amachine
What is A-Machine?
A-Machine is a work in progress library for constructing epsilon-machines1 and other stochastic models for generating structured symbol sequences with ground truth causal structure and information-theoretic complexity. The intended use case is the study of neural network learning dynamics and internal representations.
Quickstart
Installation
# CPU only
pip install a-machine
# With GPU support (requires CUDA 13)
pip install "a-machine[cuda]" --extra-index-url https://pypi.nvidia.com
Example
More examples are available at the gitlab repo.
import amachine as am
# Seeds random and np.random globally
am.srand_global( 42 )
# May have multiple recurrent subgraphs, terminal states, or tranistory states
m = am.random_machine(
n_states=23,
symbols=[ '0', '1' ],
connectedness=0.35,
randomness=0.25 )
# Collapse to the largest recurrent subgraph
m.collapse_to_largest_strongly_connected_subgraph()
# Minimize the machine in case it's not minimal -> epsilon-machine.
m.minimize( retain_names=True )
# Entropy rate, statistical complexity, excess entropy, crypticity
print( f"h_mu : {m.h_mu()}" )
print( f"C_mu : {m.C_mu()}" )
# Display the epislon machine
m.draw_graph()

HMMs and Epsilon Machines
An $\epsilon$-machine is the minimal, optimally predictive, unifilar, hidden Markov model of a stationary stochastic process. A large body of theory linking them to complex dynamical systems, information theory, and physics, has been developed by James P. Crutchfield and collaborators under the name computational mechanics 12.
A-Machine's HMM implementation is designed primarily for constructing $\epsilon$-machines to generate training data with interesting ground truth theoretical properties. This should be useful for making informed hypotheses about, designing experiments for, and interpreting the learning dynamics and internal representations of neural networks.
More expressive finite generative models, such as non-unifilar HMMs or probabilistic context-free grammars, can represent a broader class of languages than a finite $\epsilon$-machine, but may lack the theoretical tractability that makes $\epsilon$-machines particularly analytically useful. Thus, utilizing $\epsilon$-machines instead of alternatives, may involve tradeoff between expressiveness and interpretability.
Complexity Measures
Complexity measures implemented within HMM, include:
- statistical complexity $C_\mu$,
- entropy rate $h_{\mu}$,
- single symbol entropy H(1),
- learnable information $\rho_{\mu}$,
- Excess entropy $\mathbf{E}$,
- Syncronization information $\mathbf{S}$,
- Transient information $\mathbf{T}$,
- Crypticity $\chi$, and
- Block convergence measures $\mathbf{E}(L), \mathbf{S}(L), \mathbf{T}(L), \mathcal{H}(L),$ and $h_{\mu}(L)$.
Citations, definitions, and interpretations are in their individual documentation sections linked above.
Construction
A few ways to construct $\epsilon$-machines are impemented so far, and more are planned:
- random_machine, which generates a random HMM that can be reduced to an $\epsilon$-machine,
- star_join, which joins a list of machines to a central hub,
- star, whcih combines random_machine and star_join, and
- isomorphic_to, which creates a copy of an existing machine with new labels.
Mixed State Presentation
The mixed states 34, track beliefs about which intrinsic state the $\epsilon$-machicine as an observer attempts to syncronize with the $\epsilon$ machines. The MSP can be a useful tool for analyzing learning dynamics, because this is the task that a neural network learning from the generated data must perform to minimize loss. This has been empirically emipirically demonstrated by Shai et al. in transformer models 5.
The MSP can explode in terms of the number of states that are produced, or even be inifite for non-synchronizable processes. The belief states transitions also may produce fractal-like patterns in the probability simplex.
A-Machine implements MSP construction from a given $\epsilon$ machine. Since the MSP may have too many states to compute, a cap on the max number of states is imposed, and when the cap is reached, the MSP is closed by mapping the unresolved frontier of belief states back to the nearest existing belief states. When this happens, the final MSP is an only an approximation.
The MSP also yields closed form expressions for a range of complexity measures 6, such as $\mathbf{E}, \mathbf{S}$, and $\mathbf{T}$, which can be solve for using spectral decompositon. A-machine implements this approach to computing complexity measures, in addition to the alternative block entropy convergence approach. When the MSP construction excedes the max states cap, comparing the measures computed from its approximation with those approximated by block entropy convergence is useful for cooboration and additional insights about the synronization dynamics.
Block Entropy Convergence
As mentioned, block entropy convergence supports estimation of a range of complexity measures, and supports observing the convergence of uncertain as the machine synchronizes as all symbol blocks of length $L$ have been observed. A-machine implements this as a C++ extension, c++ code on gitlab.
Data Generation
Data generation supports generating stochastic sequence of symbols generated by a given machine, optionally along with the sequence of hidden states that were traversed, and also optionally with an additional symbol sequence with what we are calling isomorphic_shift applied. This changes the symbols that were emited for each state by the symbols that would have been emitted by alternative states that isomorphic to those states (should they exist).
Currently only pairs of isomorphic states are supported, while the plan is to generalize and expand the features for isomorphic substructure construction in general.
import amachine as am
am.srand_global(42)
m = am.random_machine(
n_states=11,
symbols=[ '0', '1', '2' ],
connectedness=0.65,
randomness=0.37 )
# Collapse to the largest recurrent subgraph
m.collapse_to_largest_strongly_connected_subgraph()
# Minimize the machine -> epsilon-machine.
m.minimize()
# Create an isomorphic machine with new labels
m_iso = am.isomorphic_to( m, alphabet=[ '3', '4', '5' ] )
m_iso.isoclass = 0
m.isoclass = 0
for j, state in enumerate( m.states ) :
m.states[ j ].add_isomorph( m_iso.states[ j ].name )
m_iso.states[ j ].add_isomorph( m.states[ j ].name )
# Join them together
m_star = am.star_join(
exit_symbol='x',
enter_symbols=['a','b', 'c', 'd' ],
machines=[ m, m_iso ],
mode_residency_factor=0.75
)
m_star.draw_graph()
m_star.generate_data(
file_prefix="./data/iso_012-345", # saves as iso_012-345.parqet
n_gen=2_000, # 2 billion symbols
include_states=True,
isomorphic_shifts={1}
)

The two sequences will be aligned, and with identical local and global statistics. Thus, you can train on one of them, and then analyze activation patterns and residuals on both of them in parallel during inference, to study if and how the neural network learned polysemantic representations that exploit the structural redundancies.
Cleaner and more general methods for constructing graphs with ismorphic substructure are in the works.
-
Crutchfield, "The calculi of emergence: computation, dynamics and induction.", 1994. https://csc.ucdavis.edu/~cmg/papers/CalcEmerg.pdf ↩
-
Shalizi, Rohilla, and Crutchfield. "Computational mechanics: Pattern and prediction, structure and simplicity.", 2001. https://arxiv.org/abs/cond-mat/9907176 ↩
-
Blackwell, David Harold. "The entropy of functions of finite-state Markov chains.", 1959. ↩
-
Jurgens & Crutchfield "Shannon Entropy Rate of Hidden Markov Processes", 2021. https://link.springer.com/article/10.1007/s10955-021-02769-3 ↩
-
Shai et al., "Transformers Represent Belief State Geometry in their Residual Stream", 2024. https://arxiv.org/abs/2405.15943 ↩
-
Crutchfield, "Exact complexity: The spectral decomposition of intrinsic computation", 2016. https://csc.ucdavis.edu/~cmg/papers/ec.pdf ↩
1""" 2# What is A-Machine? 3 4A-Machine is a work in progress library for constructing epsilon-machines[^1] and other stochastic models for generating structured symbol sequences with ground truth causal structure and information-theoretic complexity. The intended use case is the study of neural network learning dynamics and internal representations. 5 6# Quickstart 7 8## Installation 9 10```bash 11# CPU only 12pip install a-machine 13 14# With GPU support (requires CUDA 13) 15pip install "a-machine[cuda]" --extra-index-url https://pypi.nvidia.com 16``` 17 18##Example 19 20[More examples](https://gitlab.com/tneuroth/a-machine/-/tree/main/examples) are available at the gitlab repo. 21 22```python 23import amachine as am 24 25# Seeds random and np.random globally 26am.srand_global( 42 ) 27 28# May have multiple recurrent subgraphs, terminal states, or tranistory states 29m = am.random_machine( 30 n_states=23, 31 symbols=[ '0', '1' ], 32 connectedness=0.35, 33 randomness=0.25 ) 34 35# Collapse to the largest recurrent subgraph 36m.collapse_to_largest_strongly_connected_subgraph() 37 38# Minimize the machine in case it's not minimal -> epsilon-machine. 39m.minimize( retain_names=True ) 40 41# Entropy rate, statistical complexity, excess entropy, crypticity 42print( f"h_mu : {m.h_mu()}" ) 43print( f"C_mu : {m.C_mu()}" ) 44 45# Display the epislon machine 46m.draw_graph() 47``` 48 49<img src="resources/am_graph.png" alt="AM Graph" style="width: 80%; margin-left: 10%;"> 50 51# HMMs and Epsilon Machines 52 53An $\\epsilon$-machine is the minimal, optimally predictive, unifilar, hidden Markov model of a stationary stochastic process. A large body of theory linking them to complex dynamical systems, information theory, and physics, has been developed by James P. Crutchfield and collaborators under the name computational mechanics [^1][^2]. 54 55A-Machine's [HMM implementation](amachine/am_hmm.html#HMM) is designed primarily for constructing $\\epsilon$-machines to generate training data with interesting ground truth theoretical properties. This should be useful for making informed hypotheses about, designing experiments for, and interpreting the learning dynamics and internal representations of neural networks. 56 57More expressive finite generative models, such as non-unifilar HMMs or probabilistic context-free grammars, can represent a broader class of languages than a finite $\\epsilon$-machine, but may lack the theoretical tractability that makes $\\epsilon$-machines particularly analytically useful. Thus, utilizing $\\epsilon$-machines instead of alternatives, may involve tradeoff between expressiveness and interpretability. 58 59## Complexity Measures 60 61Complexity measures implemented within HMM, include: 62 63* [statistical complexity $C_\\mu$](amachine/am_hmm.html#HMM.C_mu), 64* [entropy rate $h_{\\mu}$](amachine/am_hmm.html#HMM.h_mu), 65* [single symbol entropy H(1)](amachine/am_hmm.html#HMM.H_1), 66* [learnable information $\\rho_{\\mu}$](amachine/am_hmm.html#HMM.rho_mu), 67* [Excess entropy $\\mathbf{E}$](amachine/am_hmm.html#HMM.E), 68* [Syncronization information $\\mathbf{S}$](amachine/am_hmm.html#HMM.S), 69* [Transient information $\\mathbf{T}$](amachine/am_hmm.html#HMM.S), 70* [Crypticity $\\chi$](amachine/am_hmm.html#HMM.chi), and 71* [Block convergence measures $\\mathbf{E}(L), \\mathbf{S}(L), \\mathbf{T}(L), \\mathcal{H}(L),$ and $h_{\\mu}(L)$](amachine/am_hmm.html#HMM.block_convergence). 72 73Citations, definitions, and interpretations are in their individual documentation sections linked above. 74 75## Construction 76 77A few ways to construct $\\epsilon$-machines are impemented so far, and more are planned: 78 79* [random_machine](amachine/am_create.html#random_machine), which generates a random HMM that can be reduced to an $\\epsilon$-machine, 80* [star_join](amachine/am_create.html#star_join), which joins a list of machines to a central hub, 81* [star](amachine/am_create.html#star), whcih combines random_machine and star_join, and 82* [isomorphic_to](amachine/am_create.html#isomorphic_to), which creates a copy of an existing machine with new labels. 83 84## Mixed State Presentation 85 86The mixed states [^3][^4], track beliefs about which intrinsic state the $\\epsilon$-machicine as an observer attempts to syncronize with the $\\epsilon$ machines. The MSP can be a useful tool for analyzing learning dynamics, because this is the task that a neural network learning from the generated data must perform to minimize loss. This has been empirically emipirically demonstrated by Shai et al. in transformer models [^5]. 87 88The MSP can explode in terms of the number of states that are produced, or even be inifite for non-synchronizable processes. The belief states transitions also may produce fractal-like patterns in the probability simplex. 89 90A-Machine implements [MSP construction](amachine/am_machine.html#get_msp) from a given $\\epsilon$ machine. Since the MSP may have too many states to compute, a cap on the max number of states is imposed, and when the cap is reached, the MSP is closed by mapping the unresolved frontier of belief states back to the nearest existing belief states. When this happens, the final MSP is an only an approximation. 91 92The MSP also yields closed form expressions for a range of complexity measures [^6], such as $\\mathbf{E}, \\mathbf{S}$, and $\\mathbf{T}$, which can be solve for using spectral decompositon. A-machine implements this approach to computing complexity measures, in addition to the alternative block entropy convergence approach. When the MSP construction excedes the max states cap, comparing the measures computed from its approximation with those approximated by block entropy convergence is useful for cooboration and additional insights about the synronization dynamics. 93 94## Block Entropy Convergence 95 96As mentioned, [block entropy convergence](amachine/am_machine.html#get_msp) supports estimation of a range of complexity measures, and supports observing the convergence of uncertain as the machine synchronizes as all symbol blocks of length $L$ have been observed. A-machine implements this as a [C++ extension](amachine/am_fast.html#block_entropy_convergence), [c++ code on gitlab](https://gitlab.com/tneuroth/a-machine/-/blob/main/amachine/am_fast/am_fast.cpp). 97 98## Data Generation 99 100[Data generation](amachine/am_hmm.html#HMM.generate_data) supports generating stochastic sequence of symbols generated by a given machine, optionally along with the sequence of hidden states that were traversed, and also optionally with an additional symbol sequence with what we are calling [isomorphic_shift](amachine/am_hmm.html#HMM.isomorphic_shift) applied. This changes the symbols that were emited for each state by the symbols that would have been emitted by alternative states that isomorphic to those states (should they exist). 101 102Currently only pairs of isomorphic states are supported, while the plan is to generalize and expand the features for isomorphic substructure construction in general. 103 104```python 105import amachine as am 106 107am.srand_global(42) 108 109m = am.random_machine( 110 n_states=11, 111 symbols=[ '0', '1', '2' ], 112 connectedness=0.65, 113 randomness=0.37 ) 114 115# Collapse to the largest recurrent subgraph 116m.collapse_to_largest_strongly_connected_subgraph() 117 118# Minimize the machine -> epsilon-machine. 119m.minimize() 120 121# Create an isomorphic machine with new labels 122m_iso = am.isomorphic_to( m, alphabet=[ '3', '4', '5' ] ) 123 124m_iso.isoclass = 0 125m.isoclass = 0 126 127for j, state in enumerate( m.states ) : 128 m.states[ j ].add_isomorph( m_iso.states[ j ].name ) 129 m_iso.states[ j ].add_isomorph( m.states[ j ].name ) 130 131# Join them together 132m_star = am.star_join( 133 exit_symbol='x', 134 enter_symbols=['a','b', 'c', 'd' ], 135 machines=[ m, m_iso ], 136 mode_residency_factor=0.75 137) 138 139m_star.draw_graph() 140 141m_star.generate_data( 142 file_prefix="./data/iso_012-345", # saves as iso_012-345.parqet 143 n_gen=2_000, # 2 billion symbols 144 include_states=True, 145 isomorphic_shifts={1} 146) 147``` 148 149<img src="resources/iso.png" alt="iso machine" style="width: 80%; margin-left: 10%;"> 150 151The two sequences will be aligned, and with identical local and global statistics. Thus, you can train on one of them, and then analyze activation patterns and residuals on both of them in parallel during inference, to study if and how the neural network learned polysemantic representations that exploit the structural redundancies. 152 153*Cleaner and more general methods for constructing graphs with ismorphic substructure are in the works.* 154 155[^1]: Crutchfield, "The calculi of emergence: computation, dynamics and induction.", 1994. 156 <https://csc.ucdavis.edu/~cmg/papers/CalcEmerg.pdf> 157 158[^2]: Shalizi, Rohilla, and Crutchfield. "Computational mechanics: Pattern and prediction, structure and simplicity.", 2001. 159 <https://arxiv.org/abs/cond-mat/9907176> 160 161[^3]: Blackwell, David Harold. "The entropy of functions of finite-state Markov chains.", 1959. 162 163[^4]: Jurgens & Crutchfield "Shannon Entropy Rate of Hidden Markov Processes", 2021. 164 <https://link.springer.com/article/10.1007/s10955-021-02769-3> 165 166[^5]: Shai et al., "Transformers Represent Belief State Geometry in their Residual Stream", 2024. 167 <https://arxiv.org/abs/2405.15943> 168 169[^6]: Crutchfield, "Exact complexity: The spectral decomposition of intrinsic computation", 2016. 170 <https://csc.ucdavis.edu/~cmg/papers/ec.pdf> 171 172""" 173__version__ = "0.1.2" 174 175from .am_create import ( 176 random_machine, 177 star_join, 178 star, 179 isomorphic_to 180) 181 182from .am_hmm import HMM 183 184from .am_syntagmatics import ( 185 Syntagmatics, 186 ExplicitCohesionRule 187) 188 189from .am_substitution_algebra import ( 190 SubstitutionAlgebra 191) 192 193from .am_random import srand_global