GitLab Repo

amachine

What is A-Machine?

A-Machine is a work in progress library for constructing epsilon-machines1 and other stochastic models for generating structured symbol sequences with ground truth causal structure and information-theoretic complexity. The intended use case is the study of neural network learning dynamics and internal representations.

Quickstart

Installation

# CPU only
pip install a-machine

# With GPU support (requires CUDA 13)
pip install "a-machine[cuda]" --extra-index-url https://pypi.nvidia.com

Example

More examples are available at the gitlab repo.

import amachine as am

# Seeds random and np.random globally
am.srand_global( 42 )

# May have multiple recurrent subgraphs, terminal states, or tranistory states
m = am.random_machine( 
        n_states=23, 
        symbols=[ '0', '1' ],
        connectedness=0.35,
        randomness=0.25 )

# Collapse to the largest recurrent subgraph
m.collapse_to_largest_strongly_connected_subgraph()

# Minimize the machine in case it's not minimal -> epsilon-machine.
m.minimize( retain_names=True )

# Entropy rate, statistical complexity, excess entropy, crypticity 
print( f"h_mu : {m.h_mu()}" )
print( f"C_mu : {m.C_mu()}" )

# Display the epislon machine
m.draw_graph()

AM Graph

HMMs and Epsilon Machines

An $\epsilon$-machine is the minimal, optimally predictive, unifilar, hidden Markov model of a stationary stochastic process. A large body of theory linking them to complex dynamical systems, information theory, and physics, has been developed by James P. Crutchfield and collaborators under the name computational mechanics 12.

A-Machine's HMM implementation is designed primarily for constructing $\epsilon$-machines to generate training data with interesting ground truth theoretical properties. This should be useful for making informed hypotheses about, designing experiments for, and interpreting the learning dynamics and internal representations of neural networks.

More expressive finite generative models, such as non-unifilar HMMs or probabilistic context-free grammars, can represent a broader class of languages than a finite $\epsilon$-machine, but may lack the theoretical tractability that makes $\epsilon$-machines particularly analytically useful. Thus, utilizing $\epsilon$-machines instead of alternatives, may involve tradeoff between expressiveness and interpretability.

Complexity Measures

Complexity measures implemented within HMM, include:

Citations, definitions, and interpretations are in their individual documentation sections linked above.

Construction

A few ways to construct $\epsilon$-machines are impemented so far, and more are planned:

  • random_machine, which generates a random HMM that can be reduced to an $\epsilon$-machine,
  • star_join, which joins a list of machines to a central hub,
  • star, whcih combines random_machine and star_join, and
  • isomorphic_to, which creates a copy of an existing machine with new labels.

Mixed State Presentation

The mixed states 34, track beliefs about which intrinsic state the $\epsilon$-machicine as an observer attempts to syncronize with the $\epsilon$ machines. The MSP can be a useful tool for analyzing learning dynamics, because this is the task that a neural network learning from the generated data must perform to minimize loss. This has been empirically emipirically demonstrated by Shai et al. in transformer models 5.

The MSP can explode in terms of the number of states that are produced, or even be inifite for non-synchronizable processes. The belief states transitions also may produce fractal-like patterns in the probability simplex.

A-Machine implements MSP construction from a given $\epsilon$ machine. Since the MSP may have too many states to compute, a cap on the max number of states is imposed, and when the cap is reached, the MSP is closed by mapping the unresolved frontier of belief states back to the nearest existing belief states. When this happens, the final MSP is an only an approximation.

The MSP also yields closed form expressions for a range of complexity measures 6, such as $\mathbf{E}, \mathbf{S}$, and $\mathbf{T}$, which can be solve for using spectral decompositon. A-machine implements this approach to computing complexity measures, in addition to the alternative block entropy convergence approach. When the MSP construction excedes the max states cap, comparing the measures computed from its approximation with those approximated by block entropy convergence is useful for cooboration and additional insights about the synronization dynamics.

Block Entropy Convergence

As mentioned, block entropy convergence supports estimation of a range of complexity measures, and supports observing the convergence of uncertain as the machine synchronizes as all symbol blocks of length $L$ have been observed. A-machine implements this as a C++ extension, c++ code on gitlab.

Data Generation

Data generation supports generating stochastic sequence of symbols generated by a given machine, optionally along with the sequence of hidden states that were traversed, and also optionally with an additional symbol sequence with what we are calling isomorphic_shift applied. This changes the symbols that were emited for each state by the symbols that would have been emitted by alternative states that isomorphic to those states (should they exist).

Currently only pairs of isomorphic states are supported, while the plan is to generalize and expand the features for isomorphic substructure construction in general.

import amachine as am

am.srand_global(42)

m = am.random_machine( 
        n_states=11, 
        symbols=[ '0', '1', '2' ],
        connectedness=0.65,
        randomness=0.37 )

# Collapse to the largest recurrent subgraph
m.collapse_to_largest_strongly_connected_subgraph()

# Minimize the machine -> epsilon-machine.
m.minimize()

# Create an isomorphic machine with new labels
m_iso = am.isomorphic_to( m, alphabet=[ '3', '4', '5' ] )

m_iso.isoclass = 0
m.isoclass     = 0

for j, state in enumerate( m.states ) : 
    m.states[ j ].add_isomorph( m_iso.states[ j ].name )
    m_iso.states[ j ].add_isomorph( m.states[ j ].name )

# Join them together
m_star = am.star_join(
    exit_symbol='x', 
    enter_symbols=['a','b', 'c', 'd' ],
    machines=[ m, m_iso ],
    mode_residency_factor=0.75 
)

m_star.draw_graph()

m_star.generate_data(
    file_prefix="./data/iso_012-345", # saves as iso_012-345.parqet
    n_gen=2_000, # 2 billion symbols    
    include_states=True,
    isomorphic_shifts={1}
)

iso machine

The two sequences will be aligned, and with identical local and global statistics. Thus, you can train on one of them, and then analyze activation patterns and residuals on both of them in parallel during inference, to study if and how the neural network learned polysemantic representations that exploit the structural redundancies.

Cleaner and more general methods for constructing graphs with ismorphic substructure are in the works.


  1. Crutchfield, "The calculi of emergence: computation, dynamics and induction.", 1994. https://csc.ucdavis.edu/~cmg/papers/CalcEmerg.pdf 

  2. Shalizi, Rohilla, and Crutchfield. "Computational mechanics: Pattern and prediction, structure and simplicity.", 2001. https://arxiv.org/abs/cond-mat/9907176 

  3. Blackwell, David Harold. "The entropy of functions of finite-state Markov chains.", 1959. 

  4. Jurgens & Crutchfield "Shannon Entropy Rate of Hidden Markov Processes", 2021. https://link.springer.com/article/10.1007/s10955-021-02769-3 

  5. Shai et al., "Transformers Represent Belief State Geometry in their Residual Stream", 2024. https://arxiv.org/abs/2405.15943 

  6. Crutchfield, "Exact complexity: The spectral decomposition of intrinsic computation", 2016. https://csc.ucdavis.edu/~cmg/papers/ec.pdf 

  1"""
  2# What is A-Machine?
  3
  4A-Machine is a work in progress library for constructing epsilon-machines[^1] and other stochastic models for generating structured symbol sequences with ground truth causal structure and information-theoretic complexity. The intended use case is the study of neural network learning dynamics and internal representations.
  5
  6# Quickstart
  7
  8## Installation
  9
 10```bash
 11# CPU only
 12pip install a-machine
 13
 14# With GPU support (requires CUDA 13)
 15pip install "a-machine[cuda]" --extra-index-url https://pypi.nvidia.com
 16```
 17
 18##Example
 19
 20[More examples](https://gitlab.com/tneuroth/a-machine/-/tree/main/examples) are available at the gitlab repo.
 21
 22```python
 23import amachine as am
 24
 25# Seeds random and np.random globally
 26am.srand_global( 42 )
 27
 28# May have multiple recurrent subgraphs, terminal states, or tranistory states
 29m = am.random_machine( 
 30	n_states=23, 
 31	symbols=[ '0', '1' ],
 32	connectedness=0.35,
 33	randomness=0.25 )
 34
 35# Collapse to the largest recurrent subgraph
 36m.collapse_to_largest_strongly_connected_subgraph()
 37
 38# Minimize the machine in case it's not minimal -> epsilon-machine.
 39m.minimize( retain_names=True )
 40
 41# Entropy rate, statistical complexity, excess entropy, crypticity 
 42print( f"h_mu : {m.h_mu()}" )
 43print( f"C_mu : {m.C_mu()}" )
 44
 45# Display the epislon machine
 46m.draw_graph()
 47```
 48
 49<img src="resources/am_graph.png" alt="AM Graph" style="width: 80%; margin-left: 10%;">
 50
 51# HMMs and Epsilon Machines
 52
 53An $\\epsilon$-machine is the minimal, optimally predictive, unifilar, hidden Markov model of a stationary stochastic process. A large body of theory linking them to complex dynamical systems, information theory, and physics, has been developed by James P. Crutchfield and collaborators under the name computational mechanics [^1][^2].
 54
 55A-Machine's [HMM implementation](amachine/am_hmm.html#HMM) is designed primarily for constructing $\\epsilon$-machines to generate training data with interesting ground truth theoretical properties. This should be useful for making informed hypotheses about, designing experiments for, and interpreting the learning dynamics and internal representations of neural networks.
 56
 57More expressive finite generative models, such as non-unifilar HMMs or probabilistic context-free grammars, can represent a broader class of languages than a finite $\\epsilon$-machine, but may lack the theoretical tractability that makes $\\epsilon$-machines particularly analytically useful. Thus, utilizing $\\epsilon$-machines instead of alternatives, may involve tradeoff between expressiveness and interpretability.
 58
 59## Complexity Measures
 60
 61Complexity measures implemented within HMM, include:
 62
 63* [statistical complexity $C_\\mu$](amachine/am_hmm.html#HMM.C_mu),
 64* [entropy rate $h_{\\mu}$](amachine/am_hmm.html#HMM.h_mu),
 65* [single symbol entropy H(1)](amachine/am_hmm.html#HMM.H_1), 
 66* [learnable information $\\rho_{\\mu}$](amachine/am_hmm.html#HMM.rho_mu),
 67* [Excess entropy $\\mathbf{E}$](amachine/am_hmm.html#HMM.E),
 68* [Syncronization information $\\mathbf{S}$](amachine/am_hmm.html#HMM.S),
 69* [Transient information $\\mathbf{T}$](amachine/am_hmm.html#HMM.S),
 70* [Crypticity $\\chi$](amachine/am_hmm.html#HMM.chi), and
 71* [Block convergence measures $\\mathbf{E}(L), \\mathbf{S}(L), \\mathbf{T}(L), \\mathcal{H}(L),$ and $h_{\\mu}(L)$](amachine/am_hmm.html#HMM.block_convergence).
 72
 73Citations, definitions, and interpretations are in their individual documentation sections linked above.
 74
 75## Construction
 76
 77A few ways to construct $\\epsilon$-machines are impemented so far, and more are planned:
 78
 79* [random_machine](amachine/am_create.html#random_machine), which generates a random HMM that can be reduced to an $\\epsilon$-machine,
 80* [star_join](amachine/am_create.html#star_join), which joins a list of machines to a central hub,
 81* [star](amachine/am_create.html#star), whcih combines random_machine and star_join, and
 82* [isomorphic_to](amachine/am_create.html#isomorphic_to), which creates a copy of an existing machine with new labels.
 83
 84## Mixed State Presentation
 85
 86The mixed states [^3][^4], track beliefs about which intrinsic state the $\\epsilon$-machicine as an observer attempts to syncronize with the $\\epsilon$ machines. The MSP can be a useful tool for analyzing learning dynamics, because this is the task that a neural network learning from the generated data must perform to minimize loss. This has been empirically emipirically demonstrated by Shai et al. in transformer models [^5].
 87
 88The MSP can explode in terms of the number of states that are produced, or even be inifite for non-synchronizable processes. The belief states transitions also may produce fractal-like patterns in the  probability simplex.
 89
 90A-Machine implements [MSP construction](amachine/am_machine.html#get_msp) from a given $\\epsilon$ machine. Since the MSP may have too many states to compute, a cap on the max number of states is imposed, and when the cap is reached, the MSP is closed by mapping the unresolved frontier of belief states back to the nearest existing belief states. When this happens, the final MSP is an only an approximation.
 91
 92The MSP also yields closed form expressions for a range of complexity measures [^6], such as $\\mathbf{E}, \\mathbf{S}$, and $\\mathbf{T}$, which can be solve for using spectral decompositon. A-machine implements this approach to computing complexity measures, in addition to the alternative block entropy convergence approach. When the MSP construction excedes the max states cap, comparing the measures computed from its approximation with those approximated by block entropy convergence is useful for cooboration and additional insights about the synronization dynamics.
 93
 94## Block Entropy Convergence
 95
 96As mentioned, [block entropy convergence](amachine/am_machine.html#get_msp) supports estimation of a range of complexity measures, and supports observing the convergence of uncertain as the machine synchronizes as all symbol blocks of length $L$ have been observed. A-machine implements this as a [C++ extension](amachine/am_fast.html#block_entropy_convergence), [c++ code on gitlab](https://gitlab.com/tneuroth/a-machine/-/blob/main/amachine/am_fast/am_fast.cpp).
 97
 98## Data Generation
 99
100[Data generation](amachine/am_hmm.html#HMM.generate_data) supports generating stochastic sequence of symbols generated by a given machine, optionally along with the sequence of hidden states that were traversed, and also optionally with an additional symbol sequence with what we are calling [isomorphic_shift](amachine/am_hmm.html#HMM.isomorphic_shift) applied. This changes the symbols that were emited for each state by the symbols that would have been emitted by alternative states that isomorphic to those states (should they exist).
101
102Currently only pairs of isomorphic states are supported, while the plan is to generalize and expand the features for isomorphic substructure construction in general.
103
104```python
105import amachine as am
106
107am.srand_global(42)
108
109m = am.random_machine( 
110        n_states=11, 
111        symbols=[ '0', '1', '2' ],
112        connectedness=0.65,
113        randomness=0.37 )
114
115# Collapse to the largest recurrent subgraph
116m.collapse_to_largest_strongly_connected_subgraph()
117
118# Minimize the machine -> epsilon-machine.
119m.minimize()
120
121# Create an isomorphic machine with new labels
122m_iso = am.isomorphic_to( m, alphabet=[ '3', '4', '5' ] )
123
124m_iso.isoclass = 0
125m.isoclass     = 0
126
127for j, state in enumerate( m.states ) : 
128    m.states[ j ].add_isomorph( m_iso.states[ j ].name )
129    m_iso.states[ j ].add_isomorph( m.states[ j ].name )
130    
131# Join them together
132m_star = am.star_join(
133    exit_symbol='x', 
134    enter_symbols=['a','b', 'c', 'd' ],
135    machines=[ m, m_iso ],
136    mode_residency_factor=0.75 
137)
138
139m_star.draw_graph()
140
141m_star.generate_data(
142    file_prefix="./data/iso_012-345", # saves as iso_012-345.parqet
143    n_gen=2_000, # 2 billion symbols    
144    include_states=True,
145    isomorphic_shifts={1}
146)
147```
148
149<img src="resources/iso.png" alt="iso machine" style="width: 80%; margin-left: 10%;">
150
151The two sequences will be aligned, and with identical local and global statistics. Thus, you can train on one of them, and then analyze activation patterns and residuals on both of them in parallel during inference, to study if and how the neural network learned polysemantic representations that exploit the structural redundancies.
152
153*Cleaner and more general methods for constructing graphs with ismorphic substructure are in the works.*
154
155[^1]: Crutchfield, "The calculi of emergence: computation, dynamics and induction.", 1994.
156    <https://csc.ucdavis.edu/~cmg/papers/CalcEmerg.pdf>
157
158[^2]: Shalizi, Rohilla, and Crutchfield. "Computational mechanics: Pattern and prediction, structure and simplicity.", 2001.
159    <https://arxiv.org/abs/cond-mat/9907176>
160
161[^3]: Blackwell, David Harold. "The entropy of functions of finite-state Markov chains.", 1959.
162
163[^4]: Jurgens & Crutchfield "Shannon Entropy Rate of Hidden Markov Processes", 2021.
164    <https://link.springer.com/article/10.1007/s10955-021-02769-3>
165
166[^5]: Shai et al., "Transformers Represent Belief State Geometry in their Residual Stream", 2024. 
167    <https://arxiv.org/abs/2405.15943>
168
169[^6]: Crutchfield, "Exact complexity: The spectral decomposition of intrinsic computation", 2016.
170    <https://csc.ucdavis.edu/~cmg/papers/ec.pdf>
171
172"""
173__version__ = "0.1.2"
174
175from .am_create import (
176    random_machine,
177    star_join,
178    star,
179    isomorphic_to
180)
181
182from .am_hmm import HMM
183
184from .am_syntagmatics import (
185    Syntagmatics,
186    ExplicitCohesionRule
187)
188
189from .am_substitution_algebra import (
190    SubstitutionAlgebra
191)
192
193from .am_random import srand_global