FluxEM: Algebraic Embeddings for Neural Arithmetic

Hunter Bown Shannon Labs
December 2024

Abstract

Neural networks often struggle with arithmetic because they treat numbers as arbitrary tokens. FluxEM is an embedding scheme where arithmetic operations map to geometric operations: addition becomes vector addition, and multiplication becomes addition in log space. The result is systematic generalization from algebraic structure without learned parameters, with accuracy evaluated within floating-point tolerance. The approach draws on Lewin's Generalized Interval Systems (1987), a framework from music theory that formalizes transformation-based structure.

1. The Problem

Large language models can describe calculus but still fumble arithmetic like 1847 x 392. Prior work (NALU, xVal, Abacus) focuses on training models to learn arithmetic more reliably.

The core issue is representation: when numbers are embedded as arbitrary token vectors, arithmetic relationships must be inferred from data. This inference often fails to generalize to new ranges.

2. The Insight

What if numbers were encoded so arithmetic operations are geometric operations?

For addition, embed numbers along a fixed direction. Then vector addition in embedding space equals arithmetic addition:

embed(a) + embed(b) = embed(a + b)

For multiplication, embed in log space. Addition in log space corresponds to multiplication in linear space:

log_embed(a) + log_embed(b) = log_embed(a x b)

These are algebraic identities over the reals for the magnitude component; under IEEE-754, they hold within floating-point precision bounds.

3. Formal Definition

3.1 Linear Embedding

For addition and subtraction:

e_lin(n) = (n / scale) * v

where v in R^d is a fixed unit vector. The homomorphism property e_lin(a) + e_lin(b) = e_lin(a + b) follows directly.

3.2 Logarithmic Embedding

For multiplication and division:

e_log(n) = (log|n| / scale) * v_mag + sign(n) * v_sign

where v_mag and v_sign are orthogonal unit vectors. Magnitude and sign are tracked separately, and the sign component is recombined in the operator definition.

Zero is handled explicitly outside the log embedding via a masking branch.

4. Results

Operation OOD Accuracy Test Range
AdditionWithin tolerance[-100000, 100000]
SubtractionWithin tolerance[-100000, 100000]
MultiplicationWithin tolerance[10, 1000] x [10, 1000]
DivisionWithin tolerance[100, 10000] / [10, 100]
PowersWithin toleranceVarious
RootsWithin toleranceVarious

"Within tolerance" means all samples achieved < 1% relative error (or < 0.5 absolute error when |expected| <= 1). These bounds are intended for neural approximation contexts rather than calculator-grade financial or scientific workloads. Errors arise from IEEE-754 floating-point precision.

5. Connection to Music Theory

This approach has historical roots in music theory. Euler's Tonnetz (1739) embedded pitches geometrically, and Lewin's Generalized Musical Intervals and Transformations (1987) formalized a transformation-based framework.

A Generalized Interval System (GIS) consists of:

FluxEM applies this structure to numbers: arithmetic is encoded directly as interval relations in embedding space.

6. Prior Work

Approach Method Limitation
NALU (Trask 2018) Log/exp with learned gates Gates can fail to generalize
xVal (Golkar 2023) Learned scaling direction Direction must be learned
Abacus (McLeish 2024) Positional digit encoding Character-level, not continuous
FluxEM Fixed algebraic structure Parameter-free

7. Limitations

8. Code

Requires Python 3.10+.

Note: PyPI currently serves fluxem==0.1.0. For the latest source (0.2.0), install from GitHub.

git clone https://github.com/Hmbown/FluxEM.git
cd FluxEM
python3 -m pip install -e .
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install fluxem

from fluxem import create_unified_model

model = create_unified_model()
model.compute("1847*392")    # -> 724024.0
model.compute("123456+789")  # -> 124245.0
model.compute("2**16")       # -> 65536.0
# Windows (PowerShell)
python -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install fluxem

Source: github.com/Hmbown/FluxEM

9. Lean4 Notes

These snippets illustrate how the linear and logarithmic magnitude properties can be stated in a proof assistant. They are illustrative and omit proof details.

-- Linear embedding property (sketch)
-- e_lin (a + b) = e_lin a + e_lin b

-- Log embedding magnitude property (sketch)
-- proj_mag (e_log (a * b)) = proj_mag (e_log a) + proj_mag (e_log b)

References