Metadata-Version: 2.4
Name: mlx-mhc
Version: 0.2.0
Summary: MLX implementation of Manifold-Constrained Hyper-Connections (mHC) for Apple Silicon
Author: MA
License: MIT
Project-URL: Homepage, https://github.com/ml-explore/mlx-mhc
Project-URL: Repository, https://github.com/ml-explore/mlx-mhc
Keywords: mlx,apple silicon,deep learning,transformers,mhc,hyper-connections
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: mlx>=0.10.0
Requires-Dist: numpy>=1.24.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Dynamic: license-file

# mlx-mhc

First MLX implementation of DeepSeek's **Manifold-Constrained Hyper-Connections (mHC)** for Apple Silicon.

Based on: [arXiv:2512.24880](https://arxiv.org/abs/2512.24880)

## Installation

```bash
pip install mlx-mhc
```

## Quick Start

```python
import mlx.core as mx
import mlx_mhc as mhc

# Sinkhorn-Knopp projection to doubly stochastic matrix
matrix = mx.random.normal((8, 8))
doubly_stochastic = mhc.sinkhorn_knopp(matrix)

# Manifold Hyper-Connection module
connection = mhc.ManifoldHyperConnection(dims=512, expansion=2)
output = connection(x, layer_output)
```

## What is mHC?

mHC (Manifold-Constrained Hyper-Connections) improves training stability for large language models by constraining residual connection mixing matrices to the Birkhoff polytope (doubly stochastic matrices).

Key benefits:
- Prevents gradient explosion in deep networks
- Maintains identity mapping property
- 2.1% improvement on benchmarks with only 6.7% overhead

## API

### `sinkhorn_knopp(matrix, max_iterations=100, epsilon=1e-6, log_space=True)`

Project a matrix onto the Birkhoff polytope (set of doubly stochastic matrices).

### `ManifoldHyperConnection(dims, expansion=2, sinkhorn_iterations=10)`

MLX module implementing mHC for transformer residual connections.

## License

MIT
