Metadata-Version: 2.4
Name: cymeta
Version: 0.1.7
Summary: Cyclic Meta-Dictionary Compression for Transformer Models
Home-page: https://github.com/cymeta/cymeta
Author: Bajirao Sudhakar Sali
Author-email: salibajirao@gmail.com
License: Apache-2.0
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch>=1.9.0
Requires-Dist: numpy>=1.20.0
Requires-Dist: transformers>=4.20.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=3.0.0; extra == "dev"
Requires-Dist: black>=22.0.0; extra == "dev"
Requires-Dist: mypy>=0.950; extra == "dev"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license
Dynamic: license-file
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# CyMeta: Cyclic Meta-Dictionary Compression

A production-ready Python library for **Cyclic Meta-Dictionary Compression (CyMeta)**, a novel weight-compression and weight-reconstruction approach for transformer models.

## Overview

CyMeta replaces large transformer weight matrices with:
- **Small shared meta-dictionaries**: 8-64 learned high-precision 1D dictionary vectors per module type
- **Compact integer index maps**: Store which dictionary atom to use and how much to cyclically shift it
- **Tiny contextual gating networks**: Output mixture coefficients over dictionary atoms for context-dependent reconstruction

This enables **extreme compression** while maintaining functional capacity through adaptive weight reconstruction at runtime.

## Key Features

- 🚀 **Extreme Compression**: Achieve 10-100x compression ratios
- 🎯 **Context-Dependent**: Adaptive weight reconstruction based on token context
- 🔄 **Circular Shifts**: Efficient wrap-around shifts without matrix multiplications
- 🧩 **Modular Design**: Drop-in replacements for PyTorch layers
- 📦 **Production-Ready**: Full type annotations, tests, and documentation

## Installation

```bash
pip install cymeta
```

For local development (editable mode):
```bash
pip install -e .
```

For development with all tooling:
```bash
pip install -e ".[dev]"
```

## Quick Start

### Basic Usage

```python
import torch
from cymeta import CyMetaLinear

# Create a CyMeta-compressed linear layer
layer = CyMetaLinear(
    in_features=128,
    out_features=64,
    dict_size=32,        # Number of dictionary atoms
    shift_bits=8,         # Bits for shift encoding
    gating_hidden=16,    # Hidden dim for gating network
)

# Use like a normal PyTorch layer
x = torch.randn(4, 128)  # (batch_size, in_features)
output = layer(x)        # (batch_size, out_features)

# Check compression ratio
print(f"Compression: {layer.get_compression_ratio():.2f}x")
```

### Compressing a Model

```python
from cymeta.training.compressor import CyMetaCompressor
import torch.nn as nn

# Original layer
original_layer = nn.Linear(128, 64)

# Compress
compressor = CyMetaCompressor(num_atoms=32, gating_hidden=16)
compressed = compressor.compress_layer(original_layer, num_iterations=100)

# Use compressed components
print(f"Dictionary atoms: {compressed['dictionary'].num_atoms}")
print(f"Index map shape: {compressed['index_map'].shape}")
```

### Using Attention and FFN Layers

```python
from cymeta import CyMetaMultiHeadAttention, CyMetaFFN

# Multi-head attention with CyMeta compression
attention = CyMetaMultiHeadAttention(
    embed_dim=256,
    num_heads=8,
    dict_size=32,
)

# Feed-forward network with CyMeta compression
ffn = CyMetaFFN(
    embed_dim=256,
    ffn_dim=1024,
    dict_size=32,
)
```

## Architecture

### Core Components

1. **MetaDictionary** (`cymeta/dictionary.py`)
   - Stores learned high-precision dictionary atoms
   - Supports efficient circular shift operations

2. **IndexMap** (`cymeta/index_map.py`)
   - Compact integer matrices for atom and shift indices
   - Enables extreme compression

3. **GatingNetwork** (`cymeta/gating.py`)
   - Tiny MLP for context-dependent atom mixing
   - Ensures expressivity with extreme compression

4. **Reconstruction** (`cymeta/reconstruct.py`)
   - Algorithms for on-the-fly weight reconstruction
   - Combines atoms, shifts, and gating coefficients

### Model Wrappers

- `CyMetaLinear`: Drop-in replacement for `nn.Linear`
- `CyMetaMultiHeadAttention`: Compressed multi-head attention
- `CyMetaFFN`: Compressed feed-forward network

### Training Utilities

- `CyMetaCompressor`: Compress full-precision weights
- `CyMetaTrainer`: Joint training of dictionaries, index maps, and gating networks
- `KnowledgeDistillation`: Distillation from teacher models

## Examples

See the `examples/` directory for:
- `example_compression.py`: Basic compression example
- `benchmark.py`: Performance benchmarking utilities
- `compress_hf_model.py`: Convert a HuggingFace transformer

## Testing

Run the test suite:

```bash
pytest tests/
```

## Documentation

### Compression Algorithm

1. **Preprocessing/Compression**:
   - Learn K dictionary atoms D = {d₁, ..., dₖ}
   - For each weight matrix row, find best dictionary index p and cyclic shift s
   - Save index map I[row] = (p, s)
   - Train tiny gating net G to modulate atom usage

2. **Forward Pass/Reconstruction**:
   - For each token t and head h: g = G(token_embedding_t, head=h)
   - Reconstruct: weight_vector = Σₖ g[k] * roll(D[k], shift=I[k])

### Compression Ratio

The compression ratio depends on:
- Number of dictionary atoms (`dict_size`)
- Size of index maps (integer vs. float32)
- Size of gating network (typically very small)

Typical compression ratios: **10-100x** for transformer layers.

### HuggingFace Conversion Workflow

CyMeta can compress any HuggingFace transformer by replacing every `nn.Linear` layer with a `CyMetaLinear`. Use `ConversionConfig` to control dictionary size, gating hidden dimension, and compression iterations.

```python
from cymeta.exports.converter import ConversionConfig, convert_pretrained_model

config = ConversionConfig(dict_size=32, compress_iterations=50, verbose=True)
compressed_model = convert_pretrained_model("distilbert-base-uncased", config=config)
```

Under the hood CyMeta:
1. Loads the pretrained model from HuggingFace
2. Iterates over every linear projection (attention QKV, FFN up/down, classifier heads, etc.)
3. Uses `CyMetaCompressor` to learn dictionaries + index maps for each layer
4. Replaces the original module with `CyMetaLinear`

You can supply custom `include_filter` / `exclude_filter` functions inside `ConversionConfig` to limit which layers get compressed (e.g. skip final classification head).

### CPU Deployment (No GPU Required)

CyMeta can compress models to run efficiently on CPU without requiring GPU. Use `cpu_optimized=True` for aggressive compression that achieves 20-50x compression ratios, making large models runnable on CPU.

```python
from cymeta import (
    ConversionConfig,
    convert_pretrained_model,
    ensure_cpu_model,
    get_compression_summary,
)

# Configure for CPU deployment
config = ConversionConfig(
    cpu_optimized=True,  # Aggressive compression: dict_size=16, gating_hidden=8
    verbose=True,
)

# Compress model
compressed_model = convert_pretrained_model("distilbert-base-uncased", config=config)

# Ensure CPU-ready
compressed_model = ensure_cpu_model(compressed_model)

# Check compression stats
summary = get_compression_summary(compressed_model)
print(f"Compression ratio: {summary['average_compression_ratio']:.2f}x")
print(f"Model is ready for CPU inference!")

# Use the model on CPU
compressed_model.eval()
with torch.no_grad():
    outputs = compressed_model(**inputs)  # Runs on CPU
```

**CPU-Optimized Settings:**
- `dict_size=16` (reduced from 32) - Fewer dictionary atoms for higher compression
- `gating_hidden=8` (reduced from 16) - Smaller gating networks
- `compress_iterations=150` (increased) - More iterations for better quality at lower atom count
- Automatic CPU device placement

This enables running large transformer models on CPU that would normally require GPU memory. See `examples/compress_for_cpu.py` for a complete example.

## Project Structure

```
cymeta/
├── __init__.py              # Package exports
├── dictionary.py            # Meta-dictionary implementation
├── index_map.py             # Index map implementation
├── gating.py                # Gating network implementation
├── reconstruct.py           # Weight reconstruction algorithms
├── models/                  # Model wrappers
│   ├── linear.py
│   ├── attention.py
│   └── ffn.py
├── training/                # Training utilities
│   ├── compressor.py
│   ├── trainer.py
│   └── distillation.py
└── exports/                 # Serialization
    ├── serialization.py
    └── converter.py

tests/                       # Unit tests
examples/                    # Example scripts
```

## Requirements

- Python >= 3.8
- PyTorch >= 1.9.0
- NumPy >= 1.20.0
- Transformers >= 4.20.0 (optional, for HuggingFace conversion)

## License

Apache License 2.0

## Citation

If you use CyMeta in your research, please cite:

```bibtex
@software{cymeta2025,
  title={CyMeta: Cyclic Meta-Dictionary Compression for Transformers},
  author={Sali, Bajirao Sudhakar},
  email={salibajirao@gmail.com},
  year={2025},
  url={https://github.com/cymeta/cymeta}
}
```

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## Future Work

- [ ] CUDA kernels for optimized circular shift operations
- [ ] C++ backend for production deployment
- [ ] Support for more transformer architectures
- [ ] Quantization-aware training integration
- [ ] Advanced compression algorithms

## Acknowledgments

This library implements the Cyclic Meta-Dictionary Compression (CyMeta) approach for transformer model compression.

