Metadata-Version: 2.4
Name: autoattn
Version: 0.1.0
Summary: Automatic routing between attention backends (dense, flash, sparse)
Author: Achintya Paningapalli
License: MIT
Project-URL: Homepage, https://github.com/achintya-p/auto_attn
Project-URL: Repository, https://github.com/achintya-p/auto_attn
Keywords: attention,transformer,llm,flash-attention,pytorch
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch>=2.0
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Dynamic: license-file

# autoattn

Automatic routing between attention backends for LLMs/VLMs.

## Install

```bash
pip install -e .
```

## Usage

```python
import torch
from autoattn import AutoAttention

attn = AutoAttention(d_model=256, num_heads=8, causal=True)

q = torch.randn(2, 128, 256)
k = torch.randn(2, 128, 256)
v = torch.randn(2, 128, 256)

out = attn(q, k, v)  # Automatically picks best backend
```

## Backends

| Backend | When Used | Memory | Exact? |
|---------|-----------|--------|--------|
| `dense` | CPU, fallback | O(N²) | ✅ |
| `flash` | GPU, seq ≤ 2048 | O(N) | ✅ |
| `local` | GPU, seq > 4096, memory mode | O(N·W) | ❌ |

## Modes

```python
# Auto (default) - picks based on device/seq length
AutoAttention(d_model=256, num_heads=8, mode="auto")

# Performance - prefer flash on GPU
AutoAttention(d_model=256, num_heads=8, mode="performance")

# Memory - prefer local/sparse
AutoAttention(d_model=256, num_heads=8, mode="memory")
```

## Requirements

- Python ≥ 3.9
- PyTorch ≥ 2.0

