Metadata-Version: 2.4
Name: kolmo
Version: 0.0.2
Summary: Ultra-low-latency LLM router using a compression-distance heuristic inspired by Kolmogorov complexity.
Author-email: Carlos Bustamante <me@carlosbustamante.dev>, KODEX <agent@carlosbustamante.dev>
Project-URL: Homepage, https://github.com/charlybgai/kolmo
Project-URL: Repository, https://github.com/charlybgai/kolmo
Project-URL: Author, https://carlosbustamante.dev
Keywords: llm,router,routing,kolmogorov,compression,ncd,information-theory,zstd,latency
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: loguru>=0.7.3
Requires-Dist: pydantic>=2.12.5
Requires-Dist: zstandard>=0.22.0
Provides-Extra: hf
Requires-Dist: datasets>=4.5.0; extra == "hf"
Dynamic: license-file

# ⚡ Kolmo v0.0.2

> **The Information-Theoretic LLM Router**
> Zero VRAM. Microsecond latency. Dictionary-based compression distance heuristic.

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![Version: 0.0.2](https://img.shields.io/badge/version-0.0.2-green.svg)](https://github.com/charlybgai/kolmo)

---

## 🧬 What is Kolmo?

Kolmo is a **high-performance LLM router** that leverages a **dictionary-based compression distance heuristic** inspired by Normalized Compression Distance (NCD) to classify and route prompts to the most appropriate model. No neural networks. No GPU memory. Pure computational elegance.

```python
from kolmo import KolmoRouter

# Initialize with your fallback model
router = KolmoRouter(default_model="gpt-4o-mini")

# Train domains from HuggingFace datasets or local samples
router.add_domain(
    name="code",
    source="huggingface:codeparrot/github-code",
    target_model="deepseek-coder"
)
router.add_domain(
    name="science",
    samples=[b"Photosynthesis converts light energy into...", b"Quantum mechanics states that..."],
    target_model="claude-3-opus"
)

# Route in microseconds ⚡
result = router.route("Implement a Red-Black tree in Python")
print(result.winner)           # "code"
print(result.target_model)     # "deepseek-coder"
print(f"{result.latency_ms:.3f}ms")  # ~0.042ms
```

---

## 🚀 Why Kolmo?

| Feature | Traditional Routers | Kolmo |
|---------|-------------------|-------|
| **Latency** | Milliseconds | **Microseconds** |
| **VRAM Usage** | 2-20GB | **Zero** 🎯 |
| **Explainability** | Black-box embeddings | **Compression ratios** |
| **Language Support** | Requires training data | **Universal (byte-level)** |
| **Deploy Cost** | GPU required | **CPU-only** |
| **Cold Start** | Model loading time | **Instant** |

---

## 📦 Installation

```bash
pip install kolmo
```

Requires Python 3.10+.

---

## 🎮 Quick Start

### Python API

```python
from kolmo import KolmoRouter

router = KolmoRouter()

# Add domains from various sources
router.add_domain(
    name="medical",
    source="huggingface:medalpaca/medical_meadow_medical_flashcards",
    target_model="meditron-7b",
    sample_count=5000
)

# Or train from raw samples
router.add_domain(
    name="legal",
    samples=[b"Contract clause 1...", b"In accordance with Section 2..."],
    target_model="lawgpt"
)

# Route with confidence
result = router.route("Patient presents with tachycardia and chest pain...")
print(f"Domain: {result.winner}")              # "medical"
print(f"Model: {result.target_model}")         # "meditron-7b"
print(f"Confidence: {result.confidence:.2%}")  # 0.94
print(f"Latency: {result.latency_ms:.3f}ms")   # 0.038ms

# Save as portable bundle
router.save_bundle("medical_router.kolmo")
```

### CLI Interface

Kolmo ships with a powerful CLI for training and routing without writing code:

```bash
# Create config.json
{
  "default_model": "gpt-4o-mini",
  "dict_size": 112640,
  "domains": [
    {
      "name": "code",
      "source": "huggingface:codeparrot/github-code",
      "target_model": "deepseek-coder",
      "sample_count": 1000
    },
    {
      "name": "general",
      "samples": ["Hello world", "How are you?"],
      "target_model": "gpt-4o-mini"
    }
  ]
}

# Train your bundle
kolmo train config.json router.kolmo

# Route a prompt
kolmo route router.kolmo "Write a Python function to reverse a string"

# Get bundle info
kolmo info router.kolmo
```

---

## 🧠 The Science: Compression Distance Heuristic

Kolmo is inspired by **Kolmogorov Complexity** theory and uses a dictionary-based compression distance heuristic (not strict NCD). For context, the **strict NCD** definition is below.

### The Foundation

**Kolmogorov Complexity** K(x) is the theoretical minimum information needed to represent a string x. While K(x) is uncomputable, compression provides an excellent upper bound.

### Strict NCD Formula

$$
\mathrm{NCD}(x, y) = \frac{C(xy) - \min(C(x), C(y))}{\max(C(x), C(y))}
$$

Where:
- **C(x)** = compressed size of x
- **C(xy)** = compressed size of x concatenated with y
- When x and y share structure → C(xy) < C(x) + C(y)

### How Kolmo Uses a Compression Heuristic

Kolmo approximates the NCD idea with **zstd dictionaries** by treating each dictionary as a compact "reference string":

```
score(prompt, dict) = (C(dict) + C(prompt | dict) - min(C(prompt), C(dict))) / max(C(prompt), C(dict))
```

This is a **dictionary-based compression distance heuristic**, not strict NCD, because it uses dictionary-conditional compression to approximate \(C(xy)\).

**The domain with the LOWEST score wins** — the prompt compresses best with that domain's dictionary, indicating highest similarity.

---

## 🏗️ Architecture

```
┌─────────────────────────────────────────────────────────────┐
│                        Kolmo v0.0.2 • Router                   │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│   Input Prompt                                               │
│        │                                                     │
│        ▼                                                     │
│   ┌──────────────┐    ┌──────────────┐    ┌──────────────┐  │
│   │ zstd + Dict  │    │ zstd + Dict  │    │ zstd + Dict  │  │
│   │   (Code)     │    │  (Medical)   │    │   (Legal)    │  │
│   └──────┬───────┘    └──────┬───────┘    └──────┬───────┘  │
│          │                    │                    │          │
│          ▼                    ▼                    ▼          │
│      NCD: 0.12           NCD: 0.34           NCD: 0.89       │
│          │                    │                    │          │
│          └────────────────────┼────────────────────┘          │
│                               │                              │
│                    Winner: Code (lowest NCD)                 │
│                               │                              │
│                    Route → deepseek-coder                    │
└─────────────────────────────────────────────────────────────┘
```

---

## 🔥 v0.0.2 • Features

### ⚡ Batch Routing
Process thousands of prompts efficiently:

```python
prompts = [
    "Write a Python function",
    "Explain quantum entanglement",
    "Draft a legal contract clause"
]

batch = router.route_many(prompts)
print(f"Throughput: {batch.throughput_rps:.0f} req/s")
print(f"Avg latency: {batch.avg_latency_ms:.3f}ms")
```

### 🎯 Thresholded Routing with Ambiguity Detection

Handle uncertain decisions gracefully:

```python
result = router.route_thresholded(
    prompt,
    min_confidence=0.3,      # Minimum confidence to accept
    ambiguity_gap=0.05,      # Gap between top candidates
    max_candidates=3         # Return top-N candidates
)

if result.ambiguous:
    print(f"Uncertain ({result.reason}): {result.candidates}")
    # Fallback to general-purpose model
else:
    model = result.target_model
```

### 📊 Pluggable Metrics Hooks

Integrate with your monitoring stack:

```python
def prometheus_hook(metrics: RoutingMetrics):
    latency.observe(metrics.latency_ms)
    requests.labels(domain=metrics.winner).inc()

router = KolmoRouter(metrics_hooks=[prometheus_hook])
```

### 🛠️ Fine-Grained Controls

Tune dictionary size per domain:

```python
router.add_domain(
    name="code",
    samples=code_samples,
    dict_size=50000,           # Custom dictionary size
)
```

---

## 📈 Performance

Run the reproducible benchmark script to generate the table for your hardware:

```bash
python examples/quickstart_benchmark.py --iterations 500 --domains 5
```

It logs CPU details, domain count, iteration count, and prompt lengths, then prints a README-ready table.

| Prompt Length | Domains | Avg Latency (ms) | p50 (ms) | p95 (ms) | Throughput (req/s) | Accuracy |
|:-------------:|:-------:|:---------------:|:--------:|:--------:|:-----------------:|:--------:|
| (run script)  | 5       | (run script)    | (run)    | (run)    | (run script)      | (run)    |

*Memory footprint: ~50KB per domain dictionary*

---

## 🌍 Language Agnostic

Kolmo operates at the **byte level** using statistical compression:

- ✅ English, Spanish, Mandarin, Japanese...
- ✅ Python, Rust, Go, C++, JavaScript...
- ✅ Log files, JSON, XML, Protocol Buffers...
- ✅ Mixed-domain prompts

No tokenization. No language-specific training.

---

## 📚 Advanced Examples

### Coding vs. Reasoning Router

```python
router = KolmoRouter()

router.add_domain(
    "coding",
    source="huggingface:codeparrot/github-code",
    target_model="deepseek-coder"
)
router.add_domain(
    "reasoning",
    source="huggingface:sci_q",
    target_model="deepseek-r1"
)

# Coding prompt → deepseek-coder
res = router.route("Write a FastAPI endpoint for user login")

# Reasoning prompt → deepseek-r1
res = router.route("If 3x + 5 = 20, what is x?")
```

### Language Detection

```python
router.add_domain("en", samples=english_samples, target_model="gpt-4o")
router.add_domain("es", samples=spanish_samples, target_model="gpt-4o-es")

res = router.route("Hola, ¿cómo estás?")
print(res.winner)  # "es"
```

### Using Pre-trained Dictionaries

```python
# Load a pre-trained zstd dictionary
dict_bytes = open("medical.dict", "rb").read()

router.add_domain(
    name="medical",
    dict_bytes=dict_bytes,
    target_model="meditron-7b"
)
```

---

## 🧪 API Reference

### `KolmoRouter`

| Method | Description |
|--------|-------------|
| `add_domain(name, ...)` | Add a routing domain |
| `route(prompt)` | Route a single prompt |
| `route_thresholded(prompt, ...)` | Route with ambiguity detection |
| `route_many(prompts)` | Batch routing with aggregate stats |
| `save_bundle(path)` | Save router to `.kolmo` file |
| `load_bundle(path)` | Load router from `.kolmo` file |

### CLI Commands

```bash
kolmo train <config.json> <output.kolmo>  # Train a bundle
kolmo route <bundle.kolmo> <prompt>        # Route a prompt
kolmo info <bundle.kolmo>                  # Show bundle info
```

---

## 🗺️ Roadmap

### ✅ Implemented (Current)

- Batch routing with throughput stats
- Thresholded routing with ambiguity detection
- CLI tooling (train, route, info)
- Pluggable metrics hooks
- Fine-grained dictionary sizing controls

### 🚀 Next Steps (High Priority)

- Dictionary optimization and pruning tools
- Config-driven domain registry (YAML/JSON)
- Comprehensive benchmark suite
- Caching layer for repeated prompts
- Improved dataset ingestion with progress tracking

### 🗺️ Future Horizons (Backlog)

- Hierarchical multi-stage routing
- Adaptive dictionary refresh
- REST/gRPC service layer
- SIMD-optimized compression kernels

---

## 🤝 Contributing

We welcome contributions! Kolmo is in active development.

```bash
git clone https://github.com/charlybgai/kolmo
cd kolmo
uv sync --frozen
uv tool install pre-commit
pre-commit install
```

---

## 📄 License

MIT License — See [LICENSE](https://opensource.org/licenses/MIT) for details.

---

## 🙏 Acknowledgments

Kolmo is inspired by research in:
- [Normalized Compression Distance](https://en.wikipedia.org/wiki/Normalized_compression_distance) (Li et al.)
- [The Similarity Metric](https://arxiv.org/abs/cs/0111054) (Li et al., 2004)
- Modern zstd dictionary training techniques (Facebook)

---

**Built by [Carlos Bustamante](mailto:me@carlosbustamante.dev) & [KODEX](mailto:agent@carlosbustamante.dev)**
*Information theory meets production inference. ⚡*
