Metadata-Version: 2.4
Name: context-compress
Version: 0.1.0
Summary: Hierarchical context compression toolkit for LLM applications
Project-URL: Homepage, https://github.com/elliottx/cctx
Project-URL: Documentation, https://github.com/elliottx/cctx
Project-URL: Repository, https://github.com/elliottx/cctx
Project-URL: Issues, https://github.com/elliottx/cctx/issues
Author: Elliott Weed
License-Expression: MIT
License-File: LICENSE
Keywords: agent,compression,context,llm,rag,summarization,token
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.11
Requires-Dist: click>=8.1
Requires-Dist: fastapi>=0.110
Requires-Dist: httpx>=0.27
Requires-Dist: scikit-learn<1.5,>=1.4
Requires-Dist: tiktoken>=0.7
Requires-Dist: uvicorn>=0.29
Provides-Extra: benchmark
Requires-Dist: bert-score>=0.3.13; extra == 'benchmark'
Provides-Extra: dev
Requires-Dist: pytest-cov>=5.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Provides-Extra: spacy
Requires-Dist: spacy>=3.7; extra == 'spacy'
Description-Content-Type: text/markdown

# cctx — Context Compression Toolkit

Hierarchical context compression for LLM applications. Compress long inputs into layered representations so cheap, small-context models perform like expensive, large-context ones.

**Key numbers:** 2.6x compression (L1 extractive), 11x compression (L2 abstractive) on real conversation data. 26ms compression time. Entity preservation rate >80%.

## Why

LLM context windows are expensive. Stuffing 100K tokens into every call burns money and latency. Most of that context is filler — cctx extracts what matters and gives you three layers to choose from:

| Layer | What | Compression | Use Case |
|-------|------|------------|----------|
| **L0 (Raw)** | Original text | 1x | Full context when budget allows |
| **L1 (Facts)** | Key sentences extracted | ~2.5x | Most API calls |
| **L2 (Summary)** | Abstractive summary | ~10x | Agent-to-agent context exchange |

## Features

- **Hierarchical compression** — L0/L1/L2 layers with drill-down to recover detail
- **Entity-aware scoring** — Named entities boost sentence importance (spaCy + regex hybrid)
- **Conversation-aware** — Recency bias, role weighting, incremental delta compression
- **Delta cache** — Content-addressable LRU cache with disk persistence; only compress what changed
- **LLM-based scoring** — Optional GPT-4o-mini scorer for higher quality (with budget caps)
- **Abstractive L2** — LLM-powered summarization with extractive fallback ($0.00013/call)
- **OpenAI-compatible proxy** — Drop-in `/v1/chat/completions` endpoint that auto-compresses context
- **Agent protocol** — A2A/MCP-compatible context exchange with MIME types and header negotiation
- **Benchmark suite** — ROUGE-L, BERTScore, LLM-as-Judge evaluation

## Installation

```bash
pip install -e .

# Optional: better entity extraction
pip install spacy && python -m spacy download en_core_web_sm
```

## Quick Start

### Python API

```python
from cctx.compressor import Compressor
from cctx.types import CompressRequest, Layer

compressor = Compressor()
result = compressor.compress(CompressRequest(
    text="Your long document or conversation here...",
    l1_ratio=0.3,        # Keep top 30% of sentences
    l2_max_sentences=5,   # Summarize to 5 sentences
))

print(result.layers[Layer.FACTS])    # L1: key sentences
print(result.layers[Layer.SUMMARY])  # L2: abstractive summary
print(result.metadata)               # Token counts, entities, timing
```

### CLI

```bash
# Compress a file
cctx compress document.txt
cctx compress document.txt --layer 2 --output summary.txt

# Run as API server
cctx serve --port 8420

# Benchmark compression quality
cctx benchmark document.txt

# Show version
cctx version
```

### OpenAI-Compatible Proxy

Drop cctx in front of any OpenAI-compatible API. It compresses conversation history before forwarding:

```bash
cctx serve --port 8420

# Use like normal OpenAI API — context gets compressed automatically
curl -X POST http://localhost:8420/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {"role": "system", "content": "You are helpful."},
      {"role": "user", "content": "Very long conversation history..."}
    ]
  }'
```

## API Endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/health` | GET | Health check |
| `/v1/compress` | POST | Compress text with layer selection |
| `/v1/conversation` | POST | Incremental conversation compression |
| `/v1/drill-down` | POST | Recover detail from a compressed section |
| `/v1/chat/completions` | POST | OpenAI-compatible proxy with auto-compression |

### Compress

```bash
curl -X POST localhost:8420/v1/compress \
  -H 'Content-Type: application/json' \
  -d '{
    "text": "Long text to compress...",
    "layer": 1,
    "l1_ratio": 0.3,
    "l2_max_sentences": 3,
    "sections": true
  }'
```

### Incremental Conversation

```bash
# Add messages one at a time — cctx maintains compressed context
curl -X POST localhost:8420/v1/conversation \
  -H 'Content-Type: application/json' \
  -d '{
    "conversation_id": "conv-1",
    "message": {"role": "user", "content": "Hello, can you help me?"}
  }'
```

## Scoring Strategies

cctx provides multiple sentence scoring algorithms that can be composed:

| Scorer | Speed | Quality | Notes |
|--------|-------|---------|-------|
| **TextRank** | Fast | Good | Default. Graph-based, no external deps |
| **TF-IDF** | Fast | Good | Query-aware variant available |
| **ConversationScorer** | Fast | Best for chat | Recency bias + role weights + entity boost |
| **LLMScorer** | Slow | Highest | GPT-4o-mini based, with budget caps and caching |
| **CompositeScorer** | Varies | Custom | Weighted blend of any scorers |

### Entity-Aware Scoring

Sentences containing named entities (people, orgs, products, money amounts) get automatic importance boosts. Entities appearing 3+ times get stronger boosts. A safety net swaps in missing entity sentences after initial selection.

```python
from cctx.scorer import ConversationScorer
from cctx.compressor import Compressor

# Entity boost is automatic when using ConversationScorer
compressor = Compressor(scorer=ConversationScorer())
```

## Agent Protocol

cctx defines a lightweight protocol for compressed context exchange between AI agents, compatible with A2A and MCP patterns:

```
X-CCTX-Encoding: facts          # Current compression level
X-CCTX-Accept: facts,summary    # Accepted levels
X-CCTX-Token-Budget: 2000       # Token budget constraint
X-CCTX-Drill-Down: true         # Drill-down supported
Content-Type: application/vnd.cctx+json
```

Agents can negotiate compression levels and request drill-downs to recover detail when the summary isn't enough.

## Architecture

```
Input Text
    │
    ├─→ Entity Extraction (spaCy + regex)
    │       │
    │       ▼
    ├─→ Sentence Scoring (TextRank / ConversationScorer / LLM)
    │       │ (entity boost applied here)
    │       ▼
    ├─→ L1: Top-K Sentence Selection + Entity Safety Net
    │       │
    │       ▼
    └─→ L2: Abstractive Summarization (LLM or extractive fallback)
            │
            ▼
        Delta Cache (content-addressable, LRU, disk-persistent)
```

See [ARCHITECTURE.md](ARCHITECTURE.md) for full module contracts.

## Development

```bash
# Run tests
pytest -q

# With coverage
pytest --cov=cctx

# Current: 146 tests passing
```

## Benchmarks

On a real 22-turn Tesla buying conversation (1,673 tokens):

| Metric | L1 (Facts) | L2 (Summary) |
|--------|-----------|--------------|
| Output tokens | 641 | 151 |
| Compression ratio | 2.6x | 11.1x |
| Key entities preserved | ✅ Penske, IMCU, Tesla | ✅ Tesla, IMCU, Model Y |
| Compression time | 26ms | — |

## License

MIT
