Metadata-Version: 2.4
Name: tokenpak
Version: 1.0.2
Summary: Slash LLM costs with intelligent context compression, smart routing, and cost tracking
Author-email: Kevin Yang <kaywhy331@gmail.com>
Maintainer-email: Kevin Yang <kaywhy331@gmail.com>
License: MIT License
        
        Copyright (c) 2026 TokenPak Contributors
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/tokenpak/tokenpak
Project-URL: Documentation, https://github.com/tokenpak/tokenpak/blob/main/README.md
Project-URL: Bug Reports, https://github.com/tokenpak/tokenpak/issues
Project-URL: Source Code, https://github.com/tokenpak/tokenpak
Project-URL: Changelog, https://github.com/tokenpak/tokenpak/blob/main/CHANGELOG.md
Keywords: llm,ai,proxy,token-optimization,openai,anthropic,compression,context,context-window,tokens,cost-tracking,routing
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Internet :: WWW/HTTP :: HTTP Servers
Classifier: Topic :: System :: Networking
Classifier: Topic :: Utilities
Classifier: Typing :: Typed
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: aiohttp>=3.9.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: click>=8.1.0
Requires-Dist: starlette>=0.36.0
Requires-Dist: uvicorn>=0.27.0
Requires-Dist: httpx>=0.26.0
Requires-Dist: h2<5,>=3
Requires-Dist: requests>=2.28.0
Requires-Dist: watchdog>=3.0.0
Requires-Dist: requests>=2.26.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23.0; extra == "dev"
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
Requires-Dist: pytest-benchmark>=4.0.0; extra == "dev"
Requires-Dist: pytest-mock>=3.12.0; extra == "dev"
Requires-Dist: black>=24.0.0; extra == "dev"
Requires-Dist: ruff>=0.2.0; extra == "dev"
Provides-Extra: tokens
Requires-Dist: tiktoken>=0.5.0; extra == "tokens"
Provides-Extra: docs
Requires-Dist: mkdocs>=1.5.0; extra == "docs"
Requires-Dist: mkdocs-material>=9.5.0; extra == "docs"
Dynamic: license-file

# TokenPak

[![PyPI version](https://img.shields.io/pypi/v/tokenpak.svg)](https://pypi.org/project/tokenpak/)
[![Python 3.10+](https://img.shields.io/pypi/pyversions/tokenpak.svg)](https://pypi.org/project/tokenpak/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Tests](https://img.shields.io/badge/tests-375%20passing-brightgreen.svg)](#)

**Token compression and cost optimization proxy for LLM APIs**

TokenPak is a drop-in HTTP proxy that sits between your app and LLM APIs. It compresses requests, deduplicates cache hits, and tracks token spend — with zero SDK changes required.

---

## Installation

```bash
pip install tokenpak
```

Requires Python 3.10+. The free tier supports Anthropic, OpenAI, and Google Gemini SDKs.

---

## 5-Minute Quickstart

### 1. Start the proxy

```bash
tokenpak serve --port 8766
```

### 2. Point your SDK at the proxy

```python
import anthropic

client = anthropic.Anthropic(
    base_url="http://localhost:8766",
    api_key="your-anthropic-api-key",
)

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=256,
    messages=[{"role": "user", "content": "Explain token compression in 2 sentences."}],
)
print(response.content[0].text)
```

**Expected output:**
```
Token compression reduces the number of tokens sent to an LLM by removing
redundant context and reformatting prompts more efficiently, lowering API cost
without changing the model response quality.
```

No code changes needed beyond setting `base_url`. All SDKs (Anthropic, OpenAI, Google) work the same way.

### 3. Check your savings

```bash
tokenpak status
tokenpak cost --last 7d
```

---

## Adapters

| Provider | SDK | Base URL |
|----------|-----|----------|
| Anthropic | `anthropic` | `http://localhost:8766` |
| OpenAI | `openai` | `http://localhost:8766/v1` |
| Google Gemini | `google-generativeai` | `http://localhost:8766/google` |

See [`docs/adapters/`](docs/adapters/) for full per-provider reference.

---

## Feature Matrix

| Feature | Free | Pro |
|---------|:----:|:---:|
| Anthropic, OpenAI, Google adapters | ✅ | ✅ |
| Request compression (capsule builder) | ✅ | ✅ |
| Token usage telemetry | ✅ | ✅ |
| Cost dashboard (`tokenpak status`) | ✅ | ✅ |
| Streaming support | ✅ | ✅ |
| Vault indexer (8 modules) | ✅ | ✅ |
| Middleware & logging hooks | ✅ | ✅ |
| Semantic cache | ❌ | ✅ |
| Smart model routing | ❌ | ✅ |
| PII scrubbing | ❌ | ✅ |
| Multi-tenant seat management | ❌ | ✅ |
| SLA/priority queuing | ❌ | ✅ |

---

## Configuration

```yaml
# tokenpak.yaml
proxy:
  host: 127.0.0.1
  port: 8766
  workers: 4

compression:
  enabled: true
  ratio_target: 0.8   # target 20% token reduction

cache:
  enabled: true
  max_size_mb: 1000
  ttl_seconds: 3600
```

```bash
tokenpak serve --config tokenpak.yaml
```

---

## CLI Commands

```bash
tokenpak serve          # Start the proxy
tokenpak status         # Live stats (requests, tokens, cost)
tokenpak cost           # Cost breakdown by model/provider
tokenpak demo           # Interactive pipeline visualization
tokenpak compress <file> # Test compression on a file
tokenpak doctor         # Diagnose configuration issues
```

---

## How It Works

1. **Intercept** — Proxy receives your SDK request
2. **Compress** — Capsule builder removes redundant context (10–40% reduction)
3. **Forward** — Optimized request sent to real API with your credentials
4. **Cache** — Response stored for deduplication
5. **Track** — Token usage and cost logged per request

Response headers include `x-tokenpak-ratio` (compression ratio) and `x-tokenpak-cache-hit`.

---

## Known Limitations

- Vision/image inputs pass through without compression (text only)
- Streaming compression applies to prompt only; response streams verbatim
- Python 3.10+ required
- Semantic cache and smart routing require Pro tier

---

## Documentation

- [Anthropic Adapter](docs/adapters/anthropic.md)
- [OpenAI Adapter](docs/adapters/openai.md)
- [Google Adapter](docs/adapters/google.md)
- [Vault Indexer](docs/vault.md) *(coming soon)*

---

## Support

- **Bugs & features:** [GitHub Issues](https://github.com/kaywhy331/tokenpak/issues)
- **License:** MIT — see [LICENSE](LICENSE)

---

*TokenPak v0.2.0 · Last updated March 2026*
