Metadata-Version: 2.4
Name: robot-resources-router
Version: 2.1.3
Summary: Intelligent LLM routing proxy — cost optimization via local proxy
Project-URL: Homepage, https://github.com/robot-resources/robot-resources
Project-URL: Documentation, https://github.com/robot-resources/robot-resources#readme
Project-URL: Repository, https://github.com/robot-resources/robot-resources
Author: Robot Resources Team
License: MIT
License-File: LICENSE
Keywords: ai,cost-optimization,llm,proxy,router,routing
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.10
Requires-Dist: click>=8.1.0
Requires-Dist: fastapi>=0.109.0
Requires-Dist: httpx>=0.26.0
Requires-Dist: pydantic-settings>=2.1.1
Requires-Dist: pydantic>=2.5.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: structlog>=24.1.0
Requires-Dist: tiktoken>=0.7.0
Requires-Dist: uvicorn>=0.27.0
Provides-Extra: dev
Requires-Dist: mypy>=1.8.0; extra == 'dev'
Requires-Dist: pre-commit>=3.6.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.1.0; extra == 'dev'
Requires-Dist: pytest>=7.4.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Description-Content-Type: text/markdown

# Robot Resources

[![CI](https://github.com/robot-resources/robot-resources/actions/workflows/ci.yml/badge.svg)](https://github.com/robot-resources/robot-resources/actions/workflows/ci.yml)
[![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
[![Version](https://img.shields.io/badge/version-2.0.1-blue.svg)](CHANGELOG.md)

> Intelligent LLM cost optimization via local proxy.

Automatically route each LLM request to the cheapest model that can handle it. Capability scores calibrated from [Chatbot Arena](https://arena.ai) ELO ratings.

- **API-key users**: 60-90% direct cost savings (benchmarked 82.5% avg, 210 prompts)
- **Subscription users** (e.g., OpenClaw + Claude): 3x token budget stretch (53.7% avg savings, Haiku/Sonnet/Opus split)

## Quick Start

The fastest way — installs Router, registers it as an always-on service, and auto-configures MCP:

```bash
npx @robot-resources/router
```

### From PyPI (manual setup)

```bash
# 1. Install
pip install robot-resources-router

# 2. Set API keys
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."

# 3. Start proxy
rr-router start
# Proxy running on http://localhost:3838
```

Either way, point your agent to `http://localhost:3838` and use `model: "auto"`.

## Why Robot Resources?

| Without RR | With RR |
|------------|---------|
| Every message uses same expensive model | Each message routed to optimal model |
| "hello" costs same as "refactor codebase" | Simple tasks use cheap/free models |
| Manual model selection | Automatic task detection |
| No cost visibility | Full routing transparency |

### Savings by user type

**API-key users** (pay per token, all 11 models available):

| Workload | Avg Savings | Typical Model |
|----------|-----------|---------------|
| Simple Q&A | 98% | gemini-2.5-flash-lite, gpt-5.4-nano |
| Creative | 83% | gpt-5.4-mini, gemini-2.5-flash |
| Reasoning | 79% | o4-mini, gemini-2.5-pro |
| Coding | 77% | gpt-5.4-mini, gemini-2.5-flash |
| Analysis | 73% | gpt-5.4-mini, gemini-2.5-pro |

**Subscription users** (e.g., OpenClaw + Claude, Anthropic models only):

| Complexity | Model Selected | Savings vs Opus |
|-----------|---------------|----------------|
| Simple prompts | Haiku (41.9%) | 80% |
| Medium prompts | Sonnet (50.5%) | 40% |
| Complex prompts | Opus (7.6%) | 0% |

Token budget multiplier: **3x** — your subscription handles 3x more requests through intelligent routing.

## How It Works

```
Your Agent
    │
    │  POST /v1/chat/completions
    │  model: "auto"
    ▼
┌─────────────────────────────────────┐
│   Robot Resources (localhost:3838)  │
│                                     │
│   1. Detect task type               │
│      → coding, reasoning, analysis  │
│        simple_qa, creative, general │
│                                     │
│   2. Filter capable models          │
│      → capability >= 0.70 threshold │
│                                     │
│   3. Select cheapest                │
│      → lowest cost_per_1k_input     │
│                                     │
│   4. Forward to provider            │
│      → Anthropic, OpenAI, Google    │
└─────────────────────────────────────┘
    │
    ▼
Real LLM Provider (using your API keys)
```

## Installation

### From PyPI

```bash
pip install robot-resources-router
```

### From Source

```bash
git clone https://github.com/robot-resources/robot-resources.git
cd robot-resources/router
pip install -e ".[dev]"
```

### Requirements

- Python 3.10+
- API keys for at least one provider (Anthropic, OpenAI, or Google)

## Configuration

### Environment Variables

```bash
# Required: At least one provider
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."
export GOOGLE_API_KEY="..."

# Optional: Server settings
export ROUTER_PORT=3838              # Default: 3838
export ROUTER_API_KEY="your-key"     # Optional: enable auth on all endpoints
export ROUTER_CORS_ORIGINS=""        # Default: localhost only
```

### Agent Integration

Point your agent's API base URL to `http://localhost:3838` and use model `auto`. Works with any OpenAI-compatible client.

```python
from openai import OpenAI

client = OpenAI(base_url="http://localhost:3838/v1", api_key="unused")
response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Hello!"}],
)
```

## API Reference

### Endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/v1/chat/completions` | POST | Chat completions (streaming supported) |
| `/v1/models` | GET | List available models |
| `/v1/stats` | GET | Cost savings statistics |
| `/v1/models/compare` | GET | Compare models by task type |
| `/v1/config` | GET/PATCH | View or update routing config at runtime |
| `/health` | GET | Health check with component diagnostics |

### Request Format

Standard OpenAI chat completions format:

```json
{
  "model": "auto",
  "messages": [
    {"role": "system", "content": "You are helpful."},
    {"role": "user", "content": "Hello!"}
  ],
  "temperature": 0.7,
  "max_tokens": 1000,
  "stream": false
}
```

### Response Format

Standard OpenAI format plus `routing_info`:

```json
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "gemini-2.0-flash",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 8,
    "total_tokens": 18
  },
  "routing_info": {
    "selected_model": "gemini-2.0-flash",
    "original_model": "auto",
    "provider": "google",
    "task_type": "simple_qa",
    "capability_score": 0.92,
    "savings_percent": 96.0,
    "baseline_model": "gpt-4o",
    "reasoning": "Selected gemini-2.0-flash as cheapest capable model..."
  }
}
```

## Task Types

RR automatically detects 6 task types:

| Task Type | Detection Keywords | Typical Models |
|-----------|-------------------|----------------|
| `coding` | function, code, debug, python, api | claude-sonnet-4-6, gpt-5.4-mini |
| `reasoning` | explain why, prove, step by step | o3, o4-mini |
| `analysis` | compare, pros and cons, evaluate | gpt-5.4-mini, gemini-2.5-pro |
| `simple_qa` | what is, who invented, capital of | gemini-2.5-flash, claude-haiku-4-5 |
| `creative` | write a story, compose, brainstorm | claude-sonnet-4-6, gpt-5.4 |
| `general` | (fallback) | cheapest available |

## Supported Models

11 models across 3 supported providers (routes within your available providers):

| Provider | Models |
|----------|--------|
| **OpenAI** | gpt-5.4, gpt-5.4-mini, gpt-5.4-nano, o3, o4-mini |
| **Anthropic** | claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5-20251001 |
| **Google** | gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite |

## CLI Commands

```bash
rr-router start                # Start the proxy server
rr-router start --port 8080    # Start on custom port
rr-router status               # Check proxy health and config
rr-router report weekly        # Cost savings report (7 days)
rr-router report monthly       # Cost savings report (30 days)
rr-router --version            # Show version
```

## MCP Server

The Router includes an MCP server for AI agent integration:

```bash
npx @robot-resources/router-mcp
```

Available tools: `router_get_stats`, `router_compare_models`, `router_get_config`, `router_set_config`.

## Development

See [CONTRIBUTING.md](CONTRIBUTING.md) for the full development guide.

```bash
git clone https://github.com/robot-resources/robot-resources.git
cd robot-resources/router
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
pytest  # 681 tests
```

### Project Structure

```
src/robot_resources/
├── cli/                    # CLI entry point (click)
├── config.py               # Centralized settings (pydantic-settings)
├── proxy/
│   ├── server.py           # FastAPI app (auth, CORS, lifespan, middleware)
│   ├── security.py         # Bearer token auth (timing-safe)
│   ├── models.py           # Pydantic models with validators
│   ├── handlers/           # API endpoints (completions, stats, config, compare)
│   └── providers/          # LLM provider clients (Anthropic, OpenAI, Google)
├── routing/
│   ├── task_detection.py   # 6 task types, keyword + context
│   ├── classifier.py       # LLM task classifier (async)
│   ├── router.py           # Hybrid routing with confidence branching
│   ├── selector.py         # Capability filter + cheapest model
│   ├── decision_log.py     # SQLite WAL decision persistence
│   └── models_db.json      # 11 models with capabilities + pricing
└── tracking/
    ├── db.py               # OutcomeDB (SQLite WAL, async, migrations)
    ├── recorder.py          # OutcomeRecorder (routing outcomes)
    ├── calculator.py        # CostCalculator (pricing from models_db)
    └── telemetry.py         # TelemetryReporter (platform API)
```

## Troubleshooting

### Port already in use

```bash
lsof -i :3838
rr-router start --port 3839
```

### API key not found

```bash
echo $ANTHROPIC_API_KEY
echo $OPENAI_API_KEY
export ANTHROPIC_API_KEY="sk-ant-..."
```

### Model not found

Use `model: "auto"` for automatic routing. Check `/v1/models` for available models.

## Roadmap

- [x] Local proxy with task detection routing
- [x] Real SSE streaming for all 3 providers
- [x] Hybrid routing (keyword + LLM classifier)
- [x] MCP server for stats and configuration
- [x] Production hardening (681 tests, error handling, observability)
- [ ] Outcome-based routing (learning from success/failure)
- [ ] Calibration lab (benchmark-driven model scoring)

## License

[MIT](LICENSE)

## Contributing

Contributions welcome! Please read [CONTRIBUTING.md](CONTRIBUTING.md) first.

For security vulnerabilities, see [SECURITY.md](.github/SECURITY.md).
