Metadata-Version: 2.4
Name: llm-rotator
Version: 0.2.0
Summary: Fault-tolerant LLM provider rotation with circuit breaker, quotas, and model-first routing
Project-URL: Homepage, https://github.com/dmitry/llm-rotator
Project-URL: Repository, https://github.com/dmitry/llm-rotator
Project-URL: Issues, https://github.com/dmitry/llm-rotator/issues
Author-email: Dmitry <dmitry@example.com>
License-Expression: MIT
License-File: LICENSE
Keywords: anthropic,circuit-breaker,fallback,gemini,langchain,llm,openai,rotation
Classifier: Development Status :: 3 - Alpha
Classifier: Framework :: AsyncIO
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries
Classifier: Typing :: Typed
Requires-Python: >=3.11
Requires-Dist: httpx<1,>=0.27
Requires-Dist: pydantic<3,>=2.0
Provides-Extra: all
Requires-Dist: langchain-core>=0.3; extra == 'all'
Requires-Dist: redis[asyncio]>=5.0; extra == 'all'
Provides-Extra: langchain
Requires-Dist: langchain-core>=0.3; extra == 'langchain'
Provides-Extra: redis
Requires-Dist: redis[asyncio]>=5.0; extra == 'redis'
Description-Content-Type: text/markdown

# llm-rotator

Fault-tolerant LLM provider rotation with circuit breaker, quotas, and model-first routing.

## Features

- **Model-First Routing** — tries all keys for a model before downgrading
- **Circuit Breaker** — granular blocking per key+model with TTL
- **Quota Management** — token and request quotas with automatic reset
- **Tier System** — quality ceiling to control model selection
- **Lifecycle Hooks** — extensible before/after request callbacks
- **Streaming** — with mid-stream error recovery
- **Tool Calling & Structured Output** — unified format across all providers
- **Structured Logging** — full routing chain in one log line
- **Quota Warnings** — alerts when usage crosses a threshold
- **LangChain Integration** — drop-in `RotatorChatModel` for chains and agents
- **JSON Config** — load config from file with env variable substitution
- **Async-First** — built on asyncio + httpx

## Supported Providers

- OpenAI (and any OpenAI-compatible API: OpenRouter, Groq, etc.)
- Google Gemini
- Anthropic Claude

## Installation

```bash
pip install llm-rotator

# With optional dependencies
pip install llm-rotator[redis]      # Redis state backend
pip install llm-rotator[langchain]  # LangChain integration
pip install llm-rotator[all]        # Everything
```

## Quick Start

### Basic Usage

```python
import asyncio
from llm_rotator import LLMRotator, OpenAIClient, RotatorConfig

config = RotatorConfig(
    providers=[
        {
            "name": "openai",
            "client_type": "openai",
            "priority": 1,
            "models": ["gpt-4o"],
            "keys": [{"token": "sk-...", "alias": "main"}],
        }
    ]
)

async def main():
    rotator = LLMRotator(config, clients={"openai": OpenAIClient()})
    response = await rotator.complete(
        messages=[{"role": "user", "content": "Hello!"}]
    )
    print(response.content)
    print(f"Tokens used: {response.usage.total_tokens}")

asyncio.run(main())
```

### JSON Config

Load configuration from a JSON file with env variable substitution for secrets:

```json
{
  "providers": [
    {
      "name": "openai",
      "client_type": "openai",
      "priority": 1,
      "models": ["gpt-4o"],
      "keys": [
        {"token": "$OPENAI_API_KEY", "alias": "main"},
        {"token": "${OPENAI_BACKUP_KEY:-}", "alias": "backup"}
      ]
    }
  ]
}
```

```python
rotator = LLMRotator.from_json("config.json", clients={"openai": OpenAIClient()})
```

Supported patterns: `$VAR`, `${VAR}`, `${VAR:-default}`.

### Multi-Provider with Fallback

If the first provider fails (rate limit, server error), the rotator automatically tries the next one:

```python
from llm_rotator import AnthropicClient, GeminiClient

config = RotatorConfig(
    providers=[
        {
            "name": "openai",
            "client_type": "openai",
            "priority": 1,
            "model_groups": [
                {
                    "name": "flagship",
                    "tier": 1,
                    "models": ["gpt-4o"],
                    "token_quota": {"limit": 250_000, "reset": "daily_utc"},
                },
                {
                    "name": "mini",
                    "tier": 3,
                    "models": ["gpt-4o-mini"],
                },
            ],
            "keys": [
                {"token": "sk-key1", "alias": "key1"},
                {"token": "sk-key2", "alias": "key2"},
            ],
        },
        {
            "name": "anthropic",
            "client_type": "anthropic",
            "priority": 2,
            "models": ["claude-sonnet-4-20250514"],
            "keys": [{"token": "sk-ant-...", "alias": "claude_key"}],
        },
        {
            "name": "gemini",
            "client_type": "gemini",
            "priority": 3,
            "models": ["gemini-2.0-flash"],
            "keys": [{"token": "AIza...", "alias": "gemini_key"}],
        },
    ]
)

rotator = LLMRotator(
    config,
    clients={
        "openai": OpenAIClient(),
        "anthropic": AnthropicClient(),
        "gemini": GeminiClient(),
    },
)
```

### Streaming

```python
async for chunk in rotator.stream(
    messages=[{"role": "user", "content": "Tell me a story"}]
):
    print(chunk.delta, end="", flush=True)
```

Mid-stream errors are handled automatically — the rotator retries from the beginning on the next available candidate.

### Tool Calling

Unified OpenAI-compatible tool format across all providers:

```python
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather",
            "parameters": {
                "type": "object",
                "properties": {"city": {"type": "string"}},
                "required": ["city"],
            },
        },
    }
]

response = await rotator.complete(
    messages=[{"role": "user", "content": "Weather in Paris?"}],
    tools=tools,
)

if response.tool_calls:
    for tc in response.tool_calls:
        print(f"Call {tc.name}({tc.arguments})")
```

Tools are automatically translated to each provider's native format (Gemini `functionDeclarations`, Anthropic `tools` with `input_schema`).

### Structured Output

```python
# JSON mode
response = await rotator.complete(
    messages=[{"role": "user", "content": "Return a JSON with name and age"}],
    response_format={"type": "json_object"},
)
```

### Tier-Based Routing

Control model quality per request using `RoutingContext`:

```python
from llm_rotator import RoutingContext

# Simple task — use only economy models (tier >= 3)
result = await rotator.complete(
    messages=[{"role": "user", "content": "Classify this text"}],
    routing=RoutingContext(tier=3),
)

# Complex task — full access starting from flagship (tier >= 1, default)
result = await rotator.complete(
    messages=[{"role": "user", "content": "Write a detailed analysis"}],
    routing=RoutingContext(tier=1),
)

# Restrict to specific providers
result = await rotator.complete(
    messages=messages,
    routing=RoutingContext(allowed_providers=["gemini"]),
)
```

### LangChain Integration

Drop-in replacement for any LangChain `BaseChatModel`:

```python
from llm_rotator import RotatorChatModel
from langchain_core.messages import HumanMessage

model = RotatorChatModel(rotator=rotator)

# Use directly
response = await model.ainvoke([HumanMessage(content="Hello")])

# Use in chains
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

chain = ChatPromptTemplate.from_messages([
    ("system", "Answer concisely."),
    ("user", "{question}"),
]) | model | StrOutputParser()

result = await chain.ainvoke({"question": "Capital of France?"})

# With tools
bound = model.bind_tools([...])
result = await bound.ainvoke([HumanMessage(content="Search for Python")])

# With routing context
result = await model.ainvoke(
    [HumanMessage(content="Simple task")],
    routing=RoutingContext(tier=3),
)
```

### Structured Logging

Full routing chain in one log line:

```
[req:abc123] [OpenAI/flagship] gpt-4o (main_key) → 429 RateLimit → gpt-4o (backup_key) → 200 OK (usage: 150 tokens)
[req:def456] [OpenAI/flagship] gpt-4o (key1) → 500 ServerError → [Gemini/_default] gemini-flash (google_key) → 200 OK (usage: 120 tokens)
```

Pass a custom logger:

```python
import logging

my_logger = logging.getLogger("my_app.llm")
rotator = LLMRotator(config, clients=clients, logger=my_logger)
```

### Quota Warnings

Get notified when usage approaches the limit:

```python
from llm_rotator.quota import QuotaWarning

async def on_warning(w: QuotaWarning):
    print(f"Quota {w.scope}: {w.percentage:.0%} ({w.current}/{w.limit})")

rotator = LLMRotator(
    config,
    clients=clients,
    on_quota_warning=on_warning,
    warning_threshold=0.8,  # default: 80%
)
```

### Lifecycle Hooks

Add custom logic without modifying the rotator:

```python
class BudgetHook:
    async def before_request(self, ctx, candidate):
        """Return False to skip a candidate."""
        if ctx.tags.get("user_tier") == "free" and candidate.model_group == "flagship":
            return False
        return True

    async def after_response(self, ctx, candidate, usage):
        """Called after a successful response."""
        print(f"Used {usage.total_tokens} tokens on {candidate.model}")

    async def on_fallback(self, ctx, from_candidate, to_candidate, error):
        """Called when switching to the next candidate."""
        print(f"Fallback from {from_candidate.model}: {error}")

rotator.add_hook(BudgetHook())

result = await rotator.complete(
    messages=messages,
    routing=RoutingContext(tags={"user_tier": "free"}),
)
```

### Redis Backend

For multi-instance deployments:

```python
from llm_rotator import RedisBackend

backend = await RedisBackend.from_url("redis://localhost:6379/0")
rotator = LLMRotator(config, clients=clients, backend=backend)
```

### OpenAI-Compatible Providers

Use any OpenAI-compatible API by setting `base_url`:

```python
config = RotatorConfig(
    providers=[
        {
            "name": "openrouter",
            "client_type": "openai",
            "priority": 1,
            "base_url": "https://openrouter.ai/api/v1",
            "models": ["meta-llama/llama-3.3-70b-instruct:free"],
            "keys": [{"token": "sk-or-...", "alias": "openrouter"}],
        },
        {
            "name": "groq",
            "client_type": "openai",
            "priority": 2,
            "base_url": "https://api.groq.com/openai/v1",
            "models": ["llama-3.3-70b-versatile"],
            "keys": [{"token": "gsk_...", "alias": "groq"}],
        },
    ]
)
```

## How Rotation Works

1. Providers are tried in **priority order** (lowest number first)
2. Within each provider, model groups are sorted by **tier** (best quality first)
3. For each model, **all keys** are tried before downgrading (Model-First)
4. **Circuit breaker** blocks failed key+model combinations with TTL
5. **Quota manager** skips exhausted candidates without making HTTP requests
6. **Hooks** can filter candidates based on custom business logic

## License

MIT
