Metadata-Version: 2.4
Name: covenance
Version: 0.0.7
Summary: Online LLM clients for OpenAI, Google Gemini, Mistral, Anthropic Claude, and OpenRouter
Author: Ilya Kamen
License: MIT
License-File: LICENSE
Keywords: anthropic,gemini,llm,mistral,openai,openrouter
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Utilities
Requires-Python: >=3.10
Requires-Dist: httpx
Requires-Dist: pydantic
Requires-Dist: pydantic-ai
Requires-Dist: python-dotenv
Requires-Dist: tenacity
Provides-Extra: dev
Requires-Dist: build; extra == 'dev'
Requires-Dist: pre-commit; extra == 'dev'
Requires-Dist: pytest; extra == 'dev'
Requires-Dist: pytest-cov; extra == 'dev'
Requires-Dist: pytest-rerunfailures; extra == 'dev'
Requires-Dist: pytest-xdist; extra == 'dev'
Requires-Dist: ruff; extra == 'dev'
Requires-Dist: twine; extra == 'dev'
Description-Content-Type: text/markdown

# covenance

[![PyPI version](https://img.shields.io/pypi/v/covenance)](https://pypi.org/project/covenance/)
[![Tests](https://github.com/ikamensh/covenance/actions/workflows/test.yml/badge.svg?branch=main)](https://github.com/ikamensh/covenance/actions/workflows/test.yml)
[![codecov](https://codecov.io/gh/ikamensh/covenance/branch/main/graph/badge.svg)](https://codecov.io/gh/ikamensh/covenance)

Type-safe LLM outputs across any provider. Track every call and its cost.

```python
from covenance import ask_llm

review = ask_llm("Write a short review of Inception", model="gpt-4.1-nano")
is_positive = ask_llm(
    "Is this review positive? '{review}'", 
    model="gemini-2.5-flash-lite", 
    response_type=bool)
print(is_positive)  # True/False
```

## Usecases

- **Structured outputs that work** - Same code, any provider. Pydantic models, primitives, lists, tuples.
- **Zero routing code** - Model name determines provider automatically (`gemini-*`, `claude-*`, `gpt-*`)
- **Convenience** - you get TPM (Token Per Minute) limit retries automatically, as well as if the LLM fails to return the type you have requested.
- **Visibility: Know what you're calling and spending** - Every call logged with token counts and cost. `print_usage()` for totals, `print_call_timeline()` for a visual waterfall.

## Installation

Install only the providers you need:

```bash
pip install covenance[openai]      # OpenAI, Grok, OpenRouter
pip install covenance[anthropic]   # Anthropic Claude
pip install covenance[google]      # Google Gemini
pip install covenance[mistral]     # Mistral

# Multiple providers
pip install covenance[openai,anthropic]

# All providers
pip install covenance[all]
```

## Structured outputs

Pass `response_type` to get validated, typed results:

```python
# Pydantic models
class Evaluation(BaseModel):
    reasoning: str
    is_correct: bool

result = ask_llm("Is 2+2=5?", model="gemini-2.5-flash-lite", response_type=Evaluation)
print(result.reasoning)  # "2+2 equals 4, not 5"
print(result.is_correct)  # False

# Primitives
answer = ask_llm("Is Python interpreted?", model="gpt-4.1-nano", response_type=bool)
print(answer)  # True

# Collections
items = ask_llm("List 3 prime numbers", model="claude-sonnet-4-20250514", response_type=list[int])
print(items)  # [2, 3, 5]
```

Works identically across OpenAI, Gemini, Anthropic, Mistral, Grok, and OpenRouter.

## Cost tracking

Every call is recorded with token counts and cost:

```python
from covenance import ask_llm, print_usage, print_call_timeline, get_records

ask_llm("Hello", model="gpt-4.1-nano")
ask_llm("Hello", model="gemini-2.5-flash-lite")

print_usage()
# ==================================================
# LLM Usage Summary (default client)
# ==================================================
#   Calls: 2
#   Tokens: 45 (In: 12, Out: 33)
#   Cost: $0.0001
#   Models: gemini/gemini-2.5-flash-lite, openai/gpt-4.1-nano

# Access individual records
for record in get_records():
    print(f"{record.model}: {record.cost_usd}")
```

Persist records by setting `COVENANCE_RECORDS_DIR` or calling `set_llm_call_records_dir()`.

## Call timeline

Visualize call sequences and parallelism in your terminal:

```python
from covenance import print_call_timeline

print_call_timeline()
# LLM Call Timeline (4.4s total, 5 calls)
#                         |0s                                            4.4s|
#   gpt-4.1-nano    1.3s  |████████████████                                  |
#   g2.5-flash-l    1.1s  |                 ████████████                     |
#   g2.5-flash-l    1.1s  |                 ████████████                     |
#   g2.5-flash-l    1.5s  |                 ████████████████                 |
#   g2.5-flash-l    1.5s  |                                 █████████████████|
```

Each line is a call, sorted by start time. Blocks show when each call was active - parallel calls appear as overlapping bars on different rows.

## Consensus for quality

Run parallel LLM calls and integrate results for higher quality:

```python
from covenance import llm_consensus

result = llm_consensus(
    "Explain quantum entanglement",
    model="gpt-4.1-nano",
    response_type=Evaluation,
    num_candidates=3,  # 3 parallel calls + integration
)
```

## Supported providers

Provider is determined by model name prefix:

| Prefix | Provider |
|--------|----------|
| `gpt-*`, `o1-*`, `o3-*` | OpenAI |
| `gemini-*` | Google Gemini |
| `claude-*` | Anthropic |
| `mistral-*`, `codestral-*` | Mistral |
| `grok-*` | xAI Grok |
| `org/model` (contains `/`) | OpenRouter |

### Structured output reliability

Providers differ in how they enforce JSON schema compliance:

| Provider | Method | Guarantee |
|----------|--------|-----------|
| OpenAI | [Constrained decoding](https://openai.com/index/introducing-structured-outputs-in-the-api) | 100% schema-valid JSON |
| Google Gemini | [Controlled generation](https://ai.google.dev/gemini-api/docs/structured-output) | 100% schema-valid JSON |
| Grok | [Constrained decoding](https://docs.x.ai/docs/guides/structured-outputs) | 100% schema-valid JSON |
| Anthropic | [Structured outputs beta](https://docs.anthropic.com/en/docs/build-with-claude/structured-outputs) | 100% schema-valid JSON* |
| Mistral | [Best-effort](https://docs.mistral.ai/capabilities/structured_output) | Probabilistic |
| OpenRouter | Varies | Depends on underlying model |

*Anthropic structured outputs requires SDK >= 0.74.1 (uses `anthropic-beta: structured-outputs-2025-11-13`). Mistral uses probabilistic generation. Covenance retries automatically (up to 3 times) on JSON parse errors for Mistral.

## API keys

Set environment variables for the providers you use:

- `OPENAI_API_KEY`
- `GOOGLE_API_KEY` (or `GEMINI_API_KEY`)
- `ANTHROPIC_API_KEY`
- `MISTRAL_API_KEY`
- `OPENROUTER_API_KEY`
- `XAI_API_KEY` (for Grok)

A `.env` file in the working directory is loaded automatically.

## Isolated clients

Use `Covenance` instances for separate API keys and call records per subsystem:

```python
from covenance import Covenance
from pydantic import BaseModel

# Each client tracks its own usage
question_client = Covenance(label="questions")
review_client = Covenance(label="review")

answer = question_client.ask_llm("Who is David Blaine?", model="gpt-4.1-nano")

class Evaluation(BaseModel):
    reasoning: str
    is_correct: bool

eval = review_client.llm_consensus(
    f"Is this accurate? '''{answer}'''",
    model="gemini-2.5-flash-lite",
    response_type=Evaluation,
)

question_client.print_usage()  # Shows only the question call
review_client.print_usage()    # Shows only the review call
```

## How it works: dual backend

Covenance uses two backends for structured output and picks the better one per provider:

- **Native SDK** — calls the provider's API directly (e.g., OpenAI [Responses API](https://platform.openai.com/docs/api-reference/responses) with `responses.parse`)
- **pydantic-ai** — uses [pydantic-ai](https://github.com/pydantic/pydantic-ai) as a unified layer

The default routing:

| Provider | Backend | Why |
|----------|---------|-----|
| OpenAI | Native | Responses API with constrained decoding handles enums, recursive types, and large schemas more reliably |
| Grok | Native | OpenAI-compatible API, same benefits |
| Gemini | pydantic-ai | Native SDK hits `RecursionError` on self-referencing types (e.g., tree nodes) |
| Anthropic | pydantic-ai | No native client implemented |
| Mistral | pydantic-ai | Similar pass rates; pydantic-ai handles recursive types better |
| OpenRouter | pydantic-ai | No native client implemented |

These defaults are based on a [stress test suite](scripts/stress/) that runs 14 test categories across providers with both backends. The results for the cheapest model per provider:

```
OpenAI  (gpt-4.1-nano):          native 14/14, pydantic-ai 10/14
Gemini  (gemini-2.5-flash-lite): native 11/14, pydantic-ai 13/14
Mistral (mistral-small-latest):  native  9/14, pydantic-ai  8/14
```

Where native beats pydantic-ai on OpenAI: enum adherence (strict values vs. hallucinated ones), recursive types (deeper trees), real-world schemas (fewer empty fields), and extreme schema limits (100+ fields with Literal types).

Where pydantic-ai beats native on Gemini: recursive/self-referencing types (native Google SDK crashes with `RecursionError`).

### Overriding the backend

Each `Covenance` instance has a `backends` object with a field per provider. You can inspect and override them:

```python
from covenance import Covenance

client = Covenance()
print(client.backends)
# Backends(native=[openai, grok], pydantic=[gemini, anthropic, mistral, openrouter])

# Override a specific provider
client.backends.anthropic = "native"

# Force all providers to one backend (useful for benchmarking)
client.backends.set_all("native")
```

Only `"native"` and `"pydantic"` are accepted — anything else raises `ValueError`.

Every call records which backend was used:

```python
for record in client.get_records():
    print(f"{record.model}: {record.backend}")  # "native" or "pydantic"
```

The backend also shows in `print_call_timeline()` as `(N)` or `(P)`:

```python
print_call_timeline()
# LLM Call Timeline (2.1s total, 2 calls)
#                            |0s                                       2.1s|
#   gpt-4.1-nano(N)    0.8s  |█████████████████                            |
#   g2.5-flash-l(P)    1.1s  |                  ██████████████████████████  |
```

To see routing decisions in real time, enable debug logging:

```python
import logging
logging.basicConfig(level=logging.DEBUG)
# DEBUG:covenance:ask_llm: model=gpt-4.1-nano provider=openai backend=native
```
