Metadata-Version: 2.4
Name: ai-preflight
Version: 0.3.0
Summary: Safe, budget-aware multi-model agent experiments via OpenRouter
Project-URL: Homepage, https://github.com/seadotdev/agent-preflight
Project-URL: Repository, https://github.com/seadotdev/agent-preflight
Project-URL: Issues, https://github.com/seadotdev/agent-preflight/issues
License: MIT
License-File: LICENSE
Keywords: agents,budget,docker,llm,openrouter,preflight
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.11
Requires-Dist: httpx>=0.27
Provides-Extra: docker
Description-Content-Type: text/markdown

# agent-preflight

Safe, budget-aware multi-model agent experiments via OpenRouter.

When you're running a field of LLMs on an autonomous task — each in its own Docker container, each with its own API key and spend cap — you need: preflight validation, per-model budget enforcement, eject policies, event logging, and a status dashboard. That's what this library provides.

## Background

This pattern emerged from two independent experiments:

1. **Loanville2** — an LLM lending benchmark where models autonomously underwrite loan applications through a full Loan Origination System REST API. Each model gets a provisioned sub-key and budget cap; a preflight check validates balance, model availability, and provisioning before the run starts.

2. **pi-docker** — a game strategy experiment where models compete to improve a Kurve AI in Docker containers. Budget awareness was injected as a file every 30 seconds; eject policies killed models that spent 25%+ of their budget without running a single test.

Both converged on the same infrastructure. This library extracts it.

## Install

```bash
pip install ai-preflight
```

Requires Python 3.11+. Docker is used via the CLI (no Docker SDK dependency).
The package ships with `py.typed` for mypy/pyright users.

## Quickstart

```python
from agent_preflight import preflight, provision_key, BudgetTracker

# 1. Validate before spending anything
preflight(
    admin_key="sk-or-admin-...",
    or_key="sk-or-...",
    models=["meta-llama/llama-3.3-70b-instruct"],
    budget_per_model=0.25,
)

# 2. Provision a capped sub-key
key = provision_key(admin_key, label="my-run", limit_usd=0.25)

# Optional: prefer {} on model-list fetch errors
from agent_preflight import get_available_models_or_empty
models = get_available_models_or_empty("sk-or-...")

# 3. Track spend inside your agent loop
tracker = BudgetTracker(api_key=key, limit_usd=0.25, mode="detailed")
status = tracker.poll()
print(status)  # Budget: $0.25 remaining of $0.25 (100% left)
```

## Prompt caching

Prompt caching is built into the runner, cache report, and `PromptBuilder`.

- Anthropic defaults to **explicit** per-block `cache_control` because that is the most reliable OpenRouter behavior we have observed.
- OpenAI-style providers benefit from a stable, append-only prompt prefix.
- The detailed guide, behavior notes, and practical best practices live in [CACHE.md](./CACHE.md).

## Full Docker harness

```python
from agent_preflight import run_multi, BudgetFractionEject, IdleTimeoutEject, CompositeEject

results = run_multi(
    models=[
        ("meta-llama/llama-3.3-70b-instruct", "llama"),
        ("google/gemini-flash-1.5", "gemini"),
    ],
    image_name="my-agent:latest",
    admin_key="sk-or-admin-...",
    or_key="sk-or-...",
    budget_usd=0.25,
    mounts=[
        ("/path/to/TASK.md", "/workspace/TASK.md", "ro"),
        "/path/to/cache:/workspace/cache:delegated",
    ],
    eject_policy=CompositeEject([
        BudgetFractionEject(check_after_pct=0.25,
                            progress_fn=lambda ctx: ctx["tool_calls"] > 0),
        IdleTimeoutEject(timeout_s=180),
    ]),
    budget_mode="detailed",
)
```

Each container:
- Gets a provisioned sub-key with a hard spend cap
- Receives `budget.txt` in its workspace every 30s (so the agent can see its balance)
- Is killed if an eject policy triggers

## Status dashboard

```bash
# Watch live run progress
watch -n 10 python -m agent_preflight status

# Or a specific run
python -m agent_preflight status run-20260312-120000
```

Output:
```
  Run: run-20260312-120000
  ============================================================
  alias           status      events  tools  snaps  cost  time  note
  ──────────────────────────────────────────────────────────────────
  llama           DONE            87     42      5  $0.18  4m22s
  gemini          running         34     18      2  $0.07  1m45s*  $0.18 remaining...
```

## Preflight checks

`preflight()` runs up to 7 checks before your experiment starts:

| Check | What it validates |
|-------|-------------------|
| `balance` | Account balance > $0.10 |
| `math` | Balance covers `n_models × budget_per_model` |
| `models` | Each model exists on OpenRouter (with pricing) |
| `tool_support` | Models in `required_params` / `common_required_params` support those request params |
| `key_smoke` | Provisions a $0.01 test key and makes a real API call |
| `cache_smoke` | Sends two Anthropic cacheable requests to verify prompt caching |
| `docker_image` | Docker image exists locally (if `docker_image=` set) |

```python
preflight(
    admin_key=..., or_key=...,
    models=[...], budget_per_model=0.25,
    checks=["balance", "math", "models", "tool_support"],
    common_required_params=["tools"],
    required_params={"anthropic/claude-sonnet-4": ["tools"]},
    custom_checks=[my_health_check],  # zero-arg or context-aware
)
```

```python
from agent_preflight import run_single

result = run_single(
    model_id="z-ai/glm-5",
    extra_models=[
        "google/gemini-2.5-flash",
        "anthropic/claude-sonnet-4",
    ],
    image_name="my-agent:latest",
    admin_key="sk-or-admin-...",
    or_key="sk-or-...",
    budget_usd=0.25,
)
```

Context-aware custom checks receive a dict with already-fetched data:

```python
def check_tool_support(ctx):
    model = "anthropic/claude-sonnet-4"
    supported = set((ctx["available_models"] or {}).get(model, {}).get("supported_parameters", []))
    missing = {"tools"} - supported
    return (
        "Custom tool support",
        not missing,
        "ok" if not missing else f"missing: {', '.join(sorted(missing))}",
    )
```

## Eject policies

Built-in policies:

| Class | Triggers when |
|-------|---------------|
| `BudgetExceededEject` | Measured spend ≥ limit |
| `IdleTimeoutEject(300)` | No events for N seconds |
| `BudgetFractionEject(0.20, progress_fn)` | progress_fn returns False at N% budget spent |
| `CompositeEject([...])` | Any sub-policy triggers |
| `default_eject_policies()` | BudgetExceededEject + IdleTimeoutEject(300s) |

Custom policy:

```python
from agent_preflight import EjectPolicy, EjectDecision

class MyEject(EjectPolicy):
    def check(self, *, budget_status, events, context, elapsed_s, **_):
        if context.get("output_written"):
            return EjectDecision(should_eject=False)
        if budget_status.pct_spent > 0.5:
            return EjectDecision(should_eject=True, reason="no output after 50% budget")
        return EjectDecision(should_eject=False)
```

## Budget injection

Agents that read `/workspace/output/budget.txt` get real-time cost visibility:

**Simple mode:**
```
Budget: $0.086 remaining of $0.15 (57% left)
```

**Detailed mode:**
```
Budget: $0.086 remaining of $0.15 (57% left)

Recent costs:
  - $0.0012  (bash)
  - $0.0031  (read_file)
  - $0.0008  (bash)

Total actions: 14  |  Avg cost/action: $0.0010
Estimated actions remaining: ~86
```

## Agent contract

Your Docker image needs to:
1. Accept `OPENROUTER_API_KEY` as an env var
2. Optionally read `/workspace/output/budget.txt` for cost awareness
3. Write output files to `/workspace/output/`
4. (For pi-style RPC mode) Accept a JSON prompt on stdin

Beyond that, the harness is framework-agnostic. Works with [pi-coding-agent](https://github.com/anthropics/pi), Claude Code, custom agents, or any container that calls OpenRouter.

## Environment variables for the CLI

```bash
OR_ADMIN_KEY=sk-or-admin-...
OPENROUTER_API_KEY=sk-or-...
PREFLIGHT_MODELS=meta-llama/llama-3.3-70b-instruct,google/gemini-flash-1.5
PREFLIGHT_BUDGET=0.25

python -m agent_preflight preflight
```

## License

MIT
