Metadata-Version: 2.4
Name: costwise-mcp
Version: 0.2.0
Summary: CostWise Managed Cost Policy SDK — automatically reduce AI costs without changing your workflow
Author-email: CostWise <sdk@cost-wise.dev>
License: MIT
Project-URL: Homepage, https://cost-wise.dev
Project-URL: Documentation, https://cost-wise.dev/docs/mcp-sdk
Project-URL: Repository, https://github.com/costwise/mcp-sdk
Keywords: ai,cost-optimization,llm,finops,openai,anthropic
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: tiktoken>=0.5.0
Provides-Extra: telemetry
Requires-Dist: httpx>=0.25.0; extra == "telemetry"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21; extra == "dev"

# CostWise MCP SDK

**Automatically reduce AI costs without changing your workflow.**

The CostWise Managed Cost Policy (MCP) SDK analyzes your LLM requests, classifies task complexity, and recommends cheaper models and token limits — saving up to 90% on AI costs.

## Install

```bash
pip install costwise-mcp
```

## Quick Start

```python
from costwise_mcp import CostPolicy

policy = CostPolicy(
    api_key="cw_your_api_key",
    backend_url="https://app.cost-wise.dev",  # your CostWise instance
)

decision = policy.optimize(
    prompt="Translate 'hello' to French",
    model="gpt-5.4",
)

print(decision.recommended_model)  # "gpt-5.4-mini" (70% cheaper)
print(decision.max_tokens)         # 256
print(decision.estimated_cost)     # $0.001154

# Call your LLM with the optimized params
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
    model=decision.recommended_model,
    max_tokens=decision.max_tokens,
    messages=[{"role": "user", "content": "Translate 'hello' to French"}],
)

# Report actual usage (async, non-blocking, optional)
policy.report(decision, actual_tokens=response.usage.total_tokens)
```

## Team Policies (AI Governance)

Enforce model access and budgets per team:

```python
from costwise_mcp import CostPolicy

policy = CostPolicy(api_key="cw_...", backend_url="https://app.cost-wise.dev")

# Interns: restricted to budget models (tier 3), max $50/month
decision = policy.optimize(
    prompt="Write a summary",
    model="gpt-5.4",      # they asked for premium
    team="interns",        # policy enforces tier 3
)
# decision.recommended_model = "gpt-5.4-nano"
# decision.message = "Team 'interns' restricted to tier 3 models"

# Engineers: standard tier, $500/month budget
decision = policy.optimize(
    prompt="Debug this code",
    model="gpt-5.4",
    team="engineers",      # policy enforces tier 2
)
# decision.recommended_model = "gpt-5.4-mini"

# AI Research: full access, no budget limit
decision = policy.optimize(
    prompt="Analyze this architecture...",
    model="gpt-5.4",
    team="ai-team",        # tier 1 = all models allowed
)
# decision.recommended_model = "gpt-5.4" (kept for complex tasks)
```

### Team tier levels

| Tier | Access | Example teams |
|------|--------|---------------|
| 1 — Premium | All models (gpt-5.4, claude-opus-4-6, etc.) | AI Research, CTO |
| 2 — Standard | Mid-tier models (gpt-5.4-mini, claude-sonnet-4-6) | Engineers, Data Science |
| 3 — Budget | Cheapest models only (gpt-5.4-nano, claude-haiku-4-5) | Interns, Support, QA |

Configure team policies in CostWise: **Settings > MCP Teams**.

## Error Handling

```python
from costwise_mcp import (
    CostPolicy,
    BudgetExceededError,
    ModelBlockedError,
    TierRestrictionError,
    PolicyViolationError,
)

policy = CostPolicy(api_key="cw_...", backend_url="https://app.cost-wise.dev")

decision = policy.optimize(
    prompt="Generate a report",
    model="gpt-5.4",
    team="interns",
)

if not decision.allowed:
    print(f"Blocked: {decision.message}")
    # "Team 'interns' monthly budget exceeded ($50.42 / $50.00)"
else:
    # Proceed with the LLM call
    print(f"Use {decision.recommended_model}, max {decision.max_tokens} tokens")
```

### Error types

| Error | When | Fields |
|-------|------|--------|
| `BudgetExceededError` | Team monthly/daily budget exceeded | `current_spend`, `budget_limit` |
| `ModelBlockedError` | Model is on the team's blocklist | `blocked_model`, `alternative` |
| `TierRestrictionError` | Model tier above team's limit | `model_tier`, `max_tier` |
| `PolicyViolationError` | Any team policy violation (base) | `team`, `reason` |
| `InvalidAPIKeyError` | API key invalid or expired | — |
| `RateLimitError` | Backend rate limit exceeded | `retry_after` |

## Framework Integration

The SDK works with **any AI framework** — it runs before the LLM call, not instead of it.

### OpenAI SDK

```python
from costwise_mcp import CostPolicy
from openai import OpenAI

policy = CostPolicy(api_key="cw_...", backend_url="https://app.cost-wise.dev")
client = OpenAI()

decision = policy.optimize("Summarize this article", model="gpt-5.4", team="engineers")
response = client.chat.completions.create(
    model=decision.recommended_model,
    max_tokens=decision.max_tokens,
    messages=[{"role": "user", "content": prompt}],
)
policy.report(decision, actual_tokens=response.usage.total_tokens)
```

### Anthropic SDK

```python
from costwise_mcp import CostPolicy
import anthropic

policy = CostPolicy(api_key="cw_...", backend_url="https://app.cost-wise.dev")
client = anthropic.Anthropic()

decision = policy.optimize(prompt, model="claude-opus-4-6", team="ai-team")
message = client.messages.create(
    model=decision.recommended_model,
    max_tokens=decision.max_tokens,
    messages=[{"role": "user", "content": prompt}],
)
policy.report(decision, output_tokens=message.usage.output_tokens)
```

### LangChain

```python
from costwise_mcp import CostPolicy
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

policy = CostPolicy(api_key="cw_...", backend_url="https://app.cost-wise.dev")

decision = policy.optimize(prompt, model="gpt-5.4", team="engineers")
llm = ChatOpenAI(model=decision.recommended_model, max_tokens=decision.max_tokens)
result = llm.invoke([HumanMessage(content=prompt)])
```

### LlamaIndex

```python
from costwise_mcp import CostPolicy
from llama_index.llms.openai import OpenAI

policy = CostPolicy(api_key="cw_...", backend_url="https://app.cost-wise.dev")

decision = policy.optimize(prompt, model="gpt-5.4", team="engineers")
llm = OpenAI(model=decision.recommended_model, max_tokens=decision.max_tokens)
response = llm.complete(prompt)
```

### Google Gemini

```python
from costwise_mcp import CostPolicy
import google.generativeai as genai

policy = CostPolicy(api_key="cw_...", backend_url="https://app.cost-wise.dev")

decision = policy.optimize(prompt, model="gemini-2.5-pro", team="engineers")
model = genai.GenerativeModel(decision.recommended_model)
response = model.generate_content(prompt)
```

## Custom Configuration

```python
from costwise_mcp import CostPolicy, PolicyConfig

policy = CostPolicy(
    api_key="cw_...",
    backend_url="https://app.cost-wise.dev",
    config=PolicyConfig(
        max_tokens_simple=128,      # override default 256
        max_tokens_medium=512,      # override default 1024
        max_tokens_complex=2048,    # override default 4096
        auto_downgrade=True,        # auto-select cheaper models for simple tasks
        blocked_models=["o1-pro"],  # local blocklist (in addition to team policy)
        telemetry_batch_size=100,   # send every 100 events (default: 50)
        telemetry_flush_interval=60, # send every 60 seconds (default: 30)
    ),
    project_id="my-chatbot",
)
```

## Supported Models (75 models, 10 providers)

| Provider | Models |
|----------|--------|
| OpenAI (18) | gpt-5.4, gpt-5.4-mini, gpt-5.4-nano, gpt-4.1, gpt-4o, gpt-4o-mini, o1, o1-mini, o3, o3-mini, o3-pro, o4-mini |
| Anthropic (23) | claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5, claude-opus-4-5, claude-sonnet-4-5, claude-3.5-sonnet, claude-3.5-haiku |
| Google (7) | gemini-2.5-pro, gemini-2.5-flash, gemini-2.0-flash, gemini-1.5-pro, gemini-1.5-flash |
| Mistral (7) | mistral-large, mistral-small, open-mistral-nemo, codestral, pixtral-large |
| Meta/Llama (6) | llama-3.3-70b, llama-3.1-405b, llama-4-scout, llama-4-maverick |
| Cohere (4) | command-r-plus, command-r, command-light, command-a |
| xAI/Grok (3) | grok-3, grok-3-mini, grok-2 |
| AWS Nova (3) | nova-pro, nova-lite, nova-micro |
| DeepSeek (2) | deepseek-chat, deepseek-reasoner |
| AI21 (2) | jamba-1.5-large, jamba-1.5-mini |

## Privacy

- Prompts are **never** sent to the CostWise backend
- Only metadata: token counts, model name, cost, team ID
- Telemetry is optional: `PolicyConfig(telemetry_enabled=False)`

## API Reference

### `CostPolicy(api_key, backend_url, config, project_id)`

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `api_key` | str | Yes | CostWise API key (`cw_...`). Get one at Settings > MCP SDK. |
| `backend_url` | str | No | Your CostWise instance URL. Default: `https://app.cost-wise.dev` |
| `config` | PolicyConfig | No | Custom token limits, budgets, blocked models |
| `project_id` | str | No | Group analytics by project |

### `policy.optimize(prompt, model, task_type, max_budget, team) → Decision`

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `prompt` | str | Yes | Prompt text (for token estimation + complexity classification) |
| `model` | str | Yes | Model you intend to use (e.g. `"gpt-5.4"`) |
| `task_type` | TaskComplexity | No | Override auto-classification: `SIMPLE`, `MEDIUM`, `COMPLEX` |
| `max_budget` | float | No | Max cost in USD for this request |
| `team` | str | No | Team ID for policy enforcement (e.g. `"interns"`) |

### `policy.report(decision, actual_tokens, output_tokens, latency_ms)`

Report actual usage after LLM call. Async, non-blocking.

### `Decision` object

| Field | Type | Description |
|-------|------|-------------|
| `recommended_model` | str | Model to use |
| `max_tokens` | int | Token limit |
| `estimated_cost` | float | Cost in USD |
| `original_model` | str | Originally requested model |
| `input_tokens` | int | Estimated input tokens |
| `complexity` | TaskComplexity | simple, medium, complex |
| `allowed` | bool | Whether request is within budget/policy |
| `savings_pct` | float | Percentage saved vs original |
| `estimated_savings` | float | USD saved |
| `message` | str | Human-readable recommendation |

## Get Your API Key

1. Sign in to your CostWise instance
2. Go to **Settings > MCP SDK**
3. Click **Generate API Key**
4. Copy the `cw_...` key
5. Set up team policies at **Settings > MCP Teams**
