Metadata-Version: 2.4
Name: costwise-mcp
Version: 0.1.0
Summary: CostWise Managed Cost Policy SDK — automatically reduce AI costs without changing your workflow
Author-email: CostWise <sdk@cost-wise.dev>
License: MIT
Project-URL: Homepage, https://cost-wise.dev
Project-URL: Documentation, https://cost-wise.dev/docs/mcp-sdk
Project-URL: Repository, https://github.com/costwise/mcp-sdk
Keywords: ai,cost-optimization,llm,finops,openai,anthropic
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: tiktoken>=0.5.0
Provides-Extra: telemetry
Requires-Dist: httpx>=0.25.0; extra == "telemetry"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21; extra == "dev"

# CostWise MCP SDK

**Automatically reduce AI costs without changing your workflow.**

The CostWise Managed Cost Policy (MCP) SDK analyzes your LLM requests, classifies task complexity, and recommends cheaper models and token limits — saving up to 90% on AI costs.

## Install

```bash
pip install costwise-mcp
```

## Quick Start

```python
from costwise_mcp import CostPolicy

policy = CostPolicy(api_key="cw_your_api_key")

# Before calling your LLM, optimize the request
decision = policy.optimize(
    prompt="Translate 'hello' to French",
    model="gpt-4",
)

print(decision.recommended_model)  # "gpt-4o-mini" (90% cheaper)
print(decision.max_tokens)         # 256
print(decision.estimated_cost)     # $0.000045
print(decision.savings_pct)        # 97.0%

# Call your LLM with the optimized params
import openai
response = openai.chat.completions.create(
    model=decision.recommended_model,
    max_tokens=decision.max_tokens,
    messages=[{"role": "user", "content": "Translate 'hello' to French"}],
)

# Report actual usage (async, non-blocking, optional)
policy.report(decision, actual_tokens=response.usage.total_tokens)
```

## How It Works

```
Your Code                    CostWise MCP SDK              LLM Provider
    |                              |                            |
    |  optimize(prompt, model)     |                            |
    |----------------------------->|                            |
    |                              |  1. Estimate tokens        |
    |                              |  2. Classify complexity    |
    |                              |  3. Select cheaper model   |
    |                              |  4. Set max_tokens         |
    |                              |  5. Estimate cost          |
    |  <-- Decision object --------|                            |
    |                              |                            |
    |  openai.create(model=decision.recommended_model) -------->|
    |  <-- LLM response ----------------------------------------|
    |                              |                            |
    |  report(decision, tokens)    |                            |
    |----------------------------->|  (async, background)       |
    |                              |  --> CostWise Dashboard    |
```

**Key principle:** The SDK never sees or proxies your LLM calls. It only recommends params. You call the LLM directly with your own API keys.

## Features

### Automatic Model Downgrade
```python
# GPT-4 → GPT-4o-mini for simple tasks (saves ~97%)
decision = policy.optimize("What is 2+2?", model="gpt-4")
assert decision.recommended_model == "gpt-4o-mini"

# Claude Opus → Claude Haiku for simple tasks (saves ~95%)
decision = policy.optimize("Translate this", model="claude-3-opus-20240229")
assert decision.recommended_model == "claude-3.5-haiku-20241022"

# Complex tasks keep the original model
decision = policy.optimize(long_analysis_prompt, model="gpt-4")
assert decision.recommended_model == "gpt-4"
```

### Token Limiting
```python
# Simple tasks: max 256 tokens
# Medium tasks: max 1024 tokens
# Complex tasks: max 4096 tokens
```

### Cost Estimation
```python
decision = policy.optimize(prompt, model="gpt-4")
print(f"Original cost: ${decision.original_estimated_cost:.6f}")
print(f"Optimized cost: ${decision.estimated_cost:.6f}")
print(f"Savings: ${decision.estimated_savings:.6f} ({decision.savings_pct:.0f}%)")
```

### Budget Enforcement
```python
decision = policy.optimize(
    prompt="Write a very long essay...",
    model="gpt-4",
    max_budget=0.01,  # $0.01 max
)
if not decision.allowed:
    print(decision.message)  # "Estimated cost exceeds budget"
```

### Custom Configuration
```python
from costwise_mcp import CostPolicy, PolicyConfig

policy = CostPolicy(
    api_key="cw_...",
    config=PolicyConfig(
        max_tokens_simple=128,
        max_tokens_medium=512,
        max_tokens_complex=2048,
        daily_budget_usd=10.0,
        auto_downgrade=True,
        blocked_models=["gpt-4", "claude-3-opus-20240229"],  # Force cheaper models
    ),
    project_id="my-chatbot",
)
```

### Works with Any Provider
```python
# OpenAI
decision = policy.optimize(prompt, model="gpt-4o")

# Anthropic
decision = policy.optimize(prompt, model="claude-3.5-sonnet-20241022")

# Google
decision = policy.optimize(prompt, model="gemini-2.5-pro")

# Mistral
decision = policy.optimize(prompt, model="mistral-large-latest")

# DeepSeek
decision = policy.optimize(prompt, model="deepseek-chat")
```

## Supported Models

| Provider | Models | Auto-downgrade target |
|----------|--------|----------------------|
| OpenAI | gpt-4, gpt-4-turbo, gpt-4o, gpt-4o-mini, gpt-4.1, o1, o3-mini | gpt-4o-mini, gpt-4.1-nano |
| Anthropic | claude-3-opus, claude-3.5-sonnet, claude-3.5-haiku, claude-sonnet-4, claude-opus-4 | claude-3.5-haiku |
| Google | gemini-2.5-pro, gemini-2.5-flash, gemini-1.5-pro, gemini-1.5-flash | gemini-2.5-flash |
| Mistral | mistral-large, mistral-medium, mistral-small, open-mistral-nemo | open-mistral-nemo |
| Cohere | command-r-plus, command-r, command-light | command-light |
| DeepSeek | deepseek-chat, deepseek-reasoner | deepseek-chat |

## Privacy

- Prompts are **never** sent to the CostWise backend
- Only metadata is reported: token counts, model name, cost estimates
- Telemetry is optional and can be disabled:
  ```python
  policy = CostPolicy(config=PolicyConfig(telemetry_enabled=False))
  ```

## API Reference

### `CostPolicy(api_key, backend_url, config, project_id)`
Main SDK class. Create one instance per application.

### `policy.optimize(prompt, model, task_type, max_budget) → Decision`
Analyze a prompt and return optimization recommendations. **Local-only, no network calls.**

### `policy.report(decision, actual_tokens, output_tokens, latency_ms)`
Report actual usage after LLM call. **Async, non-blocking.**

### `Decision`
| Field | Type | Description |
|-------|------|-------------|
| `recommended_model` | str | The model you should use |
| `max_tokens` | int | Maximum output tokens |
| `estimated_cost` | float | Estimated cost in USD |
| `original_model` | str | The model you requested |
| `input_tokens` | int | Estimated input token count |
| `complexity` | TaskComplexity | simple, medium, or complex |
| `allowed` | bool | Whether the request is within budget |
| `savings_pct` | float | Percentage saved vs original model |
| `estimated_savings` | float | USD saved vs original model |
| `message` | str | Human-readable recommendation |

## Get Your API Key

1. Sign in to [CostWise](https://app.cost-wise.dev)
2. Go to your organization → **Settings** → **MCP SDK**
3. Click **Generate API Key**
4. Copy the key and use it in `CostPolicy(api_key="cw_...")`
