Metadata-Version: 2.4
Name: inferenceiq
Version: 0.1.0
Summary: InferenceIQ - AI inference cost optimization. Drop-in replacement for OpenAI SDK.
Home-page: https://github.com/awh233/inferenceiq
Author: Drew Hutton
Author-email: awh233@gmail.com
Keywords: ai llm inference optimization cost openai anthropic
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: httpx>=0.24.0
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-asyncio; extra == "dev"
Requires-Dist: respx; extra == "dev"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# InferenceIQ Python SDK

Drop-in replacement for the OpenAI Python SDK that automatically optimizes every inference request — cutting costs by up to 40% with intelligent model routing, semantic caching, and prompt compression.

## Installation

```bash
pip install inferenceiq
```

## Quick Start

```python
from inferenceiq import InferenceIQ

# Drop-in OpenAI replacement
client = InferenceIQ(api_key="iq-live_...")

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)
```

## Features

- **OpenAI-compatible**: Swap `openai.OpenAI()` for `InferenceIQ()` — same API
- **Automatic cost optimization**: Intelligent model routing saves 25-40%
- **Semantic caching**: Identical/similar requests served from cache
- **Quality guarantees**: Set minimum quality thresholds per request
- **Full observability**: Per-request cost attribution and savings tracking

## Native API

```python
# Use the native optimization API for more control
result = client.optimize(
    prompt="Summarize this document...",
    strategy="balanced",        # cost_optimized | balanced | quality_first | latency_optimized
    quality_threshold=0.85,
    max_cost_per_token=0.00003
)

print(f"Saved: ${result.savings:.4f}")
print(f"Provider: {result.provider_used}")
```

## Links

- **Dashboard**: https://inferenceiq.onrender.com
- **API Docs**: https://inferenceiq-api.onrender.com/docs
- **GitHub**: https://github.com/awh233/inferenceiq
