Metadata-Version: 2.4
Name: compress-lightreach
Version: 1.0.8
Summary: AI cost management SDK with intelligent model routing, prompt compression, and real-time token tracking
Home-page: https://compress.lightreach.io
Author: Light Reach
Author-email: Light Reach <jonathankt@lightreach.io>
License: MIT
Project-URL: Homepage, https://compress.lightreach.io
Project-URL: Documentation, https://compress.lightreach.io/docs
Project-URL: Source, https://github.com/lightreach/compress-lightreach
Project-URL: Bug Tracker, https://github.com/lightreach/compress-lightreach/issues
Keywords: llm,ai,model-routing,cost-management,compression,token-tracking,openai,anthropic,optimization
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: General
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: tiktoken>=0.5.0
Requires-Dist: requests>=2.31.0
Requires-Dist: urllib3>=2.0.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: tzdata>=2023.3
Provides-Extra: api
Requires-Dist: fastapi>=0.104.0; extra == "api"
Requires-Dist: uvicorn[standard]>=0.24.0; extra == "api"
Requires-Dist: pydantic>=2.0.0; extra == "api"
Requires-Dist: pydantic-settings>=2.0.0; extra == "api"
Requires-Dist: python-multipart>=0.0.6; extra == "api"
Requires-Dist: slowapi>=0.1.9; extra == "api"
Requires-Dist: httpx>=0.25.0; extra == "api"
Requires-Dist: sqlalchemy>=2.0.0; extra == "api"
Requires-Dist: psycopg2-binary>=2.9.0; extra == "api"
Requires-Dist: bcrypt>=4.0.0; extra == "api"
Requires-Dist: pyjwt>=2.8.0; extra == "api"
Requires-Dist: alembic>=1.12.0; extra == "api"
Requires-Dist: stripe>=7.0.0; extra == "api"
Requires-Dist: cryptography>=42.0.0; extra == "api"
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# Compress Light Reach

**AI cost management SDK with intelligent model routing, prompt compression, and real-time token tracking**

[![PyPI version](https://badge.fury.io/py/compress-lightreach.svg)](https://badge.fury.io/py/compress-lightreach)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

Compress Light Reach is a Python SDK that provides intelligent model routing and prompt compression for LLM applications, reducing token usage and costs while maintaining quality.

## Features

- **Intelligent Model Routing**: Automatically selects the optimal model based on admin-configured quality settings and available provider keys
- **Token-aware Compression**: Replaces repeated substrings with shorter placeholders using a fast greedy algorithm
- **Lossless**: Perfect decompression guaranteed
- **Output Compression**: Optional model output compression support
- **Cloud API**: Uses Light Reach's cloud service for compression and routing
- **Multi-provider Support**: OpenAI, Anthropic, Google, DeepSeek, Moonshot
- **BYOK**: Provider API keys managed securely in dashboard (never passed through SDK)

## Installation

```bash
pip install compress-lightreach
```

## Quick Start

The SDK uses **intelligent model routing** and targets `POST /api/v2/complete`.

- Authenticate with your **LightReach API key** (env var `PCOMPRESLR_API_KEY` or `LIGHTREACH_API_KEY`)
- Manage **provider keys** (OpenAI/Anthropic/Google/etc.) in the dashboard (BYOK)
- System automatically selects the optimal model based on admin-configured quality settings

```python
from pcompresslr import PcompresslrAPIClient

client = PcompresslrAPIClient(api_key="your-lightreach-api-key")

result = client.complete(
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in simple terms."},
    ],
    tags={"team": "backend", "environment": "production"},
)

print(result["decompressed_response"])
print(f"Selected: {result['routing_info']['selected_model']}")
print(f"Token savings: {result['compression_stats']['token_savings']}")
```

## OpenAI-compatible API (Cursor / OpenAI SDKs)

LightReach also exposes a **strict OpenAI-compatible** surface (including streaming SSE) so you can use standard OpenAI tooling without changing your app.

- **Cursor base URL**: `https://api.compress.lightreach.io/v1/cursor`
- **Generic OpenAI-compatible base URL**: `https://api.compress.lightreach.io/v1`
- **Endpoints**: `GET /models`, `POST /chat/completions`
- **Model id**: `lightreach`

Example (cURL):

```bash
curl -sS https://api.compress.lightreach.io/v1/chat/completions \
  -H "Authorization: Bearer lr_your_lightreach_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "lightreach",
    "messages": [{"role":"user","content":"Say hello"}],
    "stream": true
  }'
```

## Tags

Tags provide **cost attribution** and enable **admin-controlled quality ceilings** per tag. The system supports three tag categories that you can set on requests:

| Tag Key | Description | Example Values |
|---------|-------------|----------------|
| `team` | Your team or group | `"backend"`, `"ml-platform"`, `"marketing"` |
| `environment` | Deployment environment | `"development"`, `"staging"`, `"production"` |
| `feature` | Feature or use case | `"search"`, `"chat"`, `"summarization"` |

Tags are validated server-side. Your workspace admin can configure allowed values for each tag category via the dashboard. If a tag value is not in the allowed list, the request may be warned or rejected depending on your workspace's enforcement mode.

```python
result = client.complete(
    messages=[{"role": "user", "content": "Summarize this document..."}],
    tags={
        "team": "backend",
        "environment": "production",
        "feature": "summarization",
    },
)
```

> **Note:** The `integration` tag is reserved for system use (e.g., Cursor, Claude Code) and should not be set manually. The `project` tag is also available for workspace-level project attribution — see your dashboard for configuration.

## Intelligent Model Routing

Model routing is fully managed by your workspace admin via the dashboard. The system uses **HLE (Humanity's Last Exam)** scores — a standardized benchmark — to determine model quality. Admins configure quality ceilings at three levels:

- **Global ceiling**: Set via the HLE slider in the dashboard. Applies to all requests.
- **Tag-level ceilings**: Set per tag (e.g., `environment=development` gets a lower ceiling to save costs).
- **Integration-level ceilings**: Set per integration (e.g., Cursor, Claude Code).

The routing engine picks the **cheapest model** whose HLE score meets the effective ceiling. HLE scores are maintained server-side and cannot be overridden by SDK callers.

```python
from pcompresslr import PcompresslrAPIClient

client = PcompresslrAPIClient(api_key="your-lightreach-api-key")

result = client.complete(
    messages=[{"role": "user", "content": "Explain quantum computing"}],
    tags={"team": "backend", "environment": "production"},
)

print(result["routing_info"]["selected_model"])           # e.g., "gpt-4o-mini"
print(result["routing_info"]["selected_provider"])        # e.g., "openai"
print(result["routing_info"]["model_hle"])                # e.g., 32.5
print(result["routing_info"]["model_price_per_million"])  # e.g., 0.15
```

### Routing Response

Every `complete()` response includes `routing_info` with full transparency into the routing decision:

```python
info = result["routing_info"]
print(f"Model: {info['selected_model']}")
print(f"Provider: {info['selected_provider']}")
print(f"Model HLE: {info['model_hle']}")
print(f"Effective HLE ceiling: {info['effective_hle']}")
print(f"Ceiling source: {info['hle_source']}")  # "tag", "global", or "none"
```

### Provider-Constrained Routing

Optionally constrain to a specific provider:

```python
result = client.complete(
    messages=[{"role": "user", "content": "Write a poem"}],
    llm_provider="anthropic",
)
```

### With Output Compression

```python
result = client.complete(
    messages=[{"role": "user", "content": "Generate a long report..."}],
    compress_output=True,
)

print(result["decompressed_response"])
```

### Using CompressionConfig

Control which message roles get compressed:

```python
result = client.complete(
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"},
    ],
    compress=True,
    compression_config={
        "compress_system": False,
        "compress_user": True,
        "compress_assistant": False,
        "compress_only_last_n_user": 1,
    },
    temperature=0.7,
    max_tokens=1000,
    tags={"team": "backend", "environment": "production"},
)
```

### Compression Only (No LLM Call)

```python
from pcompresslr import PcompresslrAPIClient

client = PcompresslrAPIClient(api_key="your-lightreach-api-key")

compressed = client.compress(
    prompt="Your text with repeated content here...",
    model="gpt-4",
    tags={"team": "backend"},
)

print(compressed["llm_format"])
print(f"Compression ratio: {compressed['compression_ratio']:.2%}")

# Decompress later
decompressed = client.decompress(compressed["llm_format"])
print(decompressed["decompressed"])
```

### Command Line Interface

```bash
export PCOMPRESLR_API_KEY=your-api-key

pcompresslr "Your prompt with repeated text here..."
```

## API Reference

### `PcompresslrAPIClient`

Main API client for intelligent model routing and compression.

#### Constructor

```python
PcompresslrAPIClient(
    api_key: str = None,  # Falls back to env vars
    api_url: str = None,  # Default: https://api.compress.lightreach.io
    timeout: int = 900    # Request timeout in seconds
)
```

**Parameters:**
- `api_key` (str, optional): LightReach API key. Falls back to `LIGHTREACH_API_KEY` or `PCOMPRESLR_API_KEY` env vars.
- `api_url` (str, optional): Override base API URL. Falls back to `PCOMPRESLR_API_URL` env var.
- `timeout` (int, optional): Request timeout in seconds. Default: `900` (15 minutes; `complete()` can include long upstream LLM calls).

#### Methods

##### `complete(messages, ...)`

Messages-first completion with intelligent routing (POST `/api/v2/complete`).

By default, uses async job processing (enqueue + poll) for production reliability. Pass `mode="sync"` for direct synchronous calls.

**Parameters:**

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `messages` | `list[dict]` | required | Conversation history with `role` and `content` |
| `llm_provider` | `str` | `None` | Provider constraint: `"openai"`, `"anthropic"`, `"google"`, `"deepseek"`, `"moonshot"`. Omit for cross-provider optimization |
| `compress` | `bool` | `True` | Whether to compress messages |
| `compress_output` | `bool` | `False` | Whether to request compressed output from LLM |
| `compression_config` | `dict` | `None` | Per-role compression settings (see below) |
| `temperature` | `float` | `None` | LLM temperature parameter |
| `max_tokens` | `int` | `None` | Maximum tokens to generate |
| `tags` | `dict[str, str]` | `None` | Tags for cost attribution and quality ceilings. Use `team`, `environment`, and/or `feature` keys |
| `max_history_messages` | `int` | `None` | Limit conversation history length |
| `mode` | `str` | `"async"` | `"async"` (job + poll, recommended) or `"sync"` (direct call) |
| `wait` | `bool` | `True` | Whether to wait for async job completion (only applies when `mode="async"`) |
| `poll_interval_s` | `float` | `1.0` | Poll interval for async jobs |
| `max_wait_s` | `float` | `None` | Max wait time for async jobs (defaults to `timeout`) |
| `idempotency_key` | `str` | `None` | Idempotency key for async job creation |

**`compression_config` options:**

```python
{
    "compress_system": False,         # default
    "compress_user": True,            # default
    "compress_assistant": False,      # default
    "compress_only_last_n_user": 1,   # default (None = compress all)
}
```

**Response (dict):**

```python
{
    "decompressed_response": str,     # Final decompressed LLM response
    "compression_stats": {
        "compression_enabled": bool,
        "original_tokens": int,
        "compressed_tokens": int,
        "token_savings": int,
        "compression_ratio": float,
        "token_count_exact": bool,
        "token_count_source": str,
        "token_accounting_note": str,
        "processing_time_ms": float | None,
    },
    "llm_stats": {
        "provider": str,
        "model": str,
        "input_tokens": int,
        "output_tokens": int,
        "total_tokens": int,
        "finish_reason": str,
    },
    "routing_info": {
        "selected_model": str,          # Model chosen by system
        "selected_provider": str,       # Provider chosen by system
        "model_hle": float,             # HLE score of selected model (server-computed)
        "model_price_per_million": float,
        "effective_hle": float | None,  # The quality ceiling that was applied
        "hle_source": str,              # "tag", "global", or "none"
    },
    "warnings": list[str],
    "cost_estimate": float | None,
    "savings_estimate": float | None,
}
```

##### `compress(prompt, model, tags)`

Compression-only (POST `/api/v1/compress`).

**Parameters:**
- `prompt` (str, required): Text to compress
- `model` (str, optional): Model for tokenization. Default: `"gpt-4"`
- `tags` (dict, optional): Tags for attribution

**Response (dict):**

```python
{
    "compressed": str,
    "dictionary": dict[str, str],
    "llm_format": str,
    "compression_ratio": float,
    "original_size": int,
    "compressed_size": int,
    "processing_time_ms": float,
    "algorithm": str,
}
```

##### `decompress(llm_format)`

Decompress an LLM-formatted compressed prompt (POST `/api/v1/decompress`).

**Parameters:**
- `llm_format` (str, required): The `llm_format` string from a compress response

**Response (dict):**

```python
{
    "decompressed": str,
    "processing_time_ms": float,
}
```

##### `health_check()`

Check API health status (GET `/health`).

**Response (dict):**

```python
{
    "status": str,
    "version": str,
}
```

### Data Classes

#### `Message`

```python
from pcompresslr import Message

msg = Message(role="user", content="Hello!")
msg.to_dict()  # {"role": "user", "content": "Hello!"}
```

**Roles:** `"system"`, `"developer"`, `"user"`, `"assistant"`

#### `CompressionConfig`

```python
from pcompresslr import CompressionConfig

config = CompressionConfig(
    compress_system=False,
    compress_user=True,
    compress_assistant=False,
    compress_only_last_n_user=1,
)
config.to_dict()
```

### Environment Variables

| Variable | Description |
|----------|-------------|
| `PCOMPRESLR_API_KEY` | Your LightReach API key (primary) |
| `LIGHTREACH_API_KEY` | Your LightReach API key (alternative) |
| `PCOMPRESLR_API_URL` | Override the API base URL (advanced/testing) |

### Exceptions

| Exception | Description |
|-----------|-------------|
| `PcompresslrAPIError` | Base exception class |
| `APIKeyError` | Invalid or missing API key |
| `RateLimitError` | Rate limit exceeded |
| `APIRequestError` | General API errors (including routing failures, tag validation errors) |

```python
from pcompresslr import (
    PcompresslrAPIClient,
    APIKeyError,
    RateLimitError,
    APIRequestError,
)

try:
    result = client.complete(messages=[...])
except APIKeyError as e:
    print("Invalid API key")
except RateLimitError as e:
    print("Rate limited, please retry later")
except APIRequestError as e:
    print(f"API error: {e}")
```

## How It Works

1. **Compression**: Identifies repeated substrings using efficient algorithms and replaces them with shorter placeholders, reducing token count
2. **Routing**: Selects the cheapest model that meets the admin-configured quality ceiling (global, tag-level, or integration-level)
3. **LLM Call**: Sends the compressed prompt to the selected model via your BYOK provider keys
4. **Decompression**: Losslessly restores the model's response if output compression was enabled

## Examples

### Example 1: Complete with Compression

```python
from pcompresslr import PcompresslrAPIClient

client = PcompresslrAPIClient(api_key="your-lightreach-api-key")

prompt = """
Write a story about a cat. The cat is very friendly. 
Write a story about a dog. The dog is very friendly.
Write a story about a bird. The bird is very friendly.
"""

result = client.complete(
    messages=[{"role": "user", "content": prompt}],
    tags={"team": "content", "environment": "production"},
)

print(result["decompressed_response"])
print(f"Model used: {result['routing_info']['selected_model']}")
print(f"Token savings: {result['compression_stats']['token_savings']} tokens")
print(f"Compression ratio: {result['compression_stats']['compression_ratio']:.2%}")
```

### Example 2: Multi-turn Conversation

```python
from pcompresslr import PcompresslrAPIClient

client = PcompresslrAPIClient(api_key="your-lightreach-api-key")

result = client.complete(
    messages=[
        {"role": "system", "content": "You are a helpful coding assistant."},
        {"role": "user", "content": "How do I read a file in Python?"},
        {"role": "assistant", "content": "You can use open() with a context manager..."},
        {"role": "user", "content": "How about writing to a file?"},
    ],
    compression_config={
        "compress_system": False,
        "compress_user": True,
        "compress_assistant": False,
        "compress_only_last_n_user": 2,
    },
    tags={"team": "engineering", "feature": "code-assistant"},
)
```

### Example 3: Provider-Constrained Request

```python
from pcompresslr import PcompresslrAPIClient

client = PcompresslrAPIClient(api_key="your-lightreach-api-key")

result = client.complete(
    messages=[
        {"role": "system", "content": "You are a creative writing assistant."},
        {"role": "user", "content": "Write a haiku about coding."},
    ],
    llm_provider="anthropic",
    tags={"team": "content", "environment": "staging"},
)

print(result["decompressed_response"])
print(f"Model: {result['routing_info']['selected_model']}")
```

## Getting an API Key

To use Compress Light Reach, you need an API key from [compress.lightreach.io](https://compress.lightreach.io).

1. Visit [compress.lightreach.io](https://compress.lightreach.io)
2. Sign up for an account
3. Get your API key from the dashboard
4. Set it as an environment variable: `export PCOMPRESLR_API_KEY=your-key`

## Security & Privacy

**BYOK model:** Provider keys (OpenAI/Anthropic/Google/etc.) are managed in the dashboard and **never passed through this SDK**. The SDK only uses your LightReach API key for authentication with the service.

## Requirements

- Python 3.10+
- tiktoken >= 0.5.0
- requests >= 2.31.0
- urllib3 >= 2.0.0
- python-dotenv >= 1.0.0

## License

MIT License - see [LICENSE](LICENSE) file for details.

## Support

- Documentation: [compress.lightreach.io/docs](https://compress.lightreach.io/docs)
- Issues: [GitHub Issues](https://github.com/lightreach/compress-lightreach/issues)
- Email: jonathankt@lightreach.io

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.
