Metadata-Version: 2.4
Name: compress-lightreach
Version: 1.0.2
Summary: AI cost management SDK with intelligent model routing, prompt compression, and real-time token tracking
Home-page: https://compress.lightreach.io
Author: Light Reach
Author-email: Light Reach <jonathankt@lightreach.io>
License: MIT
Project-URL: Homepage, https://compress.lightreach.io
Project-URL: Documentation, https://compress.lightreach.io/docs
Project-URL: Source, https://github.com/lightreach/compress-lightreach
Project-URL: Bug Tracker, https://github.com/lightreach/compress-lightreach/issues
Keywords: llm,ai,model-routing,cost-management,compression,token-tracking,openai,anthropic,optimization
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: General
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: tiktoken>=0.5.0
Requires-Dist: requests>=2.31.0
Requires-Dist: urllib3>=2.0.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: tzdata>=2023.3
Provides-Extra: api
Requires-Dist: fastapi>=0.104.0; extra == "api"
Requires-Dist: uvicorn[standard]>=0.24.0; extra == "api"
Requires-Dist: pydantic>=2.0.0; extra == "api"
Requires-Dist: pydantic-settings>=2.0.0; extra == "api"
Requires-Dist: python-multipart>=0.0.6; extra == "api"
Requires-Dist: slowapi>=0.1.9; extra == "api"
Requires-Dist: httpx>=0.25.0; extra == "api"
Requires-Dist: sqlalchemy>=2.0.0; extra == "api"
Requires-Dist: psycopg2-binary>=2.9.0; extra == "api"
Requires-Dist: bcrypt>=4.0.0; extra == "api"
Requires-Dist: pyjwt>=2.8.0; extra == "api"
Requires-Dist: alembic>=1.12.0; extra == "api"
Requires-Dist: stripe>=7.0.0; extra == "api"
Requires-Dist: cryptography>=42.0.0; extra == "api"
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# Compress Light Reach

**AI cost management SDK with intelligent model routing, prompt compression, and real-time token tracking**

[![PyPI version](https://badge.fury.io/py/compress-lightreach.svg)](https://badge.fury.io/py/compress-lightreach)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

Compress Light Reach is a Python SDK that provides intelligent model routing and prompt compression for LLM applications, reducing token usage and costs while maintaining quality.

## Features

- **Intelligent Model Routing**: Automatically selects optimal model based on quality requirements (HLE) and available provider keys
- **Token-aware Compression**: Replaces repeated substrings with shorter placeholders
- **Dual Algorithms**: 
  - Fast greedy (~99% optimal) for daily use
  - Optimal DP (O(n²)) for critical prompts
- **Lossless**: Perfect decompression guaranteed
- **Output Compression**: Optional model output compression support
- **Cloud API**: Uses Light Reach's cloud service for compression and routing
- **Multi-provider Support**: OpenAI, Anthropic, Google, DeepSeek, Moonshot
- **BYOK**: Provider API keys managed securely in dashboard (never passed through SDK)

## Installation

```bash
pip install compress-lightreach
```

## Quick Start

The SDK uses **intelligent model routing** and targets `POST /api/v2/complete`.

- Authenticate with your **LightReach API key** (env var `PCOMPRESLR_API_KEY` or `LIGHTREACH_API_KEY`)
- Manage **provider keys** (OpenAI/Anthropic/Google/etc.) in the dashboard (BYOK)
- System automatically selects optimal model based on your requirements

```python
from pcompresslr import PcompresslrAPIClient

client = PcompresslrAPIClient(api_key="your-lightreach-api-key")

result = client.complete(
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in simple terms."},
    ],
    desired_hle=30,  # Quality preference (0-40, where 40 is SOTA)
)

print(result["decompressed_response"])
print(f"Selected: {result['routing_info']['selected_model']}")
print(f"Token savings: {result['compression_stats']['token_savings']}")
```

### With Output Compression

```python
result = client.complete(
    messages=[{"role": "user", "content": "Generate a long report..."}],
    desired_hle=25,
    compress_output=True,
)

print(result["decompressed_response"])
```

### Intelligent Model Routing

The system automatically selects the optimal model based on quality requirements and your available provider keys:

```python
from pcompresslr import PcompresslrAPIClient

client = PcompresslrAPIClient(api_key="your-lightreach-api-key")

# Cross-provider optimization: system picks cheapest model meeting your quality bar
result = client.complete(
    messages=[{"role": "user", "content": "Explain quantum computing"}],
    desired_hle=30,  # Quality preference (0-40, where 40 is SOTA)
)

# Check what was selected
print(result["routing_info"]["selected_model"])           # e.g., "gpt-4o-mini"
print(result["routing_info"]["selected_provider"])        # e.g., "openai"
print(result["routing_info"]["model_hle"])                # e.g., 32.5
print(result["routing_info"]["model_price_per_million"])  # e.g., 0.15
```

### Provider-Constrained Routing

Optionally constrain to a specific provider:

```python
# Only use OpenAI models, but pick the cheapest one meeting HLE 35
result = client.complete(
    messages=[{"role": "user", "content": "Write a poem"}],
    llm_provider="openai",  # Optional: constrain to one provider
    desired_hle=35,
)
```

### HLE Cascading with Admin Controls

Admins can set quality **ceilings** via the dashboard (global or per-tag) to control costs. Your `desired_hle` is a preference, but requests will error if they exceed the admin-set ceiling:

```python
from pcompresslr import PcompresslrAPIClient, APIRequestError

client = PcompresslrAPIClient(api_key="your-lightreach-api-key")

# Admin set global HLE ceiling to 30%
# Requesting above the ceiling will error
try:
    result = client.complete(
        messages=[{"role": "user", "content": "Process payment"}],
        desired_hle=35,  # ERROR: exceeds ceiling of 30
        tags={"env": "production"},
    )
except APIRequestError as e:
    print(f"Error: {e}")  # "Requested HLE 35% exceeds workspace maximum of 30%"

# Correct usage: request within ceiling
result = client.complete(
    messages=[{"role": "user", "content": "Process payment"}],
    desired_hle=25,  # OK: below ceiling of 30
    tags={"env": "production"},
)

# Check if your HLE was lowered by admin ceiling
if result["routing_info"]["hle_clamped"]:
    print(f"HLE lowered from {result['routing_info']['requested_hle']} "
          f"to {result['routing_info']['effective_hle']} "
          f"by {result['routing_info']['hle_source']}-level ceiling")
```

**HLE Ceiling Logic:**
- `effective_hle = min(desired_hle, tag_hle, global_hle)` - most restrictive ceiling wins
- Lower ceiling = force cheaper models (better cost control)
- Engineers get errors if requesting above ceiling
- Tag-level ceilings can override global ceiling (lowest wins)

### Using Message and CompressionConfig Classes

For more structured code, use the `Message` and `CompressionConfig` dataclasses:

```python
from pcompresslr import PcompresslrAPIClient, Message, CompressionConfig

client = PcompresslrAPIClient(api_key="your-lightreach-api-key")

# Using Message dataclass
result = client.complete(
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"},
    ],
    desired_hle=30,
    compress=True,
    compress_output=False,
    compression_config={
        "compress_system": False,
        "compress_user": True,
        "compress_assistant": False,
        "compress_only_last_n_user": 1,
    },
    temperature=0.7,
    max_tokens=1000,
    tags={"env": "production"},
)

print(result["decompressed_response"])
print(f"Model used: {result['routing_info']['selected_model']}")
```

> **Note:** The `LightReach` wrapper class is available for backwards compatibility but uses deprecated parameters (`default_model`, `default_provider`). Use `PcompresslrAPIClient` directly for v1.0.0 intelligent routing.

### Compression Only (No LLM Call)

```python
from pcompresslr import PcompresslrAPIClient

client = PcompresslrAPIClient(api_key="your-lightreach-api-key")

# Compress text without making an LLM call
compressed = client.compress(
    prompt="Your text with repeated content here...",
    model="gpt-4",      # Model for tokenization
    algorithm="greedy", # 'greedy' or 'optimal'
    tags={"env": "dev"} # Optional tags
)

print(compressed["llm_format"])
print(f"Compression ratio: {compressed['compression_ratio']:.2%}")

# Decompress later
decompressed = client.decompress(compressed["llm_format"])
print(decompressed["decompressed"])
```

### Command Line Interface

```bash
# Set your API key
export PCOMPRESLR_API_KEY=your-api-key

# Compress a prompt
pcompresslr "Your prompt with repeated text here..."

# Use optimal algorithm only
pcompresslr "Your prompt here" --optimal-only

# Use greedy algorithm only
pcompresslr "Your prompt here" --greedy-only
```

## API Reference

### `PcompresslrAPIClient`

Main API client for intelligent model routing and compression.

#### Constructor

```python
PcompresslrAPIClient(
    api_key: str = None,  # Falls back to env vars
    api_url: str = None,  # Default: https://api.compress.lightreach.io
    timeout: int = 120    # Request timeout in seconds
)
```

**Parameters:**
- `api_key` (str, optional): LightReach API key. Falls back to `LIGHTREACH_API_KEY` or `PCOMPRESLR_API_KEY` env vars.
- `api_url` (str, optional): Override base API URL. Falls back to `PCOMPRESLR_API_URL` env var.
- `timeout` (int, optional): Request timeout in seconds. Default: `120` (2 minutes for LLM calls).

#### Methods

##### `complete(messages, ...)`

Messages-first completion with intelligent routing (POST `/api/v2/complete`).

**Parameters:**

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `messages` | `list[dict]` | required | Conversation history with `role` and `content` |
| `llm_provider` | `str` | `None` | Provider constraint: `"openai"`, `"anthropic"`, `"google"`, `"deepseek"`, `"moonshot"`. Omit for cross-provider optimization |
| `desired_hle` | `float` | `None` | Quality preference (0-40, where 40 is SOTA). Must not exceed admin ceilings |
| `compress` | `bool` | `True` | Whether to compress messages |
| `compress_output` | `bool` | `False` | Whether to request compressed output from LLM |
| `algorithm` | `str` | `"greedy"` | Compression algorithm: `"greedy"` or `"optimal"` |
| `compression_config` | `dict` | `None` | Per-role compression settings (see below) |
| `temperature` | `float` | `None` | LLM temperature parameter |
| `max_tokens` | `int` | `None` | Maximum tokens to generate |
| `tags` | `dict[str, str]` | `None` | Tags for cost attribution and tag-level HLE ceilings |
| `max_history_messages` | `int` | `None` | Limit conversation history length |

**`compression_config` options:**

```python
{
    "compress_system": False,         # default
    "compress_user": True,            # default
    "compress_assistant": False,      # default
    "compress_only_last_n_user": 1,   # default (None = compress all)
}
```

**Response (dict):**

```python
{
    "decompressed_response": str,     # Final decompressed LLM response
    "compression_stats": {
        "original_size_chars": int,
        "compressed_size_chars": int,
        "original_tokens": int,
        "compressed_tokens": int,
        "compression_ratio": float,
        "token_savings": int,
        "token_savings_percent": float,
        "processing_time_ms": float,
    },
    "llm_stats": {
        "prompt_tokens": int,
        "completion_tokens": int,
        "total_tokens": int,
    },
    "routing_info": {
        "selected_model": str,          # Model chosen by system
        "selected_provider": str,       # Provider chosen by system
        "model_hle": float,             # HLE score of selected model
        "model_price_per_million": float,
        "requested_hle": float | None,
        "effective_hle": float | None,  # Effective HLE after admin ceilings
        "hle_source": str,              # "request", "tag", "global", or "none"
        "hle_clamped": bool,            # True if admin ceiling lowered desired_hle
    },
    "warnings": list[str],
    "cost_estimate": float | None,
    "savings_estimate": float | None,
}
```

**Deprecated parameters** (ignored in v1.0.0):
- `model`: System now selects models automatically
- `hle_target_percent`: Use `desired_hle` instead
- `min_hle_score`: Use `desired_hle` instead
- `auto_select_by_hle`: Always auto-selects now
- `same_provider_only`: Use `llm_provider` instead

##### `compress(prompt, model, algorithm, tags)`

Compression-only (POST `/api/v1/compress`).

**Parameters:**
- `prompt` (str, required): Text to compress
- `model` (str, optional): Model for tokenization. Default: `"gpt-4"`
- `algorithm` (str, optional): `"greedy"` or `"optimal"`. Default: `"greedy"`
- `tags` (dict, optional): Tags for attribution

**Response (dict):**

```python
{
    "compressed": str,
    "dictionary": dict[str, str],
    "llm_format": str,
    "compression_ratio": float,
    "original_size": int,
    "compressed_size": int,
    "processing_time_ms": float,
    "algorithm": str,
}
```

##### `decompress(llm_format)`

Decompress an LLM-formatted compressed prompt (POST `/api/v1/decompress`).

**Parameters:**
- `llm_format` (str, required): The `llm_format` string from a compress response

**Response (dict):**

```python
{
    "decompressed": str,
    "processing_time_ms": float,
}
```

##### `health_check()`

Check API health status (GET `/health`).

**Response (dict):**

```python
{
    "status": str,
    "version": str,
}
```

### `LightReach` Class (Legacy)

> **Deprecated:** This wrapper uses `default_model` and `default_provider` parameters which are ignored in v1.0.0. Use `PcompresslrAPIClient` directly for intelligent routing.

Available for backwards compatibility. The `Pcompresslr` class is an alias for `LightReach`.

### Data Classes

#### `Message`

```python
from pcompresslr import Message

msg = Message(role="user", content="Hello!")
msg.to_dict()  # {"role": "user", "content": "Hello!"}
```

**Roles:** `"system"`, `"developer"`, `"user"`, `"assistant"`

#### `CompressionConfig`

```python
from pcompresslr import CompressionConfig

config = CompressionConfig(
    compress_system=False,        # default
    compress_user=True,           # default
    compress_assistant=False,     # default
    compress_only_last_n_user=1,  # default (None = compress all)
)
config.to_dict()
```

### Environment Variables

| Variable | Description |
|----------|-------------|
| `PCOMPRESLR_API_KEY` | Your LightReach API key (primary) |
| `LIGHTREACH_API_KEY` | Your LightReach API key (alternative) |
| `PCOMPRESLR_API_URL` | Override the API base URL (advanced/testing) |

### Exceptions

| Exception | Description |
|-----------|-------------|
| `PcompresslrAPIError` | Base exception class |
| `APIKeyError` | Invalid or missing API key |
| `RateLimitError` | Rate limit exceeded |
| `APIRequestError` | General API errors (including routing failures, HLE ceiling exceeded) |

```python
from pcompresslr import (
    PcompresslrAPIClient,
    APIKeyError,
    RateLimitError,
    APIRequestError,
)

try:
    result = client.complete(messages=[...])
except APIKeyError as e:
    print("Invalid API key")
except RateLimitError as e:
    print("Rate limited, please retry later")
except APIRequestError as e:
    print(f"API error: {e}")
```

## How It Works

Compress Light Reach uses intelligent algorithms to identify repeated substrings in your prompts and replace them with shorter placeholders.

The library:
1. Identifies repeated substrings using efficient suffix array algorithms
2. Calculates token savings for each potential replacement
3. Selects optimal replacements that reduce total token count
4. Intelligently routes to the best model based on your quality requirements
5. Formats the result for easy LLM consumption
6. Provides perfect decompression

## Examples

### Example 1: Complete with Compression

```python
from pcompresslr import PcompresslrAPIClient

client = PcompresslrAPIClient(api_key="your-lightreach-api-key")

prompt = """
Write a story about a cat. The cat is very friendly. 
Write a story about a dog. The dog is very friendly.
Write a story about a bird. The bird is very friendly.
"""

result = client.complete(
    messages=[{"role": "user", "content": prompt}],
    desired_hle=30,
)

print(result["decompressed_response"])
print(f"Model used: {result['routing_info']['selected_model']}")
print(f"Token savings: {result['compression_stats']['token_savings']} tokens")
print(f"Compression ratio: {result['compression_stats']['compression_ratio']:.2%}")
```

### Example 2: Multi-turn Conversation

```python
from pcompresslr import PcompresslrAPIClient

client = PcompresslrAPIClient(api_key="your-lightreach-api-key")

result = client.complete(
    messages=[
        {"role": "system", "content": "You are a helpful coding assistant."},
        {"role": "user", "content": "How do I read a file in Python?"},
        {"role": "assistant", "content": "You can use open() with a context manager..."},
        {"role": "user", "content": "How about writing to a file?"},
    ],
    desired_hle=30,
    compression_config={
        "compress_system": False,
        "compress_user": True,
        "compress_assistant": False,
        "compress_only_last_n_user": 2,  # Only compress last 2 user messages
    },
)
```

### Example 3: Provider-Constrained Request

```python
from pcompresslr import PcompresslrAPIClient

client = PcompresslrAPIClient(api_key="your-lightreach-api-key")

# Constrain to Anthropic models only
result = client.complete(
    messages=[
        {"role": "system", "content": "You are a creative writing assistant."},
        {"role": "user", "content": "Write a haiku about coding."},
    ],
    llm_provider="anthropic",  # Only use Anthropic models
    desired_hle=30,
)

print(result["decompressed_response"])
print(f"Model: {result['routing_info']['selected_model']}")  # e.g., claude-3-haiku
```

## Getting an API Key

To use Compress Light Reach, you need an API key from [compress.lightreach.io](https://compress.lightreach.io).

1. Visit [compress.lightreach.io](https://compress.lightreach.io)
2. Sign up for an account
3. Get your API key from the dashboard
4. Set it as an environment variable: `export PCOMPRESLR_API_KEY=your-key`

## Security & Privacy

**BYOK model:** Provider keys (OpenAI/Anthropic/Google/etc.) are managed in the dashboard and **never passed through this SDK**. The SDK only uses your LightReach API key for authentication with the service.

### BYOK Provider Key Encryption (Required for Dashboard Settings → Provider Keys)

Provider keys are encrypted at rest using **Fernet** (symmetric authenticated encryption). The backend requires a Fernet key via:

- `API_KEY_ENCRYPTION_KEY`

Generate a key:

```bash
python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
```

Set it in your runtime environment (examples):

- **Docker Compose**: set `API_KEY_ENCRYPTION_KEY` in your shell or `.env` before running `docker compose up`
- **GitHub Actions**: store the value as a GitHub Secret, then map it to the environment variable `API_KEY_ENCRYPTION_KEY` in your deploy workflow

## Requirements

- Python 3.8+
- tiktoken >= 0.5.0
- requests >= 2.31.0
- urllib3 >= 2.0.0
- python-dotenv >= 1.0.0

## License

MIT License - see [LICENSE](LICENSE) file for details.

## Support

- Documentation: [compress.lightreach.io/docs](https://compress.lightreach.io/docs)
- Issues: [GitHub Issues](https://github.com/lightreach/compress-lightreach/issues)
- Email: jonathankt@lightreach.io

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.
