Metadata-Version: 2.4
Name: compress-lightreach
Version: 1.0.0
Summary: Intelligent compression algorithms for LLM prompts that reduce token usage
Home-page: https://compress.lightreach.io
Author: Light Reach
Author-email: Light Reach <jonathankt@lightreach.io>
License: MIT
Project-URL: Homepage, https://compress.lightreach.io
Project-URL: Documentation, https://compress.lightreach.io/docs
Project-URL: Source, https://github.com/lightreach/compress-lightreach
Project-URL: Bug Tracker, https://github.com/lightreach/compress-lightreach/issues
Keywords: llm,compression,prompt,token,optimization,ai
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: General
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: tiktoken>=0.5.0
Requires-Dist: requests>=2.31.0
Requires-Dist: urllib3>=2.0.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: tzdata>=2023.3
Provides-Extra: api
Requires-Dist: fastapi>=0.104.0; extra == "api"
Requires-Dist: uvicorn[standard]>=0.24.0; extra == "api"
Requires-Dist: pydantic>=2.0.0; extra == "api"
Requires-Dist: pydantic-settings>=2.0.0; extra == "api"
Requires-Dist: python-multipart>=0.0.6; extra == "api"
Requires-Dist: slowapi>=0.1.9; extra == "api"
Requires-Dist: httpx>=0.25.0; extra == "api"
Requires-Dist: sqlalchemy>=2.0.0; extra == "api"
Requires-Dist: psycopg2-binary>=2.9.0; extra == "api"
Requires-Dist: bcrypt>=4.0.0; extra == "api"
Requires-Dist: pyjwt>=2.8.0; extra == "api"
Requires-Dist: alembic>=1.12.0; extra == "api"
Requires-Dist: stripe>=7.0.0; extra == "api"
Requires-Dist: cryptography>=42.0.0; extra == "api"
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# Compress Light Reach

**Intelligent compression algorithms for LLM prompts that reduce token usage**

[![PyPI version](https://badge.fury.io/py/compress-lightreach.svg)](https://badge.fury.io/py/compress-lightreach)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

Compress Light Reach is a Python library that intelligently compresses LLM prompts by replacing repeated substrings with shorter placeholders, significantly reducing token usage and costs while maintaining perfect decompression.

## Features

- **Token-aware compression**: Only replaces substrings >1 token with 1-token placeholders
- **Dual algorithms**: 
  - Fast greedy (~99% optimal) for daily use
  - Optimal DP (O(n²)) for critical prompts
- **Lossless**: Perfect decompression guaranteed
- **Output compression**: Optional model output compression support
- **Cloud API**: Uses Light Reach's cloud service for compression
- **Model-aware**: Optimized for GPT-4, GPT-3.5-turbo, Claude, and more
- **Intelligent Routing**: Automatic model selection based on quality requirements

## Installation

```bash
pip install compress-lightreach
```

## Quick Start (v1.0.0)

The SDK uses **intelligent model routing** and targets `POST /api/v2/complete`.

- Authenticate with your **LightReach API key** (env var `PCOMPRESLR_API_KEY`)
- Manage **provider keys** (OpenAI/Anthropic/Google) in the dashboard (BYOK)
- System automatically selects optimal model based on your requirements

```python
from pcompresslr import PcompresslrAPIClient

client = PcompresslrAPIClient(api_key="your-lightreach-api-key")

result = client.complete(
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in simple terms."},
    ],
    desired_hle=30,  # Quality preference (0-40, where 40 is SOTA)
)

print(result["decompressed_response"])
print(f"Selected: {result['routing_info']['selected_model']}")
print(f"Token savings: {result['compression_stats']['token_savings']}")
```

### With Output Compression

```python
result = client.complete(
    messages=[{"role": "user", "content": "Generate a long report..."}],
    desired_hle=25,
    compress_output=True,
)

print(result["decompressed_response"])
```

### Intelligent Model Routing (v1.0.0)

The system automatically selects the optimal model based on quality requirements and your available provider keys:

```python
from pcompresslr import PcompresslrAPIClient

client = PcompresslrAPIClient(api_key="your-lightreach-api-key")

# Cross-provider optimization: system picks cheapest model meeting your quality bar
result = client.complete(
    messages=[{"role": "user", "content": "Explain quantum computing"}],
    desired_hle=30,  # Quality preference (0-40, where 40 is SOTA)
)

# Check what was selected
print(result["routing_info"]["selected_model"])      # e.g., "gpt-4o-mini"
print(result["routing_info"]["selected_provider"])   # e.g., "openai"
print(result["routing_info"]["model_hle"])           # e.g., 32.5
print(result["routing_info"]["model_price_per_million"])  # e.g., 0.15
```

### Provider-Constrained Routing

Optionally constrain to a specific provider:

```python
# Only use OpenAI models, but pick the cheapest one meeting HLE 35
result = client.complete(
    messages=[{"role": "user", "content": "Write a poem"}],
    llm_provider="openai",  # Optional: constrain to one provider
    desired_hle=35,
)
```

### HLE Cascading with Admin Controls

Admins can set quality **ceilings** via the dashboard (global or per-tag) to control costs. Your `desired_hle` is a preference, but requests will error if they exceed the admin-set ceiling:

```python
# Admin set global HLE ceiling to 30%
# Requesting above the ceiling will error
try:
    result = client.complete(
        messages=[{"role": "user", "content": "Process payment"}],
        desired_hle=35,  # ❌ ERROR: exceeds ceiling of 30
        tags={"env": "production"},
    )
except APIRequestError as e:
    print(f"Error: {e}")  # "Requested HLE 35% exceeds workspace maximum of 30%"

# Correct usage: request within ceiling
result = client.complete(
    messages=[{"role": "user", "content": "Process payment"}],
    desired_hle=25,  # ✅ OK: below ceiling of 30
    tags={"env": "production"},
)

# Check if your HLE was lowered by admin ceiling
if result["routing_info"]["hle_clamped"]:
    print(f"HLE lowered from {result['routing_info']['requested_hle']} "
          f"to {result['routing_info']['effective_hle']} "
          f"by {result['routing_info']['hle_source']}-level ceiling")
```

**HLE Ceiling Logic:**
- `effective_hle = min(desired_hle, tag_hle, global_hle)` - most restrictive ceiling wins
- Lower ceiling = force cheaper models (better cost control)
- Engineers get errors if requesting above ceiling
- Tag-level ceilings can override global ceiling (lowest wins)

### Command Line Interface

```bash
# Set your API key
export PCOMPRESLR_API_KEY=your-api-key

# Compress a prompt
pcompresslr "Your prompt with repeated text here..."

# Use optimal algorithm only
pcompresslr "Your prompt here" --optimal-only

# Use greedy algorithm only
pcompresslr "Your prompt here" --greedy-only
```

## API Reference

### `PcompresslrAPIClient`

Main API client for intelligent model routing and compression.

#### Constructor Parameters

- `api_key` (str, optional): LightReach API key. If not provided, checks `PCOMPRESLR_API_KEY`.
- `api_url` (str, optional): Override base API URL (advanced/testing).
- `timeout` (int): Request timeout in seconds (default: 120).

#### Methods

##### `complete(messages, ...)`

Messages-first completion with intelligent routing (POST `/api/v2/complete`).

**Parameters:**
- `messages` (required): Conversation history as list of dicts with `role` and `content`
- `llm_provider` (optional): Provider constraint (`"openai"`, `"anthropic"`, `"google"`, etc.). Omit for cross-provider optimization.
- `desired_hle` (optional): Quality preference (0-40, where 40 is SOTA). Must not exceed admin's global/tag-level ceilings (request will error if it does).
- `tags` (optional): Dict of tags for cost attribution and tag-level HLE ceilings
- `compress` (optional): Whether to compress messages (default: `True`)
- `compress_output` (optional): Whether to request compressed output from LLM (default: `False`)
- `algorithm` (optional): Compression algorithm (`"greedy"` or `"optimal"`, default: `"greedy"`)
- `temperature` (optional): LLM temperature parameter
- `max_tokens` (optional): Maximum tokens to generate
- `compression_config` (optional): Per-role compression settings
- `max_history_messages` (optional): Limit conversation history length

**Response includes:**
- `decompressed_response`: Final decompressed LLM response
- `routing_info`: Details about model selection:
  - `selected_model`: Model chosen by system
  - `selected_provider`: Provider chosen by system
  - `model_hle`: HLE score of selected model
  - `effective_hle`: Effective HLE after applying admin ceilings (min of desired/tag/global)
  - `hle_source`: Which ceiling was applied: `"request"`, `"tag"`, `"global"`, or `"none"`
  - `hle_clamped`: `True` if admin ceiling lowered your requested HLE
- `compression_stats`: Token savings statistics
- `llm_stats`: Token usage from the LLM
- `warnings`: List of any warnings

##### `compress(prompt, model, algorithm, tags)`

Compression-only (POST `/api/v1/compress`).

##### `decompress(llm_format)`

Decompress an LLM-formatted compressed prompt (POST `/api/v1/decompress`).

##### `health_check()`

Check API health status (GET `/health`).

### Environment Variables

- `PCOMPRESLR_API_KEY` (or `LIGHTREACH_API_KEY`): Your LightReach API key.
- `PCOMPRESLR_API_URL`: Override the API base URL (advanced/testing).

### Exceptions

- `APIKeyError`: Raised when API key is invalid or missing
- `RateLimitError`: Raised when rate limit is exceeded
- `APIRequestError`: Raised for general API errors (including routing failures)
- `PcompresslrAPIError`: Base exception class

## How It Works

Compress Light Reach uses intelligent algorithms to identify repeated substrings in your prompts and replace them with shorter placeholders.

The library:
1. Identifies repeated substrings using efficient suffix array algorithms
2. Calculates token savings for each potential replacement
3. Selects optimal replacements that reduce total token count
4. Intelligently routes to the best model based on your quality requirements
5. Formats the result for easy LLM consumption
6. Provides perfect decompression

## Examples

### Example: Using `complete()` (Recommended)

```python
from pcompresslr import PcompresslrAPIClient

client = PcompresslrAPIClient(api_key="your-lightreach-api-key")

result = client.complete(
    messages=[
        {"role": "system", "content": "You are a creative writing assistant."},
        {"role": "user", "content": "Write a story about a cat, a dog, and a bird."},
    ],
    desired_hle=30,
    compression_config={"compress_user": True, "compress_only_last_n_user": 1},
)

print(result["decompressed_response"])
print(f"Model used: {result['routing_info']['selected_model']}")
print(f"Token savings: {result['compression_stats']['token_savings']}")
```

### Example 2: Complete with Output Compression

```python
from pcompresslr import PcompresslrAPIClient

client = PcompresslrAPIClient(api_key="your-lightreach-api-key")

result = client.complete(
    messages=[{"role": "user", "content": "Generate a long report with repeated sections..."}],
    desired_hle=35,
    compress_output=True,
)

print(result["decompressed_response"])
```

## Getting an API Key

To use Compress Light Reach, you need an API key from [compress.lightreach.io](https://compress.lightreach.io).

1. Visit [compress.lightreach.io](https://compress.lightreach.io)
2. Sign up for an account
3. Get your API key from the dashboard
4. Set it as an environment variable: `export PCOMPRESLR_API_KEY=your-key`

## Security & Privacy

**BYOK model:** Provider keys (OpenAI/Anthropic/Google) are managed in the dashboard and **never passed through this SDK**. The SDK only uses your LightReach API key for authentication with the service.

### BYOK Provider Key Encryption (Required for Dashboard Settings → Provider Keys)

Provider keys are encrypted at rest using **Fernet** (symmetric authenticated encryption). The backend requires a Fernet key via:

- `API_KEY_ENCRYPTION_KEY`

Generate a key:

```bash
python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
```

Set it in your runtime environment (examples):

- **Docker Compose**: set `API_KEY_ENCRYPTION_KEY` in your shell or `.env` before running `docker compose up`
- **GitHub Actions**: store the value as a GitHub Secret, then map it to the environment variable `API_KEY_ENCRYPTION_KEY` in your deploy workflow

## Requirements

- Python 3.8+
- tiktoken >= 0.5.0
- requests >= 2.31.0
- urllib3 >= 2.0.0

## License

MIT License - see [LICENSE](LICENSE) file for details.

## Support

- Documentation: [compress.lightreach.io/docs](https://compress.lightreach.io/docs)
- Issues: [GitHub Issues](https://github.com/lightreach/compress-lightreach/issues)
- Email: jonathankt@lightreach.io

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.
