Metadata-Version: 2.4
Name: caliper-sdk
Version: 0.0.0
Summary: Auto-instrumentation SDK for LLM API observability
Author: Oliver Guy
License-Expression: GPL-3.0-only
License-File: LICENSE
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries
Requires-Python: >=3.12
Requires-Dist: pydantic-settings>=2.0
Requires-Dist: pydantic>=2.0
Requires-Dist: structlog>=24.0
Requires-Dist: tenacity>=8.0
Provides-Extra: dev
Requires-Dist: anthropic>=0.30.0; extra == 'dev'
Requires-Dist: openai>=1.0.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.24.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.9.0; extra == 'dev'
Provides-Extra: s3
Requires-Dist: boto3>=1.35.0; extra == 's3'
Description-Content-Type: text/markdown

# Caliper Python SDK

The Caliper SDK auto-instruments the OpenAI and Anthropic SDKs for observability.

It captures token usage, latency, TTFT and any number of custom features from LLM SDK calls with zero code changes beyond a single `init()` call for basic metrics and only a single line to annotate a call with custom metrics.

## Install

```bash
pip install caliper-sdk              # auto-detects installed provider SDKs
pip install caliper-sdk[s3]          # S3 export support
```

## Quick start

```bash
make install-dev
```

```python
import caliper
import anthropic

caliper.init(target="dev")  # writes to caliper_records.jsonl

client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=128,
    messages=[{"role": "user", "content": "Hello"}],
)

caliper.shutdown()
```

## Configuration

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `target` | `str` | `"dev"` | Export handler: `"dev"` (local JSONL) or `"s3"` (S3 bucket) |
| `s3_bucket` | `str \| None` | `None` | Required for `"s3"` target. Also reads `CALIPER_S3_BUCKET` env var |
| `s3_access_key` | `str \| None` | `None` | S3 access key (optional if using IAM roles) |
| `s3_secret_key` | `str \| None` | `None` | S3 secret key (optional if using IAM roles) |
| `s3_region` | `str` | `us-east-1` | S3 region |
| `s3_prefix` | `str` | `""` | Key prefix for S3 objects |
| `s3_endpoint` | `str \| None` | `None` | Custom S3-compatible endpoint URL |
| `flush_interval` | `float` | `2.0` | Seconds between background flushes |
| `batch_size` | `int` | `250` | Max records per export call |
| `max_queue_size` | `int` | `10_000` | Backpressure limit — oldest records dropped |
| `max_retries` | `int` | `3` | Retry count for HTTP transport |
| `file_path` | `str` | `caliper_records.jsonl` | File path for `"dev"` target records |
| `annotations_file_path` | `str \| None` | `None` | File path for `"dev"` target annotations. Defaults to `caliper_annotations.jsonl` |
| `debug` | `bool` | `False` | Verbose logging |

All parameters can also be set via environment variables with `CALIPER_` prefix (e.g. `CALIPER_S3_BUCKET`).

## Metadata

### Context-scoped (block)

```python
with caliper.features(user_id="123", feature="chat"):
    client.messages.create(...)  # gets {user_id: "123", feature: "chat"}
```

### Per-request (kwarg)

```python
client.messages.create(
    ...,
    caliper_metadata={"campaign": "q4"},
)
```

Per-request metadata takes precedence over context metadata.

### Linking requests

Link LLM calls to a prior request to track multi-turn conversations, retry chains, etc.

```python
response = client.messages.create(...)
first_id = caliper.last_request_id()

with caliper.features(previous_request=first_id, feature="followup"):
    followup = client.messages.create(...)  # record gets linked_request_id=first_id
```

## Post-request annotations

Attach metadata to requests **after** they complete — user feedback, classification labels, eval scores, etc.

```python
# Get the ID of the most recent request
request_id = caliper.last_request_id()

# Annotate implicitly (uses last request)
caliper.annotate(sentiment="positive")

# Annotate explicitly by request ID
caliper.annotate(request_id, user_feedback="thumbs_up")

# Multiple annotations per request are allowed
caliper.annotate(request_id, reviewed_by="human")
```

Each annotation is keyed by the caliper-generated `request_id` for joining with the main request record.

## Versioning and releases

The package version is derived from **git tags** at build time using [hatch-vcs](https://github.com/ofek/hatch-vcs). There is no hardcoded version in `pyproject.toml`.

**Releasing a new version:**

```bash
git tag v0.2.0
git push origin v0.2.0
```

CI detects the tag, builds the package (`uv build`), and publishes to PyPI. The `v` prefix is stripped automatically — tag `v0.2.0` produces package version `0.2.0`.

**During development** (no tag on HEAD), editable installs get a dev version like `0.1.0.dev3+gabc1234` based on the last tag and number of commits since.

**Checking the version:**

```python
import caliper
print(caliper.__version__)
```

Or from the command line:

```bash
uv run python -c "import caliper; print(caliper.__version__)"
```

## Make targets

```
make install                   Install production dependencies
make install-dev               Install development dependencies
make lint                      Run ruff linter
make format                    Run ruff formatter
make test                      Run all tests
make test-sample P=10          Run random P% of tests
make up                        Start services with docker compose
make down                      Stop services with docker compose
make reload                    Rebuild and restart containers
```
