Metadata-Version: 2.4
Name: ptuner
Version: 0.2.1
Summary: Python client library for the ptuner prompt-tuning API
Project-URL: Homepage, https://prompts.church
Project-URL: Repository, https://github.com/ptuner/ptuner
Project-URL: Documentation, https://github.com/ptuner/ptuner/tree/main/client#readme
License: MIT
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: httpx<1,>=0.27
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest>=8; extra == 'dev'
Requires-Dist: respx>=0.21; extra == 'dev'
Description-Content-Type: text/markdown

# ptuner

Python client for the **ptuner** prompt-tuning API.

Evaluate, compare and iterate on LLM prompts with dataset-driven benchmarks,
exact-match scoring, and LLM-as-judge evaluation.

**Hosted at [prompts.church](https://prompts.church)**

## Installation

```bash
pip install ptuner
```

## Quick Start

```python
from ptuner import PtunerClient

client = PtunerClient(
    base_url="https://api.prompts.church",
    api_key="sk_...",
)

# 1. Create a project
project = client.create_project(
    name="Sentiment Analysis",
    description="Classify customer feedback",
)

# 2. Create a prompt with a version
prompt = client.create_prompt(
    project["id"],
    name="Sentiment Classifier",
    slug="sentiment-v1",
)

version = client.create_version(
    prompt["id"],
    system_template=(
        "You are a sentiment classifier. "
        "Respond with exactly one word: positive, negative, or neutral."
    ),
    message_template="Text: {{ text }}\n\nSentiment:",
)

# 3. Create a dataset
dataset = client.create_dataset(project["id"], name="Customer Reviews")

reviews = [
    {"text": "This product is amazing!", "label": "positive"},
    {"text": "Terrible quality, broke after one day.", "label": "negative"},
    {"text": "The package arrived on time.", "label": "neutral"},
]

for r in reviews:
    client.create_datapoint(
        dataset["id"],
        message_params=[{"role": "user", "params": {"text": r["text"]}}],
        exact_match_label=r["label"],
        external_id=r["text"][:64],  # optional: dedup on re-import
    )

# 4. Store your LLM API key (one-time)
client.create_credential(
    provider="openai",
    api_key="sk-your-openai-key",
    display_label="My Key",
)

# 5. Run evaluation
run = client.create_eval_run(
    project_id=project["id"],
    prompt_version_id=version["id"],
    dataset_id=dataset["id"],
    model_config={"model": "gpt-5-nano", "provider": "openai", "temperature": 0.0},
    judge_config={"judge_model": "gpt-5-mini"},
    iterations=3,
)

# 6. Wait and check results
import time
for _ in range(30):
    status = client.get_eval_run(run["id"])
    if status["status"] in ("completed", "failed"):
        break
    time.sleep(2)

results = client.list_eval_results(run["id"])
exact = [r["exact_match_score"] for r in results if r.get("exact_match_score") is not None]
judge = [r["judge_score"] for r in results if r.get("judge_score") is not None]

if exact:
    print(f"Exact match accuracy: {sum(exact)/len(exact):.1%}")
if judge:
    print(f"Judge avg score: {sum(judge)/len(judge):.2f}")
```

## Authentication

Pass either an API key or a Firebase JWT token:

```python
# API key (recommended)
client = PtunerClient(base_url="https://api.prompts.church", api_key="sk_...")

# Firebase JWT
client = PtunerClient(base_url="https://api.prompts.church", token="eyJ...")
```

Generate an API key in the UI at **Settings → Generate API Key**.

## Structured JSON Output

Force models to return structured JSON by adding `json_schema` when creating
a prompt version:

```python
version = client.create_version(
    prompt["id"],
    system_template="You are a sentiment expert. Return JSON with sentiment and confidence.",
    message_template="Text: {{ text }}",
    json_schema={
        "type": "object",
        "properties": {
            "sentiment": {"type": "string", "enum": ["positive", "negative", "neutral"]},
            "confidence": {"type": "number"},
        },
        "required": ["sentiment", "confidence"],
        "additionalProperties": False,
    },
)
```

This works across all providers (OpenAI, Anthropic, Google) — ptuner
translates the schema to each provider's structured output format automatically.

Omit `json_schema` (or set it to `None`) for plain text mode.

## Comparing Prompt Versions

A common workflow: iterate on a prompt and compare versions against the same dataset.

```python
v2 = client.create_version(
    prompt["id"],
    system_template="You are a sentiment analysis expert. Respond: positive, negative, or neutral.",
    message_template="Text: {{ text }}\n\nSentiment:",
)

run_v2 = client.create_eval_run(
    project_id=project["id"],
    prompt_version_id=v2["id"],
    dataset_id=dataset["id"],
    model_config={"model": "gpt-5-nano", "provider": "openai", "temperature": 0.0},
    iterations=3,
)
# Compare results between v1 and v2 in the UI or via the API
```

## API Reference

### Client

| Method | Description |
|---|---|
| `PtunerClient(base_url, api_key=, token=, timeout=)` | Create a client |
| `client.close()` | Close the HTTP connection |

Supports context manager: `with PtunerClient(...) as client:`

### User

| Method | Description |
|---|---|
| `get_me()` | Get current user info |
| `generate_api_key()` | Generate a new API key |

### Projects

| Method | Description |
|---|---|
| `list_projects()` | List all projects |
| `create_project(name, description="")` | Create a project |
| `get_project(project_id)` | Get project details |
| `list_members(project_id)` | List project members |
| `add_member(project_id, email, role="editor")` | Add a member |

### Prompts & Versions

| Method | Description |
|---|---|
| `list_prompts(project_id)` | List prompts in a project |
| `create_prompt(project_id, name, slug)` | Create a prompt |
| `list_versions(prompt_id)` | List versions of a prompt |
| `create_version(prompt_id, system_template=, message_template=, json_schema=)` | Create a version |

### Datasets & Datapoints

| Method | Description |
|---|---|
| `list_datasets(project_id)` | List datasets |
| `create_dataset(project_id, name)` | Create a dataset |
| `list_datapoints(dataset_id)` | List datapoints |
| `create_datapoint(dataset_id, system_params=, message_params=, exact_match_label=, acceptance_criteria=, labels=, external_id=)` | Add a datapoint (upserts if `external_id` already exists in the dataset) |
| `update_datapoint(datapoint_id, **fields)` | Update a datapoint |
| `delete_datapoint(datapoint_id)` | Delete a datapoint |

### LLM Credentials

| Method | Description |
|---|---|
| `list_credentials()` | List stored credentials |
| `create_credential(provider, api_key, project_id=, display_label=)` | Store a credential |
| `update_credential(credential_id, **fields)` | Update a credential |
| `delete_credential(credential_id)` | Delete a credential |
| `resolve_credential(project_id, provider)` | Resolve which credential will be used |

### Eval Runs

| Method | Description |
|---|---|
| `create_eval_run(project_id, prompt_version_id, dataset_id, model_config=, judge_config=, iterations=1)` | Start an eval run |
| `get_eval_run(run_id)` | Get run status |
| `list_eval_results(run_id)` | Get run results |
| `list_project_runs(project_id)` | List all runs in a project |

## Examples

See [examples/benchmark_sentiment.py](examples/benchmark_sentiment.py) for a
full end-to-end benchmark that compares multiple models with both plain text
and structured JSON output.

## License

MIT
