Metadata-Version: 2.4
Name: traqo
Version: 0.7.4
Summary: Structured tracing for applications. JSONL files, hierarchical spans, zero infrastructure.
Project-URL: Homepage, https://github.com/Cecuro/traqo
Project-URL: Repository, https://github.com/Cecuro/traqo
Author: Cecuro
License-Expression: MIT
License-File: LICENSE
Keywords: jsonl,llm,observability,spans,tracing
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: zstandard>=0.23
Provides-Extra: all
Requires-Dist: anthropic>=0.40; extra == 'all'
Requires-Dist: boto3>=1.28; extra == 'all'
Requires-Dist: google-cloud-storage>=2.10; extra == 'all'
Requires-Dist: google-genai>=1.0; extra == 'all'
Requires-Dist: langchain-core>=0.3; extra == 'all'
Requires-Dist: openai>=1.0; extra == 'all'
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.40; extra == 'anthropic'
Provides-Extra: dev
Requires-Dist: pyright>=1.1.400; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.24; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.12.0; extra == 'dev'
Provides-Extra: gcs
Requires-Dist: google-cloud-storage>=2.10; extra == 'gcs'
Provides-Extra: gemini
Requires-Dist: google-genai>=1.0; extra == 'gemini'
Provides-Extra: langchain
Requires-Dist: langchain-core>=0.3; extra == 'langchain'
Provides-Extra: openai
Requires-Dist: openai>=1.0; extra == 'openai'
Provides-Extra: s3
Requires-Dist: boto3>=1.28; extra == 's3'
Description-Content-Type: text/markdown

<p align="center">
  <img src="frontend/public/favicon.svg" width="96" height="96" alt="Pedro the Raccoon — traqo mascot">
</p>

# traqo

Structured tracing for applications. JSONL files, hierarchical spans, zero infrastructure.

```python
from traqo import Tracer, trace
from pathlib import Path

@trace
def classify(text: str) -> str:
    response = llm.chat(text)
    return response

with Tracer(Path("traces/run.jsonl"), input={"query": "Is this a bug?"}):
    result = classify("Is this a bug?")
```

Your traces are compressed `.jsonl.gz` files. Read them with `gzcat | jq`, query with DuckDB, use `traqo ui` for visual exploration, or hand them to an AI assistant.

## Why traqo?

- **Zero infrastructure** -- no server, no database, no account. `pip install traqo` and go.
- **AI-first** -- JSONL is text. AI assistants read your traces directly, no browser needed.
- **Hierarchical spans** -- not flat logs. Reconstruct the full call tree across functions and files.
- **Everything is a span** -- LLM calls, DB queries, HTTP requests. All spans with metadata.
- **Minimal dependencies** -- one runtime dep (`zstandard`). Integrations are optional extras.
- **Transparent** -- traces are portable files. No vendor lock-in, no proprietary format.

## Install

```bash
pip install traqo                   # Core (requires zstandard)
pip install traqo[openai]           # + OpenAI integration
pip install traqo[anthropic]        # + Anthropic integration
pip install traqo[langchain]        # + LangChain integration
pip install traqo[gemini]           # + Google Gemini integration
pip install traqo[all]              # Everything
```

## Quick Start

### 1. Trace a function

```python
from traqo import Tracer, trace
from pathlib import Path

@trace
def summarize(text: str) -> str:
    # your logic here
    return summary

@trace
def pipeline(docs: list[str]) -> list[str]:
    return [summarize(doc) for doc in docs]

with Tracer(
    Path("traces/my_run.jsonl"),
    input={"docs": ["doc1", "doc2"]},
    tags=["production"],
) as tracer:
    results = pipeline(["doc1", "doc2"])
    tracer.set_output({"count": len(results)})
```

`@trace` works with sync/async functions and generators. It detects and handles all automatically.

### 2. Auto-trace LLM calls

```python
from traqo.integrations.openai import traced_openai
from openai import OpenAI

client = traced_openai(OpenAI(), operation="summarize")
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Summarize this..."}],
)
# Token usage, model, input/output all captured automatically as span metadata
```

Works the same way for Anthropic, Gemini, and LangChain:

```python
from traqo.integrations.anthropic import traced_anthropic
from traqo.integrations.gemini import traced_gemini
from traqo.integrations.langchain import traced_model
```

All integrations auto-capture token usage, model parameters, streaming with TTFT, and tool calls.

### 3. Use metadata, tags, and kind

```python
from traqo import Tracer, LLM, TOOL

with Tracer(Path("traces/run.jsonl"), tags=["prod"]) as tracer:
    with tracer.span(
        "classify",
        input={"text": "Is this a bug?"},
        metadata={"model": "gpt-4o", "provider": "openai"},
        tags=["llm"],
        kind=LLM,
    ) as span:
        result = call_llm(...)
        span.set_metadata("token_usage", {"input_tokens": 100, "output_tokens": 50})
        span.set_output(result)
```

Kind constants: `LLM`, `TOOL`, `RETRIEVER`, `CHAIN`, `AGENT`, `EMBEDDING`, `GUARDRAIL` (or use any string).

### 4. Access the current span from anywhere

```python
from traqo import trace, update_current_span

@trace
def classify(text: str) -> str:
    update_current_span(metadata={"confidence": 0.95, "model": "gpt-4o"})
    return result
```

`update_current_span()` is a convenience helper — no-op when no span is active. For full control, use `get_current_span()` directly.

### 5. Read your traces

```bash
# Last line is always trace_end with summary stats
gzcat traces/my_run.jsonl.gz | tail -1 | jq .

# All LLM spans
gzcat traces/my_run.jsonl.gz | grep '"kind":"llm"' | jq .

# Filter by tag
gzcat traces/my_run.jsonl.gz | grep '"tags"' | jq .

# Errors
for f in traces/**/*.jsonl.gz; do gzcat "$f" | grep '"status":"error"'; done

# Token usage from span metadata
gzcat traces/my_run.jsonl.gz | grep '"token_usage"' | jq '.metadata.token_usage'

# Or use the built-in trace viewer UI
traqo ui ./traces
```

## Claude Agent SDK Integration

Trace [Claude Agent SDK](https://github.com/anthropics/claude-code/tree/main/packages/claude-agent-sdk) sessions with an async context manager. The Stop hook converts the session transcript into a traqo trace automatically.

```python
from claude_agent_sdk import query, ClaudeAgentOptions
from traqo.integrations.claude_agent_sdk import traqo_agent

async with traqo_agent("code-review", output_dir="./traces", tags=["review"]) as hooks:
    async for msg in query(
        prompt="Review this PR for security issues",
        options=ClaudeAgentOptions(hooks=hooks),
    ):
        print(msg)
```

Nest multiple agents inside a parent trace for pipeline orchestration:

```python
with Tracer(Path("traces/pipeline.jsonl"), tags=["ci"]) as tracer:
    async with traqo_agent("code-review", tags=["review"]) as hooks:
        async for msg in query(prompt="Review", options=ClaudeAgentOptions(hooks=hooks)):
            ...
    async with traqo_agent("test-gen", tags=["testing"]) as hooks:
        async for msg in query(prompt="Generate tests", options=ClaudeAgentOptions(hooks=hooks)):
            ...
```

The parent `trace_end` rolls up token usage and span counts from all child agents.

## Agent Skill

Give your AI coding assistant full knowledge of traqo traces — reading, querying, instrumenting, and launching the UI.

```bash
npx skills add Cecuro/traqo --yes --global
```

Works with Claude Code, Cursor, Copilot, Codex, and other agents. Once installed, the agent can navigate traces, extract token usage, find errors, and add tracing to your code without further guidance.

## Claude Code Integration

Convert Claude Code session transcripts into traqo traces — one trace per session with turns, LLM calls, tool calls, and subagent hierarchy.

```bash
# Sync all sessions
traqo cc-sync --all --output-dir ./traces

# Sync a single session
traqo cc-sync path/to/session.jsonl

# View the results
traqo ui ./traces
```

**As a Claude Code Stop hook** (`~/.claude/settings.json`):

```json
{
  "hooks": {
    "Stop": [{ "type": "command", "command": "traqo cc-sync --hook" }]
  }
}
```

## Trace Viewer UI

Browse and inspect traces in your browser. Uses Python's built-in HTTP server.

```bash
traqo ui ./traces                  # Serve traces on http://localhost:7600
traqo ui ./traces --port 8080     # Custom port
traqo ui s3://my-bucket/traces/   # Browse traces from S3
traqo ui gs://my-bucket/traces/   # Browse traces from GCS
python -m traqo ui ./traces       # Alternative invocation
```

Cloud sources list files instantly via API, then download on click. Previously viewed traces show full summary data (duration, stats, tags) on the next page load.

Features: folder navigation, search/filter, span tree with waterfall timing, JSON viewer with syntax highlighting, token usage visualization, keyboard shortcuts (Escape to go back, ? for help).

## API Reference

### `Tracer(path, *, input=None, metadata=None, tags=None, thread_id=None, capture_content=True, backends=None)`

Creates a trace session writing to a JSONL file. Use as a context manager.

```python
with Tracer(
    Path("traces/run.jsonl"),
    input={"query": "What is the weather?"},
    metadata={"run_id": "abc123"},
    tags=["production", "chatbot"],
    thread_id="conv-456",
    capture_content=False,  # Integrations omit LLM input/output
) as tracer:
    result = my_pipeline()
    tracer.set_output({"response": result})
```

| Parameter | Type | Default | Description |
|---|---|---|---|
| `path` | `Path` | required | JSONL file path. Parent dirs created automatically. |
| `input` | `Any` | `None` | Trace input, written to `trace_start`. |
| `metadata` | `dict` | `{}` | Arbitrary metadata written to `trace_start`. |
| `tags` | `list[str]` | `[]` | Tags for filtering/categorization, written to `trace_start`. |
| `thread_id` | `str` | `None` | Conversation/thread grouping ID, written to `trace_start`. |
| `capture_content` | `bool` | `True` | If `False`, integration wrappers omit LLM message inputs/outputs. The `@trace` decorator has separate `capture_input`/`capture_output` flags. |
| `backends` | `list[Backend]` | `None` | Storage backends notified on events and trace completion. Traces are compressed to `.jsonl.gz` locally; backends receive the compressed paths. |

**Methods:**

| Method | Description |
|---|---|
| `span(name, *, input=, metadata=, tags=, kind=)` | Span context manager. Yields a `Span` object. |
| `set_output(value)` | Set trace-level output (written to `trace_end`). |
| `log(name, data)` | Write a custom event. |
| `child(name, path)` | Create a child tracer writing to a separate file. |

### `Span`

Mutable handle yielded by `tracer.span()`. Set output and metadata during execution.

```python
with tracer.span("my_step", input=data, tags=["important"], kind="tool") as span:
    result = do_work()
    span.set_output(result)
    span.set_metadata("latency_ms", 42)
    span.update_metadata({"extra": "info"})
```

| Method | Description |
|---|---|
| `set_output(value)` | Set span output (written to `span_end`) |
| `set_metadata(key, value)` | Set a metadata key |
| `update_metadata(dict)` | Merge a dict into metadata |

### `@trace`

Decorator that wraps a function in a span. Works with sync/async functions and generators.

```python
@trace
def my_step(data: list) -> dict:
    return process(data)

@trace("custom_name", capture_input=False, kind=TOOL)
def sensitive_step(secret: str) -> str:
    return handle(secret)

@trace(ignore_arguments=["password"], kind=TOOL)
def login(user: str, password: str) -> bool:
    return authenticate(user, password)
```

Parameters: `name`, `capture_input`, `capture_output`, `ignore_arguments`, `metadata`, `tags`, `kind`.

When no tracer is active, `@trace` is a pure passthrough with zero overhead.

### `get_current_span() -> Span | None`

Returns the current active span, or `None`.

### `update_current_span(*, output=, metadata=, tags=, **kw_metadata)`

Convenience helper to update the active span. No-op when no span is active.

```python
from traqo import trace, update_current_span

@trace
def my_function(text: str) -> str:
    update_current_span(metadata={"custom_key": "custom_value"})
    return process(text)
```

### `get_tracer() -> Tracer | None`

Returns the active tracer for the current context, or `None`.

```python
from traqo import get_tracer

tracer = get_tracer()
if tracer:
    tracer.log("checkpoint", {"count": len(results)})
```

### `disable()` / `enable()`

```python
import traqo
traqo.disable()  # All tracing becomes no-op
traqo.enable()   # Re-enable
```

Or via environment variable: `TRAQO_DISABLED=1`

## Child Tracers

For concurrent agents or workers that produce many events. Each child writes to its own file, linked to the parent.

```python
with Tracer(Path("traces/pipeline.jsonl")) as tracer:
    child = tracer.child("reentrancy_agent", Path("traces/agents/reentrancy.jsonl"))
    with child:
        run_agent(...)
```

The parent trace records `child_started` / `child_ended` events and includes child summaries in `trace_end`.

## JSONL Format

Every line is a self-contained JSON object. Five event types:

| Type | When | Key Fields |
|---|---|---|
| `trace_start` | Tracer enters | `tracer_version`, `input`, `metadata`, `tags`, `thread_id` |
| `span_start` | Span begins | `id`, `parent_id`, `name`, `input`, `metadata`, `tags`, `kind` |
| `span_end` | Span ends | `id`, `duration_s`, `status`, `output`, `metadata`, `tags`, `kind` |
| `event` | Custom checkpoint | `name`, `data` |
| `trace_end` | Tracer exits | `duration_s`, `output`, `stats`, `children` |

The `kind` field categorizes spans (e.g. `"llm"`, `"tool"`, `"retriever"`). The `tags` field is a list of strings for filtering. Both are omitted when not set.

The `metadata` dict is the universal extension point. LLM-specific data like `model`, `provider`, and `token_usage` are stored there.

## Query with DuckDB

```sql
-- All LLM spans with token usage
SELECT metadata->>'model' as model,
       count(*) as calls,
       sum((metadata->'token_usage'->>'input_tokens')::int) as total_in,
       sum((metadata->'token_usage'->>'output_tokens')::int) as total_out,
       avg(duration_s) as avg_duration
FROM read_json('traces/**/*.jsonl')
WHERE kind = 'llm'
GROUP BY model;

-- All traces for a conversation thread
SELECT * FROM read_json('traces/**/*.jsonl')
WHERE thread_id = 'conv-123'
AND type = 'trace_start';
```

## License

MIT
