Metadata-Version: 2.4
Name: llmdebug
Version: 2.1.0
Summary: Structured debug snapshots for LLM-assisted debugging
Project-URL: Homepage, https://github.com/NicolasSchuler/llmdebug
Project-URL: Repository, https://github.com/NicolasSchuler/llmdebug
Author-email: Nicolas Schuler <schuler.nicolas@proton.me>
License: MIT
License-File: LICENSE
Keywords: crash-reporting,debugging,llm,pytest
Classifier: Development Status :: 4 - Beta
Classifier: Framework :: Pytest
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Debuggers
Requires-Python: >=3.10
Requires-Dist: filelock>=3.0
Provides-Extra: cli
Requires-Dist: click>=8.0; extra == 'cli'
Requires-Dist: rich>=13.0; extra == 'cli'
Provides-Extra: dev
Requires-Dist: click>=8.0; extra == 'dev'
Requires-Dist: mcp>=1.0; extra == 'dev'
Requires-Dist: numpy>=1.20; extra == 'dev'
Requires-Dist: pyright>=1.1; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21; extra == 'dev'
Requires-Dist: pytest-benchmark>=4.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: python-semantic-release>=9.0; extra == 'dev'
Requires-Dist: rich>=13.0; extra == 'dev'
Requires-Dist: ruff>=0.1; extra == 'dev'
Requires-Dist: toons>=0.1; extra == 'dev'
Provides-Extra: evals
Requires-Dist: datasets>=2.0; extra == 'evals'
Provides-Extra: mcp
Requires-Dist: mcp>=1.0; extra == 'mcp'
Provides-Extra: toon
Requires-Dist: toons>=0.1; extra == 'toon'
Description-Content-Type: text/markdown

<p align="center">
  <img src="logo/bird.png" alt="llmdebug logo" width="200">
</p>

# llmdebug

Structured debug snapshots for LLM-assisted debugging.

When your code fails, `llmdebug` captures the exception, stack frames, local variables, and environment info in a JSON format optimized for LLM consumption. This enables **evidence-based debugging** instead of the "guess → patch → rerun" loop.

## Why?

Without observability, LLMs debug by guessing:
```
fail → guess patch → rerun → repeat (LLM roulette)
```

With `llmdebug`, failures produce rich snapshots automatically:
```
fail → read snapshot → ranked hypotheses → minimal patch → verify
```

The key insight: **baseline instrumentation should always be on**, so the first failure already has the evidence needed to diagnose it.

## Installation

```bash
pip install llmdebug          # Core library + pytest plugin
pip install llmdebug[cli]     # Adds CLI for viewing snapshots
```

## Quick Start

### Pytest (automatic - recommended)

Just install the package. Test failures automatically generate snapshots.

```bash
pytest  # Failures create .llmdebug/latest.json
```

### Decorator

```python
from llmdebug import debug_snapshot

@debug_snapshot()
def main():
    data = load_data()
    process(data)

if __name__ == "__main__":
    main()
```

### Context Manager

For targeted instrumentation when you need more detail:

```python
from llmdebug import snapshot_section

with snapshot_section("data_processing"):
    result = transform(data)
```

## CLI

View snapshots directly in the terminal with rich formatting:

```bash
llmdebug              # Show latest snapshot (default)
llmdebug show --full  # Show all stack frames
llmdebug show --json  # Output raw expanded JSON
llmdebug list         # List recent snapshots
llmdebug frames -i 0  # Inspect a specific frame
llmdebug clean -k 5   # Keep only 5 most recent snapshots
```

All commands accept `--dir <path>` to point at a custom snapshot directory.

Requires the `cli` extra: `pip install llmdebug[cli]`

## Output

On failure, find your snapshot at `.llmdebug/latest.json`:

```json
{
  "name": "test_training_step",
  "timestamp_utc": "2026-01-27T14:30:52Z",
  "exception": {
    "type": "ValueError",
    "message": "operands could not be broadcast together..."
  },
  "frames": [
    {
      "file": "training.py",
      "line": 42,
      "function": "train_step",
      "code": "output = model(x) + residual",
      "locals": {
        "x": {"__array__": "jax.Array", "shape": [32, 64], "dtype": "float32"},
        "residual": {"__array__": "jax.Array", "shape": [32, 128], "dtype": "float32"}
      }
    }
  ],
  "env": {"python": "3.12.0", "platform": "Darwin-24.0.0-arm64"}
}
```

**Key features:**
- Crash frame is at index 0 (most relevant first)
- Arrays summarized with `shape` and `dtype` (not raw data)
- Source snippet around the failing line
- Environment info for reproducibility

### Snapshot metadata (new)

The snapshot includes extra fields to reduce LLM guesswork:
- `schema_version` and `llmdebug_version` for compatibility
- `crash_frame_index` to mark the exact crash frame in `frames`
- `capture_config` (frames, locals_mode, truncation limits, redaction patterns)
- `exception` may include `qualified_type`, `args`, `notes`, `cause`, `context`, and `exceptions` (ExceptionGroup)
- Each frame may include `module`, `file_rel`, and `locals_meta` (type/size hints)
- Frames may include `locals_truncated` and `locals_truncated_keys` when locals are omitted
- Pytest runs add `pytest` context (`longrepr`, `capstdout`, `capstderr`, params, `repro`)
- `env.argv` records the command-line invocation

## For Claude Code / LLM Users

Add this to your project's `CLAUDE.md`:

```markdown
## Debug Snapshots (llmdebug)

This project uses `llmdebug` for structured crash diagnostics.

### On any failure:
1. **Read `.llmdebug/latest.json` first** (or run `llmdebug show --json`) - never patch before reading
2. Analyze the snapshot:
   - **Exception type/message** - what went wrong
   - **Crash frame (index 0)** - where it happened
   - **Locals** - variable values at crash time
   - **Array shapes** - look for empty arrays, shape mismatches
3. **Produce 2-4 ranked hypotheses** based on evidence
4. Apply minimal fix for the most likely hypothesis
5. Re-run to verify

### Key signals:
- `shape: [0, ...]` - empty array, upstream data issue
- `None` where object expected - initialization bug
- Shape mismatch in binary op - broadcasting error
- `i=10` with `len(arr)=10` - off-by-one

### When the snapshot isn't enough:
If locals show the symptom but not the cause:
1. Add `with snapshot_section("stage_x")` around suspect code
2. Re-run to get a better snapshot
3. Repeat hypothesis→patch loop

### Don't:
- Guess without reading the snapshot first
- Make multiple speculative changes at once
- Refactor until tests pass
```

## Configuration

```python
@debug_snapshot(
    out_dir=".llmdebug",       # Output directory
    frames=5,                   # Stack frames to capture
    source_context=3,           # Lines of source before/after crash
    source_mode="all",          # "all" | "crash_only" | "none"
    locals_mode="safe",         # "safe" | "meta" | "none"
    max_str=500,                # Truncate long strings
    max_items=50,               # Truncate large collections
    redact=[r"api_key=.*"],     # Regex patterns to redact
    redact_keys=False,          # Keep dict keys stable by default
    include_env=True,           # Include Python/platform info
    max_snapshots=50,           # Auto-cleanup old snapshots (0 = unlimited)
    output_format="json_compact", # "json" | "json_compact" | "toon"
)
```

Redaction defaults to leaf string values only. This avoids accidental key collisions in nested dicts.
Set `redact_keys=True` only if you explicitly need key-name redaction and can accept possible key merging.

### Output Formats

llmdebug supports multiple output formats to optimize for different use cases:

| Format | Size | Best For |
|--------|------|----------|
| `json` | baseline | Human readability, external tools |
| `json_compact` (default) | ~40% smaller | LLM context efficiency |
| `toon` | ~50% smaller | Maximum token savings |

**Compact JSON** uses abbreviated keys (e.g., `_exc` instead of `exception`) to reduce token usage. The `get_latest_snapshot()` function auto-expands keys, so your code works identically regardless of format.

**TOON format** requires an optional dependency:
```bash
pip install llmdebug[toon]
```

Set format via environment variable for pytest:
```bash
LLMDEBUG_OUTPUT_FORMAT=json pytest  # Use pretty JSON
```

### Pytest Opt-out

Skip snapshot capture for specific tests:

```python
import pytest

@pytest.mark.no_snapshot
def test_expected_failure():
    ...
```

## API

```python
from llmdebug import debug_snapshot, snapshot_section, get_latest_snapshot

# Read the most recent snapshot programmatically
snapshot = get_latest_snapshot()  # Returns dict or None
```

## License

MIT
