Zero-config · Open Source · MIT

Stop the LLM debugging roulette.

Give your AI actual evidence. llmdebug captures structured snapshots at failure time — so LLMs diagnose instead of guess.

fail → guess → rerun → repeat
fail → snapshot → diagnose → fix
$ pip install llmdebug[cli]
View on GitHub →
llmdebug mascot

How it works

Three steps. No configuration. Evidence-first debugging.

Step 1

Test fails

# test_pipeline.py
def test_transform():
    result = transform(data)
      # ← fails here
    assert result.shape == (100, 5)
Step 2

Snapshot captured

{
  "exception": "ValueError: shape mismatch",
  "closest_frame": {
    "file": "pipeline.py",
    "line": 47,
    "locals": {
      "data": "ndarray (100,3)",
      "result": "ndarray (100,4)"
    }
  }
}
Step 3

LLM diagnoses

1. Shape mismatch: expected (100,5)
      got (100,4)
    Check transform() dimensions

2. Missing feature column
      after preprocessing step
    Verify feature_engineering()
      returns 5 cols

"No more guessing. Evidence first, every time."

Works everywhere you do

Six integration patterns. Choose the one that fits your workflow.

shell
# Zero configuration needed. Just install and run.
$ pip install llmdebug
$ pytest

# ✓ Failures automatically create .llmdebug/latest.json
# ✓ Read with:   llmdebug show
# ✓ Hypotheses:  llmdebug hypothesize
# ✓ Compare:     llmdebug diff
from llmdebug import debug_snapshot

@debug_snapshot()
def process_batch(data: list) -> list:
    return [transform(item) for item in data]

# Snapshot is captured automatically if the function raises.
# Pass config= to customize detail level, PII redaction, etc.
from llmdebug import snapshot_section

with snapshot_section("feature_engineering"):
    features = build_features(raw_data)

# Pinpoint exactly where in a pipeline something breaks.
# Locals at the failure boundary are captured for you.
%load_ext llmdebug       # Auto-captures on cell errors

%llmdebug               # Rich HTML snapshot in notebook
%llmdebug hypothesize   # Ranked debugging hypotheses
%llmdebug diff          # Compare with previous run
%llmdebug list          # List recent snapshots
import llmdebug

llmdebug.install_hooks()
# Captures: sys.excepthook, threading.excepthook,
#           sys.unraisablehook
# Includes: rate limiting + PII redaction

# For web apps — WSGI/ASGI middleware:
from llmdebug import LLMDebugWSGIMiddleware
app = LLMDebugWSGIMiddleware(app)  # Flask, Django
# (use LLMDebugASGIMiddleware for FastAPI)
{
  "mcpServers": {
    "llmdebug": {
      "command": "uvx",
      "args": ["llmdebug[mcp]", "serve"]
    }
  }
}

Works with Claude Code, Cursor, and any MCP-compatible IDE. 10 tools: show_snapshot, hypothesize, diff_snapshots, and more.

Everything you need

Built for modern Python development workflows.

Zero-config pytest

Install and run. Failures captured automatically via pytest plugin. No setup required.

🧠

Hypothesis generation

10 pattern detectors auto-rank debugging leads — empty arrays, shape mismatches, None values, off-by-one errors.

📦

LLM-optimized output

Compact JSON (~40% smaller). TOON format for ~50% token savings when pasting into an AI chat.

🔬

ML-aware

Tensor shapes, NaN/Inf detection, device tracking, requires_grad — captured out of the box for PyTorch & NumPy.

🔌

MCP server

Direct integration with Claude Code, Cursor, and any MCP-compatible IDE. 10 tools ready to use.

🛡️

Production-ready

Exception hooks with rate limiting and automatic PII redaction — safe to deploy in production environments.

🔍

Snapshot diffing

Compare runs to see exactly what changed between two failures — pinpoint regressions instantly.

🌐

Web middleware

WSGI/ASGI middleware for Flask, FastAPI, and Django. Zero-config crash capture for web applications.

Start diagnosing in 30 seconds.

Three commands. That's it.

$ pip install llmdebug[cli]
$ pytest
$ llmdebug show
# Your first structured snapshot is ready.