Metadata-Version: 2.4
Name: llmdebug
Version: 2.27.0
Summary: Structured debug snapshots for LLM-assisted debugging
Project-URL: Homepage, https://github.com/NicolasSchuler/llmdebug
Project-URL: Repository, https://github.com/NicolasSchuler/llmdebug
Author-email: Nicolas Schuler <schuler.nicolas@proton.me>
License: MIT
License-File: LICENSE
Keywords: crash-reporting,debugging,llm,pytest
Classifier: Development Status :: 4 - Beta
Classifier: Framework :: Pytest
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Debuggers
Requires-Python: >=3.10
Requires-Dist: filelock>=3.0
Requires-Dist: orjson>=3.10.0
Provides-Extra: cli
Requires-Dist: click>=8.0; extra == 'cli'
Requires-Dist: rich>=13.0; extra == 'cli'
Provides-Extra: dev
Requires-Dist: bandit>=1.8.0; extra == 'dev'
Requires-Dist: click>=8.0; extra == 'dev'
Requires-Dist: deptry>=0.22.0; extra == 'dev'
Requires-Dist: diff-cover>=9.2; extra == 'dev'
Requires-Dist: httpx[http2]>=0.27.0; extra == 'dev'
Requires-Dist: import-linter>=2.0; extra == 'dev'
Requires-Dist: ipython>=8.0; extra == 'dev'
Requires-Dist: mcp>=1.0; extra == 'dev'
Requires-Dist: mutmut>=3.2; extra == 'dev'
Requires-Dist: numpy>=1.20; extra == 'dev'
Requires-Dist: pip-audit>=2.9.0; extra == 'dev'
Requires-Dist: polars>=1.12.0; extra == 'dev'
Requires-Dist: pyright>=1.1.390; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.25.0; extra == 'dev'
Requires-Dist: pytest-benchmark>=4.0; extra == 'dev'
Requires-Dist: pytest-cov>=6.0; extra == 'dev'
Requires-Dist: pytest>=9.0; extra == 'dev'
Requires-Dist: python-semantic-release>=9.0; extra == 'dev'
Requires-Dist: radon>=6.0; extra == 'dev'
Requires-Dist: rich>=13.0; extra == 'dev'
Requires-Dist: ruff>=0.12.0; extra == 'dev'
Requires-Dist: scikit-learn>=1.4.0; extra == 'dev'
Requires-Dist: scipy>=1.13; extra == 'dev'
Requires-Dist: toons>=0.1; extra == 'dev'
Requires-Dist: vulture>=2.14; extra == 'dev'
Requires-Dist: xenon>=0.9.3; extra == 'dev'
Provides-Extra: evals
Requires-Dist: datasets>=2.0; extra == 'evals'
Requires-Dist: httpx[http2]>=0.27.0; extra == 'evals'
Requires-Dist: polars>=1.12.0; extra == 'evals'
Requires-Dist: scikit-learn>=1.4.0; extra == 'evals'
Requires-Dist: scipy>=1.13; extra == 'evals'
Requires-Dist: swebench==4.1.0; extra == 'evals'
Requires-Dist: testcontainers>=4.13.2; extra == 'evals'
Provides-Extra: jupyter
Requires-Dist: ipython>=8.0; extra == 'jupyter'
Provides-Extra: mcp
Requires-Dist: mcp>=1.0; extra == 'mcp'
Provides-Extra: toon
Requires-Dist: toons>=0.1; extra == 'toon'
Description-Content-Type: text/markdown

<p align="center">
  <img src="logo/bird.png" alt="llmdebug logo" width="200">
</p>

<h1 align="center">llmdebug</h1>

<p align="center">Structured debug snapshots for LLM-assisted debugging.</p>

<p align="center">
  <a href="https://pypi.org/project/llmdebug/"><img src="https://img.shields.io/pypi/v/llmdebug" alt="PyPI"></a>
  <a href="https://pypi.org/project/llmdebug/"><img src="https://img.shields.io/pypi/pyversions/llmdebug" alt="Python"></a>
  <a href="https://github.com/NicolasSchuler/llmdebug/actions/workflows/ci-cd.yml"><img src="https://github.com/NicolasSchuler/llmdebug/actions/workflows/ci-cd.yml/badge.svg" alt="CI"></a>
  <a href="https://github.com/NicolasSchuler/llmdebug/blob/main/LICENSE"><img src="https://img.shields.io/pypi/l/llmdebug" alt="License"></a>
  <a href="https://pypi.org/project/llmdebug/"><img src="https://img.shields.io/pypi/dm/llmdebug" alt="Downloads"></a>
</p>

---

`llmdebug` captures failure-time evidence — exception details, prioritized stack
frames, local variables, and execution context — as a machine-readable artifact
that works for both humans and coding agents. The goal is to make the **first
failing run** useful, rather than reconstructing state after the fact.

## The Debugging Loop

Without structured evidence, a typical loop looks like:

```
fail → infer missing state → guess patch → rerun → repeat
```

With `llmdebug`, the loop becomes:

```
fail → read snapshot → ranked hypotheses → minimal patch → verify
```

## Installation

```bash
pip install 'llmdebug[cli]'   # Recommended: pytest plugin + CLI
pip install llmdebug           # Core library + pytest plugin only
pip install llmdebug[mcp]      # MCP server for IDE integration
```

Other extras: `jupyter`, `toon`, `evals` — see [Configuration Reference](docs/configuration.md#installation-extras).

## Quick Start

### Pytest (automatic)

Failing tests automatically create `.llmdebug/latest.json`:

```bash
pytest                                  # Failures create .llmdebug/latest.json
llmdebug                                # View crash summary
llmdebug show --detail context          # Full context (git, env, repro command)
llmdebug diff                           # Compare latest vs previous
```

### Decorator

```python
from llmdebug import debug_snapshot

@debug_snapshot()
def main():
    data = load_data()
    process(data)
```

### Context Manager

```python
from llmdebug import snapshot_section

with snapshot_section("data_processing"):
    result = transform(data)
```

More entry points (production hooks, web middleware, Jupyter) in the [Configuration Reference](docs/configuration.md#capture-entry-points).

## Features

- **Automatic capture** — pytest plugin, decorator, context manager, production hooks, web middleware
- **Rich snapshots** — exception chain, prioritized frames, typed locals (array shapes, dtypes), source context
- **Layered detail** — `crash` (~2K tokens) → `full` (~5K) → `context` (~10K) disclosure levels
- **CLI inspection** — `show`, `list`, `frames`, `diff`, `git-context`, `clean`
- **MCP server** — 10 tools for Claude Code, Cursor, and other MCP-capable editors
- **Hypothesis engine** — auto-generated ranked debugging hypotheses from snapshot patterns
- **Privacy controls** — PII redaction profiles (`dev` / `ci` / `prod`), pattern-based redaction
- **Jupyter integration** — cell-error banners + `%llmdebug` magic commands
- **Compact formats** — `json_compact` (~40% smaller) and `toon` (~50% smaller) for LLM context

## CLI

```bash
llmdebug                                # Latest snapshot (crash detail)
llmdebug show --detail full             # All frames
llmdebug show --detail context          # Everything (git, env, repro)
llmdebug show --json --detail context   # JSON output
llmdebug list                           # List snapshots
llmdebug diff                           # Compare latest vs previous
llmdebug clean -k 5                     # Keep 5 most recent
```

| Level | Content | ~Tokens |
|-------|---------|---------|
| `crash` (default) | Exception + crash frame | ~2K |
| `full` | All frames + traceback | ~5K |
| `context` | Everything (repro, git, env, coverage) | ~10K |

Full reference: [docs/cli-reference.md](docs/cli-reference.md)

## MCP Server

```bash
pip install llmdebug[mcp]
llmdebug-mcp  # Start the MCP server (stdio transport)
```

| Tool | Description |
|------|-------------|
| `llmdebug_diagnose` | Concise crash summary optimized for LLM consumption |
| `llmdebug_show` | Full expanded JSON snapshot with detail level control |
| `llmdebug_list` | List available snapshots with metadata |
| `llmdebug_frame` | Detailed view of a specific stack frame |
| `llmdebug_git_context` | On-demand enhanced git metadata for crash triage |
| `llmdebug_diff` | Compare two snapshots to show what changed |
| `llmdebug_hypothesize` | Generate ranked debugging hypotheses |
| `llmdebug_rca_status` | Show latest RCA state for a session |
| `llmdebug_rca_history` | Show RCA attempt history |
| `llmdebug_rca_advance` | Manually advance RCA state machine |

Claude Code configuration (`.mcp.json`):

```json
{
  "mcpServers": {
    "llmdebug": {
      "command": "llmdebug-mcp"
    }
  }
}
```

Full reference: [docs/mcp-reference.md](docs/mcp-reference.md)

## Output

On failure, `.llmdebug/latest.json` stores a versioned `DebugSession` envelope:

```json
{
  "schema_version": "2.0",
  "kind": "llmdebug.debug_session",
  "session": {
    "name": "test_training_step",
    "timestamp_utc": "2026-01-27T14:30:52Z"
  },
  "snapshot": {
    "exception": {
      "type": "ValueError",
      "message": "operands could not be broadcast together..."
    },
    "frames": [
      {
        "file": "training.py",
        "line": 42,
        "function": "train_step",
        "code": "output = model(x) + residual",
        "locals": {
          "x": {"shape": [32, 64], "dtype": "float32"},
          "residual": {"shape": [32, 128], "dtype": "float32"}
        }
      }
    ]
  }
}
```

## Configuration

All capture settings are configurable via `@debug_snapshot()` parameters,
`snapshot_section()` arguments, or environment variables (for pytest):

```bash
LLMDEBUG_OUTPUT_FORMAT=json pytest          # Use pretty JSON
LLMDEBUG_REDACTION_PROFILE=ci pytest        # Use CI redaction profile
LLMDEBUG_INCLUDE_GIT=false pytest           # Disable git context
```

Full parameter reference, output formats, and redaction profiles:
[docs/configuration.md](docs/configuration.md)

## Documentation

| Document | Description |
|----------|-------------|
| [Configuration Reference](docs/configuration.md) | Parameters, env vars, output formats, API surface |
| [CLI Reference](docs/cli-reference.md) | Full CLI command reference |
| [MCP Reference](docs/mcp-reference.md) | MCP server JSON schemas and parameters |
| [Contributing](CONTRIBUTING.md) | Development setup and quality gates |
| [Quality Map](docs/quality-map.md) | Which checks block package changes vs staged eval work |
| [Eval Framework](evals/README.md) | Benchmark methodology and analysis |
| [Research Roadmap](docs/research-improvement-roadmap.md) | Forward-looking priorities |

## License

MIT
