Metadata-Version: 2.4
Name: jaymd96-pytest-agent
Version: 0.1.0
Summary: LLM-optimised test output for pytest -- structured JSONL, deduplicated warnings, adaptive token budgets, and failure clustering via Claude Code.
Project-URL: Homepage, https://github.com/jaymd96/psclaude
Project-URL: Source, https://github.com/jaymd96/psclaude
Project-URL: Issues, https://github.com/jaymd96/psclaude/issues
Author: jaymd96
License-Expression: MIT
License-File: LICENSE
Keywords: claude,llm,pytest,structured-output,testing
Classifier: Development Status :: 4 - Beta
Classifier: Framework :: Pytest
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Testing
Requires-Python: >=3.10
Requires-Dist: pytest>=7.0
Provides-Extra: cluster
Requires-Dist: jaymd96-psclaude>=0.1.0; extra == 'cluster'
Description-Content-Type: text/markdown

# pytest-agent

LLM-optimised test output for pytest. Structured JSONL, deduplicated warnings, adaptive token budgets, and automatic failure clustering via Claude Code.

## The idea

Test output is designed for humans scanning terminals. LLM agents consuming test results must reverse-parse decorative formatting back into structured data. This wastes tokens, introduces parsing errors, and conflates root causes with their symptoms across duplicated tracebacks.

`pytest-agent` fixes this with a two-phase architecture:

- **Phase 1** (deterministic): a pytest plugin extracts structured JSONL from `TestReport` objects — no formatting, no decoration, no information loss
- **Phase 2** (LLM clustering): sends failures to Claude Code CLI for semantic root-cause clustering — grouping failures by *cause*, not by string similarity

The tool is **self-contained**. It detects whether Claude Code is installed, uses it if available, and falls back gracefully to raw JSONL if not. No pipes, no external scripts, no configuration.

## Install

```bash
pip install -e .
```

## Usage

### One command — full pipeline

```bash
# Run tests with automatic failure clustering
pytest --cluster

# With output files for CI integration
pytest --cluster --agent-output report.jsonl --cluster-output clusters.txt
```

If Claude Code is installed, you get a clustered failure report. If not, you get a clear message and the raw structured JSONL — the tool never fails silently.

### Phase 1 only — structured JSONL

```bash
# Emit structured JSONL without clustering
pytest --agent-json --agent-output report.jsonl

# Control verbosity with token budget
pytest --agent-json --agent-output report.jsonl --token-budget 500
```

### Standalone clustering

```bash
# Check if Claude Code is available
python -m pytest_agent.cluster --check

# Cluster a previously-generated JSONL file
python -m pytest_agent.cluster report.jsonl
```

### In CLAUDE.md for Claude Code projects

```markdown
When running tests, use:
  pytest --cluster --agent-output /tmp/test-report.jsonl --cluster-output /tmp/clusters.txt
Read /tmp/clusters.txt for failure analysis. If clustering fails, read /tmp/test-report.jsonl directly.
```

## How it works

```
pytest run
    │
    ▼
Phase 1: pytest plugin hooks into TestReport objects
    │  ├─ Extracts structured exception/assertion data
    │  ├─ Deduplicates warnings with occurrence counts
    │  └─ Writes JSONL (verdict-first, one line per failure)
    │
    ▼
Phase 2: Claude Code subprocess (if --cluster and claude is installed)
    │  ├─ Detects claude CLI via shutil.which + fallback paths
    │  ├─ Sends structured failures as JSON (not terminal output)
    │  ├─ Receives clustered root-cause analysis
    │  └─ Falls back to raw JSONL if unavailable
    │
    ▼
Output: compact cluster report + raw JSONL
```

## Output schema

### JSONL (Phase 1)

Every line is valid JSON. Line 1 is always the summary — the decision signal comes first.

```json
{"type": "summary", "verdict": "FAIL", "counts": {"passed": 42, "failed": 3, "skipped": 5}, "duration_s": 1.23}
{"type": "failure", "node_id": "tests/test_auth.py::test_validate_token", "exception": {"type": "AssertionError", "message": "..."}, "assertion": {"left": "'expired'", "comparator": "==", "right": "'valid'"}, "traceback": [...]}
{"type": "warnings", "unique_count": 1, "total_count": 12, "items": [{"category": "DeprecationWarning", "message": "...", "count": 12}]}
```

### Cluster report (Phase 2)

```
VERDICT: FAIL | 6 passed, 4 failed, 1 skipped | 0.22s

3 failure cluster(s):

  [incorrect/localised] validate_token() returns "expired" for expired tokens — tests expect "valid"
    fix → examples/test_demo.py::test_expired_token_returns_valid
    tests (2): test_expired_token_returns_valid, test_token_status_is_not_expired
    evidence: Both call validate_token("expired-abc"), function is correct, tests are wrong

  [incorrect/one-liner] calculate_discount() missing "platinum" tier in rates dict
    fix → examples/test_demo.py::calculate_discount
    tests (1): test_platinum_discount

  [incorrect/one-liner] parse_config() raises ValueError on empty input
    fix → examples/test_demo.py::parse_config
    tests (1): test_parse_empty_config
```

## Token budget tiers

| Flag | Includes | ~Tokens (4 failures) |
|------|----------|----:|
| `--token-budget 0` (default) | Full output — all tracebacks, all warnings | ~840 |
| `--token-budget 2000` | Truncated tracebacks (assertion + source-under-test only) | ~500 |
| `--token-budget 500` | Failure one-liners (node_id + exception) | ~182 |
| `--token-budget 100` | Verdict line only | ~57 |

## CLI flags

| Flag | Description |
|------|-------------|
| `--agent-json` | Enable structured JSONL output |
| `--agent-output PATH` | Write JSONL to file (default: stderr) |
| `--cluster` | Cluster failures via Claude Code (implies `--agent-json`) |
| `--cluster-output PATH` | Write cluster report to file (default: stderr) |
| `--cluster-timeout N` | Max seconds for Claude Code response (default: 120) |
| `--token-budget N` | Control JSONL verbosity (0 = unlimited) |

## Requirements

- Python ≥ 3.10
- pytest ≥ 7.0
- Claude Code CLI (optional, for `--cluster`): `npm install -g @anthropic-ai/claude-code`
