Metadata-Version: 2.4
Name: memory-harness
Version: 1.2.0
Summary: Benchmark and validate AI memory systems
License: MIT
Project-URL: Homepage, https://github.com/memory-harness/cli
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: httpx>=0.24.0
Provides-Extra: embedding
Requires-Dist: sentence-transformers>=2.2.0; extra == "embedding"

# Memory Harness

**Benchmark and validate AI memory systems. Detect regressions, leakage, and shortcuts.**
```
Score: 90/100  Grade: A
```

## Install
```bash
pip install memory-harness
```

## Quick Start (Direct Mode)

Test your memory endpoint in 30 seconds:
```bash
python3 -m memorybench dataset \
  -d dataset.jsonl \
  --provider-endpoint https://your-memory-api.com \
  --n-probe 16
```

## Quick Start (Dashboard Mode)
```bash
python3 -m memorybench login -e your@email.com
python3 -m memorybench dataset -d dataset.jsonl --n-probe 16
```

## Dataset Format

Create a JSONL file with `store` and `query` items:
```jsonl
{"type":"store","item_id":"doc1","tenant_id":"acme","text":"Customer bought widgets"}
{"type":"store","item_id":"doc2","tenant_id":"acme","text":"Support ticket opened"}
{"type":"query","query_id":"q1","tenant_id":"acme","text":"customer purchase","expected_item_id":"doc1"}
```

### Fields

**Store items** (documents to remember):
| Field | Required | Description |
|-------|----------|-------------|
| type | Yes | `"store"` |
| item_id | Yes | Unique identifier |
| tenant_id | Yes | Namespace/tenant |
| text | Yes | Content |

**Query items** (retrieval tests):
| Field | Required | Description |
|-------|----------|-------------|
| type | Yes | `"query"` |
| tenant_id | Yes | Namespace/tenant |
| text | Yes | Search query |
| expected_item_id | Yes | Correct match |

### Validate Your Dataset
```bash
python3 -m memorybench validate -d dataset.jsonl
```

## Metrics

| Metric | Description | Target |
|--------|-------------|--------|
| Accuracy@1 | Exact match rate | ≥70% |
| Accuracy@k | Top-k hit rate | ≥90% |
| Cross-tenant | Leakage between tenants | <5% |
| Collision | Same result for different queries | <20% |
| Confidence | Clear winner margin | ≥80% |

## Scoring
```
Score = 35×Acc@1 + 15×Acc@k + 25×(1-CrossTenant) + 10×(1-Collision) + 15×Confidence
```

| Grade | Score |
|-------|-------|
| A | 90-100 |
| B | 80-89 |
| C | 70-79 |
| D | 60-69 |
| F | <60 |

## CI Integration
```yaml
name: Memory Audit
on: [push]
jobs:
  audit:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      - run: pip install memory-harness httpx
      - run: |
          python3 -m memorybench dataset \
            -d tests/memory_data.jsonl \
            --provider-endpoint ${{ secrets.MEMORY_ENDPOINT }} \
            --n-probe 16 \
            --pass-threshold 70
```

## CLI Reference
```bash
python3 -m memorybench --version
python3 -m memorybench login -e EMAIL
python3 -m memorybench validate -d DATASET
python3 -m memorybench dataset -d DATASET [OPTIONS]
python3 -m memorybench run  # 7-test audit
```

### Dataset Options

| Option | Default | Description |
|--------|---------|-------------|
| -d, --dataset | required | JSONL file path |
| -a, --adapter | text | hash, text, embedding |
| --n-probe | 16 | Pattern dimension |
| --n-bridge | 16 | Cue dimension |
| --provider-endpoint | - | Direct API URL |
| --pass-threshold | 70 | Minimum score |
| -o, --output | dataset_report.json | Report file |

## Example: CSV to JSONL
```python
import json, csv

with open('docs.csv') as f, open('dataset.jsonl', 'w') as out:
    for row in csv.DictReader(f):
        out.write(json.dumps({
            "type": "store",
            "item_id": row["id"],
            "tenant_id": row["org"],
            "text": row["content"]
        }) + "\n")

with open('queries.csv') as f, open('dataset.jsonl', 'a') as out:
    for row in csv.DictReader(f):
        out.write(json.dumps({
            "type": "query",
            "tenant_id": row["org"],
            "text": row["query"],
            "expected_item_id": row["doc_id"]
        }) + "\n")
```

## Provider API Requirements

Your memory endpoint must implement:
```
POST /reset     {"seed": int}
POST /store     {"pattern": [[]], "cue": [[]], "learn_steps": int}
POST /recall    {"cue": [[]], "steps": int} → {"pattern": [[]]}
```

## License

MIT
