Metadata-Version: 2.4
Name: assay-ai
Version: 1.14.0
Summary: Tamper-evident audit trails for AI systems
Author-email: Tim Bhaserjian <tim2208@gmail.com>
License: Apache-2.0
Project-URL: Homepage, https://github.com/Haserjian/assay
Project-URL: Repository, https://github.com/Haserjian/assay
Project-URL: Bug Tracker, https://github.com/Haserjian/assay/issues
Project-URL: Documentation, https://github.com/Haserjian/assay/blob/main/docs/README_quickstart.md
Keywords: ai,safety,audit,receipts,governance
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: typer>=0.9.0
Requires-Dist: rich>=13.0.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: PyNaCl>=1.5.0
Requires-Dist: jsonschema>=4.17.0
Requires-Dist: referencing>=0.30.0
Requires-Dist: packaging>=21.0
Requires-Dist: pyyaml>=6.0
Provides-Extra: openai
Requires-Dist: openai>=1.0.0; extra == "openai"
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.20.0; extra == "anthropic"
Provides-Extra: langchain
Requires-Dist: langchain-core>=0.1.0; extra == "langchain"
Provides-Extra: all
Requires-Dist: openai>=1.0.0; extra == "all"
Requires-Dist: anthropic>=0.20.0; extra == "all"
Requires-Dist: langchain-core>=0.1.0; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: hypothesis>=6.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Dynamic: license-file

# Assay

Tamper-evident audit trails for AI systems.

We scanned 30 popular AI projects and found 202 high-confidence LLM call
sites. Zero had tamper-evident audit trails.
[Full results](scripts/scan_study/results/report.md).

Assay adds independently verifiable execution evidence to AI systems:
cryptographically signed receipt bundles that a third party can verify
offline without trusting your server logs. Two lines of code. Four exit codes.

```bash
pip install assay-ai && assay quickstart   # requires Python 3.9+
```

> **Boundary:** Assay proves tamper-evident internal consistency and
> completeness relative to scanned call sites. It does not prevent a fully
> compromised machine from fabricating a consistent story. That's what
> [trust tiers](docs/FULL_PICTURE.md#trust-tiers) are for.

> **Not this:** Assay is not a logging framework, an observability dashboard,
> or a monitoring tool. It produces signed evidence bundles that a third party
> can verify offline. If you need Datadog, this isn't it.

## See It -- Then Understand It

Try it now (no API key needed -- demos use synthetic data; with real calls, Assay instruments OpenAI, Anthropic, Gemini, LiteLLM, LangChain, and local models):

```bash
pip install assay-ai
assay demo-challenge    # tamper detection: one valid pack, one with a single byte changed
```

Two packs, one byte changed ("gpt-4" -> "gpt-5" in the receipts). Here's what happens
(pack IDs and timestamps will differ on your machine):

```
$ assay verify-pack challenge_pack/good/

  VERIFICATION PASSED

  Pack ID:    pack_20260222_ca2bb665
  Integrity:  PASS
  Claims:     PASS
  Receipts:   3
  Signature:  Ed25519 valid

  Exit code: 0

$ assay verify-pack challenge_pack/tampered/

  VERIFICATION FAILED

  Pack ID:    pack_20260222_ca2bb665
  Integrity:  FAIL
  Error:      Hash mismatch for receipt_pack.jsonl

  Exit code: 2
```

One byte changed. Verification fails. No server access needed. No trust required. Just math.

Now try the policy violation demo:

```bash
assay demo-incident     # two-act scenario: honest PASS vs honest FAIL
```

```
  Act 1: Agent uses gpt-4 with guardian check
  Integrity: PASS    Claims: PASS    Exit code: 0

  Act 2: Someone swaps model to gpt-3.5-turbo, removes guardian
  Integrity: PASS    Claims: FAIL    Exit code: 1
```

Act 2 is an **honest failure** -- authentic evidence proving the run violated
its declared standards. The evidence is real. The failure is real. Nobody can
edit the history. Exit code 1.

Exit 1 is **audit gold**: a control failed, but the failure is detectable and
retained. Auditors love that more than systems that always claim to pass.

### How that works

Assay separates two questions on purpose:

- **Integrity**: "Were these bytes tampered with after creation?" (signatures, hashes, required files)
- **Claims**: "Does this evidence satisfy our declared governance checks?" (receipt types, counts, field values)

| Integrity | Claims | Exit | Meaning |
|-----------|--------|------|---------|
| PASS | PASS | 0 | Evidence checks out, behavior meets standards |
| PASS | FAIL | 1 | Honest failure: authentic evidence of a standards violation |
| FAIL | -- | 2 | Tampered evidence |
| -- | -- | 3 | Bad input (missing files, invalid arguments) |

The split is the point. Systems that can prove they failed honestly are
more trustworthy than systems that always claim to pass.

**With real calls:** `assay scan .` finds your actual OpenAI / Anthropic / Gemini / LiteLLM / LangChain call sites. `assay patch .` instruments them. Every real LLM call emits a signed receipt. The demos above use synthetic data so you can see verification without configuring anything.

## Add to Your Project

```bash
# 1. Find uninstrumented LLM calls
assay scan . --report

# 2. Patch (one line per SDK, or auto-patch all)
assay patch .

# 3. Run + build a signed evidence pack
# -c receipt_completeness runs the built-in completeness check (see `assay cards list` for all options)
# everything after -- is your normal run command
assay run -c receipt_completeness -- python my_app.py

# 4. Verify
assay verify-pack ./proof_pack_*/
```

`assay scan . --report` finds every LLM call site (OpenAI, Anthropic, Google
Gemini, LiteLLM, LangChain) and generates a self-contained HTML gap report.
`assay patch` inserts the two-line integration. `assay run` wraps your command,
collects receipts, and produces a signed 5-file evidence pack. `assay verify-pack`
checks integrity + claims and exits with one of the four codes above. Then run
`assay explain` on any pack for a plain-English summary.

**Local models**: Any OpenAI-compatible server (Ollama, LM Studio, vLLM,
llama.cpp) works automatically -- Assay patches the OpenAI SDK at the class
level, so `OpenAI(base_url="http://localhost:11434/v1")` emits receipts like
any other provider. LiteLLM users get the same coverage via the LiteLLM
integration (`ollama/llama3`, etc.).

> **Why now**: EU AI Act Article 12 requires automatic logging for high-risk
> AI systems; Article 19 requires providers to retain automatically generated
> logs for at least 6 months. High-risk obligations apply from 2 Aug 2026
> (Annex III) and 2 Aug 2027 (regulated products). SOC 2 CC7.2 requires
> monitoring of system components and analysis of anomalies as security events.
> "We have logs on our server" is not independently verifiable evidence.
> Assay produces evidence that is.
> See [compliance citations](docs/compliance-citations.md) for exact references.

## CI Gate

Three commands, one lockfile:

```bash
assay run -c receipt_completeness -- python my_app.py
assay verify-pack ./proof_pack_*/ --lock assay.lock --require-claim-pass
assay diff ./baseline_pack/ ./proof_pack_*/ --gate-cost-pct 25 --gate-errors 0 --gate-strict
```

The lockfile catches config drift. Verify-pack catches tampering. Diff
catches regressions and budget overruns. See
[Decision Escrow](docs/decision-escrow.md) for the protocol model.

```bash
# Lock your verification contract
assay lock write --cards receipt_completeness -o assay.lock
```

### Daily use after CI is green

**Regression forensics**:

```bash
assay diff ./proof_pack_*/ --against-previous --why
```

`--against-previous` auto-discovers the baseline pack.
`--why` traces receipt chains to explain what regressed and which call sites caused it.

**Cost/latency drift (from receipts)**:

```bash
assay analyze --history --since 7
```

Shows cost, latency percentiles, error rates, and per-model breakdowns
from your local trace history.

## Trust Model

What Assay detects, what it doesn't, and how to strengthen guarantees.

**Assay detects:**
- Retroactive tampering (edit one byte, verification fails)
- Selective omission under a completeness contract
- Claiming checks that were never run
- Policy drift from a locked baseline

**Assay does not prevent:**
- A fully fabricated false run (attacker controls the machine)
- Dishonest receipt content (receipts are self-attested)
- Timestamp fraud without an external time anchor

Completeness is enforced relative to the call sites enumerated by the scanner
and/or declared by policy. Undetected call sites are a known residual risk,
reduced via multi-detector scanning and CI gating.

**To strengthen guarantees:**
- [Transparency ledger](https://github.com/Haserjian/assay-ledger) (independent witness)
- CI-held org key + branch protection (separation of signer and committer)
- External timestamping (RFC 3161)

The cost of cheating scales with the complexity of the lie. Assay doesn't
make fraud impossible -- it makes fraud expensive.

## The Evidence Compiler

Assay is an **evidence compiler** for AI execution. If you've used a build
system, you already know the mental model:

| Concept | Build System | Assay |
|---------|-------------|-------|
| Source | `.c` / `.ts` files | Receipts (one per LLM call) |
| Artifact | Binary / bundle | Evidence pack (5 files, 1 signature) |
| Tests | Unit / integration tests | Verification (integrity + claims) |
| Lock | `package-lock.json` | `assay.lock` |
| Gate | CI deploy check | CI evidence gate |

## Commands

The core path is 6 commands:

```
assay quickstart          # discover
assay scan / assay patch  # instrument
assay run                 # produce evidence
assay verify-pack         # verify evidence
assay diff                # catch regressions
assay score               # evidence readiness (0-100, A-F)
```

Full command reference:

**Getting started**

| Command | Purpose |
|---------|---------|
| `assay quickstart` | One command: demo + scan + next steps |
| `assay status` | One-screen operational dashboard |
| `assay start demo\|ci\|mcp` | Guided entrypoints for trying, CI setup, or MCP auditing |
| `assay onboard` | Guided setup: doctor -> scan -> first run plan |
| `assay doctor` | Preflight check: is Assay ready here? |
| `assay version` | Print installed version |

**Instrument + produce evidence**

| Command | Purpose |
|---------|---------|
| `assay scan` | Find uninstrumented LLM call sites (`--report` for HTML) |
| `assay patch` | Auto-insert SDK integration patches into your entrypoint |
| `assay run` | Wrap command, collect receipts, build signed evidence pack |

**Verify + analyze**

| Command | Purpose |
|---------|---------|
| `assay verify-pack` | Verify integrity + claims (the 4 exit codes) |
| `assay verify-signer` | Extract and verify signer identity from a pack manifest |
| `assay explain` | Plain-English summary of an evidence pack |
| `assay analyze` | Cost, latency, error breakdown from pack or `--history` |
| `assay diff` | Compare packs: claims, cost, latency (`--against-previous`, `--why`, `--gate-*`) |
| `assay score` | Evidence Readiness Score (0-100, A-F) with anti-gaming caps |

**Workflows + CI**

| Command | Purpose |
|---------|---------|
| `assay flow try\|adopt\|ci\|mcp\|audit` | Guided workflow executor (dry-run by default, `--apply` to execute) |
| `assay ci init github` | Generate a GitHub Actions workflow |
| `assay ci doctor` | CI-profile preflight checks |
| `assay audit bundle` | Create portable audit bundle (tar.gz with verify instructions) |
| `assay compliance report` | Generate compliance evidence report |

**Pack + baseline management**

| Command | Purpose |
|---------|---------|
| `assay packs list` | List local proof packs |
| `assay packs show` | Show pack details |
| `assay packs pin-baseline` | Pin a pack as the diff baseline |
| `assay baseline set\|get` | Set or get the baseline pack for diff |

**Key management**

| Command | Purpose |
|---------|---------|
| `assay key generate` | Generate a new Ed25519 signing key |
| `assay key list` | List local signing keys and active signer |
| `assay key info` | Show key details (fingerprint, creation date) |
| `assay key set-active` | Set active signing key for future runs |
| `assay key rotate` | Generate a new key and switch active signer |
| `assay key export\|import` | Export or import keys for CI or team sharing |
| `assay key revoke` | Revoke a signing key |

**Lockfile + cards**

| Command | Purpose |
|---------|---------|
| `assay lock write` | Freeze verification contract to lockfile |
| `assay lock check` | Validate lockfile against current card definitions |
| `assay lock init` | Initialize a new lockfile interactively |
| `assay cards list` | List built-in run cards and their claims |
| `assay cards show` | Show card details, claims, and parameters |

**MCP + policy**

| Command | Purpose |
|---------|---------|
| `assay mcp-proxy` | Transparent MCP proxy: intercept tool calls, emit receipts |
| `assay mcp policy init` | Generate a starter MCP policy YAML file |
| `assay mcp policy validate` | Validate a policy file against the schema |
| `assay policy impact` | Analyze policy impact on existing evidence |

**Incident forensics**

| Command | Purpose |
|---------|---------|
| `assay incident timeline` | Build incident timeline from receipts |
| `assay incident replay` | Replay an incident from receipt chain |

**Demos**

| Command | Purpose |
|---------|---------|
| `assay demo-incident` | Two-act scenario: passing run vs failing run |
| `assay demo-challenge` | CTF-style good + tampered pack pair |
| `assay demo-pack` | Generate demo packs (no config needed) |

## Documentation

- **[Start Here](docs/START_HERE.md) -- 6 steps from install to evidence in CI**
- [Full Picture](docs/FULL_PICTURE.md) -- architecture, trust tiers, repo boundaries, release history
- [Quickstart](docs/README_quickstart.md) -- install, golden path, command reference
- [For Compliance Teams](docs/for-compliance.md) -- what auditors see, evidence artifacts, framework alignment
- [Compliance Citations](docs/compliance-citations.md) -- exact regulatory references (EU AI Act, SOC 2, ISO 42001)
- [Decision Escrow](docs/decision-escrow.md) -- protocol model: agent actions don't settle until verified
- [Roadmap](docs/ROADMAP.md) -- phases, product boundary, execution stack
- [Repo Map](docs/REPO_MAP.md) -- what lives where across the Assay ecosystem
- [Pilot Program](docs/PILOT_PROGRAM.md) -- early adopter program details

## Common Issues

- **"No receipts emitted" after `assay run`**: First, check whether your code
  has call sites: `assay scan .` -- if scan finds 0 sites, you may not be
  using a supported SDK yet. If scan finds sites, check: (1) Is `# assay:patched`
  in the file? Run `assay scan . --report` to see patch status per file.
  (2) Did you install the SDK extra (`pip install assay-ai[openai]`)?
  (3) Did you use `--` before your command (`assay run -- python app.py`)?
  Run `assay doctor` for a full diagnostic.

- **LangChain projects**: `assay patch` auto-instruments OpenAI and Anthropic
  SDKs but not LangChain (which uses callbacks, not monkey-patching). For
  LangChain, add `AssayCallbackHandler()` to your chain's `callbacks` parameter
  manually. See `src/assay/integrations/langchain.py` for the handler.

- **`assay run python app.py` gives "No command provided"**: You need the `--`
  separator: `assay run -c receipt_completeness -- python app.py`. Everything
  after `--` is passed to the subprocess.

- **Quickstart blocked on large directories**: `assay quickstart` guards against
  scanning system directories (>10K Python files). Use `--force` to bypass:
  `assay quickstart --force`.

- **macOS: `ModuleNotFoundError` inside `assay run` but works outside it**:
  On macOS, `python3` on PATH may point to a different Python version than
  where assay and your SDK are installed (e.g. `python3` → 3.14, but packages
  are in 3.11). Use a virtual environment (recommended), or specify the exact
  interpreter: `assay run -- python3.11 app.py`. Check with
  `python3 --version` and compare to the Python where you ran `pip install`.

## Get Involved

- **Try it**: `pip install assay-ai && assay quickstart`
- **Questions / feedback**: [GitHub Discussions](https://github.com/Haserjian/assay/discussions)
- **Bug reports**: [Issues](https://github.com/Haserjian/assay/issues)
- **Want this in your stack in 2 weeks?** [Pilot program](docs/PILOT_PROGRAM.md) --
  we instrument your AI workflows, set up CI gates, and hand you a working
  evidence pipeline. [Open a pilot inquiry](https://github.com/Haserjian/assay/issues/new?template=pilot-inquiry.md).

## Related Repos

| Repo | Purpose |
|------|---------|
| [assay](https://github.com/Haserjian/assay) | Core CLI, SDK, conformance corpus (this repo) |
| [assay-verify-action](https://github.com/Haserjian/assay-verify-action) | GitHub Action for CI verification |
| [assay-ledger](https://github.com/Haserjian/assay-ledger) | Public transparency ledger |

## License

Apache-2.0
