Metadata-Version: 2.4
Name: ultrastable
Version: 0.4.0
Summary: Local-first, audit-grade stability/guard library for AI agents (with optional robotics extras)
Author: Zsolt Döme
License-Expression: MIT
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Topic :: Software Development :: Libraries
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.24
Provides-Extra: events
Requires-Dist: pydantic>=2.5; extra == "events"
Provides-Extra: cli
Requires-Dist: rich>=13; extra == "cli"
Requires-Dist: typer>=0.9; extra == "cli"
Provides-Extra: http
Requires-Dist: httpx>=0.25; extra == "http"
Provides-Extra: otlp
Requires-Dist: opentelemetry-sdk>=1.20; extra == "otlp"
Requires-Dist: opentelemetry-exporter-otlp>=1.20; extra == "otlp"
Provides-Extra: robotics
Requires-Dist: gymnasium>=0.29; extra == "robotics"
Requires-Dist: torch>=2.0; extra == "robotics"
Provides-Extra: cortex
Requires-Dist: pydantic>=2.5; extra == "cortex"
Requires-Dist: rich>=13; extra == "cortex"
Requires-Dist: typer>=0.9; extra == "cortex"
Requires-Dist: httpx>=0.25; extra == "cortex"
Requires-Dist: opentelemetry-sdk>=1.20; extra == "cortex"
Requires-Dist: opentelemetry-exporter-otlp>=1.20; extra == "cortex"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: mypy>=1.6; extra == "dev"
Requires-Dist: ruff>=0.4.0; extra == "dev"
Requires-Dist: build>=1.0; extra == "dev"
Requires-Dist: pre-commit>=3.5; extra == "dev"
Requires-Dist: PyYAML>=6; extra == "dev"
Requires-Dist: twine>=4.0; extra == "dev"
Requires-Dist: readme_renderer[md]>=43.0; extra == "dev"
Requires-Dist: langchain-core>=1.2.16; extra == "dev"
Requires-Dist: langgraph>=1.0.0; extra == "dev"
Dynamic: license-file

# Ultrastable

Ultrastable is a local-first, audit-grade guard layer for AI agents. It keeps
agents within viable boundaries by monitoring **essential variables**, running
deterministic detectors, and applying typed interventions when tokens, spend,
context pressure, or retries become risky. The implementation stays focused on
guarding runtime behavior, enforcing viability, and producing finance-grade
evidence—all while keeping the core NumPy-only and offline-first.

## Why Ultrastable?

[![Pipeline](https://gitlab.com/domezsolt/ultrastable/badges/main/pipeline.svg)](https://gitlab.com/domezsolt/ultrastable/-/pipelines)

- **Guard runtime behavior**: detect lexical repeats, runaway tool loops, or escalating costs and respond with deterministic interventions.
- **Enforce viability**: encode budgets/caps as PolicyPacks, hash them, and log every Run/Step/Trigger with schema v1.2 RunEvents that include `policy_hash`.
- **Finance-grade evidence**: append-only JSONL ledgers with per-entry `event_hash`/`prev_hash` hash chains plus CLI spend/unit-econ/zombie reports for CFO-ready attribution.
- **Offline-first + minimal core**: `ultrastable.core` depends only on NumPy/stdlib; CLI/exporters/robotics live behind extras.
- **Dual-domain**: the same primitives stabilize agent loops (tokens/$) and robotics/control loops (battery, torque, temperature).

## Installation & Quick Check

User install (PyPI):

```bash
pip install ultrastable
```

Extras (optional):

```bash
# CLI + reports
pip install "ultrastable[cli]"
# Full agent/cortex toolchain (CLI + OTLP telemetry)
pip install "ultrastable[cortex]"
# Robotics demos
pip install "ultrastable[robotics]"
```

Development install:

```bash
pip install -e .[dev]
```

To mirror the dependency set used in CI/experiments (core + extras):

```bash
pip install -r requirements.txt
```

Quick sanity check:

```bash
python -c "import ultrastable, ultrastable.core; print(ultrastable.core.ping())"
```

Run the automated experiment suite (writes ledgers/reports under `runs/experiments`):

```bash
python scripts/run_ultrastable_experiments.py --output-dir runs/experiments --keep-artifacts
```

Need a fast CI sanity check? Use the curated `experiments.json` smoke plan:

```bash
python scripts/run_ultrastable_experiments.py --plan experiments.json --output-dir runs/smoke --keep-artifacts
```

Want to exercise every major feature (agent demos, reports, exports, robotics examples) in under
45 minutes? Use the full-suite runner:

```bash
python scripts/run_ultrastable_full_suite.py --output-dir runs/full-suite --keep-artifacts
```

Need the AFMB failure-mode smoke suite to run offline in under 10 minutes (no paid APIs)?

```bash
python scripts/run_afmb_suite.py --output-dir runs/afmb-suite --keep-artifacts
```

## Install Matrix

| Goal | Command | Extras pulled in | Notes |
| --- | --- | --- | --- |
| Core library / embeddable guards | `pip install ultrastable` | NumPy only | Minimal footprint; `tests/test_imports.py` ensures no optional deps are loaded. |
| CLI + demos + reports | `pip install "ultrastable[cli]"` | `typer`, `rich` | Enables the Typer CLI with colorized output + interactive prompts. |
| AI Cortex / FinOps guardrails | `pip install "ultrastable[cortex]"` | `pydantic`, `httpx`, `rich`, `typer`, `opentelemetry-*` | Full AgentGuard stack including report/export tooling and telemetry hooks. |
| Robotics demos + DriveWrapper | `pip install "ultrastable[robotics]"` | `gymnasium`, `torch` | Heavier extra; only required for DriveReward/DriveWrapper/MobileHomeostat2D. |

Extras can be combined (e.g., `pip install "ultrastable[cortex,robotics]"`). Development installs still use `pip install -e .[dev]` for lint/test tooling.

## 60-second LangChain integration

Guard an existing LangChain or LangGraph workflow without rewriting it—install the connector and attach a callback:

```bash
pip install ultrastable ultrastable-langchain langchain-openai
```

Set your model provider credentials (e.g., `OPENAI_API_KEY`) and drop this snippet into your chain runner:

```python
from langchain_openai import ChatOpenAI
from ultrastable.agent import AgentGuard
from ultrastable.cli.demos import build_agent_loop_controller
from ultrastable.ledger import JsonlLedger
from ultrastable_langchain import (
    UltrastableCallbackHandler,
    llm_run_to_guard_step,
    pre_step_context_from_prompts,
)

ledger = JsonlLedger("runs/langchain_demo.jsonl", redaction="metadata-only")
guard = AgentGuard(
    controller=build_agent_loop_controller(),
    ledger=ledger,
    context_budget_chars=2000,
)
handler = UltrastableCallbackHandler()
llm = ChatOpenAI(model="gpt-4o-mini", callbacks=[handler], tags=["support-desk"])

guard.start_run(run_id="support-demo", tags={"agent": "tickets"})
llm.invoke("Draft a 1 sentence welcome.")
context = pre_step_context_from_prompts(handler.llm_runs[-1].prompts, base_id="welcome")
if context:
    guard.pre_step(context)
guard_step = llm_run_to_guard_step(handler.llm_runs[-1])
guard.post_step(
    step_id=guard_step.result.step_id,
    role=guard_step.result.role,
    kind=guard_step.result.kind,
    model=guard_step.result.model,
    prompt_text=guard_step.result.prompt_text,
    response_text=guard_step.result.response_text,
    metrics=guard_step.metrics,
    tags={"tenant": "acme-co", **(guard_step.result.tags or {})},
)
guard.end_run()
ledger.close()
```

`UltrastableCallbackHandler` records LLM/tool spans, the bridge helpers convert them into `AgentGuard` payloads, and `JsonlLedger` writes a tamper-evident log under `runs/`. Iterate over `handler.tool_runs` with `tool_run_to_guard_step` to emit tool calls (with deterministic `tool_args_hash`) so ToolLoopDetector and spend reports see the full interaction history.

## Documentation

- [`docs/concepts.md`](docs/concepts.md) — essential variables, viability policies, interventions.
- [`docs/quickstart.md`](docs/quickstart.md) — guided CLI walkthrough (mirrors CI smoke tests).
- [`docs/ai/agent_guard.md`](docs/ai/agent_guard.md), [`docs/ai/detectors_interventions.md`](docs/ai/detectors_interventions.md), [`docs/ai/reports_and_gitops.md`](docs/ai/reports_and_gitops.md) — AI guard tutorials.
- [`docs/robotics/homeostatic_reward.md`](docs/robotics/homeostatic_reward.md), [`docs/robotics/mobile_homeostat.md`](docs/robotics/mobile_homeostat.md), [`docs/robotics/plasticity_resets.md`](docs/robotics/plasticity_resets.md) — robotics demos + extras.
- [`docs/registry.md`](docs/registry.md) — coupling/plasticity registries and IDs.
- [`docs/howto_detectors.md`](docs/howto_detectors.md), [`docs/howto_interventions.md`](docs/howto_interventions.md), [`docs/cli.md`](docs/cli.md) — practical guides.
- [`docs/event_schema.md`](docs/event_schema.md) — schema v1.2 (RunEvent `policy_hash`, PolicySwitchEvent provenance).
- [`docs/experiments.md`](docs/experiments.md) — instructions for the experiment runner.
- [`docs/langchain_connector_limitations.md`](docs/langchain_connector_limitations.md) — known limitations of the `ultrastable-langchain` connector’s first public alpha.
- [`docs/benchmark_manifest.md`](docs/benchmark_manifest.md) — `manifest.json` schema + helpers for benchmark runs.
- [`docs/benchmark_results.md`](docs/benchmark_results.md) — `results.json` schema + helpers for benchmark outputs.
- [`docs/benchmark_harness_config.md`](docs/benchmark_harness_config.md) — AFMB harness config format (`afmb_baselines.json`) covering timeout/retry/tool/budget baselines.
- [`ROADMAP.md`](ROADMAP.md) — current milestone focus and upcoming goals.

Baseline harness helpers live under the optional `ultrastable.benchmark` namespace and are only
imported when explicitly requested, so importing the core `ultrastable` package never pulls in
baseline/batch experiment code.

## CLI Highlights

```bash
# Validate and hash a PolicyPack
ultrastable validate policy configs/guard.json

# Tamper-evident ledger validation (hash chain)
ultrastable ledger validate runs/agent_loop.jsonl --hash-chain

# Run demos (built-in or PolicyPack-backed)
ultrastable demo agent-loop --output runs/agent_loop.jsonl --policy-pack configs/guard.json
ultrastable demo budget-cap --output runs/budget_cap.jsonl

# Replay ledgers deterministically
ultrastable replay runs/agent_loop.jsonl --policy-pack configs/guard.json --deterministic

# Export deterministic routing + budget configs
ultrastable export routing-policy runs/agent_loop.jsonl --output configs/routing.json
ultrastable export budget-policy runs/agent_loop.jsonl --output configs/budget_policy.json

# Finance-grade reports
ultrastable report spend runs/agent_loop.jsonl --by customer --filter agent=loop
ultrastable report unit-econ runs/agent_loop.jsonl --metric cost_per_task --task-map mappings/tasks.json
ultrastable report zombie runs/agent_loop.jsonl --start-time 2025-06-01T00:00:00Z --end-time 2025-06-07T23:59:59Z
```

Spend reports now emit a `health` block that surfaces the latest `D(H)` trend
(start/end/min/max plus median and p90) alongside an `intervention_effect_size`
summary of the recorded ΔD(H) deltas. Use it to see whether interventions are
actually reducing the health distance—even when you are only skimming the CLI
output. When you pass `--format text`, the CLI prints a short `D(H)` trend plus
ΔD(H) summary line so on-call engineers can confirm recovery direction at a
glance without scrolling through JSON.

`ultrastable ledger validate` also supports `--hash-chain` to recompute every
`event_hash` and prove the log has not been modified—any mid-chain modification,
reordering, or insertion yields a `prev_hash` mismatch; omit the flag for a quick
JSON/schema sanity check. The legacy `ultrastable validate ledger` remains as an
alias for existing scripts. Ledgers written prior to the hash-chain rollout still
pass this command—the CLI simply reports that the hash chain is absent so older
archives remain readable.
`ultrastable replay` performs the same hash-chain verification (when ledgers
carry `event_hash`/`prev_hash` pairs) before running controllers and includes a
`hash_chain` block in the emitted report metadata so audits can prove the replay
consumed unmodified evidence; legacy ledgers that predate hash chaining are
noted as such instead of failing.

`inspect` and all `report` subcommands support `--format json|text` plus `--output FILE` to control machine-readable artifacts.
The `inspect` summary highlights `steps.tool_calls` and `interventions.outcomes`
so tool loops and their intervention results are visible without opening the
raw JSONL.

> PolicyPacks must be JSON; YAML parsing has been removed to keep the core dependency-free.

CLI demos/dashboards honor `--redaction metadata-only|selective-text|full-text|none`
(with `none` acting as a convenience alias for `full-text`). Use `--redaction none`
only when you explicitly want prompt/response bodies persisted to disk; the
default `metadata-only` mode hashes/redacts those fields. `selective-text` keeps
error strings for debugging while still hashing prompts/responses, and `full-text`
(`none`) preserves every raw body for short-lived investigations—treat ledgers
produced in that mode as sensitive. See `docs/privacy.md` for the full tradeoff
matrix.

## Observability (Grafana/OTEL) Quick Start

Ultrastable emits spans/metrics via OTLP/HTTP when telemetry is enabled. For a fast local check:

1) Start a local OTEL Collector (HTTP receiver → debug exporter):

```
docker run --rm -p 4318:4318 \
  -v "$(pwd)/collector-minimal.yaml:/etc/otelcol/config.yaml:ro" \
  otel/opentelemetry-collector:latest --config /etc/otelcol/config.yaml
```

2) Point Ultrastable to it and send telemetry:

```
export ULTRASTABLE_OTLP_ENDPOINT=http://localhost:4318
python3 scripts/run_integrations_smoke.py --otel auto
```

3) Import the dashboard template:

- Open Grafana and import `docs/grafana_dashboard.json` (see `docs/grafana_dashboard.md`).
- For hosted setups (Grafana Cloud), configure your collector to export traces via `otlphttp` and metrics via `prometheusremotewrite` using your Cloud endpoints/tokens.

## Examples

Run any example with `python examples/<name>.py`:

- `agent_loop_offline.py`, `budget_cap.py`, `tool_loop.py`, `context_pressure.py`
- `battery_agent.py` — trust-battery agent that halts when feedback is poor
- `robotics_drive_demo.py`, `mobile_homeostat_demo.py` (writes `runs/mobile_homeostat_trajectory.svg`), `dipaolo_replication.py`
- `replay_policy_change.py`

## License & Contributions

Ultrastable is released under the MIT License (see `LICENSE`). Contributions are
welcome—see `CONTRIBUTING.md` for branching/testing/style guidelines before
opening a merge request.

## Acknowledgement: A Note on the Name

The name “Ultrastable” is a small salute to W. Ross Ashby and his work on
ultrastability and homeostatic systems (e.g., An Introduction to Cybernetics;
Design for a Brain - The origin of adaptive behaviour). Ideas such as essential 
variables, viability, and homeostasis inform the way we model agent health and 
design controllers in this library.
