Metadata-Version: 2.4
Name: aegis-eval
Version: 0.1.0
Summary: The Adaptive Intelligence Layer for AI Agents — eval, train, memory, environments.
Project-URL: Homepage, https://aegis.dev
Project-URL: Documentation, https://docs.aegis.dev
Project-URL: Repository, https://github.com/metronis-space/aegis
Project-URL: Issues, https://github.com/metronis-space/aegis/issues
Author-email: "Metronis, Inc." <eng@metronis.dev>
License-Expression: Apache-2.0
License-File: LICENSE
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.11
Requires-Dist: httpx<1.0,>=0.27
Requires-Dist: pydantic-settings<3.0,>=2.0
Requires-Dist: pydantic<3.0,>=2.0
Requires-Dist: python-dotenv<2.0,>=1.0
Requires-Dist: pyyaml<7.0,>=6.0
Requires-Dist: rich<14.0,>=13.0
Requires-Dist: typer<1.0,>=0.9
Provides-Extra: all
Requires-Dist: datasets<3.0,>=2.0; extra == 'all'
Requires-Dist: docling<3.0,>=2.0; extra == 'all'
Requires-Dist: fastapi<1.0,>=0.115; extra == 'all'
Requires-Dist: huggingface-hub<1.0,>=0.20; extra == 'all'
Requires-Dist: neo4j<6.0,>=5.0; extra == 'all'
Requires-Dist: numpy<3.0,>=1.26; extra == 'all'
Requires-Dist: pgvector<1.0,>=0.3; extra == 'all'
Requires-Dist: psycopg[binary]<4.0,>=3.1; extra == 'all'
Requires-Dist: python-multipart<1.0,>=0.0.9; extra == 'all'
Requires-Dist: redis<6.0,>=5.0; extra == 'all'
Requires-Dist: sentence-transformers<4.0,>=3.0; extra == 'all'
Requires-Dist: uvicorn[standard]<1.0,>=0.30; extra == 'all'
Provides-Extra: api
Requires-Dist: fastapi<1.0,>=0.115; extra == 'api'
Requires-Dist: python-multipart<1.0,>=0.0.9; extra == 'api'
Requires-Dist: uvicorn[standard]<1.0,>=0.30; extra == 'api'
Provides-Extra: browser
Requires-Dist: playwright<2.0,>=1.40; extra == 'browser'
Provides-Extra: data
Requires-Dist: datasets<3.0,>=2.0; extra == 'data'
Requires-Dist: huggingface-hub<1.0,>=0.20; extra == 'data'
Provides-Extra: db
Requires-Dist: neo4j<6.0,>=5.0; extra == 'db'
Requires-Dist: pgvector<1.0,>=0.3; extra == 'db'
Requires-Dist: psycopg[binary]<4.0,>=3.1; extra == 'db'
Requires-Dist: redis<6.0,>=5.0; extra == 'db'
Provides-Extra: dev
Requires-Dist: mypy<2.0,>=1.13; extra == 'dev'
Requires-Dist: pre-commit<5.0,>=4.0; extra == 'dev'
Requires-Dist: pytest-asyncio<1.0,>=0.24; extra == 'dev'
Requires-Dist: pytest-cov<6.0,>=5.0; extra == 'dev'
Requires-Dist: pytest<9.0,>=8.0; extra == 'dev'
Requires-Dist: ruff<1.0,>=0.8; extra == 'dev'
Requires-Dist: twine<6.0,>=5.0; extra == 'dev'
Provides-Extra: docs
Requires-Dist: mkdocs-material<10.0,>=9.0; extra == 'docs'
Requires-Dist: mkdocstrings[python]<1.0,>=0.27; extra == 'docs'
Provides-Extra: full
Requires-Dist: datasets<3.0,>=2.0; extra == 'full'
Requires-Dist: docling<3.0,>=2.0; extra == 'full'
Requires-Dist: fastapi<1.0,>=0.115; extra == 'full'
Requires-Dist: huggingface-hub<1.0,>=0.20; extra == 'full'
Requires-Dist: neo4j<6.0,>=5.0; extra == 'full'
Requires-Dist: numpy<3.0,>=1.26; extra == 'full'
Requires-Dist: peft>=0.11; extra == 'full'
Requires-Dist: pgvector<1.0,>=0.3; extra == 'full'
Requires-Dist: playwright<2.0,>=1.40; extra == 'full'
Requires-Dist: psycopg[binary]<4.0,>=3.1; extra == 'full'
Requires-Dist: python-multipart<1.0,>=0.0.9; extra == 'full'
Requires-Dist: redis<6.0,>=5.0; extra == 'full'
Requires-Dist: sentence-transformers<4.0,>=3.0; extra == 'full'
Requires-Dist: torch>=2.0; extra == 'full'
Requires-Dist: transformers>=4.40; extra == 'full'
Requires-Dist: uvicorn[standard]<1.0,>=0.30; extra == 'full'
Requires-Dist: verl>=0.3; extra == 'full'
Provides-Extra: gpu
Requires-Dist: numpy<3.0,>=1.26; extra == 'gpu'
Requires-Dist: peft>=0.11; extra == 'gpu'
Requires-Dist: sentence-transformers<4.0,>=3.0; extra == 'gpu'
Requires-Dist: torch>=2.0; extra == 'gpu'
Requires-Dist: transformers>=4.40; extra == 'gpu'
Requires-Dist: verl>=0.3; extra == 'gpu'
Provides-Extra: ingestion
Requires-Dist: docling<3.0,>=2.0; extra == 'ingestion'
Provides-Extra: scoring
Requires-Dist: numpy<3.0,>=1.26; extra == 'scoring'
Requires-Dist: sentence-transformers<4.0,>=3.0; extra == 'scoring'
Provides-Extra: training
Requires-Dist: peft>=0.11; extra == 'training'
Requires-Dist: torch>=2.0; extra == 'training'
Requires-Dist: transformers>=4.40; extra == 'training'
Requires-Dist: verl>=0.3; extra == 'training'
Description-Content-Type: text/markdown

# Aegis

[![CI](https://github.com/metronis-space/aegis/actions/workflows/ci.yml/badge.svg)](https://github.com/metronis-space/aegis/actions/workflows/ci.yml)
[![Python 3.11+](https://img.shields.io/badge/python-3.11%2B-blue.svg)](https://www.python.org/downloads/)
[![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-green.svg)](https://opensource.org/licenses/Apache-2.0)
[![PyPI version](https://img.shields.io/pypi/v/aegis-eval.svg)](https://pypi.org/project/aegis-eval/)

**The Adaptive Intelligence Layer for AI Agents** -- eval, train, and memory on one platform.

Aegis is an open-source framework by [Metronis, Inc.](https://aegis.dev) that provides three integrated products for building, evaluating, and improving AI agents:

| Product | What it does |
|---------|-------------|
| **Aegis Eval** | 51 core + 50 domain dimensions, triangulated scoring, scenario generation, diagnostic reporting |
| **Aegis Train** | GRPO-based RL training engine with progressive memory-op unlocking and Observatory monitoring |
| **Aegis Memory** | 7 memory types, 12 RL-trained operations, knowledge graph, vector store, provenance tracking |

---

## Table of Contents

- [Architecture](#architecture)
- [Why Aegis](#why-aegis)
- [Installation](#installation)
- [Quick Start](#quick-start)
- [CLI Reference](#cli-reference)
- [Eval Dimensions](#eval-dimensions)
- [Domain Plugins](#domain-plugins)
- [Agent Adapters](#agent-adapters)
- [API Server](#api-server)
- [Testing](#testing)
- [Benchmarks](#benchmarks)
- [Contributing](#contributing)
- [License](#license)

---

## Architecture

```
┌──────────────────────────────────────────────────────────┐
│                      Aegis Platform                       │
├─────────────────┬─────────────────┬──────────────────────┤
│   Aegis Eval    │   Aegis Train   │    Aegis Memory      │
│   51+50 dims    │   GRPO engine   │  7 types · 12 ops    │
│   3 scorers     │   Observatory   │  KG · Vectors · Log  │
├─────────────────┴─────────────────┴──────────────────────┤
│           Adapters · API · CLI · Plugins                  │
└──────────────────────────────────────────────────────────┘
```

```mermaid
flowchart LR
    A["Aegis Eval"] --> B["Diagnostics"]
    B --> C["Aegis Train"]
    C --> D["Improved Agent Policy"]
    D --> E["Aegis Memory"]
    E --> F["Production Agent Runtime"]
    F --> A
```

### Detailed Data Flow

```mermaid
flowchart TB
    subgraph Eval["Aegis Eval"]
        E1["51 Core Dimensions\n(7 Tiers)"] --> E2["Triangulated Scoring"]
        E3["50 Domain Dimensions\n(Legal / Finance / Safety)"] --> E2
        E2 --> E4["Rule-Based"]
        E2 --> E5["Semantic"]
        E2 --> E6["LLM Judge"]
        E4 & E5 & E6 --> E7["JudgePacketV1"]
    end

    subgraph Train["Aegis Train"]
        T1["AMIR-GRPO / GRPO-SG"] --> T2["Rollout Engine"]
        T2 --> T3["Reward Engine"]
        T3 --> T4["DrGRPO / DAPO / GiGPO / Forge"]
        T4 --> T5["Observatory Monitor"]
        T5 --> T6["Checkpoints + LoRA Adapters"]
    end

    subgraph Memory["Aegis Memory"]
        M1["12 Operations"] --> M2["Event Log"]
        M1 --> M3["Knowledge Graph\n(Neo4j)"]
        M1 --> M4["Vector Store\n(pgvector)"]
        M1 --> M5["Temporal Index"]
    end

    subgraph Adapters["Agent Adapters"]
        AD["OpenAI / Anthropic / LangChain\nLlamaIndex / LangGraph / DSPy / REST"]
    end

    AD -->|TrajectoryV1| Eval
    E7 -->|Diagnostics| Train
    T6 -->|Trained Policy| Memory
    Memory -->|Context| AD
```

**Aegis Eval** provides 51 core evaluation dimensions organized into 7 tiers, plus 50 domain-specific dimensions across Legal, Finance, and Safety verticals. Scoring is triangulated through three independent backends -- rule-based, semantic similarity, and LLM judge -- to reduce single-method bias.

**Aegis Train** implements AMIR-GRPO (Adaptive Multi-stage Iterative Reward GRPO) and GRPO-SG (Staged Gating) for training memory policy networks. Twelve memory operations are progressively unlocked across training stages. The Observatory subsystem monitors for reward hacking, gradient health issues, and distribution drift.

**Aegis Memory** provides managed memory infrastructure with seven memory types (session, episodic, semantic, procedural, prospective, social, meta), backed by an event log, temporal index, knowledge graph, and vector store. Every memory operation is tracked with full provenance, and point-in-time snapshot reconstruction is supported.

---

## Why Aegis

| Capability | Aegis | DeepEval | RAGAS | LangSmith |
|------------|-------|----------|-------|-----------|
| Multi-tier capability + safety eval | ✅ | ⚠️ limited | ⚠️ limited | ⚠️ workflow-focused |
| Integrated RL training loop | ✅ | ❌ | ❌ | ❌ |
| Managed memory operations | ✅ | ❌ | ❌ | ❌ |
| Open-source extensibility (plugins/adapters) | ✅ | ✅ | ✅ | ⚠️ partial |
| Domain-specific evaluation packs | ✅ | ⚠️ custom work | ⚠️ custom work | ⚠️ custom work |

---

## Installation

**Core package:**

```bash
pip install aegis-eval
```

**Optional extras:**

```bash
pip install aegis-eval[api]       # + FastAPI server (uvicorn)
pip install aegis-eval[scoring]   # + sentence-transformers, numpy
pip install aegis-eval[db]        # + PostgreSQL, Neo4j, Redis
pip install aegis-eval[data]      # + HuggingFace datasets
pip install aegis-eval[ingestion] # + Docling document parsing
pip install aegis-eval[gpu]       # + training + scoring (torch, verl, peft)
pip install aegis-eval[all]       # API + scoring + DB + ingestion + data
pip install aegis-eval[full]      # Everything including training
```

**Development setup:**

```bash
git clone https://github.com/metronis-space/aegis.git
cd aegis
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev,all]"
```

**Docker (full stack):**

```bash
docker compose up -d  # PostgreSQL, Neo4j, Redis, API server, dashboard
```

**Requirements:** Python 3.11 or later.

---

## Quick Start

### Run an evaluation in Python

```python
from aegis import Evaluator, EvalConfig

evaluator = Evaluator(config=EvalConfig(dimensions="all"))
result = evaluator.run()

print(f"Overall score: {result.overall_score:.2%}")
for tier_name, tier_score in result.tier_scores.items():
    print(f"  {tier_name}: {tier_score:.2%}")
```

### Run an evaluation from the CLI

```bash
aegis eval run --config eval.yaml
```

### Inspect available dimensions

```bash
aegis eval dimensions
```

### Start training (CLI)

```bash
aegis train start --model Qwen/Qwen2.5-7B --optimizer dr_grpo
aegis train status --job-id <JOB_ID>
```

### Check memory subsystem

```bash
aegis memory health
aegis memory audit
```

### Run the closed-loop proof demo

Simulated backend (deterministic, no GPU required):

```bash
./scripts/run_closed_loop_demo.sh
```

This executes: baseline eval -> weak-dimension diagnosis -> simulated training -> re-eval,
and writes a reproducible report to `tmp-download/closed_loop_demo.json`.

Real-artifact backend (compares proof outputs from GPU pipeline):

```bash
python examples/closed_loop_demo.py \
  --backend real \
  --real-baseline results/proof/baseline_eval.json \
  --real-trained results/proof/trained_eval.json \
  --output results/proof/closed_loop_demo_real.json
```

---

## CLI Reference

```
aegis version                                  Show version info
aegis eval run --config eval.yaml              Run evaluation suite
aegis eval dimensions                          List all registered dimensions
aegis eval compare --runs RUN_A --runs RUN_B   Compare two eval runs
aegis eval report --run RUN_ID --format json   Export diagnostic report (json or html)
aegis eval benchmark-list                      List available benchmark suites
aegis eval benchmark --suite legal-memory      Run a benchmark suite
aegis eval benchmark --suite legal-memory-scale  Run 200+ case legal benchmark suite
aegis data download --dataset cuad             Download and prepare datasets
aegis memory health                            Check memory subsystem health
aegis train start --model base-agent-v1        Start RL training run
```

---

## Eval Dimensions

### 7 Core Tiers (51 dimensions)

| Tier | Name | Dimensions | Examples |
|------|------|-----------|----------|
| T1 | Memory Fidelity | 8 | Verbatim recall, temporal ordering, source attribution |
| T2 | Context Intelligence | 8 | Multi-session coherence, relevance filtering, context switching |
| T3 | Learning Dynamics | 8 | Preference drift, correction integration, few-shot adaptation |
| T4 | Reasoning Quality | 7 | Causal reasoning, counterfactual analysis, uncertainty calibration |
| T5 | Meta-Cognition | 7 | Confidence calibration, knowledge boundary detection, self-correction |
| T6 | Collaborative Context | 6 | Shared knowledge management, perspective tracking, conflict resolution |
| T7 | Security & Adversarial | 7 | Injection resistance, memory poisoning detection, privacy compliance |

### Triangulated Scoring

Every dimension is scored through three independent backends:

1. **Rule-based** -- Deterministic checks (exact match, pattern matching, threshold validation)
2. **Semantic** -- Embedding-based similarity using sentence-transformers
3. **LLM Judge** -- Structured rubric evaluation with configurable model backend

Final scores are reconciled across all three to produce a calibrated result.

---

## Domain Plugins

Plugins extend Aegis Eval with industry-specific dimensions and scoring criteria. They are auto-discovered via Python entry points.

### Legal (18 dimensions)

Clause retention, precedent tracking, citation validity, confidentiality boundary enforcement, jurisdictional awareness, statutory interpretation, contract term extraction, and more.

### Finance (20 dimensions)

Numerical retention, FINRA/SEC compliance verification, materiality judgment, portfolio context tracking, risk factor memory, earnings data accuracy, regulatory disclosure, and more.

### Safety (12 dimensions)

Prompt injection detection, PII leakage prevention, privilege boundary enforcement, jailbreak resistance, data exfiltration detection, and more.

### Registering a custom plugin

Define a class extending `aegis.plugins.base.DomainPlugin` and register it as an entry point under `aegis.plugins` in your `pyproject.toml`:

```toml
[project.entry-points."aegis.plugins"]
my_domain = "my_package.plugin:MyDomainPlugin"
```

---

## Agent Adapters

Aegis ships with built-in adapters for common agent frameworks:

| Adapter | Module | Description |
|---------|--------|-------------|
| OpenAI | `aegis.adapters.openai` | Chat Completions API wrapper |
| Anthropic | `aegis.adapters.anthropic` | Messages API wrapper |
| LangChain | `aegis.adapters.langchain` | Chain/Agent integration |
| LlamaIndex | `aegis.adapters.llamaindex` | Query engine integration |
| LangGraph | `aegis.adapters.langgraph` | State machine agent adapter |
| DSPy | `aegis.adapters.dspy` | DSPy module adapter |
| REST | `aegis.adapters.rest` | Generic HTTP endpoint adapter |

Adapters normalize agent interactions into Aegis trajectory format (`TrajectoryV1`) for evaluation and memory tracking.

---

## API Server

Launch the FastAPI server for programmatic access:

```bash
pip install aegis-eval[api]
uvicorn aegis.api.app:app --host 0.0.0.0 --port 8000
```

### Key Endpoints

| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/v1/evals/runs` | Start an evaluation run |
| `GET` | `/v1/evals/dimensions` | List registered dimensions |
| `GET` | `/v1/evals/benchmarks` | List benchmark suites |
| `POST` | `/v1/evals/benchmarks/run` | Run a benchmark suite |
| `GET` | `/v1/evals/benchmarks/history` | List benchmark run history |
| `GET` | `/v1/evals/benchmarks/runs/{run_id}` | Get benchmark run details |
| `POST` | `/v1/arena/submit` | Submit an arena agent |
| `GET` | `/v1/arena/leaderboard` | Get arena leaderboard |
| `POST` | `/v1/retrieval/query` | Query memory with context retrieval |
| `GET` | `/v1/retrieval/benchmark/m4` | Run M4 retrieval precision/depth benchmark |
| `POST` | `/v1/ingestion/upload` | Upload and ingest a document |
| `GET` | `/v1/ingestion/history` | List ingestion runs |
| `GET` | `/v1/ingestion/formats` | List supported formats and parsers |
| `GET` | `/v1/ingestion/{document_id}` | Get ingestion status/details |
| `POST` | `/v1/ingestion/{document_id}/retry` | Retry ingesting a prior document |
| `GET` | `/v1/events/webhooks/queue` | Inspect pending webhook queue |
| `GET` | `/v1/events/webhooks/deliveries` | List webhook delivery attempts |
| `POST` | `/v1/events/webhooks/process` | Process queue with success/failure/retry summary |
| `GET` | `/v1/memory/health` | Memory subsystem health check |
| `POST` | `/v1/train/jobs` | Create a training job |
| `GET` | `/v1/train/jobs` | List training jobs |
| `GET` | `/v1/train/jobs/{job_id}` | Get training job status |
| `POST` | `/v1/train/jobs/{job_id}/enqueue` | Queue a training job |
| `POST` | `/v1/train/jobs/{job_id}/tick` | Advance queued training job by one tick |
| `GET` | `/v1/train/jobs/{job_id}/metrics` | Get training metrics snapshot |
| `GET` | `/v1/train/jobs/{job_id}/metrics/series` | Get per-stage training metrics series |
| `GET` | `/v1/train/jobs/{job_id}/observatory` | Get observatory health checks for a training job |
| `POST` | `/v1/train/jobs/{job_id}/run` | Execute a training job and persist metrics |
| `GET` | `/v1/train/jobs/{job_id}/result` | Get full training result payload (includes PODS and transfer safeguards) |
| `POST` | `/v1/train/jobs/{job_id}/stop` | Stop or cancel a training job |
| `POST` | `/v1/train/run` | Compatibility alias: create training run |
| `GET` | `/v1/train/run/{run_id}` | Compatibility alias: get run status |
| `POST` | `/v1/train/run/{run_id}/enqueue` | Compatibility alias: queue run |
| `POST` | `/v1/train/run/{run_id}/tick` | Compatibility alias: advance queued run |
| `POST` | `/v1/train/run/{run_id}/start` | Compatibility alias: start run |
| `GET` | `/v1/train/run/{run_id}/metrics` | Compatibility alias: get run metrics |
| `GET` | `/v1/train/run/{run_id}/metrics/series` | Compatibility alias: get run metrics series |
| `GET` | `/v1/train/run/{run_id}/observatory` | Compatibility alias: get run observatory checks |
| `GET` | `/v1/train/run/{run_id}/result` | Compatibility alias: get run result |
| `POST` | `/v1/train/run/{run_id}/stop` | Compatibility alias: stop run |
| `GET` | `/v1/train/adapters` | List produced LoRA adapters |
| `GET` | `/v1/train/adapters/{adapter_id}` | Get a LoRA adapter metadata record |
| `GET` | `/v1/train/adapters/{adapter_id}/download` | Download adapter payload + checksum |
| `POST` | `/v1/observability/events` | Emit observability events |

Full OpenAPI docs are available at `/docs` when the server is running.

---

## Backends

Aegis supports both local deterministic mode and real backends:

| Component | Local (default) | Production |
|-----------|----------------|------------|
| **Retrieval** | Lexical index from fixtures | pgvector + Neo4j + cross-encoder reranking |
| **Training** | Deterministic simulation | verl GRPO with LoRA on GPU |
| **Scoring** | Rule-based + semantic | + LLM judge (OpenAI/Anthropic) |
| **Storage** | SQLite (`~/.aegis/aegis.db`) | PostgreSQL + Neo4j + Redis |
| **Ingestion** | In-memory pipeline | pgvector sink + Neo4j entity extraction |

Set environment variables to activate production backends:
```bash
export OPENAI_API_KEY=sk-...            # LLM judge scoring
export ANTHROPIC_API_KEY=sk-ant-...     # Alternative LLM backend
export AEGIS_POSTGRES_URL=postgresql://... # Real persistence
export AEGIS_NEO4J_URL=bolt://...       # Knowledge graph
export AEGIS_REDIS_URL=redis://...      # Cache layer
```

---

## Testing

Run the quality checks locally:

```bash
# Run all tests
pytest -q

# Run with coverage
pytest --cov=aegis --cov-fail-under=80

# Run a specific module
pytest tests/test_dimensions.py
```

Linting and type checking:

```bash
ruff check src/ tests/
mypy src/
```

---

## Benchmarks

Run domain-specific benchmark suites:

```bash
aegis eval benchmark-list
aegis eval benchmark --suite legal-memory      # 50 legal memory lifecycle scenarios
aegis eval benchmark --suite legal-memory-scale # 250 scaled legal memory scenarios
aegis eval benchmark --suite finance-memory    # 50 finance memory lifecycle scenarios
aegis eval benchmark --suite reward-integrity  # 200 seeded reward-hacking scenarios
```

Or via Make: `make benchmark`

---

## Notebooks

Interactive Jupyter notebooks in `notebooks/`:

| Notebook | Description |
|----------|-------------|
| `01_quickstart_eval.ipynb` | First eval run, dimension scores, tier grouping |
| `02_domain_eval.ipynb` | Legal/finance domain evaluation and benchmarks |
| `03_memory_operations.ipynb` | All 12 memory operations demonstrated |
| `04_ingestion_pipeline.ipynb` | Document ingestion with storage sinks |
| `05_closed_loop_demo.ipynb` | Eval -> diagnose -> improve -> re-eval walkthrough |

---

## Project Structure

```
aegis/
├── src/aegis/
│   ├── adapters/          # Agent framework adapters (OpenAI, Anthropic, etc.)
│   ├── api/               # FastAPI server, routes, middleware
│   ├── cli/               # Typer CLI application
│   ├── core/              # Config, shared types, schema definitions
│   ├── data/              # Dataset downloaders (CUAD, LegalBench, FinanceBench)
│   ├── eval/              # Evaluation engine, dimensions, scorers, judges
│   ├── ingestion/         # Document ingestion pipeline + storage sinks
│   ├── memory/            # Event log, graph, vector, temporal, provenance
│   ├── observatory/       # Training monitoring (reward hacking, drift)
│   ├── plugins/           # Domain plugins (legal, finance, safety)
│   ├── retrieval/         # Context retrieval (pgvector, Neo4j, cross-encoder)
│   ├── security/          # Governance and access control
│   ├── store/             # Persistence (SQLite, PostgreSQL)
│   └── training/          # RL engine (AMIR-GRPO, GRPO-SG, verl, curriculum)
├── benchmarks/            # Domain benchmark suites and harness
├── dashboard/             # Next.js dashboard (eval, training, memory)
├── examples/              # Python examples and sample configs
├── notebooks/             # Jupyter notebooks
├── tests/                 # 1300+ automated tests
├── Dockerfile             # Multi-stage API server image
├── docker-compose.yml     # Full-stack local development
├── pyproject.toml         # Build config, dependencies, entry points
└── README.md
```

---

## Contributing

Contributions are welcome. Please open an issue to discuss proposed changes before submitting a pull request.

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/my-feature`)
3. Install dev dependencies (`pip install -e ".[dev,all]"`)
4. Make your changes and add tests
5. Run `pytest`, `ruff check`, and `mypy` to verify
6. Submit a pull request

---

## License

Apache License 2.0. See [LICENSE](LICENSE) for details.

---

Built by [Metronis, Inc.](https://aegis.dev)
