Metadata-Version: 2.4
Name: sage-ncs
Version: 0.1.0
Summary: SAGE - Safety Analysis and Guidance Engine for Nuclear Criticality Safety
Project-URL: Homepage, https://github.com/sage-ncs/sage
Project-URL: Documentation, https://sage-ncs.github.io/sage
Project-URL: Repository, https://github.com/sage-ncs/sage
Author: SAGE Team
License: MIT
Keywords: ai,criticality,llm,nuclear,rag,safety
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering
Requires-Python: >=3.11
Requires-Dist: alembic>=1.13.0
Requires-Dist: anthropic>=0.40.0
Requires-Dist: beautifulsoup4>=4.12.0
Requires-Dist: fastembed>=0.3.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: jinja2>=3.0.0
Requires-Dist: openai>=1.0.0
Requires-Dist: pdfplumber>=0.11.0
Requires-Dist: pillow>=10.0.0
Requires-Dist: psycopg2-binary>=2.9.9
Requires-Dist: pydantic-settings>=2.1.0
Requires-Dist: pydantic>=2.6.0
Requires-Dist: pymupdf>=1.24.0
Requires-Dist: pytesseract>=0.3.10
Requires-Dist: python-docx>=1.1.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: python-json-logger>=2.0.0
Requires-Dist: qdrant-client>=1.9.0
Requires-Dist: sqlalchemy>=2.0.0
Requires-Dist: tenacity>=8.2.0
Provides-Extra: all
Requires-Dist: bandit[toml]>=1.7.0; extra == 'all'
Requires-Dist: black>=24.0.0; extra == 'all'
Requires-Dist: detect-secrets>=1.4.0; extra == 'all'
Requires-Dist: hypothesis>=6.100.0; extra == 'all'
Requires-Dist: mkdocs-material>=9.5.0; extra == 'all'
Requires-Dist: mkdocs>=1.5.0; extra == 'all'
Requires-Dist: mkdocstrings[python]>=0.24.0; extra == 'all'
Requires-Dist: mlflow>=2.10.0; extra == 'all'
Requires-Dist: mypy>=1.8.0; extra == 'all'
Requires-Dist: pip-audit>=2.7.0; extra == 'all'
Requires-Dist: pre-commit>=3.6.0; extra == 'all'
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'all'
Requires-Dist: pytest-cov>=4.1.0; extra == 'all'
Requires-Dist: pytest-xdist>=3.5.0; extra == 'all'
Requires-Dist: pytest>=8.0.0; extra == 'all'
Requires-Dist: ruff>=0.2.0; extra == 'all'
Requires-Dist: safety>=3.0.0; extra == 'all'
Requires-Dist: sentence-transformers>=2.6.0; extra == 'all'
Requires-Dist: torch>=2.0.0; extra == 'all'
Provides-Extra: dev
Requires-Dist: bandit[toml]>=1.7.0; extra == 'dev'
Requires-Dist: black>=24.0.0; extra == 'dev'
Requires-Dist: detect-secrets>=1.4.0; extra == 'dev'
Requires-Dist: hypothesis>=6.100.0; extra == 'dev'
Requires-Dist: mlflow>=2.10.0; extra == 'dev'
Requires-Dist: mypy>=1.8.0; extra == 'dev'
Requires-Dist: pip-audit>=2.7.0; extra == 'dev'
Requires-Dist: pre-commit>=3.6.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.1.0; extra == 'dev'
Requires-Dist: pytest-xdist>=3.5.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: ruff>=0.2.0; extra == 'dev'
Requires-Dist: safety>=3.0.0; extra == 'dev'
Provides-Extra: docs
Requires-Dist: mkdocs-material>=9.5.0; extra == 'docs'
Requires-Dist: mkdocs>=1.5.0; extra == 'docs'
Requires-Dist: mkdocstrings[python]>=0.24.0; extra == 'docs'
Provides-Extra: local
Requires-Dist: sentence-transformers>=2.6.0; extra == 'local'
Requires-Dist: torch>=2.0.0; extra == 'local'
Description-Content-Type: text/markdown

<p align="center">
  <img src="assets/sage-logo.svg" width="400" alt="SAGE Logo">
</p>

# SAGE

**Safety Analysis & Guidance Engine**

A reasoning language model platform for safety-critical technical domains.

**Status:** Phase 1 (Foundation) complete | Phase 2 (Core Development) planned
**Version:** 0.1.0 | **Python:** 3.11-3.12 | **License:** MIT

## Initial Focus: SAGE-NCS

Nuclear Criticality Safety (NCS) is the initial domain, providing:

- Knowledge queries (standards, regulations, historical accidents)
- Document drafting assistance (CSEs, technical basis)
- Calculation support (SCALE/MCNP setup, interpretation)
- Double contingency analysis
- Training for new NCS engineers

## What's Built

### Safety Core (Production-Ready)

The safety-critical infrastructure is fully implemented with 100% test coverage:

- **Abstention system** - 6 triggers for refusing to answer (out-of-domain, low confidence, conflicting sources, calculation limits, dual-use concerns, ambiguous safety)
- **Escalation workflow** - 5 triggers routing to human experts (novel configurations, safety basis changes, regulatory implications, low confidence + high stakes, conflicting reasoning)
- **Output classification** - GREEN/YELLOW/RED safety levels with integrated validation
- **Reasoning verification** - Chain-of-thought validation, citation injection, uncertainty quantification

### Data Pipeline

- **Ingestion:** PDF (with table preservation), OCR (Tesseract), Word, HTML extraction
- **Sources:** NRC ADAMS and OSTI.gov API clients, ANSI/ANS-8 standards catalog
- **Processing:** Intelligent chunking preserving tables/equations/section hierarchy, metadata extraction, quality validation
- **Storage:** Qdrant vector store with hybrid search (dense + sparse via Reciprocal Rank Fusion), PostgreSQL for metadata
- **Embeddings:** OpenAI text-embedding-3-small (primary), BGE-large-en-v1.5 (local alternative)

### RAG Pipeline

- Full retrieval-augmented generation with configurable 30+ options
- Context building, response generation, citation verification, grounding score calculation
- NCS-specific prompt templates with conservative bias enforcement
- Multi-provider LLM support (Claude, GPT-4)

### NCS Tools

- **K-eff estimator** - Hand-method screening calculations, ANSI/ANS-8.1 single-parameter limits, surface density method
- **SCALE interface** - Input file generation and output parsing (requires external SCALE installation)
- **Standards lookup** - ANSI/ANS-8 series queries with version tracking
- **Geometry visualizer** - 3D visualization of fissile configurations
- **Unit converter** - Mass, volume, concentration, enrichment conversions

### Evaluation & Benchmarks

- Benchmark runner with ARH-600, ICSBEP (5000+ experiments), CSE review, and red team adversarial suites
- Evaluation metrics: accuracy, precision, recall, F1, citation quality, conservative bias
- Current baseline: 100% accuracy on initial test suite (19/19 questions)

### Training Data Collection

- Async interaction logger for all SAGE sessions (JSONL/CSV export)
- Expert feedback collection (correctness, citations, clarity)

### Infrastructure

- **CI/CD:** GitHub Actions pipelines for lint, type-check, test, security scanning, and release
- **Docker:** Multi-stage production image (Alpine), dev image, docker-compose for local dev (PostgreSQL + Qdrant + MLflow)
- **Monitoring:** Structured JSON logging, audit trail, production metrics tracking
- **Experiment tracking:** MLflow integration

## Architecture

```
Query → Router → [KNOWLEDGE | CALCULATION | ANALYSIS | REASONING]
                        ↓
               RAG Pipeline + Tools
                        ↓
         Reasoning Verification & Validation
                        ↓
         Classification (GREEN / YELLOW / RED)
              ↓              ↓           ↓
           Return      Caution       Escalate to
           answer      + caveats     human expert
```

**Dual-database design:** PostgreSQL (metadata, escalation queue, interaction logs) + Qdrant (vectors, chunked documents)

**Multi-provider LLM:** Anthropic Claude (primary), OpenAI GPT-4 (fallback), pluggable interface

## Quick Start

```bash
# Start services (PostgreSQL, Qdrant, app)
docker-compose up -d

# Download public NCS documents
python scripts/download_public_docs.py

# Run tests
pytest tests/ -v --cov=src/sage --cov-fail-under=80

# Run benchmark validation
python scripts/run_benchmark_validation.py

# Generate decision report
python scripts/generate_decision_report.py
```

See [examples/sage_config.yaml](examples/sage_config.yaml) for full configuration options.

## Testing

674 tests across unit, integration, system, acceptance, and benchmark categories.

| Category | Coverage Requirement |
|----------|---------------------|
| General | 80% minimum (enforced in CI) |
| Safety-critical modules | 100% |

Tests run on Python 3.11 and 3.12 with parallel execution (pytest-xdist). See [docs/testing/](docs/testing/) for test guides and conventions.

## Planned Work

### Phase 2: Core Development

- Continued pre-training on 10B+ token NCS corpus
- Supervised fine-tuning (5000+ expert Q&A pairs)
- RLHF with domain expert preference data
- Constitutional AI safety training
- Full SCALE/MCNP tool integration
- Advanced recursive reasoning (self-verification, iterative refinement)

### Phase 3-5: Validation, Pilot, Production

- Expert blind evaluation and red team exercises
- NRC/DOE regulatory compliance review
- Pilot site deployment
- Multi-domain expansion

## Future Domains

| Module | Domain |
|--------|--------|
| SAGE-RP | Radiation Protection |
| SAGE-PSA | Probabilistic Safety Assessment |
| SAGE-Fire | Fire Protection Engineering |
| SAGE-Trans | Transportation of Radioactive Materials |
| SAGE-Decom | Decommissioning & Waste Management |

## Documentation

- [Development Plan](SAGE-Development-Plan.md) - Architecture, strategy, and approach
- [Implementation TODOs](SAGE-Implementation-TODOs.md) - Actionable task list by phase
- [Changelog](CHANGELOG.md) - Version history
- [Threat Model](docs/security/threat-model.md) - Security analysis
- [Incident Response](docs/security/incident-response.md) - Incident procedures
- [Data Licensing](docs/data/licensing.md) - Source licensing guide
- [Benchmark Plan](docs/plans/benchmark-validation-suite.md) - Evaluation strategy

## Core Principles

1. **Conservative by design** - Always err on the side of safety
2. **Human-in-the-loop** - Augment engineers, don't replace judgment
3. **Traceable reasoning** - Every conclusion backed by citations
4. **Tool-augmented** - Calculations via verified tools, not LLM arithmetic
5. **Auditable** - Full reasoning chains for regulatory compliance
