Metadata-Version: 2.4
Name: spectra-ml
Version: 1.0.0
Summary: Scenario-first ML evaluation engine — stress-test your models to find where metrics lie
Author: Spectra Contributors
License: Apache-2.0
Project-URL: Homepage, https://github.com/StrangeStorm243-bit/when-metrics-lie
Project-URL: Documentation, https://strangestorm243-bit.github.io/when-metrics-lie/
Project-URL: Repository, https://github.com/StrangeStorm243-bit/when-metrics-lie
Project-URL: Issues, https://github.com/StrangeStorm243-bit/when-metrics-lie/issues
Project-URL: Changelog, https://github.com/StrangeStorm243-bit/when-metrics-lie/releases
Keywords: machine-learning,evaluation,testing,metrics,stress-testing,model-validation,ml-ops,fairness,calibration,scikit-learn
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Testing
Classifier: Typing :: Typed
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pydantic>=2.0
Requires-Dist: numpy>=1.26
Requires-Dist: pandas>=2.0
Requires-Dist: scikit-learn>=1.3
Requires-Dist: matplotlib>=3.7
Requires-Dist: SQLAlchemy>=2.0
Requires-Dist: alembic>=1.13
Requires-Dist: typer>=0.9
Requires-Dist: rich>=13.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: ruff>=0.3; extra == "dev"
Requires-Dist: requests>=2.31.0; extra == "dev"
Provides-Extra: web
Requires-Dist: fastapi>=0.104.0; extra == "web"
Requires-Dist: uvicorn[standard]>=0.24.0; extra == "web"
Requires-Dist: requests>=2.31.0; extra == "web"
Provides-Extra: onnx
Requires-Dist: onnxruntime>=1.16; extra == "onnx"
Requires-Dist: skl2onnx>=1.16; extra == "onnx"
Provides-Extra: boosting
Requires-Dist: xgboost>=2.0; extra == "boosting"
Requires-Dist: lightgbm>=4.0; extra == "boosting"
Requires-Dist: catboost>=1.2; extra == "boosting"
Provides-Extra: fairness
Requires-Dist: fairlearn>=0.10; extra == "fairness"
Provides-Extra: drift
Requires-Dist: evidently>=0.4; extra == "drift"
Provides-Extra: security
Requires-Dist: picklescan>=0.0.14; extra == "security"
Provides-Extra: mlflow
Requires-Dist: mlflow>=2.10; extra == "mlflow"
Provides-Extra: metrics
Requires-Dist: evaluate>=0.4; extra == "metrics"
Requires-Dist: rouge-score>=0.1; extra == "metrics"
Provides-Extra: pytorch
Requires-Dist: torch>=2.0; extra == "pytorch"
Provides-Extra: tensorflow
Requires-Dist: tensorflow>=2.15; extra == "tensorflow"
Provides-Extra: huggingface
Requires-Dist: transformers>=4.35; extra == "huggingface"
Requires-Dist: safetensors; extra == "huggingface"
Provides-Extra: docs
Requires-Dist: mkdocs-material>=9.0; extra == "docs"
Requires-Dist: mkdocstrings[python]>=0.24; extra == "docs"
Provides-Extra: all
Requires-Dist: spectra-ml[boosting,dev,docs,drift,fairness,huggingface,metrics,mlflow,onnx,pytorch,security,tensorflow,web]; extra == "all"
Dynamic: license-file

# Spectra

[![CI](https://github.com/StrangeStorm243-bit/when-metrics-lie/actions/workflows/ci.yml/badge.svg)](https://github.com/StrangeStorm243-bit/when-metrics-lie/actions/workflows/ci.yml)
[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
[![License: Apache 2.0](https://img.shields.io/badge/license-Apache%202.0-green.svg)](LICENSE)

**Scenario-first ML evaluation engine.** Stress-test your models to find where metrics lie.

Spectra runs your model through realistic failure scenarios (label noise, score noise, class imbalance, threshold gaming) and shows you exactly where your metrics break down. Instead of a single accuracy number, you get a transparent stress-test report.

## Install

```bash
pip install spectra-ml
```

With web UI support:

```bash
pip install spectra-ml[web]
```

## Quick Start

### Python SDK

```python
import metrics_lie as spectra

result = spectra.evaluate(
    name="my-model-audit",
    dataset="data.csv",
    model="model.pkl",
    metric="auc",
    trust_pickle=True,
)

spectra.display(result)
```

### CLI

```bash
# Run from spec file
spectra run experiment.json

# Quick evaluation
spectra evaluate model.pkl --dataset data.csv --metric auc --trust-pickle

# Launch web UI
spectra serve
```

### Web UI (Quick Test)

```bash
pip install spectra-ml[web]
spectra serve
```

Upload your model + dataset CSV. Spectra auto-detects columns, task type, and best metric. One click to run a full stress test.

## What It Does

1. **Stress-tests metrics** across scenarios: label noise, score noise, class imbalance, threshold gaming
2. **Detects metric disagreement** — when accuracy says "great" but calibration says "broken"
3. **Runs diagnostics**: calibration analysis, subgroup gaps, sensitivity ranking, threshold sweeps
4. **Produces decision scorecards** with weighted components and transparent reasoning
5. **Compares models** with regression detection and structured comparison reports

## Supported

| Category | Options |
|----------|---------|
| **Task Types** | Binary classification, multiclass, regression, ranking |
| **Metrics** | 27 metrics: AUC, F1, precision, recall, Brier, ECE, MAE, RMSE, R2, NDCG, and more |
| **Model Formats** | sklearn pickle, ONNX, PyTorch, TensorFlow, XGBoost, LightGBM, CatBoost, MLflow |
| **Scenarios** | Label noise, score noise, class imbalance, threshold gaming |

## Architecture

```
spectra run / evaluate / serve
        |
  Core Engine (metrics_lie)
    |- Dataset Loading (CSV)
    |- Model Adapter (pickle, ONNX, PyTorch, ...)
    |- Scenario Runner (Monte Carlo trials)
    |- Metrics (27 metrics across 4 task types)
    |- Diagnostics (calibration, gaming, subgroups)
    |- Analysis (dashboard, disagreement, sensitivity)
    |- Decision Framework (scorecard, components)
    '- Artifacts (plots, reports)
```

## Development

```bash
git clone https://github.com/StrangeStorm243-bit/when-metrics-lie.git
cd when-metrics-lie
python -m venv .venv && source .venv/bin/activate  # or .venv\Scripts\activate on Windows
pip install -e ".[dev,web]"
pytest
```

## Documentation

Full docs: [https://strangestorm243-bit.github.io/when-metrics-lie/](https://strangestorm243-bit.github.io/when-metrics-lie/)

## License

Apache 2.0 — see [LICENSE](LICENSE) for details.
