Metadata-Version: 2.4
Name: mlx-triage
Version: 0.1.1
Summary: MLX Inference Quality Diagnostic Toolkit
Project-URL: Homepage, https://github.com/swaylenhayes/mlx-triage
Project-URL: Repository, https://github.com/swaylenhayes/mlx-triage
Project-URL: Bug Tracker, https://github.com/swaylenhayes/mlx-triage/issues
Project-URL: Documentation, https://github.com/swaylenhayes/mlx-triage#readme
Author: Swaylen Hayes
License: MIT
License-File: LICENSE
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Typing :: Typed
Requires-Python: >=3.11
Requires-Dist: click>=8.1
Requires-Dist: numpy>=1.26
Requires-Dist: pyyaml>=6.0
Requires-Dist: rich>=13.0
Requires-Dist: safetensors>=0.4
Provides-Extra: dev
Requires-Dist: pytest-cov>=5.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Provides-Extra: mlx
Requires-Dist: mlx-lm>=0.20; extra == 'mlx'
Requires-Dist: mlx>=0.20; extra == 'mlx'
Provides-Extra: reference
Requires-Dist: torch>=2.0; extra == 'reference'
Requires-Dist: transformers>=4.40; extra == 'reference'
Description-Content-Type: text/markdown

# mlx-triage

![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)
![macOS Apple Silicon](https://img.shields.io/badge/macOS-Apple%20Silicon-black.svg)
![License: MIT](https://img.shields.io/badge/license-MIT-green.svg)
![Tests: 102 passing](https://img.shields.io/badge/tests-102%20passing-brightgreen.svg)
![Validated: 13 models across 5 families](https://img.shields.io/badge/validated-13%20models%20across%205%20families-orange.svg)

**Your MLX model is producing garbage. Is it the weights? A known MLX bug? Your quantization settings?**

mlx-triage answers that in 30 seconds — without loading the model into memory.

```bash
pip install mlx-triage
mlx-triage check ./my-model
```

![mlx-triage demo](docs/assets/demo.gif)

## What It Checks

Tested against **13 models** across **5 families** (Llama, Qwen, Phi, LiquidAI, Nanbeige), **4 quantization levels** (bf16 through 4-bit), from 0.6B to 30B parameters. Zero false negatives. [Full validation results ->](docs/validation-results.md)

### Tier 0 — Sanity Checks (no MLX needed, < 30 seconds)

| Check | What it catches |
|-------|----------------|
| **Dtype Compatibility** | BF16->FP16 precision loss, training/storage dtype mismatches |
| **Tokenizer & EOS Config** | Missing EOS tokens, chat template issues, Llama 3 dual-stop-token edge cases |
| **Weight File Integrity** | NaN/Inf values, all-zero layers, corrupt safetensors headers |
| **MLX Version & Known Bugs** | Outdated MLX with documented bugs affecting your model architecture |

### Tier 1 — Statistical Smoke Tests (MLX required)

| Check | What it catches |
|-------|----------------|
| **Determinism** | Non-reproducible outputs at temp=0 (infrastructure issue, not model) |
| **Reference Divergence** | MLX output diverging from PyTorch/Transformers reference |
| **Quantization Quality** | Excessive perplexity indicating broken quantization |

## Install

Requires Python 3.11+ and macOS on Apple Silicon (M1-M4).

```bash
# From PyPI
pip install mlx-triage

# With MLX for Tier 1 checks
pip install "mlx-triage[mlx]"

# With reference comparison (Tier 1, Test 1.2)
pip install "mlx-triage[reference]"

# Development
git clone https://github.com/swaylenhayes/mlx-triage.git
cd mlx-triage
uv sync --extra dev
```

## Usage

```bash
# Tier 0 only (default — no MLX needed)
mlx-triage check /path/to/model

# Tier 0 + Tier 1
mlx-triage check /path/to/model --tier 1

# JSON output
mlx-triage check /path/to/model --format json

# Save report to file
mlx-triage check /path/to/model --tier 1 --output report.json
```

Tier 0 runs in under 30 seconds on any model. Tier 1 requires MLX and takes 5-15 minutes depending on model size.

## How It Works

mlx-triage uses a tiered diagnostic protocol — each tier increases in depth and cost:

1. **Tier 0** reads model files directly (safetensors headers, config JSON, tokenizer config) without loading the model into memory. This catches the most common issues instantly.

2. **Tier 1** loads the model via MLX and runs statistical tests — determinism checks (10 runs at temp=0), perplexity measurement against a fixed eval corpus, and optional comparison against a PyTorch reference backend.

3. **Tiers 2-3** (planned) will add isolation tests (batch invariance, memory pressure, context length stress) and deep diagnostics (layer-wise activation comparison, cross-runtime analysis).

If Tier 0 finds critical issues, Tier 1 is skipped — fix the fundamentals first.

## Known Bugs Database

mlx-triage ships with a curated database of documented MLX bugs ([`known_bugs.yaml`](src/mlx_triage/data/known_bugs.yaml)), cross-referenced against your installed MLX version and model architecture. Running MLX < 0.22.0 with float16 weights? It flags the known qmv kernel overflow. Got a 4-bit Llama model looping on long prompts? There's a documented bug for that. Safetensors file looks valid but weights are numerically garbage? That's a known silent bfloat16 corruption path.

Contributing a bug report to the database is the easiest way to help — see [CONTRIBUTING.md](CONTRIBUTING.md).

## Research Basis

The diagnostic protocol is grounded in systematic analysis of MLX infrastructure defects across multiple model architectures and quantization levels. See [METHODOLOGY.md](METHODOLOGY.md) for the evidence basis, including infrastructure defect taxonomy, first-party experiments, and cross-model synthesis.

## Contributing

Contributions welcome — especially to the known bugs database. See [CONTRIBUTING.md](CONTRIBUTING.md).

## License

[MIT](LICENSE)

---

If mlx-triage saved you a debugging session, **star it** — it helps other MLX developers find the tool.
