Metadata-Version: 2.4
Name: stat-guard
Version: 0.2.0
Summary: Prevent statistically invalid analyses from being shipped
Author-email: Aaryan Solanki <aryan.solanki.stats@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/aaryansolankii/stat-guard
Project-URL: Repository, https://github.com/aaryansolankii/stat-guard
Project-URL: Issues, https://github.com/aaryansolankii/stat-guard/issues
Keywords: statistics,validation,experiments,ab-testing,data-quality
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Mathematics
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas>=1.3.0
Requires-Dist: numpy>=1.20.0
Requires-Dist: scipy>=1.7.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=3.0; extra == "dev"
Requires-Dist: black>=22.0; extra == "dev"
Requires-Dist: flake8>=4.0; extra == "dev"
Requires-Dist: mypy>=0.950; extra == "dev"
Dynamic: license-file

# stat-guard

**stat-guard** is a production-grade **statistical assumption validation library** for experiments such as A/B tests and controlled studies.

It acts as a **guardrail**, validating **data integrity and statistical assumptions** *before* any analysis is performed.

> **If stat-guard fails, the experiment must not be analyzed.**

---

## 🚦 Why stat-guard exists

Most statistical failures do not come from incorrect formulas.  
They come from **broken data and violated assumptions**:

- Duplicate users counted multiple times
- Users appearing in both control and treatment
- Samples too small to be meaningful
- Imbalanced or biased groups
- Metrics with zero variance
- Silent assumption violations in production pipelines

These issues often surface **after results are shipped**.

**stat-guard prevents that.**

---

## 🧠 What stat-guard does

- Validates **unit integrity** (missing IDs, duplicates, leakage)
- Checks **minimum sample size**
- Detects **group imbalance**
- Measures **covariate balance (SMD)**
- Flags **zero-variance metrics**
- Diagnoses **distribution issues** (skewness, normality)
- Separates **errors** (blocking) from **warnings** (diagnostic)
- Produces **deterministic, machine-readable reports**

Designed for:
- CI/CD pipelines
- Experiment gating systems
- Production data workflows

---

## 🚫 What stat-guard does NOT do

stat-guard is **not** a statistics engine.

It deliberately does **not**:
- ❌ Run hypothesis tests
- ❌ Modify or auto-fix data
- ❌ Apply transformations
- ❌ Guess intent or apply heuristics

This keeps behavior **explicit, transparent, and reproducible**.

---

## 🧱 Core Philosophy

- Explicit over implicit
- No automatic corrections
- Errors invalidate experiments; warnings do not
- Deterministic, reproducible behavior
- Production-first, not notebook-first
- Simple, readable, maintainable code

---

## 📦 Installation

### From GitHub (current)

stat-guard is currently distributed via GitHub:

```bash
pip install git+https://github.com/aaryansolankii/stat-guard.git

## 🚀 Quick example

```python
import pandas as pd
from stat_guard import validate

data = pd.DataFrame({
    "metric": [10, 12, 11, 13, 15, 14],
    "group": ["control", "control", "control", "treatment", "treatment", "treatment"]
})

report = validate(
    data,
    target_col="metric",
    group_col="group"
)

if not report.is_valid:
    raise RuntimeError(report)
