Metadata-Version: 2.4
Name: erasus
Version: 0.1.5
Summary: Efficient Representative And Surgical Unlearning Selection — Universal Machine Unlearning via Coreset Selection
Author-email: Avaya Aggarwal <aggarwal.avaya27@gmail.com>
License: MIT
Keywords: machine-unlearning,coreset,foundation-models,privacy,pytorch
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: torch>=2.0
Requires-Dist: numpy>=1.24
Requires-Dist: Pillow>=9.0
Requires-Dist: tqdm>=4.60
Requires-Dist: pyyaml>=6.0
Requires-Dist: transformers>=4.30
Provides-Extra: full
Requires-Dist: diffusers>=0.20; extra == "full"
Requires-Dist: opacus>=1.3; extra == "full"
Requires-Dist: datasets>=2.14; extra == "full"
Requires-Dist: scikit-learn>=1.2; extra == "full"
Requires-Dist: matplotlib>=3.7; extra == "full"
Requires-Dist: seaborn>=0.12; extra == "full"
Requires-Dist: wandb>=0.15; extra == "full"
Requires-Dist: peft>=0.5; extra == "full"
Requires-Dist: huggingface_hub>=0.20; extra == "full"
Provides-Extra: hub
Requires-Dist: huggingface_hub>=0.20; extra == "hub"
Requires-Dist: datasets>=2.14; extra == "hub"
Provides-Extra: dashboard
Requires-Dist: streamlit>=1.28; extra == "dashboard"
Requires-Dist: gradio>=4.0; extra == "dashboard"
Provides-Extra: dev
Requires-Dist: pytest>=7.4; extra == "dev"
Requires-Dist: pytest-cov>=4.1; extra == "dev"
Requires-Dist: ruff>=0.1; extra == "dev"
Requires-Dist: mypy>=1.5; extra == "dev"
Requires-Dist: pre-commit>=3.4; extra == "dev"

<p align="center">
  <h1 align="center">Erasus</h1>
  <p align="center">
    <strong>Efficient Representative And Surgical Unlearning Selection</strong><br>
    Universal machine unlearning via coreset selection
  </p>
  <p align="center">
    <a href="#installation"><img src="https://img.shields.io/badge/python-3.9+-blue.svg" alt="Python 3.9+"></a>
    <a href="#installation"><img src="https://img.shields.io/badge/pytorch-2.0+-ee4c2c.svg" alt="PyTorch 2.0+"></a>
    <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-green.svg" alt="License: MIT"></a>
    <a href="#test-status"><img src="https://img.shields.io/badge/tests-465%20passed-brightgreen.svg" alt="Tests"></a>
  </p>
</p>

---

Erasus surgically removes specific data, concepts, or behaviours from trained models -- without the cost of retraining. It works across **LLMs**, **VLMs**, **Diffusion**, **Audio**, and **Video** models through a single API.

```python
from erasus import ErasusUnlearner

unlearner = ErasusUnlearner(model, strategy="auto", selector="influence")
result = unlearner.fit(forget_data=forget_loader, retain_data=retain_loader, epochs=5)
```

The core insight: **coreset selection finds the minimal set of samples that define forgetting** -- unlearning 10% of the most influential samples can approximate unlearning 100% with bounded utility loss.

---

## How it works

```
Forget data + Retain data  -->  Coreset selection  -->  Targeted unlearning  -->  Evaluation & certification
                               (24 selectors)          (41 strategies)          (25+ metrics)
```

1. **Select** the minimal representative subset (coreset) from the forget set
2. **Unlearn** by applying a strategy (gradient ascent, Fisher forgetting, SCRUB, NPO, etc.)
3. **Verify** via membership inference attacks, accuracy checks, and certified removal bounds

---

## Installation

```bash
pip install -e ".[dev]"          # editable install with dev tools
pip install -e ".[full]"         # all optional deps (diffusers, wandb, peft, etc.)
```

## Quick start

### High-level API

```python
from erasus import ErasusUnlearner

unlearner = ErasusUnlearner(
    model=model,
    strategy="gradient_ascent",   # or "auto" for automatic selection
    selector="influence",
    precision="bf16-mixed",       # mixed precision support
)
result = unlearner.fit(
    forget_data=forget_loader,
    retain_data=retain_loader,
    epochs=5,
    gradient_checkpointing=True,  # enable for large models
)
```

### Composable primitives (Fabric-style)

For users who want their own training loop:

```python
from erasus.fabric import select_coreset, apply_gradient_ascent, compute_forgetting_quality

indices = select_coreset("influence", model, forget_loader, k=100)
apply_gradient_ascent(model, forget_loader, lr=1e-4, epochs=3)
quality = compute_forgetting_quality(model, forget_loader)
```

### Custom unlearning logic

```python
from erasus import UnlearningModule, UnlearningTrainer

class MyModule(UnlearningModule):
    def __init__(self, model, lr=1e-3):
        super().__init__(model)
        self.save_hyperparameters(ignore=["model"])

    def forget_step(self, batch, batch_idx):
        loss = -F.cross_entropy(self.model(batch[0]), batch[1])
        self.log("forget_loss", loss)
        return loss

    def retain_step(self, batch, batch_idx):
        return F.cross_entropy(self.model(batch[0]), batch[1])

trainer = UnlearningTrainer(epochs=10, validate_every=2, early_stopping_patience=3)
result = trainer.fit(MyModule(model), forget_loader, retain_loader)
```

### Strategy pipeline (chaining)

```python
from erasus import StrategyPipeline, ErasusUnlearner

pipeline = StrategyPipeline([
    ("gradient_ascent", {"epochs": 3, "lr": 1e-3}),
    ("fisher_forgetting", {"epochs": 2, "lr": 1e-4}),
])
unlearner = ErasusUnlearner(model, strategy=pipeline)
```

### Incremental unlearning

```python
from erasus.data.datasets.unlearning import UnlearningDataset
from erasus.unlearners.continual_unlearner import ContinualUnlearner

ds = UnlearningDataset(base_dataset, forget_indices=[0, 5, 12])
unlearner = ContinualUnlearner(model, strategy="gradient_ascent")
result = unlearner.incremental_fit(ds)

# Later, new deletion requests arrive:
ds.mark_forget([42, 88])
result = unlearner.incremental_fit(ds, previous_result=result)
```

### Benchmarking with protocols

```python
from erasus.evaluation import UnlearningBenchmark

benchmark = UnlearningBenchmark(
    protocol="tofu",              # or "muse", "wmdp", "general"
    include_privacy=True,         # adds epsilon-delta verification
    n_runs=5,
)
report = benchmark.evaluate(model, forget_loader, retain_loader, gold_model=retrained)
print(report.verdict)    # PASS / PARTIAL / FAIL
report.save("results.json")
```

### CLI

```bash
erasus unlearn --config config.yaml --coreset-from influence --coreset-k 100
erasus benchmark --protocol tofu --gold-model retrained.pt --n-runs 5
erasus evaluate --protocol general --include-privacy
```

---

## Strategies (41)

| Category | Methods |
|----------|---------|
| Gradient | Gradient Ascent, SCRUB, Fisher Forgetting, Negative Gradient, WGA, Saliency |
| Parameter | LoRA Unlearning, Sparse-Aware, Mask-Based, Neuron Pruning, Layer Freezing |
| Data | Amnesiac, SISA, Certified Removal, Knowledge Distillation |
| LLM-specific | SSD, NPO, SimNPO, AltPO, FLAT, RMU, UNDIAL, Delta, Token Masking, Embedding Alignment, Causal Tracing, Attention Surgery |
| Diffusion | Concept Erasure, Noise Injection, U-Net Surgery, Timestep Masking, Safe Latents, Meta |
| VLM | Contrastive, Attention, Vision-Text Split, Modality Decoupling |
| Inference-time | DExperts, Activation Steering |
| Meta | AutoStrategy (`"auto"`), StrategyPipeline, Ensemble |

## Selectors (24)

| Category | Methods |
|----------|---------|
| Gradient-based | Influence, TracIn, Gradient Norm, GradMatch/CRAIG, EL2N, Representer |
| Geometry-based | k-Center, Herding, GLISTER, Submodular, k-Means++, Farthest First |
| Learning-based | Forgetting Events, Data Shapley, Valuation Network, Active Learning |
| Ensemble | Voting, AutoSelector, Weighted Fusion |

## Models

| Modality | Architectures | Unlearner |
|----------|--------------|-----------|
| Language | LLaMA, Mistral, GPT-2/J, BERT, T5 | `LLMUnlearner` |
| Vision-Language | CLIP, LLaVA, BLIP-2 | `VLMUnlearner` |
| Diffusion | Stable Diffusion 1.x/2.x/XL | `DiffusionUnlearner` |
| Audio | Whisper, CLAP, Wav2Vec | `AudioUnlearner` |
| Video | VideoMAE, VideoCLIP | `VideoUnlearner` |
| Any | Auto-detect | `MultimodalUnlearner` |

---

## Evaluation

25+ metrics across forgetting quality, model utility, efficiency, and privacy:

```python
from erasus.metrics import MetricSuite
results = MetricSuite(["accuracy", "mia"]).run(model, forget_loader, retain_loader)
```

Certification with formal bounds:

```python
from erasus.certification import CertifiedRemovalVerifier
verifier = CertifiedRemovalVerifier(epsilon=1.0, delta=1e-5)
result = verifier.verify(unlearned_model, retrained_model, n_total=10000, n_forget=500)
```

---

## Performance features

- **Mixed precision** -- `precision="bf16-mixed"` for 2x throughput
- **Gradient checkpointing** -- `gradient_checkpointing=True` for large models
- **Adaptive memory** -- auto batch-size tuning and chunked computation to prevent OOM
- **In-place operations** -- optimised Fisher/gradient accumulation
- **Composable callbacks** -- 11 hook points for custom behaviour injection

---

## Project structure

```
erasus/
  core/           Base classes, registry, config, coreset, pipeline, trainer
  strategies/     41 unlearning algorithms
  selectors/      24 coreset selection methods
  unlearners/     High-level orchestrators (LLM, VLM, Diffusion, Audio, Video, Federated, Continual)
  metrics/        25+ evaluation metrics
  evaluation/     Adversarial evaluation, benchmarks, verification suite
  losses/         8 loss functions
  fabric.py       Composable primitives for custom loops
  privacy/        DP mechanisms, certificates
  certification/  Formal removal verification, theoretical bounds
  experiments/    Tracking (W&B, MLflow), HPO, ablation
  visualization/  16 visualization modules
  cli/            Command-line interface
  utils/          Helpers, callbacks, memory management, distributed
```

## Test status

```
465 tests passed  |  0 regressions  |  ~3s
```

```bash
python3 -m pytest tests/ -v --tb=short \
  --ignore=tests/test_components.py \
  --ignore=tests/unit/test_sprint_b.py \
  --ignore=tests/unit/test_sprint_f.py
```

---

## Contributing

```bash
git clone https://github.com/OnePunchMonk/erasus.git
cd erasus && pip install -e ".[dev]"
python -m pytest tests/ -v
```

## License

MIT -- see [LICENSE](LICENSE).

## Citation

```bibtex
@software{erasus2026,
  title={Erasus: Universal Machine Unlearning via Coreset Selection},
  author={Aggarwal, Avaya},
  year={2026},
  url={https://github.com/OnePunchMonk/erasus}
}
```
