Metadata-Version: 2.4
Name: deeplens-sae
Version: 0.1.1
Summary: A library to train and use Sparse Autoencoders for mechanistic interpretability
Author-email: Iñigo Parra <iparra@berkeley.edu>
License-Expression: MIT
Project-URL: Homepage, https://iparramartin.github.io/deeplens/
Project-URL: Documentation, https://iparramartin.github.io/deeplens/
Project-URL: Repository, https://github.com/iparramartin/deeplens
Project-URL: Bug Tracker, https://github.com/iparramartin/deeplens/issues
Keywords: machine-learning,interpretability,sparse-autoencoders,mechanistic-interpretability,deep-learning
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: matplotlib==3.10.8
Requires-Dist: numpy<2
Requires-Dist: pandas==2.3.3
Requires-Dist: PyYAML==6.0.3
Requires-Dist: tqdm==4.67.1
Requires-Dist: transformers>=4.57.1
Requires-Dist: datasets==4.4.2
Requires-Dist: wandb==0.23.1
Dynamic: license-file

![DeepLens](docs/assets/header.png)

## Overview
__DeepLens__ is a library for mechanistic interpretability. It includes a full set of tools that allow end-to-end interpretability pipelines: from feature extraction, to feature steering. The library includes Sparse Autoencoders (TopK and L1), feature extractors, feature dataset modules, and intervention modules. 

## Quick How To
### Installation
Before installing any dependency, I recommend creating a new virtual envoronment to avoid library conflicts.

```bash
conda create -n deeplens python=3.11
conda activate deeplens
```

The following command should install the necessary dependencies and tools:

**For Windows (CUDA support)**:
```bash
pip install -e .
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
```

**For Mac (no CUDA support)**: 
```bash
pip install -e .
pip3 install torch torchvision
```

If any errors arise, you may alternatively use the manual installation:
```bash
pip install -r requirements.txt
pip install -e .
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
# OR FOR MAC
pip3 install torch torchvision
```

### 1. MLP Feature Extraction

```python
from deeplens.extractor import FromHuggingFace

extractor = FromHuggingFace(
    model="gpt2",
    layer=3,
    dataset_name="HuggingFaceFW/fineweb", # uses dataset streaming!
    num_samples=500,
    seq_length=1024,
    inference_batch_size=16,
    device="auto",
    save_features=True
)

features = extractor.extract_features()
```

### 2. Training

```python
from deeplens.sae import SparseAutoencoder
from deeplens.train import SAETrainer
from deeplens.utils.dataset import ActivationsDatasetBuilder
import torch

dataset = ActivationsDatasetBuilder(
    activations="YOUR_SAVED_FEATURES_PT_FILE",
    splits=[0.8, 0.2],
    batch_size=16,
    norm=True
)
train, eval = dataset.get_dataloaders()

config = SAETrainer.config_from_yaml('demo/config.yaml')
model = SparseAutoencoder(**config)

optimizer = torch.optim.Adam(
    model.parameters(), 
    lr=0.0003, 
    betas=(0.9,0.99),
    weight_decay=1e-5 # Just when using untied weights! Else set to 0
)

trainer = SAETrainer(
    model=model,
    model_name="YOUR_GIVEN_MODEL_NAME",
    train_dataloader=train,
    eval_dataloader=eval,
    optim=optimizer,
    epochs=3,
    bf16=True,
    random_seed=42,
    save_checkpoints=True,
    device="auto",
    grad_clip_norm=3.0,
    lrs_type='cosine',
    eval_steps=1000,
    warmup_fraction=0.1,
    save_best_only=True,
    log_to_wandb=True
)

trainer.train()
```

### 3. SAE Feature Extraction

```python
text = "What color is the car next to Mary's house?"
sample = ExtractSingleSample(
    model="SAVED_MODEL_DIR",
    sample=text,
    layer=3,
    max_length=512,
    device="auto"
)

acts = sample.get_mlp_acts()
```

### 4. Feature Intervention

```python
text = "What color is the car next to Mary's house?"
sample = ExtractSingleSample(
    model="SAVED_MODEL_DIR",
    sample=text,
    layer=3,
    max_length=512,
    device="auto"
)

acts = sample.get_mlp_acts()
```
