Metadata-Version: 2.4
Name: inteprelens
Version: 0.1.0
Summary: InterpreLens: A Lens for Interpreting Large Language Models based on Transformers architecture.
Project-URL: Homepage, https://github.com/Aisuko/inteprelens
Project-URL: Repository, https://github.com/Aisuko/inteprelens
Author: Bowen
License: MIT
License-File: LICENSE
Requires-Python: >=3.10
Requires-Dist: einops>=0.8.2
Requires-Dist: huggingface-hub>=0.20.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: torch>=2.0.0
Requires-Dist: tqdm>=4.65.0
Requires-Dist: transformers>=4.30.0
Provides-Extra: datasets
Requires-Dist: datasets>=2.14.0; extra == 'datasets'
Requires-Dist: pyarrow>=12.0.0; extra == 'datasets'
Description-Content-Type: text/markdown

# inteprelens

`inteprelens` is the extracted core package for transformer architecture analysis.
It keeps the Causal Head Gating (CHG) workflow and adds tracing utilities for
named transformer stages and internal logits.

## Overview

Use `inteprelens` when you want to:

- train CHG masks over attention heads
- inspect necessary, sufficient, and facilitating heads
- trace intermediate transformer states such as attention output projection inputs,
  block outputs, final norm states, and per-layer logits
- score final-token logits, log-probabilities, and probabilities for calibration-style analysis
- run gradient-based attribution through facilitating head circuits

This repo is the reusable core package only. It does not include downstream
task pipelines or visualization workflows.

## Installation

Install the runtime package:

```bash
pip install inteprelens
```

For local development:

```bash
uv sync --group dev
uv run pytest -q
```

For build and publish checks:

```bash
uv sync --group publish
uv run --group publish python -m build
uv run --group publish twine check dist/*
```

If you use a gated Hugging Face model such as `meta-llama/Llama-3.2-1B`, make sure
`HF_TOKEN` is available in your environment or `.env`.

## Quick Start

```python
from inteprelens import LensAPI

analyzer = LensAPI.from_pretrained("meta-llama/Llama-3.2-1B")

results = analyzer.fit(
    texts=["What is the capital of France?"],
    targets=["Paris"],
    num_masks=1,
    num_updates=1,
    num_reg_updates=1,
    batch_size=1,
    verbose=False,
)

print(results.summary())
print(results.necessary_heads().head())
```

## Usage Examples

### CHG analysis

```python
from inteprelens import LensAPI

analyzer = LensAPI.from_pretrained("meta-llama/Llama-3.2-1B")

results = analyzer.fit(
    texts=[
        "The capital of France is",
        "2 + 2 equals",
    ],
    targets=[
        "Paris",
        "4",
    ],
    num_masks=1,
    num_updates=1,
    num_reg_updates=1,
    batch_size=1,
    verbose=False,
)

necessary = results.necessary_heads()
taxonomy = results.head_taxonomy()

print(necessary.head())
print(taxonomy.head())
```

### Trace transformer stages and logits

```python
from inteprelens import LensAPI

analyzer = LensAPI.from_pretrained("meta-llama/Llama-3.2-1B")

trace = analyzer.trace(
    texts="Paris is the capital of",
    layers=[0],
    sites=["attn_o_proj_pre", "final_norm", "logits"],
)

print(trace.get("attn_o_proj_pre", 0).shape)
print(trace.get("logits", 0).shape)
print(trace.final_logits.shape)
```

`trace.final_logits` contains the model's final output logits for the traced batch.

### Score final-token logits, log-probabilities, and probabilities

```python
from inteprelens import LensAPI

analyzer = LensAPI.from_pretrained("meta-llama/Llama-3.2-1B")

scores = analyzer.score(
    texts=[
        "The capital of France is",
        "2 + 2 equals",
    ],
    temperature=1.0,
)

print(scores.logits.shape)
print(scores.log_probs.shape)
print(scores.probs.shape)

final_logits = scores.final_token_logits()
final_log_probs = scores.final_token_log_probs()
final_probs = scores.final_token_probs()

print(final_logits.shape)
print(final_log_probs.shape)
print(final_probs.shape)
```

Use `logits` as the canonical calibration output, derive `log_probs` for stable
token scoring, and use `probs` when you need confidence-style metrics.

### Gradient-based attribution through facilitating heads

```python
from inteprelens import CausalCircuitAttribution, LensAPI

analyzer = LensAPI.from_pretrained("meta-llama/Llama-3.2-1B")

results = analyzer.fit(
    texts=["The capital of France is"],
    targets=["Paris"],
    num_masks=1,
    num_updates=1,
    num_reg_updates=1,
    batch_size=1,
    verbose=False,
)

facilitating_mask = results.get_facilitating_mask()
attribution = CausalCircuitAttribution(analyzer.model, analyzer.tokenizer)

sentence_scores = attribution.compute_sentence_importance(
    document="Paris is the capital of France. It is one of Europe's largest cities.",
    sentences=[
        "Paris is the capital of France.",
        "It is one of Europe's largest cities.",
    ],
    summary="Paris is the capital of France.",
    facilitating_mask=facilitating_mask,
)

print(sentence_scores)
```

## Public API

- `LensAPI`: high-level interface for CHG fitting, token scoring, and named-site tracing
- `TransformerTracer`: lower-level tracer for direct transformer-site collection
- `CausalCircuitAttribution`: gradient attribution through facilitating CHG heads
- `CHGDataset`: helper for building CHG-ready datasets from prompt/target pairs

## Acknowledgements

`inteprelens` builds on and adapts code and ideas from the
[Causal Head Gating](https://github.com/jonhanke/nam_causal-head-gating) project.
The extracted core package keeps CHG support and extends it with transformer-stage
tracing and internal-logit inspection for architecture analysis workflows.

## License

This project is released under the MIT License. See [LICENSE](LICENSE).
