Metadata-Version: 2.4
Name: steerling
Version: 0.1.2
Summary: Steerling: An interpretable causal diffusion language model with concept steering
Project-URL: Homepage, https://github.com/guidelabs/steerling
Project-URL: Repository, https://github.com/guidelabs/steerling
Author: Guide Labs
License-Expression: Apache-2.0
License-File: LICENSE
Keywords: concept-steering,diffusion,interpretability,language-model
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.13
Requires-Dist: huggingface-hub>=0.20.0
Requires-Dist: numpy~=2.3.0
Requires-Dist: pydantic~=2.10.0
Requires-Dist: safetensors>=0.4.0
Requires-Dist: tiktoken~=0.8.0
Requires-Dist: torch~=2.8.0
Requires-Dist: triton>=3.0.0
Provides-Extra: dev
Requires-Dist: ipykernel~=6.29.5; extra == 'dev'
Requires-Dist: ipywidgets~=8.1.5; extra == 'dev'
Requires-Dist: jupyter~=1.1.1; extra == 'dev'
Requires-Dist: lm-eval~=0.4.0; extra == 'dev'
Requires-Dist: matplotlib~=3.9.2; extra == 'dev'
Requires-Dist: pandas~=2.2.3; extra == 'dev'
Requires-Dist: pre-commit~=4.0.1; extra == 'dev'
Requires-Dist: pytest~=8.3.0; extra == 'dev'
Requires-Dist: ruff~=0.8.4; extra == 'dev'
Requires-Dist: seaborn~=0.13.2; extra == 'dev'
Requires-Dist: tqdm~=4.67.1; extra == 'dev'
Requires-Dist: ty; extra == 'dev'
Provides-Extra: dev-tools
Requires-Dist: pre-commit~=4.0.1; extra == 'dev-tools'
Requires-Dist: ruff~=0.8.4; extra == 'dev-tools'
Requires-Dist: ty; extra == 'dev-tools'
Provides-Extra: eval
Requires-Dist: lm-eval~=0.4.0; extra == 'eval'
Requires-Dist: pandas~=2.2.3; extra == 'eval'
Requires-Dist: tqdm~=4.67.1; extra == 'eval'
Provides-Extra: notebook
Requires-Dist: ipykernel~=6.29.5; extra == 'notebook'
Requires-Dist: ipywidgets~=8.1.5; extra == 'notebook'
Requires-Dist: jupyter~=1.1.1; extra == 'notebook'
Requires-Dist: matplotlib~=3.9.2; extra == 'notebook'
Requires-Dist: seaborn~=0.13.2; extra == 'notebook'
Provides-Extra: test
Requires-Dist: pytest~=8.3.0; extra == 'test'
Description-Content-Type: text/markdown

# Steerling

An interpretable causal diffusion language model.

Steerling-8B combines masked diffusion language modeling with concept decomposition, enabling:
- **Generation**: Non-autoregressive text generation via confidence-based unmasking
- **Attribution**: Decompose predictions into known concept contributions
- **Steering**: Intervene on concept activations to control generation
- **Embeddings**: Extract hidden, composed, known, or unknown representations

## Quick Start

```bash
pip install steerling
```

```python
from steerling import SteerlingGenerator, GenerationConfig

generator = SteerlingGenerator.from_pretrained("guidelabs/steerling-8b")

text = generator.generate(
    "The key to understanding neural networks is",
    GenerationConfig(max_new_tokens=100, seed=42),
)
print(text)
```

## Model Details

| Property | Value |
|---|---|
| Parameters | ~8B |
| Architecture | CausalDiffusionLM + Interpretable Concept Head |
| Context Length | 4096 |
| Vocabulary | 100,281 (cl100k_base + specials) |
| Known Concepts | 33,732 |
| Unknown Concepts | 101,196 |
| GQA | 32 heads, 4 KV heads |
| Precision | bfloat16 |

## Architecture

Steerling uses block-causal attention (bidirectional within 64-token blocks, causal across blocks) with masked diffusion training. At inference, tokens are generated by iteratively unmasking positions in order of model confidence. The interpretable concept heads decompose transformer hidden states `h` into:

```
h → known_features + unk_hat + epsilon = composed → lm_head → logits
```

- `known_features`: Weighted sum of top-k learned concept embeddings
- `unk_hat`: Residual features captured by a factorized unknown head
- `epsilon`: Small correction term for reconstruction fidelity

## Installation

```bash
# From PyPI
pip install steerling

# From source
git clone https://github.com/guidelabs/steerling.git
cd steerling
pip install -e ".[dev]"

# With evaluation support
pip install -e ".[all]"
```


## FAQ

- **Where can I read more about the details of this architecture?**\
  You can read more about the architecture in these blog posts: [Scaling Interpretable Models with 8B Parameters](https://www.guidelabs.ai/post/scaling-interpretable-models-8b/) and [Causal Diffusion Language Models](https://www.guidelabs.ai/post/block-causal-diffusion-language-model/). We will be releasing a more detailed technical report in a few months.

- **This is a base model, what about an instruction-tuned model?**\
  Stay tuned.

- **Is training code available?**\
  This release is inference-only, so the training code is not included. If you're interested in training or fine-tuning, please reach out to info@guidelabs.ai.


- **What dataset did you train on?**\
  We trained on an augmented version of the Nemontron-cc-hq data for a total of about 1.35 Trillion tokens.

- **What is block-causal attention?**\
  Standard causal attention only lets each token attend to previous tokens. Block-causal attention groups tokens into blocks of say 64 and allows bidirectional attention within each block, while maintaining causal ordering across blocks. This gives the model local bidirectional context while preserving the ability to generate sequentially. Refer to this post: [Causal Diffusion Language Models](https://www.guidelabs.ai/post/block-causal-diffusion-language-model/), for more details.

- **What are "known" and "unknown" concepts?**\
  The model decomposes its internal representations into two parts:
  - *Known concepts* (33,732): learned and supervised features that correspond to identifiable patterns that a human will understand.
  - *Unknown concepts* (101,196): capture the signal that known concepts don't explain in the hidden representations.
  - Together they reconstruct the full hidden state with an error: `hidden ≈ known_features + unknown_features + epsilon`.

- **How do I find concept IDs for steering?**\
  Over the coming weeks, we will provide a full-scale workthrough of how to extract and steer Steerling-8B.

- **What GPU do I need?**\
  Steerling-8B in bfloat16 requires approximately 18GB VRAM. It fits on a single H100, A100 (40GB or 80GB), A6000 (48GB), or RTX 4090 (24GB).

- **Can I fine-tune this model?**\
  Yes. However, we have not included finetuning code with this package. It is currently an inference-only release; if there is increasing request, we will support fine-tuning in a future release.

- **What tokenizer does Steerling-8B use?**\
  Steerling uses OpenAI's `cl100k_base` tokenizer (via tiktoken) with 4 additional special tokens: `<|pad|>`, `<|bos|>`, `<|endofchunk|>`, and `<|mask|>`, for a total vocabulary of 100,281 tokens.

- **Can I use this with the Hugging Face transformers library?**\
  Not directly, Steerling uses a custom architecture (block-causal attention, concept heads) that isn't in the transformers library. Use the `steerling` package instead, which provides `SteerlingGenerator.from_pretrained()` with a similar interface.

- **How do I get training data attributions?**\
This release is a light-weight version of the pipeline, so it doesn't directly support training data attribution. We have provided notebooks to enable concept, and feature attributions. If you're interested in supporting training data attribution, please reach out to Guide Labs.


## License

The Steerling source code is released under the  [Apache License 2.0](LICENSE).

The model weights are provided for research and evaluation purposes.
The weights were trained on datasets with varying license terms, including
[Nemotron-CC-HQ](https://huggingface.co/datasets/nvidia/Nemotron-CC) and
[Dolmino Mix](https://huggingface.co/datasets/allenai/dolmino-mix-1124).
Some training data includes synthetic content generated by third-party models
with their own license terms. We are currently reviewing the implications of
these upstream licenses for downstream use of the model weights.
Please check back for updates on the weight licensing terms.

For questions about commercial use of the model weights,
contact us at info@guidelabs.ai
