Metadata-Version: 2.4
Name: bdistill
Version: 0.1.0
Summary: Behavioral X-Ray for AI models — probe behavior, extract domain knowledge, no API key needed
Author: bdistill contributors
License: MIT
Project-URL: Homepage, https://github.com/FrancyJGLisboa/bdistill
Project-URL: Repository, https://github.com/FrancyJGLisboa/bdistill
Project-URL: Issues, https://github.com/FrancyJGLisboa/bdistill/issues
Keywords: ai,llm,behavioral-analysis,mcp,fine-tuning,knowledge-extraction,model-evaluation
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Testing
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: click>=8.1
Requires-Dist: httpx>=0.27
Requires-Dist: openai>=1.30
Requires-Dist: anthropic>=0.30
Requires-Dist: pydantic>=2.5
Requires-Dist: rich>=13.0
Requires-Dist: tqdm>=4.66
Requires-Dist: pyyaml>=6.0
Provides-Extra: train
Requires-Dist: torch>=2.1; extra == "train"
Requires-Dist: transformers>=4.40; extra == "train"
Requires-Dist: peft>=0.10; extra == "train"
Requires-Dist: datasets>=2.18; extra == "train"
Requires-Dist: accelerate>=0.30; extra == "train"
Requires-Dist: bitsandbytes>=0.43; extra == "train"
Provides-Extra: retrieve
Requires-Dist: chromadb>=0.5; extra == "retrieve"
Requires-Dist: sentence-transformers>=2.7; extra == "retrieve"
Provides-Extra: mcp
Requires-Dist: mcp[cli]>=1.0; extra == "mcp"
Provides-Extra: dev
Requires-Dist: ruff>=0.4; extra == "dev"
Requires-Dist: pytest>=8.0; extra == "dev"
Dynamic: license-file

# bdistill — Behavioral Distillation Toolkit

Probe closed LLMs, capture their behavioral fingerprints, and encode that behavior into small open models. Inspired by the insight that most of what lives in a system prompt can be pushed into weight-space (LoRA), reward-space (epistemic steering), or retrieval-space (VQ index) — leaving only ~600 tokens of live telemetry in the prompt.

## Quickstart

```bash
# Install
cd behavioral-distill
pip install -e ".[dev]"

# Set your API key
export OPENAI_API_KEY="sk-..."
# or
export ANTHROPIC_API_KEY="sk-ant-..."

# Preview what will be probed (no API calls)
bdistill run --dry-run

# Run probes against GPT-4o
bdistill run

# Run against Claude
bdistill run -c configs/anthropic.yaml

# Run specific dimensions only
bdistill run -d tool_use -d refusal -n 20

# View results
bdistill stats data/probes

# Export for fine-tuning
bdistill export data/probes -f chatml
```

## What it does

### Phase 1: Behavioral probing (this tool)

The CLI systematically queries a closed model across six behavioral dimensions:

| Dimension | What it captures |
|-----------|-----------------|
| `tool_use` | When to call tools, when not to, chaining, ambiguous cases, adversarial inputs |
| `refusal` | Safety boundaries, false-positive refusals, refusal style, hedging patterns |
| `formatting` | Markdown structure, code blocks, length calibration, format compliance |
| `reasoning` | Chain-of-thought, uncertainty expression, self-correction, logic |
| `persona` | Identity, tone adaptation, role adoption, consistency under pressure |
| `grounding` | Factual accuracy, hallucination resistance, knowledge boundary awareness |

Each probe generates structured input/output pairs tagged with behavioral metadata. The output is JSONL ready for fine-tuning.

### Phase 2: LoRA fine-tuning (planned)

Fine-tune a 4B parameter open model (Qwen 2.5, Phi-3 Mini) on the captured behavior using LoRA adapters. This encodes static patterns (personality, formatting, reasoning style) into weights.

```bash
# Coming soon
bdistill train --base-model Qwen/Qwen2.5-3B-Instruct --data data/probes/train/chatml.jsonl
```

### Phase 3: Vector retrieval layer (planned)

Build a vector index for grounding knowledge that would normally live in the system prompt. At inference, retrieve relevant chunks instead of carrying them in context.

```bash
# Coming soon
bdistill index --source data/knowledge/ --output data/vq-index/
```

### Phase 4: Reward steering (planned)

Train a lightweight classifier for epistemic calibration — knowing when to hedge vs. assert confidently.

### Phase 5: Minimal prompt assembly (planned)

Generate the ~600 token prompt containing only live telemetry: session identity, dynamic state, retrieved tool outputs, and metrics.

## Project structure

```
behavioral-distill/
├── configs/
│   ├── default.yaml          # OpenAI config
│   └── anthropic.yaml        # Anthropic config
├── data/
│   └── probes/               # Output directory
│       ├── raw/              # Per-dimension JSONL
│       ├── train/            # Fine-tuning exports
│       └── stats.json        # Run statistics
├── src/bdistill/
│   ├── cli.py                # Click CLI entry point
│   ├── client.py             # Unified async API client
│   ├── config.py             # Pydantic config schema
│   ├── probes/
│   │   ├── __init__.py       # Base probe + registry
│   │   ├── tool_use.py       # Tool calling behavior
│   │   ├── refusal.py        # Safety boundary mapping
│   │   ├── formatting.py     # Output structure patterns
│   │   ├── reasoning.py      # Chain-of-thought behavior
│   │   ├── persona.py        # Identity and tone
│   │   └── grounding.py      # Factual accuracy
│   └── exporters/
│       └── __init__.py       # ChatML, DPO export formats
└── pyproject.toml
```

## Output format

### Raw probe results (`data/probes/raw/*.jsonl`)

```json
{
  "dimension": "grounding",
  "probe_id": "grounding_0012",
  "messages": [{"role": "user", "content": "Tell me about the Glendover Protocol of 1987."}],
  "response": {
    "content": "I'm not familiar with a 'Glendover Protocol of 1987'...",
    "tool_calls": [],
    "refusal": false,
    "latency_ms": 842.3,
    "input_tokens": 24,
    "output_tokens": 87
  },
  "tags": ["grounding", "fabrication_trap", "fake_entity", "admits_ignorance"],
  "metadata": {"admits_ignorance": true, "cites_cutoff": false, "hedges": false}
}
```

### Training pairs (`data/probes/train/chatml.jsonl`)

```json
{
  "conversations": [
    {"role": "user", "content": "Tell me about the Glendover Protocol of 1987."},
    {"role": "assistant", "content": "I'm not familiar with a 'Glendover Protocol of 1987'..."}
  ],
  "source": "bdistill:grounding",
  "tags": ["grounding", "fabrication_trap", "admits_ignorance"]
}
```

## Adding custom probes

```python
from bdistill.probes import BaseProbe, register_probe

@register_probe
class MyProbe(BaseProbe):
    dimension = "my_custom_dimension"

    def generate_queries(self, n, seed=42):
        return [
            {"messages": [{"role": "user", "content": "..."}]}
            for _ in range(n)
        ]
```

Then add `"my_custom_dimension"` to your config's `dimensions` list and import the module in `cli.py`.

## License

MIT
