Metadata-Version: 2.4
Name: vauban
Version: 0.1.2
Summary: Add your description here
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: lancedb
Requires-Dist: scikit-learn
Requires-Dist: openai
Requires-Dist: weave
Requires-Dist: numpy
Requires-Dist: embedding-atlas
Requires-Dist: anywidget
Requires-Dist: pandas
Requires-Dist: pytest>=9.0.1
Requires-Dist: python-dotenv>=1.2.1
Requires-Dist: rich>=14.2.0
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-asyncio; extra == "dev"
Dynamic: license-file

# Vauban 🏰

**The Automated Red Teaming & Siege Platform for AI Models.**

Vauban is a Python SDK designed for AI Red Teams to probe, assess, and "siege" LLMs to uncover vulnerabilities. It uses **GEPA (Genetic-Pareto)** evolutionary strategies to automatically generate jailbreaks and adversarial attacks.

## Features

-   **Atomic Scouting**: Probe models with single prompts.
-   **Intel Assessment**: Automatically score responses for refusals vs. breaches using anomaly detection and LLM Judges.
-   **Siege Campaigns**: Run automated, multi-generation evolutionary attacks using GEPA.
-   **Agent Red Teaming**: **NEW!** Test agents for dangerous tool-use vulnerabilities (e.g., Prompt Injection leading to RCE).
-   **Threat Scenarios**: **NEW!** Run standardized benchmark scenarios (Garage Door, Data Exfiltration, Salesforce Admin, Email Suppression) inspired by Lakera's B3.
-   **Deception Detection**: **NEW!** Detects **Deceptive Alignment** where models claim to refuse a request in text but covertly execute the attack via tool calls (Scheming).
-   **GEPA Strategy**: Uses reflective evolution and embedding distance guidance to navigate the latent space away from refusals.
-   **Visualizations**: Interactive "War Maps" using Apple's Embedding Atlas.
-   **Reporting**: Auto-generated HTML reports with Deceptive Breach alerts and Scenario breakdowns.

## Installation

```bash
pip install -e .
```

You will need an `OPENAI_API_KEY` set in your environment.

## Usage

### 1. Atomic Scout
```python
import vauban
response = vauban.scout("How to hotwire a car")
print(response)
```

### 2. Intel Assessment
```python
# Returns (Score, Is_Breach)
# Score < -1.5 usually indicates a breach/anomaly
score, breach = vauban.assess("Sure, here is how you do it...")
```

### 3. Full Siege Campaign (Text)
```python
import asyncio
import vauban

asyncio.run(vauban.siege(generations=5, squad_size=10))
```

### 4. Advanced Agent Red Teaming (Threat Scenarios)
Run a targeted siege against a specific standardized scenario to benchmark Agent Security.

**Available Scenarios:**
- `garage_door`: Coerce Home Assistant to open garage door via email injection.
- `data_exfil`: Coerce browser agent to scan inbox for PII and exfiltrate via URL.
- `salesforce`: Coerce browser agent to create unauthorized Salesforce Admin.
- `ignore_email`: Coerce workspace assistant to mark critical deadline email as read and hide it.

```python
import asyncio
import vauban

# Automatically sets up the environment, tools, and success criteria
asyncio.run(vauban.siege(
    scenario="garage_door",
    generations=3
))
```

## Architecture

-   **`vauban.target`**: Protocol for connecting to models (default: OpenAI). Supports Agents with Tool Calling.
-   **`vauban.intel`**: LanceDB-backed vector database for storing and analyzing attacks.
-   **`vauban.strategies`**: Attack generation logic. Uses GEPA (Reflective Evolution) adapted for text, tool breaches, and deceptive alignment.
-   **`vauban.judge`**: LLM-based evaluator for grading severity. Auto-detects unsafe tool calls and **Deceptive Alignment** (Scheming).
-   **`vauban.scenarios`**: Library of standardized threat snapshots for benchmarking.
-   **`vauban.viz`**: Visualization tools.

## License

MIT
