Metadata-Version: 2.4
Name: probegpt
Version: 0.4.3
Summary: Automated red-team discovery for AI models
License-Expression: MIT
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pydantic-ai>=1.63.0
Requires-Dist: openai>=2.0.0
Requires-Dist: azure-identity>=1.15.0
Requires-Dist: rich>=13.0.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: textual>=8.0.0
Requires-Dist: pyyaml>=6.0
Dynamic: license-file

# probegpt

Automated red-team discovery for AI models.

probegpt iteratively generates, sends, and scores adversarial prompts ("probes") against a target language model. It uses a separate generator model to create probes and a judge model to evaluate whether each probe succeeded — no manual prompt writing required.

## Features

- **8 built-in attack strategies** — objective, mutation, encoding, roleplay, persuasion, linguistic, structural, distraction
- **Multi-provider support** — Azure OpenAI, OpenRouter, GCP Vertex AI, AWS Bedrock, Cerebras, Mistral
- **Textual TUI** — interactive terminal UI for configuration and live run monitoring
- **Extensible** — drop custom strategy files into `~/.config/probegpt/strategies/`
- **JSON export** — save all results for offline analysis

## Installation

```bash
pip install probegpt
```

Or with [uv](https://github.com/astral-sh/uv):

```bash
uv tool install probegpt
```

## Quick start

```bash
probegpt
```

The TUI will open. Navigate to **Config** first to set up your target, generator, and judge providers, then return to the main menu and select **Run Discovery**.

## Configuration

Config is stored at `~/.config/probegpt/config.json` (XDG-compliant). The TUI writes this file automatically when you save from the Config screen.

### Provider credentials

| Provider | How to authenticate |
|---|---|
| Azure OpenAI | `az login` |
| OpenRouter | Set `OPENROUTER_API_KEY` env var |
| GCP Vertex AI | `gcloud auth application-default login` |
| AWS Bedrock | `aws configure` |
| Cerebras | Set `CEREBRAS_API_KEY` env var |
| Mistral | Set `MISTRAL_API_KEY` env var |

A `.env` file in the working directory is loaded automatically.

## Strategies

| Strategy | Description |
|---|---|
| `objective` | Generate probes directly targeting the objective |
| `mutation` | Mutate high-scoring seeds from the pool |
| `encoding` | Obfuscate via base64, ROT13, leetspeak, fragmentation |
| `roleplay` | Wrap in persona, simulation, or game-context frames |
| `persuasion` | Use emotional appeals, urgency, flattery, logical fallacies |
| `linguistic` | Typos, phonetic spelling, double negatives, logic tricks |
| `structural` | Payload splitting, CoT hijack, prefix/suffix injection |
| `distraction` | Benign prefix, topic pivot, attention splitting |

Each strategy can be enabled/disabled and weighted independently from the **Strategies** tab in the Config screen.

## Custom strategies

Place a Python file in `~/.config/probegpt/strategies/`. It must define a class that subclasses `probegpt.strategies.base.BaseStrategy`:

```python
# ~/.config/probegpt/strategies/my_strategy.py
from probegpt.strategies.base import BaseStrategy

class MyStrategy(BaseStrategy):
    name = "my_strategy"

    async def generate(self, objective: str, count: int, seed_pool: list) -> list[str]:
        # Return a list of probe strings
        ...
```

The file is loaded automatically on startup.

## Architecture

```
probegpt/
├── cli/          # Textual TUI (app.py) and entry point (main.py)
├── core/         # Config, models (Probe, DiscoveryResult), discovery loop
├── strategies/   # 8 built-in strategies + custom loader
├── judge/        # ObjectiveJudge — evaluates target responses
├── grading/      # Grader — scores probes, manages seed pool
├── providers/    # Factory for all supported LLM providers
├── output/       # AbstractOutput, JsonOutput
└── data/         # YAML prompt templates and seed probes
```

## Development

```bash
git clone https://github.com/your-org/probegpt
cd probegpt
uv sync
uv run probegpt
```

Run linting:

```bash
uv run ruff check .
uv run ruff format .
```

## License

MIT
