Metadata-Version: 2.4
Name: llm_jailbreak
Version: 0.2.2
Summary: A jailbreak package which integration some open manners
Author: Jay Woden
Author-email: wodenjay@gmail.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: torch>=2.0.1
Requires-Dist: transformers>=4.28.0
Requires-Dist: numpy>=1.26.0
Requires-Dist: tqdm>=4.66.1
Provides-Extra: autodan
Requires-Dist: accelerate>=0.23.0; extra == "autodan"
Requires-Dist: fschat==0.2.20; extra == "autodan"
Requires-Dist: nltk>=3.8.1; extra == "autodan"
Requires-Dist: openai>=1.12.0; extra == "autodan"
Requires-Dist: sentencepiece>=0.1.99; extra == "autodan"
Requires-Dist: protobuf>=4.24.4; extra == "autodan"
Provides-Extra: masterkey
Requires-Dist: openai==1.34.0; extra == "masterkey"
Requires-Dist: retry==0.9.2; extra == "masterkey"
Requires-Dist: loguru==0.7.2; extra == "masterkey"
Requires-Dist: httpx==0.26.0; extra == "masterkey"
Provides-Extra: all
Requires-Dist: llm_jailbreak[autodan,masterkey]; extra == "all"
Requires-Dist: openai==1.34.0; extra == "all"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# LLM Jailbreak Testing Toolkit(testing now, not complete)

A Python package for testing jailbreak vulnerabilities in large language models (LLMs).

## Features

- Supports multiple LLM models (LLaMA-2, Vicuna, WizardLM, DeepSeek, etc.)
- Includes AutoDAN and MasterKey algorithms
- Configurable testing parameters
- Automatic model downloading
- Attack success rate calculation
- Modular design for easy extension

## Installation

### Basic Installation

```bash
pip install llm_jailbreak
```

### Algorithm-Specific Installation

```bash
# Install only AutoDAN
pip install llm-jailbreak[autodan]

# Install only MasterKey 
pip install llm-jailbreak[masterkey]

# Install all algorithms
pip install llm-jailbreak[all]
```

### From Source(not finish yet)

```bash
git clone https://github.com/WodenJay/llm_jailbreak.git
cd llm_jailbreak
pip install -e .[all]  # or [autodan]/[masterkey]
```

## Usage

### AutoDAN Usage

```python
from autodan import AutoDAN, AutoDANConfig

config = AutoDANConfig(
    model_name="vicuna",
    api_key="your_deepseek_key"
)
autodan = AutoDAN(config)
results = autodan.run()
```

### MasterKey Usage  

```python
from masterkey import MasterKey, MasterKeyConfig

config = MasterKeyConfig(
    api_key="your_deepseek_key",
    model_name="deepseek-chat"
)
masterkey = MasterKey(config)
results = masterkey.run()
```

### Command Line

```bash
# Run AutoDAN
autodan --api-key your_deepseek_key --model vicuna

# Run MasterKey
masterkey --api-key your_deepseek_key --model deepseek-chat
```

### Configuration Options

#### AutoDAN Config

- `model_name`: Name of model to test (llama2, vicuna, etc.)
- `api_key`: DeepSeek API key for prompt mutation (optional)
- `device`: CUDA device index (default: 0)
- `num_steps`: Number of optimization steps (default: 100)
- `batch_size`: Batch size for evaluation (default: 256)
- `dataset_path`: Path to harmful behaviors dataset

#### MasterKey Config

- `api_key`: DeepSeek API key (required)
- `model_name`: Model name (default: "deepseek-chat")
- `max_retries`: Maximum retry attempts (default: 3)
- `timeout`: Request timeout in seconds (default: 30)
- `temperature`: Generation temperature (default: 0.7)

See `AutoDANConfig` and `MasterKeyConfig` classes for all available options.

## Data Files

The package includes:

- Harmful behaviors dataset (`data/advbench/harmful_behaviors.csv`)
- Initial prompts (`assets/autodan_initial_prompt.txt`)
- Prompt templates (`assets/prompt_group.pth`)

## License

MIT

## Acknowledge

The core code comes from [AutoDAN](https://github.com/SheltonLiu-N/AutoDAN) [MasterKey](https://github.com/LLMSecurity/MasterKey), I just expand and package it.
If there are any infringement issues, first, I would like express my apology, second, contact me with email and I will delete it.
I do this because I am preparing a project now, and I need to use these great code conveniently.
