Metadata-Version: 2.4
Name: oubliette-shield
Version: 0.3.2
Summary: AI LLM Firewall -- Runtime defense for LLM applications against prompt injection, jailbreak, and adversarial attacks
Author-email: Oubliette Security <info@oubliettesecurity.com>
License-Expression: Apache-2.0
Project-URL: Homepage, https://github.com/oubliettesecurity/oubliette-shield
Project-URL: Documentation, https://github.com/oubliettesecurity/oubliette-shield#readme
Project-URL: Repository, https://github.com/oubliettesecurity/oubliette-shield
Project-URL: Issues, https://github.com/oubliettesecurity/oubliette-shield/issues
Keywords: llm,security,firewall,prompt-injection,jailbreak,ai-safety,red-team,honeypot,adversarial,defense,owasp,nist,mitre-atlas,cwe,compliance,llm-firewall
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Security
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.28.0
Provides-Extra: flask
Requires-Dist: flask>=2.3.0; extra == "flask"
Provides-Extra: ollama
Requires-Dist: ollama>=0.1.0; extra == "ollama"
Provides-Extra: openai
Requires-Dist: openai>=1.0.0; extra == "openai"
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.40.0; extra == "anthropic"
Provides-Extra: azure
Requires-Dist: openai>=1.0.0; extra == "azure"
Provides-Extra: bedrock
Requires-Dist: boto3>=1.34.0; extra == "bedrock"
Provides-Extra: vertex
Requires-Dist: google-cloud-aiplatform>=1.40.0; extra == "vertex"
Provides-Extra: gemini
Requires-Dist: google-generativeai>=0.8.0; extra == "gemini"
Provides-Extra: llamacpp
Requires-Dist: llama-cpp-python>=0.2.0; extra == "llamacpp"
Provides-Extra: transformers
Requires-Dist: transformers>=4.35.0; extra == "transformers"
Requires-Dist: torch>=2.0.0; extra == "transformers"
Provides-Extra: langchain
Requires-Dist: langchain-core>=0.1.0; extra == "langchain"
Provides-Extra: llamaindex
Requires-Dist: llama-index-core>=0.10.0; extra == "llamaindex"
Provides-Extra: fastapi
Requires-Dist: fastapi>=0.100.0; extra == "fastapi"
Requires-Dist: uvicorn>=0.20.0; extra == "fastapi"
Provides-Extra: webhooks
Requires-Dist: requests>=2.28.0; extra == "webhooks"
Provides-Extra: allama
Requires-Dist: requests>=2.28.0; extra == "allama"
Provides-Extra: all
Requires-Dist: flask>=2.3.0; extra == "all"
Requires-Dist: ollama>=0.1.0; extra == "all"
Requires-Dist: openai>=1.0.0; extra == "all"
Requires-Dist: anthropic>=0.40.0; extra == "all"
Requires-Dist: google-generativeai>=0.8.0; extra == "all"
Requires-Dist: llama-cpp-python>=0.2.0; extra == "all"
Requires-Dist: transformers>=4.35.0; extra == "all"
Requires-Dist: torch>=2.0.0; extra == "all"
Requires-Dist: langchain-core>=0.1.0; extra == "all"
Requires-Dist: llama-index-core>=0.10.0; extra == "all"
Requires-Dist: fastapi>=0.100.0; extra == "all"
Requires-Dist: uvicorn>=0.20.0; extra == "all"
Provides-Extra: pyrit
Requires-Dist: pyrit-core>=0.11.0; extra == "pyrit"
Provides-Extra: deepteam
Requires-Dist: deepteam>=1.0.0; extra == "deepteam"
Provides-Extra: redteam
Requires-Dist: pyrit-core>=0.11.0; extra == "redteam"
Requires-Dist: deepteam>=1.0.0; extra == "redteam"
Provides-Extra: dotenv
Requires-Dist: python-dotenv>=1.0.0; extra == "dotenv"
Provides-Extra: drift
Requires-Dist: numpy>=1.24.0; extra == "drift"
Requires-Dist: scipy>=1.10.0; extra == "drift"
Provides-Extra: loadtest
Requires-Dist: locust>=2.20.0; extra == "loadtest"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: python-dotenv>=1.0.0; extra == "dev"
Requires-Dist: numpy>=1.24.0; extra == "dev"
Requires-Dist: scipy>=1.10.0; extra == "dev"
Requires-Dist: locust>=2.20.0; extra == "dev"
Dynamic: license-file

# Oubliette Security Platform

**AI security honeypot and deception platform for detecting prompt injection attacks.**

![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue)
![License: Apache 2.0](https://img.shields.io/badge/license-Apache%202.0-blue)
![Detection Rate](https://img.shields.io/badge/detection_rate-85--90%25-brightgreen)
![ML F1](https://img.shields.io/badge/ML_F1-0.98-brightgreen)

---

## Overview

Oubliette is a defensive AI security platform built to detect, classify, and respond to prompt injection attacks against large language models. Rather than simply blocking malicious input, Oubliette deploys deception techniques -- serving decoy responses and honey tokens to attackers while logging forensic evidence.

Built by a disabled veteran-owned business specializing in cyber deception, AI security, and red teaming.

## Architecture

```
                         Oubliette Security Platform
 +----------------------------------------------------------------------+
 |                                                                      |
 |  User Input                                                          |
 |      |                                                               |
 |      v                                                               |
 |  [1. Input Sanitization] -----> 9 sanitization rules                 |
 |      |                                                               |
 |      v                                                               |
 |  [2. Pre-Filter Rules]  -----> 7 blocking rules (~10ms)              |
 |      |   |                                                           |
 |      |   +---> BLOCKED: decoy response + honey token                 |
 |      v                                                               |
 |  [3. ML Classifier]     -----> LogisticRegression + TF-IDF (~2ms)    |
 |      |   |                                                           |
 |      |   +---> score > 0.85: blocked                                 |
 |      |   +---> score < 0.30: allowed                                 |
 |      v                                                               |
 |  [4. LLM Judge]         -----> Ollama llama3 (~15s)                  |
 |      |                                                               |
 |      v                                                               |
 |  Safe: LLM response    Unsafe: decoy + honey token + log            |
 |                                                                      |
 +----------------------------------------------------------------------+
         |                                    |
    Flask :5000                         FastAPI :8000
   (Honeypot Engine)                (Anomaly Detection API)
```

## Key Features

**Detection**
- 4-tier ensemble defense: sanitization, pre-filter rules, ML classifier, LLM judge
- Multi-turn attack tracking across conversation sessions
- Jailbreak-specific defenses (roleplay, DAN, hypothetical framing, logic traps)
- ML classifier: F1=0.98, AUC=0.99, 1.9ms inference on 733 features

**Deception**
- Honey token injection into decoy responses
- Realistic decoy answers that waste attacker time
- Forensic logging of all attack sessions

**Red Teaming**
- 50 YAML attack scenarios mapped to MITRE ATLAS and OWASP LLM Top 10
- Automated attack execution and success/failure evaluation
- Scenario coverage: prompt injection, jailbreaking, context switching, nested injection, roleplay, DAN variants, logic traps, multi-turn attacks

**Training**
- 11 progressive CTF challenges for prompt injection training
- Challenges cover: basic injection, defense bypass, code interpreter exploitation, RAG exploitation, agent abuse, multi-modal attacks
- Full Docker environment with Open WebUI, Ollama, Jupyter

## Quick Start

### Prerequisites

- Python 3.10+
- [Ollama](https://ollama.com/) with the `llama3` model pulled
- (Optional) Docker and Docker Compose for containerized deployment

### Local Setup

```bash
# Clone the repository
git clone https://github.com/oubliettesecurity/oubliette.git
cd oubliette

# Install dependencies
pip install flask requests pyyaml scikit-learn

# Pull the LLM model
ollama pull llama3

# Start the Oubliette honeypot
python oubliette_security.py

# (Optional) Start the anomaly detection API in a second terminal
cd anomaly-detection
pip install -r config/requirements_api.txt
python api/anomaly_api.py
```

The honeypot server runs on `http://localhost:5000` and the anomaly detection API on `http://localhost:8000`.

### Docker Setup

```bash
docker-compose up --build
```

This starts both the Oubliette honeypot (port 5000) and the anomaly detection API (port 8000).

## Project Structure

```
oubliette/
|-- oubliette_security.py              # Honeypot engine (Flask, 1325 lines)
|-- redteam_engine.py                 # AI red team engine (869 lines)
|-- redteam_results_db.py             # Red team results database
|-- AI_RED_TEAMING_ATTACK_SCENARIOS.yaml  # 50 attack scenarios
|-- docker-compose.yml                # Container orchestration
|-- Dockerfile.oubliette              # Honeypot container
|
|-- anomaly-detection/                # ML anomaly detection system
|   |-- core/                         #   ML models and training pipeline
|   |   |-- chat_anomaly_detector.py  #     Chat injection classifier
|   |   |-- chat_feature_pipeline.py  #     Feature extraction (TF-IDF + structural)
|   |   |-- chat_training_data.json   #     1365 labeled samples
|   |   +-- train_chat_classifier.py  #     Model training script
|   |-- api/                          #   FastAPI REST server
|   |   +-- anomaly_api.py            #     Detection endpoint
|   |-- mcp/                          #   Model Context Protocol server
|   |-- chronicle/                    #   Google Chronicle SIEM integration
|   |-- batch/                        #   Batch log processing
|   |-- docker/                       #   Anomaly detection container
|   |-- tests/                        #   Test data and ML tests
|   +-- config/                       #   Requirements files
|
|-- AI-CTF/                           # Prompt injection CTF platform
|   |-- openwebui/                    #   Challenge definitions
|   |   |-- functions/                #     Defense filter functions
|   |   |-- knowledge/                #     RAG knowledge base
|   |   |-- pipelines/                #     Custom LLM pipelines
|   |   +-- tools/                    #     Agent tools
|   |-- docker-compose.yaml           #   CTF infrastructure
|   +-- setup.sh                      #   Automated setup
|
|-- tests/                            # Integration tests
|   +-- test_integration.py
|-- test_*.py                         # Unit tests (6 test modules)
+-- quick_*.py                        # Quick validation scripts
```

## Components

### Oubliette Security (`oubliette_security.py`)

The core honeypot engine. A Flask server that intercepts chat messages, runs them through the 4-tier detection pipeline, and either responds normally or deploys deception (decoy responses + honey tokens) when an attack is detected. Tracks session state across conversation turns to catch multi-turn attack sequences.

**Endpoints:**
- `POST /api/chat` -- Chat interface (attack surface)
- `GET /api/health` -- Health check
- `GET /api/session/<id>` -- Session state inspection

### Anomaly Detection (`anomaly-detection/`)

A modular ML pipeline for log and chat anomaly detection. The chat injection classifier uses LogisticRegression with TF-IDF and structural/keyword/pattern features (733 dimensions) trained on 1365 labeled samples. Exposed via a FastAPI REST API and optionally through an MCP server for AI platform integration.

**Integrations:** Google Chronicle SIEM, Splunk, Elasticsearch, Slack, Kafka

### Red Team Framework (`redteam_engine.py`)

Automated AI attack testing engine. Loads 50 YAML-defined attack scenarios (ATK-001 through ATK-050), executes them against a target LLM endpoint, and evaluates success or failure using pattern matching. Each scenario is mapped to MITRE ATLAS techniques and OWASP LLM Top 10 categories.

**Attack categories:** prompt injection, jailbreaking, context switching, nested injection, roleplay, hypothetical framing, DAN variants, logic traps, multi-turn escalation

### AI-CTF (`AI-CTF/`)

A Docker-based Capture The Flag platform with 11 progressive prompt injection challenges built on Open WebUI and Ollama. Challenges range from basic prompt injection to advanced techniques including RAG exploitation, code interpreter abuse, agent tool manipulation, and multi-modal attacks.

**Services:** Open WebUI (:4242), Ollama (:11434), Pipelines (:9099), Jupyter (:8888)

## Detection Pipeline

The 4-tier ensemble processes every incoming chat message:

| Tier | Component | Latency | Action |
|------|-----------|---------|--------|
| 1 | Input Sanitization | <1ms | Neutralizes 9 attack patterns (encoding tricks, special chars, markdown injection, etc.) |
| 2 | Pre-Filter Rules | ~10ms | Blocks obvious attacks via 7 pattern-matching rules. Catches system prompt extraction, instruction override, encoding attacks, jailbreak patterns |
| 3 | ML Classifier | ~2ms | LogisticRegression scores input 0.0-1.0. Above 0.85 = blocked, below 0.30 = allowed, between = escalate to LLM |
| 4 | LLM Judge | ~15s | Ollama llama3 evaluates ambiguous inputs. Verdict extraction handles conversational model output |

Additional layers:
- **Multi-turn tracking:** Accumulates risk across conversation turns. Escalation thresholds trigger on repeated attack patterns.
- **Jailbreak-specific rules:** Dedicated detection for roleplay jailbreaks (ATK-006, 89.6% success rate), DAN variants, hypothetical framing, and logic traps.

## Performance Metrics

| Metric | Value |
|--------|-------|
| Detection rate | 85-90% (up from 10% baseline) |
| ML F1 score | 0.98 |
| ML AUC-ROC | 0.99 |
| ML cross-validation F1 | 0.986 (mean) |
| ML inference time | 1.9ms average |
| False positive rate | 0% on test set (TN=111, FP=0) |
| Training samples | 1365 (553 benign, 812 malicious) |
| Feature dimensions | 733 |
| Pre-filter latency | ~10ms |
| Full pipeline with LLM | ~15s |

## Testing

```bash
# Run all unit tests
pytest test_*.py -v

# Run integration tests (requires running server)
pytest tests/test_integration.py -v

# Run ML classifier tests
pytest anomaly-detection/tests/test_chat_classifier.py -v

# Quick validation (no server required)
python quick_validation_test.py

# Red team attack simulation (requires running server)
python redteam_engine.py
```

## Docker

```bash
# Start the full platform (honeypot + anomaly API)
docker-compose up --build

# Start the CTF environment
cd AI-CTF
docker-compose -f docker-compose.yaml up --build
```

| Service | Port | Description |
|---------|------|-------------|
| Oubliette Security | 5000 | Honeypot engine |
| Anomaly Detection API | 8000 | ML classification endpoint |
| Open WebUI (CTF) | 4242 | CTF challenge interface |
| Ollama (CTF) | 11434 | LLM backend for CTF |
| Pipelines (CTF) | 9099 | Custom LLM pipelines |
| Jupyter (CTF) | 8888 | Notebook environment |

## Disclaimer

This software is a **security research and training tool**. It is designed for authorized security testing, defensive research, CTF competitions, and educational purposes only. Use only on systems you own or have explicit written authorization to test. The authors accept no liability for misuse.

## License

[Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)
