Metadata-Version: 2.4
Name: redsafe
Version: 0.1.2
Summary: Local AI-safe redaction engine for security data
Author: AI Safe Redaction Engine
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: spacy
Requires-Dist: pytesseract
Requires-Dist: opencv-python
Requires-Dist: lxml
Requires-Dist: numpy
Requires-Dist: scikit-learn

# AI Safe Redaction Engine

A local-first, modular redaction system for security data before AI processing.

## Single-Word CLI Name
This project is packaged as:
- Command: `redsafe`
- Package name: `redsafe`

PyPI:
- https://pypi.org/project/redsafe/

## Supported Inputs
- Burp Suite XML exports
- Burp project/session files (`.burp`) via best-effort binary HTTP extraction
- Raw HTTP requests/responses
- Network logs
- Basic PCAP parsing
- Screenshots (OCR + masking)

## Features
- Analysis-based sensitive data detection (not regex-only)
- Named Entity Recognition (spaCy)
- Secret detection with heuristics and entropy scoring
- Context-aware detection from header/parameter names
- Consistent placeholder mapping (`<EMAIL_1>`, `<JWT_TOKEN_1>`, etc.)
- Local-only processing, no external API calls

## Project Structure
- `parsers/`: input ingestion modules
- `detection/`: entity, secret, entropy, and context detectors
- `redaction/`: placeholder and redaction logic
- `core/`: data models and orchestration pipeline
- `utils/`: file/encoding helpers
- `tests/`: sample files + pytest coverage

## Install With pipx (PyPI)
```bash
pipx install redsafe
```

Run:
```bash
redsafe --input tests/sample_burp.xml --type burp
redsafe --input tests/sample_http.txt --type http
redsafe --input tests/sample_log.txt --type log
redsafe --input tests/sample_image.png --type image
```

Upgrade:
```bash
pipx upgrade redsafe
```

## Install From GitHub (Latest Source)
```bash
pipx install git+https://github.com/sam1101-sys/ai-redaction-engine.git
```

## Install With venv (Alternative)
```bash
cd /home/gss/Desktop/Codex/ai-redaction-engine
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python -m spacy download en_core_web_sm
```

Run examples:
```bash
python main.py --input tests/sample_burp.xml --type burp
python main.py --input tests/sample_http.txt --type http
python main.py --input tests/sample_log.txt --type log
python main.py --input tests/sample_image.png --type image
```

Outputs are written to `sanitized_output/`.

## Redaction Tuning (False Positive Control)
Secret entropy detection can be tuned via environment variables:

```bash
export REDACTION_ENTROPY_THRESHOLD=4.2
export REDACTION_MIN_SECRET_LEN=24
export REDACTION_MIN_BASE64_LEN=28
export REDACTION_IGNORE_VALUES="application/x-www-form-urlencoded,text/plain"
```

These values are consumed by `SecretDetectionConfig` in `detection/secret_detection.py`.

## Run Tests
```bash
pytest -q
```

## Notes
- Designed for integration into a future AI pentesting engine.
- All processing is local.
- If `en_core_web_sm` is unavailable, regex/heuristic detection still works.
- Image redaction needs local `opencv-python`, `pytesseract`, and system `tesseract` binary installed.
