Metadata-Version: 2.4
Name: sura-rag
Version: 0.1.0
Summary: Verified data deletion and leak detection for RAG systems
Project-URL: Homepage, https://github.com/SURA-RAG/sura-rag
Project-URL: Repository, https://github.com/SURA-RAG/sura-rag
Project-URL: Issues, https://github.com/SURA-RAG/sura-rag/issues
Project-URL: Changelog, https://github.com/SURA-RAG/sura-rag/blob/main/CHANGELOG.md
Author-email: Aditya Saxena <madityasaxena@gmail.com>
License: MIT License
        
        Copyright (c) 2024 Aditya Saxena
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
Keywords: chromadb,compliance,gdpr,langchain,leak-detection,llamaindex,llm-safety,machine-unlearning,rag,right-to-be-forgotten
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Security
Requires-Python: >=3.10
Requires-Dist: chromadb>=0.5.0
Requires-Dist: faiss-cpu>=1.8.0
Requires-Dist: numpy>=1.26.0
Requires-Dist: ollama>=0.2.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: qdrant-client>=1.9.0
Requires-Dist: reportlab>=4.2.0
Requires-Dist: rich>=13.0.0
Requires-Dist: sqlalchemy>=2.0.0
Requires-Dist: typer>=0.12.0
Provides-Extra: all
Requires-Dist: huggingface-hub>=0.23.0; extra == 'all'
Requires-Dist: langchain-community>=0.2.0; extra == 'all'
Requires-Dist: langchain>=0.2.0; extra == 'all'
Requires-Dist: llama-index>=0.10.0; extra == 'all'
Requires-Dist: pandas>=2.0.0; extra == 'all'
Requires-Dist: sentence-transformers>=3.0.0; extra == 'all'
Requires-Dist: transformers>=4.40.0; extra == 'all'
Provides-Extra: cpu
Requires-Dist: sentence-transformers>=3.0.0; extra == 'cpu'
Requires-Dist: torch>=2.2.0; extra == 'cpu'
Provides-Extra: cuda
Requires-Dist: sentence-transformers>=3.0.0; extra == 'cuda'
Provides-Extra: dev
Requires-Dist: httpx>=0.27.0; extra == 'dev'
Requires-Dist: mypy>=1.10.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
Requires-Dist: pytest-cov>=5.0.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: python-dotenv>=1.0.0; extra == 'dev'
Requires-Dist: ruff>=0.4.0; extra == 'dev'
Provides-Extra: hf
Requires-Dist: accelerate>=0.30.0; extra == 'hf'
Requires-Dist: huggingface-hub>=0.23.0; extra == 'hf'
Requires-Dist: transformers>=4.40.0; extra == 'hf'
Provides-Extra: langchain
Requires-Dist: langchain-community>=0.2.0; extra == 'langchain'
Requires-Dist: langchain>=0.2.0; extra == 'langchain'
Provides-Extra: llamaindex
Requires-Dist: llama-index>=0.10.0; extra == 'llamaindex'
Provides-Extra: pandas
Requires-Dist: pandas>=2.0.0; extra == 'pandas'
Description-Content-Type: text/markdown

# sura-rag

[![PyPI version](https://img.shields.io/pypi/v/sura-rag.svg)](https://pypi.org/project/sura-rag/)
[![Python versions](https://img.shields.io/pypi/pyversions/sura-rag.svg)](https://pypi.org/project/sura-rag/)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Tests](https://github.com/SURA-RAG/sura-rag/actions/workflows/ci.yml/badge.svg)](https://github.com/SURA-RAG/sura-rag/actions/workflows/ci.yml)

**Verified data deletion and runtime leak detection for RAG systems.** GDPR Article 17 compliant forget pipeline with multi-strategy leak probing, runtime guardrailing, and signed compliance certificates. 100% local, zero cloud API required.

## The Problem

RAG systems retrieve and present data from vector stores, but when a user exercises their GDPR Article 17 "right to be forgotten," simply deleting a document from the vector store is not enough. The LLM may have memorized fragments during retrieval, cached chunks may persist, and there is no way to verify that the data is truly gone. **sura-rag** closes this gap by providing a complete forget pipeline: delete → probe → guardrail → certify.

## Quick Install

```bash
# Core (Ollama-based, no GPU required)
pip install sura-rag

# With CPU embeddings (sentence-transformers)
pip install sura-rag[cpu]

# With CUDA support (pre-install CUDA torch first)
pip install sura-rag[cuda]

# With framework connectors
pip install sura-rag[langchain]
pip install sura-rag[llamaindex]

# Everything
pip install sura-rag[all]
```

## 30-Second Quickstart

```python
import sura_rag as sr

# Connect to your vector store
client = sr.SuraClient(
    vector_store=sr.adapters.ChromaDBAdapter("my_collection"),
    config=sr.SuraConfig(generator_model="llama3.2:3b"),
)

# Forget a document (GDPR Article 17)
result = client.forget(
    doc_ids=["doc_001"],
    subject="John Smith salary records",
    requestor_id="user_4821",
    regulation="GDPR_Art17",
)

print(f"Score: {result.forget_score.composite_score}")  # 0.0–1.0
print(f"Status: {result.status}")                       # "completed"
print(f"Certificate: {result.certificate_id}")          # UUID
```

## Features

| Feature | Phase 1 (v0.1) | Phase 2 (planned) |
|---------|:-:|:-:|
| Vector store deletion | ✅ | ✅ |
| Fingerprint registry | ✅ | ✅ |
| Direct entity probes | ✅ | ✅ |
| Paraphrase probes | ✅ | ✅ |
| Contextual probes | ✅ | ✅ |
| Adversarial probes | ✅ | ✅ |
| Runtime guardrail (4 modes) | ✅ | ✅ |
| Audit logging (SQLite/Postgres) | ✅ | ✅ |
| PDF compliance certificates | ✅ | ✅ |
| LangChain connector | ✅ | ✅ |
| LlamaIndex connector | ✅ | ✅ |
| Parametric unlearning (LoRA) | — | ✅ |
| TOFU benchmark evaluation | — | ✅ |
| Multi-GPU training | — | ✅ |

## Architecture

SURA-RAG follows a pipeline architecture: **Delete → Probe → Guardrail → Certify**. Documents are deleted from the vector store, their fingerprints are stored for runtime monitoring, multi-strategy probes verify the deletion, and a compliance certificate is generated. The runtime guardrail continuously scans all RAG responses against the fingerprint registry to catch any residual leakage.

## Compatibility

| Component | Supported |
|-----------|-----------|
| ChromaDB | ✅ ≥0.5.0 |
| Qdrant | ✅ ≥1.9.0 |
| FAISS | ✅ ≥1.8.0 (soft-delete) |
| LangChain | ✅ ≥0.2.0 |
| LlamaIndex | ✅ ≥0.10.0 |
| Ollama | ✅ ≥0.2.0 |
| HuggingFace | ✅ ≥4.40.0 |
| PyTorch | ✅ ≥2.2.0 |
| Pandas | ✅ ≥2.0.0 |
| Python 3.10 | ✅ |
| Python 3.11 | ✅ |
| Python 3.12 | ✅ |
| Windows | ✅ |
| Linux | ✅ |
| macOS | ✅ |

## Local Setup

### 1. Install Ollama

```bash
# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows — download from https://ollama.com
```

### 2. Pull models

```bash
ollama pull llama3.2:3b
ollama pull nomic-embed-text
```

### 3. Start Ollama

```bash
ollama serve
```

### 4. Install sura-rag

```bash
pip install sura-rag
# or for development:
git clone https://github.com/SURA-RAG/sura-rag.git
cd sura-rag
pip install -e ".[dev,cpu]"
```

### 5. Run tests

```bash
pytest tests/unit/ -v
```

## Environment Setup

Copy `.env.example` to `.env` and fill in your values:

```bash
cp .env.example .env
```

The `.env` file is in `.gitignore` and will never be committed. For Phase 1 (Ollama-based), no tokens are required. See `.env.example` for all available settings.

## Contributing

Contributions are welcome! Please:

1. Fork the repository
2. Create a feature branch: `git checkout -b feature/my-feature`
3. Run tests: `pytest tests/unit/ -v`
4. Run linting: `ruff check sura_rag/`
5. Submit a pull request

## License

MIT License. See [LICENSE](LICENSE) for details.

## Citation

If you use sura-rag in academic research, please cite:

```bibtex
@software{sura_rag_2024,
  title = {sura-rag: Verified Data Deletion and Leak Detection for RAG Systems},
  author = {Saxena, Aditya},
  year = {2024},
  url = {https://github.com/SURA-RAG/sura-rag},
  license = {MIT},
}
```
