Metadata-Version: 2.4
Name: rag-llm-api-pipeline
Version: 0.7.0
Summary: Multimodal RAG pipeline for low-compute, local, real-world deployment
Home-page: https://github.com/pkbythebay29/ot-rag-llm-api
Author: pkbythebay29
Author-email: kannan@haztechrisk.org
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch>=2.2
Requires-Dist: transformers>=4.42.3
Requires-Dist: accelerate>=0.30.1
Requires-Dist: sentence-transformers>=2.7
Requires-Dist: faiss-cpu>=1.8.0
Requires-Dist: fastapi>=0.111
Requires-Dist: uvicorn[standard]>=0.30
Requires-Dist: pydantic>=2.7
Requires-Dist: pyyaml>=6.0.1
Requires-Dist: pytesseract>=0.3.10
Requires-Dist: pillow>=10.3
Requires-Dist: moviepy>=1.0.3
Requires-Dist: pymupdf>=1.24.5
Requires-Dist: openai-harmony>=0.0.3
Requires-Dist: tqdm>=4.66
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# RAG-LLM-API-Pipeline

A fully  local GPU poor, multimodal Retrieval-Augmented Generation (RAG) system powered by open-source local LLMs. This pipeline is designed for operational technology environments to provide AI-assisted access to technical knowledge, manuals, and historical data — securely and offline, at min cost.

---

✅ Key Features

    🔍 Retrieval-Augmented Generation (RAG) using FAISS + SentenceTransformers

    🧠 Flexible LLM Integration with support for:
	- Open-source HuggingFace models (Qwen, Mistral, etc.)
	- Mixed precision support: fp32, fp16, bfloat16
	- Dynamic model/device/precision switching via YAML

    🔧 1-line YAML configuration to control:
    - System-specific documents
    - Embedding & generation models
    - GPU/CPU inference toggle
    - Index rebuilding, token limits, chunking
		
    📂 Multimodal Input Support:
    - PDFs
    - Plain text
    - Images (OCR via Tesseract)
    - Audio (.wav)
    - Video (.mp4)
		
    💻 Multiple Interfaces:
   - CLI (rag-cli) for single-line querying
   - FastAPI-powered REST API for local serving
   - Lightweight HTML Web UI for interactive search

###🛠️ Per-system configuration via system.yaml for flexible deployments
###🔐 Fully local operation — no cloud dependencies required

###✅ One-line install via pip install rag-llm-api-pipeline
###✅ Quickstart guide and prebuilt example included
###✅ Runs on CPU or GPU with smart memory management
###✅ Web UI + CLI + API, all in one package

---

## 📦 Installation

```bash
pip install rag-llm-api-pipeline

```

---

## 🛠️ Setup Instructions (Windows + Anaconda)

### 1. Create Python Environment
```bash
conda create -n rag_env python=3.10
conda activate rag_env
```

### 2. Install Dependencies
#### Via Conda (system-level tools):
```bash
conda install -c conda-forge ffmpeg pytesseract pyaudio
```

#### Via Pip (Python packages):
```bash
pip install -r requirements.txt
```

> Ensure Tesseract is installed and in your system PATH. You can get it from https://github.com/tesseract-ocr/tesseract.

---

## 🚀 Usage

Please review the quickstart guide. 

---
## 🐧 Setup Instructions (Linux)

### 1. Create Python Environment
```bash
python3 -m venv rag_env
source rag_env/bin/activate
```

Or with `conda`:
```bash
conda create -n rag_env python=3.10
conda activate rag_env
```

### 2. Install System Dependencies
```bash
sudo apt update
sudo apt install -y ffmpeg tesseract-ocr libpulse-dev portaudio19-dev
```

> Optional: install language packs for OCR (e.g., `tesseract-ocr-eng`).

### 3. Install Python Packages
```bash
pip install -r requirements.txt
```

---

## 🔁 Running the Application on Linux

### CLI
```bash
python cli/main.py --system TestSystem --question "What is the restart sequence for this machine?"
```

### API Server
```bash
uvicorn rag_llm_api_pipeline.api.server:app --host 0.0.0.0 --port 8000
```

### cURL Query
```bash
curl -X POST http://localhost:8000/query \
     -H "Content-Type: application/json" \
     -d '{"system": "TestSystem", "question": "What does error E204 indicate?"}'
```

## 📚 How it Works

1. **Index Building**:
   - Files are parsed using `loader.py`.
   - Text chunks are embedded with MiniLM.
   - FAISS index stores embeddings for fast similarity search.

2. **Query Execution**:
   - User provides a natural language question.
   - Relevant text chunks are retrieved from the index.
   - LLM generates an answer based on retrieved context.

---

## 🧠 Model Info

- All models are open-source and run offline.

> You can replace with any local-compatible Hugging Face model.

---

## 🔐 Security & Offline Use

- No cloud or external dependencies required after initial setup.
- Ideal for OT environments.
- All processing is local: embeddings, LLM inference, and data storage.

---

## 📜 License

MIT License

---

## 📧 Contact

For issues, improvements, or contributions, please open an issue or PR.
