Metadata-Version: 2.4
Name: ragbox-core
Version: 1.0.5
Summary: RAG-in-a-Box: Zero-Configuration Self-Building Agentic RAG System
License-File: LICENSE
Keywords: rag,llm,ai,vector-search,knowledge-graph,agentic
Author: Developer
Author-email: amankumarpandeyin@gmail.com
Requires-Python: >=3.11,<4.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Requires-Dist: aiofiles (>=23.0,<24.0)
Requires-Dist: anthropic
Requires-Dist: camelot-py
Requires-Dist: chromadb (>=0.4,<0.5)
Requires-Dist: groq
Requires-Dist: igraph
Requires-Dist: leidenalg
Requires-Dist: llama-cpp-python
Requires-Dist: loguru (>=0.7,<0.8)
Requires-Dist: neo4j (>=5.0,<6.0)
Requires-Dist: networkx (>=3.0,<4.0)
Requires-Dist: numpy (>=1.24,<2.0)
Requires-Dist: openai
Requires-Dist: paddleocr
Requires-Dist: pdfplumber
Requires-Dist: pydantic (>=2.0,<3.0)
Requires-Dist: pydantic-settings (>=2.13.1,<3.0.0)
Requires-Dist: pypdf
Requires-Dist: python-dotenv
Requires-Dist: python-louvain
Requires-Dist: python-magic
Requires-Dist: python-pptx
Requires-Dist: rank-bm25
Requires-Dist: rich (>=13.0,<14.0)
Requires-Dist: scikit-learn (>=1.3,<2.0)
Requires-Dist: scipy (>=1.11,<2.0)
Requires-Dist: sentence-transformers
Requires-Dist: tenacity
Requires-Dist: tiktoken
Requires-Dist: tree-sitter
Requires-Dist: tree-sitter-python
Requires-Dist: typer (>=0.9,<0.10)
Requires-Dist: watchdog (>=3.0,<4.0)
Project-URL: Homepage, https://pypi.org/project/ragbox-core/
Project-URL: Repository, https://github.com/ixchio/ragbox-core
Description-Content-Type: text/markdown

# RAGBox-Core

[![PyPI version](https://badge.fury.io/py/ragbox-core.svg)](https://badge.fury.io/py/ragbox-core)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
[![CI](https://github.com/ixchio/ragbox-core/actions/workflows/ci.yml/badge.svg)](https://github.com/ixchio/ragbox-core/actions/workflows/ci.yml)

**RAG-in-a-Box: Zero-Configuration Self-Building Agentic RAG System**

RAGBox is a production-ready, auto-configuring, async-first RAG engine that combines Vector Search, Agentic Orchestration, and Graph Retrieval natively.

## Installation

```bash
pip install ragbox
```

> **Note on Dependencies:** Advanced document processing features like OCR and complex PDF parsing require system-level dependencies. Depending on your OS, you may need to install standard C++ build tools or Tesseract for `paddleocr` and `pdfplumber` to function optimally.

## Configuration (API Keys)

RAGBox auto-detects cloud providers. For the best experience, set one of the following environment variables before running:
```bash
export OPENAI_API_KEY="sk-..."
# OR
export ANTHROPIC_API_KEY="sk-ant-..."
# OR
export GROQ_API_KEY="gsk_..."
```
If no keys are found, RAGBox falls back to a local LLaMA model (requires manual model download to `models/llama-3.1-8b-instruct.gguf`).

## Quick Start (3-Line API)

```python
from ragbox import RAGBox

# Automatically ingests, builds graphs, configures vector db, and chunks
rag = RAGBox("./company-docs")

# Intelligent routing via query classification
answer = rag.query("What's our vacation policy?")
print(answer)
```

## CLI Interface

RAGBox provides a dead-simple CLI for running locally without writing code:

```bash
# Point to your documents. RAGBox will self-build the index and graph.
ragbox init ./company-docs

# Query the active index
ragbox query "What's our vacation policy?" -d ./company-docs
```

## Architecture

```mermaid
graph TD
    A[Local Documents] --> B{Document Processor Auto-Router}
    B --> C[AST / OCR / PDF Parsing]
    C --> D[Chunking Engine]
    D --> E[(Vector Store)]
    C --> F[(Knowledge Graph)]
    
    Q[User Query] --> G[Agentic Orchestrator]
    G --> H[Retrieval Fusion Engine]
    E --> H
    F --> H
    H --> G
    G --> I[Final Answer]
```

## Risk Surface Analysis

*   **Temporal Edges (T=0 vs T=Scale):** At T=0, `ragbox init` is blocking to guarantee index availability. At T=scale, the background daemon handles delta updates (via watchdog) to prevent index staleness and thundering herds.
*   **Adversarial Edges:** Subject to standard prompt injection if queries are exposed raw to external users. The Orchestrator currently assumes trusted inputs.
*   **Resource Edges:** High concurrency read/write spikes memory due to dual maintenance of the local Vector DB and the Knowledge Graph.

## Features

* **Self-Healing Infrastructure:** Watchdog auto-detects changes and updates vector stores & knowledge graphs incrementally, preventing index staleness or storms.
* **Auto-Document Intelligence:** Automatically detects PDF, Text, Images, and Code to use AST, OCR (`paddleocr`), or structural layouts (`pdfplumber`).
* **Cost Estimator:** See the expected USD cost of indexing *before* it runs.
* **Auto-Knowledge-Graph (GraphRAG):** Extracts entities and communities automatically using the Leiden algorithm for structured reasoning.
* **Retrieval Fusion & Reranking:** Merges Dense Vectors and Graph Search using Reciprocal Rank Fusion, then reranks the massive candidate pool using a highly accurate `ms-marco` Cross-Encoder.
* **Late Chunking:** Contextual sequence embeddings! Vectors are calculated over the full document bounds before being pooled into chunks, preserving global semantic context within local tokens.
* **Agentic Orchestrator & Intelligent Routing:** Automatically routes incoming queries into 6 distinct pipelines: Vector, Keyword, Graph, Multi-Query, Time-Based, and Agentic.
* **Multi-Query Expansion:** Broad intent queries are dynamically expanded into multiple variations by the LLM, retrieving and fusing results across all variations for unparalleled recall.

## Contributing

We welcome contributions to RAGBox-Core! Please see our [CONTRIBUTING.md](https://github.com/ixchio/ragbox-core/blob/master/CONTRIBUTING.md) for details on how to set up your development environment, run the test suite, and submit Pull Requests.

## License

This project is licensed under the MIT License - see the [LICENSE](https://github.com/ixchio/ragbox-core/blob/master/LICENSE) file for details.

