Metadata-Version: 2.1
Name: r2r
Version: 0.1.1
Summary: SciPhi R2R
License: Apache-2.0
Author: Owen Colegrove
Author-email: owen@sciphi.ai
Requires-Python: >=3.9,<3.13
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Provides-Extra: all
Provides-Extra: codesearch
Provides-Extra: embedding
Provides-Extra: local-vectordb
Provides-Extra: monitoring
Provides-Extra: parsing
Provides-Extra: postgres
Provides-Extra: qdrant
Provides-Extra: streaming
Requires-Dist: bs4 (>=0.0.2,<0.0.3) ; extra == "parsing" or extra == "all"
Requires-Dist: datasets (>=2.16.1,<3.0.0) ; extra == "streaming" or extra == "all"
Requires-Dist: fastapi (>=0.109.2,<0.110.0)
Requires-Dist: fire (>=0.5.0,<0.6.0)
Requires-Dist: gunicorn (>=21.2.0,<22.0.0)
Requires-Dist: langchain (>=0.1.5,<0.2.0)
Requires-Dist: numpy (>=1.26.4,<2.0.0) ; extra == "local-vectordb" or extra == "all"
Requires-Dist: openai (>=1.11.1,<2.0.0)
Requires-Dist: protobuf (>=4.25.3,<5.0.0) ; extra == "codesearch" or extra == "all"
Requires-Dist: psycopg2-binary (>=2.9.9,<3.0.0) ; extra == "postgres" or extra == "all"
Requires-Dist: pydantic (>=1.10.13,<2.0.0)
Requires-Dist: pypdf (>=4.0.2,<5.0.0) ; extra == "parsing" or extra == "all"
Requires-Dist: python-dotenv (>=1.0.1,<2.0.0)
Requires-Dist: python-multipart (>=0.0.9,<0.0.10)
Requires-Dist: qdrant_client (>=1.7.0,<2.0.0) ; extra == "qdrant" or extra == "all"
Requires-Dist: requests (>=2.31.0,<3.0.0)
Requires-Dist: scikit-learn (>=1.4.1.post1,<2.0.0) ; extra == "local-vectordb" or extra == "all"
Requires-Dist: sentry-sdk (>=1.40.4,<2.0.0) ; extra == "monitoring" or extra == "all"
Requires-Dist: tiktoken (>=0.5.2,<0.6.0) ; extra == "embedding" or extra == "all"
Requires-Dist: types-requests (>=2.31.0.20240125,<3.0.0.0)
Requires-Dist: uvicorn (>=0.27.0.post1,<0.28.0)
Requires-Dist: vecs (>=0.4.0,<0.5.0)
Description-Content-Type: text/markdown

# R2R

R2R (RAG to Riches) is a Python framework designed for the rapid construction and deployment of production-ready Retrieval-Augmented Generation (RAG) systems. This semi-opinionated framework accelerates the transition from experimental stages to production-grade RAG systems.


### Quick Install:

**Install R2R directly using `pip`:**
   
   ```bash
   pip install r2r
   ```


### Full Install:

Follow these steps to ensure a smooth setup:

1. **Install Poetry:**
   - Before installing the project, make sure you have Poetry on your system. If not, visit the [official Poetry website](https://python-poetry.org/docs/#installation) for installation instructions.

2. **Clone and Install Dependencies:**
   - Clone the project repository and navigate to the project directory:
     ```bash
     git clone git@github.com:SciPhi-AI/r2r.git
     cd r2r
     ```
   - Install the project dependencies with Poetry:
     ```bash
     # See pyproject.toml for available extras
     # use "all" to include every optional dependency
     poetry install --extras "parsing"
     ```

3. **Configure Environment Variables:**
   - You need to set up cloud provider secrets in your `.env`. At a minimum, you will need an OpenAI key.
   - The framework currently supports pgvector and Qdrant with plans to extend coverage.
   - If starting from the example, copy `.env.example` to `.env` to apply your configurations:
     ```bash
     cp .env.example .env
     ```


### Basic Examples

The project includes several basic examples that demonstrate application deployment and standalone usage of the embedding and RAG pipelines:

1. [`app.py`](examples/basic/app.py): This example runs the main application, which includes the ingestion, embedding, and RAG pipelines served via FastAPI.

    ```bash
    poetry run uvicorn examples.basic.app:app
    ```

2. [`run_client.py`](examples/basic/run_client.py): This example should be run after starting the main application. It demonstrates uploading text entries as well as a PDF with the python client. Further, it shows document and user-level management with built-in features.

    ```bash
    poetry run python -m examples.client.test_client
    ```


3. [`run_pdf_chat.py`](examples/pdf_chat/run_demo.py): A more comprehensive example demonstrating upload and chat with a more realistic pdf.
    ```bash
    # Ingest pdf
    poetry run python -m examples.pdf_chat.run_demo ingest

    # Ask a question
    poetry run python -m examples.pdf_chat.run_demo search "What are the key themes of Meditations?"
    ```

4. [`web`](web/package.json): A web application which is meant to accompany the framework to provide visual intelligence.
    ```bash
    cd web && pnpm install
    # Serve the web app
    pnpm dev
    ```


## Demonstration

https://github.com/SciPhi-AI/r2r/assets/68796651/c648ab67-973a-416a-985e-2eafb0a41ef0

## Community
[Join our Discord server!](https://discord.gg/p6KqD2kjtB)

## Core Abstractions

The framework primarily revolves around three core abstractions:

- The **Ingestion Pipeline**: Facilitates the preparation of embeddable 'Documents' from various data formats (json, txt, pdf, html, etc.). The abstraction can be found in [`ingestion.py`](r2r/core/pipelines/ingestion.py).

- The **Embedding Pipeline**: Manages the transformation of text into stored vector embeddings, interacting with embedding and vector database providers through a series of steps (e.g., extract_text, transform_text, chunk_text, embed_chunks, etc.). The abstraction can be found in [`embedding.py`](r2r/core/pipelines/embedding.py).

- The **RAG Pipeline**: Works similarly to the embedding pipeline but incorporates an LLM provider to produce text completions. The abstraction can be found in [`rag.py`](r2r/core/pipelines/rag.py).

Each pipeline incorporates a logging database for operation tracking and observability.

