Metadata-Version: 2.1
Name: r2r
Version: 0.1.0
Summary: SciPhi R2R
License: Apache-2.0
Author: Owen Colegrove
Author-email: owen@sciphi.ai
Requires-Python: >=3.9,<3.12
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Dist: bs4 (>=0.0.2,<0.0.3)
Requires-Dist: builtwith (>=1.3.4,<2.0.0)
Requires-Dist: configparser (>=5.0.0,<6.0.0)
Requires-Dist: datasets (>=2.16.1,<3.0.0)
Requires-Dist: docx2txt (>=0.8,<0.9)
Requires-Dist: fastapi (>=0.109.2,<0.110.0)
Requires-Dist: fire (>=0.5.0,<0.6.0)
Requires-Dist: gunicorn (>=21.2.0,<22.0.0)
Requires-Dist: hatchet-sdk (>=0.9.4,<0.10.0)
Requires-Dist: langchain (>=0.1.5,<0.2.0)
Requires-Dist: llama-index (>=0.9.45.post1,<0.10.0)
Requires-Dist: numpy (>=1.25.2,<2.0.0)
Requires-Dist: openai (>=1.11.1,<2.0.0)
Requires-Dist: psycopg2-binary (>=2.9.9,<3.0.0)
Requires-Dist: pydantic (>=1.10.13,<2.0.0)
Requires-Dist: pypdf (>=4.0.2,<5.0.0)
Requires-Dist: python-dotenv (>=1.0.1,<2.0.0)
Requires-Dist: python-multipart (>=0.0.9,<0.0.10)
Requires-Dist: qdrant_client (>=1.7.0,<2.0.0)
Requires-Dist: requests (>=2.31.0,<3.0.0)
Requires-Dist: sentry-sdk (>=1.40.4,<2.0.0)
Requires-Dist: supabase (>=2.3.4,<3.0.0)
Requires-Dist: tiktoken (>=0.5.2,<0.6.0)
Requires-Dist: tqdm (>=4.66.1,<5.0.0)
Requires-Dist: types-requests (>=2.31.0.20240125,<3.0.0.0)
Requires-Dist: unstructured (>=0.12.4,<0.13.0)
Requires-Dist: uvicorn (>=0.27.0.post1,<0.28.0)
Requires-Dist: vecs (>=0.4.0,<0.5.0)
Description-Content-Type: text/markdown

# R2R

R2R (RAG to Riches) is a Python framework designed for the rapid construction and deployment of production-ready Retrieval-Augmented Generation (RAG) systems. This semi-opinionated framework accelerates the transition from experimental stages to production-grade RAG systems.

### Installation Guide

To get started with this project, you'll be using Poetry for managing dependencies. Follow these steps to ensure a smooth setup:

1. **Install Poetry:**
   - Before installing the project, make sure you have Poetry on your system. If not, visit the [official Poetry website](https://python-poetry.org/docs/#installation) for installation instructions.

2. **Clone and Install Dependencies:**
   - Clone the project repository and navigate to the project directory:
     ```bash
     git clone git@github.com:SciPhi-AI/r2r.git
     cd r2r
     ```
   - Install the project dependencies with Poetry:
     ```bash
     poetry install
     ```

3. **Configure Environment Variables:**
   - You need to set up cloud provider secrets in your `.env` file for the project to work properly. At a minimum, you will need an OpenAI key and a vector database provider.
   - For a fast setup, we recommend creating a project on Supabase, enabling the vector extension, and then updating the `.env.example` with the necessary details.
   - Once updated, copy the `.env.example` to `.env` to apply your configurations:
     ```bash
     cp .env.example .env
     ```
   - Qdrant is currently an alternative vector database provider, with plans to increase offerings in the near future.
     
This guide should help you set up the project with minimal hassle. Ensure you follow each step carefully to avoid any issues.

## Demonstration

https://github.com/SciPhi-AI/r2r/assets/68796651/c648ab67-973a-416a-985e-2eafb0a41ef0

## Community
[Join our Discord server!](https://discord.gg/p6KqD2kjtB)

## Core Abstractions

The framework primarily revolves around three core abstractions:

- The **Ingestion Pipeline**: Facilitates the preparation of embeddable 'Documents' from various data formats (json, txt, pdf, html, etc.). The abstraction can be found in [`ingestion.py`](r2r/core/pipelines/ingestion.py).

- The **Embedding Pipeline**: Manages the transformation of text into stored vector embeddings, interacting with embedding and vector database providers through a series of steps (e.g., extract_text, transform_text, chunk_text, embed_chunks, etc.). The abstraction can be found in [`embedding.py`](r2r/core/pipelines/embedding.py).

- The **RAG Pipeline**: Works similarly to the embedding pipeline but incorporates an LLM provider to produce text completions. The abstraction can be found in [`rag.py`](r2r/core/pipelines/rag.py).

Each pipeline incorporates a logging database for operation tracking and observability.

## Running the Examples

The project includes several basic examples that demonstrate application deployment and standalone usage of the embedding and RAG pipelines:

1. [`app.py`](examples/basic/app.py): This example runs the main application, which includes the ingestion, embedding, and RAG pipelines served via FastAPI.

    ```bash
    poetry run uvicorn examples.basic.app:app
    ```

2. [`test_client.py`](examples/client/test_client.py): This example should be run after starting the main application. It demonstrates a test of the user client.

    ```bash
    poetry run python -m examples.client.test_client
    ```

3. [`rag_pipeline.py`](examples/basic/rag_pipeline.py): This standalone example demonstrates the usage of the RAG pipeline. It takes a query as input and returns a completion generated by the OpenAI API.

    ```bash
    poetry run python -m examples.basic.rag_pipeline
    ```

4. [`embedding_pipeline.py`](examples/basic/embedding_pipeline.py): This standalone example demonstrates the usage of the embedding pipeline. It loads datasets from HuggingFace, generates embeddings for the data using the OpenAI API, and stores the embeddings in a PostgreSQL vector database.


    ```bash
    poetry run python -m examples.basic.embedding_pipeline
    ```

