Metadata-Version: 2.1
Name: examinationrag
Version: 0.1.0
Summary: XRAG: eXamining the Core - Benchmarking Foundational Component Modules in Advanced Retrieval-Augmented Generation
Home-page: https://github.com/DocAILab/XRAG
Author: DocAILab
Author-email: luoyangyifei@buaa.edu.cn
License: Apache 2.0 License
Keywords: RAG,LLM,ChatGPT,LLamaIndex,Benchmarking
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9.0
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: dashscope==1.20.0
Requires-Dist: langchain==0.2.7
Requires-Dist: scikit-learn==1.3.2
Requires-Dist: llama-index==0.11.11
Requires-Dist: litellm==1.48.10
Requires-Dist: numpy==1.26.4
Requires-Dist: openai==1.47.0
Requires-Dist: pandas==2.2.2
Requires-Dist: sentence_transformers==3.0.1
Requires-Dist: torch==2.3.0
Requires-Dist: tqdm==4.66.4
Requires-Dist: transformers==4.42.2
Requires-Dist: llama-index-embeddings-huggingface==0.3.1
Requires-Dist: llama-index-embeddings-langchain==0.2.1
Requires-Dist: llama-index-embeddings-openai==0.2.5
Requires-Dist: llama-index-retrievers-bm25==0.3.0
Requires-Dist: llama-index-readers-file==0.2.2
Requires-Dist: llama-index-legacy==0.9.48
Requires-Dist: llama-index-llms-dashscope==0.2.1
Requires-Dist: llama-index-llms-huggingface==0.3.4
Requires-Dist: llama-index-llms-openai==0.2.9
Requires-Dist: FlagEmbedding==1.2.10
Requires-Dist: zhipuai
Requires-Dist: uptrain
Requires-Dist: toml
Requires-Dist: deepeval==0.21.26
Requires-Dist: rouge-score==0.0.4
Requires-Dist: evaluate
Requires-Dist: fire>=0.4.0
Requires-Dist: validators
Requires-Dist: llama-index-postprocessor-cohere-rerank==0.2.1
Requires-Dist: llama-index-postprocessor-colbert-rerank==0.2.1
Requires-Dist: llama-index-postprocessor-flag-embedding-reranker==0.2.0
Requires-Dist: sacrebleu
Requires-Dist: jiwer
Requires-Dist: seqeval==1.2.2
Requires-Dist: bert_score
Requires-Dist: sentencepiece
Requires-Dist: mauve-text
Requires-Dist: streamlit
Requires-Dist: streamlit-option-menu
Requires-Dist: streamlit_card
Provides-Extra: jury
Requires-Dist: jury; extra == "jury"

<img src="img.png" width="100%" align="center" alt="XRAG">

# XRAG

**XRAG: eXamining the Core - Benchmarking Foundational Component Modules in Advanced Retrieval-Augmented Generation**

[![PyPI version](https://badge.fury.io/py/examinationrag.svg)](https://badge.fury.io/py/examinationrag)
[![Python](https://img.shields.io/pypi/pyversions/examinationrag)](https://pypi.org/project/examinationrag/)
[![License](https://img.shields.io/github/license/DocAILab/XRAG)](https://github.com/DocAILab/XRAG/blob/main/LICENSE)
[![Downloads](https://static.pepy.tech/badge/examinationrag)](https://pepy.tech/project/examinationrag)
[![GitHub stars](https://img.shields.io/github/stars/DocAILab/XRAG)](https://github.com/DocAILab/XRAG/stargazers)
[![GitHub issues](https://img.shields.io/github/issues/DocAILab/XRAG)](https://github.com/DocAILab/XRAG/issues)

---

## 📖 Introduction

XRAG is a benchmarking framework designed to evaluate the foundational components of advanced Retrieval-Augmented Generation (RAG) systems. By dissecting and analyzing each core module, XRAG provides insights into how different configurations and components impact the overall performance of RAG systems.

---

## ✨ Features

- **🔍 Modular Benchmarking**: Evaluate individual components such as retrievers, readers, and rankers independently.
- **⚙️ Configurable Pipelines**: Easily modify configurations to test various setups and components.
- **📊 Extensive Metrics**: Utilize a range of evaluation metrics for comprehensive assessment.
- **📚 Dataset Support**: Built-in support for popular datasets like HotpotQA, with the ability to add custom datasets.
- **🤖 Integration with LLMs**: Seamless integration with large language models for generation tasks.

---

## 🛠️ Installation

Before installing XRAG, ensure that you have Python 3.11 or later installed.

### Create a Virtual Environment via conda(Recommended)

```bash
  
# Create a new conda environment
conda create -n xrag python=3.11

# Activate the environment
conda activate xrag
```

### **Install via pip**

You can install XRAG directly using `pip`:

```bash
# Clone the repository
git clone https://github.com/DocAILab/xrag.git
cd xrag

# Install XRAG
pip install .

# Install 'jury' without dependencies to avoid conflicts
pip install jury --no-deps
```
---

## 🚀 Quick Start

Here's how you can get started with XRAG:

### 1. **Prepare Configuration**: 

Modify the `config.toml` file to set up your desired configurations.

### 2. Using `xrag-cli`

After installing XRAG, the `xrag-cli` command becomes available in your environment. This command provides a convenient way to interact with XRAG without needing to call Python scripts directly.

### **Command Structure**

```bash
xrag-cli [command] [options]
```

### **Commands and Options**

- **run**: Runs the benchmarking process.

  ```bash
  xrag-cli run [--override key=value ...]
  ```

- **webui**: Launches the web-based user interface.

  ```bash
  xrag-cli webui
  ```

- **ver**: Displays the current version of XRAG.

  ```bash
  xrag-cli version
  ```

- **help**: Displays help information.

  ```bash
  xrag-cli help
  ```

### **Overriding Configuration Parameters**

Use the `--override` flag followed by key-value pairs to override configuration settings:

```bash
xrag-cli run --override embeddings="new-embedding-model"
```

---

## ⚙️ Configuration

XRAG uses a `config.toml` file for configuration management. Below is a sample of the configuration file:

```toml
[api_keys]
api_key = "sk-xxxx"
api_base = "https://xxx"
api_name = "chatgpt-3.5"
auth_token = "hf_xxx"

[settings]
llm = "chatgpt-3.5"
embeddings = "BAAI/bge-large-en-v1.5"
split_type = "sentence"
chunk_size = 128
dataset = "hotpot_qa"
source_dir = "../wiki"
persist_dir = "storage"
# ... additional settings ...
```

---

## ❗ Troubleshooting

- **Dependency Conflicts**: If you encounter dependency issues, ensure that you have the correct versions specified in `requirements.txt` and consider using a virtual environment.

- **Invalid Configuration Keys**: Ensure that the keys you override match exactly with those in the `config.toml` file.

- **Data Type Mismatches**: When overriding configurations, make sure the values are of the correct data type (e.g., integers, booleans).

---

## 📝 Changelog

### Version 0.1.0

- Initial release with core benchmarking functionality.
- Support for HotpotQA dataset.
- Command-line configuration overrides.
- Introduction of the `xrag-cli` command-line tool.

---

## 💬 Feedback and Support

We value feedback from our users. If you have suggestions, feature requests, or encounter issues:

- **Open an Issue**: Submit an issue on our [GitHub repository](https://github.com/DocAILab/xrag/issues).
- **Email Us**: Reach out at [luoyangyifei@buaa.edu.cn](mailto:luoyangyifei@buaa.edu.cn).
- **Join the Discussion**: Participate in discussions and share your insights.

---

## 🙏 Thank You

Thank you for using XRAG! We hope it proves valuable in your research and development efforts in the field of Retrieval-Augmented Generation.
