Metadata-Version: 2.1
Name: denser-retriever
Version: 0.1.2
Summary: Enterprise-grade AI retriever solution that seamlessly integrates to enhance your AI applications.
Home-page: https://github.com/denser-org/denser-retriever
License: MIT
Author: denser-org
Author-email: support@denser.ai
Requires-Python: >=3.10.0,<4.0.0
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Dist: beautifulsoup4 (>=4.12.3,<5.0.0)
Requires-Dist: datasets (>=2.18.0,<3.0.0)
Requires-Dist: elasticsearch (>=8.13.0,<9.0.0)
Requires-Dist: fake_useragent (>=1.5.1,<2.0.0)
Requires-Dist: langchain-community (>=0.2.0,<0.3.0)
Requires-Dist: langchain-core (==0.2.10)
Requires-Dist: langchain-elasticsearch (>=0.2.2,<0.3.0)
Requires-Dist: langchain-huggingface (>=0.0.3,<0.0.4)
Requires-Dist: langchain-milvus (>=0.1.1,<0.2.0)
Requires-Dist: langchain-qdrant (>=0.1.2,<0.2.0)
Requires-Dist: langchain-text-splitters (>=0.2.2,<0.3.0)
Requires-Dist: numpy (>=1.26.4,<2.0.0)
Requires-Dist: pydantic-settings (>=2.2.1,<3.0.0)
Requires-Dist: pydantic[dotenv] (>=2.7.1,<3.0.0)
Requires-Dist: pymilvus (>=2.4.4,<3.0.0)
Requires-Dist: pypdf (>=4.2.0,<5.0.0)
Requires-Dist: pytrec-eval (>=0.5,<0.6)
Requires-Dist: rich (>=10.14.0,<11.0.0)
Requires-Dist: sentence-transformers (>=2.7.0,<3.0.0)
Requires-Dist: torch (>=1.13.1,<2.0.0)
Requires-Dist: typer[all] (>=0.12.1,<0.13.0)
Requires-Dist: xgboost (>=2.0.3,<3.0.0)
Project-URL: Repository, https://github.com/denser-org/denser-retriever
Description-Content-Type: text/markdown

# <img src="assets/images/logo.png" alt="denser logo" width="40"/> Denser Retriever

<div align="center">

<!-- [![Build status](https://github.com/denser-org/denser-retriever/workflows/build/badge.svg?branch=main&event=push)](https://github.com/denser-org/denser-retriever/actions?query=workflow%3Abuild) -->

[![Python Version](https://img.shields.io/pypi/pyversions/denser-retriever.svg)](https://pypi.org/project/denser-retriever/)
[![Dependencies Status](https://img.shields.io/badge/dependencies-up%20to%20date-brightgreen.svg)](https://github.com/denser-org/denser-retriever/pulls?utf8=%E2%9C%93&q=is%3Apr%20author%3Aapp%2Fdependabot)

[![Code style: ruff](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/astral-sh/ruff)
[![Security: bandit](https://img.shields.io/badge/security-bandit-green.svg)](https://github.com/PyCQA/bandit)
[![Pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://github.com/denser-org/denser-retriever/blob/main/.pre-commit-config.yaml)
[![Semantic Versions](https://img.shields.io/badge/%20%20%F0%9F%93%A6%F0%9F%9A%80-semantic--versions-e10079.svg)](https://github.com/denser-org/denser-retriever/releases)
[![License](https://img.shields.io/github/license/denser-org/denser-retriever)](https://github.com/denser-org/denser-retriever/blob/main/LICENSE)
![Coverage Report](assets/images/coverage.svg)

An enterprise-grade AI retriever designed to streamline AI integration into your applications, ensuring cutting-edge accuracy.

</div>

## 📝 Description

Denser Retriever combines multiple search technologies into a single platform. It utilizes **gradient boosting (
xgboost)** machine learning technique to combine:

- **Keyword-based searches** that focus on fetching precisely what the query mentions.
- **Vector databases** that are great for finding a wide range of potentially relevant answers.
- **Machine Learning rerankers** that fine-tune the results to ensure the most relevant answers top the list.

* Our experiments on MTEB datasets show that the combination of keyword search, vector search and a reranker via a xgboost model (denoted as ES+VS+RR_n) can significantly improve the vector search (VS) baseline.

![mteb_ndcg_plot](mteb_ndcg_plot.png)

* **Check out Denser Retriever experiments using the Anthropic Contextual Retrieval dataset at [here](https://github.com/denser-org/denser-retriever/tree/main/experiments/data/contextual-embeddings)**.
## 🚀 Features

The initial release of Denser Retriever provides the following features.

- Supporting heterogeneous retrievers such as **keyword search**, **vector search**, and **ML model reranking**
- Leveraging **xgboost** ML technique to effectively combine heterogeneous retrievers
- **State-of-the-art accuracy** on [MTEB](https://github.com/embeddings-benchmark/mteb) Retrieval benchmarking
- Demonstrating how to use Denser retriever to power an **end-to-end applications** such as chatbot and semantic search

## 📦 Installation

We recommend installing Python via [Anaconda](https://www.anaconda.com/download), as we have received feedback about issues with Numpy installation when using the installer from https://www.python.org/downloads/. We are working on providing a solution to this problem. To install Denser Retriever, you can run:

### Pip

```bash
pip install git+https://github.com/denser-org/denser-retriever.git#main
```

### Poetry

```bash
poetry add git+https://github.com/denser-org/denser-retriever.git#main
```

## 📃 Documentation

The official documentation is hosted on [retriever.denser.ai](https://retriever.denser.ai).
Click [here](https://retriever.denser.ai/docs/quick-start) to get started.

## 👨🏼‍💻 Development

You can start developing Denser Retriever on your local machine.

See [DEVELOPMENT.md](DEVELOPMENT.md) for more details.

## 🛡 License

[![License](https://img.shields.io/github/license/denser-org/denser-retriever)](https://github.com/denser-org/denser-retriever/blob/main/LICENSE)

This project is licensed under the terms of the `MIT` license.
See [LICENSE](https://github.com/denser-org/denser-retriever/blob/main/LICENSE) for more details.

## 📃 Citation

```bibtex
@misc{denser-retriever,
  author = {denser-org},
  title = {An enterprise-grade AI retriever designed to streamline AI integration into your applications, ensuring cutting-edge accuracy.},
  year = {2024},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/denser-org/denser-retriever}}
}
```

