Metadata-Version: 2.1
Name: rerankers
Version: 0.0.1
Summary: A unified API for various document re-ranking models.
Author-email: Ben Clavié <ben@clavie.eu>
Maintainer-email: Ben Clavié <ben@clavie.eu>
Project-URL: Homepage, https://github.com/bclavie/rerankers
Keywords: reranking,retrieval,rag,nlp
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3 :: Only
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pydantic
Requires-Dist: tqdm
Provides-Extra: all
Requires-Dist: transformers; extra == "all"
Requires-Dist: torch; extra == "all"
Requires-Dist: litellm; extra == "all"
Requires-Dist: requests; extra == "all"
Provides-Extra: transformers
Requires-Dist: transformers; extra == "transformers"
Requires-Dist: torch; extra == "transformers"
Provides-Extra: t5
Requires-Dist: transformers; extra == "t5"
Requires-Dist: torch; extra == "t5"
Requires-Dist: sentencepiece; extra == "t5"
Requires-Dist: protobuf; extra == "t5"
Provides-Extra: api
Requires-Dist: requests; extra == "api"
Provides-Extra: gpt
Requires-Dist: litellm; extra == "gpt"
Provides-Extra: training
Requires-Dist: inranker; extra == "training"
Requires-Dist: sentence-transformers; extra == "training"
Provides-Extra: dev
Requires-Dist: ruff; extra == "dev"
Requires-Dist: isort; extra == "dev"
Requires-Dist: pytest; extra == "dev"


## TL;DR

**UNDER CONSTRUCTION**


Load any reranker, no matter the architecture:
```python
from rerankers import Reranker

# Cross-encoder default
ranker = Reranker('cross-encoder')

# Specific cross-encoder
ranker = Reranker('mixedbread-ai/mxbai-rerank-xlarge-v1')

# T5 Seq2Seq reranker
ranker = Reranker("t5")

# Specific T5 Seq2Seq reranker
ranker = Reranker("unicamp-dl/InRanker-base")

# API (Cohere)
ranker = Reranker("cohere", lang='en' (or 'other'), api_key = API_KEY)

# API (Jina)
ranker = Reranker("jina", api_key = API_KEY)

# RankGPT4-turbo
ranker = Reranker("rankgpt", api_key = API_KEY)

# RankGPT3-turbo
ranker = Reranker("rankgpt3", api_key = API_KEY)

```

Then:

```python
results = ranker.rank(query="I love you", docs=["I hate you", "I really like you", "I like you", "I despise you"])
```

You can also pass a list of `doc_ids` to `rank()`. If you don't, it'll be auto-generated and each doc_id will correspond to the index of any given document in `docs`.

Which will always return a `RankedResults` pydantic object, containing a list of `Result`s:

```python
RankedResults(results=[Result(doc_id=2, text='I like you', score=0.13376183807849884, rank=1), Result(doc_id=1, text='I really like you', score=0.002901385771110654, rank=2), Result(doc_id=0, text='I hate you', score=-2.278848886489868, rank=3), Result(doc_id=3, text='I despise you', score=-3.1964476108551025, rank=4)], query='I love you', has_scores=True)
```

You can retrieve however many top results by running .top_k() on a `RankedResults` object:

```python
> results.top_k(1)
[Result(doc_id=2, text='I like you', score=0.13376183807849884, rank=1)]
> results.top_k(1)[0]['text']
'I like you'
```

You can also retrieve the score for a given doc_id. This is useful if you're scoring documents to use for knowledge distillation:

```python
> results.get_score_by_docid(3)
-2.278848886489868
```

For the same purpose, you can also use `ranker.score()` to score a single Query-Document pair:
```python
> ranker.score(query="I love you", doc="I hate you")
-2.278848886489868
```

Please note, `score` is not available for RankGPT rerankers, as they don't issue individual relevance scores but a list of ranked results!
## Features

TODO PRE-RELEASE:
- Allow the use of RankGPT with other LLMs (but no RankZephyr codebase yet)
- Allow easier model_type specification via inference (will also fix the above)

Legend:
- ✅ Supported
- 🟠 Implemented, but not fully fledged
- 📍Not supported but intended to be in the future
- ❌ Not supported & not currently planned

Supported models:
✅ Any standard SentenceTransformer or Transformers cross-encoder
🟠 RankGPT (Implemented using original repo, but missing the rankllm's repo improvements)
✅ T5-based pointwise rankers (InRanker, MonoT5...)
✅ Cohere API rerankers
✅ Jina API rerankers
📍 MixedBread API (Reranking API not yet released)
📍 RankLLM/RankZephyr (Proper RankLLM implementation should replace the unsafe RankGPT one)
📍 LiT5 

Supported features:
✅ Reranking 
📍 Training on Python >=3.10 (via interfacing with other libraries)
📍 ONNX runtime support (quantised rankers, more efficient inference, etc...)
❌(📍Maybe?) Training via rerankers directly
## Usage
  
