Metadata-Version: 2.4
Name: splade_index
Version: 0.0.2
Summary: An ultra-fast search index for SPLADE sparse retrieval models.
Home-page: https://github.com/rasyosef/splade_index
Author: Yosef Worku Alemneh
Author-email: 
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: scipy
Requires-Dist: numpy
Requires-Dist: sentence-transformers>=5.0.0
Provides-Extra: core
Requires-Dist: orjson; extra == "core"
Requires-Dist: tqdm; extra == "core"
Requires-Dist: numba; extra == "core"
Provides-Extra: hf
Requires-Dist: huggingface_hub; extra == "hf"
Provides-Extra: dev
Requires-Dist: black; extra == "dev"
Provides-Extra: selection
Requires-Dist: jax[cpu]; extra == "selection"
Provides-Extra: evaluation
Requires-Dist: pytrec_eval; extra == "evaluation"
Provides-Extra: full
Requires-Dist: orjson; extra == "full"
Requires-Dist: tqdm; extra == "full"
Requires-Dist: numba; extra == "full"
Requires-Dist: huggingface_hub; extra == "full"
Requires-Dist: black; extra == "full"
Requires-Dist: jax[cpu]; extra == "full"
Requires-Dist: pytrec_eval; extra == "full"
Dynamic: author
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license-file
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# SPLADE-Index⚡

<i>
SPLADE-Index is an ultrafast index for SPLADE sparse retrieval models implemented in pure Python and powered by Scipy sparse matrices. It is built on top of the BM25s library.
</i>
<br/><br/>

SPLADE is a neural retrieval model which learns query/document sparse expansion. Sparse representations benefit from several advantages compared to dense approaches: efficient use of inverted index, explicit lexical match, interpretability... They also seem to be better at generalizing on out-of-domain data (BEIR benchmark).

For more information about SPLADE models, please refer to the following. 
 - [SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking](https://arxiv.org/abs/2107.05720)
 - [List of Pretrained Sparse Encoder (Sparse Embeddings) Models](https://sbert.net/docs/sparse_encoder/pretrained_models.html)
 - [Training and Finetuning Sparse Embedding Models with Sentence Transformers v5](https://huggingface.co/blog/train-sparse-encoder).

## Installation

You can install `splade-index` with pip:

```bash
pip install splade-index
```

Recommended (but optional) dependencies:

```bash
# To speed up the top-k selection process, you can install `jax`
pip install "jax[cpu]"
```

## Quickstart

Here is a simple example of how to use `splade-index`:

```python
from sentence_transformers import SparseEncoder
from splade_index import SPLADE

# Download a SPLADE from the 🤗 Hub
model = SparseEncoder("rasyosef/splade-tiny")

# Create your corpus here
corpus = [
    "a cat is a feline and likes to purr",
    "a dog is the human's best friend and loves to play",
    "a bird is a beautiful animal that can fly",
    "a fish is a creature that lives in water and swims",
]

# Create the SPLADE retriever and index the corpus
retriever = SPLADE()
retriever.index(model=model, documents=corpus)

# Query the corpus
queries = ["does the fish purr like a cat?"]

# Get top-k results as a tuple of (doc ids, documents, scores). All three are arrays of shape (n_queries, k).
results = retriever.retrieve(queries, k=2)
doc_ids, result_docs, scores = results.doc_ids, results.documents, results.scores

for i in range(doc_ids.shape[1]):
    doc_id, doc, score = doc_ids[0, i], result_docs[0, i], scores[0, i]
    print(f"Rank {i+1} (score: {score:.2f}) (doc_id: {doc_id}): {doc}")

# You can save the index to a directory
retriever.save("animal_index_splade")

# ...and load it when you need it
import splade_index

reloaded_retriever = splade_index.SPLADE.load("animal_index_splade", model=model)
```
