Metadata-Version: 2.3
Name: retsim-pytorch
Version: 0.1.1
Project-URL: Documentation, https://github.com/LLukas22/retsim-pytorch#readme
Project-URL: Issues, https://github.com/LLukas22/retsim-pytorch/issues
Project-URL: Source, https://github.com/LLukas22/retsim-pytorch
Author-email: Lukas Kreussel <65088241+LLukas22@users.noreply.github.com>
License-Expression: MIT
License-File: LICENSE
Classifier: Development Status :: 4 - Beta
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Requires-Python: >=3.8
Requires-Dist: torch>=2.0.0
Provides-Extra: convert
Requires-Dist: keras==2.15.0; extra == 'convert'
Requires-Dist: safetensors==0.4.2; extra == 'convert'
Requires-Dist: tensorflow==2.15.0; extra == 'convert'
Provides-Extra: test
Requires-Dist: keras==2.15.0; extra == 'test'
Requires-Dist: onnxruntime==1.17.1; extra == 'test'
Requires-Dist: pytest>=8.0.0; extra == 'test'
Requires-Dist: retvec==1.0.1; extra == 'test'
Requires-Dist: safetensors==0.4.2; extra == 'test'
Requires-Dist: tensorflow-similarity==0.17.1; extra == 'test'
Requires-Dist: tensorflow==2.15.0; extra == 'test'
Description-Content-Type: text/markdown

# retsim-pytorch
[![PyPI Version](https://img.shields.io/pypi/v/retsim-pytorch.svg)](https://pypi.org/project/retsim-pytorch)
[![Supported Python Versions](https://img.shields.io/pypi/pyversions/retsim-pytorch.svg)](https://pypi.org/project/retsim-pytorch)

Welcome to `retsim-pytorch`, the PyTorch adaptation of Google's [RETSim](https://arxiv.org/abs/2311.17264) (Resilient and Efficient Text Similarity) model, which is part of the [UniSim (Universal Similarity)](https://github.com/google/unisim) framework.

This model is designed for efficient and accurate multilingual fuzzy string matching, near-duplicate detection, and assessing string similarity. For more information, please refer to the [UniSim documentation](https://github.com/google/unisim).

## Installation

You can easily install `retsim-pytorch` via pip:

```shell
pip install retsim-pytorch
```

## Usage

You can configure the model using the `RETSimConfig` class. By default, it utilizes the same configuration as the original UniSim model. If you wish to use the same weights as the original Google model, you can download a SafeTensors port of the weights [here](./weights/model.safetensors).

Here's how to use the model in your code:

```python
import torch
from retsim_pytorch import RETSim, RETSimConfig
from retsim_pytorch.preprocessing import binarize

# Configure the model
config = RETSimConfig()
model = RETSim(config)

# Prepare and run inference
binarized_inputs, chunk_ids = binarize(["hello world"])
embedded, unpooled = model(torch.tensor(binarized_inputs))
```