Metadata-Version: 2.4
Name: tau-eval
Version: 0.2.1
Summary: Text Anonymization Evaluation Library
Author-email: Gabriel Loiseau <gabriel.loiseau@hornetsecurity.com>
Maintainer-email: Gabriel Loiseau <gabriel.loiseau@hornetsecurity.com>
License: GPL-3.0
Keywords: Text anonymization,evaluation,NLP
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: transformers<5.0.0,>=4.48.0
Requires-Dist: sentence-transformers>=3.3.1
Requires-Dist: torch>=2.5.0
Requires-Dist: evaluate>=0.4.1
Requires-Dist: datasets<4.0.0,>=2.14.4
Requires-Dist: huggingface-hub>=0.20.0
Requires-Dist: tasknet>=1.57.0
Requires-Dist: tasksource>=0.0.47
Requires-Dist: ipywidgets>=8.1.5
Requires-Dist: ipykernel>=6.29.5
Requires-Dist: rich>=14.0.0
Requires-Dist: accelerate>=1.6.0
Requires-Dist: bert-score>=0.3.13
Requires-Dist: nltk>=3.9.1
Requires-Dist: rouge-score>=0.1.2
Requires-Dist: faker>=37.1.0
Requires-Dist: presidio-analyzer>=2.2.358
Requires-Dist: presidio-anonymizer>=2.2.358
Requires-Dist: pip>=25.0.1
Requires-Dist: pytest>=8.3.5
Requires-Dist: numpy>=2.0.2
Requires-Dist: matplotlib>=3.9.4
Provides-Extra: tests
Requires-Dist: pytest; extra == "tests"
Requires-Dist: accelerate>=0.20.3; extra == "tests"
Provides-Extra: quality
Requires-Dist: ruff; extra == "quality"
Requires-Dist: pyyaml>=5.3.1; extra == "quality"
Provides-Extra: docs
Requires-Dist: sphinx-rtd-theme; extra == "docs"
Requires-Dist: sphinx; extra == "docs"
Dynamic: license-file

# 𝜏 Tau-Eval: A Unified Evaluation Framework for Useful and Private Text Anonymization

**Tau-Eval** is a user-friendly, modular, and customizable Python library designed to benchmark and evaluate text anonymization algorithms. It enables granular analysis of anonymization impacts from both privacy and utility perspectives. Tau-Eval seamlessly integrates with [LiteLLM](https://www.litellm.ai/) and [🤗 Hugging Face](https://huggingface.co/) to support a wide range of datasets, models, and evaluation metrics.

<div align="center">

[![GNU-GPLv3](https://img.shields.io/badge/license-%20%20GNU%20GPLv3%20-green?style=plastic)](LICENSE)
[![v0.1.0](https://img.shields.io/badge/pypi-v0.2.0-orange)](https://pypi.org/project/tau-eval/0.2.0/)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue)](https://www.python.org/downloads/)
[![Tutorials](https://img.shields.io/badge/tutorials-colab-orange)](https://github.com/gabrielloiseau/tau-eval/tree/main/examples)
[![Docs - GitHub.io](https://img.shields.io/static/v1?logo=github&style=flat&color=pink&label=docs&message=Tau-Eval)](https://tau-eval.readthedocs.io/en/latest/)

</div>


## Installation

### From PyPI

Install Tau-Eval via pip:

```
pip install tau-eval
```

### From source

To install from source:

1) Clone this repository on your own path:
```
git clone https://github.com/gabrielloiseau/tau-eval.git
cd tau-eval
```

2) Create an environment with your own preferred package manager. We used [python 3.10](https://www.python.org/downloads/release/python-3100/) and dependencies listed in [`pyproject.toml`](pyproject.toml). If you use [conda](https://docs.conda.io/en/latest/), you can just run the following commands from the root of the project:

```
conda create --name taueval python=3.10        # create the environment
conda activate taueval                         # activate the environment
pip install -e .                               # install the required packages
```


## Quickstart

Tau-Eval is designed for flexibility. With just a few lines of code, you can set up and run evaluations.

#### 1. Define Your Anonymization Model

Create a custom anonymization model by extending the Anonymizer interface:
```python
from tau_eval.models import Anonymizer

class TestModel(Anonymizer):
    def __init__(self):
        self.name = "Test Model"

    def anonymize(self, text: str) -> str:
        # Implement anonymization logic
        return text

    def anonymize_batch(self, texts: list[str]) -> list[str]:
        # Batch processing
        return texts

```
Or use prebuilt models from `tau_eval.models`.

#### 2. Configure Evaluation Metrics
Use built-in metrics from `tau_eval.metrics` or define your own following this signature:
```python
Callable[[str | list[str], str | list[str]], dict]
```
This allows complete control over what and how you evaluate.

#### 3. Instantiate Tasks
Tasks can be created using prebuilt options in `tau_eval.tasks`, or customized using `CustomTask`. Tau-Eval also supports [tasksource](https://github.com/sileod/tasksource) for dataset integration.
```python
from tau_eval.tasks import DeIdentification
from tasknet import AutoTask

anli = AutoTask("anli/a1")
deid = DeIdentification(dataset="ai4privacy/pii-masking-400k")
```

#### 4. Configure and Run Your Experiment
Define an experiment configuration:
```python
from tau_eval.config import ExperimentConfig

config = ExperimentConfig(
    exp_name="test-experiment",
    classifier_name="answerdotai/ModernBERT-base",
    train_task_models=True,
    train_with_generations=False,
)
```
Run the experiment:
```python
from tau_eval.experiment import Experiment

Experiment(
    models=[TestModel(), ...],
    metrics=["bertscore", "rouge"],
    tasks=[anli, deid],
    config=config
).run()
```
#### 5. Visualize Results

Tau-Eval includes built-in visualization tools to compare model anonymization strategies and evaluation results. You can find them with `tau_eval.visualization`. 


## Tutorials

You can explore our tutorials to master **Tau-Eval** more effectively in the [`examples/`](https://github.com/gabrielloiseau/tau-eval/tree/main/examples) folder.


## Contributors

- **[Gabriel Loiseau](https://gabrielloiseau.github.io)**, *Hornetsecurity, Inria Lille*


## Citation

If you use 𝜏 **Tau-Eval** in your work, please cite our paper as follows:

```
@misc{loiseau2025taueval,
      title={Tau-Eval: A Unified Evaluation Framework for Useful and Private Text Anonymization}, 
      author={Gabriel Loiseau, Damien Sileo, Damien Riquet, Maxime Meyer, Marc Tommasi},
      year={2025},
      eprint={2506.05979},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2506.05979}, 
}
```
