Metadata-Version: 2.4
Name: discophon
Version: 0.0.7
Summary: The Phoneme Discovery Benchmark
Author: Maxime Poli
Author-email: CoML <dev@cognitive-ml.fr>
License-Expression: MIT
License-File: LICENSE
Keywords: machine learning,speech
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: C++
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.12
Requires-Dist: filelock>=3.20.2
Requires-Dist: fsspec[http]>=2026.2.0
Requires-Dist: joblib>=1.5.3
Requires-Dist: numba>=0.63.1
Requires-Dist: numpy>=2.3.5
Requires-Dist: polars>=1.36.1
Requires-Dist: praat-textgrids>=1.4.0
Requires-Dist: scipy>=1.17.0
Requires-Dist: soundfile>=0.13.1
Requires-Dist: soxr>=1.0.0
Requires-Dist: tqdm>=4.67.1
Requires-Dist: xarray>=2025.12.0
Provides-Extra: abx
Requires-Dist: fastabx>=0.7.0; extra == 'abx'
Provides-Extra: baselines
Requires-Dist: minimal-hubert>=0.0.2; extra == 'baselines'
Requires-Dist: spidr[train]>=0.1.3; extra == 'baselines'
Description-Content-Type: text/markdown

# DiscoPhon

<div class="page-subtitle">Benchmarking the Unsupervised Discovery of Phoneme Inventories With Discrete Speech Units</div>

[arXiv](https://arxiv.org/abs/2603.18612) · [GitHub](https://github.com/bootphon/discophon) · [Website](https://benchmarks.cognitive-ml.fr/discophon)

DiscoPhon is a multilingual benchmark evaluating unsupervised phoneme discovery from discrete speech units.
Given only 10 hours of speech in an unseen language, models must produce discrete units that map to a predefined phoneme inventory.

## Getting started

- Install this package:
  ```bash
  pip install discophon
  ```
- [Follow the tutorials](./docs/index.md) to download data, evaluate models, and prepare your submission.
- [Current leaderboard](./leaderboard/index.md).

## References

```bibtex
@misc{poli2026discophon,
  title={{DiscoPhon}: Benchmarking the Unsupervised Discovery of Phoneme Inventories With Discrete Speech Units},
  author={Maxime Poli and Manel Khentout and Angelo Ortiz Tandazo and Ewan Dunbar and Emmanuel Chemla and Emmanuel Dupoux},
  year={2026},
  eprint={2603.18612},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2603.18612},
}
```
