Metadata-Version: 2.4
Name: picolang
Version: 0.5.0
Summary: Simple, fast dictionary-based language detector
Author: Piero Toffanin
Author-email: Aurelien Thirion <aurelien.thirion@circl.lu>
Maintainer-email: Aurelien Thirion <aurelien.thirion@circl.lu>
License-Expression: AGPL-3.0-only
Project-URL: Homepage, https://ail-project.org/
Project-URL: Repository, https://github.com/ail-project/PicoLang
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Intended Audience :: Developers
Classifier: Topic :: Text Processing :: Linguistic
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: license-file

# PicoLang

Simple, fast dictionary-based language detector for short texts.

## Installation

```bash
pip install picolang
```

## Usage

```python
from picolang.detector import detect

print(detect("bonjour")) # ('fr', 0.45)
print(detect("学中文")) # ('zh', 0.45)
print(detect("ciao mondo")) # ('it', 0.9)
print(detect("El gato doméstico")) # ('es', 0.45)

# Optionally, specify a subset of languages to consider
print(detect("ciao", languages=["de", "ro"])) # ('de', 0.45)
```

`detect(text, languages=[])` -> tuple (`iso_639_1`, `confidence`)

## Supported Languages

 * Afrikaans
 * Albanian
 * Arabic
 * Basque
 * Bengali
 * Bulgarian
 * Catalan
 * Chinese
 * Czech
 * Danish
 * Dutch
 * English
 * Esperanto
 * Estonian
 * Finnish
 * French
 * German
 * Greek
 * Hebrew
 * Hindi
 * Hungarian
 * Indonesian
 * Italian
 * Japanese
 * Kabyle
 * Kazakh
 * Korean
 * Latvian
 * Lithuanian
 * Macedonian
 * Norwegian
 * Occitan
 * Polish
 * Portuguese
 * Romanian
 * Russian
 * Serbian
 * Slovak
 * Slovenian
 * Spanish
 * Swedish
 * Thai
 * Turkish
 * Ukrainian
 * Vietnamese
 * Farsi

## Limitations

This detector was designed for handling small texts (< 150 characters). It will probably not work reliably for longer text sequences. As it relies on dictionaries, if a word is missing or misspelled, the detection will fail.

## Contributing

If you want to add a new language, or improve an existing one, add more words to the respective dictionary in the `dictionaries` folder.

## License

AGPLv3
