Metadata-Version: 2.4
Name: tamil-utils
Version: 0.0.1
Summary: Tiny Tamil text utilities: normalize, tokenize, stopword removal, graphemes
Author: Arulnidhi Karunanidhi
License: MIT
Project-URL: Homepage, https://github.com/arulnidhii/tamil-utils
Project-URL: Issues, https://github.com/arulnidhii/tamil-utils/issues
Keywords: tamil,indic,nlp,unicode
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: regex>=2023.0
Dynamic: license-file

# tamil-utils

Tiny **Tamil** text utilities: `normalize`, `tokens`, `remove_stopwords`, `graphemes`.

```python
from tamil_utils import normalize, tokens, remove_stopwords, graphemes

s = "இது ஒரு சோதனை 👋🏽"
print(tokens(s))                        # ['இது', 'ஒரு', 'சோதனை']
print(remove_stopwords(tokens(s)))      # ['சோதனை']
print(graphemes("👩🏽‍💻"))               # ['👩🏽‍💻']
