Metadata-Version: 2.1
Name: olunicodenormalizer
Version: 1.0.0
Summary: Olchiki Unicode Normalization Toolkit
Home-page: UNKNOWN
Author: Shivnath Kisku
Author-email: 
License: MIT
Keywords: olchiki,unicode,text normalization,indic
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Education
Classifier: Operating System :: OS Independent
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.10
Description-Content-Type: text/markdown
License-File: LICENSE

# olunicodenormalizer
ᱚᱞ-ᱪᱦᱤᱠᱤ Unicode Normalization for word normalization
# install
```python
pip install olunicodenormalizer
```
# useage
**initialization and cleaning**
```python
# import
from olunicodenormalizer import Normalizer 
from pprint import pprint
# initialize
bnorm=Normalizer()
# normalize
word = 'ᱡᱚᱦᱟᱨ'
result=bnorm(word)
print(f"Non-norm:{word}; Norm:{result['normalized']}")
print("--------------------------------------------------")
pprint(result)
```
> output 

```
Non-norm:ᱡᱚᱦᱟᱨ; Norm:ᱡᱚᱦᱟᱨ
--------------------------------------------------
{'given': 'ᱡᱚᱦᱟᱨ', 'normalized': 'ᱡᱚᱦᱟᱨ', 'ops': []}
```



```python
# initialize without english (default)
norm=Normalizer()
print("without english:",norm("ASD123")["normalized"])
# --> returns None
norm=Normalizer(allow_english=True)
print("with english:",norm("ASD123")["normalized"])

```
> output

```
without english: None
with english: ASD123
```

 


Change Log
===========

0.0.5 (9/03/2022)
-------------------
- added details for execution map
- checkop typo correction

0.0.6 (9/03/2022)
-------------------
- broken diacritics op addition

0.0.7 (11/03/2022)
-------------------
- assemese replacement
- word op and unicode op mapping
- modifier list modification
- doc string for call and initialization
- verbosity removal
- typo correction for operation
- unit test updates
- 'এ' replacement correction
- NonGylphUnicodes
- Legacy symbols option
- legacy mapper added 
- added bn:bd declaration

0.0.8 (14/03/2022)
-------------------
- MultipleConsonantDiacritics handling change
- to+hosonto correction
- invalid hosonto correction 

0.0.9 (15/04/2022)
-------------------
- base normalizer
- language class
- olchiki extension
- complex root normalization 

0.0.10 (15/04/2022)
-------------------
- added conjucts
- exception for english words

0.0.11 (15/04/2022)
-------------------
- fixed no space char issue for olchiki

0.0.12 (26/04/2022)
-------------------
- fixed consonants orders 

0.0.13 (26/04/2022)
-------------------
- fixed non char followed by diacritics 

0.0.14 (01/05/2022)
-------------------
- word based normalization
- encoding fix

0.0.15 (02/05/2022)
-------------------
- import correction

0.0.16 (02/05/2022)
-------------------
- local variable issue

0.0.17 (17/05/2022)
-------------------
- nukta mod break

0.0.18 (08/06/2022)
-------------------
- no space chars fix


0.0.19 (15/06/2022)
-------------------
- no space chars further fix
- base_olchiki_compose to avoid false op flags
- added foreign conjuncts


0.0.20 (01/08/2022)
-------------------
- এ্যা replacement correction

0.0.21 (01/08/2022)
-------------------
- "য","ব" + hosonto combination correction
- added 'ব্ল্য' in conjuncts

0.0.22 (22/08/2022)
-------------------
- \u200d combination limiting

0.0.23 (23/08/2022)
-------------------
- \u200d condition change

0.0.24 (26/08/2022)
-------------------
- \u200d error handling

0.0.25 (10/09/22)
-------------------
- removed unnecessary operations: fixRefOrder,fixOrdersForCC
- added conjuncts: 'র্ন্ত','ঠ্য','ভ্ল'

0.1.0 (20/10/22)
-------------------
- added indic parser
- fixed language class

0.1.1 (21/10/22)
-------------------
- added nukta and diacritic maps for indics 
- cleaned conjucts for now 
- fixed issues with no-space and connector

0.1.2 (10/12/22)
-------------------
- allow halant ending for indic language except olchiki

0.1.3 (10/12/22)
-------------------
- broken char break cases for halant 

0.1.4 (01/01/23)
-------------------
- added sylhetinagri 

0.1.5 (01/01/23)
-------------------
- cleaned panjabi double quotes in diac map 

0.0.1 (26/08/23)
-------------------
- added olchiki punctuations 

