Metadata-Version: 2.1
Name: LEKCut
Version: 0.1
Summary: LEKCut (เล็ก คัด) is a Thai tokenization library that ports the deep learning model to the onnx model.
Home-page: https://github.com/PyThaiNLP/LEKCut
Author: Wannaphong Phatthiyaphaibun
Author-email: wannaphong@yahoo.com
License: Apache Software License 2.0
Project-URL: Source, https://github.com/PyThaiNLP/LEKCut
Project-URL: Bug Reports, https://github.com/PyThaiNLP/LEKCut/issues
Keywords: thai,NLP,natural language processing,text analytics,text processing,localization,computational linguistics,Thai language
Classifier: Development Status :: 3 - Alpha
Classifier: Programming Language :: Python :: 3
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Text Processing
Classifier: Topic :: Text Processing :: General
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Requires-Dist: onnxruntime
Requires-Dist: onnx

# LEKCut
<a href="https://pypi.python.org/pypi/lekcut"><img alt="pypi" src="https://img.shields.io/pypi/v/lekcut.svg"/></a>

LEKCut (เล็ก คัด) is a Thai tokenization library that ports the deep learning model to the onnx model.

## Install

> pip install lekcut

## How to use

```python
from lekcut import word_tokenize
word_tokenize("ทดสอบการตัดคำ")
# output: ['ทดสอบ', 'การ', 'ตัด', 'คำ']
```

**API**

```python
word_tokenize(text: str, model: str="deepcut", path: str="default") -> List[str]
```

## Model
- ```deepcut``` - We ported deepcut model from tensorflow.keras to ONNX model. The model and code come from [Deepcut's Github](https://github.com/rkcosmos/deepcut). The model is [here](https://github.com/PyThaiNLP/LEKCut/blob/main/lekcut/model/deepcut.onnx).

### Load custom model

If you has trained custom your model from deepcut or other that LEKCut support, You can load the custom model by ```path``` in ```word_tokenize``` after porting your model.

- How to train custom model ith your dataset by deepcut - [Notebook](https://github.com/rkcosmos/deepcut/blob/master/notebooks/training.ipynb) (Needs to update ```deepcut/train.py``` before train model)

## How to porting model?

See ```notebooks/```
