Metadata-Version: 2.4
Name: khmereasytools
Version: 0.3.2
Summary: A simple library for Khmer text processing, including keyword extraction, segmentation, OCR, and validation.
Home-page: https://github.com/back-kh/khmereasytools
Author: Nimol Thuon
Author-email: nimol.thuon@gmail.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: khmercut>=0.1.0
Requires-Dist: khmernltk>=1.6
Requires-Dist: pytesseract>=0.3.8
Requires-Dist: Pillow>=9.0.0
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary


# Khmer Easy Tools

A simple, user-friendly Python library for common Khmer Natural Language Processing (NLP) tasks. This package provides easy-to-use functions for keyword extraction, segmentation, POS tagging, OCR, and character validation.

## Installation

Install the package using pip:

```bash
pip install khmereasytools
```

**For OCR functionality**, you must also install Google's Tesseract OCR engine on your system.
-   [Tesseract Installation Guide](https://github.com/tesseract-ocr/tesseract/wiki)
-   Make sure to install the Khmer (`khm`) language data.

## How to Use

### Khmer Character Validation (`is_khmer`)
Checks if a string contains Khmer characters.
```python
import khmereasytools as ket

print(ket.is_khmer("សួស្តី"))  # True
print(ket.is_khmer("Hello")) # False
```

### Keyword Extraction (`khfilter`)
Segments text and removes stop words.
```python
import khmereasytools as ket
text = "នេះគឺជាប្រាសាទអង្គរវត្តស្ថិតនៅក្នុងខេត្តសៀមរាប"
keywords = ket.khfilter(text)
print(f"Keywords: '{{keywords}}'")
```

### Text Segmentation (`khseg`)
Segments text into words using `khmer-nltk`.
```python
import khmereasytools as ket
text = "នេះគឺជាប្រាសាទអង្គរវត្ត"
words = ket.khseg(text)
print(f"Segmented Words: {words}")
```

### Syllable Segmentation (`syllable_segment`)
Segments text into syllables.
```python
import khmereasytools as ket
text = "សាលារៀន"
syllables = ket.syllable_segment(text)
print(f"Syllables: {syllables}")
```

### Part-of-Speech Tagging (`pos_tag`)
Tags words with their part of speech.
```python
import khmereasytools as ket
text = "ខ្ញុំ ស្រឡាញ់ ភាសាខ្មែរ"
tags = ket.pos_tag(text)
print(f"POS Tags: {tags}")
```

### OCR from Image (`ocr_from_image`)
Extracts Khmer text from an image.
```python
import khmereasytools as ket
# Make sure you have an image file e.g., 'khmer_text.png'
# text_from_image = ket.ocr_from_image('khmer_text.png')
# print(f"Text from OCR: {{text_from_image}}")
```
