Metadata-Version: 2.1
Name: nlpturk
Version: 0.0.2
Summary: Turkish NLP library
Author-email: "Bedii A. Aydoğan" <nlpturk.ai@gmail.com>
License: MIT License
        
        Copyright (c) 2022 Bedii A. Aydoğan
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/nlpturk/nlpturk
Project-URL: Repository, https://github.com/nlpturk/nlpturk
Keywords: nlp,turkish
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Provides-Extra: build
Provides-Extra: dev
License-File: LICENSE

# nlpTurk - Turkish NLP library

nlpTurk is an open source Turkish NLP library consisting of machine learning based sentence boundary detection, lemmatization and POS tagging models.

## Installation & Usage

nlpTurk can be installed from [PyPI](https://pypi.org/project/nlpturk/). 
 
```bash
pip install nlpturk
```

nlpTurk offers a simple API to extract sentences, lemmas and POS tags.

```python
import nlpturk

text = "Sosyal medya hayatımıza hızlı girdi.ama yazım kurallarına dikkat eden pek yok :)"
doc = nlpturk(text)

# iterate over tokens
for token in doc:
    print(f"token: {token.text}, lemma: {token.lemma}, pos: {token.pos}")

"""
Prints:
  token: Sosyal, lemma: sosyal, pos: ADJ
  token: medya, lemma: medya, pos: NOUN
  ...
"""

# or get tokens by token ids
token = doc[5]
print(f"token: {token.text}, sent_start: {token.is_sent_start}, sent_end: {token.is_sent_end}")
token = doc[6]
print(f"token: {token.text}, sent_start: {token.is_sent_start}, sent_end: {token.is_sent_end}")

"""
Prints:
  token: ., sent_start: False, sent_end: True
  token: ama, sent_start: True, sent_end: False
"""

# iterate over sentences
for i, sent in enumerate(doc.sents):
    print(f"sentence #{i+1}: {sent.text}")
    for token in sent:
        print(f"  token: {token.text}, lemma: {token.lemma}, pos: {token.pos}")

"""
Prints:
  sentence #1: Sosyal medya hayatımıza hızlı girdi.
    token: Sosyal, lemma: sosyal, pos: ADJ
    ...
  sentence #2: ama yazım kurallarına dikkat eden pek yok :)
    token: ama, lemma: ama, pos: CCONJ
    ...
"""
```

## Performance

The evaluation was performed on test dataset. Detailed evaluation and benchmarking results can be found [here](https://github.com/nlpturk/nlpturk/blob/master/benchmarks).

|                        | accuracy | precision | recall | f1-score | 
| :--------------------- | :------: | :-------: | :----: | :------: | 
| **Sentence Segmenter** |    -     |   98.09   |  96.05 |  97.06   |  
| **POS Tagger**         |    -     |   95.75   |  96.26 |  96.01   |   
| **Lemmatizer**         |  96.87   |     -     |    -   |    -     |

<br/>You can perform benchmarking on your own dataset.

```bash
git clone https://github.com/nlpturk/nlpturk.git
cd nlpturk
pip install -r requirements.txt
python -m nlpturk benchmark --data_path path/to/data --output_path path/to/output
```
