Metadata-Version: 2.1
Name: reason
Version: 0.5.1
Summary: Easy-to-use NLP toolbox
Home-page: https://github.com/alisoltanirad/reason
Author: Ali Soltani Rad
Author-email: soltaniradali@gmail.com
License: UNKNOWN
Platform: UNKNOWN
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Text Processing
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Operating System :: OS Independent
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Education
Classifier: Natural Language :: English
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Requires-Dist: numpy

# Reason

[![License](https://img.shields.io/pypi/l/reason.svg)](https://github.com/alisoltanirad/Reason/blob/main/LICENSE)
[![PyPI](https://img.shields.io/pypi/v/reason.svg)](https://pypi.org/project/reason/)
[![Downloads](https://pepy.tech/badge/reason)](https://pepy.tech/project/reason)
[![Lines of Code](https://sonarcloud.io/api/project_badges/measure?project=alisoltanirad_reason&metric=ncloc)](https://sonarcloud.io/dashboard?id=alisoltanirad_reason)
[![Activity](https://img.shields.io/github/last-commit/alisoltanirad/reason)](https://github.com/alisoltanirad/Reason/)

Python easy-to-use natural language processing toolbox.


## Packages

- **classify**  
Naive bayes classifier
- **metrics**  
Confusion matrix, accuracy
- **tag**  
POS tagger, regex, lookup and default tagging tools
- **tokenize**  
Regex word and sentence tokenizer
- **stem**  
Porter and regex stemmer
- **analysis**  
Frequency distribution
- **util**  
Bigrams, trigrams and ngrams


## Install

Install latest stable version using pip:
```
pip install reason
```


## Quick Start

Classification:

```python
>>> from reason.classify import NaiveBayesClassifier
>>> classifier = NaiveBayesClassifier(train_set)
>>> y_pred = classifier.classify(new_data)

>>> from reason.metrics import accuracy
>>> accuracy(y_true, y_pred)
0.9358
```

Confusion matrix:

```python
>>> from reason.metrics import ConfusionMatrix
>>> cm = ConfusionMatrix(y_true, y_pred)

>>> cm
68 21 13
16 70 11
14 10 77

>>> cm[actual, predicted]
16

>>> from reason.metrics import BinaryConfusionMatrix
>>> bcm = BinaryConfusionMatrix(b_y_true, b_y_pred)

>>> bcm.precision()
0.7837
>>> bcm.recall()
0.8055
>>> bcm.f1_score()
0.7944
```

Part-of-speech tagging:

```python
>>> from reason.tag import POSTagger

>>> text = "10 tools from the file"
>>> tagger = POSTagger()
>>> tagger.tag(text)
[('10', 'CD'), ('tools', 'NNS'), ('from', 'IN'), ('the', 'AT'), ('file', 'NN')]
```

Word tokenization:

```python
>>> from reason.tokenize import word_tokenize

>>> text = "Testing reason0.1.0, (on: 127.0.0.1). Cool stuff..."
>>> word_tokenize(text, 'alphanumeric')
['Testing', 'reason0.1.0', 'on', '127.0.0.1', 'Cool', 'stuff']
```

Sentence tokenization:

```python
>>> from reason.tokenize import sent_tokenize

>>> text = "Hey, what's up? I love using Reason library!"
>>> sents = sent_tokenize(text)
>>> for sent in sents:
...     print(sent)
Hey, what's up?
I love using Reason library!
```

Lemmatization:

```python
>>> from reason.stem import PorterStemmer

>>> text = "watched birds flying"
>>> stemmer = PorterStemmer()
>>> stemmer.stem(text)
['watch', 'bird', 'fly']

>>> from reason.stem import regex_stem

>>> regex_pattern = r'^(.*?)(ous)?$'
>>> regex_stem('dangerous', regex_pattern)
danger
```

Preprocess text (tokenizing + stemming):

```python
>>> from reason import preprocess

>>> text = "What's up? I love using Reason library!"
>>> preprocess(text)
[["what's", 'up', '?'], ['i', 'love', 'us', 'reason', 'librari', '!']]
```

Frequency distribution:

```python
>>> from reason.analysis import FreqDist

>>> words = ['hey', 'hey', 'oh', 'oh', 'oh', 'yeah']
>>> fd = FreqDist(words)

>>> fd
Frequency Distribution
Most-Common: [('oh', 3), ('hey', 2), ('yeah', 1)]
>>> fd.most_common(2)
[('oh', 3), ('hey', 2)]
>>> fd['yeah']
1
```

N-grams:

```python
>>> sent = "Reason is easy to use"

>>> from reason.util import bigrams
>>> bigrams(sent)
[('Reason', 'is'), ('is', 'easy'), ('easy', 'to'), ('to', 'use')]

>>> from reason.util import trigrams
>>> trigrams(sent)
[('Reason', 'is', 'easy'), ('is', 'easy', 'to'), ('easy', 'to', 'use')]

>>> from reason.util import ngrams
>>> ngrams(sent, 4)
[('Reason', 'is', 'easy', 'to'), ('is', 'easy', 'to', 'use')]
```


## Dependencies

- [NumPy](https://numpy.org)  
Used to handle data
- [Pandas](https://pandas.pydata.org)  
Used in classify package

Keep in mind *NumPy* will be automatically installed with *Reason*.


## License

MIT -- See [LICENSE](https://github.com/alisoltanirad/Reason/blob/main/LICENSE) 
for details.


