Metadata-Version: 2.1
Name: text-hammer
Version: 0.1.4
Summary: This is text preprocessing package
Home-page: UNKNOWN
Author: Abhishek Jaiswal
Author-email: abhishek.jaiswal26102001@gmail.com
License: Apache License 2.0
Platform: UNKNOWN
Description-Content-Type: text/markdown
Requires-Dist: beautifulsoup4 (==4.9.1)
Requires-Dist: pandas
Requires-Dist: numpy
Requires-Dist: spacy
Requires-Dist: TextBlob

Dependencies
```
pip install spacy==2.2.3
python -m spacy download en_core_web_sm
pip install beautifulsoup4==4.9.1
pip install textblob==0.15.3
```


INSTALLATION 
'''
pip install text_hammer

'''


#### How to use it for preprocessing

You have to have installed spacy and python3 to make it work.
import text_hammer as th
```
def get_clean(x):
    x = str(x).lower().replace('\\', '').replace('_', ' ')
    x = th.cont_exp(x)
    x = th.remove_emails(x)
    x = th.remove_urls(x)
    x = th.remove_html_tags(x)
    x = th.remove_rt(x)
    x = th.remove_accented_chars(x)
    x = th.remove_special_chars(x)
    x = re.sub("(.)\\1{2,}", "\\1", x)
    return x
```

Use this if you want to use one by one
```
import pandas as pd
import numpy as np
import text_hammer as th

df = pd.read_csv('imdb_reviews.txt', sep = '\t', header = None)
df.columns = ['reviews', 'sentiment']

# These are series of preprocessing
df['reviews'] = df['reviews'].apply(lambda x: th.cont_exp(x)) #you're -> you are; i'm -> i am
df['reviews'] = df['reviews'].apply(lambda x: th.remove_emails(x))
df['reviews'] = df['reviews'].apply(lambda x: th.remove_html_tags(x))
df['reviews'] = df['reviews'].apply(lambda x: th.remove_urls(x))

df['reviews'] = df['reviews'].apply(lambda x: th.remove_special_chars(x))
df['reviews'] = df['reviews'].apply(lambda x: th.remove_accented_chars(x))
df['reviews'] = df['reviews'].apply(lambda x: th.make_base(x)) #ran -> run, 
df['reviews'] = df['reviews'].apply(lambda x: th.spelling_correction(x).raw_sentences[0]) #seplling -> spelling
```

Note: Avoid to use `make_base` and `spelling_correction` for very large dataset otherwise it might take hours to process.


#### Extra

```
x = 'lllooooovvveeee youuuu'
x = re.sub("(.)\\1{2,}", "\\1", x)
print(x)
---
love you
```

