Metadata-Version: 2.1
Name: cleaner_panda
Version: 0.1.7
Summary: A package for handling various data preprocessing tasks
Home-page: https://github.com/EmirhanSyl/cleaner-panda
Download-URL: https://github.com/EmirhanSyl/cleaner-panda/archive/refs/tags/v0.1.7.tar.gz
Author: asimtarik & emirs
Author-email: support@cleanpanda.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas
Requires-Dist: numpy
Requires-Dist: scikit-learn
Requires-Dist: nltk
Requires-Dist: category_encoders
Requires-Dist: scipy
Requires-Dist: bs4
Requires-Dist: contractions

# Cleaner Panda
 Programming For Data Engineering course final project

![https://github.com/EmirhanSyl/cleaner-panda/blob/main/logo.jpg](logo.jpg)

 https://pypi.org/project/cleaner-panda/

 ## Installation
 `pip install cleaner-panda`


## Modules


### Missing Value Handler
- `strategy enum {MEAN, MEDIAN, CONSTANT, REMOVE_ROW, REMOVE_COLUMN, FORWARD_BACKWARD}`
- `cont_int = 0, const_str =”none”, const_date=01.01.2024…`
- `replace_missing_values(dataFrame, strategy=”strategy.MEAN”, column=0)`
- `replace_mean(dataframe, column)`
- `replace_median(dataframe, column)`
- `replace_constant(dataframe, column, constant)`
- `replace_remove_row(dataframe, column)`
- `replace_remove_column(dataframe, column)`
- `replace_forward_backward(dataframe, column)`

### Outlier Handler
- `identify_outliers_iqr(data, threshold=1.5)`
- `handle_outliers_iqr(data, threshold=1.5, replacement=None)` //replacement: Value to replace outliers with (e.g., median, mean) or None to remove outliers


### Scaler
- `standardize_data(dataframe)`
- `normalize_data(dataframe)`
- `robust_scale_data(dataframe)`
- `normalize_vectors(dataframe)`
- `log_transform_data(dataframe)`


### Text Cleaner
- `remove_common_words(dataframe, column)`
- `convert_to_lowercase(dataframe, column)` // Stopwords are words like "the", "is", "and", "in", etc., that occur frequently in a language
- `remove_punctuation(dataframe, column)`
- `lemmatization(dataframe, column)`
- `expand_contractions(dataframe, column)` // (e.g., "can't" to "cannot", "won't" to "will not")
- `remove_special_characters(dataframe, column, remove=[‘.’])`
- `remove_numerical(dataframe, column)`
- `filter_words(dataframe, column, remove=[“fuck”])`


### Data Type Converter
- 


### Categorical Encoder
- `label_encoding(dataframe, column)`
- `one_hot_encoding(dataframe, column)`
- `ordinal_encoding(dataframe, column)`


### Date Time Handler
- `convert_date_to_strings(dataframe column)`
- `extract_components(dataframe, column)`
- `reformat_date(dataframe, column)`
- `calculate_datetime_differences()`
- `convert_datetime_to_different_timezones`
- `shift_time()`
- `handle_irregular_time_intervals()`
