Metadata-Version: 2.1
Name: pyoptimus
Version: 21.9.0b0
Summary: Optimus is the missing framework for cleaning and pre-processing data in a distributed fashion.
Home-page: https://github.com/hi-primus/optimus/
Author: Argenis Leon
Author-email: argenisleon@gmail.com
License: APACHE
Keywords: datacleaner,data-wrangling,data-cleansing,data-profiling
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Description-Content-Type: text/markdown
Requires-Dist: dask (==2021.5.0)
Requires-Dist: distributed (==2021.5.0)
Requires-Dist: numba (>=0.53.1)
Requires-Dist: jsonschema (>=3.2.0)
Requires-Dist: simplejson (>=3.17.2)
Requires-Dist: cryptography (>=3.2.1)
Requires-Dist: imgkit (>=1.2.2)
Requires-Dist: packaging (>=20.9)
Requires-Dist: requests (>=2.24.0)
Requires-Dist: tqdm (>=4.51.0)
Requires-Dist: fastnumbers (>=3.1.0)
Requires-Dist: multipledispatch (>=0.6.0)
Requires-Dist: numpy (>=1.21.2)
Requires-Dist: yellowbrick
Requires-Dist: deprecated (>=1.2.12)
Requires-Dist: setuptools (>=50.3.1)
Requires-Dist: Jinja2 (>=3.0.1)
Requires-Dist: pygments (>=2.2.0)
Requires-Dist: ipython (>=7.24.1)
Requires-Dist: humanize (>=3.9.0)
Requires-Dist: psutil (>=5.8.0)
Requires-Dist: ordered-set (>=4.0.2)
Requires-Dist: deepdiff (>=5.5.0)
Requires-Dist: statsmodels (>=0.12.2)
Requires-Dist: glom (>=20.11.0)
Requires-Dist: fastavro (>=1.4.1)
Requires-Dist: fast-histogram (>=0.9)
Requires-Dist: nltk (>=3.6.2)
Requires-Dist: pendulum (>=2.1.2)
Requires-Dist: rply (>=0.7.8)
Requires-Dist: pybigquery (>=0.10.1)
Requires-Dist: pandavro (>=1.6.0)
Requires-Dist: openpyxl (>=3.0.7)
Requires-Dist: tabulate (>=0.8.9)
Requires-Dist: matplotlib (>=3.4.2)
Requires-Dist: seaborn (>=0.11.1)
Requires-Dist: wordninja (==2.0.0)
Requires-Dist: jellyfish (>=0.8.7)
Requires-Dist: Metaphone (==0.6)
Requires-Dist: num2words (==0.5.10)
Requires-Dist: xlrd (>=2.0.1)
Requires-Dist: s3fs (>=2021.7.0)
Requires-Dist: aiobotocore[boto3] (>=1.3.3)
Requires-Dist: python-Levenshtein (>=0.12.2)
Requires-Dist: string-grouper (==0.5.0)
Requires-Dist: python-magic (>=0.4.15) ; sys_platform != "win32"
Requires-Dist: python-magic-bin (==0.4.14) ; sys_platform == "win32"
Provides-Extra: ai
Requires-Dist: tensorflow (>=2.0.0b1) ; extra == 'ai'
Requires-Dist: keras (>=2.4.3) ; extra == 'ai'
Requires-Dist: nltk (>=3.4.5) ; extra == 'ai'
Provides-Extra: all
Requires-Dist: pyspark (>=2.4.1) ; extra == 'all'
Requires-Dist: findspark (>=1.3.0) ; extra == 'all'
Requires-Dist: pandas (==1.2.4) ; extra == 'all'
Requires-Dist: dask[complete] (==2021.5.0) ; extra == 'all'
Requires-Dist: distributed (==2021.5.0) ; extra == 'all'
Requires-Dist: dask-ml (>=1.9.0) ; extra == 'all'
Requires-Dist: pyarrow (==1.0.1) ; extra == 'all'
Requires-Dist: coiled (>=0.0.30) ; extra == 'all'
Requires-Dist: vaex (==4.1) ; extra == 'all'
Requires-Dist: gputil ; extra == 'all'
Requires-Dist: dask[complete] (==2021.4.0) ; extra == 'all'
Requires-Dist: distributed (==2021.4.0) ; extra == 'all'
Requires-Dist: tensorflow (>=2.0.0b1) ; extra == 'all'
Requires-Dist: keras (>=2.4.3) ; extra == 'all'
Requires-Dist: nltk (>=3.4.5) ; extra == 'all'
Requires-Dist: sqlalchemy (==1.3.18) ; extra == 'all'
Requires-Dist: flask ; extra == 'all'
Provides-Extra: api
Requires-Dist: flask ; extra == 'api'
Provides-Extra: cudf
Requires-Dist: gputil ; extra == 'cudf'
Requires-Dist: dask[complete] (==2021.4.0) ; extra == 'cudf'
Requires-Dist: distributed (==2021.4.0) ; extra == 'cudf'
Provides-Extra: dask
Requires-Dist: dask[complete] (==2021.5.0) ; extra == 'dask'
Requires-Dist: distributed (==2021.5.0) ; extra == 'dask'
Requires-Dist: dask-ml (>=1.9.0) ; extra == 'dask'
Requires-Dist: pyarrow (==1.0.1) ; extra == 'dask'
Requires-Dist: coiled (>=0.0.30) ; extra == 'dask'
Provides-Extra: db
Requires-Dist: sqlalchemy (==1.3.18) ; extra == 'db'
Provides-Extra: docs
Requires-Dist: sphinx ; extra == 'docs'
Requires-Dist: pytest ; extra == 'docs'
Requires-Dist: mock ; extra == 'docs'
Requires-Dist: nose ; extra == 'docs'
Provides-Extra: lint
Requires-Dist: pep8 ; extra == 'lint'
Requires-Dist: pyflakes ; extra == 'lint'
Provides-Extra: pandas
Requires-Dist: pandas (==1.2.4) ; extra == 'pandas'
Provides-Extra: spark
Requires-Dist: pyspark (>=2.4.1) ; extra == 'spark'
Requires-Dist: findspark (>=1.3.0) ; extra == 'spark'
Provides-Extra: test
Requires-Dist: pytest ; extra == 'test'
Requires-Dist: mock ; extra == 'test'
Requires-Dist: nose ; extra == 'test'
Provides-Extra: vaex
Requires-Dist: vaex (==4.1) ; extra == 'vaex'

# Optimus

[![Logo Optimus](https://raw.githubusercontent.com/hi-primus/optimus/develop-21.9/images/optimus-logo.png)](https://hi-optimus.com)

![Tests](https://github.com/hi-primus/optimus/actions/workflows/main.yml/badge.svg)
![![Docker image updated](https://hub.docker.com/r/hiprimus/optimus)](https://github.com/hi-primus/optimus/actions/workflows/docker.yml/badge.svg)
[![PyPI Latest Release](https://img.shields.io/pypi/v/pyoptimus.svg)](https://pypi.org/project/pyoptimus/) 
[![GitHub release](https://img.shields.io/github/release/hi-primus/optimus.svg?include_prereleases)](https://github.com/hi-primus/optimus/releases)
[![CalVer](https://img.shields.io/badge/calver-YY.MM.MICRO-22bfda.svg)](http://calver.org)

[![Downloads](https://pepy.tech/badge/pyoptimus)](https://pepy.tech/project/pyoptimus)
[![Downloads](https://pepy.tech/badge/pyoptimus/month)](https://pepy.tech/project/pyoptimus/month)
[![Downloads](https://pepy.tech/badge/pyoptimus/week)](https://pepy.tech/project/pyoptimus/week)
[![Mentioned in Awesome Data Science](https://awesome.re/mentioned-badge.svg)](https://github.com/bulutyazilim/awesome-datascience) 
[![Slack](https://img.shields.io/badge/chat-slack-red.svg?logo=slack&color=36c5f0)](https://communityinviter.com/apps/hi-bumblebee/welcome)

# Get started 🏃

## Try Optimus

To launch a live notebook server to test optimus using binder or Colab, click on one of the following badges:

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/hi-primus/optimus/develop-21.9)
[![Colab](https://img.shields.io/badge/launch-colab-yellow.svg?logo=googlecolab&color=e6a210)](https://colab.research.google.com/github/hi-primus/optimus/blob/master/examples/10_min_to_optimus.ipynb)

## Installation (pip): 

In your terminal just type  ```pip install pyoptimus```

### Requirements
* Python 3.7 or 3.8

## Examples

You can go to the 10 minutes to Optimus [notebook](https://github.com/hi-primus/optimus/blob/develop-21.9/examples/10_min_to_optimus.ipynb) where you can find the basic to start working.

Also you can go to [Examples](https://github.com/hi-primus/optimus/tree/develop-21.9/examples/examples.md) and found specific notebooks about data cleaning, data munging, profiling, data enrichment and how to create ML and DL models.

Besides check the [Cheat Sheet](https://htmlpreview.github.io/?https://github.com/hi-primus/optimus/blob/develop-21.9/docs/cheatsheet/optimus_cheat_sheet.html)

## Start Optimus

Start Optimus using ```"pandas"```, ```"dask"```, ```"cudf"``` or ```"dask_cudf"```.

```python
from optimus import Optimus
op = Optimus("pandas")
```

## Loading data

Now Optimus can load data in csv, json, parquet, avro, excel from a local file or URL.

```python
#csv
df = op.load.csv("../examples/data/foo.csv")

#json
df = op.load.json("../examples/data/foo.json")

# using a url
df = op.load.json("https://raw.githubusercontent.com/hi-primus/optimus/develop-21.9/examples/data/foo.json")

# parquet
df = op.load.parquet("../examples/data/foo.parquet")

# ...or anything else
df = op.load.file("../examples/data/titanic3.xls")
```

Also, you can load data from oracle, redshift, mysql and postgres.

## Saving Data

```python
#csv
df.save.csv("data/foo.csv")

# json
df.save.json("data/foo.json")

# parquet
df.save.parquet("data/foo.parquet")
```

You can also save data to oracle, redshift, mysql and postgres.

## Create dataframes

Also, you can create a dataframe from scratch
```python
df = op.create.dataframe({
    'A': ['a', 'b', 'c', 'd'],
    'B': [1, 3, 5, 7],
    'C': [2, 4, 6, None],
    'D': ['1980/04/10', '1980/04/10', '1980/04/10', '1980/04/10']
})
```

Using `display` you have a beautiful way to show your data with extra information like column number, column data type and marked white spaces.

```python
display(df)
```
![](https://github.com/hi-primus/optimus/tree/develop-21.9/readme/images/table.png)

## Cleaning and Processing

Optimus was created to make data cleaning a breeze. The API was designed to be super easy to newcomers and very familiar for people that comes from Pandas.
Optimus expands the standard DataFrame functionality adding `.rows` and `.cols` accessors.

For example you can load data from a url, transform and apply some predefined cleaning functions:

```python
new_df = df\
    .rows.sort("rank", "desc")\
    .cols.lower(["names", "function"])\
    .cols.date_format("date arrival", "yyyy/MM/dd", "dd-MM-YYYY")\
    .cols.years_between("date arrival", "dd-MM-YYYY", output_cols="from arrival")\
    .cols.normalize_chars("names")\
    .cols.remove_special_chars("names")\
    .rows.drop(df["rank"]>8)\
    .cols.rename("*", str.lower)\
    .cols.trim("*")\
    .cols.unnest("japanese name", output_cols="other names")\
    .cols.unnest("last position seen", separator=",", output_cols="pos")\
    .cols.drop(["last position seen", "japanese name", "date arrival", "cybertronian", "nulltype"])
```

# Need help? 🛠️

## Feedback

Feedback is what drive Optimus future, so please take a couple of minutes to help shape the Optimus' Roadmap:  http://bit.ly/optimus_survey 

Also if you want to a suggestion or feature request use https://github.com/hi-primus/optimus/issues

## Troubleshooting

If you have issues, see our [Troubleshooting Guide](https://github.com/hi-primus/optimus/tree/develop-21.9/troubleshooting.md)

# Contributing to Optimus 💡

Contributions go far beyond pull requests and commits. We are very happy to receive any kind of contributions  
including: 

* [Documentation](https://github.com/hi-primus/optimus/tree/develop-21.9/docs/source) updates, enhancements, designs, or   bugfixes. 
* Spelling or grammar fixes. 
* README.md corrections or redesigns. 
* Adding unit, or functional [tests](https://github.com/hi-primus/optimus/tree/develop-21.9/tests)  
* Triaging GitHub issues -- especially determining whether an issue still persists or is reproducible.
* [Blogging, speaking about, or creating tutorials](https://hioptimus.com/category/blog/) about Optimus and its many features. 
* Helping others on our official chats

# Backers and Sponsors

Become a [backer](https://opencollective.com/optimus#backer) or a [sponsor](https://opencollective.com/optimus#sponsor) and get your image on our README on Github with a link to your site. 

[![OpenCollective](https://opencollective.com/optimus/backers/badge.svg)](#backers) [![OpenCollective](https://opencollective.com/optimus/sponsors/badge.svg)](#sponsors)


