Metadata-Version: 2.1
Name: dpiper
Version: 0.1.2
Summary: A Python module for maintaining pipeline syntax of Pandas statements.
Home-page: https://github.com/miketarpey/piper
Author: Mike Tarpey
Author-email: miketarpey@gmx.net
License: BSD
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: BSD License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Description-Content-Type: text/markdown
Requires-Dist: cx-oracle
Requires-Dist: numpy (>=1.20.0)
Requires-Dist: openpyxl (>=3.0.6)
Requires-Dist: pandas (>=1.0.0)
Requires-Dist: psycopg2
Requires-Dist: pypyodbc
Requires-Dist: seaborn (>=0.11.1)
Requires-Dist: xlsxwriter (>=1.3.2)

![PyPI version shields.io](https://img.shields.io/pypi/v/dpiper)
![PyPI version shields.io](https://img.shields.io/pypi/pyversions/dpiper)
![PyPI version shields.io](https://img.shields.io/pypi/l/dpiper)

# Piper

__Piper__ is a python package to help simplify data wrangling tasks with
[pandas](https://pandas.pydata.org/). It provides a set of wrapper functions or
'verbs' that provide a simpler interface to standard Pandas functions.

Piper functions accept and receive pandas dataframe objects. They can be used as
standalone functions but are more powerful when used together in a
[Jupyter](https://jupyter.org/) notebook cell to form a data pipeline.

Instead of the traditional the 'dot notation' or method calling technique from
within an object, piper receives and passes on the dataframe object between
functions using the piper magic command link operator (that is __'>>'__) within
a cell. So, in traditional pandas, to see the first 5 rows of a dataframe:

```python
df.head()
```

The equivalent in piper would be:

```python
%%piper
df >> head()
```

# Installation
To install the package, enter the following:

```unix
pip install dpiper
```

# Documentation
Piper API documentation available at [readthedocs](https://dpiper.readthedocs.io/en/latest/)

# Quick start

## __Example #1__

A dataframe consisting of two columns A and B.

```python
import pandas as pd
import numpy as np

np.random.seed(42)
df = pd.DataFrame({'A': np.random.randint(10, 1000, 10),
                   'B': np.random.randint(10, 1000, 10)})
df.head()
```

|    |   A |   B |
|---:|----:|----:|
|  0 | 112 | 476 |
|  1 | 445 | 224 |
|  2 | 870 | 340 |
|  3 | 280 | 468 |
|  4 | 116 |  97 |

### __Piper equivalent__

```python
from piper.defaults import *
```
```python
%%piper
df >> assign(C = lambda x: x.A + x.B,
             D = lambda x: x.C < 1000)
   >> where("~D")
```

|    |   A |   B |    C | D     |
|---:|----:|----:|-----:|:------|
|  2 | 870 | 340 | 1210 | False |
|  8 | 624 | 673 | 1297 | False |

## __Example #2__
Suppose you need the following function to trim columnar text data.

```python
def trim_columns(df):
    ''' Trim blanks for given dataframe '''

    str_cols = df.select_dtypes(include='object').columns

    for col in str_cols:
        df[col] = df[col].str.strip()

    return df

import pandas as pd
from piper.factory import sample_data

df = sample_data()

# Select all columns EXCEPT 'dates'
subset_cols = ['order_dates', 'regions', 'countries', 'values_1', 'values_2']

criteria1 = ~df['countries'].isin(['Italy', 'Portugal'])
criteria2 = df['values_1'] > 40
criteria3 = df['values_2'] < 25

df2 = (df[subset_cols][criteria1 & criteria2 & criteria3]
       .pipe(trim_columns)
       .sort_values('countries', ascending=False))
df2.head()
```

### __Piper equivalent__

Using the __%%piper__ magic function, piper verbs can be combined with standard
python functions.

```python
from piper.defaults import *
```

```python
%%piper
sample_data()
>> trim_columns()
>> select('-dates')
>> where(""" ~countries.isin(['Italy', 'Portugal']) &
              values_1 > 40 &
              values_2 < 25 """)
>> order_by('-countries')
>> head(5)
```

Result:
| dates | order_dates | countries | ids | values_1 | values_2 |
| ----- | ----------- | --------- | --- | -------- | -------- |
2020-03-03 | 2020-03-09 | Sweden | E |	194  |20
2020-05-02 | 2020-05-08 | Sweden | D |	322  |14
2020-01-20 | 2020-01-26 | Spain  | A |  183  |20
2020-02-01 | 2020-02-07 | Norway | D |	344  |21
2020-05-06 | 2020-05-12 | Norway | B |	135  |21


