Metadata-Version: 2.1
Name: etlrules
Version: 0.3.3
Summary: A python ETL rule engine operating with data frames.
Author-email: Ciprian Miclaus <ciprianm@gmail.com>
License: MIT License
        
        Copyright (c) 2023, Ciprian Miclaus
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
        
Project-URL: homepage, https://github.com/ciprianmiclaus2/etlrules
Project-URL: documentation, https://ciprianmiclaus2.github.io/etlrules/
Project-URL: repository, https://github.com/ciprianmiclaus2/etlrules
Project-URL: history, https://github.com/ciprianmiclaus2/etlrules/blob/main/HISTORY.md
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: AUTHORS.md
Requires-Dist: pandas>2
Requires-Dist: pyyaml>=6.0.1
Requires-Dist: pyarrow>=13.0.0
Requires-Dist: sqlalchemy>=2.0.0
Provides-Extra: polars
Requires-Dist: polars>1; extra == "polars"
Provides-Extra: dask
Requires-Dist: aiohttp; extra == "dask"
Requires-Dist: dask; extra == "dask"
Requires-Dist: requests; extra == "dask"
Provides-Extra: test
Requires-Dist: pytest; extra == "test"
Requires-Dist: black; extra == "test"
Requires-Dist: isort; extra == "test"
Requires-Dist: flake8; extra == "test"
Requires-Dist: flake8-docstrings; extra == "test"
Requires-Dist: pytest-cov; extra == "test"
Provides-Extra: dev
Requires-Dist: tox; extra == "dev"
Requires-Dist: pre-commit; extra == "dev"
Requires-Dist: virtualenv; extra == "dev"
Requires-Dist: pip; extra == "dev"
Requires-Dist: twine; extra == "dev"
Requires-Dist: toml; extra == "dev"
Provides-Extra: docs
Requires-Dist: mkdocs==1.5.3; extra == "docs"
Requires-Dist: mkdocs-include-markdown-plugin==3.9.1; extra == "docs"
Requires-Dist: mkdocs-material==8.1.11; extra == "docs"
Requires-Dist: mkdocstrings==0.18.1; extra == "docs"
Requires-Dist: mkdocs-material-extensions==1.3; extra == "docs"
Requires-Dist: mkdocs-autorefs==0.3.1; extra == "docs"
Requires-Dist: mike==1.1.2; extra == "docs"

# etlrules


<p align="center">
<a href="https://pypi.python.org/pypi/etlrules">
    <img src="https://img.shields.io/pypi/v/etlrules.svg"
        alt = "Release Status">
</a>

<a href="http://github.com/ciprianmiclaus2/etlrules/">

<img src="https://img.shields.io/pypi/pyversions/etlrules.svg" alt="Python versions">
</a>

<a href="https://github.com/ciprianmiclaus2/etlrules/actions">
    <img src="https://github.com/ciprianmiclaus2/etlrules/actions/workflows/python-package.yml/badge.svg?branch=main" alt="CI Status">
</a>

<a href="https://codecov.io/gh/ciprianmiclaus2/etlrules" > 
 <img src="https://codecov.io/gh/ciprianmiclaus2/etlrules/graph/badge.svg?token=4N0N8XSVZY"/> 
 </a>

<a href="https://ciprianmiclaus2.github.io/etlrules/">
    <img src="https://img.shields.io/website/https/ciprianmiclaus2.github.io/etlrules/index.html.svg?label=docs&down_message=unavailable&up_message=available" alt="Documentation Status">
</a>

</p>


A python rule engine for applying transformations to dataframes.

ETL stands for [Extract, Trasform, Load](https://en.wikipedia.org/wiki/Extract,_transform,_load), which is a three step
process to source the data from some data source (Extract), transform the data (Transform) and publish it to a final
destination (Load).

Data transformation of tabular sets can be done in pure python with many dedicated python packages, the most widely
recognized being [pandas](https://pandas.pydata.org/). The result of such transformations can be quite opaque with the
logic difficult to read and understand, especially by non-coders. Even coders can struggle to understand certain
transformations unless in-code documentation is added and even when documentation is available, the code change in ways
which renders the documentation stale.

The etlrules package solves this by offering a set of simple rules which users can use to form a plan. The plan is a blueprint
on how to transform the data. The plan can be saved to a yaml file, stored in a repo for version control or in a database for
manipulation via UIs and then executed in a repeatable and predictable fashion. The rules in the plan can have names and
extensive description acting as an embedded documentation as to what the rule is trying to achieve.

This data-driven way of operating on tabular data allows non-technical users to come up with data transformations which can
automate various problems without the need to code. Workflows for managing change and version control can be put in place
around the plans, allowing technical and non-technical users to collaborate on data transformations that can be scheduled to
run periodically for solving real business problems.

## High level concepts

### Plan

A plan is a blueprint of how to perform extractions of tabular data, transformations of the data and how to load (ie write)
the transformed data to its final destination.

A plan is a collection of rules, each of which operate on a dataframe (tabular data).

### Rule

A rule is an operation performed on a dataframe. There are three types of rules:
    * Extract rules (aka read rules)
        - They will read an external data source (ie files, DBs, APIs endpoints) and bring the data into memory for processing
    * Transform rules
        - They will perform a transformation of the data (ie add a new column, modify an existing column, join columns, aggregate)
    * Load rules (aka write rules)
        - They will write the output into an external storage (ie files, DBs, APIs endpoints)

### Rule engine

The component that takes a plan and executes it (rule by rule) based on an input.


### Rule data

The structure that holds together any input dataframes, temporary results and the final output of a rule engine execution of a plan.
The rule data can have some input dataframes or they can start as empty canvases, with the plan performing extractions/reading of data that it needs.

### Backend

The underlying dataframe library to use for executing the plan. For example: pandas, vaex, polars, etc.
At the moment, only pandas is supported.

### Runner

A command line interface application which can take a plan serialized as a yaml file and run it. Also allows to run that
programmatically via the run_plan import.


## Documentation

<https://ciprianmiclaus2.github.io/etlrules/>


## License

Free software: MIT
