Metadata-Version: 2.1
Name: schemaflow
Version: 0.1.0
Summary: a package to write data pipelines for data science systematically
Home-page: https://github.com/jorgecarleitao/schemaflow
Author: Jorge C. Leitao
License: UNKNOWN
Description: [![Build Status](https://travis-ci.org/jorgecarleitao/schemaflow.svg?branch=master)](https://travis-ci.org/jorgecarleitao/schemaflow)
        [![Coverage Status](https://coveralls.io/repos/github/jorgecarleitao/schemaflow/badge.svg)](https://coveralls.io/github/jorgecarleitao/schemaflow)
        [![Documentation Status](https://readthedocs.org/projects/schemaflow/badge/?version=latest)](https://schemaflow.readthedocs.io/en/latest/?badge=latest)
        
        # SchemaFlow
        
        This is a a package to write data pipelines for data science systematically in Python.
        Thanks for checking it out.
        
        ## The problem that this package solves
        
        A major challenge in creating a robust data pipeline is guaranteeing interoperability between
        pipes: how do we guarantee that the pipe that someone wrote is compatible
        with my pipeline *without* running the whole pipeline multiple times until I get it right?
        
        ## The solution this package adopts
         
        This package declares an interface to define a stateful data transformation that gives 
        the developer the opportunity to declare what comes in, what comes out, and what states are modified
        on each pipe and therefore the whole pipeline.
        
        ## Install 
        
            # git clone the repository
            pip install .
        
        ## Run tests
        
            pip install -r requirements_tests.txt
            python -m unittest discover
        
        ## Build documentation
        
            pip install -r requirements_docs.txt
            cd docs && make html && cd ..
            open docs/build/html/index.html
        
        ## Use cases
        
        You have a hadoop cluster with csv/etc., use PySpark to process them
        and fit a model. There are multiple processing steps developed by many people.
        
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Description-Content-Type: text/markdown
