Metadata-Version: 2.1
Name: ploomber
Version: 0.6
Summary: Spend your time discovering insights from data, not writing plumbing code. Declare your pipeline in a short YAML file and Ploomber will take care of the rest.
Home-page: https://github.com/ploomber/ploomber
Author: 
Author-email: 
License: UNKNOWN
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: Unix
Classifier: Operating System :: POSIX
Classifier: Operating System :: Microsoft :: Windows
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Requires-Dist: pyyaml
Requires-Dist: networkx
Requires-Dist: jinja2
Requires-Dist: tabulate
Requires-Dist: humanize
Requires-Dist: tqdm
Requires-Dist: sqlparse
Requires-Dist: autopep8
Requires-Dist: parso
Requires-Dist: mistune
Requires-Dist: pygments
Requires-Dist: sqlalchemy
Requires-Dist: click
Requires-Dist: papermill
Requires-Dist: jupytext
Requires-Dist: ipykernel (>=1.5.2)
Requires-Dist: jupyter-client (>=5.3.1)
Requires-Dist: nbconvert (>=5.6.0)
Requires-Dist: notebook
Requires-Dist: pyflakes
Requires-Dist: importlib-resources ; python_version < "3.7"
Provides-Extra: all
Requires-Dist: pandas ; extra == 'all'
Requires-Dist: pyarrow ; extra == 'all'
Requires-Dist: numpydoc ; extra == 'all'
Provides-Extra: test
Requires-Dist: pandas ; extra == 'test'
Requires-Dist: pyarrow ; extra == 'test'
Requires-Dist: numpydoc ; extra == 'test'
Requires-Dist: pygraphviz ; extra == 'test'
Requires-Dist: matplotlib ; extra == 'test'
Requires-Dist: paramiko ; extra == 'test'
Requires-Dist: boto3 ; extra == 'test'
Requires-Dist: nose ; extra == 'test'

Ploomber
========

.. image:: https://travis-ci.org/ploomber/ploomber.svg?branch=master
    :target: https://travis-ci.org/ploomber/ploomber.svg?branch=master

.. image:: https://readthedocs.org/projects/ploomber/badge/?version=latest
    :target: https://ploomber.readthedocs.io/en/latest/?badge=latest
    :alt: Documentation Status

.. image:: https://mybinder.org/badge_logo.svg
 :target: https://mybinder.org/v2/gh/ploomber/projects/master



Point Ploomber to your Python and SQL scripts in a ``pipeline.yaml`` file and it will figure out execution order by extracting dependencies from them.


It also keeps track of source code changes to speed up builds by skipping up-to-date tasks. This is a great way to interactively develop your projects, sync work with your team and quickly recover from crashes (just fix the bug and build again).


`Try out the live demo (no installation required) <https://mybinder.org/v2/gh/ploomber/projects/master?filepath=spec%2FREADME.md>`_.

`Click here for documentation <https://ploomber.readthedocs.io/>`_.

`Our blog <https://ploomber.io/>`_.


Works with Python 3.5 and higher.


``pipeline.yaml`` example
-------------------------

.. code-block:: yaml

    # pipeline.yaml

    # clean data from the raw table
    - source: clean.sql
      product: clean_data
      # function that returns a db client
      client: db.get_client

    # aggregate clean data
    - source: aggregate.sql
      product: agg_data
      client: db.get_client

    # dump data to a csv file
    - class: SQLDump
      source: dump_agg_data.sql
      product: output/data.csv
      client: db.get_client

    # visualize data from csv file
    - source: plot.py
      product:
        # where to save the executed notebook
        nb: output/executed-notebook-plot.ipynb
        # tasks can generate other outputs
        data: output/some_data.csv



Python script example
---------------------

.. code-block:: python

    # annotated python file (it will be converted to a notebook during execution)
    import pandas as pd

    # + tags=["parameters"]
    # this script depends on the output generated by a task named "clean"
    upstream = {'clean': None}
    product = None

    # during execution, a new cell is added here

    # +
    df = pd.read_csv(upstream['some_task'])
    # do data processing...
    df.to_csv(product['data'])


SQL script example
------------------

.. code-block:: sql

    DROP TABLE IF EXISTS {{product}};

    CREATE TABLE {{product}} AS
    -- this task depends on the output generated by a task named "clean"
    SELECT * FROM {{upstream['clean']}}
    WHERE x > 10


Install
-------

.. code-block:: shell

    pip install ploomber


To install Ploomber along with all optional dependencies:

.. code-block:: shell

    pip install "ploomber[all]"

``graphviz`` is required for plotting pipelines:

.. code-block:: shell

    # if you use conda (recommended)
    conda install graphviz
    # if you use homebrew
    brew install graphviz
    # for more options, see: https://www.graphviz.org/download/


Create a new project
--------------------

.. code-block:: shell

    ploomber new


Python API
----------

There is also a Python API for advanced use cases. This API allows you build
flexible abstractions such as dynamic pipelines, where the exact number of
tasks is determined by its parameters.

CHANGELOG
=========

0.6.1dev
--------
* Experimental PythonCallable.develop()

0.6 (2020-07-08)
------------------
* Adds Jupyter notebook extension to inject parameters when opening a task
* Improved CLI `ploombe new`, `ploombe add` and `ploombe entry`
* Spec API documentation additions
* Support for `on_finish`, `on_failure` and `on_render` hooks in spec API
* Improved validation for DAG specs
* Several bug fixes


0.5.1 (2020-06-30)
------------------
* Reduces the number of required dependencies
* A new option in DBAPIClient to split source with a custom separator


0.5 (2020-06-27)
----------------
* Adds CLI
* New spec API to instantiate DAGs using YAML files
* NotebookRunner.debug() for debugging and .develop() for interacive development
* Bug fixes


0.4.1 (2020-05-19)
-------------------
* PythonCallable.debug() now works in Jupyter notebooks

0.4.0 (2020-05-18)
-------------------
* PythonCallable.debug() now uses IPython debugger by default
* Improvements to Task.build() public API
* Moves hook triggering logic to Task to simplify executors implementation
* Adds DAGBuildEarlyStop exception to signal DAG execution stop
* New option in Serial executor to turn warnings and exceptions capture off
* Adds Product.prepare_metadata hook
* Implements hot reload for notebooks and python callables
* General clean ups for old `__str__` and `__repr__` in several modules
* Refactored ploomber.sources module and ploomber.placeholders (previously ploomber.templates)
* Adds NotebookRunner.debug() and NotebookRunner.develop()
* NotebookRunner: now has an option to run static analysis on render
* Adds documentation for DAG-level hooks
* Bug fixes

0.3.5 (2020-05-03)
-------------------
* Bug fixes #88, #89, #90, #84, #91
* Modifies Env API: Env() is now Env.load(), Env.start() is now Env()
* New advanced Env guide added to docs
* Env can now be used with a context manager
* Improved DAGConfigurator API
* Deletes logger configuration in executors constructors, logging is available via DAGConfigurator


0.3.4 (2020-04-25)
-------------------
* Dependencies cleanup
* Removed (numpydoc) as dependency, now optional
* A few bug fixes: #79, #71
* All warnings are captured and shown at the end (Serial executor)
* Moves differ parameter from DAG constructor to DAGConfigurator


0.3.3 (2020-04-23)
-------------------
* Cleaned up some modules, deprecated some rarely used functionality
* Improves documentation aimed to developers looking to extend ploomber
* Introduces DAGConfigurator for advanced DAG configuration [Experimental API]
* Adds task to upload files to S3 (ploomber.tasks.UploadToS3), requires boto3
* Adds DAG-level on_finish and on_failure hooks
* Support for enabling logging in entry points (via --logging)
* Support for starting an interactive session using entry points (via python -i -m)
* Improved support for database drivers that can only send one query at a time
* Improved repr for SQLAlchemyClient, shows URI (but hides password)
* PythonCallable now validates signature against params at render time
* Bug fixes


0.3.2 (2020-04-07)
------------------

* Faster Product status checking, now performed at rendering time
* New products: GenericProduct and GenericSQLRelation for Products that do not have a specific implementation (e.g. you can use Hive with the DBAPI client + GenericSQLRelation)
* Improved DAG build reports, subselect columns, transform to pandas.DataFrame and dict
* Parallel executor now returns build reports, just like the Serial executor



0.3.1 (2020-04-01)
------------------

* DAG parallel executor
* Interact with pipelines from the command line (entry module)
* Bug fixes
* Refactored access to Product.metadata


0.3 (2020-03-20)
----------------
* New Quickstart and User Guide section in documentation
* DAG rendering and build now continue until no more tasks can render/build (instead of failing at the first exception)
* New @with_env and @load_env decorators for managing environments
* Env expansion ({{user}} expands to the current, also {{git}} and {{version}} available)
* Task.name is now optional when Task is initialized with a source that has __name__ attribute (Python functions) or a name attribute (like Placeholders returned from SourceLoader)
* New Task.on_render hook
* Bug fixes
* A lot of new tests
* Now compatible with Python 3.5 and higher

0.2.1 (2020-02-20)
------------------

* Adds integration with pdb via PythonCallable.debug
* Env.start now accepts a filename to look for
* Improvements to data_frame_validator

0.2 (2020-02-13)
----------------

* Simplifies installation
* Deletes BashCommand, use ShellScript
* More examples added
* Refactored env module
* Renames SQLStore to SourceLoader
* Improvements to SQLStore
* Improved documentation
* Renamed PostgresCopy to PostgresCopyFrom
* SQLUpload and PostgresCopy have now the same API
* A few fixes to PostgresCopy (#1, #2)

0.1
---

* First release

