Metadata-Version: 2.1
Name: pybda
Version: 0.0.5
Summary: Big Data analytics powered by Apache Spark
Home-page: https://github.com/cbg-ethz/pybda
Author: Simon Dirmeier
Author-email: simon.dirmeier@bsse.ethz.de
License: GPLv3
Keywords: bigdata analysis pipeline workflow spark pyspark machinelearning
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: End Users/Desktop
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Requires-Python: >=3
Requires-Dist: click (>=6.7)
Requires-Dist: joypy (>=0.1.9)
Requires-Dist: matplotlib (>=2.2.3)
Requires-Dist: numpy (>=1.15.0)
Requires-Dist: pandas (>=0.23.3)
Requires-Dist: pyspark (==2.4.0)
Requires-Dist: scipy (>=1.0.0)
Requires-Dist: seaborn (>=0.9.0)
Requires-Dist: snakemake (>=5.2.2)
Requires-Dist: sparkhpc (>=0.3.post4)
Requires-Dist: uuid (>=1.3.0)
Provides-Extra: dev
Requires-Dist: coverage ; extra == 'dev'
Requires-Dist: findspark ; extra == 'dev'
Requires-Dist: flake8 ; extra == 'dev'
Requires-Dist: pylint ; extra == 'dev'
Requires-Dist: pytest (>=3.6.2) ; extra == 'dev'
Requires-Dist: pytest-cov ; extra == 'dev'
Requires-Dist: pytest-pep8 ; extra == 'dev'
Requires-Dist: scikit-learn ; extra == 'dev'
Requires-Dist: yapf ; extra == 'dev'
Requires-Dist: sphinx ; extra == 'dev'
Requires-Dist: sphinx-fontawesome ; extra == 'dev'
Requires-Dist: sphinxcontrib-fulltoc ; extra == 'dev'
Provides-Extra: doc
Requires-Dist: sphinx ; extra == 'doc'
Requires-Dist: sphinx-fontawesome ; extra == 'doc'
Requires-Dist: sphinxcontrib-fulltoc ; extra == 'doc'
Provides-Extra: test
Requires-Dist: coverage ; extra == 'test'
Requires-Dist: findspark ; extra == 'test'
Requires-Dist: flake8 ; extra == 'test'
Requires-Dist: pylint ; extra == 'test'
Requires-Dist: pytest (>=3.6.2) ; extra == 'test'
Requires-Dist: pytest-cov ; extra == 'test'
Requires-Dist: pytest-pep8 ; extra == 'test'
Requires-Dist: scikit-learn ; extra == 'test'
Requires-Dist: yapf ; extra == 'test'

*****
PyBDA
*****

A commandline tool for analysis of big biological data sets using Snakemake and Apache Spark.

About
=====

PyBDA is a Python library and command line tool for big data analytics and machine learning scaling to tera byte sized data sets.

In order to make PyBDA scale to big data sets, we use `Apache Spark`_'s DataFrame API which, if developed against, automatically distributes
data to the nodes of a high-performance cluster and does the computation of expensive machine learning tasks in parallel.
For scheduling, PyBDA uses Snakemake_ to automatically execute pipelines of jobs. In particular, PyBDA will first build a DAG of methods/jobs
you want to execute in succession (e.g. dimensionality reduction into clustering) and then compute every method by traversing the DAG.
In the case of a successful computation of a job, PyBDA will write results and plots, and create statistics. If one of the jobs fails PyBDA will report where and which method failed
(owing to Snakemake's scheduling) such that the same pipeline can effortlessly be continued from where it failed the last time.

Documentation
=============

Check out the documentation `here <https://pybda.readthedocs.io/en/latest/>`_.
The documentation will walk you though

* the installation process,
* setting up Apache Spark,
* using `pybda`.

Author
======

Simon Dirmeier `simon.dirmeier at bsse.ethz.ch <mailto:simon.dirmeier@bsse.ethz.ch>`_.

.. _`Apache Spark`: https://spark.apache.org/
.. _Snakemake: https://snakemake.readthedocs.io/en/stable/

