Metadata-Version: 2.1
Name: great-expectations
Version: 0.8.2
Summary: Always know what to expect from your data.
Home-page: https://github.com/great-expectations/great_expectations
Author: The Great Expectations Team
Author-email: team@greatexpectations.io
License: Apache-2.0
Keywords: data science testing pipeline data quality dataquality validation datavalidation
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Other Audience
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Software Development
Classifier: Topic :: Software Development :: Testing
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Requires-Dist: numpy (>=1.14.1)
Requires-Dist: scipy (>=0.19.0)
Requires-Dist: pandas (>=0.22.0)
Requires-Dist: python-dateutil (>=2.4.2)
Requires-Dist: pytz (>=2015.6)
Requires-Dist: six (>=1.12.0)
Requires-Dist: jsonschema (>=2.5.1)
Requires-Dist: altair (>=3.1.0)
Requires-Dist: backports.functools-lru-cache (>=1.5)
Requires-Dist: ruamel.yaml (>=0.15.24)
Requires-Dist: ipywidgets (>=7.4.2)
Requires-Dist: requests (>=2.20)
Requires-Dist: Click (>=7.0)
Requires-Dist: termcolor (>=1.1.0)
Requires-Dist: tzlocal (>=1.2)
Requires-Dist: pypandoc (>=1.4)
Requires-Dist: future (>=0.16)
Requires-Dist: jinja2 (>=2.10)
Requires-Dist: marshmallow (<3.0,>=2.0)
Provides-Extra: airflow
Requires-Dist: apache-airflow[s3] (>=1.9.0) ; extra == 'airflow'
Requires-Dist: boto3 (>=1.7.3) ; extra == 'airflow'
Provides-Extra: spark
Requires-Dist: pyspark (>=2.3.2) ; extra == 'spark'
Provides-Extra: sqlalchemy
Requires-Dist: sqlalchemy (>=1.2) ; extra == 'sqlalchemy'

|Build Status| |Coverage Status| |Documentation Status|

Great Expectations
==================

*Always know what to expect from your data.*

Quick Start
-----------

`Getting
Started <http://docs.greatexpectations.io/en/latest/getting_started.html>`__
will teach you how to get up and running in minutes.

For full documentation, visit `Great Expectations on
readthedocs.io <http://great-expectations.readthedocs.io/en/latest/>`__.

`Down with Pipeline
Debt! <https://medium.com/@expectgreatdata/down-with-pipeline-debt-introducing-great-expectations-862ddc46782a>`__
explains the core philosophy behind Great Expectations. Please give it a
read, and clap, follow, and share while you’re at it.

What is great_expectations?
---------------------------

Great Expectations helps teams save time and promote analytic integrity
by offering a unique approach to automated testing: pipeline tests.
Pipeline tests are applied to data (instead of code) and at batch time
(instead of compile or deploy time). Pipeline tests are like unit tests
for datasets: they help you guard against upstream data changes and
monitor data quality.

Software developers have long known that automated testing is essential
for managing complex codebases. Great Expectations brings the same
discipline, confidence, and acceleration to data science and engineering
teams.

Why would I use Great Expectations?
-----------------------------------

To get more done with data, faster. Teams use great_expectations to

-  Save time during data cleaning and munging.
-  Accelerate ETL and data normalization.
-  Streamline analyst-to-engineer handoffs.
-  Streamline knowledge capture and requirements gathering from
   subject-matter experts.
-  Monitor data quality in production data pipelines and data products.
-  Automate verification of new data deliveries from vendors and other
   teams.
-  Simplify debugging data pipelines if (when) they break.
-  Codify assumptions used to build models when sharing with other teams
   or analysts.
-  Develop rich, shared data documention in the course of normal work.
-  Make implicit knowledge explicit.
-  etc., etc., etc.

Key features
------------

**Expectations**

Expectations are the workhorse abstraction in Great Expectations. Like
assertions in traditional python unit tests, Expectations provide a
flexible, declarative language for describing expected behavior. Unlike
traditional unit tests, Great Expectations applies Expectations to data
instead of code.

Expectations include: - ``expect_table_row_count_to_equal`` -
``expect_column_values_to_be_unique`` -
``expect_column_values_to_be_in_set`` -
``expect_column_mean_to_be_between`` - …and many more

Great Expectations currently supports native execution of Expectations
in three environments: pandas, SQL (through the SQLAlchemy core), and
Spark. This approach follows the philosophy of “take the compute to the
data.” Future releases of Great Expectations will extend this
functionality to other frameworks, such as dask and BigQuery.

**Automated data profiling**

Writing pipeline tests from scratch can be tedious and counterintuitive.
Great Expectations jump starts the process by providing powerful tools
for automated data profiling. This provides the double benefit of
helping you explore data faster, and capturing knowledge for future
documentation and testing.

**DataContexts and DataSources**

…allow you to configure connections your data stores, using names that
point to concepts you’re already familiar with: “the
``ml_training_results`` bucket in S3,” “the ``Users`` table in
Redshift.” Great Expectations provides convenience libraries to
introspect most common data stores (Ex: SQL databases, data directories
and S3 buckets.) We are also working to integrate with pipeline
execution frameworks (Ex: airflow, dbt, dagster, prefect.io). The Great
Expectations framework lets you fetch, validate, profile, and document
your data in a way that’s meaningful within your existing infrastructure
and work environment.

**Tooling for validation**

Evaluating Expectations against data is just one step in a typical
validation workflow. Great Expectations makes the followup steps simple,
too: storing validation results to a shared bucket, summarizing results
and posting notifications to slack, handling differences between
warnings and errors, etc.

Great Expectations also provides robust concepts of Batches and Runs.
Although we sometimes talk informally about validating “dataframes” or
“tables,” it’s much more common to validate batches of new data—subsets
of tables, rather than whole tables. DataContexts provide simple,
universal syntax to generate, fetch, and validate Batches of data from
any of your DataSources.

**Compile to Docs**

As of v0.7.0, Great Expectations includes new classes and methods to
``render`` Expectations to clean, human-readable documentation. Since
docs are compiled from tests and you are running tests against new data
as it arrives, your documentation is guaranteed to never go stale.

What does Great Expectations NOT do?
------------------------------------

**Great Expectations is NOT a pipeline execution framework.**

We aim to integrate seamlessly with DAG execution tools like
`Spark <https://spark.apache.org/>`__,
`Airflow <https://airflow.apache.org/>`__,
`dbt <https://www.getdbt.com/>`__,
`prefect <https://www.prefect.io/>`__,
`dagster <https://github.com/dagster-io/dagster>`__,
`Kedro <https://github.com/quantumblacklabs/kedro>`__, etc. We DON’T
execute your pipelines for you.

**Great Expectations is NOT a data versioning tool.**

Great Expectations does not store data itself. Instead, it deals in
metadata about data: Expectations, validation results, etc. If you want
to bring your data itself under version control, check out tools like:
`DVC <https://dvc.org/>`__ and
`Quilt <https://github.com/quiltdata/quilt>`__.

**Great Expectations currently works best in a python/bash
environment.**

Great Expectations is python-based. You can invoke it from the command
line without using a python programming environment, but if you’re
working in another ecosystem, other tools might be a better choice. If
you’re running in a pure R environment, you might consider
`assertR <https://github.com/ropensci/assertr>`__ as an alternative.
Within the Tensorflow ecosystem,
`TFDV <https://www.tensorflow.org/tfx/guide/tfdv>`__ fulfills a similar
function as Great Expectations.

How do I learn more?
--------------------

For full documentation, visit `Great Expectations on
readthedocs.io <http://great-expectations.readthedocs.io/en/latest/>`__.

`Down with Pipeline
Debt! <https://medium.com/@expectgreatdata/down-with-pipeline-debt-introducing-great-expectations-862ddc46782a>`__
explains the core philosophy behind Great Expectations. Please give it a
read, and clap, follow, and share while you’re at it.

For quick, hands-on introductions to Great Expectations’ key features,
check out our walkthrough videos:

-  `Introduction to Great
   Expectations <https://www.youtube.com/watch?v=-_0tG7ACNU4>`__
-  `Using Distributional
   Expectations <https://www.youtube.com/watch?v=l3DYPVZAUmw&t=20s>`__

Who maintains Great Expectations?
---------------------------------

Great Expectations is under active development by James Campbell, Abe
Gong, Eugene Mandel, Rob Lim, Taylor Miller, and help from many others.

What’s the best way to get in touch with the Great Expectations team?
---------------------------------------------------------------------

If you have questions, comments, or just want to have a good
old-fashioned chat about data pipelines, please hop on our public `Slack
channel <https://greatexpectations.io/slack>`__

If you’d like hands-on assistance setting up Great Expectations,
establishing a healthy practice of data testing, or adding functionality
to Great Expectations, please see options for consulting help
`here <https://greatexpectations.io/consulting/>`__.

Can I contribute to the library?
--------------------------------

Absolutely. Yes, please. Start
`here <https://github.com/great-expectations/great_expectations/blob/develop/CONTRIBUTING.md>`__
and please don’t be shy with questions.

.. |Build Status| image:: https://travis-ci.org/great-expectations/great_expectations.svg?branch=develop
   :target: https://travis-ci.org/great-expectations/great_expectations
.. |Coverage Status| image:: https://coveralls.io/repos/github/great-expectations/great_expectations/badge.svg?branch=develop
   :target: https://coveralls.io/github/great-expectations/great_expectations?branch=develop
.. |Documentation Status| image:: https://readthedocs.org/projects/great-expectations/badge/?version=latest
   :target: http://great-expectations.readthedocs.io/en/latest/?badge=latest


