Metadata-Version: 1.1
Name: goodtables
Version: 1.0.0a16
Summary: Goodtables is a framework to inspect tabular data.
Home-page: https://github.com/frictionlessdata/goodtables
Author: Open Knowledge International
Author-email: info@okfn.org
License: MIT
Description: goodtables
        ==========
        
        | |Travis|
        | |Coveralls|
        | |PyPi|
        | |SemVer|
        | |Gitter|
        
        Goodtables is a framework to inspect tabular data.
        
            [BREAKING] Version ``v1.0.0-alpha8`` has merged preset ``tables``
            and preset ``datapackages`` into universal ``nested`` preset.
        
        --------------
        
            [BREAKING] Version ``v1.0`` has renewed API introduced in NOT
            backward-compatibility manner. Previous version could be found
            `here <https://github.com/frictionlessdata/goodtables-py/tree/4b85254cc0358c0caf85bbd41d0c2023df99fb9b>`__.
        
        Features
        --------
        
        -  tabular data inspection and validation
        -  general, structure and schema checks
        -  support for different input data presets
        -  parallel computation for multitable datasets
        -  builtin command-line interface
        
        Getting Started
        ---------------
        
        Installation
        ~~~~~~~~~~~~
        
        .. code:: bash
        
            $ pip install goodtables --pre
            $ pip install goodtables[ods] --pre # With ods format support
        
        Example
        ~~~~~~~
        
        Let's start with the simple example:
        
        .. code:: python
        
            from goodtables import Inspector
        
            inspector = Inspector()
            print(inspector.inspect('data/invalid.csv'))
        
            # will print
            #{'time': 0.029,
            # 'valid': False',
            # 'error-count': 2,
            # 'table-count': 1,
            # 'warnings': [],
            # 'tables': [
            #    {'time': 0.027,
            #     'valid': False',
            #     'headers': ['id', 'name', ''],
            #     'row-count': 4,
            #     'source': 'data/invalid.csv'
            #     'error-count': 2,
            #     'errors': [
            #        {'row': None,
            #         'code': 'blank-header',
            #         'message': 'Blank header',
            #         'row-number': None,
            #         'column-number': 2},
            #        {'row': [],
            #         'code': 'blank-row',
            #         'message': 'Blank row',
            #         'row-number': 3,
            #         'column-number': None}]}]}
        
        Inspection
        ~~~~~~~~~~
        
        Goodtables inspects your tabular data to find general, structure and
        schema errors. As presented in an example above to inspect data:
        
        -  ``Inspector(**options)`` class should be instantiated
        -  ``inspector.inspect(source, preset=<preset>, **options)`` should be
           called
        -  a returning value will be a report dictionary
        
        Dataset
        ^^^^^^^
        
        Goodtables support different sources for an inspection. But it should be
        convertable to dataset presented on a figure 1. Details will be
        explained in the next sections:
        
        |Dataset|
        
        Report
        ^^^^^^
        
        As a result of inspection goodtables returns a report dictionary. It
        includes valid flag, count of errors, list of reports per table
        including errors etc. See example above for an instance. A report
        structure and all errors are standartised and described in **data
        quality spec**:
        
            https://github.com/frictionlessdata/goodtables-py/blob/next-initial/goodtables/spec.json
        
        Errors
        ^^^^^^
        
        Report errors are categorized by type:
        
        -  source - data can't be loaded or parsed
        -  structure - general tabular errors like duplicate headers
        -  schema - error of checks against JSON Table Schema
        
        Report errors are categorized by context:
        
        -  table - the whole table errors like IO, HTTP or encoding error
        -  head - headers errors
        -  body - contents errors
        
        Presets
        ~~~~~~~
        
        Table is a main inspection object in goodtables. The simplest option is
        to pass to ``Inspector.inspect`` path and other options for one table
        (see example above). But when multitable parallized inspection is needed
        different presets could be used to process a dataset.
        
        Let's see how to inspect a datapackage:
        
        .. code:: python
        
            from goodtables import Inspector
        
            inspector = Inspector()
            inspector.inspect('datapackage.json', preset='datapackage')
        
        A preset function proceses passed source and options and fills tables
        list for the following inspection. If any issues have happened a preset
        function should add them to warnings list.
        
        Builtin presets
        ^^^^^^^^^^^^^^^
        
        Goodtables by default supports the following presets:
        
        -  table
        -  datapackage
        -  nested (a special preset allows to nest ``inspect`` calls -
           `example <https://github.com/frictionlessdata/goodtables-py/blob/master/examples/nested.py>`__)
        
        Custom presets
        ^^^^^^^^^^^^^^
        
            It's a provisional API excluded from SemVer. If you use it as a part
            of other program please pin concrete ``goodtables`` version to your
            requirements file.
        
        To register a custom preset user could use a ``preset`` decorator. This
        way the builtin preset could be overriden or could be added a custom
        preset.
        
        .. code:: python
        
            from tabulator import Stream
            from jsontableschema import Schema
            from goodtables import Inspector, preset
        
            @preset('custom-preset')
            def custom_preset(source, **options):
                warnings = []
                tables = []
                for table in source:
                    try:
                        tables.append({
                            'source':  str(source),
                            'stream':  Stream(...),
                            'schema': Schema(...),
                            'extra': {...},
                        })
                    except Exception:
                        warnings.append('Warning message')
                return warnings, tables
        
            inspector = Inspector(custom_presets=[custom_preset])
            inspector.inspect(source, preset='custom-preset')
        
        See builtin presets to learn more about the dataset extration protocol.
        
        Checks
        ~~~~~~
        
        Check is a main inspection actor in goodtables. Every check is
        associated with a specification error. Checking order is the same as
        order of errors in the specification. List of checks could be customized
        using inspector's ``checks`` argument. Let's explore options on an
        example:
        
        .. code:: python
        
            inspector = Inspector(checks='all/structure/schema') # type
            inspector = Inspector(checks={'bad-headers': False}) # exclude
            inspector = Inspector(checks={'bad-headers': True}) # cherry-pick
        
        Check gets input data from framework based on context (e.g.
        ``columns, sample`` for ``head`` context) and update errors and columns
        lists in-place.
        
        Buitin checks
        ^^^^^^^^^^^^^
        
        Goodtables by default supports the following checks:
        
        -  [check for every error from the specification]
        
        Custom checks
        ^^^^^^^^^^^^^
        
            It's a provisional API excluded from SemVer. If you use it as a part
            of other program please pin concrete ``goodtables`` version to your
            requirements file.
        
        To register a custom check user could use a ``check`` decorator. This
        way the builtin check could be overriden (use the spec error code like
        ``duplicate-row``) or could be added a check for a custom error (use
        ``type``, ``context`` and ``after/before`` arguments):
        
        .. code:: python
        
            from goodtables import Inspector, check
        
            @check('custom-error', type='structure', context='body', after='blank-row')
            def custom_check(errors, columns, row_number,  state=None):
                for column in columns:
                    errors.append({
                        'code': 'custom-error',
                        'message': 'Custom error',
                        'row-number': row_number,
                        'column-number': column['number'],
                    })
                    columns.remove(column)
        
            inspector = Inspector(custom_checks=[custom_check])
        
        See builtin checks to learn more about checking protocol.
        
        CLI
        ~~~
        
            It's a provisional API excluded from SemVer. If you use it as a part
            of other program please pin concrete ``goodtables`` version to your
            requirements file.
        
        All common goodtables tasks could be done using a command-line interface
        (command per preset excluding ``tables``):
        
        ::
        
            $ goodtables
            Usage: cli.py [OPTIONS] COMMAND [ARGS]...
        
            Options:
              --json
              --error-limit INTEGER
              --table-limit INTEGER
              --row-limit INTEGER
              --infer-schema
              --infer-fields
              --order-fields
              --help                 Show this message and exit.
        
            Commands:
              datapackage
              table
        
        For example write a following command to the shell:
        
        ::
        
            $ goodtables table data/invalid.csv
        
        And a report (the same as in the initial example) will be printed to the
        standard output.
        
        FAQ
        ---
        
        Is it an inspection or validation?
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        
        For now we use the word ``inspector`` because we create reports as the
        result of an inspection. One difference to validation - goodtables will
        not raise an exception if the dataset is invalid. Final naming is under
        consideration and based on exposed methods (only ``inspect`` or like
        ``inspect/validate/stream``).
        
        Is it possible to stream reporting?
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        
        For now - it's not. But it's under consideration. Not for multitable
        datasets because of parallelizm, but for one table it could be exposed
        to public via API because internally that's how goodtables works. The
        question here is "what should be streamed?" - errors, or valid/invalid
        per row indication with errors, etc. We would be happy to see a real
        world use case for this feature.
        
        API Reference
        -------------
        
        Snapshot
        ~~~~~~~~
        
        ::
        
            Inspector(checks='all',
                      table_limit=10,
                      row_limit=1000,
                      error_limit=1000,
                      infer_schema=False,
                      infer_fields=False,
                      order_fields=False,
                      custom_presets=[],
                      custom_checks=[])
                inspect(source, preset='table', **options)
            ~@preset(name)
            ~@check(error)
            exceptions
            spec
            ~cli
        
        Detailed
        ~~~~~~~~
        
        -  `Docstrings <https://github.com/frictionlessdata/goodtables-py/tree/master/goodtables>`__
        -  `Changelog <https://github.com/frictionlessdata/goodtables/commits/master>`__
        
        Contributing
        ------------
        
        Please read the contribution guideline:
        
        `How to Contribute <CONTRIBUTING.md>`__
        
        Thanks!
        
        .. |Travis| image:: https://img.shields.io/travis/frictionlessdata/goodtables-py/master.svg
           :target: https://travis-ci.org/frictionlessdata/goodtables-py
        .. |Coveralls| image:: http://img.shields.io/coveralls/frictionlessdata/goodtables-py.svg?branch=master
           :target: https://coveralls.io/r/frictionlessdata/goodtables-py?branch=master
        .. |PyPi| image:: https://img.shields.io/pypi/v/goodtables.svg
           :target: https://pypi.python.org/pypi/goodtables
        .. |SemVer| image:: https://img.shields.io/badge/versions-SemVer-brightgreen.svg
           :target: http://semver.org/
        .. |Gitter| image:: https://img.shields.io/gitter/room/frictionlessdata/chat.svg
           :target: https://gitter.im/frictionlessdata/chat
        .. |Dataset| image:: data/dataset.png
Keywords: data validation,frictionless data,open data,json schema,json table schema,data package,tabular data package
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Web Environment
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Topic :: Internet :: WWW/HTTP :: Dynamic Content
Classifier: Topic :: Software Development :: Libraries :: Python Modules
