Metadata-Version: 2.0
Name: goodtables
Version: 2.0.0
Summary: Goodtables is a framework to inspect tabular data.
Home-page: https://github.com/frictionlessdata/goodtables
Author: Open Knowledge International
Author-email: info@okfn.org
License: MIT
Keywords: data validation,frictionless data,open data,json schema,json table schema,data package,tabular data package
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Web Environment
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Topic :: Internet :: WWW/HTTP :: Dynamic Content
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Provides-Extra: ods
Provides-Extra: develop
Requires-Dist: six (<2.0,>=1.9)
Requires-Dist: click (<7.0,>=6.6)
Requires-Dist: click-default-group
Requires-Dist: requests (<3.0,>=2.10)
Requires-Dist: simpleeval (<2.0,>=0.9)
Requires-Dist: statistics (<2.0,>=1.0)
Requires-Dist: tabulator (<2.0,>=1.3)
Requires-Dist: tableschema (<2.0,>=1.0.3)
Requires-Dist: datapackage (<2.0,>=1.2)
Provides-Extra: develop
Requires-Dist: tox; extra == 'develop'
Provides-Extra: ods
Requires-Dist: ezodf (<2.0,>=0.3); extra == 'ods'
Requires-Dist: lxml (<4.0,>=3.0); extra == 'ods'

goodtables-py
=============

| |Travis|
| |Coveralls|
| |PyPi|
| |Gitter|

| Goodtables is a framework to validate tabular data. It can check the
structure
| of your data (e.g. all rows have the same number of columns), and its
contents
| (e.g. all dates are valid).

Features
--------

-  **Structural checks**: Ensure that there are no empty rows, no blank
   headers, etc.
-  **Content checks**: Ensure that the values have the correct types
   ("string", "number", "date", etc.), that their format is valid
   ("string must be an e-mail"), and that they respect the constraints
   ("age must be a number greater than 18").
-  **Support for multiple tabular formats**: CSV, Excel files,
   LibreOffice, Data Package, etc.
-  **Parallelized validations for multi-table datasets**
-  **Command line interface**

Table of Contents
-----------------

-  `Features <#features>`__
-  `Getting Started <#getting-started>`__

   -  `Installing <#installing>`__
   -  `Running on CLI <#running-on-cli>`__
   -  `Running on Python <#running-on-python>`__

-  `Validation <#validation>`__

   -  `Basic checks <#basic-checks>`__
   -  `Structural checks <#structural-checks>`__
   -  `Content checks <#content-checks>`__
   -  `Advanced checks <#advanced-checks>`__
   -  `Validation report format <#validation-report-format>`__

-  `Developer documentation <#developer-documentation>`__

   -  `Semantic versioning <#semantic-versioning>`__
   -  `Validate <#validate>`__

-  `FAQ <#faq>`__

   -  `How can I add a new custom
      check? <#how-can-i-add-a-new-custom-check>`__
   -  `How can I add support for a new tabular file
      type? <#how-can-i-add-support-for-a-new-tabular-file-type>`__

-  `Changelog <#changelog>`__

Getting Started
---------------

Installing
~~~~~~~~~~

::

    pip install goodtables
    pip install goodtables[ods]  # If you need LibreOffice's ODS file support

Running on CLI
~~~~~~~~~~~~~~

::

    goodtables data.csv

Use ``goodtables --help`` to see the different options.

Running on Python
~~~~~~~~~~~~~~~~~

.. code:: python

    from goodtables import validate

    report = validate('invalid.csv')
    report['valid'] # false
    report['table-count'] # 1
    report['error-count'] # 3
    report['tables'][0]['valid'] # false
    report['tables'][0]['source'] # 'invalid.csv'
    report['tables'][0]['errors'][0]['code'] # 'blank-header'

| You can read a more in depth explanation on using goodtables with
Python on
| the `developer documentation <#developer-documentation>`__ section.
Check also
| the `examples <examples>`__ folder for other examples.

Validation
----------

Basic checks
~~~~~~~~~~~~

The basic checks can't be disabled, as they deal with goodtables being
able to read the files.

+------------------+-------------------------------------------------------------------------+
| check            | description                                                             |
+==================+=========================================================================+
| io-error         | Data reading error because of IO error.                                 |
+------------------+-------------------------------------------------------------------------+
| http-error       | Data reading error because of HTTP error.                               |
+------------------+-------------------------------------------------------------------------+
| source-error     | Data reading error because of not supported or inconsistent contents.   |
+------------------+-------------------------------------------------------------------------+
| scheme-error     | Data reading error because of incorrect scheme.                         |
+------------------+-------------------------------------------------------------------------+
| format-error     | Data reading error because of incorrect format.                         |
+------------------+-------------------------------------------------------------------------+
| encoding-error   | Data reading error because of an encoding problem.                      |
+------------------+-------------------------------------------------------------------------+

Structural checks
~~~~~~~~~~~~~~~~~

These checks validate that the structure of the file are valid.

+--------------------+-----------------------------------------------------------------------------------+
| check              | description                                                                       |
+====================+===================================================================================+
| blank-header       | There is a blank header name. All cells in the header row must have a value.      |
+--------------------+-----------------------------------------------------------------------------------+
| duplicate-header   | There are multiple columns with the same name. All column names must be unique.   |
+--------------------+-----------------------------------------------------------------------------------+
| blank-row          | Rows must have at least one non-blank cell.                                       |
+--------------------+-----------------------------------------------------------------------------------+
| duplicate-row      | Rows can't be duplicated.                                                         |
+--------------------+-----------------------------------------------------------------------------------+
| extra-value        | A row has more columns than the header.                                           |
+--------------------+-----------------------------------------------------------------------------------+
| missing-value      | A row has less columns than the header.                                           |
+--------------------+-----------------------------------------------------------------------------------+

Content checks
~~~~~~~~~~~~~~

These checks validate the contents of the file. To use them, you need to
pass a `Table
Schema <https://specs.frictionlessdata.io/table-schema/>`__. If you
don't have a schema, goodtables can infer it if you use the
``infer_schema`` option.

If your schema only covers part of the data, you can use the
``infer_fields`` to infer the remaining fields.

Lastly, if the order of the fields in the data is different than in your
schema, enable the ``order_fields`` option.

+-----------------------------+-------------------------------------------------------------------------------------------+
| check                       | description                                                                               |
+=============================+===========================================================================================+
| schema-error                | Schema is not valid.                                                                      |
+-----------------------------+-------------------------------------------------------------------------------------------+
| non-matching-header         | The header's name in the schema is different from what's in the data.                     |
+-----------------------------+-------------------------------------------------------------------------------------------+
| extra-header                | The data contains a header not defined in the schema.                                     |
+-----------------------------+-------------------------------------------------------------------------------------------+
| missing-header              | The data doesn't contain a header defined in the schema.                                  |
+-----------------------------+-------------------------------------------------------------------------------------------+
| type-or-format-error        | The value can’t be cast based on the schema type and format for this field.               |
+-----------------------------+-------------------------------------------------------------------------------------------+
| required-constraint         | This field is a required field, but it contains no value.                                 |
+-----------------------------+-------------------------------------------------------------------------------------------+
| pattern-constraint          | This field value's should conform to the defined pattern.                                 |
+-----------------------------+-------------------------------------------------------------------------------------------+
| unique-constraint           | This field is a unique field but it contains a value that has been used in another row.   |
+-----------------------------+-------------------------------------------------------------------------------------------+
| enumerable-constraint       | This field value should be equal to one of the values in the enumeration constraint.      |
+-----------------------------+-------------------------------------------------------------------------------------------+
| minimum-constraint          | This field value should be greater or equal than constraint value.                        |
+-----------------------------+-------------------------------------------------------------------------------------------+
| maximum-constraint          | This field value should be less or equal than constraint value.                           |
+-----------------------------+-------------------------------------------------------------------------------------------+
| minimum-length-constraint   | A length of this field value should be greater or equal than schema constraint value.     |
+-----------------------------+-------------------------------------------------------------------------------------------+
| maximum-length-constraint   | A length of this field value should be less or equal than schema constraint value.        |
+-----------------------------+-------------------------------------------------------------------------------------------+

Advanced checks
~~~~~~~~~~~~~~~

+----------------------------------------------+---------------------------------------------------------------------------------------------------+
| check                                        | description                                                                                       |
+==============================================+===================================================================================================+
| `blacklisted-value <#blacklisted-value>`__   | Ensure there are no cells with the blacklisted values.                                            |
+----------------------------------------------+---------------------------------------------------------------------------------------------------+
| `deviated-value <#deviated-value>`__         | Ensure numbers are within a number of standard deviations from the average.                       |
+----------------------------------------------+---------------------------------------------------------------------------------------------------+
| `sequential-value <#sequential-value>`__     | Ensure numbers are sequential.                                                                    |
+----------------------------------------------+---------------------------------------------------------------------------------------------------+
| `truncated-value <#truncated-value>`__       | Detect values that were potentially truncated.                                                    |
+----------------------------------------------+---------------------------------------------------------------------------------------------------+
| `custom-constraint <#custom-constraint>`__   | Defines a constraint based on the values of other columns (e.g. ``value * quantity == total``).   |
+----------------------------------------------+---------------------------------------------------------------------------------------------------+

blacklisted-value
^^^^^^^^^^^^^^^^^

Sometimes we have to check for some values we don't want to have in out
dataset. It accepts following options:

+-------------+---------------+------------------------------+
| option      | type          | description                  |
+=============+===============+==============================+
| column      | int/str       | Column number or name        |
+-------------+---------------+------------------------------+
| blacklist   | list of str   | List of blacklisted values   |
+-------------+---------------+------------------------------+

Consider the following CSV file:

.. code:: csv

    id,name
    1,John
    2,bug
    3,bad
    5,Alex

Let's check that the ``name`` column doesn't contain rows with ``bug``
or ``bad``:

.. code:: python

    from goodtables import validate

    report = validate('data.csv', checks=[
        {'blacklisted-value': {'column': 'id', 'blacklist': ['bug', 'bad']}},
    ])
    # error on row 3 with code "blacklisted-value"
    # error on row 4 with code "blacklisted-value"

deviated-value
^^^^^^^^^^^^^^

This check helps to find outlines in a column containing positive
numbers. It accepts following options:

+------------+-----------+---------------------------------------------------------------------------+
| option     | type      | description                                                               |
+============+===========+===========================================================================+
| column     | int/str   | Column number or name                                                     |
+------------+-----------+---------------------------------------------------------------------------+
| average    | str       | Average type, either "mean", "median" or "mode"                           |
+------------+-----------+---------------------------------------------------------------------------+
| interval   | int       | Values must be inside range ``average ± standard deviation * interval``   |
+------------+-----------+---------------------------------------------------------------------------+

Consider the following CSV file:

.. code:: csv

    temperature
    1
    -2
    7
    0
    1
    2
    5
    -4
    100
    8
    3

We use ``median`` to get an average of the column values and allow
interval of 3 standard deviations. For our case median is ``2.0`` and
standard deviation is ``29.73`` so all valid values must be inside the
``[-87.19, 91.19]`` interval.

.. code:: python

    report = validate('data.csv', checks=[
        {'deviated-value': {'column': 'temperature', 'average': 'median', 'interval': 3}},
    ])
    # error on row 10 with code "deviated-value"

sequential-value
^^^^^^^^^^^^^^^^

This checks is for pretty common case when a column should have integers
that sequentially increment. It accepts following options:

+----------+-----------+-------------------------+
| option   | type      | description             |
+==========+===========+=========================+
| column   | int/str   | Column number or name   |
+----------+-----------+-------------------------+

Consider the following CSV file:

.. code:: csv

    id,name
    1,one
    2,two
    3,three
    5,five

Let's check if the ``id`` column contains sequential integers:

.. code:: python

    from goodtables import validate

    report = validate('data.csv', checks=[
        {'sequential-value': {'column': 'id'}},
    ])
    # error on row 5 with code "sequential-value"

truncated-value
^^^^^^^^^^^^^^^

Some database or spreadsheet software (like MySQL or Excel) could cutoff
values on saving. There are some well-known heuristics to find this bad
values. See
https://github.com/propublica/guides/blob/master/data-bulletproofing.md
for more detailed information.

Consider the following CSV file:

.. code:: csv

    id,amount,comment
    1,14000000,good
    2,2147483647,bad
    3,32767,bad
    4,234234234,bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbad

To detect all probably truncated values we could use ``truncated-value``
check:

.. code:: python

    report = validate('data.csv', checks=[
        'truncated-value',
    ])
    # error on row 3 with code "truncated-value"
    # error on row 4 with code "truncated-value"
    # error on row 5 with code "truncated-value"

custom-constraint
^^^^^^^^^^^^^^^^^

With Table Schema we could create constraints for an individual field
but sometimes it's not enough. With a custom constraint check every row
could be checked against given limited python expression in which
variable names resolve to column values. See list of `available
operators <https://github.com/danthedeckie/simpleeval#operators>`__. It
accepts following options:

.. raw:: html

   <dl>
     <dt>constraint (str)</dt>
     <dd>Constraint definition (e.g. <code>col1 + col2 == col3</code>)</dd>
   </dl>

Consider csv file like this:

.. code:: csv

    id,name,salary,bonus
    1,Alex,1000,200
    2,Sam,2500,500
    3,Ray,1350,500
    4,John,5000,1000

Let's say our business rule is to be shy on bonuses:

.. code:: python

    report = validate('data.csv', checks=[
        {'custom-constraint': {'constraint': 'salary > bonus * 4'}},
    ])
    # error on row 4 with code "custom-constraint"

Validation report format
~~~~~~~~~~~~~~~~~~~~~~~~

| The validation report follows the JSON Schema defined on
| `goodtables/schemas/report.json <goodtables/schemas/report.json>`__.
As an example, this is
| the report generated by running ``goodtables --row-limit 5 --json`` on
the file
| `data/datapackages/invalid/datapackage.json <data/datapackage/invalid/datapackage.json>`__:

.. code:: json

    {
        "time": 0.015,
        "valid": false,
        "error-count": 2,
        "table-count": 2,
        "tables": [
            {
                "datapackage": "data/datapackages/invalid/datapackage.json",
                "time": 0.005,
                "valid": false,
                "error-count": 1,
                "row-count": 4,
                "source": "data/datapackages/invalid/data.csv",
                "headers": [
                    "id",
                    "name",
                    "description",
                    "amount"
                ],
                "format": "inline",
                "schema": "table-schema",
                "errors": [
                    {
                        "code": "blank-row",
                        "row-number": 3,
                        "message": "Row 3 is completely blank"
                    }
                ]
            },
            {
                "datapackage": "data/datapackages/invalid/datapackage.json",
                "time": 0.004,
                "valid": false,
                "error-count": 1,
                "row-count": 5,
                "source": "data/datapackages/invalid/data2.csv",
                "headers": [
                    "parent",
                    "comment"
                ],
                "format": "inline",
                "schema": "table-schema",
                "errors": [
                    {
                        "code": "blank-row",
                        "row-number": 4,
                        "message": "Row 4 is completely blank"
                    }
                ]
            }
        ],
        "warnings": [
            "Table \"data/datapackages/invalid/data2.csv\" inspection has reached 5 row(s) limit"
        ],
        "preset": "datapackage"
    }

Developer documentation
-----------------------

Semantic versioning
~~~~~~~~~~~~~~~~~~~

| We follow the `Semantic Versioning <https://semver.org/>`__
specification to define our version
| numbers. This means that we'll increase the major version number when
there's a
| breaking change. Because of this, we recommend you to explicitly
specify the
| goodtables version on your dependency list (e.g. ``setup.py`` or
| ``requirements.txt``).

Validate
~~~~~~~~

| Goodtables validates your tabular dataset to find structural and
content
| errors. Consider you have a file named ``invalid.csv``. Let's validate
it:

.. code:: python

    report = validate('invalid.csv')

| We could also pass a remote URI instead of a local path. It supports
CSV, XLS,
| XLSX, ODS, JSON, and all other formats supported by the
`tabulator <https://github.com/frictionlessdata/tabulator-py/>`__
| library.

Validation report
^^^^^^^^^^^^^^^^^

| The output of the ``validate()`` method is a report dictionary. It
includes
| information if the data was valid, count of errors, list of table
reports, which
| individual checks failed, etc.

Resulting report will be looking like this:

.. code:: json

    {
        "time": 0.009,
        "error-count": 1,
        "warnings": [
            "Table \"data/invalid.csv\" inspection has reached 1 error(s) limit"
        ],
        "preset": "table",
        "valid": false,
        "tables": [
            {
                "errors": [
                    {
                        "row-number": null,
                        "message": "Header in column 3 is blank",
                        "row": null,
                        "column-number": 3,
                        "code": "blank-header"
                    }
                ],
                "error-count": 1,
                "headers": [
                    "id",
                    "name",
                    "",
                    "name"
                ],
                "scheme": "file",
                "row-count": 2,
                "valid": false,
                "encoding": "utf-8",
                "time": 0.007,
                "schema": null,
                "format": "csv",
                "source": "data/invalid"
            }
        ],
        "table-count": 1
    }

| Rase report errors are standardized and described
| in
| `Data Quality
Spec <https://github.com/frictionlessdata/data-quality-spec/blob/master/spec.json>`__.
| The errors are divided in one of the following categories:

-  ``source`` - data can't be loaded or parsed
-  ``structure`` - general tabular errors like duplicate headers
-  ``schema`` - error of checks against `Table
   Schema <http://specs.frictionlessdata.io/table-schema/>`__
-  ``custom`` - custom checks errors

Checks
^^^^^^

| Check is a main validation actor in goodtables. The list of enabled
checks can
| be changed using ``checks`` and ``skip_checks`` arguments. Let's
explore the options
| on an example:

.. code:: python

    report = validate('data.csv') # by default structure and schema (if available) checks
    report = validate('data.csv', checks=['structure']) # only structure checks
    report = validate('data.csv', checks=['schema']) # only schema (if available) checks
    report = validate('data.csv', checks=['bad-headers']) # check only 'bad-headers'
    report = validate('data.csv', skip_checks=['bad-headers']) # exclude 'bad-headers'

| By default a dataset will be validated against all available Data
Quality Spec
| errors. Some checks can be unavailable for validation. For example, if
the
| schema isn't provided, only the ``structure`` checks will be done.

Presets
^^^^^^^

| Goodtables support different formats of tabular datasets. They're
called
| presets. A tabular dataset is some data that can be split in a list of
data
| tables, as:

|Dataset|

| We can change the preset using the ``preset`` argument for
``validate()``. By
| default, it'll be inferred from the source, falling back to ``table``.
To validate
| a `data package <https://specs.frictionlessdata.io/data-package/>`__,
we can do:

.. code:: python

    report = validate('datapackage.json') # implicit preset
    report = validate('datapackage.json', preset='datapackage') # explicit preset

This will validate all tabular resources in the datapackage.

| It's also possible to validate a list of files using the "nested"
preset. To do
| so, the first argument to ``validate()`` should be a list of
dictionaries, where
| each key in the dictionary is named after a parameter on
``validate()``. For example:

.. code:: python

    report = validate([{'source': 'data1.csv'}, {'source': 'data2.csv'}]) # implicit preset
    report = validate([{'source': 'data1.csv'}, {'source': 'data2.csv'}], preset='nested') # explicit preset

Is similar to:

.. code:: python

    report_data1 = validate('data1.csv')
    report_data2 = validate('data2.csv')

| The difference is that goodtables validates multiple tables in
parallel, so
| calling using the "nested" preset should run faster.

Contributing
------------

This project follows the `Open Knowledge International coding
standards <https://github.com/okfn/coding-standards>`__.

| We recommend you to use ``virtualenv`` to isolate goodtables from the
rest of the
| packages in your machine.

To install goodtables and the development dependencies, run:

::

    $ make install

To run the tests, use:

.. code:: bash

    $ make test

FAQ
---

How can I add a new custom check?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To create a custom check user could use a ``check`` decorator. This way
the builtin check could be overridden (use the spec error code like
``duplicate-row``) or could be added a check for a custom error (use
``type``, ``context`` and ``position`` arguments):

.. code:: python

    from goodtables import validate, check, Error

    @check('custom-check', type='custom', context='body')
    def custom_check(cells):
        errors = []
        for cell in cells:
            message = 'Custom error on column {column_number} and row {row_number}'
            error = Error(
                'custom-error',
                cell,
                message
            )
            errors.append(error)
        return errors

    report = validate('data.csv', checks=['custom-check'])

For now this documentation section is incomplete. Please see builtin
checks to learn more about checking protocol.

How can I add support for a new tabular file type?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To create a custom preset user could use a ``preset`` decorator. This
way the builtin preset could be overridden or could be added a custom
preset.

.. code:: python

    from tabulator import Stream
    from tableschema import Schema
    from goodtables import validate

    @preset('custom-preset')
    def custom_preset(source, **options):
        warnings = []
        tables = []
        for table in source:
            try:
                tables.append({
                    'source':  str(source),
                    'stream':  Stream(...),
                    'schema': Schema(...),
                    'extra': {...},
                })
            except Exception:
                warnings.append('Warning message')
        return warnings, tables

    report = validate(source, preset='custom-preset')

For now this documentation section is incomplete. Please see builtin
presets to learn more about the dataset extraction protocol.

Changelog
---------

v2.0
~~~~

Breaking changes:

-  Checks method signature now only receives the current row's ``cells``
   list
-  Checks raise errors by returning an array of ``Error`` objects
-  Cells have the row number in the ``row-number`` key
-  Files with ZIP extension are presumed to be datapackages, so
   ``goodtables mydatapackage.zip`` works
-  Improvements to goodtables CLI
   (`#233 <https://github.com/frictionlessdata/goodtables-py/issues/233>`__)
-  New ``goodtables init <data paths>`` command to create a new
   ``datapackage.json`` with the files passed as parameters and their
   inferred schemas.

Bug fixes:

-  Fix bug with ``truncated-values`` check on date fields
   (`#250 <https://github.com/frictionlessdata/goodtables-py/issues/250>`__)

v1.5
~~~~

New API added:

-  Validation ``source`` now could be a ``pathlib.Path``

v1.4
~~~~

Improved behaviour:

-  rebased on Data Quality Spec v1
-  rebased on Data Package Spec v1
-  rebased on Table Schema Spec v1
-  treat primary key as required/unique field

v1.3
~~~~

New advanced checks added:

-  ``blacklisted-value``
-  ``custom-constraint``
-  ``deviated-value``
-  ``sequential-value``
-  ``truncated-value``

v1.2
~~~~

New API added:

-  ``report.preset``
-  ``report.tables[].schema``

v1.1
~~~~

New API added:

-  ``report.tables[].scheme``
-  ``report.tables[].format``
-  ``report.tables[].encoding``

v1.0
~~~~

This version includes various big changes. A migration guide is under
development and will be published here.

v0.6
~~~~

First version of ``goodtables``.

.. |Travis| image:: https://img.shields.io/travis/frictionlessdata/goodtables-py/master.svg
   :target: https://travis-ci.org/frictionlessdata/goodtables-py
.. |Coveralls| image:: http://img.shields.io/coveralls/frictionlessdata/goodtables-py.svg?branch=master
   :target: https://coveralls.io/r/frictionlessdata/goodtables-py?branch=master
.. |PyPi| image:: https://img.shields.io/pypi/v/goodtables.svg
   :target: https://pypi.python.org/pypi/goodtables
.. |Gitter| image:: https://img.shields.io/gitter/room/frictionlessdata/chat.svg
   :target: https://gitter.im/frictionlessdata/chat
.. |Dataset| image:: data/dataset.png

