Metadata-Version: 2.0
Name: jsontableschema
Version: 0.6.4
Summary: A utility library for working with JSON Table Schema in Python
Home-page: https://github.com/frictionlessdata/jsontableschema-py
Author: Open Knowledge Foundation
Author-email: info@okfn.org
License: MIT
Keywords: frictionless data,open data,json schema,json table schema,data package,tabular data package
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Web Environment
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Topic :: Internet :: WWW/HTTP :: Dynamic Content
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Dist: click (>=3.3)
Requires-Dist: requests (>=2.5.1)
Requires-Dist: python-dateutil (>=2.4.0)
Requires-Dist: rfc3986 (>=0.3.0)
Requires-Dist: jsonschema (>=2.5.1)
Requires-Dist: future (>=0.15.2)
Requires-Dist: tabulator (>=0.3)
Requires-Dist: unicodecsv (>=0.14)
Provides-Extra: develop
Requires-Dist: pylint; extra == 'develop'
Requires-Dist: tox; extra == 'develop'

# JSON Table Schema

[![Travis](https://travis-ci.org/frictionlessdata/jsontableschema-py.svg?branch=master)](https://travis-ci.org/frictionlessdata/jsontableschema-py)
[![Coveralls](http://img.shields.io/coveralls/frictionlessdata/jsontableschema-py.svg?branch=master)](https://coveralls.io/r/frictionlessdata/jsontableschema-py?branch=master)
[![PyPi](https://img.shields.io/pypi/v/jsontableschema.svg)](https://pypi.python.org/pypi/jsontableschema)
[![Gitter](https://img.shields.io/gitter/room/frictionlessdata/chat.svg)](https://gitter.im/frictionlessdata/chat)

A utility library for working with [JSON Table Schema](http://dataprotocols.org/json-table-schema/) in Python.

## Table of Contents

- [Goals](#goals)
- [Installation](#installation)
- [Components](#components)
  - [Model](#model) - a python model of a JSON Table Schema with useful methods for interaction
  - [Types](#types) - a collection of classes to validate type/format and constraints of data described by a JSON Table Schema
  - [Infer](#infer) - a utility that creates a JSON Table Schema based on a data sample
  - [Validate](#validate) - a utility to validate a **schema** as valid according to the current spec
  - [Push/pull](#pushpull) - utilities to push and pull resources to/from storage
  - [Storage](#storage) - Tabular Storage interface declaration
- [Plugins](#plugins)
  - [BigQuery](#bigquery) - Tabular Storage implementation for BigQuery
  - [SQL](#sql) - Tabular Storage implementation for SQL
- [CLI](#cli)
  - [Infer](#infer-1) - command line interface to infer utility
  - [Validate](#validate-1) - command line interface to validate utility
- [Contributing](#contributing)

## Goals

* A core set of utilities for working with [JSON Table Schema](http://dataprotocols.org/json-table-schema/)
* Use in *other* packages that deal with actual validation of data, or other 'higher level' use cases around JSON Table Schema (e.g. [Tabular Validator](https://github.com/okfn/tabular-validator))
* Be 100% compliant with the the JSON Table Schema specification (we are not there yet)

## Installation

```
pip install jsontableschema
```

## Components

Let's look at each of the components in more detail.

### Model

A model of a schema with helpful methods for working with the schema and
supported data. SchemaModel instances can be initialized with a schema source as a filepath or url to a JSON file, or a Python dict. The schema is initially validated (see [validate](#validate) below), and will raise an exception if not a valid JSON Table Schema.

```python
from jsontableschema.model import SchemaModel
...
schema = SchemaModel(file_path_to_schema)

# convert a row
schema.convert_row('12345', 'a string', 'another field')

# convert a set of rows
schema.convert([['12345', 'a string', 'another field'],
                ['23456', 'string', 'another field']])
```

Some methods available to SchemaModel instances:

* `headers` - return an array of the schema headers (property)
* `required_headers` - return headers with the `required` constraint as an array (property)
* `fields` - return an array of the schema's fields (property)
* `primaryKey` - return the primary key field for the schema (property)
* `foreignKey` - return the foreign key property for the schema (property)
* `cast(field_name, value, index=0)` - return a value cast against a named `field_name`.
* `get_field(field_name, index=0)` - return the field object for `field_name`
* `has_field(field_name)` - return a bool if the field exists in the schema
* `get_type(field_name, index=0)` - return the type for a given `field_name`
* `get_fields_by_type(type_name)` - return all fields that match the given type
* `get_constraints(field_name, index=0)` - return the constraints object for a given `field_name`
* `convert_row(*args, fail_fast=False)` - convert the arguments given to the types of the current schema,
* `convert(rows, fail_fast=False)` - convert an iterable `rows` using the current schema of the SchemaModel instance.

Where the optional `index` argument is available, it can be used as a positional argument if the schema has multiple fields with the same name.
Where the option `fail_fast` is given, it will raise the first error it encouters, otherwise an exceptions.MultipleInvalid will be raised (if there are errors).

### Types

Data values can be cast to native Python objects with a type instance from `jsontableschema.types`.

Types can either be instantiated directly, or returned from `SchemaModel` instances instantiated with a JSON Table Schema.

Casting a value will check the value is of the expected type, is in the correct format, and complies with any constraints imposed by a schema. E.g. a date value (in ISO 8601 format) can be cast with a DateType instance:

```python
from jsontableschema import types, exceptions

# Create a DateType instance
date_type = types.DateType()

# Cast date string to date
cast_value = date_type.cast('2015-10-27')

print(type(cast_value))
# <type 'datetime.date'>

```

Values that can't be cast will raise an `InvalidCastError` exception.

Type instances can be initialized with [field descriptors](http://dataprotocols.org/json-table-schema/#field-descriptors). This allows formats and constraints to be defined:

```python

field_descriptor = {
    'name': 'Field Name',
    'type': 'date',
    'format': 'default',
    'constraints': {
        'required': True,
        'minimum': '1978-05-30'
    }
}

date_type = types.DateType(field_descriptor)
```

Casting a value that doesn't meet the constraints will raise a `ConstraintError` exception.

Note: the `unique` constraint is not currently supported.

### Infer

Given headers and data, `infer` will return a JSON Table Schema as a Python dict based on the data values. Given the data file, data_to_infer.csv:

```csv
id,age,name
1,39,Paul
2,23,Jimmy
3,36,Jane
4,28,Judy
```

Call `infer` with headers and values from the datafile:

```python
import io
import csv

from jsontableschema import infer

filepath = 'data_to_infer.csv'
with io.open(filepath) as stream:
    headers = stream.readline().rstrip('\n').split(',')
    values = csv.reader(stream)

schema = infer(headers, values)
```

`schema` is now a schema dict:

```python
{u'fields': [
    {
        u'description': u'',
        u'format': u'default',
        u'name': u'id',
        u'title': u'',
        u'type': u'integer'
    },
    {
        u'description': u'',
        u'format': u'default',
        u'name': u'age',
        u'title': u'',
        u'type': u'integer'
    },
    {
        u'description': u'',
        u'format': u'default',
        u'name': u'name',
        u'title': u'',
        u'type': u'string'
    }]
}
```

The number of rows used by `infer` can be limited with the `row_limit` argument.

### Validate

Given a schema as JSON file, url to JSON file, or a Python dict, `validate` returns `True` for a valid JSON Table Schema, or raises an exception, `SchemaValidationError`.

```python
import io
import json

from jsontableschema import validate

filepath = 'schema_to_validate.json'

with io.open(filepath) as stream:
    schema = json.load(stream)

try:
    jsontableschema.validate(schema)
except jsontableschema.exceptions.SchemaValidationError as e:
   # handle errors

```

It may be useful to report multiple errors when validating a schema. This can be done with `validator.iter_errors()`.

```python

from jsontableschema import validator

filepath = 'schema_with_multiple_errors.json'
with io.open(filepath) as stream:
    schema = json.load(stream)
    errors = [i for i in validator.iter_errors(schema)]
```

Note: `validate()` validates whether a **schema** is a validate JSON Table Schema. It does **not** validate data against a schema.

### Push/pull

This utilities provide push and pull to/from storage possibilites
to JSON Table Schema resource (schema and data file).

This functionality requires some storage plugin installed. See
[plugins](#plugins) section for more information. Let's imagine we
have installed `jsontableschema-mystorage` (not a real name) plugin.

Then we could push and pull resources to/from the storage:

> All parameters should be used as keyword arguments.

```python
from jsontableschema import push_resource, pull_resource

# Push
push_resource(
    table='table_name', schema='schema_path', data='data_path',
    backend='mystorage, '**<mystorage_options>)

# Import
pull_resource(
    table='table_name', schema='schema_path', data='data_path',
    backend='mystorage', **<mystorage_options>)
```

Options could be a SQLAlchemy engine or a BigQuery project and dataset name etc.
Detailed desctiption you could find in a concrete plugin documentation.

See concrete exmples in [plugins](#plugins) section.

### Storage

On level between the high-level interface and low-level driver
package uses **Tabular Storage** concept:

![Tabular Storage](files/storage.png)

To write you own storage driver implement
`jsontableschema.storage.Storage` interface:

```python
from jsontableschema.storage import Storage

class CustomStorage(Storage):

    pass
```

Reference:
- [Tabular Storage](https://github.com/datapackages/jsontableschema-py/blob/feature/plugins-and-storage/jsontableschema/storage.py)

## Plugins

JSON Table Schema has a plugin system.
Any package with the name like `jsontableschema_<name>` could be imported as:

```python
from jsontableschema.plugins import <name>
```

If a plugin is not installed `ImportError` will be raised with
a message describing how to install the plugin.

Below there is a list of official supported plugins.

### BigQuery

Tabular Storage implementation for Google's BigQuery.

Installation:

```
$ pip install jsontableschema-bigquery
```

Push/pull:

> To start using Google BigQuery service:
> - Create a new project - [link](https://console.developers.google.com/home/dashboard)
> - Create a service key - [link](https://console.developers.google.com/apis/credentials)
> - Download json credentials and set `GOOGLE_APPLICATION_CREDENTIALS` environment variable

```python
import io
import os
import json
from apiclient.discovery import build
from oauth2client.client import GoogleCredentials
from jsontableschema import push_resource, pull_resource

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '.credentials.json'
credentials = GoogleCredentials.get_application_default()
service = build('bigquery', 'v2', credentials=credentials)
project = json.load(io.open('.credentials.json', encoding='utf-8'))['project_id']

# Push
push_resource(
    table='table_name', schema='schema_path', data='data_path',
    backend='bigquery', service=service, project=project, dataset='dataset', prefix='prefix_')

# Pull
pull_resource(
    table='table_name', schema='schema_path', data='data_path',
    backend='bigquery', service=service, project=project, dataset='dataset', prefix='prefix_')
```

Storage usage:

```python
from jsontableschema.plugins.bigquery import Storage

# Use Storage here
```

Reference:
- [Package Page](https://github.com/okfn/jsontableschema-bigquery-py)

### SQL

Tabular Storage implementation for SQL:

Installation:

```
$ pip install jsontableschema-sql
```

Push/pull:

```python
from sqlalchemy import create_engine
from jsontableschema import push_resource, pull_resource

engine = create_engine('sqlite:///:memory:')

# Push
push_resource(
    table='table_name', schema='schema_path', data='data_path',
    backend='sql', engine=engine, prefix='prefix_')

# Import
pull_resource(
    table='table_name', schema='schema_path', data='data_path',
    backend='sql', engine=engine, prefix='prefix_')
```

Storage usage:

```python
from jsontableschema.plugins.sql import Storage

# Use Storage here
```

Reference:
- [Package Page](https://github.com/okfn/jsontableschema-sql-py)

## CLI

JSON Table Schema features a CLI called `jsontableschema`. This CLI exposes the `infer` and `validate` functions for command line use.

### Infer

```
$ jsontableschema infer path/to/data.csv
```

The optional argument `--encoding` allows a character encoding to be specified for the data file. The default is utf-8.

The response is a schema as JSON.

See the above [Infer](#infer) section for details.

### Validate

```
$ jsontableschema validate path/to-schema.json
```

See the above [Validate](#validate) section for details.

## Contributing

Please read the contribution guideline:

[How to Contribute](CONTRIBUTING.md)

Thanks!

