Metadata-Version: 2.1
Name: babbage
Version: 0.4.0
Summary: A light-weight analytical engine for OLAP processing
Home-page: http://github.com/openspending/babbage
Author: Friedrich Lindenberg
Author-email: friedrich@pudo.org
License: MIT
Keywords: sql sqlalchemy olap cubes analytics
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Description-Content-Type: text/markdown
Requires-Dist: normality (>=0.2.2)
Requires-Dist: PyYAML (>=3.10)
Requires-Dist: six (>=1.7.3)
Requires-Dist: flask (>=0.10.1)
Requires-Dist: jsonschema (>=2.5.1)
Requires-Dist: sqlalchemy (>=1.0)
Requires-Dist: psycopg2 (>=2.6)
Requires-Dist: grako (==3.10.1)

# Babbage Analytical Engine

[![Gitter](https://img.shields.io/gitter/room/openspending/chat.svg)](https://gitter.im/openspending/chat)
[![Build Status](https://travis-ci.org/openspending/babbage.svg?branch=master)](https://travis-ci.org/openspending/babbage)
[![Coverage Status](https://coveralls.io/repos/openspending/babbage/badge.svg?branch=master&service=github)](https://coveralls.io/github/openspending/babbage?branch=master)

``babbage`` is a lightweight implementation of an OLAP-style database
query tool for PostgreSQL. Given a database schema and a logical model
of the data, it can be used to perform analytical queries against that
data - programmatically or via a web API.

It is heavily inspired by [Cubes](http://cubes.databrewery.org/) but
has less ambitious goals, i.e. no pre-computation of aggregates, or
multiple storage backends.

``babbage`` is not specific to government finances, and could easily be used e.g. for ReGENESIS, a project that makes German national statistics available via an API. The API functions by interpreting modelling metadata generated by the user (measures and dimensions).

## Installation and test

``babbage`` will normally included as a PyPI dependency, or installed via
``pip``:

```bash
$ pip install babbage
```

People interested in contributing to the package should instead check out the
source repository and then use the provided ``Makefile`` to install the
library (this requires ``virtualenv`` to be installed):

```bash
$ git clone https://github.com/openspending/babbage.git
$ cd babbage
$ make install
$ pip install tox
$ export BABBAGE_TEST_DB=postgresql://postgres@localhost:5432/postgres
$ make test
```

## Usage

``babbage`` is used to query a set of existing database tables, using an 
abstract, logical model to query them. A sample of a logical model can be
found in ``tests/fixtures/models/cra.json``, and a JSON schema specifying
the model is available in ``babbage/schema/model.json``.

The central unit of ``babbage`` is a ``Cube``, i.e. a [OLAP cube](https://en.wikipedia.org/wiki/OLAP_cube) that uses the provided model metadata to construct queries 
against a database table. Additionally, the application supports managing
multiple cubes at the same time via a ``CubeManager``, which can be
subclassed to enable application-specific ways of defining cubes and where
their metadata is stored.

Futher, ``babbage`` includes a Flask Blueprint that can be used to expose
a standard API via HTTP. This API is consumed by the JavaScript ``babbage.ui``
package and it is very closely modelled on the Cubes and OpenSpending HTTP
APIs.

### Programmatic usage

Let's assume you have an existing database table of procurement data and
want to query it using ``babbage`` in a Python shell. A session might look
like this:

```python
import json
from sqlalchemy import create_engine
from babbage.cube import Cube
from babbage.model import Measure

engine = create_engine('postgresql://localhost/procurement')
model = json.load(open('procurement_model.json', 'r'))

cube = Cube(engine, 'procurement', model)
facts = cube.facts(page_size=5)

# There are 17201 rows in the table:
assert facts['total_fact_count'] == 17201

# There's a field called 'total_value':
assert 'total_value' in facts['fields']

# We can get metadata about it:
concept = cube.model['total_value']
assert isinstance(concept, Measure)
assert concept.label == 'Total Value'

# And there's some actual data:
assert len(facts['data']) == 5
fact_0 = facts['data'][0]
assert 'total_value' in fact_0

# For dimensions, we can get all the distinct values:
members = cube.members('supplier', cut='year:2015', page_size=500)
assert len(members['data']) <= 500
assert members['total_member_count']

# And, finally, we can aggregate by specific dimensions:
aggregate = cube.aggregate(aggregates='total_value.sum',
                           drilldowns='supplier|authority'
                           cut='year:2015|authority.country:GB',
                           page_size=500)
# This translates to: 
#   Aggregate the procurement data by summing up the 'total_value'
#   for each unique pair of values in the 'supplier' and 'authority'
#   dimensions, and filter for only those entries where the 'year'
#   dimensions key attribute is '2015' and the 'authority' dimensions
#   'country' attribute is 'GB'. Return the first 500 results.
assert aggregate['total_cell_count']
assert len(aggregate['cells']) <= 500
aggregate_0 = aggregate['cells'][0]
assert 'total_value.sum' in aggregate_0

# Note that these attribute names are made up for this example, they
# should be reflected from the model:
assert 'supplier.code' in aggregate_0
assert 'supplier.label' in aggregate_0
assert 'authority.code' in aggregate_0
assert 'authority.label' in aggregate_0
```

### Using the HTTP API

The HTTP API for ``babbage`` is a simple Flask [Blueprint](http://flask.pocoo.org/docs/latest/blueprints/) used to expose a small set of calls that correspond to
the cube functions listed above. To include it into an existing Flask
application, you would need to create a ``CubeManager`` and then
configure the API like this: 

```python
from flask import Flask
from sqlalchemy import create_engine
from babbage.manager import JSONCubeManager
from babbage.api import configure_api

app = Flask('demo')
engine = 
models_directory = 'models/'
manager = JSONCubeManager(engine, models_directory)
blueprint = configure_api(app, manager)
app.register_blueprint(blueprint, url_prefix='/api/babbage')

app.run()
```

Of course, you can define your own ``CubeManager``, for example if
you wish to retrieve model metadata from a database.

When enabled, the API will expose a number of JSON(P) endpoints
relative to the given ``url_prefix``:

* ``/``, returns the system status and version.
* ``/cubes``, returns a list of the available cubes (name only).
* ``/cubes/<name>/model``, returns full metadata for a given 
  cube (i.e. measures, dimensions, aggregates etc.)
* ``/cubes/<name>/facts`` is used to return individual entries from
  the cube in a non-aggregated form. Supports filters (``cut``), a
  set of ``fields`` to return and a ``sort`` (``field_name:direction``),
  as well as ``page`` and ``page_size``.
* ``/cubes/<name>/members`` is used to return the distinct set of 
  values for a given dimension, e.g. all the suppliers mentioned in
  a procurement dataset. Supports filters (``cut``), a and a ``sort``
  (``field_name:direction``), as well as ``page`` and ``page_size``.
* ``/cubes/<name>/aggregate`` is the main endpoint for generating 
  aggregate views of the data. Supports specifying the ``aggregates``
  to include, the ``drilldowns`` to aggregate by, a set of filters
  (``cut``), a and a ``sort`` (``field_name:direction``), as well
  as ``page`` and ``page_size``.



