Metadata-Version: 2.1
Name: corintick
Version: 0.2.0
Summary: Column-based datastore for historical timeseries
Home-page: https://github.com/plugaai/corintick
Author: Gustavo Bezerra
Author-email: gusutabopb@gmail.com
License: GPL
Platform: UNKNOWN
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Development Status :: 4 - Beta
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Topic :: Database
Classifier: Topic :: Database :: Front-Ends
Classifier: Topic :: Software Development :: Libraries
Requires-Python: >=3.6
Provides-Extra: test
Requires-Dist: lz4 (>=1.0.0)
Requires-Dist: pandas (>=0.23)
Requires-Dist: pymongo (>=3.6)
Requires-Dist: numpy
Requires-Dist: pytz
Requires-Dist: msgpack-python
Provides-Extra: test
Requires-Dist: pytest; extra == 'test'
Requires-Dist: pytest-cov; extra == 'test'
Requires-Dist: flake8; extra == 'test'

corintick
=========

Column-based datastore for historical timeseries data. Corintick is
designed mainly to store `pandas <http://pandas.pydata.org/>`__
DataFrames that represent timeseries.


Instalation
-----------

In order to use Corintick you need MongoDB. See installation
instructions `here <https://docs.mongodb.com/manual/installation/>`__.

Corintick itself can be installed with ``pip``:

.. code:: bash

   $ pip install corintick


Quickstart
----------

Initialize Corintick:

.. code:: python

   from corintick import Corintick
   corin = Corintick()

Now we need a DataFrame to insert into Corintick. For demonstration
purposes, we will get data from `Quandl <https://www.quandl.com/>`__:

.. code:: python

   import quandl
   df1 = quandl.get('TSE/7203')

Here, ``df1`` looks like this:

.. code:: text

                 Open    High     Low   Close      Volume
   Date
   2012-08-23  3240.0  3270.0  3220.0  3260.0   4652200.0
   2012-08-24  3225.0  3245.0  3210.0  3235.0   3659600.0
   2012-08-27  3250.0  3280.0  3215.0  3220.0   3614600.0
   2012-08-28  3235.0  3260.0  3150.0  3180.0   6759100.0
   2012-08-29  3180.0  3195.0  3160.0  3175.0   2614800.0
   2012-08-30  3180.0  3190.0  3160.0  3170.0   3291700.0
   2012-08-31  3135.0  3155.0  3095.0  3095.0   5663800.0
   ...

Writing
^^^^^^^

Inserting ``df1`` into Corintick is simple:

.. code:: python

   corin.write('7203.T', df1, source='Quandl', country='Japan')

The first argument passed to ``corintick.write`` is an UID (universal
identifier) and must be unique for each timeseries inserted in a given
collection. The second argument is the dataframe to be inserted. The
remaining keyword arguments are optional metadata tags that can be
attached to the dataframe/document for querying.

Reading
^^^^^^^

Reading from Corintick is also straightforward:

.. code:: python

   df2 = corin.read('7203.T')

You can also specify ``start`` and ``end`` as ISO-8601 datetime string...

.. code:: python

   df2 = corin.read('7203.T', start='2014-01-01', end='2014-12-31')


.. code:: text

                 Open    High     Low   Close      Volume
   2014-01-06  6360.0  6400.0  6280.0  6300.0  12249300.0
   2014-01-07  6270.0  6340.0  6260.0  6270.0   7891400.0
   2014-01-08  6310.0  6320.0  6260.0  6300.0   7184100.0
   2014-01-09  6310.0  6340.0  6260.0  6270.0   8653000.0
   2014-01-10  6260.0  6310.0  6250.0  6290.0   7815900.0
   ...
   2014-12-24  7645.0  7687.0  7639.0  7657.0  9287900.0
   2014-12-25  7600.0  7655.0  7597.0  7611.0  5362700.0
   2014-12-26  7629.0  7700.0  7615.0  7696.0  6069100.0
   2014-12-29  7740.0  7746.0  7565.0  7662.0  9942800.0
   2014-12-30  7652.0  7674.0  7558.0  7558.0  7821200.0

...and which columns you want retrieved:

.. code:: python

   df2 = corin.read('7203.T', columns=['Close', 'Volume'], start='2017-05-10')

.. code:: text

                Close      Volume
   2017-05-10  6081.0   7823700.0
   2017-05-11  6123.0  13511900.0
   2017-05-12  6047.0   8216600.0
   2017-05-15  6009.0   5925200.0
   2017-05-16  6093.0   6449300.0
   ...

Configuration
^^^^^^^^^^^^^

By default, Corintick tries to use a MongoDB instance running at ``localhost:27017``.
This can be changed through the ``host`` and ``port`` arguments of the ``Corintick`` initializer.
Similarly, the database to be used by Corintick defaults to ``corintick`` and can also be changed using
the ``db`` parameter.
All the data in the ``db`` database is assumed to be Corintick data. Avoid having any
other process/application reading/writing data to that database.

In case your MongoDB setup requires authentication, you can use the ``username`` and ``password`` arguments.

See ``Corintick.__init__`` for details.


Collections
-----------

Corintick can use multiple collections to better organize data. A
Corintick collection is the same as a MongoDB collection. In each
collection, only a single dataframe/document can exist for a given UID
for a given time period.

In case you need to store two different types of data for a same UID
over an overlapping time frame (i.e. trade data and order book data for
a given stock), you should separate the two different types of data into
different collections.

By default, data is written to the ``corintick`` collection.
This  default collection can be changed by assigning a string to
``Corintick.default_collection``.

.. code:: python

   >>> corin.collection = 'another_collection'

Collections can also be specified on a method call basis:

.. code:: python

   df = corin.read('7203.T', collection='orderbook')

.. code:: python

   corin.write(df, collection='another_collection')


Corintick mechanics
-------------------

During writing, Corintick does the following:

1) Takes the input DataFrame and splits into columns
2) Serializes/compresses each using the LZ4 compression algorithm
3) Generates a MongoDB document containing the binary blobs
   corresponding to each column and other metadata

During reading, the opposite takes places:

1) Documents are fetched
2) Data is decompressed and converted back to numpy arrays
3) DataFrame is reconstructed and returned to the user

Background
----------

Corintick was inspired by and aims to be a simplified version of Man
AHL’s `Arctic <https://github.com/manahl/arctic>`__.



Differences from Arctic
^^^^^^^^^^^^^^^^^^^^^^^

Corintick has a single storage engine, which is column-based and not
versioned, similar to Arctic’s TickStore. However, differently from
TickStore, it does support non-numerical ``object`` dtype columns by
parsing them into MessagePack string objects

Naming
^^^^^^

Corintick aimed from the beginning to be a column-based data storage.
"Corintick" is a blend of “Corinthan” (style of Roman columns) and
"tick".

Benchmarks
----------

**TODO**

- **vs InfluxDB**
- **vs vanila MongoDB**
- **vs MySQL**
- **vs KDB+ (32-bit)**

Contributing
------------

| To contribute, fork the repository on GitHub, make your changes and
  submit a pull request.
| Corintick is not a mature project yet, so just simply raising issues
  is also greatly appreciated :)


