Metadata-Version: 2.0
Name: cooler
Version: 0.6.6
Summary: Sparse binary format for genomic interaction matrices
Home-page: https://github.com/mirnylab/cooler
Author: Nezar Abdennur
Author-email: nezar@mit.edu
License: BSD3
Keywords: genomics,bioinformatics,Hi-C,contact,matrix,format,hdf5
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Requires-Dist: biopython
Requires-Dist: click (>=6.6)
Requires-Dist: h5py (>=2.5)
Requires-Dist: multiprocess
Requires-Dist: numpy (>=1.9)
Requires-Dist: pandas (>=0.17)
Requires-Dist: pyfaidx
Requires-Dist: pypairix
Requires-Dist: pysam (>0.8)
Requires-Dist: scipy (>=0.16)
Requires-Dist: six
Provides-Extra: docs
Requires-Dist: Sphinx (>=1.1); extra == 'docs'
Requires-Dist: numpydoc (>=0.5); extra == 'docs'

Cooler
======

|Build Status| |Documentation Status| |Binder| |Join the chat at
https://gitter.im/mirnylab/cooler|

A cool place to store your Hi-C
-------------------------------

Cooler is a support library for a **sparse, compressed, binary**
persistent storage format for Hi-C contact matrices, called ``cool``,
which is based on HDF5.

Cooler aims to provide the following functionality:

-  Generate contact matrices from contact lists at arbitrary
   resolutions.
-  Store contact matrices efficiently in ``cool`` format based on the
   widely used
   `HDF5 <https://en.wikipedia.org/wiki/Hierarchical_Data_Format>`__
   container format.
-  Perform out-of-core genome wide contact matrix normalization (a.k.a.
   balancing)
-  Perform fast range queries on a contact matrix.
-  Convert contact matrices between formats.
-  Provide a clean and well-documented Python API to work with Hi-C
   data.

To get started:

-  Documentation is available
   `here <http://cooler.readthedocs.org/en/latest/>`__.
-  `Walkthrough <https://github.com/mirnylab/cooler-binder>`__ with a
   Jupyter notebook.
-  ``cool`` files from published Hi-C data sets are available at
   ``ftp://cooler.csail.mit.edu/coolers``.

Installation
~~~~~~~~~~~~

Requirements:

-  Python 2.7/3.4+
-  libhdf5 and Python packages ``numpy``, ``scipy``, ``pandas``,
   ``h5py``. We highly recommend using the ``conda`` package manager to
   install scientific packages like these. To get it, you can either
   install the full `Anaconda <https://www.continuum.io/downloads>`__
   Python distribution or just the standalone
   `conda <http://conda.pydata.org/miniconda.html>`__ package manager.

Install from PyPI using pip.

.. code:: sh

    $ pip install cooler

See the `docs <http://cooler.readthedocs.org/en/latest/>`__ for more
information.

Command line interface
~~~~~~~~~~~~~~~~~~~~~~

The ``cooler`` library includes utilities for creating and querying
``cool`` files and for performing contact matrix balancing on a ``cool``
file of any resolution.

.. code:: bash

    $ cooler makebins $CHROMSIZES_FILE $BINSIZE > bins.10kb.bed
    $ cooler cload bins.10kb.bed $CONTACTS_FILE out.cool
    $ cooler balance -p 10 out.cool
    $ cooler dump -b -t pixels --header --join -r chr3:10,000,000-12,000,000 -r2 chr17 out.cool | head

::

    chrom1  start1  end1    chrom2  start2  end2    count   balanced
    chr3    10000000        10010000        chr17   0       10000   1       0.810766
    chr3    10000000        10010000        chr17   520000  530000  1       1.2055
    chr3    10000000        10010000        chr17   640000  650000  1       0.587372
    chr3    10000000        10010000        chr17   900000  910000  1       1.02558
    chr3    10000000        10010000        chr17   1030000 1040000 1       0.718195
    chr3    10000000        10010000        chr17   1320000 1330000 1       0.803212
    chr3    10000000        10010000        chr17   1500000 1510000 1       0.925146
    chr3    10000000        10010000        chr17   1750000 1760000 1       0.950326
    chr3    10000000        10010000        chr17   1800000 1810000 1       0.745982

See also:

-  `CLI Reference <http://cooler.readthedocs.io/en/latest/cli.html>`__.
-  Jupyter Notebook
   `walkthrough <https://github.com/mirnylab/cooler-binder>`__.

Python API
~~~~~~~~~~

The ``cooler`` library provides a thin wrapper over the excellent
`h5py <http://docs.h5py.org/en/latest/>`__ Python interface to HDF5. It
supports creation of cooler files and the following types of **range
queries** on the data:

-  Tabular selections are retrieved as Pandas DataFrames and Series.
-  Matrix selections are retrieved as NumPy arrays or SciPy sparse
   matrices.
-  Metadata is retrieved as a json-serializable Python dictionary.
-  Range queries can be supplied using either integer bin indexes or
   genomic coordinate intervals.

.. code:: python


    >>> import cooler
    >>> import matplotlib.pyplot as plt
    >>> c = cooler.Cooler('bigDataset.cool')
    >>> resolution = c.info['bin-size']
    >>> mat = c.matrix(balance=True).fetch('chr5:10,000,000-15,000,000')
    >>> plt.matshow(np.log10(mat), cmap='YlOrRd')

.. code:: python

    >>> import multiprocessing as mp
    >>> import h5py
    >>> pool = mp.Pool(8)
    >>> f = h5py.File('bigDataset.cool', 'r')
    >>> weights, stats = cooler.ice.iterative_correction(f, map=pool.map, ignore_diags=3, min_nnz=10)

See also:

-  `API Reference <http://cooler.readthedocs.io/en/latest/api.html>`__.
-  Jupyter Notebook
   `walkthrough <https://github.com/mirnylab/cooler-binder>`__.

Schema
~~~~~~

The ``cool``
`format <http://cooler.readthedocs.io/en/latest/datamodel.html>`__
implements a simple schema that stores a contact matrix in a sparse
representation, crucial for developing robust tools for use on
increasingly high resolution Hi-C data sets, including streaming and
`out-of-core <https://en.wikipedia.org/wiki/Out-of-core_algorithm>`__
algorithms.

The data tables in a ``cool`` file are stored in a **columnar**
representation as HDF5 groups of 1D array datasets of equal length. The
contact matrix itself is stored as a single table containing only the
**nonzero upper triangle** pixels.

Contributing
~~~~~~~~~~~~

`Pull
requests <https://akrabat.com/the-beginners-guide-to-contributing-to-a-github-project/>`__
are welcome. The current requirements for testing are ``nose`` and
``mock``.

For development, clone and install in "editable" (i.e. development) mode
with the ``-e`` option. This way you can also pull changes on the fly.

.. code:: sh

    $ git clone https://github.com/mirnylab/cooler.git
    $ cd cooler
    $ pip install -e .

License
~~~~~~~

BSD (New)

.. |Build Status| image:: https://travis-ci.org/mirnylab/cooler.svg?branch=master
   :target: https://travis-ci.org/mirnylab/cooler
.. |Documentation Status| image:: https://readthedocs.org/projects/cooler/badge/?version=latest
   :target: http://cooler.readthedocs.org/en/latest/
.. |Binder| image:: http://mybinder.org/badge.svg
   :target: http://mybinder.org:/repo/mirnylab/cooler-binder
.. |Join the chat at https://gitter.im/mirnylab/cooler| image:: https://badges.gitter.im/mirnylab/cooler.svg
   :target: https://gitter.im/mirnylab/cooler?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge


