=======
msprime
=======

Msprime is a reimplementation of Hudson's classical ms program for modern
datasets. It is under heavy development at the moment, and the present
code should be considered beta quality. Bugs may be present,
and the API might change considerably. We provide an ms-compatible
command line interface ``mspms``, which should be useful to fit into
existing workflows. However, the Python API is the recommended way to
use msprime to gain full advantage of its internal structures.

See the `Examples`_ section below for some quick examples on how to
use the API. Full documentation will be forthcoming.

-------------
Installation
-------------

The simplest method of installation is to use PyPI and pip::

    $ pip install msprime --user

will install msprime your user Python installation. This should
work in most cases (but see the `Requirements`_ section).

To use the ms-compatible command line program, you must ensure
that the ``~/.local/bin/`` directory is in your ``PATH``.


*************
Requirements
*************

Msprime requires Python 2.7+ (Python 3 versions are fully supported from
3.1 onwards), and the `GNU Scientific Library <http://www.gnu.org/software/gsl/>`_,
which is available for all major platforms. For example, to install on
Debian/Ubuntu use::

    # apt-get install libgsl0-dev

--------
Examples
--------

To simulate a tree under the standard coalescent for a sample of 4
individuals, use::

    >>> import msprime
    >>> pi, tau = msprime.simulate_tree(4)

Here, ``pi`` is an oriented tree, and ``tau`` is the node time list.
See the documentation for
`ercs <http://jeromekelleher.github.io/ercs/#oriented-trees-and-forests>`_
for an explanation of oriented forests. The oriented forest is a simple
list of integers describing the genealogy of the sample in a concise
and elegant manner::

    >>> pi
    [-1, 6, 5, 6, 5, 7, 7, 0]

To simulate the history of the sample over multiple loci with recombination,
use the ``simulate_trees(n, m, r)`` function. Here, ``n`` is the sample
size, ``m`` is the number of loci and ``r`` is the scaled recombination rate
``4 * Ne * rho`` and ``rho`` is the per-generation recombination rate (note that
we do *not* scale the recombination rate by the number of loci as
ms does). For
example::

    >>> for l, pi, tau in msprime.simulate_trees(4, 20, 0.1):
    ...     print(l, ":", pi, ":", tau)
    ...
    5 : [-1, 5, 6, 5, 7, 6, 7, 0] : [-1, 0, 0, 0, 0, 0.008097067475318909, 0.29228582978248596, 1.4635169506072998]
    2 : [-1, 5, 6, 5, 7, 6, 7, 0] : [-1, 0, 0, 0, 0, 0.008097067475318909, 0.29228582978248596, 1.1387423276901245]
    1 : [-1, 5, 6, 5, 7, 6, 7, 0] : [-1, 0, 0, 0, 0, 0.008097067475318909, 0.29228582978248596, 1.4635169506072998]
    4 : [-1, 5, 6, 5, 7, 6, 7, 0] : [-1, 0, 0, 0, 0, 0.008097067475318909, 0.29228582978248596, 1.4635169506072998]
    3 : [-1, 5, 6, 5, 7, 6, 7, 0] : [-1, 0, 0, 0, 0, 0.008097067475318909, 0.29228582978248596, 1.2797138690948486]
    3 : [-1, 5, 7, 5, 6, 6, 7, 0] : [-1, 0, 0, 0, 0, 0.008097067475318909, 0.23547497391700745, 0.29228582978248596]
    2 : [-1, 5, 6, 5, 6, 7, 7, 0] : [-1, 0, 0, 0, 0, 0.008097067475318909, 0.10243768990039825, 0.23547497391700745]

The ``simulate_trees`` function returns an iterator over the trees
generated by the coalescent with recombination (``pi`` and ``tau``) along
with the length of each tree, ``l``. For instance, the first tree in the simulation
above is 5 loci long, and the length of the second is 2 loci.


***********
Demography
***********

Msprime supports demographic events in the history of the sample.
These are described using ``PopulationModel`` objects. A population
model describes the dynamics of a population starting from a given
time point (as we look into the past), and may be either a fixed
size or exponentially growing (or shrinking). The models (and parameters)
correspond exactly to the ``-eN`` and ``-eG`` options of ms.

For example, to simulate a history in which the population size was
twice its current size between 0.5 and 1.0 time units ago and
half its size after that, we might use::

    >>> models = [
    ...    msprime.ConstantPopulationModel(0.5, 2.0),
    ...    msprime.ConstantPopulationModel(1.0, 0.5)
    ...]
    >>> pi, tau = msprime.simulate_tree(10, models)

Similarly, if we wanted to simulate a population currently growing
exponentially, with growth rate ``alpha = 6``, we might use::

    >>> models = [msprime.ExponentialPopulationModel(0.0, 6.0)]
    >>> pi, tau = msprime.simulate_tree(10, models)

These population models can be mixed freely, and should behave
precisely the same way as in ms.
