Metadata-Version: 2.1
Name: lshmm
Version: 0.0.7
License: MIT
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numba

# lshmm
**lshmm** is a Python library for prototyping, experimenting, and testing implementations of algorithms using the Li & Stephens (2003) Hidden Markov Model.

## Usage

### Inputs
#### Data
* Sample and/or ancestral haplotypes comprising a reference panel.
* Query haplotypes.

In the haploid mode, the alleles in haplotypes can be represented by any integer value (besides `-1` and `-2`, which are special values). In the diploid mode, the genotypes (encoded as allele dosages) can be `0` (homozygous for the reference allele), `1` (heterozygous for the alternative allele), or `2` (homozygous for the alternative allele). Currently, multiallelic sites are supported in the haploid mode, but not the diploid mode.

Note that there are two special values `NONCOPY` and `MISSING`. `NONCOPY` (or `-2`) represent non-copiable states, and can only be found in partial ancestral haplotypes in the reference panel. `MISSING` (or `-1`) representing missing data, and can be found only in query haplotypes.

#### Parameters
* Recombination probabilities.
* Mutation probabilities.

### Models and algorithms
* Haploid LS HMM
  * Forward-backward algorithm
  * Viterbi algorithm
* Diploid LS HMM
  * Forward-backward algorithm
  * Viterbi algorithm

### Features
* Scaling of mutation rate by the number of distinct alleles per site.
* Non-copiable state in the reference panel (`NONCOPY`).
* Missing state in the query (`MISSING`).
* Multiallelic sites (haploid only).
