Metadata-Version: 2.4
Name: PyMSQ
Version: 0.1.2
Summary: A Python package for fast Mendelian sampling (co)variance and haplotype-based similarity in genomic selection
Home-page: https://github.com/aromemusa/PyMSQ
Author: Abdulraheem Musa, Norbert Reinsch
Author-email: musa@fbn-dummerstorf.de, reinsch@fbn-dummerstorf.de
License: MIT
Project-URL: Source, https://github.com/aromemusa/PyMSQ
Project-URL: Bug Tracker, https://github.com/aromemusa/PyMSQ/issues
Project-URL: Documentation, https://github.com/aromemusa/PyMSQ/tree/main/docs
Project-URL: Zenodo (v0.1.2 DOI), https://doi.org/10.5281/zenodo.18643470
Project-URL: Software paper, https://doi.org/10.1186/s12859-026-06392-5
Project-URL: Method paper, https://doi.org/10.1111/jbg.12930
Keywords: Mendelian sampling,variance,covariance,similarity,selection,haplotype diversity
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy<1.25
Requires-Dist: pandas
Requires-Dist: scipy
Requires-Dist: numba<0.58
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license
Dynamic: license-file
Dynamic: project-url
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# PyMSQ: a Python package for fast Mendelian sampling (co)variance and haplotype-based similarity in genomic selection

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.18643470.svg)](https://doi.org/10.5281/zenodo.18643470)

PyMSQ is an open-source Python package that enables breeders, geneticists, and quantitative biologists to estimate Mendelian sampling–related metrics—including variance, covariance, and haplotype-based similarities—in both plant and animal species. For simplicity, PyMSQ consists of a single module, [`msq`](docs/documentation_msq.md).

## Key Features
- **Within-Family Covariance**  
  Constructs population-specific covariance matrices that capture within-family linkage disequilibrium, reflecting recombination patterns and phased marker data.

- **Mendelian Sampling (Co)Variance**  
  Estimates Mendelian sampling variance (MSV) for single or multiple traits, as well as covariances (MSCs), crucial for maintaining genetic diversity and controlling inbreeding.

- **Similarity Matrices**  
  Computes haplotype-based similarity matrices between individuals (or zygotes), focusing on shared heterozygous segments that drive within-family genetic variation.

- **Selection Criteria**  
  Offers functions to derive selection strategies (e.g., GEBVs, usefulness criteria, index-based approaches) that leverage MSV/MSC or similarity measures.

## Installation
PyMSQ is available on PyPI and can be installed via:

```bash
python -m pip install PyMSQ
```
    

## Basic Usage
Below is a minimal example illustrating how to import PyMSQ and call its core functions:


```python
from PyMSQ import msq  # Imports the msq module

# Example: Loading an included dataset
data = msq.load_package_data()

# Deriving expected LD matrices
ld_matrices = msq.expldmat(data['chromosome_data'], data['group_data'])

# Estimating Mendelian sampling (co)variances
msv = msq.msvarcov(
    gmat      = data['genotype_data'],
    gmap      = data['chromosome_data'],
    meff      = data['marker_effect_data'],
    exp_ldmat = ld_matrices,
    group     = data['group_data']
)

# Constructing similarity matrices
similarity = msq.simmat(
    gmat      = data['genotype_data'],
    gmap      = data['chromosome_data'],
    meff      = data['marker_effect_data'],
    group     = data['group_data'],
    exp_ldmat = ld_matrices
)
```


## Tutorial
A tutorial detailing each function’s parameters, usage examples, and best practices can be found [`here`](docs/Illustration_of_PyMSQ_functions.md). This tutorial walks you through:

1. **Loading** your own data or the bundled Holstein-Friesian dataset,

2. **Building** LD matrices for each chromosome,

3. **Estimating** Mendelian sampling (co)variances,

4. **Deriving** haplotype-based similarity,

5. **Applying** selection strategies using advanced metrics.


## Citation
If you use PyMSQ in academic work, please cite the following papers:

1. Musa, A. A., & Reinsch, N. (2026). PyMSQ: a Python package for fast Mendelian sampling (co)variance and haplotype-based similarity in genomic selection. *BMC Bioinformatics*, https://doi.org/10.1186/s12859-026-06392-5.

2. Musa, A. A., & Reinsch, N. (2025). A similarity matrix for hedging haplotype diversity among parents in genomic selection. *Journal of Animal Breeding and Genetics*, https://doi.org/10.1111/jbg.12930.

3. Zenodo (software version v0.1.2): https://doi.org/10.5281/zenodo.18643470

## Funding
This study was supported by the Bundesanstalt für Landwirtschaft und Ernährung (BLE) under Grant 281B101516.



## Getting Help
- **Issues & Feature Requests**
If you encounter bugs, have feature requests, or need additional clarification, please open an issue on the [`PyMSQ GitHub repository`](https://github.com/aromemusa/PyMSQ).

- **License**
PyMSQ is released under the MIT License, allowing both academic and commercial use.



**Happy analyzing!**
We hope PyMSQ supports your work in breeding, helping you balance short-term genetic gains with the long-term preservation of essential haplotype diversity.

