Metadata-Version: 2.4
Name: dmeth
Version: 0.2.0
Summary: dmeth: A toolkit for comprehensive, transparent, and reproducible DNA methylation analysis
Author-email: Dare Afolabi <dare.afolabi@outlook.com>
License: MIT License
        
        Copyright (c) 2025 Dare Afolabi
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Keywords: dna-methylation,epigenetics,differential-methylation,bioinformatics,limma,biomarker-discovery,cell-type-deconvolution
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Typing :: Typed
Requires-Python: <3.13,>=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy<3.0,>=1.23
Requires-Dist: pandas<3.0,>=1.5
Requires-Dist: scipy<2.0,>=1.9
Requires-Dist: statsmodels<1.0,>=0.14
Requires-Dist: pydantic<3.0,>=1.10
Requires-Dist: matplotlib<4.0,>=3.7
Requires-Dist: seaborn<1.0,>=0.12
Requires-Dist: tqdm<5.0,>=4.65
Requires-Dist: scikit-learn<2.0,>=1.2
Requires-Dist: jsonschema<5.0,>=4.17
Requires-Dist: importlib-resources<7.0,>=6.4
Provides-Extra: speed
Requires-Dist: combat<1.0,>=0.3; extra == "speed"
Requires-Dist: numba<1.0,>=0.57; extra == "speed"
Provides-Extra: annotation
Requires-Dist: intervaltree<4.0,>=3.1; extra == "annotation"
Requires-Dist: pyliftover<1.0,>=0.4; extra == "annotation"
Provides-Extra: parallel
Requires-Dist: joblib<2.0,>=1.2; extra == "parallel"
Provides-Extra: data-formats
Requires-Dist: PyYAML<7.0,>=6.0; extra == "data-formats"
Requires-Dist: toml<1.0,>=0.10; extra == "data-formats"
Requires-Dist: xlrd<3.0,>=2.0; extra == "data-formats"
Requires-Dist: h5py<4.0,>=3.8; extra == "data-formats"
Provides-Extra: plotting
Requires-Dist: plotly<7.0,>=5.15; extra == "plotting"
Requires-Dist: umap-learn<1.0,>=0.5.3; extra == "plotting"
Provides-Extra: io
Requires-Dist: pyarrow<23.0,>=14; extra == "io"
Requires-Dist: tables<4.0,>=3.9; extra == "io"
Requires-Dist: openpyxl<4.0,>=3.1; extra == "io"
Requires-Dist: xlsxwriter<4.0,>=3.0; extra == "io"
Provides-Extra: full
Requires-Dist: dmeth[speed]; extra == "full"
Requires-Dist: dmeth[annotation]; extra == "full"
Requires-Dist: dmeth[parallel]; extra == "full"
Requires-Dist: dmeth[data-formats]; extra == "full"
Requires-Dist: dmeth[plotting]; extra == "full"
Requires-Dist: dmeth[io]; extra == "full"
Provides-Extra: dev
Requires-Dist: pytest<10.0,>=7.3; extra == "dev"
Requires-Dist: pytest-cov<8.0,>=4.1; extra == "dev"
Requires-Dist: black<26.0,>=23.3; extra == "dev"
Requires-Dist: isort<8.0,>=5.12; extra == "dev"
Requires-Dist: flake8<8.0,>=7.0; extra == "dev"
Requires-Dist: flake8-pyproject<2.0,>=1.2; extra == "dev"
Requires-Dist: flake8-bugbear<26.0,>=23.0; extra == "dev"
Requires-Dist: bandit<2.0,>=1.7; extra == "dev"
Requires-Dist: mypy<2.0,>=1.5; extra == "dev"
Requires-Dist: mkdocs<2.0,>=1.5; extra == "dev"
Requires-Dist: mkdocs-material<10.0,>=9.5; extra == "dev"
Dynamic: license-file

# `dmeth`: Differential Methylation Analysis Toolkit

<div align="center">
  <a href="https://codecov.io/gh/dare-afolabi/dmeth">
    <img src="https://img.shields.io/codecov/c/github/dare-afolabi/dmeth?style=flat" alt="Coverage">
  </a>
  <a href="https://www.python.org/downloads/">
    <img src="https://img.shields.io/badge/python-3.9+-blue.svg" alt="Python 3.9+">
  </a>
  <a href="https://badge.fury.io/py/dmeth">
    <img src="https://badge.fury.io/py/dmeth.svg" alt="PyPI version">
  </a>
  <a href="https://github.com/sponsors/dare-afolabi">
    <img src="https://img.shields.io/badge/Sponsor-grey?style=flat&logo=github-sponsors" alt="Sponsor">
  </a>
</div>


A fast, statistically rigorous Python framework providing a toolkit for DNA methylation analysis - from raw beta matrices to biomarkers and functional interpretation. **`dmeth`** implements the full modern differential methylation pipeline used in high-impact epigenome-wide association studies (EWAS), with performance and correctness on par with established R/bioconductor tools, all in pure Python.

## Key Features

| Feature                                | Implementation                                      | Performance |
|----------------------------------------|------------------------------------------------------|-------------|
| Empirical Bayes moderated t-tests      | limma-style (Smyth 2004) with exact replication     | Numba-accelerated (10–100× faster) |
| Memory-efficient chunked analysis      | Automatic fallback for >1M probes                   | <4 GB RAM typical |
| Cell-type deconvolution               | Reference-based NNLS (Houseman/Horvath-style)       | Parallel joblib |
| DMR discovery                          | Sliding-window clustering + gap merging             | Vectorized |
| Gene annotation & pathway enrichment   | IntervalTree + Fisher’s exact (FDR)                 | Sub-second on 450k/EPIC |
| Coordinate liftover (hg19 ↔ hg38)     | pyliftover integration                              | Per-region tracking |
| Biomarker panel discovery & validation | RF / Elastic Net + stratified CV                    | Built-in |
| Robust preprocessing & QC              | Missingness, group representation, imputation       | Production-safe |

Fully supports **Illumina 450K**, **EPIC (850K)**, and any custom CpG × sample matrix.

## Quick Start

```bash
pip install "dmeth[full]"
```

```python
import pandas as pd

from dmeth.io.readers import load_methylation_data
from dmeth.core.analysis.preparation import filter_cpgs_by_missingness, impute_missing_values
from dmeth.core.analysis.validation import build_design, validate_contrast
from dmeth.core.analysis.core_analysis import fit_differential
from dmeth.core.downstream.annotation import find_dmrs_by_sliding_window

# 1. Load your data
# beta: CpG x samples matrix
# pheno: sample metadata with a 'group' column
beta = pd.read_csv("beta_matrix.csv", index_col=0)
pheno = pd.read_csv("phenotype.csv", index_col=0)

# 2. Preprocessing
# Drop CpGs with too much missingness
beta_clean, _, _ = filter_cpgs_by_missingness(beta, max_missing_rate=0.2)

# Impute remaining missing values (kNN)
beta_imp = impute_missing_values(beta_clean, method="knn", k=10)

# 3. Differential analysis (case vs control)
# Build design matrix from phenotype
design = validate_design(pheno["group"])
contrast = validate_contrast(design, "case-control")

# Fit
res = fit_differential(
    M=beta_imp,
    design=pd.DataFrame(design, index=beta_imp.columns),
    contrast=contrast,
    shrink="smyth",
    robust=True,
)

# 4. Discover DMRs
annotation = pd.read_csv("cpg_annotation.csv", index_col=0)  # must include chr, pos columns
dmrs = find_dmrs_by_sliding_window(
    dms=res[res["padj"] < 0.05],
    annotation=annotation,
    max_gap=500,
    min_cpgs=3,
)

print(f"Found {len(dmrs)} DMRs")
print(dmrs.head())
```

## Installation

```bash
# Minimal (no speed, annotation, and other extras)
pip install dmeth

# Recommended: full scientific environment
pip install "dmeth[full]"

# Development
pip install "dmeth[full,dev]"
```

Optional extras (dmeth\[full]):

- **speed**: numba, combat (highly recommended)
- **annotation**: intervaltree, pyliftover
- **parallel**: joblib
- **format**: PyYAML, toml, h5py, xlrd
- **plotting**: plotly, umap-learn
- **io**: pyarrow, tables, openpyxl, xlsxwriter

Optional dev extras (dmeth\[dev]):

pytest, pytest-cov, black, isort, flake8, flake8-pyproject, flake8-bugbear, bandit, mkdocs, mkdocs-material

## Documentation

Full documentation with tutorials, API reference, and reproducibility examples:
[User Guide](https://github.com/dare-afolabi/dmeth/blob/main/docs/UserGuide.md)

## Citation

If you use `dmeth` in your research, please cite:

```bibtex
@software{dmeth2025,
  author = {Afolabi, Dare},
  title = {dmeth: A comprehensive Python toolkit for differential DNA methylation analysis with empirical Bayes moderation and biomarker discovery},
  version = {0.2.0},
  year = {2025},
  publisher = {GitHub},
  doi = {10.5281/zenodo.17777501},
  url = {https://doi.org/10.5281/zenodo.17777501},
}
```

### References

- Smyth, G. K. (2004). Linear models and empirical bayes methods for assessing differential expression in microarray experiments. *Statistical Applications in Genetics and Molecular Biology*, 3(1).
- Liu, P., & Hwang, J.T.G. (2007). Quick calculation for sample size while controlling false discovery rate with application to microarray analysis. *Bioinformatics*, 23(6), 739–746.
- Du, P., Zhang, X., Huang, C.-C., Jafari, N., Kibbe, W.A., Hou, L., & Lin, S. (2010). Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. *BMC Bioinformatics*, 11:587.
- Jung, S.H., Young, S.S. (2012). Power and sample size calculation for microarray studies. *Journal of Biopharmaceutical Statistics*, 22(1):30-42.
- Phipson, B. et al. (2016). missMethyl: an R package for analyzing data from Illumina’s HumanMethylation450 platform. *Bioinformatics*, 32(2), 286-288.

## Support

- **Issues**: [GitHub Issues](https://github.com/dare-afolabi/dmeth/issues)
- **Discussions**: [GitHub Discussions](https://github.com/dare-afolabi/dmeth/discussions/1)
- **Email**: [dare.afolabi@outlook.com](mailto:dare.afolabi@outlook.com)
ook.com)
