Metadata-Version: 2.4
Name: pathway-subtyping
Version: 0.1.0
Summary: A disease-agnostic framework for identifying molecular subtypes through pathway-based analysis of rare genetic variants
Author-email: Rohit Chauhan <info@topmist.com>
Maintainer-email: Rohit Chauhan <info@topmist.com>
License: MIT
Project-URL: Homepage, https://github.com/topmist-admin/pathway-subtyping-framework
Project-URL: Documentation, https://github.com/topmist-admin/pathway-subtyping-framework#readme
Project-URL: Repository, https://github.com/topmist-admin/pathway-subtyping-framework.git
Project-URL: Issues, https://github.com/topmist-admin/pathway-subtyping-framework/issues
Project-URL: Changelog, https://github.com/topmist-admin/pathway-subtyping-framework/releases
Keywords: genomics,pathway-analysis,molecular-subtypes,clustering,rare-variants,bioinformatics,autism,neurogenetics
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Healthcare Industry
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy<2.0.0,>=1.24.0
Requires-Dist: pandas<3.0.0,>=2.0.0
Requires-Dist: scikit-learn<2.0.0,>=1.3.0
Requires-Dist: scipy<2.0.0,>=1.11.0
Requires-Dist: pysam<1.0.0,>=0.21.0
Requires-Dist: pyyaml<7.0,>=6.0
Requires-Dist: click<9.0.0,>=8.0.0
Requires-Dist: matplotlib<4.0.0,>=3.7.0
Requires-Dist: seaborn<1.0.0,>=0.12.0
Requires-Dist: jinja2<4.0.0,>=3.1.0
Requires-Dist: tqdm<5.0.0,>=4.65.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: isort>=5.12.0; extra == "dev"
Requires-Dist: flake8>=6.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: pre-commit>=3.0.0; extra == "dev"
Provides-Extra: notebooks
Requires-Dist: jupyter>=1.0.0; extra == "notebooks"
Requires-Dist: ipykernel>=6.0.0; extra == "notebooks"
Requires-Dist: nbformat>=5.0.0; extra == "notebooks"
Provides-Extra: docs
Requires-Dist: sphinx>=6.0.0; extra == "docs"
Requires-Dist: sphinx-rtd-theme>=1.2.0; extra == "docs"
Requires-Dist: myst-parser>=1.0.0; extra == "docs"
Provides-Extra: all
Requires-Dist: pathway-subtyping[dev,docs,notebooks]; extra == "all"
Dynamic: license-file

# Pathway Subtyping Framework

**A Disease-Agnostic Tool for Pathway-Based Molecular Subtype Discovery**

[![CI](https://github.com/topmist-admin/pathway-subtyping-framework/actions/workflows/ci.yml/badge.svg)](https://github.com/topmist-admin/pathway-subtyping-framework/actions/workflows/ci.yml)
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

---

## Overview

The **Pathway Subtyping Framework** is an open-source computational tool for identifying molecular subtypes in genetically heterogeneous diseases. Instead of analyzing individual genes, it aggregates rare variant burden at the **biological pathway level**, enabling:

- Better signal detection across genetically diverse cohorts
- Identification of biologically coherent patient subgroups
- Cross-cohort validation of discovered subtypes

Originally developed for [autism research](https://github.com/topmist-admin/autism-pathway-framework), this generalized version can be adapted for any disease with:
- Genetic heterogeneity (many implicated genes)
- Convergent pathway biology
- Available exome/genome sequencing data

## Supported Disease Areas

| Disease | Status | Pathway File |
|---------|--------|--------------|
| Autism Spectrum Disorder | Validated | `autism_pathways.gmt` |
| Schizophrenia | Template | `schizophrenia_pathways.gmt` |
| Epilepsy | Template | `epilepsy_pathways.gmt` |
| Intellectual Disability | Template | `intellectual_disability_pathways.gmt` |
| *Your disease* | [Adapt it →](docs/guides/adapting-for-your-disease.md) | `your_pathways.gmt` |

## Key Features

| Feature | Description |
|---------|-------------|
| **Pathway Scoring** | Aggregate gene burdens across biological pathways |
| **Subtype Discovery** | GMM clustering with automatic model selection (BIC) |
| **Validation Gates** | Negative controls + bootstrap stability testing |
| **Reproducibility** | Deterministic execution, pinned dependencies, Docker |
| **Config-Driven** | YAML configuration for all parameters |

## Quick Start

### Installation

```bash
# Clone the repository
git clone https://github.com/topmist-admin/pathway-subtyping-framework
cd pathway-subtyping-framework

# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # or `venv\Scripts\activate` on Windows

# Install the package
pip install -e .

# Verify installation
psf --version
```

### Run with Sample Data

```bash
# Run the pipeline with synthetic test data
psf --config configs/test_synthetic.yaml

# View results
cat outputs/synthetic_test/report.md
```

### Run with Your Data

```bash
# Copy and customize a config
cp configs/example_autism.yaml configs/my_analysis.yaml

# Edit paths in my_analysis.yaml, then run
psf --config configs/my_analysis.yaml
```

### Try in Browser (No Installation)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/topmist-admin/pathway-subtyping-framework/blob/main/examples/notebooks/01_getting_started.ipynb)

### Docker

```bash
# Run pipeline
docker-compose run pipeline

# Run tests
docker-compose run test

# Start Jupyter notebook
docker-compose up jupyter
# Open http://localhost:8888
```

## Adapting for Your Disease

1. **Create a pathway GMT file** with disease-relevant gene sets
2. **Copy an example config** and point to your data
3. **Run the pipeline** — validation gates will tell you if subtypes are meaningful

See the full guide: [Adapting for Your Disease](docs/guides/adapting-for-your-disease.md)

## How It Works

```
VCF Input → Variant Filter → Gene Burden → Pathway Aggregation → GMM Clustering → Validation → Report
```

### 1. Pathway Scoring
Rare damaging variants are aggregated into pathway-level disruption scores:
- Loss-of-function variants weighted higher
- Missense variants weighted by CADD score
- Scores normalized across samples

### 2. Subtype Discovery
Gaussian Mixture Model clustering identifies patient subgroups:
- Automatic cluster selection via BIC
- Configurable cluster range (default: 2-8)

### 3. Validation Gates
Built-in tests prevent overfitting:
- **Label shuffle**: Randomized labels should NOT cluster (ARI < 0.15)
- **Random genes**: Fake pathways should NOT work (ARI < 0.15)
- **Bootstrap**: Clusters should be stable under resampling (ARI > 0.8)

## Data Requirements

| Input | Format | Notes |
|-------|--------|-------|
| Variants | VCF | Annotated with gene symbols, consequences |
| Phenotypes | CSV | Sample IDs + clinical features |
| Pathways | GMT | Gene sets for your disease |

**Your data stays on your infrastructure.** The framework runs locally or in your cloud environment.

## Project Structure

```
pathway-subtyping-framework/
├── src/pathway_subtyping/     # Core Python package
├── configs/                   # Example YAML configurations
│   ├── example_autism.yaml
│   ├── example_schizophrenia.yaml
│   ├── test_synthetic.yaml    # Ready-to-run test config
│   └── example_epilepsy.yaml
├── data/
│   ├── pathways/              # Pathway GMT files
│   └── sample/                # Synthetic test data
├── docs/guides/               # Documentation
│   ├── adapting-for-your-disease.md
│   ├── pathway-curation-guide.md
│   └── validation-gates.md
├── examples/notebooks/        # Jupyter tutorials
├── tests/                     # Test suite (64 tests)
├── Dockerfile                 # Container support
└── docker-compose.yml         # Easy orchestration
```

## Development

```bash
# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/ -v

# Run linting
black src/ tests/
isort src/ tests/
flake8 src/ tests/

# Set up pre-commit hooks
pre-commit install
```

## Related Projects

- **[Autism Pathway Framework](https://github.com/topmist-admin/autism-pathway-framework)** — The original autism-focused implementation with SFARI cohort validation

## Contributing

Contributions welcome! Areas where help is needed:
- Additional disease pathway definitions
- Performance optimization for large cohorts
- Documentation and tutorials

See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

## Citation

If you use this framework, please cite:

```
Chauhan R. Pathway Subtyping Framework. GitHub. 2026.
https://github.com/topmist-admin/pathway-subtyping-framework
```

For autism-specific work, also cite:
```
Chauhan R. Autism Pathway Framework. Zenodo. 2026.
DOI: 10.5281/zenodo.18403844
```

## License

MIT License — see [LICENSE](LICENSE) for details.

## Contact

**Rohit Chauhan**
- Email: info@topmist.com
- GitHub: [@topmist-admin](https://github.com/topmist-admin)

---

> **RESEARCH USE ONLY** — This framework is for hypothesis generation. Not for clinical diagnosis or treatment decisions.
