Metadata-Version: 2.4
Name: haplophaser
Version: 0.1.1
Summary: Polyploid Haplotype Analysis for Sequenced Eukaryotic References - A comprehensive toolkit for haplotype analysis in complex genomes
Project-URL: Homepage, https://github.com/aseetharam/haplophaser
Project-URL: Documentation, https://github.com/aseetharam/haplophaser#readme
Project-URL: Repository, https://github.com/aseetharam/haplophaser
Project-URL: Issues, https://github.com/aseetharam/haplophaser/issues
Project-URL: Changelog, https://github.com/aseetharam/haplophaser/blob/main/CHANGELOG.md
Author-email: Arun Seetharam <arnstrm@iastate.edu>
Maintainer-email: Arun Seetharam <arnstrm@iastate.edu>
License-Expression: MIT
License-File: LICENSE
Keywords: bioinformatics,expression-bias,genetics,genomics,haplotype,homeolog,maize,paleopolyploid,polyploid,subgenome,wheat
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: chromoplot>=0.1.0
Requires-Dist: cyvcf2>=0.30.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: rich>=13.0.0
Requires-Dist: typer>=0.9.0
Provides-Extra: alignment
Requires-Dist: pysam>=0.21.0; extra == 'alignment'
Provides-Extra: all
Requires-Dist: matplotlib>=3.7.0; extra == 'all'
Requires-Dist: pandas>=2.0.0; extra == 'all'
Requires-Dist: pysam>=0.21.0; extra == 'all'
Requires-Dist: scipy>=1.10.0; extra == 'all'
Provides-Extra: dev
Requires-Dist: mypy>=1.0.0; extra == 'dev'
Requires-Dist: pandas>=2.0.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
Requires-Dist: pytest-timeout>=2.0.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Requires-Dist: scipy>=1.10.0; extra == 'dev'
Provides-Extra: docs
Requires-Dist: mkdocs-material>=9.0.0; extra == 'docs'
Requires-Dist: mkdocs>=1.5.0; extra == 'docs'
Requires-Dist: mkdocstrings[python]>=0.24.0; extra == 'docs'
Provides-Extra: stats
Requires-Dist: pandas>=2.0.0; extra == 'stats'
Requires-Dist: scipy>=1.10.0; extra == 'stats'
Provides-Extra: viz
Requires-Dist: matplotlib>=3.7.0; extra == 'viz'
Description-Content-Type: text/markdown

# 🧬 Haplophaser

**Haplotype analysis toolkit for complex genomes with full polyploid support.**

Haplophaser analyzes haplotype inheritance patterns in derived lines relative to founder/source populations. Designed from the ground up for polyploid genomes, from diploids through hexaploids and beyond.

## Features

- **Haplotype Proportion Estimation**: Calculate what fraction of a sample's genome derives from each founder population
- **Chromosome Painting**: Paint genomic regions by haplotype origin using Hidden Markov Models
- **Chimeric Contig Detection**: Identify potential misassemblies through haplotype switches
- **Linkage-Informed Scaffolding**: Order and orient scaffolds using haplotype phase information
- **Full Polyploid Support**: First-class support for diploid, autopolyploid, and allopolyploid genomes

## Installation

### From PyPI

```bash
pip install haplophaser
```

### Development Installation

```bash
# Clone the repository
git clone https://github.com/aseetharam/haplophaser.git
cd haplophaser

# Create a virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install in development mode with dev dependencies
pip install -e ".[dev]"
```

### Dependencies

Core dependencies:
- Python 3.10+
- NumPy
- Pydantic v2
- cyvcf2
- PyYAML
- Typer

## Quick Start

### Basic Usage

```bash
# Estimate haplotype proportions
haplophaser proportion variants.vcf.gz -p populations.tsv -o results/

# Paint chromosomes by haplotype origin
haplophaser paint variants.vcf.gz -p populations.tsv -o painted/

# Order scaffolds using linkage
haplophaser scaffold scaffolds.vcf.gz -p populations.tsv -g genetic_map.tsv

# Run quality control checks
haplophaser qc variants.vcf.gz -p populations.tsv
```

### Population File Format

Haplophaser uses TSV or YAML files to define population structure:

**TSV format** (`populations.tsv`):
```
sample	population	role	ploidy
B73	NAM_founders	founder	2
Mo17	NAM_founders	founder	2
W22	NAM_founders	founder	2
RIL_001	NAM_RILs	derived	2
RIL_002	NAM_RILs	derived	2
```

**YAML format** (`populations.yaml`):
```yaml
populations:
  - name: NAM_founders
    role: founder
    ploidy: 2
    samples:
      - B73
      - Mo17
      - W22

  - name: NAM_RILs
    role: derived
    ploidy: 2
    samples:
      - RIL_001
      - RIL_002
```

### Polyploid Examples

For polyploid species, define subgenomes in YAML:

```yaml
populations:
  - name: wheat_founders
    role: founder
    ploidy: 6
    subgenomes:
      - name: A
        ploidy: 2
      - name: B
        ploidy: 2
      - name: D
        ploidy: 2
    samples:
      - Chinese_Spring
      - Jagger
```

### Configuration

Generate a configuration template:

```bash
haplophaser init-config -o haplophaser.yaml
```

Then customize and use:

```bash
haplophaser proportion variants.vcf.gz -p populations.tsv -c haplophaser.yaml
```

## Python API

```python
from haplophaser import Sample, Population, PopulationRole
from haplophaser.core.models import make_hexaploid_sample
from haplophaser.io import load_populations_yaml, VCFReader

# Create samples programmatically
b73 = Sample(name="B73", ploidy=2, population="founders")

# Create polyploid samples
wheat = make_hexaploid_sample("Chinese_Spring", ("A", "B", "D"), "founders")

# Load populations from file
populations = load_populations_yaml("populations.yaml")

# Read VCF files
with VCFReader("variants.vcf.gz") as reader:
    for variant in reader.fetch("chr1", 0, 1_000_000):
        print(f"{variant.chrom}:{variant.pos} {variant.ref}>{variant.alt}")
```

## Coordinate System

Haplophaser uses **0-based, half-open intervals** (BED-style) internally:
- Position 0 is the first base
- Intervals are `[start, end)` — start is included, end is excluded

Conversion to/from 1-based systems (VCF, GFF) happens automatically during I/O.

## Development

### Running Tests

```bash
# Run all tests
pytest

# Run with coverage
pytest --cov=haplophaser --cov-report=html

# Run specific test file
pytest tests/test_models.py
```

### Code Quality

```bash
# Lint and format check
ruff check src tests

# Format code
ruff format src tests

# Type checking
mypy src
```

### Project Structure

```
haplophaser/
├── pyproject.toml          # Package configuration
├── README.md
├── src/
│   └── haplophaser/
│       ├── __init__.py     # Package exports
│       ├── core/
│       │   ├── models.py   # Data models (Sample, Variant, etc.)
│       │   └── config.py   # Configuration system
│       ├── io/
│       │   ├── vcf.py      # VCF reading
│       │   └── populations.py  # Population file I/O
│       └── cli/
│           └── main.py     # CLI commands
├── tests/
│   ├── conftest.py         # Test fixtures
│   ├── test_models.py
│   ├── test_config.py
│   └── test_populations.py
└── docs/
```

## Roadmap

- [x] Core data models with polyploid support
- [x] Configuration system
- [x] Population file I/O
- [x] CLI skeleton
- [x] VCF reading implementation
- [x] Window-based analysis
- [x] HMM-based haplotype inference
- [x] Chromosome painting
- [x] Proportion estimation
- [x] Scaffold ordering
- [x] Integration with chromoplot for visualization
- [x] Expression bias analysis
- [x] Subgenome dominance testing

## Citation

If you use Haplophaser in your research, please cite:

> Haplophaser: Haplotype analysis toolkit for complex genomes. (in preparation)

## License

MIT License - see LICENSE file for details.

## Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
