Metadata-Version: 2.4
Name: umierrorcorrect2
Version: 0.30.2
Summary: Pipeline for analyzing barcoded amplicon sequencing data with Unique Molecular Identifiers (UMI)
Project-URL: Homepage, https://github.com/sfilges/umierrorcorrect2
Project-URL: Documentation, https://github.com/sfilges/umierrorcorrect2/wiki
Project-URL: Repository, https://github.com/sfilges/umierrorcorrect2
Author-email: Stefan Filges <stefan.filges@pm.me>, Tobias Osterlund <tobias.osterlund@gu.se>
License-Expression: MIT
License-File: LICENSE.txt
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.10
Requires-Dist: loguru>=0.7.0
Requires-Dist: matplotlib
Requires-Dist: pydantic>=2.0.0
Requires-Dist: pysam>=0.8.4
Requires-Dist: scipy
Requires-Dist: typer>=0.9.0
Provides-Extra: dev
Requires-Dist: mypy; extra == 'dev'
Requires-Dist: pre-commit; extra == 'dev'
Requires-Dist: pytest; extra == 'dev'
Requires-Dist: pytest-cov; extra == 'dev'
Requires-Dist: ruff; extra == 'dev'
Provides-Extra: docs
Requires-Dist: sphinx; extra == 'docs'
Requires-Dist: sphinx-rtd-theme; extra == 'docs'
Provides-Extra: fast
Requires-Dist: numba>=0.57.0; extra == 'fast'
Description-Content-Type: text/markdown

# UMIErrorCorrect2

[![PyPI version](https://badge.fury.io/py/umierrorcorrect2.svg)](https://badge.fury.io/py/umierrorcorrect2)
[![CI](https://github.com/sfilges/umierrorcorrect2/actions/workflows/ci.yml/badge.svg)](https://github.com/sfilges/umierrorcorrect2/actions/workflows/ci.yml)
[![codecov](https://codecov.io/gh/sfilges/umierrorcorrect2/branch/master/graph/badge.svg?token=)](https://codecov.io/gh/sfilges/umierrorcorrect2)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)

A modern, high-performance pipeline for analyzing barcoded amplicon sequencing data with Unique Molecular Identifiers (UMI).

This package is a **complete modernization** of the original [UMIErrorCorrect](https://github.com/stahlberggroup/umierrorcorrect) published in *Clinical Chemistry* (2022).

## Key Features

- **High Performance**: Parallel processing of genomic regions and fastp-based preprocessing.
- **Modern Tooling**: Built with `typer`, `pydantic`, `loguru`, and `hatch`.
- **Easy Installation**: Fully PEP 621 compliant, installable via `pip` or `uv`.
- **Comprehensive**: From raw FASTQ to error-corrected VCFs and consensus statistics.
- **Robust**: Extensive test suite and type safety.

## Dependencies

- [fastp](https://github.com/OpenGene/fastp) for preprocessing
- [bwa](https://github.com/lh3/bwa) for alignment

## Installation

Use [uv](https://github.com/astral-sh/uv) for lightning-fast installation:

```bash
uv pip install umierrorcorrect2
```

Or standard pip:

```bash
pip install umierrorcorrect2
```

## Quick Start

The command-line tool is named `umierrorcorrect`. Run the full pipeline on a single sample:

```bash
umierrorcorrect batch \
    -r1 sample_R1.fastq.gz \
    -r2 sample_R2.fastq.gz \
    -r hg38.fa \
    -o results/ \
    -ul 12 \
    -sl 16 \
    --fastp
```

For detailed instructions, see the **[User Guide](docs/USER_GUIDE.md)** or run:

```bash
umierrorcorrect --help
```

## Documentation

- [User Guide](docs/USER_GUIDE.md): Detailed usage instructions for all commands.
- [Docker Guide](docs/DOCKER.md): Running with containers.
- [Implementation Details](docs/IMPLEMENTATION.md): Architecture and design overview.

## Citation

> Osterlund T., Filges S., Johansson G., Stahlberg A. *UMIErrorCorrect and UMIAnalyzer: Software for Consensus Read Generation, Error Correction, and Visualization Using Unique Molecular Identifiers*, Clinical Chemistry, 2022. [doi:10.1093/clinchem/hvac136](https://doi.org/10.1093/clinchem/hvac136)
