Metadata-Version: 2.4
Name: uQCme
Version: 0.8.1
Summary: Microbial Quality Control Dashboard
Project-URL: Repository, https://github.com/ssi-dk/uQCme
Author-email: Kim Ng <kimn@ssi.dk>
License-Expression: MIT
License-File: LICENSE
Keywords: dashboard,microbial,qc,quality-control,sequencing
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Requires-Python: <3.13,>=3.12
Requires-Dist: pandas>=1.5.0
Requires-Dist: pandera>=0.18.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: pyyaml>=6.0
Provides-Extra: all
Requires-Dist: openpyxl>=3.0.0; extra == 'all'
Requires-Dist: plotly>=5.15.0; extra == 'all'
Requires-Dist: streamlit>=1.28.0; extra == 'all'
Provides-Extra: app
Requires-Dist: openpyxl>=3.0.0; extra == 'app'
Requires-Dist: plotly>=5.15.0; extra == 'app'
Requires-Dist: streamlit>=1.28.0; extra == 'app'
Provides-Extra: cli
Description-Content-Type: text/markdown

# uQCme - Microbial Quality Control Tool

![Version](https://img.shields.io/badge/version-0.5.0-blue.svg)
![Python](https://img.shields.io/badge/python-3.8+-green.svg)
![License](https://img.shields.io/badge/license-MIT-blue.svg)

A comprehensive quality control (QC) tool for microbial sequencing data that provides both command-line processing and interactive web-based visualization capabilities.

## Overview

uQCme consists of two main components:

1.  **CLI Tool (`uqcme`)**: A command-line interface that processes microbial sequencing QC data against configurable quality control rules. It determines QC outcomes (PASS/FAIL/WARNING) based on species-specific criteria.
2.  **Web Dashboard (`uqcme-dashboard`)**: An interactive Streamlit application for visualizing and exploring the QC results generated by the CLI tool.

## Features

### Core Functionality
-   **Species-specific QC rules**: Support for numerous microbial species with tailored quality control criteria defined in `QC_rules.tsv`.
-   **Configurable QC tests**: Define custom QC outcomes with priority-based rule conditions.
-   **Flexible rule engine**: Regex-based validation with threshold checks for various QC metrics.
-   **Robust Validation**: Data validation using Pandera schemas and Pydantic configuration management.
-   **Interactive dashboard**: Web-based visualization with filtering, sorting, and detailed sample exploration.
-   **Comprehensive logging**: Detailed logging system with both file and console output.

### Supported Species
The tool handles specific QC rules for the following species (as defined in the default `QC_rules.tsv`):

-   *Acinetobacter baumannii*
-   *Campylobacter coli*
-   *Campylobacter jejuni*
-   *Enterobacter spp.*
-   *Enterococcus faecium*
-   *Escherichia coli*
-   *Haemophilus influenzae*
-   *Helicobacter pylori*
-   *Klebsiella pneumoniae*
-   *Mycoplasma genitalium*
-   *Neisseria gonorrhoeae*
-   *Pseudomonas aeruginosa*
-   *Salmonella enterica*
-   *Shigella flexneri*
-   *Shigella sonnei*
-   *Staphylococcus aureus*
-   *Streptococcus pneumoniae*

*Note: "all" rules apply to any species not explicitly listed or as general baseline checks.*

### QC Metrics
-   Assembly statistics (N50, contigs, genome size)
-   CheckM completeness and contamination
-   Species identification validation
-   Coverage depth analysis
-   Quality score assessments

## Installation

### From PyPI

**Core only (shared logic):**
```bash
pip install uqcme
```

**CLI only:**
```bash
pip install uqcme[cli]
```

**Dashboard/App only:**
```bash
pip install uqcme[app]
```

**Full installation (CLI + Web Dashboard):**
```bash
pip install uqcme[all]
```

### From Source

```bash
git clone https://github.com/ssi-dk/uQCme.git
cd uQCme

# Core only
pip install .

# Full installation
pip install ".[all]"
```

## Usage

### 1. CLI Tool (`uqcme`)

The CLI tool reads your sequencing run data and applies the QC rules defined in your configuration.

**Basic Usage (using defaults):**
```bash
uqcme
```
This will use the bundled default configuration and look for input files in the current directory as specified in the default config.

**Override Data Source:**
You can process a specific data file or API endpoint without creating a full config file:

```bash
# Process a local file
uqcme --file path/to/my_run_data.tsv

# Process data from an API
uqcme --api-call "https://api.example.com/runs/123"
```

**Custom Configuration:**
For full control over rules, tests, and mappings, provide a custom configuration file:

```bash
uqcme --config my_config.yaml
```

**What it does:**
1.  Loads run data (from file, API, or defaults).
2.  Loads QC rules (`QC_rules.tsv`) and QC tests (`QC_tests.tsv`).
    *   *Note: If local rule files are missing, it falls back to bundled defaults.*
3.  Evaluates each sample against the rules for its species.
4.  Determines the final QC outcome (e.g., PASS, FAIL).
5.  Outputs a new TSV file containing the original data plus the QC results (e.g., `uQCme_run_data.tsv`).

### 2. Web Dashboard (`uqcme-dashboard`)

The dashboard visualizes the results generated by the CLI tool.

![uqcme-dashboard](uqcme_interface.png)

**Command:**
```bash
uqcme-dashboard --config config.yaml
```

**What it does:**
1.  Launches a local web server (Streamlit).
2.  Loads the processed data (from file or API) as specified in `config.yaml`.
3.  Provides an interactive interface to:
    *   View summary statistics.
    *   Filter samples by QC outcome, species, or specific metrics.
    *   Inspect individual sample details and failed rules.
    *   Visualize metric distributions.

## Configuration

The tool is driven by a `config.yaml` file. This file defines:

*   **Input paths**: Locations of your data, mapping file, and QC rules/tests.
*   **Output paths**: Where to save the results and logs.
*   **App settings**: Dashboard configuration (server port, UI preferences).

**Key Input Files:**
*   `run_data.tsv`: Your raw sequencing metrics.
*   `mapping.yaml`: Maps your data columns to the tool's internal field names.
*   `QC_rules.tsv`: Defines the specific thresholds and checks for each species. (default values originally from https://www.pathogensurveillance.net/resources/quality/ and https://happykhan.github.io/qualibact/)
*   `QC_tests.tsv`: Defines how rule failures translate into overall QC outcomes.

See the `input/example/` directory for template files.

## Output Files

### 1. QC Results (`uQCme_run_data.tsv`)
Enhanced run data with QC outcomes:
- Original sample data
- Failed rules per sample
- Assigned QC outcome with priority
- Color coding for visualization

### 2. Rule Warnings (`uQCme_rule_warnings.tsv`)
Detailed log of rule evaluation issues:
- Skipped rules and reasons
- Data validation warnings
- Processing statistics

## Dashboard Features

### Data Overview
- Interactive data table with filtering and sorting
- Priority-based color coding of QC outcomes
- Summary statistics and sample counts

### Sample Details
- Detailed view of individual sample QC results
- Failed rules and thresholds
- Interactive metric exploration

### Visualization
- Plotly-based interactive charts
- Customizable metric comparisons
- Species-specific analysis views

### Filtering and Search
- Dynamic filtering by QC outcome, species, and metrics
- Search functionality across all data columns
- Export capabilities for filtered datasets

## Advanced Usage

### Custom QC Rules

Create custom QC rules by editing `QC_rules.tsv`:

```tsv
rule_id	species	qc_tool	qc_metric	validation_type	threshold	column_name
CUSTOM1	Escherichia coli	Assembly	N50	threshold	>=50000	n50
CUSTOM2	all	CheckM	Completeness	threshold	>=90	completeness
```

### Species-Specific Tests

Define new QC tests in `QC_tests.tsv`:

```tsv
outcome_id	outcome_name	description	priority	rule_conditions	action_required
FAIL_CUSTOM	Fail - Custom QC	Custom quality control failed	3	failed_rules_contain:CUSTOM1,CUSTOM2	reject
```

## Development

This project uses [pixi](https://prefix.dev/) for dependency management and development workflow.

To set up the development environment:

```bash
pixi install
```

To run tests:

```bash
pixi run pytest
```

### Project Structure

```
uQCme/
├── src/
│   └── uQCme/
│       ├── __init__.py
│       ├── app.py          # Streamlit web dashboard
│       ├── cli.py          # CLI processing tool
│       ├── plot.py         # Plotting utilities
│       └── utils.py        # Shared utilities
├── config.yaml             # Configuration file
├── input/
│   └── example/            # Example input files
├── output/                 # Generated results
├── log/                    # Application logs
└── tests/                  # Unit tests
```

### Running Tests

```bash
python -m pytest tests/
```

### Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests for new functionality
5. Submit a pull request

## Troubleshooting

### Common Issues

1. **Missing input files**: Ensure all required input files exist and paths in `config.yaml` are correct
2. **Rule validation errors**: Check that QC rules reference valid column names in your data
3. **Dashboard not loading**: Verify Streamlit installation and port availability

### Logging

Check the log file (`./log/log.tsv`) for detailed processing information:

```bash
tail -f ./log/log.tsv
```

## Citation

If you use uQCme in your research, please cite:

```
uQCme: A Comprehensive Quality Control Tool for Microbial Sequencing Data
SSI-DK, 2025
```

## License

This project is licensed under the MIT License - see the LICENSE file for details.

## Support

For questions, issues, or feature requests:

- Create an issue on GitHub
- Contact: Kim Ng (kimn@ssi.dk)

## Changelog

See [CHANGELOG.md](CHANGELOG.md) for version history and updates.
