Metadata-Version: 2.4
Name: uQCme
Version: 0.1.0
Summary: Microbial Quality Control Dashboard
Author-email: SSI-DK <kimn@ssi.dk>
Project-URL: Repository, https://github.com/ssi-dk/uQCme
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: streamlit>=1.28.0
Requires-Dist: pandas>=1.5.0
Requires-Dist: plotly>=5.15.0
Requires-Dist: PyYAML>=6.0
Requires-Dist: openpyxl>=3.0.0
Dynamic: license-file

# uQCme - Microbial Quality Control Tool

![Version](https://img.shields.io/badge/version-0.1.0-blue.svg)
![Python](https://img.shields.io/badge/python-3.8+-green.svg)
![License](https://img.shields.io/badge/license-MIT-blue.svg)

A comprehensive quality control (QC) tool for microbial sequencing data that provides both command-line processing and interactive web-based visualization capabilities.

## Overview

uQCme consists of two main components:

1.  **CLI Tool (`uqcme`)**: A command-line interface that processes microbial sequencing QC data against configurable quality control rules. It determines QC outcomes (PASS/FAIL/WARNING) based on species-specific criteria.
2.  **Web Dashboard (`uqcme-dashboard`)**: An interactive Streamlit application for visualizing and exploring the QC results generated by the CLI tool.

## Features

### Core Functionality
-   **Species-specific QC rules**: Support for numerous microbial species with tailored quality control criteria defined in `QC_rules.tsv`.
-   **Configurable QC tests**: Define custom QC outcomes with priority-based rule conditions.
-   **Flexible rule engine**: Regex-based validation with threshold checks for various QC metrics.
-   **Interactive dashboard**: Web-based visualization with filtering, sorting, and detailed sample exploration.
-   **Comprehensive logging**: Detailed logging system with both file and console output.

### Supported Species
The tool handles specific QC rules for the following species (as defined in the default `QC_rules.tsv`):

-   *Acinetobacter baumannii*
-   *Campylobacter coli*
-   *Campylobacter jejuni*
-   *Enterobacter spp.*
-   *Enterococcus faecium*
-   *Escherichia coli*
-   *Haemophilus influenzae*
-   *Helicobacter pylori*
-   *Klebsiella pneumoniae*
-   *Mycoplasma genitalium*
-   *Neisseria gonorrhoeae*
-   *Pseudomonas aeruginosa*
-   *Salmonella enterica*
-   *Shigella flexneri*
-   *Shigella sonnei*
-   *Staphylococcus aureus*
-   *Streptococcus pneumoniae*

*Note: "all" rules apply to any species not explicitly listed or as general baseline checks.*

### QC Metrics
-   Assembly statistics (N50, contigs, genome size)
-   CheckM completeness and contamination
-   Species identification validation
-   Coverage depth analysis
-   Quality score assessments

## Installation

uQCme is a Python package. You can install it directly from the source:

```bash
git clone https://github.com/ssi-dk/uQCme.git
cd uQCme
pip install .
```

## Usage

### 1. CLI Tool (`uqcme`)

The CLI tool reads your sequencing run data and applies the QC rules defined in your configuration.

**Command:**
```bash
uqcme --config config.yaml
```

**What it does:**
1.  Loads run data (from file or API) as specified in `config.yaml`.
2.  Loads QC rules (`QC_rules.tsv`) and QC tests (`QC_tests.tsv`).
3.  Evaluates each sample against the rules for its species.
4.  Determines the final QC outcome (e.g., PASS, FAIL).
5.  Outputs a new TSV file containing the original data plus the QC results (e.g., `uQCme_run_data.tsv`).

### 2. Web Dashboard (`uqcme-dashboard`)

The dashboard visualizes the results generated by the CLI tool.

**Command:**
```bash
uqcme-dashboard --config config.yaml
```

**What it does:**
1.  Launches a local web server (Streamlit).
2.  Loads the processed data (from file or API) as specified in `config.yaml`.
3.  Provides an interactive interface to:
    *   View summary statistics.
    *   Filter samples by QC outcome, species, or specific metrics.
    *   Inspect individual sample details and failed rules.
    *   Visualize metric distributions.

## Configuration

The tool is driven by a `config.yaml` file. This file defines:

*   **Input paths**: Locations of your data, mapping file, and QC rules/tests.
*   **Output paths**: Where to save the results and logs.
*   **App settings**: Dashboard configuration (server port, UI preferences).

**Key Input Files:**
*   `run_data.tsv`: Your raw sequencing metrics.
*   `mapping.yaml`: Maps your data columns to the tool's internal field names.
*   `QC_rules.tsv`: Defines the specific thresholds and checks for each species.
*   `QC_tests.tsv`: Defines how rule failures translate into overall QC outcomes.

See the `input/example/` directory for template files.

## Output Files

### 1. QC Results (`uQCme_run_data.tsv`)
Enhanced run data with QC outcomes:
- Original sample data
- Failed rules per sample
- Assigned QC outcome with priority
- Color coding for visualization

### 2. Rule Warnings (`uQCme_rule_warnings.tsv`)
Detailed log of rule evaluation issues:
- Skipped rules and reasons
- Data validation warnings
- Processing statistics

## Dashboard Features

### Data Overview
- Interactive data table with filtering and sorting
- Priority-based color coding of QC outcomes
- Summary statistics and sample counts

### Sample Details
- Detailed view of individual sample QC results
- Failed rules and thresholds
- Interactive metric exploration

### Visualization
- Plotly-based interactive charts
- Customizable metric comparisons
- Species-specific analysis views

### Filtering and Search
- Dynamic filtering by QC outcome, species, and metrics
- Search functionality across all data columns
- Export capabilities for filtered datasets

## Advanced Usage

### Custom QC Rules

Create custom QC rules by editing `QC_rules.tsv`:

```tsv
rule_id	species	qc_tool	qc_metric	validation_type	threshold	column_name
CUSTOM1	Escherichia coli	Assembly	N50	threshold	>=50000	n50
CUSTOM2	all	CheckM	Completeness	threshold	>=90	completeness
```

### Species-Specific Tests

Define new QC tests in `QC_tests.tsv`:

```tsv
outcome_id	outcome_name	description	priority	rule_conditions	action_required
FAIL_CUSTOM	Fail - Custom QC	Custom quality control failed	3	failed_rules_contain:CUSTOM1,CUSTOM2	reject
```

## Development

### Project Structure

```
uQCme/
├── src/
│   └── uQCme/
│       ├── __init__.py
│       ├── app.py          # Streamlit web dashboard
│       ├── cli.py          # CLI processing tool
│       ├── plot.py         # Plotting utilities
│       └── utils.py        # Shared utilities
├── config.yaml             # Configuration file
├── input/
│   └── example/            # Example input files
├── output/                 # Generated results
├── log/                    # Application logs
└── tests/                  # Unit tests
```

### Running Tests

```bash
python -m pytest tests/
```

### Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests for new functionality
5. Submit a pull request

## Troubleshooting

### Common Issues

1. **Missing input files**: Ensure all required input files exist and paths in `config.yaml` are correct
2. **Rule validation errors**: Check that QC rules reference valid column names in your data
3. **Dashboard not loading**: Verify Streamlit installation and port availability

### Logging

Check the log file (`./log/log.tsv`) for detailed processing information:

```bash
tail -f ./log/log.tsv
```

## Citation

If you use uQCme in your research, please cite:

```
uQCme: A Comprehensive Quality Control Tool for Microbial Sequencing Data
SSI-DK, 2025
```

## License

This project is licensed under the MIT License - see the LICENSE file for details.

## Support

For questions, issues, or feature requests:

- Create an issue on GitHub
- Contact: Kim Ng (kimn@ssi.dk)

## Changelog

See [CHANGELOG.md](CHANGELOG.md) for version history and updates.
