Metadata-Version: 2.4
Name: qcatch
Version: 0.2.4
Summary: QCatch: Quality Control downstream of alevin-fry / simpleaf. 
Project-URL: Homepage, https://github.com/COMBINE-lab/QCatch
Project-URL: Source, https://github.com/COMBINE-lab/QCatch
Author-email: Yuan Gao <ygao61@umd.edu>, Dongze He <dhe17@umd.edu>, Rob Patro <rob@cs.umd.edu>
License: BSD 3-Clause License
        
        Copyright (c) 2025, COMBINE lab
        
        Redistribution and use in source and binary forms, with or without
        modification, are permitted provided that the following conditions are met:
        
        1. Redistributions of source code must retain the above copyright notice, this
           list of conditions and the following disclaimer.
        
        2. Redistributions in binary form must reproduce the above copyright notice,
           this list of conditions and the following disclaimer in the documentation
           and/or other materials provided with the distribution.
        
        3. Neither the name of the copyright holder nor the names of its
           contributors may be used to endorse or promote products derived from
           this software without specific prior written permission.
        
        THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
        AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
        IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
        DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
        FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
        DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
        SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
        CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
        OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
        OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
        
        
        Additional License Information
        ------------------------------
        This repository is primarily licensed under the BSD 3-Clause License.
        However, the module located at `find_retained_cells/` is licensed under the
        GNU Affero General Public License, version 2 (AGPLv2). The full text
        of the AGPLv2 license is included at `find_retained_cells/README.md`.
        
        If you modify or redistribute code from `find_retained_cells/`, you must comply
        with the terms of the AGPLv2.
License-File: LICENSE
Requires-Python: ==3.12.9
Requires-Dist: beautifulsoup4==4.13.3
Requires-Dist: numpy==2.1.3
Requires-Dist: pandas==2.2.3
Requires-Dist: plotly==6.0.0
Requires-Dist: pyroe==0.9.0
Requires-Dist: python-igraph==0.11.8
Requires-Dist: scanpy==1.10.4
Requires-Dist: scipy==1.15.2
Description-Content-Type: text/markdown

# QCatch
Quality Control downstream of alevin-fry / simpleaf.

**QCatch** is a Python package designed to streamline quality control for single-cell sequencing data quantified by [alevin-fry](https://github.com/COMBINE-lab/alevin-fry) or [simpleaf](https://github.com/COMBINE-lab/simpleaf). It provides a comprehensive web-based quality control report, enabling researchers to:

- Summarize key **quality metrics** for single-cell sequencing datasets.
- Perform **cell calling** to identify high-quality cells.
- Generate interactive **visualizations** to support downstream analysis and interpretation.

QCatch is built to simplify the quality control process, making it easier for researchers to assess data quality and make informed decisions for further analysis.

## Installation

### Bioconda
You can install using [Conda](http://anaconda.org/)
from [Bioconda](https://bioconda.github.io/).

```bash
conda install -c bioconda qcatch 
```

### PyPI
        
You can also install from [PyPI](https://pypi.org/project/qcatch/) using `pip`:

```bash
pip install qcatch
```

## Basic Usage
Provide the path to the parent folder for quantification results, or the direct path to a .h5ad file generated by `alevin-fry` or `simpleaf`. QCatch will automatically scan the input path, assess data quality, and generate an interactive HTML report that can be viewed directly in your browser.

```bash
qcatch \
    --input path/to/your/quantification/result \
    --output path/to/desired/QC/output/folder \ # if you want another folder for output
    --chemistry 10X_3p_v3 
    --save_filtered_h5ad

```

### Tips
**1- Input path:**

Provide either:

- the **path to the parent directory** containing quantification results, or
- the **direct path to a .h5ad file** generated by those tools.

QCatch will automatically detect the input type:
- If a **.h5ad file** is provided, QCatch will process it directly.
- If a **directory** is provided, QCatch will first look for an existing .h5ad file inside. If not found, it will fall back to processing the mtx-based quantification results.

See the example directory structures at the end of the Tips section for reference:

**2- Output path:**

If you do not want any modifications in your input folder/files, speaficy the output path, we will save any new results and QC HTML report there. 

**_By default_**, we will save QC report and any result in your input direcoty, therefore, the output path is not required. 
Specifically,
- If QCatch finds the `.h5ad` file from input path, it will modify the original `.h5ad` file in place by appending cell filtering results to `anndata.obs` and create a separate QC report in HTML in the input folder.
- For `mtx-based` results, QCatch will generate text files for the cell calling reuslts as well as the QC report in the input folder."

**3- Chemistry:**

We highly recommend specifying the chemistry used in your experiment. By default, QCatch will assume the settings for 10X 3' v2 and v3 chemistry. If you use custom chemistry that not listed in the predefined chemistry options. You can specify the `--n_partitions`.

**3- Gene gene mapping file:**

If you are using simpleaf v0.19.3 or later, the generated .h5ad file already includes gene names. In this case, you do not need to specify the --gene_id2name_file option.

To provide a 'gene id to name mapping' info, the file should be a **TSV** containing two columns—‘gene_id’ (e.g., ENSG00000284733) and ‘gene_name’ (e.g., OR4F29)— **without** header row. If not provided, the program will attempt to retrieve the mapping from a remote registry. If that lookup fails, mitochondria plots will not be displayed, but will not affect the QC report.

**4- Save filtered h5ad file:**

If you want to save filtered h5ad file separately, you can specify `--save_filtered_h5ad`, which is only applicable when QCatch detects the h5ad file as the input.

**5- Specify your desired cell list:**

If you want to use a specified list of valid cell barcodes, you can provide the file path with `--valid_cell_list`. QCatch will then skip the default cell calling step and use the supplied list instead. The updated .h5ad file will include only one additional column, 'is_retained_cells', containing boolean values based on the specified list.

**6- Skip clustering plots:**

To reduce runtime, you may enable the `--skip_umap_tsne` option to bypass dimensionality reduction and visualization steps.

**7- Debug-level message**

To get debug-level messages and more intermediate computation in cell calling step, you can specify `--verbose`

**8- Re-run QCatch on modified h5ad file**
If you re-run QCatch analysis on a modified `.h5ad` file (i.e., an `.h5ad` file with additional columns added for cell calling results), the existing cell calling-related columns will be removed and then replaced with new results. The new cell calling can be generated either through QCatch's internal method or based on a user-specified list of valid cell barcodes.

**Example directory structures:**

```bash
# simpleaf 
parent_quant_dir/
├── af_map/
├── af_quant/
│   ├── alevin/
│   │   ├── quants_mat_cols.txt
│   │   ├── quants_mat_rows.txt
│   │   ├── quants_mat.mtx
│   │   └── quants.h5ad (available if you use simpleaf after v0.19.3)
│   │   ...
│   ├── featureDump.txt
│   └── quant.json
└── simpleaf_quant_log.json

# alevin-fry
parent_quant_dir/
├── alevin/
│   ├── quants_mat_cols.txt
│   ├── quants_mat_rows.txt
│   └── quants_mat.mtx
├── featureDump.txt
└── quant.json

```
For more advanced options and usage details, see the sections below.

## FAQ
- For more details about how the metrics and plots were generated, refer to the [FAQ](./docs/faq.md)

## Command-Line Arguments

| Flag | Short | Type | Description |
|------|-------|------|-------------|
| `--input`  | `-i` | `str` (Required) | Path to the input directory containing the quantification output files or to the HDF5 file itself. |
| `--output` | `-o` | `str`(Required)  | Path to the output directory.|
| `--chemistry` | `-c` | `str`(Optional but recommend) | Specifies the chemistry used in the experiment, determining the range for the `empty_drops` step. **Options**: `'10X_3p_v2'`, `'10X_3p_v3'`, `'10X_3p_v4'`, `'10X_3p_LT'`,`'10X_5p_v3'`,`'10X_HT'`. **Default**: Will use the range for `'10X_3p_v2'` and `'10X_3p_v3'`. |
| `--save_filtered_h5ad` | `-s` | `flag` (Optional) |If enabled, `qcatch` will save a separate `.h5ad` file containing only the retained cells.|
| `--gene_id2name_file` | `-g` | `str` (Optional) |File provides a mapping from gene IDs to gene names. The file must be a TSV containing two columns—‘gene_id’ (e.g., ENSG00000284733) and ‘gene_name’ (e.g., OR4F29)—without a header row. If not provided, the program will attempt to retrieve the mapping from a remote registry. If that lookup fails, mitochondria plots will not be displayed.|
| `--valid_cell_list` | `-l` | `str` (Optional) |File provides a user-specified list of valid cell barcode. The file must be a TSV containing one column with cell barcodes without a header row. If provided, qcatch will skip the internal cell calling steps and and use the supplied list instead|
| `--n_partitions` | `-n` | `int` (Optional) | Number of partitions (max number of barcodes to consider for ambient estimation). Skip this step if you already specified `--chemistry`. Only use `--n_partitions` when your experiment uses a custom chemistry not listed in the predefined chemistry options.|
| `--skip_umap_tsne` | `-u` | `flag` (Optional) | If provided, skips generation of UMAP and t-SNE plots. |
| `--verbose` | `-b` | `flag` (Optional) | Enable verbose logging with debug-level messages. |
| `--version` | `-v` | `flag` (Optional) | Display the installed version of qcatch. |
