Metadata-Version: 2.4
Name: qcatch
Version: 0.2.6
Summary: QCatch: Quality Control downstream of alevin-fry / simpleaf.
Project-URL: Documentation, https://github.com/COMBINE-lab/QCatch#readme
Project-URL: Homepage, https://github.com/COMBINE-lab/QCatch
Project-URL: Source, https://github.com/COMBINE-lab/QCatch
Author: Yuan Gao, Dongze He, Rob Patro
License-File: LICENSE
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.11
Requires-Dist: anndata>=0.11.4
Requires-Dist: beautifulsoup4>=4.13.3
Requires-Dist: igraph<0.12,>=0.11
Requires-Dist: numpy<3,>=2.1.3
Requires-Dist: pandas<3,>=2.2.3
Requires-Dist: plotly>=6
Requires-Dist: requests>=2.32.4
Requires-Dist: scanpy<2,>=1.10.4
Requires-Dist: scipy<2,>=1.15.2
Requires-Dist: session-info2<0.2,>=0.1
Provides-Extra: dev
Requires-Dist: pre-commit; extra == 'dev'
Requires-Dist: twine>=4.0.2; extra == 'dev'
Provides-Extra: test
Requires-Dist: coverage; extra == 'test'
Requires-Dist: pytest; extra == 'test'
Requires-Dist: pytest-cov; extra == 'test'
Requires-Dist: setuptools; extra == 'test'
Description-Content-Type: text/markdown

# QCatch

[![PyPI version][badge-pypi]][pypi]
[![Tests][badge-tests]][tests]
[![Documentation][badge-docs]][documentation]

[badge-pypi]: https://img.shields.io/pypi/v/qcatch
[pypi]: https://pypi.org/project/qcatch/

[badge-tests]: https://img.shields.io/github/actions/workflow/status/COMBINE-lab/QCatch/test.yaml?branch=main
[tests]: https://github.com/COMBINE-lab/QCatch/actions/workflows/test.yaml

[badge-docs]: https://img.shields.io/badge/docs-online-blue
[documentation]: https://COMBINE-lab.github.io/QCatch

QCatch: Quality Control downstream of `alevin-fry` and `simpleaf`

View the complete [QCatch documentation](https://COMBINE-lab.github.io/QCatch) with interactive examples, FAQs, and detailed usage guides.

## Installation

You need to have Python 3.11 or 3.12 installed on your system.

There are several alternative options to install QCatch:


#### 1. Bioconda
You can install using [Conda](http://anaconda.org/)
from [Bioconda](https://bioconda.github.io/).

```bash
conda install -c bioconda qcatch
```

#### 2. PyPI

You can also install from [PyPI](https://pypi.org/project/qcatch/) using `pip`:

```bash
pip install qcatch
```

> Tips: If you run into environment issues, you can also use the provided Conda .yml file, which specifies the exact versions of all dependencies to ensure consistency.

```bash
conda env create -f qcatch_conda_env.yml
```


## Basic Usage
Provide the path to the parent folder for quantification results, or the direct path to a .h5ad file generated by `alevin-fry` or `simpleaf`. QCatch will automatically scan the input path, assess data quality, and generate an interactive HTML report that can be viewed directly in your browser.

```bash
qcatch \
    --input path/to/your/quantification/result \
    --output path/to/desired/QC/output/folder \ # if you want another folder for output
    --chemistry 10X_3p_v3
    --save_filtered_h5ad

```
## Tutorial: Run QCatch on Example data
### Step 1 — Download Dataset
```bash
#!/bin/bash
set -e  # Exit immediately if a command exits with a non-zero status

echo "📦 Downloading QCatch example dataset..."

# Define where to run the tutorial (you can change this path if desired)
CWD=$(pwd)  # Current working directory
TUTORIAL_DIR="${CWD}/qcatch_tutorial"

# Clean any existing tutorial directory to ensure a fresh download
rm -rf "$TUTORIAL_DIR" && mkdir -p "$TUTORIAL_DIR"
ZIP_FILE="data.zip"

# Download from Box
wget -O "$ZIP_FILE" "https://umd.box.com/shared/static/zd4sai70uw9fs24e1qx6r41ec50pf45g.zip?dl=1"

# Unzip and clean up
unzip "$ZIP_FILE" -d "$TUTORIAL_DIR"
rm "$ZIP_FILE"

echo "✅ Test data downloaded to $TUTORIAL_DIR"
```
### Step 2 - Run the qcatch
🎉 All set! Now let’s run QCatch:
```bash
#Set up output directory
OUT_DIR="${TUTORIAL_DIR}/output"
mkdir -p "$OUT_DIR"

# Step2 - Run QCatch
qcatch --input ${TUTORIAL_DIR}/test_data/simpleaf_with_map/quants.h5ad \
       --output ${OUT_DIR} \
       --chemistry 10X_3p_v3
```
### Tips
**1- Input path:**

Provide either:

- the **path to the parent directory** containing quantification results, or
- the **direct path to a .h5ad file** generated by those tools.

QCatch will automatically detect the input type:
- If a **.h5ad file** is provided, QCatch will process it directly.
- If a **directory** is provided, QCatch will first look for an existing .h5ad file inside. If not found, it will fall back to processing the mtx-based quantification results.

See the example directory structures at the end of the Tips section for reference:

**2- Output path:**

If you do not want any modifications in your input folder/files, speaficy the output path, we will save any new results and QC HTML report there.

**_By default_**, QCatch saves the QC report and all output files in your input directory. Therefore, specifying an output path is optional. Specifically,
- If QCatch finds the `.h5ad` file from input path, it will modify the original `.h5ad` file in place by appending cell filtering results to `anndata.obs` and create a separate QC report in HTML in the input folder.
- For `mtx-based` results, QCatch will generate text files for the cell calling reuslts as well as the QC report in the input folder."

**3- Chemistry:**

We highly recommend specifying the chemistry used in your experiment. By default, QCatch will assume the settings for 10X 3' v2 and v3 chemistry. If you use custom chemistry that not listed in the predefined chemistry options. You can specify the `--n_partitions`.

**3- Gene gene mapping file:**

If you are using simpleaf v0.19.3 or later, the generated .h5ad file already includes gene names. In this case, you do not need to specify the --gene_id2name_file option.

To provide a 'gene id to name mapping' info, the file should be a **TSV** containing two columns—‘gene_id’ (e.g., ENSG00000284733) and ‘gene_name’ (e.g., OR4F29)— **without** header row. If not provided, the program will attempt to retrieve the mapping from a remote registry. If that lookup fails, mitochondria plots will not be displayed, but will not affect the QC report.

**4- Save filtered h5ad file:**

If you want to save filtered h5ad file separately, you can specify `--save_filtered_h5ad`, which is only applicable when QCatch detects the h5ad file as the input.

**5- Specify your desired cell list:**

If you want to use a specified list of valid cell barcodes, you can provide the file path with `--valid_cell_list`. QCatch will then skip the default cell calling step and use the supplied list instead. The updated .h5ad file will include only one additional column, 'is_retained_cells', containing boolean values based on the specified list.

**6- Skip clustering plots:**

To reduce runtime, you may enable the `--skip_umap_tsne` option to bypass dimensionality reduction and visualization steps.

**7- Export the summary metrics**

To export the summary metrics, enable the `--export_summary_table` flag. The summary table will be saved as a separate CSV file in the output directory.

**8- Debug-level message**

To get debug-level messages and more intermediate computation in cell calling step, you can specify `--verbose`

**9- Re-run QCatch on modified h5ad file**
If you re-run QCatch analysis on a modified `.h5ad` file (i.e., an `.h5ad` file with additional columns added for cell calling results), the existing cell calling-related columns will be removed and then replaced with new results. The new cell calling can be generated either through QCatch's internal method or based on a user-specified list of valid cell barcodes.

**Example directory structures:**

```bash
# simpleaf
parent_quant_dir/
├── af_map/
├── af_quant/
│   ├── alevin/
│   │   ├── quants_mat_cols.txt
│   │   ├── quants_mat_rows.txt
│   │   ├── quants_mat.mtx
│   │   └── quants.h5ad (available if you use simpleaf after v0.19.3)
│   │   ...
│   ├── featureDump.txt
│   └── quant.json
└── simpleaf_quant_log.json

# alevin-fry
parent_quant_dir/
├── alevin/
│   ├── quants_mat_cols.txt
│   ├── quants_mat_rows.txt
│   └── quants_mat.mtx
├── featureDump.txt
└── quant.json

```
For more advanced options and usage details, see the sections below.

## Command-Line Arguments

| Flag | Short | Type | Description |
|------|-------|------|-------------|
| `--input`  | `-i` | `str` (Required) | Path to the input directory containing the quantification output files or to the HDF5 file itself. |
| `--output` | `-o` | `str`(Required)  | Path to the output directory.|
| `--chemistry` | `-c` | `str`(Optional but recommend) | Specifies the chemistry used in the experiment, determining the range for the `empty_drops` step. **Options**: `'10X_3p_v2'`, `'10X_3p_v3'`, `'10X_3p_v4'`, `'10X_3p_LT'`,`'10X_5p_v3'`,`'10X_HT'`. **Default**: Will use the range for `'10X_3p_v2'` and `'10X_3p_v3'`. |
| `--save_filtered_h5ad` | `-s` | `flag` (Optional) |If enabled, `qcatch` will save a separate `.h5ad` file containing only the retained cells.|
| `--gene_id2name_file` | `-g` | `str` (Optional) |File provides a mapping from gene IDs to gene names. The file must be a TSV containing two columns—‘gene_id’ (e.g., ENSG00000284733) and ‘gene_name’ (e.g., OR4F29)—without a header row. If not provided, the program will attempt to retrieve the mapping from a remote registry. If that lookup fails, mitochondria plots will not be displayed.|
| `--valid_cell_list` | `-l` | `str` (Optional) |File provides a user-specified list of valid cell barcode. The file must be a TSV containing one column with cell barcodes without a header row. If provided, qcatch will skip the internal cell calling steps and and use the supplied list instead|
| `--n_partitions` | `-n` | `int` (Optional) | Number of partitions (max number of barcodes to consider for ambient estimation). Skip this step if you already specified `--chemistry`. Only use `--n_partitions` when your experiment uses a custom chemistry not listed in the predefined chemistry options.|
| `--skip_umap_tsne` | `-u` | `flag` (Optional) | If provided, skips generation of UMAP and t-SNE plots. |
| `--export_summary_table` | `-x` | `flag` (Optional) | If enabled, QCatch will export the summary metrics as a separate CSV file. |
| `--verbose` | `-b` | `flag` (Optional) | Enable verbose logging with debug-level messages. |
| `--version` | `-v` | `flag` (Optional) | Display the installed version of qcatch. |

<!-- ## Contact

For questions and help requests, you can reach out in the [scverse discourse][].
If you found a bug, please use the [issue tracker][]. -->

<!-- ## Citation

> t.b.a

[uv]: https://github.com/astral-sh/uv
[scverse discourse]: https://discourse.scverse.org/
[issue tracker]: https://github.com/ygao61/QCatch/issues
[tests]: https://github.com/ygao61/QCatch/actions/workflows/test.yaml
[documentation]: https://QCatch.readthedocs.io
[changelog]: https://QCatch.readthedocs.io/en/latest/changelog.html
[api documentation]: https://QCatch.readthedocs.io/en/latest/api.html
[pypi]: https://pypi.org/project/QCatch -->
