Title: FlowMeta: Automated End-to-End Metagenomic Profiling Pipeline
Repository: https://github.com/SkinMicrobe/FlowMeta
Author: Dongqiang Zeng
Email: interlaken@smu.edu.cn
π Welcome! This guide walks through installing, configuring, and running FlowMeta, the 10-stage metagenomic pipeline that links fastp β Bowtie2 β Kraken2/Bracken β host filtering β multi-format reporting. Use it to replicate Shotgun workflows on Linux servers, HPC clusters, or WSL.
conda env create -f environment.yml
conda activate meta
# or
mamba env create -f environment.yml
mamba activate meta
Prefer an isolated virtual environment?
python -m venv .venv
source .venv/bin/activate # Windows PowerShell: .venv\Scripts\activate
pip install -r docs/quickstart-requirements.txt # tailor as needed
$PATH./mnt/db/GRCh38_noalt_as/GRCh38_noalt_as).hash.k2d, opts.k2d, taxo.k2d, and optional Bracken taxonomy tables.pip install flowmeta
Ideal once the package is published publicly; every user can install directly.
pip install dist/flowmeta-0.1.5-py3-none-any.whl
Use when sharing a pre-built artifact inside secure networks.
pip install build
python -m build --wheel --sdist
pip install dist/flowmeta-0.1.5-py3-none-any.whl
Confirm README.md, README.zh.md, and docs/tutorial.html stay bundled in the sdist so downstream installs ship with documentation.
tar -tf dist/flowmeta-0.1.5.tar.gz (sdist) and python -m zipfile -l dist/flowmeta-0.1.5-py3-none-any.whl (wheel) quickly reveal missing files. β
k2_standard_20240112) to /mnt/db/k2ppf.hash.k2d, opts.k2d, taxo.k2d, library_report.tsv, and related files.--shm_path /dev/shm/k2ppf so FlowMeta caches the DB in RAM.# fetch the genome
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.28_GRCh38.p13/GCA_000001405.28_GRCh38.p13_genomic.fna.gz
# optional: remove alternative contigs
seqkit grep -rvp "alt|PATCH" GCA_000001405.28_GRCh38.p13_genomic.fna.gz > GRCh38_noalt.fna
# build Bowtie2 index
mkdir -p /mnt/db/GRCh38_noalt_as
bowtie2-build GRCh38_noalt.fna /mnt/db/GRCh38_noalt_as/GRCh38_noalt_as
# FlowMeta configuration
flowmeta_base ... --db_bowtie2 /mnt/db/GRCh38_noalt_as/GRCh38_noalt_as
π Swap the download URL and filtering logic to index other host species.
flowmeta_base \
--input_dir /mnt/data/flowmeta/01-raw \
--output_dir /mnt/data/flowmeta/flowmeta-out \
--db_bowtie2 /mnt/db/GRCh38_noalt_as/GRCh38_noalt_as \
--db_kraken /mnt/db/k2ppf \
--threads 32 \
--project_prefix GLOBAL-
The pipeline auto-detects paired FASTQ files, writes outputs to the specified workspace, and drops .task.complete flags for safe restarts.
| Flag | Purpose |
|---|---|
--input_dir | Raw FASTQ directory (default: `01-raw`, expecting `_1/_2` suffixes). |
--output_dir | Pipeline workspace; creates `02-qc` through `09-mpa`. |
--db_bowtie2 | Bowtie2 index prefix used for host filtering. |
--db_kraken | Kraken2 database directory. |
--threads | Total threads per sample. |
--batch | Concurrent samples processed in fastp/Kraken2. |
--se | Toggle single-end mode. |
--suffix1, --suffix2 | Override FASTQ suffixes when naming schemes differ. |
--min_count | Bracken minimum count threshold (host filtering). |
--skip_integrity_checks | Skip FASTQ integrity checks to maximize speed (use only on trusted storage). |
--check_result | Enable integrity checks in Steps 2 and 4 (off by default to save time). |
--step | Resume from a logical pipeline step (1β10). |
--force | Re-run work from the specified step even if flags exist. |
--skip_host_extract | Skip exporting host reads in Step 5. |
--no_shm, --shm_path | Control shared-memory staging of Kraken2 databases. |
flowmeta_base ... --step 6flowmeta_base ... --step 3 --forceflowmeta_base ... --se --suffix1 _R.fastq.gzflowmeta_base ... --skip_host_extract02-qc/ fastp reports + trimmed reads
03-hr/ Host-depleted FASTQ files
04-bam/ Bowtie2 BAM and index files
05-host/ Optional host FASTQ exports
06-ku/ Kraken2 reports (first pass)
07-bracken/ Bracken abundance tables
08-ku2/ Host-filtered rerun outputs
09-mpa/ Final merged OTU / MPA matrices
π§© Each directory stores per-sample .task.complete files so you can resume safely after interruptions.
βΉοΈ Integrity checks in Steps 2 and 4 run only when you supply --check_result (default is off to save time) and keep them enabled. If you pass --skip_integrity_checks, all FASTQ integrity checks are skipped to maximize throughputβuse this only when storage/media is trusted. At startup the CLI also prints a concise path overview for all step directories.
--check_result).--check_result).π Steps map to functions in flowmeta/steps/ for advanced customization and can be resumed with --step.
| Step | Purpose | Files counted when announced |
|---|---|---|
| 1 | fastp trimming/QC. | FASTQ pairs in 01-raw matching suffix1. |
| 2 | fastp integrity verification (requires --check_result). | .task.complete or _fastp.json in 02-qc. |
| 3 | Bowtie2 host depletion + BAM creation. | .task.complete in 02-qc. |
| 4 | Host-removed FASTQ validation (requires --check_result). | _host_remove_R1.fastq.gz in 03-hr. |
| 5 | Optional samtools host-read export. | .bam files in 04-bam. |
| 6 | Stage Kraken2 DB in shared memory (if enabled). | N/A |
| 7 | Kraken2/Bracken classification. | _host_remove_R1.fastq.gz in 03-hr. |
| 8 | Kraken report validation. | .kraken.report.std.txt in 06-ku. |
| 9 | Host-taxid filtering + Bracken rerun. | .kraken.report.std.txt in 06-ku. |
| 10 | Merge OTU/MPA/Bracken outputs. | .nohuman.kraken.mpa.std.txt (08-ku2) + .bracken tables (07-bracken). |
STEP n banner; unrecoverable errors raise FlowMetaError..task.complete file to re-run a specific sample.--fastp_retries, --host_retries, or --kraken_retries after transient failures.--no_shm and copy databases to SSD/RAM using --shm_path.--step 3 --force (Bowtie2) or a later step depending on the scope.01-raw; FlowMeta processes only the samples lacking .task.complete markers.02-qc for both HTML reports and logs; ensure disk space is sufficient.flowmeta_base -h prints the entire argument list.Maintainer: Dongqiang Zeng Β· Southern Medical University Β· interlaken@smu.edu.cn
Please cite the GitHub repository if FlowMeta contributes to your research: https://github.com/SkinMicrobe/FlowMeta.
π Happy sequencing!