Metadata-Version: 2.4
Name: resfinder-parser
Version: 0.3.0
Summary: Parser for ResFinder output data
License: MIT
License-File: LICENSE
Author: João Dourado Santos
Author-email: joao.dourado@insa.min-saude.pt
Requires-Python: >=3.8,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Requires-Dist: pandas (>=2.0,<3.0)
Description-Content-Type: text/markdown

# ResFinderParser

[![PyPI version](https://badge.fury.io/py/resfinder-parser.svg)](https://pypi.org/project/resfinder-parser/)
[![PyPI pyversions](https://img.shields.io/pypi/pyversions/resfinder-parser.svg)](https://pypi.org/project/resfinder-parser/)
[![License](https://img.shields.io/pypi/l/resfinder-parser.svg)](https://pypi.org/project/resfinder-parser/)
[![PyPI format](https://img.shields.io/pypi/format/resfinder-parser.svg)](https://pypi.org/project/resfinder-parser/)

_Harmonizing ResFinder outputs for scalable AMR analysis_

**ResFinderParser** is a Python tool that parses and standardizes JSON outputs generated by ResFinder (CGE/DTU) across multiple isolates. It compiles per-sample resistance detection results into structured, analysis-ready tabular datasets.

ResFinder integrates multiple resistance detection layers, including acquired resistance genes (ResFinder database), chromosomal mutations (PointFinder), disinfectant-associated determinants (DisinFinder), and predicted phenotypic resistance. Although biologically comprehensive, these outputs are generated independently for each isolate and are not directly optimized for cross-sample analyses.

The parser restructures these results into harmonized matrices suitable for:

- Antimicrobial resistance (AMR) surveillance
- Comparative resistance profiling
- Resistance frequency and trend analysis
- Integration into automated genomic analysis pipeline

By transforming nested JSON outputs into standardized TSV files, the tool enables scalable genomic resistance analyses across research and surveillance contexts.

## Requirements

- Python 3.8+
- pandas 2.0+

## Installation

### Using Poetry (recommended)

```bash
poetry install
```

### Using pip

```bash
pip install resfinder-parser
```

## Input

**ResFinderParser** expects the JSON output files generated by ResFinder. The tool should be run on a directory containing one subfolder per isolate, each including the corresponding ResFinder `*.json` file (and, if available, PointFinder and DisinFinder results).

Example structure:

```
RESFINDER_DIR/
├── ISOLATE_001/
│   └── ResFinder_results.json
├── ISOLATE_002/
│   └── ResFinder_results.json
└── ...
```

The parser extracts resistance genes, chromosomal mutations, disinfectant-associated determinants, and predicted phenotypes directly from the JSON structure.

### Outputs

The parser generates four standardized TSV files:

1. **isolate_summaries.tsv**
   One row per isolate containing:
   - isolate_id
   - analysis_date
   - ResFinder version
   - databases used
   - provided species
   - predicted phenotype summary
     Useful for run tracking and dataset documentation.

2. **resfinder_results.tsv**
   Long-format table (isolate × antibiotic) including:
   - antibiotic
   - resistance class
   - amr_resistant (True/False)
   - identity
   - coverage
   - grade
     Suitable for resistance frequency analysis and comparative profiling.

3. **pointfinder_results.tsv**
   Long-format mutation table including:
   - gene
   - mutation
   - nucleotide change
   - associated phenotype
   - PMID
     Enables mutation-level resistance analysis.

---

4. **combined_presence_absence.tsv**
   Wide-format matrix:
   - One row per isolate
   - One column per detected gene (ResFinder + DisinFinder)
   - One column per relevant mutation (PointFinder)
     Designed for comparative analyses, clustering, and integration with epidemiological metadata.
     All outputs are directly compatible with R, Python (pandas), and automated genomic analysis pipelines.

## Usage

### Command Line

```bash
resfinder-parser -r /path/to/RESFINDER_DIR -o /path/to/output_dir
```

### As a Python Module

```python
from resfinder_parser import ResfinderCollector

collector = ResfinderCollector("/path/to/RESFINDER_DIR", "/path/to/output_dir")
collector.collect()
```

Arguments:

- `-r`, `--resfinder_dir` (required): directory containing isolate subfolders
- `-o`, `--output_dir` (optional): directory where output files will be written

### Development

Run tests:

```bash
poetry run pytest
```

## Citation

We recommend citing ResFinder (this repository) and its associated databases:

- Zankari et al., 2012. Journal of Antimicrobial Chemotherapy.
- Florensa et al., 2022. Microbial Genomics.
- CGE/DTU ResFinder platform.

