Metadata-Version: 2.1
Name: taxonomical_utils
Version: 0.0.4
Summary: A set of Python scripts to proceed to taxonomical resolution and retrieval of upper taxonomies.
Home-page: https://github.com/digital-botanical-gardens-initiative/taxonomical-utils
Author: Pierre-Marie Allard
Author-email: fpierre-marie.allard@unifr.ch
Requires-Python: >=3.10,<4.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Dist: click (>=8.1.7,<9.0.0)
Requires-Dist: opentree (>=1.0.1,<2.0.0)
Requires-Dist: pandas (>=2.2.2,<3.0.0)
Requires-Dist: requests (>=2.32.2,<3.0.0)
Project-URL: Documentation, https://digital-botanical-gardens-initiative.github.io/taxonomical-utils/
Project-URL: Repository, https://github.com/digital-botanical-gardens-initiative/taxonomical-utils
Description-Content-Type: text/markdown

# taxonomical-utils

[![Release](https://img.shields.io/github/v/release/digital-botanical-gardens-initiative/taxonomical-utils)](https://img.shields.io/github/v/release/digital-botanical-gardens-initiative/taxonomical-utils)
[![Build status](https://img.shields.io/github/actions/workflow/status/digital-botanical-gardens-initiative/taxonomical-utils/main.yml?branch=main)](https://github.com/digital-botanical-gardens-initiative/taxonomical-utils/actions/workflows/main.yml?query=branch%3Amain)
[![codecov](https://codecov.io/gh/digital-botanical-gardens-initiative/taxonomical-utils/branch/main/graph/badge.svg)](https://codecov.io/gh/digital-botanical-gardens-initiative/taxonomical-utils)
[![Commit activity](https://img.shields.io/github/commit-activity/m/digital-botanical-gardens-initiative/taxonomical-utils)](https://img.shields.io/github/commit-activity/m/digital-botanical-gardens-initiative/taxonomical-utils)
[![License](https://img.shields.io/github/license/digital-botanical-gardens-initiative/taxonomical-utils)](https://img.shields.io/github/license/digital-botanical-gardens-initiative/taxonomical-utils)

A set of Python scripts to proceed to taxonomical resolution and retrieval of upper taxonomies.

- **Github repository**: <https://github.com/digital-botanical-gardens-initiative/taxonomical-utils/>
- **Documentation** <https://digital-botanical-gardens-initiative.github.io/taxonomical-utils/>

## Description

This repository contains a set of Python scripts to proceed to taxonomical resolution and retrieval of upper taxonomies. For now it uses the [Open Tree of Life](https://tree.opentreeoflife.org/about/open-tree-of-life) as a source of taxonomical data. The taxonomical-utils are merely wrappers around the python [opentree](https://github.com/OpenTreeOfLife/python-opentree) package. It includes functions for resolving taxonomic names, appending upper taxonomic lineage information, and merging data files.

## Installation

To install the Taxonomical Utils, follow these steps:

### Clone the repository:

```bash
git clone https://github.com/digital-botanical-gardens-initiative/taxonomical-utils.git
```

### Navigate to the project directory:

```bash
cd taxonomical-utils
```

Install the required dependencies using Poetry:

```bash
poetry install
```

## Usage

### CLI Commands

Taxonomical Utils provides several command-line interface (CLI) commands to process taxonomic data. Each command can be run individually or as part of a pipeline.

#### 1. Resolve Taxa

This command resolves taxonomic names from an input file and generates a resolved taxa file.

Command:

```bash
poetry run taxonomical-utils resolve --input-file <input_file> --output-file <resolved_taxa_file> --org-column-header <org_column_header>
```

- <input_file>: Path to the input CSV/TSV file containing taxonomic names.
- <resolved_taxa_file>: Path to the output file where resolved taxa will be saved.
- <org_column_header>: Column header in the input file that contains the taxonomic names.

Example:

```bash
poetry run taxonomical-utils resolve --input-file ./data/in/example.csv --output-file ./data/out/resolved_taxa.csv --org-column-header idTaxon
```

#### 2. Append Upper Taxa Lineage

This command appends upper taxonomic lineage information to the resolved taxa file.

Command:

```bash
poetry run taxonomical-utils append-taxonomy --input-file <resolved_taxa_file> --output-file <upper_taxa_lineage_file>
```

- <resolved_taxa_file>: Path to the resolved taxa file generated by the resolve command.
- <upper_taxa_lineage_file>: Path to the output file where the upper taxa lineage information will be saved.

Example:

```bash
poetry run taxonomical-utils append-taxonomy --input-file data/out/resolved_taxa.csv --output-file data/out/upper_taxa_lineage.csv
```

#### 3. Merge Data Files

This command merges the original input file with the resolved taxa file and upper taxa lineage file to produce a fully resolved dataset.

Command:

```bash
poetry run taxonomical-utils merge --input-file <input_file> --resolved-taxa-file <resolved_taxa_file> --upper-taxa-lineage-file <upper_taxa_lineage_file> --output-file <final_output_file> --org-column-header <org_column_header>
```

- <input_file>: Path to the original input CSV/TSV file.
- <resolved_taxa_file>: Path to the resolved taxa file generated by the resolve command.
- <upper_taxa_lineage_file>: Path to the upper taxa lineage file generated by the append-taxonomy command.
- <final_output_file>: Path to the final output file where the merged data will be saved.
- <org_column_header>: Column header in the input file that contains the taxonomic names.

Example:

```bash
poetry run taxonomical-utils merge --input-file data/example.csv --resolved-taxa-file data/out/resolved_taxa.csv --upper-taxa-lineage-file data/out/upper_taxa_lineage.csv --output-file data/out/final_output.csv --org-column-header idTaxon
```

### Running the Full Pipeline

To run the entire pipeline, you can execute the commands sequentially:

#### Resolve Taxa:

```bash
poetry run taxonomical-utils resolve --input-file data/example.csv --output-file data/out/resolved_taxa.csv --org-column-header idTaxon
```

#### Append Upper Taxa Lineage:

```bash
poetry run taxonomical-utils append-taxonomy --input-file data/out/resolved_taxa.csv --output-file data/out/upper_taxa_lineage.csv
```

#### Merge Data Files:

```bash
poetry run taxonomical-utils merge --input-file data/example.csv --resolved-taxa-file data/out/resolved_taxa.csv --upper-taxa-lineage-file data/out/upper_taxa_lineage.csv --output-file data/out/final_output.csv --org-column-header idTaxon
```

### Running the Commands as a Pipeline

You can also run the commands in a pipeline using && to ensure each command runs only if the previous command succeeds:

```bash
poetry run taxonomical-utils resolve --input-file data/example.csv --output-file data/out/resolved_taxa.csv --org-column-header idTaxon && \
poetry run taxonomical-utils append-taxonomy --input-file data/out/resolved_taxa.csv --output-file data/out/upper_taxa_lineage.csv && \
poetry run taxonomical-utils merge --input-file data/example.csv --resolved-taxa-file data/out/resolved_taxa.csv --upper-taxa-lineage-file data/out/upper_taxa_lineage.csv --output-file data/out/final_output.csv --org-column-header idTaxon
```

## Testing

To run the tests, use the following command:

```bash
make test
```

This will execute the test suite and ensure that all functions are working correctly.

## Contributing

Contributions are welcome! Please submit a pull request or open an issue to discuss any changes.

---

Repository initiated with [fpgmaas/cookiecutter-poetry](https://github.com/fpgmaas/cookiecutter-poetry).

