Metadata-Version: 2.4
Name: RIAssigner
Version: 0.6.1
Summary: Python library for retention index calculation.
License: MIT
License-File: LICENSE
Keywords: gas chromatography,mass spectrometry,retention index
Author: Helge Hecht
Author-email: helge.hecht@recetox.muni.cz
Requires-Python: >=3.11,<3.14
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Dist: click (>=8.0)
Requires-Dist: fastparquet (>=2025.12.0,<2026.0.0)
Requires-Dist: matchms (>=0.30.1,<0.31.0)
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: pint (>=0.25.2,<0.26.0)
Requires-Dist: scipy
Requires-Dist: urllib3 (>=2.6.3,<3.0.0)
Project-URL: Repository, https://github.com/RECETOX/RIAssigner
Description-Content-Type: text/markdown

# RIAssigner

[![Python package](https://github.com/RECETOX/RIAssigner/actions/workflows/python-package.yml/badge.svg)](https://github.com/RECETOX/RIAssigner/actions/workflows/python-package.yml)
[![Python Package using Conda](https://github.com/RECETOX/RIAssigner/actions/workflows/python-package-conda.yml/badge.svg?branch=main)](https://github.com/RECETOX/RIAssigner/actions/workflows/python-package-conda.yml)
[![Anaconda Build](https://github.com/RECETOX/RIAssigner/actions/workflows/anaconda.yml/badge.svg?branch=main)](https://github.com/RECETOX/RIAssigner/actions/workflows/anaconda.yml)
[![bioconda package](https://img.shields.io/conda/v/bioconda/riassigner)](https://anaconda.org/bioconda/riassigner)
[![PyPI - Python Version](https://img.shields.io/pypi/v/RIAssigner)](https://pypi.org/project/RIAssigner/)
[![DOI](https://joss.theoj.org/papers/10.21105/joss.04337/status.svg)](https://doi.org/10.21105/joss.04337)

## Overview

RIAssigner is a python tool for retention index (RI) computation for GC-MS data developed at [RECETOX](https://www.recetox.muni.cz/en) and hosted on [Galaxy](https://umsa.cerit-sc.cz/).

The [retention index](https://goldbook.iupac.org/terms/view/R05360) is a mapping of retention time, making the retention data of compounds comparable, i.e. two compounds might have different retention times in different experiments, but a very similar retention index.
To compute this index, a set of reference compounds - often an inert alkane series - is analyzed as part of the batch (on the same column).
The retention index of the alkanes are fixed (carbon number x 100) and any query compounds can be assigned a retention index depending on its retention time.
This can be done via piece wise linear interpolation or other mathematical methods.

If you use this software, please cite our paper!

Hecht et al., (2022). RIAssigner: A package for gas chromatographic retention index calculation. Journal of Open Source Software, 7(75), 4337, https://doi.org/10.21105/joss.04337

## Installation

(1) From source by cloning the repository and then installing the package with `pip`.

```
git clone https://github.com/RECETOX/RIAssigner.git
cd RIAssigner
poetry install
```

(2) Install via [bioconda](https://anaconda.org/bioconda/riassigner) in your existing environment.

```
conda install -c bioconda riassigner
```

(3) Install via [pip](https://pypi.org/project/RIAssigner/) in your existing environment.

```
pip install riassigner
```

## Usage

RIAssigner can be used to read data from `.msp` and `.mgf` files using [matchms](https://github.com/matchms/matchms) and `.csv`, `.tsv` and `.parquet` using [pandas](https://pandas.pydata.org/) and to compute the retention indices for the data.
A reference list of retention indexed compounds (traditionally an Alkane series) with retention times is used to compute the RI for a query dataset of retention time values using the [van Den Dool and Kratz](<https://doi.org/10.1016/S0021-9673(01)80947-X>) method or by using [cubic spline-based interpolation](https://doi.org/10.1021/ac50035a026).

### Python API

```python
from RIAssigner.compute import Kovats
from RIAssigner.data import MatchMSData, PandasData

# Load reference & query data
query = PandasData("../tests/data/csv/aplcms_aligned_peaks.csv", "csv", rt_unit="seconds")
reference = MatchMSData("../tests/data/msp/Alkanes_20210325.msp", "msp", rt_unit="min")

# Compute RI and write it back to file
query.retention_indices = Kovats().compute(query, reference)
query.write("peaks_with_rt.csv")
```

For more details check out this [notebook](doc/example_usage.ipynb).

### Command Line Interface

RIAssigner provides a command-line interface for computing retention indices without writing Python code.

#### Compute retention indices

Use the `compute` command to calculate retention indices for a query dataset based on a reference dataset:

```bash
riassigner compute \
  --reference <path> <filetype> <rt_unit> \
  --query <path> <filetype> <rt_unit> \
  --method <kovats|cubicspline> \
  --output <output_path>
```

**Parameters:**
- `--reference`: Reference dataset with retention times and indices. Provide: path, filetype (msp/csv/tsv/parquet), and retention time unit (min/seconds)
- `--query`: Query dataset for which to compute retention indices. Provide: path, filetype (msp/csv/tsv/parquet), and retention time unit (min/seconds)
- `--method`: Computation method - either `kovats` (van Den Dool and Kratz) or `cubicspline`
- `--output`: Path for the output file with computed retention indices

**Example:**

```bash
riassigner compute \
  --reference reference_alkanes.msp msp min \
  --query query_peaks.csv csv seconds \
  --method kovats \
  --output peaks_with_ri.csv
```

#### Extract retention indices from comments

Use the `ri-from-comment` command to extract retention indices from the comment field of a dataset:

```bash
riassigner ri-from-comment \
  --query <path> <filetype> <rt_unit> \
  --ri-source <key> \
  --output <output_path>
```

**Parameters:**
- `--query`: Query dataset from which to read retention indices. Provide: path, filetype (msp/csv/tsv/parquet), and retention time unit (min/seconds)
- `--ri-source`: Key used in the comment field to identify the retention index value
- `--output`: Path for the output file

**Example:**

```bash
riassigner ri-from-comment \
  --query compounds.msp msp min \
  --ri-source "retention index" \
  --output compounds_with_ri.msp
```

## Developer Documentation

### Setup

```
conda create -n riassigner-dev -c conda-forge python=3.10 poetry

conda activate riassigner-dev

poetry install --no-root
```

### Contributing

We appreciate contributions - feel free to open an issue on our repository, create your own fork, work on the problem and pose a PR.
Make sure to add your contributions to the [changelog](CHANGELOG.md) and to adhere to the [versioning](https://semver.org/spec/v2.0.0.html).
For more information see [here](CONTRIBUTING.md).

### Architecture

<!-- generated by mermaid compile action - START -->

![~mermaid diagram 1~](/.resources/README-md-1.svg)

<details>
  <summary>Mermaid markup</summary>

```mermaid
classDiagram
    class MatchMSData{
        -List ~Spectra~ data
    }

    class PandasData {
        -DataFrame data
    }

    Data <|-- MatchMSData
    Data <|-- PandasData

    class Data{
        <<abstract>>
        +read(string filename)
        +write(string filename)
        +retention_times() List~float~
        +retention_indices() List~float~
    }


    class ComputationMethod{
        <<interface>>
        +compute(Data query, Data reference) List~float~

    }

    class Kovats {

    }
    class CubicSpline {

    }

    ComputationMethod <|-- Kovats
    ComputationMethod <|-- CubicSpline

```

</details>
<!-- generated by mermaid compile action - END -->

### Testing

All functionality is tested with the [pytest](https://docs.pytest.org/en/6.2.x/contents.html) framework. Make sure to run your IDE in the `riassigner-dev` conda environment (or make sure to use the respective python interpreter when developing) to follow formatting guidelines and to be able to execute the tests.

For testing, install the package dependencies as follows:

```
git clone https://github.com/RECETOX/RIAssigner.git
cd RIAssigner
poetry install --no-root
pytest
```

