Metadata-Version: 2.1
Name: starlingrt
Version: 0.1.0
Summary: [I]nteractive [R]etention [T]ime vi[S]ualization for gas chromatography.
Author-email: "Nathan A. Mahynski" <nathan.mahynski@gmail.com>
Maintainer-email: "Nathan A. Mahynski" <nathan.mahynski@gmail.com>
License: # NIST Software Licensing Statement
        
        NIST-developed software is provided by NIST as a public service.
        You may use, copy, and distribute copies of the software in any
        medium, provided that you keep intact this entire notice. You may
        improve, modify, and create derivative works of the software or
        any portion of the software, and you may copy and distribute such
        modifications or works. Modified works should carry a notice
        stating that you changed the software and should note the date
        and nature of any such change. Please explicitly acknowledge the
        National Institute of Standards and Technology as the source of
        the software.
        
        NIST-developed software is expressly provided "AS IS." NIST MAKES
        NO WARRANTY OF ANY KIND, EXPRESS, IMPLIED, IN FACT, OR ARISING BY
        OPERATION OF LAW, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
        WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE,
        NON-INFRINGEMENT, AND DATA ACCURACY. NIST NEITHER REPRESENTS NOR
        WARRANTS THAT THE OPERATION OF THE SOFTWARE WILL BE UNINTERRUPTED
        OR ERROR-FREE, OR THAT ANY DEFECTS WILL BE CORRECTED. NIST DOES
        NOT WARRANT OR MAKE ANY REPRESENTATIONS REGARDING THE USE OF THE
        SOFTWARE OR THE RESULTS THEREOF, INCLUDING BUT NOT LIMITED TO THE
        CORRECTNESS, ACCURACY, RELIABILITY, OR USEFULNESS OF THE
        SOFTWARE.
        
        You are solely responsible for determining the appropriateness of
        using and distributing the software and you assume all risks
        associated with its use, including but not limited to the risks
        and costs of program errors, compliance with applicable laws,
        damage to or loss of data, programs or equipment, and the
        unavailability or interruption of operation. This software is not
        intended to be used in any situation where a failure could cause
        risk of injury or damage to property. The software developed by
        NIST employees is not subject to copyright protection within the
        United States.
        
Project-URL: Repository, https://github.com/mahynski/starlingrt.git
Project-URL: Documentation, https://starlingrt.readthedocs.io/
Project-URL: Issues, https://github.com/mahynski/starlingrt/issues
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE.md
Requires-Dist: numpy <2.0.0,>=1.23
Requires-Dist: scipy
Requires-Dist: scikit-learn
Requires-Dist: matplotlib >=3.7.2
Requires-Dist: pandas ==2.2
Requires-Dist: bokeh ==3.0.3
Requires-Dist: xlrd ==2.0.1
Requires-Dist: pre-commit ==3.3.3
Requires-Dist: pytest >=7.4.0
Requires-Dist: ipykernel
Requires-Dist: mypy
Requires-Dist: sphinx
Provides-Extra: all

![Workflow](https://github.com/mahynski/starlingrt/actions/workflows/python-app.yml/badge.svg?branch=main)
[![Documentation Status](https://readthedocs.org/projects/starlingrt/badge/?version=latest)](https://starlingrt.readthedocs.io/en/latest/?badge=latest)
[![codecov](https://codecov.io/gh/mahynski/starlingrt/graph/badge.svg?token=7EILPHJ40F)](https://codecov.io/gh/mahynski/starlingrt)
[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://github.com/pre-commit/pre-commit)
[![DOI](https://zenodo.org/badge/886299192.svg)](https://doi.org/10.5281/zenodo.14170132)

STARLINGrt : [I]nteractive [R]etention [T]ime vi[S]ualization for gas chromatography
===

<img src="docs/_static/logo.png" height="100" align="left" />

STARLINGrt is a tool for analyzing retention times from gas chromatogaphy mass spectrometry (GCMS).  It can be used to determine a consensus value for compounds by visualizing a collection of results.  Compound identification(s) made at a given retention time are assumed to be provided by a separate code which analyzes the mass spectrometry data collected at that time.  Currently, STARLINGrt is configured to work with the outputs from [MassHunter(TM)](https://www.agilent.com/en/product/software-informatics/mass-spectrometry-software) but is extensible by subclassing "data._SampleBase" (see samples.py for an example).  The code produces an interactive HTML file using [Bokeh](https://bokeh.org/) which can be modified interactively, saved, exported and shared easily between different users.  The name "starling" was selected as a reverse acronym of the tool's purpose.

Installation
===

We recommend creating a [virtual environment](https://docs.python.org/3/library/venv.html) or, e.g., a [conda environment](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) then installing startlingrt with [pip](https://pip.pypa.io/en/stable/):

~~~bash
$ conda create -n starlingrt-env python=3.10
$ conda activate starlingrt-env
$ pip install startlingrt
~~~

You can also install from this GitHub repo source:

~~~bash
$ git clone git@github.com:mahynski/startlingrt.git
$ cd startlingrt
$ conda create -n starlingrt-env python=3.10
$ conda activate starlingrt-env
$ pip install .
$ python -m pytest # Optional unittests
~~~

To install this into a Jupyter kernel:

~~~bash
$ conda activate starlingrt-env
$ python -m ipykernel install --user --name starlingrt-kernel --display-name "starlingrt-kernel"
~~~

Use Cases
===

Imagine you have multiple GCMS output files which have been used to identify chemicals at different retention times, e.g., using some sort of library. 
In principle, these could correspond to analyses of a range of different mixtures; regardless, an individual component should elute at the same time regardless of what it is combined with. However, natural variations in:

* the retention times can cause confusion when other compounds coelute or elute at very similar times,
* the mass spectrometry peak location(s) at a given retention time can cause the identification routine to identify the same compound differently.

Given these uncertainties we would like to learn things like:

1. What is a consensus value, or at least a natural range, of retention times for each compound identified?
2. What compounds elute at similar points and are commonly confused with each other?
3. Are there any analyses that identify a compound at a retention time far away from its consensus value (data cleaning)?
4. What is a natural "gap" in retention times that can be used to "ideally" divide all compounds from their "neighbors"?

This visualization tool helps users answer these questions by exploring their data with interactive graphs. The output of this tool is an HTML file that acts as a self-contained summary of your data, how you cleaned / modified it, and an be easily shared between users.

Example
===

Here is a simple example (see `docs/_static/example.py`):

~~~python
import os
import starlingrt

from starlingrt import sample, data, functions, visualize

def load_mass_hunter(input_directory):
    """
    Parameters   
    ---------
    input_directory : str
        Directory to seach for raw folders are in.

    Returns
    -------
    samples : list(sample.MassHunterSample)
        List of Samples collected from all directories in `input_directory`.
    """
    ...
    return samples

top_entries = starlingrt.data.Utilities.select_top_entries(
    starlingrt.data.Utilities.create_entries(
        load_mass_hunter(
            "path/to/data/"
        )
    )
)

starlingrt.visualize.make(
    top_entries=top_entries, 
    width=1200,
    threshold=starlingrt.functions.estimate_threshold(starlingrt.functions.get_dataframe(top_entries)[0]),
    output_filename='summary.html',
)
~~~

Documentation
===

Documentation is hosted at [https://starlingrt.readthedocs.io/](https://starlingrt.readthedocs.io/) via [readthedocs](https://about.readthedocs.com/).

The logo was generated using Google Gemini with the prompt "Design a logo involving a starling and gas chromatography" on Nov. 9, 2024.

Contributors
===

This code was developed during a collaboration with:

* [Prof. Nives Ogrinc](https://orcid.org/0000-0002-0773-0095)
* [Dr. Lidija Strojnik](https://orcid.org/0000-0003-1898-9147)
