Metadata-Version: 2.4
Name: idtrack
Version: 0.0.4
Summary: Cross-Temporal and Cross-Database Biological Identifier Mapping.
License: BSD-3-Clause
License-File: LICENSE
Author: Kemal Inecik
Author-email: k.inecik@gmail.com
Requires-Python: >=3.9,<4.0
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: BSD License
Classifier: Natural Language :: English
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Provides-Extra: all-external
Provides-Extra: embeddings
Provides-Extra: external-mappers
Provides-Extra: ortholog
Requires-Dist: PyMySQL (>=0.9.3)
Requires-Dist: PyYAML (>=5.2,<7) ; python_version < "3.10"
Requires-Dist: PyYAML (>=6,<7) ; python_version >= "3.10"
Requires-Dist: anndata (>=0.7)
Requires-Dist: biopython (>=1.79) ; extra == "ortholog" or extra == "all-external"
Requires-Dist: gget (>=0.28) ; extra == "external-mappers" or extra == "ortholog" or extra == "all-external"
Requires-Dist: gprofiler-official (>=1.0) ; extra == "external-mappers" or extra == "all-external"
Requires-Dist: h5py (>=2.10.0)
Requires-Dist: mygene (>=3.2) ; extra == "external-mappers" or extra == "all-external"
Requires-Dist: networkx (>=2.4)
Requires-Dist: numpy (>=1.17.4)
Requires-Dist: pandas (>=0.25.3)
Requires-Dist: pybiomart (>=0.2) ; extra == "external-mappers" or extra == "all-external"
Requires-Dist: requests (>=2.32.4)
Requires-Dist: scipy (>=1.5.3)
Requires-Dist: sparse (>0.11.2)
Requires-Dist: torch (>=1.9) ; extra == "embeddings" or extra == "all-external"
Requires-Dist: tqdm (>=4.37.0)
Requires-Dist: transformers (>=4.20) ; extra == "embeddings" or extra == "all-external"
Requires-Dist: urllib3 (>=2.5.0)
Project-URL: Documentation, https://idtrack.readthedocs.io
Project-URL: Homepage, https://github.com/theislab/idtrack
Project-URL: Repository, https://github.com/theislab/idtrack
Description-Content-Type: text/x-rst

**IDTrack**
===========

|PyPI| |PyPIDownloads| |Python Version| |License| |Read the Docs| |Build| |Tests| |Codecov|

.. |PyPI| image:: https://img.shields.io/pypi/v/idtrack.svg
   :target: https://pypi.org/project/idtrack/
   :alt: PyPI
.. |Python Version| image:: https://img.shields.io/pypi/pyversions/idtrack
   :target: https://pypi.org/project/idtrack
   :alt: Python Version
.. |License| image:: https://img.shields.io/github/license/theislab/idtrack
   :target: https://opensource.org/licenses/BSD-3-Clause
   :alt: License
.. |Read the Docs| image:: https://img.shields.io/readthedocs/idtrack/latest.svg?label=Read%20the%20Docs
   :target: https://idtrack.readthedocs.io/
   :alt: Read the documentation at https://idtrack.readthedocs.io/
.. |Build| image:: https://github.com/theislab/idtrack/actions/workflows/build_package.yml/badge.svg?branch=main
   :target: https://github.com/theislab/idtrack/actions/workflows/build_package.yml
   :alt: Build Package Status
.. |Tests| image:: https://github.com/theislab/idtrack/actions/workflows/run_tests.yml/badge.svg?branch=main
   :target: https://github.com/theislab/idtrack/actions/workflows/run_tests.yml
   :alt: Tests status
.. |PyPIDownloads| image:: https://pepy.tech/badge/idtrack
   :target: https://pepy.tech/project/idtrack
   :alt: downloads
.. |Codecov| image:: https://codecov.io/gh/theislab/idtrack/branch/main/graph/badge.svg
   :target: https://codecov.io/gh/theislab/idtrack
   :alt: Codecov

.. image:: https://raw.githubusercontent.com/theislab/idtrack/main/docs/_logo/logo.png
    :width: 350
    :alt: IDTrack logo

Cross-Temporal and Cross-Database Biological Identifier Mapping
--------------------------------------------------------------------

Modern biology constantly mixes identifiers from different years, databases, and genome builds. The result is a familiar set of problems:
IDs disappear, symbols change, references disagree, and “the same gene” isn’t always represented the same way across datasets.

**IDTrack** is built for that reality. It provides a **time-aware, audit-friendly** way to translate and harmonize biological identifiers
across **Ensembl releases** and across **external namespaces** (HGNC, UniProt, RefSeq, Entrez, …), while keeping ambiguity explicit
instead of silently forcing a single answer.

What makes IDTrack different
----------------------------

* **Time-aware mapping**: treat Ensembl releases as a “time axis” and travel forward/backward through identifier history.
* **Assembly-aware mapping**: harmonize identifiers across genome builds (e.g. GRCh37 ↔ GRCh38) and respect external databases that are assembly-scoped.
* **Snapshot boundary for reproducibility**: build a release-bounded graph snapshot so results are stable and repeatable.
* **Explicit external database opt-in**: choose which external namespaces participate via a small, editable YAML contract.
* **Transparency over coercion**: conversions are naturally classified as **1→0** (no match), **1→1** (clean), or **1→n** (ambiguous).
* **Scale-ready workflows**: caching and snapshot reuse make repeated conversions and multi-dataset harmonization practical.

Who is it for?
--------------

* Wet-lab researchers who need a reliable, step-by-step path from “my gene list is old” to “my analysis is reproducible”.
* Bioinformaticians who want release-pinned, auditable conversions in notebooks, pipelines, and integration workflows.
* Atlas builders / integrators who need to harmonize gene identifiers across many cohorts (different Ensembl releases, symbols, and external IDs), keep an explicit audit trail of what mapped/failed/was ambiguous, and ship a release-pinned, reproducible feature space for downstream integration and publication.

Common use cases
----------------

* **Dataset harmonization** before integration (single-cell, bulk, atlas-scale collections).
* **Legacy data rescue** (old Ensembl releases, mixed symbols/IDs, retired identifiers).
* **Publication-grade reproducibility** (pin a snapshot boundary + share the exact external configuration).
* **Cross-database interoperability** when collaborators use different identifier conventions.

Documentation and tutorials
---------------------------

The documentation includes a **full tutorial suite** designed to be the primary learning resource:

* Documentation: Documentation_
* Tutorials: start from the “Tutorials” section in the docs (Part 0 → Part 7).

.. _PyPI: https://pypi.org/
.. _pip: https://pip.pypa.io/
.. _Documentation: https://idtrack.readthedocs.io/

