Metadata-Version: 2.1
Name: broad-babel
Version: 0.1.2
Summary: A translator of Broad and JUMP ids to more conventional names.
Author: Alan Munoz
Requires-Python: >=3.8,<4.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Description-Content-Type: text/markdown

# Proposal for broad-babel Module

It aims to translate identifiers from the Broad Institute or JUMP consortium into more standardised versions. In the case of genes NCBI gene names and in the case of chemical compounds InChiKey. From there the user can get more biological context and explore different translations.

Making this a python modules facilitates its integration into existing workflows and facilitates updating the (small) database. The python code contains exclusively the querying logic, and the csv files have been trimmed as much as possible to focus on the important data.

## Input and output design
### Inputs
- One or multiple (str) identifiers of a type (either the JUMP or Broad ID).
- A (str) specifying the type of identifier to query.

### Outputs
- A dictionary where key-value pairs are input_id -> output id.

## Data sources
- JUMP perturbation lists: https://github.com/jump-cellpainting/jump-perturbation-lists/tree/main
- JUMP CP Metadata: https://github.com/jump-cellpainting/datasets/tree/main/metadata
- JUMP-target metadata: https://github.com/jump-cellpainting/JUMP-Target/blob/master/JUMP-Target-1_compound_metadata.tsv

## Considerations
### Advantages
- Broad-babel would make plotting data with "well-known" gene names seamless. Just translate your broad ids and plot away. It also provides access to the CSV compendium with all the name-based metadata of the JUMP data collective.
- It would also make it easier for other biologists or data scientists to approach the JUMP dataset, as Broad/JUMP ids mean nothing to them. 

### Limitations
- Data and metadata may be currently incomplete, but it would be useful to find the holes in JUMP metadata. This would make it easier to be aware of what is missing, as current data
- Querying in python is probably inefficient, but as long as the size of the dataset is small this should not be prohibitive.

