Metadata-Version: 2.1
Name: genotations
Version: 0.1.6
Summary: Genotations - python library to work with genomes and primers
Author: antonkulaga (Anton Kulaga)
Author-email: <antonkulaga@gmail.com>
Keywords: python,utils,files
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Operating System :: Unix
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: Microsoft :: Windows
Description-Content-Type: text/markdown


Genotations
===========

Python library to work with genomes and annotations, mostly Ensembl genomes. Also supports visualization of transcripts/gene features and primer selection.
As pandas and polars are libraries of everyday use for many python developers this library focus on annotations representation in a dataframe way.


The library allows:
* downloading Ensembl annotations and genomes (uses genomepy under the hood)
* working with genomic annotations like with polars dataframes
* getting sequences for selected genes
* visualizing the genes features
* designing primers for selected transcripts with Primer3 python wrapper
 
Usage
=====

Install with pip:
```bash
pip install genotations
```
In some cases you may also need to install ucsc annotation tools, you can add them to your micromamba/conda environment as they are installed from bioconda channel.
Here how it may look in your environment file:
```yaml
name: genotations
channels:
  - conda-forge
  - BjornFJohansson
  - bioconda
  - defaults
dependencies:
  - python=3.10
  - ucsc-bedtogenepred
  - ucsc-genepredtobed
  - ucsc-genepredtogtf
  - ucsc-gff3togenepred
  - ucsc-gtftogenepred
  - pip
  - pip:
      - genotations
```

Now you can start using it, for example:
```python
from genotations import ensembl
human = ensembl.human # getting human genome
mouse = ensembl.mouse # getting mosue genome
mouse.annotations.exons().annotations_df # getting exons as DataFrame
mouse.annotations.protein_coding().exons().annotations_df # getting exons of protein coding genes
mouse.annotations.transcript_gene_names_df # getting transcript gene names
mouse.annotations.with_gene_name_contains("Foxo1").protein_coding().transcripts() #getting only coding Foxo1 transcripts
mouse.annotations.with_gene_name_contains("Foxo1").genes_visual(mouse.genome)[0].plot() # plotting features of the Foxo1 gene
cow_assemblies = ensembl.search_assemblies("Bos taurus") # you can also search genomes by species name if it exists in Ensembl
cow1 = ensembl.SpeciesInfo("Cow", cow_assemblies[-1][0]) # selecting one of several cow assemblies
cow1.annotations.annotations_df # getting annotations as dataframe
```

You can also use the library to annotate existing gene expression data with gene and transcript symbols and features.
For example
```python
from genotations.quantification import *
from genotations import ensembl
base = "."
examples = base / "examples"
data = examples / "data"
expressions = pl.read_parquet(str(data / "PRJNA543661_transcripts.parquet"))
with_expressions_summaries(expressions, min_avg_value = 1)
expressions_ext = ensembl.mouse.annotations.extend_with_annotations_and_sequences(expressions, ensembl.mouse.genome) # extend expression data with annotations and sequences
```

For more examples, check [example notebook](https://github.com/antonkulaga/genotations/blob/main/examples/explore_mouse.ipynb) to see the usage and API


Working with the library code
=====

Use micromamba (or conda) and environment.yaml to install the dependencies
```
micromamba create -f environment.yaml
micromamba activate genotations
```
