Metadata-Version: 2.1
Name: annoPipeline
Version: 0.0.1
Summary: API-enabled Gene Annotation
Home-page: https://github.com/jimmyjamesarnold/annoPipeline
Author: Jim Arnold
Author-email: jimmyjamesarnold@gmail.com
License: MIT
Description: # annoPipeline - an API-enabled gene annotation pipeline
        
        ***annoPipeline*** uses APIs from [mygene.info](http://mygene.info/) and [Entrez esummary](https://dataguide.nlm.nih.gov/eutilities/utilities.html#esummary) to annotate a user-provided list of gene symbols.
        
        Generates a pandas DataFrame with gene symbol, gene name, EntrezID, and bibliographic info for up to 5 pubmed publications where a functional reference was provided (more about functional references at [GeneRIF](https://www.ncbi.nlm.nih.gov/gene/about-generif)).
        
        Designed to be useful for tasks such as:
        * identifying relevant publications for a given function
        * analyzing publications trends for genes belonging to a common pathway
        * identifying influential PIs for a given gene network. 
        
        ## Reqirements:
        
        * Written for use with Python 3.7, not tested on other versions.
        
        * *annoPipeline* requires:
            - numpy >= 1.16.2
            - pandas >= 0.24.2
            - Biopython >= 1.73
            - openpyxl >= 2.6.1
            - requests >= 2.21.0
        
        ## To Install:
        ```
        pip install annoPipeline
        ```
        
        Or clone the repo from github.
        Then, in the annoPipeline directory, run:
        ```
        python setup.py install
        ```
        Required dependencies will be installed if missing, may take a few seconds.
        
        ## Example usage:
        
        Execute the full annotation pipeline on a list of gene symbols like this:
        ```python
        import annoPipeline as ap
        
        # define a list of genes you would like annotated
        geneList = ['CDK2', 'FGFR1', 'SLC6A4']
        
        # annoPipeline will execute full annotation pipeline (see individual functions below). 
        df = ap.annoPipeline(geneList) # returns pandas df with annotations for gene and bibliographic info.
        ```
        - ***ap.annoPipeline*** will default save annotation output to Excel file named by geneList symbols separated by '_'.
        
        ### Warning! 
        If querying a **single gene**, still pass as a list. For example:
        ```python
        import annoPipeline as ap
        
        df = ap.annoPipeline(['CDK2']) # for single gene queries still include [] - will be fixed in later version
        ```
        
        
        ## v0.0.1 Functionality
        
        ### Task 1:
        1.  From the MyGeneInfo API, use the “Gene query service" GET method to return details on a given list of human gene symbols.
        2.  From the returned json, parse out the “name", “symbol" and “entrezgene" values and print to screen
        
        Use *queryGenes()*:
        ```python
        import annoPipeline as ap
        
        geneList = ['CDK2', 'FGFR1', 'SLC6A4']
        
        l1 = ap.queryGenes(geneList) # returns list of dicts where keys are default mygene fields (symbol,name,taxid,entrezgene,ensemblgene)
        ```
        
        ### Task 2: 
        1. 	Using the appropriate identifier from the above result, send a query to the MyGeneInfo “Gene annotation services" method for each gene
        2.	From the resulting json, collate up to 5 generif descriptions per gene
        3.	Write the results to an Excel spreadsheet with columns: gene_symbol, gene_name, entrez_id, generifs
        
        Use *getAnno()*:
        ```python
        import annoPipeline as ap
        
        geneList = ['CDK2', 'FGFR1', 'SLC6A4']
        l1 = ap.queryGenes(geneList)
        l2 = ap.getAnno(l1, saveExcel=True) # saveExcel defaults False
        ```
        - returns pandas df with genes and up to 5 generifs from mygene.info. 
        - default **saveExcel**=*False*, to save output to Excel must state *True*
        - if *True*, Excel file will be named by geneList symbols separated by '_'. 
        
        ### Task 3:
        1.  Use the Pubmed IDs associated with the above generif content to extract additional bibliographic information.
        
        Use *addBibs()*:
        ```python
        import annoPipeline as ap
        
        geneList = ['CDK2', 'FGFR1', 'SLC6A4']
        l1 = ap.queryGenes(geneList)
        l2 = ap.getAnno(l1)
        l3 = ap.addBibs(l2) # will return df with genes and up to 5 generifs from mygene.info
        ```  
        * Currently returns the following bibliographic information when available:
            * PubDate
            * Source
            * Title
            * LastAuthor
            * DOI
            * PmcRefCount
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Scientific/Engineering :: Interface Engine/Protocol Translator
Description-Content-Type: text/markdown
