Metadata-Version: 2.1
Name: GENIALbiologists
Version: 0.9.0
Summary: GENIAL: GENes Indentification with Abricate for Lucky biologists
Home-page: https://github.com/p-barbet/GENIAL
Author: Pauline Barbet, Arnaud Felten
Author-email: pauline.barbet@anses.fr, arnaud.felten@anses.fr
License: UNKNOWN
Description: GENIAL : GENes Identification with Abricate for Lucky biologists
        ================================================================
        
        Authors : Barbet Pauline, Felten Arnaud
        
        Affiliation: [Food Safety Laboratory - ANSES Maisons Alfort (France)](https://www.anses.fr/en/content/laboratory-food-safety-maisons-alfort-and-boulogne-sur-mer)
        
        You can find the latest version of the tool at [https://github.com/p-barbet/GENIAL](https://github.com/p-barbet/GENIAL)
        
        
        GENIAL
        ======
        
        GENIAL aims to identify antimicrobial resistance and virulence genes from bacterial genomes matching them to a database gathering genes of interest using [ABRicate](https://github.com/tseemann/abricate).
        
        ### Databases
        
        Default databases available are ([Resfinder](https://cge.cbs.dtu.dk/services/ResFinder/), [CARD](https://card.mcmaster.ca/), [ARG-ANNOT](http://backup.mediterranee-infection.com/article.php?laref=282&titre=arg-annot), [NCBI](https://www.ncbi.nlm.nih.gov/bioproject/PRJNA313047), [EcOH](https://github.com/katholt/srst2/tree/master/data), [PlasmidFinder](https://cge.cbs.dtu.dk/services/PlasmidFinder/), [Ecoli_VF](https://github.com/phac-nml/ecoli_vf) and [VFDB](http://www.mgc.ac.cn/VFs/))
        
        As well as this databes, it's posible to use your own database.
        
        The tool is divided into two scripts.
        
        ### GENIALanalysis
        
        GENIALanalysis aims to run ABricate. It takes in input a tsv file containing genomes fasta files paths and IDs.If you want to use your own database you also need to provide a multifasta whith genes IDs as headers. Then the script run ABricate and produce in output one ABRicate result file per genome, corresponding to a tsv file including genes found in each sample.
        
        ### GENIALresults
        
        GENIALresults aims to conditionning ABRicate results in the form of matrixes and heatmaps of presence/absence. It takes in input a temporary file produced by the Abricate analysis containing the genomes Abricate results paths and IDs. In the case of vfdb database a file containing the virulence factors names, their family and species is automticaly included in the script.
        
        The output depending on the database used :
        
        * In any cases a matrix in tsv format and a heatmap in png format with all genes found are created
        
        
        On top of that:
        
        * If you use one of the default databases [Resfinder](https://cge.cbs.dtu.dk/services/ResFinder/) or [VFDB](http://www.mgc.ac.cn/VFs/) news matrix and heatmap by gene type are produced with a correspondace table between the gene name, its family and its number in all genomes.
        
        * If you don't use one of the two previous databases or if you use your own database, only a corespondance table between the gene name and its number in all genomes is produced in addition.
        
        ![](workflow.PNG?raw=true "script workflow")
        
        
        Dependencies
        ============
        
        The script has been developed with python 3.6 (tested with 3.6.6)
        
        ### External dependencies
        
        * [ABRicate](https://github.com/tseemann/abricate) tested with 0.8.7
        * [Pandas](https://pandas.pydata.org/) tested with 0.23.4
        * [seaborn](https://seaborn.pydata.org/installing.html) tested with 0.9.0
        
        
        Parameters
        ==========
        
        ### Command line options
        
        
        |   Options   |                                                              Description                                                              |        Required        |      Default     |
        |:-----------:|:-------------------------------------------------------------------------------------------------------------------------------------:|:----------------------:|:----------------:|
        |      -f     |                                            tsv file with FASTA files paths ans strains IDs                                            |           Yes          |                  |
        |     -dbp    |                                   Path to ABRicate databases repertory. Implies -dbf and --privatedb                                  |   Yes if --privatedb   |                  |
        |     -dbf    |                           Multifasta containing the private database sequences. Implies -dbp and --privatedb                          |   Yes if --privatedb   |                  |
        |      -T     |                                                        Number of thread to use                                                        |           No           |         1        |
        |      -w     |                                                           Working directory                                                           |           No           |         .        |
        |      -r     |                                                         Results directory name                                                        |           No           | ABRicate_results |
        | --defaultdb |  default databases available : resfinder, card, argannot, acoh, ecoli_vf, plasmidfinder, vfdb or ncbi. Incompatible with --privatedb  | Yes if not --privatedb |                  |
        | --privatedb |                              Private database name. Implies -dbp and -dbf. Incompatible with --defaultdb                              | Yes if not --defaultdb |                  |
        |   --mincov  |                                                   Minimum proportion of gene covered                                                  |           No           |        80        |
        |   --minid   |                                             Minimum proportion of exact nucleotide matches                                            |           No           |        90        |
        |     --R     |                                          Remove genes present in all genomes from the matrix                                          |           No           |       False      |
        
        Test 
        ====
        
        After installing ABRicate and Pandas and seaborn you can test the script with the command line :
        
        ## Default database
        
        	python AntiViruce.py -f input_file.tsv --defaultdb vfdb -r results_directory --minid 90 --mincov 80
        
        ## Private database
        
        	python AntiViruce.py -f input_file.tsv  --privatedb private_db_name -T 10 -r results_directory --minid 90 --mincov 80 -dbp path_to_abricate_databases_repertory -dbf private_db_multifasta_path
        
        
        
        
        
        
        
        
Platform: UNKNOWN
Classifier: License :: OSI Approved :: GNU General Public License v2 (GPLv2)
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: POSIX :: Linux
Description-Content-Type: text/markdown
