Metadata-Version: 2.1
Name: RKP
Version: 0.1.0
Summary: Relative K-mer Project
Home-page: https://gitlab.com/microbial_genomics/relative-kmer-project
Author: Lennard Epping, Felix Hartkopf
Author-email: EppingL@rki.de, HartkopfF@rki.de
License: UNKNOWN
Description: # Relative K-mer Project
        
        ## Abstract
        
        ### WGS analysis reveals extended natural transformation in Campylobacter impacting diagnostics and the pathogens adaptive potential. 
        ### Running title: WGS analysis of Campylobacter hybrid strains
        
        ### Julia C. Golz 1a, Lennard Epping 2#, Marie-Theres Knüver 1a, Maria Borowiak 1b, Felix Hartkopf 2, Carlus Deneke 1b, Burkhard Malorny 1b, Torsten Semmler 2, Kerstin Stingl 1a*
        
        1 German Federal Institute for Risk Assessment, Department of Biological Safety, a National Reference Laboratory for *Campylobacter*, b Study Centre for Genome Sequencing and Analysis, Berlin, Germany
        2 Robert Koch Institute, Microbial Genomics, Berlin, Germany  
        
        \# sharing first author  
        \* corresponding author
        
        In the past decade, *Campylobacter* infections are getting more common worldwide. These infections can lead to diarrhea, abdominal pain, fever, headache, nausea, and/or vomiting and pose a serious danger for public health.  This sparked efforts to improve prevention, treatment and reduce transmissions. As further stated by Kaakoush et al. [1], the main risks are the consumption of animal products and water, contact with animals and international travels. 
        
        As the threat to public health differs among *Campylobacter* species, it is important to identify dangerous *Campylobacter* species and investigate their characteristics in genotype and phenotype. In this work, a kmer mapping approach is used to identify recombination events and involved genes to describe hybrid species. Therefore, hybrids of *Campylobacter jejuni* and *Campylobacter coli* are analyzed to validate this approach and to develop a workflow that can be applied to emerging hybrids in general. This would allow a fast and reliable classification of hybrids. 
        
        KMC3 [2] and BEDTools [5] are utilized to extract kmers of *Campylobacter* genomes and to calculate shared kmers of two species and their hybrids. Subsequently, these kmers can be used in combination with Blast [3] and Bowtie 2 [4] to select genes that are shared with the hybrid genomes. These genes can be grouped into batches that were involved in a single recombination event. A visualization of the gene coverage generated using R provides further information about the selected genes. 
        
        This work will provide a new generic tool for hybrid analysis that could be expanded to other bacteria and enable researchers to classify new species and recombination events in a fast and reliable manner.
        
        
        [1] Global Epidemiology of Campylobacter Infection
        Nadeem O. Kaakoush, Natalia Castaño-Rodríguez, Hazel M. Mitchell, Si Ming Man
        Clinical Microbiology Reviews Jun 2015, 28 (3) 687-720; DOI: 10.1128/CMR.00006-15  
        [2] Marek Kokot, Maciej Długosz, Sebastian Deorowicz, KMC 3: counting and manipulating k-mer statistics, Bioinformatics, Volume 33, Issue 17, 01 September 2017, Pages 2759–2761, https://doi.org/10.1093/bioinformatics/btx304  
        [3] Stephen F. Altschul, Warren Gish, Webb Miller, Eugene W. Myers, David J. Lipman,
        Basic local alignment search tool, Journal of Molecular Biology, Volume 215, Issue 3, 1990, Pages 403-410, ISSN 0022-2836, https://doi.org/10.1016/S0022-2836(05)80360-2.  
        [4] Langmead B, Salzberg S. Fast gapped-read alignment with Bowtie 2. Nature Methods. 2012, 9:357-359.  
        [5] Aaron R. Quinlan, Ira M. Hall, BEDTools: a flexible suite of utilities for comparing genomic features, Bioinformatics, Volume 26, Issue 6, 15 March 2010, Pages 841–842, https://doi.org/10.1093/bioinformatics/btq033
        
        ## Requirements
        
        + [Conda](https://docs.conda.io/en/latest/)
        
        or 
        
        + Python 3.X
          + numpy = 1.17.3
          + matplotlib = 3.1.2
          + pandas = 0.25.3
          + biopython = 1.76
          + argparse = 1.4.0
          + tqdm = 4.41.1
        + kmc = 3.1.1
        + bowtie2 = 2.3.5
        + bedtools = 2.29.2
        + r = 3.6
          + pheatmap = 1.0.12
          + gplots = 3.0.1.1
        + blast = 2.9.0
        + samtools = 1.10
        + bedops = 2.4.37
        + seqkit=0.11.0
        
        
        ## Installation
        
        
        
        1.
        
        Change to src directory in RKP repository:
        ```bash
        cd path/to/repo/src
        ```
        2.
        
        Create environment with all dependencies needed by RKP:
        ```bash
        conda env create -f RKP.yaml
        ```
        
        3. 
        
        Activate RKP environment:
        ```bash
        conda activate RKP
        ```
        
        4.
        
        Run RKP:
        ```bash
         python RKP.py -A <acceptor genome dir A> -B <hybrid genome dir B> -C <donor genome dir C> -k  <kmerlength> -a <acceptor treshold> -c <donor threshold> -g <acceptor reference genome fasta> -f <acceptor refernecs genome gff> -o <output directory>
        ```
        
        
        Required parameters: 
        
        |  Parameter | Description  |  
        |------------|--------------|
        | -A, -C     | Two directories with genomes (.fna) of acceptor and donor | 
        | -B         | Directory with genomes (.fasta) and fnn files of hybrids | 
        |  -k        |  Length of kmers | 
        |  -at        |  Relative amount (0 to 1) of isolates of acceptor that should have kmer x| 
        |  -dt        |  Relative amount (0 to 1) of isolates of donor that should have kmer x| 
        |  -g        |  acceptor reference genome | 
        |  -f        |  acceptor reference gff file | 
        |  -o        |  output directory| 
        
        
        Optional parameters: 
        
        
        |  Parameter | Description  |  
        |------------|--------------|
        | -d         | Keep all temporary files | 
        |  --version |  Show version of RKP | 
        |  -h        |  Show help | 
        |  -t        |  number of threads, default = 8| 
        
        ## File structure of output
        ```
        output
        │
        │  
        │
        └───Acceptor
        │   │   (only temporary files)
        │   
        └───Hybrid
        |   │   *_iso_seq_protein.fasta
        |   |   *_iso_seq.fasta
        |   |   mapping_result_Genes_count.csv
        |   |   mapping_result_Genes_cutoff_20.csv
        |   |   mapping_result_Genes_raw.csv
        |   |   mapping_result.csv
        |   |   mapping_result.pdf
        |   |   recombination_cov_<kmerLength>_W50.pdf
        |   |   recombination_cov_<kmerLength>_W100.pdf
        |   |   recombination_cov_<kmerLength>_W200.pdf
        |   |   recombination_cov_<kmerLength>_W300.pdf
        |   |   recombination_cov_<kmerLength>_W400.pdf
        |   |   recombination_cov_<kmerLength>_W500.pdf
        |   |   Recombination_result_<kmerLength>_W50.csv
        |   |   Recombination_result_<kmerLength>_W100.csv
        |   |   Recombination_result_<kmerLength>_W200.csv
        |   |   Recombination_result_<kmerLength>_W300.csv
        |   |   Recombination_result_<kmerLength>_W400.csv
        |   |   Recombination_result_<kmerLength>_W500.csv
        |
        └───Donor
        |   │   (only temporary files)
        |
        └───RKP.log
        ``` 
        
        ## Call structure
        
        ```mermaid
        graph TD;
          RKP.py-->create_kmers.sh;
          create_kmers.sh-->map_kmers.sh;
          RKP.py-->heatmap.R;
        ```
        
        ## Workflow
        
        ![workflow](workflow.png "Workflow")
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: POSIX :: Linux
Requires-Python: >=3.6
Description-Content-Type: text/markdown
