Metadata-Version: 2.1
Name: HIFI-SE
Version: 0.0.3
Summary: HIFI-SE
Home-page: https://github.com/comery/HIFI-barcode-SE400
Author: Chentao Yang, Guanliang Meng
Author-email: yangchentao@genomics.cn
License: UNKNOWN
Description: # HIFI-barcode-SE400
        The BGISEQ-500 platform has launched a new test sequencing kits capable of single-end 400 bp sequencing (SE400), which offers a simple and reliable way to achieve DNA barcodes efficiently. In this study, we explored the potential of the BGISEQ-500 SE400 sequencing in DNA barcode reference construction, meanwhile provided an updated HIFI-Barcode software package that can generate COI barcode assemblies using HTS reads of length > 400 bp.
        
        
        ### Versions
        
        new release: 0.0.3 2018/11/15
        
        ### Usage (latest==0.03)
        
        
        ```shell
        HIFI-SE
        ```
        
        ```text
        usage: HIFI-SE [-h] {all,filter,assign,assembly,bold_identification} ...
        
        Description
        
        	An automatic pipeline for HIFI-SE400 project, including filtering raw reads,
        	assigning reads to samples, assembly HIFI barcodes (COI sequences).
        
        Version
        	0.0.3 2018-11-15
        
        Author
        	yangchentao at genomics.cn, BGI.
        	mengguanliang at genomics.cn, BGI.
        
        
        positional arguments:
          {all,filter,assign,assembly,bold_identification}
            all                 run filter, assign and assembly
            filter              filter raw reads
            assign              assign reads to samples
            assembly            do assembly from input fastq reads,
                                output HIFI barcodes.
            bold_identification
                                do taxa identification on BOLD system,
        
        optional arguments:
          -h, --help            show this help message and exit
        
        ```
        
        #### run in "all"
        Example:
        
        ```shell
        HIFI-SE all -outpre hifi -raw test.raw.fastq -index 5 -primer index_primer.list -cid 0.98 -oid 0.95 -seqs_lim 50000 -threads 4 -tp 2
        ```
        #### run by steps [filter -> assign -> assembly]
        
        - ```python3 HIFI-SE.py filter ```
        
        ```text
        usage: HIFI-SE filter [-h] -outpre <STR> -raw <STR> [-e <INT>]
                              [-q <INT> <INT>] [-n <INT>]
        
        optional arguments:
          -h, --help      show this help message and exit
        
        common arguments:
          -outpre <STR>   prefix for output files
        
        filter arguments:
          -raw <STR>      input raw Single-End fastq file, and only adaptersshould be removed;
                          supposed on Phred33 score system (BGISEQ-500)
          -e <INT>        expected error threshod, default=10
                          see more: http://drive5.com/usearch/manual/exp_errs.html
          -q <INT> <INT>  filter by base quality; for example: '20 5' means
                          dropping read which contains more than 5 percent of
                          quality score < 20 bases.
          -n <INT>        remove reads containing [INT] Ns, default=1
        ```
        
        - ```python3 HIFI-SE.py assign```
        
        ```text
        uusage: HIFI-SE assign [-h] -outpre <STR> -index INT -fq <STR> -primer <STR>
                              [-outdir <STR>]
        
        optional arguments:
          -h, --help     show this help message and exit
        
        common arguments:
          -outpre <STR>  prefix for output files
        
        index arguments:
          -index INT     the length of tag sequence in the ends of primers
        
        when only run assign arguments:
          -fq <STR>      cleaned fastq file
        
        assign arguments:
          -primer <STR>  taged-primer list, on following format:
                         Rev001	AAGCTAAACTTCAGGGTGACCAAAAAATCA
                         For001	AAGCGGTCAACAAATCATAAAGATATTGG
                         ...
                         this format is necessary!
          -outdir <STR>  output directory for assignment, default="assigned"
        ```
        - ```python3 HIFI-SE.py assembly```
        
        ```
        usage: HIFI-SE assembly [-h] -outpre <STR> -index INT -list FILE
                                [-vsearch <STR>] [-threads <INT>] [-cid FLOAT]
                                [-min INT] [-max INT] [-oid FLOAT] [-tp INT] [-ab INT]
                                [-seqs_lim INT] [-len INT] [-mode INT] [-rc] [-cc]
                                [-codon INT] [-frame INT]
        
        optional arguments:
          -h, --help      show this help message and exit
        
        common arguments:
          -outpre <STR>   prefix for output files
        
        index arguments:
          -index INT      the length of tag sequence in the ends of primers
        
        when only run assembly arguments:
          -list FILE      input file, fastq file list. [required]
        
        software path:
          -vsearch <STR>  vsearch path(only needed if vsearch is not in $PATH)
          -threads <INT>  threads for vsearch, default=2
          -cid FLOAT      identity for clustering, default=0.98
        
        assembly arguments:
          -min INT        minimun length of overlap, default=80
          -max INT        maximum length of overlap, default=90
          -oid FLOAT      minimun similarity of overlap region, default=0.95
          -tp INT         how many clusters will be used in assembly, recommendation=2
          -ab INT         keep clusters to assembly if its abundance >=INT
          -seqs_lim INT   reads number limitation. by default, no limitation for input reads
          -len INT        standard read length, default=400
          -mode INT       1 or 2; modle 1 is to cluster and keep most [-tp] abundance clusters,
                          or clusters abundance more than [-ab], and then make a consensus
                          sequence for each cluster. modle 2 is directly to make only one
                          consensus sequence without clustering. default=1
          -rc             whether to check amino acid translation for reads, default not
          -cc             whether to check final COI contig's amino acid translation, default not
          -codon INT      codon usage table used to check translation, default=5
          -frame INT      start codon shift for amino acid translation, default=1
        ```
        
        #### Github page
        https://github.com/comery/HIFI-barcode-SE400
        
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Requires-Python: >=3.5
Description-Content-Type: text/markdown
