Metadata-Version: 2.1
Name: itsxpress
Version: 1.8.1
Summary: Rapidly trim sequences down to their Internally Transcribed Spacer ITS regions
Author-email: "Adam R. Rivers" <adam.rivers@usda.gov>, "Sveinn V. Einarsson" <seinarsson@ufl.edu>
Maintainer-email: "Adam R. Rivers" <adam.rivers@usda.gov>, "Sveinn V. Einarsson" <seinarsson@ufl.edu>
License: Creative Commons Legal Code CC0 1.0 Universal Official translations of this
        legal tool are available CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES
        NOT PROVIDE LEGAL SERVICES. DISTRIBUTION OF THIS DOCUMENT DOES NOT CREATE AN
        ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS INFORMATION ON AN
        "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES REGARDING THE USE OF THIS
        DOCUMENT OR THE INFORMATION OR WORKS PROVIDED HEREUNDER, AND DISCLAIMS LIABILITY
        FOR DAMAGES RESULTING FROM THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS
        PROVIDED HEREUNDER. Statement of Purpose The laws of most jurisdictions
        throughout the world automatically confer exclusive Copyright and Related Rights
        (defined below) upon the creator and subsequent owner(s) (each and all, an
        "owner") of an original work of authorship and/or a database (each, a "Work").
        
        Certain owners wish to permanently relinquish those rights to a Work for the
        purpose of contributing to a commons of creative, cultural and scientific works
        ("Commons") that the public can reliably and without fear of later claims of
        infringement build upon, modify, incorporate in other works, reuse and
        redistribute as freely as possible in any form whatsoever and for any purposes,
        including without limitation commercial purposes. These owners may contribute to
        the Commons to promote the ideal of a free culture and the further production of
        creative, cultural and scientific works, or to gain reputation or greater
        distribution for their Work in part through the use and efforts of others.
        
        For these and/or other purposes and motivations, and without any expectation of
        additional consideration or compensation, the person associating CC0 with a Work
        (the "Affirmer"), to the extent that he or she is an owner of Copyright and
        Related Rights in the Work, voluntarily elects to apply CC0 to the Work and
        publicly distribute the Work under its terms, with knowledge of his or her
        Copyright and Related Rights in the Work and the meaning and intended legal
        effect of CC0 on those rights.
        
        1. Copyright and Related Rights. A Work made available under CC0 may be
        protected by copyright and related or neighboring rights ("Copyright and Related
        Rights"). Copyright and Related Rights include, but are not limited to, the
        following:
        
        the right to reproduce, adapt, distribute, perform, display, communicate, and
        translate a Work; moral rights retained by the original author(s) and/or
        performer(s); publicity and privacy rights pertaining to a person's image or
        likeness depicted in a Work; rights protecting against unfair competition in
        regards to a Work, subject to the limitations in paragraph 4(a), below; rights
        protecting the extraction, dissemination, use and reuse of data in a Work;
        database rights (such as those arising under Directive 96/9/EC of the European
        Parliament and of the Council of 11 March 1996 on the legal protection of
        databases, and under any national implementation thereof, including any amended
        or successor version of such directive); and other similar, equivalent or
        corresponding rights throughout the world based on applicable law or treaty, and
        any national implementations thereof. 2. Waiver. To the greatest extent
        permitted by, but not in contravention of, applicable law, Affirmer hereby
        overtly, fully, permanently, irrevocably and unconditionally waives, abandons,
        and surrenders all of Affirmer's Copyright and Related Rights and associated
        claims and causes of action, whether now known or unknown (including existing as
        well as future claims and causes of action), in the Work (i) in all territories
        worldwide, (ii) for the maximum duration provided by applicable law or treaty
        (including future time extensions), (iii) in any current or future medium and
        for any number of copies, and (iv) for any purpose whatsoever, including without
        limitation commercial, advertising or promotional purposes (the "Waiver").
        Affirmer makes the Waiver for the benefit of each member of the public at large
        and to the detriment of Affirmer's heirs and successors, fully intending that
        such Waiver shall not be subject to revocation, rescission, cancellation,
        termination, or any other legal or equitable action to disrupt the quiet
        enjoyment of the Work by the public as contemplated by Affirmer's express
        Statement of Purpose.
        
        3. Public License Fallback. Should any part of the Waiver for any reason be
        judged legally invalid or ineffective under applicable law, then the Waiver
        shall be preserved to the maximum extent permitted taking into account
        Affirmer's express Statement of Purpose. In addition, to the extent the Waiver
        is so judged Affirmer hereby grants to each affected person a royalty-free, non
        transferable, non sublicensable, non exclusive, irrevocable and unconditional
        license to exercise Affirmer's Copyright and Related Rights in the Work (i) in
        all territories worldwide, (ii) for the maximum duration provided by applicable
        law or treaty (including future time extensions), (iii) in any current or future
        medium and for any number of copies, and (iv) for any purpose whatsoever,
        including without limitation commercial, advertising or promotional purposes
        (the "License"). The License shall be deemed effective as of the date CC0 was
        applied by Affirmer to the Work. Should any part of the License for any reason
        be judged legally invalid or ineffective under applicable law, such partial
        invalidity or ineffectiveness shall not invalidate the remainder of the License,
        and in such case Affirmer hereby affirms that he or she will not (i) exercise
        any of his or her remaining Copyright and Related Rights in the Work or (ii)
        assert any associated claims and causes of action with respect to the Work, in
        either case contrary to Affirmer's express Statement of Purpose.
        
        4. Limitations and Disclaimers.
        
        No trademark or patent rights held by Affirmer are waived, abandoned,
        surrendered, licensed or otherwise affected by this document. Affirmer offers
        the Work as-is and makes no representations or warranties of any kind concerning
        the Work, express, implied, statutory or otherwise, including without limitation
        warranties of title, merchantability, fitness for a particular purpose, non
        infringement, or the absence of latent or other defects, accuracy, or the
        present or absence of errors, whether or not discoverable, all to the greatest
        extent permissible under applicable law. Affirmer disclaims responsibility for
        clearing rights of other persons that may apply to the Work or any use thereof,
        including without limitation any person's Copyright and Related Rights in the
        Work. Further, Affirmer disclaims responsibility for obtaining any necessary
        consents, permissions or other rights required for any use of the Work. Affirmer
        understands and acknowledges that Creative Commons is not a party to this
        document and has no duty or obligation with respect to this CC0 or use of the
        Work.
        
Project-URL: repository, http://github.com/usda-ars-gbru/itsxpress
Keywords: Amplicon,sequencing,fungal,ITS
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.5
Classifier: Development Status :: 3 - Alpha
Requires-Python: >=3.5
Description-Content-Type: text/x-rst
License-File: LICENSE.txt
Requires-Dist: biopython (>=1.79)
Requires-Dist: pyzstd (>=0.15.3)
Provides-Extra: tests
Requires-Dist: pytest ; extra == 'tests'

ITSxpress: Software to rapidly trim  the Internally transcribed spacer (ITS) region of FASTQ files
==================================================================================================

.. image:: https://readthedocs.org/projects/itsxpress/badge/?version=latest
    :target: https://itsxpress.readthedocs.io/en/latest/?badge=latest
    :alt: Documentation Status

.. image:: https://github.com/USDA-ARS-GBRU/itsxpress/actions/workflows/python-package-conda.yml/badge.svg
   :target: https://github.com/USDA-ARS-GBRU/itsxpress/actions/workflows/python-package-conda.yml
   :alt: Build Status

.. image:: https://anaconda.org/bioconda/itsxpress/badges/downloads.svg
   :target: https://anaconda.org/bioconda/itsxpress
   :alt: Anaconda-Server Badge
   
.. image:: https://img.shields.io/github/v/release/USDA-ARS-GBRU/itsxpress?style=social
   :target: https://github.com/USDA-ARS-GBRU/itsxpress/releases/latest
   :alt: GitHub release (latest by date)

.. image:: https://zenodo.org/badge/DOI/10.5281/zenodo.1304349.svg
  :target: https://doi.org/10.5281/zenodo.1304349

Author
-------
* Adam R. Rivers, US Department of Agriculture, Agricultural Research Service
* Sveinn V. Einarsson, US Department of Agriculture, Agricultural Research Service


Citation
--------
Rivers AR, Weber KC, Gardner TG et al. ITSxpress: Software to rapidly trim
internally transcribed spacer sequences with quality scores for marker gene
analysis [version 1]. F1000Research 2018, 7:1418
(doi: `10.12688/f1000research.15704.1`_)

.. _`10.12688/f1000research.15704.1`: https://doi.org/10.12688/f1000research.15704.1

#####

**This is the end of life version 1 ITSxpress.
The new version 2 of ITSxpress, has the Qiime2 plugin built in with the command line version of ITSxpress. See 
master branch of ITSxpress.**

#####

Introduction
-------------

The internally transcribed spacer region is a region between highly conserved the small
subunit (SSU) of rRNA and the large subunit (LSU) of the rRNA. In Eukaryotes it contains
the 5.8s genes and two variable length spacer regions. In amplicon sequencing studies it is
common practice to trim off the conserved (SSU, 5,8S or LSU) regions. `Bengtsson-Palme
et al. (2013)`_ published software the software package ITSx_ to do this.

ITSxpress is designed to support the calling of exact sequence variants rather than OTUs_.
This newer method of sequence error-correction requires quality score data from each
sequence, so each input sequence must be trimmed. ITSxpress makes this possible by
taking FASTQ data, de-replicating the sequences then identifying the start and stop
sites using HMMSearch.  Results are parsed and the trimmed files are returned. The ITS1,
ITS2 or the entire ITS region including the 5.8s rRNA gene can be selected. ITSxpress
uses the hmm model from ITSx so results are comparable.

ITSxpress is also available as a `QIIME2 Plugin`_

.. _`Bengtsson-Palme et al. (2013)`: https://doi.org/10.1111/2041-210X.12073
.. _ITSx: http://microbiology.se/software/itsx/
.. _OTUs: https://doi.org/10.1038/ismej.2017.119
.. _`QIIME2 Plugin`: https://github.com/USDA-ARS-GBRU/q2_itsxpress
.. _`mamba installation guide`: https://mamba.readthedocs.io/en/latest/installation.html


Installation
-------------

This is the installation of the final iteration of ITSxpress version 1: (BBmap is no longer used in ITSxpress version 2):

	- This version should primarily be used for reproducability with other datasets, which used ITSxpress =<1.8.1
	- The new version 2 is compatible with the newer versions of Qiime2
	- **If you want to install this iteration of ITSxpress with Qiime2, then you you need to follow the install instructions here:** `QIIME2 Plugin`_ 

Since this version is no longer supported, you **must** create a new conda environment in order for the depenendencies to be compatible.


Example on how to install and create new conda environment for this version of ITSxpress. We are using mamba because it resolves packages better and faster, but conda can be substituted.

	- Information on installing mamba or micromamba (either highly recommended) can be found here: `mamba installation guide`_

.. code-block:: bash
  
  mamba create -n ITSxpress_V1EOL python=3.8.13
  mamba activate ITSxpress_V1EOL
  #or
  conda create -n ITSxpress_V1EOL python=3.8.13
  conda activate ITSxpress_V1EOL



ITSxpress can be installed in 3 ways:
--------------------------------------


1. **Bioconda:** (preferred method because it handles dependencies):

.. code-block:: bash

    mamba install -y -c bioconda itsxpress==1.8.1

2. **Pip:** https://pypi.org/project/itsxpress/:
    - If using Pip, you will need to specify the versions of the dependencies listed below before installing itsxpress

.. code-block:: bash

    mamba install -y -c bioconda hmmer==3.1b2
    mamba install -y -c bioconda bbmap==38.69
    mamba install -y -c bioconda vsearch==2.21.1
    pip install itsxpress


3. **The Github repository:** https://github.com/USDA-ARS-GBRU/itsxpress

.. code-block:: bash

    git clone -branch 1.8.1-EOL https://github.com/USDA-ARS-GBRU/itsxpress.git


Dependencies
-------------
This software requires Vsearch=2.21.1, BBtools=38.69, Hmmer=3.1b2 and Biopython>=1.79. Bioconda
takes care of this for you so it is the preferred installation method.


Usage
---------


+-------------------------+---------------------------------------------------------------+
| Option                  | Description                                                   |
+=========================+===============================================================+
| -h, --help              | Show this help message and exit.                              |
+-------------------------+---------------------------------------------------------------+
| --fastq                 | A ``.fastq``, ``.fq``, ``.fastq.gz`` or ``.fq.gz`` file.      |
|                         | Interleaved or not. Required.                                 |
+-------------------------+---------------------------------------------------------------+
| --single_end            | A flag to specify that the fastq file is single-ended (not    |
|                         | paired). Default is false.                                    |
+-------------------------+---------------------------------------------------------------+
| --fastq2                | A ``.fastq``, ``.fq``, ``.fastq.gz`` or ``.fq.gz`` file       |
|                         | representing read 2 if present, optional.                     |
+-------------------------+---------------------------------------------------------------+
| --outfile               | The trimmed FASTQ file, if it ends in ``gz`` it will be       |
|                         | gzipped.                                                      |
+-------------------------+---------------------------------------------------------------+
| --outfile2              | The trimmed FASTQ read 2 file, if it ends in ``gz`` it will   |
|                         | be gzipped. If used, reads will be retuned as unmerged pairs  |
|                         | rather than than merged.                                      |
+-------------------------+---------------------------------------------------------------+
| --tempdir               | Specify the temp file directory. Default is None.             |
+-------------------------+---------------------------------------------------------------+
| --keeptemp              | Should intermediate files be kept? Default is false.          |
+-------------------------+---------------------------------------------------------------+
| --region                | Options : {ITS2, ITS1, ALL}                                   |
+-------------------------+---------------------------------------------------------------+
| --taxa                  | Select the taxonomic group sequenced: {Alveolata, Bryophyta,  |
|                         | Bacillariophyta, Amoebozoa, Euglenozoa, Fungi, Chlorophyta,   |
|                         | Rhodophyta, Phaeophyceae, Marchantiophyta, Metazoa, Oomycota, |
|                         | Haptophyceae, Raphidophyceae, Rhizaria, Synurophyceae,        |
|                         | Tracheophyta, Eustigmatophyceae, All}. Default Fungi.         |
+-------------------------+---------------------------------------------------------------+
| --cluster_id            | The percent identity for clustering reads range [0.99-1.0],   |
|                         | set to 1 for exact de-replication. Default 1.0.               |
+-------------------------+---------------------------------------------------------------+
| --log                   | Log file. Default is ITSxpress.log.                           |
+-------------------------+---------------------------------------------------------------+
| --threads               | Number of processor threads to use. Default is 1.             |
+-------------------------+---------------------------------------------------------------+
| --reversed_primers      | Primers are in reverse orientation as in Taylor et al. 2016,  |
|                         | DOI:10.1128/AEM.02576-16. If selected ITSxpress returns       |
|                         | trimmed reads flipped to the forward orientation              |
+-------------------------+---------------------------------------------------------------+
| --allow_staggered_reads | Allow merging staggered reads with --fastq_allowmergestagger  |
|                         | for Vsearch --fastq_mergepairs. See Vsearch documentation.    |
|                         | (Optional) Default is true.                                   |
+-------------------------+---------------------------------------------------------------+



Examples
---------

Use case 1: Trimming the ITS2 region from a fungal amplicon sequencing dataset with
forward and reverse gzipped FASTQ files using two cpu threads. Return a single merged file for use in Deblur.

.. code-block:: bash

    itsxpress --fastq r1.fastq.gz --fastq2 r2.fastq.gz --region ITS2 \
    --taxa Fungi --log logfile.txt --outfile trimmed_reads.fastq.gz --threads 2

ITSxpress can take gzipped or un-gzipped FASTQ files and it can write gzipped or
un-gzipped FASTQ files. It expects FASTQ files to end in: .fq, .fastq, .fq.gz or fastq.gz.

Use case 2: Trimming the ITS2 region from a fungal amplicon sequencing dataset with
forward and reverse gzipped FASTQ files using two cpu threads. Return a forward
and reverse read files  for use in Dada2.

.. code-block:: bash

    itsxpress --fastq r1.fastq.gz --fastq2 r2.fastq.gz --region ITS2 \
    --taxa Fungi --log logfile.txt --outfile trimmed_reads.fastq.gz --threads 2

ITSxpress can take gzipped or un-gzipped FASTQ files and it can write gzipped or
un-gzipped FASTQ files. It expects FASTQ files to end in: .fq, .fastq, .fq.gz or fastq.gz.


Use case 3: Trimming the ITS2 region from a fungal amplicon sequencing dataset with
an interleaved gzipped FASTQ files using two cpu threads. Return a single merged file for use in Deblur.

.. code-block:: bash

    itsxpress --fastq interleaved.fastq.gz  --region ITS2 --taxa Fungi \
    --log logfile.txt --outfile trimmed_reads.fastq.gz --threads 2


Use case 4: Trimming the ITS2 region from a fungal amplicon sequencing dataset with
an single-ended gzipped FASTQ files using two cpu threads.

.. code-block:: bash

    itsxpress --fastq single-end.fastq.gz --single_end --region ITS2 --taxa Fungi \
    --log logfile.txt --outfile trimmed_reads.fastq.gz --threads 2

Single ended data is less common and may come from a dataset where the reads have already
been merged.

Use case 5: Trimming the ITS1 region from a Alveolata amplicon sequencing dataset with
an interleaved gzipped FASTQ files using 8 cpu threads.

.. code-block:: bash

    itsxpress --fastq interleaved.fastq.gz --region ITS1 --taxa Alveolata \
    --log logfile.txt --outfile trimmed_reads.fastq.gz --threads 8


License information
--------------------
This software is a work of the United States Department of Agriculture,
Agricultural Research Service and is released under a Creative Commons CC0
public domain attribution.
