Metadata-Version: 2.1
Name: triqler
Version: 0.1.2
Summary: A combined identification and quantification error model of label-free protein quantification
Home-page: https://github.com/statisticalbiotechnology/triqler
Author: Matthew The, KTH
Author-email: matthew.the@scilifelab.se
License: Apache
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Requires-Dist: numpy (>=1.12)
Requires-Dist: scipy (>=0.17)

Triqler: TRansparent Identification-Quantification-Linked Error Rates
=====================================================================

Requirements
------------

Python 2 or 3 installation

Packages needed:

- numpy 1.12+
- scipy 0.17+

Installation via ``pip``
************************

::

  pip install triqler

Installation from source
************************

::

  git clone https://github.com/statisticalbiotechnology/triqler.git
  cd triqler
  pip install .

Usage
-----

::

  usage: python -m triqler [-h] [--out_file OUT] [--fold_change_eval F]
                     [--decoy_pattern P] [--min_samples N] [--num_threads N]
                     [--ttest]
                     IN_FILE

  positional arguments:
    IN_FILE               List of PSMs with abundances (not log transformed!)
                          and search engine score. See README for a detailed
                          description of the columns.

  optional arguments:
    -h, --help            show this help message and exit
    --out_file OUT        Path to output file (writing in TSV format). N.B. if
                          more than 2 treatment groups are present, suffixes
                          will be added before the file extension. (default:
                          proteins.tsv)
    --fold_change_eval F  log2 fold change evaluation threshold. (default: 1.0)
    --decoy_pattern P     Prefix for decoy proteins. (default: decoy_)
    --min_samples N       Minimum number of samples a peptide needed to be
                          quantified in. (default: 2)
    --num_threads N       Number of threads, by default this is equal to the
                          number of CPU cores available on the device. (default:
                          auto detect)
    --ttest               Use t-test for evaluating differential expression
                          instead of posterior probabilities. (default: False)

Example
-------

A sample file ``iPRG2016.tsv`` is provided in the ``example`` folder. You can
run Triqler on this file by running the following command:

::

  python -m triqler --fold_change_eval 0.8 example/iPRG2016.tsv

Interface
---------

The simplest input format is a tab-separated file consisting of a header line 
followed by one PSM per line in the following format:

::

  run <tab> condition <tab> charge <tab> searchScore <tab> intensity <tab> peptide     <tab> proteins
  r1  <tab> 1         <tab> 2      <tab> 1.345       <tab> 21359.123 <tab> A.PEPTIDE.A <tab> proteinA <tab> proteinB 
  r2  <tab> 1         <tab> 2      <tab> 1.945       <tab> 24837.398 <tab> A.PEPTIDE.A <tab> proteinA <tab> proteinB 
  r3  <tab> 2         <tab> 2      <tab> 1.684       <tab> 25498.869 <tab> A.PEPTIDE.A <tab> proteinA <tab> proteinB
  ...
  r1  <tab> 1         <tab> 3      <tab> 0.452       <tab> 13642.232 <tab> A.NTPEPTIDE.- <tab> decoy_proteinA


Alternatively, if you have match-between-run probabilities, a slightly more
complicated input format can be used as input:

::

  run <tab> condition <tab> charge <tab> searchScore <tab> spectrumId <tab> linkPEP <tab> featureClusterId <tab> intensity <tab> peptide     <tab> proteins
  r1  <tab> 1         <tab> 2      <tab> 1.345       <tab> 3          <tab> 0.0     <tab> 1                <tab> 21359.123 <tab> A.PEPTIDE.A <tab> proteinA <tab> proteinB 
  r2  <tab> 1         <tab> 2      <tab> 1.345       <tab> 3          <tab> 0.021   <tab> 1                <tab> 24837.398 <tab> A.PEPTIDE.A <tab> proteinA <tab> proteinB 
  r3  <tab> 2         <tab> 2      <tab> 1.684       <tab> 4          <tab> 0.0     <tab> 1                <tab> 25498.869 <tab> A.PEPTIDE.A <tab> proteinA <tab> proteinB
  ...
  r1  <tab> 1         <tab> 3      <tab> 0.452       <tab> 6568       <tab> 0.15    <tab> 9845             <tab> 13642.232 <tab> A.NTPEPTIDE.- <tab> decoy_proteinA

Some remarks:

- For Triqler to work, it also needs decoy PSMs, preferably resulting from a 
  search engine search with a reversed protein sequence database concatenated
  to the target database.
- The intensities should **not** be log transformed, Triqler will do this 
  transformation for you.
- The search engine scores should be such that higher scores indicate a higher
  confidence in the PSM.
- We recommend usage of well calibrated search engine scores, e.g. the
  SVM scores from Percolator.
- Multiple proteins can be specified at the end of the line, separated by tabs. 
  However, it should be noted that Triqler currently discards shared peptides.

The output format is a tab-separated file consisting of a header line followed
by one protein per line in the following format:

::

  q_value <tab> posterior_error_prob <tab> protein <tab> num_peptides <tab> protein_id_PEP <tab> log2_fold_change <tab> diff_exp_prob_<FC> <tab> <condition1>:<run1> <tab> <condition1>:<run2> <tab> ... <tab> <conditionM>:<runN> <tab> peptides

Some remarks:

- The reported protein expressions are the expected value of the protein's
  expression in the run. They are calculated relative to the protein's mean 
  expression and are **not** log transformed.
- The reported fold change is log2 transformed and is the expected value based 
  on the posterior distribution of the fold change.
- If more than 2 treatment groups are present, separate files will be written
  out for each pairwise comparison with suffixes added before the file 
  extension, e.g. proteins.1vs3.tsv.



