Metadata-Version: 2.1
Name: pypgx
Version: 0.4.1
Summary: A Python package for pharmacogenomics research
Home-page: https://github.com/sbslee/pypgx
Author: Seung-been "Steven" Lee
Author-email: sbstevenlee@gmail.com
License: MIT
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Description-Content-Type: text/x-rst
Requires-Dist: fuc
Requires-Dist: scikit-learn

..
   This file was automatically generated by docs/create.py.

README
******

.. image:: https://badge.fury.io/py/pypgx.svg
    :target: https://badge.fury.io/py/pypgx

.. image:: https://readthedocs.org/projects/pypgx/badge/?version=latest
    :target: https://pypgx.readthedocs.io/en/latest/?badge=latest
    :alt: Documentation Status

Introduction
============

The main purpose of the PyPGx package is to provide a unified platform for pharmacogenomics (PGx) research.

The package is written in Python, and supports both command line interface (CLI) and application programming interface (API) whose documentations are available at the `Read the Docs <https://pypgx.readthedocs.io/en/latest/>`_.

Your contributions (e.g. feature ideas, pull requests) are most welcome.

| Author: Seung-been "Steven" Lee
| Email: sbstevenlee@gmail.com
| License: MIT License

Installation
============

The following packages are required to run PyPGx:

.. parsed-literal::

   fuc
   scikit-learn

There are various ways you can install PyPGx. The recommended way is via conda (`Anaconda <https://www.anaconda.com/>`__):

.. code-block:: text

   $ conda install -c bioconda pypgx

Above will automatically download and install all the dependencies as well. Alternatively, you can use pip (`PyPI <https://pypi.org/>`__) to install PyPGx and all of its dependencies:

.. code-block:: text

   $ pip install pypgx

Finally, you can clone the GitHub repository and then install PyPGx locally:

.. code-block:: text

   $ git clone https://github.com/sbslee/pypgx
   $ cd pypgx
   $ pip install .

The nice thing about this approach is that you will have access to development versions that are not available in Anaconda or PyPI. For example, you can access a development branch with the ``git checkout`` command. When you do this, please make sure your environment already has all the dependencies installed.

Archive file, semantic type, and metadata
=========================================

In order to efficiently store and transfer data, PyPGx uses the ZIP archive file format (``.zip``) which supports lossless data compression. Each archive file created by PyPGx has a metadata file (``metadata.txt``) and a data file (e.g. ``data.tsv``, ``data.vcf``). A metadata file contains important information about the data file within the same archive, which is expressed as pairs of ``=``-separated keys and values (e.g. ``Assembly=GRCh37``):

.. list-table::
    :widths: 20 40 40
    :header-rows: 1

    * - Metadata
      - Description
      - Examples
    * - ``Assembly``
      - Reference genome assembly.
      - ``GRCh37``, ``GRCh38``
    * - ``Control``
      - Control gene.
      - ``VDR``, ``chr1:10000-20000``
    * - ``Gene``
      - Target gene.
      - ``CYP2D6``, ``GSTT1``
    * - ``Platform``
      - NGS platform.
      - ``WGS``, ``Targeted``
    * - ``Program``
      - Name of the phasing program.
      - ``Beagle``
    * - ``Samples``
      - Samples used for inter-sample normalization.
      - ``NA07000,NA10854,NA11993``
    * - ``SemanticType``
      - Semantic type of the archive.
      - ``CovFrame[CopyNumber]``, ``Model[CNV]``

Notably, all archive files have defined semantic types, which allows us to ensure that the data that is passed to a PyPGx command (CLI) or method (API) is meaningful for the operation that will be performed. Below is a list of currently defined semantic types:

- ``CovFrame[CopyNumber]``
    * CovFrame for storing target gene's per-base copy number which is computed from read depth with control statistics.
    * Requires following metadata: ``Gene``, ``Assembly``, ``SemanticType``, ``Platform``, ``Control``, ``Samples``.
- ``CovFrame[ReadDepth]``
    * CovFrame for storing target gene's per-base read depth which is computed from BAM files.
    * Requires following metadata: ``Gene``, ``Assembly``, ``SemanticType``, ``Platform``.
- ``Model[CNV]``
    * Model for calling CNV in target gene.
    * Requires following metadata: ``Gene``, ``Assembly``, ``SemanticType``, ``Control``.
- ``SampleTable[Alleles]``
    * TSV file for storing target gene's candidate star alleles for each sample.
    * Requires following metadata: ``Gene``, ``Assembly``, ``SemanticType``, ``Program``.
- ``SampleTable[CNVCalls]``
    * TSV file for storing target gene's CNV call for each sample.
    * Requires following metadata: ``Gene``, ``Assembly``, ``SemanticType``, ``Control``.
- ``SampleTable[Genotypes]``
    * TSV file for storing target gene's genotype call for each sample.
    * Requires following metadata: ``Gene``, ``Assembly``, ``SemanticType``.
- ``SampleTable[Results]``
    * TSV file for storing various results for each sample.
    * Requires following metadata: ``Gene``, ``Assembly``, ``SemanticType``.
- ``SampleTable[Statistcs]``
    * TSV file for storing control gene's various statistics on read depth for each sample. Used for converting target gene's read depth to copy number.
    * Requires following metadata: ``Control``, ``Assembly``, ``SemanticType``, ``Platform``.
- ``VcfFrame[Consolidated]``
    * VcfFrame for storing target gene's consolidated variant data.
    * Requires following metadata: ``Gene``, ``Assembly``, ``SemanticType``, ``Program``.
- ``VcfFrame[Imported]``
    * VcfFrame for storing target gene's raw variant data.
    * Requires following metadata: ``Gene``, ``Assembly``, ``SemanticType``.
- ``VcfFrame[Phased]``
    * VcfFrame for storing target gene's phased variant data.
    * Requires following metadata: ``Gene``, ``Assembly``, ``SemanticType``, ``Program``.

Getting help
============
For detailed documentations on the CLI and API, please refer to the `Read the Docs <https://pypgx.readthedocs.io/en/latest/>`_.

For getting help on the CLI:

.. code-block:: text

   $ pypgx -h

   usage: pypgx [-h] [-v] COMMAND ...

   positional arguments:
     COMMAND
       call-genotypes      Call genotypes for target gene.
       combine-results     Combine various results for the target gene.
       compute-control-statistics
                           Compute various statistics for control gene with BAM data.
       compute-copy-number
                           Compute copy number from read depth for target gene.
       compute-target-depth
                           Compute read depth for target gene with BAM data.
       create-consolidated-vcf
                           Create consolidated VCF.
       create-read-depth-tsv
                           Compute read depth for target gene with BAM data.
       create-regions-bed  Create a BED file which contains all regions used by PyPGx.
       estimate-phase-beagle
                           Estimate haplotype phase of observed variants with the Beagle program.
       filter-samples      Filter Archive file for specified samples.
       import-read-depth   Import read depth data for target gene.
       import-variants     Import variant data for target gene.
       plot-bam-copy-number
                           Plot copy number profile with BAM data.
       plot-bam-read-depth
                           Plot read depth profile with BAM data.
       plot-vcf-allele-fraction
                           Plot allele fraction profile with VCF data.
       plot-vcf-read-depth
                           Plot read depth profile with VCF data.
       predict-alleles     Predict candidate star alleles based on observed variants.
       predict-cnv         Predict CNV for target gene based on copy number data.
       print-metadata      Print the metadata of specified archive.
       run-ngs-pipeline    Run NGS pipeline for the target gene.
       test-cnv-caller     Test a CNV caller for the target gene.
       train-cnv-caller    Train a CNV caller for the target gene.

   optional arguments:
     -h, --help            Show this help message and exit.
     -v, --version         Show the version number and exit.

For getting help on a specific command (e.g. call-genotypes):

.. code-block:: text

   $ pypgx call-genotypes -h

Below is the list of submodules available in the API:

- **genotype** : The genotype submodule is a suite of tools for accurately predicting genotype calls.
- **pipeline** : The pipeline submodule is used to provide convenient methods that combine multiple PyPGx actions and automatically handle semantic types.
- **plot** : The plot submodule is used to plot various kinds of profiles such as read depth, copy number, and allele fraction.
- **utils** : The utils submodule is the main suite of tools for PGx research.


For getting help on a specific submodule (e.g. utils):

.. code:: python3

   >>> from pypgx.api import utils
   >>> help(utils)

CLI examples
============

Run NGS pipeline for CYP2D6:

.. code-block:: text

   $ pypgx run-ngs-pipeline \
   CYP2D6 \
   CYP2D6-pipeline \
   --vcf input.vcf \
   --panel ref.vcf \
   --tsv input.tsv \
   --control-statistics control-statistics-VDR.zip

API examples
============

Predict phenotype based on two haplotype calls:

.. code:: python3

    >>> import pypgx
    >>> pypgx.predict_phenotype('CYP2D6', '*4', '*5')   # Both alleles have no function
    'Poor Metabolizer'
    >>> pypgx.predict_phenotype('CYP2D6', '*5', '*4')   # The order of alleles does not matter
    'Poor Metabolizer'
    >>> pypgx.predict_phenotype('CYP2D6', '*1', '*22')  # *22 has uncertain function
    'Indeterminate'
    >>> pypgx.predict_phenotype('CYP2D6', '*1', '*1x2') # Gene duplication
    'Ultrarapid Metabolizer'
    >>> pypgx.predict_phenotype('CYP2B6', '*1', '*4')   # *4 has increased function
    'Rapid Metabolizer'


