Metadata-Version: 2.1
Name: pyPINTS
Version: 1.1.5
Summary: Peak Identifier for Nascent Transcripts Starts (PINTS)
Home-page: https://pints.yulab.org
Author: Li Yao
Author-email: regulatorygenome@gmail.com
License: GPL
Platform: UNKNOWN
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy (>=1.19.2)
Requires-Dist: pandas (>=1.1.5)
Requires-Dist: scipy (>=1.5.2)
Requires-Dist: pysam (>=0.16.0.1)
Requires-Dist: requests
Requires-Dist: pybedtools (>=0.8.1)
Requires-Dist: statsmodels (>=0.12.1)
Requires-Dist: pyBigWig
Requires-Dist: biopython
Requires-Dist: matplotlib

# PINTS: Peak Identifier for Nascent Transcripts Starts

![](https://img.shields.io/badge/platform-linux%20%7C%20osx-lightgrey.svg)
![](https://img.shields.io/badge/python-3.x-blue.svg)
[![PyPI](https://github.com/liyao001/PINTS/actions/workflows/python-publish.yml/badge.svg)](https://github.com/liyao001/PINTS/actions/workflows/python-publish.yml)

## Installation

PINTS is available on PyPI, which means you can install it with the following command:

```shell
pip install pyPINTS
```

Alternatively, you can clone this repo to a local directory, then in the directory, run the following command:

```shell
python setup.py install
```

## Prerequisite

Python packages

* biopython
* matplotlib
* numpy
* pandas
* pybedtools
* pyBigWig
* pysam
* requests
* scipy
* statsmodels

## Get started

PINTS can call peaks directly from BAM files. To call peaks from BAM files,
you need to provide the tool a path to the bam file and what kind of experiment it was from.
If it's from a standard protocol, like [PROcap](https://doi.org/10.1038/nprot.2016.086), then you can set `--exp-type PROcap`.
Other supported experiments including [GROcap](https://doi.org/10.7554/eLife.00808)/
[CoPRO](https://doi.org/10.1038/s41588-018-0234-5)/
[csRNAseq](https://doi.org/10.1101/gr.253492.119)/
[NETCAGE](https://doi.org/10.1038/s41588-019-0485-9)/
[CAGE](https://doi.org/10.1038/nmeth0306-211)/
[RAMPAGE](https://doi.org/10.1101/gr.139618.112)/
[STRIPEseq](https://doi.org/10.1101/gr.261545.120). For a comprehensive list of directly supported assays, please run

```shell
pints_caller --help
```

If the data was generated by other methods, you need to tell the tool where it can find ends of RNAs you are interested in.
For example, `--exp-type R_5` tells the tool that:

1. this alignment is from a single-end library;
2. the tool should look at 5' of reads. Other supported values are `R_3`, `R1_5`, `R1_3`, `R2_5`, `R2_3`.

If reads represent the reverse complement of original RNAs, like PROseq, then you need to use `--reverse-complement`
(not necessary for standard protocols).

One example for calling peaks from BAM file:

```shell
pints_caller --bam-file input.bam --save-to output_dir --file-prefix output_prefix --thread 16 --exp-type PROcap
```

Or you can call peaks from BigWig files:

```shell
pints_caller --save-to output_dir --file-prefix output_prefix --bw-pl path_to_pl.bw --bw-mn path_to_mn.bw --thread 16
```

## Outputs

* prefix+`_{SID}_divergent_peaks.bed`: Divergent TREs;
* prefix+`_{SID}_bidirectional_peaks.bed`: Bidirectional TREs (divergent + convergent);
* prefix+`_{SID}_unidirectional_peaks.bed`: Unidirectional TREs, maybe lncRNAs transcribed from enhancers (e-lncRNAs) as suggested [here](http://www.nature.com/articles/s41576-019-0184-5).

`{SID}` will be replaced with the number of samples that peaks are called from,
if you only provide PINTS with one sample, then `{SID}` will be replaced with **1**,
if you try to use PINTS with three replicates (`--bam-file A.bam B.bam C.bam`), then `{SID}` for peaks identified from `A.bam` will be replaced with 1.

For divergent or bidirectional TREs, there will be 6 columns in the outputs:

1. Chromosome
2. Start site: 0-based
3. End site: 0-based
4. Confidence about the peak pair. Can be:
   * `Stringent(qval)`, which means the two peaks on both forward and reverse strands are significant based on their *q*-values;
   * `Stringent(pval)`, which means one peak is significant according to *q*-value while the other one is significant according to *p*-value;
   * `Relaxed`, which means only one peak is significant in the pair.
   * A combination of the three types above, because of overlap for nearby elements.
   * If epigenomic annotation is enabled by `--epig-annotation <biosample>`, then peaks that are less significant (`--relaxed-fdr-target`, default is 2*`fdr_target`), but overlap with epigenomic annotations from PINTS web server, will be listed with the confidence level: `Marginal`.
5. Major TSSs on the forward strand, if there are multiple major TSSs, they will be separated by comma `,`
6. Major TSSs on the reverse strand, if there are multiple major TSSs, they will be separated by comma `,`

For unidirectional TREs, there will be 9 columns in the output:

1. Chromosome
2. Start
3. End
4. Peak ID
5. Q-value
6. Strand
7. Read counts
8. Position of the summit TSS
9. Height of the summit

For all three types of TREs, if a valid biosample name for `--epig-annotation` is provided, then an additional column with epigenomic annotation for each TRE will show up in the final output.

## Parameters

### Input & Output

* If you want to use BAM files as inputs:
  * `--bam-file`: input bam file(s);
  * `--exp-type`: Type of experiment. If the experiment is not listed as a choice, or you know the position of RNA ends on the reads and you want to override the defaults, you can specify:
    * `R_5` (5' of the read for single-end lib),
    * `R_3` (3' of the read for single-end lib),
    * `R1_5` (5' of the read1 for paired-end lib),
    * `R1_3` (3' of the read1 for paired-end lib),
    * `R2_5` (5' of the read2 for paired-end lib),
    * or `R2_3` (3' of the read2 for paired-end lib)
  * `--reverse-complement`: Set this switch if 1) `exp-type` is `Rx_x` and 2) reads in this library represent the reverse complement of RNAs, like PROseq;
  * `--ct-bam`: Bam file for input/control (optional);
* If you want to use bigwig files as inputs:
  * `--bw-pl`: Bigwig for signals on the forward strand;
  * `--bw-mn`: Bigwig for signals on the reverse strand;
  * `--ct-bw-pl`: Bigwig for input/control signals on the forward strand (optional);
  * `--ct-bw-mn`: Bigwig for input/control signals on the reverse strand (optional);
* `--save-to`: save peaks to this path (a folder), by default, current folder
* `--file-prefix`: prefix to all outputs

### Optional parameters

* `--epig-annotation <biosample>`: Use this option together with the name of the biosample that the library was derived from, for example K562; then epigenomic annotations will be downloaded from the PINTS web server and used for annotating and augmenting TREs identified by PINTS **(for hg38 only)**;
* `--relaxed-fdr-target <relaxed fdr>`: In the presence of `--epig-annotation`, peaks that do not pass the original FDR cutoff but pass this relaxed cutoff and have support from DNase-seq and H3K27ac ChIP-seq will also be included in final outputs. By default, 2*fdr;
* `--mapq-threshold <min mapq>`: Minimum mapping quality, by default: 30 or `None`;
* `--close-threshold <close distance>`: Distance threshold for two peaks (on opposite strands) to be merged, by default: 300;
* `--fdr-target <fdr>`: FDR target for multiple testing, by default: 0.1;
* `--chromosome-start-with <chromosome prefix>`: Only keep reads mapped to chromosomes with this prefix, if it's set to `None`, then all reads will be analyzed;
* `--thread <n thread>`: Max number of threads the tool can create;
* `--borrow-info-reps`: Borrow information from reps to refine calling of divergent elements;
* `--output-diagnostic-plot`: Save diagnostic plots (independent filtering and pval dist) to local folder

More parameters can be seen by running `pints_caller -h`.

## Other tools

* `pints_boundary_extender`: Extend peaks from summits.
* `pints_visualizer`: Generate bigwig files for the inputs.
* `pints_normalizery`: Normalize inputs.

## Tips

1. Be cautious to reads mapped to scaffolds instead of main chromosome (for example the notorious `chrUn_gl000220` in `hg19`, they maybe rRNA contamination)!

## Contact

Please submit an issue with any questions or if you experience any issues/bugs. If you use PINTS in your work, please cite: [https://www.nature.com/articles/s41587-022-01211-7](https://www.nature.com/articles/s41587-022-01211-7).


