Metadata-Version: 2.1
Name: phylopypruner
Version: 0.1.2
Summary: tree-based orthology inference
Home-page: https://gitlab.com/fethalen/phylopypruner
Author: Felix Thalen
Author-email: fe1430th-s@student.lu.se
License: MIT
Keywords: orthology inference,orthologs,tree-based,phylogenetics,phylogenomics,orthology
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Description-Content-Type: text/markdown

PhyloPyPruner
-------------

PhyloPyPruner is a tree-based orthology inference program for refining
orthology inference made by a graph-based approach. In addition to implementing
previously published paralogy pruning algorithms seen in
[PhyloTreePruner](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3825643/),
[UPhO](https://academic.oup.com/mbe/article/33/8/2117/2578877),
[Agalma](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3840672/) and [Yang and
Smith's
phylogenomic_dataset_construction](https://www.ncbi.nlm.nih.gov/pubmed/25158799),
this software provides tools for identifying and getting rid of operational
taxonomical units (OTUs) that display contamination-like issues.

PhyloPyPruner is currently under active development and I would appreciate it
if you try this software on your own data and [leave
feedback](mailto:felix.thalen.1430@student.lu.se).

See [the Wiki](https://gitlab.com/fethalen/phylopypruner/wikis) for more
details.

### Features

* Remove short sequences
* Remove sequences with long branches
* Collapse weakly supported nodes into polytomies
* Five different paralogy pruning algorithms
* Measure and remove OTUs with frequent paralogs
* Identify problematic OTUs using taxon jackknifing
* Exclude certain OTUs
* Specify taxonomical groups and see how often they form a phylogenetic group
* Mask monophylies by choosing the longest sequence or using pairwise distance

### Installation

This software runs under both Python 3 and 2.7. There are no external
dependencies, but the plotting library [Matplotlib](https://matplotlib.org/)
may be installed for generating paralog frequency plots.

You can install PhyloPyPruner using pip.

```bash
pip install --user phylopypruner
```

### Usage

Once installed, execute this software like so:

```bash
python -m phylopypruner
```

To get a list of options, either run the software without any arguments or, by
using the `-h` or `--help` flag.

Either provide a single multiple sequence alignment (MSA) and a Newick tree by
using the `--msa` and `--tree` flags:

```bash
python -m phylopypruner --msa 16s.fas --tree 16s.tre
```

or, provide a `path` to an input directory, containing multiple trees and
alignments, by typing `--dir path`.

FASTA descriptions and Newick names must match and has to be in one of the
following formats: `OTU|ID` or `OTU@ID`, where `OTU` is the operational
taxonomical unit (usually the species) and `ID` is a unique annotation or
sequence identifier. For example: `>Meiomenia_swedmarki|Contig00001_Hsp90`.

Sequence descriptions and tree names are not allowed to deviate from each
other. Sequence data needs to be [valid IUPAC nucleotide or amino acid
sequences](https://www.bioinformatics.org/sms/iupac.html).

For inputting multiple files, you provide a path to the directory in which
these files reside.

```bash
python -m phylopypruner --dir <path>
```

The program will automatically look for trees and alignments with the same name
and run for each of these pair.

### Output files

The following files are generated after running this program.

* `<timestamp>_<orthologs>/...` – output alignments
* `<timestamp>_ppp_summary.log` – summary statistics for all alignments
* `<timestamp>_ppp_run.log` – detailed report of each performed action
* `<timestamp>_ppp_ortho_stats.csv` – statistics for output alignments
* `<timestamp>_ppp_paralog_freq.csv` – paralogy frequency data
* `<timestamp>_ppp_paralog_freq.png` – paralogy frequency plot\*

If no output directory has been specified by the `--output` flag, then output
files will be located within the same directory as the input alignment files.

\* – only produced if [Matplotlib](https://matplotlib.org/) is installed

© [Kocot Lab](https://www.kocotlab.com/) 2018


