Metadata-Version: 2.1
Name: PhyloFunc
Version: 1.0.6
Summary: calculating protein- level functional distances
Author: Luman Wang
Author-email: lumanottawa@gmail.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Description-Content-Type: text/markdown
License-File: LICENSE.txt
Requires-Dist: pandas
Requires-Dist: Bio

# PhyloFunc

'PhyloFunc' is a Python package for calculating functional beta-diversity distances by the phylogenetic framework between any sample pairs. It can detect the significance of phylogenetic context in shaping the functional diversity within metaproteomes.

## Installation of package

You can install this package via pip:  
pip install PhyloFunc

## Usage
Once installed, you can use the PhyloFunc package in a Python script or an interactive environment.

### Quick Start

#### Import package

from PhyloFunc import PhyloFunc_distance  
from PhyloFunc import PhyloFunc_matrix

#### Script

#### Calculate PhyloFunc distance for sample pair

#### 1. Use the embedded Genome-COG metaproteomic data file for sample pair and embedded UHGG tree file
PhyloFunc_distance()

#### 2. Use a custom metaproteomic data file or custom phylogenetic tree constructed by corresponding metagenomics data
PhyloFunc_distance(tree_file='phylogenetic_tree_file. nwk', sample_file='metaproteomic_sample_file.csv')

#### Calculate PhyloFunc distance matrix for all samples

#### 1. Use the embedded taxon-function metaproteomic data file for all samples and embedded UHGG tree file
PhyloFunc_matrix()

#### 2. Use a custom metaproteomic data file or custom phylogenetic tree constructed by corresponding metagenomics data
PhyloFunc_matrix(tree_file='phylogenetic_tree_file. nwk', sample_file='metaproteomic_sample_file.csv')

### Parameter
tree_file: This file contains phylogenetic trees in Newick format, which can be generated by metagenomics data using software such as MEGA. If no file is provided, the package uses the embedded UHGG tree file: bac120_iqtree_v2.0.1.nwk.

sample_file: This is the CSV file containing the Genome-COG metaproteomic data. To generate two sample distances, the file should include four columns named Genome, COG accession, intensity for sample 1, and intensity for sample 2. To generate a distance matrix between multiple samples, the file should include multiple columns, with the first two columns named Genome, COG accession, and the following columns being the intensity for each sample. If no file is provided, the package will use the embedded sample file from a human gut microbiome experiment referenced in our paper.

### Output
PhyloFunc distance or PhyloFunc distance matrix can be output.

## Performance optimization
This package improves performance by reducing disk I/O operations and processing data in memory. This enables faster computations with large datasets.

## Project structure
```
PhyloFunc/  
├── __init__.py  
├── PhyloFunc.py  
│  └── The main function code.  
├── data/  
│  ├── Taxon_Function_distance.csv
│  │  └── Data file for calculating the distance between two samples. 
│  ├── Taxon_Function_matrix.csv
│  │  └── Data file for calculating distances among multiple samples.  
│  └── bac120_iqtree_v2.0.1.nwk  
│     └── Phylogenetic tree file.  
└── PhyloFunc_Package_Tutorial.ipynb
   └── Demonstrates the specific application of this package.
```

## Contribution
Welcome code contributions and improvement suggestions! Feel free to submit an issue or a pull request on GitHub.

## License
This project uses an MIT license. For details, see the LICENSE file.

## Application
For more detailed usage instructions, please refer to the paper：
Wang and Li et al., PhyloFunc: Phylogeny-informed Functional Distance as a New Ecological Metric for Metaproteomic Data Analysis
doi: https://doi.org/10.1101/2024.05.28.596184
