Metadata-Version: 2.4
Name: fdog
Version: 1.1.6
Summary: Feature-aware Directed OrtholoG search tool
Home-page: https://github.com/BIONF/fDOG
Author: Vinh Tran
Author-email: tran@bio.uni-frankfurt.de
License: GPL-3.0
Classifier: Environment :: Console
Classifier: Intended Audience :: End Users/Desktop
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.12.0
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: biopython
Requires-Dist: tqdm
Requires-Dist: ete4
Requires-Dist: six
Requires-Dist: PyYAML
Requires-Dist: pyhmmer
Requires-Dist: pysam
Requires-Dist: pandas
Requires-Dist: greedyFAS>=1.19.0
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license
Dynamic: license-file
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# fDOG - Feature-aware Directed OrtholoG search
[![published in: MBE](https://img.shields.io/badge/published%20in-MBE-ff69b4)](https://doi.org/10.1093/molbev/msaf120)
[![PyPI version](https://badge.fury.io/py/fdog.svg)](https://pypi.org/project/fdog/)
[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
![Github Build](https://github.com/BIONF/fDOG/workflows/build/badge.svg)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.17250793.svg)](https://doi.org/10.5281/zenodo.17250793)


# Table of Contents
* [How to install](#how-to-install)
     * [Install the fDOG package](#install-the-fdog-package)
     * [Setup fDOG](#setup-fdog)
* [Usage](#usage)
* [fDOG data set](#fdog-data-set)
     * [Adding a new gene set into fDOG](#adding-a-new-gene-set-into-fdog)
     * [Adding a list of gene sets into fDOG](#adding-a-list-of-gene-sets-into-fdog)
* [fDOG-Assembly](#fdog-assembly)
* [Bugs](#bugs)
* [How to cite](#how-to-cite)
* [Contributors](#contributors)
* [Contact](#contact)

# How to install

*fDOG* tool is distributed as a python package called *fdog*. It is compatible with [Python ≥ v3.12](https://www.python.org/downloads/).

## Install the fDOG package

**_RECOMMENDATION:_** Install fDOG in a fresh conda-like environment to ensure compatibility and prevent conflicts with other packages. We recommend using Mamba or Micromamba instead of Conda for better performance, faster dependency resolution and more reliable environment management.

### Using a Mamba/Conda Environment

1. Follow [this instruction](https://mamba.readthedocs.io/en/latest/) to install Mamba or Micromamba (faster package manager for conda)

2. Create a new environment

```
mamba create -n fdog python -y
```

3. Activate the environment

```
mamba activate fdog
```

4. Install *fdog* using `pip`:

```
python3 -m pip install fdog
```

### Without conda

1. Install *fdog* globally (requires admin rights)

```
python3 -m pip install fdog
```

2. Install *fdog* for a single user (no admin rights needed)

```
python3 -m pip install --user fdog
```

3. Add local bin to PATH (if using `--user`)

Append this line to the end of your **~/.bashrc** or **~/.bash_profile** file

```
export PATH=$HOME/.local/bin:$PATH
```

Then, reload the current terminal to apply the change (or run `source ~/.bashrc`)

## Setup fDOG

After installing *fdog*, you need to setup *fdog* to get its dependencies and pre-calculated data.

**IMPORTANT NOTE**: if you haven't installed [greedyFAS](https://github.com/BIONF/FAS), it will be automatically installed during the *fDOG* setup. After installation, you must run [setupFAS](https://github.com/BIONF/FAS/wiki/setupFAS) before using *fDOG* with *FAS*! This step is required to configure *FAS* correctly. You can run *fDOG* without *FAS* by adding the `--fasOff` option to your command. However, it is recommended to use *FAS* to access all the features of *fDOG*.

You can setup fDOG by running this command
```
fdog.setup -d /output/path/for/fdog/data
```

[Pre-calculated data set](https://github.com/BIONF/fDOG/wiki/Input-and-Output-Files#data-structure) of fdog will be saved in `/output/path/for/fdog/data`. After the setup run successfully, you can start using *fdog*. **Please make sure to check if you need to run [setupFAS](https://github.com/BIONF/FAS/wiki/setupFAS) first.**

You will get a warning if any of the dependencies are not ready to use, please solve those issues and rerun `fdog.setup`.

*For debugging the setup, please create a log file by running the setup as e.g. `fdog.setup | tee log.txt` and send us that log file, so that we can trouble shoot the issues. Most of the problems can be solved by just re-running the setup.*

# Usage

Once *fdog* is installed and set up correctly, it can be run using the provided sample input file 'infile.fa'

## Running fDOG with FAS
```
fdog.run --seqFile infile.fa --jobName test --refspec HUMAN@9606@qfo24_02
```

## Running fDOG without FAS

If FAS has not been set up, add the `--fasOff` option

```
fdog.run --seqFile infile.fa --jobName test --refspec HUMAN@9606@qfo24_02 --fasOff
```

## Output

All output files will be saved in your **current working directory** with the prefix specified in `--jobName` (e.g. `test`).

## Viewing all options

You can have an overview about all available options with the command
```
fdog.run -h
```

Please find more information in [our wiki](https://github.com/BIONF/fDOG/wiki) to learn about the [input and outputs files](https://github.com/BIONF/fDOG/wiki/Input-and-Output-Files) of *fdog*.

# fDOG data set

Within the data package we provide a set of [81 reference taxa](https://ftp.ebi.ac.uk/pub/databases/reference_proteomes/QfO/QfO_release_2024_02.tar.gz). They will be automatically downloaded during the setup. This data comes "ready to use" with the *fdog* framework. Species data must be present in the three directories listed below:

* searchTaxa_dir (Contains sub-directories for proteome fasta files for each species)
* coreTaxa_dir (Contains sub-directories for BLAST databases made with `makeblastdb` out of your proteomes)
* annotation_dir (Contains feature annotation files for each proteome)

For each species/taxon there is a sub-directory named in accordance to the naming schema ([Species acronym]@[NCBI ID]@[Proteome version])

*fdog* is not limited to those 81 reference taxa. If needed the user can manually add further gene sets (multiple fasta format) using provided functions.

## Adding a new gene set into fDOG
For adding **one gene set**, please use the `fdog.addTaxon` function:
```
fdog.addTaxon -f newTaxon.fa -i tax_id [-o /output/directory] [-n abbr_tax_name] [-c] [-v protein_version] [-a]
```

in which, the first 3 arguments are required including `newTaxon.fa` is the gene set that need to be added, `tax_id` is its NCBI taxonomy ID, `/output/directory` is where the sub-directories can be found (*genome_dir*, *blast_dir* and *weight_dir*). If not given, new taxon will be added into the same directory of pre-calculated data. Other arguments are optional, which are `-n` for specify your own taxon name (if not given, an abbriviate name will be suggested based on the NCBI taxon name of the input `tax_id`), `-c` for calculating the BLAST DB (only needed if you need to include your new taxon into the list of taxa for compilating the core set), `-v` for identifying the genome/proteome version (default will be the current date <YYMMDD>), and `-a` for turning off the annotation step (*not recommended*).

## Adding a list of gene sets into fDOG
For adding **more than one gene set**, please use the `fdog.addTaxa` script:
```
fdog.addTaxa -i /path/to/newtaxa/fasta -m mapping_file [-o /output/directory] [-c]
```
in which, `/path/to/taxa/fasta` is a folder where the FASTA files of all new taxa can be found. `mapping_file` is a tab-delimited text file, where you provide the taxonomy IDs that stick with the FASTA files:

```
#filename	tax_id	abbr_tax_name	version
filename1.fa	12345678
filename2.faa	9606
filename3.fasta	4932	my_fungi
...
```

The header line (started with #) is a Must. The values of the last 2 columns (abbr. taxon name and genome version) are, however, optional. If you want to specify a new version for a genome, you need to define also the abbr. taxon name, so that the genome version is always at the 4th column in the mapping file.

_**NOTE:** After adding new taxa into *fdog*, you should [check for the validity of the new data](https://github.com/BIONF/fDOG/wiki/Check-data-validity) before running fdog._

# fDOG-Assembly

*fDOG-Assembly* is an extension of *fDog* that enables searching for orthologs directly within unannotated genome assemblies. For more details about *fDOG-Assembly*, please refer to our [wiki page](https://github.com/BIONF/fDOG/wiki/fDOG-Assembly).

# Bugs
Any bug reports or comments, suggestions are highly appreciated. Please [open an issue on GitHub](https://github.com/BIONF/fDOG/issues/new) or be in touch via email.

# How to cite
Tran V, Langschied F, Muelbaier H, Dosch J, Arthen F, Balint M, Ebersberger I. 2025. Feature architecture-aware ortholog search with fDOG reveals the distribution of plant cell wall-degrading enzymes across life. Molecular Biology and Evolution:msaf120. https://doi.org/10.1093/molbev/msaf120

# Contributors
- [Ingo Ebersberger](https://github.com/ebersber)
- [Vinh Tran](https://github.com/trvinh)
- [Hannah Muelbaier](https://github.com/HannahBioI)
- [Holger Bergmann](https://github.com/holgerbgm)

# Contact
For further support or bug reports please contact: ebersberger@bio.uni-frankfurt.de
