Metadata-Version: 2.1
Name: alpharing
Version: 2.0.0
Summary: Interpretable, protein structure-based prediction of missense variant deleteriousness
Author-email: Aaron Logsdon <aaron.logsdon19@imperial.ac.uk>
License: GPLv3
Project-URL: Homepage, https://github.com/loggy01/alpharing
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: POSIX :: Linux
Requires-Python: <3.11,>=3.10
Description-Content-Type: text/markdown
Requires-Dist: absl-py ==1.0.0
Requires-Dist: biopython ==1.79
Requires-Dist: chex ==0.1.86
Requires-Dist: dm-haiku ==0.0.12
Requires-Dist: dm-tree ==0.1.8
Requires-Dist: immutabledict ==2.0.0
Requires-Dist: jax[cuda12] ==0.4.26
Requires-Dist: ml-collections ==0.1.0
Requires-Dist: numpy ==1.24.3
Requires-Dist: pandas ==2.0.3
Requires-Dist: scikit-learn ==1.7.0
Requires-Dist: scipy ==1.11.1
Requires-Dist: shap ==0.48.0
Requires-Dist: tensorflow-cpu ==2.16.1
Requires-Dist: xgboost ==3.0.0

# AlphaRING v2 (AlphaRING-X)

AlphaRING is a package designed for interpretable, protein structure-based prediction of missense variant deleteriousness.

To predict the deleteriousness of a missense variant, AlphaRING performs the following steps:

1. Predicts the structure of the wild-type protein using [AlphaFold](https://github.com/google-deepmind/alphafold) to extract the pLDDT of the substituted wild-type residue.
2. Converts wild-type structure into a residue interaction network using [RING](https://ring.biocomputingup.it/) to extract the degree of the substituted wild-type residue.
3. Uses wild-type structure to calculate the ΔΔG of the substitution using [FoldX](https://foldxsuite.crg.eu/) and extracts it.
4. Calculates the relative substitution position (RSP) along the protein.
5. Feeds pLDDT, degree, ΔΔG, and RSP into an in-house [XGBoost](https://github.com/dmlc/xgboost) classifier trained to classify missense variant deleteriousness.
6. Outputs the probability of deleteriousness and feature SHAP values to explain the prediction mechanistically.

> [!NOTE]
> AlphaRING-X manuscript and benchmarking data/scripts (including for classifier training/calibrating/testing) will be released soon.

## Installation

Before installation, ensure you have a Linux machine equipped with a modern NVIDIA GPU and the following:

1. [Full AlphaFold v2 genetic database](https://github.com/google-deepmind/alphafold?tab=readme-ov-file#genetic-databases)
2. [RING v4](https://biocomputingup.it/services/download/)
3. [FoldX v5.1](https://foldxsuite.crg.eu/academic-license-info)
4. [Miniconda](https://www.anaconda.com/docs/getting-started/miniconda/main)

To install AlphaRING, please do the following:

1. Create an environment for AlphaRING using Miniconda:

   ```bash
   conda create -n alpharing -c bioconda -c conda-forge python==3.10 hmmer kalign2 pdbfixer hhsuite==3.3.0 openmm==8.0.0
   ```

2. Activate the environment and install AlphaRING:

   ```bash
   conda activate alpharing
   pip install alpharing
   ```

## Usage

To predict the deleteriousness of a missense variant, activate the AlphaRING environment and execute the `alpharing` command as follows:

```bash
alpharing \
  --fasta_path=... \
  --substitutions=... \
  --output_dir=... \
  --data_dir=... \
  --ring_exe_path=... \
  --foldx_exe_path=...
```

Argument breakdown:

- `--fasta_path`: path to a FASTA file representing the wild-type protein. See [here](https://github.com/loggy01/alpharing/tree/main/tests/test_data/input/protein.fa) for an example.
- `--substitutions`: list of one or more single-residue substitutions to be individually applied to the wild-type protein. Please represent the substitutions in FoldX format, e.g., WA70Y, where W is the wild-type residue, A is the chain, 70 is the substitution position, and Y is the variant residue. For AlphaRING, the chain should always be A. If multiple substitutions are provided, please separate them with commas only, e.g., WA70Y,WA80F.
- `--output_dir`: path to the directory that will store the output. 
- `--data_dir`: path to the directory of the full AlphaFold database.
- `--ring_exe_path`: path to the RING executable. This should remain in the original installation.
- `--foldx_exe_path`: path to the FoldX executable. This should remain in the original installation.

## Downstream

AlphaRING stores the output in a subdirectory within the directory specified by `--output_dir`. The subdirectory is named after the basename of the FASTA file specified by `--fasta_path`. In addition to the default AlphaFold, RING, and FoldX outputs, the subdirectory contains the file `alpharing_scores.txt`, which summarises the feature values, deleteriousness probability, and feature SHAP values of each substitution specified by `--substitutions`.

> [!NOTE]
> For efficiency, when running a prediction, AlphaRING will check if the FASTA file specified by `--fasta_path` already has its corresponding output subdirectory with an AlphaFold relaxed model file, and, if found, will skip running AlphaFold.
