Metadata-Version: 2.1
Name: tengu-py
Version: 0.12.2
Summary: Python SDK for interacting with the QDX Tengu API and modules
Author: Ryan Swart
Author-email: ryan@talosystems.com
Requires-Python: >=3.10,<3.13
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Dist: dataclasses-json (>=0.5.14,<0.6.0)
Requires-Dist: gql[requests] (>=3.5.0b,<4.0.0)
Requires-Dist: pdb-tools (>=2.5.0,<3.0.0)
Requires-Dist: requests-toolbelt (>=1.0.0,<2.0.0)
Description-Content-Type: text/markdown


## tengu-py

Below we’ll walk through the process of building and running a drug
discovery workflow using tengu!

First, install the following modules via pip - we require Python \> 3.10

    pip install tengu-py pdb-tools

<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

``` python
import json
import os
import sys
import tarfile

from pdbtools import *
import requests
from datetime import datetime
from pathlib import Path

import tengu
```

### 0) Setup

``` python
# Set our token - ensure you have exported TENGU_TOKEN in your shell; or just replace the os.getenv with your token
TOKEN = os.getenv("TENGU_TOKEN")
```

``` python
# Define our project information
DESCRIPTION = "tengu-py demo notebook"
TAGS = ["qdx", "tengu-py", "demo"]
WORK_DIR = Path.home() / "qdx" / "tengu-py-demo"
OUT_DIR = WORK_DIR / "runs"
OUT_DIR.mkdir(parents=True, exist_ok=True)

# Set our inputs
SYSTEM_PDB_PATH = WORK_DIR / "test.pdb"
PROTEIN_PDB_PATH = WORK_DIR / "test_P.pdb"
LIGAND_SMILES_STR = "CCCc1ccccc1O"
LIGAND_PDB_PATH = WORK_DIR / "test_L.pdb"
```

``` python
# fetch datafiles
complex = pdb_fetch.fetch_structure("3HTB")
protein = pdb_delhetatm.remove_hetatm(pdb_selchain.select_chain(complex, "A"))
ligand = pdb_selres.select_residuese(complex, "JZ4")
with open(SYSTEM_PDB_PATH, 'w') as f:
    for l in complex:
        f.write(str(l))
with open(PROTEIN_PDB_PATH, 'w') as f:
    for l in protein:
        f.write(str(l))
with open(LIGAND_PDB_PATH, 'w') as f:
    for l in ligand:
        f.write(str(l))
```

``` python
# Get our client, for calling modules and using the tengu API
client = tengu.Provider(access_token=TOKEN)
```

``` python
# Get our latest modules as a dict[module_name, module_path]
modules = client.get_latest_module_paths()
```

- `module_name` is a descriptive string and indicates the “function” the
  module is calling;
- `module_path` is a versioned tengu “endpoint” for a module accessible
  via the client.

Using the same `module_path` string across multiple runs provides
reproducibility.

The following is an example of how save and load frozen modules:

``` python
frozen_modules_filepath = client.save_module_paths(modules)
frozen_modules = client.load_module_paths(frozen_modules_filepath)
assert(modules == frozen_modules)
```

You could save modules and provide a fixed string to load_module_paths:

``` python
FROZEN_MODULES_FILEPATH = 'tengu-modules-20231006T132244.json'
frozen_modules = client.load_module_paths(FROZEN_MODULES_FILEPATH)
```

Below we’ll call modules using `client.run2(...)`.

The parameters to client.run2() are as follows: - `module_path`: The
endpoint of the module we’ll be running; - `args`: A list of the
arguments to the module; an argument can be one of the following: 1. A
`pathlib.Path` or a file-like object like `BufferedReader`, `FileIO`,
`StringIO` etc.:  
Loads the data in the file as an argument.  
**NOTE**: The uploaded value isn’t just the string of the file, so don’t
pass the string directly; pass the path or wrap in StringIO. 2. A
`tengu.ArgId`:  
Uses an object already uploaded to tengu, such as outputs of other run
calls.  
See below for more details. It’s easier to understand when you see an
example. 3. A parameter, i.e. a value of any other type, including
`None`:  
Tengu modules take configs as json in the backend; we’ll convert for
you.  
Just pass arguments directly, as per the schema for the module you’re
running. - `target`: The machine we want to run on (`NIX_SSH` for a
cluster, `GADI` for a supercomputer). - `resources`: The resources to
use on the target. - `tags`: Tags to associate with our run, so we can
easily look up our runs.

The return value is a dict that contains: - key `"module_instance_id"`
-\> val is a `ModuleInstanceId` for the run itself; - key `"output_ids"`
-\> val is a list of `ArgId`s, one for each output.

Both of these ID types have the form of a UUID. This ID lets you
manipulate the output of this module without having to: 1) Wait for the
module to finish its computation, or 2) Download the actual value
corresponding to this output.

You can pass it to subsequent modules as if it were the value itself, or
you can wait on it to obtain the value itself.

<div class="alert alert-block alert-warning">

A coming improvement will provide explicit naming and type info for the
inputs and outputs of each module, which will improve clarity and
discoverability.

</div>

### 1.1) Prep the protein

``` python
pdb2pqr_result = client.run2(
    modules["pdb2pqr_tengu"],
    [
        PROTEIN_PDB_PATH,
    ],
    target="NIX_SSH",
    resources={"gpus": 1, "storage": 1_024_000_000, "walltime": 15},
    tags=TAGS,
)
pdb2pqr_run_id = pdb2pqr_result["module_instance_id"]
prepped_protein_id = pdb2pqr_result["output_ids"][0]
print(f"{datetime.now().time()} | Running protein prep!")
```

``` python
with open(OUT_DIR / f"01-pdb2pqr-{pdb2pqr_run_id}.json", "w") as f:
    json.dump(pdb2pqr_result, f, default=str, indent=2)
```

``` python
client.poll_module_instance(pdb2pqr_run_id)
client.download_object(prepped_protein_id, OUT_DIR / "01-prepped-protein.pdb")
print(f"{datetime.now().time()} | Downloaded prepped protein!")
```

### 1.2) Prep the ligand

``` python
ligand_prep_config = {
    "source": "",
    "output_folder": "./",
    "job_manager": "multiprocessing",
    "num_processors": -1,
    "max_variants_per_compound": 1,
    "thoroughness": 3,
    "separate_output_files": True,
    "min_ph": 6.4,
    "max_ph": 8.4,
    "pka_precision": 1.0,
    "skip_optimize_geometry": True,
    "skip_alternate_ring_conformations": True,
    "skip_adding_hydrogen": False,
    "skip_making_tautomers": True,
    "skip_enumerate_chiral_mol": True,
    "skip_enumerate_double_bonds": True,
    "let_tautomers_change_chirality": False,
    "use_durrant_lab_filters": True,
}
ligand_prep_result = client.run2(
    modules["prepare_ligand_tengu"],
    [
        LIGAND_SMILES_STR,
        LIGAND_PDB_PATH,
        ligand_prep_config,
    ],
    target="NIX_SSH",
    resources={"gpus": 1, "storage": 16_000_000, "walltime": 5},
    tags=TAGS,
)
ligand_prep_run_id = ligand_prep_result["module_instance_id"]
prepped_ligand_id = ligand_prep_result["output_ids"][0]
print(f"{datetime.now().time()} | Running ligand prep!")
```

``` python
with open(OUT_DIR / f"01-prepare-ligand-{ligand_prep_run_id}.json", "w") as f:
    json.dump(ligand_prep_result, f, default=str, indent=2)
```

``` python
client.poll_module_instance(ligand_prep_run_id)
client.download_object(prepped_ligand_id, OUT_DIR / "01-prepped-ligand.pdb")
print(f"{datetime.now().time()} | Downloaded prepped ligand!")
```

### 2) Run GROMACS (module: gmx_tengu / gmx_tengu_pdb)

``` python
gmx_config = {
    "param_overrides": {
        "md": [("nsteps", "5000")],
        "em": [("nsteps", "1000")],
        "nvt": [("nsteps", "1000")],
        "npt": [("nsteps", "1000")],
        "ions": [],
    },
    "num_gpus": 4,
    "num_replicas": 1,
    "ligand_charge": None,
    "frame_sel": {
        "begin_time": 2,
        "end_time": 10,
        "delta_time": 2,
    },
}
gmx_result = client.run2(
    # TODO: Should be using qdxf conformer verions of these modules
    modules["gmx_tengu_pdb"],
    [
        prepped_protein_id,
        prepped_ligand_id,
        gmx_config,
    ],
    target="GADI",
    resources={"gpus": 4, "storage": 1_024_000_000, "cpus": 48, "walltime": 60},
    tags=TAGS,
)
gmx_run_id = gmx_result["module_instance_id"]
gmx_output_id = gmx_result["output_ids"][0]
gmx_ligand_gro_id = gmx_result["output_ids"][3]
print(f"{datetime.now().time()} | Running GROMACS simulation!")
```

``` python
with open(OUT_DIR / f"02-gmx-{gmx_run_id}.json", "w") as f:
    json.dump(gmx_result, f, default=str, indent=2)
```

``` python
client.poll_module_instance(gmx_run_id, n_retries=60, poll_rate=60)
client.download_object(gmx_output_id, OUT_DIR / "02-gmx-output.zip")
# Get the "dry" (i.e. non-solvated) frames we asked for
with tarfile.open(OUT_DIR / "02-gmx-output.zip", "r") as tf:
    selected_frame_pdbs = [
        tf.extractfile(member)
        for member in sorted(tf, key=lambda m: m.name)
        if ("dry" in member.name and "pdb" in member.name)
    ]
client.download_object(gmx_ligand_gro_id, OUT_DIR / "02-gmx-ligand.gro")
print(f"{datetime.now().time()} | Downloaded GROMACS output!")
```

### 3.1) Run quantum energy calculation (modules: qp_gen_inputs, hermes_energy, qp_collate)

``` python
# We have a helper function for this, as it combines 3 modules without much need
# to inspect the intermediate results.
(_, _, qp_result) = client.run_qp(
    modules["qp_gen_inputs"],
    modules["hermes_energy"],
    modules["qp_collate"],
    pdb=selected_frame_pdbs[0],  # extractfile returns a BufferedReader, which is file-like
    gro=gmx_ligand_gro_id,
    lig=prepped_ligand_id,
    lig_type="sdf",
    lig_res_id="UNL",  # The ligand's residue code in the PDB file; this is what our prep uses
    target="GADI",
    resources={"storage": 1_024_000_000, "walltime": 600},
    tags=TAGS,
)
qp_run_id = qp_result["module_instance_id"]
qp_interaction_energy_id = qp_result["output_ids"][0]
print(f"{datetime.now().time()} | Running QP energy calculation!")
```

``` python
with open(OUT_DIR / f"03-qp-{qp_run_id}.json", "w") as f:
    json.dump(qp_result, f, default=str, indent=2)
```

``` python
client.poll_module_instance(qp_run_id)
client.download_object(qp_interaction_energy_id, OUT_DIR / "03-qp-interaction-energy.json")
print(f"{datetime.now().time()} | Downloaded qp interaction energy!")
```

### 3.2) Run MM-PBSA

``` python
mmpbsa_config = [
    401,  # start frame
    901,  # end frame
    None,  # optional argument for overriding raw GROMACS parameters
    12,  # num_cpus
]
mmpbsa_result = client.run2(
    modules["gmx_mmpbsa_tengu"],
    [
        gmx_output_id,
        *mmpbsa_config,
    ],
    target="GADI",
    resources={"storage": 1_024_000_000, "walltime": 600},
    tags=TAGS,
)
mmpbsa_run_id = mmpbsa_result["module_instance_id"]
mmpbsa_output_id = mmpbsa_result["output_ids"][0]
print(f"{datetime.now().time()} | Running GROMACS MM-PBSA calculation!")
```

``` python
with open(OUT_DIR / f"03-mmpbsa-{mmpbsa_run_id}.json", "w") as f:
    json.dump(mmpbsa_result, f, default=str, indent=2)
```

``` python
client.poll_module_instance(mmpbsa_run_id)
client.download_object(mmpbsa_output_id, OUT_DIR / "03-mmpbsa-output.zip")
print(f"{datetime.now().time()} | Downloaded MM-PBSA results!")
```

