Metadata-Version: 2.4
Name: wtfdtb
Version: 0.2.0
Summary: Inverse virtual screening — dock one ligand against a whole protein library via GNINA.
Author: Chandragupt Sharma
License-Expression: MIT
Keywords: bioinformatics,cheminformatics,docking,drug-discovery,target-fishing,virtual-screening
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Scientific/Engineering :: Chemistry
Requires-Python: >=3.10
Requires-Dist: biopython>=1.80
Requires-Dist: dimorphite-dl>=1.3
Requires-Dist: gemmi
Requires-Dist: meeko>=0.5
Requires-Dist: openmm>=8.0
Requires-Dist: pandas>=2.0
Requires-Dist: pdb-tools>=2.5
Requires-Dist: pdb2pqr>=3.6
Requires-Dist: pdbfixer>=1.9
Requires-Dist: prolif>=2.0
Requires-Dist: rdkit
Requires-Dist: requests
Requires-Dist: tqdm
Requires-Dist: typer>=0.9
Provides-Extra: dev
Requires-Dist: pytest-cov; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff; extra == 'dev'
Description-Content-Type: text/markdown

# WTFDTB — High-Throughput Inverse Virtual Screening

> **Target Fishing**: Dock a single small-molecule ligand against a library of macromolecular protein structures using a state-of-the-art ML/DL stack.

![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue)
![License: MIT](https://img.shields.io/badge/license-MIT-green)
![Status: Stable](https://img.shields.io/badge/status-v0.1.2-blue)

---

## What Is This?

Traditional virtual screening docks many ligands against one protein target. **WTFDTB flips this**: it docks **one ligand** against **many proteins** to answer the question — *"What targets does this drug bind?"*

This is called **inverse virtual screening** (or *target fishing*), and it's essential for:

- **Drug repurposing** — finding new uses for existing drugs
- **Off-target prediction** — identifying potential side effects  
- **Polypharmacology** — understanding multi-target drug activity
- **Natural product target deconvolution** — identifying targets for bioactive compounds

WTFDTB automates the entire workflow from a raw ligand file to a ranked CSV of protein targets with interaction fingerprints — no manual intervention needed.

---

## Pipeline Architecture

The pipeline runs in 5 sequential phases:

```
  ┌──────────────┐    ┌────────────────────┐    ┌──────────────────┐
  │  1. Ligand   │───▶│  2. Receptor       │───▶│  3. Pocket       │
  │     Prep     │    │     Curation       │    │     Detection    │
  │              │    │     (parallel)      │    │                  │
  │ Dimorphite-DL│    │ PDBFixer + PDB2PQR │    │     P2Rank       │
  │ RDKit + Meeko│    │ + PROPKA + Meeko   │    │     (Java ML)    │
  └──────────────┘    └────────────────────┘    └──────────────────┘
                                                         │
         ┌───────────────────────────────────────────────┘
         ▼
  ┌──────────────────┐    ┌──────────────────────┐
  │  4. Docking      │───▶│  5. Post-Docking      │
  │     (parallel)   │    │     Analysis           │
  │                  │    │                        │
  │     GNINA        │    │ ProLIF + Pandas        │
  │  (CNN-rescored)  │    │ Filter → Rank → CSV   │
  └──────────────────┘    └──────────────────────┘
```

### Phase Details

| Phase | Module | Tools | What It Does |
|-------|--------|-------|--------------|
| **1. Ligand Prep** | `ligand_prep.py` | Dimorphite-DL, RDKit, Meeko | Enumerate protonation states at target pH, generate 3D conformer, produce PDBQT with Gasteiger charges |
| **2. Receptor Curation** | `receptor_curation.py` | PDBFixer, PDB2PQR, PROPKA | Download PDB, strip HETATM/water, repair missing heavy atoms, protonate at target pH, parallelised |
| **3. Pocket Detection** | `pocket_detection.py` | P2Rank (Java) | ML-based cavity prediction — no template bias, detects all druggable sites per protein |
| **4. Docking** | `docking.py` | GNINA (C++) | CNN-rescored molecular docking for each pocket × ligand combination, parallelised |
| **5. Post-Docking** | `post_dock.py` | ProLIF, Pandas | Compute interaction fingerprints (H-bond, hydrophobic, etc.), filter, rank by Vina affinity, export CSV |

---

## Installation

### From PyPI (Recommended)

```bash
pip install wtfdtb
```

### From Source (Development)

```bash
git clone https://github.com/ChandraguptSharma07/WTFDTB.git
cd WTFDTB
pip install -e ".[dev]"
```

---

## Setup (External Binaries)

WTFDTB requires **GNINA** and **P2Rank**. You can set them up automatically:

```bash
# Default (CPU version)
wtfdtb install

# GPU version (Highly recommended for 1000+ proteins)
wtfdtb install --gpu
```

This command will:
1. Download pre-compiled Linux binaries for GNINA (v1.3.2) and P2Rank.
2. Place them in `~/.local/`.
3. **Automatically configure your PATH** by updating your `.bashrc`.

*Note: GPU version requires an NVIDIA GPU and CUDA 12 drivers. P2Rank requires Java ≥ 11.*

---

## Quick Start

### 1. Create a Target List
Create a file named `targets.txt` with PDB IDs or paths to `.pdb` files:
```text
1IEP
1PXX
```

### 2. Run the Screen
```bash
wtfdtb screen --ligand aspirin.smi --targets targets.txt --output results.csv
```
*If binaries are missing, the tool will interactively prompt you to install them.*

---

## CLI Reference

```bash
wtfdtb screen [OPTIONS]
```

| Flag | Type | Default | Description |
|------|------|---------|-------------|
| `--ligand`, `-l` | Path | *required* | Input ligand file (`.sdf`, `.mol`, `.mol2`, `.smi`) |
| `--targets`, `-t` | Path | *required* | Directory of `.pdb` files or text file of PDB IDs |
| `--output`, `-o` | Path | `results.csv` | Output CSV path for ranked results |
| `--ph` | float | `7.4` | Physiological pH for protonation |
| `--box-size` | int | `25` | Side length (Å) of the cubic docking box |
| `--cnn-model` | str | `default` | GNINA CNN model (`default`, `dense`) |
| `--cnn-score-threshold` | float | `0.5` | Minimum CNNscore (0–1) to accept a pose |
| `--min-interactions` | int | `1` | Minimum interactions to keep a pose |
| `--workers`, `-w` | int | CPU count | Parallel workers for curation and docking |
| `--exhaustiveness` | int | `8` | GNINA search exhaustiveness (higher = slower) |
| `--verbosity` | int | `1` | Logging: 0=quiet, 1=normal, 2=debug |

---

## Output Format

The output CSV is ranked primarily by **Vina affinity** (lower is better), with **CNNaffinity** used to break ties:

| Column | Description |
|--------|-------------|
| `rank` | Overall rank (1 = best binder) |
| `pdb_id` | Target protein PDB ID |
| `pocket` | Cavity name (from P2Rank) |
| `pose_rank` | Pose rank within this pocket (from GNINA) |
| `cnn_score` | Neural network confidence (0–1) |
| `cnn_affinity` | Predicted binding affinity (pKd) |
| `vina_affinity` | Empirical scoring affinity (kcal/mol) |
| `hbond` | Number of hydrogen bonds |
| `hydrophobic` | Number of hydrophobic contacts |
| `pi_stacking` | Number of π-stacking interactions |
| `salt_bridge` | Number of salt bridges |
| `total_interactions` | Sum of all interaction types |

---

## Python API

You can also run the pipeline programmatically in scripts or Jupyter notebooks:

```python
from pathlib import Path
from wtfdtb.pipeline import run_pipeline

results_csv = run_pipeline(
    ligand_path=Path("aspirin.smi"),
    targets_path=Path("targets.txt"),
    output_csv=Path("my_results.csv"),
    workers=4
)
```

---

## Supported Platforms

| Platform | Status | Notes |
|----------|--------|-------|
| **Linux x86_64** | ✅ Supported | Primary platform. Binaries auto-installed via `wtfdtb install`. |
| **Windows (WSL)** | ✅ Supported | Works flawlessly via Windows Subsystem for Linux. |
| **Kaggle / Colab** | ✅ Supported | Verified working. Use `pip install wtfdtb` in cells. |
| **macOS** | ⚠️ Partial | Python pipeline works; GNINA must be compiled from source. |

---

## Citation

If you use WTFDTB in your research, please cite:

```bibtex
@software{wtfdtb2026,
  title  = {WTFDTB: High-Throughput Inverse Virtual Screening},
  author = {Chandragupt Sharma},
  year   = {2026},
  url    = {https://github.com/ChandraguptSharma07/WTFDTB}
}
```

---

## License

MIT — see [LICENSE](LICENSE) for details.
