bioneuralnet.downstream_task.dpmon

Functions

build_omics_networks_tg(adjacency_matrix, ...)

get_logger(name)

Retrieves a global logger configured to write to 'bioneuralnet.log' at the project root.

run_hyperparameter_tuning(dpmon_params, ...)

run_standard_training(dpmon_params, ...)

setup_device(gpu, cuda)

slice_omics_datasets(omics_dataset, ...)

train_model(model, criterion, optimizer, ...)

Classes

ASHAScheduler

alias of AsyncHyperBandScheduler

Autoencoder(*args, **kwargs)

CLIReporter(*[, metric_columns, ...])

Command-line reporter

Checkpoint(path[, filesystem])

A reference to data persisted as a directory in local or remote storage.

DPMON(adjacency_matrix, omics_list, ...[, ...])

DPMON (Disease Prediction using Multi-Omics Networks): An end-to-end pipeline for disease prediction using multi-omics networks.

DimensionAveraging(*args, **kwargs)

DownstreamTaskNN(*args, **kwargs)

GAT(*args, **kwargs)

GCN(*args, **kwargs)

GIN(*args, **kwargs)

NeuralNetwork(*args, **kwargs)

Path(*args, **kwargs)

PurePath subclass that can make system calls.

SAGE(*args, **kwargs)

class bioneuralnet.downstream_task.dpmon.Autoencoder(*args: Any, **kwargs: Any)[source]

Bases: Module

forward(x)[source]
class bioneuralnet.downstream_task.dpmon.DPMON(adjacency_matrix: DataFrame, omics_list: List[DataFrame], phenotype_data: DataFrame, clinical_data: DataFrame | None = None, model: str = 'GAT', gnn_hidden_dim: int = 16, layer_num: int = 5, nn_hidden_dim1: int = 8, nn_hidden_dim2: int = 8, num_epochs: int = 100, repeat_num: int = 5, lr: float = 0.1, weight_decay: float = 0.0001, tune: bool = False, gpu: bool = False, cuda: int = 0, output_dir: str | None = None)[source]

Bases: object

DPMON (Disease Prediction using Multi-Omics Networks): An end-to-end pipeline for disease prediction using multi-omics networks.

Instead of node-level MSE regression, DPMON aggregates node embeddings with patient-level omics data. A downstream classification head (e.g., softmax layer with CrossEntropyLoss) is applied for sample-level disease prediction. This end-to-end approach leverages both local (node-level) and global (patient-level) network information

adjacency_matrix

The adjacency matrix of the network.

Type:

pd.DataFrame

omics_list

A list of omics datasets.

Type:

List[pd.DataFrame]

phenotype_data

A DataFrame containing the disease phenotype.

Type:

pd.DataFrame

clinical_data

A DataFrame containing clinical data.

Type:

Optional[pd.DataFrame]

model

The GNN model to use (GCN, GAT, SAGE, GIN). Default=’GAT’.

Type:

str

gnn_hidden_dim

The hidden dimension of the GNN. Default=16.

Type:

int

layer_num

The number of GNN layers. Default=5.

Type:

int

nn_hidden_dim1

The hidden dimension of the first NN layer. Default=8.

Type:

int

nn_hidden_dim2

The hidden dimension of the second NN layer. Default=8.

Type:

int

num_epochs

The number of training epochs. Default=100.

Type:

int

repeat_num

The number of training repeats. Default=5.

Type:

int

lr

The learning rate. Default=1e-1.

Type:

float

weight_decay

The weight decay. Default=1e-4.

Type:

float

tune

Whether to perform hyperparameter tuning. Default=False.

Type:

bool

gpu

Whether to use GPU. Default=False.

Type:

bool

cuda

The CUDA device ID. Default=0.

Type:

int

output_dir

The output directory. Default=None.

Type:

Optional[str]

run() DataFrame[source]

Execute the DPMON pipeline for disease prediction.

Steps:

  1. Combining Omics and Phenotype Data: - Merges the provided omics datasets and ensures that the phenotype (phenotype) column is included.

  2. Tuning or Training: - Tuning: If tune=True, performs hyperparameter tuning using Ray Tune and returns an empty DataFrame. - Training: If tune=False, runs standard training to generate predictions.

  3. Predictions: - If training is performed, returns a DataFrame of predictions with ‘Actual’ and ‘Predicted’ columns.

Returns: pd.DataFrame

  • If tune=False, a DataFrame containing disease phenotype predictions for each sample.

  • If tune=True, returns an empty DataFrame since no predictions are generated.

Raises:

  • ValueError: If the input data is improperly formatted or missing.

  • Exception: For any unforeseen errors encountered during preprocessing, tuning, or training.

Notes:

  • DPMON relies on internally-generated embeddings (via GNNs), node correlations, and a downstream neural network.

  • Ensure that the adjacency matrix and omics data are properly aligned and that clinical/phenotype data match the sample indices.

Example:

dpmon = DPMON(adjacency_matrix, omics_list, phenotype_data, clinical_data, model='GCN')
predictions = dpmon.run()
print(predictions.head())
class bioneuralnet.downstream_task.dpmon.DimensionAveraging(*args: Any, **kwargs: Any)[source]

Bases: Module

forward(x)[source]
class bioneuralnet.downstream_task.dpmon.DownstreamTaskNN(*args: Any, **kwargs: Any)[source]

Bases: Module

forward(x)[source]
class bioneuralnet.downstream_task.dpmon.NeuralNetwork(*args: Any, **kwargs: Any)[source]

Bases: Module

forward(omics_dataset, omics_network_tg)[source]
bioneuralnet.downstream_task.dpmon.build_omics_networks_tg(adjacency_matrix: DataFrame, omics_datasets: List[DataFrame], clinical_data: DataFrame) List[torch_geometric.data.Data][source]
bioneuralnet.downstream_task.dpmon.run_hyperparameter_tuning(dpmon_params, adjacency_matrix, combined_omics, clinical_data)[source]
bioneuralnet.downstream_task.dpmon.run_standard_training(dpmon_params, adjacency_matrix, combined_omics, clinical_data, output_dir)[source]
bioneuralnet.downstream_task.dpmon.setup_device(gpu, cuda)[source]
bioneuralnet.downstream_task.dpmon.slice_omics_datasets(omics_dataset: DataFrame, adjacency_matrix: DataFrame) List[DataFrame][source]
bioneuralnet.downstream_task.dpmon.train_model(model, criterion, optimizer, train_data, train_labels, epoch_num)[source]