bioneuralnet.downstream_task

Classes

DPMON(adjacency_matrix, omics_list, ...[, ...])

DPMON (Disease Prediction using Multi-Omics Networks): An end-to-end pipeline for disease prediction using multi-omics networks.

SubjectRepresentation(omics_data, embeddings)

SubjectRepresentation Class for Integrating Network Embeddings into Omics Data.

class bioneuralnet.downstream_task.DPMON(adjacency_matrix: DataFrame, omics_list: List[DataFrame], phenotype_data: DataFrame, clinical_data: DataFrame | None = None, model: str = 'GAT', gnn_hidden_dim: int = 16, layer_num: int = 5, nn_hidden_dim1: int = 8, nn_hidden_dim2: int = 8, num_epochs: int = 100, repeat_num: int = 5, lr: float = 0.1, weight_decay: float = 0.0001, tune: bool = False, gpu: bool = False, cuda: int = 0, output_dir: str | None = None)[source]

Bases: object

DPMON (Disease Prediction using Multi-Omics Networks): An end-to-end pipeline for disease prediction using multi-omics networks.

Instead of node-level MSE regression, DPMON aggregates node embeddings with patient-level omics data. A downstream classification head (e.g., softmax layer with CrossEntropyLoss) is applied for sample-level disease prediction. This end-to-end approach leverages both local (node-level) and global (patient-level) network information

adjacency_matrix

The adjacency matrix of the network.

Type:

pd.DataFrame

omics_list

A list of omics datasets.

Type:

List[pd.DataFrame]

phenotype_data

A DataFrame containing the disease phenotype.

Type:

pd.DataFrame

clinical_data

A DataFrame containing clinical data.

Type:

Optional[pd.DataFrame]

model

The GNN model to use (GCN, GAT, SAGE, GIN). Default=’GAT’.

Type:

str

gnn_hidden_dim

The hidden dimension of the GNN. Default=16.

Type:

int

layer_num

The number of GNN layers. Default=5.

Type:

int

nn_hidden_dim1

The hidden dimension of the first NN layer. Default=8.

Type:

int

nn_hidden_dim2

The hidden dimension of the second NN layer. Default=8.

Type:

int

num_epochs

The number of training epochs. Default=100.

Type:

int

repeat_num

The number of training repeats. Default=5.

Type:

int

lr

The learning rate. Default=1e-1.

Type:

float

weight_decay

The weight decay. Default=1e-4.

Type:

float

tune

Whether to perform hyperparameter tuning. Default=False.

Type:

bool

gpu

Whether to use GPU. Default=False.

Type:

bool

cuda

The CUDA device ID. Default=0.

Type:

int

output_dir

The output directory. Default=None.

Type:

Optional[str]

run() DataFrame[source]

Execute the DPMON pipeline for disease prediction.

Steps:

  1. Combining Omics and Phenotype Data: - Merges the provided omics datasets and ensures that the phenotype (phenotype) column is included.

  2. Tuning or Training: - Tuning: If tune=True, performs hyperparameter tuning using Ray Tune and returns an empty DataFrame. - Training: If tune=False, runs standard training to generate predictions.

  3. Predictions: - If training is performed, returns a DataFrame of predictions with ‘Actual’ and ‘Predicted’ columns.

Returns: pd.DataFrame

  • If tune=False, a DataFrame containing disease phenotype predictions for each sample.

  • If tune=True, returns an empty DataFrame since no predictions are generated.

Raises:

  • ValueError: If the input data is improperly formatted or missing.

  • Exception: For any unforeseen errors encountered during preprocessing, tuning, or training.

Notes:

  • DPMON relies on internally-generated embeddings (via GNNs), node correlations, and a downstream neural network.

  • Ensure that the adjacency matrix and omics data are properly aligned and that clinical/phenotype data match the sample indices.

Example:

dpmon = DPMON(adjacency_matrix, omics_list, phenotype_data, clinical_data, model='GCN')
predictions = dpmon.run()
print(predictions.head())
class bioneuralnet.downstream_task.SubjectRepresentation(omics_data: DataFrame, embeddings: DataFrame, phenotype_data: DataFrame | None = None, phenotype_col: str = 'phenotype', reduce_method: str = 'AE', seed: int | None = None, tune: bool | None = False, output_dir: str | None = None)[source]

Bases: object

SubjectRepresentation Class for Integrating Network Embeddings into Omics Data.

run() DataFrame[source]

Executes the Subject Representation workflow. If tuning is enabled, runs hyperparameter tuning and uses the best config to reduce embeddings. Otherwise, uses the default reduction method. :returns:

  • Enhanced omics data as a DataFrame.

Modules

dpmon

subject_representation