bioneuralnet.downstream_task
Classes
|
DPMON (Disease Prediction using Multi-Omics Networks): An end-to-end pipeline for disease prediction using multi-omics networks. |
|
SubjectRepresentation Class for Integrating Network Embeddings into Omics Data. |
- class bioneuralnet.downstream_task.DPMON(adjacency_matrix: DataFrame, omics_list: List[DataFrame], phenotype_data: DataFrame, clinical_data: DataFrame | None = None, model: str = 'GAT', gnn_hidden_dim: int = 16, layer_num: int = 5, nn_hidden_dim1: int = 8, nn_hidden_dim2: int = 8, num_epochs: int = 100, repeat_num: int = 5, lr: float = 0.1, weight_decay: float = 0.0001, tune: bool = False, gpu: bool = False, cuda: int = 0, output_dir: str | None = None)[source]
Bases:
objectDPMON (Disease Prediction using Multi-Omics Networks): An end-to-end pipeline for disease prediction using multi-omics networks.
Instead of node-level MSE regression, DPMON aggregates node embeddings with patient-level omics data. A downstream classification head (e.g., softmax layer with CrossEntropyLoss) is applied for sample-level disease prediction. This end-to-end approach leverages both local (node-level) and global (patient-level) network information
- adjacency_matrix
The adjacency matrix of the network.
- Type:
pd.DataFrame
- omics_list
A list of omics datasets.
- Type:
List[pd.DataFrame]
- phenotype_data
A DataFrame containing the disease phenotype.
- Type:
pd.DataFrame
- clinical_data
A DataFrame containing clinical data.
- Type:
Optional[pd.DataFrame]
The hidden dimension of the GNN. Default=16.
- Type:
The hidden dimension of the first NN layer. Default=8.
- Type:
The hidden dimension of the second NN layer. Default=8.
- Type:
- run() DataFrame[source]
Execute the DPMON pipeline for disease prediction.
Steps:
Combining Omics and Phenotype Data: - Merges the provided omics datasets and ensures that the phenotype (phenotype) column is included.
Tuning or Training: - Tuning: If tune=True, performs hyperparameter tuning using Ray Tune and returns an empty DataFrame. - Training: If tune=False, runs standard training to generate predictions.
Predictions: - If training is performed, returns a DataFrame of predictions with ‘Actual’ and ‘Predicted’ columns.
Returns: pd.DataFrame
If tune=False, a DataFrame containing disease phenotype predictions for each sample.
If tune=True, returns an empty DataFrame since no predictions are generated.
Raises:
ValueError: If the input data is improperly formatted or missing.
Exception: For any unforeseen errors encountered during preprocessing, tuning, or training.
Notes:
DPMON relies on internally-generated embeddings (via GNNs), node correlations, and a downstream neural network.
Ensure that the adjacency matrix and omics data are properly aligned and that clinical/phenotype data match the sample indices.
Example:
dpmon = DPMON(adjacency_matrix, omics_list, phenotype_data, clinical_data, model='GCN') predictions = dpmon.run() print(predictions.head())
- class bioneuralnet.downstream_task.SubjectRepresentation(omics_data: DataFrame, embeddings: DataFrame, phenotype_data: DataFrame | None = None, phenotype_col: str = 'phenotype', reduce_method: str = 'AE', seed: int | None = None, tune: bool | None = False, output_dir: str | None = None)[source]
Bases:
objectSubjectRepresentation Class for Integrating Network Embeddings into Omics Data.
Modules