Example 1: SmCCNet + GNN Embeddings + Subject Representation
This tutorial illustrates how to:
Build: an adjacency matrix with SmCCNet.
Enhance Representation: Generate node embeddings using GNNEmbedding.
Integrate: Incorporate these embeddings into subject-level omics data using SubjectRepresentation.
Workflow:
Construct: - A multi-omics network adjacency using SmCCNet.
Generate: - Node embeddings with a Graph Neural Network (GNN).
Integrate: - These embeddings into subject-level omics data for enhanced representation.
Subject-level embeddings provide richer phenotypic and clinical context.
View full-size image: Subject Representation
Step-by-Step Instructions:
Data Setup: - Load omics data, phenotype data, and clinical data using DatasetLoader.
Network Construction (SmCCNet): - Call SmCCNet.run() to produce an adjacency matrix from multi-omics data.
GNN Embedding: - Pass the adjacency, omics data, and (optionally) clinical data to GNNEmbedding. - Use .fit() and .embed() to generate node embeddings.
Subject Representation: - Integrate these embeddings into omics data via SubjectRepresentation.
Below is a complete Python implementation:
import pandas as pd
from bioneuralnet.datasets import DatasetLoader
from bioneuralnet.external_tools import SmCCNet
from bioneuralnet.network_embedding import GNNEmbedding
from bioneuralnet.downstream_task import SubjectRepresentation
# 1) Load dataset
loader = DatasetLoader("example1")
omics1, omics2, phenotype, clinical = loader.load_data()
# 2) Merge omics data
merged_omics = pd.concat([omics1, omics2], axis=1)
# 3) Generate adjacency matrix using SmCCNet
smccnet = SmCCNet(
phenotype_df=phenotype,
omics_dfs=[omics1, omics2],
data_types=["genes", "proteins"],
kfold=3,
subSampNum=500,
)
global_network, smccnet_clusters = smccnet.run()
# 4) Generate embeddings using GNNEmbedding
gnn_embedding = GNNEmbedding(
adjacency_matrix=global_network,
omics_data=merged_omics,
phenotype_data=phenotype,
clinical_data=clinical,
tune=True,
)
gnn_embedding.fit()
embeddings_output = gnn_embedding.embed(as_df=True)
print(f"GNN embeddings generated. Shape: {embeddings_output.shape}")
# 5) Perform subject representation using SubjectRepresentation
graph_embedding = SubjectRepresentation(
omics_data=merged_omics,
embeddings=embeddings_output,
phenotype_data=phenotype,
tune=True,
)
enhanced_data = graph_embedding.run()
print(f"Enhanced omics data shape: {enhanced_data.shape}")
# Save enhanced omics data
enhanced_data.to_csv("enhanced_omics_data.csv")
Results:
Adjacency Matrix generated using SmCCNet.
Node Embeddings from GNN.
Enhanced Omics Data, integrating node embeddings for subject-level analysis.