BioNeuralNet - Multi-Omics Integration with Graph Neural Networks
Installation
To install BioNeuralNet, simply run:
pip install bioneuralnet
For additional installation details, see Installation.
BioNeuralNet Overview
Embeddings form the core of BioNeuralNet, enabling a number of downstream applications.
BioNeuralNet Core Features
For an End-to-End example example of BioNeuralNet, see Quick Start Guide.
- Network Embedding: GNN Embeddings
Given a multi-omics network as input, BioNeuralNet can generate embeddings using Graph Neural Networks (GNNs).
Generate embeddings using methods such as GCN, GAT, GraphSAGE, and GIN.
Outputs can be obtained as native tensors or converted to pandas DataFrames for easy analysis and visualization.
Embeddings unlock numerous downstream applications, including disease prediction, enhanced subject representation, clustering, and more.
- Graph Clustering: Correlated Clustering
Identify functional modules or communities using correlated clustering methods (e.g., CorrelatedPageRank, CorrelatedLouvain, HybridLouvain) that integrate phenotype correlation to extract biologically relevant modules [1].
Clustering methods can be applied to any network represented allowing flexible analysis across different domains.
All clustering components return either raw partitions dictionaries or induced subnetwork adjacency matrices (as DataFrames) for visualization.
Use cases include, feature selection, biomarker discovery, and network-based analysis.
- Downstream Tasks: Downstream Tasks
- Subject Representation:
Integrate node embeddings back into omics data to enrich subject-level profiles by weighting features with learned embedding.
This embedding-enriched data can be used for downstream tasks such as disease prediction or biomarker discovery.
The result can be returned as a DataFrame or a PyTorch tensor, fitting naturally into downstream analyses.
- Disease Prediction for Multi-Omics Network DPMON [2]:
Classification End-to-End pipeline for disease prediction using Graph Neural Network embeddings.
DPMON supports hyperparameter tuning-when enabled, it finds the best for the given data.
This approach, along with the native pandas integration across modules, ensures that BioNeuralNet can be easily incorporated into your analysis workflows.
- Metrics: Metrics
Several plotting funcctions to visualize networks, emebddings, variance distribution, cluster comparison, and more.
Correlation based functions to compare clustersand omics data with the phenotype.
- Utilities: Utils
- Filtering Functions:
Network filtering allows users to select variance or zero-fraction filtering to an omics network.
Reducing noise, and removing outliers.
- Data Conversion:
Convert RData files both CSV and to Pandas DataFrame. For ease of integration for R-based workflows.
- External Tools: External Tools
- Graph Construction:
BioNeuralNet provides additional tools in the bioneuralnet.external_tools module.
Allowing users to generate networks using R-based tools like SmCCNet.
While optional, these tools enhance BioNeuralNet’s capabilities and are recommended for comprehensive analysis.
What is BioNeuralNet?
BioNeuralNet is a Python-based framework designed to bridge the gap between multi-omics data analysis and Graph Neural Networks (GNNs). By leveraging advanced techniques, it enables:
Graph Clustering: Identifies biologically meaningful communities within omics networks.
GNN Embeddings: Learns network-based feature representations from biological graphs, capturing both biological structure and feature correlations for enhanced analysis.
Subject Representation: Generates high-quality embeddings for individuals based on multi-omics profiles.
Disease Prediction: Builds predictive models using integrated multi-layer biological networks.
Why GNNs?
Traditional methods often struggle to model complex multi-omics relationships due to their inability to capture biological interactions and dependencies. BioNeuralNet addresses this challenge by utilizing GNN-powered embeddings, incorporating models such as:
Graph Convolutional Networks (GCN): Aggregates features from neighboring nodes to capture local structure.
Graph Attention Networks (GAT): Applies attention mechanisms to prioritize important interactions between biomolecules.
GraphSAGE: Enables inductive learning, making it applicable to unseen omics data.
Graph Isomorphism Networks (GIN): Improves expressiveness in graph-based learning tasks.
By integrating omics features within a network-aware framework, BioNeuralNet preserves biological interactions, leading to more accurate and interpretable predictions. For a deeper dive into how BioNeuralNet applies GNN embeddings, see GNN Embeddings.
Seamless Data Integration
One of BioNeuralNet’s core strengths is interoperability:
Outputs are structured as pandas DataFrames, ensuring easy downstream analysis.
Supports integration with external tools and machine learning frameworks, making it adaptable to various research workflows.
Works seamlessly with network-based and graph-learning pipelines.
Our User API provides detailed information on how to use BioNeuralNet’s modules and functions.
Example: Transforming Multi-Omics for Enhanced Disease Prediction
View full-size image: Transforming Multi-Omics for Enhanced Disease Prediction
BioNeuralNet: Transforming Multi-Omics for Enhanced Disease Prediction
Below is a quick example demonstrating the following steps:
Data Preparation:
Input your multi-omics data (e.g., proteomics, metabolomics) along with phenotype and clinical data.
Network Construction:
Not performed internally: Generate the network adjacency matrix externally (SmCCNet).
Lightweight wrappers (SmCCNet) are available in bioneuralnet.external_tools for convenience, R is required for their usage.
Disease Prediction:
Use DPMON to predict disease phenotypes by integrating the network information with omics data.
DPMON supports an end-to-end pipeline with hyperparameter tuning that can return predictions as pandas DataFrames, enabling seamless integration with existing workflows.
Code Example:
import pandas as pd
from bioneuralnet.external_tools import SmCCNet
from bioneuralnet.downstream_task import DPMON
# Step 1: Data Preparation
phenotype_data = pd.read_csv('phenotype_data.csv')
omics_proteins = pd.read_csv('omics_proteins.csv')
omics_metabolites = pd.read_csv('omics_metabolites.csv')
clinical_dt = pd.read_csv('clinical_data.csv')
# Step 2: Network Construction
smccnet = SmCCNet(
phenotype_df=phenotype_data,
omics_dfs=[omics_proteins, omics_metabolites],
data_types=["protein", "metabolite"],
kfold=5,
summarization="PCA",
)
global_network, clusters = smccnet.run()
print("Adjacency matrix generated.")
# Step 3: Disease Prediction (DPMON)
dpmon = DPMON(
adjacency_matrix=global_network,
omics_list=[omics_proteins, omics_metabolites],
phenotype_data=phenotype_data,
clinical_data=clinical_dt,
model="GCN",
)
predictions = dpmon.run()
print("Disease phenotype predictions:\n", predictions)
Contents:
- Installation
- GNN Embeddings
- Key Contributions:
- GNN Model Overviews
- Dimensionality Reduction and Downstream Integration
- Task-Driven (Supervised/Semi-Supervised) GNNs
- Generating Low-Dimensional Embeddings for Multi-Omics
- Key Insights into GNN Parameters and Outputs
- Dimensionality Reduction: PCA vs. Autoencoders
- How DPMON Uses GNNs Differently
- Example Usage
- Correlated Clustering
- Metrics
- Utils
- Downstream Tasks
- Quick Start Guide
- BioNeuralNet offers a number of classes and functions.
- Load Data
- Generate Global Network using SmCCNet
- Generate Network Embeddings using GNNEmbedding
- Embeddings visualization
- Integrate Embeddings into Omics Data with SubjectRepresentation
- Comparing results for prediction task
- Disease Prediction using DPMON(Disease Prediction using Multi-Omics Networks)
- Phenotype after:
- Visualizing the variance distribution and feature variance
- DPMON Predictions
- Clustering with CorrelatedLouvain and HybridLouvain
- Lets plot the clustered network from correlated louvain
- TCGA-BRCA Demo
- Preprocessing
- Optional: Load the data we just saved to make sure it looks okay.
- Easy Access via DatasetLoader
- Preparing Multi-Omics Data for downstream tasks
- TOPMed Demo
- Trans-Omics for Precision Medicine | TOPMed
- Loading your data:
- Ease of component exploration via
DatasetLoader - Generating a Multi-Omics Network using SmCCNet
- SmCCNet Output
- Generating Low-Dimensional Embeddings using Graph Neural Networks to capture meaningful biological interactions.
- Output
- Embeddings visualization
- Using the Embeddings
- Comparing results
- Network Visulization
- Correlated Clustering
- DPMON (Disease Prediction using Multi-Omics Networks) reuses the same GNN architectures but with a different objective:
- DPMON allows BioNeuralNet users to significatly improve phenotype predictions with a few lines of code.
- Tutorials
- External Tools
- User API
- Acknowledgments