BioNeuralNet - Multi-Omics Integration with Graph Neural Networks

https://img.shields.io/badge/license-MIT-blue.svg https://img.shields.io/pypi/v/bioneuralnet https://static.pepy.tech/badge/bioneuralnet https://img.shields.io/badge/GitHub-View%20Code-blue
BioNeuralNet Logo

Installation

To install BioNeuralNet, simply run:

pip install bioneuralnet

For additional installation details, see Installation.

BioNeuralNet Overview

BioNeuralNet Logo

Embeddings form the core of BioNeuralNet, enabling a number of downstream applications.

BioNeuralNet Core Features

For an End-to-End example example of BioNeuralNet, see Quick Start Guide.

Network Embedding: GNN Embeddings
  • Given a multi-omics network as input, BioNeuralNet can generate embeddings using Graph Neural Networks (GNNs).

  • Generate embeddings using methods such as GCN, GAT, GraphSAGE, and GIN.

  • Outputs can be obtained as native tensors or converted to pandas DataFrames for easy analysis and visualization.

  • Embeddings unlock numerous downstream applications, including disease prediction, enhanced subject representation, clustering, and more.

Graph Clustering: Correlated Clustering
  • Identify functional modules or communities using correlated clustering methods (e.g., CorrelatedPageRank, CorrelatedLouvain, HybridLouvain) that integrate phenotype correlation to extract biologically relevant modules [1].

  • Clustering methods can be applied to any network represented allowing flexible analysis across different domains.

  • All clustering components return either raw partitions dictionaries or induced subnetwork adjacency matrices (as DataFrames) for visualization.

  • Use cases include, feature selection, biomarker discovery, and network-based analysis.

Downstream Tasks: Downstream Tasks
  • Subject Representation:
    • Integrate node embeddings back into omics data to enrich subject-level profiles by weighting features with learned embedding.

    • This embedding-enriched data can be used for downstream tasks such as disease prediction or biomarker discovery.

    • The result can be returned as a DataFrame or a PyTorch tensor, fitting naturally into downstream analyses.

  • Disease Prediction for Multi-Omics Network DPMON [2]:
    • Classification End-to-End pipeline for disease prediction using Graph Neural Network embeddings.

    • DPMON supports hyperparameter tuning-when enabled, it finds the best for the given data.

    • This approach, along with the native pandas integration across modules, ensures that BioNeuralNet can be easily incorporated into your analysis workflows.

Metrics: Metrics
  • Several plotting funcctions to visualize networks, emebddings, variance distribution, cluster comparison, and more.

  • Correlation based functions to compare clustersand omics data with the phenotype.

Utilities: Utils
  • Filtering Functions:
    • Network filtering allows users to select variance or zero-fraction filtering to an omics network.

    • Reducing noise, and removing outliers.

  • Data Conversion:
    • Convert RData files both CSV and to Pandas DataFrame. For ease of integration for R-based workflows.

External Tools: External Tools
  • Graph Construction:
    • BioNeuralNet provides additional tools in the bioneuralnet.external_tools module.

    • Allowing users to generate networks using R-based tools like SmCCNet.

    • While optional, these tools enhance BioNeuralNet’s capabilities and are recommended for comprehensive analysis.

What is BioNeuralNet?

BioNeuralNet is a Python-based framework designed to bridge the gap between multi-omics data analysis and Graph Neural Networks (GNNs). By leveraging advanced techniques, it enables:

  • Graph Clustering: Identifies biologically meaningful communities within omics networks.

  • GNN Embeddings: Learns network-based feature representations from biological graphs, capturing both biological structure and feature correlations for enhanced analysis.

  • Subject Representation: Generates high-quality embeddings for individuals based on multi-omics profiles.

  • Disease Prediction: Builds predictive models using integrated multi-layer biological networks.

Why GNNs?

Traditional methods often struggle to model complex multi-omics relationships due to their inability to capture biological interactions and dependencies. BioNeuralNet addresses this challenge by utilizing GNN-powered embeddings, incorporating models such as:

  • Graph Convolutional Networks (GCN): Aggregates features from neighboring nodes to capture local structure.

  • Graph Attention Networks (GAT): Applies attention mechanisms to prioritize important interactions between biomolecules.

  • GraphSAGE: Enables inductive learning, making it applicable to unseen omics data.

  • Graph Isomorphism Networks (GIN): Improves expressiveness in graph-based learning tasks.

By integrating omics features within a network-aware framework, BioNeuralNet preserves biological interactions, leading to more accurate and interpretable predictions. For a deeper dive into how BioNeuralNet applies GNN embeddings, see GNN Embeddings.

Seamless Data Integration

One of BioNeuralNet’s core strengths is interoperability:

  • Outputs are structured as pandas DataFrames, ensuring easy downstream analysis.

  • Supports integration with external tools and machine learning frameworks, making it adaptable to various research workflows.

  • Works seamlessly with network-based and graph-learning pipelines.

  • Our User API provides detailed information on how to use BioNeuralNet’s modules and functions.

Example: Transforming Multi-Omics for Enhanced Disease Prediction

View full-size image: Transforming Multi-Omics for Enhanced Disease Prediction

Overview of BioNeuralNet's multi-omics integration process

BioNeuralNet: Transforming Multi-Omics for Enhanced Disease Prediction

Below is a quick example demonstrating the following steps:

  1. Data Preparation:

    • Input your multi-omics data (e.g., proteomics, metabolomics) along with phenotype and clinical data.

  2. Network Construction:

    • Not performed internally: Generate the network adjacency matrix externally (SmCCNet).

    • Lightweight wrappers (SmCCNet) are available in bioneuralnet.external_tools for convenience, R is required for their usage.

  3. Disease Prediction:

    • Use DPMON to predict disease phenotypes by integrating the network information with omics data.

    • DPMON supports an end-to-end pipeline with hyperparameter tuning that can return predictions as pandas DataFrames, enabling seamless integration with existing workflows.

Code Example:

import pandas as pd
from bioneuralnet.external_tools import SmCCNet
from bioneuralnet.downstream_task import DPMON

# Step 1: Data Preparation
phenotype_data = pd.read_csv('phenotype_data.csv')
omics_proteins = pd.read_csv('omics_proteins.csv')
omics_metabolites = pd.read_csv('omics_metabolites.csv')
clinical_dt = pd.read_csv('clinical_data.csv')

# Step 2: Network Construction
smccnet = SmCCNet(
    phenotype_df=phenotype_data,
    omics_dfs=[omics_proteins, omics_metabolites],
    data_types=["protein", "metabolite"],
    kfold=5,
    summarization="PCA",
)
global_network, clusters = smccnet.run()
print("Adjacency matrix generated.")

# Step 3: Disease Prediction (DPMON)
dpmon = DPMON(
    adjacency_matrix=global_network,
    omics_list=[omics_proteins, omics_metabolites],
    phenotype_data=phenotype_data,
    clinical_data=clinical_dt,
    model="GCN",
)
predictions = dpmon.run()
print("Disease phenotype predictions:\n", predictions)

Contents:

Indices and References