bioneuralnet.metrics

Functions

cluster_correlation(cluster_df, pheno)

Compute the Pearson correlation coefficient between PC1 of a cluster and phenotype.

compare_clusters(louvain_clusters, ...[, ...])

Compare clusters from two methods by computing the correlation for each induced subnetwork.

evaluate_f1m(X, y[, model_type, ...])

Evaluate macro F1-score over multiple runs.

evaluate_f1w(X, y[, model_type, ...])

Evaluate weighted F1-score over multiple runs.

evaluate_model(X, y[, model_type, ...])

Evaluate a single model (RF or XGB, classif or reg) over multiple runs, returning three tuples.

evaluate_rf(X, y[, mode, n_estimators, ...])

Shortcut function: evaluate a RandomForest (classification or regression).

evaluate_single_run(X, y[, model_type, ...])

Do one train/test split, train the specified model.

evaluate_xgb(X, y[, mode, n_estimators, ...])

Shortcut function: evaluate an XGBoost (classification or regression).

louvain_to_adjacency(louvain_cluster)

Convert a Louvain cluster to an adjacency matrix.

omics_correlation(omics, pheno)

Compute the Pearson correlation between a group of omics data (reduced to one principal component) and a phenotype.

plot_embeddings(embeddings[, node_labels])

Plot the embeddings in 2D space using t-SNE.

plot_multiple_metrics(metrics[, title_map, ...])

Consolidate multiple metric grouped performances into one figure.

plot_network(adjacency_matrix[, ...])

Plots a network graph from an adjacency matrix with improved visualization.

plot_performance(embedding_result, raw_rf_acc)

Clean and minimal bar plot comparing raw vs embeddings-based performance.

plot_performance_three(raw_score, gnn_score, ...)

Bar plot comparing performance for raw omics, GNN-enriched omics, and one other method.

plot_variance_by_feature(df)

Plot the variance for each feature against its index or name.

plot_variance_distribution(df[, bins])

Compute the variance for each feature (column) in the DataFrame and plot a histogram of these variances.

bioneuralnet.metrics.cluster_correlation(cluster_df: DataFrame, pheno: DataFrame) tuple[source]

Compute the Pearson correlation coefficient between PC1 of a cluster and phenotype.

Parameters:
  • cluster_df – DataFrame representing a cluster of samples.

  • pheno – DataFrame representing the phenotype.

Returns:

(cluster_size, correlation) or (size, None) if correlation fails.

bioneuralnet.metrics.compare_clusters(louvain_clusters: list, smccnet_clusters: list, pheno: DataFrame, omics_merged: DataFrame, label1: str = 'Louvain', label2: str = 'SmCCNet')[source]

Compare clusters from two methods by computing the correlation for each induced subnetwork. Both inputs are expected to be lists of pandas DataFrames. If the lists have different lengths, only the first min(n, m) clusters are compared.

Parameters:
  • louvain_clusters – list of pd.DataFrame Each DataFrame represents an induced subnetwork (from Louvain).

  • smccnet_clusters – list of pd.DataFrame Each DataFrame represents an induced subnetwork (from SMCCNET).

  • pheno – pd.DataFrame Phenotype data (the first column is used).

  • omics_merged – pd.DataFrame Full omics data

  • label1 – str Label for the first method.

  • label2 – str Label for the second method.

Returns:

Results table with cluster indices, sizes, and correlations

Return type:

pd.DataFrame

bioneuralnet.metrics.evaluate_f1m(X: ndarray, y: ndarray, model_type: str = 'rf_classif', n_estimators: int = 100, runs: int = 5, seed: int = 119)[source]

Evaluate macro F1-score over multiple runs.

bioneuralnet.metrics.evaluate_f1w(X: ndarray, y: ndarray, model_type: str = 'rf_classif', n_estimators: int = 100, runs: int = 5, seed: int = 119)[source]

Evaluate weighted F1-score over multiple runs.

bioneuralnet.metrics.evaluate_model(X: ndarray, y: ndarray, model_type: str = 'rf_classif', n_estimators: int = 150, runs: int = 100, seed: int = 119)[source]

Evaluate a single model (RF or XGB, classif or reg) over multiple runs, returning three tuples. For classification:

  • (accuracy_mean, accuracy_std)

  • (f1_weighted_mean, f1_weighted_std)

  • (f1_macro_mean, f1_macro_std)

For regression:

  • (r2_mean, r2_std)

  • (None, None)

  • (None, None)

bioneuralnet.metrics.evaluate_rf(X: ndarray, y: ndarray, mode: str = 'classification', n_estimators: int = 150, runs: int = 100, seed: int = 119, return_all: bool = False)[source]

Shortcut function: evaluate a RandomForest (classification or regression).

bioneuralnet.metrics.evaluate_single_run(X: ndarray, y: ndarray, model_type: str = 'rf_classif', n_estimators: int = 100, test_size: float = 0.3, seed: int = 119)[source]

Do one train/test split, train the specified model.

Return: (accuracy, f1_weighted, f1_macro)

bioneuralnet.metrics.evaluate_xgb(X: ndarray, y: ndarray, mode: str = 'classification', n_estimators: int = 150, runs: int = 100, seed: int = 119, return_all: bool = False)[source]

Shortcut function: evaluate an XGBoost (classification or regression).

bioneuralnet.metrics.louvain_to_adjacency(louvain_cluster: DataFrame) DataFrame[source]

Convert a Louvain cluster to an adjacency matrix.

Parameters:

louvain_cluster – represents an induced subnetwork (from Louvain).

Returns:

Adjacency matrix

Return type:

pd.DataFrame

bioneuralnet.metrics.omics_correlation(omics: DataFrame, pheno: DataFrame) Tuple[float, float][source]

Compute the Pearson correlation between a group of omics data (reduced to one principal component) and a phenotype.

Parameters:
  • omics (pd.DataFrame) – Omics data with rows as samples and columns as features.

  • pheno (pd.DataFrame) – Phenotype data. Expected to have a single column.

Returns:

Pearson correlation coefficient and p-value.

Return type:

Tuple[float, float]

bioneuralnet.metrics.plot_embeddings(embeddings, node_labels=None)[source]

Plot the embeddings in 2D space using t-SNE.

Parameters:
  • embeddings (array-like) – High-dimensional embedding data.

  • node_labels (array-like or DataFrame, optional) – Labels for the nodes to color the points.

bioneuralnet.metrics.plot_multiple_metrics(metrics: dict[str, dict[str, dict[str, tuple[float, float]]]], title_map: dict[str, str] = None, ylabel_map: dict[str, str] = None, filename: Path = None)[source]

Consolidate multiple metric grouped performances into one figure.

Adds numeric labels on top of each bar.

bioneuralnet.metrics.plot_network(adjacency_matrix, weight_threshold=0.0, show_labels=False, show_edge_weights=False)[source]

Plots a network graph from an adjacency matrix with improved visualization. Also adds a summary table mapping node indexes to actual gene names.

Parameters:
  • adjacency_matrix (pd.DataFrame) – The adjacency matrix of the network.

  • weight_threshold (float) – Minimum weight to keep an edge (default: 0.0).

  • show_labels (bool) – Whether to show node labels.

  • show_edge_weights (bool) – Whether to show edge weights.

Returns:

Mapping of node indexes to actual gene names.

Return type:

pd.DataFrame

bioneuralnet.metrics.plot_performance(embedding_result, raw_rf_acc, title='Performance Comparison', filename=None)[source]

Clean and minimal bar plot comparing raw vs embeddings-based performance.

bioneuralnet.metrics.plot_performance_three(raw_score, gnn_score, other_score, labels=['Raw', 'GNN', 'Other'], title='Performance Comparison', filename=None)[source]

Bar plot comparing performance for raw omics, GNN-enriched omics, and one other method.

bioneuralnet.metrics.plot_variance_by_feature(df: DataFrame)[source]

Plot the variance for each feature against its index or name.

Parameters:

df (pd.DataFrame) – Input data.

Returns:

Generated figure.

Return type:

matplotlib.figure.Figure

bioneuralnet.metrics.plot_variance_distribution(df: DataFrame, bins: int = 50)[source]

Compute the variance for each feature (column) in the DataFrame and plot a histogram of these variances.

Parameters:
  • df (pd.DataFrame) – Input data.

  • bins (int) – Number of bins for the histogram.

Returns:

Generated figure.

Return type:

matplotlib.figure.Figure

Modules

correlation

evaluation

plot