bioneuralnet.metrics
Functions
|
Compute the Pearson correlation coefficient between PC1 of a cluster and phenotype. |
|
Compare clusters from two methods by computing the correlation for each induced subnetwork. |
|
Evaluate macro F1-score over multiple runs. |
|
Evaluate weighted F1-score over multiple runs. |
|
Evaluate a single model (RF or XGB, classif or reg) over multiple runs, returning three tuples. |
|
Shortcut function: evaluate a RandomForest (classification or regression). |
|
Do one train/test split, train the specified model. |
|
Shortcut function: evaluate an XGBoost (classification or regression). |
|
Convert a Louvain cluster to an adjacency matrix. |
|
Compute the Pearson correlation between a group of omics data (reduced to one principal component) and a phenotype. |
|
Plot the embeddings in 2D space using t-SNE. |
|
Consolidate multiple metric grouped performances into one figure. |
|
Plots a network graph from an adjacency matrix with improved visualization. |
|
Clean and minimal bar plot comparing raw vs embeddings-based performance. |
|
Bar plot comparing performance for raw omics, GNN-enriched omics, and one other method. |
Plot the variance for each feature against its index or name. |
|
|
Compute the variance for each feature (column) in the DataFrame and plot a histogram of these variances. |
- bioneuralnet.metrics.cluster_correlation(cluster_df: DataFrame, pheno: DataFrame) tuple[source]
Compute the Pearson correlation coefficient between PC1 of a cluster and phenotype.
- Parameters:
cluster_df – DataFrame representing a cluster of samples.
pheno – DataFrame representing the phenotype.
- Returns:
(cluster_size, correlation) or (size, None) if correlation fails.
- bioneuralnet.metrics.compare_clusters(louvain_clusters: list, smccnet_clusters: list, pheno: DataFrame, omics_merged: DataFrame, label1: str = 'Louvain', label2: str = 'SmCCNet')[source]
Compare clusters from two methods by computing the correlation for each induced subnetwork. Both inputs are expected to be lists of pandas DataFrames. If the lists have different lengths, only the first min(n, m) clusters are compared.
- Parameters:
louvain_clusters – list of pd.DataFrame Each DataFrame represents an induced subnetwork (from Louvain).
smccnet_clusters – list of pd.DataFrame Each DataFrame represents an induced subnetwork (from SMCCNET).
pheno – pd.DataFrame Phenotype data (the first column is used).
omics_merged – pd.DataFrame Full omics data
label1 – str Label for the first method.
label2 – str Label for the second method.
- Returns:
Results table with cluster indices, sizes, and correlations
- Return type:
pd.DataFrame
- bioneuralnet.metrics.evaluate_f1m(X: ndarray, y: ndarray, model_type: str = 'rf_classif', n_estimators: int = 100, runs: int = 5, seed: int = 119)[source]
Evaluate macro F1-score over multiple runs.
- bioneuralnet.metrics.evaluate_f1w(X: ndarray, y: ndarray, model_type: str = 'rf_classif', n_estimators: int = 100, runs: int = 5, seed: int = 119)[source]
Evaluate weighted F1-score over multiple runs.
- bioneuralnet.metrics.evaluate_model(X: ndarray, y: ndarray, model_type: str = 'rf_classif', n_estimators: int = 150, runs: int = 100, seed: int = 119)[source]
Evaluate a single model (RF or XGB, classif or reg) over multiple runs, returning three tuples. For classification:
(accuracy_mean, accuracy_std)
(f1_weighted_mean, f1_weighted_std)
(f1_macro_mean, f1_macro_std)
For regression:
(r2_mean, r2_std)
(None, None)
(None, None)
- bioneuralnet.metrics.evaluate_rf(X: ndarray, y: ndarray, mode: str = 'classification', n_estimators: int = 150, runs: int = 100, seed: int = 119, return_all: bool = False)[source]
Shortcut function: evaluate a RandomForest (classification or regression).
- bioneuralnet.metrics.evaluate_single_run(X: ndarray, y: ndarray, model_type: str = 'rf_classif', n_estimators: int = 100, test_size: float = 0.3, seed: int = 119)[source]
Do one train/test split, train the specified model.
Return: (accuracy, f1_weighted, f1_macro)
- bioneuralnet.metrics.evaluate_xgb(X: ndarray, y: ndarray, mode: str = 'classification', n_estimators: int = 150, runs: int = 100, seed: int = 119, return_all: bool = False)[source]
Shortcut function: evaluate an XGBoost (classification or regression).
- bioneuralnet.metrics.louvain_to_adjacency(louvain_cluster: DataFrame) DataFrame[source]
Convert a Louvain cluster to an adjacency matrix.
- Parameters:
louvain_cluster – represents an induced subnetwork (from Louvain).
- Returns:
Adjacency matrix
- Return type:
pd.DataFrame
- bioneuralnet.metrics.omics_correlation(omics: DataFrame, pheno: DataFrame) Tuple[float, float][source]
Compute the Pearson correlation between a group of omics data (reduced to one principal component) and a phenotype.
- bioneuralnet.metrics.plot_embeddings(embeddings, node_labels=None)[source]
Plot the embeddings in 2D space using t-SNE.
- Parameters:
embeddings (array-like) – High-dimensional embedding data.
node_labels (array-like or DataFrame, optional) – Labels for the nodes to color the points.
- bioneuralnet.metrics.plot_multiple_metrics(metrics: dict[str, dict[str, dict[str, tuple[float, float]]]], title_map: dict[str, str] = None, ylabel_map: dict[str, str] = None, filename: Path = None)[source]
Consolidate multiple metric grouped performances into one figure.
Adds numeric labels on top of each bar.
- bioneuralnet.metrics.plot_network(adjacency_matrix, weight_threshold=0.0, show_labels=False, show_edge_weights=False)[source]
Plots a network graph from an adjacency matrix with improved visualization. Also adds a summary table mapping node indexes to actual gene names.
- Parameters:
- Returns:
Mapping of node indexes to actual gene names.
- Return type:
pd.DataFrame
- bioneuralnet.metrics.plot_performance(embedding_result, raw_rf_acc, title='Performance Comparison', filename=None)[source]
Clean and minimal bar plot comparing raw vs embeddings-based performance.
- bioneuralnet.metrics.plot_performance_three(raw_score, gnn_score, other_score, labels=['Raw', 'GNN', 'Other'], title='Performance Comparison', filename=None)[source]
Bar plot comparing performance for raw omics, GNN-enriched omics, and one other method.
- bioneuralnet.metrics.plot_variance_by_feature(df: DataFrame)[source]
Plot the variance for each feature against its index or name.
- Parameters:
df (pd.DataFrame) – Input data.
- Returns:
Generated figure.
- Return type:
- bioneuralnet.metrics.plot_variance_distribution(df: DataFrame, bins: int = 50)[source]
Compute the variance for each feature (column) in the DataFrame and plot a histogram of these variances.
- Parameters:
df (pd.DataFrame) – Input data.
bins (int) – Number of bins for the histogram.
- Returns:
Generated figure.
- Return type:
Modules