Utils
RData Conversion
bioneuralnet.utils.rdata_to_df()converts an RData file to a CSV and loads it into a pandas DataFrame.
Logging
bioneuralnet.utils.get_logger()configures and returns a logger writing tobioneuralnet.logat the project root.
Graph Generation
This section details utility functions for generating networks from omics data matrices. All methods are industry-standard and open in the literature.
Methods and Examples
k-NN Cosine/RBF Similarity Graph
Computes either cosine similarity:
\[S_{ij} = \frac{x_i^\top x_j}{\|x_i\|\,\|x_j\|}\]or Gaussian RBF kernel:
\[S_{ij} = \exp\bigl(-\|x_i - x_j\|^2 /(2\sigma^2)\bigr)\]Sparsify by keeping top-(k) neighbors per node, optionally mutual.
from bioneuralnet.utils.graphs import gen_similarity_graph A = gen_similarity_graph(X, k=15, metric='cosine', mutual=True) Reference: Hastie et al., 2009 [Hastie2009_]
Pearson/Spearman Co-expression Graph
Compute correlation:
\[C_{ij} = \mathrm{corr}(x_i, x_j)\]Then sparsify by top-(k) or threshold.
from bioneuralnet.utils.graphs import gen_correlation_graph A = gen_correlation_graph(X, k=15, method='pearson') Reference: Langfelder & Horvath, 2008 [Langfelder2008_]
Soft-Threshold Graph
Soft-threshold absolute correlation, similar to WGCNA:
\[W_{ij} = |C_{ij}|^\beta\]Then top-(k) selection and normalization.
from bioneuralnet.utils.graphs import gen_threshold_graph A = gen_threshold_graph(X, b=6.0, k=15) Reference: Langfelder & Horvath, 2008 [Langfelder2008_]
Gaussian k-NN Graph
Gaussian kernel sparsified by k-nearest neighbors:
\[S_{ij} = \exp\bigl(-\|x_i - x_j\|^2 /(2\sigma^2)\bigr),\quad W = \text{Top}_k(S)\]from bioneuralnet.utils.graphs import gen_gaussian_knn_graph A = gen_gaussian_knn_graph(X, k=15, sigma=None)
Credit: adapts common practice from spectral clustering (Ng et al., 2002).
Mutual Information Graph
Estimate pairwise mutual information:
\[\mathrm{MI}_{ij} = I(x_i; x_j)\]from bioneuralnet.utils.graphs import gen_mutual_info_graph A = gen_mutual_info_graph(X, k=15) Reference: Margolin et al., 2006 [Margolin2006_]
Preprocessing Utilities
A collection of data-cleaning and feature-selection functions for clinical and omics datasets.
Clinical Preprocessing
bioneuralnet.utils.preprocess.preprocess_clinical()Splits numeric and categorical features; replaces Inf/NaN; optionally scales numeric data (RobustScaler); encodes categoricals; drops zero-variance; and selects top-k features by RandomForest importance.Example:
from bioneuralnet.utils.preprocess import preprocess_clinical df_top = preprocess_clinical(X, y, top_k=10, scale=True)
bioneuralnet.utils.preprocess.clean_inf_nan()Replaces Inf with NaN, imputes medians, drops zero-variance columns, and logs counts.Example:
from bioneuralnet.utils.preprocess import clean_inf_nan df_clean = clean_inf_nan(df)
Variance-Based Selection
bioneuralnet.utils.preprocess.select_top_k_variance()Cleans data, then picks the top-k numeric features by variance.Example:
from bioneuralnet.utils.preprocess import select_top_k_variance df_var = select_top_k_variance(df, k=500)
Correlation-Based Selection
bioneuralnet.utils.preprocess.select_top_k_correlation()- Supervised: if you pass y, selects features by absolute Pearson correlation with y. - Unsupervised: if y=None, picks features with the lowest average inter-feature correlation (redundancy reduction).Example:
from bioneuralnet.utils.preprocess import select_top_k_correlation df_sup = select_top_k_correlation(X, y, top_k=100) # supervised df_unsup = select_top_k_correlation(X, top_k=100) # unsupervised
RandomForest Feature Importance
bioneuralnet.utils.preprocess.select_top_randomforest()Requires numeric inputs; cleans data; drops zero-variance; fits RandomForest; and returns the top-k features by importance.Example:
from bioneuralnet.utils.preprocess import select_top_randomforest df_rf = select_top_randomforest(X, y, top_k=200)
ANOVA F-Test Selection
bioneuralnet.utils.preprocess.top_anova_f_features()Runs ANOVA F-test (classification or regression), applies FDR correction, selects all significant features, and pads with next-best to reach max_features.Example:
from bioneuralnet.utils.preprocess import top_anova_f_features df_anova = top_anova_f_features(X, y, max_features=100, alpha=0.05)
Network Pruning
bioneuralnet.utils.preprocess.prune_network()Prunes edges below a weight threshold, removes isolates, and logs before/after stats.Example:
from bioneuralnet.utils.preprocess import prune_network pruned = prune_network(adj_df, weight_threshold=0.1)
bioneuralnet.utils.preprocess.prune_network_by_quantile()Uses a quantile cutoff on edge weights, prunes accordingly, removes isolates, and logs stats.Example:
from bioneuralnet.utils.preprocess import prune_network_by_quantile pruned_q = prune_network_by_quantile(adj_df, quantile=0.75)
bioneuralnet.utils.preprocess.network_remove_low_variance()Drops rows/columns in the adjacency matrix whose variance falls below a threshold.Example:
from bioneuralnet.utils.preprocess import network_remove_low_variance filtered = network_remove_low_variance(adj_df, threshold=1e-5)
bioneuralnet.utils.preprocess.network_remove_high_zero_fraction()Removes rows/columns where the fraction of zero weights exceeds a threshold (default 0.95).Example:
from bioneuralnet.utils.preprocess import network_remove_high_zero_fraction filtered_z = network_remove_high_zero_fraction(adj_df, threshold=0.95)
Data Summary Utilities
bioneuralnet.utils.data_summary.variance_summary()- Computes summary statistics for column variances.Example:
from bioneuralnet.utils.data_summary import variance_summary stats = variance_summary(df, low_var_threshold=1e-4)
bioneuralnet.utils.data_summary.zero_fraction_summary()- Computes statistics for the fraction of zeros per column.Example:
from bioneuralnet.utils.data_summary import zero_fraction_summary stats = zero_fraction_summary(df, high_zero_threshold=0.5)
bioneuralnet.utils.data_summary.expression_summary()- Computes summary of mean expression values across features.Example:
from bioneuralnet.utils.data_summary import expression_summary stats = expression_summary(df)
bioneuralnet.utils.data_summary.correlation_summary()- Computes statistics of each feature’s maximum pairwise correlation (excluding self).Example:
from bioneuralnet.utils.data_summary import correlation_summary stats = correlation_summary(df)
bioneuralnet.utils.data_summary.explore_data_stats()- Prints an overall summary (variance, zero fraction, expression, correlation) to stdout.Example:
from bioneuralnet.utils.data_summary import explore_data_stats explore_data_stats(df, name="MyOmicsData")