bioneuralnet.clustering

Classes

CorrelatedLouvain(G, B[, Y, k3, k4, weight, ...])

CorrelatedLouvain Class for Community Detection with Correlated Omics Data.

CorrelatedPageRank(graph, omics_data, ...[, ...])

PageRank Class for Clustering Nodes Based on Personalized PageRank.

HybridLouvain(G, B, Y[, k3, k4, max_iter, ...])

HybridLouvain Class that combines Correlated Louvain and Correlated PageRank for community detection.

class bioneuralnet.clustering.CorrelatedLouvain(G: Graph, B: DataFrame, Y=None, k3: float = 0.2, k4: float = 0.8, weight: str = 'weight', tune: bool = False, gpu: bool = False, seed: int | None = None)[source]

Bases: object

CorrelatedLouvain Class for Community Detection with Correlated Omics Data. .. attribute:: G

NetworkX graph object.

type:

nx.Graph

B

Omics data.

Type:

pd.DataFrame

Y

Phenotype data.

Type:

pd.DataFrame

k3

Weight for Correlated Louvain.

Type:

float

k4

Weight for Correlated Louvain.

Type:

float

weight

Edge weight parameter name.

Type:

str

tune

Flag to enable tuning of parameters

Type:

bool

get_quality() float[source]
partition_to_adjacency(partition: dict) list[source]

Convert the partition dictionary into a list of adjacency matrices (DataFrames), where each adjacency matrix represents a cluster with more than 2 nodes.

run(as_dfs: bool = False) dict | list[source]

Run correlated Louvain clustering.

If as_dfs is True, returns a list of adjacency matrices (DataFrames), where each adjacency matrix represents a cluster with more than 2 nodes. Otherwise, returns the partition dictionary.

If tune is True and as_dfs is False, hyperparameter tuning is performed and the best parameters are returned. If tune is True and as_dfs is True, tuning is performed, and then standard detection is run using the tuned parameters.

run_tuning(num_samples=10)[source]
class bioneuralnet.clustering.CorrelatedPageRank(graph: Graph, omics_data: DataFrame, phenotype_data: DataFrame, alpha: float = 0.9, max_iter: int = 100, tol: float = 1e-06, k: float = 0.5, tune: bool = False, gpu: bool = False, seed: int | None = None)[source]

Bases: object

PageRank Class for Clustering Nodes Based on Personalized PageRank.

This class handles the execution of the Personalized PageRank algorithm and identification of clusters based on sweep cuts.

alpha

Damping factor for PageRank.

Type:

float

max_iter

Maximum number of iterations for PageRank convergence.

Type:

int

tol

Tolerance for convergence.

Type:

float

k

Weighting factor for composite correlation-conductance score.

Type:

float

output_dir

Directory to save outputs.

Type:

str

generate_weighted_personalization(nodes: List[Any]) Dict[Any, float][source]

Generates a weighted personalization vector for PageRank.

Parameters:

nodes (List[Any]) – List of node identifiers to consider.

Returns:

Personalization vector with weights for each node.

Return type:

Dict[Any, float]

get_quality() float[source]

Returns the composite score (or correlation) from the latest clustering run.

phen_omics_corr(nodes: List[Any]) Tuple[float, str][source]

Calculates the Pearson correlation between the PCA of omics data and phenotype.

Parameters:

nodes (List[Any]) – List of node identifiers to include in the calculation.

Returns:

Correlation coefficient and formatted correlation with p-value.

Return type:

Tuple[float, str]

run(seed_nodes: List[Any]) Dict[str, Any][source]

Executes the correlated PageRank clustering pipeline.

Steps:

  1. Initializing Clustering:
    • Receives a list of seed nodes to personalize the PageRank algorithm.

    • Prepares the input graph and relevant parameters for clustering.

  2. PageRank Execution:
    • Applies the PageRank algorithm with personalization based on the seed nodes.

    • Computes node scores and determines cluster memberships.

  3. Result Compilation:
    • Compiles clustering results, including cluster sizes and node memberships, into a dictionary.

    • Logs the successful completion of the clustering process.

Args:
seed_nodes (List[Any]):
  • A list of node identifiers used as seed nodes for personalized PageRank.

  • These nodes influence the clustering process by biasing the algorithm.

Returns: Dict[str, Any]

  • A dictionary containing the clustering results. Keys may include:
    • clusters: Lists of nodes grouped into clusters.

    • scores: PageRank scores for each node.

    • metadata: Additional metrics or details about the clustering process.

Raises:

  • ValueError: If the input graph is empty or seed nodes are invalid.

  • Exception: For any unexpected errors during clustering execution.

Notes:

  • Seed nodes strongly influence the clustering outcome; select them carefully based on prior knowledge or experimental goals.

  • The PageRank algorithm requires a well-defined and connected graph to produce meaningful results.

  • Results are sensitive to the alpha (damping factor) and other hyperparameters.

run_pagerank_clustering(seed_nodes: List[Any]) Dict[str, Any][source]

Executes the PageRank clustering algorithm.

Parameters:

seed_nodes (List[Any]) – List of seed node identifiers for personalization.

Returns:

Dictionary containing clustering results.

Return type:

Dict[str, Any]

run_tuning(num_samples: int = 10) Dict[str, Any][source]
sweep_cut(p: Dict[Any, float]) Tuple[List[Any], int, float, float, float, str][source]
class bioneuralnet.clustering.HybridLouvain(G: Graph, B: DataFrame, Y: DataFrame, k3: float = 0.2, k4: float = 0.8, max_iter: int = 10, weight: str = 'weight', gpu: bool = False, seed: int | None = None, tune: bool | None = False)[source]

Bases: object

HybridLouvain Class that combines Correlated Louvain and Correlated PageRank for community detection.

G

NetworkX graph object.

Type:

nx.Graph

B

Omics data.

Type:

pd.DataFrame

Y

Phenotype data.

Type:

pd.DataFrame

k3

Weight for Correlated Louvain.

Type:

float

k4

Weight for Correlated Louvain.

Type:

float

max_iter

Maximum number of iterations.

Type:

int

weight

Edge weight parameter name.

Type:

str

tune

Flag to enable tuning of parameters

Type:

bool

run(as_dfs: bool = False) dict | list[source]

Modules

correlated_louvain

correlated_pagerank

hybrid_louvain