bioneuralnet.clustering.correlated_pagerank

Functions

get_logger(name)

Retrieves a global logger configured to write to 'bioneuralnet.log' at the project root.

pearsonr(x, y, *[, alternative, method])

Pearson correlation coefficient and p-value for testing non-correlation.

Classes

ASHAScheduler

alias of AsyncHyperBandScheduler

Any(*args, **kwargs)

Special type indicating an unconstrained type.

CLIReporter(*[, metric_columns, ...])

Command-line reporter

CorrelatedPageRank(graph, omics_data, ...[, ...])

PageRank Class for Clustering Nodes Based on Personalized PageRank.

class bioneuralnet.clustering.correlated_pagerank.CorrelatedPageRank(graph: Graph, omics_data: DataFrame, phenotype_data: DataFrame, alpha: float = 0.9, max_iter: int = 100, tol: float = 1e-06, k: float = 0.5, tune: bool = False, gpu: bool = False, seed: int | None = None)[source]

Bases: object

PageRank Class for Clustering Nodes Based on Personalized PageRank.

This class handles the execution of the Personalized PageRank algorithm and identification of clusters based on sweep cuts.

alpha

Damping factor for PageRank.

Type:

float

max_iter

Maximum number of iterations for PageRank convergence.

Type:

int

tol

Tolerance for convergence.

Type:

float

k

Weighting factor for composite correlation-conductance score.

Type:

float

output_dir

Directory to save outputs.

Type:

str

generate_weighted_personalization(nodes: List[Any]) Dict[Any, float][source]

Generates a weighted personalization vector for PageRank.

Parameters:

nodes (List[Any]) – List of node identifiers to consider.

Returns:

Personalization vector with weights for each node.

Return type:

Dict[Any, float]

get_quality() float[source]

Returns the composite score (or correlation) from the latest clustering run.

phen_omics_corr(nodes: List[Any]) Tuple[float, str][source]

Calculates the Pearson correlation between the PCA of omics data and phenotype.

Parameters:

nodes (List[Any]) – List of node identifiers to include in the calculation.

Returns:

Correlation coefficient and formatted correlation with p-value.

Return type:

Tuple[float, str]

run(seed_nodes: List[Any]) Dict[str, Any][source]

Executes the correlated PageRank clustering pipeline.

Steps:

  1. Initializing Clustering:
    • Receives a list of seed nodes to personalize the PageRank algorithm.

    • Prepares the input graph and relevant parameters for clustering.

  2. PageRank Execution:
    • Applies the PageRank algorithm with personalization based on the seed nodes.

    • Computes node scores and determines cluster memberships.

  3. Result Compilation:
    • Compiles clustering results, including cluster sizes and node memberships, into a dictionary.

    • Logs the successful completion of the clustering process.

Args:
seed_nodes (List[Any]):
  • A list of node identifiers used as seed nodes for personalized PageRank.

  • These nodes influence the clustering process by biasing the algorithm.

Returns: Dict[str, Any]

  • A dictionary containing the clustering results. Keys may include:
    • clusters: Lists of nodes grouped into clusters.

    • scores: PageRank scores for each node.

    • metadata: Additional metrics or details about the clustering process.

Raises:

  • ValueError: If the input graph is empty or seed nodes are invalid.

  • Exception: For any unexpected errors during clustering execution.

Notes:

  • Seed nodes strongly influence the clustering outcome; select them carefully based on prior knowledge or experimental goals.

  • The PageRank algorithm requires a well-defined and connected graph to produce meaningful results.

  • Results are sensitive to the alpha (damping factor) and other hyperparameters.

run_pagerank_clustering(seed_nodes: List[Any]) Dict[str, Any][source]

Executes the PageRank clustering algorithm.

Parameters:

seed_nodes (List[Any]) – List of seed node identifiers for personalization.

Returns:

Dictionary containing clustering results.

Return type:

Dict[str, Any]

run_tuning(num_samples: int = 10) Dict[str, Any][source]
sweep_cut(p: Dict[Any, float]) Tuple[List[Any], int, float, float, float, str][source]