Metadata-Version: 2.4
Name: sbcluster
Version: 0.3.2
Summary: Spectral Bridges clustering algorithm
Author: Félix Laplante
Project-URL: Source, https://gitlab.com/felixlaplante0/sbcluster
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: MacOS
Classifier: Operating System :: Microsoft :: Windows
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Requires-Dist: scipy
Requires-Dist: scikit-learn
Requires-Dist: faiss-cpu
Dynamic: license-file

# 📊 Spectral Bridges

**sbcluster** is a Python package that implements a novel clustering algorithm combining k-means and spectral clustering techniques, called **Spectral Bridges**. It leverages efficient affinity matrix computation and merges clusters based on a connectivity measure inspired by SVM's margin concept. This package is designed to provide robust clustering solutions, particularly suited for large datasets.

---

## ✨ Features

- **Spectral Bridges Clustering Algorithm**: Integrates k-means and spectral clustering with efficient affinity matrix calculation for improved clustering results.
- **Scalability**: Designed to handle large datasets by optimizing cluster formation through advanced affinity matrix computations.
- **Customizable**: Parameters such as number of clusters, iterations, and random state allow flexibility in clustering configurations.
- **Model selection**: Automatic model selection for number of nodes (m) according to a normalized eigengap metric.
- **scikit-learn**: Native integration with the standard API, with easy options for model selection and evaluation.


---

## ⚡ Speed

Spectral Bridges not only utilizes FAISS's efficient k-means implementation but also uses a scikit-learn method clone for centroid initialization, which is much faster than using scikit-learn's implementation (over 2x improvement).

---

## 🚀 Installation

```bash
pip install sbcluster
```

## 🔧 Usage

### Example

```python
import matplotlib.pyplot as plt
import numpy as np
from sbcluster import SpectralBridges, ngap_scorer
from sklearn.cluster import SpectralClustering
from sklearn.metrics import adjusted_rand_score
from sklearn.model_selection import GridSearchCV

# Load some synthetic data
data = np.genfromtxt("datasets/impossible.csv", delimiter=",")
X, y = data[:, :-1], data[:, -1]

# Define the parameter grid
param_grid = {"n_clusters": [2, 3, 4, 5, 6, 7, 8, 9, 10]}
cv = [(np.arange(X.shape[0]), np.arange(X.shape[0]))] * 5

# Perform grid search for optimal parameters
grid_search = GridSearchCV(
    estimator=SpectralBridges(n_clusters=2, n_nodes=250),
    param_grid=param_grid,
    scoring=ngap_scorer,
    cv=cv,
    verbose=1,
)

# Fit the grid search
grid_search.fit(X)

# Print the results
print(grid_search.cv_results_["mean_test_score"])
print(grid_search.best_params_)

# Make predictions with the best model
guess = grid_search.best_estimator_.predict(X)
ari = adjusted_rand_score(y, guess)

# Print the ARI
print(f"Adjusted Rand Index: {ari}")

# Visualize the clustering results
plt.scatter(X[:, 0], X[:, 1], c=guess, alpha=0.1)
plt.scatter(
    grid_search.best_estimator_.cluster_centers_[:, 0],
    grid_search.best_estimator_.cluster_centers_[:, 1],
    c=grid_search.best_estimator_.cluster_labels_,
    marker="X",
)
plt.title("Clustered data and centroids with best SpectralBridges fit")
plt.show()

# Compare with sklearn's SpectralClustering
sc = SpectralClustering(n_clusters=7).fit(X)

plt.scatter(X[:, 0], X[:, 1], c=sc.labels_, alpha=0.1)
plt.title("Spectral Clustering of the original dataset")
plt.show()
```

## Results Comparison

<p align="center">
  <img src="./figures/spectralbridges.png" alt="Spectral Bridges result" width="48%">
  <img src="./figures/spectralclustering.png" alt="Spectral Clustering result" width="48%">
</p>

---

## 📖 Learn More

For tutorials, API reference, visit the official site:  
👉 [sbcluster Documentation](https://felixlaplante0.gitlab.io/sbcluster)
