Metadata-Version: 2.4
Name: mcmst_clust
Version: 1.0.3
Summary: Multi-Center Minimum Spanning Tree Clustering algorithm
Home-page: https://github.com/senolali/MCMSTClustering
Author: Ali Şenol
Author-email: Ali Şenol <alisenol@tarsus.edu.tr>
Project-URL: Homepage, https://github.com/senolali/MCMSTClustering
Project-URL: Documentation, https://github.com/senolali/MCMSTClustering
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Requires-Dist: numpy
Requires-Dist: scipy
Requires-Dist: networkx
Dynamic: author
Dynamic: home-page
Dynamic: requires-python

# Motivation

MCMSTClustering is a minimum-cost MST based clustering algorithm.  
It uses MST distances and optional DBSCAN to detect clusters in high-dimensional data.

## Installation

```bash
pip install MCMSTClustering
```

## Usage

```bash
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_moons, make_blobs, make_circles
from sklearn.preprocessing import MinMaxScaler
# Import the MCMSTClustering package
from mcmst_clust import MCMSTClustering, normalize

# =============================================================================
# EXAMPLE 1: BASIC USAGE WITH SYNTHETIC DATA
# =============================================================================

print("=" * 60)
print(" BASIC USAGE WITH SYNTHETIC DATA")
print("=" * 60)

# Generate synthetic data (two moons)
X_moons, y_moons = make_moons(n_samples=300, noise=0.05, random_state=42)

# Normalize the data (important for distance-based clustering)
X_moons_normalized = normalize(X_moons)

# Initialize the MCMSTClustering model
# Parameters from the paper:
# N: Minimum number of data points to define a Micro Cluster (default: 5)
# r: Radius of the Micro Cluster (default: 0.05)
# n_micro: Minimum number of Micro Clusters to define a Macro Cluster (default: 5)
model = MCMSTClustering(N=5, r=0.05, n_micro=3)

# Fit the model to the data
model.fit(X_moons_normalized)

# Get cluster labels
labels = model.labels_



```

## Oerview

MCMSTClustering (Defining Non-Spherical Clusters by using Minimum Spanning Tree over KD-Tree-based Micro-Clusters) is designed to overcome limitations of conventional clustering algorithms when handling:

	- High-dimensional data
	
	- Imbalanced datasets
	
	- Clusters with varying densities
	
	- Noisy data/outliers
	
	- Arbitrary-shaped clusters
	

The algorithm consists of three main steps:

	1. Micro-cluster Formation: Defines micro-clusters using a KD-Tree data structure with range search.
	
	2. Macro-cluster Construction: Builds a minimum spanning tree (MST) over the micro-clusters to form macro-clusters.
	
	3. Cluster Regulation: Refines the clusters to improve accuracy and overall clustering quality.
	

Extensive experiments against state-of-the-art algorithms show that MCMSTClustering achieves high-quality clustering results with acceptable runtime.

Key Features

	- Clusters datasets with high quality

	- Detects arbitrary-shaped clusters

	- Robust against outliers/noisy data

	- Handles clusters with varying densities

	- Efficient on imbalanced datasets


## Cite

If you use the code in your works, please cite the paper given below:
```bash
Şenol, A. MCMSTClustering: defining non-spherical clusters by using minimum 
spanning tree over KD-tree-based micro-clusters. Neural Comput & Applic 35, 
13239–13259 (2023). https://doi.org/10.1007/s00521-023-08386-3
```

## BibTeX

```bash
@article{csenol2023mcmstclustering,
  title={MCMSTClustering: defining non-spherical clusters by using minimum spanning tree over KD-tree-based micro-clusters},
  author={{\c{S}}enol, Ali},
  journal={Neural Computing and Applications},
  volume={35},
  number={18},
  pages={13239--13259},
  year={2023},
  publisher={Springer}
}
```
