Metadata-Version: 2.1
Name: cyhdbscan
Version: 0.10
Summary: Very fast hdbscan for Python - written in Cython/C++
Home-page: https://github.com/hansalemaos/cyhdbscan
Author: Johannes Fischer
Author-email: aulasparticularesdealemaosp@gmail.com
License: MIT
Keywords: hdbscan,Cython,cpp
Classifier: Development Status :: 4 - Beta
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Utilities
Description-Content-Type: text/markdown
License-File: LICENSE.rst
Requires-Dist: Cython
Requires-Dist: setuptools


# Python Wrapper for HDBSCAN-C++

### `pip install cyhdbscan`

This repository contains a Python wrapper for the [HDBSCAN-C++ implementation by Rohan Mohapatra / Sumedh Basarkod](https://github.com/rohanmohapatra/hdbscan-cpp) . It allows you to perform HDBSCAN clustering directly from Python using Cython to bridge between Python and C++. It has no dependencies (except Cython for the compilation)

## Features

- Utilize the fast and efficient HDBSCAN algorithm implemented in C++
- Easy to use from Python
- Supports different distance metrics - Euclidean and Manhattan

## Prerequisites

Before you can use this wrapper, ensure you have the following installed:
- Python (of course)
- Cython
- A C++ compiler (e.g., GCC or MSVC)

## Usage example

```py
from cyhdbscan import py_calculate_hdbscan # The lib will be compiled the first time you import it

dataset = [
    (0.837, 2.136),
    (-1.758, 2.974),
    (1.190, 4.728),
    (2.140, 0.706),
    (-1.035, 8.206),
    (1.255, 0.090),
    (0.596, 4.086),
    (1.280, 1.058),
    (1.730, 1.147),
    (-0.949, 8.464),
    (0.935, 5.332),
    (2.369, 0.795),
    (0.429, 4.974),
    (-2.048, 6.654),
    (-1.457, 7.487),
    (0.529, 3.808),
    (1.782, 0.908),
    (-1.956, 8.616),
    (-1.746, 3.012),
    (-1.180, 3.128),
    (1.164, 3.791),
    (1.362, 1.366),
    (2.601, 1.088),
    (0.272, 5.470),
    (-3.122, 3.282),
    (-0.588, 8.614),
    (1.669, -0.436),
    (-0.683, 7.675),
    (2.368, 0.552),
    (1.052, 4.545),
    (2.227, 1.263),
    (2.439, -0.073),
    (1.345, 4.857),
    (-1.315, 6.839),
    (0.983, 5.375),
    (-1.063, 2.208),
    (-1.607, 3.565),
    (1.573, 0.484),
    (-2.179, 8.086),
    (1.834, 0.754),
    (2.106, 3.495),
    (-1.643, 7.527),
    (1.106, 1.264),
    (1.612, 1.823),
    (0.460, 5.450),
    (-0.538, 3.016),
    (1.678, 0.609),
    (-1.012, 3.603),
    (1.342, 0.594),
    (1.428, 1.624),
    (2.045, 1.125),
    (1.673, 0.659),
    (-1.359, 2.322),
    (1.131, 0.936),
    (-1.739, 1.948),
    (-0.340, 8.167),
    (-1.638, 2.433),
    (-1.688, 2.241),
    (2.430, -0.064),
    (-1.380, 7.185),
    (-1.252, 2.339),
    (-2.395, 3.398),
    (-2.092, 7.481),
    (0.488, 3.268),
    (-0.539, 7.456),
    (-2.592, 8.076),
    (-1.047, 2.965),
    (1.256, 3.382),
    (-1.622, 4.272),
    (1.869, 5.441),
    (-1.764, 2.222),
    (-1.382, 7.288),
    (0.008, 4.176),
    (-1.103, 7.302),
    (-1.794, 7.581),
    (-1.512, 7.944),
    (0.959, 4.561),
    (-0.601, 6.300),
    (0.225, 4.770),
    (1.567, 0.018),
    (-1.034, 2.921),
    (-0.922, 8.099),
    (-1.886, 2.248),
    (1.869, 0.956),
    (1.101, 4.890),
    (-1.932, 8.306),
    (0.670, 4.041),
    (0.744, 4.122),
    (1.640, 1.819),
    (0.815, 4.785),
    (-2.633, 2.631),
    (-0.961, 1.274),
    (0.214, 4.885),
    (1.435, 1.307),
    (1.214, 3.648),
    (1.083, 4.063),
    (-1.226, 8.296),
    (1.482, 0.690),
    (1.896, 5.185),
    (-1.324, 4.131),
    (-1.150, 7.893),
    (2.469, 1.679),
    (2.311, 1.304),
    (0.573, 4.088),
    (-0.968, 3.122),
    (2.625, 0.950),
    (1.684, 4.196),
    (-2.221, 2.731),
    (-1.578, 3.034),
    (0.082, 4.567),
    (1.433, 4.377),
    (1.063, 5.176),
    (0.768, 4.398),
    (2.470, 1.315),
    (-1.732, 7.164),
    (0.347, 3.452),
    (-1.001, 2.849),
    (1.016, 4.485),
    (0.560, 4.214),
    (-2.118, 2.035),
    (-1.362, 2.383),
    (-2.784, 2.992),
    (1.652, 3.656),
    (-1.940, 2.189),
    (-1.815, 7.978),
    (1.202, 3.644),
    (-0.969, 3.267),
    (1.870, -0.108),
    (-1.807, 2.068),
    (1.218, 3.893),
    (-1.484, 6.008),
    (-1.564, 2.853),
    (-0.686, 8.683),
    (1.076, 4.685),
    (-0.976, 6.738),
    (1.380, 4.548),
    (-1.641, 2.681),
    (-0.002, 4.581),
    (1.714, 5.025),
    (-1.405, 7.726),
    (-0.708, 2.504),
    (-0.886, 2.646),
    (1.984, 0.490),
    (2.952, -0.344),
    (0.432, 4.335),
    (-1.866, 7.625),
    (2.527, 0.618),
    (2.041, 0.455),
    (-2.580, 3.188),
    (1.620, 0.068),
    (-2.588, 3.131),
    (0.444, 3.115),
    (-0.457, 7.306),
    (-1.129, 7.805),
    (2.130, 5.192),
    (1.004, 4.191),
    (-1.393, 8.746),
    (0.728, 3.855),
    (0.893, 1.011),
    (-1.108, 2.920),
    (0.789, 4.337),
    (1.976, 0.719),
    (-1.249, 3.085),
    (-1.078, 8.881),
    (-1.868, 3.080),
    (2.768, 1.088),
    (0.277, 4.844),
    (3.411, 0.872),
    (-1.581, 7.553),
    (-1.530, 7.705),
    (-1.825, 7.360),
    (-1.686, 7.953),
    (-1.651, 3.446),
    (-1.304, 3.003),
    (-0.731, 6.242),
    (2.406, 4.870),
    (-1.536, 3.014),
    (1.489, 0.652),
    (0.514, 4.627),
    (-1.815, 3.290),
    (-1.937, 3.914),
    (-0.615, 3.950),
    (2.032, 0.197),
    (2.149, 1.037),
    (-1.370, 7.770),
    (0.914, 4.550),
    (0.334, 4.936),
    (-2.160, 3.410),
    (1.367, 0.635),
    (-0.571, 8.133),
    (-1.006, 3.084),
    (1.495, 3.858),
    (-0.590, 7.695),
    (0.715, 5.413),
    (2.114, 1.247),
    (1.201, 0.602),
    (-2.546, 3.150),
    (-1.959, 2.430),
    (2.338, 3.431),
    (3.353, 1.700),
    (1.843, 0.073),
    (1.320, 1.404),
    (2.097, 4.847),
    (-1.243, 8.152),
    (-1.859, 7.789),
    (2.747, 1.545),
    (2.608, 1.089),
    (1.660, 3.563),
    (2.352, 0.828),
    (2.223, 0.839),
    (3.229, 1.132),
    (-1.559, 7.248),
    (-0.647, 3.429),
    (-1.327, 8.515),
    (0.917, 3.906),
    (2.295, -0.766),
    (1.816, 1.120),
    (-1.120, 7.110),
    (-1.655, 8.614),
    (-1.276, 7.968),
    (1.974, 1.580),
    (2.518, 1.392),
    (0.439, 4.536),
    (0.369, 7.791),
    (-1.791, 2.750),
]

result = py_calculate_hdbscan(
    data=dataset, min_points=5, min_cluster_size=5, distance_metric="Euclidean"
)
import pandas as pd

print(pd.DataFrame(result).to_string())

#        original_data  label  membership_probability  outlier_score  outlier_id
# 0     [0.837, 2.136]      4                0.000000       0.000000          66
# 1    [-1.758, 2.974]      7                0.000000       0.000000          29
# 2      [1.19, 4.728]      6                0.000000       0.000000           6
# 3      [2.14, 0.706]      4                0.742785       0.000000          80
# 4    [-1.035, 8.206]      3                0.000000       0.000000         188
# 5      [1.255, 0.09]      4                0.738651       0.000000          48
# 6     [0.596, 4.086]      6                0.742785       0.000000          76
# 7      [1.28, 1.058]      4                0.719853       0.000000         103
# 8      [1.73, 1.147]      4                0.689899       0.000000          70
# 9    [-0.949, 8.464]      3                0.742785       0.000000         190
# 10    [0.935, 5.332]      6                0.738651       0.000000         177
# 11    [2.369, 0.795]      4                0.416808       0.000000         108
# 12    [0.429, 4.974]      6                0.719853       0.000000         159
# 13   [-2.048, 6.654]      3                0.738651       0.000000          51
# 14   [-1.457, 7.487]      3                0.719853       0.000000          97
# 15    [0.529, 3.808]      6                0.689899       0.000000          86
# 16    [1.782, 0.908]      4                0.826320       0.000000          82
# 17   [-1.956, 8.616]      3                0.689899       0.000000         128
# 18   [-1.746, 3.012]      7                0.742785       0.000000         186
# 19    [-1.18, 3.128]      7                0.738651       0.000000         166
# 20    [1.164, 3.791]      6                0.416808       0.000000         118
# 21    [1.362, 1.366]      4                0.636566       0.000000          87
# 22    [2.601, 1.088]      4                0.639942       0.000000         133
# 23     [0.272, 5.47]      6                0.826320       0.000000         117
# 24   [-3.122, 3.282]      7                0.719853       0.000000          57
# 25   [-0.588, 8.614]      3                0.416808       0.000000          18
# 26   [1.669, -0.436]      4                0.582667       0.000000          19
# 27   [-0.683, 7.675]      3                0.826320       0.000000         185
# 28    [2.368, 0.552]      4                0.461632       0.000000          41
# 29    [1.052, 4.545]      6                0.636566       0.000000         169
# 30    [2.227, 1.263]      4                0.722914       0.000000         168
# 31   [2.439, -0.073]      4                0.671035       0.000000          74
# 32    [1.345, 4.857]      6                0.639942       0.000000         219
# 33   [-1.315, 6.839]      3                0.636566       0.000000         184
# 34    [0.983, 5.375]      6                0.582667       0.000000           1
# 35   [-1.063, 2.208]      7                0.689899       0.000000         176
# 36   [-1.607, 3.565]      7                0.416808       0.000000         131
# 37    [1.573, 0.484]      4                0.122696       0.000000          92
# 38   [-2.179, 8.086]      3                0.639942       0.000000         123
# 39    [1.834, 0.754]      4                0.737856       0.000000          78
# 40    [2.106, 3.495]      6                0.461632       0.000000         201
# 41   [-1.643, 7.527]      3                0.582667       0.000000          21
# 42    [1.106, 1.264]      4                0.673931       0.000000         148
# 43    [1.612, 1.823]      4                0.721101       0.000000          49
# 44      [0.46, 5.45]      6                0.722914       0.000000          12
# 45   [-0.538, 3.016]      7                0.826320       0.000000         196
# 46    [1.678, 0.609]      4                0.341140       0.000000          93
# 47   [-1.012, 3.603]      7                0.636566       0.000000           7
# 48    [1.342, 0.594]      4                0.760534       0.000000          61
# 49    [1.428, 1.624]      4                0.760116       0.000000         150
# 50    [2.045, 1.125]      4                0.689325       0.000000         121
# 51    [1.673, 0.659]      4                0.685775       0.005590         104
# 52   [-1.359, 2.322]      7                0.639942       0.023094          14
# 53    [1.131, 0.936]      4                0.701151       0.031076          42
# 54   [-1.739, 1.948]      7                0.582667       0.056146          39
# 55    [-0.34, 8.167]      3                0.461632       0.057855         204
# 56   [-1.638, 2.433]      7                0.461632       0.062953          75
# 57   [-1.688, 2.241]      7                0.722914       0.080455           2
# 58    [2.43, -0.064]      4                0.387009       0.081781         139
# 59    [-1.38, 7.185]      3                0.722914       0.087602          46
# 60   [-1.252, 2.339]      7                0.671035       0.094225         173
# 61   [-2.395, 3.398]      7                0.122696       0.098303         116
# 62   [-2.092, 7.481]      3                0.671035       0.104613         162
# 63    [0.488, 3.268]      6                0.671035       0.135718         211
# 64   [-0.539, 7.456]      3                0.122696       0.140783          37
# 65   [-2.592, 8.076]      3                0.737856       0.160861          56
# 66   [-1.047, 2.965]      7                0.737856       0.161913         145
# 67    [1.256, 3.382]      6                0.122696       0.162461         170
# 68   [-1.622, 4.272]      0                0.000000       0.180867         194
# 69    [1.869, 5.441]      6                0.737856       0.180867         102
# 70   [-1.764, 2.222]      7                0.673931       0.180867          50
# 71   [-1.382, 7.288]      3                0.673931       0.180867         216
# 72    [0.008, 4.176]      6                0.673931       0.180867          83
# 73   [-1.103, 7.302]      3                0.721101       0.183881           4
# 74   [-1.794, 7.581]      3                0.341140       0.183881         100
# 75   [-1.512, 7.944]      3                0.760534       0.183881         203
# 76    [0.959, 4.561]      6                0.721101       0.190656         209
# 77     [-0.601, 6.3]      3                0.760116       0.190656          30
# 78     [0.225, 4.77]      6                0.341140       0.190656         183
# 79    [1.567, 0.018]      4                0.326183       0.196035          71
# 80   [-1.034, 2.921]      7                0.721101       0.203713          11
# 81   [-0.922, 8.099]      3                0.689325       0.208430         160
# 82   [-1.886, 2.248]      7                0.341140       0.208627         224
# 83    [1.869, 0.956]      4                0.417112       0.208888          16
# 84     [1.101, 4.89]      6                0.760534       0.212415           3
# 85   [-1.932, 8.306]      3                0.685775       0.217258         112
# 86     [0.67, 4.041]      6                0.760116       0.217690         153
# 87    [0.744, 4.122]      6                0.689325       0.226782         157
# 88     [1.64, 1.819]      4                0.386631       0.233446         171
# 89    [0.815, 4.785]      6                0.685775       0.249974          54
# 90   [-2.633, 2.631]      7                0.760534       0.253104          59
# 91   [-0.961, 1.274]      0                0.000000       0.255871         161
# 92    [0.214, 4.885]      6                0.701151       0.256637         129
# 93    [1.435, 1.307]      4                0.470280       0.256637         125
# 94    [1.214, 3.648]      6                0.387009       0.256637          94
# 95    [1.083, 4.063]      6                0.326183       0.256637          20
# 96   [-1.226, 8.296]      3                0.701151       0.256637         214
# 97     [1.482, 0.69]      4                0.684759       0.261700          22
# 98    [1.896, 5.185]      6                0.417112       0.261700         113
# 99   [-1.324, 4.131]      7                0.760116       0.269267         206
# 100   [-1.15, 7.893]      3                0.387009       0.275141          95
# 101   [2.469, 1.679]      4                0.838035       0.280604          15
# 102   [2.311, 1.304]      4                0.727384       0.283523         164
# 103   [0.573, 4.088]      6                0.386631       0.286972          84
# 104  [-0.968, 3.122]      7                0.689325       0.288302         147
# 105    [2.625, 0.95]      4                0.338441       0.292141         208
# 106   [1.684, 4.196]      6                0.470280       0.300090          28
# 107  [-2.221, 2.731]      7                0.685775       0.300860         155
# 108  [-1.578, 3.034]      7                0.701151       0.306750          96
# 109   [0.082, 4.567]      6                0.684759       0.310031         144
# 110   [1.433, 4.377]      6                0.838035       0.324773          89
# 111   [1.063, 5.176]      6                0.727384       0.325524         126
# 112   [0.768, 4.398]      6                0.338441       0.327308         217
# 113    [2.47, 1.315]      4                0.635927       0.333191         136
# 114  [-1.732, 7.164]      3                0.326183       0.335403         221
# 115   [0.347, 3.452]      6                0.635927       0.336760          52
# 116  [-1.001, 2.849]      7                0.387009       0.342861         195
# 117   [1.016, 4.485]      6                0.482353       0.344652         197
# 118    [0.56, 4.214]      6                0.430840       0.344827         179
# 119  [-2.118, 2.035]      7                0.326183       0.348280         142
# 120  [-1.362, 2.383]      7                0.417112       0.352888         105
# 121  [-2.784, 2.992]      7                0.386631       0.355081         124
# 122   [1.652, 3.656]      6                0.472956       0.355697          32
# 123   [-1.94, 2.189]      7                0.470280       0.362003         178
# 124  [-1.815, 7.978]      3                0.417112       0.363128          81
# 125   [1.202, 3.644]      6                0.393618       0.372836         135
# 126  [-0.969, 3.267]      7                0.684759       0.386366           9
# 127   [1.87, -0.108]      4                0.482353       0.387209           8
# 128  [-1.807, 2.068]      7                0.838035       0.391382         191
# 129   [1.218, 3.893]      6                0.743208       0.392763         120
# 130  [-1.484, 6.008]      3                0.386631       0.395156         114
# 131  [-1.564, 2.853]      7                0.727384       0.397227         213
# 132  [-0.686, 8.683]      3                0.470280       0.401954         222
# 133   [1.076, 4.685]      6                0.502175       0.402517         109
# 134  [-0.976, 6.738]      3                0.684759       0.403438         141
# 135    [1.38, 4.548]      6                0.766235       0.418471          62
# 136  [-1.641, 2.681]      7                0.338441       0.432980          53
# 137  [-0.002, 4.581]      6                0.189619       0.437722          73
# 138   [1.714, 5.025]      6                0.759588       0.439706         200
# 139  [-1.405, 7.726]      3                0.838035       0.439706          79
# 140  [-0.708, 2.504]      7                0.635927       0.439706         127
# 141  [-0.886, 2.646]      7                0.482353       0.439706         182
# 142    [1.984, 0.49]      4                0.430840       0.441015         146
# 143  [2.952, -0.344]      4                0.472956       0.457827          85
# 144   [0.432, 4.335]      6                0.624909       0.457827         218
# 145  [-1.866, 7.625]      3                0.727384       0.458990         119
# 146   [2.527, 0.618]      4                0.393618       0.463463         137
# 147   [2.041, 0.455]      4                0.743208       0.470460          60
# 148   [-2.58, 3.188]      7                0.430840       0.470825         149
# 149    [1.62, 0.068]      4                0.502175       0.483465         165
# 150  [-2.588, 3.131]      7                0.472956       0.485582          38
# 151   [0.444, 3.115]      0                0.000000       0.492651         163
# 152  [-0.457, 7.306]      3                0.338441       0.505280          33
# 153  [-1.129, 7.805]      3                0.635927       0.505572         172
# 154    [2.13, 5.192]      6                0.417656       0.511402         111
# 155   [1.004, 4.191]      6                0.441913       0.511402         193
# 156  [-1.393, 8.746]      3                0.482353       0.516557         192
# 157   [0.728, 3.855]      6                0.438766       0.516557          27
# 158   [0.893, 1.011]      4                0.766235       0.518120         110
# 159   [-1.108, 2.92]      7                0.393618       0.521854         189
# 160   [0.789, 4.337]      6                0.758176       0.524485         101
# 161   [1.976, 0.719]      4                0.189619       0.526365         212
# 162  [-1.249, 3.085]      7                0.743208       0.532096         156
# 163  [-1.078, 8.881]      3                0.430840       0.535303          67
# 164   [-1.868, 3.08]      7                0.502175       0.536606          98
# 165   [2.768, 1.088]      4                0.759588       0.536606         202
# 166   [0.277, 4.844]      6                0.425497       0.536606         154
# 167   [3.411, 0.872]      4                0.624909       0.536606         138
# 168  [-1.581, 7.553]      3                0.472956       0.543011         122
# 169   [-1.53, 7.705]      3                0.393618       0.544018         207
# 170   [-1.825, 7.36]      3                0.743208       0.544851          35
# 171  [-1.686, 7.953]      3                0.502175       0.547966         187
# 172  [-1.651, 3.446]      7                0.766235       0.555784         220
# 173  [-1.304, 3.003]      7                0.189619       0.560654          25
# 174  [-0.731, 6.242]      3                0.766235       0.564289          10
# 175    [2.406, 4.87]      6                0.806437       0.572578          45
# 176  [-1.536, 3.014]      7                0.759588       0.576194         107
# 177   [1.489, 0.652]      4                0.417656       0.577034         205
# 178   [0.514, 4.627]      6                0.671555       0.579684          44
# 179   [-1.815, 3.29]      7                0.624909       0.582450          47
# 180  [-1.937, 3.914]      7                0.417656       0.587861          34
# 181   [-0.615, 3.95]      0                0.000000       0.600229          90
# 182   [2.032, 0.197]      4                0.441913       0.600301         132
# 183   [2.149, 1.037]      4                0.438766       0.602544         140
# 184    [-1.37, 7.77]      3                0.189619       0.604401          36
# 185    [0.914, 4.55]      6                0.738818       0.610496         134
# 186   [0.334, 4.936]      6                0.779360       0.611313          17
# 187    [-2.16, 3.41]      7                0.441913       0.615856          64
# 188   [1.367, 0.635]      4                0.758176       0.616716          55
# 189  [-0.571, 8.133]      3                0.759588       0.617330          23
# 190  [-1.006, 3.084]      7                0.438766       0.618283         180
# 191   [1.495, 3.858]      6                0.638316       0.619707         106
# 192   [-0.59, 7.695]      3                0.624909       0.621117          43
# 193   [0.715, 5.413]      6                0.610878       0.621668           5
# 194   [2.114, 1.247]      4                0.425497       0.622072         158
# 195   [1.201, 0.602]      4                0.806437       0.628202          72
# 196   [-2.546, 3.15]      7                0.758176       0.629062         115
# 197   [-1.959, 2.43]      7                0.425497       0.630759          88
# 198   [2.338, 3.431]      0                0.000000       0.640282          26
# 199     [3.353, 1.7]      4                0.671555       0.643993          24
# 200   [1.843, 0.073]      4                0.738818       0.652346         152
# 201    [1.32, 1.404]      4                0.779360       0.666485          31
# 202   [2.097, 4.847]      6                0.642165       0.673339          58
# 203  [-1.243, 8.152]      3                0.417656       0.675609          63
# 204  [-1.859, 7.789]      3                0.441913       0.676482          99
# 205   [2.747, 1.545]      4                0.638316       0.676673          69
# 206   [2.608, 1.089]      4                0.610878       0.689151         210
# 207    [1.66, 3.563]      6                0.331853       0.708094          13
# 208   [2.352, 0.828]      4                0.642165       0.709907         175
# 209   [2.223, 0.839]      4                0.331853       0.710540          40
# 210   [3.229, 1.132]      4                0.680165       0.713226          65
# 211  [-1.559, 7.248]      3                0.438766       0.718864         181
# 212  [-0.647, 3.429]      7                0.806437       0.731078         174
# 213  [-1.327, 8.515]      3                0.758176       0.740460         151
# 214   [0.917, 3.906]      6                0.680165       0.747472         130
# 215  [2.295, -0.766]      4                0.760572       0.751100          68
# 216    [1.816, 1.12]      4                0.324560       0.752195         215
# 217    [-1.12, 7.11]      3                0.425497       0.758514          77
# 218  [-1.655, 8.614]      3                0.806437       0.766876         167
# 219  [-1.276, 7.968]      3                0.671555       0.767946         223
# 220    [1.974, 1.58]      4                0.657128       0.771445         199
# 221   [2.518, 1.392]      4                0.546996       0.779360           0
# 222   [0.439, 4.536]      6                0.760572       0.782303         198
# 223   [0.369, 7.791]      3                0.738818       0.816012         143
# 224   [-1.791, 2.75]      7                0.671555       0.827391          91
```
