Metadata-Version: 2.1
Name: dbscan1d
Version: 0.1.1
Summary: dbscan1d is a package for DBSCAN on 1D arrays
Home-page: https://github.com/d-chambers/dbscan1d
Author: Derrick Chambers
Author-email: djachambeador@gmail.com
License: GNU Lesser General Public License v3.0 or later (LGPLv3.0+)
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU Lesser General Public License v3 or later (LGPLv3+)
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Topic :: Scientific/Engineering
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: numpy (>=1.13.0)
Requires-Dist: black
Requires-Dist: flake8

# DBSCAN1D
dbscan1d is a 1D implementation of the [DBSCAN algorithm](https://en.wikipedia.org/wiki/DBSCAN). It was created to efficiently
preform clustering on large 1D arrays.

[Sci-kit Learn's DBSCAN implementation](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html) does
not have a special case for 1D, where calculating the full distance matrix is wasteful. It is much better to simply sort
the input array and performing efficient bisects for finding closest points. Here are the results of running the simple
profile script included with the package. In every case DBSCAN1D is much faster than scikit learn's implementation.

![image](https://github.com/d-chambers/dbscan1d/raw/master/profile_results.png)

## Installation
Simply use pip to install dbscan1d:
```bash
pip install dbscan1d
```
It only requires numpy.

## Quickstart
dbscan1d is designed to be interchangable with sklearn's implementation in alnmost
all cases. The exception is that the `weights` parameter is not yet supported.

```python
from sklearn.datasets import make_blobs

from dbscan1d.core import DBSCAN1D

# make blobs to test clustering
X = make_blobs(1_000_000, centers=2, n_features=1)[0]

# init dbscan object
dbs = DBSCAN1D(eps=.5, min_samples=4)

# get labels for each point
labels = dbs.fit_predict(X)

# show core point indices
dbs.core_sample_indices_

# get values of core points
dbs.components_
```


