Metadata-Version: 2.1
Name: bkmeans
Version: 1.0
Summary: The breathing k-means algorithm
Home-page: https://github.com/gittar/bkmeans
Author: Bernd Fritzke
Author-email: fritzke@web.de
License: MIT
Description: 
        # The Breathing *K*-Means Algorithm
        
        ## A novel approximation algorithm for the *k*-means problem
        
        **Breathing *k*-means** finds solutions which
        * are usually better than solutions from ***k*-means++**
        * are lower-bounded by ***k*-means++** (the worst possible result *is* a solution from ***k*-means++**)
        * require more computation time than ***k*-means++** (ca 50%-300% depending on the problem)
        
        The second claim is fulfilled, since 
        the default seeding method of **breathing *k*-means** *is* ***k*-means++**.  If no improvement is found by **breathing *k*-means**, it simply returns the ***k*-means++** result used as seed.
        
        The more interesting part about **breathing *k*-means** is probably its ability to improve upon ***k*-means++** in many cases and to do so at moderate  extra computational cost  (considering that the *k*-means problem is NP-hard).
        
        ## Further Info
        For a detailed motivation and description of the **breathing *k*-means algorithm** see the preprint https://arxiv.org/abs/2006.15666. 
        
        An extended software repo including the data sets used in the above preprint and jupyter notebooks to experiment with them can be found at https://github.com/gittar/breathing-k-means
        
        ## API
        The included class **BKMeans** is subclassed from [scikit-learn's **KMeans** class](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html)
        and has thus the same API, only extended by two parameters which, however, can be ignored for normal usage and thus simply left at their default values:
        
        * *m* (breathing depth), default: 5, (m=0 is equivalent to running k-means++)
        * *theta* (neighborhood freezing range), default: 1.1
        
        
        ## Installation
        
        ```bash
        pip install bkmeans
        ```
        ## Example 1: running on simple random data set
        Code:
        ```python
        import numpy as np
        from bkmeans import BKMeans
        
        # generate random data set
        X=np.random.rand(1000,2)
        
        # create BKMeans instance
        bkm = BKMeans(n_clusters=100)
        
        # run the algorithm
        bkm.fit(X)
        
        # print SSE (inertia in scikit-learn terms)
        print(bkm.inertia_)
        ```
        Output:
        ```
        1.1775040547902602
        ```
        
        ## Example 2: comparison with *k*-means++ (multiple runs)
        Code:
        ```python
        import numpy as np
        from sklearn.cluster import KMeans
        from bkmeans import BKMeans
        
        # random 2D data set
        X=np.random.rand(1000,2)
        
        # number of centroids
        k=100
        
        for i in range(5):
            # kmeans++
            km = KMeans(n_clusters=k)
            km.fit(X)
        
            # breathing k-means
            bkm = BKMeans(n_clusters=k)
            bkm.fit(X)
        
            # relative SSE improvement of bkm over km++
            imp = 1 - bkm.inertia_/km.inertia_
            print(f"SSE improvement over k-means++: {imp:.2%}")
        ```
        Output:
        
        ```
        SSE improvement over k-means++: 3.38%
        SSE improvement over k-means++: 4.16%
        SSE improvement over k-means++: 6.14%
        SSE improvement over k-means++: 6.79%
        SSE improvement over k-means++: 4.76%
        ```
        
        
        
Platform: UNKNOWN
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.7.0
Description-Content-Type: text/markdown
