===================
Quick start example
===================

Users of ``scikit-hubness`` typically want to

1. analyse, whether their data show hubness
2. reduce hubness
3. perform learning (classification, regression, ...)

The following example shows all these steps for an example dataset
from the text domain (dexter).
Please make sure you have installed ``scikit-hubness``
(`installation instructions <installation.html>`_).

First, we load the dataset and inspect its size.

.. code-block:: python

    from skhubness.data import load_dexter
    X, y = load_dexter()
    print(f'X.shape = {X.shape}, y.shape={y.shape}')

Dexter is embedded in a high-dimensional space,
and could, thus, be prone to hubness.
Therefore, we assess the actual degree of hubness.

.. code-block:: python

    from skhubness import Hubness
    hub = Hubness(k=10, metric='cosine')
    hub.fit(X)
    k_skew = hub.score()
    print(f'Skewness = {k_skew:.3f}')

As a rule-of-thumb, skewness > 1.2 indicates significant hubness.
Additional hubness indices are available, for example:

.. code-block:: python

    print(f'Robin hood index: {hub.robinhood_index:.3f}')
    print(f'Antihub occurrence: {hub.antihub_occurrence:.3f}')
    print(f'Hub occurrence: {hub.hub_occurrence:.3f}')

There is considerable hubness in dexter.
Let's see, whether hubness reduction can improve
kNN classification performance.

.. code-block:: python

    from sklearn.model_selection import cross_val_score
    from skhubness.neighbors import KNeighborsClassifier

    # vanilla kNN
    knn_standard = KNeighborsClassifier(n_neighbors=5,
                                        metric='cosine')
    acc_standard = cross_val_score(knn_standard, X, y, cv=5)

    # kNN with hubness reduction (mutual proximity)
    knn_mp = KNeighborsClassifier(n_neighbors=5,
                                  metric='cosine',
                                  hubness='mutual_proximity')
    acc_mp = cross_val_score(knn_mp, X, y, cv=5)

    print(f'Accuracy (vanilla kNN): {acc_standard.mean():.3f}')
    print(f'Accuracy (kNN with hubness reduction): {acc_mp.mean():.3f}')


Accuracy was considerably improved by mutual proximity (MP).
But did MP actually reduce hubness?

.. code-block:: python

    hub_mp = Hubness(k=10, metric='cosine',
                     hubness='mutual_proximity')
    hub_mp.fit(X)
    k_skew_mp = hub_mp.score()
    print(f'Skewness after MP: {k_skew_mp:.3f} '
          f'(reduction of {k_skew - k_skew_mp:.3f})')
    print(f'Robin hood: {hub_mp.robinhood_index:.3f} '
          f'(reduction of {hub.robinhood_index - hub_mp.robinhood_index:.3f})')

Yes!

The neighbor graph can also be created directly,
with or without hubness reduction:

.. code-block:: python

    from skhubness.neighbors import kneighbors_graph
    neighbor_graph = kneighbors_graph(X,
                                      n_neighbors=5,
                                      hubness='mutual_proximity')

You may want to precompute the graph like this,
in order to avoid computing it repeatedly for subsequent hubness estimation and learning.
