
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/combine/plot_comparison_combine.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_auto_examples_combine_plot_comparison_combine.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_combine_plot_comparison_combine.py:


==================================================
Compare sampler combining over- and under-sampling
==================================================

This example shows the effect of applying an under-sampling algorithms after
SMOTE over-sampling. In the literature, Tomek's link and edited nearest
neighbours are the two methods which have been used and are available in
imbalanced-learn.

.. GENERATED FROM PYTHON SOURCE LINES 11-15

.. code-block:: default


    # Authors: Guillaume Lemaitre <g.lemaitre58@gmail.com>
    # License: MIT








.. GENERATED FROM PYTHON SOURCE LINES 16-24

.. code-block:: default

    print(__doc__)

    import matplotlib.pyplot as plt
    import seaborn as sns

    sns.set_context("poster")






.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none


    /Users/glemaitre/mambaforge/envs/dev/lib/python3.8/site-packages/seaborn/cm.py:1582: UserWarning: Trying to register the cmap 'rocket' which already exists.
      mpl_cm.register_cmap(_name, _cmap)
    /Users/glemaitre/mambaforge/envs/dev/lib/python3.8/site-packages/seaborn/cm.py:1583: UserWarning: Trying to register the cmap 'rocket_r' which already exists.
      mpl_cm.register_cmap(_name + "_r", _cmap_r)
    /Users/glemaitre/mambaforge/envs/dev/lib/python3.8/site-packages/seaborn/cm.py:1582: UserWarning: Trying to register the cmap 'mako' which already exists.
      mpl_cm.register_cmap(_name, _cmap)
    /Users/glemaitre/mambaforge/envs/dev/lib/python3.8/site-packages/seaborn/cm.py:1583: UserWarning: Trying to register the cmap 'mako_r' which already exists.
      mpl_cm.register_cmap(_name + "_r", _cmap_r)
    /Users/glemaitre/mambaforge/envs/dev/lib/python3.8/site-packages/seaborn/cm.py:1582: UserWarning: Trying to register the cmap 'icefire' which already exists.
      mpl_cm.register_cmap(_name, _cmap)
    /Users/glemaitre/mambaforge/envs/dev/lib/python3.8/site-packages/seaborn/cm.py:1583: UserWarning: Trying to register the cmap 'icefire_r' which already exists.
      mpl_cm.register_cmap(_name + "_r", _cmap_r)
    /Users/glemaitre/mambaforge/envs/dev/lib/python3.8/site-packages/seaborn/cm.py:1582: UserWarning: Trying to register the cmap 'vlag' which already exists.
      mpl_cm.register_cmap(_name, _cmap)
    /Users/glemaitre/mambaforge/envs/dev/lib/python3.8/site-packages/seaborn/cm.py:1583: UserWarning: Trying to register the cmap 'vlag_r' which already exists.
      mpl_cm.register_cmap(_name + "_r", _cmap_r)
    /Users/glemaitre/mambaforge/envs/dev/lib/python3.8/site-packages/seaborn/cm.py:1582: UserWarning: Trying to register the cmap 'flare' which already exists.
      mpl_cm.register_cmap(_name, _cmap)
    /Users/glemaitre/mambaforge/envs/dev/lib/python3.8/site-packages/seaborn/cm.py:1583: UserWarning: Trying to register the cmap 'flare_r' which already exists.
      mpl_cm.register_cmap(_name + "_r", _cmap_r)
    /Users/glemaitre/mambaforge/envs/dev/lib/python3.8/site-packages/seaborn/cm.py:1582: UserWarning: Trying to register the cmap 'crest' which already exists.
      mpl_cm.register_cmap(_name, _cmap)
    /Users/glemaitre/mambaforge/envs/dev/lib/python3.8/site-packages/seaborn/cm.py:1583: UserWarning: Trying to register the cmap 'crest_r' which already exists.
      mpl_cm.register_cmap(_name + "_r", _cmap_r)




.. GENERATED FROM PYTHON SOURCE LINES 25-30

Dataset generation
------------------

We will create an imbalanced dataset with a couple of samples. We will use
:func:`~sklearn.datasets.make_classification` to generate this dataset.

.. GENERATED FROM PYTHON SOURCE LINES 32-47

.. code-block:: default

    from sklearn.datasets import make_classification

    X, y = make_classification(
        n_samples=100,
        n_features=2,
        n_informative=2,
        n_redundant=0,
        n_repeated=0,
        n_classes=3,
        n_clusters_per_class=1,
        weights=[0.1, 0.2, 0.7],
        class_sep=0.8,
        random_state=0,
    )








.. GENERATED FROM PYTHON SOURCE LINES 48-51

.. code-block:: default

    _, ax = plt.subplots(figsize=(6, 6))
    ax.scatter(X[:, 0], X[:, 1], c=y, alpha=0.8, edgecolor="k")




.. image:: /auto_examples/combine/images/sphx_glr_plot_comparison_combine_001.png
    :alt: plot comparison combine
    :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none


    <matplotlib.collections.PathCollection object at 0x145288760>



.. GENERATED FROM PYTHON SOURCE LINES 52-54

The following function will be used to plot the sample space after resampling
to illustrate the characteristic of an algorithm.

.. GENERATED FROM PYTHON SOURCE LINES 56-68

.. code-block:: default

    from collections import Counter


    def plot_resampling(X, y, sampler, ax):
        """Plot the resampled dataset using the sampler."""
        X_res, y_res = sampler.fit_resample(X, y)
        ax.scatter(X_res[:, 0], X_res[:, 1], c=y_res, alpha=0.8, edgecolor="k")
        sns.despine(ax=ax, offset=10)
        ax.set_title(f"Decision function for {sampler.__class__.__name__}")
        return Counter(y_res)









.. GENERATED FROM PYTHON SOURCE LINES 69-71

The following function will be used to plot the decision function of a
classifier given some data.

.. GENERATED FROM PYTHON SOURCE LINES 73-92

.. code-block:: default

    import numpy as np


    def plot_decision_function(X, y, clf, ax):
        """Plot the decision function of the classifier and the original data"""
        plot_step = 0.02
        x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
        y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
        xx, yy = np.meshgrid(
            np.arange(x_min, x_max, plot_step), np.arange(y_min, y_max, plot_step)
        )

        Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
        Z = Z.reshape(xx.shape)
        ax.contourf(xx, yy, Z, alpha=0.4)
        ax.scatter(X[:, 0], X[:, 1], alpha=0.8, c=y, edgecolor="k")
        ax.set_title(f"Resampling using {clf[0].__class__.__name__}")









.. GENERATED FROM PYTHON SOURCE LINES 93-104

:class:`~imblearn.over_sampling.SMOTE` allows to generate samples. However,
this method of over-sampling does not have any knowledge regarding the
underlying distribution. Therefore, some noisy samples can be generated, e.g.
when the different classes cannot be well separated. Hence, it can be
beneficial to apply an under-sampling algorithm to clean the noisy samples.
Two methods are usually used in the literature: (i) Tomek's link and (ii)
edited nearest neighbours cleaning methods. Imbalanced-learn provides two
ready-to-use samplers :class:`~imblearn.combine.SMOTETomek` and
:class:`~imblearn.combine.SMOTEENN`. In general,
:class:`~imblearn.combine.SMOTEENN` cleans more noisy data than
:class:`~imblearn.combine.SMOTETomek`.

.. GENERATED FROM PYTHON SOURCE LINES 106-121

.. code-block:: default

    from imblearn.over_sampling import SMOTE
    from imblearn.combine import SMOTEENN, SMOTETomek
    from imblearn.pipeline import make_pipeline
    from sklearn.svm import LinearSVC

    samplers = [SMOTE(random_state=0), SMOTEENN(random_state=0), SMOTETomek(random_state=0)]

    fig, axs = plt.subplots(3, 2, figsize=(15, 25))
    for ax, sampler in zip(axs, samplers):
        clf = make_pipeline(sampler, LinearSVC()).fit(X, y)
        plot_decision_function(X, y, clf, ax[0])
        plot_resampling(X, y, sampler, ax[1])
    fig.tight_layout()

    plt.show()



.. image:: /auto_examples/combine/images/sphx_glr_plot_comparison_combine_002.png
    :alt: Resampling using SMOTE, Decision function for SMOTE, Resampling using SMOTEENN, Decision function for SMOTEENN, Resampling using SMOTETomek, Decision function for SMOTETomek
    :class: sphx-glr-single-img






.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  0.433 seconds)


.. _sphx_glr_download_auto_examples_combine_plot_comparison_combine.py:


.. only :: html

 .. container:: sphx-glr-footer
    :class: sphx-glr-footer-example



  .. container:: sphx-glr-download sphx-glr-download-python

     :download:`Download Python source code: plot_comparison_combine.py <plot_comparison_combine.py>`



  .. container:: sphx-glr-download sphx-glr-download-jupyter

     :download:`Download Jupyter notebook: plot_comparison_combine.ipynb <plot_comparison_combine.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
