
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/datasets/plot_make_imbalance.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_auto_examples_datasets_plot_make_imbalance.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_datasets_plot_make_imbalance.py:


============================
Create an imbalanced dataset
============================

An illustration of the :func:`~imblearn.datasets.make_imbalance` function to
create an imbalanced dataset from a balanced dataset. We show the ability of
:func:`~imblearn.datasets.make_imbalance` of dealing with Pandas DataFrame.

.. GENERATED FROM PYTHON SOURCE LINES 10-16

.. code-block:: default


    # Authors: Dayvid Oliveira
    #          Christos Aridas
    #          Guillaume Lemaitre <g.lemaitre58@gmail.com>
    # License: MIT








.. GENERATED FROM PYTHON SOURCE LINES 17-23

.. code-block:: default

    print(__doc__)

    import seaborn as sns

    sns.set_context("poster")





.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none


    /Users/glemaitre/mambaforge/envs/dev/lib/python3.8/site-packages/seaborn/cm.py:1582: UserWarning: Trying to register the cmap 'rocket' which already exists.
      mpl_cm.register_cmap(_name, _cmap)
    /Users/glemaitre/mambaforge/envs/dev/lib/python3.8/site-packages/seaborn/cm.py:1583: UserWarning: Trying to register the cmap 'rocket_r' which already exists.
      mpl_cm.register_cmap(_name + "_r", _cmap_r)
    /Users/glemaitre/mambaforge/envs/dev/lib/python3.8/site-packages/seaborn/cm.py:1582: UserWarning: Trying to register the cmap 'mako' which already exists.
      mpl_cm.register_cmap(_name, _cmap)
    /Users/glemaitre/mambaforge/envs/dev/lib/python3.8/site-packages/seaborn/cm.py:1583: UserWarning: Trying to register the cmap 'mako_r' which already exists.
      mpl_cm.register_cmap(_name + "_r", _cmap_r)
    /Users/glemaitre/mambaforge/envs/dev/lib/python3.8/site-packages/seaborn/cm.py:1582: UserWarning: Trying to register the cmap 'icefire' which already exists.
      mpl_cm.register_cmap(_name, _cmap)
    /Users/glemaitre/mambaforge/envs/dev/lib/python3.8/site-packages/seaborn/cm.py:1583: UserWarning: Trying to register the cmap 'icefire_r' which already exists.
      mpl_cm.register_cmap(_name + "_r", _cmap_r)
    /Users/glemaitre/mambaforge/envs/dev/lib/python3.8/site-packages/seaborn/cm.py:1582: UserWarning: Trying to register the cmap 'vlag' which already exists.
      mpl_cm.register_cmap(_name, _cmap)
    /Users/glemaitre/mambaforge/envs/dev/lib/python3.8/site-packages/seaborn/cm.py:1583: UserWarning: Trying to register the cmap 'vlag_r' which already exists.
      mpl_cm.register_cmap(_name + "_r", _cmap_r)
    /Users/glemaitre/mambaforge/envs/dev/lib/python3.8/site-packages/seaborn/cm.py:1582: UserWarning: Trying to register the cmap 'flare' which already exists.
      mpl_cm.register_cmap(_name, _cmap)
    /Users/glemaitre/mambaforge/envs/dev/lib/python3.8/site-packages/seaborn/cm.py:1583: UserWarning: Trying to register the cmap 'flare_r' which already exists.
      mpl_cm.register_cmap(_name + "_r", _cmap_r)
    /Users/glemaitre/mambaforge/envs/dev/lib/python3.8/site-packages/seaborn/cm.py:1582: UserWarning: Trying to register the cmap 'crest' which already exists.
      mpl_cm.register_cmap(_name, _cmap)
    /Users/glemaitre/mambaforge/envs/dev/lib/python3.8/site-packages/seaborn/cm.py:1583: UserWarning: Trying to register the cmap 'crest_r' which already exists.
      mpl_cm.register_cmap(_name + "_r", _cmap_r)




.. GENERATED FROM PYTHON SOURCE LINES 24-30

Generate the dataset
--------------------

First, we will generate a dataset and convert it to a
:class:`~pandas.DataFrame` with arbitrary column names. We will plot the
original dataset.

.. GENERATED FROM PYTHON SOURCE LINES 32-46

.. code-block:: default

    import pandas as pd
    from sklearn.datasets import make_moons

    X, y = make_moons(n_samples=200, shuffle=True, noise=0.5, random_state=10)
    X = pd.DataFrame(X, columns=["feature 1", "feature 2"])
    ax = X.plot.scatter(
        x="feature 1",
        y="feature 2",
        c=y,
        colormap="viridis",
        colorbar=False,
    )
    sns.despine(ax=ax, offset=10)




.. image:: /auto_examples/datasets/images/sphx_glr_plot_make_imbalance_001.png
    :alt: plot make imbalance
    :class: sphx-glr-single-img





.. GENERATED FROM PYTHON SOURCE LINES 47-53

Make a dataset imbalanced
-------------------------

Now, we will show the helpers :func:`~imblearn.datasets.make_imbalance`
that is useful to random select a subset of samples. It will impact the
class distribution as specified by the parameters.

.. GENERATED FROM PYTHON SOURCE LINES 55-63

.. code-block:: default

    from collections import Counter


    def ratio_func(y, multiplier, minority_class):
        target_stats = Counter(y)
        return {minority_class: int(multiplier * target_stats[minority_class])}









.. GENERATED FROM PYTHON SOURCE LINES 64-101

.. code-block:: default

    import matplotlib.pyplot as plt
    from imblearn.datasets import make_imbalance

    fig, axs = plt.subplots(nrows=2, ncols=3, figsize=(15, 10))

    X.plot.scatter(
        x="feature 1",
        y="feature 2",
        c=y,
        ax=axs[0, 0],
        colormap="viridis",
        colorbar=False,
    )
    axs[0, 0].set_title("Original set")
    sns.despine(ax=axs[0, 0], offset=10)

    multipliers = [0.9, 0.75, 0.5, 0.25, 0.1]
    for ax, multiplier in zip(axs.ravel()[1:], multipliers):
        X_resampled, y_resampled = make_imbalance(
            X,
            y,
            sampling_strategy=ratio_func,
            **{"multiplier": multiplier, "minority_class": 1},
        )
        X_resampled.plot.scatter(
            x="feature 1",
            y="feature 2",
            c=y_resampled,
            ax=ax,
            colormap="viridis",
            colorbar=False,
        )
        ax.set_title(f"Sampling ratio = {multiplier}")
        sns.despine(ax=ax, offset=10)

    plt.tight_layout()
    plt.show()



.. image:: /auto_examples/datasets/images/sphx_glr_plot_make_imbalance_002.png
    :alt: Original set, Sampling ratio = 0.9, Sampling ratio = 0.75, Sampling ratio = 0.5, Sampling ratio = 0.25, Sampling ratio = 0.1
    :class: sphx-glr-single-img






.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  0.288 seconds)


.. _sphx_glr_download_auto_examples_datasets_plot_make_imbalance.py:


.. only :: html

 .. container:: sphx-glr-footer
    :class: sphx-glr-footer-example



  .. container:: sphx-glr-download sphx-glr-download-python

     :download:`Download Python source code: plot_make_imbalance.py <plot_make_imbalance.py>`



  .. container:: sphx-glr-download sphx-glr-download-jupyter

     :download:`Download Jupyter notebook: plot_make_imbalance.ipynb <plot_make_imbalance.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
