
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/model_selection/plot_validation_curve.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_auto_examples_model_selection_plot_validation_curve.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_model_selection_plot_validation_curve.py:


==========================
Plotting Validation Curves
==========================

In this example the impact of the :class:`~imblearn.over_sampling.SMOTE`'s
`k_neighbors` parameter is examined. In the plot you can see the validation
scores of a SMOTE-CART classifier for different values of the
:class:`~imblearn.over_sampling.SMOTE`'s `k_neighbors` parameter.

.. GENERATED FROM PYTHON SOURCE LINES 11-16

.. code-block:: default


    # Authors: Christos Aridas
    #          Guillaume Lemaitre <g.lemaitre58@gmail.com>
    # License: MIT








.. GENERATED FROM PYTHON SOURCE LINES 17-26

.. code-block:: default

    print(__doc__)

    import seaborn as sns

    sns.set_context("poster")


    RANDOM_STATE = 42





.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none


    /Users/glemaitre/mambaforge/envs/dev/lib/python3.8/site-packages/seaborn/cm.py:1582: UserWarning: Trying to register the cmap 'rocket' which already exists.
      mpl_cm.register_cmap(_name, _cmap)
    /Users/glemaitre/mambaforge/envs/dev/lib/python3.8/site-packages/seaborn/cm.py:1583: UserWarning: Trying to register the cmap 'rocket_r' which already exists.
      mpl_cm.register_cmap(_name + "_r", _cmap_r)
    /Users/glemaitre/mambaforge/envs/dev/lib/python3.8/site-packages/seaborn/cm.py:1582: UserWarning: Trying to register the cmap 'mako' which already exists.
      mpl_cm.register_cmap(_name, _cmap)
    /Users/glemaitre/mambaforge/envs/dev/lib/python3.8/site-packages/seaborn/cm.py:1583: UserWarning: Trying to register the cmap 'mako_r' which already exists.
      mpl_cm.register_cmap(_name + "_r", _cmap_r)
    /Users/glemaitre/mambaforge/envs/dev/lib/python3.8/site-packages/seaborn/cm.py:1582: UserWarning: Trying to register the cmap 'icefire' which already exists.
      mpl_cm.register_cmap(_name, _cmap)
    /Users/glemaitre/mambaforge/envs/dev/lib/python3.8/site-packages/seaborn/cm.py:1583: UserWarning: Trying to register the cmap 'icefire_r' which already exists.
      mpl_cm.register_cmap(_name + "_r", _cmap_r)
    /Users/glemaitre/mambaforge/envs/dev/lib/python3.8/site-packages/seaborn/cm.py:1582: UserWarning: Trying to register the cmap 'vlag' which already exists.
      mpl_cm.register_cmap(_name, _cmap)
    /Users/glemaitre/mambaforge/envs/dev/lib/python3.8/site-packages/seaborn/cm.py:1583: UserWarning: Trying to register the cmap 'vlag_r' which already exists.
      mpl_cm.register_cmap(_name + "_r", _cmap_r)
    /Users/glemaitre/mambaforge/envs/dev/lib/python3.8/site-packages/seaborn/cm.py:1582: UserWarning: Trying to register the cmap 'flare' which already exists.
      mpl_cm.register_cmap(_name, _cmap)
    /Users/glemaitre/mambaforge/envs/dev/lib/python3.8/site-packages/seaborn/cm.py:1583: UserWarning: Trying to register the cmap 'flare_r' which already exists.
      mpl_cm.register_cmap(_name + "_r", _cmap_r)
    /Users/glemaitre/mambaforge/envs/dev/lib/python3.8/site-packages/seaborn/cm.py:1582: UserWarning: Trying to register the cmap 'crest' which already exists.
      mpl_cm.register_cmap(_name, _cmap)
    /Users/glemaitre/mambaforge/envs/dev/lib/python3.8/site-packages/seaborn/cm.py:1583: UserWarning: Trying to register the cmap 'crest_r' which already exists.
      mpl_cm.register_cmap(_name + "_r", _cmap_r)




.. GENERATED FROM PYTHON SOURCE LINES 27-28

Let's first generate a dataset with imbalanced class distribution.

.. GENERATED FROM PYTHON SOURCE LINES 30-45

.. code-block:: default

    from sklearn.datasets import make_classification

    X, y = make_classification(
        n_classes=2,
        class_sep=2,
        weights=[0.1, 0.9],
        n_informative=10,
        n_redundant=1,
        flip_y=0,
        n_features=20,
        n_clusters_per_class=4,
        n_samples=5000,
        random_state=RANDOM_STATE,
    )








.. GENERATED FROM PYTHON SOURCE LINES 46-50

We will use an over-sampler :class:`~imblearn.over_sampling.SMOTE` followed
by a :class:`~sklearn.tree.DecisionTreeClassifier`. The aim will be to
search which `k_neighbors` parameter is the most adequate with the dataset
that we generated.

.. GENERATED FROM PYTHON SOURCE LINES 52-60

.. code-block:: default

    from imblearn.over_sampling import SMOTE
    from imblearn.pipeline import make_pipeline
    from sklearn.tree import DecisionTreeClassifier

    model = make_pipeline(
        SMOTE(random_state=RANDOM_STATE), DecisionTreeClassifier(random_state=RANDOM_STATE)
    )








.. GENERATED FROM PYTHON SOURCE LINES 61-65

We can use the :class:`~sklearn.model_selection.validation_curve` to inspect
the impact of varying the parameter `k_neighbors`. In this case, we need
to use a score to evaluate the generalization score during the
cross-validation.

.. GENERATED FROM PYTHON SOURCE LINES 67-82

.. code-block:: default

    from sklearn.metrics import cohen_kappa_score, make_scorer
    from sklearn.model_selection import validation_curve

    scorer = make_scorer(cohen_kappa_score)
    param_range = range(1, 11)
    train_scores, test_scores = validation_curve(
        model,
        X,
        y,
        param_name="smote__k_neighbors",
        param_range=param_range,
        cv=3,
        scoring=scorer,
    )








.. GENERATED FROM PYTHON SOURCE LINES 83-88

.. code-block:: default

    train_scores_mean = train_scores.mean(axis=1)
    train_scores_std = train_scores.std(axis=1)
    test_scores_mean = test_scores.mean(axis=1)
    test_scores_std = test_scores.std(axis=1)








.. GENERATED FROM PYTHON SOURCE LINES 89-91

We can now plot the results of the cross-validation for the different
parameter values that we tried.

.. GENERATED FROM PYTHON SOURCE LINES 93-123

.. code-block:: default

    import matplotlib.pyplot as plt

    fig, ax = plt.subplots(figsize=(7, 5))
    ax.plot(param_range, test_scores_mean, label="SMOTE")
    ax.fill_between(
        param_range,
        test_scores_mean + test_scores_std,
        test_scores_mean - test_scores_std,
        alpha=0.2,
    )
    idx_max = test_scores_mean.argmax()
    ax.scatter(
        param_range[idx_max],
        test_scores_mean[idx_max],
        label=r"Cohen Kappa: ${:.2f}\pm{:.2f}$".format(
            test_scores_mean[idx_max], test_scores_std[idx_max]
        ),
    )

    fig.suptitle("Validation Curve with SMOTE-CART")
    ax.set_xlabel("k_neighbors")
    ax.set_ylabel("Cohen's kappa")

    # make nice plotting
    sns.despine(ax=ax, offset=10)
    ax.set_xlim([1, 10])
    ax.set_ylim([0.4, 0.8])
    ax.legend(loc="lower right")

    plt.show()



.. image:: /auto_examples/model_selection/images/sphx_glr_plot_validation_curve_001.png
    :alt: Validation Curve with SMOTE-CART
    :class: sphx-glr-single-img






.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  3.152 seconds)


.. _sphx_glr_download_auto_examples_model_selection_plot_validation_curve.py:


.. only :: html

 .. container:: sphx-glr-footer
    :class: sphx-glr-footer-example



  .. container:: sphx-glr-download sphx-glr-download-python

     :download:`Download Python source code: plot_validation_curve.py <plot_validation_curve.py>`



  .. container:: sphx-glr-download sphx-glr-download-jupyter

     :download:`Download Jupyter notebook: plot_validation_curve.ipynb <plot_validation_curve.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
