Run this notebook: Open in Colab Open in Kaggle

Plot Roc Crossval¶

============================================================= Receiver Operating Characteristic (ROC) with cross validation¶

This example presents how to estimate and visualize the variance of the Receiver Operating Characteristic (ROC) metric using cross-validation.

ROC curves typically feature true positive rate (TPR) on the Y axis, and false positive rate (FPR) on the X axis. This means that the top left corner of the plot is the “ideal” point - a FPR of zero, and a TPR of one. This is not very realistic, but it does mean that a larger Area Under the Curve (AUC) is usually better. The “steepness” of ROC curves is also important, since it is ideal to maximize the TPR while minimizing the FPR.

This example shows the ROC response of different datasets, created from K-fold cross-validation. Taking all of these curves, it is possible to calculate the mean AUC, and see the variance of the curve when the training set is split into different subsets. This roughly shows how the classifier output is affected by changes in the training data, and how different the splits generated by K-fold cross-validation are from one another.

… note::

See :ref:`sphx_glr_auto_examples_model_selection_plot_roc.py` for a
complement of the present example explaining the averaging strategies to
generalize the metrics for multiclass classifiers.

Imports for Cross-Validated ROC Curves with Variance Estimation¶

Estimating ROC variability across CV folds: A single train-test split produces one ROC curve, but its shape depends heavily on which samples land in the test set. cross_validate with return_estimator=True and return_indices=True provides the fitted models and fold indices needed to compute per-fold ROC curves via RocCurveDisplay.from_cv_results. Each fold’s ROC curve is interpolated onto a common mean_fpr grid using np.interp, enabling pointwise averaging and standard deviation computation across folds. The resulting mean ROC curve with a shaded one-standard-deviation band reveals how stable the classifier’s discrimination ability is across different data partitions.

Interpreting the mean ROC and its confidence band: A narrow band indicates the model consistently discriminates well regardless of the specific training data, while a wide band suggests sensitivity to the training sample – a sign of overfitting or insufficient data. The mean AUC and its standard deviation (e.g., 0.80 +/- 0.05) summarize the distribution of fold-level AUCs. StratifiedKFold with 6 splits ensures each fold preserves the class ratio, and adding 200 noisy features per original feature makes the binary iris classification problem challenging enough to exhibit meaningful ROC variability. The chance-level baseline (diagonal) corresponds to a classifier that predicts the most frequent class, providing a reference for how much the model improves over random guessing.

# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

# %%
# Load and prepare data
# =====================
#
# We import the :ref:`iris_dataset` which contains 3 classes, each one
# corresponding to a type of iris plant. One class is linearly separable from
# the other 2; the latter are **not** linearly separable from each other.
#
# In the following we binarize the dataset by dropping the "virginica" class
# (`class_id=2`). This means that the "versicolor" class (`class_id=1`) is
# regarded as the positive class and "setosa" as the negative class
# (`class_id=0`).

import numpy as np

from sklearn.datasets import load_iris

iris = load_iris()
target_names = iris.target_names
X, y = iris.data, iris.target
X, y = X[y != 2], y[y != 2]
n_samples, n_features = X.shape

# %%
# We also add noisy features to make the problem harder.
random_state = np.random.RandomState(0)
X = np.concatenate([X, random_state.randn(n_samples, 200 * n_features)], axis=1)

# %%
# Classification and ROC analysis
# -------------------------------
#
# Here we run :func:`~sklearn.model_selection.cross_validate` on a
# :class:`~sklearn.svm.SVC` classifier, then use the computed cross-validation results
# to plot the ROC curves fold-wise. Notice that the baseline to define the chance
# level (dashed ROC curve) is a classifier that would always predict the most
# frequent class.

import matplotlib.pyplot as plt

from sklearn import svm
from sklearn.metrics import RocCurveDisplay, auc
from sklearn.model_selection import StratifiedKFold, cross_validate

n_splits = 6
cv = StratifiedKFold(n_splits=n_splits)
classifier = svm.SVC(kernel="linear", probability=True, random_state=random_state)
cv_results = cross_validate(
    classifier, X, y, cv=cv, return_estimator=True, return_indices=True
)

prop_cycle = plt.rcParams["axes.prop_cycle"]
colors = prop_cycle.by_key()["color"]
curve_kwargs_list = [
    dict(alpha=0.3, lw=1, color=colors[fold % len(colors)]) for fold in range(n_splits)
]
names = [f"ROC fold {idx}" for idx in range(n_splits)]

mean_fpr = np.linspace(0, 1, 100)
interp_tprs = []

_, ax = plt.subplots(figsize=(6, 6))
viz = RocCurveDisplay.from_cv_results(
    cv_results,
    X,
    y,
    ax=ax,
    name=names,
    curve_kwargs=curve_kwargs_list,
    plot_chance_level=True,
)

for idx in range(n_splits):
    interp_tpr = np.interp(mean_fpr, viz.fpr[idx], viz.tpr[idx])
    interp_tpr[0] = 0.0
    interp_tprs.append(interp_tpr)

mean_tpr = np.mean(interp_tprs, axis=0)
mean_tpr[-1] = 1.0
mean_auc = auc(mean_fpr, mean_tpr)
std_auc = np.std(viz.roc_auc)

ax.plot(
    mean_fpr,
    mean_tpr,
    color="b",
    label=r"Mean ROC (AUC = %0.2f $\pm$ %0.2f)" % (mean_auc, std_auc),
    lw=2,
    alpha=0.8,
)

std_tpr = np.std(interp_tprs, axis=0)
tprs_upper = np.minimum(mean_tpr + std_tpr, 1)
tprs_lower = np.maximum(mean_tpr - std_tpr, 0)
ax.fill_between(
    mean_fpr,
    tprs_lower,
    tprs_upper,
    color="grey",
    alpha=0.2,
    label=r"$\pm$ 1 std. dev.",
)

ax.set(
    xlabel="False Positive Rate",
    ylabel="True Positive Rate",
    title=f"Mean ROC curve with variability\n(Positive label '{target_names[1]}')",
)
ax.legend(loc="lower right")
plt.show()