Run this notebook: Open in Colab Open in Kaggle

Plot Iris Dtc¶

======================================================================= Plot the decision surface of decision trees trained on the iris dataset¶

Plot the decision surface of a decision tree trained on pairs of features of the iris dataset.

See :ref:decision tree <tree> for more information on the estimator.

For each pair of iris features, the decision tree learns decision boundaries made of combinations of simple thresholding rules inferred from the training samples.

We also show the tree structure of a model built on all of the features.

Imports for Visualizing Decision Tree Boundaries on Iris Data¶

Decision trees classify data by learning a sequence of if-then rules on individual features. When trained on just two features at a time, the resulting decision boundaries are axis-aligned rectangles that can be plotted directly in 2D, making decision trees one of the most interpretable classifiers available. Visualizing these boundaries across all pairs of iris features reveals which feature combinations provide the cleanest class separation.

Why pairwise feature plots? The iris dataset has four features (sepal length, sepal width, petal length, petal width). By training separate trees on each of the six possible feature pairs, we can see how each pair contributes to discrimination. DecisionBoundaryDisplay.from_estimator fills the 2D plane with predicted class colors, while plot_tree renders the full tree structure showing split features, thresholds, and class distributions at each node – a powerful tool for model explainability in real-world applications.

# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

# %%
# First load the copy of the Iris dataset shipped with scikit-learn:
from sklearn.datasets import load_iris

iris = load_iris()


# %%
# Display the decision functions of trees trained on all pairs of features.
import matplotlib.pyplot as plt
import numpy as np

from sklearn.datasets import load_iris
from sklearn.inspection import DecisionBoundaryDisplay
from sklearn.tree import DecisionTreeClassifier

# Parameters
n_classes = 3
plot_colors = "ryb"
plot_step = 0.02


for pairidx, pair in enumerate([[0, 1], [0, 2], [0, 3], [1, 2], [1, 3], [2, 3]]):
    # We only take the two corresponding features
    X = iris.data[:, pair]
    y = iris.target

    # Train
    clf = DecisionTreeClassifier().fit(X, y)

    # Plot the decision boundary
    ax = plt.subplot(2, 3, pairidx + 1)
    plt.tight_layout(h_pad=0.5, w_pad=0.5, pad=2.5)
    DecisionBoundaryDisplay.from_estimator(
        clf,
        X,
        cmap=plt.cm.RdYlBu,
        response_method="predict",
        ax=ax,
        xlabel=iris.feature_names[pair[0]],
        ylabel=iris.feature_names[pair[1]],
    )

    # Plot the training points
    for i, color in zip(range(n_classes), plot_colors):
        idx = np.asarray(y == i).nonzero()
        plt.scatter(
            X[idx, 0],
            X[idx, 1],
            c=color,
            label=iris.target_names[i],
            edgecolor="black",
            s=15,
        )

plt.suptitle("Decision surface of decision trees trained on pairs of features")
plt.legend(loc="lower right", borderpad=0, handletextpad=0)
_ = plt.axis("tight")

# %%
# Display the structure of a single decision tree trained on all the features
# together.
from sklearn.tree import plot_tree

plt.figure()
clf = DecisionTreeClassifier().fit(iris.data, iris.target)
plot_tree(clf, filled=True)
plt.title("Decision tree trained on all the iris features")
plt.show()