Plot Adaboost TwoclassΒΆ
================== Two-class AdaBoostΒΆ
This example fits an AdaBoosted decision stump on a non-linearly separable
classification dataset composed of two βGaussian quantilesβ clusters
(see :func:sklearn.datasets.make_gaussian_quantiles) and plots the decision
boundary and decision scores. The distributions of decision scores are shown
separately for samples of class A and B. The predicted class label for each
sample is determined by the sign of the decision score. Samples with decision
scores greater than zero are classified as B, and are otherwise classified
as A. The magnitude of a decision score determines the degree of likeness with
the predicted class label. Additionally, a new dataset could be constructed
containing a desired purity of class B, for example, by only selecting samples
with a decision score above some value.
Imports for Two-Class AdaBoost with Decision StumpsΒΆ
AdaBoost (Adaptive Boosting) builds an ensemble by iteratively reweighting training samples: after each weak learner (here a decision stump with max_depth=1) is trained, samples that were misclassified receive higher weights, forcing subsequent learners to focus on the hardest examples. The final prediction is a weighted vote of all weak learners, where each learnerβs weight is proportional to its accuracy. The decision_function method returns the signed confidence score, with the sign indicating the predicted class and the magnitude indicating certainty.
Decision boundary and score analysis: The left plot shows AdaBoostβs non-linear decision boundary created by combining 200 linear decision stumps β each stump can only make one axis-aligned split, yet their weighted combination produces a smooth, curved boundary that separates the overlapping Gaussian quantile clusters. The right plot shows the distribution of decision scores for each class: well-separated score distributions indicate confident, accurate classification, while overlap near zero indicates ambiguous predictions. This score distribution is useful for setting custom classification thresholds when the cost of false positives and false negatives differ.
# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause
import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import make_gaussian_quantiles
from sklearn.ensemble import AdaBoostClassifier
from sklearn.inspection import DecisionBoundaryDisplay
from sklearn.tree import DecisionTreeClassifier
# Construct dataset
X1, y1 = make_gaussian_quantiles(
cov=2.0, n_samples=200, n_features=2, n_classes=2, random_state=1
)
X2, y2 = make_gaussian_quantiles(
mean=(3, 3), cov=1.5, n_samples=300, n_features=2, n_classes=2, random_state=1
)
X = np.concatenate((X1, X2))
y = np.concatenate((y1, -y2 + 1))
# Create and fit an AdaBoosted decision tree
bdt = AdaBoostClassifier(DecisionTreeClassifier(max_depth=1), n_estimators=200)
bdt.fit(X, y)
plot_colors = "br"
plot_step = 0.02
class_names = "AB"
plt.figure(figsize=(10, 5))
# Plot the decision boundaries
ax = plt.subplot(121)
disp = DecisionBoundaryDisplay.from_estimator(
bdt,
X,
cmap=plt.cm.Paired,
response_method="predict",
ax=ax,
xlabel="x",
ylabel="y",
)
x_min, x_max = disp.xx0.min(), disp.xx0.max()
y_min, y_max = disp.xx1.min(), disp.xx1.max()
plt.axis("tight")
# Plot the training points
for i, n, c in zip(range(2), class_names, plot_colors):
idx = (y == i).nonzero()
plt.scatter(
X[idx, 0],
X[idx, 1],
c=c,
s=20,
edgecolor="k",
label="Class %s" % n,
)
plt.xlim(x_min, x_max)
plt.ylim(y_min, y_max)
plt.legend(loc="upper right")
plt.title("Decision Boundary")
# Plot the two-class decision scores
twoclass_output = bdt.decision_function(X)
plot_range = (twoclass_output.min(), twoclass_output.max())
plt.subplot(122)
for i, n, c in zip(range(2), class_names, plot_colors):
plt.hist(
twoclass_output[y == i],
bins=10,
range=plot_range,
facecolor=c,
label="Class %s" % n,
alpha=0.5,
edgecolor="k",
)
x1, x2, y1, y2 = plt.axis()
plt.axis((x1, x2, y1, y2 * 1.2))
plt.legend(loc="upper right")
plt.ylabel("Samples")
plt.xlabel("Score")
plt.title("Decision Scores")
plt.tight_layout()
plt.subplots_adjust(wspace=0.35)
plt.show()