Plot Sgd IrisΒΆ
======================================== Plot multi-class SGD on the iris datasetΒΆ
Plot decision surface of multi-class SGD on iris dataset. The hyperplanes corresponding to the three one-versus-all (OVA) classifiers are represented by the dashed lines.
Imports for SGD Classification on IrisΒΆ
Stochastic Gradient Descent (SGD) is the workhorse optimization algorithm behind most modern machine learning, from logistic regression to deep neural networks. Instead of computing the gradient over the entire dataset (as in batch gradient descent), SGD updates model weights using one sample (or a small mini-batch) at a time. This makes it dramatically faster for large datasets and enables online learning where data arrives in streams.
Multi-class SGD on Iris: SGDClassifier with the default hinge loss implements a linear SVM trained via SGD. For multi-class problems, it uses a one-vs-all (OVA) strategy, fitting one binary classifier per class. The alpha parameter controls L2 regularization strength, and max_iter limits the number of passes over the data. Standardizing features (zero mean, unit variance) is essential for SGD because the algorithmβs convergence rate depends on feature scales β unscaled features cause the loss landscape to be elongated, leading to slow or oscillating convergence.
# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets
from sklearn.inspection import DecisionBoundaryDisplay
from sklearn.linear_model import SGDClassifier
# import some data to play with
iris = datasets.load_iris()
# we only take the first two features. We could
# avoid this ugly slicing by using a two-dim dataset
X = iris.data[:, :2]
y = iris.target
colors = "bry"
# shuffle
idx = np.arange(X.shape[0])
np.random.seed(13)
np.random.shuffle(idx)
X = X[idx]
y = y[idx]
# standardize
mean = X.mean(axis=0)
std = X.std(axis=0)
X = (X - mean) / std
clf = SGDClassifier(alpha=0.001, max_iter=100).fit(X, y)
ax = plt.gca()
DecisionBoundaryDisplay.from_estimator(
clf,
X,
cmap=plt.cm.Paired,
ax=ax,
response_method="predict",
xlabel=iris.feature_names[0],
ylabel=iris.feature_names[1],
)
plt.axis("tight")
# Plot also the training points
for i, color in zip(clf.classes_, colors):
idx = (y == i).nonzero()
plt.scatter(
X[idx, 0],
X[idx, 1],
c=color,
label=iris.target_names[i],
edgecolor="black",
s=20,
)
plt.title("Decision surface of multi-class SGD")
plt.axis("tight")
# Plot the three one-against-all classifiers
xmin, xmax = plt.xlim()
ymin, ymax = plt.ylim()
coef = clf.coef_
intercept = clf.intercept_
def plot_hyperplane(c, color):
def line(x0):
return (-(x0 * coef[c, 0]) - intercept[c]) / coef[c, 1]
plt.plot([xmin, xmax], [line(xmin), line(xmax)], ls="--", color=color)
for i, color in zip(clf.classes_, colors):
plot_hyperplane(i, color)
plt.legend()
plt.show()