Plot Sgd IrisΒΆ

======================================== Plot multi-class SGD on the iris datasetΒΆ

Plot decision surface of multi-class SGD on iris dataset. The hyperplanes corresponding to the three one-versus-all (OVA) classifiers are represented by the dashed lines.

Imports for SGD Classification on IrisΒΆ

Stochastic Gradient Descent (SGD) is the workhorse optimization algorithm behind most modern machine learning, from logistic regression to deep neural networks. Instead of computing the gradient over the entire dataset (as in batch gradient descent), SGD updates model weights using one sample (or a small mini-batch) at a time. This makes it dramatically faster for large datasets and enables online learning where data arrives in streams.

Multi-class SGD on Iris: SGDClassifier with the default hinge loss implements a linear SVM trained via SGD. For multi-class problems, it uses a one-vs-all (OVA) strategy, fitting one binary classifier per class. The alpha parameter controls L2 regularization strength, and max_iter limits the number of passes over the data. Standardizing features (zero mean, unit variance) is essential for SGD because the algorithm’s convergence rate depends on feature scales – unscaled features cause the loss landscape to be elongated, leading to slow or oscillating convergence.

# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

import matplotlib.pyplot as plt
import numpy as np

from sklearn import datasets
from sklearn.inspection import DecisionBoundaryDisplay
from sklearn.linear_model import SGDClassifier

# import some data to play with
iris = datasets.load_iris()

# we only take the first two features. We could
# avoid this ugly slicing by using a two-dim dataset
X = iris.data[:, :2]
y = iris.target
colors = "bry"

# shuffle
idx = np.arange(X.shape[0])
np.random.seed(13)
np.random.shuffle(idx)
X = X[idx]
y = y[idx]

# standardize
mean = X.mean(axis=0)
std = X.std(axis=0)
X = (X - mean) / std

clf = SGDClassifier(alpha=0.001, max_iter=100).fit(X, y)
ax = plt.gca()
DecisionBoundaryDisplay.from_estimator(
    clf,
    X,
    cmap=plt.cm.Paired,
    ax=ax,
    response_method="predict",
    xlabel=iris.feature_names[0],
    ylabel=iris.feature_names[1],
)
plt.axis("tight")

# Plot also the training points
for i, color in zip(clf.classes_, colors):
    idx = (y == i).nonzero()
    plt.scatter(
        X[idx, 0],
        X[idx, 1],
        c=color,
        label=iris.target_names[i],
        edgecolor="black",
        s=20,
    )
plt.title("Decision surface of multi-class SGD")
plt.axis("tight")

# Plot the three one-against-all classifiers
xmin, xmax = plt.xlim()
ymin, ymax = plt.ylim()
coef = clf.coef_
intercept = clf.intercept_
def plot_hyperplane(c, color):
    def line(x0):
        return (-(x0 * coef[c, 0]) - intercept[c]) / coef[c, 1]

    plt.plot([xmin, xmax], [line(xmin), line(xmax)], ls="--", color=color)


for i, color in zip(clf.classes_, colors):
    plot_hyperplane(i, color)
plt.legend()
plt.show()