Plot Ridge PathΒΆ

=========================================================== Plot Ridge coefficients as a function of the regularizationΒΆ

Shows the effect of collinearity in the coefficients of an estimator.

… currentmodule:: sklearn.linear_model

class:

Ridge Regression is the estimator used in this example. Each color represents a different feature of the coefficient vector, and this is displayed as a function of the regularization parameter.

This example also shows the usefulness of applying Ridge regression to highly ill-conditioned matrices. For such matrices, a slight change in the target variable can cause huge variances in the calculated weights. In such cases, it is useful to set a certain regularization (alpha) to reduce this variation (noise).

When alpha is very large, the regularization effect dominates the squared loss function and the coefficients tend to zero. At the end of the path, as alpha tends toward zero and the solution tends towards the ordinary least squares, coefficients exhibit big oscillations. In practice it is necessary to tune alpha in such a way that a balance is maintained between both.

Imports for Ridge Regularization PathΒΆ

Plotting how Ridge regression coefficients change as a function of the regularization parameter alpha is one of the most insightful diagnostic visualizations in linear modeling. As alpha decreases toward zero, the solution approaches OLS and coefficients can grow wildly, especially when the design matrix is ill-conditioned. As alpha increases, coefficients are shrunk toward zero, stabilizing the model at the cost of increased bias.

The Hilbert matrix demonstration: The 10x10 Hilbert matrix used here is a classic example of severe ill-conditioning – its condition number is astronomically large, meaning tiny perturbations in the input cause enormous changes in the OLS solution. Ridge regression explicitly counteracts this by adding alpha * I to X^T X before inverting, which stabilizes the system. The regularization path (coefficients plotted across a range of alpha values on a log scale) reveals exactly how each feature’s influence is smoothly dampened as regularization strength increases.

# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

import matplotlib.pyplot as plt
import numpy as np

from sklearn import linear_model

# X is the 10x10 Hilbert matrix
X = 1.0 / (np.arange(1, 11) + np.arange(0, 10)[:, np.newaxis])
y = np.ones(10)

# %%
# Compute paths
# -------------

n_alphas = 200
alphas = np.logspace(-10, -2, n_alphas)

coefs = []
for a in alphas:
    ridge = linear_model.Ridge(alpha=a, fit_intercept=False)
    ridge.fit(X, y)
    coefs.append(ridge.coef_)

# %%
# Display results
# ---------------

ax = plt.gca()

ax.plot(alphas, coefs)
ax.set_xscale("log")
ax.set_xlim(ax.get_xlim()[::-1])  # reverse axis
plt.xlabel("alpha")
plt.ylabel("weights")
plt.title("Ridge Coefficients vs Regularization Strength (alpha)")
plt.axis("tight")
plt.legend(
    [f"Feature {i + 1}" for i in range(X.shape[1])], loc="best", fontsize="small"
)
plt.show()