Run this notebook: Open in Colab Open in Kaggle

Linear Regression with scikit-learn¶

Linear regression is the foundational supervised learning algorithm. It models a continuous target variable as a linear combination of input features, learning the optimal weights (coefficients) and intercept by minimizing the sum of squared residuals. Despite its simplicity, linear regression remains one of the most interpretable and widely used models in practice – from predicting house prices to estimating the effect of marketing spend on revenue.

Credits: Forked from PyCon 2015 Scikit-learn Tutorial by Jake VanderPlas

Linear Regression

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import seaborn; 
from sklearn.linear_model import LinearRegression
import pylab as pl

seaborn.set()

Linear Regression¶

Linear Regression is a supervised learning algorithm that models the relationship between a scalar dependent variable y and one or more explanatory variables (or independent variable) denoted X.

Generate some data:

# Create some simple data
import numpy as np
np.random.seed(0)
X = np.random.random(size=(20, 1))
y = 3 * X.squeeze() + 2 + np.random.randn(20)

plt.plot(X.squeeze(), y, 'o');

Fitting the model and visualizing predictions¶

model.fit(X, y) learns the best-fit line through the data. To visualize the result, we create a dense grid of x-values with np.linspace(), predict the corresponding y-values, and overlay the fitted line on top of the original data points. The gap between the line and the points represents the residuals – the errors that the model could not explain. A good fit has small, randomly scattered residuals with no visible pattern.

model = LinearRegression()
model.fit(X, y)

# Plot the data and the model prediction
X_fit = np.linspace(0, 1, 100)[:, np.newaxis]
y_fit = model.predict(X_fit)

plt.plot(X.squeeze(), y, 'o')
plt.plot(X_fit.squeeze(), y_fit);