Run this notebook: Open in Colab Open in Kaggle

Plot Regression¶

============================ Nearest Neighbors regression¶

Demonstrate the resolution of a regression problem using a k-Nearest Neighbor and the interpolation of the target using both barycenter and constant weights.

Imports for K-Nearest Neighbors Regression¶

KNN regression predicts via local averaging of neighbor targets: KNeighborsRegressor predicts the target value for a query point by finding its k nearest training samples and combining their target values. With weights="uniform", the prediction is the simple arithmetic mean of the k neighbors’ targets, producing a piecewise-constant function with sharp jumps as the set of nearest neighbors changes. With weights="distance", each neighbor’s contribution is weighted by the inverse of its distance to the query point, producing smoother interpolation that more closely follows nearby training points while being less influenced by distant ones in the neighborhood.

Non-parametric regression with no training phase: Unlike parametric regressors (linear regression, neural networks) that learn explicit model parameters during training, KNN regression stores the training data and performs all computation at prediction time – making “training” instantaneous but prediction O(n*d) per query point without spatial indexing. The sinusoidal data with added noise demonstrates how KNN adapts to arbitrary nonlinear relationships without assuming a functional form, while the n_neighbors=5 parameter controls the bias-variance tradeoff: fewer neighbors produce a wiggly fit that tracks noise (low bias, high variance), while more neighbors produce a smoother fit that may miss local structure (high bias, low variance).

# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

# %%
# Generate sample data
# --------------------
# Here we generate a few data points to use to train the model. We also generate
# data in the whole range of the training data to visualize how the model would
# react in that whole region.
import matplotlib.pyplot as plt
import numpy as np

from sklearn import neighbors

rng = np.random.RandomState(0)
X_train = np.sort(5 * rng.rand(40, 1), axis=0)
X_test = np.linspace(0, 5, 500)[:, np.newaxis]
y = np.sin(X_train).ravel()

# Add noise to targets
y[::5] += 1 * (0.5 - np.random.rand(8))

# %%
# Fit regression model
# --------------------
# Here we train a model and visualize how `uniform` and `distance`
# weights in prediction effect predicted values.
n_neighbors = 5

for i, weights in enumerate(["uniform", "distance"]):
    knn = neighbors.KNeighborsRegressor(n_neighbors, weights=weights)
    y_ = knn.fit(X_train, y).predict(X_test)

    plt.subplot(2, 1, i + 1)
    plt.scatter(X_train, y, color="darkorange", label="data")
    plt.plot(X_test, y_, color="navy", label="prediction")
    plt.axis("tight")
    plt.legend()
    plt.title("KNeighborsRegressor (k = %i, weights = '%s')" % (n_neighbors, weights))

plt.tight_layout()
plt.show()