Run this notebook: Open in Colab Open in Kaggle

Plot Gpc Xor¶

======================================================================== Illustration of Gaussian process classification (GPC) on the XOR dataset¶

This example illustrates GPC on XOR data. Compared are a stationary, isotropic kernel (RBF) and a non-stationary kernel (DotProduct). On this particular dataset, the DotProduct kernel obtains considerably better results because the class-boundaries are linear and coincide with the coordinate axes. In general, stationary kernels often obtain better results.

Imports for GPC on the XOR Dataset with Stationary vs Non-Stationary Kernels¶

Kernel choice determines whether GPC can capture the XOR pattern: The XOR function creates four quadrant-based clusters where diagonally opposite quadrants share the same label – a pattern that requires the decision boundary to coincide with the coordinate axes. The stationary RBF kernel defines similarity based only on distance between points, producing smooth circular contours that struggle to align with axis-parallel boundaries. The non-stationary DotProduct**2 kernel computes similarity based on the dot product of input positions (equivalent to a polynomial kernel of degree 4), which naturally produces axis-aligned decision regions because the polynomial features include x1^2 and x2^2 terms that separate the XOR quadrants.

Log-marginal-likelihood as model selection criterion: The log_marginal_likelihood printed for each kernel quantifies how well the model explains the training data while penalizing complexity – a Bayesian version of model selection that does not require a separate validation set. The DotProduct kernel achieves a substantially higher LML on XOR data because its inductive bias (polynomial feature interactions) matches the data structure, while the RBF kernel’s assumption of smooth, distance-based similarity is a poor match. The warm_start=True parameter reuses the previous solution as initialization for the Laplace approximation when the kernel changes, potentially speeding up convergence.

# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

import matplotlib.pyplot as plt
import numpy as np

from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.gaussian_process.kernels import RBF, DotProduct

xx, yy = np.meshgrid(np.linspace(-3, 3, 50), np.linspace(-3, 3, 50))
rng = np.random.RandomState(0)
X = rng.randn(200, 2)
Y = np.logical_xor(X[:, 0] > 0, X[:, 1] > 0)

# fit the model
plt.figure(figsize=(10, 5))
kernels = [1.0 * RBF(length_scale=1.15), 1.0 * DotProduct(sigma_0=1.0) ** 2]
for i, kernel in enumerate(kernels):
    clf = GaussianProcessClassifier(kernel=kernel, warm_start=True).fit(X, Y)

    # plot the decision function for each datapoint on the grid
    Z = clf.predict_proba(np.vstack((xx.ravel(), yy.ravel())).T)[:, 1]
    Z = Z.reshape(xx.shape)

    plt.subplot(1, 2, i + 1)
    image = plt.imshow(
        Z,
        interpolation="nearest",
        extent=(xx.min(), xx.max(), yy.min(), yy.max()),
        aspect="auto",
        origin="lower",
        cmap=plt.cm.PuOr_r,
    )
    contours = plt.contour(xx, yy, Z, levels=[0.5], linewidths=2, colors=["k"])
    plt.scatter(X[:, 0], X[:, 1], s=30, c=Y, cmap=plt.cm.Paired, edgecolors=(0, 0, 0))
    plt.xticks(())
    plt.yticks(())
    plt.axis([-3, 3, -3, 3])
    plt.colorbar(image)
    plt.title(
        "%s\n Log-Marginal-Likelihood:%.3f"
        % (clf.kernel_, clf.log_marginal_likelihood(clf.kernel_.theta)),
        fontsize=12,
    )

plt.tight_layout()
plt.show()