Plot Rbm Logistic ClassificationΒΆ

============================================================== Restricted Boltzmann Machine features for digit classificationΒΆ

For greyscale image data where pixel values can be interpreted as degrees of blackness on a white background, like handwritten digit recognition, the Bernoulli Restricted Boltzmann machine model (:class:BernoulliRBM <sklearn.neural_network.BernoulliRBM>) can perform effective non-linear feature extraction.

Imports for RBM Feature Extraction with Logistic RegressionΒΆ

A BernoulliRBM learns a generative model of binary image patches that produces useful discriminative features: The Restricted Boltzmann Machine is an undirected graphical model with visible units (pixel values) and hidden units (learned features) connected by a weight matrix but with no intra-layer connections. During training via contrastive divergence, it learns to model the joint probability P(visible, hidden) such that each hidden unit activates for a specific visual pattern. When used as a feature extractor in a Pipeline before LogisticRegression, the RBM’s hidden unit activations serve as a nonlinear transformation of the raw pixels, often capturing stroke fragments and spatial correlations that a linear classifier cannot discover from pixels alone.

Data augmentation via 1-pixel shifts compensates for the small training set: The nudge_dataset function uses scipy.ndimage.convolve with directional kernels to shift each 8x8 digit image by one pixel in four directions (up, down, left, right), quintupling the dataset size. This simple augmentation teaches the model translational near-invariance without architectural changes. The minmax_scale to [0, 1] is essential because the Bernoulli RBM assumes binary or near-binary visible units – values outside this range violate the model’s probabilistic assumptions. The 100 learned components (weight vectors reshaped to 8x8) visualized at the end reveal the visual patterns the RBM has discovered, which typically resemble pen strokes, digit fragments, and edge detectors.

# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

# %%
# Generate data
# -------------
#
# In order to learn good latent representations from a small dataset, we
# artificially generate more labeled data by perturbing the training data with
# linear shifts of 1 pixel in each direction.

import numpy as np
from scipy.ndimage import convolve

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import minmax_scale

Data Augmentation via 1-Pixel ShiftsΒΆ

The nudge_dataset function creates translation-augmented training data by convolving each 8x8 image with directional shift kernels: Each 3x3 kernel places a single 1 at a cardinal direction (up, down, left, right), and scipy.ndimage.convolve with mode='constant' shifts the image by exactly one pixel while zero-padding the exposed edge. This produces four shifted copies of every image, expanding the dataset from N to 5N samples and teaching downstream models to be approximately invariant to small spatial translations – a critical property for robust digit recognition.

def nudge_dataset(X, Y):
    """
    This produces a dataset 5 times bigger than the original one,
    by moving the 8x8 images in X around by 1px to left, right, down, up
    """
    direction_vectors = [
        [[0, 1, 0], [0, 0, 0], [0, 0, 0]],
        [[0, 0, 0], [1, 0, 0], [0, 0, 0]],
        [[0, 0, 0], [0, 0, 1], [0, 0, 0]],
        [[0, 0, 0], [0, 0, 0], [0, 1, 0]],
    ]

    def shift(x, w):
        return convolve(x.reshape((8, 8)), mode="constant", weights=w).ravel()

    X = np.concatenate(
        [X] + [np.apply_along_axis(shift, 1, X, vector) for vector in direction_vectors]
    )
    Y = np.concatenate([Y for _ in range(5)], axis=0)
    return X, Y


X, y = datasets.load_digits(return_X_y=True)
X = np.asarray(X, "float32")
X, Y = nudge_dataset(X, y)
X = minmax_scale(X, feature_range=(0, 1))  # 0-1 scaling

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=0)

# %%
# Models definition
# -----------------
#
# We build a classification pipeline with a BernoulliRBM feature extractor and
# a :class:`LogisticRegression <sklearn.linear_model.LogisticRegression>`
# classifier.

from sklearn import linear_model
from sklearn.neural_network import BernoulliRBM
from sklearn.pipeline import Pipeline

logistic = linear_model.LogisticRegression(solver="newton-cg", tol=1)
rbm = BernoulliRBM(random_state=0, verbose=True)

rbm_features_classifier = Pipeline(steps=[("rbm", rbm), ("logistic", logistic)])

# %%
# Training
# --------
#
# The hyperparameters of the entire model (learning rate, hidden layer size,
# regularization) were optimized by grid search, but the search is not
# reproduced here because of runtime constraints.

from sklearn.base import clone

# Hyper-parameters. These were set by cross-validation,
# using a GridSearchCV. Here we are not performing cross-validation to
# save time.
rbm.learning_rate = 0.06
rbm.n_iter = 10

# More components tend to give better prediction performance, but larger
# fitting time
rbm.n_components = 100
logistic.C = 6000

# Training RBM-Logistic Pipeline
rbm_features_classifier.fit(X_train, Y_train)

# Training the Logistic regression classifier directly on the pixel
raw_pixel_classifier = clone(logistic)
raw_pixel_classifier.C = 100.0
raw_pixel_classifier.fit(X_train, Y_train)

# %%
# Evaluation
# ----------

from sklearn import metrics

Y_pred = rbm_features_classifier.predict(X_test)
print(
    "Logistic regression using RBM features:\n%s\n"
    % (metrics.classification_report(Y_test, Y_pred))
)

# %%
Y_pred = raw_pixel_classifier.predict(X_test)
print(
    "Logistic regression using raw pixel features:\n%s\n"
    % (metrics.classification_report(Y_test, Y_pred))
)

# %%
# The features extracted by the BernoulliRBM help improve the classification
# accuracy with respect to the logistic regression on raw pixels.

# %%
# Plotting
# --------

import matplotlib.pyplot as plt

plt.figure(figsize=(4.2, 4))
for i, comp in enumerate(rbm.components_):
    plt.subplot(10, 10, i + 1)
    plt.imshow(comp.reshape((8, 8)), cmap=plt.cm.gray_r, interpolation="nearest")
    plt.xticks(())
    plt.yticks(())
plt.suptitle("100 components extracted by RBM", fontsize=16)
plt.subplots_adjust(0.08, 0.02, 0.92, 0.85, 0.08, 0.23)

plt.show()