Run this notebook: Open in Colab Open in Kaggle

Plot Digits Linkage¶

============================================================================= Various Agglomerative Clustering on a 2D embedding of digits¶

An illustration of various linkage option for agglomerative clustering on a 2D embedding of the digits dataset.

The goal of this example is to show intuitively how the metrics behave, and not to find good clusters for the digits. This is why the example works on a 2D embedding.

What this example shows us is the behavior “rich getting richer” of agglomerative clustering that tends to create uneven cluster sizes.

This behavior is pronounced for the average linkage strategy, that ends up with a couple of clusters with few datapoints.

The case of single linkage is even more pathologic with a very large cluster covering most digits, an intermediate size (clean) cluster with most zero digits and all other clusters being drawn from noise points around the fringes.

The other linkage strategies lead to more evenly distributed clusters that are therefore likely to be less sensible to a random resampling of the dataset.

Imports for Agglomerative Clustering on 2D Digit Embeddings¶

This example applies four linkage strategies (Ward, average, complete, single) to a spectral embedding of the 64-dimensional digits dataset projected into 2D, revealing how each linkage criterion handles uneven cluster densities and shapes. The SpectralEmbedding preserves the local neighborhood structure of the high-dimensional data while mapping it to a 2D plane where agglomerative clustering can be visualized. Each digit is rendered as its actual numeral character, color-coded by cluster assignment, making it easy to see which digits are grouped together and which are fragmented.

Linkage behavior on real data: Ward linkage produces the most balanced clusters because its variance-minimization objective discourages merging small clusters into large ones. Average linkage exhibits the “rich get richer” effect – a few clusters absorb most data points while others remain tiny. Single linkage is the most pathological, producing one enormous cluster and many singleton noise clusters due to its chaining tendency. Complete linkage falls between Ward and average. These behaviors are amplified on 2D embeddings where the digit classes overlap and have irregular shapes, illustrating why Ward is typically the safest default for exploratory hierarchical clustering.

# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

from time import time

import numpy as np
from matplotlib import pyplot as plt

from sklearn import datasets, manifold

digits = datasets.load_digits()
X, y = digits.data, digits.target
n_samples, n_features = X.shape

np.random.seed(0)


# ----------------------------------------------------------------------
# Visualize the clustering

def plot_clustering(X_red, labels, title=None):
    x_min, x_max = np.min(X_red, axis=0), np.max(X_red, axis=0)
    X_red = (X_red - x_min) / (x_max - x_min)

    plt.figure(figsize=(6, 4))
    for digit in digits.target_names:
        plt.scatter(
            *X_red[y == digit].T,
            marker=f"${digit}$",
            s=50,
            c=plt.cm.nipy_spectral(labels[y == digit] / 10),
            alpha=0.5,
        )

    plt.xticks([])
    plt.yticks([])
    if title is not None:
        plt.title(title, size=17)
    plt.axis("off")
    plt.tight_layout(rect=[0, 0.03, 1, 0.95])


# ----------------------------------------------------------------------
# 2D embedding of the digits dataset
print("Computing embedding")
X_red = manifold.SpectralEmbedding(n_components=2).fit_transform(X)
print("Done.")

from sklearn.cluster import AgglomerativeClustering

for linkage in ("ward", "average", "complete", "single"):
    clustering = AgglomerativeClustering(linkage=linkage, n_clusters=10)
    t0 = time()
    clustering.fit(X_red)
    print("%s :\t%.2fs" % (linkage, time() - t0))

    plot_clustering(X_red, clustering.labels_, "%s linkage" % linkage)


plt.show()