Plot Label Propagation StructureΒΆ

======================================================= Label Propagation circles: Learning a complex structureΒΆ

Example of LabelPropagation learning a complex internal structure to demonstrate β€œmanifold learning”. The outer circle should be labeled β€œred” and the inner circle β€œblue”. Because both label groups lie inside their own distinct shape, we can see that the labels propagate correctly around the circle.

Imports for Label Propagation on Concentric CirclesΒΆ

LabelSpreading with kernel='knn' exploits manifold structure to propagate labels from just 2 labeled points: The two concentric circles generated by make_circles form a dataset where Euclidean distance alone cannot separate the classes (inner and outer points at the same radius are equally distant from the center), but the graph connectivity along each circle provides a natural separation. With only one labeled point on the outer circle (index 0) and one on the inner circle (index -1), the KNN graph connects each point to its nearest neighbors along its respective circle, and label spreading propagates the known labels through these local connections until all 200 points are classified.

The alpha=0.8 clamping parameter controls how strongly the original labels are preserved during propagation: At each iteration, each node’s label distribution is updated as a weighted average of its neighbors’ distributions, then the labeled nodes’ distributions are partially reset toward their known labels with strength (1-alpha). Higher alpha values allow labeled nodes to be more influenced by their neighbors (softer clamping), while lower alpha values force labeled nodes to retain their original labels more strongly. On this manifold-structured dataset, the algorithm achieves perfect classification despite having labels for only 1% of the data, demonstrating that geometric structure in the feature space can substitute for large amounts of labeled data.

# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

# %%
# We generate a dataset with two concentric circles. In addition, a label
# is associated with each sample of the dataset that is: 0 (belonging to
# the outer circle), 1 (belonging to the inner circle), and -1 (unknown).
# Here, all labels but two are tagged as unknown.

import numpy as np

from sklearn.datasets import make_circles

n_samples = 200
X, y = make_circles(n_samples=n_samples, shuffle=False)
outer, inner = 0, 1
labels = np.full(n_samples, -1.0)
labels[0] = outer
labels[-1] = inner

# %%
# Plot raw data
import matplotlib.pyplot as plt

plt.figure(figsize=(4, 4))
plt.scatter(
    X[labels == outer, 0],
    X[labels == outer, 1],
    color="navy",
    marker="s",
    lw=0,
    label="outer labeled",
    s=10,
)
plt.scatter(
    X[labels == inner, 0],
    X[labels == inner, 1],
    color="c",
    marker="s",
    lw=0,
    label="inner labeled",
    s=10,
)
plt.scatter(
    X[labels == -1, 0],
    X[labels == -1, 1],
    color="darkorange",
    marker=".",
    label="unlabeled",
)
plt.legend(scatterpoints=1, shadow=False, loc="center")
_ = plt.title("Raw data (2 classes=outer and inner)")

# %%
#
# The aim of :class:`~sklearn.semi_supervised.LabelSpreading` is to associate
# a label to sample where the label is initially unknown.
from sklearn.semi_supervised import LabelSpreading

label_spread = LabelSpreading(kernel="knn", alpha=0.8)
label_spread.fit(X, labels)

# %%
# Now, we can check which labels have been associated with each sample
# when the label was unknown.
output_labels = label_spread.transduction_
output_label_array = np.asarray(output_labels)
outer_numbers = (output_label_array == outer).nonzero()[0]
inner_numbers = (output_label_array == inner).nonzero()[0]

plt.figure(figsize=(4, 4))
plt.scatter(
    X[outer_numbers, 0],
    X[outer_numbers, 1],
    color="navy",
    marker="s",
    lw=0,
    s=10,
    label="outer learned",
)
plt.scatter(
    X[inner_numbers, 0],
    X[inner_numbers, 1],
    color="c",
    marker="s",
    lw=0,
    s=10,
    label="inner learned",
)
plt.legend(scatterpoints=1, shadow=False, loc="center")
plt.title("Labels learned with Label Spreading (KNN)")
plt.show()