Plot Label Propagation StructureΒΆ
======================================================= Label Propagation circles: Learning a complex structureΒΆ
Example of LabelPropagation learning a complex internal structure to demonstrate βmanifold learningβ. The outer circle should be labeled βredβ and the inner circle βblueβ. Because both label groups lie inside their own distinct shape, we can see that the labels propagate correctly around the circle.
Imports for Label Propagation on Concentric CirclesΒΆ
LabelSpreading with kernel='knn' exploits manifold structure to propagate labels from just 2 labeled points: The two concentric circles generated by make_circles form a dataset where Euclidean distance alone cannot separate the classes (inner and outer points at the same radius are equally distant from the center), but the graph connectivity along each circle provides a natural separation. With only one labeled point on the outer circle (index 0) and one on the inner circle (index -1), the KNN graph connects each point to its nearest neighbors along its respective circle, and label spreading propagates the known labels through these local connections until all 200 points are classified.
The alpha=0.8 clamping parameter controls how strongly the original labels are preserved during propagation: At each iteration, each nodeβs label distribution is updated as a weighted average of its neighborsβ distributions, then the labeled nodesβ distributions are partially reset toward their known labels with strength (1-alpha). Higher alpha values allow labeled nodes to be more influenced by their neighbors (softer clamping), while lower alpha values force labeled nodes to retain their original labels more strongly. On this manifold-structured dataset, the algorithm achieves perfect classification despite having labels for only 1% of the data, demonstrating that geometric structure in the feature space can substitute for large amounts of labeled data.
# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause
# %%
# We generate a dataset with two concentric circles. In addition, a label
# is associated with each sample of the dataset that is: 0 (belonging to
# the outer circle), 1 (belonging to the inner circle), and -1 (unknown).
# Here, all labels but two are tagged as unknown.
import numpy as np
from sklearn.datasets import make_circles
n_samples = 200
X, y = make_circles(n_samples=n_samples, shuffle=False)
outer, inner = 0, 1
labels = np.full(n_samples, -1.0)
labels[0] = outer
labels[-1] = inner
# %%
# Plot raw data
import matplotlib.pyplot as plt
plt.figure(figsize=(4, 4))
plt.scatter(
X[labels == outer, 0],
X[labels == outer, 1],
color="navy",
marker="s",
lw=0,
label="outer labeled",
s=10,
)
plt.scatter(
X[labels == inner, 0],
X[labels == inner, 1],
color="c",
marker="s",
lw=0,
label="inner labeled",
s=10,
)
plt.scatter(
X[labels == -1, 0],
X[labels == -1, 1],
color="darkorange",
marker=".",
label="unlabeled",
)
plt.legend(scatterpoints=1, shadow=False, loc="center")
_ = plt.title("Raw data (2 classes=outer and inner)")
# %%
#
# The aim of :class:`~sklearn.semi_supervised.LabelSpreading` is to associate
# a label to sample where the label is initially unknown.
from sklearn.semi_supervised import LabelSpreading
label_spread = LabelSpreading(kernel="knn", alpha=0.8)
label_spread.fit(X, labels)
# %%
# Now, we can check which labels have been associated with each sample
# when the label was unknown.
output_labels = label_spread.transduction_
output_label_array = np.asarray(output_labels)
outer_numbers = (output_label_array == outer).nonzero()[0]
inner_numbers = (output_label_array == inner).nonzero()[0]
plt.figure(figsize=(4, 4))
plt.scatter(
X[outer_numbers, 0],
X[outer_numbers, 1],
color="navy",
marker="s",
lw=0,
s=10,
label="outer learned",
)
plt.scatter(
X[inner_numbers, 0],
X[inner_numbers, 1],
color="c",
marker="s",
lw=0,
s=10,
label="inner learned",
)
plt.legend(scatterpoints=1, shadow=False, loc="center")
plt.title("Labels learned with Label Spreading (KNN)")
plt.show()