Plot Affinity PropagationΒΆ
================================================= Demo of affinity propagation clustering algorithmΒΆ
Reference: Brendan J. Frey and Delbert Dueck, βClustering by Passing Messages Between Data Pointsβ, Science Feb. 2007
Imports for Affinity Propagation ClusteringΒΆ
Affinity Propagation discovers clusters by exchanging βmessagesβ between data points, simultaneously identifying both the number of clusters and which data points serve as cluster centers (exemplars). Unlike K-Means which requires specifying k, Affinity Propagation takes a similarity matrix (default: negative squared Euclidean distance) and a preference parameter that controls how likely each point is to become an exemplar β lower preference values produce fewer, larger clusters while higher values produce more, smaller clusters. The algorithm alternates between sending βresponsibilityβ messages (how well-suited a candidate exemplar is for a point) and βavailabilityβ messages (how appropriate it is for a point to choose a candidate), with damping (0.5-1.0) controlling convergence stability.
When to use Affinity Propagation: It excels when the number of clusters is unknown and actual data points should serve as cluster representatives (exemplars), making it interpretable for tasks like document summarization, image selection, or gene expression analysis. The cluster_centers_indices_ attribute returns the indices of the exemplar points in the original dataset, and the visualization draws lines from each point to its exemplar, revealing the cluster structure. The tradeoff is computational cost: Affinity Propagation has O(n^2) memory and O(n^2 * T) time complexity (where T is iterations), making it impractical for datasets larger than a few thousand points without approximation.
# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause
import numpy as np
from sklearn import metrics
from sklearn.cluster import AffinityPropagation
from sklearn.datasets import make_blobs
# %%
# Generate sample data
# --------------------
centers = [[1, 1], [-1, -1], [1, -1]]
X, labels_true = make_blobs(
n_samples=300, centers=centers, cluster_std=0.5, random_state=0
)
# %%
# Compute Affinity Propagation
# ----------------------------
af = AffinityPropagation(preference=-50, random_state=0).fit(X)
cluster_centers_indices = af.cluster_centers_indices_
labels = af.labels_
n_clusters_ = len(cluster_centers_indices)
print("Estimated number of clusters: %d" % n_clusters_)
print("Homogeneity: %0.3f" % metrics.homogeneity_score(labels_true, labels))
print("Completeness: %0.3f" % metrics.completeness_score(labels_true, labels))
print("V-measure: %0.3f" % metrics.v_measure_score(labels_true, labels))
print("Adjusted Rand Index: %0.3f" % metrics.adjusted_rand_score(labels_true, labels))
print(
"Adjusted Mutual Information: %0.3f"
% metrics.adjusted_mutual_info_score(labels_true, labels)
)
print(
"Silhouette Coefficient: %0.3f"
% metrics.silhouette_score(X, labels, metric="sqeuclidean")
)
# %%
# Plot result
# -----------
import matplotlib.pyplot as plt
plt.close("all")
plt.figure(1)
plt.clf()
colors = plt.cycler("color", plt.cm.viridis(np.linspace(0, 1, 4)))
for k, col in zip(range(n_clusters_), colors):
class_members = labels == k
cluster_center = X[cluster_centers_indices[k]]
plt.scatter(
X[class_members, 0], X[class_members, 1], color=col["color"], marker="."
)
plt.scatter(
cluster_center[0], cluster_center[1], s=14, color=col["color"], marker="o"
)
for x in X[class_members]:
plt.plot(
[cluster_center[0], x[0]], [cluster_center[1], x[1]], color=col["color"]
)
plt.title("Estimated number of clusters: %d" % n_clusters_)
plt.show()