Run this notebook: Open in Colab Open in Kaggle

Plot Affinity Propagation¶

================================================= Demo of affinity propagation clustering algorithm¶

Reference: Brendan J. Frey and Delbert Dueck, “Clustering by Passing Messages Between Data Points”, Science Feb. 2007

Imports for Affinity Propagation Clustering¶

Affinity Propagation discovers clusters by exchanging “messages” between data points, simultaneously identifying both the number of clusters and which data points serve as cluster centers (exemplars). Unlike K-Means which requires specifying k, Affinity Propagation takes a similarity matrix (default: negative squared Euclidean distance) and a preference parameter that controls how likely each point is to become an exemplar – lower preference values produce fewer, larger clusters while higher values produce more, smaller clusters. The algorithm alternates between sending “responsibility” messages (how well-suited a candidate exemplar is for a point) and “availability” messages (how appropriate it is for a point to choose a candidate), with damping (0.5-1.0) controlling convergence stability.

When to use Affinity Propagation: It excels when the number of clusters is unknown and actual data points should serve as cluster representatives (exemplars), making it interpretable for tasks like document summarization, image selection, or gene expression analysis. The cluster_centers_indices_ attribute returns the indices of the exemplar points in the original dataset, and the visualization draws lines from each point to its exemplar, revealing the cluster structure. The tradeoff is computational cost: Affinity Propagation has O(n^2) memory and O(n^2 * T) time complexity (where T is iterations), making it impractical for datasets larger than a few thousand points without approximation.

# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

import numpy as np

from sklearn import metrics
from sklearn.cluster import AffinityPropagation
from sklearn.datasets import make_blobs

# %%
# Generate sample data
# --------------------
centers = [[1, 1], [-1, -1], [1, -1]]
X, labels_true = make_blobs(
    n_samples=300, centers=centers, cluster_std=0.5, random_state=0
)

# %%
# Compute Affinity Propagation
# ----------------------------
af = AffinityPropagation(preference=-50, random_state=0).fit(X)
cluster_centers_indices = af.cluster_centers_indices_
labels = af.labels_

n_clusters_ = len(cluster_centers_indices)

print("Estimated number of clusters: %d" % n_clusters_)
print("Homogeneity: %0.3f" % metrics.homogeneity_score(labels_true, labels))
print("Completeness: %0.3f" % metrics.completeness_score(labels_true, labels))
print("V-measure: %0.3f" % metrics.v_measure_score(labels_true, labels))
print("Adjusted Rand Index: %0.3f" % metrics.adjusted_rand_score(labels_true, labels))
print(
    "Adjusted Mutual Information: %0.3f"
    % metrics.adjusted_mutual_info_score(labels_true, labels)
)
print(
    "Silhouette Coefficient: %0.3f"
    % metrics.silhouette_score(X, labels, metric="sqeuclidean")
)

# %%
# Plot result
# -----------
import matplotlib.pyplot as plt

plt.close("all")
plt.figure(1)
plt.clf()

colors = plt.cycler("color", plt.cm.viridis(np.linspace(0, 1, 4)))

for k, col in zip(range(n_clusters_), colors):
    class_members = labels == k
    cluster_center = X[cluster_centers_indices[k]]
    plt.scatter(
        X[class_members, 0], X[class_members, 1], color=col["color"], marker="."
    )
    plt.scatter(
        cluster_center[0], cluster_center[1], s=14, color=col["color"], marker="o"
    )
    for x in X[class_members]:
        plt.plot(
            [cluster_center[0], x[0]], [cluster_center[1], x[1]], color=col["color"]
        )

plt.title("Estimated number of clusters: %d" % n_clusters_)
plt.show()