Run this notebook: Open in Colab Open in Kaggle

Plot Spectral Coclustering¶

============================================== A demo of the Spectral Co-Clustering algorithm¶

This example demonstrates how to generate a dataset and bicluster it using the Spectral Co-Clustering algorithm.

The dataset is generated using the make_biclusters function, which creates a matrix of small values and implants bicluster with large values. The rows and columns are then shuffled and passed to the Spectral Co-Clustering algorithm. Rearranging the shuffled matrix to make biclusters contiguous shows how accurately the algorithm found the biclusters.

Imports for Spectral Co-Clustering on Synthetic Biclusters¶

SpectralCoclustering partitions rows and columns into corresponding groups that form diagonal blocks: Unlike SpectralBiclustering which finds a grid of row-clusters x column-clusters, co-clustering assigns each row and each column to exactly one of k clusters, producing k biclusters along the diagonal of the rearranged matrix. The algorithm normalizes the data matrix to create a bipartite graph, computes its singular vectors, and applies k-means to the concatenated left and right singular vectors. The make_biclusters function generates a matrix with k implanted dense blocks amid a noisy background, providing ground truth for evaluation.

Rearranging the shuffled matrix by learned labels visually validates the clustering quality: After randomly permuting rows and columns (simulating the real-world scenario where the natural ordering is unknown), SpectralCoclustering recovers the block structure from the shuffled matrix. Sorting rows by model.row_labels_ and columns by model.column_labels_ reassembles the blocks, and the consensus_score quantifies the match. With noise=5 on a (300, 300) matrix with 5 clusters, the algorithm typically achieves near-perfect recovery, demonstrating its effectiveness on matrices where both row and column structure carry meaningful information.

# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

import numpy as np
from matplotlib import pyplot as plt

from sklearn.cluster import SpectralCoclustering
from sklearn.datasets import make_biclusters
from sklearn.metrics import consensus_score

data, rows, columns = make_biclusters(
    shape=(300, 300), n_clusters=5, noise=5, shuffle=False, random_state=0
)

plt.matshow(data, cmap=plt.cm.Blues)
plt.title("Original dataset")

# shuffle clusters
rng = np.random.RandomState(0)
row_idx = rng.permutation(data.shape[0])
col_idx = rng.permutation(data.shape[1])
data = data[row_idx][:, col_idx]

plt.matshow(data, cmap=plt.cm.Blues)
plt.title("Shuffled dataset")

model = SpectralCoclustering(n_clusters=5, random_state=0)
model.fit(data)
score = consensus_score(model.biclusters_, (rows[:, row_idx], columns[:, col_idx]))

print("consensus score: {:.3f}".format(score))

fit_data = data[np.argsort(model.row_labels_)]
fit_data = fit_data[:, np.argsort(model.column_labels_)]

plt.matshow(fit_data, cmap=plt.cm.Blues)
plt.title("After biclustering; rearranged to show biclusters")

plt.show()