Hexagonal Binning with hexbin(x, y, C)ΒΆ
Hexagonal binning divides the 2D plane into a grid of hexagonal cells and colors each cell by the count (or aggregated value) of data points that fall within it. The Axes.hexbin() method is an efficient way to visualize the density of large scatter plots where individual points would overlap and obscure the distribution.
Why this matters for data science: When you have thousands or millions of data points, a standard scatter plot becomes an unreadable blob of overlapping markers. Hexbin solves this by aggregating points into hexagonal bins and using color to encode density, effectively creating a 2D histogram that reveals the underlying structure. Hexagons are preferred over squares because they have more uniform neighbor distances and avoid the visual artifacts of rectangular grids. The gridsize=20 parameter controls the number of hexagons across the x-axis β smaller values create larger, coarser bins while larger values create finer resolution. The example generates correlated data (y = 1.2*x + noise) with 5000 points, making the linear relationship and its spread clearly visible through the density coloring rather than through individual points.
import matplotlib.pyplot as plt
import numpy as np
plt.style.use('_mpl-gallery-nogrid')
# make data: correlated + noise
np.random.seed(1)
x = np.random.randn(5000)
y = 1.2 * x + np.random.randn(5000) / 3
# plot:
fig, ax = plt.subplots()
ax.hexbin(x, y, gridsize=20)
ax.set(xlim=(-2, 2), ylim=(-3, 3))
plt.show()