2D Histograms with hist2d(x, y)ΒΆ

A 2D histogram divides the x-y plane into a grid of rectangular bins and colors each bin according to the number of data points it contains, creating a heatmap-like view of the joint distribution of two variables. The Axes.hist2d() method extends the familiar 1D histogram concept into two dimensions.

Why this matters for data science: Understanding the joint distribution of two variables is fundamental to statistical analysis and feature engineering. While scatter plots show individual points, 2D histograms reveal density patterns that are invisible in overplotted scatters – modes, ridges, and clusters in the joint distribution become immediately apparent. They are commonly used for visualizing the relationship between predicted and actual values in regression, examining the joint distribution of two features before modeling, and as a faster alternative to kernel density estimation for large datasets. The bins parameter accepts tuples of arrays for fine control: here np.arange(-3, 3, 0.1) creates uniform 0.1-wide bins along both axes, producing a high-resolution density map. Compared to hexbin(), hist2d() uses rectangular bins which align naturally with Cartesian coordinates.

import matplotlib.pyplot as plt
import numpy as np

plt.style.use('_mpl-gallery-nogrid')

# make data: correlated + noise
np.random.seed(1)
x = np.random.randn(5000)
y = 1.2 * x + np.random.randn(5000) / 3

# plot:
fig, ax = plt.subplots()

ax.hist2d(x, y, bins=(np.arange(-3, 3, 0.1), np.arange(-3, 3, 0.1)))

ax.set(xlim=(-2, 2), ylim=(-3, 3))

plt.show()