Run this notebook: Open in Colab Open in Kaggle

Empirical Cumulative Distribution Functions with `ecdf(x)`¶

The empirical cumulative distribution function (ECDF) shows, for each value on the x-axis, what fraction of the data falls at or below that value. The Axes.ecdf() method computes and plots this step function directly from raw data, without requiring any binning decisions like histograms do.

Why this matters for data science: ECDFs are one of the most underused yet powerful tools in statistical analysis. Unlike histograms, which require choosing a bin width that can dramatically change the visual impression, ECDFs show the complete distribution with no information loss and no arbitrary parameters. They make it easy to read off exact percentiles (median, quartiles), compare distributions by overlaying multiple ECDFs, and perform statistical tests like the Kolmogorov-Smirnov test which operates directly on the ECDF. In ML, ECDFs are useful for comparing score distributions between classes, examining calibration of predicted probabilities, and checking whether features follow expected distributions. The example generates 200 samples from a normal distribution centered at 4 with standard deviation 1.5.

import matplotlib.pyplot as plt
import numpy as np

plt.style.use('_mpl-gallery')

# make data
np.random.seed(1)
x = 4 + np.random.normal(0, 1.5, 200)

# plot:
fig, ax = plt.subplots()
ax.ecdf(x)
plt.show()

Empirical Cumulative Distribution Functions with ecdf(x)¶

Empirical Cumulative Distribution Functions with `ecdf(x)`¶