Chapter 9: Diffusion ModelsΒΆ
Generating Images by Learning to Reverse NoiseΒΆ
Diffusion models represent a fundamentally different approach to generative AI compared to autoregressive models like GPT. Instead of generating outputs token-by-token, diffusion models start with pure random noise and gradually refine it into a coherent image through a sequence of small denoising steps. The mathematical framework is elegant: define a forward process that progressively adds Gaussian noise to a real image until it becomes indistinguishable from pure noise, then train a neural network to learn the reverse process β predicting and removing the noise at each step.
The forward process is a fixed Markov chain: \(x_t = \sqrt{\alpha_t} x_{t-1} + \sqrt{1 - \alpha_t} \epsilon\) where \(\epsilon \sim \mathcal{N}(0, I)\). After \(T\) steps (typically 1000), the original image \(x_0\) has been completely destroyed. The neural network learns to approximate the reverse: given a noisy image \(x_t\) and the timestep \(t\), predict the noise \(\epsilon\) that was added. At generation time, you sample random noise and iteratively denoise it, producing a realistic image from nothing. This is the foundation of DALL-E 2, Stable Diffusion, and Midjourney.
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.patches import Circle
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (16, 10)
np.random.seed(42)
How Diffusion WorksΒΆ
TrainingΒΆ
Take real image
Add noise gradually
Train network to predict noise
GenerationΒΆ
Start with pure noise
Denoise step by step
Get realistic image!
Applications: DALL-E, Stable Diffusion, Midjourney
# Diffusion concept
import numpy as np
print("Forward: Image β Noise")
print("Reverse: Noise β Image")
print("\nTrained on millions of images!")