Chapter 9: Diffusion ModelsΒΆ

Generating Images by Learning to Reverse NoiseΒΆ

Diffusion models represent a fundamentally different approach to generative AI compared to autoregressive models like GPT. Instead of generating outputs token-by-token, diffusion models start with pure random noise and gradually refine it into a coherent image through a sequence of small denoising steps. The mathematical framework is elegant: define a forward process that progressively adds Gaussian noise to a real image until it becomes indistinguishable from pure noise, then train a neural network to learn the reverse process – predicting and removing the noise at each step.

The forward process is a fixed Markov chain: \(x_t = \sqrt{\alpha_t} x_{t-1} + \sqrt{1 - \alpha_t} \epsilon\) where \(\epsilon \sim \mathcal{N}(0, I)\). After \(T\) steps (typically 1000), the original image \(x_0\) has been completely destroyed. The neural network learns to approximate the reverse: given a noisy image \(x_t\) and the timestep \(t\), predict the noise \(\epsilon\) that was added. At generation time, you sample random noise and iteratively denoise it, producing a realistic image from nothing. This is the foundation of DALL-E 2, Stable Diffusion, and Midjourney.

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.patches import Circle

sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (16, 10)
np.random.seed(42)

How Diffusion WorksΒΆ

TrainingΒΆ

  1. Take real image

  2. Add noise gradually

  3. Train network to predict noise

GenerationΒΆ

  1. Start with pure noise

  2. Denoise step by step

  3. Get realistic image!

Applications: DALL-E, Stable Diffusion, Midjourney

# Diffusion concept
import numpy as np
print("Forward: Image β†’ Noise")
print("Reverse: Noise β†’ Image")
print("\nTrained on millions of images!")