Run this notebook: Open in Colab Open in Kaggle

Stable Diffusion: Text-to-Image Generation¶

Stable Diffusion is an open-source latent diffusion model that generates high-quality images from text prompts. Run it locally for free, fine-tune it, and control generation precisely.

# Install required packages (December 2025)
# !pip install diffusers>=0.31.0 transformers>=4.47.0 accelerate>=1.2.0 safetensors>=0.4.0

import torch
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
from PIL import Image
import matplotlib.pyplot as plt

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

1. Load Model (December 2025)¶

Model Options¶

Model	Quality	Speed	VRAM	Release
SDXL Turbo	Excellent	Very Fast	6GB	2023
SDXL	Excellent	Medium	8GB	2023
SD 3.5	Best	Medium	10GB+	2024
FLUX.1	Best Overall	Slow	12GB+	2024

December 2025 Recommendations:

Best Quality: FLUX.1-dev or SD 3.5 Large
Best Speed: SDXL Turbo (4 steps!)
Balanced: SDXL (still excellent)
Legacy: SD 1.5/2.1 (outdated)

# Load SDXL Turbo (December 2025 - Fast & Quality)
model_id = "stabilityai/sdxl-turbo"
# Modern alternatives (2024-2025):
# "stabilityai/stable-diffusion-3.5-large" - Best quality from Stability
# "black-forest-labs/FLUX.1-dev" - Best overall (requires login)
# "stabilityai/stable-diffusion-xl-base-1.0" - SDXL base (slower but good)

# Legacy models (2023):
# "runwayml/stable-diffusion-v1-5" - Old, but fast
# "stabilityai/stable-diffusion-2-1" - Better than 1.5

# Load pipeline
pipe = StableDiffusionPipeline.from_pretrained(
    model_id,
    torch_dtype=torch.float16,  # Half precision for speed
    variant="fp16",  # Use FP16 weights
    safety_checker=None  # Disable for local use (optional)
)

# SDXL Turbo uses fewer steps
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)

# Move to GPU
device = "cuda" if torch.cuda.is_available() else "cpu"
pipe = pipe.to(device)

print(f"Model: {model_id}")
print(f"Device: {device}")
if torch.cuda.is_available():
    print(f"VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f}GB")

2. Generate Your First Image¶

# Simple generation
prompt = "a photo of an astronaut riding a horse on mars"

image = pipe(
    prompt,
    num_inference_steps=25,  # More steps = better quality (but slower)
    guidance_scale=7.5,      # How closely to follow prompt (higher = more literal)
).images[0]

# Display
plt.figure(figsize=(8, 8))
plt.imshow(image)
plt.axis('off')
plt.title(prompt)
plt.show()

# Save
image.save("astronaut_horse.png")
print("Image saved!")

3. Prompt Engineering for Images¶

Quality prompts make quality images!

# Prompt structure
def create_prompt(
    subject,
    style="",
    lighting="",
    quality_tags="high quality, detailed"
):
    """Build a good prompt."""
    parts = [subject]
    if style:
        parts.append(style)
    if lighting:
        parts.append(lighting)
    if quality_tags:
        parts.append(quality_tags)
    return ", ".join(parts)

# Examples
prompts = [
    create_prompt(
        "a majestic lion",
        style="digital art, fantasy",
        lighting="golden hour lighting"
    ),
    create_prompt(
        "a futuristic city",
        style="cyberpunk, neon lights",
        lighting="night time"
    ),
    create_prompt(
        "a serene mountain landscape",
        style="oil painting, impressionist",
        lighting="soft morning light"
    ),
]

# Generate all
fig, axes = plt.subplots(1, len(prompts), figsize=(15, 5))

for ax, prompt in zip(axes, prompts):
    image = pipe(prompt, num_inference_steps=25).images[0]
    ax.imshow(image)
    ax.set_title(prompt[:40] + "...", fontsize=8)
    ax.axis('off')

plt.tight_layout()
plt.show()

4. Control Generation Parameters¶

prompt = "a fantasy castle on a floating island"

# Try different guidance scales
guidance_scales = [3, 7.5, 15]

fig, axes = plt.subplots(1, 3, figsize=(15, 5))

for ax, scale in zip(axes, guidance_scales):
    image = pipe(
        prompt,
        num_inference_steps=25,
        guidance_scale=scale,
        generator=torch.manual_seed(42)  # Same seed for comparison
    ).images[0]
    
    ax.imshow(image)
    ax.set_title(f"Guidance: {scale}")
    ax.axis('off')

plt.suptitle(prompt)
plt.tight_layout()
plt.show()

# Different seeds = different images
prompt = "a cute robot assistant"

fig, axes = plt.subplots(1, 3, figsize=(15, 5))

for ax, seed in zip(axes, [42, 123, 999]):
    image = pipe(
        prompt,
        num_inference_steps=25,
        generator=torch.manual_seed(seed)
    ).images[0]
    
    ax.imshow(image)
    ax.set_title(f"Seed: {seed}")
    ax.axis('off')

plt.suptitle(prompt)
plt.tight_layout()
plt.show()

5. Negative Prompts¶

Tell the model what NOT to include.

prompt = "a beautiful portrait of a woman"
negative_prompt = "blurry, distorted, ugly, bad anatomy, extra limbs"

# Without negative prompt
image1 = pipe(
    prompt,
    num_inference_steps=30,
    generator=torch.manual_seed(42)
).images[0]

# With negative prompt
image2 = pipe(
    prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=30,
    generator=torch.manual_seed(42)
).images[0]

# Compare
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

ax1.imshow(image1)
ax1.set_title("Without Negative Prompt")
ax1.axis('off')

ax2.imshow(image2)
ax2.set_title("With Negative Prompt")
ax2.axis('off')

plt.tight_layout()
plt.show()

6. Batch Generation¶

# Generate multiple variations
prompt = "a magical forest with glowing mushrooms"

images = pipe(
    prompt,
    num_images_per_prompt=4,  # Generate 4 at once
    num_inference_steps=25
).images

# Display grid
fig, axes = plt.subplots(2, 2, figsize=(10, 10))

for ax, img in zip(axes.flat, images):
    ax.imshow(img)
    ax.axis('off')

plt.suptitle(prompt, fontsize=14)
plt.tight_layout()
plt.show()

7. Image-to-Image Generation¶

Start from an existing image.

from diffusers import StableDiffusionImg2ImgPipeline

# Load img2img pipeline
img2img_pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
    model_id,
    torch_dtype=torch.float16
).to(device)

# Create or load initial image
init_image = pipe(
    "a simple sketch of a house",
    num_inference_steps=20
).images[0]

# Transform it
new_image = img2img_pipe(
    prompt="a beautiful victorian mansion, detailed architecture, photorealistic",
    image=init_image,
    strength=0.75,  # How much to change (0-1)
    num_inference_steps=30
).images[0]

# Compare
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

ax1.imshow(init_image)
ax1.set_title("Initial Image")
ax1.axis('off')

ax2.imshow(new_image)
ax2.set_title("Transformed")
ax2.axis('off')

plt.tight_layout()
plt.show()

Prompt Engineering Tips¶

Structure Your Prompts¶

[Subject], [Style], [Lighting], [Quality modifiers]

Examples:

# ✅ Good
"a dragon, fantasy art style, dramatic lighting, highly detailed, 8k"

# ❌ Bad
"dragon"

Style Keywords¶

Photorealistic: “photorealistic”, “photo”, “4k”, “8k”
Artistic: “oil painting”, “watercolor”, “digital art”, “concept art”
3D: “3d render”, “octane render”, “unreal engine”
Specific artists: “in the style of Van Gogh”, “by Greg Rutkowski”

Lighting¶

“golden hour”, “soft lighting”, “dramatic lighting”
“volumetric lighting”, “studio lighting”
“sunrise”, “sunset”, “neon lights”

Quality Boosters¶

“highly detailed”, “intricate details”
“masterpiece”, “award winning”
“trending on artstation”

Common Negative Prompts¶

negative = "blurry, low quality, distorted, ugly, bad anatomy, "
negative += "deformed, disfigured, extra limbs, missing limbs, "
negative += "watermark, text, signature"

Performance Optimization¶

Memory Optimization¶

# Enable attention slicing (reduces VRAM)
pipe.enable_attention_slicing()

# Enable VAE slicing (for large images)
pipe.enable_vae_slicing()

# Use CPU offload (very low VRAM)
pipe.enable_sequential_cpu_offload()

Speed Optimization¶

# Fewer steps (faster, lower quality)
num_inference_steps=15  # vs 50

# Use faster scheduler
from diffusers import DPMSolverMultistepScheduler
pipe.scheduler = DPMSolverMultistepScheduler.from_config(
    pipe.scheduler.config
)

# Compile model (PyTorch 2.0+)
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead")

Exercise: Create an Image Generator¶

Build a tool to generate variations of a concept.

class ImageGenerator:
    def __init__(self, pipe):
        self.pipe = pipe
    
    def generate_variations(
        self,
        base_prompt,
        styles,
        negative_prompt=None,
        seed=42
    ):
        """Generate same subject in different styles."""
        images = []
        prompts = []
        
        for style in styles:
            prompt = f"{base_prompt}, {style}"
            prompts.append(prompt)
            
            image = self.pipe(
                prompt,
                negative_prompt=negative_prompt,
                num_inference_steps=25,
                generator=torch.manual_seed(seed)
            ).images[0]
            
            images.append(image)
        
        return images, prompts

# Use it
generator = ImageGenerator(pipe)

images, prompts = generator.generate_variations(
    base_prompt="a cat wearing a wizard hat",
    styles=[
        "watercolor painting",
        "digital art, vibrant colors",
        "photorealistic, studio lighting"
    ],
    negative_prompt="blurry, low quality"
)

# Display
fig, axes = plt.subplots(1, len(images), figsize=(15, 5))
for ax, img, prompt in zip(axes, images, prompts):
    ax.imshow(img)
    ax.set_title(prompt.split(",")[-1].strip(), fontsize=10)
    ax.axis('off')
plt.tight_layout()
plt.show()

Key Takeaways¶

Open source: Free to use and modify
Runs locally: Privacy and control
Prompt engineering: Key to good results
Flexible: Many parameters to tune
Active community: Thousands of models and LoRAs

December 2025 State of the Art¶

Best Models:

FLUX.1 (Black Forest Labs) - Best quality, photorealism
SD 3.5 Large (Stability AI) - Excellent, fast iteration
SDXL Turbo - Fast (1-4 steps), good quality
SDXL - Solid baseline, widely supported

For Production (December 2025):

API: OpenAI DALL-E 3, Midjourney, Ideogram
Self-hosted: FLUX.1, SD 3.5, SDXL
Fast iteration: SDXL Turbo, LCM-LoRA

Cost Comparison:

Local (SDXL Turbo): Free after GPU cost
DALL-E 3: $0.040-0.120 per image
Midjourney: $10-60/month subscription

Next Steps¶

02_sdxl_advanced.ipynb - SDXL with ControlNet, IP-Adapter
03_flux_basics.ipynb - FLUX.1 for best quality
04_lora_training.ipynb - Fine-tune for specific styles
05_controlnet.ipynb - Precise control over composition
06_inpainting.ipynb - Edit parts of images