Stable Diffusion: Text-to-Image Generation

Stable Diffusion is an open-source latent diffusion model that generates high-quality images from text prompts. Run it locally for free, fine-tune it, and control generation precisely.

# Install required packages (December 2025)
# !pip install diffusers>=0.31.0 transformers>=4.47.0 accelerate>=1.2.0 safetensors>=0.4.0
import torch
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
from PIL import Image
import matplotlib.pyplot as plt

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

1. Load Model (December 2025)

Model Options

Model

Quality

Speed

VRAM

Release

SDXL Turbo

Excellent

Very Fast

6GB

2023

SDXL

Excellent

Medium

8GB

2023

SD 3.5

Best

Medium

10GB+

2024

FLUX.1

Best Overall

Slow

12GB+

2024

December 2025 Recommendations:

  • Best Quality: FLUX.1-dev or SD 3.5 Large

  • Best Speed: SDXL Turbo (4 steps!)

  • Balanced: SDXL (still excellent)

  • Legacy: SD 1.5/2.1 (outdated)

# Load SDXL Turbo (December 2025 - Fast & Quality)
model_id = "stabilityai/sdxl-turbo"
# Modern alternatives (2024-2025):
# "stabilityai/stable-diffusion-3.5-large" - Best quality from Stability
# "black-forest-labs/FLUX.1-dev" - Best overall (requires login)
# "stabilityai/stable-diffusion-xl-base-1.0" - SDXL base (slower but good)

# Legacy models (2023):
# "runwayml/stable-diffusion-v1-5" - Old, but fast
# "stabilityai/stable-diffusion-2-1" - Better than 1.5

# Load pipeline
pipe = StableDiffusionPipeline.from_pretrained(
    model_id,
    torch_dtype=torch.float16,  # Half precision for speed
    variant="fp16",  # Use FP16 weights
    safety_checker=None  # Disable for local use (optional)
)

# SDXL Turbo uses fewer steps
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)

# Move to GPU
device = "cuda" if torch.cuda.is_available() else "cpu"
pipe = pipe.to(device)

print(f"Model: {model_id}")
print(f"Device: {device}")
if torch.cuda.is_available():
    print(f"VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f}GB")

2. Generate Your First Image

# Simple generation
prompt = "a photo of an astronaut riding a horse on mars"

image = pipe(
    prompt,
    num_inference_steps=25,  # More steps = better quality (but slower)
    guidance_scale=7.5,      # How closely to follow prompt (higher = more literal)
).images[0]

# Display
plt.figure(figsize=(8, 8))
plt.imshow(image)
plt.axis('off')
plt.title(prompt)
plt.show()

# Save
image.save("astronaut_horse.png")
print("Image saved!")

3. Prompt Engineering for Images

Quality prompts make quality images!

# Prompt structure
def create_prompt(
    subject,
    style="",
    lighting="",
    quality_tags="high quality, detailed"
):
    """Build a good prompt."""
    parts = [subject]
    if style:
        parts.append(style)
    if lighting:
        parts.append(lighting)
    if quality_tags:
        parts.append(quality_tags)
    return ", ".join(parts)

# Examples
prompts = [
    create_prompt(
        "a majestic lion",
        style="digital art, fantasy",
        lighting="golden hour lighting"
    ),
    create_prompt(
        "a futuristic city",
        style="cyberpunk, neon lights",
        lighting="night time"
    ),
    create_prompt(
        "a serene mountain landscape",
        style="oil painting, impressionist",
        lighting="soft morning light"
    ),
]

# Generate all
fig, axes = plt.subplots(1, len(prompts), figsize=(15, 5))

for ax, prompt in zip(axes, prompts):
    image = pipe(prompt, num_inference_steps=25).images[0]
    ax.imshow(image)
    ax.set_title(prompt[:40] + "...", fontsize=8)
    ax.axis('off')

plt.tight_layout()
plt.show()

4. Control Generation Parameters

prompt = "a fantasy castle on a floating island"

# Try different guidance scales
guidance_scales = [3, 7.5, 15]

fig, axes = plt.subplots(1, 3, figsize=(15, 5))

for ax, scale in zip(axes, guidance_scales):
    image = pipe(
        prompt,
        num_inference_steps=25,
        guidance_scale=scale,
        generator=torch.manual_seed(42)  # Same seed for comparison
    ).images[0]
    
    ax.imshow(image)
    ax.set_title(f"Guidance: {scale}")
    ax.axis('off')

plt.suptitle(prompt)
plt.tight_layout()
plt.show()
# Different seeds = different images
prompt = "a cute robot assistant"

fig, axes = plt.subplots(1, 3, figsize=(15, 5))

for ax, seed in zip(axes, [42, 123, 999]):
    image = pipe(
        prompt,
        num_inference_steps=25,
        generator=torch.manual_seed(seed)
    ).images[0]
    
    ax.imshow(image)
    ax.set_title(f"Seed: {seed}")
    ax.axis('off')

plt.suptitle(prompt)
plt.tight_layout()
plt.show()

5. Negative Prompts

Tell the model what NOT to include.

prompt = "a beautiful portrait of a woman"
negative_prompt = "blurry, distorted, ugly, bad anatomy, extra limbs"

# Without negative prompt
image1 = pipe(
    prompt,
    num_inference_steps=30,
    generator=torch.manual_seed(42)
).images[0]

# With negative prompt
image2 = pipe(
    prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=30,
    generator=torch.manual_seed(42)
).images[0]

# Compare
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

ax1.imshow(image1)
ax1.set_title("Without Negative Prompt")
ax1.axis('off')

ax2.imshow(image2)
ax2.set_title("With Negative Prompt")
ax2.axis('off')

plt.tight_layout()
plt.show()

6. Batch Generation

# Generate multiple variations
prompt = "a magical forest with glowing mushrooms"

images = pipe(
    prompt,
    num_images_per_prompt=4,  # Generate 4 at once
    num_inference_steps=25
).images

# Display grid
fig, axes = plt.subplots(2, 2, figsize=(10, 10))

for ax, img in zip(axes.flat, images):
    ax.imshow(img)
    ax.axis('off')

plt.suptitle(prompt, fontsize=14)
plt.tight_layout()
plt.show()

7. Image-to-Image Generation

Start from an existing image.

from diffusers import StableDiffusionImg2ImgPipeline

# Load img2img pipeline
img2img_pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
    model_id,
    torch_dtype=torch.float16
).to(device)

# Create or load initial image
init_image = pipe(
    "a simple sketch of a house",
    num_inference_steps=20
).images[0]

# Transform it
new_image = img2img_pipe(
    prompt="a beautiful victorian mansion, detailed architecture, photorealistic",
    image=init_image,
    strength=0.75,  # How much to change (0-1)
    num_inference_steps=30
).images[0]

# Compare
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

ax1.imshow(init_image)
ax1.set_title("Initial Image")
ax1.axis('off')

ax2.imshow(new_image)
ax2.set_title("Transformed")
ax2.axis('off')

plt.tight_layout()
plt.show()

Prompt Engineering Tips

Structure Your Prompts

[Subject], [Style], [Lighting], [Quality modifiers]

Examples:

# ✅ Good
"a dragon, fantasy art style, dramatic lighting, highly detailed, 8k"

# ❌ Bad
"dragon"

Style Keywords

  • Photorealistic: “photorealistic”, “photo”, “4k”, “8k”

  • Artistic: “oil painting”, “watercolor”, “digital art”, “concept art”

  • 3D: “3d render”, “octane render”, “unreal engine”

  • Specific artists: “in the style of Van Gogh”, “by Greg Rutkowski”

Lighting

  • “golden hour”, “soft lighting”, “dramatic lighting”

  • “volumetric lighting”, “studio lighting”

  • “sunrise”, “sunset”, “neon lights”

Quality Boosters

  • “highly detailed”, “intricate details”

  • “masterpiece”, “award winning”

  • “trending on artstation”

Common Negative Prompts

negative = "blurry, low quality, distorted, ugly, bad anatomy, "
negative += "deformed, disfigured, extra limbs, missing limbs, "
negative += "watermark, text, signature"

Performance Optimization

Memory Optimization

# Enable attention slicing (reduces VRAM)
pipe.enable_attention_slicing()

# Enable VAE slicing (for large images)
pipe.enable_vae_slicing()

# Use CPU offload (very low VRAM)
pipe.enable_sequential_cpu_offload()

Speed Optimization

# Fewer steps (faster, lower quality)
num_inference_steps=15  # vs 50

# Use faster scheduler
from diffusers import DPMSolverMultistepScheduler
pipe.scheduler = DPMSolverMultistepScheduler.from_config(
    pipe.scheduler.config
)

# Compile model (PyTorch 2.0+)
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead")

Exercise: Create an Image Generator

Build a tool to generate variations of a concept.

class ImageGenerator:
    def __init__(self, pipe):
        self.pipe = pipe
    
    def generate_variations(
        self,
        base_prompt,
        styles,
        negative_prompt=None,
        seed=42
    ):
        """Generate same subject in different styles."""
        images = []
        prompts = []
        
        for style in styles:
            prompt = f"{base_prompt}, {style}"
            prompts.append(prompt)
            
            image = self.pipe(
                prompt,
                negative_prompt=negative_prompt,
                num_inference_steps=25,
                generator=torch.manual_seed(seed)
            ).images[0]
            
            images.append(image)
        
        return images, prompts

# Use it
generator = ImageGenerator(pipe)

images, prompts = generator.generate_variations(
    base_prompt="a cat wearing a wizard hat",
    styles=[
        "watercolor painting",
        "digital art, vibrant colors",
        "photorealistic, studio lighting"
    ],
    negative_prompt="blurry, low quality"
)

# Display
fig, axes = plt.subplots(1, len(images), figsize=(15, 5))
for ax, img, prompt in zip(axes, images, prompts):
    ax.imshow(img)
    ax.set_title(prompt.split(",")[-1].strip(), fontsize=10)
    ax.axis('off')
plt.tight_layout()
plt.show()

Key Takeaways

  1. Open source: Free to use and modify

  2. Runs locally: Privacy and control

  3. Prompt engineering: Key to good results

  4. Flexible: Many parameters to tune

  5. Active community: Thousands of models and LoRAs

December 2025 State of the Art

Best Models:

  1. FLUX.1 (Black Forest Labs) - Best quality, photorealism

  2. SD 3.5 Large (Stability AI) - Excellent, fast iteration

  3. SDXL Turbo - Fast (1-4 steps), good quality

  4. SDXL - Solid baseline, widely supported

For Production (December 2025):

  • API: OpenAI DALL-E 3, Midjourney, Ideogram

  • Self-hosted: FLUX.1, SD 3.5, SDXL

  • Fast iteration: SDXL Turbo, LCM-LoRA

Cost Comparison:

  • Local (SDXL Turbo): Free after GPU cost

  • DALL-E 3: $0.040-0.120 per image

  • Midjourney: $10-60/month subscription

Next Steps

  • 02_sdxl_advanced.ipynb - SDXL with ControlNet, IP-Adapter

  • 03_flux_basics.ipynb - FLUX.1 for best quality

  • 04_lora_training.ipynb - Fine-tune for specific styles

  • 05_controlnet.ipynb - Precise control over composition

  • 06_inpainting.ipynb - Edit parts of images