Run this notebook: Open in Colab Open in Kaggle

Containerizing ML Applications with Docker¶

🎯 Learning Objectives¶

Understand Docker concepts
Build Docker images for ML apps
Create multi-stage builds
Optimize image size
Use Docker Compose for multi-container apps

Why Docker?¶

Problems without Docker:

“Works on my machine” syndrome
Dependency conflicts
Environment inconsistencies
Difficult deployment

Docker benefits:

Consistent environments
Easy deployment
Isolation
Reproducibility
Scalability

Docker Basics¶

Key Concepts¶

Image: Blueprint for containers (like a class)
Container: Running instance of an image (like an object)
Dockerfile: Instructions to build an image
Registry: Storage for images (Docker Hub, ECR, etc.)

Basic Commands¶

# Build image
docker build -t myapp:v1 .

# Run container
docker run -p 8000:8000 myapp:v1

# List containers
docker ps

# Stop container
docker stop <container_id>

# Remove container
docker rm <container_id>

Simple ML API Dockerfile¶

A Dockerfile is a recipe that describes how to build an image layer by layer. The pattern below follows a standard convention for Python ML applications: start from a slim base image, install dependencies first (to leverage Docker’s layer caching), then copy the application code. The WORKDIR /app directive sets the working directory inside the container, EXPOSE 8000 documents which port the app listens on, and CMD specifies the default command when the container starts. By copying requirements.txt before the rest of the code, Docker can skip reinstalling packages when only your application logic changes – a significant speedup during development.

# Create a simple Dockerfile
dockerfile_content = '''FROM python:3.11-slim

WORKDIR /app

# Copy requirements
COPY requirements.txt .

# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy application
COPY . .

# Expose port
EXPOSE 8000

# Run application
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
'''

# Save to file (in practice)
print("Dockerfile created:")
print(dockerfile_content)

Requirements File¶

Pinning exact dependency versions in requirements.txt is essential for reproducible builds. Without version pins, running pip install on different days could pull different package versions, leading to subtle bugs or broken models. For ML applications, this is especially important because numerical libraries like NumPy and scikit-learn can produce slightly different results across versions. The --no-cache-dir flag used in the Dockerfile prevents pip from storing downloaded packages in the container, keeping the image size smaller.

requirements = '''fastapi==0.104.1
uvicorn[standard]==0.24.0
scikit-learn==1.3.2
joblib==1.3.2
numpy==1.26.2
pydantic==2.5.0
'''

print("requirements.txt:")
print(requirements)

Building and Running¶

# Build the image
docker build -t ml-api:v1 .

# Run the container
docker run -d -p 8000:8000 --name ml-api ml-api:v1

# Check logs
docker logs ml-api

# Test the API
curl http://localhost:8000/health

Multi-Stage Build (Optimized)¶

A multi-stage build uses two or more FROM statements to separate the build environment from the runtime environment. The first stage (builder) installs compilers like gcc and g++ needed to build C-extension packages (NumPy, scikit-learn), then pip-installs everything into the user directory. The second stage starts from a clean python:3.11-slim image and copies only the installed packages, leaving behind all build tools. The result is a dramatically smaller image – often 50-70% reduction – which means faster pulls, lower storage costs, and a reduced attack surface since the production container has no compilers or development headers.

optimized_dockerfile = '''# Stage 1: Build dependencies
FROM python:3.11-slim AS builder

WORKDIR /app

# Install build dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    gcc \
    g++ \
    && rm -rf /var/lib/apt/lists/*

# Install Python packages
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt

# Stage 2: Runtime
FROM python:3.11-slim

WORKDIR /app

# Copy only necessary files from builder
COPY --from=builder /root/.local /root/.local

# Copy application code
COPY main.py .
COPY models/ models/

# Update PATH
ENV PATH=/root/.local/bin:$PATH

EXPOSE 8000

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
'''

print("Optimized Multi-Stage Dockerfile:")
print(optimized_dockerfile)
print("\nBenefits:")
print("- Smaller image size (no build tools in final image)")
print("- Faster deployment")
print("- Better security (fewer packages)")

Docker Compose for Multi-Container Apps¶

Production ML systems rarely consist of a single container. Docker Compose lets you define and run multi-container applications with a single YAML file. The configuration below orchestrates three services: the ML API, a Redis cache for storing prediction results, and Prometheus for metrics collection. The depends_on directive ensures Redis starts before the API, and volumes mounts let you share model files and configuration between the host and containers. In development, docker-compose up -d starts the entire stack with one command; in production, this same topology translates to Kubernetes manifests or cloud-native equivalents.

docker_compose = '''version: '3.8'

services:
  # ML API Service
  api:
    build: .
    ports:
      - "8000:8000"
    environment:
      - MODEL_PATH=/app/models
      - REDIS_HOST=redis
    depends_on:
      - redis
    volumes:
      - ./models:/app/models
  
  # Redis for caching
  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
  
  # Monitoring
  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
'''

print("docker-compose.yml:")
print(docker_compose)
print("\nUsage:")
print("  docker-compose up -d     # Start all services")
print("  docker-compose down       # Stop all services")
print("  docker-compose logs -f   # View logs")

API with Redis Caching¶

Caching is one of the simplest ways to reduce inference latency and compute costs. If the same input appears repeatedly – common in recommendation systems and search – you can store the prediction and return it instantly on subsequent requests. The implementation below uses Redis as an external cache, hashing the input features with MD5 to create a deterministic cache key. The setex method stores results with a 3600-second TTL (time to live), ensuring stale predictions are eventually evicted. Returning a cached flag in the response helps monitoring systems distinguish cache hits from actual model invocations.

from fastapi import FastAPI
import redis
import json
import hashlib

app = FastAPI()

# Connect to Redis
try:
    cache = redis.Redis(host='redis', port=6379, decode_responses=True)
    cache.ping()
    print("✓ Connected to Redis")
except:
    cache = None
    print("⚠️  Redis not available, running without cache")

def get_cache_key(data):
    """Generate cache key from input"""
    return hashlib.md5(json.dumps(data, sort_keys=True).encode()).hexdigest()

@app.post("/predict")
async def predict(features: dict):
    # Check cache
    if cache:
        cache_key = get_cache_key(features)
        cached_result = cache.get(cache_key)
        
        if cached_result:
            print("✓ Cache hit")
            return {"prediction": json.loads(cached_result), "cached": True}
    
    # Perform prediction (mock)
    result = {"class": 0, "confidence": 0.95}
    
    # Cache result (1 hour TTL)
    if cache:
        cache.setex(cache_key, 3600, json.dumps(result))
        print("✓ Result cached")
    
    return {"prediction": result, "cached": False}

print("API with caching ready")

Best Practices¶

Image Optimization¶

Use specific base images
```
FROM python:3.11-slim  # Not 'latest'
```

Minimize layers

# Bad (multiple layers)
RUN pip install pandas
RUN pip install numpy

# Good (single layer)
RUN pip install pandas numpy

Use .dockerignore

__pycache__
*.pyc
.git
.env
tests/
*.md

Order matters (cache)

# Copy requirements first (changes less often)
COPY requirements.txt .
RUN pip install -r requirements.txt

# Copy code last (changes often)
COPY . .

Health Checks¶

Docker’s HEALTHCHECK instruction defines a command that runs periodically inside the container to verify the application is still functioning. If the health check fails consecutively (controlled by --retries), Docker marks the container as unhealthy, which orchestrators like Docker Swarm or Kubernetes use to trigger automatic restarts. The --start-period gives the application time to initialize before health checks begin – important for ML APIs that need to load large models into memory at startup. A well-configured health check is the difference between a service that self-heals and one that silently serves errors.

healthcheck_dockerfile = '''FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .

# Add health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=40s --retries=3 \
  CMD curl -f http://localhost:8000/health || exit 1

EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
'''

print("Dockerfile with health check:")
print(healthcheck_dockerfile)

Deployment Workflow¶

1. Build¶

docker build -t myapp:v1.0 .

2. Test Locally¶

docker run -p 8000:8000 myapp:v1.0
curl http://localhost:8000/health

3. Tag for Registry¶

docker tag myapp:v1.0 myregistry.com/myapp:v1.0

4. Push to Registry¶

docker push myregistry.com/myapp:v1.0

5. Deploy¶

# Pull and run on production server
docker pull myregistry.com/myapp:v1.0
docker run -d -p 8000:8000 myregistry.com/myapp:v1.0

Key Takeaways¶

✅ Docker ensures consistent environments ✅ Multi-stage builds reduce image size ✅ Docker Compose manages multi-container apps ✅ Caching speeds up predictions ✅ Health checks enable monitoring ✅ Follow best practices for production