Containerizing ML Applications with DockerΒΆ
π― Learning ObjectivesΒΆ
Understand Docker concepts
Build Docker images for ML apps
Create multi-stage builds
Optimize image size
Use Docker Compose for multi-container apps
Why Docker?ΒΆ
Problems without Docker:
βWorks on my machineβ syndrome
Dependency conflicts
Environment inconsistencies
Difficult deployment
Docker benefits:
Consistent environments
Easy deployment
Isolation
Reproducibility
Scalability
Docker BasicsΒΆ
Key ConceptsΒΆ
Image: Blueprint for containers (like a class)
Container: Running instance of an image (like an object)
Dockerfile: Instructions to build an image
Registry: Storage for images (Docker Hub, ECR, etc.)
Basic CommandsΒΆ
# Build image
docker build -t myapp:v1 .
# Run container
docker run -p 8000:8000 myapp:v1
# List containers
docker ps
# Stop container
docker stop <container_id>
# Remove container
docker rm <container_id>
Simple ML API DockerfileΒΆ
A Dockerfile is a recipe that describes how to build an image layer by layer. The pattern below follows a standard convention for Python ML applications: start from a slim base image, install dependencies first (to leverage Dockerβs layer caching), then copy the application code. The WORKDIR /app directive sets the working directory inside the container, EXPOSE 8000 documents which port the app listens on, and CMD specifies the default command when the container starts. By copying requirements.txt before the rest of the code, Docker can skip reinstalling packages when only your application logic changes β a significant speedup during development.
# Create a simple Dockerfile
dockerfile_content = '''FROM python:3.11-slim
WORKDIR /app
# Copy requirements
COPY requirements.txt .
# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Copy application
COPY . .
# Expose port
EXPOSE 8000
# Run application
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
'''
# Save to file (in practice)
print("Dockerfile created:")
print(dockerfile_content)
Requirements FileΒΆ
Pinning exact dependency versions in requirements.txt is essential for reproducible builds. Without version pins, running pip install on different days could pull different package versions, leading to subtle bugs or broken models. For ML applications, this is especially important because numerical libraries like NumPy and scikit-learn can produce slightly different results across versions. The --no-cache-dir flag used in the Dockerfile prevents pip from storing downloaded packages in the container, keeping the image size smaller.
requirements = '''fastapi==0.104.1
uvicorn[standard]==0.24.0
scikit-learn==1.3.2
joblib==1.3.2
numpy==1.26.2
pydantic==2.5.0
'''
print("requirements.txt:")
print(requirements)
Building and RunningΒΆ
# Build the image
docker build -t ml-api:v1 .
# Run the container
docker run -d -p 8000:8000 --name ml-api ml-api:v1
# Check logs
docker logs ml-api
# Test the API
curl http://localhost:8000/health
Multi-Stage Build (Optimized)ΒΆ
A multi-stage build uses two or more FROM statements to separate the build environment from the runtime environment. The first stage (builder) installs compilers like gcc and g++ needed to build C-extension packages (NumPy, scikit-learn), then pip-installs everything into the user directory. The second stage starts from a clean python:3.11-slim image and copies only the installed packages, leaving behind all build tools. The result is a dramatically smaller image β often 50-70% reduction β which means faster pulls, lower storage costs, and a reduced attack surface since the production container has no compilers or development headers.
optimized_dockerfile = '''# Stage 1: Build dependencies
FROM python:3.11-slim AS builder
WORKDIR /app
# Install build dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
gcc \
g++ \
&& rm -rf /var/lib/apt/lists/*
# Install Python packages
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt
# Stage 2: Runtime
FROM python:3.11-slim
WORKDIR /app
# Copy only necessary files from builder
COPY --from=builder /root/.local /root/.local
# Copy application code
COPY main.py .
COPY models/ models/
# Update PATH
ENV PATH=/root/.local/bin:$PATH
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
'''
print("Optimized Multi-Stage Dockerfile:")
print(optimized_dockerfile)
print("\nBenefits:")
print("- Smaller image size (no build tools in final image)")
print("- Faster deployment")
print("- Better security (fewer packages)")
Docker Compose for Multi-Container AppsΒΆ
Production ML systems rarely consist of a single container. Docker Compose lets you define and run multi-container applications with a single YAML file. The configuration below orchestrates three services: the ML API, a Redis cache for storing prediction results, and Prometheus for metrics collection. The depends_on directive ensures Redis starts before the API, and volumes mounts let you share model files and configuration between the host and containers. In development, docker-compose up -d starts the entire stack with one command; in production, this same topology translates to Kubernetes manifests or cloud-native equivalents.
docker_compose = '''version: '3.8'
services:
# ML API Service
api:
build: .
ports:
- "8000:8000"
environment:
- MODEL_PATH=/app/models
- REDIS_HOST=redis
depends_on:
- redis
volumes:
- ./models:/app/models
# Redis for caching
redis:
image: redis:7-alpine
ports:
- "6379:6379"
# Monitoring
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
'''
print("docker-compose.yml:")
print(docker_compose)
print("\nUsage:")
print(" docker-compose up -d # Start all services")
print(" docker-compose down # Stop all services")
print(" docker-compose logs -f # View logs")
API with Redis CachingΒΆ
Caching is one of the simplest ways to reduce inference latency and compute costs. If the same input appears repeatedly β common in recommendation systems and search β you can store the prediction and return it instantly on subsequent requests. The implementation below uses Redis as an external cache, hashing the input features with MD5 to create a deterministic cache key. The setex method stores results with a 3600-second TTL (time to live), ensuring stale predictions are eventually evicted. Returning a cached flag in the response helps monitoring systems distinguish cache hits from actual model invocations.
from fastapi import FastAPI
import redis
import json
import hashlib
app = FastAPI()
# Connect to Redis
try:
cache = redis.Redis(host='redis', port=6379, decode_responses=True)
cache.ping()
print("β Connected to Redis")
except:
cache = None
print("β οΈ Redis not available, running without cache")
def get_cache_key(data):
"""Generate cache key from input"""
return hashlib.md5(json.dumps(data, sort_keys=True).encode()).hexdigest()
@app.post("/predict")
async def predict(features: dict):
# Check cache
if cache:
cache_key = get_cache_key(features)
cached_result = cache.get(cache_key)
if cached_result:
print("β Cache hit")
return {"prediction": json.loads(cached_result), "cached": True}
# Perform prediction (mock)
result = {"class": 0, "confidence": 0.95}
# Cache result (1 hour TTL)
if cache:
cache.setex(cache_key, 3600, json.dumps(result))
print("β Result cached")
return {"prediction": result, "cached": False}
print("API with caching ready")
Best PracticesΒΆ
Image OptimizationΒΆ
Use specific base images
FROM python:3.11-slim # Not 'latest'
Minimize layers
# Bad (multiple layers) RUN pip install pandas RUN pip install numpy # Good (single layer) RUN pip install pandas numpy
Use .dockerignore
__pycache__ *.pyc .git .env tests/ *.md
Order matters (cache)
# Copy requirements first (changes less often) COPY requirements.txt . RUN pip install -r requirements.txt # Copy code last (changes often) COPY . .
Health ChecksΒΆ
Dockerβs HEALTHCHECK instruction defines a command that runs periodically inside the container to verify the application is still functioning. If the health check fails consecutively (controlled by --retries), Docker marks the container as unhealthy, which orchestrators like Docker Swarm or Kubernetes use to trigger automatic restarts. The --start-period gives the application time to initialize before health checks begin β important for ML APIs that need to load large models into memory at startup. A well-configured health check is the difference between a service that self-heals and one that silently serves errors.
healthcheck_dockerfile = '''FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
# Add health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=40s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
'''
print("Dockerfile with health check:")
print(healthcheck_dockerfile)
Deployment WorkflowΒΆ
1. BuildΒΆ
docker build -t myapp:v1.0 .
2. Test LocallyΒΆ
docker run -p 8000:8000 myapp:v1.0
curl http://localhost:8000/health
3. Tag for RegistryΒΆ
docker tag myapp:v1.0 myregistry.com/myapp:v1.0
4. Push to RegistryΒΆ
docker push myregistry.com/myapp:v1.0
5. DeployΒΆ
# Pull and run on production server
docker pull myregistry.com/myapp:v1.0
docker run -d -p 8000:8000 myregistry.com/myapp:v1.0
Key TakeawaysΒΆ
β Docker ensures consistent environments β Multi-stage builds reduce image size β Docker Compose manages multi-container apps β Caching speeds up predictions β Health checks enable monitoring β Follow best practices for production