Building ML APIs with FastAPIΒΆ

🎯 Learning Objectives¢

  • Build REST APIs for ML models

  • Handle request validation

  • Create API documentation

  • Implement async endpoints

  • Add authentication and error handling

Why FastAPI?ΒΆ

  • Fast: High performance (comparable to NodeJS/Go)

  • Easy: Intuitive Python syntax

  • Auto-docs: Swagger UI out of the box

  • Type safety: Pydantic validation

  • Async: Native async/await support

# Install dependencies
# !pip install fastapi uvicorn pydantic

Hello World APIΒΆ

The simplest FastAPI application consists of an app instance and one or more route-decorated functions. The @app.get("/") decorator maps HTTP GET requests at the root path to the root() coroutine. FastAPI uses Python’s async/await natively, so every endpoint can handle concurrent requests without blocking. A /health endpoint is standard practice in production services – orchestrators like Kubernetes poll it to decide whether your container is alive and ready to receive traffic. Running uvicorn main:app --reload starts a development server with hot-reloading so code changes take effect immediately.

from fastapi import FastAPI

app = FastAPI(title="My First ML API")

@app.get("/")
async def root():
    return {"message": "Hello, MLOps!"}

@app.get("/health")
async def health_check():
    return {"status": "healthy"}

print("API created!")
print("Run with: uvicorn main:app --reload")

Request/Response Models with PydanticΒΆ

Pydantic models define the shape and validation rules for API inputs and outputs. By inheriting from BaseModel, you get automatic type checking, serialization, and OpenAPI schema generation. The Field(...) function adds constraints like min_length and max_length, which FastAPI enforces before your code even runs – returning a clear 422 error if validation fails. In ML APIs, Pydantic models serve as a contract: the TextInput class guarantees that every request contains valid text, while SentimentOutput documents exactly what the caller can expect back. This eliminates an entire class of runtime bugs caused by malformed inputs.

from pydantic import BaseModel, Field
from typing import List

class TextInput(BaseModel):
    text: str = Field(..., min_length=1, max_length=1000)
    language: str = Field(default="en")

class SentimentOutput(BaseModel):
    label: str
    confidence: float
    scores: dict

# Example usage
sample_input = TextInput(
    text="FastAPI is amazing!",
    language="en"
)

print(sample_input.model_dump())

ML Model EndpointΒΆ

Bringing a trained model into a FastAPI endpoint involves three stages: load the model at startup, define request/response schemas, and implement the prediction route. The model and vectorizer are loaded once when the module initializes (not per-request), keeping latency low. Inside the /predict handler, the input text is transformed using the same TfidfVectorizer used during training – a critical requirement, since the feature space must match. The predict_proba() method returns class probabilities, giving callers a confidence score alongside the label. Wrapping the logic in a try/except that raises HTTPException ensures the API returns structured error responses rather than crashing silently.

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
import joblib

app = FastAPI(title="Sentiment Analysis API")

# Mock training data
train_texts = [
    "I love this product",
    "This is amazing",
    "Terrible experience",
    "Very disappointing"
]
train_labels = [1, 1, 0, 0]  # 1=positive, 0=negative

# Train simple model
vectorizer = TfidfVectorizer(max_features=100)
X_train = vectorizer.fit_transform(train_texts)
model = MultinomialNB()
model.fit(X_train, train_labels)

print("βœ“ Model trained")

class TextRequest(BaseModel):
    text: str

class SentimentResponse(BaseModel):
    sentiment: str
    confidence: float

@app.post("/predict", response_model=SentimentResponse)
async def predict_sentiment(request: TextRequest):
    try:
        # Vectorize input
        X = vectorizer.transform([request.text])
        
        # Predict
        prediction = model.predict(X)[0]
        probabilities = model.predict_proba(X)[0]
        
        sentiment = "positive" if prediction == 1 else "negative"
        confidence = float(probabilities[prediction])
        
        return SentimentResponse(
            sentiment=sentiment,
            confidence=confidence
        )
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

print("βœ“ Sentiment endpoint created")

Batch PredictionsΒΆ

Real-world applications often need to classify many items at once – for example, scoring an entire email inbox or labeling a batch of customer reviews. A batch prediction endpoint accepts a list of inputs and returns a list of results in a single HTTP round-trip, which is far more efficient than calling /predict in a loop. The BatchRequest model wraps a List[str], and the response mirrors it with a List[SentimentResponse]. For even higher throughput, you could vectorize the entire batch at once with vectorizer.transform(request.texts) and call model.predict() on the full matrix, leveraging NumPy’s vectorized operations.

from typing import List

class BatchRequest(BaseModel):
    texts: List[str]

class BatchResponse(BaseModel):
    predictions: List[SentimentResponse]

@app.post("/batch_predict", response_model=BatchResponse)
async def batch_predict(request: BatchRequest):
    predictions = []
    
    for text in request.texts:
        X = vectorizer.transform([text])
        pred = model.predict(X)[0]
        proba = model.predict_proba(X)[0]
        
        predictions.append(SentimentResponse(
            sentiment="positive" if pred == 1 else "negative",
            confidence=float(proba[pred])
        ))
    
    return BatchResponse(predictions=predictions)

print("βœ“ Batch endpoint created")

Error HandlingΒΆ

Robust APIs validate inputs explicitly and return informative error messages with appropriate HTTP status codes. FastAPI’s HTTPException lets you raise errors with a specific status_code and detail message that gets serialized as JSON. The pattern below checks text length constraints before running inference, returning 400 Bad Request for invalid inputs rather than letting the model process garbage data. In production ML APIs, input validation is your first line of defense against adversarial inputs, excessively long sequences that could cause out-of-memory errors, and other edge cases that the model was never trained to handle.

from fastapi import HTTPException, status

@app.post("/predict_with_validation")
async def predict_with_validation(request: TextRequest):
    # Validate input
    if len(request.text) < 3:
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail="Text must be at least 3 characters"
        )
    
    if len(request.text) > 1000:
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail="Text too long (max 1000 characters)"
        )
    
    # Process request
    X = vectorizer.transform([request.text])
    prediction = model.predict(X)[0]
    
    return {
        "sentiment": "positive" if prediction == 1 else "negative"
    }

API DocumentationΒΆ

FastAPI automatically generates interactive documentation:

No extra work needed!

Running the APIΒΆ

# Development mode (auto-reload)
uvicorn main:app --reload

# Production mode
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4

Testing the APIΒΆ

# Using curl
curl -X POST "http://localhost:8000/predict" \
  -H "Content-Type: application/json" \
  -d '{"text": "This is fantastic!"}'

# Using Python
import requests

response = requests.post(
    "http://localhost:8000/predict",
    json={"text": "This is fantastic!"}
)

print(response.json())

Best PracticesΒΆ

  1. Use Pydantic models for request/response validation

  2. Add proper error handling with HTTPException

  3. Implement health checks for monitoring

  4. Use async/await for I/O operations

  5. Add API versioning (/v1/predict, /v2/predict)

  6. Document endpoints with docstrings

  7. Add rate limiting for production

  8. Implement authentication for security

Key TakeawaysΒΆ

βœ… FastAPI makes building ML APIs simple βœ… Pydantic ensures type safety and validation βœ… Auto-generated docs save time βœ… Async support for better performance βœ… Ready for production with proper error handling