Building ML APIs with FastAPIΒΆ
π― Learning ObjectivesΒΆ
Build REST APIs for ML models
Handle request validation
Create API documentation
Implement async endpoints
Add authentication and error handling
Why FastAPI?ΒΆ
Fast: High performance (comparable to NodeJS/Go)
Easy: Intuitive Python syntax
Auto-docs: Swagger UI out of the box
Type safety: Pydantic validation
Async: Native async/await support
# Install dependencies
# !pip install fastapi uvicorn pydantic
Hello World APIΒΆ
The simplest FastAPI application consists of an app instance and one or more route-decorated functions. The @app.get("/") decorator maps HTTP GET requests at the root path to the root() coroutine. FastAPI uses Pythonβs async/await natively, so every endpoint can handle concurrent requests without blocking. A /health endpoint is standard practice in production services β orchestrators like Kubernetes poll it to decide whether your container is alive and ready to receive traffic. Running uvicorn main:app --reload starts a development server with hot-reloading so code changes take effect immediately.
from fastapi import FastAPI
app = FastAPI(title="My First ML API")
@app.get("/")
async def root():
return {"message": "Hello, MLOps!"}
@app.get("/health")
async def health_check():
return {"status": "healthy"}
print("API created!")
print("Run with: uvicorn main:app --reload")
Request/Response Models with PydanticΒΆ
Pydantic models define the shape and validation rules for API inputs and outputs. By inheriting from BaseModel, you get automatic type checking, serialization, and OpenAPI schema generation. The Field(...) function adds constraints like min_length and max_length, which FastAPI enforces before your code even runs β returning a clear 422 error if validation fails. In ML APIs, Pydantic models serve as a contract: the TextInput class guarantees that every request contains valid text, while SentimentOutput documents exactly what the caller can expect back. This eliminates an entire class of runtime bugs caused by malformed inputs.
from pydantic import BaseModel, Field
from typing import List
class TextInput(BaseModel):
text: str = Field(..., min_length=1, max_length=1000)
language: str = Field(default="en")
class SentimentOutput(BaseModel):
label: str
confidence: float
scores: dict
# Example usage
sample_input = TextInput(
text="FastAPI is amazing!",
language="en"
)
print(sample_input.model_dump())
ML Model EndpointΒΆ
Bringing a trained model into a FastAPI endpoint involves three stages: load the model at startup, define request/response schemas, and implement the prediction route. The model and vectorizer are loaded once when the module initializes (not per-request), keeping latency low. Inside the /predict handler, the input text is transformed using the same TfidfVectorizer used during training β a critical requirement, since the feature space must match. The predict_proba() method returns class probabilities, giving callers a confidence score alongside the label. Wrapping the logic in a try/except that raises HTTPException ensures the API returns structured error responses rather than crashing silently.
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
import joblib
app = FastAPI(title="Sentiment Analysis API")
# Mock training data
train_texts = [
"I love this product",
"This is amazing",
"Terrible experience",
"Very disappointing"
]
train_labels = [1, 1, 0, 0] # 1=positive, 0=negative
# Train simple model
vectorizer = TfidfVectorizer(max_features=100)
X_train = vectorizer.fit_transform(train_texts)
model = MultinomialNB()
model.fit(X_train, train_labels)
print("β Model trained")
class TextRequest(BaseModel):
text: str
class SentimentResponse(BaseModel):
sentiment: str
confidence: float
@app.post("/predict", response_model=SentimentResponse)
async def predict_sentiment(request: TextRequest):
try:
# Vectorize input
X = vectorizer.transform([request.text])
# Predict
prediction = model.predict(X)[0]
probabilities = model.predict_proba(X)[0]
sentiment = "positive" if prediction == 1 else "negative"
confidence = float(probabilities[prediction])
return SentimentResponse(
sentiment=sentiment,
confidence=confidence
)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
print("β Sentiment endpoint created")
Batch PredictionsΒΆ
Real-world applications often need to classify many items at once β for example, scoring an entire email inbox or labeling a batch of customer reviews. A batch prediction endpoint accepts a list of inputs and returns a list of results in a single HTTP round-trip, which is far more efficient than calling /predict in a loop. The BatchRequest model wraps a List[str], and the response mirrors it with a List[SentimentResponse]. For even higher throughput, you could vectorize the entire batch at once with vectorizer.transform(request.texts) and call model.predict() on the full matrix, leveraging NumPyβs vectorized operations.
from typing import List
class BatchRequest(BaseModel):
texts: List[str]
class BatchResponse(BaseModel):
predictions: List[SentimentResponse]
@app.post("/batch_predict", response_model=BatchResponse)
async def batch_predict(request: BatchRequest):
predictions = []
for text in request.texts:
X = vectorizer.transform([text])
pred = model.predict(X)[0]
proba = model.predict_proba(X)[0]
predictions.append(SentimentResponse(
sentiment="positive" if pred == 1 else "negative",
confidence=float(proba[pred])
))
return BatchResponse(predictions=predictions)
print("β Batch endpoint created")
Error HandlingΒΆ
Robust APIs validate inputs explicitly and return informative error messages with appropriate HTTP status codes. FastAPIβs HTTPException lets you raise errors with a specific status_code and detail message that gets serialized as JSON. The pattern below checks text length constraints before running inference, returning 400 Bad Request for invalid inputs rather than letting the model process garbage data. In production ML APIs, input validation is your first line of defense against adversarial inputs, excessively long sequences that could cause out-of-memory errors, and other edge cases that the model was never trained to handle.
from fastapi import HTTPException, status
@app.post("/predict_with_validation")
async def predict_with_validation(request: TextRequest):
# Validate input
if len(request.text) < 3:
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="Text must be at least 3 characters"
)
if len(request.text) > 1000:
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="Text too long (max 1000 characters)"
)
# Process request
X = vectorizer.transform([request.text])
prediction = model.predict(X)[0]
return {
"sentiment": "positive" if prediction == 1 else "negative"
}
API DocumentationΒΆ
FastAPI automatically generates interactive documentation:
Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc
No extra work needed!
Running the APIΒΆ
# Development mode (auto-reload)
uvicorn main:app --reload
# Production mode
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4
Testing the APIΒΆ
# Using curl
curl -X POST "http://localhost:8000/predict" \
-H "Content-Type: application/json" \
-d '{"text": "This is fantastic!"}'
# Using Python
import requests
response = requests.post(
"http://localhost:8000/predict",
json={"text": "This is fantastic!"}
)
print(response.json())
Best PracticesΒΆ
Use Pydantic models for request/response validation
Add proper error handling with HTTPException
Implement health checks for monitoring
Use async/await for I/O operations
Add API versioning (/v1/predict, /v2/predict)
Document endpoints with docstrings
Add rate limiting for production
Implement authentication for security
Key TakeawaysΒΆ
β FastAPI makes building ML APIs simple β Pydantic ensures type safety and validation β Auto-generated docs save time β Async support for better performance β Ready for production with proper error handling