Phase 20: Real-Time Streaming β Start HereΒΆ
Stream LLM outputs token-by-token, build live chat interfaces, and process real-time data feeds with AI.
Why Streaming?ΒΆ
Streaming makes AI feel instant. Instead of waiting 10 seconds for a full response, users see tokens appear immediately β dramatically improving perceived performance and UX.
Notebooks in This PhaseΒΆ
Notebook |
Topic |
|---|---|
|
OpenAI/Anthropic streaming APIs |
|
Real-time bidirectional communication |
|
Stream RAG results as theyβre retrieved |
|
FastAPI + SSE for production streaming apps |
Streaming PatternsΒΆ
Pattern |
Use Case |
Tech |
|---|---|---|
SSE (Server-Sent Events) |
Chat UI, one-way stream |
FastAPI, Flask |
WebSocket |
Interactive, bidirectional |
FastAPI WS, Socket.IO |
Async generator |
Backend streaming pipeline |
Python async/await |
Kafka/Redis |
High-throughput event streams |
Kafka, Redis Streams |
Quick Start β OpenAI StreamingΒΆ
from openai import OpenAI
client = OpenAI()
with client.chat.completions.stream(
model='gpt-4o',
messages=[{'role': 'user', 'content': 'Tell me a story'}]
) as stream:
for chunk in stream:
print(chunk.choices[0].delta.content or '', end='', flush=True)
PrerequisitesΒΆ
LLM API basics (Phase 11)
Basic async Python helpful but not required
Learning PathΒΆ
01_streaming_responses.ipynb β Start here
02_websocket_connections.ipynb
03_real_time_rag.ipynb
04_production_streaming.ipynb