Run this notebook: Open in Colab Open in Kaggle

Phase 20: Real-Time Streaming — Start Here¶

Stream LLM outputs token-by-token, build live chat interfaces, and process real-time data feeds with AI.

Why Streaming?¶

Streaming makes AI feel instant. Instead of waiting 10 seconds for a full response, users see tokens appear immediately — dramatically improving perceived performance and UX.

Notebooks in This Phase¶

Notebook	Topic
`01_streaming_responses.ipynb`	OpenAI/Anthropic streaming APIs
`02_websocket_connections.ipynb`	Real-time bidirectional communication
`03_real_time_rag.ipynb`	Stream RAG results as they’re retrieved
`04_production_streaming.ipynb`	FastAPI + SSE for production streaming apps

Streaming Patterns¶

Pattern	Use Case	Tech
SSE (Server-Sent Events)	Chat UI, one-way stream	FastAPI, Flask
WebSocket	Interactive, bidirectional	FastAPI WS, Socket.IO
Async generator	Backend streaming pipeline	Python async/await
Kafka/Redis	High-throughput event streams	Kafka, Redis Streams

Quick Start — OpenAI Streaming¶

from openai import OpenAI
client = OpenAI()

with client.chat.completions.stream(
    model='gpt-4o',
    messages=[{'role': 'user', 'content': 'Tell me a story'}]
) as stream:
    for chunk in stream:
        print(chunk.choices[0].delta.content or '', end='', flush=True)

Prerequisites¶

LLM API basics (Phase 11)
Basic async Python helpful but not required

Learning Path¶

01_streaming_responses.ipynb     ← Start here
02_websocket_connections.ipynb
03_real_time_rag.ipynb
04_production_streaming.ipynb