Phase 20: Real-Time Streaming β€” Start HereΒΆ

Stream LLM outputs token-by-token, build live chat interfaces, and process real-time data feeds with AI.

Why Streaming?ΒΆ

Streaming makes AI feel instant. Instead of waiting 10 seconds for a full response, users see tokens appear immediately β€” dramatically improving perceived performance and UX.

Notebooks in This PhaseΒΆ

Notebook

Topic

01_streaming_responses.ipynb

OpenAI/Anthropic streaming APIs

02_websocket_connections.ipynb

Real-time bidirectional communication

03_real_time_rag.ipynb

Stream RAG results as they’re retrieved

04_production_streaming.ipynb

FastAPI + SSE for production streaming apps

Streaming PatternsΒΆ

Pattern

Use Case

Tech

SSE (Server-Sent Events)

Chat UI, one-way stream

FastAPI, Flask

WebSocket

Interactive, bidirectional

FastAPI WS, Socket.IO

Async generator

Backend streaming pipeline

Python async/await

Kafka/Redis

High-throughput event streams

Kafka, Redis Streams

Quick Start β€” OpenAI StreamingΒΆ

from openai import OpenAI
client = OpenAI()

with client.chat.completions.stream(
    model='gpt-4o',
    messages=[{'role': 'user', 'content': 'Tell me a story'}]
) as stream:
    for chunk in stream:
        print(chunk.choices[0].delta.content or '', end='', flush=True)

PrerequisitesΒΆ

  • LLM API basics (Phase 11)

  • Basic async Python helpful but not required

Learning PathΒΆ

01_streaming_responses.ipynb     ← Start here
02_websocket_connections.ipynb
03_real_time_rag.ipynb
04_production_streaming.ipynb