Phase 8: RAG¶

🎯 Overview¶

Combine your skills from previous phases to build production-grade RAG systems!

Prerequisites:

✅ Tokenization (Phase 3)
✅ Embeddings (Phase 4)
✅ Neural Networks (Phase 5)
✅ Vector Databases (Phase 6)

Time: 3-4 weeks | 60-80 hours
Outcome: Build AI applications that can query your knowledge base

🗂️ Module Structure¶

08-rag/
├── 00_START_HERE.ipynb           # RAG overview and quick demo
├── 01_basic_rag.ipynb             # Simple RAG from scratch
├── 02_document_processing.ipynb   # Chunking strategies
├── 03_langchain_rag.ipynb         # Using LangChain framework
├── 04_llamaindex_rag.ipynb        # Using LlamaIndex framework
├── 05_advanced_retrieval.ipynb    # Hybrid search, re-ranking
├── 06_conversation_rag.ipynb      # Chat with memory
├── 07_evaluation.ipynb            # RAG evaluation metrics
├── 09_advanced_retrieval.ipynb    # Parent-child retrieval, ensemble
├── 10_graphrag_visual_rag.ipynb   # GraphRAG and multimodal RAG
├── assignment.md                  # Phase assignment
├── challenges.md                  # Hands-on challenges
└── README.md                      # This file

🚀 Quick Start¶

1. Basic RAG Pipeline¶

# The fundamental RAG flow:
# 1. Index documents → embeddings → vector DB
# 2. User query → embedding → similarity search
# 3. Retrieved docs + query → LLM → answer

from sentence_transformers import SentenceTransformer
from your_vector_db import VectorDB  # Chroma, Qdrant, etc.
from openai import OpenAI

# 1. Index your documents
# Use any embedding model - see 05-embeddings/embedding_comparison.md for options
# API: Gemini Embedding (cheapest + best), Voyage 3.5, or OpenAI
# Local: Qwen3-Embedding, BGE-M3, or all-MiniLM-L6-v2
model = SentenceTransformer('all-MiniLM-L6-v2')  # local, fast
docs = ["Your documents here..."]
embeddings = model.encode(docs)
db.add(documents=docs, embeddings=embeddings)

# 2. Retrieve relevant context
query = "What is RAG?"
query_embedding = model.encode(query)
results = db.search(query_embedding, top_k=3)

# 3. Generate answer with LLM (Claude, GPT, Gemini, or local)
context = "\n".join(results)
prompt = f"Context: {context}\n\nQuestion: {query}\n\nAnswer:"
response = llm.generate(prompt)

🛠️ Technologies You’ll Use¶

LLM Frameworks:

LangChain - Most popular, extensive ecosystem
LlamaIndex - Best for document indexing
Haystack - Production-focused

LLM Providers:

OpenAI (GPT-5.4, GPT-4.1, GPT-4.1-mini)
Anthropic (Claude Sonnet 4.6, Haiku 4.5)
Google (Gemini 3.1 Pro, Flash)
Local models (Qwen 3, Llama 4, DeepSeek R1 via Ollama)

Vector Databases:

Use what you learned in Phase 6!
Chroma, Qdrant, Weaviate, Milvus

Embeddings:

OpenAI embeddings (text-embedding-3-small/large)
Sentence Transformers (all-MiniLM-L6-v2, all-mpnet-base-v2)
Cohere embeddings

📊 Key Concepts Explained¶

1. RAG Pipeline¶

Documents → Split → Embed → Store in Vector DB
                                    ↓
User Query → Embed → Search → Retrieve Top-K
                                    ↓
Retrieved Docs + Query → LLM Prompt → Answer

2. Chunking Strategies¶

Fixed-size chunks:

chunk_size = 512  # tokens or characters
overlap = 50      # overlap between chunks

Semantic chunks:

Split by paragraphs, sentences
Preserve document structure
Maintain context boundaries

Recursive splitting:

Try different separators (\n\n, \n, ., space)
Preserve hierarchy

3. Retrieval Methods¶

Dense (Vector Search):

Semantic similarity
Works for paraphrased queries
Requires embeddings

Sparse (Keyword Search):

BM25, TF-IDF
Exact keyword matching
Fast and interpretable

Hybrid:

Combine both approaches
Re-rank with RRF (Reciprocal Rank Fusion)
Best of both worlds

🎯 Projects¶

Project 1: Personal Documentation Q&A¶

Build a chatbot that answers questions about your personal notes, docs, PDFs.

Features:

Upload PDFs, TXTs, Markdown files
Chunk and embed documents
Conversational interface
Source citation

Project 2: Code Search Engine¶

Semantic search across your GitHub repositories.

Features:

Index code files (Python, JavaScript, etc.)
Search by intent (“how to connect to database?”)
Show relevant code snippets
Explain code functionality

Project 3: Research Assistant¶

Query academic papers and scientific literature.

Features:

Process research papers (PDFs)
Extract citations and references
Summarize papers
Compare multiple papers

Project 4: Customer Support Bot¶

RAG-powered FAQ system.

Features:

Index support documentation
Handle common questions
Escalate to human when needed
Track conversation context

📈 Evaluation Metrics¶

Retrieval Quality¶

Precision@K: Relevant docs in top K results
Recall@K: % of relevant docs retrieved
MRR (Mean Reciprocal Rank): Position of first relevant result
NDCG: Normalized Discounted Cumulative Gain

Generation Quality¶

Faithfulness: Answer grounded in context
Relevance: Answer addresses the question
Correctness: Factually accurate
Human evaluation: User satisfaction

System Metrics¶

Latency: Response time
Cost: API costs per query
Cache hit rate: Efficiency

💡 Best Practices¶

Document Processing¶

✅ Chunk size: 256-1024 tokens (experiment!)
✅ Overlap: 10-20% of chunk size
✅ Preserve metadata (source, date, author)
✅ Clean text (remove headers, footers)

Retrieval¶

✅ Retrieve 3-10 documents (balance context vs noise)
✅ Use hybrid search when possible
✅ Re-rank results for better quality
✅ Filter by metadata when relevant

Prompting¶

✅ Provide clear instructions
✅ Include relevant context only
✅ Ask LLM to cite sources
✅ Handle “I don’t know” cases

Production¶

✅ Cache embeddings and results
✅ Monitor LLM costs
✅ Implement rate limiting
✅ Add error handling and retries

🔗 Resources¶

Documentation¶

Papers¶

Courses¶

Tools¶

Ollama - Run local LLMs
Chroma - Vector database
LangSmith - RAG evaluation

🎓 What’s Next?¶

Phase 8: MLOps & Production →

Deploy RAG as scalable API
Monitor performance and costs
CI/CD for ML systems
Cloud deployment (AWS, Azure, GCP)

Phase 9: Specializations →

Multimodal RAG (images + text)
Agent systems with RAG
Advanced prompt engineering

Ready to build your first RAG system? → Start with 00_START_HERE.ipynb

Questions? → Check the assignment.md and challenges.md for practice exercises

🚀 Let’s build intelligent systems that can learn from your data!

Phase 8: RAG¶

🎯 Overview¶

📚 What You’ll Learn¶

Core RAG Concepts¶

Advanced RAG Techniques¶

🗂️ Module Structure¶

🚀 Quick Start¶

1. Basic RAG Pipeline¶

📋 Learning Path¶

Week 1: RAG Fundamentals¶

Week 2: RAG Frameworks¶

Week 3: Advanced Techniques¶

Week 4: Production Project¶

🛠️ Technologies You’ll Use¶

📊 Key Concepts Explained¶

1. RAG Pipeline¶

2. Chunking Strategies¶

3. Retrieval Methods¶

🎯 Projects¶

Project 1: Personal Documentation Q&A¶

Project 2: Code Search Engine¶

Project 3: Research Assistant¶

Project 4: Customer Support Bot¶

📈 Evaluation Metrics¶

Retrieval Quality¶

Generation Quality¶

System Metrics¶

💡 Best Practices¶

Document Processing¶

Retrieval¶

Prompting¶

Production¶

🔗 Resources¶

Documentation¶

Papers¶

Courses¶

Tools¶

✅ Completion Checklist¶

🎓 What’s Next?¶

Site Navigation¶