Phase 8: RAGΒΆ

🎯 Overview¢

Combine your skills from previous phases to build production-grade RAG systems!

Prerequisites:

  • βœ… Tokenization (Phase 3)

  • βœ… Embeddings (Phase 4)

  • βœ… Neural Networks (Phase 5)

  • βœ… Vector Databases (Phase 6)

Time: 3-4 weeks | 60-80 hours
Outcome: Build AI applications that can query your knowledge base

πŸ“š What You’ll LearnΒΆ

Core RAG ConceptsΒΆ

  • RAG architecture and pipeline

  • Document processing and chunking strategies

  • Retrieval methods (dense, sparse, hybrid)

  • Context management and prompt construction

  • Re-ranking and result filtering

  • LLM integration (OpenAI, Anthropic, local models)

Advanced RAG TechniquesΒΆ

  • Hybrid search (vector + keyword)

  • Query transformation and expansion

  • Multi-query retrieval

  • Parent-document retrieval

  • Self-query and metadata filtering

  • Conversation memory and context

πŸ—‚οΈ Module StructureΒΆ

08-rag/
β”œβ”€β”€ 00_START_HERE.ipynb           # RAG overview and quick demo
β”œβ”€β”€ 01_basic_rag.ipynb             # Simple RAG from scratch
β”œβ”€β”€ 02_document_processing.ipynb   # Chunking strategies
β”œβ”€β”€ 03_langchain_rag.ipynb         # Using LangChain framework
β”œβ”€β”€ 04_llamaindex_rag.ipynb        # Using LlamaIndex framework
β”œβ”€β”€ 05_advanced_retrieval.ipynb    # Hybrid search, re-ranking
β”œβ”€β”€ 06_conversation_rag.ipynb      # Chat with memory
β”œβ”€β”€ 07_evaluation.ipynb            # RAG evaluation metrics
β”œβ”€β”€ 09_advanced_retrieval.ipynb    # Parent-child retrieval, ensemble
β”œβ”€β”€ 10_graphrag_visual_rag.ipynb   # GraphRAG and multimodal RAG
β”œβ”€β”€ assignment.md                  # Phase assignment
β”œβ”€β”€ challenges.md                  # Hands-on challenges
└── README.md                      # This file

πŸš€ Quick StartΒΆ

1. Basic RAG PipelineΒΆ

# The fundamental RAG flow:
# 1. Index documents β†’ embeddings β†’ vector DB
# 2. User query β†’ embedding β†’ similarity search
# 3. Retrieved docs + query β†’ LLM β†’ answer

from sentence_transformers import SentenceTransformer
from your_vector_db import VectorDB  # Chroma, Qdrant, etc.
from openai import OpenAI

# 1. Index your documents
# Use any embedding model - see 05-embeddings/embedding_comparison.md for options
# API: Gemini Embedding (cheapest + best), Voyage 3.5, or OpenAI
# Local: Qwen3-Embedding, BGE-M3, or all-MiniLM-L6-v2
model = SentenceTransformer('all-MiniLM-L6-v2')  # local, fast
docs = ["Your documents here..."]
embeddings = model.encode(docs)
db.add(documents=docs, embeddings=embeddings)

# 2. Retrieve relevant context
query = "What is RAG?"
query_embedding = model.encode(query)
results = db.search(query_embedding, top_k=3)

# 3. Generate answer with LLM (Claude, GPT, Gemini, or local)
context = "\n".join(results)
prompt = f"Context: {context}\n\nQuestion: {query}\n\nAnswer:"
response = llm.generate(prompt)

πŸ“‹ Learning PathΒΆ

Week 1: RAG FundamentalsΒΆ

  • Complete 00_START_HERE.ipynb

  • Build basic RAG in 01_basic_rag.ipynb

  • Learn chunking strategies in 02_document_processing.ipynb

  • Project: Simple Q&A on your documents

Week 2: RAG FrameworksΒΆ

  • Learn LangChain in 03_langchain_rag.ipynb

  • Explore LlamaIndex in 04_llamaindex_rag.ipynb

  • Compare frameworks and choose your favorite

  • Project: Build a research paper assistant

Week 3: Advanced TechniquesΒΆ

  • Implement hybrid search in 05_advanced_retrieval.ipynb

  • Add conversation memory in 06_conversation_rag.ipynb

  • Learn evaluation in 07_evaluation.ipynb

  • Project: Code search system for your repos

Week 4: Production ProjectΒΆ

  • Build end-to-end RAG application

  • Add proper error handling

  • Implement caching and optimization

  • Deploy as API (preview of Phase 8)

  • Capstone: Personal knowledge assistant

πŸ› οΈ Technologies You’ll UseΒΆ

LLM Frameworks:

  • LangChain - Most popular, extensive ecosystem

  • LlamaIndex - Best for document indexing

  • Haystack - Production-focused

LLM Providers:

  • OpenAI (GPT-5.4, GPT-4.1, GPT-4.1-mini)

  • Anthropic (Claude Sonnet 4.6, Haiku 4.5)

  • Google (Gemini 3.1 Pro, Flash)

  • Local models (Qwen 3, Llama 4, DeepSeek R1 via Ollama)

Vector Databases:

  • Use what you learned in Phase 6!

  • Chroma, Qdrant, Weaviate, Milvus

Embeddings:

  • OpenAI embeddings (text-embedding-3-small/large)

  • Sentence Transformers (all-MiniLM-L6-v2, all-mpnet-base-v2)

  • Cohere embeddings

πŸ“Š Key Concepts ExplainedΒΆ

1. RAG PipelineΒΆ

Documents β†’ Split β†’ Embed β†’ Store in Vector DB
                                    ↓
User Query β†’ Embed β†’ Search β†’ Retrieve Top-K
                                    ↓
Retrieved Docs + Query β†’ LLM Prompt β†’ Answer

2. Chunking StrategiesΒΆ

Fixed-size chunks:

chunk_size = 512  # tokens or characters
overlap = 50      # overlap between chunks

Semantic chunks:

  • Split by paragraphs, sentences

  • Preserve document structure

  • Maintain context boundaries

Recursive splitting:

  • Try different separators (\n\n, \n, ., space)

  • Preserve hierarchy

3. Retrieval MethodsΒΆ

Dense (Vector Search):

  • Semantic similarity

  • Works for paraphrased queries

  • Requires embeddings

Sparse (Keyword Search):

  • BM25, TF-IDF

  • Exact keyword matching

  • Fast and interpretable

Hybrid:

  • Combine both approaches

  • Re-rank with RRF (Reciprocal Rank Fusion)

  • Best of both worlds

🎯 Projects¢

Project 1: Personal Documentation Q&AΒΆ

Build a chatbot that answers questions about your personal notes, docs, PDFs.

Features:

  • Upload PDFs, TXTs, Markdown files

  • Chunk and embed documents

  • Conversational interface

  • Source citation

Project 2: Code Search EngineΒΆ

Semantic search across your GitHub repositories.

Features:

  • Index code files (Python, JavaScript, etc.)

  • Search by intent (β€œhow to connect to database?”)

  • Show relevant code snippets

  • Explain code functionality

Project 3: Research AssistantΒΆ

Query academic papers and scientific literature.

Features:

  • Process research papers (PDFs)

  • Extract citations and references

  • Summarize papers

  • Compare multiple papers

Project 4: Customer Support BotΒΆ

RAG-powered FAQ system.

Features:

  • Index support documentation

  • Handle common questions

  • Escalate to human when needed

  • Track conversation context

πŸ“ˆ Evaluation MetricsΒΆ

Retrieval QualityΒΆ

  • Precision@K: Relevant docs in top K results

  • Recall@K: % of relevant docs retrieved

  • MRR (Mean Reciprocal Rank): Position of first relevant result

  • NDCG: Normalized Discounted Cumulative Gain

Generation QualityΒΆ

  • Faithfulness: Answer grounded in context

  • Relevance: Answer addresses the question

  • Correctness: Factually accurate

  • Human evaluation: User satisfaction

System MetricsΒΆ

  • Latency: Response time

  • Cost: API costs per query

  • Cache hit rate: Efficiency

πŸ’‘ Best PracticesΒΆ

Document ProcessingΒΆ

βœ… Chunk size: 256-1024 tokens (experiment!)
βœ… Overlap: 10-20% of chunk size
βœ… Preserve metadata (source, date, author)
βœ… Clean text (remove headers, footers)

RetrievalΒΆ

βœ… Retrieve 3-10 documents (balance context vs noise)
βœ… Use hybrid search when possible
βœ… Re-rank results for better quality
βœ… Filter by metadata when relevant

PromptingΒΆ

βœ… Provide clear instructions
βœ… Include relevant context only
βœ… Ask LLM to cite sources
βœ… Handle β€œI don’t know” cases

ProductionΒΆ

βœ… Cache embeddings and results
βœ… Monitor LLM costs
βœ… Implement rate limiting
βœ… Add error handling and retries

πŸ”— ResourcesΒΆ

DocumentationΒΆ

PapersΒΆ

CoursesΒΆ

ToolsΒΆ

βœ… Completion ChecklistΒΆ

Before moving to Phase 8 (MLOps), you should be able to:

  • Explain RAG architecture and benefits

  • Process and chunk documents effectively

  • Build basic RAG pipeline from scratch

  • Use LangChain or LlamaIndex

  • Implement hybrid search (dense + sparse)

  • Add conversation memory to chatbots

  • Evaluate RAG system quality

  • Deploy a working RAG application

  • Understand cost/latency tradeoffs

  • Handle edge cases and errors

πŸŽ“ What’s Next?ΒΆ

Phase 8: MLOps & Production β†’

  • Deploy RAG as scalable API

  • Monitor performance and costs

  • CI/CD for ML systems

  • Cloud deployment (AWS, Azure, GCP)

Phase 9: Specializations β†’

  • Multimodal RAG (images + text)

  • Agent systems with RAG

  • Advanced prompt engineering

Ready to build your first RAG system? β†’ Start with 00_START_HERE.ipynb

Questions? β†’ Check the assignment.md and challenges.md for practice exercises

πŸš€ Let’s build intelligent systems that can learn from your data!