Phase 7: Retrieval-Augmented Generation (RAG) - Pre-QuizΒΆ
Time: 15 minutes
Questions: 10
Passing Score: 70%
Purpose: Assess your baseline knowledge before learning about RAG systems
Question 1 (Easy)ΒΆ
What does RAG stand for?
A) Random Access Generation
B) Retrieval-Augmented Generation β
C) Recursive Automated Generation
D) Rapid AI Gateway
Explanation
Answer: B) Retrieval-Augmented Generation
RAG is a technique that combines information retrieval with language generation. It retrieves relevant documents/passages and uses them to augment the LLMβs response, grounding answers in factual sources.
Reference: Phase 7 - Introduction to RAG
Question 2 (Easy)ΒΆ
What is the main problem that RAG solves?
A) Slow inference speed
B) LLM hallucinations and outdated knowledge β
C) High training costs
D) Model size limitations
Explanation
Answer: B) LLM hallucinations and outdated knowledge
RAG addresses:
Hallucinations: LLMs making up facts β RAG grounds responses in retrieved documents
Outdated knowledge: LLMs trained on old data β RAG accesses current information
Domain-specific needs: LLMs lack specialized knowledge β RAG retrieves from custom knowledge bases
Reference: Phase 7 - Why RAG?
Question 3 (Medium)ΒΆ
What are the three main components of a RAG system?
A) Train, Test, Deploy
B) Indexing, Retrieval, Generation β
C) Input, Process, Output
D) Encode, Store, Decode
Explanation
Answer: B) Indexing, Retrieval, Generation
RAG Pipeline:
Indexing (Offline):
Split documents into chunks
Create embeddings
Store in vector database
Retrieval (Query time):
Embed user query
Find similar chunks
Retrieve top-k relevant passages
Generation (Query time):
Combine query + retrieved docs
Send to LLM
Generate grounded response
Reference: Phase 7 - RAG Architecture
Question 4 (Medium)ΒΆ
What is a vector embedding?
A) A compressed file format
B) A numerical representation of text in high-dimensional space β
C) A type of database index
D) A machine learning model
Explanation
Answer: B) A numerical representation of text in high-dimensional space
Vector Embedding:
Converts text β array of numbers (e.g., 384, 768, or 1536 dimensions)
Similar meaning β similar vectors (close in vector space)
Enables semantic search (beyond keyword matching)
Example:
"dog" β [0.12, -0.34, 0.56, ..., 0.78] # 384 dims
"puppy" β [0.15, -0.31, 0.54, ..., 0.81] # Similar!
"car" β [-0.45, 0.67, -0.23, ..., -0.12] # Different
Reference: Phase 7 - Embeddings & Vector Search
Question 5 (Medium)ΒΆ
What is the purpose of βchunkingβ documents in RAG?
A) To compress file size
B) To break large documents into smaller, retrievable pieces β
C) To encrypt data
D) To format text for display
Explanation
Answer: B) To break large documents into smaller, retrievable pieces
Why chunk?
LLMs have token limits (canβt process entire books)
Retrieve only relevant sections (not entire documents)
Better precision (specific paragraphs vs. whole pages)
Chunking strategies:
Fixed size: 500 tokens per chunk
Sentence-based: Complete sentences only
Semantic: Chunks by topic/section
Overlap: Chunks share 50-100 tokens
Reference: Phase 7 - Document Processing
Question 6 (Hard)ΒΆ
Which similarity metric is commonly used to find relevant chunks?
A) Euclidean distance
B) Manhattan distance
C) Cosine similarity β
D) Hamming distance
Explanation
Answer: C) Cosine similarity
Cosine Similarity:
similarity = (A Β· B) / (||A|| * ||B||)
Range: -1 to +1
+1: Identical direction
0: Orthogonal (unrelated)
-1: Opposite direction
Why cosine?
Measures angle, not magnitude
Works well for high-dimensional text embeddings
Efficient to compute
Alternatives:
Dot product: Faster but magnitude-sensitive
Euclidean: Can work but less common
Reference: Phase 7 - Similarity Search
Question 7 (Medium)ΒΆ
What is a vector database?
A) A SQL database with vectors
B) A specialized database optimized for similarity search β
C) A NoSQL document store
D) A graph database
Explanation
Answer: B) A specialized database optimized for similarity search
Vector Databases:
Store high-dimensional embeddings efficiently
Perform fast approximate nearest neighbor (ANN) search
Scale to millions/billions of vectors
Popular vector DBs:
Pinecone (managed)
Weaviate (open-source)
Qdrant (open-source)
Chroma (lightweight)
FAISS (Facebook AI library)
Traditional databases canβt: Efficiently search βfind most similar vectorsβ at scale
Reference: Phase 9 - Vector Databases
Question 8 (Hard)ΒΆ
In a RAG system, what is the βcontext windowβ?
A) The time limit for queries
B) The maximum tokens the LLM can process β
C) The number of documents indexed
D) The embedding dimension
Explanation
Answer: B) The maximum tokens the LLM can process
Context Window:
Maximum tokens LLM can handle in one request
Includes: system prompt + retrieved docs + user query + response
Example (GPT-3.5):
Context window: 4,096 tokens
System prompt: 500 tokens
Retrieved docs: 2,000 tokens
User query: 100 tokens
Available for response: 1,496 tokens
Modern models:
GPT-4: 8K-128K tokens
Claude 3: 200K tokens
Gemini 1.5: 1M tokens
RAG consideration: Must fit retrieved chunks within context window!
Reference: Phase 7 - LLM Context Limits
Question 9 (Medium)ΒΆ
What does βkβ represent in βtop-k retrievalβ?
A) The number of keywords
B) The number of documents to retrieve β
C) The embedding dimension
D) The chunk size
Explanation
Answer: B) The number of documents to retrieve
Top-k Retrieval:
Retrieve the k most similar chunks
Common values: k = 3, 5, or 10
Example:
# Query: "What is RAG?"
# Vector DB contains 10,000 chunks
# top_k = 5
results = vector_db.search(query_embedding, top_k=5)
# Returns 5 most relevant chunks
Trade-offs:
Small k (3): Fast, focused, but might miss context
Large k (10): More context, but slower and noisier
Reference: Phase 7 - Retrieval Strategies
Question 10 (Hard)ΒΆ
What is the main difference between RAG and fine-tuning?
A) RAG is faster to deploy
B) RAG retrieves external knowledge; fine-tuning updates model weights β
C) RAG is more accurate
D) Fine-tuning doesnβt require data
Explanation
Answer: B) RAG retrieves external knowledge; fine-tuning updates model weights
Aspect |
RAG |
Fine-Tuning |
|---|---|---|
Knowledge source |
External documents (retrieved) |
Model parameters (learned) |
Updates |
Add/remove documents anytime |
Retrain model (expensive) |
Use case |
Factual Q&A, current data |
Domain adaptation, style |
Cost |
Low (no retraining) |
High (GPU hours) |
Transparency |
Can cite sources |
Black box |
When to use:
RAG: Frequently changing data, need citations
Fine-tuning: Specialized behavior, consistent style
Both: Often combined for best results!
Reference: Phase 7 - RAG vs Fine-Tuning
Scoring GuideΒΆ
0-3 correct (0-30%): RAG is an advanced topic. Make sure youβve completed:
Phase 5 (Neural Networks)
Phase 10 (Prompt Engineering)
Phase 8 (LangChain basics)
4-5 correct (40-50%): You have some relevant background. Review vector embeddings and LLM basics before starting.
6-7 correct (60-70%): Good foundation! Youβre ready to learn RAG with focused effort.
8-9 correct (80-90%): Excellent! You understand the core concepts. This phase will teach you implementation.
10 correct (100%): Outstanding! You may have RAG experience. Youβll learn production best practices and advanced techniques.
Prerequisites ChecklistΒΆ
Before starting Phase 7, ensure you understand:
β How LLMs work (Phase 10)
β Vector embeddings (Phase 9)
β Python and APIs
β Basic prompt engineering
β JSON and data structures
Next StepsΒΆ
After this pre-quiz:
Score < 50%: Review prerequisites first
Phase 10: Prompt Engineering
Phase 9: Vector Databases intro
LLM fundamentals
Score 50-70%: Start Phase 7 but take it slow
Revisit embedding concepts
Practice with simple examples
Ask questions in community
Score > 70%: Dive into Phase 7!
Build a RAG system
Complete the assignment
Try advanced challenges
Remember: RAG is complex but incredibly powerful. Take your time and build incrementally! ππ