Assignment: Build a Production-Ready RAG System¶

🎯 Objective¶

Build a complete Retrieval-Augmented Generation (RAG) system for a real-world use case. Your system should handle document ingestion, intelligent retrieval, and high-quality answer generation with proper evaluation metrics.

Estimated Time: 8-10 hours
Difficulty: ⭐⭐⭐⭐ Advanced
Due Date: 2 weeks from assignment

📊 Grading Rubric¶

Criteria	Exemplary (A: 90-100%)	Proficient (B: 80-89%)	Adequate (C: 70-79%)	Needs Work (D/F: <70%)
Document Processing (20pts)	Multi-format, semantic chunking, metadata	Good chunking, basic metadata	Simple chunking only	Broken or incomplete
Retrieval (25pts)	Hybrid search + reranking, excellent relevance	Good retrieval, some reranking	Basic semantic search	Poor retrieval quality
Generation (25pts)	Citations, confidence, high quality answers	Good answers, some citations	Basic answers generated	Poor answer quality
Evaluation (30pts)	Comprehensive metrics, >50 test cases, deep analysis	Good metrics, 30-50 tests	Basic eval, 20-30 tests	Incomplete evaluation

🎯 Use Case Options (Choose One)¶

Option 1: Technical Documentation Assistant¶

Dataset: Python/React/AWS documentation
Challenge: Handle code examples, API references
Special requirement: Syntax highlighting in answers

Option 2: Research Paper Q&A¶

Dataset: ArXiv papers in your field of interest
Challenge: Mathematical notation, citations
Special requirement: LaTeX rendering

Option 3: Company Knowledge Base¶

Dataset: Internal docs, wikis, Slack conversations
Challenge: Privacy, access control
Special requirement: User permissions

Option 4: Legal Document Analysis¶

Dataset: Court cases, statutes, regulations
Challenge: Precise language, citations critical
Special requirement: Confidence scoring

Option 5: Medical Literature Search¶

Dataset: PubMed articles, clinical trials
Challenge: Technical terminology, accuracy critical
Special requirement: Source verification

📦 Submission Requirements¶

Repository Structure¶

your-name-rag-system/
├── README.md                          # Setup and usage guide
├── requirements.txt                   # Dependencies
├── .env.example                       # Environment variables template
├── src/
│   ├── document_processor.py          # Part 1
│   ├── retriever.py                   # Part 2
│   ├── generator.py                   # Part 3
│   ├── evaluator.py                   # Part 4
│   └── rag_system.py                  # Main system
├── data/
│   ├── documents/                     # Source documents
│   ├── test_set.json                  # Evaluation questions
│   └── ground_truth.json              # Expected answers
├── notebooks/
│   ├── 01_data_preparation.ipynb
│   ├── 02_retrieval_experiments.ipynb
│   ├── 03_generation_tuning.ipynb
│   └── 04_evaluation_analysis.ipynb
├── tests/
│   ├── test_processor.py
│   ├── test_retriever.py
│   ├── test_generator.py
│   └── test_integration.py
├── results/
│   ├── metrics.json
│   ├── error_analysis.md
│   └── charts/
└── EVALUATION_REPORT.md               # Detailed analysis

Deliverables¶

Working RAG System:
- All 4 parts implemented
- Passes all tests
- CLI or API interface
- Demo notebook
Evaluation Report:
- Methodology description
- Metrics tables and charts
- Error analysis
- Optimization attempts
- Conclusions
Test Dataset:
- 50+ diverse questions
- Ground truth answers
- Difficulty levels
- Coverage of edge cases
Demo:
- 5-minute video OR
- Live Gradio/Streamlit app
- Show: ingestion → retrieval → generation → evaluation

💡 Advanced Tips¶

Tip 1: Semantic Chunking Strategy

def semantic_chunking(text, max_chunk_size=512):
    """
    Chunk text at semantic boundaries.
    Priority: paragraphs > sentences > words
    """
    # Try paragraph-level first
    paragraphs = text.split('\n\n')
    chunks = []
    current_chunk = ""
    
    for para in paragraphs:
        if len(current_chunk) + len(para) < max_chunk_size:
            current_chunk += para + "\n\n"
        else:
            if current_chunk:
                chunks.append(current_chunk.strip())
            current_chunk = para + "\n\n"
    
    if current_chunk:
        chunks.append(current_chunk.strip())
    
    return chunks

Tip 2: Hybrid Search Implementation

def hybrid_search(self, query, top_k=10, alpha=0.7):
    """
    Combine dense (embeddings) + sparse (BM25) search.
    
    alpha: weight for dense search (1-alpha for sparse)
    """
    # Dense retrieval
    query_embedding = self.embed(query)
    dense_results = self.vector_store.similarity_search(
        query_embedding, k=top_k*2
    )
    
    # Sparse retrieval (BM25)
    sparse_results = self.bm25.get_top_n(query, self.documents, n=top_k*2)
    
    # Combine with weighted scoring
    combined_scores = {}
    for doc, score in dense_results:
        combined_scores[doc.id] = alpha * score
    
    for doc, score in sparse_results:
        combined_scores[doc.id] = (
            combined_scores.get(doc.id, 0) + (1-alpha) * score
        )
    
    # Sort and return top k
    sorted_docs = sorted(
        combined_scores.items(), 
        key=lambda x: x[1], 
        reverse=True
    )[:top_k]
    
    return [self.get_doc(doc_id) for doc_id, _ in sorted_docs]

Tip 3: Citation Extraction

def generate_with_citations(self, query, chunks):
    """Generate answer with inline citations."""
    # Number each chunk
    context = "\n\n".join([
        f"[{i+1}] {chunk.text}"
        for i, chunk in enumerate(chunks)
    ])
    
    prompt = f"""Answer using the numbered sources below. 
    Include citations like [1], [2] in your answer.
    
    {context}
    
    Question: {query}
    Answer:"""
    
    answer = self.llm(prompt)
    
    # Extract citations and map to sources
    import re
    citations = re.findall(r'\[(\d+)\]', answer)
    sources = [chunks[int(c)-1].metadata for c in set(citations)]
    
    return {
        "answer": answer,
        "sources": sources,
        "citation_count": len(set(citations))
    }

📚 Resources¶

Essential Reading¶

Tools & Libraries¶

Vector Stores: ChromaDB, Pinecone, Weaviate, Qdrant
Embeddings: OpenAI, Cohere, Sentence-Transformers
Frameworks: LangChain, LlamaIndex, Haystack
Evaluation: RAGAS, DeepEval

Papers¶

🎓 Learning Objectives¶

After completing this assignment, you will:

✅ Build end-to-end RAG systems
✅ Implement advanced retrieval techniques
✅ Optimize for quality, speed, and cost
✅ Evaluate RAG systems comprehensively
✅ Deploy production-ready AI applications

💬 Support¶

Office Hours: Tuesdays/Thursdays 2-5 PM
Discussion: GitHub Discussions
Email: instructor@zero-to-ai.com

Good luck building your RAG system! 🚀

Assignment: Build a Production-Ready RAG System¶

🎯 Objective¶

📋 Requirements¶

Part 1: Document Processing Pipeline (20 points)¶

Part 2: Vector Database & Retrieval (25 points)¶

Part 3: Answer Generation (25 points)¶

Part 4: Evaluation & Testing (30 points)¶