Master Study GuideΒΆ
Goal: Become a competitive AI/ML engineer in 6-12 months.
How to Use This GuideΒΆ
Read top-to-bottom once for the full picture
Use CAREER_ROADMAP.md to target job roles
Follow checklist.md to track progress
Use REFERENCES.md for courses and papers per phase
Use INTERVIEW_PREP.md for practice
Build portfolio projects progressively β they matter more than certifications
Learning TracksΒΆ
Track A: AI Engineer (4-6 months)ΒΆ
Focus on LLMs, RAG, agents, deployment. Skip deep math.
Month 1: Phase 0 (Foundations) β Phase 4 (Embeddings) β Phase 5 (NN basics only)
Month 2: Phase 6 (Vector DBs) β Phase 7 (RAG) β Phase 18 (Low-code demos)
Month 3: Phase 10 (Prompt Eng) β Phase 13 (Local LLMs) β Phase 14 (Agents)
Month 4: Phase 8 (MLOps) β Phase 11 (Fine-tuning) β Portfolio projects
Month 5: Interview prep β Applications
Track B: ML Engineer (8-10 months)ΒΆ
Full foundation plus advanced topics.
Month 1-2: Phases 0-2 (Python, Data Science, Math)
Month 3: Phases 3-5 (Tokenization, Embeddings, Neural Networks)
Month 4: Phases 6-7 (Vector DBs, RAG) + Phase 18 (demos)
Month 5: Phases 8-9 (MLOps, Specializations) + Phase 16 (Eval)
Month 6-7: Phases 10-14 (Prompt Eng, Fine-tuning, Agents, Local LLMs)
Month 8: Phase 19 (Safety) + Portfolio + Phase 15 (Streaming)
Month 9: Interview prep + Applications
Track C: Data Scientist (6-8 months)ΒΆ
Statistics, experimentation, classical ML.
Month 1-2: Phases 0-2 (Python, Statistics, Math)
Month 3-4: Phase 27 (Causal Inference) + Phase 26 (Time Series)
Month 5: Phase 7 (RAG) + Phase 10 (Prompt Eng)
Month 6: Phase 16 (Eval) + Portfolio + Interview prep
Phase-by-Phase NotesΒΆ
Phase 0: Foundations (1 week)ΒΆ
Supervised vs unsupervised vs RL. Train/val/test splits. Overfitting. Loss functions. ML vs deep learning vs LLMs.
Start: 23-glossary/GLOSSARY.md
Phase 1: Python & Data Science (3-4 weeks)ΒΆ
NumPy, pandas, scikit-learn. Focus on: LinearRegression, LogisticRegression, RandomForest, GradientBoosting, KMeans, PCA.
Key notebooks: 02-data-science/
Phase 2: Mathematics for ML (2-3 weeks)ΒΆ
Linear algebra (vectors, matrices, dot products), gradient descent, probability. You need intuition, not proofs.
Key notebooks: 03-maths/foundational/
Phase 3: Tokenization (1 week)ΒΆ
Tokens != words. BPE tokenization. Why it matters for cost and context. TikToken and HuggingFace tokenizers.
Key notebooks: 04-token/
Phase 4: Embeddings (1 week)ΒΆ
Text to vectors. Cosine similarity. Word vs sentence embeddings. Choosing between local models and APIs (Gemini Embedding, Voyage, OpenAI, Sentence Transformers).
Key notebooks: 05-embeddings/
Phase 5: Neural Networks (2-3 weeks)ΒΆ
Neurons, layers, activations, backprop. The Transformer architecture. Self-attention. Multi-head attention.
Key notebooks: 06-neural-networks/
Phase 6: Vector Databases (1 week)ΒΆ
ANN search, HNSW. ChromaDB for prototypes, Qdrant for production, pgvector if you already use Postgres.
Key notebooks: 07-vector-databases/
Phase 7: RAG Systems (2 weeks)ΒΆ
The RAG pipeline: chunk β embed β store β retrieve β rerank β generate. Chunking strategies. Hybrid search. RAGAS evaluation.
RAG is the most in-demand AI skill in enterprise right now. The basic version is easy β making it work well is the challenge.
Key notebooks: 08-rag/
Phase 8: MLOps (2 weeks)ΒΆ
MLflow, FastAPI, Docker, GitHub Actions, monitoring. 80% of ML projects fail because of bad ops, not bad models.
Key notebooks: 09-mlops/
Phase 9: Specializations (2-3 weeks, pick one)ΒΆ
AI Agents: ReAct, tool calling, LangGraph, MCP β 10-specializations/ai-agents/
Computer Vision: CNNs, YOLO, CLIP β 10-specializations/computer-vision/
NLP: BERT fine-tuning, NER, summarization β 10-specializations/nlp/
Phase 10: Prompt Engineering (1 week)ΒΆ
Zero/few-shot, Chain-of-Thought, ReAct, structured outputs, DSPy. Treat prompting like software engineering: version control, testing, iteration.
Key notebooks: 11-prompt-engineering/
Phase 11: LLM Fine-tuning (2-3 weeks)ΒΆ
Decision: prompt β RAG β fine-tune. LoRA, QLoRA, SFT, DPO. Use Unsloth for 2-5x faster training.
Key notebooks: 12-llm-finetuning/
Phase 12: Multimodal AI (1 week)ΒΆ
CLIP, vision-language models (GPT-5.4, Claude Sonnet 4.6, Qwen2.5-VL), Stable Diffusion, multimodal RAG.
Key notebooks: 13-multimodal/
Phase 13: Local LLMs (1 week)ΒΆ
Ollama (Qwen 3, Llama 4, DeepSeek R1), llama.cpp, vLLM. Quantization formats. TurboQuant for KV cache compression (ICLR 2026). OpenAI-compatible local APIs.
Key notebooks: 14-local-llms/
Phase 14: AI Agents (2 weeks)ΒΆ
ReAct loop, tool calling, agent memory, multi-agent systems, MCP, LangGraph.
Key notebooks: 15-ai-agents/
Supplementary PhasesΒΆ
Model Evaluation β 16-model-evaluation/ β do after Phase 7
Debugging β 17-debugging-troubleshooting/ β alongside portfolio projects
Low-Code Tools β 18-low-code-ai-tools/ β Gradio/Streamlit for demos
AI Safety β 19-ai-safety-redteaming/ β before deploying anything publicly
Streaming β 20-real-time-streaming/ β for production UX
Portfolio ProjectsΒΆ
Required (3 projects)ΒΆ
Project 1: RAG Document Assistant LangChain/LlamaIndex + ChromaDB + Ollama/OpenAI + FastAPI + Streamlit. Add hybrid search, reranking, RAGAS eval.
Project 2: Fine-tuned Domain Expert Pick a domain. Fine-tune Qwen 3 4B or Phi-4 Mini with QLoRA. Evaluate before/after. Deploy with Ollama or vLLM.
Project 3: End-to-End MLOps Pipeline Train a model. Track with MLflow. Serve with FastAPI + Docker. CI/CD with GitHub Actions.
DifferentiatorsΒΆ
Project 4: AI Agent with Real Tools β 3+ tools, memory, documented failure modes
Project 5: Evaluation Framework β automated regression testing for model quality
Key Papers to KnowΒΆ
Paper |
Year |
Contribution |
|---|---|---|
Attention Is All You Need |
2017 |
Transformer architecture |
BERT |
2018 |
Bidirectional pre-training |
GPT-3 |
2020 |
Scale + few-shot prompting |
RAG (Lewis et al.) |
2020 |
Retrieval-augmented generation |
LoRA |
2021 |
Parameter-efficient fine-tuning |
InstructGPT (RLHF) |
2022 |
Alignment via human feedback |
Chain-of-Thought |
2022 |
Reasoning via step-by-step prompts |
DPO |
2023 |
Alignment without RL |
Weekly Schedule TemplateΒΆ
Weekdays (3 hrs/day): 1 hr theory β 1 hr notebook β 1 hr project work
Weekends (4-5 hrs/day): 2-3 hrs deep dive β 1-2 hrs portfolio β 30 min review
Progress CheckpointsΒΆ
Week 4: Can you train a scikit-learn classifier, clean data, evaluate properly?
Week 8: Can you explain transformers? Build a simple neural network in PyTorch?
Week 12: Can you build a working RAG system? Evaluate its quality?
Week 16: Can you fine-tune a small LLM? Deploy it with FastAPI?
Week 20: Can you build a working AI agent with tools and memory?
Week 24: 3+ portfolio projects on GitHub with READMEs?