Master Study GuideΒΆ

Goal: Become a competitive AI/ML engineer in 6-12 months.

How to Use This GuideΒΆ

  1. Read top-to-bottom once for the full picture

  2. Use CAREER_ROADMAP.md to target job roles

  3. Follow checklist.md to track progress

  4. Use REFERENCES.md for courses and papers per phase

  5. Use INTERVIEW_PREP.md for practice

  6. Build portfolio projects progressively β€” they matter more than certifications

Learning TracksΒΆ

Track A: AI Engineer (4-6 months)ΒΆ

Focus on LLMs, RAG, agents, deployment. Skip deep math.

Month 1:  Phase 0 (Foundations) β†’ Phase 4 (Embeddings) β†’ Phase 5 (NN basics only)
Month 2:  Phase 6 (Vector DBs) β†’ Phase 7 (RAG) β†’ Phase 18 (Low-code demos)
Month 3:  Phase 10 (Prompt Eng) β†’ Phase 13 (Local LLMs) β†’ Phase 14 (Agents)
Month 4:  Phase 8 (MLOps) β†’ Phase 11 (Fine-tuning) β†’ Portfolio projects
Month 5:  Interview prep β†’ Applications

Track B: ML Engineer (8-10 months)ΒΆ

Full foundation plus advanced topics.

Month 1-2: Phases 0-2 (Python, Data Science, Math)
Month 3:   Phases 3-5 (Tokenization, Embeddings, Neural Networks)
Month 4:   Phases 6-7 (Vector DBs, RAG) + Phase 18 (demos)
Month 5:   Phases 8-9 (MLOps, Specializations) + Phase 16 (Eval)
Month 6-7: Phases 10-14 (Prompt Eng, Fine-tuning, Agents, Local LLMs)
Month 8:   Phase 19 (Safety) + Portfolio + Phase 15 (Streaming)
Month 9:   Interview prep + Applications

Track C: Data Scientist (6-8 months)ΒΆ

Statistics, experimentation, classical ML.

Month 1-2: Phases 0-2 (Python, Statistics, Math)
Month 3-4: Phase 27 (Causal Inference) + Phase 26 (Time Series)
Month 5:   Phase 7 (RAG) + Phase 10 (Prompt Eng)
Month 6:   Phase 16 (Eval) + Portfolio + Interview prep

Phase-by-Phase NotesΒΆ

Phase 0: Foundations (1 week)ΒΆ

Supervised vs unsupervised vs RL. Train/val/test splits. Overfitting. Loss functions. ML vs deep learning vs LLMs.

Start: 23-glossary/GLOSSARY.md

Phase 1: Python & Data Science (3-4 weeks)ΒΆ

NumPy, pandas, scikit-learn. Focus on: LinearRegression, LogisticRegression, RandomForest, GradientBoosting, KMeans, PCA.

Key notebooks: 02-data-science/

Phase 2: Mathematics for ML (2-3 weeks)ΒΆ

Linear algebra (vectors, matrices, dot products), gradient descent, probability. You need intuition, not proofs.

Key notebooks: 03-maths/foundational/

Phase 3: Tokenization (1 week)ΒΆ

Tokens != words. BPE tokenization. Why it matters for cost and context. TikToken and HuggingFace tokenizers.

Key notebooks: 04-token/

Phase 4: Embeddings (1 week)ΒΆ

Text to vectors. Cosine similarity. Word vs sentence embeddings. Choosing between local models and APIs (Gemini Embedding, Voyage, OpenAI, Sentence Transformers).

Key notebooks: 05-embeddings/

Phase 5: Neural Networks (2-3 weeks)ΒΆ

Neurons, layers, activations, backprop. The Transformer architecture. Self-attention. Multi-head attention.

Key notebooks: 06-neural-networks/

Phase 6: Vector Databases (1 week)ΒΆ

ANN search, HNSW. ChromaDB for prototypes, Qdrant for production, pgvector if you already use Postgres.

Key notebooks: 07-vector-databases/

Phase 7: RAG Systems (2 weeks)ΒΆ

The RAG pipeline: chunk β†’ embed β†’ store β†’ retrieve β†’ rerank β†’ generate. Chunking strategies. Hybrid search. RAGAS evaluation.

RAG is the most in-demand AI skill in enterprise right now. The basic version is easy β€” making it work well is the challenge.

Key notebooks: 08-rag/

Phase 8: MLOps (2 weeks)ΒΆ

MLflow, FastAPI, Docker, GitHub Actions, monitoring. 80% of ML projects fail because of bad ops, not bad models.

Key notebooks: 09-mlops/

Phase 9: Specializations (2-3 weeks, pick one)ΒΆ

Phase 10: Prompt Engineering (1 week)ΒΆ

Zero/few-shot, Chain-of-Thought, ReAct, structured outputs, DSPy. Treat prompting like software engineering: version control, testing, iteration.

Key notebooks: 11-prompt-engineering/

Phase 11: LLM Fine-tuning (2-3 weeks)ΒΆ

Decision: prompt β†’ RAG β†’ fine-tune. LoRA, QLoRA, SFT, DPO. Use Unsloth for 2-5x faster training.

Key notebooks: 12-llm-finetuning/

Phase 12: Multimodal AI (1 week)ΒΆ

CLIP, vision-language models (GPT-5.4, Claude Sonnet 4.6, Qwen2.5-VL), Stable Diffusion, multimodal RAG.

Key notebooks: 13-multimodal/

Phase 13: Local LLMs (1 week)ΒΆ

Ollama (Qwen 3, Llama 4, DeepSeek R1), llama.cpp, vLLM. Quantization formats. TurboQuant for KV cache compression (ICLR 2026). OpenAI-compatible local APIs.

Key notebooks: 14-local-llms/

Phase 14: AI Agents (2 weeks)ΒΆ

ReAct loop, tool calling, agent memory, multi-agent systems, MCP, LangGraph.

Key notebooks: 15-ai-agents/

Supplementary PhasesΒΆ

Portfolio ProjectsΒΆ

Required (3 projects)ΒΆ

Project 1: RAG Document Assistant LangChain/LlamaIndex + ChromaDB + Ollama/OpenAI + FastAPI + Streamlit. Add hybrid search, reranking, RAGAS eval.

Project 2: Fine-tuned Domain Expert Pick a domain. Fine-tune Qwen 3 4B or Phi-4 Mini with QLoRA. Evaluate before/after. Deploy with Ollama or vLLM.

Project 3: End-to-End MLOps Pipeline Train a model. Track with MLflow. Serve with FastAPI + Docker. CI/CD with GitHub Actions.

DifferentiatorsΒΆ

Project 4: AI Agent with Real Tools β€” 3+ tools, memory, documented failure modes

Project 5: Evaluation Framework β€” automated regression testing for model quality

Key Papers to KnowΒΆ

Paper

Year

Contribution

Attention Is All You Need

2017

Transformer architecture

BERT

2018

Bidirectional pre-training

GPT-3

2020

Scale + few-shot prompting

RAG (Lewis et al.)

2020

Retrieval-augmented generation

LoRA

2021

Parameter-efficient fine-tuning

InstructGPT (RLHF)

2022

Alignment via human feedback

Chain-of-Thought

2022

Reasoning via step-by-step prompts

DPO

2023

Alignment without RL

Weekly Schedule TemplateΒΆ

Weekdays (3 hrs/day): 1 hr theory β†’ 1 hr notebook β†’ 1 hr project work

Weekends (4-5 hrs/day): 2-3 hrs deep dive β†’ 1-2 hrs portfolio β†’ 30 min review

Progress CheckpointsΒΆ

  • Week 4: Can you train a scikit-learn classifier, clean data, evaluate properly?

  • Week 8: Can you explain transformers? Build a simple neural network in PyTorch?

  • Week 12: Can you build a working RAG system? Evaluate its quality?

  • Week 16: Can you fine-tune a small LLM? Deploy it with FastAPI?

  • Week 20: Can you build a working AI agent with tools and memory?

  • Week 24: 3+ portfolio projects on GitHub with READMEs?