Complete Reference Materials — Videos, Repos, Courses, Papers¶

Curated resources aligned to each learning phase. Free resources are marked with (FREE).

How to Use This File¶

Don’t try to consume all of these — use them as a menu
For each phase, pick 1-2 video resources and 1-2 repos to study
“Must-watch” items are marked with **

Video Courses & YouTube Channels¶

Mathematics & Foundations¶

Resource	Description	Link
3Blue1Brown: Essence of Linear Algebra	Best visual explanation of linear algebra	https://www.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab
3Blue1Brown: Essence of Calculus	Intuitive calculus from scratch	https://www.youtube.com/playlist?list=PLZHQObOWTQDMsr9K-rj53DwVRMYO3t5Yr
3Blue1Brown: Neural Networks	4-part series, best visual intro to neural networks	https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi
StatQuest with Josh Starmer	Every ML concept explained simply with cartoons	https://www.youtube.com/@statquest
MIT 18.065 (Gilbert Strang)	Full MIT linear algebra course (more rigorous)	https://www.youtube.com/playlist?list=PLUl4u3cNGP63oMNUHXqIUcrkS2PivhN3k
Stanford CS229 (Andrew Ng)	Classic ML course, full lectures	https://www.youtube.com/playlist?list=PLoROMvodv4rMiGQp3WXShtMGgzqpfVfbU

Deep Learning & Neural Networks¶

Resource	Description	Link
Andrej Karpathy: Neural Networks Zero to Hero	Build GPT from scratch, best hands-on series	https://www.youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ
Andrej Karpathy: makemore series	Build character-level LM from scratch	https://www.youtube.com/watch?v=PaCmpygFfXo
MIT 6.S191: Intro to Deep Learning	Annual MIT deep learning course	https://www.youtube.com/@AAmini
fast.ai Practical Deep Learning	Top-down approach, very practical	https://www.youtube.com/playlist?list=PLfYUBJiXbdtSvpQjSnJJ_PmDQB_VyT5iU
Stanford CS231N: CNNs for Visual Recognition	Best course on computer vision	https://www.youtube.com/playlist?list=PL3FW7Lu3i5JvHM8ljYj-zLfQRF3EO8sYv
Stanford CS224N: NLP with Deep Learning	Best course on NLP/transformers	https://www.youtube.com/playlist?list=PLoROMvodv4rOSH4v6133s9LFPRHjEmbmJ

LLMs, Transformers, and Generative AI¶

Resource	Description	Link
Andrej Karpathy: Let’s build GPT	Build GPT-2 from scratch in 2 hours	https://www.youtube.com/watch?v=kCc8FmEb1nY
Andrej Karpathy: Let’s build the GPT Tokenizer	Hands-on tokenization	https://www.youtube.com/watch?v=zduSFxRajkE
Yannic Kilcher: Paper Explanations	Deep research paper walkthroughs	https://www.youtube.com/@YannicKilcher
The AI Epiphany	Clear explanations of recent AI papers	https://www.youtube.com/@TheAiEpiphany
Umar Jamil: Transformers from scratch	Code implementations of major transformer papers	https://www.youtube.com/@umarjamilai
Two Minute Papers	3-5 min summaries of top AI research	https://www.youtube.com/@TwoMinutePapers

RAG, Agents, and Applied LLMs¶

Resource	Description	Link
Sam Witteveen (Red Dragon AI)	RAG, LangChain, agents hands-on	https://www.youtube.com/@samwitteveenai
James Briggs	LangChain, RAG, Pinecone tutorials	https://www.youtube.com/@jamesbriggs
Data Independent	LangChain + OpenAI practical tutorials	https://www.youtube.com/@DataIndependent
LangChain Official	Official LangChain tutorials and walkthroughs	https://www.youtube.com/@LangChain
Matt Williams (Ollama)	Local LLMs with Ollama	https://www.youtube.com/@technovangelist

Fine-tuning and MLOps¶

Resource	Description	Link
Trelis Research	LLM fine-tuning, quantization, deployment	https://www.youtube.com/@TrelisResearch
Weights & Biases (W&B)	MLOps, experiment tracking tutorials	https://www.youtube.com/@WeightsBiases
Made With ML	Full MLOps course (free)	https://www.youtube.com/@madewithml
Patrick Loeber	PyTorch tutorials from scratch	https://www.youtube.com/@patloeber
Sentdex	Python, ML, reinforcement learning	https://www.youtube.com/@sentdex

Free Online Courses¶

Must-Complete (Curated for Job Readiness)¶

Course	Platform	Duration	Cost
Hugging Face NLP Course	HuggingFace	4-6 weeks	FREE
fast.ai Practical Deep Learning	fast.ai	8 weeks	FREE
DeepLearning.AI: ChatGPT Prompt Engineering	Coursera	1 week	FREE audit
Made With ML	madewithml.com	6-8 weeks	FREE
Stanford CS229 Machine Learning	Stanford	12 weeks	FREE
Stanford CS224N NLP with Deep Learning	Stanford	10 weeks	FREE
MIT OpenCourseWare 6.036	MIT	12 weeks	FREE
Google Machine Learning Crash Course	Google	3-4 weeks	FREE
Microsoft AI for Beginners	GitHub	24 lessons	FREE
Microsoft ML for Beginners	GitHub	26 lessons	FREE
Microsoft Generative AI for Beginners	GitHub	18 lessons	FREE
Microsoft AI Agents for Beginners	GitHub	Self-paced	FREE

Paid (Worth the Investment)¶

Course	Platform	Duration	Cost
Deep Learning Specialization (Andrew Ng)	Coursera	3-4 months	$49/mo
LLM Fine-tuning with Hugging Face	Coursera/DL.AI	2 weeks	$49/mo
LangChain for LLM Applications	Coursera/DL.AI	1 week	$49/mo
Building AI Agents (DeepLearning.AI)	Coursera	1-2 weeks	$49/mo
MLOps Specialization	Coursera	4 months	$49/mo
Full Stack LLM Bootcamp	FSDL	Self-paced	$200

GitHub Repositories to Study¶

Foundational ML¶

Repo	Stars	Why Study It
scikit-learn/scikit-learn	60k+	Industry-standard ML library; study examples folder
ageron/handson-ml3	28k+	Hands-On ML book notebooks (excellent quality)
rasbt/machine-learning-book	5k+	ML with scikit-learn, Keras, TensorFlow book code
microsoft/ML-For-Beginners	70k+	26 lessons, curriculum-style

Deep Learning & PyTorch¶

Repo	Stars	Why Study It
karpathy/nanoGPT	38k+	Minimal, readable GPT-2 implementation — must read
karpathy/minbpe	10k+	Minimal BPE tokenizer implementation
pytorch/tutorials	8k+	Official PyTorch tutorials
fastai/fastai	26k+	high-level DL library, great for learning patterns
lucidrains/x-transformers	4k+	Clean transformer implementations

LLMs and NLP¶

Repo	Stars	Why Study It
huggingface/transformers	140k+	Industry standard; study examples and model cards
huggingface/peft	17k+	LoRA/QLoRA fine-tuning library
huggingface/trl	12k+	SFT, DPO, RLHF training
openai/openai-cookbook	62k+	Official OpenAI examples and best practices
anthropics/anthropic-cookbook	8k+	Official Claude/Anthropic examples
mlabonne/llm-course	42k+	Excellent LLM engineering course with notebooks

RAG and Vector Search¶

Repo	Stars	Why Study It
langchain-ai/langchain	97k+	Most popular LLM orchestration framework
run-llama/llama_index	38k+	Alternative to LangChain, strong for RAG
chroma-core/chroma	16k+	Most popular local vector database
qdrant/qdrant	21k+	High-performance vector DB for production
pgvector/pgvector	13k+	Vector search in PostgreSQL
BerriAI/litellm	16k+	Unified interface to 100+ LLM APIs

Agents and Tool Use¶

Repo	Stars	Why Study It
langchain-ai/langgraph	10k+	Graph-based stateful agent framework
openai/swarm	18k+	OpenAI’s lightweight multi-agent framework
microsoft/autogen	37k+	Multi-agent conversation framework
crewAIInc/crewAI	26k+	Role-based multi-agent framework
anthropics/model-context-protocol	Growing	MCP standard for tool integration

Fine-tuning¶

Repo	Stars	Why Study It
unslothai/unsloth	22k+	2-4x faster fine-tuning with less memory
axolotl-ai-cloud/axolotl	9k+	Powerful fine-tuning config framework
huggingface/alignment-handbook	5k+	Official HF recipes for SFT and DPO
hiyouga/LLaMA-Factory	40k+	Easy fine-tuning for many models
mlabonne/llm-datasets	2k+	Curated LLM fine-tuning datasets

MLOps and Deployment¶

Repo	Stars	Why Study It
mlflow/mlflow	19k+	Experiment tracking and model registry
wandb/wandb	9k+	Weights & Biases experiment tracking
vllm-project/vllm	44k+	Fast LLM inference and serving
tiangolo/fastapi	80k+	Fast API framework for ML serving
gradio-app/gradio	34k+	Build ML demos in minutes
streamlit/streamlit	36k+	Data apps with pure Python

Evaluation¶

Repo	Stars	Why Study It
explodinggradients/ragas	7k+	RAG evaluation framework
openai/evals	14k+	OpenAI’s evaluation framework
EleutherAI/lm-evaluation-harness	7k+	Industry standard LLM benchmarking
huggingface/evaluate	2k+	Metrics for ML models

Blogs and Reading¶

Must-Read Blogs¶

Blog	Why Read It	Link
Lilian Weng (OpenAI)	Deeply researched posts on transformer internals, RL, agents	https://lilianweng.github.io
Sebastian Ruder	NLP research trends and summaries	https://ruder.io
Jay Alammar	Visual explanations of transformer and BERT	https://jalammar.github.io
Andrej Karpathy	Occasional deep-dive blog posts	https://karpathy.github.io
The Gradient	Long-form ML research coverage	https://thegradient.pub
Import AI (Jack Clark)	Weekly AI news for practitioners	https://jack-clark.net
Simon Willison	Practical LLM and tool use	https://simonwillison.net
Eugene Yan	Applied ML in production	https://eugeneyan.com

Research Papers (Required Reading for Senior Roles)¶

Start with the abstracts and conclusions, read fully for the ones most relevant to your role.

Foundation papers:

Attention Is All You Need (2017) — the Transformer
BERT (2018) — bidirectional pre-training
Language Models are Few-Shot Learners / GPT-3 (2020) — scaling + prompting
An Image is Worth 16x16 Words / ViT (2020) — Vision Transformers

RAG and retrieval:

Retrieval-Augmented Generation (2020) — original RAG paper
Precise Zero-Shot Dense Retrieval / HyDE (2022) — HyDE technique

Fine-tuning and alignment:

LoRA (2021) — low-rank adaptation
QLoRA (2023) — quantized LoRA on consumer hardware
InstructGPT / RLHF (2022) — training with human feedback
Direct Preference Optimization / DPO (2023) — alignment without RL

Agents and tools:

ReAct: Synergizing Reasoning and Acting (2022) — ReAct agents
Toolformer (2023) — teaching LLMs to use tools

Prompting:

Chain-of-Thought Prompting (2022) — step-by-step reasoning
Large Language Models are Zero-Shot Reasoners (2022) — “Let’s think step by step”

Datasets for Practice Projects¶

Dataset	Use Case	Where to Find
HuggingFace Datasets Hub	Everything — 100k+ datasets	https://huggingface.co/datasets
Kaggle Datasets	Competition-grade structured data	https://kaggle.com/datasets
Common Crawl	Web text for LLM training	https://commoncrawl.org
The Pile	Diverse text dataset	https://huggingface.co/datasets/EleutherAI/the_pile
OpenAssistant OASST2	Conversation data for fine-tuning	https://huggingface.co/datasets/OpenAssistant/oasst2
Alpaca dataset	Instruction-following data	https://huggingface.co/datasets/tatsu-lab/alpaca
SQUAD v2	Question answering	https://huggingface.co/datasets/rajpurkar/squad_v2
MS MARCO	Information retrieval	https://microsoft.github.io/msmarco/
BEIR benchmark	RAG/retrieval evaluation	https://github.com/beir-cellar/beir

Developer Tools to Know¶

APIs and Platforms¶

Tool	What It Does	Why Learn It
OpenAI API	GPT-4, embeddings, DALL-E	Most common LLM API in production
Anthropic API	Claude models	Strong at reasoning, long context
Hugging Face Hub	Model hosting and sharing	Industry standard for open-source models
Ollama	Local LLM serving	Dev/testing without cloud costs
Replicate	Run models via API	Easy access to open-source models
Modal	Serverless GPU computing	Run GPU workloads without managing servers
Together AI	Fast inference for open models	Good OpenAI-compatible API alternative

Development Tools¶

Tool	Use	Link
LangSmith	LangChain observability and tracing	https://smith.langchain.com
Weights & Biases	Experiment tracking, visualizations	https://wandb.ai
MLflow	Open-source experiment tracking	https://mlflow.org
LabelStudio	Data labeling for fine-tuning	https://labelstud.io
Chainlit	Build chat UIs for LLM apps	https://chainlit.io
Gradio	Quick ML demos	https://gradio.app

Learning Communities¶

Community	Platform	Value
Hugging Face Forums	HuggingFace	Best for transformers/fine-tuning questions
r/MachineLearning	Reddit	Research and industry news
r/LocalLLaMA	Reddit	Local LLM community, very active
ML Twitter/X	Twitter	Follow researchers and engineers for news
Discord: Eleuther AI	Discord	Open-source LLM research community
Discord: fast.ai	Discord	Practical DL community
Discord: LangChain	Discord	LangChain help and announcements
Kaggle forums	Kaggle	Competition strategies and feedback

Phase-to-Resource Mapping¶

Quick reference — what to use for each phase:

Phase	Video	GitHub Repo	Blog/Other
0: Foundations	StatQuest channel	microsoft/ML-For-Beginners	GLOSSARY.md
1: Python/Data Science	StatQuest	scikit-learn/scikit-learn examples	ageron/handson-ml3
2: Mathematics	3Blue1Brown Linear Algebra + Calculus	karpathy/nanoGPT (for context)	Jay Alammar blog
3: Tokenization	Karpathy: GPT Tokenizer	karpathy/minbpe	HF tokenizers docs
4: Embeddings	Sam Witteveen	sentence-transformers repo	Jay Alammar: Illustrated Word2Vec
5: Neural Networks	3Blue1Brown NN series + Karpathy Zero to Hero	karpathy/nanoGPT	Lilian Weng blog
6: Vector Databases	James Briggs	chroma-core/chroma	Pinecone docs
7: RAG	Sam Witteveen, James Briggs	openai/openai-cookbook	Lilian Weng RAG post
8: MLOps	Made With ML, W&B	mlflow/mlflow	Eugene Yan blog
9: Specializations	Fast.ai (for CV), Stanford CS224N (for NLP)	langchain-ai/langgraph	10-specializations/README.md
10: Prompt Engineering	DeepLearning.AI course	openai/openai-cookbook	Lilian Weng prompting
11: Fine-tuning	Trelis Research	unslothai/unsloth	LoRA + QLoRA papers
12: Multimodal	Two Minute Papers	huggingface/transformers	ViT paper, CLIP paper
13: Local LLMs	Matt Williams (Ollama)	ollama/ollama	r/LocalLLaMA
14: AI Agents	LangChain official	langchain-ai/langgraph	ReAct + Toolformer papers
15: Streaming	FastAPI docs	tiangolo/fastapi	SSE spec, WebSocket MDN
16: Model Evaluation	StatQuest: ROC/AUC	explodinggradients/ragas	Hugging Face evaluate docs
17: Debugging	Made With ML	whylogs, evidently	Eugene Yan debugging post
18: Low-Code Tools	Gradio official channel	gradio-app/gradio	Hugging Face Spaces docs
19: AI Safety	Anthropic research blog	microsoft/promptbench	Lilian Weng adversarial post
24: Advanced DL	Yannic Kilcher	lucidrains/x-transformers	Lilian Weng GAN/VAE posts
25: Reinforcement Learning	DeepMind YouTube	openai/gym (now Gymnasium)	Lilian Weng RL post
26: Time Series	StatQuest time series	facebook/prophet	Rob Hyndman blog
27: Causal Inference	Brady Neal causal course	microsoft/EconML	Judea Pearl writings