Phase 6: Neural NetworksΒΆ

This module is where the repo shifts from classical ML intuition into modern deep learning. The goal is not just to run PyTorch code, but to understand why gradient-based learning, attention, and transformers work well enough that later LLM modules feel connected instead of magical.

What You Should Be Able To ExplainΒΆ

  • Why nonlinear activations are needed

  • How backpropagation moves signal through a network

  • Why PyTorch autograd matters in practice

  • What attention is computing and why scaling matters

  • How transformer blocks combine attention, MLPs, residual paths, and normalization

How To Study This ModuleΒΆ

Suggested PracticeΒΆ

  • Implement a tiny MLP from scratch with NumPy

  • Rebuild the same idea in PyTorch

  • Write down tensor shapes at each step of attention

  • Explain a transformer block without using the phrase β€œit just learns it”

Why This Module MattersΒΆ

If this phase is weak, later phases on fine-tuning, local LLMs, evaluation, and agents become tool memorization. If this phase is strong, the rest of the repo becomes a connected system.