Phase 6: Neural Networks¶

This module is where the repo shifts from classical ML intuition into modern deep learning. The goal is not just to run PyTorch code, but to understand why gradient-based learning, attention, and transformers work well enough that later LLM modules feel connected instead of magical.

Recommended Order¶

Companion reading:

What You Should Be Able To Explain¶

Why nonlinear activations are needed
How backpropagation moves signal through a network
Why PyTorch autograd matters in practice
What attention is computing and why scaling matters
How transformer blocks combine attention, MLPs, residual paths, and normalization

How To Study This Module¶

Spend more time on 02_backpropagation_explained.ipynb than on framework syntax.
Treat 04_attention_mechanism.ipynb as the bridge into LLM architecture.
Revisit 03-maths/foundational/07_neural_network_math.ipynb if gradients feel mechanical instead of intuitive.

Suggested Practice¶

Implement a tiny MLP from scratch with NumPy
Rebuild the same idea in PyTorch
Write down tensor shapes at each step of attention
Explain a transformer block without using the phrase “it just learns it”

Why This Module Matters¶

If this phase is weak, later phases on fine-tuning, local LLMs, evaluation, and agents become tool memorization. If this phase is strong, the rest of the repo becomes a connected system.