# Install required packages
!pip install torch torchvision numpy matplotlib scikit-learn transformers datasets tqdm
Verify InstallationΒΆ
Before diving into neural networks, we need to confirm that all core libraries are available and properly configured. PyTorch is the deep learning framework we will use throughout this module for building, training, and evaluating models. NumPy provides the numerical foundation, Matplotlib handles visualization, and HuggingFace Transformers gives us access to state-of-the-art pre-trained models like BERT and GPT. The cell below also checks for CUDA (GPU) availability β GPU acceleration is not required for the exercises, but it dramatically speeds up training for larger models.
import torch
import numpy as np
import matplotlib.pyplot as plt
from transformers import __version__ as transformers_version
print("β
All packages installed successfully!\n")
print(f"PyTorch version: {torch.__version__}")
print(f"NumPy version: {np.__version__}")
print(f"Transformers version: {transformers_version}")
print(f"\nCUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
print(f"GPU: {torch.cuda.get_device_name(0)}")
else:
print("Running on CPU (this is fine for learning!)")
π― What Youβll BuildΒΆ
By the end of this module, youβll have built:
Notebook 1: Simple Neural NetworkΒΆ
# Binary classifier from scratch
class NeuralNetwork:
def forward(self, X):
# Your code!
pass
Notebook 2: Backpropagation EngineΒΆ
# Automatic differentiation
loss.backward() # Compute all gradients!
Notebook 3: PyTorch ModelsΒΆ
# Modern deep learning
model = nn.Sequential(
nn.Linear(784, 128),
nn.ReLU(),
nn.Linear(128, 10)
)
Notebook 4: Attention MechanismΒΆ
# The innovation that changed AI
attention_scores = softmax(Q @ K.T / sqrt(d_k))
output = attention_scores @ V
Notebook 5: TransformerΒΆ
# The architecture behind GPT, BERT, etc.
transformer = nn.Transformer(
d_model=512,
nhead=8,
num_encoder_layers=6
)
π§ͺ Quick Neural Network DemoΒΆ
Letβs see a neural network in action!
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt
# Generate data: classify points above/below a curve
np.random.seed(42)
torch.manual_seed(42)
n_points = 200
X = np.random.randn(n_points, 2)
y = (X[:, 1] > X[:, 0]**2).astype(float) # Quadratic boundary
# Convert to PyTorch tensors
X_tensor = torch.FloatTensor(X)
y_tensor = torch.FloatTensor(y).reshape(-1, 1)
# Define a simple neural network
model = nn.Sequential(
nn.Linear(2, 8), # Input layer: 2 features β 8 neurons
nn.ReLU(), # Activation function
nn.Linear(8, 8), # Hidden layer: 8 β 8
nn.ReLU(),
nn.Linear(8, 1), # Output layer: 8 β 1
nn.Sigmoid() # Convert to probability
)
# Training setup
criterion = nn.BCELoss() # Binary cross-entropy loss
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
# Train the network
losses = []
for epoch in range(1000):
# Forward pass
predictions = model(X_tensor)
loss = criterion(predictions, y_tensor)
# Backward pass
optimizer.zero_grad()
loss.backward()
optimizer.step()
losses.append(loss.item())
if (epoch + 1) % 200 == 0:
print(f"Epoch {epoch+1}/1000, Loss: {loss.item():.4f}")
print("\nβ
Training complete!")
# Visualize results
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
# Plot 1: Training loss
ax1.plot(losses)
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Loss')
ax1.set_title('Training Loss Over Time')
ax1.grid(True, alpha=0.3)
# Plot 2: Decision boundary
# Create mesh for decision boundary
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100),
np.linspace(y_min, y_max, 100))
# Predict on mesh
mesh_data = torch.FloatTensor(np.c_[xx.ravel(), yy.ravel()])
with torch.no_grad():
Z = model(mesh_data).numpy().reshape(xx.shape)
# Plot decision boundary
ax2.contourf(xx, yy, Z, levels=20, cmap='RdYlBu', alpha=0.6)
ax2.scatter(X[y==0, 0], X[y==0, 1], c='blue', label='Class 0', edgecolors='k', s=50)
ax2.scatter(X[y==1, 0], X[y==1, 1], c='red', label='Class 1', edgecolors='k', s=50)
ax2.set_xlabel('Feature 1')
ax2.set_ylabel('Feature 2')
ax2.set_title('Neural Network Decision Boundary')
ax2.legend()
ax2.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Calculate accuracy
with torch.no_grad():
predictions = (model(X_tensor) > 0.5).float()
accuracy = (predictions == y_tensor).float().mean()
print(f"\nβ
Accuracy: {accuracy.item()*100:.2f}%")
π What Just Happened?ΒΆ
In just a few lines, you:
Created a neural network with 2 hidden layers
Trained it to learn a complex decision boundary
Achieved high accuracy on classification
The network learned the quadratic boundary automatically from data!
π Reading MaterialΒΆ
Theory Documents (Read First)ΒΆ
intro.md- Neural network fundamentalsNeurons, layers, activation functions
Forward propagation
Loss functions
attention_explained.md- The attention mechanismWhy attention matters
Self-attention explained
Multi-head attention
transformer_architecture.md- Transformer deep diveEncoder-decoder architecture
Position embeddings
How GPT and BERT differ
π Prerequisites ReviewΒΆ
Make sure youβre comfortable with these concepts from earlier phases:
# Mathematics (Phase 0.5)
import numpy as np
# 1. Matrix multiplication (Linear Algebra)
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
C = A @ B # Neural networks are matrix operations!
print("Matrix multiplication:\n", C)
# 2. Derivatives (Calculus)
# f(x) = x^2, derivative f'(x) = 2x
x = 3
derivative = 2 * x # Backpropagation uses chain rule!
print(f"\nDerivative of x^2 at x=3: {derivative}")
# 3. Probability (Statistics)
# Softmax converts scores to probabilities
scores = np.array([2.0, 1.0, 0.1])
exp_scores = np.exp(scores)
probabilities = exp_scores / exp_scores.sum()
print(f"\nSoftmax probabilities: {probabilities}")
print(f"Sum to 1.0: {probabilities.sum()}")
π¦ Next StepsΒΆ
Ready to Start?ΒΆ
Recommended path:
β Watch 3Blue1Brown videos 1-2
β Read
intro.mdsections 1-3β Complete
01_neural_network_basics.ipynbβ Watch 3Blue1Brown videos 3-4
β Complete
02_backpropagation_explained.ipynbβ Complete
03_pytorch_fundamentals.ipynbβ Read
attention_explained.mdβ Complete
04_attention_mechanism.ipynbβ Read
transformer_architecture.mdβ Complete
05_transformer_architecture.ipynb
Study TipsΒΆ
πΊ Watch videos first - Visual intuition is crucial
π Take notes - Write down key equations
π» Code along - Donβt just read, type the code
π Experiment - Change hyperparameters, see what happens
π€ Ask why - Understand the purpose of each component
π Progress TrackerΒΆ
Track your learning:
Milestone |
Status |
|---|---|
Watched 3Blue1Brown videos |
β |
Read intro.md |
β |
Completed Notebook 1 |
β |
Completed Notebook 2 |
β |
Completed Notebook 3 |
β |
Read attention_explained.md |
β |
Completed Notebook 4 |
β |
Read transformer_architecture.md |
β |
Completed Notebook 5 |
β |
Built a custom model |
β |
π― Learning GoalsΒΆ
By the end of this module, you should be able to:
β Explain how neural networks learn from data
β Implement forward and backward propagation
β Build and train neural networks with PyTorch
β Understand the attention mechanism
β Explain transformer architecture
β Fine-tune pre-trained models
β Connect this to earlier phases (tokenization β embeddings β transformers)
π Helpful ResourcesΒΆ
VideosΒΆ
InteractiveΒΆ
TensorFlow Playground - Visualize neural networks
CNN Explainer - Interactive CNN
Transformer Explainer - Interactive transformer
ArticlesΒΆ
DocumentationΒΆ
π Letβs Begin!ΒΆ
Start with: 01_neural_network_basics.ipynb
Youβre about to understand the technology behind ChatGPT, DALL-E, and all modern AI systems. Exciting times ahead! π