# Install required packages
!pip install torch torchvision numpy matplotlib scikit-learn transformers datasets tqdm

Verify InstallationΒΆ

Before diving into neural networks, we need to confirm that all core libraries are available and properly configured. PyTorch is the deep learning framework we will use throughout this module for building, training, and evaluating models. NumPy provides the numerical foundation, Matplotlib handles visualization, and HuggingFace Transformers gives us access to state-of-the-art pre-trained models like BERT and GPT. The cell below also checks for CUDA (GPU) availability – GPU acceleration is not required for the exercises, but it dramatically speeds up training for larger models.

import torch
import numpy as np
import matplotlib.pyplot as plt
from transformers import __version__ as transformers_version

print("βœ… All packages installed successfully!\n")
print(f"PyTorch version: {torch.__version__}")
print(f"NumPy version: {np.__version__}")
print(f"Transformers version: {transformers_version}")
print(f"\nCUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
else:
    print("Running on CPU (this is fine for learning!)")

🎯 What You’ll BuildΒΆ

By the end of this module, you’ll have built:

Notebook 1: Simple Neural NetworkΒΆ

# Binary classifier from scratch
class NeuralNetwork:
    def forward(self, X):
        # Your code!
        pass

Notebook 2: Backpropagation EngineΒΆ

# Automatic differentiation
loss.backward()  # Compute all gradients!

Notebook 3: PyTorch ModelsΒΆ

# Modern deep learning
model = nn.Sequential(
    nn.Linear(784, 128),
    nn.ReLU(),
    nn.Linear(128, 10)
)

Notebook 4: Attention MechanismΒΆ

# The innovation that changed AI
attention_scores = softmax(Q @ K.T / sqrt(d_k))
output = attention_scores @ V

Notebook 5: TransformerΒΆ

# The architecture behind GPT, BERT, etc.
transformer = nn.Transformer(
    d_model=512,
    nhead=8,
    num_encoder_layers=6
)

πŸ§ͺ Quick Neural Network DemoΒΆ

Let’s see a neural network in action!

import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt

# Generate data: classify points above/below a curve
np.random.seed(42)
torch.manual_seed(42)

n_points = 200
X = np.random.randn(n_points, 2)
y = (X[:, 1] > X[:, 0]**2).astype(float)  # Quadratic boundary

# Convert to PyTorch tensors
X_tensor = torch.FloatTensor(X)
y_tensor = torch.FloatTensor(y).reshape(-1, 1)

# Define a simple neural network
model = nn.Sequential(
    nn.Linear(2, 8),      # Input layer: 2 features β†’ 8 neurons
    nn.ReLU(),            # Activation function
    nn.Linear(8, 8),      # Hidden layer: 8 β†’ 8
    nn.ReLU(),
    nn.Linear(8, 1),      # Output layer: 8 β†’ 1
    nn.Sigmoid()          # Convert to probability
)

# Training setup
criterion = nn.BCELoss()  # Binary cross-entropy loss
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

# Train the network
losses = []
for epoch in range(1000):
    # Forward pass
    predictions = model(X_tensor)
    loss = criterion(predictions, y_tensor)
    
    # Backward pass
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    losses.append(loss.item())
    
    if (epoch + 1) % 200 == 0:
        print(f"Epoch {epoch+1}/1000, Loss: {loss.item():.4f}")

print("\nβœ… Training complete!")
# Visualize results
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Plot 1: Training loss
ax1.plot(losses)
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Loss')
ax1.set_title('Training Loss Over Time')
ax1.grid(True, alpha=0.3)

# Plot 2: Decision boundary
# Create mesh for decision boundary
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100),
                     np.linspace(y_min, y_max, 100))

# Predict on mesh
mesh_data = torch.FloatTensor(np.c_[xx.ravel(), yy.ravel()])
with torch.no_grad():
    Z = model(mesh_data).numpy().reshape(xx.shape)

# Plot decision boundary
ax2.contourf(xx, yy, Z, levels=20, cmap='RdYlBu', alpha=0.6)
ax2.scatter(X[y==0, 0], X[y==0, 1], c='blue', label='Class 0', edgecolors='k', s=50)
ax2.scatter(X[y==1, 0], X[y==1, 1], c='red', label='Class 1', edgecolors='k', s=50)
ax2.set_xlabel('Feature 1')
ax2.set_ylabel('Feature 2')
ax2.set_title('Neural Network Decision Boundary')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Calculate accuracy
with torch.no_grad():
    predictions = (model(X_tensor) > 0.5).float()
    accuracy = (predictions == y_tensor).float().mean()
    print(f"\nβœ… Accuracy: {accuracy.item()*100:.2f}%")

πŸŽ‰ What Just Happened?ΒΆ

In just a few lines, you:

  1. Created a neural network with 2 hidden layers

  2. Trained it to learn a complex decision boundary

  3. Achieved high accuracy on classification

The network learned the quadratic boundary automatically from data!

πŸ“– Reading MaterialΒΆ

Theory Documents (Read First)ΒΆ

  1. intro.md - Neural network fundamentals

    • Neurons, layers, activation functions

    • Forward propagation

    • Loss functions

  2. attention_explained.md - The attention mechanism

    • Why attention matters

    • Self-attention explained

    • Multi-head attention

  3. transformer_architecture.md - Transformer deep dive

    • Encoder-decoder architecture

    • Position embeddings

    • How GPT and BERT differ

πŸŽ“ Prerequisites ReviewΒΆ

Make sure you’re comfortable with these concepts from earlier phases:

# Mathematics (Phase 0.5)
import numpy as np

# 1. Matrix multiplication (Linear Algebra)
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
C = A @ B  # Neural networks are matrix operations!
print("Matrix multiplication:\n", C)

# 2. Derivatives (Calculus)
# f(x) = x^2, derivative f'(x) = 2x
x = 3
derivative = 2 * x  # Backpropagation uses chain rule!
print(f"\nDerivative of x^2 at x=3: {derivative}")

# 3. Probability (Statistics)
# Softmax converts scores to probabilities
scores = np.array([2.0, 1.0, 0.1])
exp_scores = np.exp(scores)
probabilities = exp_scores / exp_scores.sum()
print(f"\nSoftmax probabilities: {probabilities}")
print(f"Sum to 1.0: {probabilities.sum()}")

🚦 Next Steps¢

Ready to Start?ΒΆ

Recommended path:

  1. βœ… Watch 3Blue1Brown videos 1-2

  2. βœ… Read intro.md sections 1-3

  3. βœ… Complete 01_neural_network_basics.ipynb

  4. βœ… Watch 3Blue1Brown videos 3-4

  5. βœ… Complete 02_backpropagation_explained.ipynb

  6. βœ… Complete 03_pytorch_fundamentals.ipynb

  7. βœ… Read attention_explained.md

  8. βœ… Complete 04_attention_mechanism.ipynb

  9. βœ… Read transformer_architecture.md

  10. βœ… Complete 05_transformer_architecture.ipynb

Study TipsΒΆ

  • πŸ“Ί Watch videos first - Visual intuition is crucial

  • πŸ“ Take notes - Write down key equations

  • πŸ’» Code along - Don’t just read, type the code

  • πŸ”„ Experiment - Change hyperparameters, see what happens

  • πŸ€” Ask why - Understand the purpose of each component

πŸ“Š Progress TrackerΒΆ

Track your learning:

Milestone

Status

Watched 3Blue1Brown videos

☐

Read intro.md

☐

Completed Notebook 1

☐

Completed Notebook 2

☐

Completed Notebook 3

☐

Read attention_explained.md

☐

Completed Notebook 4

☐

Read transformer_architecture.md

☐

Completed Notebook 5

☐

Built a custom model

☐

🎯 Learning Goals¢

By the end of this module, you should be able to:

  • βœ… Explain how neural networks learn from data

  • βœ… Implement forward and backward propagation

  • βœ… Build and train neural networks with PyTorch

  • βœ… Understand the attention mechanism

  • βœ… Explain transformer architecture

  • βœ… Fine-tune pre-trained models

  • βœ… Connect this to earlier phases (tokenization β†’ embeddings β†’ transformers)

πŸ”— Helpful ResourcesΒΆ

VideosΒΆ

InteractiveΒΆ

ArticlesΒΆ

DocumentationΒΆ

πŸš€ Let’s Begin!ΒΆ

Start with: 01_neural_network_basics.ipynb

You’re about to understand the technology behind ChatGPT, DALL-E, and all modern AI systems. Exciting times ahead! πŸŽ‰