Challenges: Neural Networks¶

Hands-on challenges to deepen your understanding of neural networks

🚀 Challenge 1: Gradient Vanishing Detective¶

Difficulty: ⭐⭐ Beginner-Intermediate
Time: 30-45 minutes
Concepts: Activation functions, gradient flow, deep networks

The Problem¶

Build a very deep neural network (10+ layers) and observe the gradient vanishing problem in action.

Your Task¶

Create a 15-layer network with sigmoid activations
Train on MNIST dataset
Track gradient magnitudes at each layer during training
Visualize the gradients - notice how they get smaller in early layers?
Fix it by switching to ReLU - observe the difference!

Starter Code¶

class DeepNetwork:
    def __init__(self, n_layers=15, activation='sigmoid'):
        self.n_layers = n_layers
        self.activation = activation
        # TODO: Initialize layers
    
    def track_gradients(self):
        """Return gradient magnitudes for each layer."""
        # TODO: Track gradient norms
        pass

Success Criteria¶

Demonstrate gradient vanishing with sigmoid
Show improvement with ReLU
Create visualization comparing both
Explain why this happens mathematically

💡 Hint

Sigmoid derivative: max value is 0.25. If you chain 15 layers: 0.25^15 ≈ 0 (vanishing!) ReLU derivative: either 0 or 1, so gradients flow better.

🚀 Challenge 2: Architecture Search¶

Difficulty: ⭐⭐⭐ Intermediate
Time: 1-2 hours
Concepts: Hyperparameter tuning, model selection, experimental design

The Problem¶

Find the best neural network architecture for CIFAR-10 image classification.

Your Task¶

Experiment with:

Number of layers (2, 4, 6, 8)
Layer sizes (32, 64, 128, 256)
Activation functions (ReLU, LeakyReLU, ELU)
Dropout rates (0, 0.2, 0.5)
Batch sizes (32, 64, 128)

Requirements¶

Test at least 20 different architectures
Track accuracy, training time, and parameter count
Create visualization showing tradeoffs
Identify Pareto-optimal architectures
Present your findings

Deliverable¶

results/
├── experiment_log.csv      # All experiments
├── best_models.json        # Top 5 configurations
├── tradeoff_plot.png      # Accuracy vs params vs time
└── recommendations.md      # Your analysis

💡 Hint

Use a grid search or random search. Track experiments in a DataFrame. Consider accuracy/params ratio as a metric.

🚀 Challenge 3: Attention Visualization¶

Difficulty: ⭐⭐⭐ Intermediate-Advanced
Time: 2-3 hours
Concepts: Attention mechanism, visualization, interpretability

The Problem¶

Implement and visualize a simple attention mechanism for sequence classification.

Your Task¶

Build attention layer from scratch
Train on text sentiment classification
Visualize attention weights for sample inputs
Show which words the model focuses on
Compare with/without attention

Example Output¶

Input: "This movie was absolutely terrible and boring"
Attention weights:
- "terrible": 0.45 ██████████
- "boring": 0.38   ████████
- "absolutely": 0.10 ██
- Other words: <0.05
Prediction: Negative (0.92 confidence)

Success Criteria¶

Working attention implementation
Heatmap visualization of attention
Interpretable results
Performance comparison

💡 Hint

Attention formula: α = softmax(score(h_i, query)) Output = Σ α_i * h_i Start with simple dot-product attention.

🚀 Challenge 4: Transfer Learning Master¶

Difficulty: ⭐⭐⭐⭐ Advanced
Time: 3-4 hours
Concepts: Transfer learning, fine-tuning, domain adaptation

The Problem¶

Use a pre-trained ImageNet model for a custom classification task with limited data.

Your Task¶

Choose a small custom dataset (100-500 images, 5-10 classes)
Load pre-trained ResNet/VGG/MobileNet
Try 3 approaches:
- Feature extraction (freeze all layers)
- Fine-tuning (unfreeze last few layers)
- Full training (train entire network)
Compare results and training time
Analyze which layers learn what

Dataset Suggestions¶

Food classification
Dog breed recognition
Flower species
Medical images
Your own photos

Analysis Required¶

Learning curves for each approach
Confusion matrices
Layer-wise feature visualization
Recommendations for when to use each approach

💡 Hint

Feature extraction works well with <1000 images. Fine-tuning helps when task is somewhat different from ImageNet. Full training needs lots of data.

🚀 Challenge 5: Neural Network Debugger¶

Difficulty: ⭐⭐⭐ Intermediate
Time: 1-2 hours
Concepts: Debugging, troubleshooting, systematic diagnosis

The Problem¶

You’re given 5 broken neural network implementations. Find and fix the bugs!

Scenarios¶

Bug 1: The Never-Learning Network

# Network trains but loss doesn't decrease
# What's wrong?
for epoch in range(100):
    loss = compute_loss(model(X), y)
    gradients = backprop(loss)
    # weights stay the same! Why?

Bug 2: The Exploding Loss

# Loss goes to infinity after a few batches
# What's the issue?
def train_step(X, y):
    pred = model(X)
    loss = mse_loss(pred, y)
    # loss = 1e+10 after iteration 3

Bug 3: The Plateauing Network

# Accuracy stuck at 10% on MNIST (10 classes)
# Red flag! What's happening?

Bug 4: The Slow Learner

# Training takes 10x longer than expected
# Same architecture, same data
# Where's the bottleneck?

Bug 5: The Overfitting Champion

# Training accuracy: 99%
# Validation accuracy: 45%
# How do you fix this?

Your Task¶

Identify each bug
Explain why it happens
Provide the fix
Share prevention strategies

💡 Hint 1

Bug 1: Are you actually updating the weights? Bug 2: Check learning rate and weight initialization Bug 3: Is your model predicting all one class? Bug 4: Profile your code - is it the data loading? Bug 5: Classic overfitting - what's missing?

🌟 Meta Challenge: Build Your Own Framework¶

Difficulty: ⭐⭐⭐⭐⭐ Expert
Time: 8-12 hours
Concepts: Software engineering, API design, comprehensive understanding

The Ultimate Challenge¶

Build a mini deep learning framework (like PyTorch, but simpler).

Requirements¶

Your framework should support:

Automatic differentiation (autograd)
Common layers (Linear, Conv2D, ReLU, etc.)
Loss functions (MSE, CrossEntropy)
Optimizers (SGD, Adam)
Model save/load
GPU support (optional but impressive!)

Example Usage¶

from your_framework import Module, Linear, ReLU, CrossEntropyLoss, Adam

class MyModel(Module):
    def __init__(self):
        self.fc1 = Linear(784, 128)
        self.relu = ReLU()
        self.fc2 = Linear(128, 10)
    
    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

model = MyModel()
optimizer = Adam(model.parameters(), lr=0.001)
criterion = CrossEntropyLoss()

# Training loop works just like PyTorch!

Bonus Points¶

Clean, documented API
Comprehensive tests
Tutorial notebooks
Performance benchmarks
PyTorch compatibility layer

📊 Challenge Completion Tracker¶

Mark off challenges as you complete them:

Challenge 1: Gradient Vanishing Detective
Challenge 2: Architecture Search
Challenge 3: Attention Visualization
Challenge 4: Transfer Learning Master
Challenge 5: Neural Network Debugger
Meta Challenge: Build Your Own Framework

💡 Learning Tips¶

Start small: Don’t try all challenges at once
Debug systematically: Print intermediate values, visualize data
Compare with libraries: Check your implementation against PyTorch
Read papers: Many challenges have research papers behind them
Ask for help: Post in Discussions if stuck

Happy challenging! 🚀

Remember: The goal is to learn, not to complete everything perfectly. Each challenge deepens your understanding!

Challenges: Neural Networks¶

🚀 Challenge 1: Gradient Vanishing Detective¶

The Problem¶

Your Task¶

Starter Code¶

Success Criteria¶

🚀 Challenge 2: Architecture Search¶

The Problem¶

Your Task¶

Requirements¶

Deliverable¶

🚀 Challenge 3: Attention Visualization¶

The Problem¶

Your Task¶

Example Output¶

Success Criteria¶

🚀 Challenge 4: Transfer Learning Master¶

The Problem¶

Your Task¶

Dataset Suggestions¶

Analysis Required¶

🚀 Challenge 5: Neural Network Debugger¶

The Problem¶

Scenarios¶

Your Task¶

🌟 Meta Challenge: Build Your Own Framework¶

The Ultimate Challenge¶

Requirements¶

Example Usage¶

Bonus Points¶

📊 Challenge Completion Tracker¶

🏆 Share Your Solutions!¶

💡 Learning Tips¶