Challenges: Neural NetworksΒΆ
Hands-on challenges to deepen your understanding of neural networks
π Challenge 1: Gradient Vanishing DetectiveΒΆ
Difficulty: ββ Beginner-Intermediate
Time: 30-45 minutes
Concepts: Activation functions, gradient flow, deep networks
The ProblemΒΆ
Build a very deep neural network (10+ layers) and observe the gradient vanishing problem in action.
Your TaskΒΆ
Create a 15-layer network with sigmoid activations
Train on MNIST dataset
Track gradient magnitudes at each layer during training
Visualize the gradients - notice how they get smaller in early layers?
Fix it by switching to ReLU - observe the difference!
Starter CodeΒΆ
class DeepNetwork:
def __init__(self, n_layers=15, activation='sigmoid'):
self.n_layers = n_layers
self.activation = activation
# TODO: Initialize layers
def track_gradients(self):
"""Return gradient magnitudes for each layer."""
# TODO: Track gradient norms
pass
Success CriteriaΒΆ
Demonstrate gradient vanishing with sigmoid
Show improvement with ReLU
Create visualization comparing both
Explain why this happens mathematically
π‘ Hint
Sigmoid derivative: max value is 0.25. If you chain 15 layers: 0.25^15 β 0 (vanishing!) ReLU derivative: either 0 or 1, so gradients flow better.π Challenge 2: Architecture SearchΒΆ
Difficulty: βββ Intermediate
Time: 1-2 hours
Concepts: Hyperparameter tuning, model selection, experimental design
The ProblemΒΆ
Find the best neural network architecture for CIFAR-10 image classification.
Your TaskΒΆ
Experiment with:
Number of layers (2, 4, 6, 8)
Layer sizes (32, 64, 128, 256)
Activation functions (ReLU, LeakyReLU, ELU)
Dropout rates (0, 0.2, 0.5)
Batch sizes (32, 64, 128)
RequirementsΒΆ
Test at least 20 different architectures
Track accuracy, training time, and parameter count
Create visualization showing tradeoffs
Identify Pareto-optimal architectures
Present your findings
DeliverableΒΆ
results/
βββ experiment_log.csv # All experiments
βββ best_models.json # Top 5 configurations
βββ tradeoff_plot.png # Accuracy vs params vs time
βββ recommendations.md # Your analysis
π‘ Hint
Use a grid search or random search. Track experiments in a DataFrame. Consider accuracy/params ratio as a metric.π Challenge 3: Attention VisualizationΒΆ
Difficulty: βββ Intermediate-Advanced
Time: 2-3 hours
Concepts: Attention mechanism, visualization, interpretability
The ProblemΒΆ
Implement and visualize a simple attention mechanism for sequence classification.
Your TaskΒΆ
Build attention layer from scratch
Train on text sentiment classification
Visualize attention weights for sample inputs
Show which words the model focuses on
Compare with/without attention
Example OutputΒΆ
Input: "This movie was absolutely terrible and boring"
Attention weights:
- "terrible": 0.45 ββββββββββ
- "boring": 0.38 ββββββββ
- "absolutely": 0.10 ββ
- Other words: <0.05
Prediction: Negative (0.92 confidence)
Success CriteriaΒΆ
Working attention implementation
Heatmap visualization of attention
Interpretable results
Performance comparison
π‘ Hint
Attention formula: Ξ± = softmax(score(h_i, query)) Output = Ξ£ Ξ±_i * h_i Start with simple dot-product attention.π Challenge 4: Transfer Learning MasterΒΆ
Difficulty: ββββ Advanced
Time: 3-4 hours
Concepts: Transfer learning, fine-tuning, domain adaptation
The ProblemΒΆ
Use a pre-trained ImageNet model for a custom classification task with limited data.
Your TaskΒΆ
Choose a small custom dataset (100-500 images, 5-10 classes)
Load pre-trained ResNet/VGG/MobileNet
Try 3 approaches:
Feature extraction (freeze all layers)
Fine-tuning (unfreeze last few layers)
Full training (train entire network)
Compare results and training time
Analyze which layers learn what
Dataset SuggestionsΒΆ
Food classification
Dog breed recognition
Flower species
Medical images
Your own photos
Analysis RequiredΒΆ
Learning curves for each approach
Confusion matrices
Layer-wise feature visualization
Recommendations for when to use each approach
π‘ Hint
Feature extraction works well with <1000 images. Fine-tuning helps when task is somewhat different from ImageNet. Full training needs lots of data.π Challenge 5: Neural Network DebuggerΒΆ
Difficulty: βββ Intermediate
Time: 1-2 hours
Concepts: Debugging, troubleshooting, systematic diagnosis
The ProblemΒΆ
Youβre given 5 broken neural network implementations. Find and fix the bugs!
ScenariosΒΆ
Bug 1: The Never-Learning Network
# Network trains but loss doesn't decrease
# What's wrong?
for epoch in range(100):
loss = compute_loss(model(X), y)
gradients = backprop(loss)
# weights stay the same! Why?
Bug 2: The Exploding Loss
# Loss goes to infinity after a few batches
# What's the issue?
def train_step(X, y):
pred = model(X)
loss = mse_loss(pred, y)
# loss = 1e+10 after iteration 3
Bug 3: The Plateauing Network
# Accuracy stuck at 10% on MNIST (10 classes)
# Red flag! What's happening?
Bug 4: The Slow Learner
# Training takes 10x longer than expected
# Same architecture, same data
# Where's the bottleneck?
Bug 5: The Overfitting Champion
# Training accuracy: 99%
# Validation accuracy: 45%
# How do you fix this?
Your TaskΒΆ
Identify each bug
Explain why it happens
Provide the fix
Share prevention strategies
π‘ Hint 1
Bug 1: Are you actually updating the weights? Bug 2: Check learning rate and weight initialization Bug 3: Is your model predicting all one class? Bug 4: Profile your code - is it the data loading? Bug 5: Classic overfitting - what's missing?π Meta Challenge: Build Your Own FrameworkΒΆ
Difficulty: βββββ Expert
Time: 8-12 hours
Concepts: Software engineering, API design, comprehensive understanding
The Ultimate ChallengeΒΆ
Build a mini deep learning framework (like PyTorch, but simpler).
RequirementsΒΆ
Your framework should support:
Automatic differentiation (autograd)
Common layers (Linear, Conv2D, ReLU, etc.)
Loss functions (MSE, CrossEntropy)
Optimizers (SGD, Adam)
Model save/load
GPU support (optional but impressive!)
Example UsageΒΆ
from your_framework import Module, Linear, ReLU, CrossEntropyLoss, Adam
class MyModel(Module):
def __init__(self):
self.fc1 = Linear(784, 128)
self.relu = ReLU()
self.fc2 = Linear(128, 10)
def forward(self, x):
x = self.fc1(x)
x = self.relu(x)
x = self.fc2(x)
return x
model = MyModel()
optimizer = Adam(model.parameters(), lr=0.001)
criterion = CrossEntropyLoss()
# Training loop works just like PyTorch!
Bonus PointsΒΆ
Clean, documented API
Comprehensive tests
Tutorial notebooks
Performance benchmarks
PyTorch compatibility layer
π Challenge Completion TrackerΒΆ
Mark off challenges as you complete them:
Challenge 1: Gradient Vanishing Detective
Challenge 2: Architecture Search
Challenge 3: Attention Visualization
Challenge 4: Transfer Learning Master
Challenge 5: Neural Network Debugger
Meta Challenge: Build Your Own Framework
π‘ Learning TipsΒΆ
Start small: Donβt try all challenges at once
Debug systematically: Print intermediate values, visualize data
Compare with libraries: Check your implementation against PyTorch
Read papers: Many challenges have research papers behind them
Ask for help: Post in Discussions if stuck
Happy challenging! π
Remember: The goal is to learn, not to complete everything perfectly. Each challenge deepens your understanding!