Assignment: Build a Neural Network from ScratchΒΆ
π― ObjectiveΒΆ
Build a complete neural network from scratch (without PyTorch/TensorFlow) to classify the MNIST handwritten digits dataset. This assignment will solidify your understanding of how neural networks actually work under the hood.
Estimated Time: 6-8 hours
Difficulty: βββ Intermediate
Due Date: 2 weeks from assignment
π RequirementsΒΆ
Part 1: Network Architecture (25 points)ΒΆ
Implement a 3-layer neural network with:
Input layer: 784 neurons (28x28 flattened images)
Hidden layer 1: 128 neurons with ReLU activation
Hidden layer 2: 64 neurons with ReLU activation
Output layer: 10 neurons with Softmax activation (digits 0-9)
Required Implementation:
class NeuralNetwork:
def __init__(self, layer_sizes):
"""
Initialize network with given layer sizes.
Args:
layer_sizes: List of layer sizes, e.g., [784, 128, 64, 10]
"""
# TODO: Initialize weights and biases
pass
def forward(self, X):
"""Forward pass through the network."""
# TODO: Implement forward propagation
pass
def backward(self, X, y):
"""Backward pass - compute gradients."""
# TODO: Implement backpropagation
pass
def update_weights(self, learning_rate):
"""Update weights using computed gradients."""
# TODO: Implement gradient descent update
pass
Part 2: Training Loop (25 points)ΒΆ
Implement the training process:
Load and preprocess MNIST dataset
Implement mini-batch gradient descent
Use categorical cross-entropy loss
Train for at least 10 epochs
Track training and validation loss per epoch
Plot learning curves
Training Requirements:
Batch size: 32-128 (your choice)
Learning rate: 0.001-0.01 (experiment)
Validation split: 20% of training data
Save best model based on validation accuracy
Part 3: Evaluation & Analysis (25 points)ΒΆ
Evaluate your trained model:
Calculate final test accuracy (target: >90%)
Create confusion matrix
Show examples of misclassified digits
Analyze which digits are most confused
Visualize learned weights from first layer
Required Metrics:
# Calculate and report:
- Test Accuracy
- Precision per class
- Recall per class
- F1-Score per class
- Overall confusion matrix
Part 4: Experimentation & Documentation (25 points)ΒΆ
Experiment and document your findings:
Experiment 1: Try 3 different learning rates - which works best?
Experiment 2: Compare 2-layer vs 3-layer networks
Experiment 3: Try different activation functions (sigmoid, tanh, ReLU)
Experiment 4: Test different weight initialization strategies
Documentation Requirements:
Create a markdown report with:
Introduction and approach
Architecture decisions and rationale
Training process description
Results table comparing experiments
Conclusions and lessons learned
π Grading RubricΒΆ
Criteria |
Exemplary (A: 90-100%) |
Proficient (B: 80-89%) |
Adequate (C: 70-79%) |
Needs Work (D/F: <70%) |
|---|---|---|---|---|
Implementation |
Clean, efficient, well-commented code; all functions work correctly |
Mostly correct, minor bugs; adequate comments |
Basic implementation with several bugs |
Broken or incomplete code |
Architecture |
Proper layer sizes, activations, and initialization; optimized design |
Correct structure with minor inefficiencies |
Basic structure but suboptimal choices |
Incorrect architecture |
Training |
Smooth convergence, proper validation, excellent learning curves |
Good training process with minor issues |
Training works but inefficient |
Poor training or doesnβt converge |
Evaluation |
Comprehensive analysis, insightful visualizations, >92% accuracy |
Good analysis, clear results, >90% accuracy |
Basic evaluation, >85% accuracy |
Incomplete evaluation or <85% accuracy |
Experiments |
4+ experiments, thorough analysis, clear insights |
3-4 experiments with good documentation |
2-3 experiments, basic documentation |
<2 experiments or poor analysis |
Documentation |
Exceptionally clear, professional, insightful |
Well-written and organized |
Adequate but could be clearer |
Poor or missing documentation |
Grade BreakdownΒΆ
A (90-100): All requirements met + bonus challenges + exceptional documentation
B (80-89): All core requirements met with good quality
C (70-79): Most requirements met, basic functionality
D/F (<70): Major requirements missing or not working
π Bonus Challenges (+10 points each, max +30)ΒΆ
Bonus 1: Advanced Optimizers (+10 points)ΒΆ
Implement momentum optimization
Implement Adam optimizer
Compare SGD vs Momentum vs Adam with plots
Bonus 2: Regularization (+10 points)ΒΆ
Add L2 regularization
Implement dropout
Show impact on overfitting with plots
Bonus 3: Advanced Analysis (+10 points)ΒΆ
Visualize activation patterns in hidden layers
Implement and visualize attention/saliency maps
Create interactive demo with matplotlib widgets
Bonus 4: Performance Optimization (+10 points)ΒΆ
Vectorize all operations (no Python loops)
Compare training time: original vs optimized
Profile code and show bottleneck analysis
π¦ Submission RequirementsΒΆ
What to SubmitΒΆ
Code Files:
neural_network.py- Your NN class implementationtrain.py- Training scriptevaluate.py- Evaluation scriptrequirements.txt- Dependencies
Jupyter Notebook:
analysis.ipynb- Complete analysis with:Training process
Visualizations
Experiments
Results discussion
Report:
REPORT.md- Markdown report with:Methodology
Results tables
Conclusions
Lessons learned
Assets:
models/- Saved model weightsplots/- All generated visualizationsresults/- Experiment results (CSV/JSON)
Submission FormatΒΆ
GitHub Repository:
your-name-mnist-nn/
βββ README.md # Setup and run instructions
βββ requirements.txt # Dependencies
βββ neural_network.py # Core implementation
βββ train.py # Training script
βββ evaluate.py # Evaluation script
βββ analysis.ipynb # Analysis notebook
βββ REPORT.md # Written report
βββ models/
β βββ best_model.npz # Saved weights
βββ plots/
β βββ learning_curves.png
β βββ confusion_matrix.png
β βββ ...
βββ results/
βββ experiments.csv
Submit:
GitHub repository link (make it public)
Include all files listed above
Ensure code runs with:
pip install -r requirements.txt && python train.py
π‘ Hints & TipsΒΆ
Hint 1: Weight Initialization
Use Xavier/He initialization to prevent gradient vanishing:
# Xavier initialization for layers with sigmoid/tanh
W = np.random.randn(n_in, n_out) * np.sqrt(2.0 / (n_in + n_out))
# He initialization for layers with ReLU
W = np.random.randn(n_in, n_out) * np.sqrt(2.0 / n_in)
Hint 2: Debugging Gradients
Implement gradient checking to verify backpropagation:
def numerical_gradient(f, x, eps=1e-5):
"""Compute gradient numerically for verification."""
grad = np.zeros_like(x)
for i in range(x.size):
old_val = x.flat[i]
x.flat[i] = old_val + eps
pos = f(x)
x.flat[i] = old_val - eps
neg = f(x)
x.flat[i] = old_val
grad.flat[i] = (pos - neg) / (2 * eps)
return grad
Hint 3: Vectorization
Avoid loops! Process entire batches at once:
# Bad: Loop through samples
for i in range(batch_size):
output[i] = np.dot(W, X[i]) + b
# Good: Vectorized
output = np.dot(X, W.T) + b # Entire batch at once
Hint 4: Debugging Low Accuracy
If accuracy is low, check:
Data normalization (scale to 0-1)
Learning rate (try 0.001, 0.01, 0.1)
Weight initialization
Gradient flow (print gradient magnitudes)
Loss decreasing? (plot loss curve)
π ResourcesΒΆ
Essential ReadingΒΆ
Code ReferencesΒΆ
Optional Deep DivesΒΆ
β FAQΒΆ
Q: Can I use PyTorch/TensorFlow for parts of it?
A: No - the point is to implement from scratch. You can use NumPy, but not ML frameworks.
Q: What if I canβt reach 90% accuracy?
A: 85-89% is still acceptable for a passing grade. Document what you tried and why you think it didnβt work better.
Q: Can I work with a partner?
A: Discuss concepts together, but write your own code. No shared code submissions.
Q: How long should the report be?
A: Quality over quantity. 2-4 pages of clear analysis is better than 10 pages of fluff.
Q: Can I use a different dataset?
A: No - use MNIST so we can fairly compare submissions.
π Learning ObjectivesΒΆ
After completing this assignment, you will be able to:
β Implement forward and backward propagation from scratch
β Understand the mathematical foundations of neural networks
β Debug gradient computation issues
β Choose appropriate hyperparameters
β Evaluate model performance comprehensively
β Communicate technical results clearly
π Getting StartedΒΆ
Fork the starter repository: github.com/zero-to-ai/nn-assignment-starter
Set up your environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install numpy matplotlib scikit-learn jupyter
Download MNIST:
from sklearn.datasets import fetch_openml mnist = fetch_openml('mnist_784', version=1)
Start coding! Begin with the
neural_network.pyskeleton
π¬ Questions & SupportΒΆ
Office Hours: Tuesdays 2-4 PM, Thursdays 3-5 PM
Discussion Forum: GitHub Discussions
Email: instructor@zero-to-ai.com (response within 24 hours)
Stuck? Post your question in Discussions - help others by answering too!
Good luck! Youβve got this! π
Remember: This assignment is designed to be challenging but doable. Start early, test often, and donβt hesitate to ask for help.