# Setup: Import required libraries
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import linalg
from scipy.stats import multivariate_normal
# Set style
sns.set_style('whitegrid')
np.random.seed(42)
print("✅ Libraries loaded successfully!")
Chapter 2: Linear Algebra¶
Exercise 2.1: Gaussian Elimination 🟢¶
Solve the following system of linear equations using Gaussian elimination:
2x + 3y - z = 5
4x + 4y - 3z = 3
-2x + 3y - z = 1
Tasks:
Convert to augmented matrix form
Apply row operations to get row echelon form
Back-substitute to find x, y, z
Verify your solution
# Your code here
# Hint: Create augmented matrix [A | b]
Exercise 2.2: Linear Independence 🟡¶
Determine whether the following vectors are linearly independent:
v1 = [1, 2, 3]
v2 = [2, 4, 6]
v3 = [1, 1, 1]
Method: Check if the matrix formed by these vectors has full rank.
# Your code here
# Hint: Use np.linalg.matrix_rank()
Exercise 2.3: Basis and Dimension 🟡¶
Given the following vectors in ℝ³:
v1 = [1, 0, 1]
v2 = [0, 1, 1]
v3 = [1, 1, 0]
Tasks:
Verify they form a basis for ℝ³
Express the vector [2, 3, 1] in this basis
Convert back to standard basis to verify
# Your code here
Exercise 2.4: Linear Transformations 🟡¶
Consider the transformation T: ℝ² → ℝ² that:
Scales x-direction by 2
Rotates 45° counterclockwise
Tasks:
Construct the transformation matrix (combine scaling and rotation)
Apply to the unit square vertices: (0,0), (1,0), (1,1), (0,1)
Plot both original and transformed squares
# Your code here
# Hint: Rotation matrix is [[cos θ, -sin θ], [sin θ, cos θ]]
Chapter 3: Analytic Geometry¶
Exercise 3.1: Norms and Distances 🟢¶
Given vectors:
x = [3, 4]
y = [1, 2]
Calculate:
L1 norm (Manhattan) of x
L2 norm (Euclidean) of x
L∞ norm (Maximum) of x
Distance between x and y (L2)
Cosine similarity between x and y
# Your code here
Exercise 3.2: Gram-Schmidt Orthogonalization 🟡¶
Apply Gram-Schmidt process to orthogonalize:
v1 = [3, 1]
v2 = [2, 2]
Tasks:
Compute orthogonal vectors u1, u2
Normalize to get orthonormal vectors
Verify orthogonality (dot product = 0)
Verify unit length
# Your code here
# Formula: u1 = v1, u2 = v2 - proj_u1(v2)
Exercise 3.3: Projection and Distance 🟡¶
Given a point p = [5, 5] and a line defined by vector v = [1, 2] passing through origin:
Tasks:
Project p onto the line (find closest point on line)
Calculate distance from p to the line
Visualize: plot p, the line, projection point, and perpendicular distance
# Your code here
# Projection formula: proj_v(p) = (p·v / v·v) * v
Exercise 3.4: Angles and Orthogonality 🟢¶
For vectors:
a = [1, 1, 1]
b = [1, -1, 0]
c = [2, 0, -2]
Calculate:
Angle between a and b (in degrees)
Angle between b and c
Are any pairs orthogonal?
Find a vector orthogonal to both a and b (use cross product)
# Your code here
# Angle formula: cos(θ) = (a·b) / (||a|| ||b||)
Chapter 4: Matrix Decompositions¶
Exercise 4.1: Eigenvalues and Eigenvectors 🟢¶
For matrix:
A = [[4, 1],
[2, 3]]
Tasks:
Compute eigenvalues
Compute eigenvectors
Verify: Av = λv for each eigenvalue/eigenvector pair
Visualize the transformation effect on eigenvectors
# Your code here
Exercise 4.2: Eigendecomposition 🟡¶
Given symmetric matrix:
S = [[5, 2],
[2, 5]]
Tasks:
Decompose S = PDP^T (eigendecomposition)
Verify decomposition by reconstructing S
Compute S² using eigendecomposition (faster than direct multiplication)
Compute matrix square root √S
# Your code here
# S^2 = P D^2 P^T (eigenvalue exponentiation)
Exercise 4.3: Singular Value Decomposition 🟡¶
Apply SVD to:
M = [[3, 2, 2],
[2, 3, -2]]
Tasks:
Compute SVD: M = UΣV^T
Verify reconstruction
What is the rank of M?
Compute low-rank approximation (rank-1) and calculate reconstruction error
# Your code here
Exercise 4.4: Cholesky Decomposition 🔴¶
For positive definite matrix:
A = [[4, 2],
[2, 3]]
Tasks:
Verify A is positive definite (check eigenvalues > 0)
Compute Cholesky decomposition A = LL^T
Solve Ax = b for b = [1, 2] using forward and backward substitution
Compare efficiency with direct solve
# Your code here
Chapter 5: Vector Calculus¶
Exercise 5.1: Gradients 🟢¶
For functions:
f(x, y) = x² + y²
g(x, y) = x²y + xy²
Tasks:
Compute gradients analytically (by hand, write formulas)
Implement numerical gradient computation
Evaluate at point (2, 3)
Visualize gradient field with quiver plot
# Your code here
# Numerical gradient: (f(x+h) - f(x-h)) / (2h)
Exercise 5.2: Jacobian Matrix 🟡¶
For vector function F: ℝ² → ℝ²:
F([x, y]) = [x² - y², 2xy]
Tasks:
Compute Jacobian matrix J analytically
Evaluate J at point (1, 1)
Implement numerical Jacobian
Verify your analytical result matches numerical approximation
# Your code here
Exercise 5.3: Chain Rule and Backpropagation 🟡¶
Consider a simple computational graph:
x → a = x² → b = a + 1 → c = b³ → output
Tasks:
Compute forward pass for x = 2
Compute ∂c/∂x using chain rule analytically
Implement backward pass (backpropagation)
Verify numerical gradient
# Your code here
# Chain rule: dc/dx = (dc/db) * (db/da) * (da/dx)
Exercise 5.4: Hessian Matrix 🔴¶
For function f(x, y) = x³ + y³ - 3xy:
Tasks:
Compute Hessian matrix (matrix of second derivatives)
Find critical points (where gradient = 0)
Classify critical points using Hessian (minimum/maximum/saddle)
Visualize function with contours and mark critical points
# Your code here
# Hessian: H = [[∂²f/∂x², ∂²f/∂x∂y], [∂²f/∂y∂x, ∂²f/∂y²]]
Chapter 6: Probability and Distributions¶
Exercise 6.1: Probability Basics 🟢¶
A bag contains 3 red, 4 blue, and 5 green marbles.
Calculate:
P(drawing a blue marble)
P(drawing a red OR green marble)
P(drawing 2 blue marbles without replacement)
Simulate 10,000 draws and verify your probabilities
# Your code here
Exercise 6.2: Bayes’ Theorem 🟡¶
Medical test scenario:
Disease prevalence: 1% of population
Test sensitivity (true positive rate): 95%
Test specificity (true negative rate): 90%
Tasks:
If someone tests positive, what’s the probability they have the disease?
If someone tests negative, what’s the probability they’re healthy?
Implement a function to compute posterior probabilities
Visualize how posterior changes with different prior probabilities
# Your code here
# P(Disease|Positive) = P(Positive|Disease) * P(Disease) / P(Positive)
Exercise 6.3: Gaussian Distribution 🟡¶
Given: μ = 10, σ = 2
Tasks:
Generate 1000 samples from N(10, 4)
Plot histogram and overlay theoretical PDF
Calculate P(8 < X < 12) analytically and empirically
Find the value x such that P(X < x) = 0.95
Compute sample mean and variance, compare to theoretical
# Your code here
Exercise 6.4: Multivariate Gaussian 🔴¶
Create a 2D Gaussian with:
μ = [0, 0]
Σ = [[2, 1],
[1, 2]]
Tasks:
Generate 500 samples
Plot samples with covariance ellipse
Compute correlation coefficient
Apply transformation to make variables independent (whitening)
Verify independence (compute covariance of transformed data)
# Your code here
Chapter 7: Continuous Optimization¶
Exercise 7.1: Gradient Descent 🟢¶
Minimize f(x) = x² + 5x + 6 using gradient descent.
Tasks:
Implement gradient descent from scratch
Start from x₀ = 10, use learning rate α = 0.1
Track convergence (plot x vs iteration)
Find optimal x analytically and compare
Experiment with different learning rates (too small, too large)
# Your code here
Exercise 7.2: Momentum 🟡¶
For function f(x, y) = x² + 10y²:
Tasks:
Implement vanilla gradient descent
Implement gradient descent with momentum (β = 0.9)
Start from (10, 10), use α = 0.01
Plot trajectories of both methods
Compare convergence speed (iterations to reach tolerance)
# Your code here
# Momentum: v = β*v + (1-β)*gradient; x = x - α*v
Exercise 7.3: Newton’s Method 🔴¶
Minimize f(x) = x⁴ - 3x³ + 2 using Newton’s method.
Tasks:
Implement Newton’s method (uses Hessian)
Compare with gradient descent
Visualize convergence (quadratic vs linear)
What happens with bad initialization? (Try x₀ = 0.5)
# Your code here
# Newton: x_new = x_old - H^(-1) * gradient
Exercise 7.4: Constrained Optimization 🔴¶
Minimize f(x, y) = (x - 3)² + (y - 2)²
Subject to: x + y = 5
Tasks:
Solve using Lagrange multipliers (analytical)
Verify solution satisfies constraint
Visualize: plot objective contours, constraint line, and solution
Use projected gradient descent as alternative method
# Your code here
# Lagrangian: L(x, y, λ) = f(x,y) + λ(x + y - 5)
🎯 Bonus Challenge: Integration Exercise 🔴🔴¶
Ridge Regression with Gradient Descent¶
Combine concepts from multiple chapters to implement ridge regression:
Given:
Generate synthetic data: y = 3x₁ + 2x₂ + noise
Use 100 samples with 2 features
Tasks:
Linear Algebra: Set up normal equations with regularization
Calculus: Derive gradient of ridge objective J(w) = ||Xw - y||² + λ||w||²
Optimization: Implement gradient descent to minimize J(w)
Probability: Add noise to data, analyze effect on estimates
Matrix Decompositions: Solve using SVD as alternative
Comparison: Compare gradient descent vs closed-form solution
This exercise tests your understanding of how all concepts connect in real ML!
# Your code here - Good luck! 🚀
📊 Self-Assessment¶
After completing these exercises:
Check your understanding:
✅ Can you explain why each method works, not just how?
✅ Can you identify when to use each technique in ML?
✅ Can you combine concepts from different chapters?
Next Steps:
Review solutions in
mml_solutions_part1.ipynbRedo exercises where you struggled
Move to
mml_exercises_part2.ipynb(ML Applications)Apply these concepts in real ML projects!
Remember: Understanding the math gives you superpowers in machine learning! 🦸
Debug models when they fail
Design better architectures
Understand why things work
Read research papers with confidence