# Setup: Import required libraries
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import linalg
from scipy.stats import multivariate_normal

# Set style
sns.set_style('whitegrid')
np.random.seed(42)

print("✅ Libraries loaded successfully!")

Chapter 2: Linear Algebra

Exercise 2.1: Gaussian Elimination 🟢

Solve the following system of linear equations using Gaussian elimination:

2x + 3y - z = 5
4x + 4y - 3z = 3
-2x + 3y - z = 1

Tasks:

  1. Convert to augmented matrix form

  2. Apply row operations to get row echelon form

  3. Back-substitute to find x, y, z

  4. Verify your solution

# Your code here
# Hint: Create augmented matrix [A | b]

Exercise 2.2: Linear Independence 🟡

Determine whether the following vectors are linearly independent:

v1 = [1, 2, 3]
v2 = [2, 4, 6]
v3 = [1, 1, 1]

Method: Check if the matrix formed by these vectors has full rank.

# Your code here
# Hint: Use np.linalg.matrix_rank()

Exercise 2.3: Basis and Dimension 🟡

Given the following vectors in ℝ³:

v1 = [1, 0, 1]
v2 = [0, 1, 1]
v3 = [1, 1, 0]

Tasks:

  1. Verify they form a basis for ℝ³

  2. Express the vector [2, 3, 1] in this basis

  3. Convert back to standard basis to verify

# Your code here

Exercise 2.4: Linear Transformations 🟡

Consider the transformation T: ℝ² → ℝ² that:

  • Scales x-direction by 2

  • Rotates 45° counterclockwise

Tasks:

  1. Construct the transformation matrix (combine scaling and rotation)

  2. Apply to the unit square vertices: (0,0), (1,0), (1,1), (0,1)

  3. Plot both original and transformed squares

# Your code here
# Hint: Rotation matrix is [[cos θ, -sin θ], [sin θ, cos θ]]

Chapter 3: Analytic Geometry

Exercise 3.1: Norms and Distances 🟢

Given vectors:

x = [3, 4]
y = [1, 2]

Calculate:

  1. L1 norm (Manhattan) of x

  2. L2 norm (Euclidean) of x

  3. L∞ norm (Maximum) of x

  4. Distance between x and y (L2)

  5. Cosine similarity between x and y

# Your code here

Exercise 3.2: Gram-Schmidt Orthogonalization 🟡

Apply Gram-Schmidt process to orthogonalize:

v1 = [3, 1]
v2 = [2, 2]

Tasks:

  1. Compute orthogonal vectors u1, u2

  2. Normalize to get orthonormal vectors

  3. Verify orthogonality (dot product = 0)

  4. Verify unit length

# Your code here
# Formula: u1 = v1, u2 = v2 - proj_u1(v2)

Exercise 3.3: Projection and Distance 🟡

Given a point p = [5, 5] and a line defined by vector v = [1, 2] passing through origin:

Tasks:

  1. Project p onto the line (find closest point on line)

  2. Calculate distance from p to the line

  3. Visualize: plot p, the line, projection point, and perpendicular distance

# Your code here
# Projection formula: proj_v(p) = (p·v / v·v) * v

Exercise 3.4: Angles and Orthogonality 🟢

For vectors:

a = [1, 1, 1]
b = [1, -1, 0]
c = [2, 0, -2]

Calculate:

  1. Angle between a and b (in degrees)

  2. Angle between b and c

  3. Are any pairs orthogonal?

  4. Find a vector orthogonal to both a and b (use cross product)

# Your code here
# Angle formula: cos(θ) = (a·b) / (||a|| ||b||)

Chapter 4: Matrix Decompositions

Exercise 4.1: Eigenvalues and Eigenvectors 🟢

For matrix:

A = [[4, 1],
     [2, 3]]

Tasks:

  1. Compute eigenvalues

  2. Compute eigenvectors

  3. Verify: Av = λv for each eigenvalue/eigenvector pair

  4. Visualize the transformation effect on eigenvectors

# Your code here

Exercise 4.2: Eigendecomposition 🟡

Given symmetric matrix:

S = [[5, 2],
     [2, 5]]

Tasks:

  1. Decompose S = PDP^T (eigendecomposition)

  2. Verify decomposition by reconstructing S

  3. Compute S² using eigendecomposition (faster than direct multiplication)

  4. Compute matrix square root √S

# Your code here
# S^2 = P D^2 P^T (eigenvalue exponentiation)

Exercise 4.3: Singular Value Decomposition 🟡

Apply SVD to:

M = [[3, 2, 2],
     [2, 3, -2]]

Tasks:

  1. Compute SVD: M = UΣV^T

  2. Verify reconstruction

  3. What is the rank of M?

  4. Compute low-rank approximation (rank-1) and calculate reconstruction error

# Your code here

Exercise 4.4: Cholesky Decomposition 🔴

For positive definite matrix:

A = [[4, 2],
     [2, 3]]

Tasks:

  1. Verify A is positive definite (check eigenvalues > 0)

  2. Compute Cholesky decomposition A = LL^T

  3. Solve Ax = b for b = [1, 2] using forward and backward substitution

  4. Compare efficiency with direct solve

# Your code here

Chapter 5: Vector Calculus

Exercise 5.1: Gradients 🟢

For functions:

  1. f(x, y) = x² + y²

  2. g(x, y) = x²y + xy²

Tasks:

  1. Compute gradients analytically (by hand, write formulas)

  2. Implement numerical gradient computation

  3. Evaluate at point (2, 3)

  4. Visualize gradient field with quiver plot

# Your code here
# Numerical gradient: (f(x+h) - f(x-h)) / (2h)

Exercise 5.2: Jacobian Matrix 🟡

For vector function F: ℝ² → ℝ²:

F([x, y]) = [x² - y², 2xy]

Tasks:

  1. Compute Jacobian matrix J analytically

  2. Evaluate J at point (1, 1)

  3. Implement numerical Jacobian

  4. Verify your analytical result matches numerical approximation

# Your code here

Exercise 5.3: Chain Rule and Backpropagation 🟡

Consider a simple computational graph:

x  a = x²  b = a + 1  c = b³  output

Tasks:

  1. Compute forward pass for x = 2

  2. Compute ∂c/∂x using chain rule analytically

  3. Implement backward pass (backpropagation)

  4. Verify numerical gradient

# Your code here
# Chain rule: dc/dx = (dc/db) * (db/da) * (da/dx)

Exercise 5.4: Hessian Matrix 🔴

For function f(x, y) = x³ + y³ - 3xy:

Tasks:

  1. Compute Hessian matrix (matrix of second derivatives)

  2. Find critical points (where gradient = 0)

  3. Classify critical points using Hessian (minimum/maximum/saddle)

  4. Visualize function with contours and mark critical points

# Your code here
# Hessian: H = [[∂²f/∂x², ∂²f/∂x∂y], [∂²f/∂y∂x, ∂²f/∂y²]]

Chapter 6: Probability and Distributions

Exercise 6.1: Probability Basics 🟢

A bag contains 3 red, 4 blue, and 5 green marbles.

Calculate:

  1. P(drawing a blue marble)

  2. P(drawing a red OR green marble)

  3. P(drawing 2 blue marbles without replacement)

  4. Simulate 10,000 draws and verify your probabilities

# Your code here

Exercise 6.2: Bayes’ Theorem 🟡

Medical test scenario:

  • Disease prevalence: 1% of population

  • Test sensitivity (true positive rate): 95%

  • Test specificity (true negative rate): 90%

Tasks:

  1. If someone tests positive, what’s the probability they have the disease?

  2. If someone tests negative, what’s the probability they’re healthy?

  3. Implement a function to compute posterior probabilities

  4. Visualize how posterior changes with different prior probabilities

# Your code here
# P(Disease|Positive) = P(Positive|Disease) * P(Disease) / P(Positive)

Exercise 6.3: Gaussian Distribution 🟡

Given: μ = 10, σ = 2

Tasks:

  1. Generate 1000 samples from N(10, 4)

  2. Plot histogram and overlay theoretical PDF

  3. Calculate P(8 < X < 12) analytically and empirically

  4. Find the value x such that P(X < x) = 0.95

  5. Compute sample mean and variance, compare to theoretical

# Your code here

Exercise 6.4: Multivariate Gaussian 🔴

Create a 2D Gaussian with:

μ = [0, 0]
Σ = [[2, 1],
     [1, 2]]

Tasks:

  1. Generate 500 samples

  2. Plot samples with covariance ellipse

  3. Compute correlation coefficient

  4. Apply transformation to make variables independent (whitening)

  5. Verify independence (compute covariance of transformed data)

# Your code here

Chapter 7: Continuous Optimization

Exercise 7.1: Gradient Descent 🟢

Minimize f(x) = x² + 5x + 6 using gradient descent.

Tasks:

  1. Implement gradient descent from scratch

  2. Start from x₀ = 10, use learning rate α = 0.1

  3. Track convergence (plot x vs iteration)

  4. Find optimal x analytically and compare

  5. Experiment with different learning rates (too small, too large)

# Your code here

Exercise 7.2: Momentum 🟡

For function f(x, y) = x² + 10y²:

Tasks:

  1. Implement vanilla gradient descent

  2. Implement gradient descent with momentum (β = 0.9)

  3. Start from (10, 10), use α = 0.01

  4. Plot trajectories of both methods

  5. Compare convergence speed (iterations to reach tolerance)

# Your code here
# Momentum: v = β*v + (1-β)*gradient; x = x - α*v

Exercise 7.3: Newton’s Method 🔴

Minimize f(x) = x⁴ - 3x³ + 2 using Newton’s method.

Tasks:

  1. Implement Newton’s method (uses Hessian)

  2. Compare with gradient descent

  3. Visualize convergence (quadratic vs linear)

  4. What happens with bad initialization? (Try x₀ = 0.5)

# Your code here
# Newton: x_new = x_old - H^(-1) * gradient

Exercise 7.4: Constrained Optimization 🔴

Minimize f(x, y) = (x - 3)² + (y - 2)²
Subject to: x + y = 5

Tasks:

  1. Solve using Lagrange multipliers (analytical)

  2. Verify solution satisfies constraint

  3. Visualize: plot objective contours, constraint line, and solution

  4. Use projected gradient descent as alternative method

# Your code here
# Lagrangian: L(x, y, λ) = f(x,y) + λ(x + y - 5)

🎯 Bonus Challenge: Integration Exercise 🔴🔴

Ridge Regression with Gradient Descent

Combine concepts from multiple chapters to implement ridge regression:

Given:

  • Generate synthetic data: y = 3x₁ + 2x₂ + noise

  • Use 100 samples with 2 features

Tasks:

  1. Linear Algebra: Set up normal equations with regularization

  2. Calculus: Derive gradient of ridge objective J(w) = ||Xw - y||² + λ||w||²

  3. Optimization: Implement gradient descent to minimize J(w)

  4. Probability: Add noise to data, analyze effect on estimates

  5. Matrix Decompositions: Solve using SVD as alternative

  6. Comparison: Compare gradient descent vs closed-form solution

This exercise tests your understanding of how all concepts connect in real ML!

# Your code here - Good luck! 🚀

📊 Self-Assessment

After completing these exercises:

Check your understanding:

  • ✅ Can you explain why each method works, not just how?

  • ✅ Can you identify when to use each technique in ML?

  • ✅ Can you combine concepts from different chapters?

Next Steps:

  1. Review solutions in mml_solutions_part1.ipynb

  2. Redo exercises where you struggled

  3. Move to mml_exercises_part2.ipynb (ML Applications)

  4. Apply these concepts in real ML projects!

Remember: Understanding the math gives you superpowers in machine learning! 🦸

  • Debug models when they fail

  • Design better architectures

  • Understand why things work

  • Read research papers with confidence