Run this notebook: Open in Colab Open in Kaggle

# Setup: Import required libraries
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import linalg
from scipy.stats import multivariate_normal

# Set style
sns.set_style('whitegrid')
np.random.seed(42)

print("✅ Libraries loaded successfully!")

Chapter 2: Linear Algebra¶

Exercise 2.1: Gaussian Elimination 🟢¶

Solve the following system of linear equations using Gaussian elimination:

2x + 3y - z = 5
4x + 4y - 3z = 3
-2x + 3y - z = 1

Tasks:

Convert to augmented matrix form
Apply row operations to get row echelon form
Back-substitute to find x, y, z
Verify your solution

# Your code here
# Hint: Create augmented matrix [A | b]

Exercise 2.2: Linear Independence 🟡¶

Determine whether the following vectors are linearly independent:

v1 = [1, 2, 3]
v2 = [2, 4, 6]
v3 = [1, 1, 1]

Method: Check if the matrix formed by these vectors has full rank.

# Your code here
# Hint: Use np.linalg.matrix_rank()

Exercise 2.3: Basis and Dimension 🟡¶

Given the following vectors in ℝ³:

v1 = [1, 0, 1]
v2 = [0, 1, 1]
v3 = [1, 1, 0]

Tasks:

Verify they form a basis for ℝ³
Express the vector [2, 3, 1] in this basis
Convert back to standard basis to verify

# Your code here

Exercise 2.4: Linear Transformations 🟡¶

Consider the transformation T: ℝ² → ℝ² that:

Scales x-direction by 2
Rotates 45° counterclockwise

Tasks:

Construct the transformation matrix (combine scaling and rotation)
Apply to the unit square vertices: (0,0), (1,0), (1,1), (0,1)
Plot both original and transformed squares

# Your code here
# Hint: Rotation matrix is [[cos θ, -sin θ], [sin θ, cos θ]]

Chapter 3: Analytic Geometry¶

Exercise 3.1: Norms and Distances 🟢¶

Given vectors:

x = [3, 4]
y = [1, 2]

Calculate:

L1 norm (Manhattan) of x
L2 norm (Euclidean) of x
L∞ norm (Maximum) of x
Distance between x and y (L2)
Cosine similarity between x and y

# Your code here

Exercise 3.2: Gram-Schmidt Orthogonalization 🟡¶

Apply Gram-Schmidt process to orthogonalize:

v1 = [3, 1]
v2 = [2, 2]

Tasks:

Compute orthogonal vectors u1, u2
Normalize to get orthonormal vectors
Verify orthogonality (dot product = 0)
Verify unit length

# Your code here
# Formula: u1 = v1, u2 = v2 - proj_u1(v2)

Exercise 3.3: Projection and Distance 🟡¶

Given a point p = [5, 5] and a line defined by vector v = [1, 2] passing through origin:

Tasks:

Project p onto the line (find closest point on line)
Calculate distance from p to the line
Visualize: plot p, the line, projection point, and perpendicular distance

# Your code here
# Projection formula: proj_v(p) = (p·v / v·v) * v

Exercise 3.4: Angles and Orthogonality 🟢¶

For vectors:

a = [1, 1, 1]
b = [1, -1, 0]
c = [2, 0, -2]

Calculate:

Angle between a and b (in degrees)
Angle between b and c
Are any pairs orthogonal?
Find a vector orthogonal to both a and b (use cross product)

# Your code here
# Angle formula: cos(θ) = (a·b) / (||a|| ||b||)

Chapter 4: Matrix Decompositions¶

Exercise 4.1: Eigenvalues and Eigenvectors 🟢¶

For matrix:

A = [[4, 1],
     [2, 3]]

Tasks:

Compute eigenvalues
Compute eigenvectors
Verify: Av = λv for each eigenvalue/eigenvector pair
Visualize the transformation effect on eigenvectors

# Your code here

Exercise 4.2: Eigendecomposition 🟡¶

Given symmetric matrix:

S = [[5, 2],
     [2, 5]]

Tasks:

Decompose S = PDP^T (eigendecomposition)
Verify decomposition by reconstructing S
Compute S² using eigendecomposition (faster than direct multiplication)
Compute matrix square root √S

# Your code here
# S^2 = P D^2 P^T (eigenvalue exponentiation)

Exercise 4.3: Singular Value Decomposition 🟡¶

Apply SVD to:

M = [[3, 2, 2],
     [2, 3, -2]]

Tasks:

Compute SVD: M = UΣV^T
Verify reconstruction
What is the rank of M?
Compute low-rank approximation (rank-1) and calculate reconstruction error

# Your code here

Exercise 4.4: Cholesky Decomposition 🔴¶

For positive definite matrix:

A = [[4, 2],
     [2, 3]]

Tasks:

Verify A is positive definite (check eigenvalues > 0)
Compute Cholesky decomposition A = LL^T
Solve Ax = b for b = [1, 2] using forward and backward substitution
Compare efficiency with direct solve

# Your code here

Chapter 5: Vector Calculus¶

Exercise 5.1: Gradients 🟢¶

For functions:

f(x, y) = x² + y²
g(x, y) = x²y + xy²

Tasks:

Compute gradients analytically (by hand, write formulas)
Implement numerical gradient computation
Evaluate at point (2, 3)
Visualize gradient field with quiver plot

# Your code here
# Numerical gradient: (f(x+h) - f(x-h)) / (2h)

Exercise 5.2: Jacobian Matrix 🟡¶

For vector function F: ℝ² → ℝ²:

F([x, y]) = [x² - y², 2xy]

Tasks:

Compute Jacobian matrix J analytically
Evaluate J at point (1, 1)
Implement numerical Jacobian
Verify your analytical result matches numerical approximation

# Your code here

Exercise 5.3: Chain Rule and Backpropagation 🟡¶

Consider a simple computational graph:

x → a = x² → b = a + 1 → c = b³ → output

Tasks:

Compute forward pass for x = 2
Compute ∂c/∂x using chain rule analytically
Implement backward pass (backpropagation)
Verify numerical gradient

# Your code here
# Chain rule: dc/dx = (dc/db) * (db/da) * (da/dx)

Exercise 5.4: Hessian Matrix 🔴¶

For function f(x, y) = x³ + y³ - 3xy:

Tasks:

Compute Hessian matrix (matrix of second derivatives)
Find critical points (where gradient = 0)
Classify critical points using Hessian (minimum/maximum/saddle)
Visualize function with contours and mark critical points

# Your code here
# Hessian: H = [[∂²f/∂x², ∂²f/∂x∂y], [∂²f/∂y∂x, ∂²f/∂y²]]

Chapter 6: Probability and Distributions¶

Exercise 6.1: Probability Basics 🟢¶

A bag contains 3 red, 4 blue, and 5 green marbles.

Calculate:

P(drawing a blue marble)
P(drawing a red OR green marble)
P(drawing 2 blue marbles without replacement)
Simulate 10,000 draws and verify your probabilities

# Your code here

Exercise 6.2: Bayes’ Theorem 🟡¶

Medical test scenario:

Disease prevalence: 1% of population
Test sensitivity (true positive rate): 95%
Test specificity (true negative rate): 90%

Tasks:

If someone tests positive, what’s the probability they have the disease?
If someone tests negative, what’s the probability they’re healthy?
Implement a function to compute posterior probabilities
Visualize how posterior changes with different prior probabilities

# Your code here
# P(Disease|Positive) = P(Positive|Disease) * P(Disease) / P(Positive)

Exercise 6.3: Gaussian Distribution 🟡¶

Given: μ = 10, σ = 2

Tasks:

Generate 1000 samples from N(10, 4)
Plot histogram and overlay theoretical PDF
Calculate P(8 < X < 12) analytically and empirically
Find the value x such that P(X < x) = 0.95
Compute sample mean and variance, compare to theoretical

# Your code here

Exercise 6.4: Multivariate Gaussian 🔴¶

Create a 2D Gaussian with:

μ = [0, 0]
Σ = [[2, 1],
     [1, 2]]

Tasks:

Generate 500 samples
Plot samples with covariance ellipse
Compute correlation coefficient
Apply transformation to make variables independent (whitening)
Verify independence (compute covariance of transformed data)

# Your code here

Chapter 7: Continuous Optimization¶

Exercise 7.1: Gradient Descent 🟢¶

Minimize f(x) = x² + 5x + 6 using gradient descent.

Tasks:

Implement gradient descent from scratch
Start from x₀ = 10, use learning rate α = 0.1
Track convergence (plot x vs iteration)
Find optimal x analytically and compare
Experiment with different learning rates (too small, too large)

# Your code here

Exercise 7.2: Momentum 🟡¶

For function f(x, y) = x² + 10y²:

Tasks:

Implement vanilla gradient descent
Implement gradient descent with momentum (β = 0.9)
Start from (10, 10), use α = 0.01
Plot trajectories of both methods
Compare convergence speed (iterations to reach tolerance)

# Your code here
# Momentum: v = β*v + (1-β)*gradient; x = x - α*v

Exercise 7.3: Newton’s Method 🔴¶

Minimize f(x) = x⁴ - 3x³ + 2 using Newton’s method.

Tasks:

Implement Newton’s method (uses Hessian)
Compare with gradient descent
Visualize convergence (quadratic vs linear)
What happens with bad initialization? (Try x₀ = 0.5)

# Your code here
# Newton: x_new = x_old - H^(-1) * gradient

Exercise 7.4: Constrained Optimization 🔴¶

Minimize f(x, y) = (x - 3)² + (y - 2)²
Subject to: x + y = 5

Tasks:

Solve using Lagrange multipliers (analytical)
Verify solution satisfies constraint
Visualize: plot objective contours, constraint line, and solution
Use projected gradient descent as alternative method

# Your code here
# Lagrangian: L(x, y, λ) = f(x,y) + λ(x + y - 5)

🎯 Bonus Challenge: Integration Exercise 🔴🔴¶

Ridge Regression with Gradient Descent¶

Combine concepts from multiple chapters to implement ridge regression:

Given:

Generate synthetic data: y = 3x₁ + 2x₂ + noise
Use 100 samples with 2 features

Tasks:

Linear Algebra: Set up normal equations with regularization
Calculus: Derive gradient of ridge objective J(w) = ||Xw - y||² + λ||w||²
Optimization: Implement gradient descent to minimize J(w)
Probability: Add noise to data, analyze effect on estimates
Matrix Decompositions: Solve using SVD as alternative
Comparison: Compare gradient descent vs closed-form solution

This exercise tests your understanding of how all concepts connect in real ML!

# Your code here - Good luck! 🚀

📊 Self-Assessment¶

After completing these exercises:

Check your understanding:

✅ Can you explain why each method works, not just how?
✅ Can you identify when to use each technique in ML?
✅ Can you combine concepts from different chapters?

Next Steps:

Review solutions in mml_solutions_part1.ipynb
Redo exercises where you struggled
Move to mml_exercises_part2.ipynb (ML Applications)
Apply these concepts in real ML projects!

Remember: Understanding the math gives you superpowers in machine learning! 🦸

Debug models when they fail
Design better architectures
Understand why things work
Read research papers with confidence