Chapter 11: Taylor SeriesΒΆ
Approximating Any Function with PolynomialsΒΆ
Taylor series are one of the most powerful ideas in applied mathematics. The core insight is that any smooth function can be perfectly reconstructed from its derivatives at a single point. Near \(x = a\), the Taylor expansion is:
Each successive term captures finer detail: the constant term matches the value, the linear term matches the slope, the quadratic term matches the curvature, and so on. Truncating at degree \(n\) gives a polynomial that agrees with \(f\) through its first \(n\) derivatives at \(a\).
Why this matters for ML/AI: Taylor approximations underpin virtually every optimization algorithm. Gradient descent is a first-order Taylor approximation (linear), Newtonβs method uses a second-order approximation (quadratic), and understanding the error of truncation tells you how far a single gradient step can be trusted β this is the theoretical basis for learning rate selection. Activation function design (GELU, Swish) also leverages Taylor-like polynomial smoothness properties.
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (14, 10)
Building Intuition: Watch the Approximation ImproveΒΆ
The power of Taylor series becomes vivid when you see successive polynomial approximations converging to the true function. Starting with just the constant \(f(a)\), each additional term bends the polynomial to match one more derivative of the target function. The approximation is excellent near \(x = a\) and degrades as you move further away β the radius of convergence determines how far you can go.
The code below computes Taylor polynomials of increasing degree for functions like \(\sin(x)\) and \(e^x\) around \(a = 0\) (a Maclaurin series). Notice how a degree-5 polynomial already captures \(\sin(x)\) remarkably well over a wide range, while \(e^x\) β which grows without bound β requires higher-degree terms for accuracy far from the origin. This connects to the idea of local vs. global approximation in ML: a linear model (degree 1) captures the local trend, but you need more complexity (higher degree or nonlinear models) to fit the global pattern.