The Architecture of Mathematics: Analytical vs Numerical ApproachesΒΆ
Source: The Math Behind Artificial Intelligence β Chapter 2
OverviewΒΆ
Mathematics is not a flat collection of formulas β it is a tree, where foundational ideas (logic, set theory) branch into pure math (algebra, topology, analysis) and applied math (statistics, optimization, numerical methods). AI draws from every branch.
This notebook covers:
The Tree of Mathematics: how everything connects
Symbolic (Analytical) Math β exact, closed-form solutions
Numerical Math β approximate, computational solutions
When to use each approach in AI/ML
GΓΆdelβs Incompleteness β why math has limits
1. The Tree of MathematicsΒΆ
MATHEMATICS
β
ββββββββββββββ΄βββββββββββββ
Pure Math Applied Math
β β
ββββββββΌβββββββ βββββββββΌββββββββ
Algebra Topology Analysis Stats Optim Numerics
β β β β β
Linear Calculus Real Prob GradDescent NumDE
Algebra Manifolds Analysis Bayes Adam SVD
β β β β β
βββββββββββββββ AI / ML ββββββββββββββββββββββ
Key insight for AI: AI is applied mathematics. Every ML algorithm maps to a mathematical area:
Neural networks β Linear algebra + Calculus
Training β Optimization theory
Uncertainty β Probability & Statistics
Transformers β Linear algebra + Information theory
2. Symbolic (Analytical) MathΒΆ
Exact solutions through algebraic manipulationΒΆ
Symbolic math manipulates mathematical expressions as formal symbols β variables, operators, and functions remain abstract objects rather than being evaluated to floating-point numbers. The goal is to find exact, closed-form solutions: expressions that give the answer precisely, for all possible inputs.
When you solve \(2x + 3y = 6\) and \(-x + y = 1\) symbolically, you get \(x = 3/5\) and \(y = 8/5\) β not \(x \approx 0.60000001\). This exactness is crucial for deriving general theorems, proving convergence bounds, and understanding the structure of a solution. SymPyβs solve, diff, and integrate functions perform these manipulations programmatically, applying the same algebraic rules you would use by hand but without arithmetic errors.
In the context of ML, symbolic math is essential for deriving gradient formulas, proving that a loss function is convex, computing closed-form solutions for simple models (e.g., the normal equation for linear regression: \(\theta^* = (X^TX)^{-1}X^Ty\)), and finding critical points of loss landscapes analytically. The limitation is scalability β symbolic expressions grow combinatorially with problem size, making them impractical for the million-parameter models common in deep learning.
from sympy import symbols, Eq, solve, diff, integrate, latex, simplify, expand
from sympy import sin, cos, exp, pi, sqrt, oo
import sympy as sp
# --- Example 1: Solving a system of linear equations (from book, Chapter 2) ---
x, y = symbols('x y')
eq1 = Eq(2*x + 3*y, 6)
eq2 = Eq(-x + y, 1)
solution = solve((eq1, eq2), (x, y))
print("System of equations:")
print(f" 2x + 3y = 6")
print(f" -x + y = 1")
print(f"\nExact symbolic solution: {solution}")
print(f" x = {solution[x]}, y = {solution[y]}")
# --- Example 2: Symbolic differentiation ---
x = symbols('x')
f = x**3 + 3*x**2 - 2*x + 5
df = diff(f, x)
d2f = diff(f, x, 2) # second derivative
print(f"f(x) = {f}")
print(f"f'(x) = {df}")
print(f"f''(x) = {d2f}")
print(f"\nLaTeX: f'(x) = {latex(df)}")
# --- Example 3: Symbolic integration ---
x = symbols('x')
# Indefinite integral
f = x**2 * exp(-x)
F = integrate(f, x)
print(f"β« {f} dx = {F} + C")
# Definite integral
result = integrate(f, (x, 0, oo))
print(f"β«β^β {f} dx = {result}")
# --- Example 4: Finding critical points (gradient = 0) ---
# Loss function: L(w) = w^4 - 4w^2 + 3
w = symbols('w')
L = w**4 - 4*w**2 + 3
dL = diff(L, w)
critical_points = solve(dL, w)
print(f"Loss: L(w) = {L}")
print(f"dL/dw = {dL}")
print(f"Critical points (dL/dw = 0): {critical_points}")
# Check which are minima/maxima
d2L = diff(L, w, 2)
for cp in critical_points:
val = d2L.subs(w, cp)
kind = "minimum" if val > 0 else "maximum"
print(f" w={cp:6.3f}: L={float(L.subs(w,cp)):.3f} β {kind}")
3. Numerical (Approximate) MathΒΆ
Computational algorithms for problems that defy exact solutionsΒΆ
Numerical math abandons the quest for exact symbolic answers and instead computes approximate solutions using iterative algorithms on floating-point numbers. Where symbolic math requires that a closed-form solution exists, numerical methods only require that the problem is well-posed and that an algorithm can reduce error to an acceptable tolerance.
The trade-off is precision for scalability. Solving a \(5 \times 5\) linear system symbolically is feasible; solving a \(50{,}000 \times 50{,}000\) system (common in finite element analysis or large-scale ML) is only possible numerically. NumPyβs np.linalg.solve uses LU decomposition under the hood, computing in \(O(n^3)\) floating-point operations β fast even for large matrices, at the cost of rounding errors on the order of machine epsilon (\(\approx 10^{-16}\) for 64-bit floats).
In deep learning, nearly all computation is numerical: forward passes are matrix multiplications on GPU tensors, losses are evaluated at specific data points, and gradients are computed via automatic differentiation at concrete parameter values. The key insight is that we do not need exact answers β we need answers that are good enough to make the next gradient step in the right direction.
import numpy as np
from scipy.linalg import solve as scipy_solve
# --- Example 1: Solving a 5x5 system numerically (from book, Chapter 2) ---
A = np.array([[3, 2, -1, 4, 5],
[1, 1, 3, 2, -2],
[4, -1, 2, 1, 0],
[5, 3, -2, 1, 1],
[2, -3, 1, 3, 4]])
b = np.array([12, 5, 7, 9, 10])
solution = scipy_solve(A, b)
print("5x5 system Ax = b")
print(f"\nNumerical solution: {np.round(solution, 6)}")
# Verify: Ax should equal b
residual = np.linalg.norm(A @ solution - b)
print(f"Residual ||Ax - b|| = {residual:.2e} (should be ~0)")
# --- Example 2: Numerical integration (Riemann sums vs scipy) ---
from scipy import integrate as sci_integrate
import matplotlib.pyplot as plt
# Function: f(x) = x^2 * e^(-x)
def f(x):
return x**2 * np.exp(-x)
# Numerical integration methods
methods = {}
# Riemann left sum (crude)
n = 1000
x = np.linspace(0, 20, n)
dx = x[1] - x[0]
methods['Riemann (n=1000)'] = np.sum(f(x[:-1])) * dx
# Scipy quad (adaptive, very accurate)
result_quad, error = sci_integrate.quad(f, 0, np.inf)
methods['SciPy quad'] = result_quad
# Exact (symbolic) answer = 2
methods['Exact (SymPy)'] = 2.0
print("β«β^β xΒ² e^(-x) dx")
print("-" * 35)
for name, val in methods.items():
print(f" {name:25s} = {val:.8f}")
# --- Example 3: Numerical differentiation vs symbolic ---
# Numerical gradient at a point using finite differences
def numerical_gradient(f, x, h=1e-5):
return (f(x + h) - f(x - h)) / (2 * h) # central difference
f_poly = lambda x: x**3 + 3*x**2 - 2*x + 5
f_prime_exact = lambda x: 3*x**2 + 6*x - 2 # symbolic answer
test_points = [-2, -1, 0, 1, 2, 3]
print("f(x) = xΒ³ + 3xΒ² - 2x + 5 β f'(x) = 3xΒ² + 6x - 2")
print(f"\n{'x':>5} | {'Exact':>12} | {'Numerical':>12} | {'Error':>12}")
print("-" * 48)
for xi in test_points:
exact = f_prime_exact(xi)
numerical = numerical_gradient(f_poly, xi)
error = abs(exact - numerical)
print(f"{xi:>5} | {exact:>12.8f} | {numerical:>12.8f} | {error:>12.2e}")
4. Analytical vs Numerical: When to Use EachΒΆ
Choosing the right tool for the mathematical jobΒΆ
The choice between analytical and numerical methods is not about which is βbetterβ β it is about which is appropriate for the problem at hand. Analytical solutions, when they exist, provide exact answers, general formulas, and deep insight into structure. Numerical solutions handle the vast majority of real-world problems where closed forms do not exist.
Situation |
Approach |
Why |
|---|---|---|
Small linear systems (4 vars or fewer) |
Symbolic |
Exact solution in closed form |
Large linear systems (millions of params) |
Numerical |
Symbolic is computationally infeasible |
Gradient computation in ML |
Numerical (autograd) |
Model graphs are too complex for symbolic |
Proving convergence guarantees |
Symbolic |
Need exact inequalities |
Bayesian posterior with conjugate prior |
Symbolic |
Closed-form posterior exists |
Bayesian posterior (non-conjugate) |
Numerical (MCMC/VI) |
No closed form |
PDE/ODE in physics simulation |
Numerical (FEM) |
Symbolic solution usually doesnβt exist |
In deep learning, PyTorch uses automatic differentiation (autograd) β a hybrid that tracks the computation symbolically through a directed acyclic graph (DAG) but evaluates everything numerically. Each tensor operation records its inputs and the function that produced it. During backward(), PyTorch walks this graph in reverse, applying the chain rule symbolically (it knows that the derivative of \(x^2\) is \(2x\)) but evaluating the result numerically at the current parameter values. This gives the best of both worlds: the correctness of symbolic differentiation with the speed of numerical evaluation.
# --- Autograd: PyTorch's hybrid approach ---
import torch
# PyTorch tracks the computation graph symbolically, evaluates numerically
x = torch.tensor(2.0, requires_grad=True)
# f(x) = x^3 + 3x^2 - 2x + 5
f = x**3 + 3*x**2 - 2*x + 5
f.backward() # computes df/dx symbolically through the graph
print(f"x = {x.item()}")
print(f"f(x) = {f.item():.4f}")
print(f"f'(x) = {x.grad.item():.4f} (PyTorch autograd)")
print(f"f'(x) = {3*2**2 + 6*2 - 2:.4f} (Exact: 3xΒ²+6x-2 at x=2)")
5. GΓΆdelβs Incompleteness TheoremsΒΆ
In 1931, Kurt GΓΆdel proved two theorems that fundamentally limit what mathematics can know about itself:
First Theorem: In any consistent formal system capable of expressing basic arithmetic, there exist true statements that cannot be proven within that system.
Second Theorem: A consistent system cannot prove its own consistency.
Why This Matters for AIΒΆ
Halting Problem (Turing, 1936): You cannot write a program that always determines whether any given program will halt. This is directly related to GΓΆdel β some questions are undecidable.
Generalization Limits: No learning algorithm can perfectly generalize from finite data to all possible inputs (related to No Free Lunch theorem).
AI Completeness: There will always be tasks that AI systems cannot provably solve correctly.
# --- Demonstrating computational limits: the Collatz conjecture ---
# True for all tested numbers (up to 2^68), but never proven for all n
# A concrete example of a simple statement that may be undecidable
def collatz_steps(n):
"""Count steps to reach 1 under Collatz iteration."""
steps = 0
while n != 1:
n = n // 2 if n % 2 == 0 else 3 * n + 1
steps += 1
return steps
# Test the conjecture (always terminates in practice, never proven in general)
test_ns = [27, 97, 871, 6171, 77031]
print("Collatz Conjecture: every n eventually reaches 1")
print(f"\n{'n':>8} | {'Steps to reach 1':>18}")
print("-" * 30)
for n in test_ns:
print(f"{n:>8} | {collatz_steps(n):>18}")
print("\nβ Works for all tested values, but no one has proven it for ALL integers.")
print("β This illustrates GΓΆdel: verifiable statements that may be unprovable.")
6. The History of Mathematics in 5 MilestonesΒΆ
Era |
Milestone |
Impact on AI |
|---|---|---|
~300 BC |
Euclidβs Elements β formal proof system |
Template for all mathematical reasoning |
1666-1687 |
Newton & Leibniz invent calculus |
Gradient descent (backprop) relies on this |
1800s |
Gauss, Fourier, Riemann β analysis & geometry |
Signal processing, spectral methods, embeddings |
1930s |
GΓΆdel, Turing, Church β computation limits |
Defines what computers (and AI) can/canβt do |
1940s-now |
Von Neumann, Shannon, Wiener β information theory, control |
Entropy, KL divergence, feedback control |
SummaryΒΆ
Symbolic math = exact, algebraic, good for small/tractable problems
Numerical math = approximate, iterative, scales to massive problems
AI uses both: symbolic reasoning (logic) + numerical optimization (gradient descent)
PyTorch autograd is the bridge: symbolic graph tracking + numerical evaluation
GΓΆdelβs theorems set fundamental limits on what any formal system (including AI) can prove
ExercisesΒΆ
Use SymPy to solve the system:
3x - y + 2z = 11,x + 2y - z = 3,2x - y + 3z = 14Compute the symbolic derivative of
f(x) = sin(xΒ²) * e^x. Verify numerically atx = Ο/4.What is the difference between a numerical gradient (finite differences) and automatic differentiation (autograd)? Which does PyTorch use?
Why canβt we always use symbolic math for training neural networks? Give a concrete reason.
Look up the Halting Problem. How does it relate to GΓΆdelβs First Incompleteness Theorem?