Run this notebook: Open in Colab Open in Kaggle

The Architecture of Mathematics: Analytical vs Numerical Approaches¶

Source: The Math Behind Artificial Intelligence — Chapter 2

Overview¶

Mathematics is not a flat collection of formulas — it is a tree, where foundational ideas (logic, set theory) branch into pure math (algebra, topology, analysis) and applied math (statistics, optimization, numerical methods). AI draws from every branch.

This notebook covers:

The Tree of Mathematics: how everything connects
Symbolic (Analytical) Math — exact, closed-form solutions
Numerical Math — approximate, computational solutions
When to use each approach in AI/ML
Gödel’s Incompleteness — why math has limits

1. The Tree of Mathematics¶

                    MATHEMATICS
                        │
           ┌────────────┴────────────┐
        Pure Math              Applied Math
           │                        │
    ┌──────┼──────┐         ┌───────┼───────┐
 Algebra  Topology Analysis  Stats  Optim  Numerics
    │                │         │      │        │
 Linear    Calculus Real     Prob   GradDescent NumDE
 Algebra   Manifolds Analysis Bayes  Adam    SVD
    │                │         │      │        │
  ─────────────── AI / ML ──────────────────────

Key insight for AI: AI is applied mathematics. Every ML algorithm maps to a mathematical area:

Neural networks → Linear algebra + Calculus
Training → Optimization theory
Uncertainty → Probability & Statistics
Transformers → Linear algebra + Information theory

2. Symbolic (Analytical) Math¶

Exact solutions through algebraic manipulation¶

Symbolic math manipulates mathematical expressions as formal symbols – variables, operators, and functions remain abstract objects rather than being evaluated to floating-point numbers. The goal is to find exact, closed-form solutions: expressions that give the answer precisely, for all possible inputs.

When you solve \(2x + 3y = 6\) and \(-x + y = 1\) symbolically, you get \(x = 3/5\) and \(y = 8/5\) – not \(x \approx 0.60000001\). This exactness is crucial for deriving general theorems, proving convergence bounds, and understanding the structure of a solution. SymPy’s solve, diff, and integrate functions perform these manipulations programmatically, applying the same algebraic rules you would use by hand but without arithmetic errors.

In the context of ML, symbolic math is essential for deriving gradient formulas, proving that a loss function is convex, computing closed-form solutions for simple models (e.g., the normal equation for linear regression: \(\theta^* = (X^TX)^{-1}X^Ty\)), and finding critical points of loss landscapes analytically. The limitation is scalability – symbolic expressions grow combinatorially with problem size, making them impractical for the million-parameter models common in deep learning.

from sympy import symbols, Eq, solve, diff, integrate, latex, simplify, expand
from sympy import sin, cos, exp, pi, sqrt, oo
import sympy as sp

# --- Example 1: Solving a system of linear equations (from book, Chapter 2) ---
x, y = symbols('x y')
eq1 = Eq(2*x + 3*y, 6)
eq2 = Eq(-x + y, 1)

solution = solve((eq1, eq2), (x, y))
print("System of equations:")
print(f"  2x + 3y = 6")
print(f"  -x + y  = 1")
print(f"\nExact symbolic solution: {solution}")
print(f"  x = {solution[x]}, y = {solution[y]}")

# --- Example 2: Symbolic differentiation ---
x = symbols('x')
f = x**3 + 3*x**2 - 2*x + 5

df = diff(f, x)
d2f = diff(f, x, 2)   # second derivative

print(f"f(x)   = {f}")
print(f"f'(x)  = {df}")
print(f"f''(x) = {d2f}")
print(f"\nLaTeX: f'(x) = {latex(df)}")

# --- Example 3: Symbolic integration ---
x = symbols('x')

# Indefinite integral
f = x**2 * exp(-x)
F = integrate(f, x)
print(f"∫ {f} dx = {F} + C")

# Definite integral
result = integrate(f, (x, 0, oo))
print(f"∫₀^∞ {f} dx = {result}")

# --- Example 4: Finding critical points (gradient = 0) ---
# Loss function: L(w) = w^4 - 4w^2 + 3
w = symbols('w')
L = w**4 - 4*w**2 + 3

dL = diff(L, w)
critical_points = solve(dL, w)

print(f"Loss: L(w) = {L}")
print(f"dL/dw = {dL}")
print(f"Critical points (dL/dw = 0): {critical_points}")

# Check which are minima/maxima
d2L = diff(L, w, 2)
for cp in critical_points:
    val = d2L.subs(w, cp)
    kind = "minimum" if val > 0 else "maximum"
    print(f"  w={cp:6.3f}: L={float(L.subs(w,cp)):.3f} → {kind}")

3. Numerical (Approximate) Math¶

Computational algorithms for problems that defy exact solutions¶

Numerical math abandons the quest for exact symbolic answers and instead computes approximate solutions using iterative algorithms on floating-point numbers. Where symbolic math requires that a closed-form solution exists, numerical methods only require that the problem is well-posed and that an algorithm can reduce error to an acceptable tolerance.

The trade-off is precision for scalability. Solving a \(5 \times 5\) linear system symbolically is feasible; solving a \(50{,}000 \times 50{,}000\) system (common in finite element analysis or large-scale ML) is only possible numerically. NumPy’s np.linalg.solve uses LU decomposition under the hood, computing in \(O(n^3)\) floating-point operations – fast even for large matrices, at the cost of rounding errors on the order of machine epsilon (\(\approx 10^{-16}\) for 64-bit floats).

In deep learning, nearly all computation is numerical: forward passes are matrix multiplications on GPU tensors, losses are evaluated at specific data points, and gradients are computed via automatic differentiation at concrete parameter values. The key insight is that we do not need exact answers – we need answers that are good enough to make the next gradient step in the right direction.

import numpy as np
from scipy.linalg import solve as scipy_solve

# --- Example 1: Solving a 5x5 system numerically (from book, Chapter 2) ---
A = np.array([[3, 2, -1, 4, 5],
              [1, 1,  3, 2, -2],
              [4, -1, 2, 1,  0],
              [5, 3, -2, 1,  1],
              [2, -3, 1, 3,  4]])

b = np.array([12, 5, 7, 9, 10])

solution = scipy_solve(A, b)
print("5x5 system Ax = b")
print(f"\nNumerical solution: {np.round(solution, 6)}")

# Verify: Ax should equal b
residual = np.linalg.norm(A @ solution - b)
print(f"Residual ||Ax - b|| = {residual:.2e}  (should be ~0)")

# --- Example 2: Numerical integration (Riemann sums vs scipy) ---
from scipy import integrate as sci_integrate
import matplotlib.pyplot as plt

# Function: f(x) = x^2 * e^(-x)
def f(x):
    return x**2 * np.exp(-x)

# Numerical integration methods
methods = {}

# Riemann left sum (crude)
n = 1000
x = np.linspace(0, 20, n)
dx = x[1] - x[0]
methods['Riemann (n=1000)'] = np.sum(f(x[:-1])) * dx

# Scipy quad (adaptive, very accurate)
result_quad, error = sci_integrate.quad(f, 0, np.inf)
methods['SciPy quad'] = result_quad

# Exact (symbolic) answer = 2
methods['Exact (SymPy)'] = 2.0

print("∫₀^∞ x² e^(-x) dx")
print("-" * 35)
for name, val in methods.items():
    print(f"  {name:25s} = {val:.8f}")

# --- Example 3: Numerical differentiation vs symbolic ---

# Numerical gradient at a point using finite differences
def numerical_gradient(f, x, h=1e-5):
    return (f(x + h) - f(x - h)) / (2 * h)   # central difference

f_poly = lambda x: x**3 + 3*x**2 - 2*x + 5
f_prime_exact = lambda x: 3*x**2 + 6*x - 2   # symbolic answer

test_points = [-2, -1, 0, 1, 2, 3]
print("f(x) = x³ + 3x² - 2x + 5   →   f'(x) = 3x² + 6x - 2")
print(f"\n{'x':>5} | {'Exact':>12} | {'Numerical':>12} | {'Error':>12}")
print("-" * 48)
for xi in test_points:
    exact = f_prime_exact(xi)
    numerical = numerical_gradient(f_poly, xi)
    error = abs(exact - numerical)
    print(f"{xi:>5} | {exact:>12.8f} | {numerical:>12.8f} | {error:>12.2e}")

4. Analytical vs Numerical: When to Use Each¶

Choosing the right tool for the mathematical job¶

The choice between analytical and numerical methods is not about which is “better” – it is about which is appropriate for the problem at hand. Analytical solutions, when they exist, provide exact answers, general formulas, and deep insight into structure. Numerical solutions handle the vast majority of real-world problems where closed forms do not exist.

Situation	Approach	Why
Small linear systems (4 vars or fewer)	Symbolic	Exact solution in closed form
Large linear systems (millions of params)	Numerical	Symbolic is computationally infeasible
Gradient computation in ML	Numerical (autograd)	Model graphs are too complex for symbolic
Proving convergence guarantees	Symbolic	Need exact inequalities
Bayesian posterior with conjugate prior	Symbolic	Closed-form posterior exists
Bayesian posterior (non-conjugate)	Numerical (MCMC/VI)	No closed form
PDE/ODE in physics simulation	Numerical (FEM)	Symbolic solution usually doesn’t exist

In deep learning, PyTorch uses automatic differentiation (autograd) – a hybrid that tracks the computation symbolically through a directed acyclic graph (DAG) but evaluates everything numerically. Each tensor operation records its inputs and the function that produced it. During backward(), PyTorch walks this graph in reverse, applying the chain rule symbolically (it knows that the derivative of \(x^2\) is \(2x\)) but evaluating the result numerically at the current parameter values. This gives the best of both worlds: the correctness of symbolic differentiation with the speed of numerical evaluation.

# --- Autograd: PyTorch's hybrid approach ---
import torch

# PyTorch tracks the computation graph symbolically, evaluates numerically
x = torch.tensor(2.0, requires_grad=True)

# f(x) = x^3 + 3x^2 - 2x + 5
f = x**3 + 3*x**2 - 2*x + 5
f.backward()   # computes df/dx symbolically through the graph

print(f"x = {x.item()}")
print(f"f(x)   = {f.item():.4f}")
print(f"f'(x)  = {x.grad.item():.4f}  (PyTorch autograd)")
print(f"f'(x)  = {3*2**2 + 6*2 - 2:.4f}  (Exact: 3x²+6x-2 at x=2)")

5. Gödel’s Incompleteness Theorems¶

In 1931, Kurt Gödel proved two theorems that fundamentally limit what mathematics can know about itself:

First Theorem: In any consistent formal system capable of expressing basic arithmetic, there exist true statements that cannot be proven within that system.

Second Theorem: A consistent system cannot prove its own consistency.

Why This Matters for AI¶

Halting Problem (Turing, 1936): You cannot write a program that always determines whether any given program will halt. This is directly related to Gödel — some questions are undecidable.
Generalization Limits: No learning algorithm can perfectly generalize from finite data to all possible inputs (related to No Free Lunch theorem).
AI Completeness: There will always be tasks that AI systems cannot provably solve correctly.

# --- Demonstrating computational limits: the Collatz conjecture ---
# True for all tested numbers (up to 2^68), but never proven for all n
# A concrete example of a simple statement that may be undecidable

def collatz_steps(n):
    """Count steps to reach 1 under Collatz iteration."""
    steps = 0
    while n != 1:
        n = n // 2 if n % 2 == 0 else 3 * n + 1
        steps += 1
    return steps

# Test the conjecture (always terminates in practice, never proven in general)
test_ns = [27, 97, 871, 6171, 77031]
print("Collatz Conjecture: every n eventually reaches 1")
print(f"\n{'n':>8} | {'Steps to reach 1':>18}")
print("-" * 30)
for n in test_ns:
    print(f"{n:>8} | {collatz_steps(n):>18}")

print("\n→ Works for all tested values, but no one has proven it for ALL integers.")
print("→ This illustrates Gödel: verifiable statements that may be unprovable.")

6. The History of Mathematics in 5 Milestones¶

Era	Milestone	Impact on AI
~300 BC	Euclid’s Elements — formal proof system	Template for all mathematical reasoning
1666-1687	Newton & Leibniz invent calculus	Gradient descent (backprop) relies on this
1800s	Gauss, Fourier, Riemann — analysis & geometry	Signal processing, spectral methods, embeddings
1930s	Gödel, Turing, Church — computation limits	Defines what computers (and AI) can/can’t do
1940s-now	Von Neumann, Shannon, Wiener — information theory, control	Entropy, KL divergence, feedback control

Summary¶

Symbolic math = exact, algebraic, good for small/tractable problems
Numerical math = approximate, iterative, scales to massive problems
AI uses both: symbolic reasoning (logic) + numerical optimization (gradient descent)
PyTorch autograd is the bridge: symbolic graph tracking + numerical evaluation
Gödel’s theorems set fundamental limits on what any formal system (including AI) can prove

Exercises¶

Use SymPy to solve the system: 3x - y + 2z = 11, x + 2y - z = 3, 2x - y + 3z = 14
Compute the symbolic derivative of f(x) = sin(x²) * e^x. Verify numerically at x = π/4.
What is the difference between a numerical gradient (finite differences) and automatic differentiation (autograd)? Which does PyTorch use?
Why can’t we always use symbolic math for training neural networks? Give a concrete reason.
Look up the Halting Problem. How does it relate to Gödel’s First Incompleteness Theorem?