The Architecture of Mathematics: Analytical vs Numerical ApproachesΒΆ

Source: The Math Behind Artificial Intelligence β€” Chapter 2

OverviewΒΆ

Mathematics is not a flat collection of formulas β€” it is a tree, where foundational ideas (logic, set theory) branch into pure math (algebra, topology, analysis) and applied math (statistics, optimization, numerical methods). AI draws from every branch.

This notebook covers:

  1. The Tree of Mathematics: how everything connects

  2. Symbolic (Analytical) Math β€” exact, closed-form solutions

  3. Numerical Math β€” approximate, computational solutions

  4. When to use each approach in AI/ML

  5. GΓΆdel’s Incompleteness β€” why math has limits

1. The Tree of MathematicsΒΆ

                    MATHEMATICS
                        β”‚
           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        Pure Math              Applied Math
           β”‚                        β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”
 Algebra  Topology Analysis  Stats  Optim  Numerics
    β”‚                β”‚         β”‚      β”‚        β”‚
 Linear    Calculus Real     Prob   GradDescent NumDE
 Algebra   Manifolds Analysis Bayes  Adam    SVD
    β”‚                β”‚         β”‚      β”‚        β”‚
  ─────────────── AI / ML ──────────────────────

Key insight for AI: AI is applied mathematics. Every ML algorithm maps to a mathematical area:

  • Neural networks β†’ Linear algebra + Calculus

  • Training β†’ Optimization theory

  • Uncertainty β†’ Probability & Statistics

  • Transformers β†’ Linear algebra + Information theory

2. Symbolic (Analytical) MathΒΆ

Exact solutions through algebraic manipulationΒΆ

Symbolic math manipulates mathematical expressions as formal symbols – variables, operators, and functions remain abstract objects rather than being evaluated to floating-point numbers. The goal is to find exact, closed-form solutions: expressions that give the answer precisely, for all possible inputs.

When you solve \(2x + 3y = 6\) and \(-x + y = 1\) symbolically, you get \(x = 3/5\) and \(y = 8/5\) – not \(x \approx 0.60000001\). This exactness is crucial for deriving general theorems, proving convergence bounds, and understanding the structure of a solution. SymPy’s solve, diff, and integrate functions perform these manipulations programmatically, applying the same algebraic rules you would use by hand but without arithmetic errors.

In the context of ML, symbolic math is essential for deriving gradient formulas, proving that a loss function is convex, computing closed-form solutions for simple models (e.g., the normal equation for linear regression: \(\theta^* = (X^TX)^{-1}X^Ty\)), and finding critical points of loss landscapes analytically. The limitation is scalability – symbolic expressions grow combinatorially with problem size, making them impractical for the million-parameter models common in deep learning.

from sympy import symbols, Eq, solve, diff, integrate, latex, simplify, expand
from sympy import sin, cos, exp, pi, sqrt, oo
import sympy as sp

# --- Example 1: Solving a system of linear equations (from book, Chapter 2) ---
x, y = symbols('x y')
eq1 = Eq(2*x + 3*y, 6)
eq2 = Eq(-x + y, 1)

solution = solve((eq1, eq2), (x, y))
print("System of equations:")
print(f"  2x + 3y = 6")
print(f"  -x + y  = 1")
print(f"\nExact symbolic solution: {solution}")
print(f"  x = {solution[x]}, y = {solution[y]}")
# --- Example 2: Symbolic differentiation ---
x = symbols('x')
f = x**3 + 3*x**2 - 2*x + 5

df = diff(f, x)
d2f = diff(f, x, 2)   # second derivative

print(f"f(x)   = {f}")
print(f"f'(x)  = {df}")
print(f"f''(x) = {d2f}")
print(f"\nLaTeX: f'(x) = {latex(df)}")
# --- Example 3: Symbolic integration ---
x = symbols('x')

# Indefinite integral
f = x**2 * exp(-x)
F = integrate(f, x)
print(f"∫ {f} dx = {F} + C")

# Definite integral
result = integrate(f, (x, 0, oo))
print(f"βˆ«β‚€^∞ {f} dx = {result}")
# --- Example 4: Finding critical points (gradient = 0) ---
# Loss function: L(w) = w^4 - 4w^2 + 3
w = symbols('w')
L = w**4 - 4*w**2 + 3

dL = diff(L, w)
critical_points = solve(dL, w)

print(f"Loss: L(w) = {L}")
print(f"dL/dw = {dL}")
print(f"Critical points (dL/dw = 0): {critical_points}")

# Check which are minima/maxima
d2L = diff(L, w, 2)
for cp in critical_points:
    val = d2L.subs(w, cp)
    kind = "minimum" if val > 0 else "maximum"
    print(f"  w={cp:6.3f}: L={float(L.subs(w,cp)):.3f} β†’ {kind}")

3. Numerical (Approximate) MathΒΆ

Computational algorithms for problems that defy exact solutionsΒΆ

Numerical math abandons the quest for exact symbolic answers and instead computes approximate solutions using iterative algorithms on floating-point numbers. Where symbolic math requires that a closed-form solution exists, numerical methods only require that the problem is well-posed and that an algorithm can reduce error to an acceptable tolerance.

The trade-off is precision for scalability. Solving a \(5 \times 5\) linear system symbolically is feasible; solving a \(50{,}000 \times 50{,}000\) system (common in finite element analysis or large-scale ML) is only possible numerically. NumPy’s np.linalg.solve uses LU decomposition under the hood, computing in \(O(n^3)\) floating-point operations – fast even for large matrices, at the cost of rounding errors on the order of machine epsilon (\(\approx 10^{-16}\) for 64-bit floats).

In deep learning, nearly all computation is numerical: forward passes are matrix multiplications on GPU tensors, losses are evaluated at specific data points, and gradients are computed via automatic differentiation at concrete parameter values. The key insight is that we do not need exact answers – we need answers that are good enough to make the next gradient step in the right direction.

import numpy as np
from scipy.linalg import solve as scipy_solve

# --- Example 1: Solving a 5x5 system numerically (from book, Chapter 2) ---
A = np.array([[3, 2, -1, 4, 5],
              [1, 1,  3, 2, -2],
              [4, -1, 2, 1,  0],
              [5, 3, -2, 1,  1],
              [2, -3, 1, 3,  4]])

b = np.array([12, 5, 7, 9, 10])

solution = scipy_solve(A, b)
print("5x5 system Ax = b")
print(f"\nNumerical solution: {np.round(solution, 6)}")

# Verify: Ax should equal b
residual = np.linalg.norm(A @ solution - b)
print(f"Residual ||Ax - b|| = {residual:.2e}  (should be ~0)")
# --- Example 2: Numerical integration (Riemann sums vs scipy) ---
from scipy import integrate as sci_integrate
import matplotlib.pyplot as plt

# Function: f(x) = x^2 * e^(-x)
def f(x):
    return x**2 * np.exp(-x)

# Numerical integration methods
methods = {}

# Riemann left sum (crude)
n = 1000
x = np.linspace(0, 20, n)
dx = x[1] - x[0]
methods['Riemann (n=1000)'] = np.sum(f(x[:-1])) * dx

# Scipy quad (adaptive, very accurate)
result_quad, error = sci_integrate.quad(f, 0, np.inf)
methods['SciPy quad'] = result_quad

# Exact (symbolic) answer = 2
methods['Exact (SymPy)'] = 2.0

print("βˆ«β‚€^∞ xΒ² e^(-x) dx")
print("-" * 35)
for name, val in methods.items():
    print(f"  {name:25s} = {val:.8f}")
# --- Example 3: Numerical differentiation vs symbolic ---

# Numerical gradient at a point using finite differences
def numerical_gradient(f, x, h=1e-5):
    return (f(x + h) - f(x - h)) / (2 * h)   # central difference

f_poly = lambda x: x**3 + 3*x**2 - 2*x + 5
f_prime_exact = lambda x: 3*x**2 + 6*x - 2   # symbolic answer

test_points = [-2, -1, 0, 1, 2, 3]
print("f(x) = xΒ³ + 3xΒ² - 2x + 5   β†’   f'(x) = 3xΒ² + 6x - 2")
print(f"\n{'x':>5} | {'Exact':>12} | {'Numerical':>12} | {'Error':>12}")
print("-" * 48)
for xi in test_points:
    exact = f_prime_exact(xi)
    numerical = numerical_gradient(f_poly, xi)
    error = abs(exact - numerical)
    print(f"{xi:>5} | {exact:>12.8f} | {numerical:>12.8f} | {error:>12.2e}")

4. Analytical vs Numerical: When to Use EachΒΆ

Choosing the right tool for the mathematical jobΒΆ

The choice between analytical and numerical methods is not about which is β€œbetter” – it is about which is appropriate for the problem at hand. Analytical solutions, when they exist, provide exact answers, general formulas, and deep insight into structure. Numerical solutions handle the vast majority of real-world problems where closed forms do not exist.

Situation

Approach

Why

Small linear systems (4 vars or fewer)

Symbolic

Exact solution in closed form

Large linear systems (millions of params)

Numerical

Symbolic is computationally infeasible

Gradient computation in ML

Numerical (autograd)

Model graphs are too complex for symbolic

Proving convergence guarantees

Symbolic

Need exact inequalities

Bayesian posterior with conjugate prior

Symbolic

Closed-form posterior exists

Bayesian posterior (non-conjugate)

Numerical (MCMC/VI)

No closed form

PDE/ODE in physics simulation

Numerical (FEM)

Symbolic solution usually doesn’t exist

In deep learning, PyTorch uses automatic differentiation (autograd) – a hybrid that tracks the computation symbolically through a directed acyclic graph (DAG) but evaluates everything numerically. Each tensor operation records its inputs and the function that produced it. During backward(), PyTorch walks this graph in reverse, applying the chain rule symbolically (it knows that the derivative of \(x^2\) is \(2x\)) but evaluating the result numerically at the current parameter values. This gives the best of both worlds: the correctness of symbolic differentiation with the speed of numerical evaluation.

# --- Autograd: PyTorch's hybrid approach ---
import torch

# PyTorch tracks the computation graph symbolically, evaluates numerically
x = torch.tensor(2.0, requires_grad=True)

# f(x) = x^3 + 3x^2 - 2x + 5
f = x**3 + 3*x**2 - 2*x + 5
f.backward()   # computes df/dx symbolically through the graph

print(f"x = {x.item()}")
print(f"f(x)   = {f.item():.4f}")
print(f"f'(x)  = {x.grad.item():.4f}  (PyTorch autograd)")
print(f"f'(x)  = {3*2**2 + 6*2 - 2:.4f}  (Exact: 3xΒ²+6x-2 at x=2)")

5. GΓΆdel’s Incompleteness TheoremsΒΆ

In 1931, Kurt GΓΆdel proved two theorems that fundamentally limit what mathematics can know about itself:

First Theorem: In any consistent formal system capable of expressing basic arithmetic, there exist true statements that cannot be proven within that system.

Second Theorem: A consistent system cannot prove its own consistency.

Why This Matters for AIΒΆ

  1. Halting Problem (Turing, 1936): You cannot write a program that always determines whether any given program will halt. This is directly related to GΓΆdel β€” some questions are undecidable.

  2. Generalization Limits: No learning algorithm can perfectly generalize from finite data to all possible inputs (related to No Free Lunch theorem).

  3. AI Completeness: There will always be tasks that AI systems cannot provably solve correctly.

# --- Demonstrating computational limits: the Collatz conjecture ---
# True for all tested numbers (up to 2^68), but never proven for all n
# A concrete example of a simple statement that may be undecidable

def collatz_steps(n):
    """Count steps to reach 1 under Collatz iteration."""
    steps = 0
    while n != 1:
        n = n // 2 if n % 2 == 0 else 3 * n + 1
        steps += 1
    return steps

# Test the conjecture (always terminates in practice, never proven in general)
test_ns = [27, 97, 871, 6171, 77031]
print("Collatz Conjecture: every n eventually reaches 1")
print(f"\n{'n':>8} | {'Steps to reach 1':>18}")
print("-" * 30)
for n in test_ns:
    print(f"{n:>8} | {collatz_steps(n):>18}")

print("\n→ Works for all tested values, but no one has proven it for ALL integers.")
print("β†’ This illustrates GΓΆdel: verifiable statements that may be unprovable.")

6. The History of Mathematics in 5 MilestonesΒΆ

Era

Milestone

Impact on AI

~300 BC

Euclid’s Elements β€” formal proof system

Template for all mathematical reasoning

1666-1687

Newton & Leibniz invent calculus

Gradient descent (backprop) relies on this

1800s

Gauss, Fourier, Riemann β€” analysis & geometry

Signal processing, spectral methods, embeddings

1930s

GΓΆdel, Turing, Church β€” computation limits

Defines what computers (and AI) can/can’t do

1940s-now

Von Neumann, Shannon, Wiener β€” information theory, control

Entropy, KL divergence, feedback control

SummaryΒΆ

  • Symbolic math = exact, algebraic, good for small/tractable problems

  • Numerical math = approximate, iterative, scales to massive problems

  • AI uses both: symbolic reasoning (logic) + numerical optimization (gradient descent)

  • PyTorch autograd is the bridge: symbolic graph tracking + numerical evaluation

  • GΓΆdel’s theorems set fundamental limits on what any formal system (including AI) can prove

ExercisesΒΆ

  1. Use SymPy to solve the system: 3x - y + 2z = 11, x + 2y - z = 3, 2x - y + 3z = 14

  2. Compute the symbolic derivative of f(x) = sin(xΒ²) * e^x. Verify numerically at x = Ο€/4.

  3. What is the difference between a numerical gradient (finite differences) and automatic differentiation (autograd)? Which does PyTorch use?

  4. Why can’t we always use symbolic math for training neural networks? Give a concrete reason.

  5. Look up the Halting Problem. How does it relate to GΓΆdel’s First Incompleteness Theorem?