AI Foundations: Symbolic vs Non-Symbolic AI & Control TheoryΒΆ

Source: The Math Behind Artificial Intelligence β€” Chapter 3

OverviewΒΆ

Before modern deep learning, AI took two radically different paths β€” and control theory was doing AI before β€œAI” was even a field.

This notebook covers:

  1. What is Artificial Intelligence? β€” a precise definition

  2. Symbolic AI (GOFAI) β€” rule-based reasoning

  3. Non-Symbolic AI β€” statistical learning & neural networks

  4. Control Theory as the β€œFirst AI” β€” feedback systems

  5. PID controllers β€” math of classical control

  6. From control theory β†’ reinforcement learning

1. What is Artificial Intelligence?ΒΆ

A precise definition and the landscape of AI approachesΒΆ

AI is the development of systems that perform tasks which, when done by humans, would require intelligence. This deceptively simple definition encompasses everything from a thermostat (which makes decisions based on sensory input) to a large language model (which generates coherent text from a prompt). The field has historically organized itself around three levels of capability:

Level

Name

Description

Status

1

Narrow AI (ANI)

Solves one specific task (chess, image classification)

Achieved

2

General AI (AGI)

Performs any intellectual task a human can

In progress

3

Super AI (ASI)

Surpasses human intelligence in all domains

Theoretical

Two fundamental paradigms have competed to achieve AI, each with its own mathematical foundations:

  • Symbolic AI: encode intelligence as rules and logic (rooted in formal logic and discrete mathematics)

  • Non-Symbolic AI: learn intelligence from data (rooted in statistics, optimization, and linear algebra)

Understanding both paradigms – and how modern systems increasingly hybridize them – is essential for appreciating why the mathematical foundations covered in this curriculum matter.

2. Symbolic AI (GOFAI β€” Good Old-Fashioned AI)ΒΆ

Symbolic AI represents knowledge as symbols, rules, and logic β€” the way humans explicitly describe reasoning.

Key idea: If you can write down the rules, you can build an intelligent system.

Examples:

  • Expert systems (1970s-80s): MYCIN (medical diagnosis), DENDRAL (chemistry)

  • Prolog programs

  • Decision trees with hand-crafted features

  • Chess engines (Deep Blue used hybrid symbolic + search)

Strength: Interpretable, provably correct for defined domains

Weakness: The knowledge acquisition bottleneck β€” you can’t write rules for everything

# --- Symbolic AI: A simple expert system for loan approval ---

def expert_loan_system(income, credit_score, debt_ratio, employment_years):
    """
    Hand-crafted rule-based expert system.
    Represents symbolic AI: explicit if-then rules.
    """
    reasons = []
    
    # Rule 1: Minimum income
    if income < 30000:
        reasons.append("Income below minimum ($30k)")
    
    # Rule 2: Credit score threshold
    if credit_score < 620:
        reasons.append(f"Credit score {credit_score} below minimum (620)")
    
    # Rule 3: Debt-to-income ratio
    if debt_ratio > 0.43:
        reasons.append(f"Debt ratio {debt_ratio:.0%} exceeds limit (43%)")
    
    # Rule 4: Employment stability
    if employment_years < 2:
        reasons.append(f"Employment history {employment_years}y below minimum (2y)")
    
    # Composite rule: exceptional credit overrides income rule
    if credit_score >= 750 and income >= 25000:
        reasons = [r for r in reasons if "Income" not in r]
    
    approved = len(reasons) == 0
    return approved, reasons


# Test applicants
applicants = [
    {"name": "Alice",   "income": 75000, "credit_score": 720, "debt_ratio": 0.30, "employment_years": 5},
    {"name": "Bob",     "income": 28000, "credit_score": 580, "debt_ratio": 0.50, "employment_years": 1},
    {"name": "Charlie", "income": 26000, "credit_score": 760, "debt_ratio": 0.35, "employment_years": 3},
]

print("Expert System β€” Loan Approval Decisions")
print("=" * 50)
for a in applicants:
    approved, reasons = expert_loan_system(
        a["income"], a["credit_score"], a["debt_ratio"], a["employment_years"]
    )
    status = "βœ… APPROVED" if approved else "❌ DENIED"
    print(f"\n{a['name']}: {status}")
    if reasons:
        for r in reasons:
            print(f"  - {r}")

3. Non-Symbolic AI β€” Statistical & Neural LearningΒΆ

Non-symbolic AI learns rules from data rather than having them hand-coded.

Key idea: Given enough examples, a statistical model discovers its own patterns.

Three generations:

  1. Statistical ML (1980s-2000s): SVMs, decision trees, random forests

  2. Deep Learning (2012-present): CNNs, RNNs, Transformers

  3. Foundation Models (2020-present): GPT, BERT, CLIP, Gemini

Strength: Handles perceptual problems (vision, speech, language) that are impossible to rule-encode

Weakness: Requires large data, black-box, can fail unpredictably

# --- Non-Symbolic AI: Same loan problem, learned from data ---
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
import warnings
warnings.filterwarnings('ignore')

# Generate synthetic training data
np.random.seed(42)
n = 1000

income        = np.random.normal(55000, 20000, n).clip(15000, 150000)
credit_score  = np.random.normal(680, 80, n).clip(400, 850)
debt_ratio    = np.random.uniform(0.1, 0.6, n)
emp_years     = np.random.exponential(4, n).clip(0, 20)

# True outcome (based on similar logic to expert system, plus noise)
score = (income/100000)*0.3 + (credit_score/850)*0.4 + (1-debt_ratio)*0.2 + (emp_years/20)*0.1
approved = (score + np.random.normal(0, 0.05, n)) > 0.5

X = np.column_stack([income, credit_score, debt_ratio, emp_years])
y = approved.astype(int)

# Train a logistic regression model
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
model = LogisticRegression()
model.fit(X_scaled, y)

print("Non-Symbolic AI β€” Learned Model Coefficients")
print("(model discovered these from data, no rules were hard-coded)")
print()
features = ['Income', 'Credit Score', 'Debt Ratio', 'Employment Years']
for feat, coef in zip(features, model.coef_[0]):
    direction = "↑ more β†’ approved" if coef > 0 else "↑ more β†’ denied"
    print(f"  {feat:20s}: coef={coef:+.3f}  ({direction})")

# Predict same applicants
print("\nPredictions on same applicants:")
for a in applicants:
    x = scaler.transform([[a['income'], a['credit_score'], a['debt_ratio'], a['employment_years']]])
    prob = model.predict_proba(x)[0][1]
    status = "βœ… APPROVED" if prob > 0.5 else "❌ DENIED"
    print(f"  {a['name']}: {status}  (probability={prob:.2%})")

4. Control Theory β€” The β€œFirst AIӢ

Control theory studies how to make a system behave the way you want using feedback loops.

Developed in the 1940s-50s (Norbert Wiener’s Cybernetics, 1948), control theory solved intelligent adaptive behavior before AI was formalized.

       Goal (setpoint)
           β”‚
           β–Ό
   β”Œβ”€β”€ [Controller] ──→ [System/Plant] ──→ Output
   β”‚         ↑                                β”‚
   β”‚      Error                               β”‚
   └──── [Sensor] β†β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              (Feedback)

Examples: Cruise control, thermostat, autopilot, insulin pump, rocket guidance

Connection to ML: Gradient descent is a feedback control loop:

  • Setpoint = 0 loss

  • Error = current loss

  • Controller = optimizer (SGD, Adam)

  • System = neural network

# --- PID Controller: The classic control theory algorithm ---
import matplotlib.pyplot as plt

class PIDController:
    """
    Proportional-Integral-Derivative (PID) controller.
    The foundational algorithm of control theory.
    
    Output = Kp*error + Ki*integral(error) + Kd*derivative(error)
    """
    def __init__(self, Kp, Ki, Kd, dt=0.1):
        self.Kp = Kp   # Proportional gain: react to current error
        self.Ki = Ki   # Integral gain: react to accumulated past error
        self.Kd = Kd   # Derivative gain: react to rate of error change
        self.dt = dt
        self.integral = 0
        self.prev_error = 0
    
    def step(self, setpoint, measured_value):
        error = setpoint - measured_value
        
        # P: proportional to current error
        P = self.Kp * error
        
        # I: integral of past errors (eliminates steady-state error)
        self.integral += error * self.dt
        I = self.Ki * self.integral
        
        # D: derivative (predicts future error, dampens oscillation)
        derivative = (error - self.prev_error) / self.dt
        D = self.Kd * derivative
        
        self.prev_error = error
        return P + I + D


# Simulate a temperature control system
# Target: reach 70Β°C from 20Β°C, system has thermal inertia
def simulate_temperature_control(Kp, Ki, Kd, setpoint=70.0, steps=200):
    pid = PIDController(Kp, Ki, Kd, dt=0.1)
    temp = 20.0
    temps = [temp]
    
    for _ in range(steps):
        control = pid.step(setpoint, temp)
        # System dynamics: temperature changes proportionally to control input
        # but with inertia (slow response) and cooling (drag)
        temp += 0.1 * (control * 0.5 - (temp - 20) * 0.02)
        temps.append(temp)
    
    return temps

# Compare different PID tunings
configs = [
    ("P only (Kp=1.0)",         1.0, 0.0, 0.0, 'blue'),
    ("PD (Kp=1.0, Kd=0.5)",    1.0, 0.0, 0.5, 'green'),
    ("PID (Kp=1.0, Ki=0.1, Kd=0.5)", 1.0, 0.1, 0.5, 'red'),
]

t = np.arange(0, 20.1, 0.1)
fig, ax = plt.subplots(figsize=(12, 5))

ax.axhline(y=70, color='black', linestyle='--', linewidth=2, label='Setpoint (70Β°C)', alpha=0.7)
for label, Kp, Ki, Kd, color in configs:
    temps = simulate_temperature_control(Kp, Ki, Kd)
    ax.plot(t, temps, color=color, linewidth=2, label=label)

ax.set_xlabel('Time (seconds)', fontsize=12)
ax.set_ylabel('Temperature (Β°C)', fontsize=12)
ax.set_title('PID Controller: Temperature Control', fontsize=14, fontweight='bold')
ax.legend(fontsize=10)
ax.grid(True, alpha=0.3)
ax.set_ylim(15, 85)
plt.tight_layout()
plt.show()

print("\nPID components:")
print("  P (Proportional): Reacts to current error β€” fast but may overshoot")
print("  I (Integral):     Eliminates steady-state error β€” fixes offset")
print("  D (Derivative):   Predicts future error β€” dampens oscillations")

5. From Control Theory to Machine LearningΒΆ

The mathematical bridge between feedback systems and neural network trainingΒΆ

The parallels between control theory and machine learning are not mere analogy – they reflect a deep mathematical kinship. Both fields solve the same fundamental problem: given a system with tunable parameters, how do you adjust those parameters to minimize the discrepancy between desired and actual behavior? The terminology differs, but the equations are strikingly similar.

Control Theory

Machine Learning

Setpoint / Reference

Target label / desired output

Error signal

Loss function

Controller

Optimizer (SGD, Adam)

Plant / System

Neural network

Feedback loop

Backpropagation

Stability analysis

Convergence proofs

PID gains (Kp, Ki, Kd)

Learning rate, momentum, \(\beta_1\), \(\beta_2\)

Integral windup

Gradient accumulation (Adam)

Derivative kick

Momentum in optimizers

The PID controller’s integral term accumulates past errors to eliminate steady-state offset, much like Adam’s first moment \(m_t\) accumulates gradient history. The derivative term anticipates future error by measuring the rate of change, analogous to how momentum dampens oscillations during training. Even modern concepts like learning rate schedules (warm-up, cosine annealing) mirror the gain scheduling techniques that control engineers have used for decades. Reinforcement learning makes this connection explicit: the Bellman equation in RL is a discrete-time version of control theory’s Hamilton-Jacobi-Bellman equation.

# --- Gradient Descent as a Control Loop ---
# Training a simple model IS running a PID-like feedback loop

# Simple 1D loss landscape: L(w) = (w - 3)^2
# True minimum at w = 3

def loss(w):
    return (w - 3.0) ** 2

def grad_loss(w):
    return 2 * (w - 3.0)

# Compare: plain gradient descent vs momentum (more like PID)
def gradient_descent(w0=10.0, lr=0.1, steps=30):
    w = w0
    history = [w]
    for _ in range(steps):
        w = w - lr * grad_loss(w)
        history.append(w)
    return history

def gradient_descent_momentum(w0=10.0, lr=0.1, beta=0.9, steps=30):
    w = w0
    v = 0  # velocity (integral term)
    history = [w]
    for _ in range(steps):
        v = beta * v + (1 - beta) * grad_loss(w)  # exponential moving average
        w = w - lr * v
        history.append(w)
    return history

gd = gradient_descent()
gd_mom = gradient_descent_momentum()

fig, axes = plt.subplots(1, 2, figsize=(13, 4))

# Parameter trajectory
axes[0].plot(gd, 'bo-', markersize=4, label='Plain GD')
axes[0].plot(gd_mom, 'rs-', markersize=4, label='GD + Momentum')
axes[0].axhline(y=3, color='green', linestyle='--', label='Optimal w=3')
axes[0].set_xlabel('Step')
axes[0].set_ylabel('Parameter w')
axes[0].set_title('Parameter Trajectory')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Loss trajectory
axes[1].semilogy([loss(w) for w in gd], 'bo-', markersize=4, label='Plain GD')
axes[1].semilogy([loss(w) for w in gd_mom], 'rs-', markersize=4, label='GD + Momentum')
axes[1].set_xlabel('Step')
axes[1].set_ylabel('Loss (log scale)')
axes[1].set_title('Loss Convergence')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("Gradient descent = feedback control:")
print(f"  Setpoint: w* = 3.0")
print(f"  Final GD parameter: w = {gd[-1]:.6f}")
print(f"  Final Momentum parameter: w = {gd_mom[-1]:.6f}")

6. Symbolic vs Non-Symbolic: A Modern PerspectiveΒΆ

Today’s most powerful AI systems are hybrids:

System

Symbolic component

Non-symbolic component

AlphaGo

Tree search (MCTS)

Value/policy networks

GPT + tools

Tool-use reasoning (code execution)

Transformer next-token prediction

AlphaCode

Syntax-correct code generation

LLM + test-based filtering

Theorem provers (Lean+AI)

Formal proof system

Neural proof search

The trend: Non-symbolic AI (LLMs) is increasingly incorporating symbolic reasoning through chain-of-thought, tool use, and formal verification.

SummaryΒΆ

  • Symbolic AI: explicit rules, interpretable, brittle for perception tasks

  • Non-symbolic AI: learned from data, powerful for perception, less interpretable

  • Control theory pioneered feedback learning before ML β€” same math, different framing

  • PID controller: P=react to error, I=fix accumulated error, D=predict future error

  • Gradient descent is a control feedback loop: error = loss, controller = optimizer

ExercisesΒΆ

  1. Modify the expert loan system to add a new rule: if both income > 100k AND credit_score > 750, auto-approve regardless of other factors.

  2. Which kind of AI (symbolic or non-symbolic) would you use for: (a) spam detection, (b) medical diagnosis requiring explanation, Β© real-time language translation? Justify.

  3. In the PID simulation, what happens when you increase Kd (derivative gain) too much? Run the code and observe.

  4. Map gradient descent to the PID framework: what corresponds to P, I, and D in a standard SGD update with momentum?

  5. Reinforcement learning (RL) is often described as control theory + ML. Research: what is the β€œBellman equation” and how does it relate to control theory’s Hamilton-Jacobi-Bellman equation?