Chapter 4: ClassificationΒΆ
OverviewΒΆ
Predicting a qualitative (categorical) response instead of quantitative.
Examples:
Email: spam or not spam
Medical: disease or no disease
Customer: will default or not
Transaction: fraudulent or legitimate
Key Methods:
Logistic Regression: Models P(Y=1|X) using logistic function
Linear Discriminant Analysis (LDA): Assumes Gaussian distributions, linear boundary
Quadratic Discriminant Analysis (QDA): Assumes Gaussian distributions, quadratic boundary
Naive Bayes: Assumes feature independence
K-Nearest Neighbors (KNN): Non-parametric, local averaging
Why Not Linear Regression?ΒΆ
For binary outcomes (0/1), linear regression:
Can predict values < 0 or > 1
No probabilistic interpretation
Assumes ordered categories for multi-class
Classification methods provide:
Probabilities: P(Y = k | X)
Proper handling of categorical outcomes
Better decision boundaries
4.1 Logistic RegressionΒΆ
The Logistic FunctionΒΆ
Instead of modeling Y directly, model the probability:
Log-Odds (Logit)ΒΆ
Key Properties:
Output always between 0 and 1
S-shaped curve
Linear in log-odds
InterpretationΒΆ
\(\beta_1 > 0\): Increasing X increases probability
\(\beta_1 < 0\): Increasing X decreases probability
\(e^{\beta_1}\): Odds ratio (multiplicative effect on odds)
Maximum Likelihood EstimationΒΆ
Coefficients estimated by maximizing likelihood function: $\(L(\beta_0, \beta_1) = \prod_{i:y_i=1} p(x_i) \prod_{i:y_i=0} (1-p(x_i))\)$
# Generate binary classification data
np.random.seed(42)
n = 500
# Feature: credit score (300-850)
credit_score = np.random.uniform(300, 850, n)
# True model: P(default) = logistic(-10 + 0.015*score)
# Higher credit score β lower default probability
true_beta0 = -10
true_beta1 = 0.015
log_odds = true_beta0 + true_beta1 * credit_score
prob_default = 1 / (1 + np.exp(-log_odds))
# Generate binary outcomes
default = np.random.binomial(1, prob_default)
df_default = pd.DataFrame({
'CreditScore': credit_score,
'Default': default
})
print("π Credit Default Dataset")
print(f"\nTotal observations: {n}")
print(f"Default rate: {default.mean():.2%}")
print(f"\nClass distribution:")
print(df_default['Default'].value_counts())
print(f"\nFirst few rows:")
print(df_default.head(10))
# Visualize
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Raw data
axes[0].scatter(credit_score[default==0], default[default==0],
alpha=0.3, label='No Default', s=30)
axes[0].scatter(credit_score[default==1], default[default==1],
alpha=0.3, label='Default', s=30)
axes[0].set_xlabel('Credit Score')
axes[0].set_ylabel('Default (0=No, 1=Yes)')
axes[0].set_title('Raw Classification Data')
axes[0].legend()
axes[0].grid(True, alpha=0.3)
# True probability curve
score_range = np.linspace(300, 850, 200)
true_prob = 1 / (1 + np.exp(-(true_beta0 + true_beta1 * score_range)))
axes[1].plot(score_range, true_prob, 'r-', linewidth=3, label='True P(Default)')
axes[1].scatter(credit_score, default, alpha=0.2, s=20)
axes[1].axhline(y=0.5, color='k', linestyle='--', alpha=0.5, label='Decision boundary')
axes[1].set_xlabel('Credit Score')
axes[1].set_ylabel('P(Default | Credit Score)')
axes[1].set_title('Logistic Function (True Model)')
axes[1].legend()
axes[1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Fit logistic regression
X = df_default[['CreditScore']].values
y = df_default['Default'].values
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Fit model
logistic_model = LogisticRegression()
logistic_model.fit(X_train, y_train)
beta0_hat = logistic_model.intercept_[0]
beta1_hat = logistic_model.coef_[0][0]
print("π Logistic Regression Results\n")
print(f"{'Parameter':<20} {'True Value':<15} {'Estimated':<15} {'Difference'}")
print("="*65)
print(f"{'Ξ²β (Intercept)':<20} {true_beta0:>12.4f} {beta0_hat:>12.4f} {abs(beta0_hat-true_beta0):>10.4f}")
print(f"{'Ξ²β (CreditScore)':<20} {true_beta1:>12.6f} {beta1_hat:>12.6f} {abs(beta1_hat-true_beta1):>10.6f}")
# Predictions
y_pred_prob = logistic_model.predict_proba(X_test)[:, 1]
y_pred = logistic_model.predict(X_test)
# Metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
print(f"\nπ Model Performance on Test Set:")
print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f} (Of predicted defaults, how many were correct?)")
print(f"Recall: {recall:.4f} (Of actual defaults, how many did we catch?)")
print(f"F1-Score: {f1:.4f} (Harmonic mean of precision and recall)")
# Visualize fitted model
plt.figure(figsize=(14, 5))
plt.subplot(1, 2, 1)
score_plot = np.linspace(300, 850, 200).reshape(-1, 1)
prob_plot = logistic_model.predict_proba(score_plot)[:, 1]
true_prob_plot = 1 / (1 + np.exp(-(true_beta0 + true_beta1 * score_plot.flatten())))
plt.plot(score_plot, true_prob_plot, 'g--', linewidth=2, label='True P(Default)', alpha=0.7)
plt.plot(score_plot, prob_plot, 'r-', linewidth=2, label='Estimated P(Default)')
plt.scatter(X_test, y_test, alpha=0.3, s=30, label='Test data')
plt.axhline(y=0.5, color='k', linestyle='--', alpha=0.5)
plt.xlabel('Credit Score')
plt.ylabel('P(Default)')
plt.title('Fitted Logistic Regression')
plt.legend()
plt.grid(True, alpha=0.3)
# Decision boundary
plt.subplot(1, 2, 2)
decision_boundary = -beta0_hat / beta1_hat
true_boundary = -true_beta0 / true_beta1
plt.scatter(X_test[y_pred==0], y_test[y_pred==0], c='blue', alpha=0.5,
s=50, label='Predicted: No Default')
plt.scatter(X_test[y_pred==1], y_test[y_pred==1], c='red', alpha=0.5,
s=50, label='Predicted: Default')
plt.axvline(x=decision_boundary, color='r', linestyle='-', linewidth=2,
label=f'Decision boundary ({decision_boundary:.0f})')
plt.axvline(x=true_boundary, color='g', linestyle='--', linewidth=2,
label=f'True boundary ({true_boundary:.0f})')
plt.xlabel('Credit Score')
plt.ylabel('Default')
plt.title('Classification with Decision Boundary')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
print(f"\nπ‘ Interpretation:")
print(f" β’ Decision boundary at score = {decision_boundary:.0f}")
print(f" β’ Below {decision_boundary:.0f}: Predict default")
print(f" β’ Above {decision_boundary:.0f}: Predict no default")
print(f" β’ Odds ratio = e^({beta1_hat:.6f}) = {np.exp(beta1_hat):.6f}")
print(f" β’ 10-point score increase multiplies odds by {np.exp(10*beta1_hat):.4f}")
4.2 Classification MetricsΒΆ
Confusion MatrixΒΆ
Predicted
Negative Positive
Actual Neg TN FP
Pos FN TP
Key MetricsΒΆ
Accuracy = (TP + TN) / Total
Precision = TP / (TP + FP) β Of predicted positives, how many correct?
Recall (Sensitivity) = TP / (TP + FN) β Of actual positives, how many caught?
Specificity = TN / (TN + FP) β Of actual negatives, how many caught?
F1-Score = 2 Γ (Precision Γ Recall) / (Precision + Recall)
ROC CurveΒΆ
Plot: Recall (TPR) vs False Positive Rate (FPR)
FPR = FP / (FP + TN) = 1 - Specificity
AUC (Area Under Curve): Overall performance measure
AUC = 1.0: Perfect classifier
AUC = 0.5: Random guessing
AUC < 0.5: Worse than random
# Confusion matrix and detailed metrics
cm = confusion_matrix(y_test, y_pred)
print("π Confusion Matrix\n")
print(" Predicted")
print(" No Default Default")
print(f"Actual No {cm[0,0]:>6} {cm[0,1]:>6} (TN, FP)")
print(f" Yes {cm[1,0]:>6} {cm[1,1]:>6} (FN, TP)")
TN, FP, FN, TP = cm[0,0], cm[0,1], cm[1,0], cm[1,1]
accuracy = (TP + TN) / (TP + TN + FP + FN)
precision = TP / (TP + FP) if (TP + FP) > 0 else 0
recall = TP / (TP + FN) if (TP + FN) > 0 else 0
specificity = TN / (TN + FP) if (TN + FP) > 0 else 0
f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
print(f"\nπ Detailed Metrics:")
print(f"\nAccuracy: {accuracy:.4f} = ({TP}+{TN}) / {TP+TN+FP+FN}")
print(f"Precision: {precision:.4f} = {TP} / ({TP}+{FP}) [Of predicted defaults, {precision:.1%} were correct]")
print(f"Recall: {recall:.4f} = {TP} / ({TP}+{FN}) [Of actual defaults, caught {recall:.1%}]")
print(f"Specificity: {specificity:.4f} = {TN} / ({TN}+{FP}) [Of actual non-defaults, {specificity:.1%} correct]")
print(f"F1-Score: {f1:.4f}")
# ROC Curve
fpr, tpr, thresholds = roc_curve(y_test, y_pred_prob)
roc_auc = roc_auc_score(y_test, y_pred_prob)
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Confusion matrix heatmap
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', ax=axes[0],
xticklabels=['No Default', 'Default'],
yticklabels=['No Default', 'Default'])
axes[0].set_ylabel('Actual')
axes[0].set_xlabel('Predicted')
axes[0].set_title(f'Confusion Matrix\nAccuracy: {accuracy:.2%}')
# ROC curve
axes[1].plot(fpr, tpr, linewidth=2, label=f'Logistic Regression (AUC = {roc_auc:.3f})')
axes[1].plot([0, 1], [0, 1], 'k--', linewidth=1, label='Random Classifier (AUC = 0.5)')
axes[1].set_xlabel('False Positive Rate (1 - Specificity)')
axes[1].set_ylabel('True Positive Rate (Recall)')
axes[1].set_title('ROC Curve')
axes[1].legend()
axes[1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
print(f"\nπ‘ ROC-AUC Interpretation:")
print(f" β’ AUC = {roc_auc:.3f} ({roc_auc*100:.1f}%)")
if roc_auc > 0.9:
print(f" β’ Excellent discrimination")
elif roc_auc > 0.8:
print(f" β’ Good discrimination")
elif roc_auc > 0.7:
print(f" β’ Fair discrimination")
else:
print(f" β’ Poor discrimination")
print(f" β’ Model is much better than random guessing β
")
4.3 Linear Discriminant Analysis (LDA)ΒΆ
Bayesβ Theorem ApproachΒΆ
where:
\(\pi_k\) = P(Y=k) β prior probability of class k
\(f_k(x)\) β density of X in class k
LDA AssumptionsΒΆ
Normal distributions: \(f_k(x) \sim N(\mu_k, \sigma^2)\)
Common variance: Same \(\sigma^2\) for all classes
Decision RuleΒΆ
Assign to class k that maximizes: $\(\delta_k(x) = x \cdot \frac{\mu_k}{\sigma^2} - \frac{\mu_k^2}{2\sigma^2} + \log(\pi_k)\)$
Linear decision boundary between classes
When to Use LDAΒΆ
Classes well-separated
Normal distribution reasonable
Small sample sizes (more stable than logistic)
Multi-class problems (>2 classes)
# Generate 2D data for LDA visualization
np.random.seed(42)
n_per_class = 200
# Class 0: mean=(2, 2)
# Class 1: mean=(5, 5)
# Common covariance
mean0 = np.array([2, 2])
mean1 = np.array([5, 5])
cov = np.array([[1, 0.5], [0.5, 1]]) # Common covariance
X0 = np.random.multivariate_normal(mean0, cov, n_per_class)
X1 = np.random.multivariate_normal(mean1, cov, n_per_class)
X_lda = np.vstack([X0, X1])
y_lda = np.hstack([np.zeros(n_per_class), np.ones(n_per_class)])
# Split data
X_train_lda, X_test_lda, y_train_lda, y_test_lda = train_test_split(
X_lda, y_lda, test_size=0.3, random_state=42)
# Fit LDA
lda_model = LinearDiscriminantAnalysis()
lda_model.fit(X_train_lda, y_train_lda)
# Fit Logistic Regression for comparison
logistic_2d = LogisticRegression()
logistic_2d.fit(X_train_lda, y_train_lda)
# Predictions
y_pred_lda = lda_model.predict(X_test_lda)
y_pred_logistic = logistic_2d.predict(X_test_lda)
acc_lda = accuracy_score(y_test_lda, y_pred_lda)
acc_logistic = accuracy_score(y_test_lda, y_pred_logistic)
print("π LDA vs Logistic Regression\n")
print(f"LDA Test Accuracy: {acc_lda:.4f}")
print(f"Logistic Test Accuracy: {acc_logistic:.4f}")
# Visualize decision boundaries
def plot_decision_boundary(model, X, y, ax, title):
h = 0.1
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
ax.contourf(xx, yy, Z, alpha=0.3, cmap='RdYlBu')
ax.scatter(X[y==0, 0], X[y==0, 1], c='blue', s=30, alpha=0.6, label='Class 0')
ax.scatter(X[y==1, 0], X[y==1, 1], c='red', s=30, alpha=0.6, label='Class 1')
ax.set_xlabel('Xβ')
ax.set_ylabel('Xβ')
ax.set_title(title)
ax.legend()
ax.grid(True, alpha=0.3)
fig, axes = plt.subplots(1, 2, figsize=(14, 6))
plot_decision_boundary(lda_model, X_test_lda, y_test_lda, axes[0],
f'LDA Decision Boundary\nAccuracy: {acc_lda:.2%}')
plot_decision_boundary(logistic_2d, X_test_lda, y_test_lda, axes[1],
f'Logistic Regression Boundary\nAccuracy: {acc_logistic:.2%}')
plt.tight_layout()
plt.show()
print(f"\nπ‘ Observations:")
print(f" β’ Both produce linear boundaries")
print(f" β’ LDA assumes Gaussian distributions with equal variance")
print(f" β’ Logistic makes no distributional assumptions")
print(f" β’ Similar performance when LDA assumptions met β
")
4.4 Quadratic Discriminant Analysis (QDA)ΒΆ
Difference from LDAΒΆ
LDA: Assumes common covariance matrix β Linear boundary
QDA: Allows different covariance per class β Quadratic boundary
QDA Decision FunctionΒΆ
Quadratic in x β Can model more complex boundaries
Bias-Variance TradeoffΒΆ
LDA: Lower variance, higher bias (fewer parameters)
QDA: Higher variance, lower bias (more parameters)
When to Use QDAΒΆ
Decision boundary is clearly non-linear
Large training set (many parameters to estimate)
Classes have different variances
# Generate data with different covariances (QDA advantageous)
np.random.seed(42)
n_per_class = 200
mean0 = np.array([2, 2])
mean1 = np.array([5, 5])
# DIFFERENT covariances
cov0 = np.array([[1, 0], [0, 1]]) # Circular
cov1 = np.array([[3, 2], [2, 3]]) # Elliptical
X0_qda = np.random.multivariate_normal(mean0, cov0, n_per_class)
X1_qda = np.random.multivariate_normal(mean1, cov1, n_per_class)
X_qda = np.vstack([X0_qda, X1_qda])
y_qda = np.hstack([np.zeros(n_per_class), np.ones(n_per_class)])
# Split
X_train_qda, X_test_qda, y_train_qda, y_test_qda = train_test_split(
X_qda, y_qda, test_size=0.3, random_state=42)
# Fit models
lda_qda = LinearDiscriminantAnalysis()
qda_model = QuadraticDiscriminantAnalysis()
lda_qda.fit(X_train_qda, y_train_qda)
qda_model.fit(X_train_qda, y_train_qda)
# Predictions
acc_lda_qda = accuracy_score(y_test_qda, lda_qda.predict(X_test_qda))
acc_qda = accuracy_score(y_test_qda, qda_model.predict(X_test_qda))
print("π LDA vs QDA on Non-Equal Covariance Data\n")
print(f"LDA Test Accuracy: {acc_lda_qda:.4f} (assumes equal covariance)")
print(f"QDA Test Accuracy: {acc_qda:.4f} (allows different covariances) β
")
print(f"\nImprovement: {(acc_qda - acc_lda_qda)*100:.1f} percentage points")
# Visualize
fig, axes = plt.subplots(1, 2, figsize=(14, 6))
plot_decision_boundary(lda_qda, X_test_qda, y_test_qda, axes[0],
f'LDA (Linear Boundary)\nAccuracy: {acc_lda_qda:.2%}')
plot_decision_boundary(qda_model, X_test_qda, y_test_qda, axes[1],
f'QDA (Quadratic Boundary)\nAccuracy: {acc_qda:.2%}')
plt.tight_layout()
plt.show()
print(f"\nπ‘ Key Insights:")
print(f" β’ QDA captures curved boundary between classes")
print(f" β’ LDA forced to use straight line (suboptimal)")
print(f" β’ QDA more flexible but needs more data")
print(f" β’ Use QDA when classes have different spreads/shapes")
4.5 Naive BayesΒΆ
The AssumptionΒΆ
Features are conditionally independent given the class: $\(P(X_1, X_2, ..., X_p | Y=k) = \prod_{j=1}^p P(X_j | Y=k)\)$
Classification RuleΒΆ
ProsΒΆ
Very fast (simple calculations)
Works well with high-dimensional data
Performs surprisingly well even when assumption violated
Good for text classification
ConsΒΆ
Independence assumption often unrealistic
Canβt model feature interactions
Probability estimates can be poor
# Compare all methods on the QDA dataset
models = {
'Logistic Regression': LogisticRegression(),
'LDA': LinearDiscriminantAnalysis(),
'QDA': QuadraticDiscriminantAnalysis(),
'Naive Bayes': GaussianNB(),
'KNN (k=5)': KNeighborsClassifier(n_neighbors=5)
}
results = {}
for name, model in models.items():
model.fit(X_train_qda, y_train_qda)
y_pred = model.predict(X_test_qda)
acc = accuracy_score(y_test_qda, y_pred)
results[name] = acc
print("π Comparison of All Classification Methods\n")
print(f"{'Method':<25} {'Test Accuracy'}")
print("="*45)
for name, acc in sorted(results.items(), key=lambda x: x[1], reverse=True):
print(f"{name:<25} {acc:>12.4f} {'β
' if acc == max(results.values()) else ''}")
# Visualize all decision boundaries
fig, axes = plt.subplots(2, 3, figsize=(16, 10))
axes = axes.ravel()
for idx, (name, model) in enumerate(models.items()):
plot_decision_boundary(model, X_test_qda, y_test_qda, axes[idx],
f'{name}\nAccuracy: {results[name]:.2%}')
# Summary in last subplot
axes[5].axis('off')
summary = "π‘ Method Selection Guide:\n\n"
summary += "Logistic: General purpose\n"
summary += "LDA: Equal variance, linear\n"
summary += "QDA: Different variance, curved\n"
summary += "Naive Bayes: Fast, high-D\n"
summary += "KNN: Non-parametric, local\n\n"
summary += f"Best here: QDA ({max(results.values()):.2%})"
axes[5].text(0.1, 0.5, summary, fontsize=11, verticalalignment='center',
bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.8))
plt.tight_layout()
plt.show()
4.6 Multi-Class ClassificationΒΆ
Extending to K > 2 ClassesΒΆ
Logistic Regression: One-vs-Rest or Softmax (Multinomial)
Softmax: \(P(Y=k|X) = \frac{e^{\beta_k^T X}}{\sum_{l=1}^K e^{\beta_l^T X}}\)
LDA/QDA: Natural extension
Compute discriminant function for each class
Assign to class with highest score
KNN: Naturally handles multi-class
Majority vote among k neighbors
# Generate 3-class data
np.random.seed(42)
n_per_class = 150
# Three classes with different centers
mean_A = np.array([0, 0])
mean_B = np.array([4, 0])
mean_C = np.array([2, 3.5])
cov_shared = np.array([[0.8, 0], [0, 0.8]])
X_A = np.random.multivariate_normal(mean_A, cov_shared, n_per_class)
X_B = np.random.multivariate_normal(mean_B, cov_shared, n_per_class)
X_C = np.random.multivariate_normal(mean_C, cov_shared, n_per_class)
X_multi = np.vstack([X_A, X_B, X_C])
y_multi = np.hstack([np.zeros(n_per_class),
np.ones(n_per_class),
2*np.ones(n_per_class)])
# Split
X_train_multi, X_test_multi, y_train_multi, y_test_multi = train_test_split(
X_multi, y_multi, test_size=0.3, random_state=42)
# Train LDA for multi-class
lda_multi = LinearDiscriminantAnalysis()
lda_multi.fit(X_train_multi, y_train_multi)
# Predictions
y_pred_multi = lda_multi.predict(X_test_multi)
acc_multi = accuracy_score(y_test_multi, y_pred_multi)
print("π Multi-Class Classification (3 Classes)\n")
print(f"Test Accuracy: {acc_multi:.4f}")
print(f"\nClassification Report:")
print(classification_report(y_test_multi, y_pred_multi,
target_names=['Class A', 'Class B', 'Class C']))
# Confusion matrix
cm_multi = confusion_matrix(y_test_multi, y_pred_multi)
fig, axes = plt.subplots(1, 2, figsize=(14, 6))
# Decision boundaries
h = 0.1
x_min, x_max = X_multi[:, 0].min() - 1, X_multi[:, 0].max() + 1
y_min, y_max = X_multi[:, 1].min() - 1, X_multi[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
Z = lda_multi.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
axes[0].contourf(xx, yy, Z, alpha=0.3, cmap='viridis')
axes[0].scatter(X_test_multi[y_test_multi==0, 0], X_test_multi[y_test_multi==0, 1],
c='red', s=50, alpha=0.6, edgecolors='k', label='Class A')
axes[0].scatter(X_test_multi[y_test_multi==1, 0], X_test_multi[y_test_multi==1, 1],
c='blue', s=50, alpha=0.6, edgecolors='k', label='Class B')
axes[0].scatter(X_test_multi[y_test_multi==2, 0], X_test_multi[y_test_multi==2, 1],
c='green', s=50, alpha=0.6, edgecolors='k', label='Class C')
axes[0].set_xlabel('Xβ')
axes[0].set_ylabel('Xβ')
axes[0].set_title(f'Multi-Class LDA\nAccuracy: {acc_multi:.2%}')
axes[0].legend()
axes[0].grid(True, alpha=0.3)
# Confusion matrix
sns.heatmap(cm_multi, annot=True, fmt='d', cmap='Blues', ax=axes[1],
xticklabels=['Class A', 'Class B', 'Class C'],
yticklabels=['Class A', 'Class B', 'Class C'])
axes[1].set_ylabel('Actual')
axes[1].set_xlabel('Predicted')
axes[1].set_title('Confusion Matrix (3 Classes)')
plt.tight_layout()
plt.show()
print(f"\nπ‘ Multi-Class Notes:")
print(f" β’ LDA naturally extends to K > 2 classes")
print(f" β’ Creates K-1 linear discriminants")
print(f" β’ Confusion matrix shows per-class performance")
print(f" β’ Can identify which classes are confused")
Key TakeawaysΒΆ
1. Method ComparisonΒΆ
Method |
Boundary |
Assumptions |
Best For |
|---|---|---|---|
Logistic |
Linear |
None |
General purpose, interpretable |
LDA |
Linear |
Normal, equal Ξ£ |
Well-separated, small n |
QDA |
Quadratic |
Normal, different Ξ£ |
Curved boundary, larger n |
Naive Bayes |
Any |
Independence |
High-D, text, fast |
KNN |
Any |
None |
Non-parametric, irregular |
2. Key MetricsΒΆ
Accuracy: Overall correctness
Precision: Minimize false positives
Recall: Minimize false negatives
F1-Score: Balance precision/recall
ROC-AUC: Overall discrimination ability
3. Decision FrameworkΒΆ
Linear boundary?
Yes β Logistic or LDA
No β QDA or KNN
Small sample?
Yes β LDA (more stable)
No β Logistic or QDA
Need probabilities?
Yes β Logistic, LDA, QDA
No β KNN acceptable
High dimensions?
Yes β Naive Bayes, Logistic with regularization
4. Practical TipsΒΆ
Always plot data first (2D/3D if possible)
Check class balance (imbalanced β adjust metrics/threshold)
Use cross-validation for model selection
Consider cost of errors (medical: high recall; spam: high precision)
ROC curve for threshold selection
Ensemble methods often beat single classifier
5. Common PitfallsΒΆ
Using accuracy with imbalanced data
Ignoring class prior probabilities
Not standardizing features (for KNN, LDA)
Overfitting with QDA on small samples
Trusting Naive Bayes probability estimates
Next ChapterΒΆ
Chapter 5: Resampling Methods
Cross-Validation (validation set, LOOCV, k-fold)
Bootstrap
Model selection and assessment
Practice ExercisesΒΆ
Exercise 1: Logistic Regression InterpretationΒΆ
Given logistic model: \(\log(\text{odds}) = -5 + 0.02 \times \text{Age}\)
What is P(Y=1) for Age=30?
At what age is P(Y=1) = 0.5?
Interpret the coefficient 0.02
Exercise 2: Metrics CalculationΒΆ
Given confusion matrix:
Pred- Pred+
Actual- 80 20
Actual+ 10 90
Calculate: Accuracy, Precision, Recall, Specificity, F1
Exercise 3: Method SelectionΒΆ
For each scenario, choose the best method and explain:
Email spam detection (10,000 word features)
Medical diagnosis (n=50, 5 features, classes overlap)
Customer churn (n=100,000, 20 features, non-linear)
Exercise 4: LDA vs QDAΒΆ
When would you prefer LDA over QDA? Consider:
Sample size
Number of features
Decision boundary shape
Exercise 5: ImplementationΒΆ
Create synthetic 3-class data and compare:
Logistic Regression (one-vs-rest)
LDA
KNN (try k=1, 5, 10)
Which performs best? Why?