Phase 17: Debugging & Troubleshooting β€” Start HereΒΆ

Diagnose and fix AI system failures systematically β€” from data issues to slow inference to hallucinating models.

Why This Phase MattersΒΆ

90% of AI project failures are not model failures β€” they are data issues, evaluation mistakes, or infrastructure problems. This phase teaches a systematic debugging mindset.

Notebooks in This PhaseΒΆ

Notebook

Topic

01_debugging_workflow.ipynb

Systematic AI debugging methodology

02_data_issues.ipynb

Data leakage, class imbalance, drift detection

03_performance_profiling.ipynb

Profile slow code, CUDA bottlenecks, memory

04_model_debugging.ipynb

Overfitting, underfitting, gradient issues

05_error_analysis.ipynb

Confusion matrices, failure mode analysis

Common AI Bugs TaxonomyΒΆ

Category

Examples

Data bugs

Train/test leakage, label noise, class imbalance

Training bugs

Wrong loss function, LR too high/low, batch size

Evaluation bugs

Wrong metric, leaky evaluation, benchmark overfitting

Inference bugs

Wrong preprocessing, tokenization mismatch

LLM-specific

Hallucination, context overflow, prompt injection

PrerequisitesΒΆ

  • Machine learning basics

  • Model evaluation (Phase 16)

Learning PathΒΆ

01_debugging_workflow.ipynb      ← Start here
02_data_issues.ipynb
03_performance_profiling.ipynb
04_model_debugging.ipynb
05_error_analysis.ipynb