Debugging & Troubleshooting ChallengesΒΆ
Progressive challenges to master ML debugging skills.
Challenge 1: The Mystery Bug ββΒΆ
Difficulty: Beginner
Time: 30-45 minutes
Topic: Basic debugging workflow
ScenarioΒΆ
A junior data scientist wrote code that βworksβ but gives terrible results. Find and fix the bugs!
CodeΒΆ
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Shuffle (for randomness!)
np.random.shuffle(X_train)
np.random.shuffle(y_train)
# Train
model = LogisticRegression()
model.fit(X_train, y_train)
print(f"Accuracy: {model.score(X_test, y_test):.3f}")
RequirementsΒΆ
Identify all bugs (there are 2)
Explain why each is a problem
Fix the code
Document expected vs actual behavior
Verify fix with multiple random seeds
Success CriteriaΒΆ
β Bugs identified correctly
β Clear explanation of issues
β Fixed code achieves >85% accuracy
β Documented properly
Learning ObjectivesΒΆ
Recognize data leakage patterns
Understand feature-label alignment
Apply debugging workflow
Challenge 2: Data Detective βββΒΆ
Difficulty: Intermediate
Time: 1-2 hours
Topic: Data quality issues
ScenarioΒΆ
Your model performs well in training but fails in production. Investigate the dataset!
DatasetΒΆ
Download: UCI Adult Income dataset
RequirementsΒΆ
Missing Value Analysis
Find all columns with missing data
Calculate missing percentage
Recommend handling strategy per column
Implement and compare 2 strategies
Outlier Detection
Apply Z-score method
Apply IQR method
Compare results
Visualize outliers
Decide on handling approach
Duplicate Detection
Find exact duplicates
Find near-duplicates (optional)
Analyze impact on model
Distribution Shift
Split data by time/group
Test for distribution shift
Quantify the shift
DeliverablesΒΆ
Jupyter notebook with full analysis
Visualizations for each issue type
Before/after performance comparison
Summary report with recommendations
Success CriteriaΒΆ
β All data issues identified
β Multiple detection methods used
β Clear visualizations
β Quantified improvements
Bonus (β)ΒΆ
Detect label noise
Create automated data quality report
Build data validation pipeline
Challenge 3: Speed Demon βββΒΆ
Difficulty: Intermediate
Time: 2-3 hours
Topic: Performance optimization
ScenarioΒΆ
Your ML pipeline is too slow for production. Optimize it!
Code (Slow Pipeline)ΒΆ
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
# Load data (1M rows)
data = pd.DataFrame(np.random.randn(1000000, 50))
target = np.random.randint(0, 2, 1000000)
# Slow preprocessing
processed = []
for idx, row in data.iterrows():
normalized = (row - row.mean()) / row.std()
processed.append(normalized)
data_processed = pd.DataFrame(processed)
# Slow predictions
model = RandomForestClassifier(n_estimators=100)
model.fit(data_processed[:800000], target[:800000])
predictions = []
for i in range(800000, 1000000):
pred = model.predict(data_processed.iloc[i:i+1])[0]
predictions.append(pred)
RequirementsΒΆ
Profile the code
Use cProfile
Identify top 3 bottlenecks
Calculate time percentage for each
Optimize
Vectorize preprocessing
Batch predictions
Use parallel processing
Cache where applicable
Benchmark
Measure before/after speed
Measure memory usage
Create comparison table
Verify results match
Success CriteriaΒΆ
β Minimum 10x speedup
β Memory usage reduced
β Results identical to original
β Code is readable and documented
Bonus (β)ΒΆ
Achieve 50x+ speedup
Use line_profiler
Create performance visualization
Optimize memory further
Challenge 4: Convergence Crisis ββββΒΆ
Difficulty: Advanced
Time: 2-3 hours
Topic: Model debugging
ScenarioΒΆ
Your neural network wonβt converge. Debug and fix it!
CodeΒΆ
import numpy as np
from sklearn.datasets import make_classification
from sklearn.neural_network import MLPClassifier
X, y = make_classification(
n_samples=10000, n_features=100,
n_informative=50, random_state=42
)
# Don't scale features
model = MLPClassifier(
hidden_layer_sizes=(100, 50),
learning_rate_init=1.0, # High learning rate
max_iter=10, # Too few iterations
random_state=42,
verbose=True
)
model.fit(X, y)
print(f"Score: {model.score(X, y):.3f}")
RequirementsΒΆ
Diagnose Issues
Identify all problems
Explain impact of each
Prioritize fixes
Fix Systematically
Scale features
Tune learning rate
Adjust iterations
Monitor convergence
Learning Curves
Plot training vs validation loss
Diagnose overfitting/underfitting
Apply regularization if needed
Hyperparameter Tuning
Try different architectures
Test learning rates: [0.001, 0.01, 0.1]
Create validation curves
Success CriteriaΒΆ
β Model converges properly
β No convergence warnings
β Test accuracy >85%
β Learning curves look healthy
Bonus (β)ΒΆ
Implement early stopping
Use GridSearchCV
Compare with other models
Analyze gradient flow
Challenge 5: Error Analyzer ββββΒΆ
Difficulty: Advanced
Time: 3-4 hours
Topic: Error analysis
ScenarioΒΆ
Your model has 90% accuracy but fails on critical cases. Analyze and improve!
DatasetΒΆ
Use MNIST digits or similar multi-class dataset
RequirementsΒΆ
Confusion Matrix Analysis
Generate confusion matrix
Identify top 5 confused pairs
Visualize normalized matrix
Explain patterns
Per-Class Performance
Calculate precision/recall/F1 per class
Identify worst 3 classes
Analyze why they perform poorly
Propose class-specific fixes
Failure Case Analysis
Collect 20+ failure examples
Categorize error types
Visualize failure cases
Find common patterns
Confidence Analysis
Plot confidence distribution
Separate correct/incorrect
Find high-confidence errors
Create calibration curve
Action Plan
Prioritize improvements
Estimate impact
Propose data collection strategy
Suggest model changes
DeliverablesΒΆ
Complete error analysis report
Visualizations for all analyses
Categorized failure cases
Detailed improvement roadmap
Success CriteriaΒΆ
β Comprehensive confusion matrix analysis
β All classes analyzed
β Failure patterns identified
β Actionable recommendations
Bonus (β)ΒΆ
Implement one improvement
Show before/after comparison
Create error monitoring dashboard
Challenge 6: The Production Mystery βββββΒΆ
Difficulty: Expert
Time: 4-6 hours
Topic: Real-world debugging
ScenarioΒΆ
Your model works perfectly in development but fails in production. Why?
Given InformationΒΆ
Training accuracy: 95%
Test accuracy: 94%
Production accuracy: 65% (after 1 month)
No code changes were made
Different data source in production
RequirementsΒΆ
Hypothesis Generation
List 5+ possible causes
Rank by likelihood
Plan investigation steps
Distribution Shift Analysis
Compare train vs production distributions
Statistical tests (K-S, chi-square)
Visualize differences
Quantify shift magnitude
Feature Drift Detection
Monitor feature statistics
Detect out-of-range values
Identify concept drift
Track label distribution
Root Cause Analysis
Investigate data pipeline
Check preprocessing consistency
Validate assumptions
Document findings
Solutions
Propose fixes
Implement monitoring
Create retraining strategy
Build alerting system
DeliverablesΒΆ
Investigation report
Distribution analysis
Monitoring dashboard design
Retraining pipeline proposal
Documentation for ops team
Success CriteriaΒΆ
β Root cause identified
β Distribution shift quantified
β Solution implemented
β Monitoring in place
β Future prevention strategy
Bonus (ββ)ΒΆ
Implement automated retraining
Create A/B testing framework
Build model versioning system
Deploy monitoring dashboard
Challenge 7: Debug the Debugger βββββΒΆ
Difficulty: Expert
Time: 5-8 hours
Topic: Comprehensive debugging
ScenarioΒΆ
Build a comprehensive debugging toolkit for ML pipelines!
RequirementsΒΆ
Automated Bug Detection
class MLPipelineDebugger: def check_data_leakage(self, pipeline): # Detect common leakage patterns pass def check_feature_target_alignment(self, X, y): # Verify alignment pass def check_scaling_issues(self, scaler, X_train, X_test): # Detect scaling problems pass def check_class_imbalance(self, y): # Identify severe imbalance pass def check_convergence(self, model): # Verify model converged pass
Performance Profiler
CPU time tracking
Memory usage monitoring
Bottleneck identification
Optimization suggestions
Model Health Checker
Overfitting detection
Underfitting detection
Learning curve analysis
Validation curve generation
Error Analyzer
Automatic confusion matrix
Per-class metrics
Failure case collection
Confidence analysis
Report Generator
Create HTML report
Include all analyses
Actionable recommendations
Export to PDF
DeliverablesΒΆ
ml_debugger.py- Complete toolkitTest suite with 10+ test cases
Documentation with examples
Sample reports (HTML/PDF)
Tutorial notebook
Success CriteriaΒΆ
β All checkers implemented
β Catches common bugs
β Works with sklearn models
β Generates useful reports
β Well documented
Bonus (βββ)ΒΆ
Support PyTorch/TensorFlow
Add interactive visualizations
Create CLI tool
Publish as package
Add CI/CD integration
π Completion TrackerΒΆ
Track your progress:
Challenge 1: The Mystery Bug ββ
Challenge 2: Data Detective βββ
Challenge 3: Speed Demon βββ
Challenge 4: Convergence Crisis ββββ
Challenge 5: Error Analyzer ββββ
Challenge 6: The Production Mystery βββββ
Challenge 7: Debug the Debugger βββββ
π‘ TipsΒΆ
Start Simple: Begin with Challenge 1 and progress
Document Everything: Keep notes on what you tried
Measure First: Always establish baseline before optimizing
Test Thoroughly: Verify fixes work with different data
Learn from Failures: Every bug teaches something valuable
π ResourcesΒΆ
Happy Debugging! πβπ¦