Pre-Quiz: Debugging & TroubleshootingΒΆ
Test your baseline knowledge before starting Phase 16.
Time: 10 minutes
Questions: 10
Passing Score: 60%
InstructionsΒΆ
Answer each question and check your responses. This helps identify areas to focus on.
QuestionsΒΆ
1. What is the first step in debugging an ML model thatβs not learning?ΒΆ
A) Increase model complexity
B) Reproduce the bug consistently
C) Add more data
D) Try a different algorithm
Show Answer
Correct Answer: B
Explanation: The first step in any debugging workflow is to reproduce the bug consistently. You canβt fix what you canβt reliably observe. Once you can reproduce the issue, you can then gather data, hypothesize causes, and test fixes.
2. Which of the following is a sign of data leakage?ΒΆ
A) Test accuracy much lower than training accuracy
B) Test accuracy suspiciously close to or higher than training accuracy
C) Model takes long to train
D) Missing values in the dataset
Show Answer
Correct Answer: B
Explanation: Data leakage occurs when information from the test set influences training. This typically results in unrealistically high test accuracy that wonβt generalize. If test accuracy is very close to or exceeds training accuracy, suspect leakage.
3. What does a large gap between training and validation accuracy indicate?ΒΆ
A) Underfitting
B) Overfitting
C) Good generalization
D) Data leakage
Show Answer
Correct Answer: B
Explanation: A large gap (training accuracy much higher than validation) indicates overfitting - the model has memorized the training data but doesnβt generalize well. Solutions include regularization, more data, or reducing model complexity.
4. Which tool would you use for CPU profiling in Python?ΒΆ
A) memory_profiler
B) cProfile
C) pdb
D) pytest
Show Answer
Correct Answer: B
Explanation: cProfile is Pythonβs built-in CPU profiler that shows where your code spends time. memory_profiler is for memory, pdb is for interactive debugging, and pytest is for testing.
5. Whatβs the primary benefit of vectorization in ML code?ΒΆ
A) Easier to read
B) Uses less memory
C) Much faster execution
D) Better accuracy
Show Answer
Correct Answer: C
Explanation: Vectorization replaces Python loops with optimized NumPy operations, resulting in significantly faster execution (often 10-100x speedup). It leverages low-level optimizations and can use SIMD instructions.
6. Why is it important to scale features before training?ΒΆ
A) To make the model train faster
B) To ensure features are on similar scales for convergence
C) To reduce overfitting
D) To handle missing values
Show Answer
Correct Answer: B
Explanation: Unscaled features with different ranges can cause convergence issues, especially for gradient-based algorithms. Features on different scales can make some weights update much faster than others, preventing proper convergence.
7. What does the diagonal of a confusion matrix represent?ΒΆ
A) False positives
B) False negatives
C) Correct predictions
D) Total predictions
Show Answer
Correct Answer: C
Explanation: The diagonal elements of a confusion matrix represent correct predictions (true positives and true negatives). Off-diagonal elements represent errors (false positives and false negatives).
8. If your model has precision=0.95 and recall=0.40, what should you do?ΒΆ
A) Model is perfect, do nothing
B) Focus on reducing false negatives
C) Focus on reducing false positives
D) Collect more data
Show Answer
Correct Answer: B
Explanation: Low recall (0.40) means the model misses many positive cases (high false negatives). High precision (0.95) means when it predicts positive, itβs usually right. To improve recall, you might lower the decision threshold or address class imbalance.
9. Whatβs the best way to handle missing values that are >50% of a column?ΒΆ
A) Fill with mean
B) Fill with median
C) Consider dropping the column
D) Fill with mode
Show Answer
Correct Answer: C
Explanation: When more than 50% of values are missing, the column provides little information and imputation would be mostly guessing. Itβs often better to drop such columns unless the missingness itself is meaningful.
10. What does a learning curve that shows both training and validation scores are low indicate?ΒΆ
A) Overfitting
B) Underfitting
C) Perfect fit
D) Data leakage
Show Answer
Correct Answer: B
Explanation: When both training and validation scores are low and not improving with more data, it indicates underfitting (high bias). The model is too simple to capture the patterns. Solutions: increase model complexity or add more features.
Scoring GuideΒΆ
Count your correct answers:
9-10 correct: Excellent! You have a strong foundation π
7-8 correct: Good! Review a few concepts before starting π
5-6 correct: Moderate. Focus on weak areas during Phase 16 π
0-4 correct: Review prerequisite material before Phase 16 π
Key Topics to ReviewΒΆ
Based on your score, focus on:
If you missed 1-3:
General debugging workflow
Common ML pitfalls
If you missed 4-5:
Data quality issues
Model evaluation metrics
Performance optimization basics
If you missed 6-8:
Review Phase 15 (Model Evaluation)
Study debugging fundamentals
Practice with simple examples
If you missed 9-10:
Complete prerequisite phases first
Review Python and ML basics
Start with simpler debugging exercises
Next StepsΒΆ
Review any questions you got wrong
Check the explanations carefully
Read relevant notebook sections
Begin Phase 16 when ready!
Good luck with Phase 16! π