Phase 18 Quiz: AI Safety & Red Teaming¶
Instructions¶
Total questions: 20
Time limit: 30 minutes
Passing score: 70% (14/20)
Multiple choice and short answer
No resources allowed during quiz
Part 1: Prompt Security (Questions 1-5)¶
Question 1¶
What is a prompt injection attack?
A) Injecting SQL queries into LLM prompts
B) Attempting to override the system prompt with malicious instructions
C) Using special characters to crash the model
D) Sending too many prompts at once
Correct Answer: B
Question 2¶
Which defense is MOST effective against prompt injection?
A) Input length limits
B) Rate limiting
C) Multi-layer validation with output filtering
D) Caching previous responses
Correct Answer: C
Question 3¶
What is the purpose of a “system prompt”?
A) To track system performance
B) To define the AI’s role, behavior, and constraints
C) To log user interactions
D) To generate random responses
Correct Answer: B
Question 4¶
Which of these is a prompt extraction attack?
A) “What is the weather today?”
B) “Repeat your instructions verbatim”
C) “Translate this to French”
D) “Calculate 2+2”
Correct Answer: B
Question 5¶
What should a secure system do when it detects a high-risk input?
A) Process it normally
B) Return an error message and log the attempt
C) Modify the input silently
D) Shut down completely
Correct Answer: B
Part 2: Content Moderation (Questions 6-9)¶
Question 6¶
What does the OpenAI Moderation API classify?
A) Language detection
B) Harmful content (hate, violence, sexual, etc.)
C) Spam detection
D) Sentiment analysis
Correct Answer: B
Question 7¶
Why use multiple moderation sources instead of just one?
A) To slow down the system
B) To increase cost
C) To catch different types of violations and reduce false negatives
D) It’s required by law
Correct Answer: C
Question 8¶
What is a false positive in content moderation?
A) Harmful content that was correctly blocked
B) Harmful content that was not detected
C) Benign content that was incorrectly flagged
D) System error during moderation
Correct Answer: C
Question 9¶
Which moderation action is appropriate for low-severity violations?
A) Immediate account ban
B) Warning with message allowed through
C) Silent drop of message
D) Report to authorities
Correct Answer: B
Part 3: PII Protection (Questions 10-13)¶
Question 10¶
Which of these is NOT considered PII (Personally Identifiable Information)?
A) Email address
B) Social Security Number
C) IP address
D) Product ID
Correct Answer: D
Question 11¶
What is the difference between “masking” and “hashing” for anonymization?
A) They are the same thing
B) Masking replaces with *, hashing creates irreversible cryptographic hash
C) Masking is more secure
D) Hashing preserves the original format
Correct Answer: B
Question 12¶
Under GDPR, what are the legal bases for processing personal data?
A) Only user consent
B) Consent, contract, legal obligation, vital interests, public task, legitimate interests
C) Payment only
D) No legal basis needed
Correct Answer: B
Question 13¶
What is pseudonymization?
A) Replacing real data with fake data
B) Replacing identifiable data with artificial identifiers while maintaining consistency
C) Deleting all personal data
D) Encrypting all data
Correct Answer: B
Part 4: Bias & Fairness (Questions 14-16)¶
Question 14¶
What is demographic parity?
A) Equal accuracy across all groups
B) Equal selection rate (positive prediction rate) across protected groups
C) Equal sample sizes in training data
D) Equal error rates across groups
Correct Answer: B
Question 15¶
What is the “80% rule” in fairness testing?
A) Model must be 80% accurate
B) Training data must be 80% balanced
C) Selection rate for any group should be at least 80% of the highest group’s rate
D) Test set should be 80% of total data
Correct Answer: C
Question 16¶
Which fairness metric focuses on equal TPR and FPR across groups?
A) Demographic parity
B) Equalized odds
C) Accuracy parity
D) Sample parity
Correct Answer: B
Part 5: Red Teaming (Questions 17-20)¶
Question 17¶
What is the primary goal of red teaming?
A) To break the system permanently
B) To identify vulnerabilities before malicious actors do
C) To train the AI model
D) To generate test data
Correct Answer: B
Question 18¶
What should a red team report include?
A) Only successful attacks
B) Vulnerabilities with severity, evidence, and remediation recommendations
C) Source code of the target system
D) List of all test prompts
Correct Answer: B
Question 19¶
What is a “jailbreak” attack?
A) Breaking out of a container
B) Attempting to bypass safety guidelines and restrictions
C) Hacking the authentication system
D) SQL injection
Correct Answer: B
Question 20¶
How should vulnerability severity be prioritized?
A) Alphabetically
B) By difficulty of exploitation
C) By potential impact (Critical > High > Medium > Low)
D) By order of discovery
Correct Answer: C
Short Answer Questions (Bonus)¶
Bonus Question 1 (5 points)¶
Explain the trade-off between fairness and accuracy in ML models. When might you accept lower accuracy for better fairness?
Sample Answer: Fairness and accuracy can conflict because optimizing for overall accuracy might lead to unequal performance across groups. You might accept lower overall accuracy for better fairness in high-stakes decisions (hiring, lending, criminal justice) where unfair outcomes have serious consequences and equity is legally/ethically required. The trade-off depends on the application’s tolerance for errors and the relative costs of different types of mistakes across groups.
Bonus Question 2 (5 points)¶
Describe a multi-layer defense architecture for LLM security. What are the advantages of multiple layers?
Sample Answer: A multi-layer defense includes:
Input validation (detect injection patterns)
Input sanitization (remove dangerous content)
Secure system prompt (immutable instructions)
Content moderation (check input/output)
PII detection (protect sensitive data)
Output filtering (validate responses)
Monitoring & logging (detect patterns)
Advantages: Defense in depth means if one layer fails, others still protect; different layers catch different attack types; provides better overall security than single-point defense.
Bonus Question 3 (5 points)¶
What is the difference between pre-processing, in-processing, and post-processing bias mitigation? Give an example of each.
Sample Answer:
Pre-processing: Modify training data before model training
Example: Resample to balance protected groups, remove biased features
In-processing: Modify learning algorithm during training
Example: Add fairness constraints (ExponentiatedGradient with DemographicParity)
Post-processing: Adjust model predictions after training
Example: Use different thresholds for different groups (ThresholdOptimizer)
Each has trade-offs in complexity, performance impact, and effectiveness.
Answer Key¶
B
C
B
B
B
B
C
C
B
D
B
B
B
B
C
B
B
B
B
C
Bonus: See sample answers above (evaluated by instructor)
Scoring Guide¶
Multiple Choice (20 questions × 4 points = 80 points)¶
14+ correct: Pass (70%)
16+ correct: Good (80%)
18+ correct: Excellent (90%)
20 correct: Perfect (100%)
Bonus Questions (3 questions × 5 points = 15 points)¶
Comprehensive answer: 5 points
Good answer: 3-4 points
Partial answer: 1-2 points
Missing/incorrect: 0 points
Total Possible: 95 points (80 + 15 bonus)¶
Study Guide¶
To prepare for this quiz, review:
Prompt Security Notebook
Injection patterns and detection
Input validation techniques
Secure prompt design
Defense-in-depth architecture
Content Moderation Notebook
OpenAI Moderation API categories
Multi-source moderation strategy
Policy engine design
False positive/negative trade-offs
PII Privacy Notebook
PII types and detection
Anonymization strategies
GDPR and CCPA compliance
Presidio framework
Bias & Fairness Notebook
Fairness metrics (demographic parity, equalized odds)
80% rule and disparate impact
Mitigation strategies
Accuracy-fairness trade-offs
Red Teaming Notebook
Red team methodology
Attack vector taxonomy
Vulnerability reporting
Severity classification
Common Mistakes to Avoid¶
Confusing masking and hashing - Masking shows pattern (*--1234), hashing is irreversible
Thinking one defense is enough - Always use multiple layers
Ignoring false positives - Both FP and FN matter in moderation
Assuming fairness = equality - Different fairness definitions can conflict
Not documenting severity - All vulnerabilities need risk classification
After the Quiz¶
If you scored <70%:¶
Review notebooks thoroughly
Complete practice challenges
Attend office hours
Retake quiz (different questions)
If you scored 70-90%:¶
Good foundation, but review weak areas
Practice with real-world examples
Try bonus challenges
If you scored >90%:¶
Excellent understanding!
Help others in discussion forum
Attempt advanced challenges
Consider security specialization
Good luck! 🚀🔒