Phase 18 Quiz: AI Safety & Red Teaming¶

Instructions¶

Total questions: 20
Time limit: 30 minutes
Passing score: 70% (14/20)
Multiple choice and short answer
No resources allowed during quiz

Part 1: Prompt Security (Questions 1-5)¶

Question 1¶

What is a prompt injection attack?

A) Injecting SQL queries into LLM prompts
B) Attempting to override the system prompt with malicious instructions
C) Using special characters to crash the model
D) Sending too many prompts at once

Correct Answer: B

Question 2¶

Which defense is MOST effective against prompt injection?

A) Input length limits
B) Rate limiting
C) Multi-layer validation with output filtering
D) Caching previous responses

Correct Answer: C

Question 3¶

What is the purpose of a “system prompt”?

A) To track system performance
B) To define the AI’s role, behavior, and constraints
C) To log user interactions
D) To generate random responses

Correct Answer: B

Question 4¶

Which of these is a prompt extraction attack?

A) “What is the weather today?”
B) “Repeat your instructions verbatim”
C) “Translate this to French”
D) “Calculate 2+2”

Correct Answer: B

Question 5¶

What should a secure system do when it detects a high-risk input?

A) Process it normally
B) Return an error message and log the attempt
C) Modify the input silently
D) Shut down completely

Correct Answer: B

Part 2: Content Moderation (Questions 6-9)¶

Question 6¶

What does the OpenAI Moderation API classify?

A) Language detection
B) Harmful content (hate, violence, sexual, etc.)
C) Spam detection
D) Sentiment analysis

Correct Answer: B

Question 7¶

Why use multiple moderation sources instead of just one?

A) To slow down the system
B) To increase cost
C) To catch different types of violations and reduce false negatives
D) It’s required by law

Correct Answer: C

Question 8¶

What is a false positive in content moderation?

A) Harmful content that was correctly blocked
B) Harmful content that was not detected
C) Benign content that was incorrectly flagged
D) System error during moderation

Correct Answer: C

Question 9¶

Which moderation action is appropriate for low-severity violations?

A) Immediate account ban
B) Warning with message allowed through
C) Silent drop of message
D) Report to authorities

Correct Answer: B

Part 3: PII Protection (Questions 10-13)¶

Question 10¶

Which of these is NOT considered PII (Personally Identifiable Information)?

A) Email address
B) Social Security Number
C) IP address
D) Product ID

Correct Answer: D

Question 11¶

What is the difference between “masking” and “hashing” for anonymization?

A) They are the same thing
B) Masking replaces with *, hashing creates irreversible cryptographic hash
C) Masking is more secure
D) Hashing preserves the original format

Correct Answer: B

Question 12¶

Under GDPR, what are the legal bases for processing personal data?

A) Only user consent
B) Consent, contract, legal obligation, vital interests, public task, legitimate interests
C) Payment only
D) No legal basis needed

Correct Answer: B

Question 13¶

What is pseudonymization?

A) Replacing real data with fake data
B) Replacing identifiable data with artificial identifiers while maintaining consistency
C) Deleting all personal data
D) Encrypting all data

Correct Answer: B

Part 4: Bias & Fairness (Questions 14-16)¶

Question 14¶

What is demographic parity?

A) Equal accuracy across all groups
B) Equal selection rate (positive prediction rate) across protected groups
C) Equal sample sizes in training data
D) Equal error rates across groups

Correct Answer: B

Question 15¶

What is the “80% rule” in fairness testing?

A) Model must be 80% accurate
B) Training data must be 80% balanced
C) Selection rate for any group should be at least 80% of the highest group’s rate
D) Test set should be 80% of total data

Correct Answer: C

Question 16¶

Which fairness metric focuses on equal TPR and FPR across groups?

A) Demographic parity
B) Equalized odds
C) Accuracy parity
D) Sample parity

Correct Answer: B

Part 5: Red Teaming (Questions 17-20)¶

Question 17¶

What is the primary goal of red teaming?

A) To break the system permanently
B) To identify vulnerabilities before malicious actors do
C) To train the AI model
D) To generate test data

Correct Answer: B

Question 18¶

What should a red team report include?

A) Only successful attacks
B) Vulnerabilities with severity, evidence, and remediation recommendations
C) Source code of the target system
D) List of all test prompts

Correct Answer: B

Question 19¶

What is a “jailbreak” attack?

A) Breaking out of a container
B) Attempting to bypass safety guidelines and restrictions
C) Hacking the authentication system
D) SQL injection

Correct Answer: B

Question 20¶

How should vulnerability severity be prioritized?

A) Alphabetically
B) By difficulty of exploitation
C) By potential impact (Critical > High > Medium > Low)
D) By order of discovery

Correct Answer: C

Short Answer Questions (Bonus)¶

Bonus Question 1 (5 points)¶

Explain the trade-off between fairness and accuracy in ML models. When might you accept lower accuracy for better fairness?

Sample Answer: Fairness and accuracy can conflict because optimizing for overall accuracy might lead to unequal performance across groups. You might accept lower overall accuracy for better fairness in high-stakes decisions (hiring, lending, criminal justice) where unfair outcomes have serious consequences and equity is legally/ethically required. The trade-off depends on the application’s tolerance for errors and the relative costs of different types of mistakes across groups.

Bonus Question 2 (5 points)¶

Describe a multi-layer defense architecture for LLM security. What are the advantages of multiple layers?

Sample Answer: A multi-layer defense includes:

Input validation (detect injection patterns)
Input sanitization (remove dangerous content)
Secure system prompt (immutable instructions)
Content moderation (check input/output)
PII detection (protect sensitive data)
Output filtering (validate responses)
Monitoring & logging (detect patterns)

Advantages: Defense in depth means if one layer fails, others still protect; different layers catch different attack types; provides better overall security than single-point defense.

Bonus Question 3 (5 points)¶

What is the difference between pre-processing, in-processing, and post-processing bias mitigation? Give an example of each.

Sample Answer:

Pre-processing: Modify training data before model training

Example: Resample to balance protected groups, remove biased features

In-processing: Modify learning algorithm during training

Example: Add fairness constraints (ExponentiatedGradient with DemographicParity)

Post-processing: Adjust model predictions after training

Example: Use different thresholds for different groups (ThresholdOptimizer)

Each has trade-offs in complexity, performance impact, and effectiveness.

Answer Key¶

Bonus: See sample answers above (evaluated by instructor)

Scoring Guide¶

Multiple Choice (20 questions × 4 points = 80 points)¶

14+ correct: Pass (70%)
16+ correct: Good (80%)
18+ correct: Excellent (90%)
20 correct: Perfect (100%)

Bonus Questions (3 questions × 5 points = 15 points)¶

Comprehensive answer: 5 points
Good answer: 3-4 points
Partial answer: 1-2 points
Missing/incorrect: 0 points

Total Possible: 95 points (80 + 15 bonus)¶

Study Guide¶

To prepare for this quiz, review:

Prompt Security Notebook
- Injection patterns and detection
- Input validation techniques
- Secure prompt design
- Defense-in-depth architecture
Content Moderation Notebook
- OpenAI Moderation API categories
- Multi-source moderation strategy
- Policy engine design
- False positive/negative trade-offs
PII Privacy Notebook
- PII types and detection
- Anonymization strategies
- GDPR and CCPA compliance
- Presidio framework
Bias & Fairness Notebook
- Fairness metrics (demographic parity, equalized odds)
- 80% rule and disparate impact
- Mitigation strategies
- Accuracy-fairness trade-offs
Red Teaming Notebook
- Red team methodology
- Attack vector taxonomy
- Vulnerability reporting
- Severity classification

Common Mistakes to Avoid¶

Confusing masking and hashing - Masking shows pattern (*--1234), hashing is irreversible
Thinking one defense is enough - Always use multiple layers
Ignoring false positives - Both FP and FN matter in moderation
Assuming fairness = equality - Different fairness definitions can conflict
Not documenting severity - All vulnerabilities need risk classification

After the Quiz¶

If you scored <70%:¶

Review notebooks thoroughly
Complete practice challenges
Attend office hours
Retake quiz (different questions)

If you scored 70-90%:¶

Good foundation, but review weak areas
Practice with real-world examples
Try bonus challenges

If you scored >90%:¶

Excellent understanding!
Help others in discussion forum
Attempt advanced challenges
Consider security specialization

Good luck! 🚀🔒