03: Experimental DesignΒΆ
βTo consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of.β - Ronald Fisher
Welcome to the gold standard of causal inference! This notebook explores experimental design - the most reliable way to establish causal relationships. Weβll cover Randomized Controlled Trials (RCTs), power analysis, and the principles that make experiments trustworthy.
π― Learning ObjectivesΒΆ
By the end of this notebook, youβll be able to:
Design and analyze Randomized Controlled Trials (RCTs)
Conduct power analysis for sample size determination
Apply blocking and stratification to reduce variance
Handle practical challenges in experimental design
Understand ethical considerations in experimentation
Evaluate experimental validity and potential biases
π₯ Randomized Controlled Trials (RCTs)ΒΆ
Randomized Controlled Trial (RCT): The gold standard for causal inference where:
Treatment assignment is random
Control group receives no treatment (or placebo)
Outcomes are measured for both groups
Analysis compares treated vs control
Why RCTs WorkΒΆ
Balance: Randomization balances both observed and unobserved confounders
Unbiased: No systematic differences between groups
Causal: Treatment precedes outcome, assignment is exogenous
RCT ComponentsΒΆ
Participants: Who is eligible?
Randomization: How are subjects assigned?
Intervention: What treatment is given?
Measurement: How are outcomes assessed?
Analysis: How are results interpreted?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from statsmodels.stats.power import TTestIndPower
import warnings
warnings.filterwarnings('ignore')
# Set random seeds
np.random.seed(42)
# Set up plotting
plt.style.use('default')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = [12, 8]
plt.rcParams['font.size'] = 12]
print("Experimental Design: Randomized Controlled Trials")
print("=================================================")
def simulate_rct():
"""Simulate a complete RCT from design to analysis"""
print("=== Simulating a Complete RCT ===\n")
# Study: Effect of online tutoring on student test scores
print("Study: Effect of Online Tutoring on Student Test Scores")
print("-" * 55)
n_students = 200
# Generate student characteristics (potential confounders)
baseline_score = np.random.normal(75, 10, n_students) # Pre-test scores
socioeconomic_status = np.random.normal(0, 1, n_students)
study_hours = np.random.normal(5, 2, n_students)
motivation = np.random.normal(0, 1, n_students)
# True causal effect of tutoring
true_tutoring_effect = 8 # 8 point improvement
# Random assignment to treatment (50% chance)
treatment_assignment = np.random.binomial(1, 0.5, n_students)
# Generate post-test scores
# All students improve based on their baseline and characteristics
natural_improvement = (0.7 * baseline_score +
2 * socioeconomic_status +
1.5 * study_hours +
3 * motivation)
# Add treatment effect
treatment_effect = true_tutoring_effect * treatment_assignment
# Add random noise
noise = np.random.normal(0, 5, n_students)
post_test_score = natural_improvement + treatment_effect + noise
# Create DataFrame
rct_data = pd.DataFrame({
'Student_ID': range(n_students),
'Treatment': treatment_assignment,
'Baseline_Score': baseline_score,
'SES': socioeconomic_status,
'Study_Hours': study_hours,
'Motivation': motivation,
'Post_Score': post_test_score,
'Improvement': post_test_score - baseline_score
})
print(f"RCT Design Summary:")
print(f"- Total students: {n_students}")
print(f"- Treatment group: {treatment_assignment.sum()}")
print(f"- Control group: {n_students - treatment_assignment.sum()}")
print(f"- True treatment effect: {true_tutoring_effect} points")
print()
# Check randomization balance
print("Randomization Balance Check:")
balance_vars = ['Baseline_Score', 'SES', 'Study_Hours', 'Motivation']
for var in balance_vars:
treated_mean = rct_data[rct_data['Treatment'] == 1][var].mean()
control_mean = rct_data[rct_data['Treatment'] == 0][var].mean()
diff = treated_mean - control_mean
# T-test for difference
t_stat, p_val = stats.ttest_ind(
rct_data[rct_data['Treatment'] == 1][var],
rct_data[rct_data['Treatment'] == 0][var]
)
print(f"{var:15}: Treated={treated_mean:.2f}, Control={control_mean:.2f}, "
f"Diff={diff:.2f}, p={p_val:.3f}")
print()
# Analyze results
treated_improvement = rct_data[rct_data['Treatment'] == 1]['Improvement'].mean()
control_improvement = rct_data[rct_data['Treatment'] == 0]['Improvement'].mean()
estimated_effect = treated_improvement - control_improvement
# Statistical test
t_stat, p_val = stats.ttest_ind(
rct_data[rct_data['Treatment'] == 1]['Improvement'],
rct_data[rct_data['Treatment'] == 0]['Improvement']
)
print("RCT Results:")
print(f"- Treated group improvement: {treated_improvement:.2f} points")
print(f"- Control group improvement: {control_improvement:.2f} points")
print(f"- Estimated treatment effect: {estimated_effect:.2f} points")
print(f"- True treatment effect: {true_tutoring_effect:.0f} points")
print(f"- T-statistic: {t_stat:.3f}, p-value: {p_val:.4f}")
print(f"- Statistically significant: {'Yes' if p_val < 0.05 else 'No'}")
print()
# Visualize results
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 12))
# Distribution of baseline scores by group
sns.histplot(data=rct_data, x='Baseline_Score', hue='Treatment',
alpha=0.7, ax=ax1)
ax1.set_title('Baseline Score Distribution\n(Randomization Balance)')
ax1.set_xlabel('Baseline Score')
ax1.legend(['Control', 'Treated'])
# Improvement by group
sns.boxplot(data=rct_data, x='Treatment', y='Improvement', ax=ax2)
ax2.set_title('Test Score Improvement by Group')
ax2.set_xticklabels(['Control', 'Treated'])
ax2.set_ylabel('Improvement (points)')
# Scatter plot: baseline vs improvement
sns.scatterplot(data=rct_data, x='Baseline_Score', y='Improvement',
hue='Treatment', alpha=0.7, ax=ax3)
ax3.set_title('Baseline Score vs Improvement')
ax3.set_xlabel('Baseline Score')
ax3.set_ylabel('Improvement (points)')
# Effect size over time (simulated multiple measurements)
# Simulate repeated measures
time_points = ['Pre-test', 'Mid-test', 'Post-test']
treated_scores = [75, 78, 83] # With tutoring effect accumulating
control_scores = [75, 76, 78] # Natural improvement only
ax4.plot(time_points, treated_scores, 'o-', linewidth=3, label='Treated', markersize=8)
ax4.plot(time_points, control_scores, 's-', linewidth=3, label='Control', markersize=8)
ax4.set_title('Learning Trajectories')
ax4.set_ylabel('Test Score')
ax4.legend()
ax4.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
print("π RCT Key Insights:")
print("1. Randomization creates balanced groups")
print("2. Simple comparison gives unbiased causal estimate")
print("3. Effect is precisely estimated and statistically significant")
print("4. Results are interpretable: tutoring improves scores by ~8 points")
print("5. Confidence intervals quantify uncertainty")
return rct_data
# Simulate a complete RCT
rct_results = simulate_rct()
β‘ Power Analysis & Sample SizeΒΆ
Power Analysis: Determines the sample size needed to detect an effect of a given size with sufficient statistical power.
Key ConceptsΒΆ
Effect Size: Standardized measure of treatment impact
Power (1-Ξ²): Probability of detecting a true effect
Significance Level (Ξ±): Probability of false positive
Sample Size (n): Number of subjects needed
Power Analysis FormulaΒΆ
Where:
\(z_{1-\alpha/2}\): Critical value for significance level
\(z_{1-\beta}\): Critical value for power
\(\sigma\): Standard deviation of outcome
\(d\): Effect size (Cohenβs d)
def power_analysis_demo():
"""Demonstrate power analysis for sample size determination"""
print("=== Power Analysis & Sample Size Determination ===\n")
# Example: Testing a new drug for blood pressure reduction
print("Example: Drug Trial for Blood Pressure Reduction")
print("-" * 45)
# Study parameters
effect_size = 0.5 # Cohen's d (moderate effect)
alpha = 0.05 # Significance level
power = 0.8 # Desired power (80%)
sigma = 10 # Standard deviation of blood pressure
# Calculate required sample size
z_alpha = stats.norm.ppf(1 - alpha/2) # Two-tailed test
z_beta = stats.norm.ppf(power)
n_per_group = ((z_alpha + z_beta) ** 2 * sigma ** 2) / (effect_size * sigma) ** 2
n_per_group = int(np.ceil(n_per_group))
total_n = 2 * n_per_group
print(f"Study Parameters:")
print(f"- Effect size (Cohen's d): {effect_size}")
print(f"- Significance level (Ξ±): {alpha}")
print(f"- Desired power (1-Ξ²): {power}")
print(f"- Outcome SD (Ο): {sigma} mmHg")
print()
print(f"Sample Size Calculation:")
print(f"- Subjects per group: {n_per_group}")
print(f"- Total subjects needed: {total_n}")
print()
# Demonstrate power curves
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 12))
# Power vs sample size for different effect sizes
effect_sizes = [0.2, 0.5, 0.8] # Small, medium, large
sample_sizes = np.arange(10, 201, 10)
for es in effect_sizes:
powers = []
for n in sample_sizes:
analysis = TTestIndPower()
power_val = analysis.solve_power(effect_size=es, nobs1=n, alpha=alpha)
powers.append(power_val)
ax1.plot(sample_sizes, powers, 'o-', label=f'Effect size = {es}', markersize=4)
ax1.axhline(y=0.8, color='red', linestyle='--', alpha=0.7, label='Target power (80%)')
ax1.set_xlabel('Sample Size per Group')
ax1.set_ylabel('Statistical Power')
ax1.set_title('Power vs Sample Size\n(Different Effect Sizes)')
ax1.legend()
ax1.grid(True, alpha=0.3)
# Power vs effect size for fixed sample size
effect_sizes_range = np.linspace(0.1, 1.0, 50)
fixed_n = 50
powers_fixed_n = []
for es in effect_sizes_range:
analysis = TTestIndPower()
power_val = analysis.solve_power(effect_size=es, nobs1=fixed_n, alpha=alpha)
powers_fixed_n.append(power_val)
ax2.plot(effect_sizes_range, powers_fixed_n, 'b-', linewidth=2)
ax2.axhline(y=0.8, color='red', linestyle='--', alpha=0.7)
ax2.set_xlabel('Effect Size (Cohen\'s d)')
ax2.set_ylabel('Statistical Power')
ax2.set_title(f'Power vs Effect Size\n(n = {fixed_n} per group)')
ax2.grid(True, alpha=0.3)
# Required sample size vs effect size
required_n = []
for es in effect_sizes_range:
analysis = TTestIndPower()
n = analysis.solve_power(effect_size=es, power=power, alpha=alpha)
required_n.append(n)
ax3.plot(effect_sizes_range, required_n, 'g-', linewidth=2)
ax3.set_xlabel('Effect Size (Cohen\'s d)')
ax3.set_ylabel('Required Sample Size per Group')
ax3.set_title('Sample Size Requirements\n(Power = 80%, Ξ± = 0.05)')
ax3.grid(True, alpha=0.3)
# Simulate underpowered vs properly powered study
np.random.seed(42)
# Underpowered study (n=20 per group)
n_small = 20
control_small = np.random.normal(120, 10, n_small)
treated_small = np.random.normal(120 - effect_size * 10, 10, n_small) # 5 mmHg effect
# Properly powered study (n=64 per group)
n_large = 64
control_large = np.random.normal(120, 10, n_large)
treated_large = np.random.normal(120 - effect_size * 10, 10, n_large)
# Effect estimates
effect_small = np.mean(control_small) - np.mean(treated_small)
effect_large = np.mean(control_large) - np.mean(treated_large)
# Confidence intervals
se_small = np.sqrt(np.var(control_small)/n_small + np.var(treated_small)/n_small)
se_large = np.sqrt(np.var(control_large)/n_large + np.var(treated_large)/n_large)
ci_small = (effect_small - 1.96*se_small, effect_small + 1.96*se_small)
ci_large = (effect_large - 1.96*se_large, effect_large + 1.96*se_large)
# Plot
studies = ['Underpowered\n(n=20)', 'Properly Powered\n(n=64)']
effects = [effect_small, effect_large]
errors = [1.96*se_small, 1.96*se_large]
ax4.errorbar(range(len(studies)), effects, yerr=errors, fmt='o',
capsize=5, markersize=8, linewidth=2)
ax4.axhline(y=effect_size * 10, color='red', linestyle='--',
label=f'True effect ({effect_size*10} mmHg)')
ax4.set_xticks(range(len(studies)))
ax4.set_xticklabels(studies)
ax4.set_ylabel('Blood Pressure Reduction (mmHg)')
ax4.set_title('Effect Estimation\n(Confidence Intervals)')
ax4.legend()
ax4.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
print("π Power Analysis Insights:")
print("1. Larger effect sizes require smaller samples")
print("2. Small effects (d=0.2) need ~400 subjects per group")
print("3. Underpowered studies have wide confidence intervals")
print("4. Properly powered studies give precise estimates")
print("5. Power analysis prevents wasted resources")
print()
# Interactive power calculator
print("Power Calculator Examples:")
print("-" * 25)
scenarios = [
{'effect': 0.2, 'power': 0.8, 'alpha': 0.05, 'desc': 'Small effect, standard parameters'},
{'effect': 0.5, 'power': 0.9, 'alpha': 0.01, 'desc': 'Medium effect, conservative test'},
{'effect': 0.8, 'power': 0.95, 'alpha': 0.05, 'desc': 'Large effect, high power'}
]
for scenario in scenarios:
analysis = TTestIndPower()
n = analysis.solve_power(
effect_size=scenario['effect'],
power=scenario['power'],
alpha=scenario['alpha']
)
print(f"{scenario['desc']}:")
print(f" Effect size: {scenario['effect']}, Power: {scenario['power']}, Ξ±: {scenario['alpha']}")
print(f" Required n per group: {int(np.ceil(n))}")
print()
return {
'required_n': n_per_group,
'effect_sizes': effect_sizes,
'sample_sizes': sample_sizes
}
# Demonstrate power analysis
power_results = power_analysis_demo()
π― Blocking & StratificationΒΆ
Blocking: Grouping similar experimental units together before randomization. Stratification: Ensuring equal representation across important subgroups.
Why Block/Stratify?ΒΆ
Reduce variance: More precise effect estimates
Balance covariates: Ensure groups are comparable
Increase power: Better control of confounding
Subgroup analysis: Effects in different populations
Blocking vs Covariate AdjustmentΒΆ
Blocking: Done before randomization
Covariate adjustment: Done during analysis
Both valid: Blocking often more effective
def blocking_stratification_demo():
"""Demonstrate blocking and stratification in experimental design"""
print("=== Blocking & Stratification in Experimental Design ===\n")
# Example: Educational intervention study
print("Example: Educational Intervention Study")
print("-" * 38)
n_students = 240
# Create student characteristics for blocking
school_type = np.random.choice(['Urban', 'Suburban', 'Rural'], n_students,
p=[0.4, 0.4, 0.2])
prior_achievement = np.random.choice(['Low', 'Medium', 'High'], n_students,
p=[0.3, 0.4, 0.3])
# Create blocks: combination of school type and achievement
blocks = [f"{school}_{achiev}" for school, achiev in zip(school_type, prior_achievement)]
# Count students per block
block_counts = pd.Series(blocks).value_counts().sort_index()
print("Students per Block:")
for block, count in block_counts.items():
print(f" {block}: {count} students")
print()
# Randomize within blocks
treatment_assignment = np.zeros(n_students, dtype=int)
for block in block_counts.index:
block_mask = np.array(blocks) == block
block_size = block_counts[block]
# Assign half to treatment within each block
treated_in_block = np.random.choice(block_size, block_size // 2, replace=False)
treatment_assignment[block_mask] = 0
block_indices = np.where(block_mask)[0]
treatment_assignment[block_indices[treated_in_block]] = 1
# Generate outcomes
# Base outcome depends on characteristics
base_score = (
(school_type == 'Urban') * 70 +
(school_type == 'Suburban') * 75 +
(school_type == 'Rural') * 72 +
(prior_achievement == 'Low') * 0 +
(prior_achievement == 'Medium') * 5 +
(prior_achievement == 'High') * 10
)
# Treatment effect
treatment_effect = 8 * treatment_assignment
# Final score
final_score = base_score + treatment_effect + np.random.normal(0, 5, n_students)
# Create DataFrame
blocked_data = pd.DataFrame({
'Student_ID': range(n_students),
'Block': blocks,
'School_Type': school_type,
'Prior_Achievement': prior_achievement,
'Treatment': treatment_assignment,
'Final_Score': final_score
})
print(f"Blocked RCT Results:")
print(f"- Total students: {n_students}")
print(f"- Treatment group: {treatment_assignment.sum()}")
print(f"- Control group: {n_students - treatment_assignment.sum()}")
print()
# Check balance within blocks
print("Balance Check by Block:")
balance_check = blocked_data.groupby('Block').agg({
'Treatment': ['count', 'sum', 'mean']
}).round(3)
balance_check.columns = ['Total', 'Treated', 'Treatment_Rate']
print(balance_check)
print()
# Compare blocked vs unblocked design
# Simulate unblocked design (simple randomization)
unblocked_treatment = np.random.binomial(1, 0.5, n_students)
unblocked_scores = base_score + 8 * unblocked_treatment + np.random.normal(0, 5, n_students)
# Calculate effects
blocked_effect = (blocked_data[blocked_data['Treatment'] == 1]['Final_Score'].mean() -
blocked_data[blocked_data['Treatment'] == 0]['Final_Score'].mean())
unblocked_effect = (unblocked_scores[unblocked_treatment == 1].mean() -
unblocked_scores[unblocked_treatment == 0].mean())
# Calculate standard errors
blocked_se = np.sqrt(
blocked_data[blocked_data['Treatment'] == 1]['Final_Score'].var() / treatment_assignment.sum() +
blocked_data[blocked_data['Treatment'] == 0]['Final_Score'].var() / (n_students - treatment_assignment.sum())
)
unblocked_se = np.sqrt(
unblocked_scores[unblocked_treatment == 1].var() / unblocked_treatment.sum() +
unblocked_scores[unblocked_treatment == 0].var() / (n_students - unblocked_treatment.sum())
)
print("Effect Estimation Comparison:")
print(f"Blocked design: {blocked_effect:.2f} Β± {1.96*blocked_se:.2f} (SE = {blocked_se:.2f})")
print(f"Unblocked design: {unblocked_effect:.2f} Β± {1.96*unblocked_se:.2f} (SE = {unblocked_se:.2f})")
print(f"True effect: 8.00")
print()
# Visualize blocking benefits
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 12))
# Distribution by school type
school_balance = blocked_data.groupby(['School_Type', 'Treatment']).size().unstack()
school_balance.plot(kind='bar', ax=ax1)
ax1.set_title('Treatment Assignment by School Type\n(Blocked Design)')
ax1.set_ylabel('Number of Students')
ax1.legend(['Control', 'Treated'])
# Distribution by achievement level
achievement_balance = blocked_data.groupby(['Prior_Achievement', 'Treatment']).size().unstack()
achievement_balance.plot(kind='bar', ax=ax2)
ax2.set_title('Treatment Assignment by Achievement Level\n(Blocked Design)')
ax2.set_ylabel('Number of Students')
# Effect by subgroup
subgroup_effects = []
for school in ['Urban', 'Suburban', 'Rural']:
school_data = blocked_data[blocked_data['School_Type'] == school]
effect = (school_data[school_data['Treatment'] == 1]['Final_Score'].mean() -
school_data[school_data['Treatment'] == 0]['Final_Score'].mean())
subgroup_effects.append(effect)
ax3.bar(['Urban', 'Suburban', 'Rural'], subgroup_effects)
ax3.axhline(y=8, color='red', linestyle='--', label='True effect')
ax3.set_title('Treatment Effect by School Type')
ax3.set_ylabel('Effect Size')
ax3.legend()
# Precision comparison
designs = ['Unblocked', 'Blocked']
effects = [unblocked_effect, blocked_effect]
errors = [1.96*unblocked_se, 1.96*blocked_se]
ax4.errorbar(range(len(designs)), effects, yerr=errors, fmt='o',
capsize=8, markersize=10, linewidth=3)
ax4.axhline(y=8, color='red', linestyle='--', label='True effect')
ax4.set_xticks(range(len(designs)))
ax4.set_xticklabels(designs)
ax4.set_ylabel('Estimated Effect')
ax4.set_title('Precision Comparison\n(Blocked vs Unblocked)')
ax4.legend()
ax4.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
print("π Blocking & Stratification Benefits:")
print("1. Ensures balance across important subgroups")
print("2. Reduces variance and increases precision")
print("3. Allows subgroup-specific effect estimation")
print("4. More efficient than covariate adjustment alone")
print("5. Protects against chance imbalances")
print()
# Stratification example
print("Stratification Example:")
print("-" * 22)
# Ensure equal representation in each stratum
strata_counts = blocked_data.groupby(['School_Type', 'Treatment']).size()
print("Stratified counts:")
print(strata_counts)
print()
print("Stratification ensures:")
print("- Equal treatment allocation within each stratum")
print("- Representative sample across all subgroups")
print("- Ability to estimate effects within subgroups")
print("- Better generalizability of results")
return blocked_data
# Demonstrate blocking and stratification
blocked_results = blocking_stratification_demo()
β οΈ Practical Challenges & Ethical ConsiderationsΒΆ
Common Experimental Challenges:
Non-compliance: Subjects donβt follow treatment protocol
Attrition: Subjects drop out of the study
Contamination: Control group gets exposed to treatment
Measurement error: Outcomes not measured accurately
External events: Confounding events during study
Ethical Considerations:
Informed consent: Subjects understand risks and benefits
Risk minimization: Treatments should be safe
Equitable access: Fair treatment assignment
Right to withdraw: Subjects can leave anytime
Data privacy: Protect participant information
def experimental_challenges_demo():
"""Demonstrate common experimental challenges and solutions"""
print("=== Practical Challenges in Experimental Design ===\n")
# Example: Online education platform A/B test
print("Example: Online Education Platform A/B Test")
print("-" * 42)
n_users = 10000
# Random assignment
treatment = np.random.binomial(1, 0.5, n_users)
# Simulate various challenges
# 1. Non-compliance: Some treated users don't use the feature
compliance_rate = 0.7 # 70% of treated users actually use it
actual_treatment = treatment * np.random.binomial(1, compliance_rate, n_users)
# 2. Attrition: Some users drop out
attrition_rate = 0.1 # 10% drop out
dropout = np.random.binomial(1, attrition_rate, n_users)
observed = 1 - dropout
# 3. Contamination: Some control users find the feature
contamination_rate = 0.05 # 5% of controls get contaminated
contamination = (1 - treatment) * np.random.binomial(1, contamination_rate, n_users)
actual_treatment = np.maximum(actual_treatment, contamination)
# 4. Measurement error
true_engagement = 50 + 15 * actual_treatment + np.random.normal(0, 10, n_users)
measured_engagement = true_engagement + np.random.normal(0, 5, n_users) # Measurement error
# 5. External events (holidays affect engagement)
holiday_effect = np.random.choice([0, 10], n_users, p=[0.9, 0.1]) # 10% during holidays
final_engagement = measured_engagement + holiday_effect
# Create DataFrame
challenge_data = pd.DataFrame({
'User_ID': range(n_users),
'Assigned_Treatment': treatment,
'Actual_Treatment': actual_treatment,
'Observed': observed,
'Contamination': contamination,
'True_Engagement': true_engagement,
'Measured_Engagement': measured_engagement,
'Final_Engagement': final_engagement,
'Holiday_Effect': holiday_effect
})
# Only analyze observed users
observed_data = challenge_data[challenge_data['Observed'] == 1]
print(f"Study Summary:")
print(f"- Total users: {n_users}")
print(f"- Assigned to treatment: {treatment.sum()}")
print(f"- Actually received treatment: {actual_treatment.sum()}")
print(f"- Observed (no attrition): {observed.sum()}")
print(f"- Contaminated controls: {contamination.sum()}")
print()
# Different analysis approaches
# Intent-to-Treat (ITT): Analyze as assigned
itt_effect = (observed_data[observed_data['Assigned_Treatment'] == 1]['Final_Engagement'].mean() -
observed_data[observed_data['Assigned_Treatment'] == 0]['Final_Engagement'].mean())
# Per-Protocol: Analyze actual treatment received
pp_effect = (observed_data[observed_data['Actual_Treatment'] == 1]['Final_Engagement'].mean() -
observed_data[observed_data['Actual_Treatment'] == 0]['Final_Engagement'].mean())
# As-Treated: Include contamination
at_effect = (observed_data[observed_data['Actual_Treatment'] == 1]['Final_Engagement'].mean() -
observed_data[(observed_data['Actual_Treatment'] == 0) & (observed_data['Contamination'] == 0)]['Final_Engagement'].mean())
print("Effect Estimates Under Different Analysis Approaches:")
print(f"Intent-to-Treat (ITT): {itt_effect:.2f} (conservative, includes non-compliance)")
print(f"Per-Protocol (PP): {pp_effect:.2f} (excludes non-compliant, may be biased)")
print(f"As-Treated (AT): {at_effect:.2f} (includes contamination, realistic)")
print(f"True effect: 15.00")
print()
# Statistical tests
from scipy import stats
treated_itt = observed_data[observed_data['Assigned_Treatment'] == 1]['Final_Engagement']
control_itt = observed_data[observed_data['Assigned_Treatment'] == 0]['Final_Engagement']
t_stat, p_val = stats.ttest_ind(treated_itt, control_itt)
print(f"Statistical Test (ITT):")
print(f"- T-statistic: {t_stat:.3f}")
print(f"- P-value: {p_val:.4f}")
print(f"- Significant: {'Yes' if p_val < 0.05 else 'No'}")
print()
# Visualize challenges
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 12))
# Non-compliance
compliance_by_assignment = challenge_data.groupby('Assigned_Treatment')['Actual_Treatment'].mean()
compliance_by_assignment.plot(kind='bar', ax=ax1, color=['skyblue', 'lightcoral'])
ax1.set_title('Non-Compliance\n(Assigned vs Actual Treatment)')
ax1.set_xlabel('Assigned Treatment')
ax1.set_ylabel('Actual Treatment Rate')
ax1.set_xticklabels(['Control', 'Treatment'], rotation=0)
# Attrition
attrition_by_assignment = 1 - challenge_data.groupby('Assigned_Treatment')['Observed'].mean()
attrition_by_assignment.plot(kind='bar', ax=ax2, color=['skyblue', 'lightcoral'])
ax2.set_title('Attrition Rates\n(Differential Dropout)')
ax2.set_xlabel('Assigned Treatment')
ax2.set_ylabel('Attrition Rate')
ax2.set_xticklabels(['Control', 'Treatment'], rotation=0)
# Contamination
contamination_rate = challenge_data[challenge_data['Assigned_Treatment'] == 0]['Contamination'].mean()
ax3.bar(['Control Group'], [contamination_rate], color='orange')
ax3.set_title('Contamination\n(Control Group Exposure)')
ax3.set_ylabel('Contamination Rate')
ax3.set_ylim(0, 0.1)
# Effect estimates comparison
methods = ['ITT', 'Per-Protocol', 'As-Treated']
effects = [itt_effect, pp_effect, at_effect]
ax4.bar(methods, effects, color=['blue', 'green', 'red'])
ax4.axhline(y=15, color='black', linestyle='--', label='True effect')
ax4.set_title('Effect Estimates\n(Different Analysis Methods)')
ax4.set_ylabel('Estimated Effect')
ax4.legend()
plt.tight_layout()
plt.show()
print("π§ Solutions to Experimental Challenges:")
print("1. Intent-to-Treat: Analyze as randomized (gold standard)")
print("2. Monitor compliance: Track actual treatment receipt")
print("3. Minimize attrition: Keep study engaging and worthwhile")
print("4. Prevent contamination: Use separate systems/platforms")
print("5. Robust measurement: Use validated, reliable measures")
print("6. Control external factors: Block on time, use covariates")
print()
# Ethical considerations
print("βοΈ Ethical Considerations:")
print("-" * 23)
ethical_points = [
"Informed Consent: Clear explanation of study purpose and risks",
"Risk-Benefit Analysis: Benefits must outweigh potential harms",
"Equitable Selection: Fair inclusion criteria, no discrimination",
"Right to Withdraw: Participants can leave at any time",
"Data Privacy: Secure storage and anonymized analysis",
"Transparency: Clear reporting of methods and results",
"Independent Review: Ethics committee approval required"
]
for i, point in enumerate(ethical_points, 1):
print(f"{i}. {point}")
return challenge_data
# Demonstrate experimental challenges
challenge_results = experimental_challenges_demo()
π― Key TakeawaysΒΆ
1. RCTs are the Gold StandardΒΆ
Randomization balances both observed and unobserved confounders
Simple analysis gives unbiased causal estimates
Results are directly interpretable
2. Power Analysis is EssentialΒΆ
Determines sample size needed for reliable results
Prevents underpowered (wasteful) or overpowered studies
Consider effect size, significance level, and desired power
3. Blocking & Stratification Improve PrecisionΒΆ
Ensure balance across important subgroups
Reduce variance and increase statistical power
Enable subgroup-specific analyses
4. Practical Challenges are CommonΒΆ
Non-compliance, attrition, contamination affect results
Intent-to-Treat analysis maintains randomization benefits
Careful monitoring and robust design mitigate issues
5. Ethics are ParamountΒΆ
Informed consent and risk minimization required
Equitable treatment and data privacy essential
Independent ethical review protects participants
π Critical Thinking QuestionsΒΆ
When would you choose an RCT over observational methods?
How does non-compliance affect causal interpretation?
Whatβs the difference between statistical and practical significance?
How can you minimize attrition in a long-term study?
What ethical issues arise in A/B testing of user interfaces?
π Next StepsΒΆ
Now that you understand experimental design, youβre ready for:
Matching Methods: Propensity score matching for observational data
Instrumental Variables: Natural experiments and exogenous variation
Regression Discontinuity: Quasi-experimental designs
Difference-in-Differences: Before-after comparisons
Remember: Good experimental design is the foundation of trustworthy causal inference. Always prioritize randomization, power, and ethics!
βThe most important thing in an experiment is to know what you are trying to find out.β - Ludwig Wittgenstein