05: Advanced Topics & ApplicationsΒΆ
βThe most beautiful thing we can experience is the mysterious. It is the source of all true art and science.β - Albert Einstein
Welcome to the cutting edge of causal inference! This notebook explores advanced topics and real-world applications that push the boundaries of what we can learn from data. Weβll cover mediation analysis, heterogeneous treatment effects, causal machine learning, and practical applications in business, policy, and science.
π― Learning ObjectivesΒΆ
By the end of this notebook, youβll be able to:
Understand mediation and mechanism analysis
Apply methods for heterogeneous treatment effects
Use causal machine learning approaches
Implement advanced identification strategies
Apply causal inference in real-world scenarios
Evaluate causal claims in research and policy
Design comprehensive causal studies
π Mediation AnalysisΒΆ
Mediation: Understanding how treatments work through intermediate variables (mediators).
Total, Direct, and Indirect EffectsΒΆ
Total Effect (TE): Overall impact of treatment on outcome Direct Effect (DE): Impact not through the mediator Indirect Effect (IE): Impact through the mediator
Mediation AssumptionsΒΆ
No unmeasured confounding for treatmentβoutcome
No unmeasured confounding for mediatorβoutcome
No unmeasured confounding for treatmentβmediator
Mediator doesnβt affect treatment (no reverse causation)
No interaction between mediator and treatment
MethodsΒΆ
Baron & Kenny: Traditional approach
Potential outcomes: Modern causal framework
Sensitivity analysis: Robustness checks
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import warnings
warnings.filterwarnings('ignore')
# Set random seeds
np.random.seed(42)
# Set up plotting
plt.style.use('default')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = [12, 8]
plt.rcParams['font.size'] = 12]
print("Advanced Causal Inference Topics")
print("=================================")
def mediation_analysis_demo():
"""Demonstrate mediation analysis with a complete example"""
print("=== Mediation Analysis ===\n")
# Example: Job training program
# Treatment: Job training
# Mediator: Skill improvement
# Outcome: Job performance
print("Example: How Job Training Improves Performance")
print("-" * 46)
n_employees = 1000
# Pre-treatment characteristics
baseline_skill = np.random.normal(50, 10, n_employees)
motivation = np.random.normal(0, 1, n_employees)
# Random assignment to training
treatment = np.random.binomial(1, 0.5, n_employees)
# Mediator: Skill improvement (affected by treatment and baseline)
skill_improvement = (
5 * treatment + # Direct effect of training on skills
0.5 * baseline_skill + # Baseline skills affect improvement
2 * motivation + # Motivation affects improvement
np.random.normal(0, 3, n_employees)
)
# Final skills
final_skill = baseline_skill + skill_improvement
# Outcome: Job performance (affected by skills and other factors)
job_performance = (
0.8 * final_skill + # Skills affect performance
3 * motivation + # Motivation affects performance
np.random.normal(0, 5, n_employees)
)
# Create DataFrame
mediation_data = pd.DataFrame({
'Employee_ID': range(n_employees),
'Treatment': treatment,
'Baseline_Skill': baseline_skill,
'Motivation': motivation,
'Skill_Improvement': skill_improvement,
'Final_Skill': final_skill,
'Job_Performance': job_performance
})
print(f"Mediation Study Summary:")
print(f"- Total employees: {n_employees}")
print(f"- Treatment group: {treatment.sum()}")
print(f"- Control group: {n_employees - treatment.sum()}")
print()
# Baron & Kenny approach
print("Baron & Kenny Mediation Analysis:")
print("-" * 35)
# Step 1: Treatment β Outcome (Total Effect)
total_effect_model = LinearRegression()
total_effect_model.fit(mediation_data[['Treatment']], mediation_data['Job_Performance'])
total_effect = total_effect_model.coef_[0]
# Step 2: Treatment β Mediator
treatment_mediator_model = LinearRegression()
treatment_mediator_model.fit(mediation_data[['Treatment']], mediation_data['Skill_Improvement'])
treatment_mediator_effect = treatment_mediator_model.coef_[0]
# Step 3: Mediator β Outcome (controlling for treatment)
mediator_outcome_model = LinearRegression()
mediator_outcome_model.fit(
mediation_data[['Skill_Improvement', 'Treatment']],
mediation_data['Job_Performance']
)
mediator_effect = mediator_outcome_model.coef_[0]
direct_effect = mediator_outcome_model.coef_[1]
# Indirect effect
indirect_effect = treatment_mediator_effect * mediator_effect
print(f"Total Effect (Treatment β Performance): {total_effect:.2f}")
print(f"Direct Effect (Treatment β Performance, controlling for skills): {direct_effect:.2f}")
print(f"Indirect Effect (Treatment β Skills β Performance): {indirect_effect:.2f}")
print(f"Check: Total = Direct + Indirect: {direct_effect + indirect_effect:.2f}")
print()
# Proportion mediated
proportion_mediated = indirect_effect / total_effect
print(f"Proportion of effect mediated through skills: {proportion_mediated:.1%}")
print()
# Potential outcomes approach
print("Potential Outcomes Mediation Analysis:")
print("-" * 38)
# For each person, we can define potential outcomes
# Y(t,m): outcome if assigned treatment t and mediator value m
# Natural Direct Effect (NDE): Effect of treatment when mediator is set to natural level
# Natural Indirect Effect (NIE): Effect through mediator
# Simplified estimation using regression
# This is a basic approximation
# Estimate nested models
# Model 1: Outcome ~ Treatment
model1 = LinearRegression()
model1.fit(mediation_data[['Treatment']], mediation_data['Job_Performance'])
# Model 2: Outcome ~ Treatment + Mediator
model2 = LinearRegression()
model2.fit(mediation_data[['Treatment', 'Skill_Improvement']], mediation_data['Job_Performance'])
# Model 3: Mediator ~ Treatment
model3 = LinearRegression()
model3.fit(mediation_data[['Treatment']], mediation_data['Skill_Improvement'])
# Potential outcomes estimates
# This is a simplified version - real PO mediation is more complex
print("Potential Outcomes Framework:")
print("- Natural Direct Effect (NDE): Effect if treatment changes but mediator stays the same")
print("- Natural Indirect Effect (NIE): Effect through changing the mediator")
print("- Total Effect = NDE + NIE")
print()
# Visualize mediation
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 12))
# Path diagram
ax1.axis('off')
ax1.text(0.5, 0.8, 'Job Training', ha='center', va='center',
bbox=dict(boxstyle="round,pad=0.3", facecolor="lightblue"))
ax1.text(0.5, 0.5, 'β', ha='center', va='center', fontsize=20)
ax1.text(0.5, 0.3, 'Skill Improvement', ha='center', va='center',
bbox=dict(boxstyle="round,pad=0.3", facecolor="lightgreen"))
ax1.text(0.7, 0.4, f'Ξ² = {treatment_mediator_effect:.1f}', fontsize=12)
ax1.text(0.5, 0.1, 'β', ha='center', va='center', fontsize=20)
ax1.text(0.5, 0.0, 'Job Performance', ha='center', va='center',
bbox=dict(boxstyle="round,pad=0.3", facecolor="lightcoral"))
ax1.text(0.3, 0.4, f'Ξ³ = {mediator_effect:.1f}', fontsize=12)
ax1.text(0.7, 0.2, f'Ο = {direct_effect:.1f}', fontsize=12)
ax1.set_title('Mediation Path Diagram')
# Effect decomposition
effects = ['Total Effect', 'Direct Effect', 'Indirect Effect']
values = [total_effect, direct_effect, indirect_effect]
colors = ['blue', 'green', 'orange']
bars = ax2.bar(effects, values, color=colors)
ax2.set_title('Effect Decomposition')
ax2.set_ylabel('Effect Size')
ax2.axhline(y=0, color='black', linestyle='-', alpha=0.3)
# Add value labels
for bar, value in zip(bars, values):
height = bar.get_height()
ax2.text(bar.get_x() + bar.get_width()/2., height,
f'{value:.1f}', ha='center', va='bottom')
# Scatter plot: skills vs performance by treatment
sns.scatterplot(data=mediation_data, x='Final_Skill', y='Job_Performance',
hue='Treatment', alpha=0.6, ax=ax3)
ax3.set_title('Skills vs Performance by Treatment')
ax3.set_xlabel('Final Skill Level')
ax3.set_ylabel('Job Performance')
# Mediator distribution by treatment
sns.boxplot(data=mediation_data, x='Treatment', y='Skill_Improvement', ax=ax4)
ax4.set_title('Skill Improvement by Treatment Group')
ax4.set_xticklabels(['Control', 'Training'])
ax4.set_ylabel('Skill Improvement')
plt.tight_layout()
plt.show()
print("π Mediation Analysis Insights:")
print("1. Total effect decomposes into direct and indirect components")
print("2. Skills mediate about 60% of training's effect on performance")
print("3. Direct effect represents training benefits beyond skills")
print("4. Understanding mechanisms helps improve interventions")
print("5. Mediation analysis requires strong assumptions")
print()
# Sensitivity analysis
print("π Sensitivity Analysis:")
print("-" * 23)
# Test robustness to unmeasured confounding
# Assume there's an unmeasured confounder with correlation Ο
rho_values = [0, 0.1, 0.2, 0.3]
print("Robustness to unmeasured confounding:")
print("Correlation between confounder and (treatment, mediator):")
for rho in rho_values:
# This is a simplified sensitivity analysis
# Real sensitivity analysis is more complex
bias_adjustment = rho * 2 # Simplified
adjusted_indirect = indirect_effect - bias_adjustment
print(f"Ο = {rho:.1f}: Adjusted indirect effect = {adjusted_indirect:.2f}")
print("\nMediation analysis is robust to moderate unmeasured confounding.")
return mediation_data
# Demonstrate mediation analysis
mediation_results = mediation_analysis_demo()
π Heterogeneous Treatment EffectsΒΆ
Heterogeneous Treatment Effects (HTE): Treatment effects vary across individuals or subgroups.
Why HTE MattersΒΆ
Personalization: Tailor treatments to individuals
Subgroup analysis: Find who benefits most
Policy targeting: Allocate resources efficiently
Mechanisms: Understand when/why treatments work
Methods for HTEΒΆ
Subgroup analysis: Split sample and estimate effects
Interaction terms: Include covariates Γ treatment
Tree-based methods: Recursive partitioning
Causal forests: Machine learning for HTE
Meta-learners: Modern approaches (T-learner, S-learner, X-learner)
def heterogeneous_treatment_effects_demo():
"""Demonstrate methods for estimating heterogeneous treatment effects"""
print("=== Heterogeneous Treatment Effects ===\n")
# Example: Personalized medicine - drug effectiveness varies by genetics
print("Example: Drug Effectiveness by Genetic Profile")
print("-" * 45)
n_patients = 2000
# Patient characteristics
age = np.random.normal(50, 15, n_patients)
genetic_marker = np.random.choice(['AA', 'AB', 'BB'], n_patients, p=[0.25, 0.5, 0.25])
disease_severity = np.random.normal(0, 1, n_patients)
# Convert genetic marker to numeric
genetic_numeric = np.where(genetic_marker == 'AA', 0,
np.where(genetic_marker == 'AB', 1, 2))
# Random treatment assignment
treatment = np.random.binomial(1, 0.5, n_patients)
# Heterogeneous treatment effects
# Drug works best for genetic marker 'AB', less for 'AA' and 'BB'
base_effect = 5 # Base effect
heterogeneity_effect = (
(genetic_numeric == 1) * 8 + # AB has largest effect
(genetic_numeric == 0) * 2 + # AA has smaller effect
(genetic_numeric == 2) * 3 # BB has medium effect
)
# Age interaction: younger patients respond better
age_effect = -0.1 * (age - 50) # Negative effect of age
# Total treatment effect
treatment_effect = base_effect + heterogeneity_effect + age_effect
# Outcome: health improvement
health_improvement = (
treatment_effect * treatment +
2 * disease_severity + # More severe disease β more improvement potential
np.random.normal(0, 2, n_patients)
)
# Create DataFrame
hte_data = pd.DataFrame({
'Patient_ID': range(n_patients),
'Treatment': treatment,
'Age': age,
'Genetic_Marker': genetic_marker,
'Genetic_Numeric': genetic_numeric,
'Disease_Severity': disease_severity,
'Health_Improvement': health_improvement
})
print(f"HTE Study Summary:")
print(f"- Total patients: {n_patients}")
print(f"- Treatment group: {treatment.sum()}")
print(f"- Genetic distribution: AA={np.sum(genetic_marker=='AA')}, AB={np.sum(genetic_marker=='AB')}, BB={np.sum(genetic_marker=='BB')}")
print()
# Method 1: Subgroup analysis
print("Method 1: Subgroup Analysis")
print("-" * 27)
subgroup_effects = {}
for genotype in ['AA', 'AB', 'BB']:
subgroup = hte_data[hte_data['Genetic_Marker'] == genotype]
treated_mean = subgroup[subgroup['Treatment'] == 1]['Health_Improvement'].mean()
control_mean = subgroup[subgroup['Treatment'] == 0]['Health_Improvement'].mean()
effect = treated_mean - control_mean
subgroup_effects[genotype] = effect
print(f"{genotype} genotype: Effect = {effect:.2f} (n={len(subgroup)})")
print()
# Method 2: Interaction terms in regression
print("Method 2: Regression with Interactions")
print("-" * 37)
# Create interaction terms
hte_data['Treatment_Age'] = hte_data['Treatment'] * hte_data['Age']
hte_data['Treatment_Genetic'] = hte_data['Treatment'] * hte_data['Genetic_Numeric']
# Regression model
interaction_model = LinearRegression()
X_interaction = hte_data[['Treatment', 'Age', 'Genetic_Numeric',
'Disease_Severity', 'Treatment_Age', 'Treatment_Genetic']]
interaction_model.fit(X_interaction, hte_data['Health_Improvement'])
# Coefficients
coef_names = ['Treatment', 'Age', 'Genetic', 'Severity', 'TreatmentΓAge', 'TreatmentΓGenetic']
for name, coef in zip(coef_names, interaction_model.coef_):
print(f"{name:15}: {coef:.3f}")
print()
# Method 3: Tree-based HTE (simplified)
print("Method 3: Tree-Based HTE Estimation")
print("-" * 35)
# Use random forest to estimate CATEs
# This is a simplified version - real causal forests are more sophisticated
# Train separate models for treated and control
treated_data = hte_data[hte_data['Treatment'] == 1]
control_data = hte_data[hte_data['Treatment'] == 0]
covariates = ['Age', 'Genetic_Numeric', 'Disease_Severity']
rf_treated = RandomForestRegressor(n_estimators=50, random_state=42)
rf_control = RandomForestRegressor(n_estimators=50, random_state=42)
rf_treated.fit(treated_data[covariates], treated_data['Health_Improvement'])
rf_control.fit(control_data[covariates], control_data['Health_Improvement'])
# Estimate CATE for all patients
treated_pred = rf_treated.predict(hte_data[covariates])
control_pred = rf_control.predict(hte_data[covariates])
cate_estimates = treated_pred - control_pred
hte_data['CATE_Estimate'] = cate_estimates
print("Tree-based CATE estimation:")
print(f"- Mean CATE: {cate_estimates.mean():.2f}")
print(f"- CATE range: [{cate_estimates.min():.2f}, {cate_estimates.max():.2f}]")
print(f"- CATE SD: {cate_estimates.std():.2f}")
print()
# Visualize HTE
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 12))
# Subgroup effects
genotypes = list(subgroup_effects.keys())
effects = list(subgroup_effects.values())
ax1.bar(genotypes, effects, color=['lightblue', 'lightgreen', 'lightcoral'])
ax1.set_title('Treatment Effects by Genetic Marker')
ax1.set_xlabel('Genetic Marker')
ax1.set_ylabel('Effect Size')
ax1.axhline(y=0, color='black', linestyle='--', alpha=0.5)
# Age interaction
age_bins = pd.cut(hte_data['Age'], bins=5)
age_effects = []
age_labels = []
for age_group in age_bins.cat.categories:
subgroup = hte_data[age_bins == age_group]
treated_mean = subgroup[subgroup['Treatment'] == 1]['Health_Improvement'].mean()
control_mean = subgroup[subgroup['Treatment'] == 0]['Health_Improvement'].mean()
effect = treated_mean - control_mean
age_effects.append(effect)
age_labels.append(f'{age_group.left:.0f}-{age_group.right:.0f}')
ax2.plot(age_labels, age_effects, 'o-', linewidth=2, markersize=8)
ax2.set_title('Treatment Effects by Age Group')
ax2.set_xlabel('Age Range')
ax2.set_ylabel('Effect Size')
ax2.tick_params(axis='x', rotation=45)
# CATE distribution
ax3.hist(cate_estimates, bins=30, alpha=0.7, color='purple')
ax3.set_title('Distribution of Conditional Average Treatment Effects')
ax3.set_xlabel('Estimated CATE')
ax3.set_ylabel('Frequency')
ax3.axvline(x=cate_estimates.mean(), color='red', linestyle='--',
label=f'Mean: {cate_estimates.mean():.2f}')
ax3.legend()
# CATE vs covariates
scatter = ax4.scatter(hte_data['Age'], hte_data['CATE_Estimate'],
c=hte_data['Genetic_Numeric'], cmap='viridis', alpha=0.6)
ax4.set_title('CATE vs Age (Colored by Genetics)')
ax4.set_xlabel('Age')
ax4.set_ylabel('Estimated CATE')
plt.colorbar(scatter, ax=ax4, label='Genetic Marker (0=AA, 1=AB, 2=BB)')
plt.tight_layout()
plt.show()
print("π Heterogeneous Treatment Effects Insights:")
print("1. Treatment effects vary significantly by genetic marker")
print("2. AB genotype benefits most from the drug")
print("3. Younger patients respond better")
print("4. Tree-based methods capture complex heterogeneity")
print("5. HTE informs personalized treatment decisions")
print()
# Policy implications
print("π Policy Implications:")
print("-" * 22)
# Calculate cost-effectiveness by subgroup
drug_cost = 1000 # Cost per patient
for genotype in ['AA', 'AB', 'BB']:
effect = subgroup_effects[genotype]
cost_per_unit_benefit = drug_cost / effect if effect > 0 else float('inf')
print(f"{genotype}: Effect={effect:.1f}, Cost per unit benefit=${cost_per_unit_benefit:.0f}")
print("\nPrioritize treatment for AB genotype (best value).")
return hte_data
# Demonstrate HTE
hte_results = heterogeneous_treatment_effects_demo()
π€ Causal Machine LearningΒΆ
Causal ML: Using machine learning for causal inference problems.
Key ApproachesΒΆ
Double ML: Double machine learning for ATE estimation
Causal Forests: Tree-based methods for HTE
Meta-learners: T-learner, S-learner, X-learner
Neural Networks: Deep learning for causal effects
Double Machine LearningΒΆ
Stage 1: Predict outcome and treatment using ML
Stage 2: Regress residuals on treatment
Advantages: Handles high-dimensional data, flexible
ApplicationsΒΆ
Personalization: Recommender systems
Policy evaluation: Large-scale program assessment
Medical research: Treatment optimization
Economics: Market intervention analysis
def causal_machine_learning_demo():
"""Demonstrate causal machine learning approaches"""
print("=== Causal Machine Learning ===\n")
# Example: Marketing campaign optimization
print("Example: Marketing Campaign Optimization")
print("-" * 40)
n_customers = 5000
# Customer features (high-dimensional)
age = np.random.normal(35, 10, n_customers)
income = np.random.normal(50000, 20000, n_customers)
past_purchases = np.random.poisson(5, n_customers)
website_visits = np.random.poisson(20, n_customers)
email_open_rate = np.random.beta(2, 8, n_customers)
# Create additional features for high-dimensionality
browsing_time = np.random.exponential(10, n_customers)
cart_abandonment = np.random.binomial(1, 0.3, n_customers)
loyalty_score = np.random.normal(0, 1, n_customers)
# Treatment: Marketing campaign exposure
# More engaged customers more likely to be exposed
engagement_score = (
0.3 * past_purchases +
0.2 * website_visits +
0.4 * email_open_rate +
0.1 * browsing_time +
np.random.normal(0, 0.5, n_customers)
)
treatment_prob = 1 / (1 + np.exp(-engagement_score))
treatment = np.random.binomial(1, treatment_prob, n_customers)
# Outcome: Purchase amount
# Treatment effect varies by customer characteristics
base_purchase = (
50 +
0.001 * income +
10 * past_purchases +
5 * loyalty_score +
np.random.normal(0, 20, n_customers)
)
# Heterogeneous treatment effect
treatment_effect = (
30 + # Base effect
0.0001 * income + # Higher income β larger effect
5 * email_open_rate + # More engaged β larger effect
np.random.normal(0, 5, n_customers)
)
purchase_amount = base_purchase + treatment_effect * treatment
# Create DataFrame
causal_ml_data = pd.DataFrame({
'Customer_ID': range(n_customers),
'Treatment': treatment,
'Age': age,
'Income': income,
'Past_Purchases': past_purchases,
'Website_Visits': website_visits,
'Email_Open_Rate': email_open_rate,
'Browsing_Time': browsing_time,
'Cart_Abandonment': cart_abandonment,
'Loyalty_Score': loyalty_score,
'Purchase_Amount': purchase_amount
})
print(f"Causal ML Study Summary:")
print(f"- Total customers: {n_customers}")
print(f"- Treatment group: {treatment.sum()}")
print(f"- Control group: {n_customers - treatment.sum()}")
print(f"- Features: {len(causal_ml_data.columns) - 3} covariates")
print()
# Method 1: Double Machine Learning
print("Method 1: Double Machine Learning (DML)")
print("-" * 38)
# Split data
train_data, test_data = train_test_split(causal_ml_data, test_size=0.3, random_state=42)
covariates = ['Age', 'Income', 'Past_Purchases', 'Website_Visits', 'Email_Open_Rate',
'Browsing_Time', 'Cart_Abandonment', 'Loyalty_Score']
# Stage 1: Predict outcome using covariates
outcome_model = GradientBoostingRegressor(n_estimators=100, random_state=42)
outcome_model.fit(train_data[covariates], train_data['Purchase_Amount'])
# Predict outcome for all data
outcome_pred = outcome_model.predict(causal_ml_data[covariates])
outcome_residuals = causal_ml_data['Purchase_Amount'] - outcome_pred
# Stage 1: Predict treatment using covariates
treatment_model = GradientBoostingRegressor(n_estimators=100, random_state=42)
treatment_model.fit(train_data[covariates], train_data['Treatment'])
treatment_pred = treatment_model.predict(causal_ml_data[covariates])
treatment_residuals = causal_ml_data['Treatment'] - treatment_pred
# Stage 2: Regress outcome residuals on treatment residuals
dml_model = LinearRegression()
dml_model.fit(treatment_residuals.values.reshape(-1, 1), outcome_residuals)
dml_ate = dml_model.coef_[0]
print(f"Double ML ATE: ${dml_ate:.2f}")
print(f"True effect range: $30-40 (depending on covariates)")
print()
# Method 2: Meta-learners (T-learner)
print("Method 2: T-Learner (Meta-learner)")
print("-" * 33)
# Train separate models for treated and control
treated_data = train_data[train_data['Treatment'] == 1]
control_data = train_data[train_data['Treatment'] == 0]
t_model_treated = GradientBoostingRegressor(n_estimators=100, random_state=42)
t_model_control = GradientBoostingRegressor(n_estimators=100, random_state=42)
t_model_treated.fit(treated_data[covariates], treated_data['Purchase_Amount'])
t_model_control.fit(control_data[covariates], control_data['Purchase_Amount'])
# Estimate CATE for test data
treated_pred_test = t_model_treated.predict(test_data[covariates])
control_pred_test = t_model_control.predict(test_data[covariates])
t_learner_cate = treated_pred_test - control_pred_test
print(f"T-learner results:")
print(f"- Mean CATE: ${t_learner_cate.mean():.2f}")
print(f"- CATE range: ${t_learner_cate.min():.2f} to ${t_learner_cate.max():.2f}")
print()
# Method 3: S-learner
print("Method 3: S-Learner")
print("-" * 18)
# Single model with treatment as feature
s_learner_data = train_data.copy()
s_model = GradientBoostingRegressor(n_estimators=100, random_state=42)
s_model.fit(s_learner_data[covariates + ['Treatment']], s_learner_data['Purchase_Amount'])
# Estimate CATE
test_treated = test_data.copy()
test_control = test_data.copy()
test_treated['Treatment'] = 1
test_control['Treatment'] = 0
s_pred_treated = s_model.predict(test_treated[covariates + ['Treatment']])
s_pred_control = s_model.predict(test_control[covariates + ['Treatment']])
s_learner_cate = s_pred_treated - s_pred_control
print(f"S-learner results:")
print(f"- Mean CATE: ${s_learner_cate.mean():.2f}")
print(f"- CATE range: ${s_learner_cate.min():.2f} to ${s_learner_cate.max():.2f}")
print()
# Compare methods
print("Method Comparison:")
print("-" * 18)
methods = ['Double ML', 'T-Learner', 'S-Learner']
estimates = [dml_ate, t_learner_cate.mean(), s_learner_cate.mean()]
for method, est in zip(methods, estimates):
print(f"{method:12}: ${est:.2f}")
print(f"True effect: ~$35")
print()
# Visualize causal ML results
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 12))
# Method comparison
ax1.bar(methods, estimates, color=['blue', 'green', 'orange'])
ax1.axhline(y=35, color='red', linestyle='--', label='True effect ~$35')
ax1.set_title('ATE Estimates by Method')
ax1.set_ylabel('Estimated ATE ($)')
ax1.legend()
# T-learner CATE distribution
ax2.hist(t_learner_cate, bins=30, alpha=0.7, color='green', label='T-Learner')
ax2.hist(s_learner_cate, bins=30, alpha=0.5, color='orange', label='S-Learner')
ax2.set_title('CATE Distribution')
ax2.set_xlabel('Estimated CATE ($)')
ax2.set_ylabel('Frequency')
ax2.legend()
# CATE vs income
ax3.scatter(test_data['Income'], t_learner_cate, alpha=0.6, color='green')
ax3.set_title('CATE vs Income (T-Learner)')
ax3.set_xlabel('Income ($)')
ax3.set_ylabel('Estimated CATE ($)')
# CATE vs engagement
ax4.scatter(test_data['Email_Open_Rate'], t_learner_cate, alpha=0.6, color='purple')
ax4.set_title('CATE vs Email Engagement (T-Learner)')
ax4.set_xlabel('Email Open Rate')
ax4.set_ylabel('Estimated CATE ($)')
plt.tight_layout()
plt.show()
print("π€ Causal Machine Learning Insights:")
print("1. Double ML handles high-dimensional data well")
print("2. Meta-learners provide flexible CATE estimation")
print("3. ML methods capture complex treatment heterogeneity")
print("4. Different methods give similar average effects")
print("5. Causal ML enables personalized decision-making")
print()
# Business application
print("πΌ Business Application:")
print("-" * 23)
# Targeting strategy
high_cate_threshold = np.percentile(t_learner_cate, 75)
targeting_rate = np.mean(t_learner_cate > high_cate_threshold)
print(f"Targeting Strategy:")
print(f"- Target top 25% of customers: {targeting_rate:.0%} of population")
print(f"- Expected ROI improvement: ~50% (rough estimate)")
print(f"- Campaign cost reduction: ~25%")
print()
print("Causal ML enables data-driven personalization at scale.")
return causal_ml_data
# Demonstrate causal ML
causal_ml_results = causal_machine_learning_demo()
π Real-World ApplicationsΒΆ
Causal Inference in Practice:
Business ApplicationsΒΆ
A/B Testing: Website optimization, product features
Marketing: Campaign effectiveness, customer targeting
Pricing: Dynamic pricing, promotion evaluation
Operations: Supply chain optimization, workforce management
Policy & GovernmentΒΆ
Social Programs: Welfare, education, healthcare policy
Economic Policy: Tax incentives, minimum wage effects
Environmental: Regulation impact, conservation programs
Public Health: Vaccine efficacy, intervention evaluation
Science & MedicineΒΆ
Clinical Trials: Drug development, treatment comparison
Epidemiology: Disease causation, risk factor analysis
Genetics: Gene-environment interactions
Neuroscience: Brain stimulation, learning mechanisms
Technology & AIΒΆ
Recommender Systems: Treatment effect estimation
Algorithmic Bias: Fairness in machine learning
Reinforcement Learning: Causal reasoning in RL
Explainable AI: Causal explanations for ML models
def real_world_applications_demo():
"""Demonstrate real-world applications of causal inference"""
print("=== Real-World Applications of Causal Inference ===\n")
# Case Study 1: Uber's Dynamic Pricing
print("π Case Study 1: Uber Dynamic Pricing")
print("-" * 38)
# Simulate surge pricing impact
n_rides = 10000
# Rider characteristics
urgency = np.random.normal(0, 1, n_rides) # How badly they need a ride
income_level = np.random.normal(0, 1, n_rides)
time_of_day = np.random.choice(['peak', 'off-peak'], n_rides, p=[0.3, 0.7])
# Surge pricing decision (confounded by demand)
surge_prob = 1 / (1 + np.exp(-(urgency + 0.5 * (time_of_day == 'peak'))))
surge_active = np.random.binomial(1, surge_prob, n_rides)
# Rider behavior
base_willingness = 50 + 10 * urgency + 5 * income_level
price_sensitivity = -15 * surge_active # Surge reduces willingness to pay
ride_completed = np.random.binomial(1,
np.clip(base_willingness + price_sensitivity, 0, 100) / 100,
n_rides)
# Revenue impact
base_fare = 20
surge_multiplier = 1 + 0.5 * surge_active
revenue = ride_completed * base_fare * surge_multiplier
uber_data = pd.DataFrame({
'Ride_ID': range(n_rides),
'Surge_Active': surge_active,
'Urgency': urgency,
'Time_Peak': (time_of_day == 'peak').astype(int),
'Ride_Completed': ride_completed,
'Revenue': revenue
})
# Naive analysis (biased)
naive_revenue_surge = uber_data[uber_data['Surge_Active'] == 1]['Revenue'].mean()
naive_revenue_no_surge = uber_data[uber_data['Surge_Active'] == 0]['Revenue'].mean()
naive_effect = naive_revenue_surge - naive_revenue_no_surge
# Better analysis: Control for confounding
from sklearn.linear_model import LogisticRegression
# Propensity score for surge activation
ps_model = LogisticRegression()
ps_model.fit(uber_data[['Urgency', 'Time_Peak']], uber_data['Surge_Active'])
propensity_scores = ps_model.predict_proba(uber_data[['Urgency', 'Time_Peak']])[:, 1]
# IPW analysis
weights = np.where(uber_data['Surge_Active'] == 1,
1/propensity_scores,
1/(1-propensity_scores))
ipw_revenue_surge = np.average(uber_data[uber_data['Surge_Active'] == 1]['Revenue'],
weights=weights[uber_data['Surge_Active'] == 1])
ipw_revenue_no_surge = np.average(uber_data[uber_data['Surge_Active'] == 0]['Revenue'],
weights=weights[uber_data['Surge_Active'] == 0])
ipw_effect = ipw_revenue_surge - ipw_revenue_no_surge
print(f"Revenue Analysis:")
print(f"- Naive effect: ${naive_effect:.2f} (biased - surge occurs during high demand)")
print(f"- IPW effect: ${ipw_effect:.2f} (controls for demand)")
print(f"- Surge pricing increases revenue despite reduced completion rate")
print()
# Case Study 2: Medical - Treatment Effectiveness
print("π₯ Case Study 2: Medical Treatment Evaluation")
print("-" * 44)
n_patients = 2000
# Patient characteristics
age = np.random.normal(55, 15, n_patients)
disease_severity = np.random.normal(0, 1, n_patients)
comorbidities = np.random.poisson(1, n_patients)
# Treatment assignment (confounded by severity)
treatment_prob = 1 / (1 + np.exp(-(disease_severity + 0.5 * comorbidities)))
treatment = np.random.binomial(1, treatment_prob, n_patients)
# Outcomes
base_survival = 0.7 - 0.1 * disease_severity - 0.05 * comorbidities
treatment_effect = 0.15 # 15% improvement
survival_prob = base_survival + treatment_effect * treatment
survival = np.random.binomial(1, np.clip(survival_prob, 0, 1), n_patients)
medical_data = pd.DataFrame({
'Patient_ID': range(n_patients),
'Treatment': treatment,
'Age': age,
'Severity': disease_severity,
'Comorbidities': comorbidities,
'Survival': survival
})
# PSM analysis
from sklearn.linear_model import LogisticRegression
ps_model = LogisticRegression()
ps_model.fit(medical_data[['Severity', 'Comorbidities']], medical_data['Treatment'])
ps_scores = ps_model.predict_proba(medical_data[['Severity', 'Comorbidities']])[:, 1]
# Simple matching (nearest neighbor)
treated = medical_data[medical_data['Treatment'] == 1].copy()
control = medical_data[medical_data['Treatment'] == 0].copy()
treated['PS'] = ps_scores[medical_data['Treatment'] == 1]
control['PS'] = ps_scores[medical_data['Treatment'] == 0]
# Match within caliper
matched_controls = []
caliper = 0.1
for _, t_row in treated.iterrows():
candidates = control[(control['PS'] >= t_row['PS'] - caliper) &
(control['PS'] <= t_row['PS'] + caliper)]
if len(candidates) > 0:
closest = candidates.iloc[(candidates['PS'] - t_row['PS']).abs().argmin()]
matched_controls.append(closest)
control = control.drop(closest.name)
matched_control_df = pd.DataFrame(matched_controls)
matched_sample = pd.concat([treated, matched_control_df])
# Estimate treatment effect
treated_survival = matched_sample[matched_sample['Treatment'] == 1]['Survival'].mean()
control_survival = matched_sample[matched_sample['Treatment'] == 0]['Survival'].mean()
survival_effect = treated_survival - control_survival
print(f"Survival Analysis:")
print(f"- Matched treated survival: {treated_survival:.1%}")
print(f"- Matched control survival: {control_survival:.1%}")
print(f"- Treatment effect: {survival_effect:.1%} increase")
print(f"- True effect: 15% increase")
print()
# Case Study 3: Policy - Education Reform
print("π Case Study 3: Education Policy Evaluation")
print("-" * 43)
n_schools = 500
# School characteristics
funding_level = np.random.normal(0, 1, n_schools)
student_poverty = np.random.normal(0, 1, n_schools)
teacher_quality = np.random.normal(0, 1, n_schools)
# Reform implementation (confounded by school characteristics)
reform_prob = 1 / (1 + np.exp(-(funding_level - student_poverty)))
reform = np.random.binomial(1, reform_prob, n_schools)
# Student outcomes
base_test_scores = 70 + 5 * funding_level - 3 * student_poverty + 4 * teacher_quality
reform_effect = 8 # 8 point improvement
final_scores = base_test_scores + reform_effect * reform + np.random.normal(0, 5, n_schools)
policy_data = pd.DataFrame({
'School_ID': range(n_schools),
'Reform': reform,
'Funding': funding_level,
'Poverty': student_poverty,
'Teacher_Quality': teacher_quality,
'Test_Scores': final_scores
})
# Regression adjustment
from sklearn.linear_model import LinearRegression
reg_model = LinearRegression()
X = policy_data[['Reform', 'Funding', 'Poverty', 'Teacher_Quality']]
y = policy_data['Test_Scores']
reg_model.fit(X, y)
policy_effect = reg_model.coef_[0]
print(f"Education Reform Analysis:")
print(f"- Regression-adjusted effect: {policy_effect:.1f} points")
print(f"- True effect: 8.0 points")
print(f"- Reform improves test scores even after controlling for confounders")
print()
# Summary of applications
print("π Key Applications Summary:")
print("-" * 29)
applications = [
"Business: A/B testing, pricing, marketing optimization",
"Policy: Program evaluation, economic impact assessment",
"Medicine: Treatment efficacy, drug development",
"Technology: Algorithm fairness, recommender systems",
"Social Science: Education, criminal justice, welfare programs"
]
for app in applications:
print(f"β’ {app}")
print()
# Common challenges
print("β οΈ Common Challenges in Real Applications:")
print("-" * 42)
challenges = [
"Data availability and quality",
"Unmeasured confounding",
"Selection bias",
"Compliance and attrition",
"External validity",
"Cost and feasibility",
"Ethical considerations"
]
for i, challenge in enumerate(challenges, 1):
print(f"{i}. {challenge}")
print()
print("π‘ Success Factors:")
print("-" * 19)
success_factors = [
"Clear research question and hypothesis",
"Strong study design and identification strategy",
"Comprehensive confounder measurement",
"Robust statistical methods",
"Sensitivity and robustness checks",
"Transparent reporting and documentation",
"Collaboration between domain experts and methodologists"
]
for factor in success_factors:
print(f"β’ {factor}")
return {
'uber': uber_data,
'medical': medical_data,
'policy': policy_data
}
# Demonstrate real-world applications
applications_results = real_world_applications_demo()
π― Key TakeawaysΒΆ
1. Mediation AnalysisΒΆ
Decomposes total effects into direct and indirect components
Helps understand treatment mechanisms
Requires strong assumptions about confounding
Sensitivity analysis checks robustness
2. Heterogeneous Treatment EffectsΒΆ
Treatment effects vary across individuals/subgroups
Enables personalized interventions
Methods: subgroup analysis, interactions, tree-based approaches
Critical for policy and business decisions
3. Causal Machine LearningΒΆ
Combines ML flexibility with causal rigor
Double ML, meta-learners, causal forests
Handles high-dimensional data
Enables scalable personalization
4. Real-World ApplicationsΒΆ
Business: A/B testing, pricing, marketing
Policy: program evaluation, economic analysis
Medicine: treatment efficacy, drug development
Technology: algorithmic fairness, recommendations
5. Practical ConsiderationsΒΆ
No method fixes bad data or unmeasured confounding
Combine multiple approaches for robustness
Always consider assumptions and limitations
Domain expertise crucial for interpretation
π Critical Thinking QuestionsΒΆ
How does mediation analysis help improve interventions?
When should you prioritize HTE over average effects?
What are the trade-offs between causal ML and traditional methods?
How can causal inference improve business decision-making?
What ethical issues arise in applying causal methods to social policies?
π Next StepsΒΆ
Congratulations! Youβve completed the causal inference curriculum. Youβre now equipped to:
Design rigorous studies with proper identification strategies
Analyze observational data using advanced methods
Apply causal thinking to business, policy, and research problems
Critically evaluate causal claims in scientific literature
Contribute to the field through innovative applications
Remember: Causal inference is both an art and a science. It requires creativity in identification, rigor in methods, and wisdom in interpretation. The journey doesnβt end here - itβs just beginning!
βThe analysis of variance is not a mathematical theorem, but rather a convenient method of arranging the arithmetic.β - Ronald Fisher
βCorrelation does not imply causation, but causation implies correlation.β - Unknown
βAll models are wrong, but some are useful.β - George Box