Difference-in-Differences: Causal Inference from Natural ExperimentsΒΆ

DiD is the most widely used causal inference method in economics and tech companies. When you can’t randomize, find a β€˜natural experiment’ where some units were treated by accident. This notebook covers classic DiD, parallel trends, event studies, and synthetic control.

1. Setup β€” Synthetic Policy EvaluationΒΆ

Scenario: 20 cities. 10 adopted a ride-sharing ban in month 13 (treatment group). 10 did not (control group). Monthly traffic accident counts for 24 months. Does the ban reduce accidents?

import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

np.random.seed(42)

N_CITIES   = 20
N_MONTHS   = 24
TREAT_TIME = 13        # treatment starts in month 13 (months 13-24 are post)
N_TREATED  = 10
TRUE_EFFECT = -15.0    # ban reduces accidents by 15 per month

rng = np.random.default_rng(42)

cities = [f'city_{i:02d}' for i in range(N_CITIES)]
treated_cities = cities[:N_TREATED]
control_cities = cities[N_TREATED:]

# City-level baseline accident rates (fixed effects)
city_fe = rng.normal(100, 20, N_CITIES)   # mean 100 accidents/month

# Common time trend (same for all cities β€” parallel trends assumption)
time_trend = np.linspace(0, 10, N_MONTHS) + rng.normal(0, 1, N_MONTHS)

records = []
for c_idx, city in enumerate(cities):
    is_treated = city in treated_cities
    for t in range(1, N_MONTHS + 1):
        post = int(t >= TREAT_TIME)
        effect = TRUE_EFFECT if (is_treated and post) else 0.0
        accidents = (city_fe[c_idx] +
                     time_trend[t - 1] +
                     effect +
                     rng.normal(0, 5))
        records.append({
            'city'       : city,
            'month'      : t,
            'treated'    : int(is_treated),
            'post'       : post,
            'accidents'  : max(0.0, round(accidents, 2)),
        })

panel = pd.DataFrame(records)
panel['treat_post'] = panel['treated'] * panel['post']  # interaction term

print(f'Panel shape: {panel.shape}')
print(f'Treated cities : {N_TREATED}  |  Control cities: {N_CITIES - N_TREATED}')
print(f'Pre-treatment months: {TREAT_TIME - 1}  |  Post-treatment months: {N_MONTHS - TREAT_TIME + 1}')
print(f'True causal effect: {TRUE_EFFECT} accidents/month')
panel.head()

2. Classic DiD β€” The 2Γ—2 TableΒΆ

The DiD estimator uses a 2Γ—2 table of group means:

Pre-treatment

Post-treatment

Difference

Treated

\(\bar{Y}_{T,pre}\)

\(\bar{Y}_{T,post}\)

\(\Delta_T\)

Control

\(\bar{Y}_{C,pre}\)

\(\bar{Y}_{C,post}\)

\(\Delta_C\)

DiD

\(\Delta_T - \Delta_C\)

Key assumption: Parallel trends β€” in the absence of treatment, treated and control groups would have followed the same trend.

# 2Γ—2 DiD table
def did_2x2(df):
    means = df.groupby(['treated', 'post'])['accidents'].mean().unstack()
    means.columns = ['Pre', 'Post']
    means.index   = ['Control', 'Treated']
    means['Diff (Post-Pre)'] = means['Post'] - means['Pre']
    return means

table = did_2x2(panel)
print('=== 2x2 DiD Table ===')
print(table.round(3))

delta_treated = table.loc['Treated', 'Diff (Post-Pre)']
delta_control = table.loc['Control', 'Diff (Post-Pre)']
did_estimate  = delta_treated - delta_control

print(f'\nΞ”_treated (post - pre for treated group): {delta_treated:.3f}')
print(f'Ξ”_control (post - pre for control group): {delta_control:.3f}')
print(f'DiD = Ξ”_treated - Ξ”_control             : {did_estimate:.3f}')
print(f'True effect                              : {TRUE_EFFECT:.3f}')

3. Regression DiDΒΆ

The regression formulation generalizes DiD to:

  • Control for covariates

  • Include unit and time fixed effects

  • Compute standard errors (including clustered SEs)

\[Y_{it} = \alpha + \beta_1 \text{Treated}_i + \beta_2 \text{Post}_t + \delta (\text{Treated}_i \times \text{Post}_t) + \epsilon_{it}\]

\(\delta\) is the DiD estimate. With unit FE and time FE:

\[Y_{it} = \alpha_i + \gamma_t + \delta (\text{Treated}_i \times \text{Post}_t) + \epsilon_{it}\]
# --- Simple DiD regression (no fixed effects) ---
X_simple = sm.add_constant(panel[['treated', 'post', 'treat_post']])
model_simple = sm.OLS(panel['accidents'], X_simple).fit(
    cov_type='cluster',
    cov_kwds={'groups': panel['city']}
)

print('=== Simple DiD Regression (clustered SE by city) ===')
print(model_simple.summary().tables[1])
print(f'\nDiD coefficient (treat_post): {model_simple.params["treat_post"]:.3f}')
print(f'95% CI: [{model_simple.conf_int().loc["treat_post", 0]:.3f}, '
      f'{model_simple.conf_int().loc["treat_post", 1]:.3f}]')
# --- Two-way fixed effects DiD (unit FE + time FE) ---
# Add city and month dummies
panel_fe = panel.copy()
city_dummies  = pd.get_dummies(panel_fe['city'],  prefix='city',  drop_first=True)
month_dummies = pd.get_dummies(panel_fe['month'], prefix='month', drop_first=True)

X_fe = pd.concat([
    panel_fe[['treat_post']],
    city_dummies,
    month_dummies
], axis=1).astype(float)
X_fe = sm.add_constant(X_fe)

model_fe = sm.OLS(panel_fe['accidents'], X_fe).fit(
    cov_type='cluster',
    cov_kwds={'groups': panel_fe['city']}
)

twfe_coeff = model_fe.params['treat_post']
twfe_se    = model_fe.bse['treat_post']
twfe_ci    = model_fe.conf_int().loc['treat_post']

print('=== Two-Way Fixed Effects DiD ===')
print(f'DiD coefficient (treat_post): {twfe_coeff:.3f}  SE: {twfe_se:.3f}')
print(f'95% CI: [{twfe_ci[0]:.3f}, {twfe_ci[1]:.3f}]')
print(f'True effect: {TRUE_EFFECT:.3f}')
print(f'\nTWFE includes {city_dummies.shape[1]} city FE + {month_dummies.shape[1]} time FE')

5. Event Study β€” Plot Coefficients for Each Time PeriodΒΆ

An event study regression includes a separate interaction for each time period relative to treatment. If parallel trends holds, pre-treatment coefficients should be near zero and not statistically significant.

# Event study: interact treatment indicator with time dummies
# Reference period: month just before treatment (month TREAT_TIME - 1)
panel_es = panel.copy()
panel_es['event_time'] = panel_es['month'] - TREAT_TIME  # relative time: -12 to +11

# Create dummies for each event-time period, omit reference period (event_time = -1)
event_times = sorted(panel_es['event_time'].unique())
event_times_no_ref = [t for t in event_times if t != -1]

dummy_cols = {}
for t in event_times_no_ref:
    col = f'et_{t:+d}'.replace('+', 'p').replace('-', 'm')
    panel_es[col] = ((panel_es['event_time'] == t) & (panel_es['treated'] == 1)).astype(float)
    dummy_cols[t] = col

# Also add city and month FE
city_dummies_es  = pd.get_dummies(panel_es['city'],  prefix='c', drop_first=True)
month_dummies_es = pd.get_dummies(panel_es['month'], prefix='m', drop_first=True)

et_cols = list(dummy_cols.values())
X_es = pd.concat([
    panel_es[et_cols],
    city_dummies_es,
    month_dummies_es
], axis=1).astype(float)
X_es = sm.add_constant(X_es)

model_es = sm.OLS(panel_es['accidents'], X_es).fit(
    cov_type='cluster', cov_kwds={'groups': panel_es['city']}
)

# Extract event-study coefficients and CIs
es_results = []
for t, col in dummy_cols.items():
    coef = model_es.params[col]
    ci   = model_es.conf_int().loc[col]
    es_results.append({'event_time': t, 'coef': coef, 'ci_lo': ci[0], 'ci_hi': ci[1]})
# Add reference period (0 by definition)
es_results.append({'event_time': -1, 'coef': 0.0, 'ci_lo': 0.0, 'ci_hi': 0.0})
es_df = pd.DataFrame(es_results).sort_values('event_time').reset_index(drop=True)

# Plot
fig, ax = plt.subplots(figsize=(10, 4))
ax.errorbar(es_df['event_time'], es_df['coef'],
            yerr=[es_df['coef'] - es_df['ci_lo'], es_df['ci_hi'] - es_df['coef']],
            fmt='o', color='steelblue', capsize=4, linewidth=1.5)
ax.axhline(0, color='black', linewidth=0.8, linestyle='--')
ax.axvline(-0.5, color='red', linewidth=1.5, linestyle='--', label='Treatment start')
ax.fill_between(es_df['event_time'].values,
                es_df['ci_lo'].values, es_df['ci_hi'].values,
                alpha=0.15, color='steelblue')
ax.set_xlabel('Months relative to treatment'); ax.set_ylabel('Coefficient estimate')
ax.set_title('Event Study: Pre-treatment coefs β‰ˆ 0 (parallel trends), post-treatment shows effect')
ax.legend(); ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('/tmp/event_study.png', dpi=80, bbox_inches='tight')
plt.close()

pre_coefs  = es_df[es_df['event_time'] < -1]['coef']
post_coefs = es_df[es_df['event_time'] >= 0]['coef']
print('Event study results:')
print(f'  Pre-treatment coefficients (should be ~0): mean={pre_coefs.mean():.3f}, max_abs={pre_coefs.abs().max():.3f}')
print(f'  Post-treatment coefficients (should be ~{TRUE_EFFECT}): mean={post_coefs.mean():.3f}')
print('Plot saved to /tmp/event_study.png')

6. Synthetic ControlΒΆ

When you have only one treated unit, DiD is unreliable. Synthetic control constructs a weighted combination of control units that best matches the treated unit’s pre-treatment outcomes.

The synthetic counterfactual: \(\hat{Y}_{1t}(0) = \sum_{j=2}^{J} w_j^* Y_{jt}\)

Weights \(w^*\) are found by solving a constrained optimization: minimize distance between pre-treatment outcomes of treated unit and synthetic control.

from scipy.optimize import minimize

# Use city_00 as the single treated unit; all other 19 cities as donors
donor_cities   = cities[1:]   # 19 donor cities
treated_city   = cities[0]     # city_00

# Build matrices: (N_MONTHS, N_DONORS) for donor outcomes, (N_MONTHS,) for treated
pivot = panel.pivot(index='month', columns='city', values='accidents')
Y_treated_full = pivot[treated_city].values                    # (24,)
Y_donors_full  = pivot[donor_cities].values                    # (24, 19)

# Pre-treatment period
pre_idx  = np.arange(0, TREAT_TIME - 1)     # months 1-12 (indices 0-11)
post_idx = np.arange(TREAT_TIME - 1, N_MONTHS)   # months 13-24

Y_tr_pre   = Y_treated_full[pre_idx]
Y_don_pre  = Y_donors_full[pre_idx, :]     # (12, 19)

# Optimization: find weights w (sum to 1, non-negative) minimizing pre-treatment fit
n_donors = len(donor_cities)

def sc_loss(w):
    synth = Y_don_pre @ w
    return np.sum((Y_tr_pre - synth) ** 2)

w0 = np.ones(n_donors) / n_donors   # equal weights init
constraints = [{'type': 'eq', 'fun': lambda w: w.sum() - 1}]
bounds = [(0, 1)] * n_donors

result = minimize(sc_loss, w0, method='SLSQP', bounds=bounds, constraints=constraints,
                  options={'ftol': 1e-10, 'maxiter': 2000})
w_opt = result.x

# Synthetic control series
Y_synth = Y_donors_full @ w_opt    # (24,) β€” synthetic counterfactual

# Estimated effect: treated - synthetic in post period
effects = Y_treated_full[post_idx] - Y_synth[post_idx]
sc_ate  = effects.mean()

print('=== Synthetic Control ===')
print(f'Pre-treatment RMSPE: {np.sqrt(sc_loss(w_opt)):.4f}')
print(f'Avg estimated effect (post): {sc_ate:.3f}')
print(f'True effect: {TRUE_EFFECT:.3f}')
print(f'\nTop-5 donor cities by weight:')
top_donors = np.argsort(w_opt)[::-1][:5]
for i in top_donors:
    if w_opt[i] > 0.01:
        print(f'  {donor_cities[i]}: {w_opt[i]:.4f}')

# Plot
months = np.arange(1, N_MONTHS + 1)
fig, ax = plt.subplots(figsize=(10, 4))
ax.plot(months, Y_treated_full, 'r-o', label='Treated (city_00)', ms=4)
ax.plot(months, Y_synth, 'b--s',       label='Synthetic control', ms=4)
ax.axvline(TREAT_TIME - 0.5, color='k', linestyle='--', label='Treatment')
ax.fill_between(months[post_idx], Y_synth[post_idx], Y_treated_full[post_idx],
                alpha=0.2, color='red', label='Estimated effect')
ax.set_xlabel('Month'); ax.set_ylabel('Accidents')
ax.set_title('Synthetic Control: city_00 vs its synthetic counterfactual')
ax.legend(); ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('/tmp/synthetic_control.png', dpi=80, bbox_inches='tight')
plt.close()
print('\nPlot saved to /tmp/synthetic_control.png')

7. Staggered DiD β€” When Treatment is Adopted at Different TimesΒΆ

In practice, units often adopt treatment at different times (rollout). The classic TWFE estimator can be biased in staggered settings (Goodman-Bacon 2021). Modern estimators use β€œclean” comparisons only.

Callaway & Sant’Anna (2021) aggregates cohort-specific ATTs: $\(ATT = \mathbb{E}[Y_{it}(g) - Y_{it}(\infty) | G_i = g, t \geq g]\)$

# Generate staggered adoption dataset
N_CITIES_STAG = 30
N_MONTHS_STAG = 24
rng3 = np.random.default_rng(99)

# Each city gets a random treatment start (or never treated)
# 10 never treated, 10 treated early (month 7), 10 treated late (month 15)
city_cohorts = (['never'] * 10 +
                ['early'] * 10 +
                ['late']  * 10)
city_treat_time = {'never': None, 'early': 7, 'late': 15}
city_list_stag  = [f'scity_{i:02d}' for i in range(N_CITIES_STAG)]

city_fe_stag = rng3.normal(100, 20, N_CITIES_STAG)
time_trend_stag = np.linspace(0, 8, N_MONTHS_STAG)

stag_records = []
for c_idx, (city, cohort) in enumerate(zip(city_list_stag, city_cohorts)):
    treat_start = city_treat_time[cohort]
    for t in range(1, N_MONTHS_STAG + 1):
        is_treated_now = (treat_start is not None) and (t >= treat_start)
        effect = TRUE_EFFECT if is_treated_now else 0.0
        accidents = (city_fe_stag[c_idx] + time_trend_stag[t-1] + effect +
                     rng3.normal(0, 5))
        stag_records.append({
            'city': city, 'month': t, 'cohort': cohort,
            'treat_start': treat_start if treat_start else 0,
            'treated_now': int(is_treated_now),
            'accidents': max(0.0, round(accidents, 2))
        })

stag_panel = pd.DataFrame(stag_records)

# Naive TWFE on staggered panel
city_fe_s  = pd.get_dummies(stag_panel['city'],  prefix='c', drop_first=True)
month_fe_s = pd.get_dummies(stag_panel['month'], prefix='m', drop_first=True)
X_stag = pd.concat([stag_panel[['treated_now']], city_fe_s, month_fe_s], axis=1).astype(float)
X_stag = sm.add_constant(X_stag)
m_stag = sm.OLS(stag_panel['accidents'], X_stag).fit(
    cov_type='cluster', cov_kwds={'groups': stag_panel['city']}
)
naive_twfe_stag = m_stag.params['treated_now']

# Cohort-specific ATTs (clean DiD for each cohort vs never-treated)
cohort_atts = {}
never_treated = stag_panel[stag_panel['cohort'] == 'never']

for cohort_name in ['early', 'late']:
    treat_start_c = city_treat_time[cohort_name]
    cohort_data   = stag_panel[stag_panel['cohort'] == cohort_name]
    combined      = pd.concat([cohort_data, never_treated])
    combined      = combined.copy()
    combined['is_cohort'] = (combined['cohort'] == cohort_name).astype(int)
    combined['post']      = (combined['month'] >= treat_start_c).astype(int)
    combined['ip']        = combined['is_cohort'] * combined['post']
    X_c = sm.add_constant(combined[['is_cohort', 'post', 'ip']])
    m_c = sm.OLS(combined['accidents'], X_c).fit(
        cov_type='cluster', cov_kwds={'groups': combined['city']}
    )
    cohort_atts[cohort_name] = m_c.params['ip']

# Aggregate ATT (simple average of cohort ATTs)
agg_att = np.mean(list(cohort_atts.values()))

print('=== Staggered DiD ===')
print(f'Naive TWFE (can be biased with staggered adoption): {naive_twfe_stag:.3f}')
print(f"Cohort-specific ATT (early adopters, treat month {city_treat_time['early']}): {cohort_atts['early']:.3f}")
print(f"Cohort-specific ATT (late adopters,  treat month {city_treat_time['late']}):  {cohort_atts['late']:.3f}")
print(f'Aggregated ATT (Callaway & Sant\'Anna style)           : {agg_att:.3f}')
print(f'True effect                                            : {TRUE_EFFECT:.3f}')

8. Cheat Sheet + ExercisesΒΆ

Difference-in-Differences Cheat SheetΒΆ

Method

Formula

When to Use

Classic 2Γ—2 DiD

\((\bar{Y}_{T,post} - \bar{Y}_{T,pre}) - (\bar{Y}_{C,post} - \bar{Y}_{C,pre})\)

Two groups, two periods

Regression DiD

Y ~ treated + post + treated:post (OLS)

More flexibility, controls

TWFE

Y ~ treated:post + C(city) + C(month)

Panel data, multiple periods

Event Study

Y ~ Ξ£_t (treated Γ— 1[time=t]) + FE

Visualize + test parallel trends

Synthetic Control

Y_synth = W* Y_donors (constrained optim)

Single treated unit

Staggered DiD

Cohort-specific ATTs + aggregation

Staggered rollouts

# TWFE regression (statsmodels)
import statsmodels.formula.api as smf

# Simple DiD
model = smf.ols('Y ~ treated * post', data=df).fit(
    cov_type='cluster', cov_kwds={'groups': df['unit']}
)
did_effect = model.params['treated:post']

# TWFE (with entity and time fixed effects via dummies or absorb)
# Alternatively with linearmodels:
# from linearmodels.panel import PanelOLS
# mod = PanelOLS(Y, X, entity_effects=True, time_effects=True)

Key assumptions:

  1. Parallel trends β€” pre-treatment trends are the same

  2. No anticipation β€” units don’t change behavior before treatment starts

  3. SUTVA β€” treatment of one unit doesn’t affect others (no spillovers)

ExercisesΒΆ

  1. Placebo test: To test parallel trends, pretend treatment happened 6 months earlier (in month 7 instead of 13) and re-run the DiD using only pre-treatment data (months 1-12). The DiD coefficient should be near zero. Report the coefficient and p-value.

  2. Robustness to control set: Re-run the TWFE regression including a city-level covariate (e.g., simulated city population). Does adding the covariate change the DiD estimate? How large is the change, and why?

  3. Permutation test for synthetic control: Estimate synthetic control separately for each of the 10 never-treated cities (as if they were the treated unit). Plot the distribution of placebo effects. Is the true treated unit’s effect an outlier? This is the standard significance test for synthetic control.

  4. Goodman-Bacon decomposition: In the staggered dataset, the naive TWFE estimate is a weighted average of all 2Γ—2 DiD comparisons (early vs never-treated, late vs never-treated, and early vs late). Implement the Goodman-Bacon decomposition manually: compute each 2Γ—2 estimate and its weight, then verify the weighted sum equals the TWFE estimate.

  5. Interrupted time series (ITS): When there is no control group, DiD reduces to ITS: fit a linear trend before and after treatment for the treated units only. Implement ITS on treated cities, compare the estimated effect to the DiD estimate, and discuss what additional assumptions ITS requires.