What is Doubly Robust Estimation?

Atticus Li

← Glossary · Statistics & Methodology

Doubly Robust Estimation

A causal inference technique that combines outcome modeling and propensity modeling so the estimate remains consistent if either model is correct.

Doubly robust (DR) estimators combine two weaker methods into one stronger one. You model the outcome (Y given X and T) and the propensity (probability of treatment given X). The DR estimator is consistent if either model is correctly specified — hence "doubly robust." It is the workhorse estimator for observational causal inference and for experiments where compliance or randomization is imperfect.

Also Known As

Data science: DR estimator, augmented inverse propensity weighting (AIPW)
Growth: "the safer causal estimator"
Marketing: bias-corrected attribution
Engineering: robust treatment effect estimator

How It Works

You want to estimate the effect of onboarding a user to a premium feature. Randomization was imperfect — power users self-selected into the feature. You fit: (1) an outcome model predicting 30-day retention from features X and treatment T; (2) a propensity model predicting treatment given X. The AIPW estimator combines these: for each user, use the outcome model to predict retention under treatment and control, then add a bias-correction term weighted by the inverse propensity. The average gives the DR estimate of ATE.

If the outcome model is right, the propensity term zeroes out in expectation. If the propensity model is right, the outcome bias cancels. Only if both are wrong do you get biased estimates.

Best Practices

Use cross-fitting (sample splitting) to avoid overfitting bias when using ML models for either component.
Trim extreme propensities (below 0.05 or above 0.95) — they blow up variance.
Validate propensity model overlap with histograms per treatment arm.
Prefer DR over plain IPW whenever you have decent covariates.
Report both naive and DR estimates — large gaps indicate model misspecification or overlap problems.

Common Mistakes

Skipping cross-fitting with ML models. This introduces severe bias; the theoretical guarantees require it.
Using propensity scores with poor overlap. If some users have near-zero propensity, DR cannot fix that — those users are essentially unobserved under treatment.
Applying DR to pure RCTs. In a randomized experiment, propensity is known and constant — DR simplifies to covariate adjustment.

Industry Context

In SaaS/B2B, DR is the right tool for analyzing impact of opt-in features, self-selected onboarding paths, and quasi-experimental launches where clean randomization wasn't possible. In ecommerce, DR helps with attribution across channels where treatment isn't randomly assigned. In lead gen, DR is the correct tool for lift measurement on channels without clean holdouts.

The Behavioral Science Connection

DR estimation builds humility into the analysis: it assumes your outcome model might be wrong, your propensity model might be wrong, and protects you if either is. This is the statistical analog of the forecaster's credo: be suspicious of any single model and hedge by combining perspectives.

Key Takeaway

When randomization is imperfect or observational — which is most of the real world — doubly robust estimation is the sober choice. It trades some simplicity for meaningful protection against model misspecification.

← Browse All Terms