Skip to main content
← Glossary · Experimentation Strategy

Holdout Testing

A method for measuring the cumulative impact of all shipped experiments by withholding changes from a small percentage of users.

Holdout testing (also called holdback testing) is how you prove the aggregate value of your experimentation program. You permanently exclude a small group (typically 5-10%) from all shipped changes, then compare their metrics against the group receiving all optimizations.

Why Holdout Tests Are Essential

Individual A/B tests measure the impact of single changes. But the cumulative impact of 20 shipped changes over a quarter might not equal the sum of their individual lifts due to interaction effects, regression to the mean, and changing user behavior. Holdout tests measure the true cumulative value.

How to Set Up a Holdout

  • Randomly assign 5-10% of users to a holdout group
  • This group sees the site as it was at the start of the holdout period
  • Ship all winning tests to the remaining 90-95%
  • After 3-6 months, compare key metrics between groups
  • Release the holdout group and start a new holdout period

The Political Value of Holdouts

Holdout tests answer the CEO question: "Is our experimentation program actually making a difference?" When you can show that the optimized experience generates 12% more revenue per session than the holdout, you have a powerful argument for continued investment in experimentation.

Challenges

The main challenge is engineering: maintaining two parallel experiences requires infrastructure. The second challenge is statistical: 5% of traffic in the holdout means you need patience for significance. The third is ethical: you're knowingly giving some users a worse experience.