Skip to main content
← Glossary · A/B Testing

Holdback Testing

An experimentation strategy where a small percentage of users permanently see the control experience after a winning variant ships, enabling long-term impact measurement.

What Is Holdback Testing?

Holdback testing is the practice of keeping a small group of users (typically 1–5%) on the original experience indefinitely after shipping a winning variant to the majority. The holdback group serves as a persistent baseline for measuring long-term effects — novelty decay, downstream retention, lifetime value — that a short 2–4 week test cannot capture. It's what turns "we shipped a win" into "we proved a durable lift."

Also Known As

  • Marketing teams call it holdback, control group, or long-term test.
  • Growth teams say holdback, long-term control, or LT measurement.
  • Product teams use holdback group or long-term control.
  • Engineering teams refer to permanent holdout or long-term control.
  • Data science teams call it holdout, long-term holdback, or persistent control.

How It Works

Your checkout redesign won a 14-day test with +8% purchase rate. You ship to 95% of users and keep 5% on the original checkout. After 90 days you compare: purchase rate is still +8%, average order value is +3%, refund rate is flat, and 60-day retention is +2%. The lift stuck — you've confirmed a real, durable improvement. If instead retention had been -5%, you'd know the original test captured short-term novelty that masked long-term damage.

Best Practices

  • Keep holdbacks small (1–5%) to minimize opportunity cost.
  • Set an expiration date (30, 60, or 90 days post-ship) and commit to reviewing.
  • Include retention, LTV, and support volume in holdback analysis, not just the original primary metric.
  • Use holdbacks for major changes (redesigns, new features), not every small tweak.
  • Communicate holdbacks to customer support — they may occasionally get questions from holdout users.

Common Mistakes

  • Running holdbacks indefinitely and accumulating a "shadow product" with hundreds of feature differences.
  • Using too-small holdback groups (0.5%) that can't power long-term measurement.
  • Forgetting the holdback exists and never analyzing the long-term data.

Industry Context

  • SaaS/B2B: Essential for pricing, onboarding, and activation changes where retention impact is the real question.
  • Ecommerce/DTC: Valuable for checkout and cart changes where repeat purchase is the ultimate metric.
  • Lead gen: Usually not needed — lead-to-sale cycle is shorter than typical holdback horizons.

The Behavioral Science Connection

Holdbacks expose the novelty effect and the primacy effect — both of which decay over time. Users respond temporarily to "something new" (novelty lift) or temporarily to "something unfamiliar" (primacy dip). Only a holdback group that persists past the habituation window lets you separate transient effects from durable ones.

Key Takeaway

Pair major ship decisions with a 5% holdback and a 30–90 day review — it's the only way to confirm short-term wins translate into long-term value.