What is Holdback Testing?

Atticus Li

Holdback Testing

An experimentation strategy where a small percentage of users permanently see the control experience after a winning variant ships, enabling long-term impact measurement.

Holdback testing is the practice of keeping a small group of users (typically 1–5%) on the original experience indefinitely after shipping a winning variant to the majority. The holdback group serves as a persistent baseline for measuring long-term effects — novelty decay, downstream retention, lifetime value — that a short 2–4 week test cannot capture. It's what turns "we shipped a win" into "we proved a durable lift."

Also Known As

Marketing teams call it holdback, control group, or long-term test.
Growth teams say holdback, long-term control, or LT measurement.
Product teams use holdback group or long-term control.
Engineering teams refer to permanent holdout or long-term control.
Data science teams call it holdout, long-term holdback, or persistent control.

How It Works

Your checkout redesign won a 14-day test with +8% purchase rate. You ship to 95% of users and keep 5% on the original checkout. After 90 days you compare: purchase rate is still +8%, average order value is +3%, refund rate is flat, and 60-day retention is +2%. The lift stuck — you've confirmed a real, durable improvement. If instead retention had been -5%, you'd know the original test captured short-term novelty that masked long-term damage.

Best Practices

Keep holdbacks small (1–5%) to minimize opportunity cost.
Set an expiration date (30, 60, or 90 days post-ship) and commit to reviewing.
Include retention, LTV, and support volume in holdback analysis, not just the original primary metric.
Use holdbacks for major changes (redesigns, new features), not every small tweak.
Communicate holdbacks to customer support — they may occasionally get questions from holdout users.

Common Mistakes

Running holdbacks indefinitely and accumulating a "shadow product" with hundreds of feature differences.
Using too-small holdback groups (0.5%) that can't power long-term measurement.
Forgetting the holdback exists and never analyzing the long-term data.

Industry Context

SaaS/B2B: Essential for pricing, onboarding, and activation changes where retention impact is the real question.
Ecommerce/DTC: Valuable for checkout and cart changes where repeat purchase is the ultimate metric.
Lead gen: Usually not needed — lead-to-sale cycle is shorter than typical holdback horizons.

The Behavioral Science Connection

Holdbacks expose the novelty effect and the primacy effect — both of which decay over time. Users respond temporarily to "something new" (novelty lift) or temporarily to "something unfamiliar" (primacy dip). Only a holdback group that persists past the habituation window lets you separate transient effects from durable ones.

Key Takeaway

Pair major ship decisions with a 5% holdback and a 30–90 day review — it's the only way to confirm short-term wins translate into long-term value.

← Browse All Terms