Sequential Testing
A statistical methodology for monitoring A/B test results as data accumulates in real time, using adjusted significance thresholds to control error rates — solving the notorious peeking problem.
What Is Sequential Testing?
Sequential testing is a family of statistical methods that let you analyze an A/B test repeatedly as data accumulates without inflating false positive rates. Where classical frequentist testing requires a fixed sample size and a single analysis at the end, sequential methods use adjusted significance thresholds at each look, preserving valid error rates across many peeks. It's the rigorous answer to the most common experimentation sin: stopping when the data looks good.
Also Known As
- Marketing teams often call it always-valid testing or continuous monitoring.
- Growth teams say sequential tests or SPRTs.
- Product teams use continuous analysis or adaptive monitoring.
- Engineering teams refer to always-valid inference or sequential probability ratio test (SPRT).
- Statisticians distinguish between SPRT, group sequential designs, and always-valid confidence sequences.
How It Works
You launch an experiment and start with a planned sample size of 50,000 users. Under classical statistics, checking the result at 10,000, 20,000, and 30,000 users and stopping early inflates your false positive rate from 5% to roughly 20%. Sequential testing adjusts. At 10,000 users, you need (say) p < 0.001 to stop. At 20,000 users, p < 0.003. At full sample, p < 0.045. The cumulative false positive rate stays at 5% no matter how many peeks you take. Tools like Statsig and Eppo compute these adjusted thresholds automatically.
Best Practices
- Let your experimentation platform handle the math — don't try to compute always-valid p-values by hand.
- Pre-register your maximum sample size even with sequential testing so you have a stopping upper bound.
- Monitor guardrail metrics at every peek, not just the primary metric.
- Communicate the tradeoff clearly: sequential testing costs ~20% more traffic than fixed-horizon testing to reach the same power.
Common Mistakes
- Using classical p-values while peeking daily, effectively running an unprotected sequential analysis.
- Switching between classical and sequential analysis after seeing results, which breaks the error guarantees of both.
- Stopping as soon as significance is reached without checking whether guardrails are stable.
Industry Context
- SaaS/B2B: Useful when low traffic makes every peek tempting — sequential methods let you act on early signal safely.
- Ecommerce/DTC: Revenue-sensitive tests benefit from stopping losers early.
- Lead gen: Moderate value; most lead gen tests have clear fixed-duration windows (weekly pacing).
The Behavioral Science Connection
Sequential testing is a systems-level solution to hyperbolic discounting — our tendency to overvalue immediate outcomes over future ones. Rather than fighting the organizational pressure to peek, sequential methods accommodate it within a rigorous framework. You can check results daily without guilt, because the math has already accounted for your impatience.
Key Takeaway
Sequential testing makes peeking safe — it costs slightly more traffic but removes the single biggest source of false discovery in real-world experimentation programs.