Skip to main content
Glossary Testing & Experimentation

Sequential Testing

How sequential testing solves the peeking problem in A/B tests, when to use it, and the key methods practitioners should know.

Sequential testing is a statistical methodology that allows you to analyze experiment results continuously — at any point during the test — without inflating your false positive rate. It solves the peeking problem that plagues traditional fixed-horizon A/B tests.

The peeking problem

In a standard A/B test, you calculate a required sample size and commit to reading results only after reaching that number. But in practice, everyone peeks. Stakeholders want updates. You want to stop losers early. The problem: checking results repeatedly with a fixed significance threshold dramatically increases false positives. A 5% alpha can become 20-30% with daily peeking over a multi-week test.

Sequential testing addresses this by adjusting the significance boundary as data accumulates, maintaining the overall error rate at your target level regardless of how many times you look.

Key methods

Alpha spending functions. The most common approach. You “spend” your total alpha (e.g., 0.05) across multiple interim analyses using a pre-defined spending function — O’Brien-Fleming (conservative early, aggressive late) or Pocock (uniform spending). At each analysis point, you compare your test statistic to an adjusted boundary.

Always-valid p-values. Methods like mSPRT (mixture Sequential Probability Ratio Test) produce p-values that are valid at any stopping point, not just pre-planned interim looks. This is what platforms like Optimizely implement under the hood.

Confidence sequences. A more recent innovation that produces confidence intervals valid at all sample sizes simultaneously. They’re wider than fixed-sample intervals early on but tighten as data accumulates.

When to use sequential testing

Sequential testing is ideal when:

  • You need to stop losing tests early to minimize negative impact
  • Stakeholders demand ongoing visibility into test performance
  • You’re running tests on high-value pages where extended exposure to a bad variant has real cost
  • Test duration is uncertain and traffic patterns are volatile

The tradeoffs

Sequential methods require slightly larger sample sizes than fixed-horizon tests to detect the same effect — typically 20-30% more. You’re trading statistical efficiency for the flexibility to monitor and stop early. For most experimentation programs, this is an excellent trade.

Practical example

You’re testing a new checkout flow. The fixed-horizon approach says you need 30,000 visitors per variant over 3 weeks. With sequential testing, you set up 5 interim analyses (twice per week). After the first week, the new checkout shows a significant improvement at the adjusted boundary — the effect is large enough to be detected early. You stop the test, ship the winner, and start the next experiment a week early. If the test had been flat, you’d have continued to the full sample size.

Work Together

Put This Into Practice

Understanding the theory is step one. Building an experimentation program that applies these concepts systematically — and ties every test to revenue — is where the real impact happens.

Lean Experiments Newsletter

Revenue Frameworks
for Growth Leaders

Every week: one experiment, one framework, one insight to make your marketing more evidence-based and your revenue more predictable.