What is Test Duration?

Atticus Li

Test Duration

The length of time an A/B test should run to produce valid results — balancing the need for sufficient data against the risks of running too long, including novelty effects and day-of-week biases.

Test duration is the length of time you let an experiment run before making a decision. It's a function of three things: the sample size required to detect your expected effect with adequate power, the need to capture a full business cycle (usually one week), and the diminishing returns of extending beyond that. Getting duration right is half art, half arithmetic — and teams consistently err on both sides.

Also Known As

Marketing teams say test runtime or test window.
Growth teams call it test duration or experiment length.
Product teams refer to it as experiment window or trial period.
Engineering teams use runtime or collection window.
Statisticians distinguish between sample size and duration (the latter being a derivative of traffic).

How It Works

Your page gets 5,000 daily visitors split 50/50. To detect a 10% relative lift on a 4% baseline conversion with 80% power at alpha=0.05, you need ~30,000 per arm (60,000 total). At 5,000/day total, that's 12 days — but you round up to 14 days to cover two full weekly cycles. You launch Monday morning. By Thursday you see a significant lift. You ignore it and keep running — because weekend users behave differently from weekday users, and three days of data doesn't include weekend behavior at all.

Best Practices

Calculate required sample size before launching, and translate it into days based on your traffic.
Always run at least one full business cycle (usually 7 days) regardless of significance.
Cap tests at 4–6 weeks to limit cookie churn and external-event contamination.
Don't extend indefinitely — if you haven't reached significance in 2x expected duration, call it inconclusive.
Segment by week if running multi-week tests to check for trend changes.

Common Mistakes

Stopping at three days because significance showed up — missing all weekend behavior.
Running tests for three months "to be sure," introducing seasonality and cookie-churn bias.
Confusing test duration with sample size — you need both correct, not just one.

Industry Context

SaaS/B2B: Low-traffic sites often require 4–8 week tests, which strains organizational patience.
Ecommerce/DTC: Usually 1–3 weeks; hard cap at 4 to avoid seasonal contamination.
Lead gen: 2–4 weeks is typical; lead quality data often requires longer attribution windows.

The Behavioral Science Connection

Test duration is where hyperbolic discounting wreaks havoc. A three-day 15% lift feels more exciting than the same test's three-week 4% lift, even though the three-week result is dramatically more reliable. Pre-registering duration is the Ulysses contract — you bind your future self before the siren song of early significance arrives.

Key Takeaway

Run every test for at least one full business cycle, and pre-register the duration so you aren't tempted to stop when the numbers look good.

← Browse All Terms