Skip to main content
← Glossary · Statistics & Methodology

Statistical Significance

The probability that an observed difference between test variations is not due to random chance — typically measured at the 95% confidence level.

Statistical significance is the cornerstone of valid A/B testing. It tells you whether the difference you observe between your control and variation is real (a true effect) or just noise (random variation in user behavior).

The p-value Explained Simply

A p-value of 0.05 (the standard threshold for 95% significance) means there's only a 5% chance that the observed difference occurred by random chance alone. It does NOT mean there's a 95% chance your variation is better — that's a common and dangerous misinterpretation.

The Most Common Mistake

The single biggest mistake in experimentation programs: peeking at results and calling a winner when significance is reached, before the planned sample size is achieved. This inflates your false positive rate dramatically — from the intended 5% to as high as 30-40%.

If you check your results 5 times during a test and stop at the first significant result, you're not running a 95% confidence test. You're running approximately a 60-70% confidence test.

When 95% Isn't Enough

For high-stakes tests (pricing changes, checkout redesigns, anything affecting revenue directly), I recommend 99% significance. The cost of a false positive on a pricing test — shipping a change that doesn't actually work — far exceeds the cost of running the test a few days longer.

Practical Guidelines

  • Never stop a test early based on significance alone — wait for planned sample size
  • Use sequential testing methods if you must monitor results continuously
  • Report confidence intervals, not just p-values — the range matters more than the point estimate
  • Remember: statistical significance tells you if an effect exists, not whether it matters (that's practical significance)