Regression to the Mean
The statistical phenomenon where extreme measurements tend to be followed by measurements closer to the average — not because of any intervention, but due to natural variability.
Regression to the mean is the silent killer of experimentation programs. It explains why a page that performed terribly last week might improve this week without any changes — and why claiming credit for that improvement is statistically dishonest.
Why This Destroys A/B Test Credibility
Teams often launch experiments on pages that are "underperforming." But if you select a page because it had a bad week, natural variability means next week will likely be better regardless. If you launch an A/B test at that moment, the control will naturally improve, and your variant needs to beat an artificially low baseline that's already recovering.
The Pre-Test Observation Period
The fix is simple: establish a stable baseline before testing. Observe the page's performance for 2-4 weeks before launching an experiment. If performance was unusually low when you started observing, wait for it to stabilize. Your test should compare against normal performance, not anomalous performance.
Common Misattribution
I've seen teams celebrate "lifts" that were entirely explained by regression to the mean. The pattern: something breaks, metrics drop, team scrambles to fix it, metrics recover, team claims the fix worked. In many cases, the metrics would have recovered even without the fix. This is why holdout groups and proper experimental design matter.
Practical Application
Never launch an A/B test in response to a sudden performance drop. Wait for the baseline to stabilize. And always be suspicious of dramatic lifts on tests launched immediately after poor performance — they may be regression to the mean, not real treatment effects.