Skip to main content
← Glossary · Statistics & Methodology

Type I Error (False Positive)

Concluding that a variant beat control when in reality there is no true effect — the risk is controlled by the alpha level.

What Is a Type I Error?

A Type I error is when your test shouts "winner!" but the truth is that nothing happened — random chance produced a result that looked real. If alpha is 0.05, roughly 1 in 20 null-true experiments will return a false positive. Teams that run 40 tests a year and ship every significant one are shipping two changes annually that do literally nothing — and sometimes worse than nothing.

Also Known As

  • Data science: alpha error, false positive, FP
  • Growth: "we thought we won but we didn't"
  • Marketing: phantom lift, ghost winner
  • Engineering: false alarm, spurious detection

How It Works

Run an A/A test where both variants are identical. Set alpha to 0.05. With 100,000 users per arm the p-value is uniformly distributed under the null, so 5% of the time it lands below 0.05 purely by chance. Now imagine you check your dashboard daily for 14 days. Each peek is another chance to cross the threshold. The cumulative false positive rate under continuous peeking climbs past 25% — five times what you think you are getting.

Multiplicity compounds this. Ten simultaneous variants at alpha 0.05 have roughly a 40% family-wise error rate.

Best Practices

  • Lock alpha before the test and do not lower the bar mid-flight.
  • Use Bonferroni or Benjamini-Hochberg corrections for multi-variant or multi-metric tests.
  • Adopt sequential testing methods (always-valid p-values, mSPRT) if peeking is unavoidable.
  • Track your shipped-winner replication rate over time — it is the empirical false positive rate of your program.
  • Require a holdout or replication test for high-stakes wins.

Common Mistakes

  • Peeking daily and stopping at first significance. This is the single largest source of false positives in practice.
  • Claiming a "win" on a tertiary metric when the primary was flat. Multiplicity was not accounted for.
  • Running dozens of segment cuts post-hoc and reporting the ones that look good.

Industry Context

In SaaS/B2B, false positives are especially costly because you usually cannot ship and iterate rapidly — a bad pricing experiment can take a quarter to unwind. In ecommerce, the volume of tests means false positives accumulate into "optimization debt" where the site is a Frankenstein of non-effects. In lead gen, false positives in the top funnel get amplified by downstream teams who attribute pipeline to "the winning variant."

The Behavioral Science Connection

Confirmation bias makes Type I errors invisible. Teams remember the wins, forget the regressions, and build a narrative of continuous progress that is half fiction. False positives also feed the bandwagon effect — once a team "knows" a change worked, they propagate it, fight against removing it, and treat any evidence against it as a measurement problem.

Key Takeaway

Every program has a false positive rate. The question is whether you know what it is and whether you are controlling it deliberately — or letting peeking, multiplicity, and post-hoc slicing set it for you.