Type II Error (False Negative)
Failing to detect a real effect — the risk is controlled by statistical power and equals 1 minus power.
What Is a Type II Error?
A Type II error is when a truly better variant loses, ties, or is called inconclusive. With power of 0.80 your Type II rate is 0.20 — one in five real winners gets killed. Because teams rarely re-test "losers," Type II errors compound silently: good ideas are buried, stakeholders lose faith in the team's roadmap, and a culture of "nothing ever moves the needle" takes hold.
Also Known As
- Data science: beta error, false negative, FN, miss
- Growth: "we killed a winner"
- Marketing: missed opportunity, buried lift
- Engineering: undetected change, sensitivity miss
How It Works
Suppose a pricing page truly converts 1.5% better in the variant. You run 30,000 users per arm with a 6% baseline. Power to detect 1.5% is roughly 0.35 — meaning a 65% chance of calling it flat. You ship control. Two quarters later, a similar idea gets retested at 80,000 users per arm and wins cleanly. The idea was always right; the first test was underpowered.
Best Practices
- Design every test with a target power of at least 0.80 for the smallest effect you would act on.
- Treat flat results as "insufficient evidence," not "no effect." Log the detectable MDE with the result.
- Revisit buried ideas annually using updated traffic and variance assumptions.
- Use variance reduction (CUPED, stratification, pre-exposure baselines) to raise power without more traffic.
- Prefer fewer, larger tests on the roadmap's top bets rather than many underpowered ones.
Common Mistakes
- Calling a flat result a "loss" and removing the idea from the backlog.
- Running short two-week tests on low-traffic pages where detectable effect is 30%+ — guaranteeing Type II errors on any realistic change.
- Ignoring power entirely and focusing only on p-value discipline. Controlling Type I while Type II runs wild just means you ship less of everything, including real winners.
Industry Context
In SaaS/B2B, Type II is the dominant error mode because traffic is scarce. Many B2B "experimentation programs" effectively function as randomized noise generators because power is 0.3. In ecommerce, Type II shows up most on revenue metrics, where variance is high. In lead gen, Type II on lead quality is brutal: you ship the variant that won on fills but killed MQL rate, because you never had power to detect the quality shift.
The Behavioral Science Connection
Type II errors exploit loss aversion in the opposite direction: teams remember failures (shipped losers) more vividly than invisible losses (buried winners). The missing winners do not show up in any dashboard, so they are not grieved. Explicitly logging "effects we could have detected" turns an invisible loss into a visible one and changes investment decisions.
Key Takeaway
False negatives are the silent tax on experimentation programs. They kill your best ideas and nobody notices. Fight them by powering tests correctly and treating flat results as "not yet," not "no."