False Discovery Rate
The expected proportion of rejected null hypotheses that are actually false positives, controlled by methods like Benjamini-Hochberg that offer more power than family-wise error rate corrections.
What Is False Discovery Rate?
False Discovery Rate (FDR) is the expected fraction of your "significant" results that are actually noise. Unlike Bonferroni, which controls the probability of any false positive, FDR accepts that some errors will slip through and instead ensures they make up a controlled share of your reported wins.
Also Known As
- Data science teams: FDR, Benjamini-Hochberg procedure, q-value
- Growth teams: portfolio-level false positive rate
- Marketing teams: "what share of our 'winners' aren't real"
- Engineering teams: BH correction, adjusted p-values
How It Works
Imagine a program that runs 100 tests per quarter with 10,000 visitors per variant. If you use a naive alpha of 0.05 with no correction, you expect 5 false positives just by chance. If you set an FDR target of 10%, the Benjamini-Hochberg procedure adjusts each test's threshold so that no more than 10% of your significant results are noise. So if you end up with 20 significant tests, at most 2 are expected false positives, and at least 18 are real. That is a far more useful guarantee for a continuously-running program than Bonferroni's blanket protection.
Best Practices
- Do use FDR control (Benjamini-Hochberg) for large experimentation programs.
- Do set an FDR threshold that reflects the cost of shipping a false win (often 5-20%).
- Do track empirical win rates (holdback tests) to validate your FDR estimates.
- Do not treat FDR-adjusted p-values as ordinary p-values; they have different interpretation.
- Do not mix FDR with Bonferroni; pick one framework and apply it consistently.
Common Mistakes
- Forgetting to include failed tests in the FDR calculation; this inflates the discovery rate.
- Running FDR correction only after picking winners, which biases the procedure.
- Treating a q-value of 0.08 as "borderline"; it is a precise probabilistic statement.
Industry Context
- SaaS/B2B: Smaller programs may not need FDR; Bonferroni across the primary metric suffices.
- Ecommerce/DTC: High-velocity programs with dozens of tests per month benefit enormously.
- Lead gen/services: FDR lets you run many exploratory tests without drowning in false wins.
The Behavioral Science Connection
FDR is a behavioral commitment: you are accepting imperfection in exchange for discovery power. This mirrors Thaler's "choice architecture" — designing defaults that produce good portfolio outcomes even when individual decisions are imperfect. The replication crisis in psychology was, in part, a failure to think in FDR terms.
Key Takeaway
FDR control is the scalable answer to the multiple comparisons problem for real experimentation programs.