What is Multiple Comparisons Problem? — Glossary

Atticus Li

← Glossary · Statistics & Methodology

Multiple Comparisons Problem

The increased probability of falsely identifying a significant result when conducting multiple simultaneous statistical tests, as each test carries an independent chance of a Type I error.

What Is the Multiple Comparisons Problem?

When you run many statistical tests at once, the probability that at least one returns a false positive grows much faster than people expect. Each test has its own alpha, and the combined false-positive rate compounds. Unchecked, multiple comparisons produce bogus wins at alarming rates.

Also Known As

Data science teams: multiplicity, look-elsewhere effect, multiple testing
Growth teams: peeking problem, variant-count problem
Marketing teams: "why our winners keep losing in retests"
Engineering teams: FWER inflation

How It Works

Imagine running an A/B test with 10,000 visitors per variant and checking 10 secondary metrics at alpha = 0.05. The probability that at least one metric is falsely significant under a true null is 1 - (0.95)^10, or about 40%. Now imagine you also segment by device (3 categories), geography (5 regions), and traffic source (4 channels): you have effectively hundreds of tests. Unsurprisingly, you find "significant" segments. Most are illusions.

Best Practices

Do declare a single primary metric before the test and evaluate it at standard alpha.
Do apply Bonferroni for small sets of planned comparisons.
Do apply FDR control for large exploratory programs.
Do not slice data into segments post-hoc and report the best one as a finding.
Do not peek repeatedly during a test without sequential testing corrections.

Common Mistakes

Reporting surprise segment wins without multiplicity correction.
Confusing "exploratory" with "unaccountable"; exploration still has a discovery rate.
Treating pre-test peeking as harmless when it inflates false positives dramatically.

Industry Context

SaaS/B2B: Funnel-stage segmentation creates silent multiplicity; watch for it.
Ecommerce/DTC: Category-level cuts are a common source of false wins.
Lead gen/services: Long sales cycles encourage repeat peeking, which is multiplicity in time.

The Behavioral Science Connection

This is the "Texas sharpshooter fallacy" — firing bullets into a barn, then drawing a bullseye around the densest cluster. Kahneman's work on narrative bias shows humans are compelled to explain noise. Multiplicity correction is a discipline that forces the organization to remember how many shots were fired.

Key Takeaway

More tests mean more false positives; either correct for them formally or narrow your scope to a single primary metric.

← Browse All Terms