Alpha Level
The pre-specified threshold for statistical significance — the probability of a Type I error you are willing to accept, conventionally 0.05.
What Is Alpha Level?
Alpha is the line in the sand you draw before running a test. It is the maximum probability of falsely declaring a winner that you are willing to tolerate. If alpha is 0.05, you accept a 5% false positive rate — per test, per metric, per comparison. Alpha is a choice, not a law of nature, and choosing it thoughtfully is one of the few decisions that separates a serious experimentation program from a theater of dashboards.
Also Known As
- Data science: significance level, p-value threshold
- Growth: "the 95% confidence bar"
- Marketing: confidence level (1 - alpha)
- Engineering: decision threshold
How It Works
Run a two-sided test with alpha = 0.05 and baseline 10% conversion. The critical z-value is ±1.96. Any observed z above 1.96 or below -1.96 crosses the bar. At alpha = 0.10 the critical z shrinks to ±1.645 — you ship more winners and accept more false positives. At alpha = 0.01 the bar rises to ±2.576 — fewer false positives, but you need much more sample to clear it.
Alpha interacts with sample size: at fixed effect and variance, halving alpha roughly requires 40% more data to retain the same power.
Best Practices
- Set alpha based on the cost of being wrong. A paywall change deserves alpha 0.01; a button-color test can live at 0.10.
- Lock alpha in the test doc before launch. Ex-post alpha-shopping is cheating.
- Use alpha 0.025 one-sided only when a directional hypothesis is pre-registered and defensible.
- Apply corrections for multiple comparisons when testing multiple variants or metrics.
- Report the exact p-value, not just "significant / not significant," so future meta-analyses can be done.
Common Mistakes
- Mistaking alpha for the probability the variant is worse. A p-value of 0.04 does not mean 96% chance control is wrong — that is a Bayesian interpretation of a frequentist number.
- Using alpha 0.05 reflexively on everything regardless of stakes or reversibility.
- Changing alpha mid-test because the result is "almost there."
Industry Context
In SaaS/B2B, alpha 0.10 is often more appropriate than 0.05 given the cost of missed improvements at low traffic. In ecommerce, high-volume teams can afford alpha 0.01 on revenue-moving tests to protect against shipping lossy variants. In lead gen, alpha choice should be tied to downstream economics — the cost of a false positive that degrades lead quality can be 10x the cost of missing a true positive on form fills.
The Behavioral Science Connection
Humans treat 0.05 as a magic number, a phenomenon called threshold fetishism. A p-value of 0.049 feels meaningfully different from 0.051, but the underlying evidence is nearly identical. Pre-registering alpha protects against the motivated reasoning that kicks in when a stakeholder-favored variant lands at p = 0.06.
Key Takeaway
Alpha is a business decision dressed up as a statistical one. Choose it deliberately, tie it to consequences, and lock it down before you see any data.