Z-Score
A standardized score indicating how many standard deviations a data point or test statistic is from the mean, enabling comparison across different scales and distributions.
What Is a Z-Score?
A z-score converts a raw value into the number of standard deviations it sits above or below its mean: z = (x - mu) / sigma. That conversion puts every metric on a common scale so you can reason about probabilities from a standard normal distribution.
Also Known As
- Data science teams: z-statistic, standardized score, standard score
- Growth teams: significance score
- Marketing teams: standardized number
- Engineering teams: z, normalized deviation
How It Works
Imagine an A/B test with 10,000 visitors per variant. Variant A converts at 3.00%, Variant B at 3.30%. The pooled standard error of the difference is about 0.24%. The z-score is (3.30% - 3.00%) / 0.24% = 1.25. That is below the critical threshold of 1.96 for a two-sided 95% test, so the result is not significant. Now imagine rerunning with 50,000 visitors per variant; the same 0.30% lift produces a z-score above 2.7, clearly significant. Same effect, different conclusion, driven entirely by sample size.
Best Practices
- Do report z-scores alongside p-values so readers can see how "extreme" a result is.
- Do use z-scores to spot sample ratio mismatch; absolute z > 3 on variant counts is a red flag.
- Do switch to t-statistics for small samples where sigma is estimated rather than known.
- Do not mistake a high z-score for a large business effect; it scales with sample size.
- Do not use z-tests for proportions below 5% without continuity correction.
Common Mistakes
- Reporting z-score without the corresponding effect size, letting small lifts look dramatic at high traffic.
- Using one-sided z-tests silently when convention calls for two-sided.
- Comparing z-scores across tests with different baselines without accounting for metric scale.
Industry Context
- SaaS/B2B: Low traffic often keeps z-scores small; patience matters more than trying to force significance.
- Ecommerce/DTC: High traffic pushes z-scores quickly, which is why peeking is so damaging.
- Lead gen/services: Sparse conversions drive wide standard errors and dampen z-scores.
The Behavioral Science Connection
Z-scores are cognitively useful because they compress messy information into a familiar scale, similar to how a 1-10 rating compresses a review. But standardization can also disconnect us from the underlying reality; a z-score of 2.0 feels rigorous even when the real-world effect is trivial. Kahneman calls this "WYSIATI" — what you see is all there is.
Key Takeaway
A z-score tells you how surprising a result is under the null hypothesis; pair it with effect size to tell the full story.