Skip to main content
← Glossary · Statistics & Methodology

Z-Score

A standardized score indicating how many standard deviations a data point or test statistic is from the mean, enabling comparison across different scales and distributions.

What Is a Z-Score?

A z-score converts a raw value into the number of standard deviations it sits above or below its mean: z = (x - mu) / sigma. That conversion puts every metric on a common scale so you can reason about probabilities from a standard normal distribution.

Also Known As

  • Data science teams: z-statistic, standardized score, standard score
  • Growth teams: significance score
  • Marketing teams: standardized number
  • Engineering teams: z, normalized deviation

How It Works

Imagine an A/B test with 10,000 visitors per variant. Variant A converts at 3.00%, Variant B at 3.30%. The pooled standard error of the difference is about 0.24%. The z-score is (3.30% - 3.00%) / 0.24% = 1.25. That is below the critical threshold of 1.96 for a two-sided 95% test, so the result is not significant. Now imagine rerunning with 50,000 visitors per variant; the same 0.30% lift produces a z-score above 2.7, clearly significant. Same effect, different conclusion, driven entirely by sample size.

Best Practices

  • Do report z-scores alongside p-values so readers can see how "extreme" a result is.
  • Do use z-scores to spot sample ratio mismatch; absolute z > 3 on variant counts is a red flag.
  • Do switch to t-statistics for small samples where sigma is estimated rather than known.
  • Do not mistake a high z-score for a large business effect; it scales with sample size.
  • Do not use z-tests for proportions below 5% without continuity correction.

Common Mistakes

  • Reporting z-score without the corresponding effect size, letting small lifts look dramatic at high traffic.
  • Using one-sided z-tests silently when convention calls for two-sided.
  • Comparing z-scores across tests with different baselines without accounting for metric scale.

Industry Context

  • SaaS/B2B: Low traffic often keeps z-scores small; patience matters more than trying to force significance.
  • Ecommerce/DTC: High traffic pushes z-scores quickly, which is why peeking is so damaging.
  • Lead gen/services: Sparse conversions drive wide standard errors and dampen z-scores.

The Behavioral Science Connection

Z-scores are cognitively useful because they compress messy information into a familiar scale, similar to how a 1-10 rating compresses a review. But standardization can also disconnect us from the underlying reality; a z-score of 2.0 feels rigorous even when the real-world effect is trivial. Kahneman calls this "WYSIATI" — what you see is all there is.

Key Takeaway

A z-score tells you how surprising a result is under the null hypothesis; pair it with effect size to tell the full story.