Skip to main content
← Glossary · Statistics & Methodology

Effect Size Calculation

Quantifying the magnitude of a difference between groups, independent of sample size — critical for practical significance and meta-analysis.

What Is Effect Size Calculation?

Effect size tells you how big the difference is, not just whether it is real. Two tests can both return p < 0.05 while one shows a 0.2% relative lift and the other shows a 20% relative lift — statistically equivalent claims, business-wise worlds apart. Effect size calculation standardizes the magnitude so you can compare across tests, stack into meta-analyses, and make decisions that respect practical significance.

Also Known As

  • Data science: Cohen's d, Cohen's h, relative lift, absolute lift
  • Growth: "how big was it?"
  • Marketing: impact size, delta
  • Engineering: effect magnitude, treatment effect

How It Works

For a conversion test with control at 4.0% and variant at 4.3%: absolute effect = 0.3pp, relative effect = 7.5%. Cohen's h (appropriate for two proportions) ≈ 0.015 — a tiny effect in standardized terms. For a continuous metric like ARPU, Cohen's d = (mean_variant - mean_control) / pooled_std. A d of 0.2 is small, 0.5 medium, 0.8 large.

In experimentation, relative lift is most commonly reported because it is interpretable to stakeholders and roughly comparable across baselines. But standardized effect size is what you need for power calculations and cross-test comparison.

Best Practices

  • Report both absolute and relative effect size with a confidence interval.
  • Define a minimum practical effect size before the test — below this number, even a significant win isn't worth shipping complexity.
  • Use standardized effect sizes (Cohen's d, h) for meta-analysis across experiments.
  • Show the full confidence interval, not just the point estimate. A 3% lift with CI [-1%, 7%] is very different from [2.5%, 3.5%].
  • Log effect sizes in a test database so program-level trends become visible.

Common Mistakes

  • Reporting relative lift on low baselines without absolute context. A "100% lift" from 0.1% to 0.2% on a low-volume metric is often not worth shipping.
  • Confusing statistical with practical significance. A 0.1% lift at p = 0.01 on a high-traffic site is real but often not worth the integration cost.
  • Comparing effect sizes across baselines without normalization. A 5% relative lift on a 2% baseline is not the same behavioral change as 5% on a 40% baseline.

Industry Context

In SaaS/B2B, effect sizes tend to be large in percentage terms because baselines are low; this creates illusion of big wins that don't translate to pipeline. In ecommerce, effect sizes are small in percentage terms but compound to real dollars at scale — a 1% conversion lift on $100M revenue is $1M. In lead gen, effect size on cost-per-acquisition is the metric that matters, and it requires joining test data with downstream attribution.

The Behavioral Science Connection

Humans are insensitive to magnitude when evaluating significance — a failure mode called scope insensitivity. A p < 0.05 feels like a win regardless of whether the effect is 0.1% or 10%. Forcing effect-size reporting counters this by making magnitude the headline, not a footnote.

Key Takeaway

Significance without magnitude is theater. Effect size is how you separate real business impact from statistical noise dressed up in the language of confidence.