Skip to main content
← Glossary · Statistics & Methodology

Effect Size (Cohen's d)

A standardized measure of the magnitude of difference between two groups, calculated as the difference in means divided by the pooled standard deviation, independent of sample size.

What Is Effect Size?

Effect size measures how large a difference actually is, stripped of sample size. Cohen's d, the most common version, divides the mean difference by the pooled standard deviation, giving a scale-free number. Conventions call d = 0.2 small, 0.5 medium, and 0.8 large.

Also Known As

  • Data science teams: Cohen's d, standardized mean difference, SMD
  • Growth teams: lift magnitude (in standardized units)
  • Marketing teams: "how big the win actually is"
  • Engineering teams: d, g (Hedges' g for small samples)

How It Works

Imagine an A/B test with 10,000 visitors per variant. Variant A session duration: mean 120s, SD 90s. Variant B: mean 125s, SD 92s. Cohen's d = (125 - 120) / pooled SD of ~91 = 0.055. That is a very small effect — barely a nudge — even though with 20,000 users the p-value is under 0.001. Sample size manufactured the significance; effect size tells you the signal is weak. If your business only profits when d is above 0.1, this test is a "do not ship."

Best Practices

  • Do define a minimum detectable effect in Cohen's d terms before launching experiments.
  • Do report effect size with a confidence interval for honest communication.
  • Do use Hedges' g when samples are small (n < 50 per group); it corrects d's bias.
  • Do not celebrate significance without checking whether d is business-meaningful.
  • Do not compare Cohen's d across metrics with very different variance structures without care.

Common Mistakes

  • Treating small d values as important because p < 0.05.
  • Using standardized effect size where a raw dollar or percent lift is more useful to stakeholders.
  • Ignoring that effect size conventions are rules of thumb, not universal thresholds.

Industry Context

  • SaaS/B2B: Tests often see small d values; cumulative small wins compound meaningfully.
  • Ecommerce/DTC: Seasonality produces artificial d fluctuations, making pre-registration critical.
  • Lead gen/services: Lead-quality effect sizes are notoriously hard to measure without long follow-up windows.

The Behavioral Science Connection

Effect size directly addresses what Kahneman calls "the planning fallacy" in analytics — treating any positive number as meaningful. By forcing you to ask "how big?" rather than "any difference?", effect size fights the narrative instinct to declare victory at the first green p-value.

Key Takeaway

P-value tells you whether; effect size tells you how much. You need both to make a good decision.