Skip to main content
← Glossary · Statistics & Methodology

Meta-Analysis of Experiments

Combining results from multiple experiments to estimate program-level effects, heterogeneity, and publication-bias-adjusted truth.

What Is Meta-Analysis of Experiments?

Meta-analysis aggregates results across many experiments to answer questions no single test can: "Does this pattern work on average?", "How much does it vary across contexts?", and "What is the base rate of winners in our program?" For experimentation teams that have run dozens to hundreds of tests, meta-analysis turns the archive from a museum into a prediction engine.

Also Known As

  • Data science: cross-experiment analysis, hierarchical meta-analysis
  • Growth: program-level learning, portfolio analysis
  • Marketing: pattern mining across tests
  • Engineering: aggregated effect estimation

How It Works

You collect effect sizes and standard errors from 40 past tests on homepage hero changes. Fit a random-effects meta-analysis model: pooled effect = +1.8% [95% CI: +0.9%, +2.7%], between-test standard deviation tau = 2.5%. Interpretation: on average, hero changes lift conversion ~2%, but heterogeneity is large — any given test could return anywhere from -3% to +7% with reasonable probability. Base rate of "real winners" in this class is ~45%. Next time a stakeholder argues that a new hero change will lift conversion 10%, the prior says otherwise.

Meta-regression extends this by modeling effect as a function of test features (desktop vs mobile, product category, test duration) to find systematic moderators.

Best Practices

  • Log effect size, SE, and metadata for every test — the archive is only as good as its structure.
  • Use random-effects models unless you have strong reason to assume constant effects (you don't).
  • Check for publication bias — are "failed" tests actually logged, or do they disappear?
  • Report heterogeneity (I² or tau) alongside pooled effects. An average hides everything.
  • Use meta-analytic priors for future test planning — this is where meta-analysis repays its cost.

Common Mistakes

  • Combining tests on incommensurable metrics. Adding effects across unrelated metrics gives nonsense.
  • Ignoring heterogeneity. A pooled "+2%" is misleading if half the tests lost and half won by 10%.
  • Not logging failures. Meta-analysis on only "significant" results wildly overestimates program effectiveness.

Industry Context

In SaaS/B2B, meta-analysis tells you which classes of test actually move the needle historically — invaluable for roadmap prioritization under traffic constraints. In ecommerce, it reveals which merchandising patterns are durable across seasons versus seasonal noise. In lead gen, meta-analysis across campaigns separates repeatable lifts from lucky one-offs.

The Behavioral Science Connection

Meta-analysis is the antidote to vividness bias — the tendency to overweight the single memorable win from last year. The archive is boring; the stories are exciting. Meta-analysis turns the archive into stories with honest statistics attached, resetting program priors toward reality.

Key Takeaway

Every mature experimentation program should have a meta-analysis practice. Without it, the team relearns the same lessons every 18 months, loses institutional memory during reorgs, and prices new tests based on the loudest story rather than the historical truth.