Power Analysis
The pre-experiment calculation that determines the sample size required to detect a given effect with a specified probability.
What Is Power Analysis?
Power analysis is the math you do before an experiment to answer: "Given the effect I care about, how many users do I need so that, if the effect is real, I will actually detect it?" It ties together four levers — effect size, sample size, alpha, and power — such that fixing any three determines the fourth. Skipping it is the single most common reason experimentation programs produce ambiguous results.
Also Known As
- Data science: sample size calculation, sensitivity analysis
- Growth: test sizing, "how long do we run this?"
- Marketing: audience planning, campaign sizing
- Engineering: load planning for traffic allocation
How It Works
Imagine a signup page at 12% conversion. You want to detect a 10% relative lift (to 13.2%). With alpha = 0.05 (two-sided) and power = 0.80, a standard calculator gives roughly 14,750 users per variant — about 29,500 total. At 2,000 daily signups split 50/50, that is about 15 days. Drop the target to a 5% relative lift and you need roughly 58,000 per variant — a full two-month test.
Power analysis also runs in reverse. Given fixed traffic of 10,000 per variant per week and a two-week window, what lift can you detect? That reverse calculation is how you set realistic expectations with stakeholders.
Best Practices
- Always calculate two ways: required sample size for your target lift, and detectable lift for your available sample.
- Use conversion rate from the last 28 days, not all-time averages. Seasonality shifts baselines.
- Power to 0.80 minimum; 0.90 for high-stakes tests like pricing or flagship flows.
- Account for multiple variants — a four-cell test is not just four two-cell tests, multiplicity matters.
- Add a 15–20% buffer for bot filtering, sample ratio mismatch, and real-world attrition.
Common Mistakes
- Using industry benchmarks as your baseline. Your 12% is not the ecommerce average 3% — use your own data.
- Ignoring variance for revenue metrics. Revenue per user is noisier than binary conversion; the same sample size will detect far smaller lifts on conversion than on ARPU.
- Stopping early when you "hit significance." Peeking invalidates the power calculation and inflates false positives dramatically.
Industry Context
In SaaS/B2B, power analysis often reveals that monthly signup volume cannot support testing modest changes — which should redirect teams to qualitative research, usability testing, or bigger bets. In ecommerce, high traffic enables granular tests but revenue variance still limits what is detectable. In lead gen, power must be sized to MQL or SQL, not raw form completion, since downstream quality dominates the business decision.
The Behavioral Science Connection
Power analysis fights planning fallacy. Teams systematically underestimate how long real change takes to prove — they imagine two weeks and get two months. It also counters overconfidence: most "gut feel" estimates of how much lift a change will produce are 2–3x higher than reality. Writing the power calc down turns that overconfidence into a falsifiable plan.
Key Takeaway
Power analysis is the contract between your hypothesis and reality. If your test cannot mathematically detect the effect you care about, no amount of patience or creativity at the readout will rescue it.