What is Power Analysis?

Atticus Li

← Glossary · Statistics & Methodology

Power Analysis

The pre-experiment calculation that determines the sample size required to detect a given effect with a specified probability.

Power analysis is the math you do before an experiment to answer: "Given the effect I care about, how many users do I need so that, if the effect is real, I will actually detect it?" It ties together four levers — effect size, sample size, alpha, and power — such that fixing any three determines the fourth. Skipping it is the single most common reason experimentation programs produce ambiguous results.

Also Known As

Data science: sample size calculation, sensitivity analysis
Growth: test sizing, "how long do we run this?"
Marketing: audience planning, campaign sizing
Engineering: load planning for traffic allocation

How It Works

Imagine a signup page at 12% conversion. You want to detect a 10% relative lift (to 13.2%). With alpha = 0.05 (two-sided) and power = 0.80, a standard calculator gives roughly 14,750 users per variant — about 29,500 total. At 2,000 daily signups split 50/50, that is about 15 days. Drop the target to a 5% relative lift and you need roughly 58,000 per variant — a full two-month test.

Power analysis also runs in reverse. Given fixed traffic of 10,000 per variant per week and a two-week window, what lift can you detect? That reverse calculation is how you set realistic expectations with stakeholders.

Best Practices

Always calculate two ways: required sample size for your target lift, and detectable lift for your available sample.
Use conversion rate from the last 28 days, not all-time averages. Seasonality shifts baselines.
Power to 0.80 minimum; 0.90 for high-stakes tests like pricing or flagship flows.
Account for multiple variants — a four-cell test is not just four two-cell tests, multiplicity matters.
Add a 15–20% buffer for bot filtering, sample ratio mismatch, and real-world attrition.

Common Mistakes

Using industry benchmarks as your baseline. Your 12% is not the ecommerce average 3% — use your own data.
Ignoring variance for revenue metrics. Revenue per user is noisier than binary conversion; the same sample size will detect far smaller lifts on conversion than on ARPU.
Stopping early when you "hit significance." Peeking invalidates the power calculation and inflates false positives dramatically.

Industry Context

In SaaS/B2B, power analysis often reveals that monthly signup volume cannot support testing modest changes — which should redirect teams to qualitative research, usability testing, or bigger bets. In ecommerce, high traffic enables granular tests but revenue variance still limits what is detectable. In lead gen, power must be sized to MQL or SQL, not raw form completion, since downstream quality dominates the business decision.

The Behavioral Science Connection

Power analysis fights planning fallacy. Teams systematically underestimate how long real change takes to prove — they imagine two weeks and get two months. It also counters overconfidence: most "gut feel" estimates of how much lift a change will produce are 2–3x higher than reality. Writing the power calc down turns that overconfidence into a falsifiable plan.

Key Takeaway

Power analysis is the contract between your hypothesis and reality. If your test cannot mathematically detect the effect you care about, no amount of patience or creativity at the readout will rescue it.

← Browse All Terms