The Decision That Determines Everything Else

Before you launch an A/B test, you make one decision that shapes the entire experiment: the minimum detectable effect. It determines how long the test runs, how much traffic you need, and what size improvements you can reliably identify.

Get the MDE wrong and everything downstream breaks. Set it too small and the test runs for months, blocking other experiments. Set it too large and you miss real improvements that would have been worth shipping. Most teams either pick an arbitrary number or skip the decision entirely.

MDE deserves more careful thought than any other parameter in experiment design.

What MDE Actually Means

The minimum detectable effect is the smallest true improvement your test is designed to detect with acceptable reliability. If the real effect is equal to or larger than the MDE, your test has enough power to identify it. If the real effect is smaller than the MDE, your test will likely miss it.

This is not the effect you expect or hope for. It is the threshold below which you are willing to accept the risk of a false negative. Everything smaller than the MDE falls into a zone where your test cannot help you.

Think of it as the resolution of your experiment. A microscope with low magnification can see large objects but misses fine detail. Similarly, a test with a large MDE can detect large effects but is blind to smaller ones.

Why MDE Is a Business Decision, Not a Statistical One

Statisticians can tell you the mathematical relationship between MDE, sample size, power, and significance. But they cannot tell you what MDE to use. That is a business judgment.

The right MDE depends on:

The value of the improvement

What is a one-point improvement in conversion worth annually? What about a half-point? A tenth of a point? Translate the MDE into revenue impact. If detecting a particular level of improvement would generate meaningful annual value, it is worth the traffic investment to detect it.

The cost of the test

Every test has an opportunity cost: the traffic could be used for another experiment, and the team could be working on something else. Longer tests for smaller MDEs cost more. At some point, the cost of running the test exceeds the value of detecting the effect.

The cost of implementation

Some changes are cheap to ship. Others require significant engineering effort, ongoing maintenance, or operational complexity. A small effect might justify a trivial implementation but not a major one.

The traffic available

Low-traffic products cannot detect small effects in reasonable timeframes. The MDE must be calibrated to what is physically possible given your traffic level and testing roadmap.

A Framework for Choosing MDE

Step 1: Calculate the revenue impact

For your primary metric, estimate the annual revenue impact of various improvement levels. Map out what different levels of improvement translate to in business terms.

This gives you a menu of MDEs with attached business values. You can now see the trade-off in concrete terms.

Step 2: Estimate the test duration for each MDE

Using your baseline conversion rate and daily traffic, calculate the sample size and duration for each MDE level. Pair this with the revenue impact.

Now you can see: detecting a smaller effect takes a certain number of weeks and is worth a certain annual amount. Detecting a larger effect takes fewer weeks and is worth a different annual amount.

Step 3: Find the sweet spot

The right MDE is the point where the expected value of running the test justifies the cost. Specifically:

  • The test duration should be reasonable for your experimentation roadmap — typically between one and four weeks for most product experiments.
  • The detectable effect should be large enough that the revenue impact justifies the implementation cost.
  • The power should be high enough that you can trust a null result.

For most product teams with moderate traffic, this sweet spot produces MDEs in a range of relative improvements, not the tiny fractions that teams sometimes target.

Step 4: Validate against historical data

Look at the actual effect sizes from your past experiments. If your median effect size is at a certain level, an MDE well below that means most of your real effects will be detectable. An MDE above your median means you will miss more effects than you catch.

Historical effect sizes also serve as a reality check. If no experiment in your program has ever produced a large effect, choosing an MDE that only detects even larger effects means your tests will never show positive results.

Common Mistakes in Setting MDE

Choosing the smallest possible MDE

Teams sometimes set the MDE as small as their statistics allow, reasoning that detecting smaller effects is always better. But the cost is enormous. Halving the MDE roughly quadruples the required sample size. A test that would take two weeks at a moderate MDE takes eight weeks at half that MDE. Few organizations can afford to run two-month tests.

Using the expected effect as the MDE

The MDE is not what you expect the effect to be. It is the smallest effect you want to be able to detect. If you set the MDE at your expected effect, you will only have conventional power to detect it — meaning a meaningful chance of missing it even if it is real. Set the MDE below the expected effect to give yourself a margin.

Ignoring the MDE entirely

The worst mistake is not choosing an MDE at all. Without it, you have no basis for sample size calculation, no way to determine test duration, and no framework for interpreting null results. You are flying blind.

Using the same MDE for every test

Different tests have different stakes, different implementation costs, and different expected effects. A one-size-fits-all MDE is convenient but suboptimal. Calibrate the MDE to the specific decision.

MDE for Different Types of Tests

Landing page tests

Landing pages typically have moderate conversion rates and receive focused traffic. MDEs for landing page tests can often be set at moderate relative improvements because the traffic is concentrated and the metric is straightforward.

Checkout funnel tests

Checkout changes directly affect revenue, making even small improvements valuable. Teams often want smaller MDEs here, but checkout pages may have lower traffic than top-of-funnel pages. The trade-off between sensitivity and duration is especially sharp.

Feature launches

New features often have uncertain effects. Setting a moderate MDE and running a well-powered test is usually better than trying to detect a tiny effect with an underpowered test. You can always iterate after the initial test.

Pricing experiments

Pricing changes can have large effects (positive or negative), and the stakes are high. Use a moderate MDE but ensure high power, because false negatives on pricing changes are especially costly.

Engagement metrics

Metrics like time on site, pages per session, and return visit rate tend to have high variance. MDEs for these metrics usually need to be larger than for conversion rates, or the test durations become impractical.

The Organizational Dimension

Choosing an MDE is inherently a conversation between data, product, and business teams. Data knows what is feasible given the traffic. Product knows what effects are realistic given the change. Business knows what improvements justify the investment.

When these teams do not talk, you get one of two failure modes:

  • Data sets the MDE without business input. The result is technically sound but disconnected from business value.
  • Business sets the MDE without data input. The result is a test that cannot possibly run long enough to detect the specified effect.

The best organizations treat MDE selection as a collaborative design decision, documented alongside the hypothesis and success criteria before the test launches.

FAQ

Can I change the MDE after the test starts?

Technically yes, but it invalidates the original sample size calculation. If you realize mid-test that the MDE is wrong, it is better to stop the test, recalculate with the new MDE, and restart — or accept the revised MDE and extend the test accordingly.

What if my stakeholders demand a very small MDE?

Show them the math. Translate the small MDE into test duration and opportunity cost. Often, when stakeholders see that detecting a tiny effect requires months of testing, they recalibrate their expectations.

How does MDE relate to the effect I actually observe?

The MDE is a design parameter. The observed effect is a data outcome. If the observed effect is larger than the MDE, you had good power to detect it. If it is smaller, you were unlikely to detect it, and a null result is uninformative.

Should I use absolute or relative MDE?

Both are valid. Relative MDE (percentage change from baseline) is more intuitive for stakeholders and easier to compare across metrics. Absolute MDE (raw change in the metric) is more precise for statistical calculations. Use whichever is clearer for communication, but be consistent.

Share this article
LinkedIn (opens in new tab) X / Twitter (opens in new tab)
Written by Atticus Li

Revenue & experimentation leader — behavioral economics, CRO, and AI. CXL & Mindworx certified. $30M+ in verified impact.