The Test That Lied

Imagine you run an A/B test with a fifty-fifty traffic split. After two weeks, the results look great: the variant outperforms the control on your primary metric with strong statistical significance. You ship the change.

Three weeks later, the metric trend reverses. The improvement vanishes. You re-examine the experiment data and discover something you missed: the traffic split was not fifty-fifty. It was fifty-three to forty-seven. Your randomization was broken.

This is a sample ratio mismatch, and it is one of the most common — and most ignored — threats to experiment validity. When the observed traffic split does not match the expected split, something has gone wrong with the randomization process. And when randomization is broken, every conclusion drawn from the experiment is unreliable.

What Sample Ratio Mismatch Is

Sample ratio mismatch (SRM) occurs when the actual distribution of users across experiment groups differs significantly from the intended distribution.

If you configured a fifty-fifty split and observe a fifty-point-one to forty-nine-point-nine split with millions of users, that is normal statistical fluctuation. If you observe a fifty-two to forty-eight split with the same sample size, the probability of that happening by chance is vanishingly small. Something in your system is systematically sending more users to one group.

Why It Matters

The entire framework of A/B testing relies on one assumption: the treatment and control groups are identical in all respects except the change being tested. Random assignment is what creates this equivalence.

When SRM occurs, the groups are no longer equivalent. Users who were systematically diverted to one group may differ from users in the other group in ways that correlate with the outcome metric. The observed difference in the metric might be caused by the treatment — or it might be caused by the fact that different types of users ended up in different groups.

You cannot tell the difference. That is what makes SRM so dangerous.

Common Causes of Sample Ratio Mismatch

Browser and Bot Filtering

This is the most common cause. Your experiment platform assigns a user to a group, but something in the rendering pipeline prevents the group assignment from being recorded. If this filtering affects one group more than the other — because the treatment loads slightly differently, or triggers a different code path — you get SRM.

Example: The treatment variant includes a new JavaScript module. Some older browsers fail to execute it and the user's experiment exposure is not logged. The control group, using the old code, logs normally. Result: more users appear in the control group.

Redirect-Based Experiments

Experiments that redirect users to different URLs are particularly prone to SRM. Redirects have different latency characteristics. Users on slow connections may abandon during the redirect, and if one variant's redirect is slower than the other, you lose users asymmetrically.

Triggered Experiments

Some experiments only trigger under specific conditions (for example, the user must reach a certain page). If the treatment itself affects whether users reach the trigger point, you get differential exposure.

Example: You are testing a new checkout flow. Users are assigned to the experiment when they reach the cart page. But the treatment variant makes the product page more confusing, so fewer treatment users reach the cart. The treatment group shrinks — not because of randomization failure, but because the treatment changed the behavior that triggers experiment enrollment.

Caching Issues

Caching can cause SRM in subtle ways. If the control version is cached more aggressively than the treatment (or vice versa), the caching layer serves different experiences to different users regardless of their group assignment. Users who receive the cached version may not have their experiment exposure logged correctly.

User Session Issues

If users can appear in multiple sessions (different devices, cleared cookies, private browsing), they might be assigned to different groups in different sessions. When you de-duplicate by user ID, some of these conflicting assignments are resolved — but the resolution may systematically favor one group.

Experiment Interaction Effects

If multiple experiments run simultaneously and share a randomization mechanism, they can interfere with each other. A user in the treatment group of Experiment A might be systematically more likely to end up in the control group of Experiment B, creating SRM in Experiment B.

How to Detect Sample Ratio Mismatch

The Chi-Squared Test

The standard detection method is a chi-squared goodness-of-fit test. Compare the observed distribution of users across groups to the expected distribution.

For a fifty-fifty split with N total users:

  • Expected: N/2 in each group
  • Observed: actual counts in each group
  • Calculate chi-squared: sum of (observed - expected)^2 / expected
  • Compare to the chi-squared distribution with one degree of freedom

If the p-value is below a threshold (typically 0.001 — more conservative than the usual 0.05 because SRM is a validity check, not a treatment effect), you have evidence of SRM.

When to Check

Check for SRM:

  • Before analyzing results. SRM invalidates the experiment, so checking after analysis is too late.
  • Daily during the experiment. Early detection allows you to pause and investigate before wasting more time.
  • At the overall level and within segments. SRM might be present only in specific segments (for example, only on mobile, or only for new users).

Automated Monitoring

The best practice is to build automated SRM detection into your experiment platform. Every experiment should trigger an alert if SRM is detected above the threshold. This removes the reliance on analysts remembering to check.

What to Do When You Find SRM

Step 1: Stop Interpreting the Results

This is the most important step and the one teams resist most. When SRM is present, the experiment results are unreliable. Any conclusion — positive, negative, or neutral — is potentially wrong.

Do not ship a winning result if SRM is detected. Do not kill a losing result either. The data cannot be trusted.

Step 2: Investigate the Cause

Work through the common causes listed above:

  1. Check for differential logging errors between groups
  2. Check for client-side errors that affect one variant more than the other
  3. Check for bot or crawler activity that disproportionately hits one variant
  4. Check for caching asymmetries
  5. Check for redirect timing differences
  6. Check for interactions with other running experiments

Step 3: Fix and Re-Run

Once you identify and fix the cause, re-run the experiment from scratch. Do not try to salvage the existing data by removing the problematic period or segment. The contamination is systemic — you cannot be sure that removing the obvious cases removes all of them.

Step 4: Prevent Recurrence

Update your experiment platform or process to prevent the same cause from affecting future experiments. This might mean:

  • Adding client-side error handling to prevent logging failures
  • Avoiding redirect-based experiments when possible
  • Improving bot filtering
  • Adding automated SRM checks to the launch process

The Hidden Prevalence of SRM

Research from major experimentation platforms suggests that SRM affects a significant portion of all A/B tests — some estimates put it around ten to fifteen percent. Most teams do not check for it, which means they are making decisions based on invalid data more often than they realize.

The implications are serious. If roughly one in ten of your past experiments had undetected SRM, some portion of the changes you shipped were based on false positives. And some changes you killed might have been effective.

This is why SRM detection should be the first step in any experiment analysis, before you look at any metric. If the randomization is broken, nothing else matters.

SRM in Advanced Experiment Designs

Sequential Testing

Sequential testing — where you analyze results continuously rather than at a fixed endpoint — requires special attention to SRM. The traffic ratio should be stable over time. If it drifts (perhaps because of a deployment that affected one variant), the sequential analysis may be invalidated even if the overall ratio looks acceptable.

Multi-Armed Bandits

Bandit algorithms intentionally shift traffic allocation based on intermediate results. This means the traffic ratio changes over time by design. SRM detection in bandit experiments requires comparing the observed ratio to the expected ratio at each time step, not to the initial allocation.

Crossover Designs

In crossover experiments, users experience both conditions at different times. SRM can occur if users drop out between phases at different rates depending on their initial assignment. Check for balanced attrition between the phases.

Building SRM Awareness Into Your Culture

SRM is not a topic that generates excitement. It is a technical hygiene issue that prevents embarrassing mistakes. But its impact on decision quality is enormous.

To build SRM awareness:

  • Add SRM checks to your experiment analysis template. Make it the first section, before any metric analysis.
  • Include SRM detection in your experiment platform. Automated checks catch what manual reviews miss.
  • Share SRM stories in team retrospectives. When you catch an SRM, discuss what caused it and what would have happened if you had shipped without checking.
  • Treat SRM as a launch blocker. No experiment result is valid if SRM is present. This should be a team-wide norm, not a suggestion.

The goal is simple: before you celebrate a win or mourn a loss, verify that the experiment actually measured what you think it measured. Sample ratio mismatch is the first and most important validity check.

FAQ

How much deviation from the expected ratio is acceptable?

It depends on sample size. With large samples, even a tiny deviation can be statistically significant. Use the chi-squared test with a conservative threshold (p less than 0.001) rather than eyeballing the ratio.

Can I fix SRM by re-weighting the groups after the fact?

No. Re-weighting corrects the observable imbalance but not the underlying selection bias. The users who were systematically diverted are different from the users they replaced. No statistical adjustment can fix this.

Is SRM more common in client-side or server-side experiments?

Client-side experiments are more prone to SRM because they depend on browser execution, which introduces more failure modes. Server-side randomization is more reliable but can still have SRM from other causes like caching or logging issues.

Should I always use a fifty-fifty split?

Not necessarily. Unequal splits (like seventy-thirty or ninety-ten) are valid when you want to reduce risk. The key is to check for SRM against the intended split, whatever it is.

Share this article
LinkedIn (opens in new tab) X / Twitter (opens in new tab)
Written by Atticus Li

Revenue & experimentation leader — behavioral economics, CRO, and AI. CXL & Mindworx certified. $30M+ in verified impact.