Atticus Li leads Applied Experimentation at NRG Energy (Fortune 150), where he runs 100+ experiments per year and generated $30M in verified revenue impact in 2025. He writes about the operational reality of building experimentation programs that survive contact with organizational politics.

I get this message from a new analyst at least once a quarter. The wording changes, but the spirit is always the same: "Hey, I'm looking at the test we launched yesterday. The split is 51.2% to 48.8%. Is something wrong? Should we restart?"

The answer is almost always no. But the fact that they're asking is a good sign. It means they're paying attention to data quality, which is a habit I want to reinforce. The question isn't wrong. It's just premature.

So here's what I tell them.

Random Allocation Is Not a Perfect 50/50 Machine

When you set up a 50/50 A/B test, you're telling the experimentation platform to randomly assign each visitor to one of two groups with equal probability. Equal probability does not mean equal count.

Flip a fair coin 1,000 times. You will almost never get exactly 500 heads and 500 tails. You'll get 487 and 513. Or 504 and 496. Or 519 and 481. All of these are completely normal outcomes from a fair coin.

Your A/B test traffic split works the same way. Random assignment with equal probability will produce slightly unequal groups. This is not a bug. It's basic statistics. The law of large numbers says the ratio will converge toward 50/50 as sample size increases, but at any finite point, there will be variance.

At 1,000 visitors, a 51/49 split is utterly unremarkable. At 10,000 visitors, it's still fine. At 100,000 visitors, a persistent 51/49 deserves a second look — not because it's definitely wrong, but because the expected variance at that sample size is smaller.

The key concept is expected variance. At any given sample size, there's a range of split ratios that are completely consistent with fair random assignment. A 51/49 split at 1,000 users is well within that range. A 55/45 split at 100,000 users is not.

The Acceptable Range

Here's the rough heuristic I give my team. For a 50/50 split, the acceptable range at various sample sizes is approximately:

At 1,000 users: anything from 47/53 to 53/47 is normal. At 10,000 users: 49/51 to 51/49 is typical, and up to 48/52 is still within reason. At 100,000 users: you should be very close to 50/50, and deviations beyond 50.5/49.5 warrant investigation.

These aren't hard cutoffs. They're intuition guides. The actual statistical test for whether a split is problematic is the Sample Ratio Mismatch (SRM) test, which I'll get to in a moment.

The point is: don't panic at small deviations in small samples. Random variance is doing exactly what random variance does.

When to Actually Worry: Sample Ratio Mismatch

Here's where it gets serious. There are situations where a traffic split deviation is not random variance — it's a signal that something in your test setup is broken. This is called Sample Ratio Mismatch, and it's one of the most important diagnostic checks in experimentation.

SRM happens when the observed traffic split is statistically incompatible with the expected split. Not "slightly off." Incompatible. As in, the probability of seeing this split from a fair random assignment is less than 1%.

When SRM is detected, it means something is systematically pushing more traffic to one variant. Common causes include:

Bot filtering differences. If your bot filter removes more traffic from one variant than the other, you'll see SRM. This happens when the variant triggers different page behaviors that bots interact with differently.

Redirect latency. If one variant requires a redirect and the other doesn't, users who bounce during the redirect are lost from one group but not the other. This is especially common in server-side tests where the control is the default experience.

JavaScript errors. If the variant JavaScript crashes for certain browsers or devices, those users might not get tracked. Your platform thinks they were assigned to the variant, but their events never fire. The split looks off because you're missing data from one group.

Caching issues. CDN caching can serve the wrong variant to users, or cache one variant more aggressively than the other, creating systematic imbalances.

Interaction with other tests. If two tests share traffic and one test's variant affects whether users reach the second test, you can get SRM in the second test.

SRM means your test results are unreliable. The groups are no longer comparable, so any difference in conversion rates could be caused by the imbalance, not by the treatment. When SRM is detected, the test needs to be investigated and likely restarted.

The SRM Diagnostic

The good news is that SRM detection is straightforward. You run a chi-squared test comparing your observed split to your expected split. If the p-value is below 0.01 (or whatever threshold your team uses), you have SRM.

I have my team run this check on every test within the first 48 hours of launch. It's part of our standard launch QA process. Catching SRM early means you can diagnose and fix the issue before wasting weeks of test runtime on compromised data.

The check takes 30 seconds. Use GrowthLayer's SRM calculator — plug in your expected split, your observed counts, and it tells you whether you have a problem. There's no reason to skip this step.

The Mentoring Moment

When a new analyst flags a 51/49 split, I don't just tell them it's fine. I use it as a teaching moment to explain the full spectrum of split quality.

I walk them through the three buckets. First: normal variance, which is what their 51/49 is. Nothing to worry about, keep monitoring. Second: suspicious but not conclusive, which might be a 53/47 at 50,000 users. Worth running the SRM test and keeping an eye on. Third: confirmed SRM, which is a split that's statistically impossible under fair assignment. Stop the test, diagnose, fix, restart.

The lesson isn't "stop worrying about splits." The lesson is "worry about the right thing." Don't worry about normal variance. Do worry about systematic bias. And use a statistical test — not your gut — to tell the difference.

I also tell them something that took me years to internalize: the anxiety about imperfect splits comes from a good place. It means they care about data quality. I want to channel that instinct, not suppress it. The analyst who flags a 51/49 split today is the same analyst who'll catch a real SRM issue six months from now.

What I Wish Someone Had Told Me

When I started in experimentation, I didn't even know SRM was a concept. I spent weeks agonizing over minor split deviations that were completely normal, and I probably missed at least one real SRM issue because I didn't know what to look for.

The knowledge gap is common. Most experimentation courses and blog posts skip SRM entirely, or mention it in passing. But it's one of the most practically important concepts in applied experimentation. A test with SRM is a test you can't trust, no matter how impressive the results look.

So to every new analyst reading this: your instinct to check the split is correct. Your threshold for concern just needs calibration. A 51/49 split is noise. A failed SRM test is a red flag. Learn the difference, and you'll avoid both false alarms and missed problems.

And run the SRM check on every test. Every single one. It takes 30 seconds and it could save you from making a decision based on compromised data.

The 80/20 of Split Quality

If you take one thing from this article, it's this: don't spend your limited QA time worrying about whether the split is 50.3/49.7. Spend it running the SRM diagnostic on every test at launch. That's where the actual risk lives.

Perfect splits are a myth. Systematic bias is real. Focus on what matters.

---

_Run an SRM check on your current tests right now. GrowthLayer's SRM calculator is free, takes 30 seconds, and tells you whether your traffic split is healthy or hiding a real problem._

Share this article
LinkedIn (opens in new tab) X / Twitter (opens in new tab)
Atticus Li

Experimentation and growth leader. Builds AI-powered tools, runs conversion programs, and writes about economics, behavioral science, and shipping faster.