Most articles about experimentation assume the hardest part is writing a good hypothesis. It is not. The hardest part is making a decision when your test never reached significance, the traffic is lower than you modeled, a marketing campaign polluted the test window, and a stakeholder is asking you for an answer by Friday.

That is the real job. Not running perfect experiments. Making the best decision you can with the data you have.

"A lot of data-driven decision-making is making decisions with incomplete data. There's no perfect world where you have complete data. You have to make the decision based on the data you have — the best decision you can make. It's not the perfect decision." — Atticus Li

The Gap Between Influencer Advice and Company Reality

If you follow CRO on LinkedIn or read the big conference talks, you will see a specific kind of advice: run rigorous tests, wait for significance, never make claims without 95% confidence, calibrate everything. It is all technically correct. It is also useless for most teams.

The people giving that advice are almost always running programs at companies with enormous traffic. When you have millions of users, sample size is not a constraint. You can afford to wait. You can afford to throw out inconclusive tests. You can afford purity.

Most companies cannot. Most companies are running experiments on pages that get a few thousand sessions a week. Most teams are dealing with political constraints, budget constraints, and stakeholders who do not care about Bayesian priors. The trade-offs those influencers never mention are the trade-offs that define the job for everyone else.

"Most companies that are doing CRO are in this not-very-clean area of testing. They might not have enough samples, or no more traffic, or whatever. They're running tests and they're not hitting sample sizes, getting inconclusive results, and they're not sure. You have to make decisions with incomplete data." — Atticus Li

What Incomplete Data Actually Looks Like

Here is what a real week of experimentation looks like for most teams:

  • A test that started two weeks ago is trending positive but will not hit significance for another three weeks at current traffic
  • A marketing team pushed a new ad campaign mid-experiment, doubling traffic on one variant and skewing the comparison
  • A stakeholder wants to ship the variant they like regardless of the data
  • The analytics tool is showing a sample ratio mismatch on what should be a clean 50/50 split
  • A product team is blocking the release of a winning variant because it conflicts with their roadmap

None of those problems have textbook answers. You will not find a section in any statistics book titled "what to do when a product manager overrides your results." But these are the problems that actually determine whether your experimentation program delivers business value.

The Best Decision You Can Make Right Now

The framing that changed how I run programs is this: the goal is not the perfect decision. It is the best decision you can make right now, given the data you have, the time you have, and the constraints you are operating under.

That means accepting three things most practitioners resist:

1. Directional data is still data. A test that did not reach 95% confidence but shows a consistent 4% lift with no conflicting signals is not worthless. It is evidence. You can make a decision on it while explicitly flagging the confidence level so stakeholders understand what they are acting on.

2. Bayesian thinking is more honest than frequentist purity. Classical frequentist methods make sense when you can run clean experiments with large samples. Most real programs cannot. Bayesian methods let you reason about probability distributions and directional confidence in a way that matches how business decisions actually get made.

3. Speed is a cost line nobody tracks. Every week you spend waiting for more data is a week of lost optimization on the winning variant, lost learning, and lost credibility with stakeholders. Accuracy is valuable, but it trades against velocity. Name the trade-off explicitly.

A Framework for Deciding Under Uncertainty

When a test is not clearly conclusive but a decision has to be made, I run through five questions:

  1. What is the directional signal? Is the variant consistently ahead, or oscillating around the control? Consistent directionality is a stronger signal than a single snapshot number.
  2. How much of the variance can I attribute to known confounds? If marketing ran a campaign that affected one arm of the test, I account for it explicitly. I do not pretend the test was clean if it was not.
  3. What is the cost of being wrong in each direction? Shipping a bad variant has a cost. Killing a good variant has a cost. These are rarely equal. A test affecting your checkout flow has asymmetric downside compared to a test on a marketing blog post.
  4. What is the cost of waiting? How much revenue are we leaving on the table per week by not acting? How much stakeholder credibility do we lose?
  5. Can we ship with a holdout? Often the right answer is to ship the variant to 80% of traffic and hold 20% on control, which lets you continue collecting data while capturing most of the lift.

This framework is not elegant. It will not satisfy a statistics professor. But it matches how real decisions get made, and it puts structure around what would otherwise be gut calls.

The UX Researcher Who Could Not Ship

The sharpest version of this problem I have seen came from a UX researcher I worked with. She was brilliant — academically rigorous, deeply trained, always insisting on methodologies by the book.

The problem was that stakeholders were not willing to wait the time she needed. Budgets did not support the sample sizes she wanted. Timelines did not accommodate the multi-phase research plans she proposed. And the result was predictable: she could not deliver anything that stakeholders could actually use, because doing things absolutely perfectly ended up meaning doing nothing at all.

That is the core trap. Most teams do not have the traffic of Netflix. Most do not have the budget of Booking.com. Most do not have the backing of Disney. The methods that work for those companies are not the methods that work in the real world for everyone else.

Pragmatism is not the enemy of rigor. It is the thing that lets rigor survive contact with reality.

FAQ

How do you report an inconclusive test to stakeholders?

Tell them what you know, what you do not know, and what you would recommend if they had to act today. Do not hide behind "inconclusive." That word abdicates the job. Stakeholders still need to make a decision — your job is to inform it, not duck it.

Is it ever acceptable to ship a losing variant?

Yes, if the test was too noisy to trust or if there is a non-revenue reason like compliance, brand, or product strategy. What is not acceptable is shipping a losing variant and calling it a win. Intellectual honesty builds more trust over time than false certainty.

What confidence threshold do you actually use?

It depends on the blast radius. For low-risk tests on non-critical pages, I will act on 80% Bayesian probability. For revenue-critical changes to checkout or pricing, I want 95% or higher plus a post-launch holdout. The threshold should match the cost of being wrong.

How do you handle stakeholders who want a yes or no answer?

Give them one. Then tell them your confidence level in that answer and what would change your mind. "I recommend shipping the variant. I am about 70% confident, and I would upgrade to 90% if we see the same pattern hold for two more weeks in a holdout." That is more useful than hedging.

Make Better Decisions Faster

If your team is stuck waiting for statistical perfection that will never come, the problem is not your statistics. It is your decision framework. Most experimentation programs I have worked with have far more data than they are willing to act on.

I built GrowthLayer to operationalize exactly this kind of pragmatic decision-making — pre-test duration calculators, Bayesian result interpreters, and a test repository that lets you reason about directional signals without pretending you have Netflix-level traffic.

If you are building the career skills to run programs like this, browse open CRO and experimentation roles on Jobsolv.

Or book a consultation and I will walk you through how I have structured this at enterprise scale — and how to adapt it to whatever constraints your team is actually operating under.

Share this article
LinkedIn (opens in new tab) X / Twitter (opens in new tab)
Atticus Li

Leads applied experimentation at NRG Energy. $30M+ in verified revenue impact through behavioral economics and CRO.