The most common failure mode I see in experimentation programs is not bad statistics. It is bad hypotheses. And almost every bad hypothesis has the same root cause: someone jumped to a solution before they understood the problem.
Solutionizing feels productive. Someone sees a metric they do not like, suggests a fix, and within a week the team is running a test. On paper this is velocity. In practice, it is a near-guarantee that the test will fail to teach you anything useful, regardless of whether it wins or loses.
"We really focus on problem-solution mapping — finding the problem first before we write a hypothesis. Instead of just becoming a sample size of one, coming up with something you think is a solution. Don't solutionize. A lot of teams do that." — Atticus Li
The Sample Size of One Problem
Here is the pattern I see constantly. A product manager looks at a checkout page and says "the CTA button is not prominent enough — we should make it bigger and move it above the fold." A designer implements the change. A CRO manager runs the test. The test is inconclusive. Nobody learns anything.
What actually happened is that one person had a hunch, the team treated that hunch as a hypothesis, and the experiment was really just a backfilled justification for the intuition. This is what I call the sample size of one problem. The hypothesis came from a single person's perspective, with no supporting evidence, and the test was designed to validate the fix rather than to understand the problem.
A real hypothesis starts with the problem — where exactly users are dropping off, why they are dropping off, what the data and the qualitative signals are telling you. Only after you understand the problem can you propose a solution worth testing.
"Often the best solution comes out of multiple people's perspectives — UX designers, UX researchers, the actual designers, different marketing teams, CRO, data analytics, developers. It really comes down to a diverse amount of opinions on what the best way of solving a particular problem is." — Atticus Li
What Problem-First Actually Looks Like
Problem-first hypothesis design has three phases. Each one takes real work. That is why most teams skip it.
Phase 1: Find the leak quantitatively.
Start with the funnel data. Not the top-line conversion rate — the step-by-step drop-off. Which specific step is losing the most users? Is the drop-off concentrated in a segment (mobile vs. desktop, new vs. returning, paid vs. organic)? Is it trending worse over time?
This is unglamorous work. It is reading reports, running cohort analyses, building funnels in Google Analytics or Adobe Analytics, and tracing the path users actually take through your product. But you cannot fix a leak you have not located.
Phase 2: Find the leak qualitatively.
Quantitative data tells you where users are dropping off. Qualitative data tells you why. This is where heatmaps, session replays, rage clicks, dead clicks, and support ticket themes come in.
AI-powered session analysis tools have made this phase dramatically faster than it used to be. What used to take hours of watching session replays now takes minutes with a good AI summary tool. But the work is still the work: you have to actually look at what users are doing, not just what the dashboard says.
The goal of this phase is to turn the drop-off from a number into a story. "Users are abandoning the checkout at the billing step because the form is asking for information they do not have yet." That is a problem statement. "Checkout conversion is 32% on mobile" is not.
Phase 3: Get diverse opinions before writing the hypothesis.
The best hypotheses come from multiple perspectives. A UX designer sees a different problem than a developer. A data analyst sees a different problem than a copywriter. A customer support rep sees things none of them see.
Before you write the hypothesis, get at least three perspectives on the problem. This is not a focus group. It is a 20-minute conversation with people who see the same page from different angles. You will almost always find that the "obvious" solution in your head is either incomplete, wrong, or not the highest-leverage option.
"Quite a bit of time is spent figuring out where the leak in the funnel is, or why exactly it's happening — not just from a quantitative perspective, but also from a qualitative perspective of where exactly users are dropping off." — Atticus Li
The HiPPO Problem
There is a specific version of solutionizing that is especially destructive: when the hypothesis comes from the highest-paid person in the room. This is the HiPPO problem, and it is the single biggest reason experimentation programs fail to scale.
Here is how it plays out. A VP mentions that they think the hero image should be changed. Within a week, the experimentation team is running a hero image test. The test may win or lose. Either way, the team has spent resources running a test that did not come from the backlog, did not come from data, and did not go through proper prioritization.
When this happens once, it is not a problem. When it happens constantly, the backlog is no longer driven by opportunity. It is driven by whoever has the most political capital. Win rates drop. Learning compounds slowly. The program loses credibility with the people whose ideas are not getting tested.
The fix is not to refuse HiPPO requests outright. You will lose that battle. The fix is to run every idea — including HiPPO ideas — through the same problem-first process. Treat the HiPPO suggestion as a hypothesis under review, not a mandate. Ask: what problem is this trying to solve? Do we have evidence that problem exists? How does it score against other ideas in the backlog?
Sometimes the HiPPO is right. Sometimes the exercise itself redirects them to a better idea. Either way, the process holds.
A Problem-First Hypothesis Template
Here is the template I use for every hypothesis:
Problem: [Specific drop-off, friction, or behavior observed in data]
Evidence: [Quantitative signals + qualitative signals backing the problem statement]
Hypothesis: If we [proposed change], then [specific metric] will [increase/decrease] by [projected MDE] because [behavioral mechanism].
Alternative explanations: [What else could explain the observed problem?]
Success criteria: [Primary KPI, guardrail metrics, and decision rule]
The key fields most teams skip are Evidence and Alternative Explanations. Without Evidence, the hypothesis is just a guess. Without Alternative Explanations, you are not stress-testing your own thinking.
Every test in a good program should be traceable back to a specific line in the Problem field. If you cannot point to the problem the test is solving, you are solutionizing.
FAQ
How do you handle stakeholders who do not want to slow down for problem analysis?
Reframe it as risk mitigation. "We can run this test in three days without problem analysis, but our win rate on tests without analysis is 15%. If we spend two extra days on the problem phase, win rate goes up to 35%. That is worth the delay." Use their own language — speed and ROI — to make the case.
What if the data does not clearly point to a single problem?
That is useful information too. It might mean the problem is distributed across multiple steps, or that you need better instrumentation before you can run meaningful tests. Invest in the instrumentation first. You cannot test your way out of a measurement problem.
How many opinions should you gather before writing a hypothesis?
Three to five from distinct roles. More than that and you hit diminishing returns and decision paralysis. The point is diversity of perspective, not volume of input.
Can you skip problem-first analysis for small tests?
For very small tests — copy tweaks, minor styling — you can move faster. But for any test that touches a revenue-critical flow or will inform future strategy, the problem-first work is worth the time. The highest-leverage tests in your backlog deserve the highest-quality analysis.
Build a Backlog of Real Hypotheses
If your experimentation backlog is full of ideas that do not trace back to specific problems, you are running a solution mill, not an experimentation program. The cost is invisible — you will not notice it in any single test — but it compounds over time as your win rate stays flat and your learnings stay thin.
I built GrowthLayer with a problem-first hypothesis template baked in. Every test you add is forced through the same structured fields — problem statement, evidence, hypothesis, alternative explanations — which is the single biggest lever I have found for raising win rates without adding headcount.
If you are hiring or looking for experimentation and CRO roles that value structured hypothesis design, explore open positions on Jobsolv.
Or book a consultation and I will walk you through how to build a problem-first backlog process that works with stakeholders instead of against them.