Same Principle, Different Execution: Why Your A/B Test Variant Failed

Atticus Li

← Blog · ab-testing

Same Principle, Different Execution: Why Your A/B Test Variant Failed

Here's something that doesn't get talked about enough in the experimentation world: the idea isn't what wins. The execution is. This is the story of a test where two variants used the exact same psychological mechanism with radically different results.

Atticus Li March 20, 2026 7 min read

The Same Idea, Two Executions, One Winner

Here's something that doesn't get talked about enough in the experimentation world: the idea isn't what wins. The execution is.

I see this mistake constantly. A team reads about a behavioral science principle — say, the Endowed Progress Effect — and assumes that simply applying it will move the needle. They ship one version, it works (or it doesn't), and they move on. What they miss is that the same principle, implemented differently, can produce wildly different results.

This is the story of a test where two variants used the exact same psychological mechanism. One drove a measurable lift in transactions. The other did absolutely nothing.

The Test: Two Progress Bars, One Winner

A large energy retailer was losing customers at a critical juncture: the plan selection page. Users would enter their zip code, land on a grid of available plans, and then... leave. The page had solid traffic — over 30,000 visitors across the test period — but the conversion rate to the next step had flatlined.

The team's hypothesis was grounded in solid behavioral science: add a progress bar to make users feel they were already partway through the purchase journey. The bar would show four steps — Enter Zip Code, Select Plan, Enter Details, Confirm Order — with Step 1 already marked as complete.

Two variants were designed. Both included a progress bar. Both leveraged the same psychological principle. Both were placed in the same position on the page.

What Happened

Variant A delivered a 3-8% lift in transactions over the 55-day test period. Estimated revenue impact: $250K-$500K.
Variant B showed no statistically significant difference from the control. Zero lift. Nothing.

Same principle. Same page. Same placement. Radically different outcomes.

What Went Wrong with Variant B?

This is where most teams stop analyzing. They declare Variant A the winner, ship it, and celebrate. But the real insight — the one that compounds your learning across your entire testing program — lives in understanding why Variant B failed.

The Clarity Gap

In my experience running hundreds of experiments, the most common reason a behaviorally-sound idea fails is cognitive load. The principle triggers the right psychological response, but the implementation makes it too hard for the brain to process.

Daniel Kahneman's distinction between System 1 (fast, automatic) and System 2 (slow, deliberate) thinking is essential here. For a progress bar to work, it needs to be processed by System 1. Users should instantly understand where they are and where they're going. The moment a progress indicator requires System 2 thinking — decoding unfamiliar icons, parsing ambiguous labels, figuring out which step is "current" — it fails.

The winning variant almost certainly succeeded because it achieved what Kahneman calls cognitive ease. The steps were labeled in plain language that matched the user's mental model. The visual hierarchy was unmistakable: completed, current, upcoming. No ambiguity. No interpretation required.

The Mental Model Mismatch

There's a subtler failure mode at play too. For a progress bar to activate the Endowed Progress Effect, the steps need to feel legitimate. Nunes and Drèze's 2006 research on loyalty card completion showed that artificial progress only works when the task structure feels authentic.

If Variant B's step labels didn't match how users actually thought about the purchase process — if the steps felt arbitrary or the language was too technical — then the progress bar wouldn't create the psychological tension that drives completion behavior. Instead of thinking "I'm 25% done, I should keep going," users would think "what does that even mean?" and ignore it entirely.

The Visual Hierarchy Problem

A progress bar is fundamentally a wayfinding device. It answers three questions simultaneously: Where have I been? Where am I now? Where am I going? If any of these three signals is unclear, the entire mechanism breaks down.

Research by Colin Ware on visual attention shows that humans process spatial position, color, and size before they read text. A progress bar with poor visual differentiation between completed, active, and future steps fails at the perceptual level — before the behavioral science even has a chance to work.

The Broader Lesson: Why "Best Practices" Fail

This experiment illustrates a pattern I've seen play out across hundreds of tests: behavioral science principles are necessary but not sufficient. They give you the right hypothesis, but the execution determines the outcome.

The conversion optimization industry has a dangerous habit of treating principles as prescriptions. "Add social proof." "Create urgency." "Show progress." These are starting points for hypotheses, not guaranteed solutions. The difference between a principle that lifts conversion and one that falls flat is almost always in the details:

Visual clarity — Can the user process the element in under 200 milliseconds?
Mental model alignment — Does the element use language and structure that matches how users think about the task?
Cognitive load — Does the element reduce uncertainty, or does it add another thing to parse?
Context sensitivity — Is the element appropriate for this specific page, audience, and moment in the journey?

What the Data Tells Us About Execution Quality

Here's what makes this test particularly valuable for practitioners: the A/B/n design with two progress bar variants gives us a natural experiment in execution quality. The control tells us the baseline. Variant A tells us the principle works when executed well. Variant B tells us the principle doesn't work when executed poorly.

This three-way comparison eliminates the most common alternative explanation for failed tests ("maybe the principle just doesn't apply to our context"). It clearly does apply — Variant A proved it. The variable was execution, not theory.

In statistical terms, we have approximately 10,000-15,000 visitors per variant over 55 days. The winning variant reached significance. The losing variant didn't even trend in the right direction. That's not noise — that's a signal about how much execution matters.

Practical Takeaways: How to Avoid Being Variant B

1. Always Test Multiple Executions

If you're testing a behavioral principle, don't test one implementation against a control. Test at least two implementations against each other. The variance between executions of the same principle is often larger than the variance between principles.

2. Apply the 200ms Rule

Show your design to someone for 200 milliseconds, then take it away. Can they tell you what step they're on? If not, your progress indicator fails the System 1 test. This applies to any behavioral trigger — social proof, urgency signals, trust badges. If it requires reading and thinking, it's working against itself.

3. Mirror the User's Language

Use the exact words your customers use to describe each step. Not your internal terminology, not your UX team's clever labels. Run a quick card sort or review support tickets to find out how customers actually describe the purchase process. Then use those words.

4. Treat Losing Variants as Data, Not Failures

The losing variant in this test is arguably more valuable than the winner. It tells us something specific about what doesn't work. Most teams discard losing variants without analysis. Instead, document why each variant failed and build a knowledge base that compounds across your testing program.

The Compound Effect of Execution Intelligence

Here's what separates good testing programs from great ones: the learning from losing variants compounds faster than the learning from winners.

When a variant wins, you know what worked — but you don't always know why. When a variant loses despite being based on sound behavioral science, you learn something specific about the gap between theory and practice. That insight applies to every future test.

This progress bar test didn't just produce a winning variant worth $250K-$500K in revenue. It produced a lesson about execution quality that, applied across an entire testing program, is worth far more. It tells us that the details of visual hierarchy, label clarity, and mental model alignment aren't nice-to-haves. They're the difference between zero impact and six-figure revenue gains.

The next time someone on your team says "let's just add a progress bar" or "let's just add social proof," remember this test. The principle is the easy part. The execution is where the money is.

ab-testing experimentation statistics

Atticus Li

Experimentation and growth leader. Builds AI-powered tools, runs conversion programs, and writes about economics, behavioral science, and shipping faster.

About LinkedIn Newsletter