How to Share A/B Test Results With Stakeholders (Who Don't Care About P-Values)

Atticus Li

← Blog · optimizely

How to Share A/B Test Results With Stakeholders (Who Don't Care About P-Values)

Your CEO doesn't care about statistical significance. Here's the one-page results template, the revenue translation formula, and how to handle every awkward stakeholder question about your experiment results.

Atticus Li March 31, 2026 13 min read

You ran a clean experiment. The results are in Optimizely. The data is solid. Now you have 30 minutes with your VP of Marketing and her first question is: "So... did it work?"

This is the translation problem that ends more experimentation programs than bad methodology. The gap between what lives in Optimizely and what your leadership team needs to make decisions is enormous—and it's your job to bridge it.

This guide covers every component of that translation: the one-page results template, the revenue math, the stakeholder questions you'll dread, and the system for building institutional memory from your experiments over time.

The Fundamental Translation Problem

Optimizely gives you: statistical significance, confidence intervals, lift percentages, sample sizes, p-values, and a winner/loser verdict.

Your CEO wants to know: did we make more money, should we ship this, and what do we do next?

These are not the same questions. The mistake most practitioners make is trying to answer the CEO's questions with the language of the first set. "We achieved 95% statistical significance on a 6.3% lift in CVR with a p-value of 0.047" is not an answer to "did we make more money." It's a sentence that causes your CEO's eyes to glaze over and makes them wonder if they should outsource CRO.

The translation requires three things:

Connecting your test metric to revenue in plain language
Giving a clear recommendation, not a data dump
Framing the result—win or lose—in terms of what you learned and what comes next

The One-Page Results Template

I've used some version of this template for every stakeholder-facing results presentation I've given. It works. Copy it.

Experiment Name: [Name] Test Period: [Start date] – [End date] Page/Feature: [Where the test ran]

What we tested One sentence: what changed, and where. No jargon. Example: "We tested showing estimated shipping costs on the product page before users initiate checkout, vs. our current experience where shipping costs appear only at the checkout step."

Why we tested it One to two sentences: what data or insight prompted the test, and what behavioral theory predicted the outcome. Example: "Our funnel data showed 34% of cart abandonment happening on the shipping information page. We hypothesized that unexpected shipping costs were driving this—showing costs earlier would reduce surprise and increase checkout completion."

What happened Two to three sentences: the plain-language results. Lift percentage, statistical confidence, and whether it's a clear decision. Example: "Showing shipping costs earlier increased checkout completion rate by 7.2% with 97% statistical confidence. Revenue per visitor increased by $0.43 (from $5.91 to $6.34). This is a clear win—recommend shipping immediately."

Revenue translation One number: what this means in annualized dollars at current traffic. Example: "At current traffic (18,000 monthly sessions), this improvement is worth approximately $92,880 in additional annual revenue."

What we learned One to two sentences: the insight that persists regardless of whether we ship this variant. Example: "Unexpected cost reveals late in the funnel is a primary driver of cart abandonment for this audience. The magnitude of the effect suggests shipping cost transparency should be a design principle across all purchase flows, not just this page."

What's next One to two sentences: the follow-on test or action this result generates. Example: "Next test: apply the same transparency principle to tax estimates. We predict a similar effect based on the same behavioral mechanism."

This template is scannable in 90 seconds and answers every legitimate question a stakeholder has. It doesn't require them to understand p-values, confidence intervals, or statistical significance—those details are available in the appendix for anyone who wants them.

**Pro Tip:** Build this template in your company's standard presentation format (Google Slides, PowerPoint, Notion—whatever your team actually reads). A well-formatted one-pager that lives in your normal workflow gets read. A link to a Confluence doc does not.

Translating Lift to Revenue: The Formula

This is the calculation that makes experiments real to finance and leadership:

Annual Revenue Impact = Lift (%) × Baseline CVR × Monthly Traffic × AOV × 12

Or simplified: Lift × Baseline Revenue Per Session × Monthly Sessions × 12

Worked example:

Baseline CVR: 2.8%
Test variant CVR: 3.02% (7.9% relative lift)
Monthly sessions on test page: 22,000
AOV: $124
Baseline revenue per session: 2.8% × $124 = $3.47

Impact per session: $3.47 × 7.9% = $0.274 additional revenue per session Monthly impact: $0.274 × 22,000 = $6,028 additional monthly revenue Annual impact: $6,028 × 12 = $72,336 additional annual revenue

This number is what your leadership team can act on. Not "we achieved a 7.9% lift"—that's meaningless to someone who doesn't spend time in Optimizely. "$72,000 per year" is a business decision.

Caveats to communicate:

This assumes traffic stays constant (account for seasonality if relevant)
CVR gains sometimes decay post-implementation as the novelty effect wears off; consider a conservative estimate at 70% of observed lift
This is in-test traffic extrapolated—actual impact depends on implementation quality

**Pro Tip:** Build a revenue calculator as a shared spreadsheet that pre-fills from Optimizely export data. Input the lift %, baseline CVR, traffic, and AOV, and the sheet spits out annual impact. Send this with every results summary. It takes 5 minutes to fill in and instantly answers the most important stakeholder question.

How to Present Losing Tests as Wins

This is the highest-leverage communication skill in experimentation. Most practitioners dread presenting losing tests. The best practitioners have learned that losing tests, communicated correctly, build more program credibility than easy wins.

The framing: a losing test with a clear learning is a successful experiment. A test that doesn't teach you anything—win or lose—is the failure.

Structure for presenting a loss:

"We tested [X]. It did not win. Here's why that's valuable: our hypothesis was [behavioral mechanism]. The data tells us that [mechanism] is not what's driving [behavior] on this page. This eliminates a major candidate explanation and points us directly toward [next hypothesis]. Our next test will be more targeted because of this result."

Real example:

"We tested adding a security badge next to the checkout CTA, hypothesizing that trust was the primary barrier to checkout completion. CVR was flat—no significant effect. This tells us trust/security is not the friction point on this page (or that users already trust us adequately on this metric). We're re-examining our session recordings and found rage-clicking on the promo code field—suggesting friction around discount expectations may be the real barrier. Next test targets promo code field UX."

This presentation style does several things: it shows intellectual honesty, it demonstrates methodological rigor (you had a theory, you tested it, you updated appropriately), and it keeps momentum going. The next test is already framed.

**Pro Tip:** If you're in a culture where losing tests feel like program failures, address this explicitly at the organizational level. Share a "test velocity + learning velocity" dashboard with leadership that shows both won tests and learned tests as program achievements. The framing shift from "winning tests" to "learning tests" is a culture change, but it's achievable.

The "What About Statistical Significance?" Question

You will get this question. Usually after a test that has a positive trend but hasn't reached significance yet.

Here's how to answer it without being condescending:

"Statistical significance tells us how confident we are that the difference we saw between the variants is real—not just random variation in traffic. At 95% confidence, there's a 1-in-20 chance the difference is just noise. We haven't reached that threshold yet, which means we can't confidently predict this result would hold if we ship it. Shipping at this point would mean making a business decision based on data that has a higher-than-acceptable probability of being misleading."

Then give them the expected completion date and a clear recommendation for what to do in the meantime.

What you're doing: giving them just enough understanding to accept the conclusion without requiring them to become statisticians. You're not explaining p-values—you're explaining why the decision matters.

Handling "Can We Just Ship It Anyway?"

This is the hardest conversation. The test shows a positive trend. Leadership wants to act. You know the data isn't conclusive.

When "ship it anyway" might actually be right:

The lift is large in magnitude and the risk of the change is low (reverting is easy)
The business context makes waiting costly (time-sensitive promotion, competitive pressure)
The directional signal is consistent across all segments even if aggregate significance hasn't been reached

When you should hold the line:

The metric is revenue-adjacent and a false positive would mean shipping something that actually decreases revenue
Reverting after shipping would be technically or organizationally difficult
There's meaningful variance in the data—some segments look positive, others negative

How to say "no" without creating conflict: "The trend is promising, and I understand the urgency. Here's my concern: at current confidence, there's roughly a 1-in-7 chance this is noise. If we ship and it actually hurts revenue, we've done real damage and we'll spend two weeks diagnosing a confounded result. The test reaches significance in [X] more days. The expected value of waiting is higher than the expected value of shipping early. If you want, we can extend the traffic allocation to 80% to accelerate completion."

**Pro Tip:** Pre-negotiate the decision framework before the test launches. "We'll ship if we reach 95% confidence. We'll assess early if we see a 15%+ lift before that threshold." Getting agreement on the rules in advance makes the in-the-moment conversation much easier.

"It Worked in the Test, Why Didn't It Work After Shipping?"

This conversation happens when a winning variant doesn't show the expected revenue lift in the months after implementation. It's uncomfortable. Here's how to address it.

Possible explanations to investigate:

Novelty effect: Some changes get temporary lift from being new, not from being better. This is especially common with visual changes. Monitor CVR for 30–60 days post-ship.
Segment shift: The test population may not match the post-ship population (e.g., the test ran during a sale period, post-ship is full-price traffic).
Implementation differences: Engineering may have implemented something slightly different from the test variant. Compare screenshots.
Interaction effects: Another change shipped simultaneously may be counteracting the gain.

The right response is to investigate, not defend. Frame it as: "We're seeing a discrepancy between test results and post-ship performance. Here are the four most likely explanations. Here's how we'll investigate each. We'll have an answer in two weeks."

This keeps you in the role of analyst, not defendant.

Building a Results Library That Compounds

The single best long-term ROI activity in an experimentation program is maintaining a high-quality results library. Not just a list of wins and losses—a searchable, structured record of what you tested, why, what happened, and what you learned.

What to include in each entry:

Experiment name and Optimizely link
Hypothesis (full structure: observation + change + mechanism + metric + audience)
Results (lift, confidence, sample size, runtime)
Decision (shipped, rejected, inconclusive)
Learning (the insight that persists regardless of outcome)
Follow-on hypotheses generated
Tags: page type, behavioral mechanism, funnel stage, device, audience segment

The compounding effect: After 50+ experiments, your results library becomes your most competitive advantage. New team members onboard faster. Stakeholders can see program history. And most importantly—you stop testing the same things twice. Teams without libraries routinely re-run tests that were already run, reach the same conclusions, and have no institutional memory.

**Pro Tip:** Quarterly, run a "learning audit" on your results library. Pull all experiments tagged with a specific behavioral mechanism (e.g., social proof) and look for patterns. Where does it work? Where doesn't it? What's the magnitude range? This is how you build a model of your customers that's actually grounded in data.

Structuring Quarterly Experiment Reviews

A quarterly review is your program's most important stakeholder touchpoint. It's where you translate a quarter of individual test results into strategic narrative.

Structure (60 minutes):

Program velocity (5 min): Tests launched, completed, win rate, velocity trend
Revenue impact summary (10 min): Aggregate annualized value of won tests, shipped vs. pending
Key learnings (20 min): 3–5 insights from the quarter that should influence product/marketing strategy
Roadmap preview (15 min): Top 5 tests for next quarter with hypotheses and expected impact
Program asks (10 min): Resources, access, or decisions you need from leadership

The quarterly review is not a results dump. It's a narrative about what you've learned about your customers this quarter and what you're going to do with it.

**Pro Tip:** Start every quarterly review with a "learning headline"—one sentence that captures the most important thing the program discovered this quarter. "Q1 taught us that our customers respond to testimonials from industry peers but not from famous brands—suggesting our target audience values relevance over authority." This kind of framing keeps leadership engaged and makes the tactical results feel strategic.

What to Do When Leadership Overrides Your Data

It will happen. A test wins conclusively, you recommend shipping, and leadership decides not to—for political, organizational, or aesthetic reasons.

How to handle it professionally:

Document the recommendation and the override. Write it down: "Test X showed Y% lift at 95% confidence. Team recommendation: ship. Decision: not shipped. Reason: [stated reason]." This creates accountability without confrontation.
Quantify the cost. Attach the revenue impact calculation to the override documentation. Not confrontationally—as information. "For reference, this decision represents an estimated $X annualized opportunity cost."
Don't relitigate. Present the data once, clearly. If the decision goes the other way, accept it and move on. You've done your job.
Build toward system change. If overrides are frequent, the problem isn't any single decision—it's the program's authority structure. Use the override documentation to make a case for a "shipping authority" policy: who has decision rights when a test wins, and what's the process for exceptions.

Common Mistakes

Results dumps without recommendations — Sharing raw Optimizely exports with stakeholders and expecting them to draw conclusions. Your job is to have a recommendation. Every results communication should end with "our recommendation is X."

Burying the lede — Starting with methodology before results. Lead with the business impact, then explain the how. "This test is worth $80K annually. Here's how we know that."

Presenting every metric — If your results summary includes 12 metrics, you don't have a primary metric. Lead with one or two numbers. Put everything else in an appendix.

No follow-on story — Results that end with "and that's what we found" leave stakeholders wondering why they should care about the next test. Always end with what this result generates: a new hypothesis, a strategic implication, a product recommendation.

Inconsistent cadence — Sharing results ad hoc when they're positive and going quiet when they're not. Build a regular cadence (monthly digest, quarterly review) regardless of whether any given period was a good one for the program.

What to Do Next

Take your most recent experiment result and write it up using the one-page template above.
Run the revenue translation formula. If you haven't been doing this, the number will clarify why it's worth doing.
Schedule a quarterly experiment review with your key stakeholders if you don't have one on the calendar.
Start your results library today. Add your last five completed experiments with the full structured format.

If your program is still getting started and you want to see how all these pieces fit together—hypothesis writing, metrics, results communication, and roadmap—start with How to Build an Experimentation Roadmap That Actually Gets Used for the full program architecture.

optimizely stakeholder-management experimentation results communication testing-program

Written by Atticus Li

Revenue & experimentation leader — behavioral economics, CRO, and AI. CXL & Mindworx certified. $30M+ in verified impact.

About LinkedIn Newsletter