How to Read Your Optimizely Results Page (A Complete Walkthrough)
You ran your test. You opened the results page. You saw a big green percentage and felt good. Then you shipped the winner — and your conversion rate didn't move.
This is the most common failure mode in experimentation. The Optimizely results page contains a lot of information, and most practitioners look at exactly the wrong things first. This walkthrough covers every element on the page, what it actually means, and the order you should be reading it.
Start Here: The Samples Ratio Check
Before you read a single number on the results page, check one thing: did traffic split the way you intended?
If you set up a 50/50 test and your results page shows 8,200 visitors in control and 6,100 in variation, something is broken. This is called a Samples Ratio Mismatch (SRM), and it invalidates your entire test.
Causes of SRM include:
- JavaScript errors in your variation that crash the page and redirect users
- Bot filtering that applies differently to control vs. variation
- Caching or CDN configurations that serve the control to a disproportionate segment
- Experiment targeting that fires inconsistently
How to check: Divide your visitor counts. If the ratio deviates from your expected split by more than 1-2%, investigate before reading anything else.
**Pro Tip:** A quick SRM sanity check: for a 50/50 test, the smaller group should be no less than 48% of total visitors. If you're at 43/57 or worse, stop and audit your implementation before drawing any conclusions.
The Elements on the Results Page, In Order
Visitors and Conversions
The first columns you'll see are visitor counts and conversion counts per variation. This is raw data — look at it for sanity, not significance.
What to verify:
- Are visitor counts roughly equal to your expected split?
- Are conversion counts plausible? If your baseline normally converts at 3% and you're showing 0.1%, your conversion event may not be firing.
- Are both numbers growing over time? A flat conversion count while visitors grow is a red flag for a broken conversion event.
**Pro Tip:** Cross-check your conversion count in Optimizely against a secondary source like GA4 or your backend. If Optimizely shows 2,400 conversions and GA4 shows 240, your event is probably firing on every page load, not just on purchase confirmation.
Conversion Rate
Conversion rate = conversions / visitors. Simple, but there are two traps here.
Trap 1: Rounding. Optimizely displays conversion rates to two decimal places. At low conversion rates (say, 1.20% vs. 1.31%), the underlying difference might be 50 conversions out of 40,000 visitors. That sounds meaningful but might not be significant at all.
Trap 2: Lifetime counting. Optimizely counts a visitor once — their first experience. If a user visits Day 1 (sees variation), then returns Day 8 (still in variation), Optimizely counts them once. GA4 might count two sessions. This is why your conversion rates will always differ between tools.
The Improvement Column
This is the column most people look at first. Don't.
The improvement percentage tells you the relative lift of variation over baseline. A +15% improvement when your baseline is 2.0% CVR means you're at roughly 2.3% CVR — not 17% CVR.
The improvement number is meaningless without the confidence interval. A +15% improvement with a confidence interval of -8% to +38% tells you almost nothing. A +15% improvement with a confidence interval of +9% to +21% tells you a lot.
**Pro Tip:** When translating improvement to revenue, use the absolute CVR change, not the relative improvement. If baseline CVR is 2.0% and variation is 2.3%, and you get 50,000 monthly visitors at $80 AOV, the revenue impact is: 0.003 × 50,000 × $80 = $12,000/month. Stakeholders understand dollars, not percentages.
Confidence Intervals: What the Range Actually Means
The confidence interval is the most important number on the results page and the most misunderstood.
Optimizely shows confidence intervals around the improvement — not around the raw CVR. An interval of "+3% to +18%" means that, based on the data collected, the true improvement of the variation over baseline is somewhere in that range, with the stated confidence level.
Key points:
- When the interval is entirely above zero, the variation is a winner
- When the interval is entirely below zero, the variation is a loser
- When the interval crosses zero, the result is inconclusive — it's not a tie, it's uncertainty
What overlapping confidence intervals do NOT mean: Many practitioners think that if the confidence intervals of two bars overlap on the chart, the result is not significant. This is wrong. Confidence intervals around individual means can overlap even when the difference between them is significant. What matters is whether the confidence interval of the difference crosses zero.
Statistical Significance in Optimizely's Stats Engine
Optimizely uses Stats Engine, which is based on sequential testing rather than classical fixed-horizon statistics. This matters for how you interpret the significance badge.
In classical frequentist A/B testing, you must commit to a sample size before the test begins and only read results once at the end. Peeking at interim results inflates your false positive rate — if you peek 10 times at a test at 95% confidence, your effective confidence level drops to roughly 60%.
Stats Engine solves this with always-valid p-values. You can check your results at any time without increasing your Type I error rate. The tradeoff is that Stats Engine is more conservative early in a test — it requires more evidence before declaring a winner, because it needs to account for all the times you might check.
Practical implication: a result that says "significant" in Optimizely at 95% confidence genuinely means what it says, regardless of when you look. But a result that doesn't say significant yet might just need more data, not more tests.
**Pro Tip:** "Significant" and "winner" are not synonyms in Optimizely's language. A variation can be statistically significant *and* a loser — it just means you have high confidence it loses. "Winner" requires significance *and* positive improvement.
The "Headline Number" Trap
The improvement percentage at the top of the variation column is the first thing your eye goes to. It's also the last thing you should evaluate.
The correct reading order is:
- Visitor counts (SRM check)
- Conversion counts (implementation check)
- Confidence intervals (are they above/below zero?)
- Statistical significance status
- Improvement percentage (only meaningful once 1-4 check out)
Teams that lead with the improvement number make two common mistakes: they ship underpowered tests that showed a large lift early (which regresses to the mean after shipping), and they kill tests showing a small improvement when the interval doesn't yet cross zero (which may have been real effects if run longer).
Secondary Metrics: What to Look For
The primary metric drives your significance call. Secondary metrics tell you if you're breaking anything.
Look at secondary metrics for:
- Directional consistency: If your primary metric (add-to-cart) improves but revenue per visitor drops, the variation may be attracting low-value conversions
- Guardrail metrics: Bounce rate, pages per session, error rate — these tell you if the variation is creating a worse experience that your primary metric doesn't capture
- Downstream metrics: If you're testing a checkout variation, also check order completion rate, not just cart adds
**Pro Tip:** A result where your primary metric wins but two secondary metrics show meaningful negative trends is not a win. Set guardrail thresholds before the test launches — if any guardrail degrades by more than X%, you pause the rollout regardless of primary metric performance.
How to Spot Implementation Bugs From the Results Page
You don't always need to re-QA your implementation to find bugs. The results page shows you most of them.
Flat conversion rate in variation: Your variation loads but your conversion event isn't firing. Check whether the variation's JavaScript is interfering with form submission or checkout flow.
Conversion rate much higher than historical: Your event is firing multiple times per visitor. Look for cases where the event fires on page load rather than on user action.
Visitor count diverges sharply over time: Your targeting condition is changing mid-test — for example, a cookie that expires, or an audience based on a UTM parameter that stops appearing in new traffic.
Conversion rate drops to zero on a specific day: A code push broke your variation or your conversion event. Check your deployment history against the results timeline.
Common Mistakes
Calling the test too early. Even with Stats Engine, a test that hits significance after 2 days is almost always going to show a different result at Day 14. Business cycles matter — you need at least one full week, ideally two, to account for weekday/weekend variation.
Ignoring the confidence interval width. A result that's significant but has a wide interval (say, +2% to +40%) means you're very uncertain about the magnitude of the effect. You know it wins; you don't know by how much. Don't build business projections on the top end of a wide interval.
Treating "not significant" as "no effect." An inconclusive result means you don't have enough data to be confident, not that the variation has no effect. A 10% improvement that's not significant yet might become significant with two more weeks of data.
Comparing raw CVRs instead of confidence intervals. Variation has 2.4% CVR, control has 2.1% CVR. That looks like a +14% improvement. But if you have 2,000 visitors in each group, those CVRs could easily flip by next week.
Reading segmented results as if they have experiment-level significance. Optimizely's Stats Engine significance calls apply to the overall result, not to segments. See the segmentation guide for how to handle segment analysis correctly.
What to Do Next
- Open your current running experiment results page right now and check the visitor count ratio — run the SRM check before anything else.
- Find your secondary metrics column and confirm your guardrail metrics are not degrading.
- Before your next results readout with stakeholders, build a one-slide summary that leads with the confidence interval, not the improvement percentage.
- Read the companion article on when to stop your A/B test — significance alone is not a stopping criterion.