Optimizely's official docs will teach you how to click buttons. They won't teach you how to run experiments. That's what this toolkit is for.
After 7+ years managing CRO programs and running 100+ experiments — including work that drove $30M+ in revenue impact at NRG Energy — I've seen the same mistakes repeated across teams at every level. Analysts who stop tests too early because the numbers look good. PMs who don't know what statistical significance actually means. CRO managers who build roadmaps nobody follows.
This toolkit exists to close that gap. Every article is written for practitioners — people who are in Optimizely every day and need answers that go beyond the official documentation.
Who This Is For
This toolkit is useful if you're any of the following:
- An analyst setting up your first A/B test and trying to figure out targeting, traffic allocation, and metrics
- A product manager who needs to understand what you're looking at on the results page — and whether to ship or not
- A CRO manager building a testing program from scratch and trying to make it defensible and scalable
- Anyone who's been using Optimizely for months but still feels uncertain about statistical significance, when to call a test, or why your results keep changing
If you've ever stopped a test at 60% significance because it looked like it was winning, this is for you.
Start Here: Which Path Fits You?
Not everyone needs to read everything. Here's where to start based on where you are.
If you're brand new to Optimizely
Start with the foundations. Before you launch a real experiment, you need to understand what types of tests exist, how long to run them, and whether your setup is actually working.
- Which test type to use: A/B vs. multivariate vs. multipage
- How long to run an A/B test
- How to run an A/A test to validate your setup
- 10 high-impact experiments to launch first
If you're mid-experiment and confused
You launched a test, but something doesn't look right. Results are fluctuating, significance keeps changing, or you're not sure when to stop.
- What statistical significance actually means
- Why your results keep changing
- How to read the Optimizely results page
- When to stop an A/B test
If you're scaling a CRO program
You're past the basics and trying to build something repeatable — a roadmap that survives contact with stakeholders, hypotheses that generate real learning, and results that drive decisions.
- How to build an experimentation roadmap
- How to write A/B test hypotheses
- What to measure based on your revenue model
- How to share results with stakeholders
The 5 Most Common Mistakes Optimizely Users Make
These are the mistakes I see most often — not from beginners, but from people who've been running tests for a year and still haven't fixed them.
1. Stopping tests early when they look like they're winning
A test hits 85% statistical significance on day 4 and the variation is up 12%. You stop it, ship the change, and declare victory. Six months later you realize the revenue impact never materialized.
This is p-hacking. When you stop a test early based on interim results, you're inflating false positives. The 95% confidence threshold only means what it says if you wait until your predetermined sample size is reached. Read when to stop an A/B test before you call your next winner.
2. Changing a running experiment
Adjusting traffic allocation mid-test, editing the variation code, changing the goal metric after launch — all of these invalidate your data. Even minor changes break the statistical assumptions your test is built on. The full explanation is here, but the rule is simple: if the experiment is live, don't touch it.
3. Setting MDE too low and waiting forever
Your minimum detectable effect is the smallest lift worth detecting. If you set it to 0.5% on a page that gets 2,000 visits a month, you'll need to run that test for a year. Most teams don't think about MDE until their test has been running for three months with no significance — then they realize they never had enough power. Here's how to set MDE correctly.
4. Using URL targeting when you need audience targeting
URL targeting fires based on what page the visitor is on. Audience targeting fires based on who the visitor is. These are not interchangeable. If you're targeting returning customers, mobile users, or users in a specific funnel stage, you need audience conditions — URL targeting will include everyone and dilute your results. The full breakdown is here.
5. Ignoring segment analysis and reading only top-line results
A test that shows no overall lift might be a +18% lift on mobile and a -9% lift on desktop. If you read only the top-line, you ship nothing and learn nothing. Segment analysis is where most of the actual insight lives. Here's how to segment Optimizely results correctly.
Pro Tip: Before any test goes live, write down your stopping criteria: the minimum sample size, the maximum runtime, and which segments you'll check. If you decide after launch, you're rationalizing, not analyzing.
The Full Toolkit Index
All 24 articles, organized by cluster.
Cluster 1: Experiment Foundations
- How long to run an A/B test — Calculate runtime based on your traffic volume and minimum detectable effect
- Why you should never change a running experiment — What mid-test changes actually do to your data and why it's never worth it
- A/B vs. multivariate vs. multipage testing — When to use each test type and what traffic requirements they carry
- 10 high-impact A/B tests to launch — A curated list of experiments with tight hypotheses and realistic benchmarks
- How to run an A/A test — Validate your Optimizely setup before you trust any real experiment data
Cluster 2: Targeting
- Audience conditions targeting guide — How audience conditions work in practice and common configurations that actually matter
- URL targeting vs. audience targeting — The difference between the two, when each applies, and the mistake that dilutes results
Cluster 3: Statistics
- Statistical significance explained — What it means, what it doesn't mean, and why 95% is not a guarantee
- Bayesian vs. frequentist testing in Optimizely — How the two approaches differ and which to use for your program
- Minimum detectable effect guide — How to set MDE so your tests are powered to detect lifts that actually matter
- Why your experiment isn't reaching significance — Five causes and the fix for each
- False discovery rate and multiple testing — Why running many tests simultaneously inflates false positives — and how to control it
- Why results keep changing — The difference between normal fluctuation, novelty effect, and actual trend drift
Cluster 4: Results and Analysis
- Optimizely results page walkthrough — How to read every element of the results page, including what most people skip
- Segmenting A/B test results — How to find the real insights that top-line numbers hide
- When to stop an A/B test — Stopping rules that protect your data from peeking and early calls
- Optimizely vs. Google Analytics data discrepancy — Why the numbers don't match and which source to trust for which decision
Cluster 5: Metrics
- Choosing your primary metric — The framework for selecting a metric that actually measures what your test is trying to move
- Ratio, revenue, and conversion metrics — Which metric type applies to which situation and why it matters for statistical validity
- How Optimizely counts conversions — The exact counting logic and how it affects your reported conversion rates
Cluster 6: Program Building
- Experimentation roadmap guide — How to build a roadmap that gets prioritized, resourced, and actually used
- Writing A/B test hypotheses — The hypothesis structure that generates learning whether the test wins or loses
- Metrics by revenue model — What to measure if you're e-commerce, SaaS, lead gen, or media
- Sharing results with stakeholders — How to communicate test results to people who don't know what a p-value is
Subscribe: Lean Experiments
I publish new practitioner guides like these regularly. If you found this useful, subscribe to Lean Experiments — my weekly newsletter on revenue-focused experimentation. No fluff, no vendor content, just the analysis and frameworks I use in actual CRO programs.