Optimizely's official docs will teach you how to click buttons. They won't teach you how to run experiments. That's what this toolkit is for.

After 7+ years managing CRO programs and running 100+ experiments — including work that drove $30M+ in revenue impact at NRG Energy — I've seen the same mistakes repeated across teams at every level. Analysts who stop tests too early because the numbers look good. PMs who don't know what statistical significance actually means. CRO managers who build roadmaps nobody follows.

This toolkit exists to close that gap. Every article is written for practitioners — people who are in Optimizely every day and need answers that go beyond the official documentation.

Who This Is For

This toolkit is useful if you're any of the following:

  • An analyst setting up your first A/B test and trying to figure out targeting, traffic allocation, and metrics
  • A product manager who needs to understand what you're looking at on the results page — and whether to ship or not
  • A CRO manager building a testing program from scratch and trying to make it defensible and scalable
  • Anyone who's been using Optimizely for months but still feels uncertain about statistical significance, when to call a test, or why your results keep changing

If you've ever stopped a test at 60% significance because it looked like it was winning, this is for you.

Start Here: Which Path Fits You?

Not everyone needs to read everything. Here's where to start based on where you are.

If you're brand new to Optimizely

Start with the foundations. Before you launch a real experiment, you need to understand what types of tests exist, how long to run them, and whether your setup is actually working.

  1. Which test type to use: A/B vs. multivariate vs. multipage
  2. How long to run an A/B test
  3. How to run an A/A test to validate your setup
  4. 10 high-impact experiments to launch first

If you're mid-experiment and confused

You launched a test, but something doesn't look right. Results are fluctuating, significance keeps changing, or you're not sure when to stop.

  1. What statistical significance actually means
  2. Why your results keep changing
  3. How to read the Optimizely results page
  4. When to stop an A/B test

If you're scaling a CRO program

You're past the basics and trying to build something repeatable — a roadmap that survives contact with stakeholders, hypotheses that generate real learning, and results that drive decisions.

  1. How to build an experimentation roadmap
  2. How to write A/B test hypotheses
  3. What to measure based on your revenue model
  4. How to share results with stakeholders

The 5 Most Common Mistakes Optimizely Users Make

These are the mistakes I see most often — not from beginners, but from people who've been running tests for a year and still haven't fixed them.

1. Stopping tests early when they look like they're winning

A test hits 85% statistical significance on day 4 and the variation is up 12%. You stop it, ship the change, and declare victory. Six months later you realize the revenue impact never materialized.

This is p-hacking. When you stop a test early based on interim results, you're inflating false positives. The 95% confidence threshold only means what it says if you wait until your predetermined sample size is reached. Read when to stop an A/B test before you call your next winner.

2. Changing a running experiment

Adjusting traffic allocation mid-test, editing the variation code, changing the goal metric after launch — all of these invalidate your data. Even minor changes break the statistical assumptions your test is built on. The full explanation is here, but the rule is simple: if the experiment is live, don't touch it.

3. Setting MDE too low and waiting forever

Your minimum detectable effect is the smallest lift worth detecting. If you set it to 0.5% on a page that gets 2,000 visits a month, you'll need to run that test for a year. Most teams don't think about MDE until their test has been running for three months with no significance — then they realize they never had enough power. Here's how to set MDE correctly.

4. Using URL targeting when you need audience targeting

URL targeting fires based on what page the visitor is on. Audience targeting fires based on who the visitor is. These are not interchangeable. If you're targeting returning customers, mobile users, or users in a specific funnel stage, you need audience conditions — URL targeting will include everyone and dilute your results. The full breakdown is here.

5. Ignoring segment analysis and reading only top-line results

A test that shows no overall lift might be a +18% lift on mobile and a -9% lift on desktop. If you read only the top-line, you ship nothing and learn nothing. Segment analysis is where most of the actual insight lives. Here's how to segment Optimizely results correctly.

Pro Tip: Before any test goes live, write down your stopping criteria: the minimum sample size, the maximum runtime, and which segments you'll check. If you decide after launch, you're rationalizing, not analyzing.

The Full Toolkit Index

All 24 articles, organized by cluster.

Cluster 1: Experiment Foundations

Cluster 2: Targeting

Cluster 3: Statistics

Cluster 4: Results and Analysis

Cluster 5: Metrics

Cluster 6: Program Building

Subscribe: Lean Experiments

I publish new practitioner guides like these regularly. If you found this useful, subscribe to Lean Experiments — my weekly newsletter on revenue-focused experimentation. No fluff, no vendor content, just the analysis and frameworks I use in actual CRO programs.

Share this article
LinkedIn (opens in new tab) X / Twitter (opens in new tab)
Written by Atticus Li

Revenue & experimentation leader — behavioral economics, CRO, and AI. CXL & Mindworx certified. $30M+ in verified impact.