The question I get more than any other from teams setting up Optimizely for the first time: "Should we use Bayesian or Frequentist?"
It's the right question to ask. The answer changes your UI, your interpretation, your stopping rules, and how you communicate results to stakeholders. Get it wrong and you'll either run tests for too long, call winners too early, or confuse your entire leadership team.
Here's what actually changes between the approaches — and a decision framework for picking the right one for your program.
What Optimizely Actually Offers
First, a clarification most guides skip: Optimizely gives you three statistical engines, not two.
Sequential (Stats Engine) — This is Optimizely's default and proprietary approach. It's built on sequential probability ratio testing with false discovery rate control. Despite common misconceptions, it's closer to frequentist than Bayesian in its mathematical foundations, but it was specifically designed to allow continuous monitoring (the thing classical frequentist testing forbids). When you haven't changed anything, this is what you're running.
Frequentist (Fixed Horizon) — A traditional null hypothesis significance testing approach. Requires you to pre-specify your sample size. Produces a p-value and statistical significance percentage. Cannot be validly checked until you've collected the required data. This is what your stats professor taught you.
Bayesian — Updates probability estimates as data accumulates. Reports "probability that variant beats control" rather than p-values. No fixed sample size required. Can stop early if one variant is clearly winning or losing.
This article compares all three so you can make a real decision.
Frequentist Fixed Horizon: The Rigorous Option
How it works
You set your sample size before the test starts. You run until you hit that number. You analyze once. The output is a p-value and a statement like "statistically significant at 95%."
The math assumes you've committed to analyzing your data exactly once, at a predetermined endpoint. Checking results midway and stopping early invalidates the statistical guarantees.
What changes in Optimizely
When you select Frequentist (Fixed Horizon) in the Stats Configuration dropdown, Optimizely's sample size calculator becomes your primary tool. You input:
- Metric type (conversion or numeric)
- Baseline metric value (your current conversion rate)
- Minimum Detectable Effect (the smallest lift you want to detect)
- Statistical significance level (typically 90% or 95%)
- Number of variations (including control)
The calculator outputs visitors needed per variation. You copy that into the Sample Size per Variation field. The experiment becomes locked to that endpoint. You can optionally add a minimum duration in days if you want to ensure at least one full business cycle.
The results page shows: statistical significance percentage, p-value, and confidence intervals — the classic frequentist output.
When frequentist is the right call
- Regulated industries (pharma, financial services, healthcare) where you need a defensible methodology that outside auditors will recognize
- Compliance and legal contexts where the statistical method may need to be documented and defended
- Academic or rigorous internal research where you want to publish or reference results
- High-stakes irreversible decisions where you want the strictest false positive control and are willing to wait for the required sample size
- Teams with strong statistical expertise who understand NHST deeply and have discipline around pre-registration
**Pro Tip:** If you're running frequentist tests, treat your sample size as a commitment, not a guideline. The moment you start peeking at results and considering early stops, you've compromised your Type I error rate. Either use Fixed Horizon with iron discipline, or switch to Stats Engine which is built for continuous monitoring.
Bayesian: The Intuitive, Fast Option
How it works
Bayesian testing asks a different question than frequentist. Instead of "how surprised would I be if there were no effect?", it asks "given the data I've collected, what's the probability that variant A beats control?"
That probability — called the "chance to beat" or posterior probability — updates continuously as visitors enter your test. When you've seen enough data, the probability stabilizes. You can stop when it's high enough.
Optimizely implements Bayesian testing by replacing the default Stats Engine with a Bayesian framework. The output is a probability: "87% chance Variant A beats control."
What changes in Optimizely
Configuration is simpler than Frequentist. In the Stats Configuration dropdown, select "Bayesian." You can optionally adjust the "Chance to beat probability threshold" — the minimum probability required before Optimizely marks a variant as a winner. The minimum is 70%, with 95% being common.
No sample size pre-specification is required. You monitor the probability and stop when it crosses your threshold — or when the probability stabilizes at a level that tells you the effect is smaller than your MDE.
Results show: probability that each variant beats control, and the estimated lift with credible intervals (Bayesian's version of confidence intervals).
When Bayesian is the right call
- Fast-paced product teams where shipping speed matters more than statistical precision
- High-traffic pages where you accumulate data quickly and want to make decisions faster
- Stakeholder communication — "there's a 91% chance this version is better" is far more intuitive to executives than "p = 0.04 at 95% confidence"
- Exploratory testing phases where you're learning rather than making high-stakes decisions
- Teams without deep statistical training who need to act on results without getting lost in p-value interpretation
**Pro Tip:** Bayesian probability thresholds and frequentist significance levels are not equivalent. A 95% Bayesian "chance to beat" is not the same as 95% frequentist confidence. Bayesian 95% means "given the data, there's a 95% probability the variant is better." Frequentist 95% means "if there were no effect, we'd see this result only 5% of the time." Don't mix the interpretations.
Stats Engine (Sequential): The Default for Most Teams
Optimizely built Stats Engine as a middle path — and for most experimentation programs, it's the right default.
Stats Engine is mathematically a sequential test (frequentist-adjacent), but it's designed to support continuous monitoring without inflating false positive rates. It uses false discovery rate (FDR) control across your experiment's metrics, which reduces the chance of calling a winner on noise when you're tracking multiple metrics simultaneously.
The key practical advantage: you get valid real-time results. You can check the dashboard every day. You can share results in weekly readouts. You can stop early if the evidence is overwhelming. The statistical guarantees hold throughout.
For most CRO teams — especially those without dedicated statisticians — Stats Engine provides the best balance of rigor and flexibility.
The Practical Tradeoffs: A Direct Comparison
Speed to decision: Bayesian > Stats Engine > Frequentist
Bayesian can reach its probability threshold faster, especially when effects are large. Frequentist requires you to collect the full pre-specified sample regardless of how clear the signal is. Stats Engine falls in the middle — it allows early stopping but maintains stricter false positive control than Bayesian.
False positive control: Frequentist (if discipline is maintained) ~ Stats Engine > Bayesian
Classical frequentist gives you exact Type I error control — if you don't peek. Stats Engine provides valid continuous monitoring with FDR control. Bayesian is more flexible but can produce more false positives if teams stop tests too early at low probability thresholds.
Ease of interpretation: Bayesian > Stats Engine > Frequentist
"91% probability the variant is better" needs no explanation. "Statistically significant at 95% confidence" requires unpacking. P-values require a lecture.
Stakeholder communication: Bayesian wins here, consistently.
Required statistical sophistication: Frequentist > Stats Engine > Bayesian
Frequentist demands the most discipline — pre-registration, no peeking, strict adherence to sample size. Stats Engine is more forgiving. Bayesian is the most accessible.
**Pro Tip:** Match your statistical engine to your team's maturity, not to what sounds most rigorous. A team that uses frequentist but peeks at results daily has worse statistical properties than a team running Bayesian with an 85% threshold and genuine discipline around stopping rules.
Decision Table: Which Engine for Your Situation
Use Frequentist (Fixed Horizon) when:
- You're in a regulated industry or compliance context
- You need to document methodology for external review
- Your team has strong statistical training and will not peek at results
- You're running high-stakes tests where you want the tightest false positive control
- Your experiment has a single primary metric
Use Bayesian when:
- Your team needs fast decisions and shipping velocity is a priority
- Stakeholders need intuitive probability statements, not p-values
- You're running exploratory or iterative tests where speed of learning matters more than precision
- Your test is lower stakes (copy, UI, layout) and you're comfortable with slightly higher false positive risk
- You want to stop early when results are clearly going one way
Use Stats Engine (default) when:
- You don't have a specific reason to deviate from the default
- Your team checks results continuously and needs valid real-time inference
- You're tracking multiple metrics and need FDR protection
- You want the best balance of flexibility and rigor without committing to strict pre-registration
**Pro Tip:** You can change the statistical method on a test-by-test basis in Optimizely. Don't feel like you need to pick one approach for your entire program. Run frequentist for your pricing test, Bayesian for your hero section copy test, and Stats Engine for your standard feature experiments.
What Optimizely Recommends
Optimizely's default is Stats Engine, and that's deliberate. The company designed it specifically to solve the two most common failures in classical A/B testing programs: peeking and multi-metric false discovery inflation.
They've subsequently added Frequentist and Bayesian as explicit options because enterprise customers with compliance requirements or specific methodological preferences need them. But for a team starting from scratch, Stats Engine's defaults are solid.
Common Mistakes
Mistake 1: Switching methods mid-test. Don't change your statistical engine after a test starts. The analysis assumes a consistent method from the first visitor.
Mistake 2: Using frequentist but monitoring results daily. This is the worst of both worlds — you get the rigidity of frequentist (must pre-specify sample size) without the validity protection (you're peeking). Either commit to no peeking or switch to Stats Engine.
Mistake 3: Setting Bayesian probability thresholds too low. A 70% chance to beat is barely better than a coin flip. If you're making product decisions at 75% Bayesian probability, you're shipping losers regularly. Most teams should not go below 90%.
Mistake 4: Conflating the three methods' outputs. A 95% confidence Frequentist result, a 95% chance-to-beat Bayesian result, and a 95% Stats Engine result are not the same thing and should not be communicated interchangeably.
Mistake 5: Not documenting the chosen method in your experiment plan. When you revisit an experiment six months later, you need to know what statistical framework was used to interpret the results correctly. Document it.
What to Do Next
- Audit your current default. Open Optimizely's Stats Configuration on your next experiment. Confirm you know which engine is selected and whether it's appropriate for the test stakes.
- Define team standards. Write a one-page policy: which engine for which test type, what thresholds, who can override. This prevents ad hoc methodology choices that produce incomparable results across your program.
- Train stakeholders on your chosen output format. If you're running Bayesian, teach your leadership team to read probability statements. If you're running Stats Engine, give them the "how to interpret a confidence interval" two-minute briefing. Consistent interpretation across the org matters more than the specific method.
- Run a retrospective on your last 10 called experiments. What method was used? Was it appropriate for the stakes? Were results interpreted correctly? This audit often reveals systematic misinterpretation that's been costing you decision quality for months.