Bayesian A/B Testing
An alternative to frequentist A/B testing that uses Bayes' theorem to compute direct probabilities that one variant is better than another.
What Is Bayesian A/B Testing?
Bayesian A/B testing replaces p-values and confidence intervals with posterior probabilities: "there's a 94% probability variant B has higher conversion than A, and the expected uplift is 3.2%." You combine a prior belief about the conversion rate with observed data via Bayes' theorem to produce a full posterior distribution for each variant. Decisions flow directly from the posterior: "ship if P(B > A) > 95%" or "ship if expected loss from choosing B < $500."
Also Known As
- Data science: Bayesian inference for experiments, posterior-based testing
- Growth: "94% to win" testing
- Marketing: probability-of-beating-control testing
- Engineering: Bayesian decision framework
How It Works
Baseline conversion is 5%. You set a Beta(50, 950) prior (reflecting 5% belief, moderate strength). After 10,000 visitors with 520 conversions in A and 580 in B, the posteriors are Beta(570, 9480) and Beta(630, 9420). Draw 10,000 samples from each posterior, count proportion where B > A: 0.94. Expected relative lift: 10.5%. Expected loss from choosing B if A were actually better: 0.08%. Decide based on thresholds agreed before launch.
Bayesian methods also naturally handle continuous monitoring: the posterior always answers "given what I've seen, how likely is B better?" without multiplicity corrections.
Best Practices
- Pre-register decision rules — "ship if P(B>A) > 95% AND expected loss < X."
- Use weakly informative priors unless you have genuinely strong historical data.
- Report expected loss, not just probability to beat. A 96% P(B>A) with tiny expected uplift is often not worth shipping.
- Validate with simulation that your decision thresholds achieve acceptable false positive rates.
- Be transparent about priors. An overly optimistic prior biases results; reviewers should see the assumption.
Common Mistakes
- Assuming Bayesian methods don't need sample size planning. They still do — just in terms of posterior width, not p-values.
- Treating P(B>A) as P(B is a big win). Small sure wins and large uncertain wins can both hit 95% probability.
- Using overly strong priors. This makes early data almost irrelevant and defeats the point of testing.
Industry Context
In SaaS/B2B, Bayesian methods are attractive because low traffic makes peeking tempting, and Bayesian methods handle continuous monitoring more gracefully. In ecommerce, Bayesian expected-loss framing aligns naturally with revenue decisions. In lead gen, Bayesian methods let you trade off P(B>A) against expected CPL improvement — richer than binary significance.
The Behavioral Science Connection
Bayesian outputs match how humans actually think about uncertainty: "how likely is it that B is better?" feels natural, while "what is the probability of observing data this extreme if there were no effect?" (the p-value) is both mathematically accurate and cognitively alien. Bayesian framing reduces misinterpretation at readouts.
Key Takeaway
Bayesian A/B testing is not magic — it has its own assumptions (especially priors) and its own risks. But its outputs are more interpretable to stakeholders, and the decision-theoretic framing (expected loss, probability to beat) aligns better with business choices than frequentist thresholds.