Skip to main content
← Glossary · Statistics & Methodology

Bayesian A/B Testing

An alternative to frequentist A/B testing that uses Bayes' theorem to compute direct probabilities that one variant is better than another.

What Is Bayesian A/B Testing?

Bayesian A/B testing replaces p-values and confidence intervals with posterior probabilities: "there's a 94% probability variant B has higher conversion than A, and the expected uplift is 3.2%." You combine a prior belief about the conversion rate with observed data via Bayes' theorem to produce a full posterior distribution for each variant. Decisions flow directly from the posterior: "ship if P(B > A) > 95%" or "ship if expected loss from choosing B < $500."

Also Known As

  • Data science: Bayesian inference for experiments, posterior-based testing
  • Growth: "94% to win" testing
  • Marketing: probability-of-beating-control testing
  • Engineering: Bayesian decision framework

How It Works

Baseline conversion is 5%. You set a Beta(50, 950) prior (reflecting 5% belief, moderate strength). After 10,000 visitors with 520 conversions in A and 580 in B, the posteriors are Beta(570, 9480) and Beta(630, 9420). Draw 10,000 samples from each posterior, count proportion where B > A: 0.94. Expected relative lift: 10.5%. Expected loss from choosing B if A were actually better: 0.08%. Decide based on thresholds agreed before launch.

Bayesian methods also naturally handle continuous monitoring: the posterior always answers "given what I've seen, how likely is B better?" without multiplicity corrections.

Best Practices

  • Pre-register decision rules — "ship if P(B>A) > 95% AND expected loss < X."
  • Use weakly informative priors unless you have genuinely strong historical data.
  • Report expected loss, not just probability to beat. A 96% P(B>A) with tiny expected uplift is often not worth shipping.
  • Validate with simulation that your decision thresholds achieve acceptable false positive rates.
  • Be transparent about priors. An overly optimistic prior biases results; reviewers should see the assumption.

Common Mistakes

  • Assuming Bayesian methods don't need sample size planning. They still do — just in terms of posterior width, not p-values.
  • Treating P(B>A) as P(B is a big win). Small sure wins and large uncertain wins can both hit 95% probability.
  • Using overly strong priors. This makes early data almost irrelevant and defeats the point of testing.

Industry Context

In SaaS/B2B, Bayesian methods are attractive because low traffic makes peeking tempting, and Bayesian methods handle continuous monitoring more gracefully. In ecommerce, Bayesian expected-loss framing aligns naturally with revenue decisions. In lead gen, Bayesian methods let you trade off P(B>A) against expected CPL improvement — richer than binary significance.

The Behavioral Science Connection

Bayesian outputs match how humans actually think about uncertainty: "how likely is it that B is better?" feels natural, while "what is the probability of observing data this extreme if there were no effect?" (the p-value) is both mathematically accurate and cognitively alien. Bayesian framing reduces misinterpretation at readouts.

Key Takeaway

Bayesian A/B testing is not magic — it has its own assumptions (especially priors) and its own risks. But its outputs are more interpretable to stakeholders, and the decision-theoretic framing (expected loss, probability to beat) aligns better with business choices than frequentist thresholds.