What is Bayesian A/B Testing?

Atticus Li

← Glossary · Statistics & Methodology

Bayesian A/B Testing

An alternative to frequentist A/B testing that uses Bayes' theorem to compute direct probabilities that one variant is better than another.

Bayesian A/B testing replaces p-values and confidence intervals with posterior probabilities: "there's a 94% probability variant B has higher conversion than A, and the expected uplift is 3.2%." You combine a prior belief about the conversion rate with observed data via Bayes' theorem to produce a full posterior distribution for each variant. Decisions flow directly from the posterior: "ship if P(B > A) > 95%" or "ship if expected loss from choosing B < $500."

Also Known As

Data science: Bayesian inference for experiments, posterior-based testing
Growth: "94% to win" testing
Marketing: probability-of-beating-control testing
Engineering: Bayesian decision framework

How It Works

Baseline conversion is 5%. You set a Beta(50, 950) prior (reflecting 5% belief, moderate strength). After 10,000 visitors with 520 conversions in A and 580 in B, the posteriors are Beta(570, 9480) and Beta(630, 9420). Draw 10,000 samples from each posterior, count proportion where B > A: 0.94. Expected relative lift: 10.5%. Expected loss from choosing B if A were actually better: 0.08%. Decide based on thresholds agreed before launch.

Bayesian methods also naturally handle continuous monitoring: the posterior always answers "given what I've seen, how likely is B better?" without multiplicity corrections.

Best Practices

Pre-register decision rules — "ship if P(B>A) > 95% AND expected loss < X."
Use weakly informative priors unless you have genuinely strong historical data.
Report expected loss, not just probability to beat. A 96% P(B>A) with tiny expected uplift is often not worth shipping.
Validate with simulation that your decision thresholds achieve acceptable false positive rates.
Be transparent about priors. An overly optimistic prior biases results; reviewers should see the assumption.

Common Mistakes

Assuming Bayesian methods don't need sample size planning. They still do — just in terms of posterior width, not p-values.
Treating P(B>A) as P(B is a big win). Small sure wins and large uncertain wins can both hit 95% probability.
Using overly strong priors. This makes early data almost irrelevant and defeats the point of testing.

Industry Context

In SaaS/B2B, Bayesian methods are attractive because low traffic makes peeking tempting, and Bayesian methods handle continuous monitoring more gracefully. In ecommerce, Bayesian expected-loss framing aligns naturally with revenue decisions. In lead gen, Bayesian methods let you trade off P(B>A) against expected CPL improvement — richer than binary significance.

The Behavioral Science Connection

Bayesian outputs match how humans actually think about uncertainty: "how likely is it that B is better?" feels natural, while "what is the probability of observing data this extreme if there were no effect?" (the p-value) is both mathematically accurate and cognitively alien. Bayesian framing reduces misinterpretation at readouts.

Key Takeaway

Bayesian A/B testing is not magic — it has its own assumptions (especially priors) and its own risks. But its outputs are more interpretable to stakeholders, and the decision-theoretic framing (expected loss, probability to beat) aligns better with business choices than frequentist thresholds.

← Browse All Terms