Thompson Sampling
A Bayesian bandit algorithm that allocates traffic to variants based on the probability that each variant is the best, naturally balancing exploration and exploitation.
What Is Thompson Sampling?
Thompson Sampling is a Bayesian multi-armed bandit algorithm that maintains a probability distribution over each variant's true conversion rate and allocates traffic by sampling from those distributions. The variant with the highest sampled value each round gets the next visitor. Early on, when distributions are wide, allocation looks near-random; as data accumulates and distributions narrow, traffic concentrates on the winner — with no manual tuning required.
Also Known As
- Marketing teams call it smart bandit or adaptive optimization.
- Growth teams say Thompson Sampling or Bayesian bandit.
- Product teams use Bayesian bandit or Thompson.
- Engineering teams refer to Thompson Sampling, TS, or posterior sampling.
- Statisticians call it Thompson Sampling or probability matching.
How It Works
Three variants: A, B, C. Each has a Beta posterior distribution — A is Beta(105, 1995), B is Beta(118, 1982), C is Beta(95, 2005). For each incoming visitor, you draw a sample from each distribution. Say you draw A=0.0487, B=0.0621, C=0.0441. B has the highest sample, so this visitor goes to B. Over millions of visitors, B's posterior gets tighter and its samples increasingly reliably exceed A's and C's, so it gets most of the traffic — but A and C still get a slice whenever randomness in their wider distributions produces a high draw.
Best Practices
- Let your experimentation platform handle the math — implementing Thompson Sampling correctly is subtle.
- Use appropriate priors — informative priors (based on historical data) converge faster than uninformative ones.
- Monitor guardrails closely — Thompson Sampling optimizes the primary metric ruthlessly.
- Use contextual Thompson Sampling if different user segments might have different winners.
- Pair with shutdown criteria so exploration can end when you have enough data.
Common Mistakes
- Rolling your own Thompson Sampling implementation and introducing subtle bugs in the sampling step.
- Using Thompson Sampling on delayed-signal metrics where feedback takes weeks — convergence becomes unstable.
- Treating Thompson Sampling results like A/B test p-values — they aren't equivalent.
Industry Context
- SaaS/B2B: Useful for in-app message and email content optimization.
- Ecommerce/DTC: Strong fit for homepage hero, product recommendation, and promotion selection.
- Lead gen: Dynamic creative optimization in ad platforms uses Thompson-style algorithms under the hood.
The Behavioral Science Connection
Thompson Sampling formalizes how expert practitioners actually think: you don't believe a variant's conversion rate is exactly 3.2% — you believe it's probably between 2.8% and 3.6%, and that range narrows with evidence. The algorithm operationalizes Bayesian cognition at the system level.
Key Takeaway
Thompson Sampling is the most elegant bandit algorithm — adaptive without tuning, principled without complexity, and the right default for high-traffic optimization problems.