Skip to main content
← Glossary · A/B Testing

Bandit Algorithm (Multi-Armed Bandit)

An adaptive experiment design that dynamically shifts traffic toward better-performing variants during a test, balancing exploration of new options with exploitation of known winners.

What Is a Bandit Algorithm?

A bandit algorithm (or multi-armed bandit, MAB) is an adaptive experimentation method that continuously reallocates traffic toward better-performing variants while still sending some traffic to underperformers to verify their standing. It's named after the gambler's dilemma: with multiple slot machines of unknown payout, how do you maximize winnings across the session? In digital experimentation, it's a way to cut opportunity cost on long-running tests.

Also Known As

  • Marketing teams often call it adaptive testing or auto-optimizing test.
  • Growth teams say bandit, MAB, or adaptive allocation.
  • Product teams use bandit, optimization algorithm, or adaptive experiment.
  • Engineering teams refer to Thompson Sampling, UCB, or epsilon-greedy (specific bandit types).
  • Data science teams call it MAB, contextual bandit, or reinforcement learning-style allocation.

How It Works

You have three email subject lines to test. A classic A/B/C test splits traffic evenly for the full duration. A bandit starts at 33/33/33 but re-evaluates after every batch of sends. After 1,000 sends, subject A opens at 22%, B at 28%, C at 19%. The next batch reallocates: A gets 20%, B gets 65%, C gets 15%. As more data arrives, traffic concentrates further on the best performer. By the end you've sent far more opens to B than you would have in a fixed A/B/C split — at the cost of less statistical certainty about the losers.

Best Practices

  • Use bandits for optimization, not inference — when you care about maximizing outcomes, not learning.
  • Prefer Thompson Sampling over epsilon-greedy for better long-run performance.
  • Use contextual bandits when different user segments might prefer different variants.
  • Monitor guardrail metrics closely — bandits can lock onto a variant that's good on the primary metric but bad on secondary ones.
  • Don't use bandits for high-stakes infrastructure tests where you need clean causal inference.

Common Mistakes

  • Using bandits on low-traffic tests where early random variation causes premature convergence on a loser.
  • Treating bandit results as statistically significant — bandits optimize but don't prove causation cleanly.
  • Running bandits on noisy, delayed metrics (like LTV) where the feedback loop is too slow.

Industry Context

  • SaaS/B2B: Less common; traffic is usually too low and strategic learning matters more than short-term lift.
  • Ecommerce/DTC: Useful for headline, banner, and promotion optimization where traffic is high and decisions are low-stakes.
  • Lead gen: Good fit for ad creative optimization and dynamic landing page headline selection.

The Behavioral Science Connection

Bandits operationalize Herbert Simon's satisficing — accepting "good enough" quickly rather than chasing optimal certainty. They embody a systems-level tradeoff most individual experimenters fail to make: less learning per test, more value captured across the portfolio.

Key Takeaway

Use bandits when you want to capture value during optimization — and use A/B tests when you want to learn why something won.