What is Epsilon-Greedy Strategy? — Glossary

Atticus Li

Epsilon-Greedy Strategy

A bandit algorithm approach that exploits the current best variant most of the time while randomly exploring alternatives a fixed percentage of the time.

What Is Epsilon-Greedy?

Epsilon-greedy is the simplest multi-armed bandit strategy. You pick a small number called epsilon (typically 0.1, or 10%). That fraction of the time, you send traffic to a randomly chosen variant (exploration). The rest of the time, you send traffic to whichever variant is currently winning (exploitation). It's deliberately naive, and that's its strength — it's easy to explain, easy to implement, and easy to debug.

Also Known As

Marketing teams call it the 90/10 split or greedy exploration.
Growth teams say epsilon-greedy or greedy bandit.
Product teams use epsilon-greedy or naive bandit.
Engineering teams refer to ε-greedy or greedy exploration.
Reinforcement learning practitioners call it ε-greedy — it's the textbook starting point.

How It Works

You set epsilon = 0.1 across three variants A, B, C. After initial data, A is winning. For the next 100 users, 90 go to A and 10 are randomly split across A, B, and C (about 3–4 per variant). A winner's traffic share stays around 93–94%, and losers keep getting ~3% each forever. If a loser turns out to actually be better (and early data was misleading), the 10% exploration budget eventually surfaces it — but slowly.

Best Practices

Start with epsilon = 0.1 for most use cases; increase to 0.2 if you suspect early data is noisy.
Re-evaluate the "current best" continuously as exploration data accumulates.
Use decaying epsilon (start high, decrease over time) when you want more exploration early.
Monitor the gap between best and second-best — if it's tiny, you need more exploration, not less.

Common Mistakes

Setting epsilon too low (1–2%), making it nearly impossible to recover from early randomness picking a wrong winner.
Never updating the "current best" after the initial pick, which turns the algorithm into a static bet.
Using epsilon-greedy on high-variance metrics where early samples are unreliable.

Industry Context

SaaS/B2B: Useful for low-stakes optimization like email subject lines and in-app messages.
Ecommerce/DTC: Popular for banner, hero, and promotion optimization on category pages.
Lead gen: Fits ad creative rotation and dynamic landing page modules.

The Behavioral Science Connection

Epsilon-greedy mirrors the explore-exploit tradeoff humans face daily and usually handle poorly. Research on variety-seeking shows people over-explore when they should exploit (trying new restaurants despite having a favorite) and over-exploit when they should explore (never trying anything new). Epsilon-greedy imposes discipline via a fixed exploration budget.

Key Takeaway

Epsilon-greedy is the training wheels of bandit algorithms — simple, interpretable, and good enough for most low-stakes optimization problems.

← Browse All Terms