Epsilon-Greedy Strategy
A bandit algorithm approach that exploits the current best variant most of the time while randomly exploring alternatives a fixed percentage of the time.
What Is Epsilon-Greedy?
Epsilon-greedy is the simplest multi-armed bandit strategy. You pick a small number called epsilon (typically 0.1, or 10%). That fraction of the time, you send traffic to a randomly chosen variant (exploration). The rest of the time, you send traffic to whichever variant is currently winning (exploitation). It's deliberately naive, and that's its strength — it's easy to explain, easy to implement, and easy to debug.
Also Known As
- Marketing teams call it the 90/10 split or greedy exploration.
- Growth teams say epsilon-greedy or greedy bandit.
- Product teams use epsilon-greedy or naive bandit.
- Engineering teams refer to ε-greedy or greedy exploration.
- Reinforcement learning practitioners call it ε-greedy — it's the textbook starting point.
How It Works
You set epsilon = 0.1 across three variants A, B, C. After initial data, A is winning. For the next 100 users, 90 go to A and 10 are randomly split across A, B, and C (about 3–4 per variant). A winner's traffic share stays around 93–94%, and losers keep getting ~3% each forever. If a loser turns out to actually be better (and early data was misleading), the 10% exploration budget eventually surfaces it — but slowly.
Best Practices
- Start with epsilon = 0.1 for most use cases; increase to 0.2 if you suspect early data is noisy.
- Re-evaluate the "current best" continuously as exploration data accumulates.
- Use decaying epsilon (start high, decrease over time) when you want more exploration early.
- Monitor the gap between best and second-best — if it's tiny, you need more exploration, not less.
Common Mistakes
- Setting epsilon too low (1–2%), making it nearly impossible to recover from early randomness picking a wrong winner.
- Never updating the "current best" after the initial pick, which turns the algorithm into a static bet.
- Using epsilon-greedy on high-variance metrics where early samples are unreliable.
Industry Context
- SaaS/B2B: Useful for low-stakes optimization like email subject lines and in-app messages.
- Ecommerce/DTC: Popular for banner, hero, and promotion optimization on category pages.
- Lead gen: Fits ad creative rotation and dynamic landing page modules.
The Behavioral Science Connection
Epsilon-greedy mirrors the explore-exploit tradeoff humans face daily and usually handle poorly. Research on variety-seeking shows people over-explore when they should exploit (trying new restaurants despite having a favorite) and over-exploit when they should explore (never trying anything new). Epsilon-greedy imposes discipline via a fixed exploration budget.
Key Takeaway
Epsilon-greedy is the training wheels of bandit algorithms — simple, interpretable, and good enough for most low-stakes optimization problems.