A Bet You Refuse To Take

Suppose I show you two urns. The first urn — call it the transparent urn — contains exactly 50 red balls and 50 black balls. You can count them yourself. The second urn — the opaque urn — contains 100 balls that are some mix of red and black. The composition is unknown. It could be 100 red. It could be 100 black. It could be any split in between.

I offer you a bet: pick a color, draw a ball from an urn, and if the color matches you win $100. Otherwise you win nothing. You choose the urn.

Almost everyone picks the transparent urn.

Fine. Now I offer you the same bet, but for the other color. Pick again.

Almost everyone picks the transparent urn again.

Stop and notice what just happened. If you preferred the transparent urn for red, you implicitly believed the opaque urn contained fewer than 50 red balls. If you also preferred the transparent urn for black, you implicitly believed the opaque urn contained fewer than 50 black balls. Together: fewer than 50 red and fewer than 50 black, in an urn of 100. The probabilities don’t sum to one. Your subjective beliefs are mathematically incoherent.

This is the Ellsberg paradox, published by Daniel Ellsberg in 1961 in the Quarterly Journal of Economics. It is one of the most replicated findings in all of decision science — and it is the reason an entire branch of economic theory exists to model decisions under ambiguity, distinct from decisions under risk. This article is part of the replication crisis hub as an anti-example: a behavioral finding that has survived six decades of scrutiny and stands as a model of what robust experimental economics looks like.

The 1961 Paper And The Two-Urn Problem

Ellsberg, then a young RAND analyst with a Harvard PhD in economics, was not trying to overturn decision theory. He was working through implications of Leonard Savage’s Foundations of Statistics (1954), the canonical text establishing subjective expected utility (SEU) as the rational benchmark for decisions under uncertainty.

Savage’s axioms imply that any rational agent facing uncertainty acts as if they have a single subjective probability distribution over states of the world, and chooses the action that maximizes expected utility under that distribution. This is the foundation of modern microeconomics, finance, and most of statistics that calls itself Bayesian.

Ellsberg constructed his two-urn experiment as a thought experiment first, then ran informal versions on colleagues and seminar attendees — including, famously, Savage himself and Howard Raiffa. The pattern was immediate and overwhelming: highly trained economists, fully aware that their preferences implied probability incoherence, still preferred the transparent urn for both colors. They did not want to bet against an unknown distribution. The pull was strong enough that several admitted to violating their own axioms.

Ellsberg’s published 1961 paper formalized two versions: the two-urn problem (above) and the three-color problem (one urn with 30 red balls and 60 balls that are some unknown mix of black and yellow). Both produce the same violation. Both have been replicated thousands of times since. The full citation:

Ellsberg, D. (1961). Risk, ambiguity, and the Savage axioms. Quarterly Journal of Economics, 75(4), 643-669. DOI: 10.2307/1884324

The paper is short, conversational, and unusually candid about what it does and does not prove. Ellsberg explicitly framed the finding as a challenge to descriptive accuracy of SEU, not a normative refutation. He left open whether the preferences were “rational” — but argued forcefully that they were stable, predictable, and present in sophisticated decision-makers who knew the axioms cold.

Why This Violates Subjective Expected Utility

The violation hinges on Savage’s sure-thing principle (axiom P2), which says: if two actions yield the same outcome in some state of the world, that state should not affect your preference between them. From the sure-thing principle, you can derive that preferences must be representable by some subjective probability distribution.

In the two-urn problem, betting on “red from transparent” versus “red from opaque” forces a probability comparison: transparent is exactly 0.5, so preferring it means you assign opaque-red a probability less than 0.5. Betting on “black from transparent” versus “black from opaque” by the same logic means you assign opaque-black a probability less than 0.5. But opaque-red and opaque-black are complementary events. Their probabilities must sum to one. So you cannot consistently assign both less than 0.5 under SEU.

The standard interpretation: people are ambiguity averse. They distinguish between:

  • Risk — situations where probabilities are known (the transparent urn, a fair coin, a roulette wheel)
  • Ambiguity — situations where probabilities themselves are unknown or vague (the opaque urn, an unfamiliar stock, a new market)

SEU treats these identically — any uncertainty collapses to a single subjective probability. Ellsberg showed people don’t. They demand a premium to bear ambiguity over and above the premium they demand to bear equivalent risk. This is not a small effect: in laboratory bets, the ambiguity premium is typically 10-20% of the stake, and it shows up even when participants are trained economists who can articulate the SEU axioms.

A crucial nuance often missed: Ellsberg’s finding is not that people are bad at probability. The transparent urn is unambiguous — people get the 50/50 right. The violation is specifically about how preferences respond to the quality of probabilistic information, not to its content. This is what makes the paradox theoretically interesting and what made it so hard for SEU defenders to dismiss as a framing error or a cognitive mistake.

Replication Is Robust Across Six Decades

This is where the contrast with the replication crisis sharpens. The Ellsberg paradox has been replicated under essentially every variation researchers have thrown at it. The 1992 review by Camerer and Weber catalogued dozens of replications across stakes, populations, presentation formats, and decision domains:

Camerer, C., & Weber, M. (1992). Recent developments in modeling preferences: Uncertainty and ambiguity. Journal of Risk and Uncertainty, 5(4), 325-370. DOI: 10.1007/BF00122575

Camerer and Weber found ambiguity aversion in studies with real monetary stakes, with hypothetical stakes, with professional traders, with statistics students, with insurance executives, with farmers in developing economies, and in field experiments on actual investment choices. The effect attenuates somewhat with stakes and expertise but does not disappear. A meta-analytic estimate they cite suggests roughly 50-70% of subjects exhibit clear ambiguity aversion across paradigms, with only 10-15% showing ambiguity seeking and the remainder approximately neutral.

A particularly important later replication came from Yoram Halevy in 2007, who ran the Ellsberg two-urn experiment with 142 University of British Columbia undergraduates and explicitly tested whether ambiguity aversion correlated with violations of reduction of compound lotteries (a separate Savage axiom). Halevy found that essentially all the variance in ambiguity attitudes was explained by attitudes toward compound objective lotteries — suggesting ambiguity aversion is a real phenomenon distinct from simple miscalculation:

Halevy, Y. (2007). Ellsberg revisited: An experimental study. Econometrica, 75(2), 503-536. DOI: 10.1111/j.1468-0262.2006.00755.x

Halevy’s paper is methodologically important because it forecloses one of the main “rationality” defenses of SEU — the idea that Ellsberg subjects are just confused and would correct themselves under proper training. Halevy’s subjects did not.

Compare this evidentiary track record to a finding from the Open Science Collaboration 2015 reproducibility project: a single underpowered study, never independently confirmed, often with a fragile p-value just under 0.05. Ellsberg has hundreds of replications, robust effect sizes, predicted moderators that hold up, and a theoretical structure that generates new predictions which themselves replicate. This is what real science looks like when it works.

Gilboa-Schmeidler: Maxmin Expected Utility

The theoretical response to Ellsberg took decades to crystallize. The most influential generalization came in 1989 from Itzhak Gilboa and David Schmeidler, who proposed that ambiguity-averse decision-makers act as if they hold a set of probability distributions rather than a single one, and evaluate each action by its worst-case expected utility across the set:

Gilboa, I., & Schmeidler, D. (1989). Maxmin expected utility with non-unique prior. Journal of Mathematical Economics, 18(2), 141-153. DOI: 10.1016/0304-4068(89)90018-9

This is the maxmin expected utility (MMEU) model. Formally: an agent has a closed convex set $\mathcal{C}$ of probability measures over states; given an act $f$, the agent’s evaluation is $\min_{p \in \mathcal{C}} \mathbb{E}_p[u(f)]$. The agent then chooses the act with the highest minimum.

Gilboa and Schmeidler proved that MMEU is axiomatically equivalent to Savage’s framework minus the independence axiom in a specific weakened form — a precise mathematical characterization of what ambiguity aversion implies about preferences. The model predicts exactly the Ellsberg pattern: in the opaque urn, the set of compatible distributions ranges from “100 red, 0 black” to “0 red, 100 black”; the worst case for betting red is the all-black urn (0 expected utility) and the worst case for betting black is the all-red urn (also 0). So both bets evaluate to 0, while the transparent urn evaluates to 0.5 — and the agent picks the transparent urn for both colors. No contradiction.

MMEU is now standard equipment in graduate microeconomics. Variants include the smooth ambiguity model of Klibanoff, Marinacci, and Mukerji (2005), the alpha-MEU model of Ghirardato, Maccheroni, and Marinacci (2004), and the multiplier preferences of Hansen and Sargent used in robust control macroeconomics. All trace their lineage to Ellsberg’s two urns. None would exist without that 1961 paper.

The intellectual arc here is the model for what theoretical economics is supposed to do: an experimental anomaly produces a stable empirical pattern; the pattern is replicated and characterized across decades; theorists construct axiomatically grounded generalizations that nest the original model as a special case; the generalization makes new predictions that are themselves tested and confirmed. Ellsberg → Camerer-Weber review → Gilboa-Schmeidler → Halevy → modern macrofinance applications. Each step builds on the last.

Real-World Applications: Where Ambiguity Aversion Matters

The reason ambiguity aversion has earned permanent residency in economics is that it explains a half-dozen first-order empirical puzzles that SEU cannot touch.

The equity premium puzzle. Mehra and Prescott (1985) famously showed that the historical excess return on US equities over Treasury bills — roughly 6% per year — is far larger than standard SEU models with plausible risk aversion can rationalize. Models incorporating ambiguity about the true distribution of returns (Epstein-Schneider 2008, Ju-Miao 2012) close most of the gap. Investors demand a premium not just for variance but for not knowing the true model.

Insurance market puzzles. Standard SEU predicts that insurance should be priced at expected loss plus a small risk premium. In reality, insurance for low-probability/high-severity events (earthquakes, terrorism, novel medical conditions) is priced far above expected loss, and often is not offered at all when actuarial uncertainty is high. Ambiguity aversion on the insurer’s side predicts exactly this pattern: insurers demand a premium for ambiguous loss distributions, and refuse coverage when the ambiguity is too great to price.

Home bias in equity portfolios. Investors worldwide hold vastly more of their domestic equities than diversification under SEU would recommend. The 2009 Coval and Moskowitz-style evidence shows that even sophisticated institutional investors over-weight the familiar. Ambiguity aversion explains this: foreign markets present greater ambiguity about underlying distributions, so investors apply a higher discount.

Drug development and FDA decisions. Regulators and patients distinguish sharply between drugs with known side-effect profiles and drugs whose side-effect profiles are uncertain. Standard expected-value calculations under-predict how strongly populations avoid novel pharmaceuticals during early adoption. Ambiguity premia on novel technologies are well-documented.

Climate policy. Cost-benefit analysis of carbon policy is highly sensitive to whether one models climate response as risk (known distribution of outcomes) versus ambiguity (deep uncertainty about the distribution itself). Weitzman’s “dismal theorem” formalizes how tail ambiguity over climate sensitivity can produce arbitrarily large willingness-to-pay for mitigation — an argument with direct policy implications.

Strategic contracting and incomplete contracts. When contracting parties face ambiguity about future contingencies, MMEU predicts they will write more rigid contracts that perform tolerably across a wider range of possible futures, rather than contracts optimized for a single expected scenario. This connects ambiguity aversion to the foundations of organizational economics.

In each domain, the alternative to ambiguity aversion is to assume that markets are systematically irrational and ignore prices for decades. Once you take Ellsberg seriously, half of the “puzzles” in finance and decision theory dissolve into a coherent story about agents who care about both risk and the quality of probabilistic information.

What This Means For Strategists And Operators

If you are running a company, building product, or making large investment decisions, the operational implications of ambiguity aversion are concrete.

Customers will pay a premium for clarity even when expected value is identical. A pricing scheme with hidden fees has lower customer willingness-to-pay than a transparent pricing scheme with the same average cost — not because customers can’t do the math, but because they apply ambiguity premia to opaque costs. This is the operational principle behind the success of flat-fee SaaS pricing, all-inclusive resorts, and predictable subscription billing. If you are seeing churn correlated with billing surprises, you are looking at ambiguity aversion in your unit economics.

New product categories require an “ambiguity discount” baked into pricing. When you launch into an unfamiliar category, customers will demand a steeper discount than risk alone justifies. Free trials, money-back guarantees, and freemium tiers are operational tools for reducing the ambiguity premium customers apply to your product — they convert ambiguity into measurable risk that customers can evaluate.

Investors evaluating your fundraise apply ambiguity premia to unfamiliar metrics. If your business has KPIs that look unfamiliar to the investor’s mental model, expect a valuation discount that has nothing to do with the actual risk profile. The solution is not to hide complexity but to do the translation work — present your metrics in the standard frame the investor recognizes, then layer your differentiation on top.

Internal decisions about novel initiatives systematically under-invest relative to EU-optimal levels. Corporate capital allocation processes are notoriously ambiguity-averse: projects with well-understood return distributions get funded; projects with unknown return distributions get cut even when their expected value is higher. If you are a founder pitching an internal sponsor or a manager defending a budget, the operational tactic is to reduce ambiguity about the investment — small pilots, staged commitments, optionality structures — rather than to argue for higher expected value.

Hiring and team-building exhibit ambiguity premia. Candidates from known institutions, known companies, and known training programs get hired at premia that exceed their measurable performance differential. The “Stanford CS grad premium” is partly an ambiguity discount applied to less-legible candidates. Operationally, this is an opportunity: structured work-sample tests and clear performance evaluations reduce ambiguity for candidates from non-traditional backgrounds, and disproportionately surface high-performing talent that the market is mispricing.

The meta-lesson: ambiguity aversion is not a bug to be debugged out of human decision-making. It is a stable feature of how people — including sophisticated decision-makers — process uncertainty. Strategies that account for it outperform strategies that assume away.

Why This Is The Anti-Example

I am building a replication crisis hub that catalogues findings that did not survive scrutiny — ego depletion, power posing, stereotype threat, various priming effects. Ellsberg is the counterweight.

The contrast is methodologically instructive. Ellsberg published a finding that was:

  1. Theoretically motivated. The experiment was designed to test specific axioms of an existing formal framework, not run inductively until a p-value emerged.
  2. Large in effect size. The preference for the transparent urn is not a 0.2 standard-deviation drift detectable only with thousands of subjects; it is a strong majority pattern visible in any classroom demonstration.
  3. Replicated under stake variation. When other researchers ran the experiment with real money, hypothetical money, large stakes, small stakes — the pattern held.
  4. Replicated across populations. Economists, business students, farmers, traders, undergraduates, professional decision-makers. The pattern is universal enough to call cognitive, not artifactual.
  5. Generative of new predictions. MMEU and its descendants generated dozens of novel predictions about asset pricing, insurance, and contracting — and those predictions themselves replicated.
  6. Published in a top journal with full methodological transparency. You can read the 1961 paper and run the experiment yourself this afternoon.

This is a different epistemic creature from the typical priming study that fails to replicate. The difference is not luck. It is the combination of theoretical grounding, experimental robustness, conceptual replication across paradigms, and willingness to update the formal framework when the data demand it. This is what good experimental economics looks like, and it is why ambiguity aversion is a permanent fixture in the field while many psychology findings from the same era have quietly disappeared.

If you want a litmus test for whether a behavioral finding is worth building strategy on, the Ellsberg standard is the bar to apply: does it survive replication across populations, stakes, and operationalizations? Does it generate new predictions that also replicate? Has the theoretical community built generalizations that nest the original finding as a special case? Pass those tests and you have something durable. Fail them and you have a fashion.

Sources

  • Ellsberg, D. (1961). Risk, ambiguity, and the Savage axioms. Quarterly Journal of Economics, 75(4), 643-669. DOI: 10.2307/1884324
  • Camerer, C., & Weber, M. (1992). Recent developments in modeling preferences: Uncertainty and ambiguity. Journal of Risk and Uncertainty, 5(4), 325-370. DOI: 10.1007/BF00122575
  • Gilboa, I., & Schmeidler, D. (1989). Maxmin expected utility with non-unique prior. Journal of Mathematical Economics, 18(2), 141-153. DOI: 10.1016/0304-4068(89)90018-9
  • Halevy, Y. (2007). Ellsberg revisited: An experimental study. Econometrica, 75(2), 503-536. DOI: 10.1111/j.1468-0262.2006.00755.x
  • Savage, L. J. (1954). The Foundations of Statistics. New York: Wiley.
  • Klibanoff, P., Marinacci, M., & Mukerji, S. (2005). A smooth model of decision making under ambiguity. Econometrica, 73(6), 1849-1892. DOI: 10.1111/j.1468-0262.2005.00640.x
  • Mehra, R., & Prescott, E. C. (1985). The equity premium: A puzzle. Journal of Monetary Economics, 15(2), 145-161. DOI: 10.1016/0304-3932(85)90061-3
  • Epstein, L. G., & Schneider, M. (2008). Ambiguity, information quality, and asset pricing. Journal of Finance, 63(1), 197-228. DOI: 10.1111/j.1540-6261.2008.01314.x

FAQ

Is the Ellsberg paradox the same as risk aversion?

No. Risk aversion is a preference for certain outcomes over uncertain ones with the same expected value, and it operates within standard expected utility. Ambiguity aversion is an additional preference for known probability distributions over unknown ones, holding expected utility constant under any single distribution. A risk-averse agent could in principle be ambiguity-neutral; the Ellsberg paradox specifically demonstrates that most people are not.

Did Ellsberg himself believe his subjects were irrational?

He was deliberately ambiguous on this point. He framed the paradox as a challenge to the descriptive accuracy of subjective expected utility, not a normative refutation. His personal view, expressed in later interviews, was that the preferences were intuitively reasonable and the formal theory should accommodate them — which is exactly what Gilboa, Schmeidler, and others eventually did.

Has the paradox been replicated with real monetary stakes?

Yes, extensively. Camerer and Weber’s 1992 review catalogues replications with real money up to substantial stakes. Halevy’s 2007 Econometrica study used real monetary incentives. The ambiguity premium attenuates slightly with stakes but does not disappear.

What about ambiguity-seeking subjects?

A minority of subjects (typically 10-15%) prefer the opaque urn — they are ambiguity-seeking rather than ambiguity-averse. This minority is real and consistent across studies. Modern theories (notably the smooth ambiguity model and alpha-MEU) accommodate both attitudes within a unified framework.

Why didn’t Savage simply revise his axioms after Ellsberg’s paper?

Savage famously admitted in a footnote that he himself preferred the transparent urn in Ellsberg’s experiment, recognized this violated his axioms, but defended SEU on normative grounds — arguing that the axioms should be respected even if intuition rebelled. This was an unusual stance and reflects how strongly the SEU framework was entrenched in 1961. The descriptive theory split off into ambiguity models over the following decades while SEU remained the normative benchmark.

Is ambiguity aversion the same as model uncertainty in econometrics?

The two concepts are closely related but technically distinct. Model uncertainty in econometrics typically refers to the statistician’s uncertainty about which of several specifications generated the data. Ambiguity aversion is a preference of the decision-maker over outcomes whose probability distributions are themselves uncertain. The Hansen-Sargent multiplier preferences framework explicitly bridges the two by treating model uncertainty as a source of ambiguity that the decision-maker prices through robust control.

How does this apply to AI and automated decision-making?

Reinforcement learning agents trained under standard expected utility frameworks will systematically over-invest in known-distribution actions and under-explore ambiguous ones unless ambiguity-aware reward structures are built in. Recent work in robust RL and distributionally robust optimization is essentially MMEU applied to machine learning. If you are building automated decision systems, the Ellsberg-Gilboa-Schmeidler framework is directly relevant to how the system should handle distribution shift.

What’s the strongest argument against ambiguity aversion as a real phenomenon?

The most serious challenge comes from Halevy’s 2007 finding that ambiguity aversion in Ellsberg paradigms correlates almost perfectly with violations of reduction of compound lotteries — suggesting the two phenomena have a common cognitive root. This doesn’t refute ambiguity aversion as a behavioral pattern, but it suggests the underlying mechanism may be a single misperception of compound probabilities rather than two distinct preferences. Either way, the operational implication — that decision-makers respond differently to known-distribution and unknown-distribution prospects — remains robust.

Share this article
LinkedIn (opens in new tab) X / Twitter (opens in new tab)
Atticus Li

Experimentation and growth leader. CXL-certified CRO practitioner, Mindworx-certified behavioral economist (1 of ~1,000 worldwide). 200+ A/B tests across energy, SaaS, fintech, e-commerce, and marketplace verticals.