In 1985, two economists named Rajnish Mehra and Edward Prescott published a paper in the Journal of Monetary Economics that, more than four decades later, remains one of the most-cited unsolved puzzles in financial economics. The paper was called “The equity premium: A puzzle.” Its claim was deceptively narrow. Mehra and Prescott showed that the historical premium that US stocks had earned over short-term US government debt — roughly six percentage points per year averaged across the period from 1889 to 1978 — was vastly larger than any plausible calibration of the standard consumption-based asset pricing model could reproduce. Specifically: the model, given the actually-observed volatility of US per-capita consumption growth and any reasonable estimate of how risk-averse people are, predicted an equity premium of about 0.35 percent. The data showed roughly 6 percent. The model was off by a factor of roughly seventeen.
The mismatch was so large, so well-documented, and so resistant to obvious fixes that the paper effectively created a sub-field. Forty-plus years of follow-up literature has proposed habit-formation models, disaster-risk models, heterogeneous-agent models, recursive-preference models, ambiguity-aversion models, long-run-risk models, rare-disaster models, and behavioral models built on Kahneman and Tversky’s prospect theory. Each of these explanations has serious advocates. None of them has produced a consensus. The puzzle itself — that the equity premium exists, that it is large, that it is robust across decades and across countries, and that standard rational-expectations consumption-based asset pricing cannot reproduce it without implausible parameters — has held up remarkably well.
This is a different kind of entry for this hub. Most of the case studies here are about famous findings that failed to replicate, or whose original effect sizes turned out to be wildly overstated, or whose authors committed outright fraud. The equity premium puzzle is the opposite kind of case. The empirical finding has replicated, repeatedly, across many subsequent studies and many international datasets. What has not replicated — what has, in fact, conspicuously failed to materialize despite forty years of effort — is a satisfactory theoretical explanation grounded in standard economic assumptions. The puzzle is a durable anomaly. It is the kind of finding that should make economists humble about the explanatory reach of the canonical models, and it is the kind of finding that has, in the absence of theoretical resolution, pushed serious money toward behavioral explanations that have direct implications for how to design retirement plans, default investment options, and employee stock ownership.
This essay walks through what Mehra and Prescott actually showed in 1985, why their calculation was so devastating, how the puzzle has stood up across decades and countries, and what the leading proposed explanations are — particularly the behavioral one from Shlomo Benartzi and Richard Thaler that has done more than any other to translate the puzzle into practical guidance for institutional investors, plan sponsors, and policymakers.
What Mehra and Prescott Actually Calculated in 1985
The 1985 paper sits inside a particular intellectual context that is worth establishing before the calculations.
By the early 1980s, financial economics had largely converged on the consumption-based capital asset pricing model, often called the consumption CAPM or CCAPM. The intuition is straightforward. A representative consumer chooses how much to consume now versus invest for future consumption. Risky assets — stocks — are valuable insofar as they pay off well in states of the world where the consumer would otherwise be worst off, i.e., in states where consumption is low. Because stocks tend to pay off poorly precisely when the economy is bad and consumption is low, the consumer demands a premium to hold them. The size of that premium depends on two things: how much consumption growth covaries with stock returns, and how risk-averse the consumer is.
Mehra and Prescott took this framework and asked a quantitative question. Given the actual observed time series of US per-capita consumption growth from 1889 to 1978 — average growth around 1.8 percent per year, standard deviation around 3.6 percent — and given a plausible value for the coefficient of relative risk aversion (the model’s key preference parameter), what equity premium does the model predict?
Their answer, derived carefully in the paper: with a coefficient of relative risk aversion of 10 — itself already at the high end of what most economists considered plausible for individual decision-making — the model predicted an equity premium of approximately 0.35 percent per year. The actually-observed equity premium for the period was approximately 6.18 percent per year. The ratio of observed to predicted was approximately 17 to 1.
The calculation depends on one technically unobjectionable insight: in the data, aggregate per-capita consumption growth is smooth. It does not vary much from year to year. Even in recessions, consumption falls less than output and far less than stock prices. The standard deviation of annual consumption growth in the US over the relevant period was about 3.6 percent. The standard deviation of annual real stock returns over the same period was about 16.5 percent. Stocks are much more volatile than consumption.
The consumption-based asset pricing model implies that risky-asset premia should be proportional to the covariance between asset returns and consumption growth. Because consumption growth is smooth, that covariance is small. To generate a 6 percent equity premium from such a small covariance, the model needs the consumer to be enormously risk-averse — to dislike the small amount of consumption variance that the equity premium is supposedly compensating for so dramatically that they demand 6 percentage points of extra return per year to bear it. Mehra and Prescott showed that getting the model to match the data required a coefficient of relative risk aversion of approximately 30 or higher.
A coefficient of relative risk aversion of 30 is not a normal number. To put it concretely: it implies that a representative consumer would be indifferent between (a) a coin flip between consuming 50 percent of current consumption and 100 percent of current consumption, and (b) consuming a guaranteed 51.2 percent of current consumption. People do not behave that way in any other domain that economists or psychologists have measured. Risk-aversion coefficients of 1 to 5 are typical estimates from labor-supply studies, lottery-experiment studies, and insurance-purchase studies. 30 is off the chart.
This is what Mehra and Prescott meant by “a puzzle.” It was not that the model gave the wrong answer by a few percent. It was that no plausible calibration of the model produced an answer close to what the data showed. To match the data required either (a) accepting an utterly implausible degree of risk aversion in the representative consumer, or (b) acknowledging that the canonical consumption-based asset pricing model was missing something fundamental about how people actually price risk.
The Puzzle’s Robustness Across Decades and Countries
In the four decades since Mehra and Prescott’s paper, the puzzle has been extended, refined, and tested in many directions. The robustness of the underlying empirical finding has held up remarkably well.
Extensions to later US data. The original 1889-1978 sample period included the unusual mid-20th-century era of strong US equity returns. Subsequent updates extending the data forward through the 1980s, 1990s, 2000s, and 2010s have shown that the puzzle does not depend on the original sample period. The US equity premium has continued to be large — typically estimated in the range of 4 to 7 percent depending on the period and the benchmark — and consumption growth has continued to be smooth. The mismatch persists.
Extensions to international data. The puzzle is not a US-only phenomenon. Studies of equity premia in other developed markets — the UK, Germany, France, Japan, Australia, Canada — have generally found large equity premia and smooth consumption growth in the same way that the US data shows. Phillipe Jorion and William Goetzmann’s 1999 analysis of long-run international equity returns documented substantial premia across many markets. Elroy Dimson, Paul Marsh, and Mike Staunton’s “Triumph of the Optimists” project assembled century-long return series for 16 countries and similarly found persistent large premia. The empirical finding generalizes; the theoretical puzzle generalizes with it.
Survival of robustness checks. The standard methodological objections — sample selection, survivorship bias in country selection, peso problems where the data missed disasters that were rationally expected — have been examined extensively. Each can shave some of the puzzle, but none eliminates it. Even after adjusting for plausible amounts of survivorship bias and assuming that investors rationally anticipated rare disasters that did not happen in sample, the residual puzzle is large.
The risk-free rate puzzle as twin. Mehra and Prescott’s 1985 paper actually documented two related puzzles. The first is the equity premium puzzle: the gap between stock returns and risk-free returns is too large for the model to explain. The second is the risk-free rate puzzle: given the observed consumption growth and any reasonable risk-aversion parameter, the model predicts a much higher risk-free interest rate than is actually observed. The actual real risk-free rate has historically been around 1 percent. The model with a high risk-aversion parameter would predict a real risk-free rate in the high single digits or above. The two puzzles together mean that the model is failing in two directions at once: stocks return too much and bonds return too little, relative to what the model says they should.
The combination is significant. A theoretical fix that explains one of the puzzles often makes the other worse. Cranking up risk aversion to explain the equity premium tends to predict an even higher risk-free rate, exacerbating the second puzzle. This is part of why the literature has had such trouble producing a clean resolution — the model is failing in a way that constrains the possible fixes.
The Benartzi-Thaler Behavioral Explanation: Myopic Loss Aversion
The most well-developed behavioral explanation of the puzzle comes from Shlomo Benartzi and Richard Thaler in their 1995 paper “Myopic loss aversion and the equity premium puzzle,” published in the Quarterly Journal of Economics. Their explanation has become the canonical behavioral resolution and has had more practical influence on retirement plan design than any of the rational-explanation alternatives.
The argument has two ingredients. The first is loss aversion, from Kahneman and Tversky’s prospect theory. Investors evaluate gains and losses relative to a reference point — typically the starting value of their portfolio — and weight losses approximately twice as heavily as equivalent gains. The asymmetry is the well-documented finding from prospect theory, although as the loss aversion entry in this hub discusses, the specific 2-to-1 ratio is more conditional than the canonical version suggests.
The second ingredient is mental accounting and frequent evaluation. Investors do not evaluate their portfolios at infinite horizons. They evaluate them periodically — at least annually, often more frequently — and they experience the psychological costs of losses at each evaluation point. Benartzi and Thaler called the combination of these two ingredients myopic loss aversion: loss-averse investors who evaluate their portfolios myopically over short horizons.
The implication for the equity premium puzzle is direct. If investors are loss-averse and evaluate their portfolios annually, then they are exposed to the year-to-year volatility of equity returns. Equity returns are sufficiently volatile that, evaluated on a one-year horizon, there is a meaningful probability of substantial loss. A loss-averse investor weights those potential losses heavily, requires substantial compensation to bear them, and therefore demands a large equity premium.
Benartzi and Thaler calibrated their model and showed that the combination of plausible loss-aversion parameters (around the 2-to-1 ratio from Tversky and Kahneman 1992) and an annual evaluation horizon produces an equity premium roughly in line with what is observed. The model does not require an implausible risk-aversion coefficient. It requires a plausible loss-aversion coefficient and an evaluation horizon that matches how people actually look at their portfolios.
The practical importance of the Benartzi-Thaler explanation is that, unlike most of the rational-explanation alternatives, it makes predictions that can be tested in field settings and that have direct implications for institutional design. If the explanation is correct, then changing the frequency with which investors evaluate their portfolios should change their willingness to hold stocks. Less-frequent evaluation should produce more willingness to bear equity risk; more-frequent evaluation should produce less. Subsequent experimental and quasi-experimental work has provided support for this prediction. Investors who receive less-frequent feedback on their portfolios do, on average, hold more aggressive equity allocations and earn higher long-run returns.
This is the explanation that has done the most work in the practical world. Default contribution rates in 401(k) plans, default equity allocations in target-date funds, the structure of quarterly versus annual benefit statements, the way employee stock ownership plans report vesting and value — all of these design decisions can be understood as applications of (or implicit responses to) the myopic-loss-aversion framework. The Benartzi-Thaler explanation is not just a story about why the equity premium exists; it is a story with operational consequences for institutional investors and plan sponsors.
The Barro Disaster-Risk Explanation
A second major class of explanation comes from the disaster-risk literature, most associated with Robert Barro’s 2006 Quarterly Journal of Economics paper “Rare disasters and asset markets in the twentieth century.” Barro built on earlier work by Thomas Rietz from 1988 that had first proposed that rare catastrophic events might explain the equity premium.
The argument is that the historical sample of consumption growth data on which Mehra and Prescott based their analysis was, in important ways, an unrepresentative sample. The US in the 20th century did not experience a true disaster — defined as a peacetime peak-to-trough consumption decline of, say, 25 to 50 percent or more. Many other countries did. Barro assembled an international dataset of 35 countries over the 20th century and documented that disasters of that magnitude — wars, depressions, financial collapses, hyperinflations — occurred at a non-trivial frequency. He estimated a roughly 1.7 percent annual probability of a peacetime consumption disaster of 15 percent or more.
In Barro’s model, investors are rationally pricing the possibility of a disaster that did not happen in the US 20th-century sample but that they nonetheless expected. The equity premium compensates them for bearing the disaster risk. Standard risk-aversion parameters in the range of 3 to 4 — much closer to what other behavioral evidence suggests — are sufficient to generate a 6 percent equity premium, once the disaster-risk distribution is accurately modeled.
The disaster-risk explanation has substantial theoretical appeal. It rescues the standard rational-expectations consumption-based model by attributing the puzzle to a measurement problem — the historical US sample was missing the tail risk that investors were actually pricing. Subsequent work has refined the disaster-risk framework substantially, including Emmanuel Farhi and François Gourio’s recent extensions and Xavier Gabaix’s “rare disasters and asset pricing” formalization.
The empirical objection to the disaster-risk explanation is that it is essentially unfalsifiable on standard timescales. Disasters severe enough to materially shift the calibration occur at low enough frequency that the available data cannot decisively distinguish between “investors are rationally pricing disasters that have not yet occurred in our sample” and “investors are loss-averse and demanding a large premium for ordinary volatility.” Both stories fit the data. The disaster-risk story is consistent with rational expectations and standard preferences; the behavioral story is consistent with the broader prospect-theory evidence base. Different economists weigh these considerations differently.
A second objection is that the disaster-risk model, when calibrated to match the equity premium, often predicts implications for other asset markets (sovereign debt spreads, option-implied tail-risk premia, cross-country variation in equity premia) that do not always fit the data cleanly. The literature here is technical and ongoing.
The 2003 Retrospect and the Current State of the Literature
In 2003, Mehra and Prescott published a chapter in the Handbook of the Economics of Finance titled “The equity premium in retrospect.” The paper is a useful summary of the state of the literature roughly twenty years after the original puzzle.
Their assessment was, broadly, that the puzzle had survived. They reviewed the main classes of proposed explanation — disaster risk, habit formation, recursive preferences, heterogeneous agents, behavioral explanations — and concluded that none had produced a fully satisfactory resolution. Their own leanings had shifted somewhat toward heterogeneous-agent models with borrowing constraints, where the representative-agent assumption of the original model is relaxed and a fraction of the population is constrained from holding the optimal portfolio. These models can generate larger equity premia than the original representative-agent model, but they require additional structural assumptions about who is constrained and why.
Twenty more years on, the state of the literature is still that the puzzle is real, the explanations are contested, and no single resolution has consensus support. The leading candidates have all gained empirical refinement. Long-run risk models (Bansal and Yaron 2004), habit-formation models (Campbell and Cochrane 1999), disaster-risk models (Barro 2006 and successors), ambiguity-aversion models (Hansen and Sargent 2007), and myopic-loss-aversion models (Benartzi and Thaler 1995 and successors) all have serious advocates and serious critics. Most working financial economists would say that the resolution probably involves elements from several of these classes — that the equity premium reflects some combination of rational compensation for tail risk, behavioral aversion to short-term volatility, time-varying risk premia, and heterogeneity in who actually bears equity risk.
The honest summary for a non-specialist: the empirical equity premium puzzle is one of the most durable findings in empirical finance. The theoretical explanations are unsettled. If you are evaluating an asset-pricing claim that presents itself as having “solved” the equity premium puzzle, treat that claim with skepticism. The field’s current understanding is that the puzzle has several plausible partial explanations and no decisive resolution.
Why This Anomaly Matters: Strategist Implications
The equity premium puzzle is not a piece of theoretical economics with no practical consequence. The competing explanations have direct implications for how to design retirement plans, default investment options, employee stock ownership programs, and individual investor education.
Implication 1: If the behavioral explanation is even partially correct, evaluation frequency is a design lever. The Benartzi-Thaler model implies that investors who evaluate their portfolios less frequently will be more willing to bear equity risk and will earn higher long-run returns. For plan sponsors, this argues against quarterly performance statements that prominently display recent losses, in favor of annual or longer reporting cycles that allow short-term volatility to wash out. For individual investors, this argues for not logging into the brokerage account every day during a downturn.
This is not a fringe view. Major plan sponsors and asset managers have substantially adjusted statement design and online reporting tools in directions consistent with the myopic-loss-aversion framework. The default reporting frequency for many retirement plans has moved away from quarterly statements with prominent short-term return displays toward annual statements emphasizing long-run cumulative performance. The behavioral explanation has had real institutional effects, regardless of whether it is the complete explanation of the puzzle.
Implication 2: Default investment options should reflect long-horizon equity risk that individuals would not choose themselves. If a substantial fraction of investors are myopically loss-averse and therefore under-allocate to equities relative to what a long-horizon analysis would recommend, then default investment options — automatic enrollment, target-date funds, default contribution allocations — can be designed to push allocations toward what long-horizon analysis would suggest, working with rather than against the behavioral tendency. Target-date funds with substantial early-career equity allocations that gradually de-risk over time are an implementation of this logic. Default automatic enrollment in retirement plans is another.
The institutional adoption of these design choices over the past two decades is one of the cleanest examples of behavioral finance affecting practice at scale. The mechanism may or may not be myopic loss aversion specifically — it may be inertia, complexity-aversion, or default effects more broadly — but the design choices have moved in directions that the Benartzi-Thaler framework predicts would improve outcomes.
Implication 3: For ESOP design and employee stock holding, frequency of valuation matters. Employee stock ownership plans, restricted stock unit grants, and other employee-equity programs raise the same evaluation-frequency questions. If employees receive frequent updates on the value of their unvested equity, they are likely to experience myopic loss aversion in ways that affect both their satisfaction with the compensation arrangement and their broader risk-taking behavior. Plan sponsors and compensation committees can use this insight to design reporting cadences that match the long-horizon nature of equity grants rather than fighting against it.
Implication 4: The puzzle’s persistence is itself a useful epistemic signal. That a clearly-documented financial anomaly has resisted clean theoretical resolution for forty years is itself worth taking seriously. It is a reminder that the canonical models of financial economics are useful approximations, not complete descriptions, and that practitioners who treat them as complete descriptions are making implicit assumptions that the data does not support. Skepticism about asset-pricing claims that depend on canonical-model implications — particularly claims about the appropriate risk premium for new asset classes, or claims about expected returns for novel portfolios — is a reasonable default posture.
Implication 5: The puzzle disciplines claims about the equity premium going forward. A reasonable analyst’s expectation for the forward-looking US equity premium is not the simple historical average. The historical premium reflects both whatever the “true” risk premium is and whatever lucky outcomes happened in the sample. Most modern analyses of the forward-looking equity premium estimate it as substantially smaller than the 6 percent historical average — typically in the 3 to 5 percent range — precisely because some portion of the historical premium is attributable to unrepeatable factors. For long-horizon financial planning, individual investors and institutions should generally use forward-looking premium estimates rather than the historical average.
What This Means If You’re a Strategist
Three takeaways for anyone making decisions where the equity premium puzzle is relevant.
1. Treat the puzzle as a constraint on theory, not as an asset-management opportunity. The equity premium puzzle is a statement about what the canonical economic theory cannot explain. It is not, by itself, a statement about how to make money in markets. The size of the premium is reasonably well-known; capturing it requires bearing the equity risk for long enough that the premium materializes. Many practical implications follow from accepting the empirical premium, but the puzzle does not give you an edge — it gives you a reason to be skeptical of theoretical claims about how the premium should be priced.
2. The behavioral explanation has the most operational content. Among the competing explanations, the myopic-loss-aversion framework has done the most to translate the puzzle into design choices for retirement plans, default investments, and employee equity programs. The institutional adoption of these design choices over the past two decades is one of the cleanest examples of behavioral finance affecting practice at scale. If you are designing a benefit program, a savings plan, or an investor-communication strategy, the Benartzi-Thaler framework is the most useful starting point regardless of whether it is the complete theoretical explanation.
3. Use forward-looking equity premium estimates, not the historical average. For any long-horizon financial decision — retirement planning, pension fund liability matching, endowment investment policy — the relevant input is the expected forward equity premium, not the historical average. Modern estimates of the forward-looking premium are typically lower than the 6 percent historical average, reflecting some combination of disaster-risk reasoning, the possibility that risk premia have compressed as more investors participate in equity markets, and the recognition that the historical sample includes unrepeatable factors. Planning with a 3 to 5 percent expected premium rather than a 6 percent historical premium is a more defensible default.
Sources
- [Mehra, R., & Prescott, E. C. (1985). The equity premium: A puzzle. Journal of Monetary Economics, 15(2), 145-161. DOI: 10.1016/0304-3932(85)90061-3](https://www.sciencedirect.com/science/article/abs/pii/0304393285900613) --- the original puzzle paper.
- [Benartzi, S., & Thaler, R. H. (1995). Myopic loss aversion and the equity premium puzzle. Quarterly Journal of Economics, 110(1), 73-92. DOI: 10.2307/2118511](https://academic.oup.com/qje/article-abstract/110/1/73/1894564) --- the canonical behavioral explanation.
- [Mehra, R., & Prescott, E. C. (2003). The equity premium in retrospect. Handbook of the Economics of Finance, 1, 889-938. DOI: 10.1016/S1574-0102(03)01023-9](https://www.sciencedirect.com/science/article/abs/pii/S1574010203010239) --- the authors’ twenty-year retrospective.
- [Barro, R. J. (2006). Rare disasters and asset markets in the twentieth century. Quarterly Journal of Economics, 121(3), 823-866. DOI: 10.1162/qjec.121.3.823](https://academic.oup.com/qje/article-abstract/121/3/823/1917670) --- the disaster-risk explanation.
- [Campbell, J. Y., & Cochrane, J. H. (1999). By force of habit: A consumption-based explanation of aggregate stock market behavior. Journal of Political Economy, 107(2), 205-251. DOI: 10.1086/250059](https://www.journals.uchicago.edu/doi/10.1086/250059) --- the habit-formation explanation.
Related: Other Studies in This Series
This article is part of an ongoing series on famous studies and findings in behavioral and financial economics. Other entries relevant here cover prospect theory, loss aversion, defaults as an anti-example of robust findings, the disposition effect, and hyperbolic discounting. Related entries on the broader finance replication crisis include Harvey-Liu-Zhu 2016 on factor anomalies. The full hub lives at /replication-crisis/.
If you’re designing a retirement plan, an ESOP program, or an investor-communication strategy and want a careful evidence review of what the behavioral-finance literature implies for your specific context, book a consultation.
FAQ
Has the equity premium puzzle been “solved”? No. Forty years after Mehra and Prescott’s original paper, there is no consensus resolution. The leading candidate explanations — myopic loss aversion, disaster risk, habit formation, long-run risk, heterogeneous agents — all have serious advocates and serious critics. Most working financial economists believe the resolution probably involves elements from several of these classes, but no single explanation has consensus support.
Is the equity premium puzzle still real, or did it go away? The empirical premium has continued to hold up. Updates to the original sample period through the 1980s, 1990s, 2000s, and 2010s show that US stocks have continued to earn substantial premia over bonds. The puzzle generalizes to international markets as well. The empirical phenomenon has been one of the most robust findings in empirical finance.
Should I expect future US equity returns to deliver a 6 percent premium? Probably not. Most modern forward-looking estimates of the US equity premium are in the 3 to 5 percent range, somewhat below the long-run historical average. The historical average includes some portion attributable to unrepeatable luck and to compression of risk premia as more investors have participated in equity markets. For long-horizon financial planning, using a forward-looking premium estimate rather than the historical average is generally more defensible.
What is myopic loss aversion in plain terms? It is the combination of two things: investors weight losses approximately twice as heavily as equivalent gains (loss aversion, from prospect theory), and investors evaluate their portfolios over short horizons rather than the long horizons over which equities actually pay off (myopia). The combination means investors experience the year-to-year volatility of stocks as substantially more painful than a fully rational long-horizon evaluation would suggest, and therefore demand a large premium to hold them.
Why does evaluation frequency matter for my retirement plan? If the myopic-loss-aversion explanation is even partially correct, then less-frequent portfolio evaluation should produce more willingness to bear equity risk and therefore higher long-run returns. The practical implication is that checking your portfolio less often, particularly during downturns, is likely to lead to better long-run outcomes. Many plan sponsors have adjusted reporting frequency and statement design in this direction.
Does the disaster-risk explanation mean a market crash is overdue? Not exactly. The disaster-risk model says investors are rationally pricing the possibility of a catastrophic event that has not occurred in the US 20th-century sample. The model does not predict when or whether such an event will occur — only that the probability is high enough to require compensation. Treating the disaster-risk story as a forecasting tool for market crashes is a misreading of the model.
What does this have to do with employee stock ownership plans? ESOPs and other employee-equity programs concentrate risk in a single asset (the employer’s stock) and typically have long vesting horizons. The myopic-loss-aversion framework predicts that frequent valuation updates will cause employees to experience the volatility of their unvested equity disproportionately, potentially leading to suboptimal decisions about portfolio diversification and risk-taking. Plan design choices about valuation frequency, reporting cadence, and diversification options can all be informed by the framework.
replication-crisis equity-premium-puzzle mehra-prescott-1985 behavioral-finance evidence-evaluation