In January 2010, the world’s finance ministers were still living inside the wreckage of the 2008 financial crisis. Public debt had ballooned in every major advanced economy. Stimulus programs that had been politically uncontroversial in 2009 were turning into political liabilities. Greece was about to enter what would become a multi-year sovereign-debt crisis. Ireland, Portugal, and Spain were lining up behind it. In Washington, in Brussels, in Frankfurt, the question that was beginning to dominate fiscal-policy debates was no longer “how do we stimulate the economy” but “how much debt is too much debt.”

Into this debate, in the American Economic Association’s January 2010 meeting volume, dropped a short paper by two of the most-cited economists in the world. Carmen M. Reinhart, then at the University of Maryland, and Kenneth S. Rogoff, of Harvard — co-authors of the 2009 bestseller This Time Is Different: Eight Centuries of Financial Folly — published a five-page contribution to the American Economic Review Papers and Proceedings titled “Growth in a Time of Debt.” It was not peer-reviewed in the traditional sense. The AER Papers and Proceedings is a conference-volume venue, and the paper was a short empirical note rather than a full research article.

But the headline finding was politically explosive. In a dataset of 20 advanced economies spanning roughly six decades, Reinhart and Rogoff reported that countries with gross public debt above 90% of GDP experienced sharply lower growth than countries below that threshold — with median growth approximately one percentage point lower, and average (mean) real GDP growth of roughly −0.1% in the highest-debt category. The implication, as the paper put it, was that “median growth rates for countries with public debt over 90 percent of GDP are roughly one percent lower than otherwise; average (mean) growth rates are several percent lower.”

That single number — “high debt makes growth go negative” — became the empirical backbone of the global austerity case. Paul Ryan cited it in the U.S. House Budget Committee’s “Path to Prosperity.” Olli Rehn, the European Commissioner for Economic and Monetary Affairs, cited it in speeches defending eurozone austerity. George Osborne, the British Chancellor of the Exchequer, cited it in arguing for the Coalition government’s spending cuts. “We can’t let debt go over 90%” became a piece of policy shorthand circulating through finance ministries on both sides of the Atlantic.

Three years later, in April 2013, a graduate student at the University of Massachusetts Amherst named Thomas Herndon tried to reproduce the Reinhart-Rogoff numbers for a class assignment in his applied econometrics course. He couldn’t. After repeated attempts, he and his advisors — Professors Michael Ash and Robert Pollin — asked Reinhart and Rogoff for the underlying spreadsheet. Reinhart and Rogoff, to their credit, sent it.

What Herndon found in that spreadsheet did not produce the published result. It produced a different result. And the gap between the two was a 2.3-percentage-point error that, when corrected, made the 90% threshold disappear.

This is the story of how the most influential macroeconomic claim of the post-financial-crisis era was built on a paper whose central finding could not be reproduced — and what every strategist who evaluates economic-policy claims should learn from it.

What Reinhart-Rogoff 2010 Claimed

Reinhart and Rogoff (2010), “Growth in a Time of Debt,” American Economic Review, 100(2), 573–578 (DOI: 10.1257/aer.100.2.573), was structured as a short empirical contribution to the conference volume. The paper drew on a dataset Reinhart and Rogoff had built for This Time Is Different — a panel of 20 advanced economies with annual observations on gross public debt as a share of GDP, paired with annual real GDP growth.

The empirical exercise was straightforward. For each country-year, the authors classified the observation into one of four “debt buckets” based on the gross-debt-to-GDP ratio: below 30%, between 30% and 60%, between 60% and 90%, and above 90%. They then computed average (and median) real GDP growth within each bucket.

The headline result was a discontinuity. Growth rates were broadly similar across the three lower buckets — roughly 3 to 4 percent on average. Then, at the 90% threshold, the authors reported a sharp drop. Average real growth in the over-90% bucket was approximately −0.1%. Median growth in the same bucket was approximately 1.6%, against roughly 2.8% in the next-lower bucket.

The paper was careful in its written claims. Reinhart and Rogoff explicitly cautioned that they were documenting an empirical regularity, not a causal mechanism. They noted that “the relationship between government debt and real GDP growth is weak for debt/GDP ratios below a threshold of 90 percent of GDP. Above 90 percent, median growth rates fall by one percent, and average growth falls considerably more.” They acknowledged that causation could plausibly run in either direction. They did not, in the paper itself, prescribe specific austerity policies.

But the structure of the finding — a threshold, expressed as a clean number, with a dramatic effect on the dependent variable — was tailor-made for policy translation. A finding that “the relationship between debt and growth is complicated and varies by country” does not survive contact with a budget committee. A finding that “debt above 90% of GDP halves average growth, and turns it negative on average” does.

And so the finding was translated. Almost immediately.

How The Paper Shaped Real Policy

The 90% threshold did not stay inside the academic literature for long. By 2011, it was appearing in policy documents, congressional testimony, and central-bank speeches as if it were a settled empirical fact rather than a single-paper finding from a conference volume.

Paul Ryan and the U.S. House Budget Committee. Representative Paul Ryan’s 2013 “Path to Prosperity” budget — the document that formed the Republican fiscal-policy platform for that cycle — explicitly cited Reinhart and Rogoff. The relevant passage on page 78 of the budget read: “A well-known study completed by economists Ken Rogoff and Carmen Reinhart confirms this common-sense conclusion. The study found conclusive empirical evidence that gross debt exceeding 90 percent of the economy has a significant negative effect on economic growth.” Herndon, Ash, and Pollin would later note that this was the only empirical citation in the budget for the proposition that high debt threatens growth.

Olli Rehn and the European Commission. Olli Rehn, the European Commissioner for Economic and Monetary Affairs from 2010 to 2014, invoked Reinhart-Rogoff in defending the eurozone’s austerity programs. In an address to the International Labour Organization on April 9, 2013 — just one week before the Herndon critique would be published — Rehn argued that “public debt in Europe is expected to stabilise only by 2014 and to do so at above 90% of GDP. Serious empirical research has shown that at such high levels, public debt acts as a permanent drag on growth.” The “serious empirical research” was Reinhart-Rogoff (2010).

George Osborne and UK austerity. George Osborne, who became British Chancellor of the Exchequer in May 2010, drew on the Reinhart-Rogoff framework in defending the Coalition government’s “Plan A” of accelerated fiscal consolidation. The reasoning was not always cited explicitly to Reinhart-Rogoff, but the broader 90% threshold logic — that the UK had to bring debt below the dangerous threshold or face permanently lower growth — was a feature of his public arguments throughout the early 2010s.

The cultural shorthand. Beyond the named politicians, the 90% threshold became a piece of finance-ministry common sense. Investment-bank research notes cited it. Wall Street Journal and Financial Times opinion columns invoked it. Bond-market commentary referred to it as a kind of benchmark — countries approaching 90% were “approaching the danger zone.” The finding had been absorbed into the working vocabulary of fiscal policy.

This was the policy environment in which Thomas Herndon — a then-28-year-old graduate student who had previously worked as a community organizer in Texas — was assigned, in the spring 2013 semester of Professor Michael Ash’s applied econometrics class at UMass Amherst, to take a published empirical paper and try to replicate it.

The 2013 Replication Attempt

The replication assignment in Ash’s course was a standard pedagogical exercise. Students chose a recent empirical paper, obtained or constructed the data, and tried to reproduce the reported results. The goal was to teach students how published empirical work was actually done — and to give them practice working with real data, which is almost always messier than the published numbers suggest.

Herndon chose Reinhart-Rogoff (2010). The paper was high-profile, the empirical exercise was conceptually simple (compute averages within debt buckets), and the underlying data — a panel of historical macroeconomic series for advanced economies — was largely available from published sources like the IMF and OECD.

Herndon assembled his own version of the dataset and ran the calculations. He couldn’t reproduce the −0.1% average growth figure for the highest-debt bucket. He kept getting positive numbers in the range of 2%. He went back and rechecked his data and his code. He still got the same answer. The published Reinhart-Rogoff number could not be reproduced from publicly available macroeconomic series.

At this point Herndon, working with Ash and Pollin, did what the replication norms of the field encouraged: he wrote to the original authors and requested the underlying spreadsheet. Reinhart and Rogoff complied. They sent the actual Excel file from which the 2010 paper’s numbers had been computed.

This act of sending the file is important to acknowledge. In the social-science publication culture of the early 2010s, sharing your underlying data and code was rare. Many authors did not respond to replication requests at all. Reinhart and Rogoff did. Whatever criticisms attach to the paper itself, on the question of post-publication transparency they behaved better than most of their peers.

What was in that spreadsheet was something else.

The Three Errors Herndon Found

Herndon, Ash, and Pollin (2014), “Does High Public Debt Consistently Stifle Economic Growth? A Critique of Reinhart and Rogoff,” Cambridge Journal of Economics, 38(2), 257–279 (DOI: 10.1093/cje/bet075), documented three distinct problems with the original analysis. The paper was first circulated as a Political Economy Research Institute working paper on April 15, 2013, and the news exploded within 48 hours. It was formally published in the Cambridge Journal of Economics in 2014. Each of the three problems pushed the published number in the same direction — toward making the high-debt category look worse than the data actually warranted.

Error #1: The Excel coding error. In Reinhart and Rogoff’s spreadsheet, the formula that computed the average growth across the 20 countries for the highest-debt bucket did not extend across the full row. The cell range used in the AVERAGE() formula stopped at row 44 instead of row 49. The effect was that five countries were silently excluded from the average: Australia, Austria, Belgium, Canada, and Denmark. All five had recorded periods with debt above 90% of GDP, and all five had recorded average growth in those periods that was positive. Excluding them mechanically dragged the average down. By itself, correcting this one Excel cell-range error moved the high-debt average from −0.1% toward roughly +0.3% — a 0.4-percentage-point change in the direction of the corrected finding.

This is the error that attracted the most attention in the press coverage, because it was the cleanest: a single misspecified formula in a spreadsheet, the kind of bug that any working analyst recognizes and fears. The phrase “the Excel error” became shorthand for the whole controversy. But it was not the largest of the three problems.

Error #2: Selective exclusion of post-WWII data. Reinhart and Rogoff had excluded several years of data from several countries during the late 1940s and early 1950s — the period immediately after World War II, when many advanced economies had very high public debt (from war financing) but were also recording rapid postwar recovery growth. The exclusions were:

  • Australia, 1946–1950
  • Canada, 1946–1950
  • New Zealand, 1946–1949

All three countries had debt-to-GDP ratios above 90% during these excluded years, and all three were recording strong positive growth. The published paper did not document a methodological reason for excluding these specific country-years. Reinhart and Rogoff later argued that the data had been incorporated later and not yet vetted at the time of publication. Whatever the explanation, the effect of the exclusions was to remove the high-debt-plus-high-growth episodes from the dataset while retaining other high-debt episodes that showed lower growth.

The New Zealand case is the most extreme illustration. The published paper used only one year of New Zealand data in the high-debt bucket: 1951, when New Zealand recorded GDP growth of approximately −7.6% in the middle of a severe trade strike (the 1951 waterfront dispute). The excluded 1946–1949 years showed strong positive growth at similarly high debt levels. The full available record of New Zealand high-debt years would have shown average growth of approximately +2.6%, not −7.6%.

Error #3: The country-equal weighting scheme. This is the most consequential of the three errors, and the one that requires the most explanation. Reinhart and Rogoff’s averaging procedure was not the standard country-year average that most readers would have assumed. Instead, they computed the average growth rate for each country within the high-debt bucket separately, and then averaged across countries — giving each country equal weight regardless of how many years of high-debt observations it contributed.

The mechanical consequence was that the one year of New Zealand data (the −7.6% from the 1951 strike) was weighted as heavily in the final average as the nineteen years of UK data in the high-debt bucket, or the nineteen years of Greek data. A single anomalous country-year, generated by a domestic strike, was given the same influence on the published result as nearly two decades of British post-war fiscal history.

Combined with the New Zealand year-exclusion error, this weighting scheme had outsized effects. Herndon, Ash, and Pollin calculated that simply correcting the New Zealand mistake — using all available high-debt years for New Zealand instead of just 1951 — moved the average by roughly 1.5 percentage points on its own, before any other correction.

Reinhart and Rogoff would later defend the country-equal weighting as a deliberate methodological choice — a way of preventing any single country with a long high-debt history from dominating the result. There is a reasonable argument for that view; it is not an obviously wrong methodological choice in isolation. But the paper did not flag this weighting as unusual or explain the implications, and most readers — including the policymakers who cited the finding — assumed standard country-year averaging.

The Corrected Result

When Herndon, Ash, and Pollin corrected all three problems — fixed the Excel formula, included the omitted post-WWII years, and used standard country-year weighting — the high-debt average growth figure was not −0.1%. It was +2.2%.

The 2.3-percentage-point shift mattered. Average real GDP growth of +2.2% in the high-debt bucket is not dramatically different from average growth of roughly 3% in the lower-debt buckets. The threshold-style discontinuity that had made the 2010 paper politically explosive was largely an artifact of the three errors. With those errors corrected, the corrected data showed a mild and roughly monotonic negative association between debt and growth — broadly consistent with intuition — but no clean nonlinear threshold at 90% and no collapse of growth into negative territory.

This is the central empirical finding of Herndon-Ash-Pollin (2014). Not “Reinhart and Rogoff are wrong about debt and growth.” More precisely: the specific empirical claim that had driven the policy conversation — that there is a 90% threshold above which growth turns negative — does not survive correction of the errors.

It is worth being precise about what was and was not refuted. The corrected data still showed some negative association between higher debt and somewhat lower growth, particularly in the highest debt bracket. What disappeared was the threshold framing — the cliff at 90% — and the dramatic claim that high-debt episodes were associated with growth contractions on average.

Reinhart-Rogoff’s Response

Reinhart and Rogoff responded publicly within days of the Herndon-Ash-Pollin working paper’s release on April 15, 2013.

Their initial response, published the following day, stood behind both the methodology and the substantive conclusions. By the second statement, however, they explicitly acknowledged the Excel error. “We are grateful to Herndon et al. for the careful attention to our AER Papers and Proceedings paper and for pointing out an important correction,” they wrote. “It is sobering that such an error slipped into one of our papers despite our best efforts to be consistently careful.”

On the other two issues — the country-year exclusions and the weighting scheme — Reinhart and Rogoff pushed back. They argued that the omitted post-WWII data for Australia, Canada, and New Zealand had been compiled later and had not yet been incorporated into the dataset at the time of the AER Papers and Proceedings publication. They argued that the country-equal weighting scheme was a legitimate methodological choice, not an error. They pointed out that their own subsequent work using updated and broader datasets — including a 2012 Journal of Economic Perspectives piece and other follow-ups — continued to find a negative association between high debt and growth, even if not the precise threshold at 90%.

In their April 25, 2013 statement to the New York Times, Reinhart and Rogoff emphasized that their broader research program — which included historical work on financial crises across centuries — was not refuted by the specific errors in the 2010 AER Papers and Proceedings contribution. They argued that the policy implications had been overstated by the politicians who cited them, and that they themselves had been more careful in the paper’s actual language about causation and threshold effects.

There is some merit to this defense. The 2010 paper did not, in its written text, claim to have demonstrated a sharp causal threshold. The leap from “we document a regularity in advanced-economy panel data” to “therefore countries must stay below 90% debt or face permanent damage” was substantially performed by the policymakers who cited the paper, not by the paper itself. But the paper also did not push back against that translation when it was being used to justify austerity programs across Europe and the U.S. The result was that a piece of empirical economics whose central finding rested on three correctable errors was allowed to function, for three years, as the principal academic underpinning of the global austerity consensus.

The Subsequent Empirical Consensus

The Herndon-Ash-Pollin critique did not settle the underlying economic question of whether high public debt impairs growth. It refuted one specific empirical claim — the 90% threshold — but the broader question of the debt-growth relationship is harder, and has continued to be researched.

The post-2013 empirical literature has been notably skeptical of clean threshold claims.

Égert (2015), “Public Debt, Economic Growth and Nonlinear Effects: Myth or Reality?,” Journal of Macroeconomics, 43, 226–238 (DOI: 10.1016/j.jmacro.2014.11.006), used threshold-regression methods on a broader and longer-running panel of countries. Égert found that “no simple public debt threshold exists” — the data did not support a clean, common nonlinear breakpoint at 90% or at any other particular debt level. Where any threshold-like effects could be identified, they were sensitive to specification choices, sample composition, and the inclusion or exclusion of particular country-year observations. The headline conclusion: the simple threshold framing was a “myth” in the data.

Eberhardt and Presbitero (2015), “Public Debt and Growth: Heterogeneity and Non-Linearity,” Journal of International Economics, 97(1), 45–58 (DOI: 10.1016/j.jinteco.2015.04.005), used methods designed to allow the debt-growth relationship to vary across countries and to detect nonlinearities heterogeneously rather than imposing a common threshold. They found support for a generally negative association between high debt and growth — consistent with intuition — but explicitly no support for a “common debt threshold within or across countries.” Whatever debt does to growth, it does not do it via a single number that applies uniformly to advanced economies.

Subsequent meta-analyses have continued in this direction. Heimberger (2022), a meta-analysis in the Journal of Economic Surveys, examined the published literature on debt and growth and was “unable to reject the null hypothesis” of no threshold effect after correcting for publication bias.

The overall picture: the broader question of how high public debt affects growth remains an active empirical literature, with reasonable people disagreeing about magnitudes and mechanisms. The specific claim that there is a clean 90% threshold above which growth collapses — the claim that drove the global austerity debate — has not been replicated in independent samples.

What This Case Tells Us About Economic Policy Research

The Reinhart-Rogoff case sits at an awkward angle to the rest of the replication-crisis literature. There was no fraud. There was no p-hacking in the social-psychology sense. There was not even an exotic statistical procedure that turned out to be improperly applied. There was a spreadsheet, with three problems in it, that produced a number that ended up being repeated by finance ministers on three continents for three years before anybody checked.

Several structural lessons are worth pulling out.

Single-paper findings from elite institutions can shape massive policy with thin reproducibility. The 2010 AER Papers and Proceedings piece was a five-page note in a conference volume. It was not the full peer-reviewed treatment that would be appropriate for a finding of this policy weight. But the prestige of the authors and the journal lent it a credibility that the format did not warrant. By the time the underlying data had actually been checked, the finding had been in circulation for three years.

The “Papers and Proceedings” format is not the place to publish policy-load-bearing claims. The AER Papers and Proceedings is the conference companion volume for the American Economic Association’s annual meetings. Submissions are short, the review process is light, and the venue is explicitly understood within the discipline as a place for preliminary findings and conference summaries rather than for definitive empirical treatments. The 2010 paper was, by the conventions of the field, a working note. The treatment it received in the policy world was the treatment that a fully vetted, peer-reviewed empirical article would deserve.

Code and data sharing should be automatic, not heroic. Reinhart and Rogoff did share their spreadsheet when asked. That is more than most authors in the field would have done in 2013. But the request itself had to come from a graduate student doing a class assignment three years after publication. In a more functional verification infrastructure, the underlying spreadsheet would have been published alongside the paper, would have been examined by reviewers, and the errors would have been caught before — not three years after — the finding entered the policy conversation. The post-2013 movement toward mandatory code and data archiving in economics journals (the AER and Quarterly Journal of Economics both tightened their data-availability requirements in the years that followed) is in part a response to this case.

Empirical claims that translate cleanly into policy slogans should attract more skepticism, not less. The 90% threshold became powerful in the policy conversation precisely because it was a single number with a clean interpretation. “Don’t go over 90%” is the kind of finding that policy needs. It is also exactly the kind of finding that should attract extra scrutiny — because the world rarely organizes itself around clean thresholds, and the temptation to overstate empirical results in their direction is strong on both the author and the consumer side. The 2010 paper’s literal language was more cautious than the political uses to which it was put. But neither the authors nor the field as a whole pushed back when the cautious finding became the load-bearing claim.

What This Means For Strategists Evaluating Economic Claims

You are not a fiscal-policy economist. You do not need to have a calibrated view on whether the U.S. or eurozone needed more or less austerity in 2010 to 2014. What you do need, when you are evaluating an economic claim that is being used to justify a strategic or political decision, is a framework for asking the right questions.

Three calibrations from the Reinhart-Rogoff case.

First: ask what the underlying paper actually said, versus what the policy users said it said. The 2010 paper’s text was substantially more cautious than the policy slogan extracted from it. “We document a regularity in our sample” is not the same claim as “above 90% debt, growth turns negative as a matter of economic law.” When an economic finding is being deployed in your strategic conversation, locate the original paper. Read what the authors literally claimed, with what hedges, on what data, with what acknowledged limitations. The gap between the academic claim and the policy translation is often substantial.

Second: ask whether the claim has been independently replicated. A single empirical paper, even from elite authors at elite institutions, is a single data point. The Reinhart-Rogoff 90% threshold had been cited in policy documents for three years before anybody had attempted to independently reproduce it. When you see an economic claim being deployed, ask: what are the independent replications? Have other research teams, using their own datasets and code, found the same effect? In the case of the 90% threshold, the answer when independent teams checked was: no.

Third: ask whether the data and code are available, and whether anyone has examined them. The Reinhart-Rogoff Excel error was sitting in a spreadsheet that, until 2013, no independent researcher had ever opened. The economics-profession norm of voluntary code-and-data sharing has improved substantially since then. But for any empirical claim that is being used to justify a major decision, the existence of independent examination of the underlying numbers is a real signal of credibility, and the absence of it is a real warning.

The broader frame: macroeconomic policy claims are very often invoked with more empirical confidence than the underlying research warrants. This is not a left-versus-right point. It applies to claims about debt thresholds, to claims about minimum-wage effects, to claims about tax-cut elasticities, to claims about trade-policy impacts. The pattern that played out in the Reinhart-Rogoff case — a single high-profile paper, an elegant numerical claim, rapid adoption into policy slogans, delayed replication checking, eventual disconfirmation of the strong form — is not unique to debt-and-growth research. It is the default failure mode of how high-profile empirical economics interacts with the policy conversation.

When you are evaluating an economic claim in a strategic context, treat it the way you would treat any other empirical claim: with calibration appropriate to the strength of the evidence, not the prestige of the authors or the cleanliness of the number.

Sources

Primary literature:

  • Reinhart, C. M., & Rogoff, K. S. (2010). Growth in a Time of Debt. American Economic Review, 100(2), 573–578. DOI: 10.1257/aer.100.2.573. AEA link.
  • Herndon, T., Ash, M., & Pollin, R. (2014). Does High Public Debt Consistently Stifle Economic Growth? A Critique of Reinhart and Rogoff. Cambridge Journal of Economics, 38(2), 257–279. DOI: 10.1093/cje/bet075. Oxford Academic link. PERI working paper version (WP 322).
  • Égert, B. (2015). Public Debt, Economic Growth and Nonlinear Effects: Myth or Reality? Journal of Macroeconomics, 43, 226–238. DOI: 10.1016/j.jmacro.2014.11.006.
  • Eberhardt, M., & Presbitero, A. F. (2015). Public Debt and Growth: Heterogeneity and Non-Linearity. Journal of International Economics, 97(1), 45–58. DOI: 10.1016/j.jinteco.2015.04.005.
  • Heimberger, P. (2022). Do Higher Public Debt Levels Reduce Economic Growth? Journal of Economic Surveys, meta-analysis of the post-2013 literature on debt-growth thresholds.

Reinhart-Rogoff response:

Contemporaneous commentary:

  • Konczal, M. (2013, April 16). Researchers Finally Replicated Reinhart-Rogoff, and There Are Serious Problems. Roosevelt Institute / Next New Deal blog.
  • Krugman, P. (2013, April 19). The Excel Depression. The New York Times opinion column.
  • Bhattacharjee, Y. (2013, April 24). After Error is Revealed, Professor Pair Defends Core Conclusions. The Harvard Crimson.
  • Cassidy, J. (2013, April 26). The Reinhart and Rogoff Controversy: A Summing Up. The New Yorker.

Policy citations of the original paper:

  • House Budget Committee. (2013). The Path to Prosperity: A Responsible, Balanced Budget. Fiscal Year 2014 Budget Resolution. U.S. House of Representatives, p. 78 (citing Reinhart-Rogoff for the 90% debt-threshold claim).
  • Rehn, O. (2013, April 9). Address to the International Labour Organization, Geneva (invoking Reinhart-Rogoff in defending European fiscal consolidation).

Frequently Asked Questions

Is high public debt bad for economic growth?

The empirical literature is mixed and continues to evolve. Most studies do find some negative association between very high public-debt levels and somewhat lower subsequent growth, particularly at the upper end of the distribution. What the post-Herndon literature has largely failed to find is a clean nonlinear threshold — a specific debt level (whether 90% or any other number) above which growth dramatically collapses. The relationship is more gradual and more country-specific than the 2010 Reinhart-Rogoff paper implied. Reasonable economists continue to disagree about magnitudes and policy implications.

Does this mean the austerity programs of 2010–2014 were wrong?

The Reinhart-Rogoff critique doesn’t directly answer that question. Whether austerity was the right policy response to the post-2008 environment depends on many considerations beyond a single debt-and-growth correlation — monetary policy, fiscal multipliers, exchange-rate regimes, structural unemployment, financial fragility, and others. What the critique establishes is that one specific empirical claim that was used to justify the austerity case — the 90% threshold — does not survive correction of errors in the underlying paper. Whether that changes the overall policy verdict depends on what weight you placed on that specific claim in the first place. Many of the policymakers who cited Reinhart-Rogoff most heavily would, presumably, have favored austerity for independent reasons; for others, the threshold finding was load-bearing.

Was the Excel error the main problem, or was it a small part of a bigger story?

The Excel error was the most publicized, because it was the cleanest and the easiest to explain. But of the three problems Herndon-Ash-Pollin documented, the country-equal weighting scheme combined with the selective exclusion of post-WWII data did substantially more damage to the published average than the Excel formula error alone. The 2.3-percentage-point swing in the headline number came from all three problems together. The Excel error by itself accounted for roughly 0.4 percentage points of that swing.

Did Reinhart and Rogoff commit research misconduct?

No. The Herndon-Ash-Pollin critique did not allege fraud, and no investigation has ever suggested fabrication of data. The three problems they identified were (1) a clerical spreadsheet error, (2) data-handling choices that had not been documented in the published paper, and (3) a methodological choice (country-equal weighting) that was defensible but should have been flagged as unusual. Reinhart and Rogoff acknowledged the Excel error openly, shared their underlying data when asked, and defended their broader research program. They made a mistake. They did not commit misconduct.

How common are spreadsheet errors in published economics research?

More common than the field has historically wanted to admit. The post-2013 movement toward mandatory data-and-code archiving in major economics journals — including stricter policies at the American Economic Review, the Quarterly Journal of Economics, and several others — was driven in part by the recognition that the verification infrastructure of the field had been thin. A 2018 follow-up study by Chang and Li attempted to replicate 67 empirical economics papers from leading journals and was able to do so for only about one-third without significant assistance from the original authors. Spreadsheet bugs, off-by-one errors, version-control mistakes, and undocumented data transformations are not exotic events; they are the normal background noise of empirical work. The point is not that they should never happen — the point is that the system needs to be designed to catch them before policy is built on top of them.

What about Reinhart and Rogoff’s broader body of work, including the 2009 book This Time Is Different?

This Time Is Different: Eight Centuries of Financial Folly (Princeton University Press, 2009) is a separate scholarly contribution and remains influential in the financial-crisis literature. The Herndon-Ash-Pollin critique was specifically of the short 2010 AER Papers and Proceedings piece, not of the broader Reinhart-Rogoff research program. Reasonable readers should evaluate This Time Is Different and the related papers on their own terms. The narrow point of the 2013 controversy was that one specific empirical claim, in one specific paper, did not survive replication — not that the entire body of work is in question.

What changed in academic economics after this case?

Several things. Major economics journals tightened their data-availability and code-sharing requirements. The American Economic Association moved toward mandatory replication packages for empirical papers in its main journals. Replication exercises moved closer to the center of graduate-level econometrics training (Herndon’s class assignment is now widely cited as a teaching example of why this matters). And the broader empirical-economics community absorbed, somewhat reluctantly, the lesson that high-profile single-paper claims should not function as load-bearing policy evidence without independent reproduction. Whether these reforms have been sufficient is itself debated — Chang and Li’s 2018 replication study suggests that reproducibility in published economics remains imperfect — but the trajectory has been in the right direction.

What’s the strategist’s takeaway?

When an economic claim is being deployed to justify a strategic decision, treat the claim with calibration proportional to the strength of its evidence base, not to the prestige of its source. Locate the original paper. Read what it literally said, with what hedges. Ask whether independent teams have replicated the finding. Ask whether the data and code are publicly available, and whether anyone has examined them. The Reinhart-Rogoff case is the canonical illustration of what happens when a finding from elite authors at an elite institution circulates faster than its empirical foundations can support — and the cost, when the foundations turn out to be thinner than they appeared, can be measured in policy decisions made for the wrong reasons.

Share this article
LinkedIn (opens in new tab) X / Twitter (opens in new tab)
Atticus Li

Experimentation and growth leader. CXL-certified CRO practitioner, Mindworx-certified behavioral economist (1 of ~1,000 worldwide). 200+ A/B tests across energy, SaaS, fintech, e-commerce, and marketplace verticals.