For two decades, Brian Wansink was the most-quoted food psychologist in America. He ran the Cornell Food and Brand Lab. He wrote Mindless Eating: Why We Eat More Than We Think (Bantam, 2006), a popular-science bestseller that translated his lab’s findings — bigger plates make you eat more, deep bowls make you serve more soup, half-eaten popcorn tastes better fresh — into a unified theory that small environmental cues, not willpower, drive most of what humans put in their mouths. He gave TED talks. He advised PepsiCo, Coca-Cola, and the United States Army. From November 2007 through January 2009, he served as executive director of the USDA’s Center for Nutrition Policy and Promotion, where he oversaw the development of the 2010 Dietary Guidelines for Americans and helped shape what would become the MyPlate visual replacement for the food pyramid. He founded the Smarter Lunchrooms Movement, whose nudge-based recommendations — placing fruit near the cash register, renaming carrots as “X-Ray Vision Carrots” — were adopted by tens of thousands of US school districts under Obama-era federal guidance.

By every external measure, his research program was a model of policy-relevant behavioral science: prolific, accessible, intuitive, and translated into real-world interventions at federal scale.

Then, on a Monday morning in November 2016, Wansink published a blog post on his personal site praising a graduate student for her persistence in “salvaging” a failed dataset. Three statisticians — Tim van der Zee, Jordan Anaya, and Nick Brown — read the post. What he was describing, in tones of professional admiration, was textbook p-hacking. They began checking his published work. Within fifteen months they had documented thousands of statistical inconsistencies across his bibliography. Within twenty-two months, Cornell University concluded its formal investigation and found him guilty of academic misconduct. He resigned, effective June 30, 2019. As of mid-2026, at least 18 of his papers have been retracted — one of them retracted twice — and dozens more have been corrected.

The unusual feature of the Wansink case, the one that distinguishes it from the Stapel fraud or the Bem precognition controversy, is that no whistleblower triggered the investigation. No junior colleague walked into a dean’s office. No collaborator betrayed a confidence. Wansink unmasked himself, in public, in a blog post he believed was an inspiring story about academic mentorship. He did not appear to recognize that what he was describing was misconduct. The statisticians who read his post recognized it immediately.

This is the story of how that happened, what investigation it triggered, and what every strategist who cites “research-backed” claims about consumer behavior should learn from it.

The Blog Post That Ignited Everything

On November 21, 2016, Wansink published a post on his personal blog titled “The Grad Student Who Never Said No.” The blog has since been taken offline, but the post is preserved on the Internet Archive and was extensively quoted in the contemporaneous statistical-modeling commentary that followed.

The narrative arc was conventional academic mentorship hagiography. Wansink described welcoming a Turkish visiting graduate student — Özge Siğirci — to his Cornell lab. He offered her a choice of two projects. The first was a “rich and unique” dataset from a self-funded study his lab had previously run: a month of observations at an all-you-can-eat Italian buffet, where some diners were charged half-price and others full price, with the hypothesis that price would affect consumption and satisfaction. The hypothesis had failed. The dataset had produced null results. Wansink, in his own telling, told Siğirci something close to: “This cost us a lot of time and our own money to collect. There’s got to be something here we can salvage because it’s a cool dataset.” The second option was a different, freshly funded project.

Siğirci chose the null dataset. In the blog post, Wansink praises her for not “giving up.” He describes her months of effort slicing the data into subgroup after subgroup — “males, females, lunch goers, dinner goers, people sitting alone, people eating with groups of 2, people eating in groups of 2+, people who order alcohol, people who order soft drinks, people who sit close to the buffet, people who sit far away” — looking for statistical relationships that would yield publishable findings. The post celebrates the result: four published papers eventually emerged from this single null dataset, which Wansink framed as a triumph of persistence over a discouraging start.

To Wansink, this was a story about a hardworking student and a senior advisor’s commitment to extracting value from sunk-cost research effort. To Tim van der Zee, Jordan Anaya, and Nick Brown — three independent researchers with backgrounds in education, computational biology, and psychology, respectively, who had been involved in informal post-publication peer review for years — the post described, in unguarded plain language, the exact methodology that the replication-crisis literature had spent five years documenting as the engine of false-positive findings in social science. Take a dataset. Test enough subgroups. Run enough interactions. Cherry-pick the comparisons that cross the p < 0.05 threshold. Write each surviving finding up as if it had been the primary hypothesis from the start. Publish four papers from a “failed” study.

The technical name for this is p-hacking in combination with HARKing (“Hypothesizing After the Results are Known,” Kerr, 1998) and salami-slicing (splitting one underpowered study into multiple publications). Wansink had publicly described doing all three.

Within days, the post drew critical commentary on Andrew Gelman’s Statistical Modeling, Causal Inference, and Social Science blog (December 15, 2016 — “Hark, hark! the p-value at heaven’s gate sings”). Within weeks, van der Zee, Anaya, and Brown had pulled all four of the resulting “pizza papers” and begun a systematic statistical audit.

What The Statisticians Found

The pizza papers — four studies published between 2012 and 2014 from the buffet dataset — were:

  1. Just, D. R., Sigirci, O., & Wansink, B. (2014). Lower buffet prices lead to less taste satisfaction. Journal of Sensory Studies, 29(5), 362–370.
  2. Just, D. R., Sigirci, O., & Wansink, B. (2015). Peak-end pizza: Prices delay evaluations of quality. Journal of Product & Brand Management, 24(7), 770–778.
  3. Siğirci, O., Rockmore, M., & Wansink, B. (2016). How Vietnam war veterans suffer from food-related life: A reverse-life-perspective. Food Quality and Preference, 53, 17–24. (Note: this one was on veterans, not pizza, but came from a related dataset and was eventually retracted on similar statistical grounds.)
  4. Wansink, B., & Sigirci, O. (2014). Eating heavily: Men eat more in the company of women. Evolutionary Psychology, 12(2), 147470491401200214.

In January 2017, van der Zee, Anaya, and Brown published a preprint on PeerJ titled “Statistical heartburn: An attempt to digest four pizza publications from the Cornell Food and Brand Lab” (preprint), later published in the peer-reviewed journal BMC Nutrition (van der Zee, Anaya, & Brown, 2017, BMC Nutrition, 3(1), 54; DOI: 10.1186/s40795-017-0167-x).

The paper documented, in dispassionate technical language, what the four pizza publications looked like under sustained statistical scrutiny. Their findings were extensive. Among the specific problems they identified:

  • Incongruous sample sizes within and between the four papers. The same restaurant buffet, the same study period, should have produced a consistent count of diners observed. It did not. The four papers reported different sample sizes for what should have been overlapping or identical subsets of the same dataset, with no methodological explanation for the divergence.

  • Impossible degrees of freedom. In some reported analyses, the degrees of freedom of between-participant test statistics were larger than the sample size — a mathematical impossibility. A test statistic cannot have more degrees of freedom than there are participants in the comparison.

  • F and t statistics inconsistent with reported means and standard deviations. Using the same arithmetic any reviewer could apply, the authors showed that many of the reported F-tests and t-tests in the pizza papers could not have been computed from the reported sample means and standard deviations. Either the test statistics were wrong, or the descriptive statistics were wrong, or both.

  • Application of the GRIM test (Granularity-Related Inconsistency of Means). GRIM, developed by Brown and Heathers (2017), checks whether a reported mean is mathematically possible given the sample size and the integer nature of the underlying scale. Many of Wansink’s reported means failed GRIM — they could not have been produced by the sample size and measurement scale described.

  • SPRITE (Sample Parameter Reconstruction via Iterative Techniques). A complementary tool, also developed by the same circle of statistical sleuths, reconstructs plausible underlying datasets from reported summary statistics. Applied to Wansink’s papers, SPRITE could not generate plausible underlying data for many of his reported descriptive statistics.

In total, across just these four pizza papers, van der Zee and colleagues documented approximately 150 inconsistencies — impossible values, incorrect ANOVA results, dubious p-values, and discrepancies between text and tables.

This was not a marginal critique. It was a comprehensive demonstration that the four papers, in aggregate, were either built on numbers that were arithmetically impossible or had been so carelessly produced that no review process should have allowed them through.

And the four pizza papers were the beginning. Anaya, Brown, van der Zee, and others extended the audit across Wansink’s broader bibliography in the months that followed, publishing additional analyses on PeerJ preprints and informal forums throughout 2017. The catalog of problems grew: papers where reported means could not arise from the stated sample sizes; papers where the same dataset appeared, slightly recharacterized, in multiple “independent” publications; papers where the methodology described was incompatible with the analyses reported.

The Patterns That Emerged

Across the broader audit, the patterns clustered into recognizable categories. Each is well-documented in the replication-crisis methodological literature as a marker of unreliable research:

P-hacking. Running enough comparisons on a single dataset that some cross the p < 0.05 threshold by chance, then reporting only the surviving comparisons. The pizza papers were the cleanest example: one dataset, four publications, and no acknowledgment in any individual paper of how many other comparisons had been tested and discarded.

HARKing — Hypothesizing After the Results are Known. Presenting a finding that emerged from exploratory analysis as if it had been the primary hypothesis of the study from the outset. Wansink’s blog post described this practice plainly: the buffet study’s pre-registered hypothesis (price affects consumption) had failed, but each of the four resulting publications presented its surviving subgroup finding as the study’s primary research question.

Salami-slicing. Splitting what should have been a single statistical analysis across multiple publications, each presented as an independent study. The buffet dataset was sliced into four papers when, at most, it justified one paper reporting null results.

Recycled data presented as independent studies. Multiple Wansink papers reported what appeared to be the same dataset (same sample sizes, same descriptive statistics, same subjects) characterized as if it were drawn from different studies. When the same data is presented as if from different studies, citation counts and apparent replication evidence are artificially inflated.

Impossible or implausible descriptive statistics. GRIM and SPRITE applied to Wansink’s papers repeatedly returned results consistent with numbers that could not have arisen from the methodology described — pointing either to fabrication, to arithmetic errors, or to systematic carelessness in transcription. The investigation could not always distinguish between these explanations from the published text alone.

Inappropriate authorship. Cornell’s eventual finding (September 2018) included “inappropriate authorship” as one of the four categories of academic misconduct documented — referring to the practices around who was listed as an author on which papers, and what their actual contribution had been.

In December 2017, BuzzFeed News reporter Stephanie M. Lee obtained internal Wansink lab emails through public-records requests filed at New Mexico State University, which employed Wansink’s longtime collaborator Collin Payne. Her February 25, 2018 investigation — “Here’s How Cornell Scientist Brian Wansink Turned Shoddy Data Into Viral Studies About How We Eat” — quoted Wansink’s own emails to collaborators in language that paralleled, and amplified, the blog post that had started the audit. “Work hard, squeeze some blood out of this rock, and we’ll see you soon,” Wansink had written to Siğirci about the pizza data. To a different collaborator on a different study: “It looks like stickers on fruit may work (with a bit more wizardry).” Lee documented Wansink directing collaborators to run “400 strategic mediation analyses” in search of a publishable finding, and asking collaborators to re-run analyses with the framing “It seems to me it should be lower” when an initial p-value did not meet the 0.05 threshold.

The emails moved the case from “the published statistics are impossible” to “the email record shows the senior author explicitly directing his lab to manufacture findings from data that did not support them.” This was the moment the case became impossible for Cornell to slow-walk.

Cornell’s Investigation And Conclusion

Cornell University opened a formal misconduct inquiry in April 2017, expanded it in February 2018 after the BuzzFeed reporting, and concluded it in September 2018.

On September 20, 2018, Cornell Provost Michael I. Kotlikoff issued a formal statement reporting the findings of the faculty investigative committee. The conclusion was unambiguous:

“A Cornell faculty committee … found that Professor Wansink committed academic misconduct in his research and scholarship, including misreporting of research data, problematic statistical techniques, failure to properly document and preserve research results, and inappropriate authorship.” — Cornell University Provost statement, September 20, 2018

The four named categories — misreporting of data, problematic statistical techniques, failure to document and preserve results, and inappropriate authorship — corresponded precisely to the patterns that van der Zee, Anaya, Brown, Lee, and the broader audit community had identified in his published work and email record over the preceding twenty-two months.

Cornell removed Wansink from all teaching and research duties for the academic year. He was required to spend the remainder of his time at the university cooperating with a review of his prior research. He submitted his resignation, effective June 30, 2019.

For Wansink’s own framing of the findings, his public statement was that he admitted to mistaken reporting, poor documentation, and “some statistical mistakes” but maintained that there had been “no fraud, no intentional misreporting, no plagiarism, [and] no misappropriation” in his work. This characterization is incompatible with the email record that BuzzFeed had published — emails in which Wansink himself directed collaborators to find significance in null datasets — but it is the position he has maintained publicly. Cornell’s institutional finding was the contrary one.

On the same day as the Cornell statement, JAMA (the Journal of the American Medical Association) retracted six Wansink papers simultaneously after its own internal investigation reached a similar conclusion about the underlying data quality.

The 19+ Retractions

The retractions accumulated over years as journals worked through their internal evaluations of his published work.

As of mid-2026, at least 18 of Wansink’s papers have been formally retracted, with one paper having been retracted twice (after the initial retraction was found to be insufficient and a more comprehensive retraction notice was issued). At least 16 additional papers have received formal corrections — corrigenda or errata — that materially alter their reported findings without rising to the level of full retraction. Retraction Watch maintains the most comprehensive running tally; the count has grown approximately once per year since 2017 as additional journals complete reviews.

The retracted papers span the major themes of Wansink’s career:

  • The pizza papers (Italian buffet dataset) — the original target of the van der Zee, Anaya, and Brown audit.
  • “Eating heavily: Men eat more in the company of women” (Evolutionary Psychology, 2014) — a viral finding that was widely covered in popular press.
  • “The Joy of Cooking Too Much: 70 Years of Calorie Increases in Classic Recipes” (Annals of Internal Medicine, 2009) — retracted in December 2018 after the Annals editors concluded that “almost every number was different from those in the published article” when they attempted to reproduce the analysis. The paper had been cited approximately 20 times.
  • “Meal Size, Not Body Size, Explains Errors in Estimating the Calorie Content of Meals” (Annals of Internal Medicine, 2006) — retracted after the investigators found the paper reported a mean age for participants when age was not a variable actually collected in the underlying study. Cited approximately 77 times.
  • Six papers retracted by JAMA on September 19, 2018 — including findings on portion size, plate design, and consumer food choice that had been cited heavily in the obesity-intervention literature.

The corrections are arguably as informative as the retractions. A correction means that a journal investigated the paper, found that the published statistics did not match the underlying data, and required the authors to publish revised numbers — but did not deem the underlying claim invalidated to the point of full retraction. In Wansink’s case, this happened more than a dozen times. It indicates, at minimum, a sustained pattern of careless arithmetic across the bibliography.

The damage extended beyond the directly retracted papers. By 2020, citation analyses had documented that approximately a third of Wansink’s broader bibliography had been affected by formal corrections, retractions, or unresolved post-publication challenges. Specific findings that had been cited as the foundation of subsequent research programs were now in question, including findings that had been incorporated into the Smarter Lunchrooms Movement nudge recommendations.

What This Means For “Behavioral Nutrition” And Consumer Research

The Wansink case did not merely retire a single researcher. It forced a reassessment of an entire research program that had been translated into federal nutrition policy and into the operational practice of school districts, restaurants, and consumer product designers across the United States and abroad.

The Smarter Lunchrooms Movement was particularly affected. The movement’s recommendations — placing fruit at eye level near the cash register, renaming healthy items with playful names, redesigning lunch lines to put fruit before chips — were adopted by tens of thousands of US schools under USDA guidance based substantially on Wansink lab research. A 2018 reassessment by the Cochrane Collaboration’s nutrition group and a separate audit by researchers at the University of Connecticut concluded that many of the specific quantitative claims attributed to Smarter Lunchrooms research — “renaming carrots as X-Ray Vision Carrots increased consumption by 66%” — could not be supported by the underlying data and methodology. The Cornell Food and Brand Lab itself was eventually disbanded in March 2019, and the Smarter Lunchrooms Movement website ceased operation. Schools that had adopted the interventions were generally permitted to continue them — they were low-cost and uncontroversial — but the framing that they were “evidence-based” in any rigorous sense was withdrawn.

Restaurant menu and packaging research. Wansink’s findings on portion size, plate size, glass shape, and consumer satiety perception had been incorporated into commercial product design and marketing strategy for two decades. Many of the specific quantitative claims — “a 20cm plate causes diners to serve 35% more food than a 25cm plate” — are now in the same status as the Smarter Lunchrooms claims: not necessarily false, but not rigorously established by the cited research. Marketers and product designers continuing to cite these specific findings should treat them as unreliable until independently replicated.

The broader “behavioral nutrition” research program. Wansink was not the only researcher in this space, but he was its most visible and prolific contributor. The cascading effect of his discrediting was to put adjacent findings — from collaborators, from co-authors, from researchers who had built on his published claims — into a status of heightened scrutiny. Some have replicated; some have not. Some are still being audited.

The broader concept that environmental design influences eating behavior — that defaults, salience, portion sizes, and visual cues shape consumption — is not refuted by the Wansink case. The general framework remains supported by adjacent research streams, including the more rigorous work in behavioral economics by researchers such as Richard Thaler and Cass Sunstein (whose 2008 book Nudge articulated the conceptual framework independent of Wansink’s specific quantitative claims). What is in question is the specific magnitude and replicability of many individual findings that Wansink’s lab had reported.

What’s Honest To Say About Mindless Eating And Choice Architecture Now

The honest summary, as of 2026:

Many of the specific findings reported in Mindless Eating are unreliable. The book draws heavily on the Cornell Food and Brand Lab’s primary research and presents specific quantitative claims — about plate size, soup bowl design, popcorn freshness, candy dish proximity — that depend on individual studies now retracted or corrected. A reader who wants to cite specific numbers from the book should check whether the underlying study has been retracted before doing so.

The conceptual framework of choice architecture is more robust. Thaler and Sunstein’s Nudge (Yale University Press, 2008), Kahneman’s Thinking, Fast and Slow (Farrar, Straus and Giroux, 2011), and the broader behavioral-economics literature articulate the idea that environmental defaults and frictions influence consumer decisions without relying on Wansink’s specific quantitative claims. This conceptual framework has substantial supporting evidence from independent research programs, including the well-replicated finding that opt-in versus opt-out defaults dramatically change participation rates in retirement savings programs, organ donation, and other domains.

Specific quantitative claims should be treated as provisional. Marketers who want to cite, for example, “a 20% reduction in plate size leads to a 22% reduction in food consumption” — a claim of the kind that proliferated in popular nudge writing in the 2010s — should not do so without verifying that the underlying study has not been retracted or corrected, and ideally without an independent replication of the specific quantitative effect.

Calibration matters more than direction. It may well be true, in many cases, that the directional claims in Wansink’s work were correct — that smaller plates do reduce consumption, on average, by some amount — even when the specific quantitative claims (35% reduction, 22% reduction, etc.) are unreliable. The strategic implication is to make decisions based on the directional logic rather than on the specific magnitudes, and to treat any quantitative claim with caution unless it has been independently replicated.

What This Means For Strategists Evaluating “Research-Backed” Marketing And Wellness Claims

If you are a CEO, consultant, or marketing leader who relies on behavioral-science research to inform decisions, the Wansink case is the cleanest available demonstration that “research-backed” claims about consumer behavior can be built on a foundation of impossible numbers, p-hacked subgroup analyses, and recycled datasets — and that this foundation can persist for two decades, can be translated into federal policy, and can underlie commercial product design before the audit catches up.

The practical implications:

Pattern-recognize “too-clean” behavioral findings. Wansink’s findings were striking partly because the effect sizes were implausibly large for the kind of intervention described (renaming a vegetable; resizing a plate by a few centimeters). When a single small environmental change is reported to produce a 30%+ change in consumer behavior, that magnitude is itself a warning sign. Real consumer behavior is noisy, and real-world interventions typically produce modest effects. A striking, clean, large effect from a single study deserves heightened scrutiny, not heightened citation.

Be suspicious of “many publications from one dataset.” Wansink’s pizza papers were a public demonstration of salami-slicing: one underpowered study split into four publications. If a body of research consists of many publications that share suspiciously similar sample sizes, study sites, or methodology, ask whether they are really independent studies or whether they are slices of a single underlying dataset being presented multiple times.

Check whether headline numbers have been retracted or corrected. Before citing a behavioral-research finding to support a strategic decision, search the paper’s DOI on Retraction Watch (retractionwatch.com) or check the journal’s record for a retraction notice or corrigendum. This takes thirty seconds and can prevent citing a discredited finding.

Weigh independent replications far more than the original paper. The Wansink case, like the broader replication crisis, makes clear that a single published finding — even in a high-impact journal, even from a prestigious institution, even with substantial press coverage — has not yet been independently verified. Findings that have been replicated by independent labs, ideally with preregistration and open data, are materially more credible than findings that have not.

Distinguish concept from quantitative claim. Even when specific numbers from a research program are unreliable, the broader conceptual framework may remain supportable from adjacent research. Choice architecture, environmental defaults, friction reduction — these conceptual frameworks survive the Wansink case. The specific quantitative magnitudes attributed to Wansink’s individual studies often do not. Strategic decisions can often be made on the conceptual framework alone, without committing to specific magnitudes that may not hold up.

Be skeptical of behavioral interventions adopted at scale without subsequent replication. The Smarter Lunchrooms Movement was adopted by tens of thousands of US schools before its underlying evidence had been independently replicated. The interventions were largely benign, but the “evidence-based” framing turned out to be more confident than the underlying research supported. When a behavioral intervention is being marketed to your organization with “evidence-based” framing, ask: which studies, in which journals, when, and what is the independent replication evidence?

The Wansink case is the canonical example of a “research-backed” behavioral intervention program that turned out, on rigorous post-publication scrutiny, to be built on a foundation that could not bear the weight that policymakers, practitioners, and consumers had placed on it. The right inference is not cynicism about behavioral science as a whole. Substantial findings in the field have replicated and remain robust. The right inference is calibrated humility about any specific, striking, quantitative claim — particularly one being used to justify a strategic or operational decision that matters.

Sources

  • The Replication Crisis hub — the full set of cases, methods, and decision frameworks for strategists evaluating “research-backed” claims.
  • Diederik Stapel: The 58-Retraction Fraud — the closest analog: a celebrated researcher in social psychology, a long-undetected pattern of misconduct, and the most consequential investigation in the field’s recent history. The Stapel case was outright fabrication; the Wansink case was the systematic mishandling of real data. Both produced 18+ retractions; both reshaped their fields.
  • Daryl Bem And Precognition — the 2011 JPSP paper that, alongside the Stapel and Wansink cases, forced the field to confront whether its publication standards were detecting anything.
  • Money Priming And The Vohs Failures — another behavioral-economics research program whose specific quantitative claims have not survived independent replication, with similar implications for marketers citing the literature.
  • Defaults And The Status-Quo Anti-Example — the conceptual framework of choice architecture remains supported by adjacent research even where specific Wansink-era findings do not. Worth reading as a counterpoint to a fully cynical inference from the Wansink case.

FAQ

Is anything in Mindless Eating still trustworthy?

The book draws heavily on Wansink lab studies that are now retracted or corrected. Specific quantitative claims — about plate size, soup bowl design, popcorn freshness, candy dish placement — should not be cited without first checking whether the underlying study has been retracted or corrected. The broader conceptual framework, that environmental design influences eating behavior, is supported by adjacent research streams (notably the behavioral-economics work of Thaler, Sunstein, and Kahneman) but does not require the specific Wansink magnitudes to hold. A reader who wants the conceptual framework without the unreliable specifics is better served by Nudge (Thaler & Sunstein, 2008) or Thinking, Fast and Slow (Kahneman, 2011), both of which draw on a broader and more rigorous research base.

What about the Smarter Lunchrooms Movement?

The specific quantitative claims that the Smarter Lunchrooms Movement made — that renaming vegetables increased consumption by specific percentages, that placement of fruit at the cash register changed selection by specific amounts — were largely drawn from Wansink lab research that is now in question. Independent reassessments by the Cochrane Collaboration’s nutrition group and by researchers at the University of Connecticut concluded by 2020 that the “evidence-based” framing of the movement’s interventions could not be supported by rigorous standards. The interventions themselves were low-cost and uncontroversial, and many schools have continued them. But schools, districts, and policymakers should not represent them as “evidence-based” in the way the Smarter Lunchrooms Movement originally claimed. The Cornell Food and Brand Lab was disbanded in March 2019, and the Smarter Lunchrooms Movement website ceased operation around the same time.

What about other behavioral-nutrition research?

The field is broader than Wansink. Researchers including Kelly Brownell (Yale), Marlene Schwartz (UConn), and many others have continued to produce behavioral-nutrition research with stronger methodological standards. The Wansink case has accelerated the field’s adoption of preregistration, open data, and independent replication. Findings from contemporary behavioral-nutrition research that are preregistered, openly archived, and ideally independently replicated should be evaluated on those merits, not tarred with the Wansink association. The right inference from the Wansink case is heightened scrutiny of any specific quantitative claim, not blanket dismissal of the field.

How did this go undetected for so long?

The same combination of factors that allowed the Stapel fraud to persist: a research culture that rewarded striking, clean, publishable findings; a publication system that did not require raw data sharing or preregistration; co-authors who trusted the senior researcher with data handling; reviewers who did not check arithmetic against reported sample sizes; and the prestige insulation of a successful, prolific senior figure. The Wansink case differs from Stapel in that nobody inside the lab acted as a whistleblower. The case was caught entirely from outside, by independent statistical analysts who became suspicious because Wansink himself publicly described his methodology in a blog post. Without that post, the audit might never have started.

Was Wansink’s misconduct deliberate fraud or careless practice?

The Cornell investigation found “academic misconduct” but did not use the word “fraud.” Cornell’s four documented categories — misreporting of data, problematic statistical techniques, failure to properly document and preserve research results, and inappropriate authorship — can in principle describe practices ranging from deliberate fabrication to extreme carelessness. Wansink’s own characterization is that there was “no fraud, no intentional misreporting, no plagiarism, no misappropriation.” The BuzzFeed News email record published by Stephanie Lee — including Wansink’s explicit instructions to collaborators to find significance in null datasets — is in tension with the careless-error framing. Reasonable readers can reach different conclusions about intent. What is not in dispute, and what Cornell formally documented, is the pattern of misconduct in the published record itself.

What happened to Wansink after his resignation?

He retired from Cornell on June 30, 2019. He has subsequently founded a consulting practice, given paid speaking engagements, and as of 2022 has published in some lower-impact venues. He has not held a research faculty appointment since leaving Cornell. He maintains his public position that his work involved methodological errors rather than fraud.

What happened to Wansink’s co-authors and graduate students?

Many co-authors saw retractions on papers they had contributed to in good faith. The reputational and career effects have varied. Some former Wansink lab affiliates have continued their research careers in nutrition science and behavioral economics, generally with more rigorous methodological standards in their post-Wansink work. The doctoral students whose dissertations relied on Wansink lab data have faced a difficult position similar to the Stapel case: they performed their own analytical and writing work in good faith on data that was, in retrospect, unreliable. Cornell did not revoke any PhDs as a result of the investigation.

What is the single most important lesson for someone outside academia?

When a behavioral-science finding is being cited to support a strategic, operational, or product-design decision that matters, do three things. First, check the paper’s DOI on Retraction Watch to verify it has not been retracted or corrected. Second, ask whether the specific quantitative claim has been independently replicated by a different research group with preregistration. Third, distinguish between the conceptual framework (which often remains robust) and the specific magnitude (which often does not). The Wansink case is the cleanest available proof that a striking quantitative claim from a prestigious lab in a peer-reviewed journal, translated into federal policy and commercial practice, can rest on a foundation of impossible numbers and selectively reported subgroup analyses. The verification was always available — the statisticians did it in months once they knew to look. The lesson is to know to look, before you commit a decision to a number.

Share this article
LinkedIn (opens in new tab) X / Twitter (opens in new tab)
Atticus Li

Experimentation and growth leader. CXL-certified CRO practitioner, Mindworx-certified behavioral economist (1 of ~1,000 worldwide). 200+ A/B tests across energy, SaaS, fintech, e-commerce, and marketplace verticals.