For two decades, “willpower is a depletable resource you can run out of” was treated as settled science. Then 23 labs tried to replicate it and found nothing. Here is what actually happened to ego depletion, what survived, and what leaders should learn from one of the largest preregistered failures in social-psychology history.

In 1998, Roy Baumeister and three colleagues published a paper in the Journal of Personality and Social Psychology with one of the most memorable experimental setups in social psychology. They sat hungry college students in a room that smelled like fresh-baked chocolate chip cookies. On a table in front of each student were two bowls. One held the cookies. One held radishes. Half the students were told to eat the radishes and ignore the cookies. The other half were told they could eat whatever they wanted.

Then, after the food task, all of the students were given an impossible puzzle to work on. The researchers measured how long each student persisted before giving up.

The radish-eaters gave up much faster.

The interpretation was elegant. Resisting the cookies had cost the radish-eaters something. They had spent some finite quantity of self-control on the cookies, and now they had less of it left over for the puzzle. Self-control, the authors proposed, was a depletable resource --- like a battery or a fuel tank. Use it on one task, and you have less for the next one.

This was the founding study of what came to be called ego depletion, and it launched what would become one of the most prolific research programs in modern psychology. By the early 2010s, hundreds of studies had been published on ego depletion. The construct had migrated out of academia into self-help books, business books, productivity systems, and even congressional testimony. Decision fatigue. Glucose and willpower. Why judges grant fewer paroles before lunch. Why dieters break their diets at night. All of it was tied, conceptually or directly, to the ego depletion framework.

Then in 2016, twenty-three laboratories ran the same experiment at the same time. The combined sample was over two thousand people. They found nothing.

This article is about what happened --- what the original paper actually showed, how the construct grew, what the large preregistered replications found, what survived the collapse, and what leaders should learn from one of the clearest case studies in modern social-psychology methodology.

What the Original Paper Actually Said

The founding paper is Baumeister, Bratslavsky, Muraven & Tice (1998), “Ego depletion: Is the active self a limited resource?” in the Journal of Personality and Social Psychology. It contained four experiments. The famous radish-versus-chocolate study (Experiment 1) had a sample of 67 students. The other three experiments tested variations --- choice fatigue, emotion suppression, persistence after thought suppression --- with sample sizes between 36 and 72.

The headline empirical claim, across the four studies, was that a prior act of self-control (resisting cookies, choosing among many options, suppressing emotions) measurably reduced performance on a subsequent self-control task. The effect sizes reported in the paper were substantial --- Cohen’s d around 0.6 in several conditions, which is considered a medium-to-large effect in psychology.

The theoretical claim was bigger. Baumeister and colleagues proposed that all acts of self-control draw from a single shared resource. That resource is depletable. Once depleted, subsequent self-control attempts fail more often. They called this the “limited resource model” of self-regulation.

Over the next fifteen years, the model was extended. Gailliot, Baumeister and colleagues (2007) in the JPSP proposed that the depletable resource was specifically glucose. The story was that self-control burned through blood sugar, and that sugary drinks could restore self-control performance. This was the popular version that escaped academia --- the idea that “willpower is a muscle that runs on glucose.”

A 2010 meta-analysis by Hagger, Wood, Stiff & Chatzisarantis in Psychological Bulletin aggregated 83 ego-depletion studies and reported an overall effect size of d ≈ 0.62 --- strong support for the construct. Ego depletion appeared to be one of the better-established findings in social psychology.

It wasn’t.

The First Cracks

The problems began appearing in 2014. Carter, Kofler, Forster & McCullough (2015), in the Journal of Experimental Psychology: General, conducted a new meta-analysis that explicitly corrected for publication bias. They used a technique called PET-PEESE, which adjusts effect-size estimates for the relationship between study sample size and reported effect (small studies that report large effects are a signature of publication bias). When they applied this correction to the ego-depletion literature, the average effect size dropped from around d = 0.6 to essentially zero.

This was a serious challenge but not a death blow. Meta-analytic corrections are technical, and reasonable researchers could disagree about whether the correction was too aggressive. The Baumeister camp pushed back, arguing that the construct was real and the methodology of individual studies was sound.

What was needed was a definitive empirical test. A preregistered, multi-laboratory, paradigmatic replication of the kind that no individual study could perform.

That test came in 2016.

The 2016 Registered Replication Report

Hagger, Chatzisarantis, and 117 co-authors (2016), “A multilab preregistered replication of the ego-depletion effect,” appeared in Perspectives on Psychological Science. The design was simple and devastatingly clean.

Twenty-three independent laboratories around the world preregistered an identical protocol with Baumeister’s input. Each laboratory recruited participants and ran the same ego-depletion experiment using a paradigm that the original researchers had approved as a valid test of the construct. The combined sample was 2,141 participants --- far larger than the original 1998 paper and larger than most individual studies in the literature.

The result: the depletion effect was d = 0.04, with a 95% confidence interval from −0.07 to +0.15. The Bayes factor favored the null hypothesis by roughly 4-to-1.

In plain language: across twenty-three independent labs running the same study, with more than two thousand participants, the ego-depletion effect was indistinguishable from zero. The confidence interval was tight enough to rule out any practically meaningful effect.

This was, by the standards of modern social psychology, as decisive an empirical verdict as the field can produce. The construct that had supported a fifteen-year research program, hundreds of papers, and a popular self-help industry did not replicate.

The 2021 Paradigmatic Test

Defenders of ego depletion argued that the 2016 RRR had used a specific task paradigm that wasn’t representative of the broader literature. Some of the failures, they suggested, were specific to that one experimental protocol rather than to the construct itself.

In 2021, Vohs, Schmeichel and 47 co-authors addressed this objection in Psychological Science with “A multisite preregistered paradigmatic test of the ego-depletion effect.” This time the paradigm was different --- the kind of task that Baumeister himself argued was the canonical test of the construct. Thirty-six laboratories. A combined sample of 3,531 participants. Preregistered. Designed in consultation with Baumeister.

The result: d = 0.06, not statistically significant. The Bayes factor again favored the null.

Two paradigmatic, preregistered, multi-laboratory tests, totaling more than 5,500 participants, both null. At this point the empirical case for the classic ego-depletion effect was, by any reasonable scientific standard, over.

What Survived

The collapse of classic ego depletion is one of the clearer stories in the replication crisis, but it’s not the entire story. A few threads survived and are worth understanding precisely, because they shape what you can and can’t reasonably claim about self-control.

Attention-control effects survive in smaller form. Garrison, Finley & Schmeichel (2019), in Personality and Social Psychology Bulletin, ran a preregistered study with more than 1,000 participants using attention-control measures (Stroop tasks, attention-network tests) rather than persistence tasks. They found an effect of d ≈ 0.20 --- smaller than the original claims but statistically reliable. The interpretation: there may be a real but modest effect of prior self-control attempts on subsequent attention-control performance, even if the dramatic “willpower exhaustion” version doesn’t replicate.

The glucose mechanism is dead. The specific claim that willpower is literally fueled by blood glucose, popularized by Gailliot and Baumeister in 2007, has not held up to closer examination. Kurzban (2010) and subsequent metabolic-physiology critiques showed that mental tasks consume trivial amounts of glucose compared to the brain’s baseline consumption, and that the glucose-restoration findings often failed direct replication. Baumeister himself has stepped back from the strong glucose claim.

Subjective experience of effort and fatigue is real. People do report feeling more effort-fatigued after demanding tasks. They do behave differently when fatigued. None of that requires the “depletable resource” model --- it’s consistent with motivational accounts (you’re less willing to expend effort on the second task because you’re less motivated, not because you’re physically out of fuel) and with attention-control accounts (subsequent attention shifts are slower or noisier after a demanding task, but not because of resource depletion).

The honest version of self-control today is something like this: the dramatic “willpower as glucose tank” narrative is wrong, but people do experience effort, do allocate motivation strategically, and do show modest attention-control changes after demanding tasks. The story is real; the specific mechanism the original literature proposed is not.

Why the Original Looked Real

How did a construct that didn’t survive its first preregistered test accumulate hundreds of supporting studies over fifteen years? The ego-depletion story is one of the cleanest case studies in the replication crisis, and the mechanisms are worth understanding because they generalize.

Small samples plus publication bias. Most individual ego-depletion studies had sample sizes between 40 and 100. At those sample sizes, statistical power for detecting medium effects is genuinely low, and the literature looks systematically biased: studies that happened to find effects got published, studies that didn’t tended not to. The 2015 Carter meta-analysis specifically documented this pattern. When you have many small studies, a positive-only filter, and a flexible enough experimental paradigm to allow alternative analyses, you can build a literature of hundreds of “successes” that aggregates to a publication-bias-corrected effect of zero.

Researcher degrees of freedom. Ego-depletion experiments have many decision points --- which task to use, how long to run it, how to score the dependent measure, which participants to exclude, which covariates to include. Each decision point creates an opportunity for choices that subtly favor the predicted result. In a literature where individual studies are small and decision points are many, even unconscious bias can systematically inflate effect sizes.

A charismatic researcher and a memorable demonstration. The radish-versus-chocolate study is unforgettable. So is the limited-resource metaphor. Baumeister is a brilliant writer and communicator. The combination of a vivid demonstration and a clean theoretical framing did enormous cultural work. Even researchers skeptical of the specific glucose claim were inclined to give the broader construct the benefit of the doubt because the framing was so intuitively compelling.

Cultural appetite for a willpower theory. American culture is unusually invested in the idea that self-control is a personal trait that explains success and failure across domains. Ego depletion offered a respectable scientific framework for that intuition. It implied that diets fail because you “run out of willpower,” that you should make important decisions early in the day “before you’re depleted,” and that you can train willpower like a muscle. These ideas were enormously popular and produced a steady demand for ego-depletion research, popular treatments, and policy applications.

No one ran the right test for fifteen years. The first definitive preregistered multilab replication didn’t appear until 2016. By then, the cultural and academic investment in the construct was enormous. The 2016 RRR landed harder than smaller replication failures would have, but it took eighteen years from the founding paper to get to that point. Two decades is the time scale on which the social-psychology replication crisis operates.

The Honest Verdict Today

The classic ego-depletion effect --- that prior self-control on one task measurably reduces persistence on a subsequent task --- does not replicate in preregistered multi-laboratory tests. The specific “glucose mechanism” version of the theory is not supported by metabolic physiology and has failed direct replication. The strong popular claim --- you have a fixed daily quantity of willpower that runs down as you use it --- is not supported by current evidence.

A weaker and more conditional claim survives. People do experience effort and fatigue. Demanding tasks do have downstream effects on attention control and motivation, in small and heterogeneous ways. The mechanisms are probably motivational and attentional rather than resource-depletion. Whether this counts as “the construct survives in modified form” or “the construct is dead but related real phenomena exist” is partly a semantic question, and serious researchers come down in different places.

What is not in serious dispute among working researchers is that the popular version of ego depletion --- the one in business books, productivity systems, and TED talks --- is not supported by current evidence and should not be cited as if it were settled science.

What This Means If You’re a Strategist

Three concrete implications for leaders, founders, and consultants who think about decision-making, productivity, or organizational design.

1. The “decision fatigue” framework is on much shakier ground than business literature treats it. Many popular productivity and leadership frameworks build directly on ego depletion. “Make important decisions early.” “Steve Jobs wore the same outfit to avoid decision fatigue.” “Schedule difficult meetings before lunch because willpower runs down.” All of these claims trade on ego-depletion reasoning. The empirical foundation under them is much weaker than their cultural prominence suggests.

This doesn’t mean cognitive fatigue is fake or that you should run twelve-hour days of high-stakes decisions. People do tire. Decision quality probably does degrade under sustained demand. But the specific mechanism most often invoked --- that you have a finite willpower budget that runs down through the day --- is not well-supported. Plan your day around your actual energy patterns, not around a folk model that the field has moved away from.

This matters most when you’re designing organizational systems that depend on the construct. If you’ve architected your sales-rep schedule around “willpower depletion by 3 PM” or your hiring committee’s interview order around “decision fatigue late in the day,” the underlying construct may not be doing the work you think it is. Whatever genuine fatigue or motivation patterns are present in your team are worth measuring directly rather than assumed from pop-psych theory.

2. Large preregistered tests are the gold standard, and you should weight evidence accordingly. The 2016 and 2021 ego-depletion RRRs are model examples of how modern social science is supposed to test a construct. Preregistered design (no flexibility in analysis after seeing the data). Many laboratories running the identical protocol (no single-lab fluke). Large combined sample (high statistical power). Published regardless of result (no publication bias).

When evaluating a behavioral-science claim that’s relevant to a decision you’re making, the question to ask is not “does the literature support this?” --- the literature can support almost anything in social psychology if you’re willing to ignore publication bias and small-sample noise. The question is: has this construct been tested at all in a preregistered, multi-laboratory, well-powered design? If yes, weight that evidence enormously and downweight the prior small-sample literature. If no, hold the construct provisionally regardless of how many supportive papers exist.

This is uncomfortable for any field where the high-quality tests don’t yet exist. But it’s the actual hierarchy of evidence in 2026, and pretending otherwise is how decision-makers get burned by collapsing constructs.

3. The half-life of “settled” findings in this field is much shorter than people assume. Between 1998 and 2015, ego depletion was treated as one of the more solidly established constructs in social psychology. It had hundreds of supporting studies. It had a meta-analysis showing a strong effect. It had a glucose mechanism that connected it to physiology. It was in textbooks. Then in two papers, separated by five years, it largely disappeared.

This is not a one-off case. It’s the modal pattern in the social-psychology replication crisis. Power posing followed the same arc. So did the marshmallow test’s predictive validity, and parts of the social-priming literature, and the strong version of stereotype threat. The “half-life” of an apparently-established social-psychology finding --- the time before a substantial fraction of the field walks away from it --- appears to be on the order of one to two decades, especially for findings popular in self-help and business literature.

The practical implication is to hold behavioral-science claims provisionally even when they’re widely cited, and to periodically refresh your priors on the constructs you use in your work. The construct you cited confidently five years ago may not be one you should cite confidently today. The discipline of auditing your own behavioral-science assumptions --- perhaps annually, perhaps before each major strategy refresh --- is one of the highest-ROI cognitive habits available to anyone whose work depends on understanding people.

Sources

This article is part of an ongoing series on famous behavioral-science studies that did not survive replication. Other entries cover the Stanford Prison Experiment, power posing, the marshmallow test, the bystander effect, and the Mozart Effect. The full hub lives at /replication-crisis/.

If you’ve built productivity, scheduling, or hiring systems on ego-depletion or decision-fatigue assumptions and want a careful evidence review, book a consultation.

FAQ

Is “decision fatigue” real? The narrow claim --- that people behave differently late in a day of demanding decisions than early in it --- has some empirical support, but the most-cited evidence (the Israeli parole-judge study) has been substantially challenged in reanalyses. The broader claim that decision-making degrades steadily through a fixed willpower budget is not supported by the preregistered ego-depletion replications. Treat decision-fatigue claims as plausible but not settled.

What about the famous judge-and-parole study showing parole grants drop before lunch? That study (Danziger, Levav & Avnaim-Pesso, 2011, PNAS) has been challenged on multiple grounds. Critics including Glöckner (2016) and Weinshall-Margel & Shapard (2011) showed the pattern could be substantially explained by case ordering --- defendants represented by attorneys were heard first in each session, and unrepresented defendants (lower grant rate) were heard later. The original interpretation as “decision fatigue” is contested.

Does this mean willpower doesn’t exist? No. People can clearly direct attention, override impulses, persist through difficulty, and so on. The contested question is whether all of these capacities draw from a single depletable resource. The current evidence suggests that framework is wrong, but the underlying capacities are real. “Willpower” as folk concept survives; “willpower as glucose tank” does not.

What should I do instead of “manage willpower”? Most of the practical advice that was justified by ego-depletion reasoning --- make important decisions early, reduce trivial choices, take breaks during sustained work --- is reasonable advice for reasons unrelated to ego depletion. Cognitive fatigue is real; energy varies through the day; switching costs are real. You can keep most of the practical heuristics. What you should drop is the underlying theory that justified them, because that theory predicts other things that turn out not to be true.

Has Baumeister responded to the replication failures? Yes, in multiple papers. He has acknowledged that the specific glucose mechanism was wrong, but has continued to defend the broader resource framework, often arguing that the replication studies used the wrong paradigm or population. The 2021 paradigmatic test was designed in consultation with him specifically to address this defense, and it was also null. Most working researchers in the field now treat the classic effect as not robust, regardless of which specific theoretical defense is offered.

replication-crisis behavioral-science self-control organizational-psychology evidence-evaluation

Free Tool

Built for Experimentation Teams

GrowthLayer is the experimentation platform I built for CRO teams --- test management, AI-powered insights, and pattern recognition across your entire program.

Explore GrowthLayer → (opens in new tab)

· Start Free →

Share this article

LinkedIn (opens in new tab) X / Twitter (opens in new tab)

Copy link

Go deeper

Methodology The PRISM Method Case Studies $30M+ in Results Work Together Services & Mentoring

Experimentation and growth leader. Builds AI-powered tools, runs conversion programs, and writes about economics, behavioral science, and shipping faster.

About LinkedIn Newsletter

← Previous

The Facial Feedback Hypothesis: A Pen in Your Teeth, a Camera in the Room, and a Detective Story About a Methodological Moderator

Next →

The Jam Study and Choice Overload: When the Moderators Matter More Than the Main Effect

Share this article
LinkedIn (opens in new tab) X / Twitter (opens in new tab)
Atticus Li

Experimentation and growth leader. CXL-certified CRO practitioner, Mindworx-certified behavioral economist (1 of ~1,000 worldwide). 200+ A/B tests across energy, SaaS, fintech, e-commerce, and marketplace verticals.