The Overjustification Effect: When Rewards Backfire (And When They Don't)

Atticus Li

← The Replication Crisis · replication-crisis

The Overjustification Effect: When Rewards Backfire (And When They Don't)

In 1973, Stanford preschoolers who were promised a "Good Player" award for drawing later spent half as much free time drawing as kids who got nothing. The takeaway became "rewards kill motivation." The real science is more contested — and more useful — than that slogan.

By Atticus Li May 29, 2026 21 min read

In the early 1970s, at the Bing Nursery School on the Stanford University campus, a team of psychologists watched preschoolers draw. The researchers had identified children who already loved drawing with multicolored felt-tipped markers — “magic markers” not normally available in the classroom — and they split those children into three groups. One group was shown a fancy “Good Player Award,” a certificate with a gold seal and a bright red ribbon and a space for the child’s name, and asked: would you like to win one of these by drawing some pictures? They drew, and they got the award. A second group drew and then received the same award as a complete surprise, with no prior promise. A third group simply drew, with no award offered or given.

Then the researchers waited one to two weeks, brought the markers back out during ordinary free-play time, and — through one-way mirrors — measured how long each child chose to draw with no reward on the table at all.

The result became one of the most cited findings in the psychology of motivation. The children who had been promised the award and then received it spent dramatically less of their free time drawing than the children in the other two groups. The activity they had freely chosen and enjoyed two weeks earlier had become, for the expected-reward group, something they did noticeably less of once the reward stopped. The unexpected-reward group and the no-reward group kept drawing about as much as ever.

This is the overjustification effect, and the pop-science version of its lesson is brutally simple: rewards kill intrinsic motivation. Pay your kid to read and they’ll read less for pleasure. Bonus your engineers for shipping and they’ll care less about the craft. Gamify a behavior people already enjoy and you’ll extinguish the joy. The finding has been used to argue against grades, against performance pay, against loyalty points, against allowances, and against virtually every incentive scheme a manager or parent might dream up.

The actual scientific story is more interesting, more contested, and far more useful than the slogan. The effect is real. It is also bounded — it depends on the type of reward, whether it was expected, what it was contingent on, and whether the person already found the task interesting. And the question of how big and how general the effect really is touched off one of the most consequential meta-analytic wars in the history of psychology, fought across the pages of the field’s top journals over a decade. For anyone designing compensation, gamification, or incentive programs, the difference between the slogan and the calibrated truth is the difference between a policy that backfires and one that works.

What Lepper, Greene, and Nisbett Actually Found

The study is Lepper, Greene, and Nisbett (1973), “Undermining children’s intrinsic interest with extrinsic reward: A test of the ‘overjustification’ hypothesis,” published in the Journal of Personality and Social Psychology, volume 28, issue 1, pages 129–137. It is worth being precise about what it did and did not show, because the precision is where the practical lessons live.

The final sample was 51 children (19 boys and 32 girls), all selected because baseline observation through one-way mirrors had established that they already found the drawing activity intrinsically interesting. That selection criterion matters enormously and is almost always dropped from the pop summary: the effect was demonstrated only in children who already liked the activity. The reward was a symbolic certificate — the “Good Player Award” with its gold seal and red ribbon — not money. And the dependent measure was not performance or effort during the rewarded session; it was the proportion of free-choice time the child spent on the activity one to two weeks later, when no reward was available and the observation was unobtrusive.

The numbers, from the paper’s Table 1, are the part nobody quotes. In the expected-award condition, children spent an average of 8.59% of their free-choice time on the target drawing activity. In the no-award control condition, that figure was 16.73%. In the unexpected-award condition, it was 18.09%. So the expected-reward children spent roughly half as much free time drawing as the children in the other two conditions — and crucially, the unexpected-award children were statistically indistinguishable from the no-award children. The planned contrast comparing the expected-award group against the other two was statistically significant (the authors reported F = 6.19 on 1 and 48 degrees of freedom, p < .025).

That single result encodes the entire bounded structure of the effect. Getting a reward did not undermine motivation. Expecting a reward, contracting to do an enjoyable activity in order to obtain it, did. The same gold-sealed certificate produced no detectable harm when it arrived as a surprise. The mechanism the authors proposed, drawing on Daryl Bem’s self-perception theory, was that children in the expected-reward condition inferred their own motivation from the situation: “I’m doing this to get the award” — which, when the award disappeared, left them with less reason to keep going. The unexpected-reward children had no such inference to make; from their point of view they had drawn because they wanted to, and a nice thing happened afterward.

The original study was, for its era, a methodologically careful field experiment with a real-world behavioral outcome measured in a naturalistic setting. It was not a fragile lab artifact. But it was a single study, on preschoolers, with a symbolic reward, on an already-loved activity — and the gap between that carefully scoped finding and “rewards kill motivation” is exactly the gap the next twenty years of research would fight over.

Deci 1971: The Soma Puzzle and the Origin of the Idea

Lepper and colleagues were testing a hypothesis that had been planted two years earlier by Edward Deci (1971), “Effects of externally mediated rewards on intrinsic motivation,” Journal of Personality and Social Psychology, volume 18, issue 1, pages 105–115. Deci gave college students the Soma cube — a genuinely absorbing spatial puzzle — and had them reproduce configurations across three sessions. One group was paid one dollar per puzzle solved; the other was not paid. The real measurement came during a “free choice” period when the experimenter left the room on a pretext and the participant was alone with the puzzles and some magazines, free to do whatever they liked.

The paid students spent less of that free time playing with the puzzle than the unpaid students. Money, introduced into an already-interesting activity, appeared to reduce the willingness to do it for its own sake once the money stopped. Deci built this into what became, with Richard Ryan, cognitive evaluation theory and eventually self-determination theory — the framework holding that external events affect intrinsic motivation through their impact on two psychological needs, autonomy and competence. Rewards experienced as controlling (“I’m doing this because I’m being paid to”) thwart autonomy and undermine intrinsic motivation; rewards or feedback experienced as informational (“this tells me I’m good at this”) satisfy competence and can enhance it.

That theoretical distinction — controlling versus informational — is the load-bearing wall of the entire debate. It predicts exactly the pattern Lepper found: an expected, contingent, controlling reward undermines; a non-contingent or informational one does not. And it set up the central empirical question: when you pool all the studies together, does the undermining effect actually show up reliably, or is it a narrow finding that gets over-generalized?

The Meta-Analytic War: Cameron-Pierce vs. Deci-Koestner-Ryan

By the early 1990s there were dozens of experiments on rewards and intrinsic motivation, pointing in different directions. The obvious move was a meta-analysis. The problem is that two teams ran meta-analyses on overlapping literature and reached nearly opposite conclusions — and the reason they diverged is a master class in why “the meta-analysis says” is never the end of an argument.

The first salvo was Cameron and Pierce (1994), “Reinforcement, reward, and intrinsic motivation: A meta-analysis,” Review of Educational Research, volume 64, issue 3, pages 363–423. Working largely from an operant-conditioning perspective skeptical of the whole “intrinsic motivation” construct, Cameron and Pierce synthesized roughly 96 experiments and concluded that the undermining effect was, in their framing, “a limited phenomenon.” Overall, they argued, reward did not decrease intrinsic motivation. Verbal praise reliably increased it. And the only condition under which they found a negative effect was a narrow one — expected tangible rewards handed out simply for doing a task — and even there the effect on free-time engagement was, in their reading, minimal. Their bottom line for educators and managers was reassuring: go ahead and use rewards; the fear that they destroy motivation is overblown.

This landed like a grenade, because it directly contradicted two decades of work by Deci, Ryan, and their collaborators. The rebuttal came in Deci, Koestner, and Ryan (1999), “A meta-analytic review of experiments examining the effects of extrinsic rewards on intrinsic motivation,” Psychological Bulletin, volume 125, issue 6, pages 627–668 — a 128-study meta-analysis that remains the most-cited word on the subject. Their core methodological complaint was that Cameron and Pierce had collapsed across reward contingencies that cognitive evaluation theory says behave very differently, washing out a real effect by averaging undermining and non-undermining conditions together. When you separate the conditions the theory says should matter, Deci and colleagues argued, the undermining effect is not limited at all — it is pervasive.

Their numbers, organized by reward type, are the most useful single table in this literature. Reading them as Cohen’s d effect sizes (negative means undermining, positive means enhancing):

Engagement-contingent rewards (you get paid just for working on the task): d = −0.40 on free-choice behavior.
Completion-contingent rewards (you get paid for finishing it): d = −0.36.
Performance-contingent rewards (you get paid for doing it well): d = −0.28.
All tangible rewards and all expected rewards: significantly undermining.
Positive feedback / verbal praise: d = +0.33 on free-choice behavior and d = +0.31 on self-reported interest — that is, praise enhanced intrinsic motivation.

They also found that tangible rewards were more detrimental for children than for college students, and that verbal rewards were less enhancing for children than for adults — a developmental wrinkle with obvious implications for how you think about kids versus employees.

Notice that Deci, Koestner, and Ryan and Cameron and Pierce actually agree on two of the most practically important points: verbal praise helps, and unexpected or non-contingent rewards don’t hurt. Where they disagree is on the magnitude and reach of the tangible-expected-reward undermining effect — Cameron and Pierce call it limited and inconsequential for policy; Deci, Koestner, and Ryan call it pervasive and real. The dispute did not end in 1999. Eisenberger, Pierce, and Cameron published a comment in the same 1999 issue of Psychological Bulletin; the two camps traded further articles in 2001 in Review of Educational Research; and as late as 2012 the Ryan camp published a paper with the pointed title “Pervasive negative effects of rewards on intrinsic motivation: The myth continues,” signaling that, twenty years on, neither side had conceded.

For a strategist, the lesson of the war is not “pick a winner.” It is that two competent teams analyzing the same literature reached different conclusions because they made different, defensible choices about how to group the studies — and that the structure both teams ultimately agree on (it depends on reward type, expectancy, and contingency) is more reliable than either headline.

The Calibrated Boundary Conditions

Strip away the rhetoric and the actual, replicable structure of the overjustification effect comes down to four dials. The effect is strong when the dials line up one way and absent or reversed when they line up the other.

1. Reward type: tangible vs. verbal. Tangible rewards — money, prizes, certificates, gift cards, points redeemable for stuff — carry undermining risk. Verbal rewards — praise, recognition, informational feedback about competence — generally enhance intrinsic motivation. This is one of the few points both warring meta-analyses endorse. “Nice work, that was a genuinely hard problem and you nailed it” is not the same kind of intervention as a fifty-dollar bonus, and they have opposite expected effects on intrinsic interest.

2. Expectancy: expected vs. unexpected. This is the cleanest result in the entire Lepper study — expected awards undermined, the identical award given as a surprise did not. A reward the person works toward becomes part of their explanation for the behavior. A reward that arrives after the fact, unanticipated, does not rewrite that explanation. Surprise spot-bonuses and unannounced recognition are far safer than dangled, contracted incentives.

3. Contingency: what the reward is tied to. Deci, Koestner, and Ryan’s effect sizes get worse as the reward couples more tightly to merely doing the task (engagement-contingent, d = −0.40) and somewhat less bad as it couples to doing the task well (performance-contingent, d = −0.28), because performance rewards carry an informational signal about competence that partly offsets the controlling effect. Rewards that are non-contingent — given for showing up, not for the specific activity — are the least undermining of all.

4. Baseline interest. The entire effect was demonstrated on children pre-selected for already loving the activity. You cannot undermine intrinsic motivation that wasn’t there. For genuinely boring, tedious, or unpleasant tasks — the ones nobody does for their own sake — rewards don’t have intrinsic motivation to crowd out, and there is good evidence they straightforwardly help get the behavior done. The undermining risk is concentrated in exactly the activities people would do anyway.

The honest one-sentence summary: expected, tangible rewards contingent on simply doing an already-interesting task reliably reduce free-choice persistence at that task once the reward stops; unexpected rewards, verbal praise, and rewards for boring tasks generally do not, and often help.

Implications for Compensation, Gamification, and Incentive Design

This is where the calibrated version pays for itself, because the slogan (“never use rewards”) and the dismissal (“rewards are fine, the worry is a myth”) are both wrong in ways that produce real damage.

Compensation design. The overjustification literature is not an argument against paying people. Salary is non-contingent relative to any specific interesting task; nobody’s intrinsic love of coding is being crowded out by their base pay. The risk concentrates in narrow, tightly-contingent piece-rate-style incentives layered on top of work that people already find meaningful. If you bolt a per-feature bonus onto engineers who ship out of craft pride, the theory predicts you may convert “I build this because it’s good work” into “I build this for the bonus” — and then the bonus has to keep growing to sustain behavior that used to be free. The safer pattern, consistent with the evidence, is to pay people well and non-contingently for the role, reserve tangible incentives for genuinely undesirable tasks nobody does for love, and deliver recognition for excellent work verbally and informationally (“this was exceptional, here’s specifically why”) rather than converting every win into a transaction. Performance-contingent pay isn’t poison — its d is the least negative of the tangible contingencies because it signals competence — but it should be designed to maximize the informational signal (“this reflects that you did unusually well”) and minimize the controlling one (“do this or no money”).

Gamification. This is where the effect is most routinely violated. The classic gamification mistake is to take an activity users already enjoy — reading, contributing, creating — and bolt on points, badges, and leaderboards as expected, tangible, engagement-contingent rewards. That is the precise recipe Lepper and Deci-Koestner-Ryan identify as undermining: maximum baseline interest, maximum expectancy, maximum contingency on mere participation. The predictable failure mode is that engagement spikes while the points are novel and then falls below baseline when the program ends or the points lose value, because the activity has been recoded as point-farming. Gamification works best where it adds informational feedback (progress indicators, competence signals, “you’ve mastered X”) rather than controlling extrinsic carrots, and where it targets behaviors users don’t already do for their own sake.

Incentive programs and loyalty. Referral bonuses, loyalty points, and spiffs are most defensible when applied to behaviors people would not perform intrinsically (filling out a tedious survey, completing a clunky onboarding step) and least defensible when applied to behaviors that were already self-sustaining (organic word-of-mouth, community participation). Paying for what people did freely risks teaching them not to do it freely anymore. And because the developmental data show tangible rewards are more corrosive for children, anyone designing rewards in education or family contexts should weight the warning more heavily than in adult workplaces.

The Strategist’s Takeaway

The overjustification effect is a near-perfect case study in how a real, bounded scientific finding gets flattened into a useless slogan — and how recovering the boundary conditions turns it back into a decision tool.

“Rewards kill motivation” is wrong: unexpected rewards don’t, verbal praise actively helps, and rewards for boring tasks are fine. “The worry is overblown” is also wrong: expected tangible rewards for already-interesting work do reliably undermine, with effect sizes (d around −0.3 to −0.4) that are far from trivial. The calibrated truth lives in the four dials — reward type, expectancy, contingency, baseline interest — and a strategist who can name those four dials can predict, for any specific incentive scheme, whether it will help or hurt.

The deeper discipline is the same one this entire hub keeps surfacing. When someone cites “the overjustification effect” to win an argument — for or against a bonus plan, a gamification feature, an allowance — the right response is not to accept or reject the citation but to ask the boundary-condition questions. Is the reward tangible or verbal? Expected or a surprise? Contingent on what? And does the person already do this for its own sake? A finding you can only invoke as a slogan is protecting nobody’s decision. A finding you can decompose into the conditions under which it holds is doing real work. The science here is genuinely contested at the level of “how big and how general” — and genuinely settled at the level of “here is the structure.” For practical purposes, the structure is what you need.

Sources

Lepper, M. R., Greene, D., & Nisbett, R. E. (1973). Undermining children’s intrinsic interest with extrinsic reward: A test of the “overjustification” hypothesis. Journal of Personality and Social Psychology, 28(1), 129–137. DOI: 10.1037/h0035519
Deci, E. L. (1971). Effects of externally mediated rewards on intrinsic motivation. Journal of Personality and Social Psychology, 18(1), 105–115. DOI: 10.1037/h0030644
Cameron, J., & Pierce, W. D. (1994). Reinforcement, reward, and intrinsic motivation: A meta-analysis. Review of Educational Research, 64(3), 363–423. DOI: 10.3102/00346543064003363
Deci, E. L., Koestner, R., & Ryan, R. M. (1999). A meta-analytic review of experiments examining the effects of extrinsic rewards on intrinsic motivation. Psychological Bulletin, 125(6), 627–668. DOI: 10.1037/0033-2909.125.6.627
Eisenberger, R., Pierce, W. D., & Cameron, J. (1999). Effects of reward on intrinsic motivation — Negative, neutral, and positive: Comment on Deci, Koestner, and Ryan (1999). Psychological Bulletin, 125(6), 677–691. DOI: 10.1037/0033-2909.125.6.677
Deci, E. L., Koestner, R., & Ryan, R. M. (2001). Extrinsic rewards and intrinsic motivation in education: Reconsidered once again. Review of Educational Research, 71(1), 1–27. DOI: 10.3102/00346543071001001
Cerasoli, C. P., Nicklin, J. M., & Ford, M. T. (2014). Intrinsic motivation and extrinsic incentives jointly predict performance: A 40-year meta-analysis. Psychological Bulletin, 140(4), 980–1008. DOI: 10.1037/a0035661

Grit: Real, But Barely Distinguishable From Conscientiousness — another motivation construct where the popular framing outran the meta-analytic evidence.
The Self-Esteem Movement: What the Evidence Actually Showed — an education-policy intervention built on a finding that didn’t generalize the way reformers assumed.
Growth Mindset: Real Effect, Oversold Magnitude — the closest cousin to this story: a genuine but small effect inflated into a universal lever.
Maslow’s Hierarchy of Needs: The Pyramid Maslow Never Drew — a motivation framework whose pop version diverged sharply from what the research supported.
Cognitive Dissonance: A Robust Core Wrapped in Oversold Extensions — the self-perception-theory rival, and a model for “core robust, extensions overstated.”

FAQ

Does the overjustification effect mean I should never pay people for work they enjoy? No. Base salary is non-contingent with respect to any specific interesting task and does not crowd out intrinsic motivation — nobody loves their craft less because they’re paid a salary. The undermining risk is narrow: it applies to expected, tangible rewards tightly contingent on simply doing an already-interesting activity. Pay people well for the role, reserve transactional incentives for genuinely tedious tasks, and deliver recognition for great work verbally and informationally rather than converting every win into a bonus.

Is the overjustification effect real, or did it fail to replicate? It is real and reasonably robust — but bounded. Unlike several famous social-psychology effects that collapsed in the replication crisis, the undermining of intrinsic motivation by expected tangible rewards has held up across a 128-study meta-analysis (Deci, Koestner, & Ryan, 1999), with effect sizes around d = −0.3 to −0.4. What is genuinely contested is its magnitude and generality, not its existence. Cameron and Pierce (1994) argued it’s a limited phenomenon; Deci, Koestner, and Ryan argued it’s pervasive. Both agree on the structure.

What’s the difference between the two famous meta-analyses? Cameron and Pierce (1994) pooled studies in a way that averaged across reward contingencies and concluded reward mostly doesn’t harm intrinsic motivation. Deci, Koestner, and Ryan (1999) argued that averaging across contingencies washes out a real effect, and that when you separate engagement-, completion-, and performance-contingent rewards, the undermining effect is pervasive for tangible expected rewards. Notably, both teams agree that verbal praise enhances motivation and that unexpected rewards don’t undermine — they disagree mainly about how big and how policy-relevant the tangible-reward effect is.

Does praise also undermine intrinsic motivation? Generally the opposite. Both major meta-analyses found that verbal rewards — praise and positive informational feedback — enhance intrinsic motivation (Deci, Koestner, & Ryan reported d = +0.33 on free-choice behavior). The caveat from cognitive evaluation theory is that praise must be experienced as informational (“here’s specifically what you did well”) rather than controlling (“good, you did what I wanted”). Controlling praise can carry some of the same risk as a controlling tangible reward, and the enhancing effect of praise is weaker in children than in adults.

Why does gamification so often backfire on this? Because the standard gamification move violates all four boundary conditions at once: it takes an activity users already enjoy (high baseline interest), adds expected, tangible points and badges, made contingent on mere participation. That is the exact recipe the research identifies as maximally undermining. Engagement spikes while points are novel, then often falls below baseline once they lose value, because the activity gets recoded as point-farming. Gamification that adds informational competence feedback, or that targets behaviors users don’t already do intrinsically, avoids the trap.

Were the original findings on children or adults — and does it matter? The Lepper (1973) magic-marker study was on preschoolers; Deci (1971) was on college students; both found undermining. It matters because Deci, Koestner, and Ryan’s meta-analysis found tangible rewards are more detrimental for children than for college students, and verbal rewards less enhancing for children. So the warning should be weighted more heavily in education and parenting contexts than in adult workplaces, where well-designed performance-contingent and informational rewards have more room to help.

How should I evaluate someone who cites “the overjustification effect” to justify a decision? Ask the four boundary-condition questions. Is the reward tangible or verbal? Expected or a surprise? Contingent on what — showing up, doing the task, or doing it well? And does the person already do this activity for its own sake? If the citation survives all four — an expected, tangible, engagement-contingent reward layered onto an already-loved activity — the undermining concern is legitimate. If it fails any of them, the person is invoking a slogan, not the science. A finding you can’t decompose into its conditions isn’t protecting the decision.

replication-crisisoverjustification-effectintrinsic-motivationincentive-designevidence-evaluation

Atticus Li

Experimentation and growth leader. CXL-certified CRO practitioner, Mindworx-certified in behavioral economics. Led 100+ in-house experiments at NRG in 2025, with project evidence and limits documented in the case studies.

About LinkedIn Newsletter

What Lepper, Greene, and Nisbett Actually Found

Deci 1971: The Soma Puzzle and the Origin of the Idea

The Meta-Analytic War: Cameron-Pierce vs. Deci-Koestner-Ryan

The Calibrated Boundary Conditions

Implications for Compensation, Gamification, and Incentive Design

The Strategist’s Takeaway

Sources

Related Reading

FAQ

Related Articles

Cohen's d And The Misuse Of "Small/Medium/Large" Effect Sizes

The False Consensus Effect: Why You Think Everyone Agrees With You

The Barnum/Forer Effect: Why Personality Tests And Horoscopes Feel So Accurate

Get the WeeklyExperimentation Playbook

Get the Weekly
Experimentation Playbook