For 50 years, the Stanford Prison Experiment was the canonical proof that “ordinary people become evil under bad systems.” Then the audio tapes came out. What strategists, founders, and consultants should learn from one of the most-cited and least-replicable studies in social science.

In August 1971, twenty-four male college students walked into the basement of the Stanford psychology building. Six days later, the experiment was shut down. Half the participants had been “guards.” Half had been “prisoners.” According to the man running the study, things had spiraled so badly out of control that humanity itself was on display --- and what humanity had revealed about itself was ugly.

That story --- “Stanford Prison Experiment: ordinary people will become brutal if you put them in the wrong system” --- became one of the most-cited findings in 20th-century social science. It appeared in every introductory psychology textbook, in countless leadership books, in MBA programs, in arguments about prison reform, in TED talks about institutional culture, and eventually in a feature film starring Billy Crudup. The lesson seemed clear: situations dominate dispositions. Build a bad system, and good people will do bad things.

Almost none of it was true.

Or rather: the version of the story everyone was telling was a story. The thing that actually happened in the basement of Stanford’s Jordan Hall was something quite different --- and the difference matters enormously for anyone whose job involves thinking carefully about evidence, organizational design, or human behavior at scale.

This is the first in a series on famous studies that didn’t survive replication. Stanford Prison goes first because it’s a uniquely instructive case: not just “the data was thin” but “we have audio recordings of how the data was made.” Once you understand what actually went on in Jordan Hall, you’ll have a much sharper eye for the kinds of evidence claims that should make you skeptical.

What Zimbardo Said Happened

The standard textbook version: Philip Zimbardo, then a young psychology professor at Stanford, recruited 24 male volunteers via newspaper ad, paid them $15 a day, screened out anyone with criminal history or psychological issues, and randomly assigned them to be “guards” or “prisoners” in a simulated prison built in the basement of his department building. The plan was a 14-day study.

Within days, Zimbardo reported, the guards had become brutal. They woke prisoners at 2 a.m. for forced exercise. They stripped them. They put bags on their heads. They forced one prisoner to simulate sodomy on another. Several prisoners had emotional breakdowns. One --- “Prisoner #8612,” real name Doug Korpi --- went into an apparent psychotic episode within 36 hours, screaming, and had to be released early. The experiment was shut down on day six.

The conclusion Zimbardo and his coauthors drew, in a 1973 paper published in the International Journal of Criminology and Penology, was that the participants’ behavior arose spontaneously from the situation itself. Ordinary college students, given a uniform and a role, became sadists. Ordinary college students, stripped of dignity and given prisoner numbers, became broken. The implication for prison reform, military culture, and corporate behavior was massive: change the system, and the people change with it.

This story dominated social psychology for half a century. It became foundational to how the public, the media, and many serious institutions understand human nature.

What the Audio Tapes Show

In 2018, a French researcher named Thibault Le Texier obtained access to Stanford’s archive of original SPE materials --- researcher notes, interview transcripts, and most importantly, audio recordings. He published his findings in American Psychologist in 2019 in a paper titled, with admirable directness, “Debunking the Stanford Prison Experiment.”

What the archives showed was not what Zimbardo had been telling people for fifty years.

The guards were coached. The most damaging revelation was an audio recording in which David Jaffe, the study’s “warden” (and a Stanford undergraduate working under Zimbardo), explicitly instructed a “lenient” guard to be tougher. Jaffe told the guard that the experiment depended on the guards being “tough” so the prisoners would react. When the guard pushed back, Jaffe pressured him further. This wasn’t a one-off. The archives showed a pattern of researchers steering guards toward harshness --- including pre-experiment briefings in which Zimbardo himself emphasized that the guards needed to create a sense of “fear, frustration, arbitrariness.”

This single fact destroys the central claim. The famous SPE conclusion --- that brutality emerged spontaneously from the situation --- assumes the guards weren’t told to be brutal. They were.

The “psychotic break” was strategic. Doug Korpi (Prisoner #8612), whose breakdown became the iconic moment of the study, has stated in subsequent interviews --- notably with journalist Ben Blum for a 2018 Medium piece --- that he had faked the episode. He wanted out of the study. He had GRE exams to prepare for and discovered too late that he wasn’t allowed to study during the experiment. So he performed a meltdown to be released. His “breakdown” --- used for fifty years as proof that the situation had genuinely traumatized him --- was demand-characteristic theater.

The findings were known before the study. Several of the guards reported in interviews that they understood what Zimbardo wanted from them and tried to provide it. Some described their behavior as a kind of role-play --- what they thought a researcher studying prison brutality would want to see.

There was no peer review of the original methodology. The 1973 paper was not published in a top peer-reviewed psychology journal. It appeared in the International Journal of Criminology and Penology, a low-tier outlet at the time. Zimbardo’s media appearances did far more to establish the study’s credibility than its actual scientific reception did.

Le Texier’s findings were not contested by anyone except Zimbardo himself. The audio is real. The interviews are documented. The archival evidence is on the public record.

The Counter-Replication Almost No One Heard About

In 2002, the BBC funded a serious attempt at a Stanford-style prison experiment, run by social psychologists Stephen Reicher (St Andrews) and Alex Haslam (Exeter at the time, now Queensland). Fifteen male volunteers, eight days, five randomly assigned guards and ten randomly assigned prisoners, full ethics oversight, and --- critically --- no coaching.

What happened was almost the opposite of Zimbardo’s narrative. The guards didn’t spontaneously brutalize the prisoners. Many of them found the role uncomfortable and tried to be fair. As guard authority weakened, prisoners organized, demanded better treatment, and eventually staged a successful breakout. By the end of the study, prisoners and former guards were collaboratively building a more egalitarian regime. The researchers ended the experiment when the new “commune” began showing signs of becoming authoritarian itself --- but the overall arc was nothing like Stanford’s.

Reicher and Haslam published their findings in the British Journal of Social Psychology in 2006. Their interpretation was that what had happened at Stanford wasn’t spontaneous role-conformity at all. It was what they called “engaged followership”: guards followed instructions from a leader (Zimbardo, Jaffe) whom they identified with and whose project they wanted to support. Take away the engaged leadership, and you don’t get spontaneous brutality. You get awkward people in costumes who eventually figure out how to get along.

The BBC Prison Study didn’t get a feature film. There was no Hollywood version of “Twenty People In a Basement Try To Be Decent To Each Other.” There was no TED talk. The cultural mass behind the original Stanford story was so heavy that even an explicit, methodologically superior conceptual replication couldn’t dent it.

Why the Original Looked Real

If the Stanford Prison Experiment was so flawed, why did the field believe it for fifty years? Five reasons, all of which generalize to other replication failures.

A charismatic researcher and a media-friendly story. Zimbardo was a brilliant communicator. The SPE story is dramatic, visual, and easy to summarize in one sentence. Most cited findings in social psychology that turned out to be wrong share this property: they’re memorable enough to outrun their evidence.

A vivid n=24 study published in a forgettable journal. Twenty-four participants is not enough to make a strong claim about human nature. The study was never replicated by the original team in a more rigorous form. By the time the field had developed the statistical maturity to demand large preregistered samples, SPE had already become canonical, and challenging it was professionally risky.

The cultural moment. The early 1970s --- post-Milgram, post-My Lai, post-Vietnam --- was hungry for evidence that ordinary people could be turned into perpetrators by bad systems. SPE provided that evidence, and the cultural appetite for it overwhelmed normal scientific scrutiny.

Confirmation through anecdote. Once SPE was famous, every news story about prison abuse, military atrocity, or corporate malfeasance got framed as “another Stanford Prison.” Each anecdote made the original seem more validated, even though no new data was being added.

The textbook ratchet. Once a finding enters introductory psychology textbooks, it becomes very expensive to remove. Generations of students are trained to believe it. Those students become professors who teach the next generation. Removal requires a generational turnover and a critical mass of contradicting evidence --- and even then, the popular version persists for decades after the academic version is gone.

The Honest Verdict Today

The Stanford Prison Experiment is no longer treated as serious evidence of spontaneous role conformity by working social psychologists. Most modern textbooks have either removed it, demoted it to a historical case study, or added the Le Texier critique. Reicher and Haslam’s “identity leadership” framework --- that people behave badly when they’re following leaders they identify with, not when they’re spontaneously becoming bad --- is the dominant academic interpretation now.

Outside academia, the original story is still everywhere. Leadership books still cite it. MBA case studies still use it. Pop-psych podcasts still trot it out. The cultural memory has lagged the academic correction by at least a decade and probably more.

This gap --- between what the field believes and what the public believes --- is itself one of the most important things to understand about behavioral science evidence. The version of behavioral science that reaches you through TED talks, business books, and Twitter threads is consistently several years behind the version that exists in the journals, and is heavily filtered for storytelling potential rather than evidential strength.

What This Means If You’re a Strategist

If you’re a leader, founder, consultant, or anyone whose decisions depend on understanding human behavior, the Stanford Prison Experiment story has three concrete implications.

1. Distrust evidence that’s too narratively perfect. The reason SPE became famous wasn’t the strength of its data. It was the strength of its story. When a behavioral science finding is unusually clean, dramatic, and easy to summarize in a sentence, it should make you more skeptical, not less. Real human behavior is messy, contextual, and full of moderators. Findings that conveniently strip out the messiness are often artifacts of methodology or framing rather than discoveries about human nature.

This is a version of what Daniel Kahneman calls “the illusion of validity”: the more coherent a story sounds, the more we believe it, regardless of whether the underlying evidence supports it. For consultants and strategists, this bias is particularly costly because compelling stories are exactly what we get paid to produce. Build the discipline of asking “what’s the actual sample size, what’s the actual effect, what’s the actual replication record” before you treat a story as a reliable input to a decision.

2. The biggest organizational behavior lever isn’t “the system” --- it’s leadership signaling. The most important takeaway from the SPE/BBC Prison Study comparison isn’t “people are basically good.” It’s that people calibrate their behavior to what they perceive their leaders want. The Stanford guards were brutal because Zimbardo and Jaffe signaled that brutality was what the study needed. The BBC guards weren’t brutal because no one signaled they should be.

Translate this to organizational design. If your culture has problems --- sandbagging, blame-shifting, short-termism, customer hostility --- the temptation is to say “the system is causing this” and redesign processes. Sometimes that’s right. But more often, the strongest signal employees are responding to is what they think leadership actually wants, regardless of stated values. Misaligned incentive structures are loud. Leadership signaling is louder. The two together are dispositive.

This has direct implications for any change initiative. Don’t just redesign the org chart. Audit what behaviors leaders implicitly reward and which they implicitly tolerate. Those are the signals being followed.

3. Old findings deserve a fresh credibility check before you build strategy on them. Almost everyone has built a mental model of “how humans work” partly on the back of pop-psych findings from the 1970s and 80s. Stanford Prison. Marshmallow Test. Bystander Effect. Power Posing. Stereotype Threat. Almost all of these have either failed replication outright or had their effect sizes substantially revised downward in the last fifteen years.

This doesn’t mean behavioral science is useless. It means the half-life of a “finding” in this field is shorter than people assume, and the version of behavioral science you absorbed from pop sources is overdue for an audit. Before you cite a study to support a business decision --- pricing, hiring, organizational design, customer experience --- check the replication record. The five minutes it takes to look up a meta-analysis is the cheapest insurance against making decisions on findings that the field has quietly walked away from.

The strategist’s job is not to know the latest science. It’s to know which evidence claims have earned the right to influence a decision and which haven’t. That distinction is most of the value.

Sources

This article is part of an ongoing series on famous behavioral science studies that did not survive replication. Other entries cover power posing, the marshmallow test, ego depletion, the bystander effect, and the Mozart Effect. The full hub lives at /replication-crisis/.

If you’re building organizational, hiring, or pricing strategy on behavioral-science assumptions and want a careful audit of which of those assumptions still hold up, book an evidence review.

FAQ

Was the Stanford Prison Experiment ever peer-reviewed? The 1973 publication appeared in the International Journal of Criminology and Penology, which was a low-tier outlet at the time. The study did not undergo the kind of rigorous peer review now expected for major social-psychology claims. Its credibility was largely established through Zimbardo’s media presence rather than through the academic literature.

Has Zimbardo responded to the Le Texier critique? Zimbardo has published responses on his personal site and given media interviews defending the original study. He has not, to our knowledge, addressed the most damaging archival evidence --- particularly the audio recording of David Jaffe coaching guards to be tougher. The audio is publicly available and has been widely discussed since 2018.

Does this mean situations don’t influence behavior? No. Situations clearly do influence behavior --- that part of social psychology is well-supported by other research. What’s not supported is the strong claim that situations alone produce dramatic personality changes in ordinary people without any explicit pressure or leadership cue. The honest version is that situations and leadership signaling jointly shape behavior, and the SPE story dramatically overstated the situational part by hiding the leadership-signaling part.

Where can I read more about the replication crisis in behavioral science? Start with Stuart Ritchie’s Science Fictions (2020) for a general overview. For social psychology specifically, the Center for Open Science maintains a searchable database of registered replication reports. For ongoing coverage of which specific famous findings did and didn’t replicate, the hub page collects them in one place.

replication-crisis behavioral-science leadership organizational-psychology evidence-evaluation

Free Tool

Built for Experimentation Teams

GrowthLayer is the experimentation platform I built for CRO teams --- test management, AI-powered insights, and pattern recognition across your entire program.

Explore GrowthLayer → (opens in new tab)

· Start Free →

Share this article

LinkedIn (opens in new tab) X / Twitter (opens in new tab)

Copy link

Go deeper

Methodology The PRISM Method Case Studies $30M+ in Results Work Together Services & Mentoring

Experimentation and growth leader. Builds AI-powered tools, runs conversion programs, and writes about economics, behavioral science, and shipping faster.

About LinkedIn Newsletter

Next →

Power Posing: How One of TED’s Most-Watched Talks Outlasted Its Own Science

Share this article
LinkedIn (opens in new tab) X / Twitter (opens in new tab)
Atticus Li

Experimentation and growth leader. CXL-certified CRO practitioner, Mindworx-certified behavioral economist (1 of ~1,000 worldwide). 200+ A/B tests across energy, SaaS, fintech, e-commerce, and marketplace verticals.