Milgram Obedience Experiments: What The Yale Archives Actually Show

Atticus Li

← Blog · replication-crisis

Milgram Obedience Experiments: What The Yale Archives Actually Show

The famous "65% of people will shock a stranger to death" finding is a real result from a single condition out of 24. When Gina Perry combed Milgram's Yale archives, the picture got more complicated — and more interesting — than the textbook version.

By Atticus Li May 8, 2026 27 min read

The famous “65% of people will shock a stranger to death” finding is a real result from a single condition out of 24. When Gina Perry combed Milgram’s Yale archives, the picture got more complicated --- and more interesting --- than the textbook version.

The story is one of the most cited in 20th-century social science. In 1961, a young Yale psychologist named Stanley Milgram recruited ordinary New Haven residents --- schoolteachers, postal workers, salesmen --- through a newspaper ad. He told them they would participate in a study on memory and learning. He paid them four dollars (about $43 in today’s money) and brought them into a basement lab. There, an experimenter in a grey lab coat instructed each participant to deliver electric shocks of escalating intensity to a “learner” --- actually an actor --- strapped into a chair in the next room. The shocks ranged from 15 to 450 volts. The voltage levels were labeled, ending with “XXX” past “Danger: Severe Shock.”

The learner screamed. He complained about a heart condition. He fell silent. The experimenter, calmly, kept saying things like “The experiment requires that you continue.” And 65 percent of participants kept going. All the way. To 450 volts. To a man who had stopped responding entirely.

The conclusion Milgram drew, in his 1963 Journal of Abnormal and Social Psychology paper and then in his 1974 book Obedience to Authority, was epochal: ordinary people, under instruction from a legitimate authority figure, will do terrible things to other human beings. The Holocaust was not the work of monsters. It was the work of bureaucrats and butchers and clerks who, when an authority told them to do something monstrous, did it. “Nazis weren’t special.” Anyone could be made into a torturer.

This finding has shaped how we think about authority, compliance, organizational ethics, and the human capacity for evil for more than sixty years.

The trouble is: the 65 percent figure is real but selectively reported. The “all the way to 450 volts” framing leaves out almost everything interesting that was actually in Milgram’s data. And when the Australian psychologist Gina Perry got into the Yale archives where Milgram had deposited his audio tapes, his unpublished notes, and his post-experiment correspondence with participants, she found a study that looks materially different from the one that became canonical.

This is the sixteenth article in a series on famous behavioral science studies that didn’t survive scrutiny. Unlike most of them, the Milgram case isn’t a clean replication failure. The basic phenomenon --- that people will do harmful things under authority pressure --- is real. What’s wrong is the simplified story we tell about it. The “65 percent will shock to death” version is one condition out of twenty-four, stripped of every moderator, every doubt, every off-script improvisation, and every participant who wasn’t sure the shocks were real. The underlying truth is both less reassuring and less damning than the popular version, and it has a different lesson for anyone thinking about authority and compliance in organizations.

The Original 1963 Experiment

Milgram’s first paper, “Behavioral Study of Obedience,” appeared in the Journal of Abnormal and Social Psychology in October 1963. The setup is the famous one: a participant (the “teacher”) was paired with a confederate (the “learner”) and told the study concerned the effect of punishment on learning. The teacher sat at a control panel of thirty switches labeled from 15 to 450 volts, in 15-volt increments. The labels escalated from “Slight Shock” through “Moderate,” “Strong,” “Very Strong,” “Intense,” “Extreme Intensity,” “Danger: Severe Shock,” and finally two switches simply marked “XXX.” The participant was instructed to deliver a higher shock each time the learner answered incorrectly. The shocks were fake. The learner’s pre-recorded protests, screams, complaints about a heart condition, and eventual silence were scripted.

When the teacher hesitated, the experimenter --- a stern man in a grey technician’s coat --- used a sequence of four scripted verbal prods, escalating from “Please continue” through “The experiment requires that you continue” and “It is absolutely essential that you continue” to “You have no other choice; you must go on.” If the participant refused after all four prods, the trial ended.

The headline result from this first published condition: of forty male participants aged 20—50, twenty-six (65 percent) continued administering shocks to the maximum 450 volts. Every single participant continued to at least 300 volts. None broke off earlier than that. The result shocked Milgram himself, who had asked colleagues and psychiatrists to predict the outcome in advance --- their predictions had clustered around 1—3 percent obedience.

This is the version the world remembers. It deserves to be remembered, because it really happened, and the rate is genuinely much higher than reasonable people would have guessed. The trouble starts when you treat this single condition’s result as the finding from “the Milgram experiment” --- singular --- rather than as one data point in a sprawling, varied research program with very different results in other configurations.

The 24 Conditions Most People Never Hear About

Over the next several years, Milgram ran twenty-four distinct experimental conditions, manipulating things like the proximity of the learner, the proximity of the experimenter, the prestige of the institution, the gender of the participant, the presence of peers, and many others. Eighteen of these conditions were reported in his 1974 monograph; the other six appeared only in unpublished records.

A 2014 meta-analysis published in Perspectives on Psychological Science synthesized 21 of these conditions for which directly comparable obedience data exist (N = 740 participants). The overall obedience rate across those conditions --- meaning the percentage of participants who continued all the way to 450 volts --- was 43.6 percent, not 65 percent. The condition-by-condition variation is enormous:

Condition 1 (no learner feedback): 65% obedience --- the famous baseline.
Condition 2 (voice feedback from the next room): 62.5%.
Condition 3 (learner in the same room): 40%.
Condition 4 (participant had to physically place learner’s hand on the shock plate): 30%.
Condition 7 (group pressure to disobey): 10%.
Condition 13 (participant in a non-trigger role, instructing someone else): 92.5%.
Condition 14 (participant chooses their own shock level --- “carte blanche”): 2.5%.
Condition 20 (Bridgeport, a non-university setting): 47.5%.
Condition 24 (the learner is a friend or family member of the participant): 15%.

The pattern is clear: obedience rates depend enormously on the situation. Move the learner closer, and obedience drops sharply. Take the prestige of Yale away, and it drops further. Have the participant choose their own shock level with no instruction, and it collapses to almost nothing. Put group pressure against the experimenter, and obedience essentially vanishes.

The textbook version --- “65% of people will torture a stranger if a man in a lab coat asks them to” --- silently selects the single highest-obedience scripted condition and presents it as if it represented humanity. The actual data, considered across all twenty-four conditions, supports a much more conditional claim: that under a specific combination of high institutional prestige, distant victim, absent witnesses, and constant authority pressure, a high proportion of people will comply with harmful instructions. Outside that specific configuration, compliance varies from near-universal (when the participant is insulated from the act) to near-zero (when the participant has any social or physical proximity that breaks the configuration).

This is not a small caveat. It changes the lesson of the study substantially.

Perry’s Archival Investigation

In 2007, the Australian psychologist Gina Perry began a multi-year investigation into the Milgram experiments. The Yale archives, where Milgram had deposited audio recordings, participant interview notes, unpublished pilot data, post-experiment correspondence with participants, and his own working papers, had been available to researchers for decades. Few had used them at scale. Perry did. She also interviewed surviving participants, surviving research assistants, and Milgram’s former students. Her book, Behind the Shock Machine: The Untold Story of the Notorious Milgram Psychology Experiments, was published in 2013.

Four findings from the archives substantially change the picture.

Many participants suspected or knew the shocks were fake. Milgram knew this. In an unpublished analysis from his own files, the relationship between participants’ belief that the shocks were genuine and their obedience was systematically examined --- and the pattern undermined the headline reading. Perry has reported that, on the unpublished data, participants who fully believed the learner was being hurt were substantially more likely to defy the experimenter than participants who suspected the shocks were fake. In other words: the people who went all the way to 450 volts were disproportionately those who thought the whole thing was a setup. Milgram acknowledged this issue privately in correspondence but did not give it substantial weight in his published work, where he largely dismissed the suggestion that participants saw through the deception.

This single finding upends the inference structure of the famous claim. The “65 percent will shock a man to death” framing depends on the participants believing they were genuinely shocking a man. If half or more of them suspected it was a hoax, then the high obedience rate is partly a measure of how many people went along with what they perceived as theater, not how many were willing to kill on instruction.

The experimenter went off-script. Milgram’s published methodology specified the four scripted prods, in order, and only those prods. If the participant refused after the fourth prod, the trial ended. Perry’s analysis of the original audio tapes --- and subsequent rhetorical analyses by Stephen Gibson published in the British Journal of Social Psychology in 2013 --- showed that the experimenter, Mr. John Williams in the original studies, frequently improvised additional pressure beyond the four scripted prods. He delivered them out of sequence. He repeated them. He sometimes invented new ones to keep the participant at the panel. In some sessions, the prods were used not as a four-step ladder leading to a halt but as a continuous coercive sequence. In one condition with a female participant documented by Perry, the experimenter delivered prods fourteen times.

This is not a minor methodological wrinkle. The whole premise of Milgram’s design is that the obedience rate is a clean measure of how participants respond to a fixed, standardized authority script. If the script wasn’t fixed --- if the experimenter was, in effect, ad-libbing to maximize compliance --- then the obedience rate is partly a measure of the experimenter’s persuasive skill, not the participant’s propensity to obey.

Some participants were severely distressed; debriefing was inadequate. Milgram defended the ethics of the study by claiming that participants had been carefully debriefed and that no lasting harm had been caused. Ian Nicholson, in a 2011 paper in Theory & Psychology titled “Torture at Yale,” used the same Yale archive to document a different picture. Some participants wrote distressed post-experiment letters describing weeks or months of distress. Some had not been told the shocks were fake before they left the lab --- full debriefing was delayed in some cases for months, while participants believed they had actually injured another person. By modern IRB standards, the informed consent and debriefing protocols of the Milgram studies would not be approvable. Nicholson’s reading of the archive is that Milgram managed the public narrative about participant well-being far more carefully than he managed the actual well-being of participants.

Milgram selected which conditions to publicize and how. The 1963 paper reports the single 65 percent baseline. The 1974 book reports eighteen conditions but consistently foregrounds the high-obedience configurations and presents the variation across conditions in a framing that preserves the “anyone can be made to do it” lesson. Perry’s reading of the unpublished notes and correspondence suggests Milgram understood the lower-obedience and disbelief-correlated patterns clearly but did not let them disrupt the headline. This is not a fabrication. It is selective emphasis. But the cumulative effect is that the version that reached the public was a curated subset of the data, presented in a frame that the data only partly supports.

Burger’s 2009 Partial Replication

In 2009, Jerry Burger of Santa Clara University published a partial replication of the Milgram baseline condition in American Psychologist. By modern IRB standards, a full replication is not possible --- no ethics committee in the United States or Europe will approve a study that runs participants through to 450 volts under the original protocol. Burger’s design was a careful workaround.

Burger followed Milgram’s procedure exactly up to the 150-volt mark, the point at which the learner first protests verbally and asks to be let out. At 150 volts, the experiment was stopped. Burger’s reasoning, supported by Milgram’s own data, was that 150 volts is an inflection point: of Milgram’s original participants who crossed that threshold, 79 percent continued all the way to 450. So 150-volt compliance is a defensible proxy for full-protocol obedience.

The study used 70 participants (29 men and 41 women), aged 20—81, screened to exclude anyone with substantial psychology coursework or familiarity with Milgram’s work. Participants underwent extensive clinical screening to exclude those at risk of harm. The experimenter was a trained clinical psychologist, present throughout, with authority to halt the study at any sign of distress.

The result: 70 percent of participants continued past 150 volts in Burger’s primary condition, compared to 82.5 percent in Milgram’s comparable condition. The difference was not statistically significant. A condition with a defiant peer reduced obedience modestly but, contrary to expectations, did not eliminate it. There were no significant gender differences.

Burger’s interpretation: the situational forces Milgram identified are still operative nearly half a century later. Average people, in the relevant configuration, will still cross the 150-volt threshold at high rates.

What Burger’s study can and cannot tell us is important to keep straight. It can support the claim that the partial obedience phenomenon --- willingness to continue past 150 volts --- is real and robust over time. It cannot, by design, replicate the full “shock to 450 volts” finding, because no one is allowed to run that study anymore. The popular framing of Burger as “Milgram has been replicated” is partially accurate and partially overstated. The 150-volt result is replicated. The 450-volt result is not replicated and cannot be, and the 79 percent extrapolation from Milgram’s data rests on the original Milgram data, which is itself the subject of Perry’s archival critique.

The Engaged-Followership Reinterpretation

In 2015, Alex Haslam, Stephen Reicher, and colleagues published a paper in the British Journal of Social Psychology titled “‘Happy to have been of service’: The Yale archive as a window into the engaged followership of participants in Milgram’s ‘obedience’ experiments.” It is one of the most important recent reframings of what Milgram’s data actually shows.

Haslam et al. went back to a class of Yale archive material that prior critics had largely overlooked: the post-experiment feedback questionnaires that Milgram sent participants and their written responses. Across the responses they analyzed, just over two-thirds of participants reported being moderately, highly, or extremely engaged in Milgram’s scientific project. Fewer than 10 percent could be characterized as disengaged. Many participants --- including those who had gone all the way to 450 volts --- wrote that they were glad to have participated, that they felt they had contributed to important science, and that they understood and approved of Milgram’s purposes once they had been debriefed. The title quote --- “Briefly, I was happy to have been of service” --- was from a participant who had complied fully.

The interpretation Haslam, Reicher, and colleagues offer is that the standard reading of Milgram is psychologically wrong in a specific way. Participants who continued were not, in most cases, mechanically obeying an authority they wanted to defy. They were identifying with the experimenter and his mission. They believed they were helping advance science. They were not coerced victims of authoritarian pressure but engaged collaborators who had bought into the legitimacy of the experimental project. The relevant psychological state is not “blind obedience” but “engaged followership” --- a willingness to do morally questionable things in service of a cause one believes in, with leaders one identifies with.

This reframing is consistent with three patterns in the data. First, it explains why obedience drops so sharply when participants are physically close to the victim or when peers model defiance --- these are conditions that disrupt the participant’s identification with the scientific project. Second, it explains why participants who suspected the shocks were fake were more likely to continue: if you think it’s theater in service of science, you’ll play your part; if you think you’re actually torturing someone, the followership frame breaks down. Third, it explains the feedback data --- participants wrote that they were glad to have participated because, in their own self-understanding, they had been collaborators in a meaningful enterprise, not victims of pressure.

The engaged-followership reading does not rehabilitate the “65 percent will shock to death” claim. It replaces it with a more textured and, in some ways, more disturbing claim: that the social-psychological recipe for getting ordinary people to do terrible things is not coercive pressure from authority but recruitment into a project they identify with, under leadership they trust, for purposes they accept as legitimate. This is closer to what scholars who study mass atrocity have long argued. Christopher Browning’s Ordinary Men, on the Hamburg police battalion that carried out massacres in occupied Poland, makes essentially the same point: the participants were not coerced into killing. They were enrolled into a project they came, gradually, to accept.

What’s Honest to Say Now

Pulling the threads together, here is what the evidence --- primary data, archival reanalysis, partial replication, and modern reinterpretation --- actually supports.

The obedience phenomenon is real. Under specific configurations involving high-prestige institutional setting, distant victim, absent peers, and constant authority pressure, a high proportion of ordinary people will continue to deliver what they believe are harmful actions on instruction. Burger’s partial replication supports this finding, at least for the 150-volt threshold. The phenomenon is robust enough to be considered a genuine and important social-psychological finding.

The famous 65 percent figure is a single condition’s result, not a universal claim about human nature. Across the twenty-one comparably measured Milgram conditions, average obedience to 450 volts was 43.6 percent, with enormous variation depending on situational factors. The “65 percent” framing selects the highest-obedience scripted configuration and presents it as if it represented humanity. The honest summary is that obedience rates vary from near-zero to over 90 percent depending on the configuration, and the average across conditions is much lower than the canonical number.

The participants were not necessarily believers. Milgram’s own unpublished analysis showed that participant skepticism about the reality of the shocks was widespread and was systematically related to obedience --- with skeptics more likely to comply. This complicates the inference from “obedience to fake shocks” to “willingness to harm under authority.”

The experimenter was not standardized. The protocol that defined Milgram’s claim --- four scripted prods, used in order, ending the trial after the fourth --- was not always followed. The audio record shows extensive improvisation. The obedience rate is partly a measure of the experimenter’s persuasive skill, not just the participant’s disposition.

The engaged-followership reading fits the data better than the blind-obedience reading. Participants who continued were largely those who identified with the scientific project and saw themselves as collaborators, not victims of pressure. This reading is consistent with the variation across conditions, with the suspicion/obedience correlation, and with the post-experiment feedback data.

A reasonable summary of the modern academic consensus, which has been developing for fifteen years and is now reasonably mainstream in social psychology, is something like: Milgram demonstrated a real and important phenomenon --- that people will do harmful things at the instruction of an authority figure under certain configurations --- but the canonical “65 percent will shock to death” framing oversimplifies the data, ignores the conditional variation, depends on a method that was not reliably standardized, and substitutes a simple authority-coercion model for the more accurate engaged-followership model. The phenomenon is closer to a propaganda finding than to a universal truth about human nature.

What This Means For Strategists

If you’re a leader, founder, consultant, or anyone whose decisions involve thinking about authority, compliance, organizational ethics, or the design of incentive structures, the Milgram literature in its honest form has three concrete implications.

1. The “anyone can be made to do terrible things” framing is misleading. The accurate framing is “people will do terrible things for projects they identify with under leaders they trust.” This distinction matters enormously for how you think about organizational ethics. If you believe the simple Milgram story, the lever for preventing wrongdoing is to make people less compliant --- train them to resist authority, build whistleblower channels, encourage skepticism of leaders. These are useful but they address the wrong mechanism.

The engaged-followership reading suggests a different lever. Most organizational wrongdoing doesn’t come from people being coerced into actions they morally oppose. It comes from people who have bought into the company’s mission, who identify with its leaders, who believe the project is legitimate and important, and who therefore find ways to rationalize ethically problematic actions in service of that project. Enron, Wells Fargo, Theranos, Volkswagen --- the wrongdoing in each case was carried out largely by enthusiastic employees who believed they were doing important work. The bystanders who could have stopped each scandal were largely engaged followers, not coerced subordinates.

The implication: if you want an ethical organization, “train people to resist bad orders” is a weaker lever than “be deeply careful about what missions your culture engages with and what your senior leaders signal as legitimate.” Engaged followership is amplifying whatever the project actually is. Make sure the project is something that should be amplified.

2. The variation across Milgram’s conditions is the actually useful finding for organizational design. The 24-condition data set is, when you take it seriously, an empirical inventory of what disrupts harmful compliance. Proximity to the consequences of your actions reduces compliance dramatically. Visible peer defiance reduces compliance almost to zero. Reduced institutional prestige reduces compliance. The participant choosing their own action with no prescriptive instruction reduces compliance to almost nothing.

Translate this to organizational design. If you want employees to be able to push back on questionable directives, the most effective levers are not training programs or whistleblower hotlines (those help, but they’re weak). The strong levers are: make sure decision-makers see the human consequences of their decisions directly (proximity), ensure that visible respected peers model dissent when warranted (group pressure to disobey), reduce the institutional aura around authority figures so that “the boss said so” carries less weight than the merits of the request, and design decision processes so that individual employees have to take affirmative responsibility for their actions rather than just executing instructions.

These are the things that move compliance in Milgram’s actual data. Most corporate ethics programs work on much weaker levers.

3. Be suspicious of canonical findings that get cited as one number. The “65 percent” finding from Milgram, the “average person has eight seconds of attention” finding from a Microsoft white paper, the “55 percent of communication is nonverbal” finding from Mehrabian --- these are all cases where a single, narrow result from a specific configuration became, in popular treatment, a universal claim about human nature. The original researchers usually understood the limitations. The popular versions strip those limitations away.

When you encounter a single-number behavioral science finding in a presentation, business book, or training program, the first question to ask is: what’s the variation? What’s the range across conditions? What’s the average effect, not just the headline effect? If the answer is “the cited number is from one condition out of many, and the average is substantially different” --- and this is the case for a striking number of canonical findings --- then the cited number is being used rhetorically, not scientifically.

This is the strategist’s discipline: the willingness to ask, of any vivid behavioral-science claim, “what does the full data set look like, not just the headline number.” For Milgram, the answer is that the underlying phenomenon is real but the canonical framing is selectively constructed. For your decisions, that distinction usually changes the implications considerably.

Sources

[Milgram, S. (1963). Behavioral study of obedience. Journal of Abnormal and Social Psychology, 67(4), 371-378. DOI: 10.1037/h0040525](https://psycnet.apa.org/doi/10.1037/h0040525) --- the original 1963 publication reporting Condition 1 (65% obedience).
Milgram, S. (1974). Obedience to Authority: An Experimental View. Harper & Row. --- the book-length monograph reporting 18 of the 24 conditions.
Perry, G. (2013). Behind the Shock Machine: The Untold Story of the Notorious Milgram Psychology Experiments. The New Press. --- primary archival investigation, drawing on the Yale Milgram archive (audio tapes, unpublished pilot notes, participant correspondence).
[Burger, J. M. (2009). Replicating Milgram: Would people still obey today? American Psychologist, 64(1), 1-11. DOI: 10.1037/a0010932](https://psycnet.apa.org/doi/10.1037/a0010932) --- partial replication using the 150-volt threshold (70% compliance, 70 participants).
[Haslam, S. A., Reicher, S. D., Birney, M. E., Millard, K., & McDonald, R. (2015). ‘Happy to have been of service’: The Yale archive as a window into the engaged followership of participants in Milgram’s ‘obedience’ experiments. British Journal of Social Psychology, 54(1), 55-83. DOI: 10.1111/bjso.12074](https://bpspsychub.onlinelibrary.wiley.com/doi/10.1111/bjso.12074) --- engaged-followership reinterpretation based on post-experiment participant feedback in the Yale archive.
[Gibson, S. (2013). Milgram’s obedience experiments: A rhetorical analysis. British Journal of Social Psychology, 52(2), 290-309. DOI: 10.1111/j.2044-8309.2011.02070.x](https://bpspsychub.onlinelibrary.wiley.com/doi/10.1111/j.2044-8309.2011.02070.x) --- discursive analysis of audio recordings documenting experimenter improvisation beyond the four scripted prods.
[Nicholson, I. (2011). “Torture at Yale”: Experimental subjects, laboratory torment and the “rehabilitation” of Milgram’s “Obedience to Authority.” Theory & Psychology, 21(6), 737-761. DOI: 10.1177/0959354311420199](https://journals.sagepub.com/doi/10.1177/0959354311420199) --- archival evidence on participant distress and inadequate debriefing.
[Haslam, S. A., Loughnan, S., & Perry, G. (2014). Meta-Milgram: An empirical synthesis of the obedience experiments. PLOS ONE, 9(4), e93927. DOI: 10.1371/journal.pone.0093927](https://pmc.ncbi.nlm.nih.gov/articles/PMC3976349/) --- meta-analysis of 21 Milgram conditions (N=740) showing 43.6% overall obedience rate.

This article is part of an ongoing series on famous behavioral-science studies that did not survive scrutiny. Other entries cover the Stanford Prison Experiment, power posing, the marshmallow test, ego depletion, the bystander effect and the Kitty Genovese case, and the Mozart Effect. The full hub lives at /replication-crisis/.

If you’ve built leadership training, ethics programs, or organizational design choices on the canonical Milgram framing and want a careful evidence review, book a consultation.

FAQ

Did 65 percent of people in the Milgram experiment really shock a man all the way to 450 volts? In Condition 1 of the 1963 study, yes --- 26 of 40 participants continued to 450 volts. But that was the highest-obedience scripted configuration out of twenty-four conditions Milgram ran. Across the 21 conditions for which directly comparable data exist (N = 740), the average rate of going to 450 volts was 43.6 percent. Other conditions ranged from 2.5 percent (when participants chose their own shock level) to 92.5 percent (when participants were in a non-trigger role). The “65 percent” headline is real but is one condition’s result, not a universal finding.

Did Milgram fake or manipulate the results? There is no evidence of outright fabrication. What Perry and others have documented is more like selective emphasis: Milgram knew that many participants suspected the shocks were fake (and that the experimenter often went off-script with the four prods), but he largely omitted these complications from his publications and his public framing. The data points are in his archive; they just weren’t featured.

Has the Milgram experiment been replicated? A full replication is not possible under modern IRB rules --- no ethics committee will approve running participants through to 450 volts. Burger’s 2009 partial replication, which stopped at 150 volts, found 70 percent continued past that threshold compared to Milgram’s 82.5 percent. This supports the partial obedience phenomenon but cannot directly replicate the famous “all the way to 450” finding.

What does “engaged followership” mean and how is it different from blind obedience? Engaged followership, the reinterpretation proposed by Haslam and Reicher, holds that Milgram’s participants who continued were not mechanically obeying an authority they wanted to defy --- they were actively identifying with the experimenter’s scientific project and saw themselves as collaborators in important work. The relevant psychology is identification and shared purpose, not coercion. This fits the post-experiment feedback in which participants frequently wrote that they were glad to have participated and felt they had contributed to important science.

If the popular Milgram story is oversimplified, does that mean ordinary people aren’t capable of doing terrible things on instruction? No --- the underlying phenomenon is real and important. Ordinary people will do harmful things at the instruction of legitimate authorities under specific configurations. What’s wrong is the simplified “anyone can be made to do anything” framing. The more accurate version is that compliance with harmful instructions depends heavily on how the situation is configured: proximity to the victim, presence of dissenting peers, perceived legitimacy of the project, identification with leaders. These moderators are large enough to take compliance from 2.5 percent to 92.5 percent in Milgram’s own data.

Were the participants harmed? By modern standards, yes. Some participants experienced significant distress during and after the studies, and debriefing was in some cases delayed for months. Ian Nicholson’s 2011 paper documents post-experiment letters from distressed participants and argues that the informed-consent and debriefing protocols would not be approvable under current IRB rules. Milgram defended the ethics of the study publicly but managed the public narrative about participant well-being more carefully than he managed the actual well-being of participants.

What’s the right way to use Milgram’s findings in organizational ethics or leadership training? The most useful finding is not the 65 percent baseline but the variation across the 24 conditions, which functions as an empirical inventory of what disrupts harmful compliance: proximity to consequences, visible peer dissent, reduced institutional prestige, individual responsibility for choices. These are the strong levers for designing organizations where employees can push back on questionable directives. Training programs and whistleblower hotlines work on weaker levers. And the engaged-followership reading suggests that the more important question is what mission and leadership your culture is amplifying --- not just whether employees can resist explicit bad orders.

Why does the simplified “65 percent” version persist in textbooks, business books, and the popular imagination? The same forces that preserved the Stanford Prison Experiment story: a charismatic researcher who was a brilliant communicator, a vivid finding that summarizes in one sentence, a cultural appetite for an explanation of how ordinary people end up complicit in mass atrocities, and the textbook ratchet effect by which findings, once canonical, are very expensive to revise. The academic consensus has been moving toward the more nuanced reading for fifteen years, but popular treatment lags the academic update by at least a decade. Expect the simplified version to persist in business and self-help writing for some time yet.

replication-crisis milgram-obedience social-psychology research-ethics archival-reinterpretation

Free Tool

Built for Experimentation Teams

GrowthLayer is the experimentation platform I built for CRO teams --- test management, AI-powered insights, and pattern recognition across your entire program.

Explore GrowthLayer → (opens in new tab)

· Start Free →

Share this article

LinkedIn (opens in new tab) X / Twitter (opens in new tab)

Copy link

Go deeper

Methodology The PRISM Method Case Studies $30M+ in Results Work Together Services & Mentoring

Experimentation and growth leader. Builds AI-powered tools, runs conversion programs, and writes about economics, behavioral science, and shipping faster.

About LinkedIn Newsletter

← Previous

Stereotype Threat: The Effect That Got Smaller Every Time We Looked

Next →

The Implicit Association Test: The Bias Tool That Doesn’t Predict Bias

replication-crisis milgram-obedience social-psychology research-ethics archival-reinterpretation

Atticus Li

Experimentation and growth leader. CXL-certified CRO practitioner, Mindworx-certified behavioral economist (1 of ~1,000 worldwide). 200+ A/B tests across energy, SaaS, fintech, e-commerce, and marketplace verticals.

About LinkedIn Newsletter

Milgram Obedience Experiments: What The Yale Archives Actually Show

The Original 1963 Experiment

The 24 Conditions Most People Never Hear About

Perry’s Archival Investigation

Burger’s 2009 Partial Replication

The Engaged-Followership Reinterpretation

What’s Honest to Say Now

What This Means For Strategists

Sources

FAQ

Built for Experimentation Teams

Three places this work shows up.

GrowthLayer

Consulting

Jobsolv

Get the Weekly
Experimentation Playbook

The Original 1963 Experiment

The 24 Conditions Most People Never Hear About

Perry’s Archival Investigation

Burger’s 2009 Partial Replication

The Engaged-Followership Reinterpretation

What’s Honest to Say Now

What This Means For Strategists

Sources

Related: Other Studies in This Series

FAQ

Built for Experimentation Teams

Related Articles

Cohen's d And The Misuse Of "Small/Medium/Large" Effect Sizes

The False Consensus Effect: Why You Think Everyone Agrees With You

The Barnum/Forer Effect: Why Personality Tests And Horoscopes Feel So Accurate

Three places this work shows up.

GrowthLayer

Consulting

Jobsolv

Get the WeeklyExperimentation Playbook

Get the Weekly
Experimentation Playbook