For seventy years, Robber’s Cave was the canonical proof that intergroup conflict emerges spontaneously when groups compete, and resolves when they cooperate on a common goal. Then Gina Perry pulled Muzafer Sherif’s notes out of the archive. What strategists building team-building and conflict-resolution frameworks on Sherif should actually know.

In the summer of 1954, twenty-two white middle-class boys, aged eleven and twelve, were taken to a Boy Scout camp in the Sans Bois Mountains of southeastern Oklahoma. They were strangers to each other. Their parents had paid the camp fee. None of them — and apparently none of their parents — knew they were participants in a social-psychology experiment being run by Muzafer Sherif of the University of Oklahoma.

The boys were split into two groups before they ever met. They were given group names: the Rattlers and the Eagles. They lived in separate cabins, ate at separate times, and were not told the other group existed. Within days, each group had developed its own norms, leadership structure, and identity. Then the experimenters introduced the two groups to each other and set up a competition for prizes — a baseball game, a tug-of-war, a treasure hunt — with the losers getting nothing. Within hours, the two groups were burning each other’s flags, raiding each other’s cabins, hoarding rocks for fights, and refusing to eat in the same room. Then the experimenters engineered a series of shared crises — a water-supply problem, a stuck delivery truck — that required both groups to cooperate. Within days, the conflict dissolved. The boys made friends across group lines, asked to ride home on the same bus, and pooled their money to buy malted milks together.

That story — “Robber’s Cave: intergroup conflict emerges spontaneously from competition, and dissolves through cooperation on superordinate goals” — became one of the most-cited findings in 20th-century social psychology. It appears in every introductory textbook. It is foundational to contact theory, to Tajfel and Turner’s social identity theory, to corporate diversity training, to conflict-resolution practice in international NGOs, and to most of what gets sold as “team building” in organizational consulting.

Almost none of the “spontaneous” part was actually spontaneous.

Or rather: the version of the story everyone was telling was a story. The thing that actually happened at Robber’s Cave was something quite different — and the difference matters enormously for anyone whose job involves designing teams, mediating conflict, or evaluating evidence claims about group behavior at scale.

This is part of an ongoing series on famous studies that didn’t survive archival reanalysis. Robber’s Cave belongs in the series for a specific reason: the data weren’t faked, the boys were real, the events happened. What was concealed was the degree of experimenter intervention required to produce them — and the existence of a previous, failed run of the same study in which the boys refused to cooperate with the script.

What Sherif 1954/1961 Published

The standard textbook version comes from two sources: Muzafer Sherif’s 1956 popular-science article in Scientific American titled “Experiments in Group Conflict,” and the more detailed 1961 book Intergroup Conflict and Cooperation: The Robbers Cave Experiment, co-authored with O. J. Harvey, B. Jack White, William Hood, and Carolyn Sherif.

The three-stage design as described in the published account:

Stage 1 — Group Formation (Days 1–6). Twenty-two boys, carefully screened to be from white middle-class Protestant families, of similar age, with no behavioral problems, and unknown to each other. They were transported separately to Robber’s Cave State Park, housed in separate cabins, and kept unaware that another group existed. Each group spontaneously developed its own norms, leadership hierarchy, and identity over the first week — picking group names (Rattlers and Eagles), stenciling them on their shirts, and forming internal status orders through joint activities (hiking, swimming, building a rope bridge).

Stage 2 — Intergroup Conflict (Days 7–13). The two groups were introduced and immediately placed in a tournament — baseball, touch football, tug-of-war, a tent-pitching contest, a cabin-cleanliness inspection, a treasure hunt. The winning group would receive medals and pocket knives. The losing group would get nothing. Within hours of the introduction, the groups began calling each other names. Within a day, they were burning flags, raiding cabins, throwing rocks, and refusing to share a dining hall. Sherif documented escalating hostility — Rattlers calling Eagles “stinkers,” Eagles raiding the Rattler cabin, both groups stockpiling rocks for self-defense.

Stage 3 — Cooperation Through Superordinate Goals (Days 14–20). Sherif first tried “mere contact” — joint meals, a shared movie — and reported that it failed. Hostility persisted; contact alone made things worse. Then he introduced superordinate goals: a deliberately staged crisis with the camp’s water supply (boys from both groups had to inspect the water line together), a stuck delivery truck that required all twenty-two boys pulling on a rope to free, a shared movie that required pooled money. Across several such episodes, the published account reports, hostility dissolved. By the final day, boys had asked to ride home together on the same bus, and several friendships had formed across group lines.

The headline conclusion: intergroup conflict emerges spontaneously when groups are placed in zero-sum competition. It resolves when groups must cooperate on goals that neither can achieve alone. The published methodology presented Sherif and his team as neutral observers — disguised as camp caretakers and counselors — recording behavior that arose naturally from the structure of the situation.

This framing became canonical. It is the foundation of “realistic conflict theory” in social psychology, a major input to Allport’s contact hypothesis and its later meta-analytic descendants, and the empirical basis for most team-building exercises that rely on shared-goal interventions to bond groups.

The 1953 Middle Grove Pre-Experiment Perry Found

Australian psychologist and writer Gina Perry spent four years in Sherif’s archives at the University of Akron’s Cummings Center for the History of Psychology. The result was her 2018 book The Lost Boys: Inside Muzafer Sherif’s Robbers Cave Experiment. The most damaging finding wasn’t about Robber’s Cave itself. It was about the experiment that came before.

In the summer of 1953, Sherif ran the same protocol — three stages, two groups, intergroup competition, superordinate-goal resolution — at a different summer camp, in upstate New York near Middle Grove. Twenty-four boys, same recruitment criteria. The published 1961 monograph mentions this pre-experiment only obliquely. The archives tell the full story.

Middle Grove failed. In a way that contradicted the entire theory.

Sherif’s protocol at Middle Grove had a twist: the boys were allowed to socialize together as one group for the first phase, and then deliberately split into two groups whose composition broke up the friendships they had just formed. The hypothesis was that splitting friends into rival groups would intensify the conflict-creation effect.

What happened instead: the boys, rather than turning on each other along the new group lines, turned on the experimenters. Cross-group friendships were too strong. The boys saw what the adults were doing — engineering rivalry, prodding competition, planting suspicions — and refused to play along. As Perry documents from Sherif’s own notes and from interviews with surviving Middle Grove participants, the boys actively conspired across group lines, sharing food with the “rival” group, refusing to compete in the tournament, and openly accusing the staff of unfairness when the experimenters tried to escalate tensions. One Middle Grove boy, by Perry’s account, told a counselor directly that “you guys are trying to make us fight each other.”

The 1953 Middle Grove pre-experiment, in other words, produced exactly the opposite of Sherif’s theory. Groups in zero-sum competition did not become hostile to each other — they became hostile to the people running the experiment.

Sherif terminated the Middle Grove study without publishing a paper on it. He spent the next twelve months redesigning the protocol. The redesign decisions that produced the 1954 Robber’s Cave version were, by Perry’s archival evidence, explicitly aimed at fixing the “problem” that the 1953 boys had been allowed to form friendships before the rivalry began. The 1954 protocol kept the two groups strictly separate for the entire first stage — they did not know the other group existed, much less have a chance to befriend each other — so that there were no cross-group bonds to override the manufactured rivalry.

The 1953 pre-experiment is not a footnote. It is the single most important piece of context for evaluating the 1954 result. A theory that “spontaneous intergroup conflict emerges from competition” should make the same prediction whether or not boys had three days of pre-rivalry friendship. Sherif’s own data showed the prediction did not hold. He buried that finding and ran a tighter version of the same design until he got the result he expected.

What Perry Documented About The 1954 Run

The 1954 Robber’s Cave study itself, in Perry’s archival reanalysis, also turns out to be far more experimenter-engineered than the published account suggests.

Counselors actively prevented cross-group friendships. The “camp counselors” at Robber’s Cave were research assistants. Perry’s archival material documents that they were instructed to keep the two groups physically separate during Stage 1, to redirect conversations that might lead to discoveries of the other group’s existence, and to discourage individual boys who showed signs of curiosity about the “noises from the other side of the camp.”

Counselors encouraged and accompanied the boys’ “raids.” Several of the famous incidents in the published narrative — cabin raids, flag burnings, rock-stockpiling — were not purely spontaneous. Perry documents counselor involvement in escalating the disputes, including counselors accompanying boys on cross-camp raids, suggesting targets, and in some cases handing the boys the materials needed for the raid.

Counselors planted suspicions and showed strategic favoritism. When intergroup hostility threatened to subside, the staff intervened. Perry’s archival evidence includes notes on counselors making comments to one group designed to inflame suspicion of the other (“did you see what the Rattlers said about you?”), and showing favoritism in competitive judging to keep the competition perceived as unfair.

Sherif himself was disguised as the camp caretaker so he could move around the camp making observations without being recognized as a researcher. The published methodology described this as standard naturalistic observation. The archives document that Sherif was actively directing the staff in real time, often outside the boys’ awareness, including signaling them to escalate or de-escalate specific situations.

The cumulative picture is not of researchers observing spontaneous intergroup conflict. It is of researchers actively engineering a sequence of conflicts that the boys, given a freer hand, might never have produced. The 1953 Middle Grove pre-experiment is the empirical demonstration of exactly this: when the manipulation was less aggressive and the boys had pre-existing cross-group bonds, the same protocol produced no spontaneous intergroup hostility at all.

The Participants’ Accounts

Part of Perry’s project was tracking down the surviving Robber’s Cave participants — by the late 2000s they were men in their sixties and seventies — and asking them what they remembered. None of them, until Perry contacted them, knew they had been in a psychology experiment. Their parents had signed them up for what looked like a summer camp scholarship.

Several recurring themes from those interviews, as Perry reports them:

Confusion about the counselors’ behavior. Multiple participants described episodes in which the counselors did things that, in retrospect, made no sense for actual summer-camp staff. One man recalled being “offered a set of knives” by the staff in a way that struck him later as bizarre. Another remembered counselors actively encouraging the boys to do things — break into the rival cabin, taunt the other group — that he, as a parent decades later, would never have wanted a real camp counselor doing to his own children.

Surprise at the framing. The published account frames the boys as having become genuinely hostile, with deep enmity between Rattlers and Eagles. Several survivors told Perry the experience felt more confusing than hostile — they were following along with what the adults seemed to want, but they didn’t carry lasting hatred for the other group and didn’t recognize themselves in the textbook account.

Lasting emotional residue. Some participants reported lasting discomfort — one developed an aversion to lakes, cabins, and tents that he traced to the camp, and went on to specialize in family law protecting children. Another described undergoing what he called a “personality transformation” that he never fully understood and that left him asking, years later, where he had picked up the idea that fights were a normal way to resolve disputes.

None of these accounts, by themselves, prove the published findings were wrong. People’s memories of childhood events seventy years later are not high-quality scientific evidence. But the combined picture — confusion about staff behavior, awareness that the adults were engineering events, no recognition of the published “spontaneous tribal warfare” framing — matches the archival evidence of experimenter intervention much better than it matches Sherif’s published narrative.

What Sherif Suppressed

The most important thing the published version did not disclose was the 1953 Middle Grove failure. Sherif made brief references to “earlier work” in some publications, but never published a paper describing the Middle Grove design, the failed manipulation, or the boys’ active resistance to the experimenters’ attempts to create conflict. A scientific community evaluating the 1954 Robber’s Cave findings was missing the single most important piece of disconfirming evidence.

Other methodological omissions Perry’s archival work surfaced:

  • The degree of experimenter intervention in producing the conflict episodes. The published account presents conflict as emerging from the structural situation (two groups, zero-sum competition). The archives document active staff orchestration of specific conflict events.

  • The selection of the 1954 cohort. Perry’s reading of Sherif’s notes suggests that the boys recruited for the 1954 run were screened more aggressively for traits that would make them respond to the protocol as designed — including a willingness to follow adult authority and an absence of pre-existing friendships.

  • The role of the cooperation-resolution phase. The dissolution of conflict in Stage 3 was also more orchestrated than the published account suggests. The superordinate-goal crises (water shortage, stuck truck) were not natural events the experimenters happened to observe — they were staged events the staff designed, timed, and managed.

These omissions don’t make Sherif a fraud in the Diederik Stapel sense — he didn’t invent participants or fabricate observational data. What he did, on the evidence Perry assembled, was present a heavily manipulated demonstration as a naturalistic observation, conceal a contradictory prior result, and let the field treat his 1954 finding as evidence for spontaneous intergroup conflict when his own 1953 data had shown the spontaneous version did not produce the predicted result.

What This Means For “Spontaneous Intergroup Conflict” Theory

The narrow version of Sherif’s claim — “intergroup conflict emerges spontaneously from zero-sum group competition” — has weak empirical support from Sherif’s own work. The published evidence for spontaneity is largely an artifact of the experimenter intervention the published version did not disclose, and Sherif’s own 1953 pre-experiment showed the predicted spontaneous conflict did not appear when the manipulation was less aggressive.

This does not mean intergroup conflict isn’t real, or that competition has nothing to do with it. It means Sherif’s specific evidence does not establish the strong “spontaneous emergence” version of the claim. A weaker version — “when groups are placed in zero-sum competition and the situation is actively shaped to escalate hostility, hostility tends to escalate” — is supported by Sherif’s evidence. That weaker version, however, does not have the same explanatory or prescriptive force as the textbook version, because almost any sustained group hostility can be produced if the experimenters are willing to keep engineering it.

The broader theoretical claim — that intergroup contact, structured around cooperation on shared goals, can reduce prejudice — has independent empirical support. The single strongest piece of that evidence is Pettigrew and Tropp’s 2006 meta-analysis in the Journal of Personality and Social Psychology, which pooled 515 studies and 713 independent samples and found that intergroup contact, on average, reduces prejudice — with stronger effects when contact involves cooperation, equal status, common goals, and institutional support. That conclusion is robust to dropping Sherif from the evidence base. Tajfel and Turner’s social identity theory, developed in the late 1970s, also has independent experimental support that does not depend on Robber’s Cave.

So the field is in this position: the broad shape of contact theory and social identity theory has independent support; the dramatic textbook demonstration most people remember as “the evidence” does not.

What’s Honest To Say About Intergroup Contact And Conflict Now

A careful evidence-based summary of the current state, separating Sherif’s specific evidence from the broader theoretical framework it was used to support:

Spontaneous intergroup conflict from competition. Weak evidence. Sherif’s published Robber’s Cave findings are heavily compromised by undisclosed experimenter intervention and by the suppressed 1953 Middle Grove failure. There are real-world cases of intergroup conflict arising under zero-sum competition, but Robber’s Cave should not be cited as a clean demonstration of the mechanism.

Cooperative contact reduces prejudice. Strong evidence, independent of Sherif. Pettigrew and Tropp’s 2006 meta-analysis pools several decades of work across many study designs and finds robust support for contact theory, particularly when contact involves equal status, common goals, cooperation, and institutional sanction. The effect sizes are modest but real and have been replicated across cultures.

Social identity theory. Strong evidence, independent of Sherif. Tajfel and Turner’s “minimal group” paradigm — showing that group identification can produce in-group favoritism even when group assignment is trivial and meaningless — has been replicated many times and provides a much cleaner experimental foundation than Robber’s Cave.

Superordinate goals as conflict-reduction tools. Mixed evidence. The Robber’s Cave demonstration of the mechanism is unreliable, but the underlying idea — that getting groups to cooperate on goals neither can achieve alone tends to reduce inter-group hostility — has independent support in conflict-resolution research, including studies of post-conflict reconciliation programs. The mechanism is real; the Robber’s Cave evidence for it is not the basis for believing it.

This is the kind of evidence map a strategist actually needs — one that distinguishes “the dramatic case study you learned in school” from “the underlying claim that survives” from “the modern evidence base that should actually inform decisions.”

What This Means For Strategists Working On Team-Building And Conflict Resolution

If you’re designing team interventions, leading post-merger integration, working on diversity programs, or building conflict-resolution training, the Robber’s Cave story has three concrete implications.

1. Stop citing the specific Sherif demonstration. It’s a weaker piece of evidence than its cultural status implies. The story is dramatic and easy to tell — “two groups, competition, instant hostility, cooperation healed it” — which is exactly the property that makes it dangerous as a basis for designing real interventions. The published evidence is contaminated by undisclosed experimenter manipulation. The 1953 Middle Grove pre-experiment, when boys had pre-existing cross-group bonds and the experimenters were less aggressive, showed the opposite of Sherif’s prediction. If you’re recommending a team-building exercise on the strength of Robber’s Cave, you’re standing on a story, not a finding.

2. Build your interventions on the stronger evidence base instead. The interventions that look reliable in the modern literature — structured cooperative contact, equal-status interaction, shared institutional goals, cross-cutting social identities — have independent meta-analytic support that doesn’t depend on Sherif. Pettigrew and Tropp’s 2006 meta-analysis is the canonical reference. Design your interventions to satisfy the empirically supported conditions (cooperation, equal status, shared goals, institutional support) and don’t pretend you need Robber’s Cave to justify them.

3. Beware of any intervention whose proponents lean heavily on a single dramatic study. The Robber’s Cave problem generalizes. Stanford Prison Experiment, Bystander Effect, Power Posing, Marshmallow Test, Asch Conformity, Milgram Obedience — each of these is a dramatic single-study demonstration that subsequent archival or replication work has substantially undermined, even when the underlying phenomenon has some independent support. The “single famous study” is rarely the strongest evidence for a phenomenon, and consultants or trainers who lead with “as Sherif’s Robber’s Cave Experiment showed…” are usually selling story, not science.

The strategist’s job is to distinguish “what’s the actual contemporary evidence base for this intervention” from “what’s the famous story this intervention gets sold with.” Those are often the same. They are often not. The version of behavioral science that reaches you through TED talks, leadership books, and corporate training decks is consistently several years behind the version that exists in the meta-analyses, and is heavily filtered for storytelling potential rather than evidential strength.

For Robber’s Cave specifically: the underlying mechanisms (contact, cooperation, superordinate goals) have real but modest empirical support. The dramatic Sherif demonstration does not. Use the former. Discount the latter.

Sources

This article is part of an ongoing series on famous behavioral-science studies that did not survive replication or archival reanalysis. Other entries cover the Stanford Prison Experiment, Asch’s conformity study, the Milgram obedience experiments, Broken Windows theory, the marshmallow test, and ego depletion. The full hub lives at /replication-crisis/.

If you’re building organizational, diversity, or team-design strategy on behavioral-science assumptions and want a careful audit of which of those assumptions still hold up, book an evidence review.

FAQ

Is intergroup conflict not real? Intergroup conflict is real, observable across cultures and historical periods, and a major subject of legitimate social science. What’s not well-established by Sherif’s specific evidence is the strong claim that intergroup conflict emerges spontaneously from zero-sum competition with no other ingredients. Real-world intergroup conflict tends to involve histories, leadership, identity, and resources in ways that the Robber’s Cave demonstration deliberately stripped out — and even then, Sherif’s archival evidence shows the “spontaneous” emergence required substantial experimenter intervention to produce.

What about contact theory? Did Perry destroy that too? No. Contact theory — the idea that structured cooperative contact between groups under specific conditions tends to reduce prejudice — has independent empirical support. The strongest single piece of that support is Pettigrew and Tropp’s 2006 meta-analysis of 515 studies, which finds robust effects that are independent of Sherif’s specific Robber’s Cave findings. The theory is fine. The Sherif demonstration as the textbook evidence for the theory is the weak link.

What about modern team-building exercises built on Sherif? Most corporate team-building exercises that cite Robber’s Cave are using it as a colorful illustration, not as the actual evidence base. The interventions themselves — cross-functional projects, shared-goal workshops, structured cooperation — have other support and tend to work modestly well for the reasons identified in contact-theory meta-analyses. The pragmatic recommendation: don’t cite Sherif in your training deck; the interventions stand on their own evidence base without him.

Was Sherif a fraud? No, not in the way Diederik Stapel or Brian Wansink were. Sherif didn’t invent participants, fabricate observational data, or make up effects that didn’t exist. What he did, on the basis of Perry’s archival evidence, was: (1) bury a contradictory prior experiment that disconfirmed his theory, and (2) present a heavily manipulated experimental demonstration as a naturalistic observation, concealing the degree of experimenter intervention required to produce the published events. That’s serious methodological misrepresentation. It’s not the same as fraud.

Why did this take so long to come out? Sherif died in 1988 and the published version of Robber’s Cave was textbook-canonical for sixty years before Perry’s book. Several factors: Sherif’s archives weren’t fully catalogued for decades; surviving participants were never systematically interviewed; the 1953 Middle Grove pre-experiment was only briefly referenced in print, not described; and the field had no strong incentive to dig into a finding everyone considered settled. The pattern matches Stanford Prison, Bystander Effect, and several other classic studies: archival reanalysis tends to lag the original publication by half a century because the people running the original studies are in the field’s leadership during their lifetimes.

Should I still teach Robber’s Cave to my team? Teach it as a case study in how dramatic findings outlive their evidence, not as a settled scientific finding about group dynamics. The story is genuinely useful as an illustration of why archival reanalysis matters, why the gap between textbook science and current science can be decades wide, and why strategists should be especially skeptical of behavioral-science findings that are unusually clean and dramatic. The deeper lesson — distrust narratively perfect demonstrations — is more useful for a working strategist than the original Sherif takeaway ever was.

Where can I read more about archival reanalysis of classic studies? Start with Gina Perry’s two books — Behind the Shock Machine (2013) on Milgram and The Lost Boys (2018) on Sherif. For the Stanford Prison case, see Thibault Le Texier’s Debunking the Stanford Prison Experiment (2019, American Psychologist). For a broader treatment of the replication crisis in social psychology, see Stuart Ritchie’s Science Fictions (2020). The replication-crisis hub collects studies covered in this series.

replication-crisis sherif-robbers-cave social-psychology intergroup-conflict evidence-evaluation

Share this article
LinkedIn (opens in new tab) X / Twitter (opens in new tab)
Atticus Li

Experimentation and growth leader. CXL-certified CRO practitioner, Mindworx-certified behavioral economist (1 of ~1,000 worldwide). 200+ A/B tests across energy, SaaS, fintech, e-commerce, and marketplace verticals.