Cargo Cult Science: Feynman's 1974 Caltech Address On Scientific Integrity

Atticus Li

← The Replication Crisis · replication-crisis

Cargo Cult Science: Feynman's 1974 Caltech Address On Scientific Integrity

In June 1974, Richard Feynman delivered the Caltech commencement address that coined the phrase "cargo cult science." The talk was about fields that follow the outward forms of science — publication, statistics, terminology — without the underlying integrity that makes the forms mean anything.

By Atticus Li May 25, 2026 26 min read

In June 1974, Richard Feynman walked to the lectern at the California Institute of Technology to give the commencement address. He had won the Nobel Prize in Physics nine years earlier, had spent a career as one of the most public scientific figures in America, and had built a reputation for blunt, unfooled, theatrically plain-spoken honesty about how science actually works. The talk he gave that day became one of the most-quoted speeches in the history of modern science. Its title was “Cargo Cult Science,” and the phrase entered the methodological vocabulary of every field that has had to reckon, in the decades since, with the question of whether its findings would replicate.

The address was published later that year in Caltech’s alumni magazine Engineering & Science (37(7), 10–13). It was reprinted, with light edits, in Feynman’s 1985 memoir Surely You’re Joking, Mr. Feynman!, where it has been read by orders of magnitude more people than the original audience. It runs to about 3,500 words. It is built around a single metaphor, a few historical examples, and a prescription. The metaphor is the cargo cult; the historical examples are Millikan’s 1909 oil-drop experiment and a series of rat-running psychology studies from the 1930s; the prescription is “scientific integrity” defined as the principle of leaning over backwards to expose your own work to disconfirming evidence.

In the half-century since, the phrase “cargo cult science” has been applied to homeopathy, technical analysis in finance, large parts of nutritional epidemiology, much of the social-priming literature in psychology, certain styles of macroeconomic modeling, much of management consulting, the bulk of corporate strategic planning, and substantial portions of conversion-rate optimization practice as it is sold by agencies. The metaphor is so portable, and so unflattering, that fields under accusation generally respond with anger rather than argument. The accusation that one is doing cargo cult science is, in any technical community, a deep insult; it is also, frequently, accurate.

This article walks through what Feynman actually said in 1974, the historical examples he chose and why he chose them, his specific prescription for what scientific integrity looks like in practice, why the speech reads in 2026 as a near-perfect prediction of the replication crisis that would unfold in psychology and biomedicine four decades later, and how a working strategist can use the cargo-cult test as a diagnostic when evaluating any field’s claim to be “evidence-based.”

The Metaphor

Feynman opened the talk by describing the Pacific Islander cargo cults — a real anthropological phenomenon that had been documented by missionaries, colonial administrators, and ethnographers since at least the late nineteenth century, and that had received fresh attention after the Second World War. The most famous variants emerged in Melanesia, particularly on islands in what is now Vanuatu and Papua New Guinea, where American and Japanese forces had built airstrips and supply depots during the Pacific campaign of 1942–1945. The military operations brought, by the standards of the local economies, immense quantities of manufactured goods: canned food, radios, medicines, tools, clothing, jeeps, weapons. After the war ended, the airstrips were abandoned, and the cargo stopped.

Some of the local communities, attempting to understand what had happened and what might bring the cargo back, constructed elaborate ritual imitations of what they had observed the foreigners doing. They built bamboo-and-thatch replicas of control towers. They cleared and maintained the abandoned runways. They lit fires along the edges of the runways at night. They wore wooden ear-pieces and stood in the towers, waving sticks like the controllers they had watched. They drilled in formations that mimicked the foreign troops they had seen.

Feynman’s description was sympathetic in form and devastating in substance. The islanders, he said, had observed the foreigners performing certain actions, had observed that those actions were followed by the arrival of cargo, and had reasonably concluded that the actions caused the cargo. So they performed the actions. They got the forms right. The runways were the right length, the fires were lit at the right times, the bamboo towers were the right shape. What they did not have, and could not have inferred from observation alone, was the underlying causal mechanism — the global supply chain, the logistics command, the industrial production, the war economy — that had actually delivered the cargo. The forms were correct. The mechanism was not. So the cargo did not come.

Feynman’s claim was that several fields in 1974 were doing the same thing with science. They were performing the outward rituals of scientific research — publishing in journals, computing statistics, using technical terminology, holding conferences, awarding doctorates — while failing to do the one thing that actually makes science produce reliable knowledge. He called that one thing “scientific integrity.”

The metaphor is anthropologically imprecise. Subsequent ethnographic work has complicated the popular picture of cargo cults: many of the movements were complex religious and political phenomena, often with anti-colonial dimensions, and the simple “imitate the form, expect the substance” caricature does not capture them as historical movements. But the metaphor was not making an anthropological claim; it was making a methodological one. The methodological claim is what survived, and the methodological claim is, on its own terms, sound.

The Millikan Example

The most-cited historical example in the speech is the story of the oil-drop experiment and the slow correction of the elementary-charge value over the decades following 1909.

Robert Millikan, working at the University of Chicago, designed the oil-drop apparatus in 1909 and over the next several years used it to measure the charge of the electron. The technique involved suspending a charged droplet of oil between two electrodes, balancing electrostatic and gravitational forces, and inferring the charge from the equilibrium voltage. Millikan’s 1913 paper reported a value of e ≈ 1.592 × 10⁻¹⁹ coulombs. This value won him the Nobel Prize in Physics in 1923.

The value was wrong. Millikan had used a value for the viscosity of air that turned out to be slightly off, which biased his electron-charge measurement by about half a percent. The modern value of the elementary charge is approximately 1.602 × 10⁻¹⁹ coulombs.

The interesting fact, the fact Feynman built the example around, is the trajectory of the published value over the decades that followed. Subsequent experimenters made measurements of the electron charge using a variety of techniques. The values they reported did not jump cleanly to the modern value. They drifted upward gradually, decade by decade, in a curve that asymptotically approached the true value over roughly two decades. The first replications after Millikan reported values close to Millikan’s value. Later experiments reported values slightly higher. Experiments later still reported values slightly higher again. Each experimenter, Feynman pointed out, knew Millikan’s value. Each experimenter knew that the published consensus value was close to Millikan’s. When a given experiment produced a value substantially different from Millikan’s, the experimenter looked for sources of error, found one, corrected it, and reported a value closer to Millikan’s. When a given experiment produced a value close to Millikan’s, the experimenter wrote up the result and submitted it. The literature converged on the true value not because each individual experiment was unbiased, but because the bias was finite and was eventually overcome by the accumulated weight of measurements that were corrected for everything except proximity to the published consensus.

Feynman’s name for this phenomenon was an absence of integrity. The experimenters were not lying. They were not fabricating data. They were doing what reasonable scientists, on reasonable assumptions, would do: they were checking outlying results more carefully than confirming results. The asymmetric scrutiny — applied to disconfirming evidence but not to confirming evidence — produced a literature whose published values were systematically biased toward the prior consensus. The result was that the field took roughly two decades, on Feynman’s telling, to publish a value of the electron charge that matched what we now know to have been true.

The example is methodologically important for a reason that goes beyond physics. The Millikan story is an instance of what we would now call confirmation bias operating through publication selection, with the specific mechanism being asymmetric quality-checking of results conditional on whether they agree with prior expectations. The same mechanism operates today, in every field, in subtle and not-so-subtle forms. It is the structural basis of much of the replication crisis. When Daryl Bem’s precognition papers replicated successfully in some labs and failed in others, and when the failures faced higher publication barriers than the successes, the field was running a version of the Millikan effect. When clinical trials of antidepressants returning unfavorable results disappeared from the literature at higher rates than trials returning favorable results, the field was running a version of the Millikan effect. The mechanism is the same. Feynman identified it with a story from his own field, in 1974, and the story was a warning.

The Rat-Running Example

The second extended example in the speech is the rat-running maze experiments of the 1930s, particularly the work of a psychologist named Young (Feynman did not give a full citation; the work he was referring to was likely from the Berkeley animal-behavior laboratory of the era, with versions attributed to F. A. Young’s published work on rat maze experiments). The point of the example was even more pointed than the Millikan example.

Psychologists studying learning in rats had used a standard apparatus: a corridor with a series of doors, where a rat was released at one end and learned, over trials, to choose the correct door to receive a food reward. The researchers had assumed that the rats were learning to associate the correct door’s spatial position or its visual appearance with the reward. The behavioral results appeared to confirm this hypothesis.

A more careful experimenter — Feynman attributed this to “Mr. Young” in the address — designed an experiment to control for what the rats were actually using as their cue. He covered the corridor’s surface with sand to eliminate auditory cues from the rats’ feet hitting the floor differentially at different points. He varied the lighting, the smells, the visual surroundings. As he progressively eliminated alternative cues, he discovered that the rats had not been using the variables the original researchers had assumed. The rats had been using cues the researchers had not even realized were present in the apparatus — minute differences in the floor’s texture, in the residual scent of previous trials, in subtle sound cues from the rat’s own footfalls echoing differently in different positions.

The conclusion Feynman drew from the example was twofold. First, the rats were performing the task, and were performing it reliably, but they were not performing the task the researchers had thought they were performing. The published results were technically correct in the narrow sense — the rats did choose the right door — but the mechanistic interpretation that the published papers built around those results was wrong. Second, and this is the point Feynman drove hard, the subsequent literature in psychology largely ignored Young’s careful work. The standard maze apparatus continued to be used as if his control experiments had never happened. The forms of the experiment continued; the mechanistic interpretations built on those forms continued; the work that should have forced a reconsideration of the entire research program was filed away as a methodological footnote, if it was acknowledged at all.

Feynman’s gloss: this was the central failure of cargo cult science. It is not enough to do the experiments. It is not enough to publish the results. The integrity of the field depends on reading and incorporating the work that undermines the dominant interpretation — and the field had failed to do that. The field had performed the rituals of scientific publication while ignoring the substance of methodological correction. The runways were maintained. The bamboo towers were polished. The cargo did not come.

The Prescription: Lean Over Backwards

The central methodological claim of the address comes about two-thirds of the way through. Feynman attempted to define, in operational terms, what scientific integrity actually consists of. The definition he gave is one of the most-quoted passages in the speech, and it is worth quoting at length because it is the operational core of the whole talk:

It’s a kind of scientific integrity, a principle of scientific thought that corresponds to a kind of utter honesty — a kind of leaning over backwards. For example, if you’re doing an experiment, you should report everything that you think might make it invalid — not only what you think is right about it: other causes that could possibly explain your results; and things you thought of that you’ve eliminated by some other experiment, and how they worked — to make sure the other fellow can tell they have been eliminated.

The phrase that has stuck is “leaning over backwards.” It is a deliberately uncomfortable image. The natural posture of a researcher reporting a finding is to lean toward the finding — to emphasize the result, to background the caveats, to present the work in the best light because the rewards of the field flow to clean results. Feynman’s prescription was to do the opposite. The integrity-bearing scientist, on his telling, leans over backwards: she goes out of her way to surface, name, and explain every consideration that could make the result wrong, including considerations that her readers might not have thought of, considerations that she might have been able to bury without anyone noticing.

The reasoning behind the prescription is a Bayesian one (Feynman did not state it in Bayesian terms, but the underlying logic is Bayesian). The reader of a research paper is trying to estimate the probability that the reported finding is true. The information that bears on this estimate includes not only the headline result, but also every consideration that might increase the probability of a false positive. If the researcher selectively reports the evidence in favor of the finding while withholding evidence that bears against it, the reader’s estimate of the probability of truth will be systematically biased upward. The author has the information; the reader does not; the asymmetric reporting produces an asymmetric posterior. Leaning over backwards is the corrective. By going out of her way to surface disconfirming considerations, the researcher restores the symmetry that allows the reader to form a correctly calibrated estimate.

This is not a theoretical claim. It is an operational rule that can be enforced (and, in some fields, increasingly is enforced) through pre-registration, registered reports, open data, open code, and the systematic publication of null results. Each of these institutional practices is a way of making “leaning over backwards” a structural property of the field rather than a personal virtue of individual researchers. Feynman’s 1974 prescription anticipated the entire later infrastructure of open science, four decades before the institutional practices caught up.

Why The Speech Reads As Prophecy

Read the 1974 talk in 2026 and the eerie thing is how cleanly it predicts the replication crisis as it actually unfolded.

Feynman warned that fields can publish, compute statistics, use scientific terminology, and award doctorates while failing to produce reliable knowledge, because the form of science can be reproduced without the substance. The 2010s would bring exactly that diagnosis to social psychology, then to large parts of biomedicine, then to nutritional epidemiology, then to behavioral economics. The Open Science Collaboration’s 2015 Science paper attempted to replicate 100 psychology studies and successfully replicated 39. The reproducibility crisis in cancer biology produced replication rates around 11–25%, depending on the study and the definition. The candidate-gene literature of the 1990s, with its thousands of underpowered association studies, replicated at single-digit rates when adequately powered studies finally arrived. Each of these episodes is a real-world demonstration of cargo cult science as Feynman defined it: the forms had been correct, the substance had not been, and the cargo had not come.

Feynman warned that fields can converge on biased consensus values through asymmetric quality-checking, with the Millikan example as the canonical case. The replication crisis literature has documented essentially the same mechanism in dozens of subfields. The 2008 Turner et al. paper in NEJM showed that published antidepressant trials reported positive results at far higher rates than the FDA’s complete trial registry contained, with negative trials either unpublished or recoded as positive. The mechanism was Millikan’s mechanism: results that confirmed the prior consensus passed through the publication filter; results that did not were re-examined, found wanting, and quietly shelved. The field’s published consensus drifted toward a biased estimate that took years to correct.

Feynman warned that fields can fail to incorporate corrective methodological work, with the rat-running example as the canonical case. The replication crisis literature has documented this too. Paul Meehl’s 1978 Journal of Consulting and Clinical Psychology paper “Theoretical Risks and Tabular Asterisks” laid out, with brutal clarity, most of the methodological problems that would later be rediscovered in the 2010s — and was almost entirely ignored by the field for thirty years. Daniel Lakens has documented similar patterns in social psychology. The methodological corrections were available; the field did not incorporate them; the replication crisis arrived on schedule.

Feynman’s 1974 prescription — that the researcher should “lean over backwards” to surface disconfirming considerations — is the philosophical foundation of every open-science institutional reform that has been adopted since 2015. Pre-registration forces the researcher to commit, in writing, to the analyses she will run before she sees the data, which removes the freedom to “lean toward” a finding by selectively reporting analyses post hoc. Registered reports go further: the journal commits to publish the paper based on the quality of the design, before the results are known, which removes the publication-bias asymmetry that drove the Millikan effect. Open data and open code allow other researchers to verify the analyses, which removes the ability to hide methodological choices from view. Each of these is a structural implementation of the principle Feynman named in 1974.

The speech was prophetic not because Feynman saw the future, but because the underlying mechanisms he described are timeless features of how humans do investigative work in social institutions. The mechanisms operate today, in every field, and they will continue to operate tomorrow. The institutional reforms can suppress the mechanisms in the fields that adopt them; they cannot eliminate the mechanisms. In fields that have not adopted the reforms, the mechanisms run at full strength.

Modern Applications

The cargo cult test, applied as a working diagnostic, generates uncomfortable conclusions in a number of fields outside the natural sciences.

Academic research. The replication crisis has demonstrated that substantial portions of social psychology, biomedical bench research, nutritional epidemiology, candidate-gene association studies, and parts of behavioral economics meet the cargo cult definition. The publication infrastructure is intact; the statistical machinery is sophisticated; the terminology is appropriately technical; the institutional rituals are observed. The findings, when subjected to independent replication by teams with no stake in the original, do not survive. In the relevant subfields, the runway lights are lit; the cargo does not come.

Journalism that translates research. The science-journalism pipeline routinely propagates findings from low-replication subfields into the consumer-facing press as if the findings were reliable. The cargo cult diagnosis applies here in a derivative form: the journalist is performing the rituals of “covering the science” — quoting researchers, citing journals, summarizing findings — without engaging with the structural unreliability of the underlying literature. The forms of science journalism are correct; the underlying epistemic mechanism is not. The result is a popular-science discourse that has been systematically miscalibrated for decades on subjects ranging from priming effects in psychology to the supposed health benefits of various dietary patterns.

Management consulting and corporate strategy. Large parts of the consulting industry sell “evidence-based” recommendations whose evidentiary basis is a thin layer of case studies, internal proprietary research, or popular-science extrapolations from low-replication academic findings. The deliverables follow the form: bound reports, executive summaries, frameworks with capitalized names, references to “research.” The substance — the actual epistemic mechanism that would justify the recommendations — is frequently absent. The famous example is Jim Collins’s Good to Great, whose central claims about the practices that distinguished “great” companies from “good” ones have not survived independent scrutiny: the selected companies’ subsequent performance has been mediocre or worse, suggesting that the original framework picked up post-hoc rationalizations of luck rather than reliable causal practices.

Conversion-rate optimization and applied marketing research. Substantial parts of the CRO industry market themselves as “evidence-based” while running underpowered A/B tests, peeking at results, declaring significance early, and reporting selectively. The forms are correct — there are dashboards, statistical significance markers, professional reports — but the underlying replication rate of “winning” tests is, in the rare cases where it has been measured, low enough to be embarrassing. The cargo cult diagnosis applies precisely. The bamboo control tower has a dashboard.

Macroeconomic forecasting. The track record of professional macroeconomic forecasts at horizons longer than a quarter is, by the relevant evaluation standards, indistinguishable from random. The infrastructure — forecasting departments, models, conferences, publications — runs at full strength. The cargo continues not to come. The Reinhart-Rogoff “90% debt threshold” episode is a particularly clean case: a single influential paper with an Excel spreadsheet error and a small dataset became the empirical foundation for an austerity policy debate that affected hundreds of millions of people, and the error went uncorrected for years because the field’s incentives ran toward elaborating the finding rather than auditing it.

In every one of these domains, the cargo cult test is not the only diagnostic. There are honest practitioners. There are well-replicated subfields within each of the broader fields. The diagnosis is not “all academic research is cargo cult science” or “all consulting is cargo cult.” The diagnosis is that the cargo cult mechanism is one of several patterns the field can fall into, and that the fields named above contain large pockets where the pattern dominates.

The Strategist’s Diagnostic

The practical use of Feynman’s framework, for someone evaluating any field’s claim to be “evidence-based,” is to ask a specific question: does this field have institutional mechanisms that force its practitioners to lean over backwards?

The diagnostic question is not “are the published papers in this field statistically rigorous?” The Millikan example shows that statistical rigor can coexist with systematic bias. The question is not “do the practitioners cite each other and run conferences?” The cargo cult example shows that institutional ritual can coexist with epistemic emptiness. The question is whether the field has structural mechanisms that force individual practitioners to expose their work to disconfirming evidence, even when the practitioners would prefer not to.

The mechanisms that satisfy the test are concrete and identifiable:

Pre-registration of confirmatory studies. Does the field require researchers to commit to analyses in writing before they see the data?

Registered reports. Do the field’s top journals publish papers based on design quality before the results are known?

Open data and open code. Can independent researchers re-run the analyses and verify the results?

Adversarial collaboration. When two researchers disagree, does the field have mechanisms for them to jointly design an experiment that one of them must lose?

Routine independent replication. Are findings, before being treated as established, replicated by independent teams with no stake in the original?

Publication of null results. Do the field’s journals publish well-conducted studies with negative findings, or only studies with positive findings?

Honest declaration of researcher degrees of freedom. Do practitioners disclose, in writing, every analytic choice they made and every analysis they considered but did not report?

In a field with most or all of these mechanisms in place — top-tier post-2015 psychology, much of physics, much of clinical-trial methodology — the cargo cult risk is meaningfully reduced. In a field with few or none of them — much of management consulting, much of strategic forecasting, large parts of corporate “thought leadership” — the cargo cult risk is essentially uncapped, and the diagnostic posture should be one of presumptive skepticism. The default credence for any claim from a field without leaning-over-backwards mechanisms should be low until additional evidence accumulates.

This is not an anti-science posture. It is a calibrated-credence posture, applied to claims based on the structural conditions of the field that produced them. The structural conditions are observable. The calibration follows from the conditions. Feynman’s 1974 framework, half a century later, remains the cleanest articulation of how to do the calibration. The framework is also, not coincidentally, the closest thing the working strategist has to a defense against the substantial portion of “evidence-based” recommendations in modern business discourse that turn out, on inspection, to be cargo cults with PowerPoint decks.

Sources

Feynman, R. P. (1974). Cargo cult science. Engineering & Science, 37(7), 10–13. Caltech archive.
Feynman, R. P. (1985). Surely You’re Joking, Mr. Feynman! (Adventures of a Curious Character). New York: W. W. Norton.
Ioannidis, J. P. A. (2005). Why most published research findings are false. PLOS Medicine, 2(8), e124. DOI: 10.1371/journal.pmed.0020124.
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716. DOI: 10.1126/science.aac4716.
Turner, E. H., Matthews, A. M., Linardatos, E., Tell, R. A., & Rosenthal, R. (2008). Selective publication of antidepressant trials and its influence on apparent efficacy. New England Journal of Medicine, 358(3), 252–260. DOI: 10.1056/NEJMsa065779.
Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46(4), 806–834. DOI: 10.1037/0022-006X.46.4.806.
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359–1366. DOI: 10.1177/0956797611417632.
Wootton, D. (2006). Bad Medicine: Doctors Doing Harm Since Hippocrates. Oxford: Oxford University Press.
Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mellor, D. T. (2018). The preregistration revolution. Proceedings of the National Academy of Sciences, 115(11), 2600–2606. DOI: 10.1073/pnas.1708274114.
Lindsay, D. S. (2015). Replication in psychological science. Psychological Science, 26(12), 1827–1832. DOI: 10.1177/0956797615616374.
Begley, C. G., & Ellis, L. M. (2012). Drug development: Raise standards for preclinical cancer research. Nature, 483(7391), 531–533. DOI: 10.1038/483531a.

Ioannidis 2005: “Why Most Published Research Findings Are False” Landmark — the Bayesian formalization of the conditions under which cargo cult science produces unreliable literatures.
P-Hacking and Researcher Degrees of Freedom — the modern technical name for the mechanism Feynman warned about in his discussion of selective reporting.
HARKing: Hypothesizing After Results Are Known — the analytical move that perfectly inverts Feynman’s “lean over backwards” principle.
Publication Bias and the File Drawer Problem — the institutional version of the Millikan effect that drives biased consensus in modern fields.
Registered Reports: The Format That Could Solve the Replication Crisis — the structural implementation of Feynman’s prescription in modern journal publishing.

Frequently Asked Questions

Where can I read the original Feynman address?

The speech was first published in Caltech’s alumni magazine Engineering & Science, vol. 37, no. 7, June 1974, pages 10–13. Caltech maintains a freely accessible scan at calteches.library.caltech.edu. The text was reprinted, with light edits, as the closing chapter of Feynman’s 1985 memoir Surely You’re Joking, Mr. Feynman!, which is the version most readers have encountered. Both texts are the same speech; the memoir version is slightly tightened.

Is the cargo-cult anthropology in Feynman’s speech accurate?

It is a simplification, and subsequent ethnographic work has complicated the popular picture of Pacific cargo cults considerably. The cults were complex religious, political, and anti-colonial movements rather than naive imitations of foreign behavior. The historical accuracy of the metaphor is contested. The methodological point the metaphor was being used to make is not affected by the anthropological imprecision; the metaphor would work just as well with any other example of form-without-substance imitation.

Did Feynman name the specific fields he thought were cargo cult science?

He named psychology, education research, and what he called “witch doctor” practices. He was diplomatic about not naming individuals, and his examples were chosen to illustrate the methodological pattern rather than to attack particular subfields. The application of the framework to specific modern fields — social psychology, nutritional epidemiology, management consulting, technical analysis in finance — is the work of later commentators, but follows directly from the criteria Feynman articulated.

How does cargo cult science relate to fraud?

The two are different phenomena. Fraud is the deliberate fabrication or manipulation of data; cargo cult science is the unwitting performance of scientific rituals without the underlying epistemic mechanism. Most cargo cult science is not fraud; the researchers are sincere, the methods are technically defensible, the published papers are honest reports of what the researchers actually believed they had found. The problem is structural rather than ethical. Diederik Stapel committed fraud; the broader social-priming literature he was embedded in was, by Feynman’s criteria, also doing cargo cult science even in its honest portions. The two diagnoses are independent and can apply separately or together.

What is the single most useful operational rule from Feynman’s framework?

Before accepting any “evidence-based” claim, ask whether the field has institutional mechanisms that force individual practitioners to expose their work to disconfirming evidence — pre-registration, registered reports, open data, routine independent replication, publication of null results. If the field does not have these mechanisms, the cargo cult risk is essentially unbounded, and the default posture toward any single claim from that field should be presumptive skepticism. Wait for independent replication. Treat single-study findings as candidate hypotheses rather than as facts. This rule, applied consistently, will save you from most of the strategic mistakes that flow from over-trusting findings produced by fields that have not adopted leaning-over-backwards mechanisms.

Has the situation improved since 1974?

In specific fields, dramatically. Post-2015 social psychology has institutionalized pre-registration, registered reports, and replication initiatives at a scale that would have surprised Feynman. Clinical trial methodology has tightened substantially through the FDA registry, the CONSORT reporting standards, and the systematic-review infrastructure. Physics has continued to do what physics has historically done well. In other fields — much of nutritional epidemiology, large parts of corporate consulting and strategy, much of educational research, substantial portions of management science — the institutional reforms have penetrated less, and the cargo cult mechanism continues to produce unreliable literatures. The diagnosis is uneven and field-specific. The framework helps you identify which fields have done the work and which have not.

replication-crisiscargo-cult-sciencefeynmanscientific-integrityevidence-evaluation

Atticus Li

Experimentation and growth leader. CXL-certified CRO practitioner, Mindworx-certified in behavioral economics. Led 100+ in-house experiments at NRG in 2025, with project evidence and limits documented in the case studies.

About LinkedIn Newsletter

The Metaphor

The Millikan Example

The Rat-Running Example

The Prescription: Lean Over Backwards

Why The Speech Reads As Prophecy

Modern Applications

The Strategist’s Diagnostic

Sources

Related Reading

Frequently Asked Questions

Related Articles

Cohen's d And The Misuse Of "Small/Medium/Large" Effect Sizes

The False Consensus Effect: Why You Think Everyone Agrees With You

The Barnum/Forer Effect: Why Personality Tests And Horoscopes Feel So Accurate

Get the WeeklyExperimentation Playbook

Get the Weekly
Experimentation Playbook