For most of the 20th century, the established medical consensus held that peptic ulcers were caused by stress, diet, and excess stomach acid. Treatment focused on suppressing acid: antacids in the 1940s and 1950s, H2-receptor blockers in the late 1970s, proton pump inhibitors from the late 1980s. The cultural archetype was the harried executive popping Tums between meetings, white-knuckling his way through a duodenal ulcer that everyone assumed his job had given him. In 1982, two Australian pathologists --- Robin Warren and Barry Marshall --- isolated a spiral bacterium from the stomachs of ulcer patients and proposed that the bacterium, not stress, was the cause. The gastroenterology establishment rejected the hypothesis on its face. In 1984, Marshall drank a Petri dish of the bacterial culture himself, developed gastritis within days, and treated it with antibiotics. The field still resisted for years. The 1994 NIH Consensus Conference finally endorsed antibiotic-based ulcer treatment. Marshall and Warren shared the Nobel Prize in 2005.

This is one of the cleanest examples in 20th-century medicine of a confidently-held expert consensus being substantially wrong about a causal mechanism, of a correct mechanism being available in the literature and ignored for a decade, and of the field eventually correcting --- slowly. For anyone whose work involves leaning on “expert consensus” in management, organizational behavior, behavioral economics, or any other field that traffics in causal claims about human systems, the story is worth understanding in detail. The medical case is unusually well-documented because Marshall was an unusually persistent self-experimenter, the bacterium is unusually easy to detect once you know to look for it, and the eventual vindication came with a Nobel Prize that produced extensive retrospective documentation. Most fields are not this lucky.

What The 20th-Century Medical Consensus Said

The framework that dominated peptic ulcer treatment from roughly 1910 through the late 1980s rested on a few interlocking ideas. Hyperacidity --- “no acid, no ulcer” --- was the central tenet, attributed to the surgeon Karl Schwartz in 1910 and elaborated through the century. The stomach lining was thought to be sterile, on the assumption that the acidic environment would kill any bacterium that tried to colonize it. Lifestyle factors --- stress, spicy food, alcohol, smoking, irregular eating --- were treated as the upstream causes of the hyperacidity. The “stressed executive’s ulcer” was such a fixed feature of American business culture that ulcer rates were tracked in occupational health surveys and used as a stand-in for organizational stress.

Treatment followed from the mechanism. Acid suppression became the dominant intervention. The first generation of antacids --- aluminum hydroxide, magnesium hydroxide, calcium carbonate --- worked by neutralizing acid already in the stomach. The breakthrough drug class of the late 1970s was the H2-receptor antagonist: James Black’s cimetidine, marketed as Tagamet by SmithKline starting in 1977, became the first billion-dollar drug. Ranitidine (Zantac, Glaxo, 1981) followed and eventually overtook it. In the late 1980s, omeprazole (Prilosec, Astra, 1988) launched the proton pump inhibitor class, which suppressed acid even more aggressively. By the early 1990s, the global market for acid-suppressing drugs was in the multiple billions of dollars per year. Pharmaceutical companies, gastroenterologists, and patients all had functioning relationships with this treatment paradigm. It worked in the sense that it suppressed symptoms and healed ulcers. It worked less well in the sense that ulcers usually came back as soon as patients stopped taking the drugs --- which patients did, because acid suppression was lifelong --- and the recurrence rate was understood as evidence that the underlying lifestyle factors had not changed.

Surgical management of ulcer disease was its own ecosystem. Vagotomy --- cutting the vagus nerve to reduce acid secretion --- was a major elective procedure in the 1950s through 1970s. Partial gastrectomy was used for severe or perforated ulcers. Surgical residents trained extensively in ulcer surgery. Surgical departments derived significant revenue from it. By 1990, ulcer surgery in the US was a substantial line of business.

Within this framework, the idea that a bacterium might be the cause was not just unsupported. It was thought to be impossible. The stomach was acid. Acid killed bacteria. End of argument. That belief was sufficiently entrenched that earlier observations of curved bacteria in human stomachs --- and there were such observations, going back to the late 1800s in European pathology literature --- were dismissed as contaminants or as artifacts of post-mortem overgrowth. The 1954 textbook Pathology of the Stomach by Eddy Palmer at the Walter Reed Army Medical Center reported the examination of more than 1,000 human stomachs and concluded that no bacteria lived in the gastric mucosa under any normal condition. That paper, more than any other single source, closed the question for two decades.

Marshall And Warren’s 1982-1983 Discovery

Robin Warren was a pathologist at the Royal Perth Hospital in Western Australia. Starting around 1979, while examining gastric biopsies under a microscope, he noticed small curved bacteria sitting on the surface of the gastric epithelium in samples from patients with active chronic gastritis. He noticed it consistently. The bacteria were present, in his samples, on roughly half of gastric biopsies. They were not present on the biopsies of patients without gastritis. He thought it was significant. Most of his colleagues thought it was a stain artifact or a contaminant.

In 1981, Warren was joined by Barry Marshall, a young internal-medicine registrar doing a rotation in gastroenterology. Marshall had no particular reason to take an interest in Warren’s curved bacteria --- they were a pathology problem, not a clinical one --- but Warren needed a clinical collaborator who could go back to the patients, take histories, and try to culture the organism. Marshall agreed. The two began a systematic study of 100 patients undergoing endoscopy at the Royal Perth Hospital. They biopsied gastric mucosa, looked for the curved bacteria under the microscope, tried to culture them, and correlated the results with the patients’ diagnoses.

The bacteria were difficult to culture. They are microaerophilic, meaning they grow best in low-oxygen conditions, and they are slow-growing on standard media. Marshall and Warren’s first 34 culture attempts failed. The breakthrough --- and it was a literal accident --- came over Easter 1982. The lab was short-staffed. A set of plates that would normally have been examined after 48 hours and then discarded was instead left in the incubator for five days. When the technicians returned, the plates had visible colonies of the organism. The bacterium needed longer than 48 hours to form visible colonies on the medium they were using. Once Marshall and Warren knew that, they could reliably culture it.

They published their first short report in The Lancet in June 1983, a single-page letter titled “Unidentified curved bacilli on gastric epithelium in active chronic gastritis” (Warren & Marshall, 1983). It was descriptive: it reported the observation that the curved bacteria were consistently present in patients with active chronic gastritis. It did not yet claim causation. A year later, in June 1984, they published the longer paper “Unidentified curved bacilli in the stomach of patients with gastritis and peptic ulceration” in The Lancet (Marshall & Warren, 1984). This paper laid out the 100-patient study, the consistent association of the bacteria with gastritis and ulcer disease, and the explicit hypothesis: the curved bacilli might be a primary cause of these conditions.

The reception was hostile. The hypothesis violated two well-established beliefs. First, that the stomach was sterile. Second, that ulcer disease was caused by acid and stress and treated by acid suppression. Reviewers at journals dismissed the paper. Senior gastroenterologists publicly described the work as implausible. The standard response Marshall received at conferences, in his own later retelling, was that gastritis was not a real disease, that the bacteria were a consequence of gastritis rather than its cause, and that even if the bacteria existed they were obviously secondary colonizers of damaged tissue --- not the cause of the damage.

The bacterium, eventually named Helicobacter pylori, was real. The hypothesis that it caused gastritis and peptic ulcer disease was correct. It would take another decade of evidence accumulation before the field accepted this.

The 1984 Self-Experimentation

The standard scientific path to demonstrating causation in microbiology runs through Koch’s postulates: the organism must be present in cases of the disease and absent in healthy controls (association), the organism must be culturable in pure form (isolation), the cultured organism must produce the disease when introduced into a healthy host (transmission), and the organism must be re-isolated from the experimentally infected host (recovery). Marshall and Warren had the first two postulates. They needed transmission. Transmission, in microbiology, is normally established in an animal model. Marshall tried. The bacterium does not reliably colonize the stomachs of standard laboratory animals. It is too specific to humans and a few related primates. Without a working animal model, the third postulate could not be satisfied through conventional means.

In July 1984, Marshall did what was, by the standards of the time, an entirely standard piece of historical medicine and, by the standards of modern research ethics, almost unthinkable. He cultured H. pylori from a patient with gastritis, mixed the culture in a peptone broth, and drank it. He documented his pre-experiment baseline endoscopy: his stomach was healthy, with no inflammation. He drank the broth. Within five days, he developed nausea, halitosis, and vomiting. Repeat endoscopy on day 10 showed acute gastritis with the bacteria present in his gastric mucosa. He was then treated with bismuth and tinidazole, an antibiotic regimen, and the symptoms and gastritis resolved.

He published the result in 1985 in the Medical Journal of Australia under the dry title “Attempt to fulfil Koch’s postulates for pyloric Campylobacter” (Marshall, 1985). The paper, which is short and almost matter-of-fact in tone, completed the basic causal chain in the most direct possible way: take a healthy human, introduce the organism, observe the disease develop, treat with the targeted intervention, observe recovery.

The self-experimentation has acquired a kind of mythic quality in retrospect. Marshall himself, in later interviews, has noted that it was less dramatic at the time than it sounds now. He did it without telling his wife in advance, which became a piece of family folklore. He recovered cleanly. The ethical question of whether he should have run the experiment on himself rather than going through a formal trial protocol was, in 1984 Western Australian medicine, less elaborate than it would be today. Self-experimentation had been a respected tradition in medicine for a century at that point. Werner Forssmann had cathetherized his own heart in 1929 and won a Nobel Prize for it in 1956. Marshall was working in the same tradition.

What the self-experiment did, in terms of the scientific argument, was foreclose a particular objection. The standard dismissal of Marshall and Warren’s work had been that the bacteria were colonizers of damaged tissue, not the cause of damage. The self-experiment showed that the bacteria, introduced into healthy tissue, were sufficient to produce gastritis. The “secondary colonizer” objection no longer worked. A new objection had to be found, or the hypothesis had to be taken seriously.

The field largely found new objections.

Why The Medical Community Resisted

The resistance to Marshall and Warren’s hypothesis is the part of the story that matters most for anyone trying to learn the meta-lesson. It is tempting to flatten the resistance into a simple morality tale --- stupid doctors refused to accept clear evidence --- but the actual structure of the resistance is more useful than that. Several distinct mechanisms operated simultaneously.

The first was the strength of the prior. The stomach is sterile was not a hunch. It was a position derived from a century of histopathology, from the well-understood pharmacology of stomach acid, and from a major reference work --- Palmer’s 1954 paper --- that had specifically examined and rejected the bacterial hypothesis. Reversing a prior of that strength requires more than two papers from a previously unknown junior researcher in Perth. It requires accumulating evidence from multiple independent labs, ideally with different methodologies. That accumulation did happen, but it took years.

The second was incumbent treatment. By the mid-1980s, H2 blockers were a multi-billion-dollar drug class. Acid-suppression-based ulcer treatment was the operational basis of much of gastroenterology and substantial chunks of pharmaceutical R&D. A bacterial hypothesis implied that the right treatment was a one-to-two-week course of antibiotics for $50 rather than lifelong acid suppression for $1,000 per year. The economic exposure of the incumbent paradigm was very large. This is not, by itself, evidence of conscious resistance --- the people running pharmaceutical companies were not all conspiring to suppress Marshall and Warren --- but it is a structural reason that the burden of proof for the new hypothesis felt very high. Existing institutions had real costs to absorb if the hypothesis was correct, and those costs influenced the implicit cost-benefit calculation of how thoroughly to investigate the claim.

The third was specialty boundaries. Marshall and Warren were a pathologist and an internal-medicine trainee in a regional Australian hospital. They were not gastroenterologists. They were not microbiologists by primary training. Their paper was published in The Lancet rather than in a specialty journal. The senior gastroenterologists who would have needed to take the hypothesis seriously had not trained these two researchers, did not know them socially, and had no reputational investment in their success. The opposite was actually true: the senior figures of gastroenterology had built careers on the acid-and-stress framework, and a junior outsider’s challenge to that framework was, structurally, a threat.

The fourth was the absence of a mechanism. Even if the bacteria were present, even if they could be cultured, even if the self-experimentation showed them sufficient to cause gastritis, the question of how a bacterium could survive in the stomach’s acidic environment had no obvious answer in 1984. The answer turned out to be that H. pylori produces large quantities of urease, which converts urea to ammonia and creates a localized neutral microenvironment around the organism. This was worked out in the late 1980s. Without the mechanism, the bacterial hypothesis required a kind of provisional commitment to a fact that violated known biochemistry. Many researchers were unwilling to make that commitment until the mechanism was understood.

The fifth was the slow pace of independent replication. Replicating Marshall and Warren’s culture results required learning that the plates needed to be incubated for five days rather than two. Many labs that attempted to replicate the work in 1984 and 1985 did so with standard 48-hour protocols and failed to find the organism. This was reported back as a failure to replicate. It took several years for the longer incubation protocol to become standard, and during those years the experimental record was full of negative replications that strengthened the case against the hypothesis. The negative replications were technically wrong --- the labs were not doing the experiment correctly --- but they were broadcasting their results as evidence, and the field was reading them as evidence.

This is the texture of how a correct hypothesis can be resisted for a decade. Not by conspiracy. By a combination of strong prior beliefs, incumbent economic interests, specialty boundaries, an absent mechanism, and a publication record full of false-negative replications. Each of these is the kind of structural feature that recurs in every field where expert consensus is held.

The Slow Validation

Through the second half of the 1980s and into the early 1990s, independent labs began to replicate the Marshall and Warren results once the longer incubation protocol diffused. Treatment trials testing antibiotic regimens against acid suppression began to appear. The pivotal trials --- by Hentschel and colleagues, by Graham and colleagues, by others --- demonstrated repeatedly that antibiotic eradication of H. pylori produced higher ulcer-healing rates than acid suppression alone and, more importantly, dramatically lower recurrence rates. Acid suppression healed the ulcer; antibiotics cured the disease.

The accumulating evidence pushed the field to a tipping point in the early 1990s. The European Helicobacter Pylori Study Group convened in 1994 and produced the Maastricht Consensus, recommending eradication therapy for confirmed peptic ulcer disease with H. pylori infection. The same year, the US National Institutes of Health convened a Consensus Development Conference on Helicobacter pylori in Peptic Ulcer Disease. The panel’s statement, published in JAMA in July 1994 (NIH Consensus Conference, 1994), endorsed the bacterial hypothesis cleanly: ulcer patients with H. pylori infection should be treated with antimicrobial agents in addition to antisecretory drugs, whether on first presentation or on recurrence. The 1994 NIH consensus is the moment that mainstream US gastroenterology accepted the new paradigm. It is also a moment one can date, retrospectively, against Warren’s first observation: it took roughly 15 years from Warren’s first 1979 microscope observation to the 1994 consensus. About 10 years from the 1984 Marshall self-experiment.

Within a few years, prescribing patterns shifted substantially. Antibiotic-based ulcer eradication --- typically a 7-to-14-day combination of a proton pump inhibitor with two antibiotics (clarithromycin and amoxicillin, or metronidazole) --- became standard care. Elective ulcer surgery, which had been a major gastroenterology and general-surgery sub-specialty, declined sharply. Cure rates for peptic ulcer disease, which had been essentially zero in the lifelong-acid-suppression era, climbed above 90% with eradication therapy. The acid-suppressing drugs did not go away --- they remain hugely useful for GERD, for gastritis from causes other than H. pylori, and for prevention of NSAID-associated ulcers --- but they stopped being the primary treatment for the primary cause of peptic ulcer disease.

The IARC (International Agency for Research on Cancer) classified H. pylori as a Group 1 carcinogen in 1994, on the strength of accumulated evidence that chronic infection was a major risk factor for gastric cancer and gastric MALT lymphoma. The recognition that a treatable bacterial infection caused some fraction of stomach cancer was a significant public health finding in its own right. Suerbaum and Michetti’s 2002 review in the New England Journal of Medicine (Suerbaum & Michetti, 2002) is the canonical synthesis of where the field stood roughly 20 years after Warren’s original observation: H. pylori is the most common bacterial infection in human beings worldwide, affecting roughly half the global population at the time of the review, and a major cause of upper GI disease.

The 2005 Nobel Prize

The 2005 Nobel Prize in Physiology or Medicine was awarded jointly to Robin Warren and Barry Marshall “for their discovery of the bacterium Helicobacter pylori and its role in gastritis and peptic ulcer disease.” The Nobel committee’s press release and background document, published October 3 2005, are unusually direct about the historical narrative. They explicitly note that the discovery “encountered scepticism” because it “ran counter to prevailing knowledge.” They describe Marshall’s self-experimentation as one of the demonstrations that helped overcome that scepticism. They credit the discovery with transforming peptic ulcer disease from “a chronic, frequently disabling condition to a disease that can be cured by a short regimen of antibiotics and acid secretion inhibitors.”

The Nobel committee’s framing is worth noting because Nobel committees do not generally adjudicate scientific controversies; they recognize completed work. By 2005, the bacterial hypothesis was the established consensus, and the prize was a recognition of the historical fact of the discovery rather than an intervention in an active scientific debate. The framing of the resistance, in the official documents, is matter-of-fact: it happened, it was wrong, the evidence eventually settled the question. The Nobel materials are the most accessible primary source for the standard narrative of this episode, and they are unusually candid about how long the standard narrative took to become standard.

What’s Honest To Say About “Stress” In Health Now

It would be wrong to take from this story the conclusion that stress is irrelevant to physical health, or that the 20th-century concern about stress was foolish, or that we now know everything we need to know about peptic ulcer disease. None of those is true.

Stress matters, in well-established ways, for a number of physical-health outcomes. Cardiovascular disease risk has well-documented associations with chronic psychological stress, with work strain, and with acute stressors --- the increase in heart attack rates in the 24 hours following a major earthquake is one of the cleaner natural experiments in the literature. Immune function is meaningfully affected by chronic stress, with measurable changes in cytokine profiles and slower wound healing in chronically stressed populations. Mental-health outcomes are obviously stress-related in ways that hardly need citing. Some gastrointestinal conditions other than peptic ulcer disease --- functional dyspepsia, irritable bowel syndrome --- do appear to have stress-responsive components. The category “stress affects health” is not the category that turned out to be wrong.

What turned out to be wrong, specifically, was the claim that stress was the causal agent in peptic ulcer disease. That claim was built on real observations --- ulcer patients did often report high-stress lives; acid secretion does respond to stress; the symptoms of ulcer disease did worsen during stressful periods --- but the observations were being interpreted within a framework that excluded the bacterial cause. Stress modulates how H. pylori-positive patients experience their disease. Stress does not cause the disease.

The most honest contemporary summary is something like: the vast majority of peptic ulcer disease worldwide is caused by chronic H. pylori infection. Most of the rest is caused by chronic use of non-steroidal anti-inflammatory drugs (NSAIDs), which damage the gastric mucosa through a separate mechanism. A small minority of cases have other causes (Zollinger-Ellison syndrome, severe physiological stress in critically ill patients, and a few rarer mechanisms). Psychological stress, in the sense the 20th-century clinical and popular literature meant it, is not a primary cause. The treatment that works for most peptic ulcer disease is antibiotic eradication of the organism. The treatment that does not work, beyond temporary symptom relief, is acid suppression alone in the presence of untreated infection.

This is a clean factual position. It is not anti-stress, not anti-medicine, not anti-anything. It is a description of what the evidence supports.

What This Means For Evaluating Expert Consensus In Behavioral Sciences And Management

For anyone whose professional decisions depend on expert consensus in fields that traffic in causal claims about human systems --- management, organizational behavior, behavioral economics, leadership theory, motivation science --- the H. pylori story is a useful calibration. The discipline being evaluated, in this case, was medicine, which has unusually strong methodological standards, unusually well-funded research infrastructure, and an unusually clear feedback loop (people get better or they don’t). It still got the mechanism of a major disease substantially wrong for most of a century. The relevant question is not whether this can happen in fields less rigorous than medicine. It is how often it does, and how a careful evaluator might detect it.

A few practical patterns travel.

The presence of incumbent treatments is a structural reason that incorrect mechanisms can persist. When a wrong causal model has produced a working industry --- pharmaceutical, consultative, surgical, training-based --- the cost of accepting a correct alternative mechanism is high enough that the burden of proof on the new model gets raised. In management, the parallel is the consulting industry built around a particular framework, or the training certifications built around a particular theory of leadership. When such an industry exists, alternative frameworks face a high evidentiary bar even when the data supports them. The Maslow hierarchy of needs and Goleman’s emotional intelligence are two cases in the behavioral-science literature where the original empirical basis is thin but the consulting and training industry built on the framework is large. The size of the industry is not itself evidence that the framework is correct.

The strength of the prior matters more than the strength of the new evidence. When a field has a long-standing commitment to a particular causal model, new evidence against that model is processed through that model. Marshall and Warren’s 1983 and 1984 papers were perfectly clear. They were read by a field that had decided, on prior grounds, that the bacterial hypothesis could not be correct. The papers were dismissed as artifact. This is the same pattern by which negative replications of priming effects in 2010-2015 were initially dismissed by senior social psychologists who had built careers on those effects: not because the new evidence was weak, but because the prior was so strong. When you encounter a confidently-held framework in management or behavioral economics, the relevant question is how the field would receive a paper that contradicted the framework. If the answer is it would be dismissed as artifact, the framework is at risk of being a stress-causes-ulcers situation.

The mechanism question is critical. One reason the bacterial hypothesis was rejected for a decade was that no one could explain how a bacterium could survive in stomach acid. The discovery of the urease mechanism in the late 1980s made the hypothesis biochemically plausible and accelerated its acceptance. When you are evaluating a framework in management or behavioral economics, the parallel question is what is the proposed causal mechanism, and is it specified precisely enough to be testable? Frameworks that rest on vague causal claims --- engagement drives performance, culture eats strategy for breakfast, purpose-driven employees are more productive --- are particularly vulnerable to the stress-causes-ulcers failure mode, where the framework captures a correlation but mis-identifies the causal direction or the actual mechanism.

The time scale of correction is long. From Warren’s first observation in 1979 to the NIH consensus in 1994 was 15 years. From the first peer-reviewed publication in 1983 to consensus was 11 years. These are time scales over which strategic decisions are made and re-made many times. A reasonable corollary: when a field is in the early years of revising a consensus, the public-facing version of the field still reflects the old consensus. The textbooks lag the literature; the popular press lags the textbooks; the consulting frameworks lag the popular press. If you are reading about behavioral science in the popular press in 2026 and the framing matches what was taught in 2010, the relevant question is whether the underlying evidence has moved since 2010.

Self-experimentation does not always work as fast as it should. Marshall’s 1984 self-experimentation was about as direct a demonstration of causation as biology allows. It did not produce immediate acceptance. The field still required another decade. This is a sobering finding for anyone hoping that a single decisive piece of evidence will move expert consensus quickly. The structural features that protect the consensus --- the incumbent treatments, the prior beliefs, the specialty boundaries, the absent mechanism --- continue to operate even after a clean demonstration. The lesson is not that decisive evidence does not work; it is that it works on a timescale of years, not weeks. The job of a careful evaluator is to recognize when the evidence has moved and to update before the field’s official consensus does.

The strategist’s question is not what does expert consensus say. It is which evidence claims have earned the right to influence a decision, and which are still being carried by the inertia of an earlier consensus. The H. pylori story is the cleanest medical case study of the difference.

Sources

This article is part of an ongoing series on famous claims, frameworks, and studies that did not survive scrutiny. Other entries cover the Stanford Prison Experiment, the Mehrabian 7-38-55 rule, Maslow’s hierarchy of needs, and the Hawthorne effect. The full hub lives at /replication-crisis/.

If you are building organizational, product, or policy strategy on behavioral-science or management-theory assumptions and want a careful audit of which of those assumptions still hold up, book an evidence review.

FAQ

Does stress cause any health problems? Yes. Chronic psychological stress has well-documented associations with cardiovascular disease (the post-earthquake heart-attack literature is the cleanest natural experiment), immune function and wound healing, mental health, and some functional gastrointestinal conditions like irritable bowel syndrome and functional dyspepsia. The category “stress affects health” is real and well-evidenced. The specific 20th-century claim that stress caused peptic ulcer disease was the part that turned out to be wrong. Peptic ulcer disease is caused, in the vast majority of cases, by chronic Helicobacter pylori infection or by chronic NSAID use, with stress modulating symptoms rather than causing the disease.

What about “stress-related” disorders generally? Be careful with this phrase. “Stress-related” can mean stress is a primary cause, an exacerbating factor, a co-occurring feature, or simply something patients report having more of around the time the disease becomes symptomatic. The peptic ulcer story is a cautionary case of patients reporting more stress around the time of disease activity, the field interpreting the reports as evidence of causation, and the actual causal agent being something different. Modern medicine is more careful about distinguishing stress as cause from stress as correlate, but the older framing persists in popular usage.

Why did the field resist Marshall and Warren for so long? A combination of factors operated simultaneously. The prior belief that the stomach was sterile was unusually strong, supported by a major 1954 reference paper that had specifically rejected bacterial colonization. The incumbent treatment paradigm --- H2 blockers and proton pump inhibitors --- was a multi-billion-dollar pharmaceutical industry whose economic model assumed lifelong acid suppression. The bacterium was difficult to culture without the longer incubation protocol, so many early replication attempts failed (because they used standard 48-hour protocols), and the failed replications were broadcast as evidence against the hypothesis. No one understood, until the late 1980s, how the bacterium could survive in stomach acid --- the urease mechanism was not yet worked out. And the original researchers were a pathologist and an internal-medicine trainee in a regional Australian hospital, not senior gastroenterologists, which raised the implicit credibility bar their work had to clear.

What does Marshall’s self-experimentation mean for research ethics now? Marshall’s 1984 self-experimentation would not pass a modern Institutional Review Board on first submission. He did not file a protocol in advance. He did not have an external monitor. He did not have a control. He did, importantly, run the experiment only on himself, which is the part of historical self-experimentation that contemporary research ethics is most willing to grant latitude on. The Werner Forssmann (1929 self-catheterization of his own heart) and J. B. S. Haldane (1930s decompression-chamber experiments) tradition continues in attenuated form. Most institutions today would require self-experimentation to go through formal protocol approval rather than be undertaken privately. The mythic quality the Marshall story has now is partly because it would be procedurally hard to repeat in modern research culture.

Did the medical community accept the discovery once Marshall did the self-experiment? Not immediately. The self-experimentation paper was published in 1985, the bacterium was renamed Helicobacter pylori in 1989, the urease mechanism was worked out in the late 1980s, the antibiotic-treatment clinical trials accumulated through the late 1980s and early 1990s, and the formal US consensus did not come until the 1994 NIH Consensus Conference. A decade elapsed between the self-experiment and the consensus. That is roughly the right timescale to expect for a major paradigm shift in medicine, but it is much slower than the underlying evidence would have justified. Self-experimentation is decisive evidence to an individual; the field updates on its own schedule.

Are there other cases of major medical consensus being substantially wrong? Yes, several well-documented ones. The lobotomy era (Egas Moniz won the Nobel Prize for it in 1949; the procedure was performed tens of thousands of times in the US before being abandoned). Thalidomide as a sedative for pregnant women in the late 1950s, before the teratogenic effects were recognized. Bloodletting as a general therapeutic intervention for most of the 19th century. The original framing of postmenopausal hormone replacement therapy as universally beneficial, which the Women’s Health Initiative trial in 2002 substantially revised. Each case has its own structure, but each shares the basic shape of a confidently-held consensus, an accumulating contrary evidence base, and a decade-or-more lag between when the evidence became clear and when the consensus updated.

What does this mean for current confidently-held frameworks in behavioral economics or management? The honest answer is that some current confidently-held frameworks will eventually be substantially revised, and we cannot know in advance which ones. What we can do is apply structural questions: Does the framework rest on a precisely-specified causal mechanism, or on vague directional claims? Does the framework have an incumbent industry --- consulting, training, software --- that would face high costs if the framework were revised? Does the field have an institutional pathway by which contrary evidence can be processed and propagated, or are skeptics structurally outside the field? When the answers to these questions are bad, the framework is in the stress-causes-ulcers risk zone. When they are good, the framework is more likely to be self-correcting on a faster timescale. The IAT (Implicit Association Test) literature, ego depletion, power posing, and parts of the priming literature have all gone through major revisions in the last 15 years. Other current frameworks --- engagement-as-performance-driver, certain leadership typologies, parts of the EQ literature --- have not yet been seriously stress-tested, and their long-run status remains open.

What is the practical takeaway for someone making strategic decisions today? Two things. First, when a framework is being cited to justify a decision, ask whether the framework is being supported by recent independent evidence or by inertia from an earlier consensus. The two look identical from the outside. Second, when a framework’s incumbent industry is large, the framework’s burden of proof is, structurally, being held lower than it would be otherwise --- and your own confidence interval on the framework should be wider than the field’s stated confidence interval suggests. These are not radical positions. They are the standard discipline of evidence evaluation, applied to fields that often do not hold themselves to that discipline.

replication-crisis h-pylori marshall-warren medical-history evidence-evaluation

Share this article
LinkedIn (opens in new tab) X / Twitter (opens in new tab)
Atticus Li

Experimentation and growth leader. CXL-certified CRO practitioner, Mindworx-certified behavioral economist (1 of ~1,000 worldwide). 200+ A/B tests across energy, SaaS, fintech, e-commerce, and marketplace verticals.