Cancer Screening Overdiagnosis: When Finding Cancer Earlier Doesn't Save Lives

Atticus Li

← The Replication Crisis · replication-crisis

Cancer Screening Overdiagnosis: When Finding Cancer Earlier Doesn't Save Lives

For decades "screening saves lives" was treated as obviously true. Modern epidemiology has documented a systematic problem: overdiagnosis. The Korean thyroid epidemic, mammography reviews, and PSA evidence all show the same pattern.

By Atticus Li May 26, 2026 28 min read

For most of the second half of the twentieth century, “screening saves lives” was treated by patients, physicians, journalists, and public-health agencies as something close to a self-evident truth. The intuition behind it was straightforward and emotionally durable. Cancer kills people. Cancers grow over time. A cancer detected when it is small and localized is in principle easier to remove or treat than the same cancer detected after it has spread. Therefore, screening that detects cancers early should improve outcomes. The campaigns that built the modern cancer-screening infrastructure --- mammography for breast cancer, PSA testing for prostate cancer, the rise of ultrasound and palpation campaigns for thyroid cancer, and most recently low-dose CT screening for lung cancer in high-risk smokers --- all rested on this intuition. The slogans were not technically wrong, but they were missing a piece of biology and a piece of statistics that has taken modern epidemiology several decades to confront. Some of the cancers we find with screening would have killed the patient if not treated. Some of them would have not. The proportion of detected cancers that fall into each category turns out to depend on the cancer type, on the screening technology, and on the population, and for several major cancers the proportion of detected cancers that would never have caused symptoms or death --- a category that the field now calls overdiagnosis --- has turned out to be uncomfortably large.

The most concrete demonstration of the problem came not from a controlled trial but from a natural experiment in South Korea. Starting in 1999, the Korean government, as part of a broader cancer-screening initiative, made ultrasound examination of the thyroid widely and cheaply available to the general population. Over the next fifteen years, the recorded incidence of thyroid cancer in South Korea rose approximately fifteen-fold, from roughly 5 per 100,000 in 1993 to over 70 per 100,000 by 2011, making thyroid cancer the most commonly diagnosed cancer in Korean women. Surgical removal of the thyroid became a routine procedure on a scale that produced its own medical-economic ecosystem. The natural test was the mortality data. If the rise in detected cases reflected the early discovery of cancers that would, untreated, have eventually killed people, the mortality rate from thyroid cancer should have fallen as the diseases were caught and treated earlier. Mortality from thyroid cancer in South Korea over the same period was essentially flat. The fifteen-fold rise in incidence with no movement in mortality was, in epidemiological terms, definitive: the cases being detected by screening were not, on the whole, cases that would have caused death if left undetected. They were overdiagnoses. The full story was published in 2014 by Hyeong Sik Ahn, Hyun Jung Kim, and H. Gilbert Welch in the New England Journal of Medicine under the title “Korea’s thyroid-cancer ‘epidemic’ --- screening and overdiagnosis.”

H. Gilbert Welch, the third author on that paper, had spent the previous decade building the conceptual and empirical foundations for understanding overdiagnosis as a category. His 2011 book with Lisa Schwartz and Steven Woloshin, Overdiagnosed: Making People Sick in the Pursuit of Health, had argued, on the basis of accumulating evidence from breast, prostate, thyroid, and other cancers, that the pursuit of ever-earlier detection had outrun the underlying biology of which cancers actually matter. The Korean thyroid case became the textbook illustration because the natural-experiment design was so clean: a discrete policy change, a large population, a long follow-up, and outcomes that did not require interpretation. For strategists, executives, clinicians, and policymakers, the Korean thyroid story is the cleanest single case study available for the proposition that “we found more cases” and “we saved more lives” are not the same claim, and that the relationship between them must be demonstrated, not assumed.

This is the story of overdiagnosis as a concept and a body of evidence. It covers Welch’s conceptual framework, the Korean thyroid epidemic, the mammography overdiagnosis debate, the PSA controversy, the statistical traps that make naive analyses of screening data overstate benefit, the contemporary responses (shared decision-making, risk-stratified screening, active surveillance for low-grade cancers), and the broader lesson for any strategist asked to evaluate any claim that detecting something earlier will improve outcomes.

The Welch Framework: What Overdiagnosis Means

The conceptual core of Welch’s framework is the distinction between cancer as a histological category and cancer as a clinical course. The cells that pathologists call “cancer” are defined by a set of microscopic features: abnormal morphology, loss of normal tissue architecture, evidence of invasion across basement membranes. These features are real and they are visible. What is also real, and what was poorly appreciated for most of the history of oncology, is that cells with these features do not all behave the same way over time. Some cancers grow quickly and metastasize and kill. Some cancers grow slowly and never spread. Some cancers grow for a while and then stop, or regress. The behavior of an individual tumor is not fully determined by its histological appearance under a microscope.

The implication is that when a screening program detects a tumor, the detection itself does not tell you which clinical course that tumor would have followed. The pathologist can tell you the cancer is malignant by microscopic criteria. The pathologist generally cannot tell you whether this particular cancer, in this particular patient, would have grown to clinical significance within the patient’s remaining lifespan. The screening program detects all detectable cancers in the screened population --- the lethal ones, the indolent ones, and the ones in between --- and then the clinical system processes them all as if they were the lethal type, because the prudent default response to a malignant histology has always been treatment.

Welch’s argument, developed in Overdiagnosed and a series of subsequent papers, is that the indolent and slow-growing tumors are not a marginal category. For several common cancers, they are a substantial fraction of what screening detects. The overdiagnosed cancers are not detection errors in the conventional sense --- the pathology is correctly read --- but they are clinical-significance errors. The patient is told, accurately, that they have cancer. The patient is treated, often aggressively. The patient is then counted, by the cancer-survival statistics, as a successful early-detection case. The treatment morbidity is real. The cancer that was detected, in the alternative world where it was never detected, would not have caused symptoms or death. The patient lived because they would have lived. The screening program is credited with saving a life it did not save, and the patient bears the costs of a treatment they did not need.

The framework yields a specific prediction. If a screening program produces a large increase in the incidence of a cancer with no corresponding decrease in mortality from that cancer, the new detected cases are, on the whole, overdiagnoses. If the screening program produces a smaller increase in incidence with a substantial decrease in mortality, the new detected cases are, on the whole, real early detections. The data needed to distinguish these scenarios are population-level incidence and mortality data over a sufficiently long period that any real benefit would have time to appear. The data are not always easy to obtain, but for several cancers, they have now been obtained.

Korea’s Thyroid Cancer Epidemic

The Korean thyroid story is the cleanest because the natural experiment was particularly stark. Until 1999, thyroid screening was uncommon in South Korea. In that year, as part of a national cancer-screening initiative, ultrasound examination of the thyroid was bundled into the routine health-checkup protocols offered by the national health insurance system, and a competitive market of private hospitals began offering thyroid ultrasound at very low cost, often as an add-on to general health screening. The technology --- high-frequency ultrasound --- is sensitive enough to detect thyroid nodules of a few millimeters, smaller than any nodule that would be palpable on physical examination.

What the technology found, when applied to the general population, was that small thyroid nodules are extremely common. Autopsy studies done long before the screening era had established this: a substantial fraction of adults harbor small papillary thyroid carcinomas, the most common type, that they will go to their grave never knowing about. The ultrasound program detected these. Korean physicians, applying the standard-of-care protocols of the time, biopsied the suspicious nodules and, when the biopsies confirmed papillary thyroid carcinoma, recommended thyroidectomy (surgical removal of the thyroid). The cancer-incidence statistics, which count newly diagnosed cancers per year, captured the result: a fifteen-fold rise over fifteen years, with the vast majority of the new diagnoses being small papillary carcinomas detected by ultrasound rather than by symptoms.

The mortality data, by contrast, were flat. Death rates from thyroid cancer in South Korea remained essentially unchanged across the period when incidence was rising fifteen-fold. The flat mortality curve in the face of the soaring incidence curve was the signature of overdiagnosis. If the new detected cancers had been clinically meaningful --- if they were cases that would otherwise have eventually progressed and killed --- the mortality rate should have declined as those cases were caught and removed earlier. The Ahn, Kim, and Welch 2014 paper that documented the pattern was titled “Korea’s thyroid-cancer ‘epidemic’ --- screening and overdiagnosis” for a reason. The epidemic was an epidemic of diagnosis, not of disease. DOI: 10.1056/NEJMp1409841

The clinical and economic consequences were substantial. Thyroidectomy is not a minor procedure. Patients require lifelong thyroid-hormone replacement therapy. A fraction of patients experience surgical complications including damage to the parathyroid glands or to the recurrent laryngeal nerve, the latter potentially causing permanent voice changes. The Korean experience generated a population of patients with these post-surgical conditions, many of whom would, on the underlying epidemiological evidence, never have developed clinically significant thyroid disease. The Korean Society of Endocrinology and the Korean Thyroid Association revised their screening guidelines in the wake of the evidence, recommending against routine thyroid ultrasound screening of asymptomatic adults. The rate of new thyroid surgeries fell. The episode is now taught in medical schools internationally as the canonical demonstration of how a screening program can generate disease rather than detect it.

Mammography: A More Difficult Case

The breast-cancer screening story is more complicated than the thyroid story because the underlying biology of breast cancer is more heterogeneous, the screening technology has changed over time, and the long history of mammography screening means the relevant trial and observational data are extensive but not unambiguous. The headline empirical question is the same: of the breast cancers detected by mammography, what fraction would never have caused clinically significant disease in the patient’s remaining lifetime? The answers in the published literature span a wide range, but the most cited modern estimates put the overdiagnosis rate at somewhere between 19 and 31 percent of screen-detected invasive breast cancers, depending on the population studied and the methodology used.

The most influential single paper on this question in the US context is “Effect of three decades of screening mammography on breast-cancer incidence,” by Archie Bleyer and H. Gilbert Welch, published in the New England Journal of Medicine in 2012. The paper looked at thirty years of US SEER-program incidence data, comparing the rise in early-stage breast-cancer diagnoses against the corresponding fall in late-stage diagnoses. If mammography were working primarily by catching cancers earlier in their progression, the rise in early-stage diagnoses should have been matched by a roughly equal fall in late-stage diagnoses, as cancers were captured before they had time to progress. What Bleyer and Welch found was that the rise in early-stage diagnoses was much larger than the corresponding fall in late-stage diagnoses. Their estimate was that roughly 31 percent of the breast cancers detected in the screened population were overdiagnoses --- cancers that, absent screening, would never have surfaced clinically. DOI: 10.1056/NEJMoa1206809

The Bleyer-Welch estimate was disputed. Other groups, using different methodologies and different reference populations, produced lower estimates, generally in the 10 to 25 percent range. The independent UK review of breast-cancer screening commissioned by the UK National Screening Committee, led by the epidemiologist Michael Marmot and published in 2013 in the British Journal of Cancer under the title “The benefits and harms of breast cancer screening: An independent review,” landed on a working estimate of approximately 19 percent overdiagnosis among the cancers detected in screened women. DOI: 10.1038/bjc.2013.177

The Marmot review’s framing was the most useful for non-specialist audiences. For every 10,000 women aged 50 invited to screening for the following 20 years, the review estimated that mammography would prevent roughly 43 breast-cancer deaths but would also produce roughly 129 overdiagnosed cases that would be treated for cancers that would otherwise never have surfaced. The benefit-to-harm ratio, on these numbers, was approximately 1 life saved for every 3 women overdiagnosed. The review did not conclude that screening should be abandoned. It concluded that women considering mammography should be told both numbers. The Norwegian epidemiologists Mette Kalager, Hans-Olov Adami, and Michael Bretthauer, in a 2014 BMJ piece titled “Too much mammography,” took a similar line, arguing that the cumulative evidence justified more cautious recommendations and greater emphasis on informed patient choice. DOI: 10.1136/bmj.g1403

The complexity of the breast-cancer story is partly biological. Some of the lesions detected by mammography --- particularly ductal carcinoma in situ (DCIS), a non-invasive precursor that may or may not progress to invasive cancer --- have variable enough natural histories that the overdiagnosis estimates depend heavily on how DCIS cases are counted. Some of the complexity is that breast-cancer mortality has fallen substantially over the period when mammography was rising, but a substantial fraction of that fall is attributable to better treatment (particularly tamoxifen and the modern adjuvant-therapy regimens) rather than to earlier detection. Disentangling the two contributions is difficult, and reasonable epidemiologists have reached different conclusions about how much of the mortality decline to credit to screening versus treatment. What is no longer disputed is that overdiagnosis is a substantial and quantifiable harm, that the harm must be weighed against the benefit, and that women considering screening have a right to know both numbers.

Prostate Cancer And The PSA Controversy

The prostate-cancer story is, in some ways, the most extreme. PSA testing --- a blood test measuring prostate-specific antigen, an enzyme produced by prostate tissue --- was introduced into widespread clinical use in the late 1980s as a screening tool. The intuition was the standard one: detect prostate cancer earlier and treatment would be more effective. What followed was a textbook example of how screening can produce diagnostic-incidence epidemics without corresponding mortality benefit. PSA-driven biopsies found large numbers of prostate cancers in older men, most of which were the slow-growing, well-differentiated forms that autopsy studies had long established are extremely common in older men and that often never progress to clinically meaningful disease. Treatment --- typically radical prostatectomy or radiation therapy --- carried substantial risks of urinary incontinence, erectile dysfunction, and bowel symptoms. The mortality benefit, when finally tested by large randomized trials in the 2000s and 2010s (the European ERSPC trial and the US PLCO trial), turned out to be modest at best and was not even consistently demonstrable across the two trials.

The US Preventive Services Task Force grappled with this evidence for years. In 2008 and again in 2012, the USPSTF recommended against routine PSA screening for most men, arguing that the modest mortality benefit did not justify the substantial treatment-related harms in the average-risk population. In 2018, after additional trial data and after the rise of active surveillance as a treatment option for low-grade disease, the USPSTF revised its recommendation to a more nuanced position. The 2018 recommendation, published in JAMA as “Screening for Prostate Cancer: US Preventive Services Task Force Recommendation Statement” by Kirsten Bibbins-Domingo and colleagues, was that men aged 55 to 69 should make an individualized decision about PSA screening after a discussion with their clinician about benefits and harms, and that men aged 70 and over should not be routinely screened. DOI: 10.1001/jama.2018.3710

The 2018 framing was deliberate. It did not say “screen everyone” or “screen no one.” It said that the benefit-harm balance for PSA screening was close enough to neutral, at the population level, that the right answer depended on the individual patient’s values and risk profile. A man who placed very high weight on extending life and very low weight on the morbidity of unnecessary treatment might rationally choose screening. A man who placed substantial weight on avoiding incontinence and erectile dysfunction and was willing to accept a small increase in prostate-cancer mortality might rationally choose against screening. The recommendation was an institutional acknowledgment that the underlying evidence did not support a single best answer for all patients in the screening-age range.

The PSA story is the clearest illustration of two related features of the modern screening-evaluation landscape. First, the magnitude of the overdiagnosis problem can be large enough to flip the population-level cost-benefit analysis from positive to ambiguous. Second, even when the population-level balance is ambiguous, individual patients can have rational preferences that point in different directions, and the role of the screening recommendation is to support that individual choice rather than to mandate a single course of action.

Length-Time Bias And Lead-Time Bias

Two statistical features of how screening data are naively analyzed produce a systematic overstatement of the apparent benefit of screening. They are length-time bias and lead-time bias. Both are technical, both have been understood by epidemiologists for decades, and both nonetheless continue to distort popular-press reporting and even some clinical-trial interpretations.

Lead-time bias is the simpler of the two. It applies when screening detects a cancer earlier in its biological course than the cancer would otherwise have been detected by symptoms. If you compare the survival times of patients diagnosed by screening with the survival times of patients diagnosed by symptoms, the screened patients will, by construction, have longer survival times after diagnosis --- because the clock started earlier. This is not a benefit. The screened patient may still die at the same chronological age as the symptom-detected patient. The screened patient just spent more of their pre-death time knowing they had cancer and being treated for it. The five-year-survival rate, a commonly cited metric in cancer reporting, is particularly susceptible to lead-time bias: a screening program can produce dramatic apparent improvements in five-year survival without changing the age at which patients die. The mortality rate, by contrast, is largely immune to lead-time bias, because it counts deaths per unit population per year regardless of when the diagnosis occurred. This is why epidemiologists insist on mortality data rather than survival data for evaluating screening programs.

Length-time bias is more subtle. It applies because screening, by virtue of operating at intervals (annually, biennially, every five years), is more likely to detect slow-growing cancers than fast-growing cancers. A fast-growing cancer that develops, grows, and presents with symptoms in the interval between two screenings will be detected as a symptom-driven case, not as a screen-detected case. A slow-growing cancer that exists in a detectable but asymptomatic state for years is much more likely to be caught at one of the regular screenings. Slow-growing cancers are, on average, less lethal than fast-growing cancers --- they grow slowly, after all, and may never reach clinical significance. The screen-detected population is therefore enriched, by the structure of the screening interval, with the less lethal subset of the underlying cancer population. The screen-detected cohort will have better outcomes than the symptom-detected cohort even if the screening itself contributes nothing, because the two cohorts are drawn from different parts of the cancer-aggressiveness distribution.

Both biases will tend to make a naive comparison of “screened versus unscreened patients” overstate the benefit of screening. The only research design that fully escapes both biases is a randomized trial of screening versus no screening, with outcomes measured as mortality in the full intention-to-treat population, including patients who were never diagnosed with cancer at all. This is the design that has been used for the major modern screening trials and that has produced the much more modest estimates of screening benefit than the popular-press numbers had suggested. When you see a claim that “five-year survival has improved dramatically in the screening era,” you should immediately ask whether that claim is based on a comparison of screened versus unscreened patients (susceptible to both biases) or on population-level mortality rates (immune to both). The former is decorative. The latter is what matters.

The Modern Response: Shared Decisions And Active Surveillance

The contemporary response to the overdiagnosis evidence has not been the abandonment of cancer screening but a substantial change in how it is offered and how the resulting diagnoses are treated. Three components of the response are worth describing.

Shared decision-making. Modern screening guidelines increasingly frame the screening decision as a value-laden choice that the patient should make in consultation with a clinician, rather than as a clinical default to be applied uniformly. The 2018 USPSTF prostate-cancer recommendation is the clearest institutional example. The patient is told the magnitude of the mortality benefit, the magnitude of the overdiagnosis and treatment-harm risks, and is invited to weigh them. The decision-aid literature in this area has matured substantially, with structured tools that present the relevant numbers in formats designed for non-specialist audiences. The earlier paternalistic frame --- “you should be screened because screening saves lives” --- has been largely replaced by a more honest frame: “screening has benefits and harms; here is what we know about both; what matters to you?”

Risk-stratified screening. Rather than offering the same screening protocol to everyone in an age range, modern guidelines increasingly stratify by risk profile. High-risk patients --- women with a strong family history of breast cancer, men with a strong family history of aggressive prostate cancer, smokers in the high-risk lung-cancer-screening age range --- have a higher prior probability of clinically significant disease and therefore a more favorable benefit-harm balance for screening. Low-risk patients have a less favorable balance and may rationally choose less intensive screening or no screening at all. The lung-cancer low-dose CT screening guidelines, which are restricted to current and former heavy smokers in a specific age range, are a clear example of this approach: even where screening saves lives, it is offered only to the population whose risk profile makes the benefit likely to exceed the harms.

Active surveillance for low-grade cancers. For prostate cancer in particular, but increasingly for other indolent cancer types including some thyroid and breast lesions, the standard response to a low-grade diagnosis is no longer immediate aggressive treatment. It is active surveillance: regular monitoring with imaging, biopsy, and biomarker testing, with treatment reserved for cases that show evidence of progression. The shift to active surveillance has substantially reduced the treatment-related morbidity associated with the diagnosis without measurable harm to mortality outcomes in the patient populations where it has been adopted. It is, in effect, a clinical recognition that not every histological diagnosis of cancer requires immediate aggressive intervention --- a recognition that the Welch framework would have predicted and that the empirical outcome data have validated.

The three responses are partial. They do not solve the underlying problem that a screening program produces a population of diagnosed patients whose individual cancers cannot be reliably classified into “lethal” and “indolent” at the time of diagnosis. They do reduce the harm that the screening program produces, by giving patients and clinicians better tools for managing the uncertainty.

Strategist Takeaway: Evaluating Any “Early Detection” Claim

For a strategist outside oncology --- someone asked to evaluate a claim that some form of early detection, in any domain, will improve outcomes --- the cancer-screening literature is a calibration manual. The intuition that earlier detection must be better is plausible and emotionally durable, but it is not self-validating, and several specific traps recur across domains.

The first question to ask is whether the comparison being offered is screened-versus-unscreened in the same population, or whether it is implicitly a comparison between a present screened cohort and a historical unscreened cohort. The latter comparison is contaminated by every secular trend in treatment, diagnostics, and underlying disease epidemiology and will systematically overstate the benefit of screening. The reliable comparison is a randomized trial of screening versus no screening with mortality (or its domain-equivalent) measured in the full intention-to-treat population.

The second question is whether the metric being reported is susceptible to lead-time and length-time bias. Survival time from detection, time-to-event from screening, and five-year survival rates are all susceptible. Population-level mortality rates and event rates in the full population are not. If the favorable claim is being made on a susceptible metric, you should expect it to overstate the benefit.

The third question is whether the screening technology is sensitive enough to detect a substantial fraction of the underlying indolent or benign cases. Higher sensitivity is not always better. A technology that detects only the high-grade, clinically meaningful cases has a more favorable benefit-harm profile than a technology that detects the full spectrum of lesions, because the latter generates more overdiagnosis. Modern ultrasound for thyroid nodules and modern high-resolution mammography both have this feature: their sensitivity exceeds what is clinically useful, and the excess sensitivity is harm-producing.

The fourth question is what happens to the patient after detection. If the standard-of-care response to a positive screen is aggressive treatment with substantial side effects, the harm side of the benefit-harm balance is larger than it would be if the response were active surveillance. If the screening program is being introduced into a clinical system whose default response is “treat everything that screens positive,” the overdiagnosis harms will be high. If the system has the infrastructure for active surveillance, the harms can be lower.

For business-domain “early detection” claims --- a fraud-detection system, a churn-prediction model, a security-anomaly detector --- the same structure applies. The system will produce a set of flagged cases. Some of those cases will be true positives that would have caused real harm if undetected. Some of them will be false positives. Some of them will be technically-true positives that would never have actualized into the real harm the system is supposed to prevent. The cost of the false-positive response is the analog of the overdiagnosis harm: real, often substantial, and usually invisible in the headline performance metrics of the detection system. The discipline of the cancer-screening field, after thirty years of confronting overdiagnosis, is the discipline that any decision-maker introducing a high-sensitivity detection system needs to import. Detection is not benefit. The chain from detection to outcome is mediated by the underlying base rate of clinically significant cases, by the response to the detection, and by the harms of the response. Each link must be measured, not assumed.

The general principle is the one Welch articulated in Overdiagnosed and has continued to articulate in subsequent work: the question is not whether finding more cases is technically possible, because for almost any condition it is. The question is whether finding more cases improves outcomes that matter, in the population the screening will be applied to, given the standard response to a positive screen. Once you ask the question in that form, “screening saves lives” stops being a slogan and becomes an empirical claim that has to be demonstrated case by case. Sometimes the demonstration succeeds. The major modern lung-cancer screening trial, the US National Lung Screening Trial, did show a mortality benefit for low-dose CT screening in high-risk smokers, even with an overdiagnosis rate of approximately 10 to 20 percent. Sometimes it fails, as the Korean thyroid case spectacularly demonstrated. The discipline is in reserving judgment until the demonstration has been made.

Sources

Ahn, H. S., Kim, H. J., & Welch, H. G. (2014). Korea’s thyroid-cancer “epidemic” — screening and overdiagnosis. New England Journal of Medicine, 371(19), 1765-1767. DOI: 10.1056/NEJMp1409841
Bibbins-Domingo, K., Grossman, D. C., Curry, S. J., et al. (US Preventive Services Task Force) (2018). Screening for prostate cancer: US Preventive Services Task Force recommendation statement. JAMA, 319(18), 1901-1913. DOI: 10.1001/jama.2018.3710
Bleyer, A., & Welch, H. G. (2012). Effect of three decades of screening mammography on breast-cancer incidence. New England Journal of Medicine, 367(21), 1998-2005. DOI: 10.1056/NEJMoa1206809
Kalager, M., Adami, H. O., & Bretthauer, M. (2014). Too much mammography. BMJ, 348, g1403. DOI: 10.1136/bmj.g1403
Marmot, M. G., Altman, D. G., Cameron, D. A., Dewar, J. A., Thompson, S. G., & Wilcox, M. (2013). The benefits and harms of breast cancer screening: an independent review. British Journal of Cancer, 108(11), 2205-2240. DOI: 10.1038/bjc.2013.177
Welch, H. G., Schwartz, L. M., & Woloshin, S. (2011). Overdiagnosed: Making People Sick in the Pursuit of Health. Boston: Beacon Press.

Hormone Replacement Therapy: The WHI Reversal — Another large-scale medical reversal where the prevailing wisdom failed when finally tested against a properly randomized comparison.
Beta-Carotene And The CARET Trial — A trial that found the intervention being studied increased mortality rather than reducing it, illustrating why the underlying evidence must be tested rather than assumed.
Antidepressant Publication Bias: Turner 2008 — How the published medical literature systematically distorted the evidence base for a major drug class, and what the regulatory ground-truth data showed instead.

Frequently Asked Questions

Q: Does this mean I should stop getting cancer screenings? A: No, and that is not what the overdiagnosis evidence implies. What it implies is that the decision to be screened is a values-and-evidence decision that should be made with full information about both benefits and harms, rather than as a default. For some cancers and some patient profiles (lung cancer screening in heavy smokers, for example), the evidence supports screening fairly clearly. For others (thyroid screening in asymptomatic adults, PSA screening in average-risk older men), the evidence is more ambiguous and the right answer depends on individual risk and preference. Discuss your specific situation with a clinician who is familiar with the modern evidence.

Q: Why has this not been more widely publicized? A: The shared decision-making framing is now the standard approach in major guidelines documents (USPSTF in the US, NICE in the UK, the major specialty-society guidelines), but the institutional and cultural momentum behind “screening saves lives” messaging has been substantial, and the modern more-nuanced framing has been slower to reach popular-press coverage. The Welch books and a substantial accompanying literature have argued that the framing of screening should be more honest about overdiagnosis, and these arguments have been increasingly influential, but the public-communications environment around cancer is still substantially behind the technical evidence.

Q: Isn’t this just an argument against medical progress? A: No. The overdiagnosis framework does not say that diagnostic technology is bad or that early detection is bad. It says that the relationship between detection and outcome is not automatic, and that the harms of overdiagnosis are real and need to be weighed. The same framework supports both the recommendation in favor of lung-cancer screening for high-risk smokers (where the benefit clearly exceeds the harm) and the more cautious recommendation on PSA screening for average-risk men (where the balance is closer to neutral). The framework is a tool for evaluating which screening programs are net beneficial, not a blanket argument against screening.

Q: How would I know if a cancer I was diagnosed with was an overdiagnosis? A: For most cancers, you would not. The biology that distinguishes an indolent cancer from an aggressive one is not generally visible at the time of diagnosis, which is the core epidemiological problem. The implication is that the decision about treatment intensity should be informed by the typical natural history of cancers of the type and stage you have been diagnosed with, by the modern evidence on active surveillance as an alternative to immediate aggressive treatment, and by your individual preferences. For some cancer types (low-grade prostate cancer, certain small thyroid cancers, some DCIS in the breast), active surveillance is now a well-validated option that some patients rationally prefer.

Q: What is the strongest single piece of evidence for overdiagnosis as a category? A: The Korean thyroid epidemic, documented in the Ahn-Kim-Welch 2014 NEJM paper. The combination of a fifteen-fold rise in incidence with a flat mortality curve over the same period is the cleanest demonstration available that screening can produce diagnostic-incidence epidemics without corresponding mortality benefit. The natural-experiment design — a discrete policy change, a large population, long follow-up, unambiguous outcome data — does not require any of the statistical adjustments that complicate the interpretation of other screening evidence. It is the textbook case.

Q: Where does this leave AI-driven medical diagnostics? A: The next generation of detection technology, including AI-driven analysis of imaging and pathology, will be more sensitive than the technology it replaces. The cancer-screening literature is the reason to be cautious about that increased sensitivity. The question is not whether the AI can find more cases, because it can. The question is whether finding more cases improves outcomes, given that some of the additional detected cases will be overdiagnoses. The cancer-screening field’s discipline, of measuring mortality outcomes in randomized comparisons rather than accepting detection improvements as automatically beneficial, is exactly the discipline that AI-driven diagnostics will need to be evaluated against.

replication-crisiscancer-screeningoverdiagnosismedical-reversalsevidence-evaluation

Atticus Li

Experimentation and growth leader. CXL-certified CRO practitioner, Mindworx-certified in behavioral economics. Led 100+ in-house experiments at NRG in 2025, with project evidence and limits documented in the case studies.

About LinkedIn Newsletter

The Welch Framework: What Overdiagnosis Means

Korea’s Thyroid Cancer Epidemic

Mammography: A More Difficult Case

Prostate Cancer And The PSA Controversy

Length-Time Bias And Lead-Time Bias

The Modern Response: Shared Decisions And Active Surveillance

Strategist Takeaway: Evaluating Any “Early Detection” Claim

Sources

Related

Frequently Asked Questions

Related Articles

Cohen's d And The Misuse Of "Small/Medium/Large" Effect Sizes

The False Consensus Effect: Why You Think Everyone Agrees With You

The Barnum/Forer Effect: Why Personality Tests And Horoscopes Feel So Accurate

Get the WeeklyExperimentation Playbook

Get the Weekly
Experimentation Playbook