Base Rate Neglect: The Robust Reasoning Error In Diagnosis And Decisions (Anti-Example)

Atticus Li

← The Replication Crisis · replication-crisis

Base Rate Neglect: The Robust Reasoning Error In Diagnosis And Decisions (Anti-Example)

Tversky and Kahneman 1973 showed people ignore prior probabilities when given individuating descriptive information. Casscells 1978 found Harvard physicians made the same mistake on mammography. Bar-Hillel 1980 systematized the conditions. The effect is robust, well-replicated, and consequential across medicine, hiring, criminal justice, and investing. Here is how to use it.

By Atticus Li May 25, 2026 28 min read

A disease afflicts one in a thousand people in the population. There is a test for it. The test has a false-positive rate of five percent, meaning that five percent of people without the disease test positive. The test has no false negatives, meaning that everyone who actually has the disease tests positive. You take the test and get a positive result. What is the probability that you actually have the disease?

If you said something like ninety-five percent, or eighty percent, or anything in that neighborhood, you are in the overwhelming majority of educated people who answer this question. You are also wrong by about a factor of twenty. The correct answer is approximately two percent. Of every thousand people, one actually has the disease and tests positive; of the remaining nine hundred ninety-nine, about fifty test positive falsely. A positive test result therefore puts you in a pool of roughly fifty-one people, one of whom actually has the disease. Your posterior probability of having the disease is one in fifty-one, or about two percent.

This is the mammography problem, and it is the most famous illustration in the literature on base rate neglect. When Ward Casscells, Arno Schoenberger, and Thomas Grayboys ran a variant of this problem on Harvard Medical School faculty, residents, and fourth-year medical students in 1978, only about eighteen percent gave the correct answer. The most common answer, given by forty-five percent of respondents, was ninety-five percent. Most of the participating physicians had not just confused the answer slightly. They had thrown away the base rate entirely and reported the complement of the false-positive rate as the posterior probability. They were treating the prior probability of the disease as if it did not exist.

This pattern of behavior --- using individuating descriptive evidence (the test result) while ignoring the prior probability of category membership (the disease prevalence) --- is what Amos Tversky and Daniel Kahneman, working in the early 1970s, named base rate neglect. It is one of the most robust findings in cognitive psychology. It replicates across forty years of testing, across populations from undergraduates to medical specialists, across paradigms from verbal description tasks to numerical probability tasks. It survived the replication crisis intact. And it has consequences far beyond the lab: in medical diagnosis, criminal justice, hiring, investing, and any other applied domain in which a classification decision depends on the prior probability of the relevant category.

This article is an anti-example in the replication crisis hub. Unlike the contested findings discussed in other articles, base rate neglect is a phenomenon whose empirical robustness is not in serious dispute. The systematic mechanisms by which it operates have been documented in dozens of paradigms. The mitigations that work --- frequency framing, explicit prior elicitation, structured Bayesian decision aids --- are well-established. For anyone making classification or risk-assessment decisions, the practical question is not whether base rate neglect happens but how to design around it.

What Tversky and Kahneman 1973 Actually Demonstrated

The foundational paper is Kahneman, D., & Tversky, A. (1973). “On the psychology of prediction.” Psychological Review, 80(4), 237—251. DOI: 10.1037/h0034747.

The 1973 paper introduced the engineers-and-lawyers problem, which has become the canonical demonstration of base rate neglect. The setup is straightforward. Subjects are told that a panel of psychologists has interviewed and administered personality tests to one hundred professionals --- some engineers and some lawyers. They are then told the prior probability of category membership: in one condition, the panel consists of seventy engineers and thirty lawyers; in another, it consists of thirty engineers and seventy lawyers. Subjects are then given a brief personality description of one individual, drawn randomly from the panel, and asked to estimate the probability that the individual is an engineer.

The normative answer requires combining two pieces of information: the prior probability of being an engineer (the base rate), and the likelihood of the personality description given the category (the evidence). If the description is uninformative --- if it does not favor either category --- the posterior should equal the prior. If the description favors engineers, the posterior should be higher than the prior. The relative weight of the prior and the evidence is given by Bayes’ rule.

What Tversky and Kahneman found was that subjects almost entirely ignored the prior. When they were given the seventy-engineer panel and asked about an individual whose description was neutral with respect to profession, they gave probability estimates close to fifty percent --- as if the panel composition told them nothing. When they were given the thirty-engineer panel and asked about an individual whose description was clearly engineer-suggestive, they gave probability estimates near the maximum, as if the panel composition did not constrain them. The base rate was being treated as approximately irrelevant once any individuating information was available.

The 1973 paper also introduced the Tom W problem, which is structurally similar but uses a more vivid character description. Tom W is described as a graduate student in an unspecified field, with a personality sketch emphasizing tidiness, mechanical interests, lack of feeling for people, and high intelligence. Subjects are then asked to rank a list of possible graduate fields (engineering, computer science, library science, social work, humanities, and so on) by the probability that Tom is enrolled in each.

The normative ranking should weight two things: how representative the description is of each field, and how many students are enrolled in each field nationally (the base rate). What Tversky and Kahneman found was that subjects ranked fields almost entirely by representativeness. Fields with small national enrollments (computer science in the early 1970s, library science) ranked high if Tom’s description matched the prototype, and fields with large national enrollments (humanities, social sciences) ranked low if the description did not match. The base rate of field enrollment, which should have substantially pulled the rankings toward the larger fields, did essentially nothing.

The theoretical interpretation Tversky and Kahneman offered was that subjects were substituting a representativeness judgment for a probability judgment. When asked “what is the probability that Tom is in field X,” subjects were actually answering “how representative is Tom of field X.” Representativeness is a similarity judgment between an individual and a category prototype; it does not have a place for base rates in its computation. Probability theory does. The systematic neglect of base rates was therefore a direct consequence of this substitution.

The Tom W Variants and Robustness

The 1973 paper presented multiple variants of the Tom W setup, each designed to test whether the base rate neglect could be eliminated by changing the task in a way that should make the prior more salient.

In one variant, subjects were told explicitly what the base rates were and asked to consider them when making their judgments. The neglect was reduced but not eliminated --- the prior moved the posterior in the right direction, but by far less than Bayes’ rule would predict. In another variant, subjects were asked to estimate the probability that the description was a representative example of each field; here the answers tracked representativeness closely, as expected. In a third variant, subjects were asked to estimate the probability that the description was an unrepresentative or atypical example; here the answers diverged sharply from the probability of category membership, again confirming that the underlying judgment being made was about similarity rather than probability.

The most striking variant, however, was the one in which Tversky and Kahneman gave subjects worthless descriptive information --- a sentence containing no diagnostic content with respect to the categories being judged. Even with a deliberately empty description, subjects were less responsive to the base rate than they should have been. Adding any individuating information, even uninformative information, partially suppressed the use of base rates. This was a striking finding, because it suggested that the neglect was not specifically about ignoring priors in favor of strong evidence. It was about being thrown off the prior by the mere act of being given a specific case to think about.

This insensitivity to the diagnosticity of the evidence is one of the most consequential features of base rate neglect for applied decision-making. In real-world settings, decision-makers are routinely presented with individuating information about specific cases --- a patient’s symptoms, a candidate’s interview, a defendant’s appearance, a company’s pitch deck --- and most of that information is not nearly as diagnostic as it appears. The base rate neglect literature predicts that decision-makers will systematically over-weight whatever individuating information is presented to them, regardless of its actual diagnostic value, and will under-weight the prior probability that the underlying classification is true.

Casscells 1978 and the Mammography Problem

The applied importance of base rate neglect became hard to ignore after Casscells, W., Schoenberger, A., & Grayboys, T. B. (1978). “Interpretation by physicians of clinical laboratory results.” New England Journal of Medicine, 299(18), 999—1001. DOI: 10.1056/NEJM197811022991808.

Casscells, Schoenberger, and Grayboys ran what is now the canonical demonstration of base rate neglect in a clinical context. They presented sixty Harvard Medical School faculty, residents, and fourth-year students with a one-question survey. The question was a stripped-down version of the diagnostic-test problem at the top of this article: a disease afflicts one in a thousand people, the test has a five percent false-positive rate and no false negatives, a patient tests positive, what is the probability that the patient has the disease?

The result, published in the New England Journal of Medicine, became one of the most-cited demonstrations of a cognitive bias in the entire medical literature. Of the sixty respondents, only eleven gave the correct answer of approximately two percent. The modal answer, given by twenty-seven of the sixty respondents, was ninety-five percent. The average answer was about fifty-six percent. Most respondents had taken the complement of the false-positive rate (one minus five percent equals ninety-five percent) and reported that as the posterior probability of having the disease, without any consideration of the base rate.

The Casscells finding was particularly consequential because the respondents were physicians at one of the most prestigious medical schools in the world. These were not undergraduate subjects naive to probability. These were people whose professional responsibility included interpreting diagnostic tests on a daily basis. The finding that the great majority of them produced answers that were off by a factor of twenty or more was alarming to the medical community in 1978 and remained alarming in subsequent re-tests of the same paradigm.

Casscells and colleagues attributed the finding to base rate neglect of exactly the kind Tversky and Kahneman had documented in the 1973 paper. The disease prevalence (the prior probability of disease, one in a thousand) was being treated as approximately irrelevant once the diagnostic test result was available. The diagnostic test result was being interpreted as direct evidence of disease, ignoring the fact that in a population with a low base rate, even a relatively accurate test produces many more false positives than true positives.

The paper triggered a wave of follow-up work. Replications across different physician populations, different countries, and different clinical scenarios consistently produced the same pattern: most physicians, when given a diagnostic-test problem with a low base rate, would substantially overestimate the posterior probability of disease. The error was not a slight miscalibration. It was a systematic neglect of the prior of a magnitude that, in clinical practice, would translate into substantial over-treatment of patients who in fact did not have the relevant disease.

This finding became the foundation for much of the modern clinical-decision-support literature. Decision aids that present diagnostic information in frequency framing (out of one thousand patients, one has the disease and tests positive; fifty test positive falsely; so a positive result puts you in a pool of fifty-one, of whom one is actually sick) have been shown to dramatically improve physician calibration. The intervention does not require new statistical training; it requires only that the diagnostic information be presented in a format that makes the base rate visible at the point of decision.

Bar-Hillel 1980 and the Systematization

The most influential synthesis of the early base-rate-neglect literature is Bar-Hillel, M. (1980). “The base-rate fallacy in probability judgments.” Acta Psychologica, 44(3), 211—233. DOI: 10.1016/0001-6918(80)90046-3.

Maya Bar-Hillel’s paper, written at the Hebrew University of Jerusalem, did something the earlier Tversky-Kahneman work and the Casscells clinical work had not fully done. It systematized the conditions under which base rate neglect occurs, the conditions under which it is reduced or eliminated, and the relationship between base rate neglect and the broader heuristics-and-biases program.

Bar-Hillel’s central contribution was the distinction between causal and incidental base rates. A causal base rate is one that has a clear causal connection to the individual being judged --- for example, the prevalence of a disease in a population is causally connected to whether a given member of that population has the disease, because the disease distribution is generated by underlying causal processes that affect individuals. An incidental base rate is one that is logically relevant but not causally connected --- for example, the proportion of green cabs in a city is logically relevant to whether a witnessed cab was green, but the proportion does not exert direct causal influence on what color any specific cab was painted.

Bar-Hillel argued, and produced experimental evidence to support, that subjects use causal base rates substantially more than incidental base rates. This was a clarification of an apparent paradox in the earlier literature: in some experimental paradigms, subjects appeared to ignore base rates entirely; in others, they appeared to use them. The causal-incidental distinction explained much of the variance. When the base rate was framed in causal language, subjects integrated it (imperfectly, but to a meaningful degree). When the base rate was framed as a mere statistical fact about the population, subjects largely ignored it.

This was a substantively important clarification because it implied that base rate neglect was not a brute insensitivity to numerical information. It was a sensitivity to the kind of information --- specifically, a preference for causally interpretable information over abstract statistical information. The implication was that in any applied setting where base rates needed to be used by humans, the base rate should be presented in a way that made the causal connection between the prior probability and the individual case explicit. This implication has been validated in subsequent applied work in clinical decision support, criminal justice, and other domains.

Bar-Hillel also pulled together a second line of evidence about the relative weight of base rates and individuating information. She showed, across multiple paradigms, that subjects systematically under-weight base rates relative to individuating information by a factor that can be quantified. In her summary, the underweight was typically by a factor of two to five, depending on the specific paradigm. This quantification gave subsequent researchers a target: any cognitive aid that wanted to address base rate neglect needed to either eliminate the underweight (by structural intervention) or compensate for it (by deliberately upweighting the base rate in the elicitation).

The Bar-Hillel paper is the standard reference for the conditions under which base rate neglect is more or less severe, and for the theoretical interpretation of why it occurs. It is the bridge between the original Tversky-Kahneman demonstrations and the applied-decision-making literature that built on them.

Applied Domains: Where Base Rate Neglect Matters

The lab paradigms are striking, but the practical importance of base rate neglect comes from its consequences in applied domains. Several lines of work have documented the same pattern playing out in high-stakes professional decisions.

Medical diagnosis. The Casscells finding has been replicated extensively, including in studies of practicing physicians evaluating realistic clinical cases. The systematic pattern is that physicians overestimate the posterior probability of rare diseases given positive diagnostic tests, and underestimate the posterior probability of common diseases given negative or ambiguous tests. The magnitude of the bias is large enough to translate into measurable clinical consequences: over-investigation of rare diagnoses, over-treatment of false positives, and under-investigation of common diagnoses that initially present with atypical features. The clinical-decision-support literature has produced robust interventions, generally based on frequency framing and explicit pre-test probability elicitation, that substantially improve calibration.

Criminal justice. Several lines of research have documented base rate neglect in criminal-justice decision-making. Eyewitness identification, where the base rate of guilt in a lineup is often low, is one well-studied case: witnesses (and the police and jurors evaluating their identifications) systematically overestimate the probability that a positively-identified suspect is guilty, because they over-weight the identification and under-weight the prior probability that the suspect was wrongly placed in the lineup. Forensic-evidence interpretation, particularly in the early DNA-evidence era, exhibited the same pattern: the prosecutor’s fallacy --- presenting a match probability as a posterior probability of guilt --- is structurally identical to the Casscells error. Modern forensic-evidence standards include explicit prior elicitation as a result.

Hiring and personnel decisions. When evaluating job candidates against a target profile, hiring managers routinely substitute representativeness for probability, weighting how well the candidate matches the prototype far more than the base rate of success in the role given any candidate. This pattern shows up in structured-interview research, in resume-screening studies, and in subsequent-performance evaluations. The interventions that work are largely the same as in clinical contexts: structured decision aids that present base rates explicitly at the point of decision, frequency framing of the relevant probabilities, and decomposition of the overall judgment into separate components that can be evaluated independently.

Investing and forecasting. Philip Tetlock’s expert-political-judgment work, and the subsequent prediction-tournament literature, has documented systematic base rate neglect in forecasters making predictions about specific scenarios. Forecasters who pay attention to base rates --- the prior probability of geopolitical events of a given type, the historical frequency of comparable outcomes --- substantially outperform forecasters who reason primarily from the specifics of the current situation. The systematic bias in inexperienced or unstructured forecasting is to over-weight the vivid details of the current scenario and under-weight the historical frequency of comparable outcomes. This is base rate neglect applied to time series and scenarios rather than to individuals, and the effect on forecasting accuracy is large enough to be detectable in standard accuracy metrics.

Risk assessment in general. Any classification or risk-assessment context where decisions depend on the prior probability of the relevant category is a context where base rate neglect can produce systematic errors. Fraud detection, security screening, regulatory compliance, insurance underwriting, credit assessment, and many others share the same structural feature: a low-base-rate category being detected by an imperfect signal, with decisions made by humans who systematically under-weight the prior. The pattern of consequences is also similar: over-action on false positives, under-action on cases that do not match the obvious prototype, and aggregate decision quality below what proper probabilistic reasoning would yield.

The common thread across these domains is that the systematic direction of the bias is consistent: humans under-weight the prior, over-weight the individuating evidence, and produce posterior probability estimates that are too far from the prior in the direction suggested by the evidence. The magnitude of the bias varies with the specifics, but the direction is reliable.

Mitigations That Actually Work

The base rate neglect literature has produced an unusually clear set of mitigations, because the bias is mechanistically well-understood and the structural interventions that address it have been validated across multiple applied domains.

Frequency framing. The single most-validated intervention is to present probability information in frequency rather than probability format. Instead of “the disease has a prevalence of one in a thousand and the test has a five percent false-positive rate,” present “out of one thousand people, one actually has the disease; of the remaining nine hundred ninety-nine, about fifty test positive falsely; so among all fifty-one positive testers, only one actually has the disease.” The frequency framing makes the base rate visible at the point of decision and produces substantially better calibration without requiring any additional training. This effect has been documented across medical, legal, financial, and risk-assessment contexts.

Explicit prior elicitation before evidence presentation. A second well-validated intervention is to require the decision-maker to estimate the prior probability of the relevant category before any individuating evidence is presented. This forces engagement with the base rate at a point in the decision process where it cannot be displaced by the vividness of the specific case. Structured clinical-decision-support tools and structured intelligence-analysis protocols routinely use this pattern. The prior estimate then anchors subsequent updates, and the magnitude of the update can be checked against what Bayes’ rule would imply given the diagnostic value of the evidence.

Decomposition of compound probability judgments. When the decision requires a compound probability (the probability of disease given test result, the probability of guilt given evidence, the probability of success given candidate profile), structurally decompose the judgment into its components: the prior, the likelihood of the evidence given each category, and the relative likelihood ratio. The decomposition forces explicit engagement with each component and prevents the gestalt substitution of representativeness for probability. This is the basis for much of the structured-analytic-techniques work in intelligence analysis and for the structured-judgment work in clinical psychiatry.

Causal framing of base rates. Following Bar-Hillel’s distinction, when base rates need to be communicated to decision-makers, frame them in causally interpretable terms rather than as abstract statistical facts. “Of every thousand patients in this demographic, one develops this condition” lands more effectively than “the population prevalence is 0.1 percent.” The causal framing recruits the cognitive machinery that uses base rates; the statistical framing does not.

Decision aids that compute the posterior automatically. In high-stakes contexts where the cost of base rate neglect is large enough to justify the investment, the most effective intervention is often to remove the probability computation from the human entirely. Bayesian decision-support tools take the prior, the evidence likelihoods, and the observed evidence as inputs, and output the posterior probability. The human decision-maker is freed to evaluate the inputs (which they can do well) and to make the action decision conditional on the posterior (which they can also do well), while the probability arithmetic --- which they reliably do badly --- is performed by the tool.

Training in Bayesian reasoning, with realistic caveats. Explicit training in Bayesian reasoning, including practice with worked examples, does improve subsequent calibration on similar problems. But the improvement is modest and tends to decay over time. Training is not a substitute for structural interventions. It is a useful complement, particularly for professionals whose role involves frequent probabilistic decisions, but the durable interventions are the structural ones above.

The general pattern, parallel to the conjunction-fallacy literature, is that the effective interventions are structural rather than educational. Telling people about base rate neglect, by itself, does not reliably eliminate it --- the cognitive substitution operates pre-consciously, and explicit knowledge of the bias does not prevent the substitution from occurring in the moment. Designing decision environments that make the base rate visible, that elicit the prior before the evidence, and that decompose compound probability judgments into their components, does reliably improve calibration.

Strategist Implications for Classification and Risk-Assessment Decisions

For strategists, executives, and decision-makers operating in any context where classification or risk-assessment is a recurring activity, the base rate neglect literature has several practical implications.

In any classification context, ask “what is the prior probability before I see this evidence?” This is the single most practically actionable takeaway. Most professionals, including sophisticated ones, will not have thought to ask this question. The act of asking it forces engagement with the base rate at a point in the decision process where it can still influence the outcome. The discipline of asking the question routinely --- before evaluating the candidate, before reading the patient’s labs, before reviewing the investment pitch deck --- is a structural intervention against base rate neglect in your own decision-making.

Be suspicious of any positive signal in a low-base-rate environment. The mammography problem is a parable for any context in which an imperfect signal is being used to detect a rare condition. Fraud-detection signals, security alerts, intrusion-detection alerts, regulatory red flags, and any other low-base-rate detection systems will produce vastly more false positives than true positives, even when the signal is reasonably accurate. The appropriate skeptical posture is not to dismiss the signal but to recognize that its informativeness about the underlying condition is much weaker than its accuracy might suggest, and to seek confirming evidence before treating the signal as a finding.

Decompose compound judgments before evaluation. When evaluating any decision that depends on multiple probabilistic components (will the deal close, given the indications? will the candidate succeed, given the interview? will the patient recover, given the symptoms?), decompose the overall judgment into its components and evaluate each separately. The compound judgment will be subject to representativeness substitution and base rate neglect; the component judgments are more tractable for explicit probability reasoning. This is the same intervention recommended in the conjunction-fallacy literature, and it works for the same reasons.

In any system that asks humans to produce probability judgments, frame the elicitation in frequency terms. This applies to internal sales forecasting, expert clinical prediction, intelligence analysis, insurance underwriting, market research, and any other context where elicited probability judgments are aggregated or compared across cases. The frequency framing produces substantially better calibration and reduces both base rate neglect and conjunction-fallacy distortion in elicited judgments. This is essentially zero-cost to implement (it is purely a wording change in the elicitation prompt) and the gains in aggregate calibration are large enough to be measurable.

Build Bayesian decision aids for high-stakes recurring decisions. Where the cost of base rate neglect across many decisions is large enough to justify the investment, building a decision aid that performs the Bayesian arithmetic explicitly will outperform human probabilistic judgment for the same set of inputs. The decision aid does not replace the human; it complements the human by removing the specific computation that the human reliably does badly. This is the architecture that underpins most of the modern clinical-decision-support literature, and it generalizes to other high-stakes classification contexts.

Audit your decision processes for base rate visibility. A useful diagnostic question for any recurring decision process is whether the relevant base rate is visible to the decision-maker at the point of decision. If the answer is no, base rate neglect is likely to be operating, and the simplest intervention is often to make the base rate visible (for example, by reporting it on the same screen as the individuating information). This intervention does not require training, does not require new analytics infrastructure, and routinely improves calibration. It is the kind of intervention that has high return on a small investment of design attention.

Distinguish between cases where the base rate dominates and cases where the evidence dominates. Bayes’ rule tells you when the prior should pull the posterior toward itself (when the evidence is weakly diagnostic, or the prior is very strong) and when the evidence should pull the posterior toward itself (when the evidence is strongly diagnostic, or the prior is weak). A useful structural intervention is to develop, for each recurring decision type, a sense of where on this spectrum the decision sits. Most decisions that look like “evaluate this individual against a category” sit closer to the prior-dominated end than untrained intuition suggests, because individuating information is typically less diagnostic than its vividness implies and base rates are typically more constraining than their abstract framing suggests.

The general pattern: humans, including sophisticated decision-makers, systematically under-weight prior probabilities and over-weight individuating evidence when making classification or risk-assessment judgments. The bias is robust, is well-documented across applied domains, and is consequential for decision quality. The mitigations that work are structural --- frequency framing, explicit prior elicitation, decomposition, decision aids --- rather than educational. For strategists, the practical move is to design the recurring classification decisions in your organization so that the base rate is visible at the point of judgment and so that the elicitation of probability happens in the format that humans handle best.

Sources

Kahneman, D., & Tversky, A. (1973). On the psychology of prediction. Psychological Review, 80(4), 237—251. DOI: 10.1037/h0034747
Tversky, A., & Kahneman, D. (1982). Evidential impact of base rates. In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under Uncertainty: Heuristics and Biases. Cambridge University Press.
Bar-Hillel, M. (1980). The base-rate fallacy in probability judgments. Acta Psychologica, 44(3), 211—233. DOI: 10.1016/0001-6918(80)90046-3
Casscells, W., Schoenberger, A., & Grayboys, T. B. (1978). Interpretation by physicians of clinical laboratory results. New England Journal of Medicine, 299(18), 999—1001. DOI: 10.1056/NEJM197811022991808
Eddy, D. M. (1982). Probabilistic reasoning in clinical medicine: Problems and opportunities. In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under Uncertainty: Heuristics and Biases. Cambridge University Press.
Gigerenzer, G., & Hoffrage, U. (1995). How to improve Bayesian reasoning without instruction: Frequency formats. Psychological Review, 102(4), 684—704. DOI: 10.1037/0033-295X.102.4.684
Koehler, J. J. (1996). The base rate fallacy reconsidered: Descriptive, normative, and methodological challenges. Behavioral and Brain Sciences, 19(1), 1—17. DOI: 10.1017/S0140525X00041157
Tetlock, P. E. (2005). Expert Political Judgment: How Good Is It? How Can We Know? Princeton University Press.

Browse the full Replication Crisis Hub for other behavioral-science findings, including:

The Conjunction Fallacy / Linda Problem --- the other major Tversky-Kahneman probability-judgment finding, with a similar structure of robust effect plus contested interpretation
The Availability Heuristic --- the third major Tversky-Kahneman judgment heuristic, addressing how ease of recall distorts probability estimates
Gambler’s Fallacy --- a related probability-judgment bias involving expectations about sequences of independent events
Hindsight Bias --- the systematic distortion of judgments about past probability estimates
Confirmation Bias --- a broader pattern of biased evidence evaluation that interacts with base rate neglect in many applied contexts

FAQ

Is base rate neglect a real cognitive bias, or just an artifact of confusing experimental setups?

It is real. Unlike many findings in the heuristics-and-biases literature, base rate neglect has held up across forty years of testing, across populations from undergraduates to medical specialists, and across paradigms from verbal description tasks to numerical probability tasks. The effect survived the replication crisis intact, and the magnitude of the bias in clinical and applied settings has been documented in multiple independent lines of work. There is no serious dispute about the empirical existence of the effect.

Does base rate neglect always result in overestimating the posterior probability?

Not always, but typically. The systematic direction of the bias is to under-weight the prior and over-weight the individuating evidence. In low-base-rate contexts (rare diseases, infrequent events), this produces overestimation of the posterior probability of the rare category given positive evidence. In high-base-rate contexts, it can produce underestimation of the posterior probability of the common category given evidence that is ambiguous or atypical. The general pattern is that the posterior is too far from the prior in whatever direction the evidence suggests.

Why does frequency framing reduce base rate neglect so dramatically?

Two explanations, and both are probably partly right. The first, from the Gigerenzer ecological-rationality program, is that humans evolved to process frequencies of observed events rather than abstract single-event probabilities. Frequency framing recruits cognitive machinery that probability framing does not. The second, simpler explanation is that frequency framing makes the base rate visible at the point of decision in a format the decision-maker can integrate without explicit Bayesian computation. The “one out of a thousand” formulation directly displays both the prior probability and the joint denominator that the posterior depends on. The probability formulation does not.

Is base rate neglect reduced by statistical training?

Modestly, and not durably. Subjects with formal training in probability theory commit base rate neglect at lower rates than untrained subjects, but the rate is still well above zero. Practicing physicians, after a career of clinical probability decisions, still commit the Casscells error at high rates. The pattern is consistent with the broader heuristics-and-biases literature: explicit knowledge of a bias does not reliably prevent the bias from operating in the moment, because the cognitive substitution is pre-conscious. The interventions that work durably are structural rather than educational.

How does base rate neglect interact with other probability-judgment biases?

It interacts most directly with the representativeness heuristic, which is the underlying mechanism by which base rate neglect operates: subjects substitute a similarity judgment between the individual and the category prototype for the probability judgment that would require integrating the base rate. It also interacts with the conjunction fallacy (both biases are driven by representativeness substitution and both are reduced by frequency framing), with the availability heuristic (vivid individuating information increases the over-weighting of the evidence relative to the prior), and with confirmation bias (selective attention to evidence consistent with the favored hypothesis amplifies the under-weighting of the prior). In applied settings, these biases typically compound rather than offset.

What is the most important practical takeaway for a working strategist?

In any classification or risk-assessment context, ask “what is the prior probability before I see this evidence?” The act of asking this question forces engagement with the base rate at a point in the decision process where it can still influence the outcome. Most professionals will not have thought to ask the question; the discipline of asking it routinely, before any individuating evidence is considered, is a structural intervention against base rate neglect in your own decision-making. Combined with frequency framing of probabilities and explicit decomposition of compound judgments, it produces measurably better calibration without requiring formal Bayesian training.

replication-crisisbase-rate-neglecttversky-kahnemancognitive-biasevidence-evaluation

Atticus Li

Experimentation and growth leader. CXL-certified CRO practitioner, Mindworx-certified in behavioral economics. Led 100+ in-house experiments at NRG in 2025, with project evidence and limits documented in the case studies.

About LinkedIn Newsletter

What Tversky and Kahneman 1973 Actually Demonstrated

The Tom W Variants and Robustness

Casscells 1978 and the Mammography Problem

Bar-Hillel 1980 and the Systematization

Applied Domains: Where Base Rate Neglect Matters

Mitigations That Actually Work

Strategist Implications for Classification and Risk-Assessment Decisions

Sources

Related

FAQ

Related Articles

Cohen's d And The Misuse Of "Small/Medium/Large" Effect Sizes

The False Consensus Effect: Why You Think Everyone Agrees With You

The Barnum/Forer Effect: Why Personality Tests And Horoscopes Feel So Accurate

Get the WeeklyExperimentation Playbook

Get the Weekly
Experimentation Playbook