Most people fail the abstract Wason card task at rates around 10 to 25 percent. Reframe the same logical structure as a social-contract violation and performance jumps to 70 to 90 percent. The empirical pattern is robust. The evolutionary interpretation is not. Here is what holds up and what it means for decision design.

Try this in your head before reading on.

Four cards are placed in front of you. Each card has a letter on one side and a number on the other. The visible faces show:

E     K     4     7

Someone proposes a rule: “If a card has a vowel on one side, then it has an even number on the other side.”

Which cards --- and only which cards --- do you need to turn over to test whether that rule is being violated?

Most adults, including most adults with university educations in technical fields, give the wrong answer to this puzzle. The most common answer is “E and 4,” which is logically incorrect. The smaller share who answer “E and 7” have grasped what the rule’s logical structure actually demands. In the studies where this task has been run on naive subjects --- without coaching, without warm-up examples, with the abstract letters-and-numbers framing --- the proportion that gives the logically correct answer typically sits in a band between roughly 10 and 25 percent of subjects, depending on instructions, sample, and exact materials.

Then a researcher named Leda Cosmides took the same logical problem in 1989 and reframed it. Instead of vowels and even numbers, she handed subjects cards that read: “If you are drinking alcohol, then you must be over 21.” The four cards showed people’s ages or what they were drinking. Subjects were told to imagine they were the bartender enforcing the rule. Which cards do you turn over to catch violators?

Performance on this version, identical in logical structure to the abstract one, jumped to somewhere between 70 and 90 percent of subjects giving the correct answer --- a five- to eightfold improvement, on a task that is supposedly purely about formal logic.

That is the Wason selection task. It is one of the most replicated findings in cognitive psychology. It is also one of the most interpretively contested. The abstract version is a real and robust failure of human reasoning; the content effect is a real and robust enhancement of human reasoning under specific conditions; the evolutionary story that gets layered on top of both --- the claim that humans have a specifically evolved “cheater detection module” --- is a more fragile inference whose alternatives fit the same data roughly as well. The Wason selection task is, in this sense, not a story about a finding that failed to replicate. It is a story about a finding that did replicate, repeatedly, but whose most popular interpretation does much more work than the data can support.

For executives and strategists, the practical lesson is the one that survives every interpretive dispute: the form in which you present a decision problem reliably and dramatically changes how well the people you are asking will reason about it. If you want your team to do logic well, you frame logic as a thing your team already knows how to do.

Wason’s 1968 Original --- The Card Task

The selection task originates with Peter Wason, a cognitive psychologist at University College London who spent most of his career running experiments designed to expose the gap between formal logic and how people actually reason. The foundational paper is:

Wason, P. C. (1968). “Reasoning about a rule.” Quarterly Journal of Experimental Psychology, 20(3), 273—281. DOI: 10.1080/14640746808400161

The setup was deliberately spare. Subjects saw four cards on a table, knew each card had a letter on one face and a number on the other, and were given a conditional rule: If P, then Q. Their job was to identify the minimum set of cards whose hidden faces had to be inspected in order to determine whether the rule was being broken. The cards visible in front of them were laid out as P, not-P, Q, and not-Q --- one example of each logical class.

The logically correct answer for any rule of the form If P, then Q is to check P and not-Q. Turning the P card matters because the rule says P implies Q, so you need to confirm Q is on the other side. Turning the not-Q card matters because the rule’s contrapositive is If not-Q, then not-P; if you find a P on the back of a not-Q card, the rule is broken. The not-P card is irrelevant --- the rule says nothing about what happens when P is absent. The Q card is irrelevant in the same way --- finding a not-P on the other side does not break the rule, because the rule only says P implies Q, not that Q implies P.

This is high-school propositional logic. It is the kind of formal-reasoning content that most educated adults will, if you ask them to derive it on paper from a truth table, derive correctly in about thirty seconds. And yet when Wason ran it as a behavioral experiment, with the cards in front of people and instructions to find the cards that needed to be turned, performance collapsed. The modal response in his original studies was “P and Q” --- turning the cards that match the components of the rule, looking for confirmation rather than for falsification. The logically correct “P and not-Q” answer was rare, often under 10 percent of subjects in the cleanest abstract versions, and typically not exceeding 25 percent across variations.

This finding was striking because of who was failing it. Wason’s subjects were not selected for low education; they included university undergraduates, postgraduates, and in some cases logic instructors who had taught the underlying material themselves. The failure was not a knowledge gap. It was something else --- a tendency to interpret the task as a search for evidence consistent with the rule, rather than as a search for evidence that could falsify the rule. Wason connected this back to his 1960 work on the 2-4-6 problem, which had shown the same confirmation-seeking pattern in hypothesis generation. The selection task was the same bias, exposed in a different paradigm.

The 1968 paper established two things that have held up across the next half-century of follow-up work. First, the abstract version of the selection task is genuinely hard for naive subjects, and the difficulty is not a quirk of one experimenter or one undergraduate population. The pattern has replicated across countries, languages, and educational backgrounds; the percentage of correct responders on the abstract version is reliably low across a wide range of laboratory contexts. Second, the difficulty is not random error. Subjects do not produce a flat distribution of wrong answers. They produce a specific wrong answer --- the one that corresponds to confirmation-seeking --- which tells you the underlying cognitive mechanism is doing something systematic rather than something noisy.

What Wason did not yet have, in 1968, was a clean explanation of when subjects would do better. That came later, in a sequence of studies that turned the selection task from a one-paragraph demonstration of bias into a long-running interpretive argument about the architecture of human reasoning.

The Dramatic Content Effect --- Cosmides 1989 And The Social-Contract Reformulation

The content effect on the selection task was not discovered all at once. Through the 1970s and into the 1980s, a series of papers had shown that performance improved when the rule was given more concrete content, but the improvements were inconsistent and the underlying pattern was unclear. Some “realistic” framings improved performance; others did not; the effect seemed to depend on something the literature had not yet pinned down. The breakthrough paper that crystallized the pattern, and that became the canonical reference for the content effect, was:

Cosmides, L. (1989). “The logic of social exchange: Has natural selection shaped how humans reason? Studies with the Wason selection task.” Cognition, 31(3), 187—276. DOI: 10.1016/0010-0277(89)90023-1

Cosmides’s contribution was twofold. First, she gave a sharper account of which content effects worked, by predicting in advance that rules describing social contracts --- in particular, rules of the form “If you take benefit B, then you must pay cost C” --- would elicit much higher rates of correct responding, including in subjects unfamiliar with the specific content. Second, she ran a long series of selection-task experiments designed to test that prediction against alternatives, with the alcohol-and-drinking-age scenario as one of the most widely cited examples.

In the alcohol scenario, subjects saw four cards representing four bar patrons. Two cards showed what each patron was drinking (beer, soda); the other two showed each patron’s age (25, 16). The rule was the bar’s drinking-age rule: If you are drinking alcohol, then you must be over 21. Subjects were asked which cards they would need to turn over to catch anyone violating the rule.

The logically correct answer is the same as in the abstract version: turn the “beer” card (because someone drinking alcohol might be underage), and turn the “16” card (because someone underage might be drinking alcohol). Do not turn the “soda” card (the rule says nothing about soda) and do not turn the “25” card (the rule says nothing about what 25-year-olds drink).

Performance on this version was, in Cosmides’s experiments, dramatically better than on matched abstract versions. The proportion of subjects giving the logically correct answer was reported in the range of 70 to 80 percent and above for social-contract framings, compared to the 10-to-25-percent range typical for abstract versions of the same logical structure. The effect was large enough to be visible in single experiments without elaborate statistics; the modal response shifted from “P and Q” (confirmation-seeking) to “P and not-Q” (the logically correct violation-detection response).

Cosmides went further. She ran variations designed to disentangle the social-contract account from competing explanations like familiarity, deontic content (rules about what one must or must not do), or general-purpose pragmatic reasoning. She included unfamiliar social-contract scenarios where subjects could not have been drawing on prior knowledge of the specific rule. She constructed “switched social contracts” in which the logical antecedent and consequent of the rule were reversed, predicting and finding that subjects would still focus their card-turning on what the social-contract structure required (catching cheaters) rather than what the literal logical structure of the rule required. The pattern she reported was that the cheater-detection-relevant cards were chosen whether or not they corresponded to logically correct P-and-not-Q responses, which she interpreted as evidence that a cheater-detection mechanism was driving the response rather than general logical inference.

The empirical claim --- that social-contract framings dramatically improve selection-task performance compared to matched abstract framings --- has held up across replications and extensions. The pattern is robust enough that it appears in textbook treatments of human reasoning as a well-established finding. The selection task in its social-contract form, in particular the alcohol-and-drinking-age version, has become one of the most-taught examples in cognitive psychology and behavioral economics undergraduate curricula.

What has held up less well is the further inference that Cosmides built on top of the empirical pattern: that the content effect reveals the operation of a specifically evolved, content-dedicated cognitive mechanism for detecting cheaters in social exchanges. That stronger claim is where the interpretive disputes begin.

Competing Explanations --- Cheater Module, Pragmatic Schemas, And Relevance

The empirical fact that needs explaining is the gap between roughly 10-to-25-percent performance on abstract versions of the selection task and roughly 70-to-90-percent performance on social-contract versions of the same logical structure. There are at least three established families of explanation in the literature, and they disagree fundamentally about what kind of cognitive machinery is doing the work.

The first family is Cosmides’s own evolutionary-psychology account: humans possess a specialized, evolved cognitive adaptation for detecting cheaters in social-exchange situations. This adaptation, on the cheater-detection hypothesis, processes social-contract rules differently from abstract logical rules; it activates when the rule has the structure of a benefit-paid-for-a-cost contract, and it produces accurate card-turning behavior because catching cheaters in social exchanges was an adaptively important problem in human evolutionary history. The strength of the account is that it predicts a specific class of content that should improve performance --- social contracts, not just any concrete or familiar content --- and that prediction has held up. The weakness is that the inference from “this content class improves performance” to “there is a dedicated evolved module” is a much larger inferential step than the data themselves require.

The second family, articulated earlier and developed independently of Cosmides, is the pragmatic reasoning schema account, most prominently:

Cheng, P. W., & Holyoak, K. J. (1985). “Pragmatic reasoning schemas.” Cognitive Psychology, 17(4), 391—416. DOI: 10.1016/0010-0285(85)90014-3

Cheng and Holyoak proposed that human reasoning is organized around domain-general pragmatic schemas --- abstract structures that apply to broad classes of situations like permissions, obligations, and causal relationships, but that are not modality-specific to any one evolutionary problem. A permission schema, on this view, is triggered by rules of the form “If you want to do X, you must satisfy precondition Y,” and once triggered it produces accurate violation-detection behavior across many concrete instantiations of that schema, regardless of whether the specific content is social, biological, or purely conventional. The pragmatic-schema account predicts the same broad pattern of content effects as the cheater-detection account, but explains it through a different mechanism --- general-purpose deontic reasoning rather than a specifically evolved cheating-detection module.

The empirical work that followed showed that the two accounts make overlapping but not identical predictions. Permission and obligation rules that are not social contracts in Cosmides’s strict sense (e.g., rules about which children must wear safety equipment to play on certain equipment) tend to produce the same improvement in selection-task performance as paradigmatic social-contract rules, which is consistent with the pragmatic-schema account but harder to fit cleanly under a narrow cheater-detection module. Defenders of the cheater-detection account have responded by broadening the definition of what counts as a social-exchange context, which has the predictable effect of making the account harder to falsify with experimental variations.

The third family, which complicates the dispute further, is the relevance-theoretic account:

Sperber, D., Cara, F., & Girotto, V. (1995). “Relevance theory explains the selection task.” Cognition, 57(1), 31—95. DOI: 10.1016/0010-0277(95)00666-M

Sperber, Cara, and Girotto argued that the content effects in the selection task are not evidence of a specialized cheater-detection module, nor specifically of pragmatic schemas, but are explainable by the general pragmatic principle that listeners interpret communicative material in terms of what is most relevant --- which is to say, what carries the most cognitive payoff for the least processing cost --- given the context. The selection task in its abstract form, on the relevance account, is hard not because subjects lack logical capacity but because the task is communicatively impoverished: subjects struggle to construct a representation of the rule in which the not-Q card is contextually relevant. Reframings that increase the contextual relevance of the not-Q case --- including but not limited to social-contract framings --- produce the performance improvement. The relevance account thus predicts that any reframing which makes the violation case salient should help, whether or not the content is a social contract, and there is empirical support for that broader pattern.

The interpretive state of the field, after several decades of this argument, is roughly: the empirical content effect is robust and well-replicated; the specific claim that there is an evolved, modality-specific cheater-detection module is one of several theoretically respectable accounts; and the alternative accounts (pragmatic schemas, relevance, dual-process accounts in which the social-contract framing engages System 1 inferential routines that bypass the abstract-logical reasoning System 2 normally fails at) all fit the data well enough that the question of which underlying mechanism is doing the work cannot be settled by the selection-task literature alone. The evolutionary story is not refuted by the data; neither is it uniquely supported by them.

This is a familiar pattern in cognitive science. The same robust empirical finding can be consistent with strikingly different theoretical accounts, and the question of which account is right often cannot be resolved within the paradigm that produced the finding. The Wason selection task is one of the clearest cases of this pattern in the field, and it is worth being clear about what is robust and what is interpretively underdetermined.

What’s Robust And What’s Contested

The empirical findings that hold up across replications:

  • The abstract selection task is genuinely hard. Naive subjects fail it at high rates --- typically 75 to 90 percent failure rates depending on instructions and population --- and the failure has a specific structure (confirmation-seeking) rather than being random error. This part of Wason’s 1968 paper has replicated robustly.
  • Content matters dramatically. Reframing the same logical structure in terms that subjects find ecologically meaningful improves performance by large multiples, sometimes 4x to 8x. This has been demonstrated across many specific reframings, not just social contracts.
  • Social-contract framings work particularly well. The Cosmides paradigm, in its alcohol-and-drinking-age and other social-contract variants, consistently produces high correct-response rates. This empirical pattern has replicated extensively.
  • Permission and obligation framings work in similar ways. Deontic content of various kinds --- not strictly limited to Cosmides’s definition of social exchange --- improves performance, which is consistent with both the cheater-detection and pragmatic-schema accounts.

The interpretive claims that are more contested:

  • The existence of a specifically evolved, content-dedicated “cheater detection module.” This is the strong evolutionary-psychology claim. It is consistent with much of the data but is not the only mechanism that can explain those data, and the alternatives (pragmatic schemas, relevance theory, dual-process accounts) fit the same evidence approximately as well.
  • The claim that abstract-logical reasoning is fundamentally separate from social-contractual reasoning. This is a modularity claim about the architecture of cognition. It is theoretically tractable but empirically underdetermined; the same patterns can be produced by a more unified reasoning system whose performance depends on whether the input is in a format the system can recognize.
  • The claim that the content effects reveal anything specific about human evolutionary history. This is the inferential step from a cognitive performance pattern to an evolutionary cause. It is a particularly hard inference to discipline empirically, because almost any cognitive pattern observed in modern humans can be made consistent with some plausible evolutionary story.

The takeaway, for anyone reading the literature carefully, is that the selection task is a story about a robust empirical phenomenon (content effects on reasoning performance) with several theoretically respectable interpretations and one popular interpretation (the cheater-detection module) that goes considerably beyond what the data uniquely support. This is not a replication failure. It is something more subtle: a case where the empirical bedrock is solid but the theoretical superstructure is more fragile than the textbook treatments often suggest.

It is also worth being clear about what the selection task is not evidence for. It is not evidence that people are bad at logic in some general sense; people who can solve the alcohol version of the task are demonstrably doing the right logical work, they are just doing it in a representational format that recruits different inferential routines. It is not evidence that abstract reasoning is impossible or unimportant; it is evidence that the conditions under which humans do abstract reasoning well are narrower than the academic curriculum sometimes suggests. And it is not evidence that “context beats logic” in the strong sense of context overriding logic --- the high-performing social-contract version of the task is logically correct, not logically distinct. The framing recruits the right inference; it does not replace it.

Strategist Implications --- Designing Decision Processes Around Content Sensitivity

The reason the selection task matters outside of cognitive-psychology seminars is that the content effect is not a quirk of the laboratory paradigm. It generalizes. Human reasoning performance, on a wide range of inferential tasks, depends heavily on whether the problem is presented in a representational format that the reasoners can recognize and operate on natively. The selection task is one of the cleanest demonstrations of this principle, but the principle itself is much broader.

For someone designing decision processes --- in an executive role, a product role, a strategy or policy role --- there are several practical implications that follow from this regardless of which theoretical account of the content effect turns out to be right.

Frame the question in terms decision-makers natively reason about. If you need a team to evaluate evidence about whether a product launch is on track, “are there indicators we are off course” is a question they can engage with more reliably than “is the launch on course.” The two questions are logically equivalent --- the same evidence is informative under either framing --- but the first activates a violation-detection orientation that maps onto well-trained inferential routines, while the second invites a confirmation-seeking orientation that maps onto the abstract-Wason-task failure mode. The framing change is small. The performance effect, if the selection-task generalization holds, can be large.

Concrete instances often beat abstract principles. When you want a team to apply a rule consistently, instantiate the rule in concrete cases the team can hold in working memory and reason about with social-cognitive routines, rather than asking them to derive answers from the abstract rule directly. A pricing policy that says “if a customer is in segment X then they get treatment Y” will produce more consistent execution if you give the team several concrete customer-segment-treatment examples than if you give them the abstract policy and ask them to derive applications. This is not because the team is incapable of derivation; it is because derivation activates the slow, error-prone abstract-reasoning system, while pattern-matching to concrete examples activates the fast, accurate concrete-instance reasoning system.

Premortems and red teams exploit the content effect deliberately. A premortem asks the team to imagine that a decision has already failed and then to enumerate the reasons why. This reframing converts a forward-looking risk-evaluation task --- which suffers all the abstract-reasoning failure modes including confirmation bias toward the preferred plan --- into a backward-looking causal-explanation task, which the same team will execute much more accurately because retrospective causal reasoning is a well-trained cognitive routine. The technique works for the same family of reasons that the social-contract framing makes the selection task tractable: the underlying logical problem is the same, but the representational format the team is operating on is much friendlier to their actual cognitive architecture.

Watch for places where you are asking your organization to reason in the abstract-Wason mode. Strategic-planning frameworks, OKR templates, scoring rubrics, and risk-assessment matrices all have a tendency to drift toward abstract representational formats that look rigorous but that produce, in practice, the same kind of performance collapse the abstract selection task produces in the laboratory. When you find your team consistently producing the modal-wrong-answer equivalent in some recurring decision process, the highest-leverage fix is usually not to demand more rigor in the abstract format. It is to reframe the question in a more concrete, deontic, or violation-detection-oriented form that engages the cognitive routines your team is actually good at.

Do not over-claim about the mechanism. When you teach this material internally, present the empirical pattern --- abstract framings produce poor reasoning, concrete deontic framings produce much better reasoning, the difference is large and reliable --- rather than the strong evolutionary story about modular cheater-detection adaptations. The empirical pattern is what you can actually rely on. The evolutionary story is intellectually interesting but, as a matter of how the literature has shaken out, more fragile than the popular accounts suggest. Building organizational practices on the robust empirical pattern is safer than building them on a theoretical interpretation that may be revised.

The deeper lesson is that human reasoning is not domain-general in the way the abstract-rationality tradition would predict. Performance on logical and inferential tasks depends on the format of the problem, not just on its underlying logical structure. Designing decision processes that take this content sensitivity seriously is one of the highest-leverage things an organization can do to improve the quality of the decisions it produces, and it does not require taking a position on which of the competing theoretical accounts of the content effect is ultimately right.

Sources

  • Wason, P. C. (1968). Reasoning about a rule. Quarterly Journal of Experimental Psychology, 20(3), 273—281. DOI: 10.1080/14640746808400161
  • Cosmides, L. (1989). The logic of social exchange: Has natural selection shaped how humans reason? Studies with the Wason selection task. Cognition, 31(3), 187—276. DOI: 10.1016/0010-0277(89)90023-1
  • Cheng, P. W., & Holyoak, K. J. (1985). Pragmatic reasoning schemas. Cognitive Psychology, 17(4), 391—416. DOI: 10.1016/0010-0285(85)90014-3
  • Sperber, D., Cara, F., & Girotto, V. (1995). Relevance theory explains the selection task. Cognition, 57(1), 31—95. DOI: 10.1016/0010-0277(95)00666-M
  • Wason, P. C., & Johnson-Laird, P. N. (1972). Psychology of Reasoning: Structure and Content. Harvard University Press.
  • Cosmides, L., & Tooby, J. (1992). Cognitive adaptations for social exchange. In J. H. Barkow, L. Cosmides, & J. Tooby (Eds.), The Adapted Mind: Evolutionary Psychology and the Generation of Culture (pp. 163—228). Oxford University Press.
  • Evans, J. St. B. T. (2002). Logic and human reasoning: An assessment of the deduction paradigm. Psychological Bulletin, 128(6), 978—996. DOI: 10.1037/0033-2909.128.6.978
  • Sperber, D., & Girotto, V. (2002). Use or misuse of the selection task? Rejoinder to Fiddick, Cosmides, and Tooby. Cognition, 85(3), 277—290. DOI: 10.1016/S0010-0277(02)00125-7

Browse the full Replication Crisis Hub for other findings discussed alongside this one:

  • Confirmation Bias --- the broader cognitive pattern that Wason’s original 2-4-6 and selection-task work first exposed
  • Availability Heuristic --- another content-sensitive reasoning effect that has mostly survived scrutiny
  • Gambler’s Fallacy --- a related failure of intuitive probabilistic reasoning under abstract framings
  • Hindsight Bias --- an inferential failure that, like the selection task, persists even in trained experts
  • Tetlock’s Superforecasting Research --- the operational case for accountability and content-friendly forecasting formats

FAQ

What is the correct answer to the original Wason card task?

For the rule “If a card has a vowel on one side, then it has an even number on the other side,” the correct cards to turn are the vowel card (E) and the odd-number card (7). The vowel card must be checked because the rule says vowels imply even numbers, so finding an odd number on its back would refute the rule. The odd-number card must be checked because if there is a vowel on its back, the rule is broken. The K card and the 4 card are both irrelevant: the rule says nothing about consonants and nothing about what is on the back of an even-number card. The reason most people answer “E and 4” is that they are searching for evidence that confirms the rule, which is exactly the same confirmation-bias pattern Wason had identified in his 1960 2-4-6 hypothesis-testing studies.

Is the content effect a true exception to confirmation bias, or just a different surface presentation?

It is a case where the same underlying logical structure is processed by different cognitive routines depending on its presentation. The high-performing social-contract version is not bypassing logic; subjects in that version are doing the logically correct work, but they are doing it by recognizing the situation as a cheating-detection problem and applying a well-trained inferential routine for catching cheaters. The abstract version forces subjects to derive the answer from the formal logical structure, which is exactly the kind of operation human reasoning is unreliable at. The content effect, in other words, shows that humans can do the right logical work when the problem is presented in a format that engages a different cognitive system than the abstract one, not that humans escape from confirmation bias when the right content is supplied.

Has the cheater-detection interpretation been refuted?

Not refuted in the strong sense of having been ruled out by experimental evidence. It remains a theoretically respectable account of the content-effect data and has defenders in the literature. What has happened is that the competing accounts --- pragmatic reasoning schemas, relevance theory, dual-process accounts --- fit the same data approximately as well, so the inference from “social-contract framings improve performance” to “humans have an evolved cheater-detection module” is not as uniquely supported as the popular treatments of Cosmides 1989 sometimes suggest. The empirical content effect is robust; the specific evolutionary-modularity interpretation of that effect is more contested.

How is this different from the Linda Problem or the gambler’s fallacy?

The Wason selection task is a failure of deductive reasoning under abstract framings; the Linda Problem (Tversky and Kahneman’s conjunction fallacy) is a failure of probabilistic reasoning under representativeness cues; the gambler’s fallacy is a failure of probabilistic reasoning under independence-violation cues. All three are robust empirical findings, all three have content-sensitive features (performance changes when the problem is reframed), and all three illustrate that human reasoning is not a domain-general logical engine. They differ in the specific logical operation that fails and in the specific representational reframings that improve performance. The Wason content effect, in particular, is the cleanest single demonstration that deontic and social-contract framings are reliably easier than abstract conditional framings.

Does training people in propositional logic help with the abstract selection task?

Some, but less than you would expect. Formal training in propositional logic does improve performance on the abstract Wason task above the baseline naive-subject rate, but the effect of training is typically smaller than the effect of reframing the same task in a social-contract or permission format. This is the strongest evidence that the failure on the abstract version is not a knowledge gap; it is a representational-access problem. The cognitive system that produces the correct violation-detection response in the alcohol-and-drinking-age version is one most adults possess natively. The system that produces the correct response in the abstract vowels-and-numbers version requires deliberate effortful deduction, and people are bad at deliberate effortful deduction in laboratory conditions, even after training.

Does this mean abstract reasoning training is useless?

No. Abstract reasoning training, including formal logic and mathematical reasoning training, has clear cognitive benefits in the domains it is trained for. What the selection-task literature shows is that the transfer from abstract training to real-world reasoning is weaker than the abstract-rationality tradition predicts. The practical implication is not “abandon teaching abstract reasoning” but “do not assume that a team trained in abstract reasoning will produce good reasoning when handed abstract-format problems --- reframe the problems into formats that engage natively well-functioning inferential routines wherever possible, and reserve abstract-format reasoning for the narrower set of cases where the reframing is genuinely not possible.”

Where can I read more about the broader debate?

The Wason and Johnson-Laird 1972 book Psychology of Reasoning is the canonical synthesis of the early work, written before the content-effect literature had crystallized. Cosmides 1989 is the founding paper of the evolutionary-psychology interpretation, and Cosmides and Tooby’s 1992 chapter in The Adapted Mind is the most ambitious theoretical statement of that account. Cheng and Holyoak 1985 is the canonical pragmatic-schemas alternative. Sperber, Cara, and Girotto 1995 is the canonical relevance-theoretic alternative, and the Sperber and Girotto 2002 rejoinder is the clearest summary of why the relevance account considers the cheater-detection inference underdetermined by the data. Evans 2002, in Psychological Bulletin, is a careful retrospective on what the deduction-paradigm literature, including the selection task, has and has not established about human reasoning. For a popular-audience treatment, Steven Pinker’s How the Mind Works (1997) presents the cheater-detection account at chapter length; readers should pair it with one of the critical sources above to get the interpretive disputes in view.

What’s the single most useful thing to take away from this for organizational decision-making?

Reframe abstract evaluation questions as concrete violation-detection questions wherever you can. “Is our launch on track” is abstract; “if the launch were off track, what would we expect to see, and do we see it” is concrete and violation-oriented. The two questions are evidentially equivalent. The first one will produce confirmation-seeking, the abstract-Wason failure mode, and an unreliable answer. The second one will engage the deontic and violation-detection routines that the selection-task literature shows humans execute much more accurately, and will produce, on average, a better answer. The reframing costs nothing. The expected improvement in reasoning quality, if the selection-task generalization holds, is substantial. This is one of the few cases in the behavioral-science literature where a small, costless intervention reliably produces a large effect, which is exactly the kind of finding worth taking seriously for organizational practice.

Share this article
LinkedIn (opens in new tab) X / Twitter (opens in new tab)
Atticus Li

Experimentation and growth leader. CXL-certified CRO practitioner, Mindworx-certified behavioral economist (1 of ~1,000 worldwide). 200+ A/B tests across energy, SaaS, fintech, e-commerce, and marketplace verticals.