The Ultimatum Game Across Cultures: The Behavioral-Economics Finding That Was Real But Not Universal (Anti-Example)

Atticus Li

← The Replication Crisis · replication-crisis

The Ultimatum Game Across Cultures: The Behavioral-Economics Finding That Was Real But Not Universal (Anti-Example)

The ultimatum game replicates in every society researchers have tested it in. The specific quantitative pattern from Western undergraduates does not. Henrich and colleagues 2001 ran the game in 15 small-scale societies and found enormous variation. The lesson for strategists working across cultures is precise.

By Atticus Li May 25, 2026 28 min read

Most behavioral-science findings in this hub got dismantled because they were measurement artifacts, fraud, or failed replications. This article is about a different category: a finding that is robust, replicates everywhere, and whose paradigm has held up across four decades --- but whose specific quantitative pattern from Western undergraduate samples was misread as a universal feature of human nature. The correction came from a single 2001 paper that ran the same experiment in 15 small-scale societies and found enormous variation. That paper became the seed of a much larger critique of behavioral science methodology that is still being absorbed two decades later.

The finding in question is the ultimatum game. The original experiment was published in 1982 by three German economists. Within a decade it had become the canonical demonstration that humans systematically violate the rational-actor predictions of standard economic theory. Within two decades it had been replicated in dozens of WEIRD (Western, Educated, Industrialized, Rich, Democratic) samples, with remarkable consistency: proposers typically offered around 40—50% of the stakes, and responders typically rejected offers below around 30%. Textbooks, popular-science books, and intro economics lectures presented this pattern as evidence that humans have a universal preference for fairness that overrides narrow self-interest.

Then Joseph Henrich and a team of economists and anthropologists ran the same game in 15 small-scale societies on five continents. The mean offers ranged from around 26% in one society to around 58% in another. The rejection rates ranged from 0% in some societies to high rates of rejection of hyper-fair offers (above 50%) in others. The “universal preference for fairness” turned out to be a culturally constructed pattern, specific to people raised in modern market economies with particular norms about anonymous exchange. The underlying paradigm --- the experimental setup, the documentation of deviation from narrow self-interest --- replicated everywhere. The specific numbers did not.

This is the most useful kind of anti-example. The replication crisis is not just about findings that are false. It is also about findings that are real but whose generalizability was wildly overstated. The lesson is precise, and it matters for any strategist working across cultural contexts: “behavioral economics says” is not “universal human behavior.” Most of the canonical findings in behavioral economics were measured on American college students. They tell you something about American college students, and something useful about the experimental paradigm, and very little about humans in general until cross-cultural validation has been done.

What Güth 1982 Originally Demonstrated

The founding paper is Güth, W., Schmittberger, R., & Schwarze, B. (1982). “An experimental analysis of ultimatum bargaining.” Journal of Economic Behavior & Organization, 3(4), 367—388. DOI: 10.1016/0167-2681(82)90011-7.

Werner Güth, Rolf Schmittberger, and Bernd Schwarze were economists at the University of Cologne working on what was then a niche question: do real human subjects, placed in a sharply defined bargaining situation, behave the way game-theoretic models say they should? The experimental setup they designed has become so canonical that it is now taught in nearly every undergraduate behavioral economics course.

Two players are paired anonymously. A sum of money --- in the original study, between 4 and 10 Deutsche marks --- is allocated to be divided between them. One player is randomly assigned to be the proposer. The proposer offers a split of the money to the other player, the responder. The responder has exactly one decision: accept or reject. If the responder accepts, the money is divided as proposed and both players walk away with their share. If the responder rejects, both players get nothing. The game is one-shot: no repeated interaction, no possibility of building reputation, no possibility of retaliation outside the game itself.

The standard game-theoretic prediction is brutally simple. Working backwards: any positive offer leaves the responder strictly better off than rejecting (something is better than nothing). A rational, self-interested responder should therefore accept any positive offer, no matter how small. A rational, self-interested proposer, anticipating this, should offer the smallest possible positive amount, keeping nearly all of the money for themselves. The unique subgame-perfect equilibrium of the standard model is approximately (100% to proposer, smallest positive amount to responder).

Güth and colleagues found that real subjects did not behave this way. In their German student sample, mean offers were close to 37% of the pie in the simpler version of the game and close to 32% in a more complicated version. Modal offers were 50-50 splits. Low offers --- substantially below 30% --- were frequently rejected by responders, leaving both players with nothing. The responders were demonstrably willing to incur a cost (forgoing positive money) in order to punish what they perceived as unfair offers. The proposers, in turn, appeared to anticipate this and offer more than the rational-actor model predicted they should.

The 1982 paper presented this as a behavioral anomaly worth investigating, not as a refutation of economics. But over the following two decades, dozens of replications in WEIRD samples --- US college students, German college students, British college students, Israeli college students --- produced essentially the same pattern. The finding crystallized into a stylized fact: humans, when placed in ultimatum-game situations, deviate systematically from narrow self-interest in the direction of fairness. Proposers offer more than they have to. Responders reject offers they consider unfair, even at personal cost.

The WEIRD Pattern That Became “Universal”

The two-decade accumulation of WEIRD replications converged on a tight quantitative summary: in standard ultimatum games run with university student subjects in industrialized Western societies, mean offers cluster around 40—50% of the stakes, modal offers are 50-50 splits, and rejection rates for offers below 20—30% are substantial (typically 40—60% of such offers get rejected). This pattern was so consistent across WEIRD samples that it began to function in the literature as a baseline against which other findings were calibrated.

The empirical robustness of the pattern within WEIRD samples encouraged a particular interpretive frame. Standard economic theory predicts narrow self-interest. Real humans deviate from that prediction in a specific way (offering more, rejecting unfair offers). Therefore --- the inference went --- humans have a universal preference for fairness that overrides narrow self-interest, and standard economic theory needs to be augmented or replaced with models that incorporate other-regarding preferences. This interpretive move is laid out in detail in Colin Camerer’s authoritative 2003 textbook Camerer, C. F. (2003). Behavioral Game Theory: Experiments in Strategic Interaction. Princeton University Press, which collected the WEIRD ultimatum-game evidence alongside parallel evidence from dictator games, public-goods games, and trust games into a coherent case that humans have stable, measurable preferences for fairness and reciprocity that operate cross-situationally.

The inferential leap was from “this pattern is robust across WEIRD samples” to “this pattern reflects a universal human preference.” That leap was rarely defended explicitly. It was made by background assumption: of course the pattern would generalize, because it was a feature of human nature being measured through different cultural lenses. The cultural variation, if any, would presumably be at the margin --- shifts in mean offers of a few percentage points, perhaps, not fundamental disagreements about what counts as a fair offer.

That background assumption was wrong. The way it was shown to be wrong is what makes this anti-example useful.

Henrich 2001 AER --- 15 Small-Scale Societies, Enormous Variation

The paper that broke open the cross-cultural question is Henrich, J., Boyd, R., Bowles, S., Camerer, C., Fehr, E., Gintis, H., & McElreath, R. (2001). “In search of Homo economicus: Behavioral experiments in 15 small-scale societies.” American Economic Review, 91(2), 73—78. DOI: 10.1257/aer.91.2.73.

The author list matters. Henrich is an anthropologist (then at Emory, now at Harvard); Boyd is a biological anthropologist at UCLA; Bowles is an economist at the Santa Fe Institute; Camerer is the behavioral economist who wrote the 2003 textbook; Fehr is the experimental economist at Zurich whose own ultimatum-game work in WEIRD samples had been foundational; Gintis is a Santa Fe Institute economist with deep training in evolutionary game theory; McElreath was a graduate student at the time and is now a leading evolutionary anthropologist at Max Planck Leipzig. This was not a fringe-anthropology team trying to score points against economics. It was a coordinated cross-disciplinary effort, including several of the most prominent behavioral economists in the world, designed to bring the ultimatum game out of the university lab and into the kinds of human societies that had never been measured.

The team selected 15 small-scale societies for study, spanning hunter-gatherer, slash-and-burn horticultural, nomadic herder, sedentary smallholder, and other subsistence types, across five continents. The societies included the Machiguenga (Amazonian horticulturalists in Peru), the Aché (forager-farmers in Paraguay), the Hadza (foragers in Tanzania), the Au and the Gnau (horticulturalists in Papua New Guinea), the Lamalera (whale-hunting maritime foragers in Indonesia), the Achuar (forager-horticulturalists in Ecuador), and others. In each society, anthropologists with established field relationships ran the ultimatum game (and in several societies the public-goods game as well) using stakes calibrated to the local economy --- typically equivalent to one or two days’ wages in local terms.

The headline finding was not that the canonical WEIRD prediction failed in every society --- it failed in every society, but in dramatically different directions. The paper’s abstract states explicitly that the canonical model of self-interest “failed in every society studied” but that the deviations took “a wider variety of forms” than the WEIRD literature had documented. The mean offers across the 15 societies ranged from approximately 26% in the lowest society to approximately 58% in the highest. Rejection rates ranged from essentially 0% in some societies to over 40% in others. And the types of rejections varied: some societies rejected low offers but not high ones (the WEIRD-style pattern); some societies rejected nearly nothing; and some societies rejected both low offers and hyper-fair offers (offers above 50%).

The variation was not noise. The team documented systematic relationships between the ultimatum-game patterns in each society and that society’s broader characteristics: degree of market integration, degree of payoff to cooperation in everyday economic life, presence or absence of institutionalized gift-exchange relationships. The cross-cultural pattern made sense once you understood the local economic and social context. It just did not look anything like the WEIRD pattern that had been treated as the universal baseline.

The Specific Cross-Cultural Findings

The richness of the Henrich 2001 dataset is best illustrated by walking through several of the most striking individual society results.

The Machiguenga of Peru. The Machiguenga are slash-and-burn horticulturalists in the Peruvian Amazon, living in small family-based settlements with minimal cooperation beyond the family unit. Henrich had done his doctoral fieldwork with them and was the lead author on the ultimatum-game data. The mean offer in the Machiguenga sample was approximately 26% of the stakes --- dramatically lower than any WEIRD sample. Even more strikingly, the Machiguenga rejection rate was effectively zero: responders accepted nearly every offer, including very low ones. When Henrich asked Machiguenga participants why they had accepted low offers, the typical response was puzzlement: it seemed absurd to them to reject free money just because the proposer had taken a larger share. The narrow self-interest prediction (low offers, no rejections) was much closer to Machiguenga behavior than to WEIRD behavior --- not because the Machiguenga are uniquely rational, but because their cultural context does not have norms that would make low offers feel like an insult requiring punishment.

The Lamalera whale hunters of Indonesia. The Lamalera live in a small fishing village on the island of Lembata and engage in cooperative whale hunting that requires coordination among large boat crews. Successful hunts produce meat that is shared according to elaborate rules across the entire community. The mean offer in the Lamalera sample was approximately 58% --- substantially higher than any WEIRD sample. A majority of Lamalera proposers offered exactly 50% or more, and a meaningful minority offered well above 50%. The pattern fits the broader Lamalera economic life: in a society where successful subsistence requires constant coordination and equitable sharing across many participants, the ultimatum game is processed through the same norms that govern whale-meat division.

The Au and the Gnau of Papua New Guinea. These two horticulturalist groups in Papua New Guinea produced what is arguably the most theoretically interesting result in the entire 15-society dataset. Au and Gnau proposers frequently offered more than 50% of the stakes --- offers that in WEIRD samples would be accepted at essentially 100% rates. But Au and Gnau responders rejected these hyper-fair offers at substantial rates, sometimes 30% or higher. This is the opposite of the WEIRD pattern. The explanation, documented in Henrich’s later writing on these results, is rooted in Au and Gnau gift-exchange norms. In these societies, accepting a large unsolicited gift creates a serious obligation: the recipient is now socially indebted to the giver, and that debt must be repaid in kind, sometimes at considerable future cost. Receiving a hyper-fair offer in the ultimatum game was processed by Au and Gnau responders as receiving a large gift from an anonymous stranger, and rejecting it was the culturally rational way to avoid incurring an obligation they could not control or repay. The hyper-fair rejection result is the single clearest demonstration in the dataset that the WEIRD interpretation of ultimatum-game data --- “rejection reflects punishment of unfairness” --- is not a universal feature of how humans process the experimental situation.

The rest of the 15-society dataset filled in the picture. Some societies clustered near the WEIRD pattern. Some were dramatically different. The variation correlated systematically with market integration and the importance of cooperation in everyday economic life: societies with more market exposure and more cooperation in subsistence activities tended to have higher offers and more WEIRD-like rejection patterns. The findings were extended in Henrich, J., et al. (2006). “Costly punishment across human societies.” Science, 312(5781), 1767—1770. DOI: 10.1126/science.1127333, which broadened the analysis to costly punishment in third-party-punishment games across 14 of the same societies and confirmed that the cross-cultural variation in willingness to punish was as large as the cross-cultural variation in offers and direct rejections.

The WEIRD Paper

The 2001 AER paper was the empirical foundation. The conceptual extension that made the broader argument explicit came nine years later: Henrich, J., Heine, S. J., & Norenzayan, A. (2010). “The weirdest people in the world?” Behavioral and Brain Sciences, 33(2—3), 61—83. DOI: 10.1017/S0140525X0999152X.

Henrich (Harvard), Heine (UBC), and Norenzayan (UBC) made an argument that went well beyond ultimatum games. They systematically reviewed the comparative literature across multiple domains of behavioral science --- visual perception, fairness and cooperation, spatial cognition, categorization and inferential reasoning, moral reasoning, self-concept, motivations for control and choice --- and documented, for each domain, where cross-cultural data existed showing variation between WEIRD and non-WEIRD samples. The compiled evidence supported a strong claim: across many domains where behavioral science had drawn confident conclusions about “human psychology” from WEIRD samples, the WEIRD pattern was an outlier in the broader cross-cultural distribution, not a representative case.

Two specific WEIRD-paper conclusions are worth quoting in spirit. First, the authors estimated that roughly 96% of subjects in the major experimental psychology journals at that time came from countries holding approximately 12% of the global population, and that most of those subjects were university undergraduates. The behavioral science database was --- and to a substantial degree still is --- a database of WEIRD undergraduates being treated as a proxy for humanity. Second, the comparative evidence repeatedly showed that on dimensions where WEIRD samples could be compared to other samples, WEIRD subjects were systematically near one extreme of the distribution rather than near the median. Americans were the most extreme WEIRD sample on many of these dimensions --- “outliers among outliers,” in the paper’s phrasing.

The WEIRD paper is not an argument that behavioral science is wrong. It is an argument that the inferential move from “we found this pattern in WEIRD undergraduates” to “this is a feature of human psychology” is unjustified by the data, and that the cross-cultural validation that would justify the inference has been done in only a small minority of cases. Henrich’s broader research program, summarized in his 2020 book The WEIRDest People in the World: How the West Became Psychologically Peculiar and Particularly Prosperous, offers a historical account of why WEIRD psychology is unusual: a thousand-year accumulation of specific cultural changes in Europe (the Catholic Church’s marriage and family programs, the rise of impersonal market exchange, particular religious traditions) that shifted Western psychology in measurable, documentable directions.

For the purposes of this article, the relevant point is narrower. The ultimatum-game finding was robust within WEIRD samples. The cross-cultural extension showed that the specific WEIRD pattern was not universal. The broader WEIRD critique suggests that this is likely true of many other findings in behavioral science that were established on WEIRD samples and never tested cross-culturally. The default expectation, post-Henrich, should be that a behavioral pattern documented in WEIRD samples is a pattern in WEIRD samples until cross-cultural validation has been demonstrated, not a pattern in human nature awaiting confirmation.

What’s Honest To Say About “Fairness Preferences” Now

The fairness-preference literature is the most direct case study of what the WEIRD critique implies for how to talk about behavioral economics findings.

Pre-2001, the honest summary of the ultimatum-game literature was: “Humans systematically deviate from narrow self-interest in the direction of fairness. Proposers offer about 40—50% of stakes, and responders reject offers below about 30%.” This summary treated the WEIRD pattern as a feature of humans.

Post-2001, the honest summary is meaningfully more complicated and more useful. Humans across all cultures studied deviate from the narrow self-interest prediction in some direction. That is the robust feature. The specific direction and magnitude of deviation varies dramatically across cultures. Mean offers range from around 26% to around 58% across small-scale societies; rejection patterns range from essentially no rejections to substantial rejection of even hyper-fair offers; the entire shape of how the game is processed depends on the cultural and economic context the participants come from. The WEIRD pattern (about 40—50% offers, about 30% rejection threshold) is one specific configuration, not a universal baseline, and it correlates with market integration and the importance of cooperation in everyday economic life.

This more nuanced framing is consistent with the meta-analytic evidence as well. Oosterbeek, H., Sloof, R., & van de Kuilen, G. (2004). “Cultural differences in ultimatum game experiments: Evidence from a meta-analysis.” Experimental Economics, 7(2), 171—188 synthesized results from 37 ultimatum-game papers (most from WEIRD samples but including some non-WEIRD studies) and documented systematic effects of country of origin and other cultural variables on both offers and rejection rates. The meta-analytic finding was that even within the broadly WEIRD literature, country-level cultural variables explain a meaningful share of the variance in ultimatum-game behavior. The Henrich 2001 result was the most dramatic version of a pattern that was already detectable, in smaller form, in the existing literature.

The honest synthesis, then, is something like: humans have a capacity for fairness-based reasoning and punishment that can be triggered by ultimatum-game-like situations, but the threshold for what counts as unfair, the willingness to incur costs to punish unfairness, and even the direction in which “unfairness” can run (toward stinginess in WEIRD samples; toward burdensome generosity in Au and Gnau) are culturally calibrated. The experimental paradigm is robust. The specific quantitative findings are paradigm outputs in a particular cultural context, not constants of human nature.

What This Anti-Example Tells Us About Behavioral Economics

The reason this anti-example is useful is that it carves out a precise category that the rest of this hub has not addressed directly. Most of the dismantled findings in this hub failed at the paradigm level: the experiments did not replicate, or were fraudulent, or measured something other than what they claimed. The ultimatum-game case is different. The paradigm replicates. The general finding (humans deviate from narrow self-interest) replicates. The specific quantitative finding (the WEIRD pattern) does not generalize beyond WEIRD samples. This is a third category of evidential problem, and recognizing it as a distinct category sharpens the calibration this hub is trying to deliver.

The structural lesson is that behavioral economics, as a research program, has produced two distinct kinds of claims that the field has sometimes conflated. The first kind is paradigmatic claims: claims about what happens when you run a specific experimental setup with specific kinds of subjects. The ultimatum game produces a specific WEIRD pattern when run with WEIRD subjects. Prospect theory’s risk attitudes can be elicited with specific gamble-choice paradigms. The default effect can be triggered with specific pre-assignment manipulations. These paradigmatic claims are the durable contribution of the field. They tell you what the experimental tool measures, and that measurement is reproducible.

The second kind of claim is universal psychological claims: claims about what humans in general are like. “Humans care about fairness.” “Humans are loss-averse.” “Humans are present-biased.” These claims, in their universal form, are not directly supported by the experimental evidence. They are inferences from paradigmatic claims to features of human psychology in general, and that inference requires cross-cultural validation that has rarely been done. When it has been done --- as in the ultimatum-game case --- the universal claim has usually had to be revised into something more specific and culturally bounded.

A useful frame for any “behavioral economics says” claim, then, is to ask which kind of claim you are being shown. If the claim is paradigmatic --- “when you run this kind of experimental setup with this kind of subject, you reliably observe this pattern” --- it is probably reliable, modulo the broader replication issues this hub catalogues elsewhere. If the claim is universal --- “humans in general have this preference” or “humans in general behave this way” --- you should ask explicitly: has this been tested cross-culturally? In how many distinct cultural contexts? With what variation in results? In most cases the honest answer will be: it has been tested in WEIRD samples, and the cross-cultural validation is sparse or absent. That answer should make you more cautious about the universal interpretation, not because the underlying paradigm is wrong, but because the inferential leap from “WEIRD samples” to “humans in general” has not been earned.

This is calibration, not nihilism. The ultimatum game is a real thing. Human deviation from narrow self-interest in some direction is a real, cross-culturally robust thing. The fairness-preference literature has produced enormous insight into how people behave in specific institutional contexts. What the WEIRD critique does is tell you the limits of generalization, and those limits matter when you are trying to use behavioral-economics findings to make decisions in contexts that are not WEIRD undergraduate populations.

What This Means For Strategists In Multi-Cultural Business Contexts

For strategists running businesses that touch multiple cultural contexts --- international expansion, global pricing, cross-cultural sales, multi-region negotiation --- the operational implication of the Henrich critique is direct and consequential.

Pricing fairness norms vary across cultures, and the variation is large. The WEIRD-pattern intuition that customers will punish “unfair” prices --- the kind of intuition that underlies a lot of behavioral pricing advice --- maps to a specific cultural baseline. In other cultural contexts, the threshold for what counts as an unfair price, and the willingness to take costly action against the seller as a result, can be dramatically different. The Machiguenga response to low ultimatum-game offers (“why would I turn down free money?”) is not a behavioral exotic curiosity; it is a real pattern of how some cultural traditions process unequal exchanges. The Au and Gnau response to overly generous offers (rejection because of incurred obligation) is the analog of cultural contexts in which large discounts or “too generous” deals are read as suspicious or socially burdensome rather than welcome. Neither of these patterns shows up in standard WEIRD-derived pricing research, and they should not be ignored when entering culturally distinct markets.

Negotiation tactics drawn from WEIRD-sample behavioral research need calibration before deployment. The standard textbook advice on negotiation --- anchor high, make concessions in a particular pattern, calibrate to the perceived fairness of the offer --- is largely built on research from US and European business contexts. The Lamalera-style expectation that initial offers should be closer to 50-50 maps to negotiation cultures in which lowballing the first offer is read as bad faith rather than as expected strategic play. The hyper-fair-offer rejection pattern maps to negotiation contexts in which a too-generous offer is treated as a setup for future obligation rather than as a windfall. The textbook playbook works well in cultures whose ultimatum-game pattern looks WEIRD. It works less well, and sometimes counterproductively, in cultures whose pattern does not.

“Behavioral economics says” is not a universal authority in cross-cultural product, marketing, or organizational design. When a consulting deck, an academic article, or a popular-press book claims that “humans” respond a particular way to a particular framing, the correct first question for a strategist working internationally is: which humans, measured how, in what cultural context, and has the result been validated outside that context? If the underlying research is WEIRD undergraduate samples and the cross-cultural validation has not been done, the appropriate strategic move is treat the finding as a hypothesis about your specific market that needs local validation, not as a settled fact you can build on. Field-test the framing in the target market before scaling. Run small experiments with local samples before betting major budget on a behavioral-economics-derived design choice. The cost of localized validation is almost always much smaller than the cost of deploying a WEIRD-calibrated framing into a market where it does not work the way the literature predicted.

What This Means For Globalization Of Behavioral-Science Frameworks

The broader implication for any organization that deploys behavioral-science frameworks across geographies --- consulting firms, multinational product organizations, global nonprofits, international policy implementers --- is that the default assumption should be that frameworks need cross-cultural validation before deployment, not that WEIRD-validated frameworks transfer automatically.

The post-Henrich behavioral-science landscape has begun to take this seriously in several specific domains. Cross-cultural replication efforts have grown substantially since 2010, with consortia like the Psychological Science Accelerator running coordinated multi-site replications of canonical psychology findings with explicit cross-cultural sampling. The Behavioral Insights Team and similar nudge units operating in international development contexts now routinely run local pilots before scaling behavioral interventions across cultures, with documented cases where interventions that worked in one cultural context failed or produced reversed effects in another. The fairness-and-cooperation literature itself has produced a large secondary corpus of cross-cultural work since the Henrich 2001 paper, refining the original findings and documenting additional patterns.

But the broader pattern across behavioral science is that cross-cultural validation is still the exception rather than the norm. Most “behavioral economics says” claims still rest on WEIRD samples. Most popular behavioral-economics books, including the ones that practicing strategists are most likely to encounter, present WEIRD findings as universal human psychology without flagging the underlying sample composition. The work of translating from “we found this in WEIRD undergraduates” to “this is something humans in general do” is mostly still ahead of us, and in many cases the translation will not survive the work.

The strategic implication is that the Henrich critique is not a one-time cautionary footnote. It is a structural feature of the entire literature that strategists need to internalize as a default skepticism: behavioral-science claims built on WEIRD samples are claims about WEIRD samples until proven otherwise, and the cost of treating them as more than that is the cost of deploying confidently into markets where the underlying assumption was unvalidated. For a thoughtful operator, that is reason for caution, not despair. The behavioral-economics paradigm is genuinely useful. The specific findings derived from WEIRD samples are genuinely informative about WEIRD samples. The mistake is the inferential leap from there to “human nature.” That leap is a research project, not a fact, and it is mostly an incomplete one.

Sources

Güth, W., Schmittberger, R., & Schwarze, B. (1982). An experimental analysis of ultimatum bargaining. Journal of Economic Behavior & Organization, 3(4), 367—388. DOI: 10.1016/0167-2681(82)90011-7. ScienceDirect.
Henrich, J., Boyd, R., Bowles, S., Camerer, C., Fehr, E., Gintis, H., & McElreath, R. (2001). In search of Homo economicus: Behavioral experiments in 15 small-scale societies. American Economic Review, 91(2), 73—78. DOI: 10.1257/aer.91.2.73. AEA. Henrich faculty page.
Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences, 33(2—3), 61—83. DOI: 10.1017/S0140525X0999152X. Cambridge Core. Henrich faculty page.
Camerer, C. F. (2003). Behavioral Game Theory: Experiments in Strategic Interaction. Princeton University Press. ISBN: 978-0691090399.
Henrich, J., McElreath, R., Barr, A., Ensminger, J., Barrett, C., Bolyanatz, A., … & Ziker, J. (2006). Costly punishment across human societies. Science, 312(5781), 1767—1770. DOI: 10.1126/science.1127333.
Oosterbeek, H., Sloof, R., & van de Kuilen, G. (2004). Cultural differences in ultimatum game experiments: Evidence from a meta-analysis. Experimental Economics, 7(2), 171—188. DOI: 10.1023/B:EXEC.0000026978.14316.74. Springer.
Henrich, J. (2020). The WEIRDest People in the World: How the West Became Psychologically Peculiar and Particularly Prosperous. Farrar, Straus and Giroux. ISBN: 978-0374173227.

The Replication Crisis Hub --- full index of dismantled, contested, and surviving behavioral-science findings.
Prospect Theory: The Behavioral-Economics Framework That Actually Replicates --- the other anti-example in this hub, showing what robust behavioral economics looks like at the framework level.
Endowment Effect --- another classic behavioral-economics finding whose interpretation has been substantially revised post-replication-crisis.
Defaults and Status Quo Bias: The Nudge That Actually Holds Up (Anti-Example) --- the first anti-example in this hub, an empirical regularity that survived publication-bias correction and cross-cultural scrutiny.

FAQ

Should I use behavioral economics in international markets at all?

Yes, but with calibration. Behavioral-economics frameworks generate useful hypotheses about how people might respond in your target market. They do not generate confident predictions until the relevant findings have been validated in your specific cultural context, and most popular behavioral-economics claims have not been. Treat the framework as a source of testable hypotheses, run local pilots before scaling, and weight the WEIRD-derived literature appropriately for the cultural distance between WEIRD samples and your market.

What about pricing fairness norms across cultures?

The WEIRD-pattern intuition that customers will reject “unfair” prices and reward “fair” ones generalizes poorly. The threshold for what counts as unfair, the willingness to take costly action in response, and even the direction in which fairness can fail (stinginess vs. overgenerosity) vary substantially across cultures. Pricing strategies that work in US or European markets often need substantial revision in markets with different exchange norms. Local research --- focus groups, small pilots, willingness-to-pay studies in target markets --- is the correct investment, not extrapolation from US-pricing-psychology textbooks.

What about Henrich’s other work?

The 2001 ultimatum-game paper is the load-bearing empirical foundation. The 2006 Science paper on costly punishment extended the cross-cultural analysis to third-party punishment. The 2010 BBS WEIRD paper made the broader methodological argument. The 2020 book The WEIRDest People in the World offers a long-form historical account of why WEIRD psychology is unusual, grounded in centuries of European cultural change. Henrich’s overall research program is one of the most coherent extensions of the original 2001 finding into a much broader argument about how to do behavioral science responsibly.

Are there any universal behavioral patterns that have survived cross-cultural scrutiny?

A small set. Some perceptual phenomena (basic color discrimination, some forms of depth perception) appear cross-culturally robust. Some basic emotional displays (a subset of the Ekman facial expressions) replicate across cultures. Some statistical regularities (life-history patterns, mating preferences in broad outline) show meaningful cross-cultural stability. The set is smaller than the popular behavioral-science literature implies, and even within this set the magnitudes of effects often vary cross-culturally even when the direction is shared. The honest summary is that genuinely universal psychological patterns exist but are rarer than WEIRD-sample-based intuition suggests.

Does this critique apply to all behavioral economics or just to fairness research?

It applies broadly to any behavioral-economics finding that was established primarily on WEIRD samples and not validated cross-culturally. The fairness-and-cooperation literature is the most thoroughly studied case, but parallel critiques apply to many domains: time preferences (Western patterns of present-bias are not universal); spatial cognition (some spatial-reasoning patterns vary cross-culturally); self-construal (Western individualism is not the human default); even some risk preferences. The general principle: WEIRD evidence is evidence about WEIRD samples until cross-cultural validation has been done.

How is this different from the standard “replication crisis” critique?

The standard replication-crisis critique is that many findings, when re-tested in the same kind of sample, fail to produce the original effect. That is a paradigm-level failure: the experiment does not reliably do what it was claimed to do. The WEIRD critique is different. The paradigm replicates within WEIRD samples; the failure is at the level of generalization from WEIRD samples to humanity. Both critiques are real, both narrow what we can confidently claim, but they identify different kinds of inferential mistakes. A field finding can be replication-robust within WEIRD samples and still fail to generalize beyond them. The ultimatum game is precisely such a case.

What’s the simplest one-sentence takeaway for a strategist?

When someone tells you “behavioral economics shows that humans X,” your follow-up should be “which humans, measured in what context, and has that been validated in the cultural context I’m actually deploying in?” The answer determines whether the finding is a fact you can build on or a hypothesis you need to validate locally before committing real resources to.

replication-crisisultimatum-gamehenrich-weird Behavioral Science evidence-evaluation

Atticus Li

Experimentation and growth leader. CXL-certified CRO practitioner, Mindworx-certified in behavioral economics. Led 100+ in-house experiments at NRG in 2025, with project evidence and limits documented in the case studies.

About LinkedIn Newsletter

What Güth 1982 Originally Demonstrated

The WEIRD Pattern That Became “Universal”

Henrich 2001 AER --- 15 Small-Scale Societies, Enormous Variation

The Specific Cross-Cultural Findings

The WEIRD Paper

What’s Honest To Say About “Fairness Preferences” Now

What This Anti-Example Tells Us About Behavioral Economics

What This Means For Strategists In Multi-Cultural Business Contexts

What This Means For Globalization Of Behavioral-Science Frameworks

Sources

Related

FAQ

Related Articles

Cohen's d And The Misuse Of "Small/Medium/Large" Effect Sizes

The False Consensus Effect: Why You Think Everyone Agrees With You

The Barnum/Forer Effect: Why Personality Tests And Horoscopes Feel So Accurate

Get the WeeklyExperimentation Playbook

Get the Weekly
Experimentation Playbook