The Myers-Briggs (MBTI): A Personality Test Academic Psychology Considers Pseudoscience

Atticus Li

← The Replication Crisis · replication-crisis

The Myers-Briggs (MBTI): A Personality Test Academic Psychology Considers Pseudoscience

The MBTI is used by an estimated 88% of Fortune 500 companies and generates $20+ million/year in revenue, yet academic personality psychology essentially does not use it. Pittenger (1993, 2005) showed ~50% of takers get a different type on retest at five weeks, the 16-type structure has no empirical bimodal support, and three of four dimensions are weaker measurements of Big Five traits. Here is the honest story for executives evaluating personality assessments.

By Atticus Li May 22, 2026 32 min read

The Myers-Briggs Type Indicator (MBTI) is the most widely used personality assessment in business — by various industry estimates, 88% of Fortune 500 companies have used it at some point, with the Myers-Briggs Company generating well over $20 million per year in licensing, certification, and assessment revenue. Academic personality psychology essentially does not use it. The two most-cited empirical critiques (Pittenger 1993, 2005) found that about half of people get a different type on retest five weeks later, that the 16-type structure has no bimodal empirical support, and that the dimensions the MBTI measures are weaker versions of three of the Big Five personality traits. Here is the honest story for executives evaluating MBTI-based hiring, team-building, and leadership-development programs.

Picture the offsite. Forty managers in a hotel ballroom, name tags color-coded by four-letter type, breaking into small groups based on shared dichotomies. The facilitator — certified by the Myers-Briggs Company through a four-day training that costs around $2,000 — walks the room through what “ENFJs,” “ISTJs,” and “INTPs” tend to bring to a team. Someone learns they are an “INFJ” and feels seen, the way an astrology reader can feel seen by a Capricorn description. A wall poster lists the sixteen types as if they were sixteen species. The cost to the company for this one-day session, including the assessment, materials, facilitator fee, and certification overhead, runs in the low five figures. The cost across the broader Fortune 500 every year — across team retreats, executive coaching engagements, leadership development programs, and HR onboarding workshops — runs into the hundreds of millions of dollars. The Myers-Briggs Company’s revenue figures and industry analyst reports place the global MBTI assessment market well into the tens of millions of dollars in direct licensing alone, with the surrounding consulting ecosystem an order of magnitude larger.

The empirical literature behind that ballroom session — the actual peer-reviewed research on the MBTI’s reliability, structural validity, and predictive utility — tells a substantially different story than the marketing collateral. The instrument was developed by a mother-daughter team with no formal training in psychology, working from Carl Jung’s untested theoretical writings, and refined through several decades of commercial revisions. The flagship reliability finding from David Pittenger’s widely-cited 1993 Review of Educational Research critique is that approximately 50% of people who retake the MBTI five weeks later get a different four-letter type. The flagship structural finding is that when the underlying dimensions are scored continuously rather than dichotomously, the distributions look normal — bell-shaped — not bimodal, meaning there is no natural “type” cutoff in the data; the dichotomies are imposed by the scoring algorithm, not discovered in human nature. The flagship comparative finding, from McCrae and Costa’s 1989 reanalysis, is that three of the four MBTI dimensions overlap heavily with three of the Big Five (E/I with Extraversion, S/N with Openness, T/F with Agreeableness), but the MBTI measures them with worse psychometric properties than well-validated Big Five instruments do. The flagship predictive finding from Pittenger’s 2005 follow-up is that the evidence does not support the MBTI’s widespread use for career counseling, employee selection, or organizational decision-making.

This article walks through where the MBTI actually came from, the reliability problem, the 16-type structural problem, the validity problem, the comparison to the Big Five, why the test persists in corporate L&D despite the empirical case against it, and what to do instead. The goal is calibration. There are real, measurable personality traits — the Big Five framework is the academic standard, has decades of cross-cultural validation, and predicts work and life outcomes with documented incremental validity. There is no academic case for continuing to use the MBTI as a measurement instrument when better-validated alternatives are commercially available at similar cost. The MBTI’s persistence in corporate practice is a sociological phenomenon — the Barnum effect, the satisfaction of type-belonging, the self-reinforcing ubiquity of an instrument that everyone has already taken — not an empirical one.

Where The MBTI Actually Came From

The MBTI was developed by Katharine Cook Briggs (1875–1968) and her daughter Isabel Briggs Myers (1897–1980), neither of whom had formal training in psychology, psychometrics, or research methodology. Katharine Briggs was an educated homemaker with a longstanding amateur interest in personality typing; she had begun developing her own four-type system in the 1910s, before encountering Carl Jung’s 1921 Psychological Types. After reading Jung, she abandoned her own system and reoriented around his framework, eventually drawing her daughter Isabel into the project. Isabel Briggs Myers, a graduate of Swarthmore with a degree in political science, taught herself the rudiments of psychometric test construction during World War II, when she became interested in developing an instrument that would help women entering the wartime workforce find jobs suited to their personality “type.” She had no formal training in measurement, statistics, or psychology. The most thorough scholarly history of the instrument’s origins is Merve Emre’s 2018 book The Personality Brokers: The Strange History of Myers-Briggs and the Birth of Personality Testing (Doubleday), which draws on the Myers archives and the Center for Applications of Psychological Type’s records.

The theoretical foundation Briggs and Myers adopted from Jung’s Psychological Types is itself worth examining honestly. Jung’s 1921 book is a synthesis of clinical observation, philosophical reflection, and historical-cultural analysis, in which he proposes the famous distinction between extraversion and introversion and identifies four “functions” (thinking, feeling, sensation, intuition). Jung did not present this typology as an empirically validated taxonomy. He explicitly framed it as a heuristic — useful for organizing his clinical thinking — and warned in the same book against treating the types as fixed categorical labels for individual people. He never developed measurement instruments to operationalize the typology, never ran reliability studies, and (in correspondence and later writings) expressed reservations about how his typology was being used in popular form. The Jungian theoretical foundation for the MBTI is, in other words, untested speculative typology from a clinical-theoretical author who himself was wary of operationalizing it as a sorting tool.

Briggs and Myers added a fourth dichotomy (Judging/Perceiving) that does not appear in Jung’s original typology, partly to make the test mathematically symmetric (four binary dichotomies producing 2^4 = 16 types). They developed the first version of the indicator in the 1940s; it went through multiple revisions and was eventually licensed to the Educational Testing Service and later to Consulting Psychologists Press (now the Myers-Briggs Company / CPP, Inc.) for commercial distribution. By the 1980s and 1990s, the MBTI had become the most widely used personality assessment in industry. By the 2010s, the licensing organization had become a multi-million-dollar commercial enterprise with a global certification network, branded extensions (MBTI Step II, MBTI Step III), and an ecosystem of derivative products.

The professional psychology community’s relationship to this commercial success has been complicated. The American Psychological Association does not endorse the MBTI; psychometric review boards and academic personality psychology journals have published a long series of critiques (Pittenger 1993, Pittenger 2005, McCrae and Costa 1989, Stein and Swan 2019, among many others); and academic personality psychology research essentially does not use the MBTI as a measurement instrument. The instrument exists in two largely disconnected worlds: a commercial world where it is ubiquitous and a research world where it is not used.

The Reliability Problem

The single most damaging empirical finding for the MBTI is its poor test-retest reliability — specifically, the rate at which people get a different four-letter type when they retake the test a few weeks later. David J. Pittenger, “The utility of the Myers-Briggs type indicator,” published in Review of Educational Research, volume 63, issue 4, pages 467–488, in 1993 (DOI: 10.3102/00346543063004467), is the most-cited synthesis of the reliability data and reaches conclusions damaging enough that the paper has become a standard reference in the literature.

Pittenger reviewed multiple test-retest studies of the MBTI conducted from the 1970s through the early 1990s. The pattern across studies was consistent: when respondents retook the MBTI after a gap of several weeks, approximately 50% of them received at least one different letter — meaning a different four-letter type than they had received on the first administration. The exact percentage varied across studies and across the four dichotomies (some dimensions were more stable than others), but the overall picture was robust. Pittenger’s summary on page 472: “Some 50% of those who take the test for the second time will be assigned to a different type.” For a test that purports to identify a stable, meaningful personality “type” that should inform career choices, team assignments, and leadership development decisions, this level of instability is fatal. A diagnostic instrument that disagrees with itself half the time five weeks later is not measuring a stable underlying construct; it is largely measuring measurement noise around continuous dimensions that the algorithm forces into binary categories.

The MBTI’s defenders sometimes counter that test-retest agreement on the underlying dimensional scores (the continuous scores before they are dichotomized into letters) is much higher than the type-agreement rate suggests. This is a true but largely beside-the-point observation. The MBTI is sold and used as a categorical typing instrument — the entire point of the four-letter type code is to assign a person to one of sixteen discrete categories. If the underlying dimensional scores are reasonably stable but the type categorization is unstable, that means the dichotomization step (turning the continuous score into a letter) is introducing massive instability. People whose true dimensional score on, say, the Thinking-Feeling dimension is near the midpoint will flip between “T” and “F” types essentially at random across test administrations, even if their underlying dimensional score barely moves. This is exactly the failure mode you would expect from a categorical sorting algorithm imposed on continuous data — and exactly the failure mode the MBTI exhibits.

For comparison, the Big Five personality traits, when measured by validated instruments like the NEO-PI-R or the IPIP-NEO, show test-retest reliability over similar time intervals in the .80–.90 range for the trait scores and produce stable rank-orderings of people across years. The MBTI’s continuous-score reliability is in a roughly similar range, but the categorical-type reliability is dramatically worse because of the dichotomization. The Big Five framework avoids this problem by simply not dichotomizing — it reports trait scores on a continuous scale, where the measurement noise affects the score by a small amount rather than flipping the person between two categorical labels.

The 16-Types Structural Problem

The conceptual foundation of the MBTI is that there are sixteen real personality types, formed by the combination of four binary dichotomies (E/I, S/N, T/F, J/P). For this to be empirically defensible, the underlying score distributions on the four dichotomies would need to be bimodal — that is, the distribution of, say, Thinking-Feeling scores in the population would need to show two distinct peaks (one cluster of “true Thinkers,” another cluster of “true Feelers”) with relatively few people in the middle. Bimodal distributions are how you identify natural cut-points in continuous data; they are how you justify treating a continuous dimension as if it were a categorical distinction.

The data show normal distributions, not bimodal ones. When MBTI dimensions are scored continuously and the resulting distributions are plotted, the four dimensions look like bell curves — most people are near the middle, fewer people are at the extremes. This has been repeatedly documented in the psychometric literature on the MBTI; it is acknowledged even in some of the technical manuals published by the Myers-Briggs Company, though it is downplayed in the consumer-facing materials. Pittenger (1993) makes this point directly: “the lack of bimodality in the distributions calls into question the basic theoretical assumption underlying type theory” (p. 475). Stein and Swan, “Evaluating the validity of Myers-Briggs Type Indicator theory: A teaching tool and window into intuitive psychology,” published in Social and Personality Psychology Compass, volume 13, issue 2, e12434, in 2019 (DOI: 10.1111/spc3.12434), summarizes this in their critique: there is no empirical basis for treating the MBTI dimensions as categorical types because the distributions show no bimodal cut-points.

What this means in practice is that the difference between an “INTJ” and an “INTP” — same person on three dimensions, different on the fourth (J vs. P) — is often just a few percentage points on a continuous score that happens to cross the algorithm’s midpoint cutoff. Two people whose underlying personality profiles are nearly identical can get different four-letter types because one of them scored 51% on J and the other scored 49%. The categorical type difference suggests a meaningful qualitative distinction; the underlying data show no such distinction. Conversely, two people who share the same four-letter type can be far apart on the underlying dimensions — one might be a strong I (95th percentile) and the other a weak I (52nd percentile), and the type code does not distinguish them.

This is the structural reason why the MBTI’s test-retest reliability for types is so poor, why its predictive validity for outcomes is weak, and why academic personality psychology has rejected the typological framing in favor of continuous trait dimensions. The 16-type structure is an imposed mathematical scheme (four binary dichotomies → 16 cells), not an empirically discovered taxonomy of human personality. Real personality variation is dimensional; the MBTI categorization layer is an artifact of how the test is scored, not a feature of how people actually differ from one another.

The Validity Problem

The two most damaging Pittenger reviews — the 1993 Review of Educational Research paper and the 2005 follow-up — both focus heavily on the question of predictive validity: does knowing a person’s MBTI type let you predict anything important about them? David J. Pittenger, “Cautionary comments regarding the Myers-Briggs type indicator,” published in Consulting Psychology Journal: Practice and Research, volume 57, issue 3, pages 210–221, in 2005 (DOI: 10.1037/1065-9293.57.3.210), is structured explicitly as a warning to consulting psychologists who use the MBTI in client work. Pittenger’s bottom-line conclusion across both papers is that the empirical evidence “does not support” widespread use of the MBTI for the purposes it is commonly deployed for — career counseling, employee selection, team formation, and leadership development.

The specific validity findings Pittenger synthesizes are consistent with the structural problems already discussed. MBTI type does not reliably predict job performance — meta-analytic comparisons of MBTI types and job-performance outcomes show small and often statistically non-significant relationships, dwarfed by the predictive validity of cognitive ability tests and well-validated personality assessments. MBTI type does not reliably predict career choice — the claim that “INTJs” gravitate toward certain professions and “ESFPs” toward others is true at the level of weak statistical tendencies but has little practical predictive utility for individual career decisions. MBTI type does not reliably predict team performance — claims that teams with diverse type compositions outperform homogeneous teams are largely speculative, with much weaker empirical support than the corporate L&D framing suggests. MBTI type does not predict leadership effectiveness, marital compatibility, romantic success, or any of the other outcomes that popular MBTI literature gestures toward.

This does not mean that personality has no predictive validity for these outcomes. The Big Five personality traits — particularly conscientiousness for job performance, extraversion for leadership emergence and sales performance, emotional stability for adaptation to stress — have well-documented predictive validity at meta-analytic scales across the industrial/organizational psychology literature. The point is not “personality is irrelevant to work outcomes.” The point is “the MBTI is not the right instrument for measuring the personality differences that matter; better-validated instruments exist.” Pittenger’s 2005 paper is clear about this distinction: the critique is not anti-personality-measurement, it is anti-MBTI specifically.

A useful diagnostic question for evaluating any personality assessment’s predictive-validity claims is whether the assessment’s developer can cite specific, peer-reviewed, meta-analytic predictive-validity coefficients (in the form of corrected correlations or incremental validity beyond established measures) for the outcomes the assessment is being marketed for. The Big Five has such evidence in abundance — entire meta-analyses on conscientiousness and job performance, on extraversion and leadership emergence, etc. The MBTI does not have comparable evidence; the Myers-Briggs Company’s own technical manuals tend to cite reliability studies and convergence with other typological instruments rather than predictive-validity meta-analyses with job, leadership, or team outcomes. This asymmetry is not an accident. It reflects which assessments have generated bodies of independent academic validation research and which have not.

How MBTI Compares To Big Five

The cleanest analysis of what the MBTI is actually measuring, in the language academic personality psychology uses, is Robert R. McCrae and Paul T. Costa, “Reinterpreting the Myers-Briggs Type Indicator from the perspective of the five-factor model of personality,” published in Journal of Personality, volume 57, issue 1, pages 17–40, in 1989 (DOI: 10.1111/j.1467-6494.1989.tb00759.x). McCrae and Costa are the architects of the modern Big Five framework, particularly through the NEO-PI inventory. Their 1989 paper administered both the MBTI and the NEO-PI to a sample of adults and computed the correlations between MBTI dimensions and Big Five traits.

The headline result was that three of the four MBTI dimensions correlate substantially with three of the Big Five traits. Extraversion-Introversion (E/I) on the MBTI correlates strongly with Big Five Extraversion (with the sign reversed by MBTI convention — high MBTI E score corresponds to high Big Five Extraversion). Sensing-Intuition (S/N) correlates strongly with Big Five Openness to Experience (high N corresponds to high Openness). Thinking-Feeling (T/F) correlates moderately with Big Five Agreeableness (high F corresponds to high Agreeableness). The fourth MBTI dimension, Judging-Perceiving (J/P), correlates with Big Five Conscientiousness (high J corresponds to high Conscientiousness), though somewhat less strongly than the other three pairings. The fifth Big Five trait — Neuroticism / emotional stability — has no direct counterpart in the MBTI framework, which means the MBTI simply does not measure one of the most important dimensions of personality variation. This is a substantial omission; Neuroticism is one of the best-validated dimensions in personality psychology and has well-documented relevance to stress, adaptation, workplace mental health, and team dynamics.

The practical implication of the McCrae and Costa analysis is that the MBTI is essentially a weaker measurement of four-out-of-five-Big-Five-traits, plus a problematic typological wrapper that introduces measurement noise through dichotomization. Whatever predictive validity the MBTI has for work and life outcomes is largely the same predictive validity that the Big Five traits have — but expressed through a noisier categorical type system rather than through clean continuous trait scores. If you want to measure the personality dimensions the MBTI is targeting, a Big Five instrument (the NEO-PI-R, the IPIP-NEO, the BFI-2) measures them more reliably, more validly, and on a more defensible scale. The MBTI’s claim to measure something distinct from the Big Five — to capture a unique typological structure that other instruments miss — is empirically unsupported.

This comparison matters for procurement decisions. The Big Five instruments are commercially available at similar or lower per-respondent cost than the MBTI, are not encumbered by the same licensing-and-certification overhead, and have decades of independent academic validation that the MBTI lacks. An HR or L&D function that has been buying MBTI assessments for years has, by switching to a validated Big Five instrument, an opportunity to upgrade the measurement quality of its personality data without spending more — often while spending less. The corporate L&D market’s continued preference for the MBTI is not driven by superior measurement properties.

Why It Persists In Corporate L&D

If the empirical case against the MBTI is as clear as the academic literature suggests, why does the instrument continue to dominate corporate practice? The honest answer is that the MBTI’s persistence is a sociological phenomenon driven by several reinforcing dynamics, not an empirical one.

The first dynamic is the Barnum effect — the well-documented tendency for people to accept generic, broadly-applicable personality descriptions as if they were specifically and uniquely descriptive of themselves. The MBTI type descriptions are written in the language of self-flattering generality (“INFJs are insightful, idealistic, and value deep connections”; “ESTPs are energetic, pragmatic, and thrive in action-oriented environments”). Most people, on reading their type description, recognize themselves in it — partly because the descriptions are flattering, partly because they are vague enough to fit many people, and partly because once a person believes their type, they begin selectively attending to evidence that confirms it. This is the same psychological dynamic that makes horoscopes feel personally accurate; it has nothing to do with whether the underlying typology is empirically valid. The Barnum effect is so strong with the MBTI that participants in corporate training sessions routinely report finding their type description “scarily accurate,” which the facilitator then uses as informal validation of the instrument’s diagnostic power.

The second dynamic is the satisfaction of type-belonging. Knowing one is “an INFJ” or “an ENTP” provides a feeling of identity and community. There are large online communities organized around MBTI types (subreddits, forums, dating apps that filter by type). The four-letter code becomes a shorthand for self-understanding, a way to explain one’s preferences to others, a basis for in-group identification with people of the same type. This is psychologically rewarding in a way that “you scored 72nd percentile on Extraversion” is not. The MBTI’s typological framing offers something that continuous trait scores cannot — a categorical identity. This is also why the MBTI’s empirical problems (poor reliability, lack of bimodal structure, weak predictive validity) tend to bounce off committed users. The instrument’s value to them is not predictive; it is identity-affirming.

The third dynamic is the self-reinforcing ubiquity of an instrument that everyone has already taken. Once an organization has used the MBTI across thousands of employees, the four-letter codes become embedded in the organizational vocabulary. People remember their types. Managers reference them in feedback conversations. Team retrospectives invoke type combinations. The cost of switching to a different framework — re-training facilitators, re-assessing employees, re-learning a new vocabulary, re-organizing existing materials — is high, while the perceived cost of continuing with the MBTI is low (the assessment “works fine,” the workshops are well-received, no one is complaining). The corporate inertia favors the incumbent instrument regardless of its empirical merits. The Myers-Briggs Company has been savvy in maintaining this incumbency through the certification ecosystem, branded extensions, and ongoing marketing to HR functions.

The fourth dynamic is the vendor and certification ecosystem itself. Tens of thousands of consultants, coaches, trainers, and HR professionals have invested in MBTI certification — a several-day, several-thousand-dollar credential — and have built careers around delivering MBTI-based programs. These practitioners have a substantial financial interest in the instrument continuing to be perceived as legitimate. They are not bad people, and many of them are sincerely effective at facilitating useful conversations using the MBTI as a scaffold. But their accumulated financial and professional investment in the MBTI ecosystem creates a powerful constituency for its continued use, somewhat independent of the academic case for or against the instrument.

The fifth dynamic is that the MBTI does, in practice, deliver some legitimate facilitation value as a structured conversation starter, even when the underlying measurement is empirically weak. Sessions that frame conversations around “how different types approach disagreement” or “how to communicate with someone whose preferences differ from yours” can produce useful self-reflection and team awareness — not because the MBTI is correctly diagnosing real personality types, but because the framework provides a shared vocabulary and a non-threatening lens for talking about personality differences. If a team has never had a structured conversation about how its members differ in working styles, an MBTI workshop can be useful in initiating that conversation, even though the same conversation could be initiated with any number of other (more or less empirically validated) frameworks. This is the honest case for the MBTI’s continued utility in corporate L&D: it is a vehicle for conversations that have value even when the vehicle is empirically rickety.

What This Means For Hiring And L&D

For senior leaders evaluating MBTI-based assessment or training investments, the practical implications follow from the empirical picture.

On using MBTI for hiring decisions: do not. The Myers-Briggs Company itself states that the MBTI should not be used for hiring or selection, and the empirical case for this restriction is strong. The instrument has poor test-retest reliability for types, lacks bimodal structural support, and has weak predictive validity for job performance — using it as a hiring screen would expose the organization to legal-defensibility risk (the MBTI is unlikely to survive an adverse-impact challenge with expert testimony) and would not improve selection quality over a well-administered Big Five personality assessment plus a cognitive ability test. If an outside vendor proposes MBTI-based hiring, decline.

On using MBTI for team building and leadership development: treat it as an icebreaker, not a diagnostic. The facilitation value of an MBTI workshop is real, but the diagnostic value is not. If your facilitator can use the four-letter types as a non-threatening conversational scaffold without making strong claims about what the types “mean” or how teams should be composed based on type, the session can produce useful awareness. If the facilitator is making strong claims — “your team needs more Js for execution focus, more Fs for conflict de-escalation, more Ns for innovation” — they are overselling the instrument’s predictive validity, and the organizational decisions that flow from these claims will not be empirically grounded. Set expectations explicitly: the MBTI is a conversation starter, not a measurement instrument; the four-letter codes are useful shorthand, not biological facts; and any organizational decision (team composition, role assignment, succession planning) should be informed by job-relevant data and validated assessments, not by MBTI types.

On vendor selection if you want personality data: use a Big Five-based instrument. The cleanest options for corporate use include the NEO-PI-R (Costa and McCrae, the gold-standard academic instrument, used in clinical and research contexts), the IPIP-NEO (open-source Big Five inventory, available without commercial licensing), the BFI-2 (Soto and John 2017, a shorter validated Big Five inventory), and Hogan Assessments (commercial Big Five-aligned instruments specifically validated for industrial/organizational use, with strong predictive-validity research backing for leadership outcomes). Each of these has substantially better measurement properties than the MBTI and is competitively priced. The procurement decision is straightforward.

On the broader question of “what is corporate L&D for”: the deeper question raised by the MBTI’s persistence is whether the goal of personality-based L&D programs is to generate accurate measurement and individualized development plans, or to generate workshops that participants find enjoyable and team conversations that participants find meaningful. The honest answer is that the latter is often more of the actual goal than the former, and the MBTI is reasonably effective at delivering it. If the organization is going to be honest with itself that the workshop is a structured-conversation vehicle rather than a personality-measurement instrument, the MBTI can persist alongside more rigorous tools for actual selection and assessment decisions. The intellectual dishonesty enters when the MBTI is used or marketed as if it were a validated measurement instrument that should inform individual career, hiring, or team-composition decisions. That use is not defensible.

The honest pitch a thoughtful L&D vendor could make, if they wanted to use the MBTI in a way that holds up to scrutiny, is something like: “We use the MBTI as a familiar, low-threat framework for structured conversations about working-style differences. We do not claim that the four-letter codes are biological facts, that they predict job performance, or that they should inform hiring decisions. The value of the workshop is in the conversation, not in the assessment scores. For actual selection and development decisions, use a Big Five-based instrument with documented predictive validity for your specific roles.” That is a defensible pitch. The current standard pitch — “the MBTI reveals your true personality type and our program will help you and your team work better together based on these scientifically-grounded insights” — is not.

Sources

Pittenger, D. J. (1993). The utility of the Myers-Briggs type indicator. Review of Educational Research, 63(4), 467–488. DOI: 10.3102/00346543063004467
Pittenger, D. J. (2005). Cautionary comments regarding the Myers-Briggs type indicator. Consulting Psychology Journal: Practice and Research, 57(3), 210–221. DOI: 10.1037/1065-9293.57.3.210
McCrae, R. R., & Costa, P. T. (1989). Reinterpreting the Myers-Briggs Type Indicator from the perspective of the five-factor model of personality. Journal of Personality, 57(1), 17–40. DOI: 10.1111/j.1467-6494.1989.tb00759.x
Stein, R., & Swan, A. B. (2019). Evaluating the validity of Myers-Briggs Type Indicator theory: A teaching tool and window into intuitive psychology. Social and Personality Psychology Compass, 13(2), e12434. DOI: 10.1111/spc3.12434
Emre, M. (2018). The Personality Brokers: The Strange History of Myers-Briggs and the Birth of Personality Testing. New York: Doubleday.
Jung, C. G. (1921). Psychologische Typen [Psychological Types]. Zürich: Rascher Verlag. (English translation: Princeton University Press, 1971, in Collected Works vol. 6.)
Soto, C. J., & John, O. P. (2017). The next Big Five Inventory (BFI-2): Developing and assessing a hierarchical model with 15 facets to enhance bandwidth, fidelity, and predictive power. Journal of Personality and Social Psychology, 113(1), 117–143. DOI: 10.1037/pspp0000096
Costa, P. T., & McCrae, R. R. (1992). Revised NEO Personality Inventory (NEO-PI-R) and NEO Five-Factor Inventory (NEO-FFI) professional manual. Odessa, FL: Psychological Assessment Resources.
Hunsley, J., Lee, C. M., & Wood, J. M. (2003). Controversial and questionable assessment techniques. In S. O. Lilienfeld, S. J. Lynn, & J. M. Lohr (Eds.), Science and pseudoscience in clinical psychology (pp. 39–76). New York: Guilford Press.
Boyle, G. J. (1995). Myers-Briggs Type Indicator (MBTI): Some psychometric limitations. Australian Psychologist, 30(1), 71–74. DOI: 10.1111/j.1742-9544.1995.tb01750.x

Replication Crisis Hub — full index — start here for the broader landscape of contested behavioral-science claims that have entered corporate practice.
Goleman’s Emotional Intelligence — the direct structural parallel: a popular personality-adjacent construct that meta-analysis shows is largely a repackaging of Big Five personality traits.
Multiple Intelligences — Howard Gardner’s theory, another typological framework with similar empirical problems and corporate persistence.
Learning Styles — the educational analog: a popular typology (visual/auditory/kinesthetic learners) with no empirical support that nonetheless dominates corporate training-design practice.
Grit: Real, But Barely Distinguishable From Conscientiousness — another popular construct that turns out to be a repackaging of an existing Big Five trait.
Self-Esteem Movement — earlier example of a psychology-to-public-policy export that outran its empirical foundation.

FAQ

Why do so many companies use the MBTI if academic psychology rejects it?

The MBTI’s corporate dominance is a sociological and economic phenomenon, not an empirical one. Five reinforcing dynamics explain its persistence despite the empirical case against it: (1) the Barnum effect — type descriptions are written in self-flattering generalities that most people find personally resonant; (2) the satisfaction of type-belonging — four-letter codes provide a categorical identity that continuous trait scores do not; (3) self-reinforcing ubiquity — once everyone in an organization has taken the MBTI, the switching cost to a different framework is high; (4) the vendor and certification ecosystem — tens of thousands of trained MBTI facilitators have a financial stake in the instrument’s continued use; (5) genuine facilitation value — the MBTI can be useful as a structured-conversation scaffold even when the underlying measurement is weak. None of these dynamics depend on the MBTI being empirically valid; they would sustain the instrument’s market position even if the academic case against it were universally acknowledged.

What about MBTI for team building specifically? Doesn’t it help teams understand each other?

The honest answer is “yes and no, depending on what the facilitator does with it.” Used as a structured-conversation starter — “here’s a framework for talking about how we differ in working styles” — the MBTI can produce useful team awareness without doing any empirical harm. The conversation is the product; the type codes are just the scaffolding for the conversation. Used as a diagnostic instrument — “this team needs more Ns to be innovative, fewer Ts to be empathic, more Ps to be flexible” — the MBTI is being overinterpreted. The four-letter types do not predict team performance, do not justify team-composition decisions, and do not reliably identify each member’s “true” working style (remember the 50% retest disagreement). A well-run team-building session that uses the MBTI as conversational scaffolding can be valuable; a session that treats the type codes as diagnostic data about team members is overselling the instrument’s empirical foundation.

What if I find my MBTI type meaningful and accurate? Doesn’t that count as evidence?

It counts as the Barnum effect, not as evidence. The reason MBTI type descriptions feel personally accurate is partly that they are written in self-flattering generalities that fit many people, partly that they invite confirmation bias (once you believe you are an “INFJ,” you selectively notice INFJ-consistent behavior and discount inconsistent behavior), and partly that the dimensions the MBTI measures do correspond to real personality traits — so the descriptions are not arbitrary, they are just over-categorized versions of real trait variation. The fact that the descriptions feel right does not establish that the typology is empirically valid. It establishes that you are like most people — capable of finding meaning in a personality description that is broadly applicable. Astrology delivers the same subjective experience of “wow, that’s so me,” and we do not therefore conclude that astrology is empirically valid. The subjective resonance of an MBTI type description is psychologically interesting but is not a substitute for test-retest reliability, structural validity, and predictive validity — all of which the MBTI lacks.

What should I use instead of the MBTI for personality assessment?

Use a Big Five-based instrument. The Big Five framework (Extraversion, Agreeableness, Conscientiousness, Neuroticism / Emotional Stability, Openness to Experience) is the academic standard for personality measurement, has decades of cross-cultural validation, and predicts work and life outcomes with documented incremental validity. Specific instruments worth considering: the NEO-PI-R (gold-standard academic instrument), the IPIP-NEO (open-source, free), the BFI-2 (shorter validated Big Five inventory, Soto & John 2017), and Hogan Assessments (commercial Big Five-aligned instruments specifically developed for industrial/organizational use, with strong predictive-validity research for leadership outcomes). All of these measure the personality dimensions the MBTI is targeting, with better measurement properties, at similar or lower per-respondent cost, without the licensing-and-certification overhead the MBTI carries.

Is the Myers-Briggs Company a legitimate organization or a pseudoscience operation?

This is the wrong framing. The Myers-Briggs Company is a legitimate commercial organization that has built a substantial business on a personality assessment instrument the academic field considers empirically inadequate. They are not running a fraud — they are not making fabricated claims about supernatural powers, not committing scientific misconduct, not deceiving customers about what they are selling. They are selling an assessment-and-certification ecosystem that customers find valuable for facilitation purposes, with marketing claims about scientific validity that the academic literature does not support. This is more analogous to a successful consumer brand whose product has limited scientific evidence for its specific claims than to outright pseudoscience. The intellectual honesty issue is on the customer side — corporate L&D buyers should know what they are buying and price the engagement accordingly. The Myers-Briggs Company is responding to demand; the demand exists because the buyers find the workshops valuable and have not, in many cases, looked closely at the empirical literature behind the instrument.

Why did Jung not develop a measurement instrument for his typology?

Jung himself was wary of operationalizing his typology as a sorting tool. Psychological Types (1921) is framed as a clinical-theoretical synthesis, not as an empirical taxonomy. Jung explicitly cautioned against treating the types as fixed categorical labels for individual people, and in subsequent writings and correspondence he expressed reservations about how the typology was being used in popular and applied contexts. The typology was a heuristic for organizing his clinical observations and historical-philosophical reflections; it was not presented as a measurement framework, and Jung did not attempt to develop reliability data, structural-validity evidence, or predictive-validity research for it. The MBTI’s operationalization of Jungian typology as a sortable measurement instrument was Briggs and Myers’s project, not Jung’s, and was undertaken without formal psychometric training. This is part of why the empirical foundation has remained weak — the original theoretical author did not consider it a measurement framework, and the people who turned it into one were not equipped with the methodological tools to do so rigorously.

If the MBTI is empirically indefensible, why do some studies show it correlates with outcomes?

Some studies do show statistically significant correlations between MBTI types or dimensional scores and various outcomes — career choices, communication styles, conflict-resolution preferences. These correlations are generally small (weak statistical tendencies, not strong predictive relationships), are explainable by the MBTI’s overlap with Big Five traits (since three of four MBTI dimensions are weaker measures of Big Five Extraversion, Openness, and Agreeableness, any predictive validity is largely inherited from those underlying Big Five traits), and do not justify the typological framing of the instrument or its use for individual-level diagnostic decisions. The academic critique is not that the MBTI predicts nothing; it is that whatever predictive validity exists is (a) modest, (b) inherited from the underlying Big Five traits the MBTI is measuring with worse psychometric properties, and (c) insufficient to justify the categorical type interpretations the instrument is marketed around.

If I am already MBTI-certified and have built a career around it, what should I do?

Honest options exist that do not require abandoning your existing practice. (1) Reframe how you describe what the MBTI does — as a structured-conversation framework for working-style awareness, not as a measurement instrument that diagnoses true personality types. (2) Pair MBTI workshops with Big Five-based assessments for clients who want actual measurement-quality personality data — you can deliver both, with appropriate framing for each. (3) Be transparent with clients about the empirical literature when asked — sophisticated buyers will respect the honesty more than the marketing, and unsophisticated buyers will not ask. (4) Develop facilitation skills that do not depend on the MBTI as the diagnostic centerpiece — the value you deliver in workshops is largely facilitation skill, not the specific instrument; that skill transfers to better-validated frameworks. The MBTI ecosystem is unlikely to collapse anytime soon, and there is no need for individual practitioners to do so either; the path forward is honest reframing rather than wholesale rejection of the practice you have built.

replication-crisismbtipersonality-assessmentcorporate-l&devidence-evaluation

Atticus Li

Experimentation and growth leader. CXL-certified CRO practitioner, Mindworx-certified in behavioral economics. Led 100+ in-house experiments at NRG in 2025, with project evidence and limits documented in the case studies.

About LinkedIn Newsletter

Where The MBTI Actually Came From

The Reliability Problem

The 16-Types Structural Problem

The Validity Problem

How MBTI Compares To Big Five

Why It Persists In Corporate L&D

What This Means For Hiring And L&D

Sources

Related Articles in This Hub

FAQ

Why do so many companies use the MBTI if academic psychology rejects it?

What about MBTI for team building specifically? Doesn’t it help teams understand each other?

What if I find my MBTI type meaningful and accurate? Doesn’t that count as evidence?

What should I use instead of the MBTI for personality assessment?

Is the Myers-Briggs Company a legitimate organization or a pseudoscience operation?

Why did Jung not develop a measurement instrument for his typology?

If the MBTI is empirically indefensible, why do some studies show it correlates with outcomes?

If I am already MBTI-certified and have built a career around it, what should I do?

Related Articles

Cohen's d And The Misuse Of "Small/Medium/Large" Effect Sizes

The False Consensus Effect: Why You Think Everyone Agrees With You

The Barnum/Forer Effect: Why Personality Tests And Horoscopes Feel So Accurate

Get the WeeklyExperimentation Playbook

Get the Weekly
Experimentation Playbook