Howard Gardner’s 1983 Frames of Mind proposed that human intelligence is not one thing but eight or nine — linguistic, logical-mathematical, spatial, musical, bodily-kinesthetic, interpersonal, intrapersonal, and naturalist — and that schools and workplaces had been measuring only the first two. The theory transformed K-12 curriculum and corporate L&D for four decades. The empirical reality, established most cleanly by Visser, Ashton & Vernon (2006) and reviewed comprehensively by Waterhouse (2006), is that the intelligences are not separate at all — task-based measures of seven of the eight load on a single general factor (g) — and that the theory has never had the empirical foundation its educational adoption implies. Here is the honest story.

You have seen the poster. It hangs in the back of an elementary classroom — a colorful chart with eight cartoon heads labeled “Word Smart,” “Number Smart,” “Picture Smart,” “Music Smart,” “Body Smart,” “People Smart,” “Self Smart,” and “Nature Smart.” Below it, a teacher hands out a learner-profile worksheet so each child can identify their dominant intelligence and the school can differentiate instruction accordingly. The same framework, dressed for adults, appears in a corporate L&D deck somewhere this quarter — a leadership program promising to identify and develop “diverse intelligences across your management team,” priced in the six figures, citing Howard Gardner’s Frames of Mind as the scientific foundation. The proposed engagement is built on the idea that traditional cognitive-ability and personality assessments capture only a fraction of human capacity and that the modern enterprise needs to measure and develop a fuller portfolio of intelligences.

The empirical literature behind that pitch is much thinner than its educational and corporate adoption suggests. The single most rigorous direct test of Gardner’s theory — a study by Visser, Ashton, and Vernon at the University of Western Ontario, published in the peer-reviewed journal Intelligence in 2006 — administered a battery of two purpose-built tasks for each of eight of Gardner’s intelligences and found that the cognitive intelligences (linguistic, logical-mathematical, spatial, naturalist, interpersonal, intrapersonal) all loaded substantially on a single general factor, indistinguishable from classical g. The same year, Lynn Waterhouse published a comprehensive critical review in Educational Psychologist surveying the evidence base for multiple intelligences (alongside the Mozart effect and emotional intelligence) and concluded that no good empirical evidence existed for the theory in the more than two decades since its publication. Hattie’s massive 2009 meta-synthesis Visible Learning found that MI-aligned curriculum interventions produce minimal documented effects on learning outcomes — smaller than countless cheaper, less ideologically loaded interventions. And Gardner himself, in multiple venues over the years, has acknowledged that his framework was framed as a social and educational claim about what should count as intelligent rather than a strictly empirical-cognitive theory derived from psychometric data.

This article walks through what Gardner actually proposed in 1983, what evidence he did and did not provide, what the Visser et al. 2006 direct test actually found, what Waterhouse’s 2006 review concluded across the broader literature, what Gardner himself has said about the empirical status of his theory, what is actually well-established about intelligence in modern psychometric research (the robust g-factor that MI was meant to replace), and what the practical implications are for school districts, corporate L&D leaders, and CEOs evaluating MI-derived assessment and training products. The goal is calibration, not takedown. Gardner is a careful scholar and the cultural impact of his work is real. What the empirical literature does not support is the popular framing that “multiple intelligences” is a scientifically validated taxonomy of distinct cognitive capacities that education and HR systems should be measuring and developing as separate constructs.

What Gardner Proposed In 1983

The book that launched the multiple-intelligences movement is Howard Gardner, Frames of Mind: The Theory of Multiple Intelligences (Basic Books, 1983). Gardner was a Harvard developmental psychologist working at Project Zero, the cognitive-development research group founded by Nelson Goodman. The book proposed that what psychologists had long called “intelligence” was a culturally biased and unnecessarily narrow construct — focused primarily on the kinds of skills tested by IQ instruments, which themselves emphasized the linguistic and logical-mathematical capacities most valued in Western academic settings. In place of a single general intelligence, Gardner proposed an initial list of seven distinct intelligences: linguistic, logical-mathematical, spatial, musical, bodily-kinesthetic, interpersonal, and intrapersonal. He added an eighth, naturalist intelligence, in the late 1990s, and has periodically discussed a possible ninth, existential intelligence, without formally adopting it.

Gardner defined an intelligence as the capacity to solve problems or fashion products that are valued in at least one cultural setting. He proposed eight criteria that a candidate ability had to satisfy to qualify as an intelligence: (1) potential isolation by brain damage; (2) existence of idiot-savants, prodigies, and other exceptional individuals; (3) an identifiable core operation or set of operations; (4) a distinctive developmental history with a definable set of expert “end-state” performances; (5) an evolutionary history and evolutionary plausibility; (6) support from experimental psychological tasks; (7) support from psychometric findings; and (8) susceptibility to encoding in a symbol system. He argued that each of his eight proposed intelligences satisfied most of these criteria, while traditional candidates like “common sense” or “moral intelligence” did not.

The argumentative move is important to understand. Gardner did not conduct a single integrated empirical study testing whether his eight intelligences were statistically separable from each other or from established cognitive-ability constructs. The book is a synthesis — it pulls together neuropsychological case reports of selective brain damage producing selective ability deficits, anthropological observations of culturally valued skills, developmental data on prodigies and savants, evolutionary speculation, and selective references to psychometric research. The intelligences are proposed as a taxonomy supported by this convergent qualitative argument, not as a model derived from or tested by factor analysis of psychometric data. This is a perfectly legitimate way to propose a theory in cognitive psychology. What it is not is empirical validation of the theory’s central claim — that the eight intelligences are statistically independent or even meaningfully separable cognitive capacities.

The book was a runaway success in the education world. By the early 1990s, MI was being integrated into curriculum design at schools across the United States and internationally. The Key Learning Community in Indianapolis, founded in 1987, became the flagship MI-aligned school. Thomas Armstrong’s Multiple Intelligences in the Classroom (ASCD, 1994, with subsequent editions) became one of the best-selling education books of the 1990s. By the early 2000s, MI had entered teacher-preparation curriculum at most U.S. schools of education, and “differentiated instruction” — the pedagogical movement that grew up around MI and related theories — had become the prevailing orthodoxy in K-12 instructional design. The corporate L&D world followed. The Hay Group, training and development consultancies, and leadership-program vendors built MI-derived diagnostic instruments and “develop your dominant intelligence” coaching programs. The theory was, by any cultural measure, a phenomenon.

What Gardner Did NOT Provide

The single most important fact about the empirical status of multiple intelligences is this: in the 1983 book and across his subsequent decades of work, Gardner did not provide empirical evidence — in the standard psychometric sense — that the eight intelligences are separable from each other or from general intelligence. He did not present a factor-analytic study of a battery of MI-targeted tasks showing eight distinct factors. He did not present incremental-validity data showing that MI-based assessment predicts educational or career outcomes above and beyond traditional cognitive ability and personality measures. He did not present test-retest reliability data for MI-based assessments demonstrating that the eight intelligences are stable individual-difference dimensions.

This was not an oversight. Gardner has been explicit that his methodology in Frames of Mind was synthetic and convergent rather than psychometric. In his 1995 reflection paper Howard Gardner, “Reflections on Multiple Intelligences: Myths and Messages,” published in Phi Delta Kappan, volume 77, issue 3, pages 200–209, he wrote: “MI theory is in most respects a critique of psychometrics-as-usual.” He has repeatedly described his theory as a “framework” or a “perspective” rather than as a hypothesis to be tested through standard psychometric procedures. He has argued that the relevant evidence for MI comes from neuropsychology (selective brain-damage cases), developmental psychology (prodigies and savants), evolutionary plausibility, and cross-cultural observation — not from factor analysis of task batteries.

The problem is that, in the absence of psychometric evidence, the central empirical claims of MI — that the intelligences are distinct, that individuals have dominant intelligences, that educational interventions targeting specific intelligences improve learning in those areas — are not testable in the way that the popular and educational adoption assumes them to be testable. When a school adopts an MI framework and tells parents that the school is measuring and developing eight distinct intelligences in their child, the school is making an empirical claim (that these are real, separable, measurable capacities) that Gardner’s theoretical work does not by itself support. The school is borrowing scientific credibility from a synthesis of qualitative evidence and using it as if it were psychometric validation.

The intellectual structure of the situation is worth pausing on. Gardner is not a fraud, did not fabricate data, and did not misrepresent his methodology. He explicitly described his approach as synthetic and convergent. The slippage happened downstream — between Gardner’s careful theoretical work and the educational and corporate adoption of MI as a measurement-and-development framework. The popular framing claims more than the theoretical work supports, and the theoretical work itself does not include the kind of psychometric validation that would justify the popular framing. This is a recurring pattern in the replication-crisis literature: a scholar proposes a careful, nuanced framework; the framework becomes a brand; the brand outruns the evidence; and the evidentiary gap is filled by repetition, marketing, and the genuine appeal of the underlying idea.

Visser, Ashton & Vernon 2006 — The Direct Empirical Test

The most rigorous direct empirical test of Gardner’s core claim — that the intelligences are statistically separable from each other and from g — is Beth A. Visser, Michael C. Ashton, and Philip A. Vernon, “Beyond g: Putting Multiple Intelligences Theory to the Test,” published in Intelligence, volume 34, issue 5, pages 487–502, in 2006 (DOI: 10.1016/j.intell.2006.02.004). Vernon (University of Western Ontario) is one of the most respected psychometricians in intelligence research; the paper is a model of how to fairly test a theory using methods that the theory’s proponents had not themselves employed.

The study administered a battery of two purpose-built ability tasks for each of eight of Gardner’s intelligences — linguistic, logical-mathematical, spatial, musical, bodily-kinesthetic, interpersonal, intrapersonal, and naturalist — to a sample of 200 adults. The tasks were carefully designed to require the specific cognitive operations Gardner had identified for each intelligence (for example, the musical tasks required pitch discrimination and rhythm reproduction; the spatial tasks required mental rotation and visualization; the naturalist tasks required categorization of living things). Where direct ability measurement was difficult or controversial — for the bodily-kinesthetic, interpersonal, and intrapersonal intelligences — the researchers used the best available proxies and were transparent about the measurement limitations.

The headline finding is unambiguous. When the researchers factor-analyzed the task battery, the cognitive intelligences — linguistic, logical-mathematical, spatial, naturalist, interpersonal, intrapersonal — all loaded substantially on a single general factor, with loadings in the range typical of classical g-factor analyses (most above 0.50). The non-cognitive intelligences (bodily-kinesthetic, musical) loaded weakly or not at all on the general factor, which the authors interpreted as evidence that these are not “intelligences” in the cognitive-ability sense at all but rather distinct ability domains (physical skill, musical aptitude) that the MI framework had improperly conflated with cognition. The interpersonal and intrapersonal “intelligences,” when measured with ability-based tasks rather than self-report, were essentially indistinguishable from general cognitive ability — high-g individuals were better at perceiving and reasoning about social and emotional content, just as they are better at most other cognitively demanding tasks.

The authors’ summary in the abstract is direct: “Existing tests for seven intelligences were collected or developed, and were administered to 200 adult subjects… A factor analysis of the eight purported intelligences revealed a large g factor that was strongly correlated with measures of general cognitive ability.” They explicitly frame the result as a refutation of the strong version of Gardner’s claim: the intelligences are not separable from g; they mostly are g.

It is fair to note the limitations Visser et al. acknowledge. The bodily-kinesthetic and musical “intelligences” require measurement approaches outside standard cognitive psychometrics, and the researchers’ chosen tasks may not perfectly capture what Gardner had in mind. The sample is modest in size and not nationally representative. The study addresses Gardner’s structural claim about the separability of intelligences without addressing every nuance of his developmental and pedagogical claims. But the central empirical question — are there eight separate intelligences, or one general factor with some non-cognitive ability domains alongside? — is answered cleanly by the study’s data. The answer is the latter. The MI framework as a cognitive-ability taxonomy is not supported by direct psychometric test of its claims.

Gardner’s published response to the Visser et al. paper, Howard Gardner and Seana Moran, “The Science of Multiple Intelligences Theory: A Response to Lynn Waterhouse,” in Educational Psychologist, volume 41, issue 4, pages 227–232, in 2006 (DOI: 10.1207/s15326985ep4104_2), did not contest the factor-analytic finding on its own terms. Instead, Gardner reiterated that MI theory was not primarily a psychometric claim and should not be evaluated by psychometric standards alone. This is intellectually consistent but evades the point: educational and corporate adoption of MI does treat the theory as a psychometric claim, and on that ground the theory does not survive direct testing.

Waterhouse 2006 — The Comprehensive Critical Review

The most comprehensive critical review of MI’s empirical status is Lynn Waterhouse, “Multiple Intelligences, the Mozart Effect, and Emotional Intelligence: A Critical Review,” published in Educational Psychologist, volume 41, issue 4, pages 207–225, in 2006 (DOI: 10.1207/s15326985ep4104_1). Waterhouse (College of New Jersey) surveyed the empirical literature on three popular educational-psychology constructs that had achieved massive cultural adoption — MI, the Mozart effect, and emotional intelligence — and asked, for each, what peer-reviewed evidence existed to support the central empirical claims.

For multiple intelligences specifically, Waterhouse’s review documented several telling absences. First, there was no published empirical study (as of 2006, twenty-three years after Frames of Mind) demonstrating that the eight intelligences were statistically separable from each other in a factor-analytic test of an MI-aligned task battery. Visser et al.’s 2006 study, published the same year as Waterhouse’s review, would soon confirm the opposite. Second, there was no published evidence that individuals had stable “dominant intelligences” that predicted differential learning outcomes across instructional formats — the central empirical claim that MI-aligned curriculum design depends on. Third, there was no published evidence that MI-aligned curriculum interventions produced larger learning gains than traditional instruction in the same subject matter; the educational research that did exist tended to be small, methodologically weak, and conducted by researchers with commitments to the framework.

Waterhouse’s broader argument was that MI, the Mozart effect, and EI shared a common pattern. Each had achieved enormous popular and educational adoption on the basis of a small initial set of papers or a single book; each had subsequently been promoted by an industry of consultants, curriculum vendors, and assessment publishers; and each had failed to develop the kind of cumulative peer-reviewed empirical foundation that the popular adoption assumed to exist. Her summary judgment on MI: “There is no empirical evidence that the eight intelligences exist or have any explanatory power” (Waterhouse, 2006, p. 213). This is a strong claim, and Gardner’s response (Gardner & Moran, 2006) contested it on the grounds that Waterhouse had used too narrow a definition of “empirical evidence.” But the substantive point — that the kind of evidence MI would need to support its educational and corporate use does not exist — was not effectively rebutted.

The exchange between Waterhouse and Gardner, both published in the 2006 issue of Educational Psychologist, is illustrative of a deeper methodological dispute. Waterhouse, working from a standard scientific framework, asks: where is the evidence for the specific empirical claims (eight separable intelligences, stable dominant intelligences, instructional benefits of MI alignment)? Gardner responds that MI was never meant to be evaluated by those standards — it is a “framework” or “perspective” that synthesizes evidence from multiple sources rather than a hypothesis to be tested through standard psychometric procedures. Both positions can be defended on their own terms. But the practical question — should school districts and corporations spend money on MI-based assessment and curriculum products as if MI were a scientifically validated framework? — requires the standard scientific answer. And by that standard, MI does not have the empirical backing its adoption implies.

Gardner’s Own Acknowledged Walk-Back

One of the more striking features of the MI literature is that Gardner himself has, in various venues, walked back the strong empirical-cognitive interpretation of his theory in favor of a more modest social and educational interpretation. In his 1995 Phi Delta Kappan paper cited above, he wrote that “MI theory is in most respects a critique of psychometrics-as-usual” — framing the theory as a value-based critique of what should count as intelligent rather than a psychometric claim about cognitive structure. In a 2006 Educational Leadership piece, he acknowledged that “MI theory is, in large part, a critique of psychometric methods” rather than an alternative psychometric framework.

More candidly, in interviews and reflective essays Gardner has framed his theory as much about political and educational reform as about the underlying cognitive science. The argument that schools should value and develop a broader range of human capacities than IQ tests measure is a normative claim about education, defensible on its own terms regardless of whether the underlying psychometric structure looks like one g or eight intelligences. The argument that historically marginalized capacities (musical, bodily-kinesthetic, interpersonal) deserve curricular attention is a values-based educational argument. The argument that human flourishing is better served by recognizing diverse forms of cultural achievement than by ranking everyone on a single IQ scale is a humanistic argument.

These are all reasonable arguments. They are also distinct from the empirical claim that there are eight separable cognitive intelligences. Gardner’s intellectual contribution — the contribution that has had real and lasting cultural value — is the educational and political reframing, not the empirical-cognitive taxonomy. The problem is that the educational and corporate adoption of MI typically treats it as both — borrowing scientific authority from the implicit empirical-cognitive claim while making the practical sale on the basis of the political and values-based appeal.

It is worth being precise about what Gardner has and has not conceded. He has not retracted MI. He has not endorsed the Visser et al. 2006 finding that the intelligences load on a single g-factor. He continues to defend MI as a valid framework. But he has consistently, over many years, declined to defend MI on the psychometric grounds that would justify its educational and corporate use as a measurement-and-development framework. The defenders he relies on are educators and humanists, not psychometricians. The grounds he defends MI on are educational values and convergent qualitative evidence, not factor-analytic structure. This is intellectually honest of him. It is also, for the practical question of whether to invest in MI-based programs, devastating.

What’s Actually Known About Intelligence Now

The empirical alternative to multiple intelligences — the psychometric framework that MI was meant to replace — is in fact one of the most robust and well-replicated findings in all of psychology: the general factor of intelligence (g) and the hierarchical Cattell-Horn-Carroll (CHC) model of cognitive abilities.

The g-factor was first identified by Charles Spearman in 1904. It is the statistical finding that, across virtually any battery of cognitive tasks (vocabulary, arithmetic, spatial reasoning, working memory, processing speed, etc.), performance on any one task is positively correlated with performance on any other — and a single general factor extracted from these correlations explains a large portion of the variance in cognitive performance across the entire battery. This finding has been replicated thousands of times across decades, across cultures, across age groups, and across task batteries. There is genuine scientific debate about the underlying causes of g (neurobiological, developmental, evolutionary), about the precise hierarchical structure of more specific cognitive abilities beneath g, and about the relative importance of g versus more specific abilities for predicting real-world outcomes. There is no serious scientific debate about whether g exists as a statistical regularity. It does.

The modern consensus in cognitive ability research is the Cattell-Horn-Carroll (CHC) hierarchical model, developed through factor-analytic work by Raymond Cattell, John Horn, and John Carroll over several decades. Carroll’s monumental 1993 book Human Cognitive Abilities: A Survey of Factor-Analytic Studies — which re-analyzed over 460 cognitive-ability datasets going back to the 1920s — provided the empirical foundation for the modern CHC framework. The CHC model posits a three-stratum hierarchy: a single general intelligence factor at the top (g, Stratum III), about ten broad cognitive abilities below it (Stratum II — including fluid reasoning, crystallized intelligence, visual processing, auditory processing, processing speed, working memory capacity, long-term storage and retrieval), and dozens of narrow abilities below those (Stratum I — specific skills and knowledge domains). The CHC model is what major modern cognitive-ability assessments (the Woodcock-Johnson, the Stanford-Binet 5, the WAIS-IV, the Cattell Culture Fair Test) are built on, and it is the framework in which contemporary intelligence research operates.

The CHC model is not Multiple Intelligences. It does include multiple specific abilities, but they are organized hierarchically under a common general factor — which is precisely the structure Gardner’s theory was proposing to replace. The CHC model is the empirical answer to the question “are there multiple distinct cognitive abilities?” The answer is yes — there are several broad ability domains and many narrow ones — but they are all positively intercorrelated and substantially explained by a higher-order general factor. The “g-factor versus multiple intelligences” debate is not actually live in modern cognitive psychology; the question has been answered, repeatedly and decisively, by a century of factor-analytic work. The g-factor wins. The hierarchical model of specific abilities under g (the CHC framework) is what intelligence actually looks like in the data.

The predictive validity of g for real-world outcomes is also one of the most robust findings in I/O psychology. Schmidt and Hunter’s classic 1998 meta-analysis (Psychological Bulletin, volume 124, issue 2, pages 262–274) found that general cognitive ability is the single best predictor of job performance across most occupational categories, with corrected validity coefficients in the range of 0.50–0.65. More recent re-analyses have debated the precise magnitude (some recent work suggests the historical figures may have been somewhat inflated by selection effects and statistical corrections), but the qualitative picture remains: g predicts job performance, educational achievement, income, and many other outcomes better than nearly any other psychological variable studied. This is the empirical reality that MI was proposing to displace — and that, after four decades, it has not displaced.

What This Means For Education And L&D Programs

For school district leaders evaluating MI-aligned curriculum products and corporate L&D leaders evaluating MI-based assessment and training programs, the practical implications follow from the empirical picture.

On MI-based learner profiling: the central premise of “differentiated instruction” tied to MI — that each student has a dominant intelligence and that instruction should be aligned to that dominant intelligence — does not have the empirical support its adoption implies. There is no evidence that students have stable dominant intelligences (the test-retest reliability of MI-profiling instruments is generally weak), and there is no evidence that aligning instruction to a putative dominant intelligence produces better learning outcomes than well-designed traditional instruction. This is closely related to the now well-debunked “learning styles” claim (see the related article in this hub) — the meta-analytic evidence that matching instruction to claimed learning styles improves learning outcomes is essentially nil.

On MI-aligned curriculum redesign: the meta-analytic evidence on curriculum interventions, summarized in John Hattie’s Visible Learning (Routledge, 2009) and updated periodically since, places MI-aligned interventions in the bottom portion of the effect-size distribution for educational interventions. Hattie’s synthesis covers more than 800 meta-analyses of educational research, and the broad pattern is that the highest-effect interventions are direct instructional approaches (formative assessment, reciprocal teaching, mastery learning, teacher clarity), not curriculum frameworks built around contested cognitive taxonomies. Resources spent on MI-aligned curriculum redesign typically produce smaller learning gains than the same resources spent on evidence-based instructional practices like formative assessment, spaced practice, and direct instruction.

On MI-based adult assessment and L&D programs: corporate vendors selling MI-derived diagnostic assessments and “develop your dominant intelligence” coaching programs are selling a framework whose direct empirical test (Visser et al. 2006) refuted its central structural claim. If your enterprise is being pitched an MI-based leadership-development program, the diagnostic questions to ask are similar to those for EI assessments: what is the test-retest reliability of the instrument? what is the incremental validity over a Big Five personality assessment plus a cognitive-ability measure? what independent (non-vendor-funded, non-author-conflicted) peer-reviewed studies validate the instrument for the specific use? Vendors who can answer these questions clearly have something defensible to sell; vendors who pivot to “but the framework is widely used” or “Gardner is from Harvard” are selling brand, not science.

On the broader political-educational point: Gardner’s substantive argument — that schools should value a broader range of human capacities than narrow IQ-style testing rewards, that diverse cultural achievements deserve curricular attention, that not all valuable forms of human excellence reduce to verbal and quantitative reasoning — is a defensible educational and humanistic argument. It does not require the MI taxonomy as scientific underpinning. A school that values musical, athletic, social-emotional, and naturalist achievement alongside academic achievement does not need MI to justify that values commitment. The empirical claim that there are eight separable cognitive intelligences is independent of, and not required for, the normative claim that schools should value diverse human capacities. The empirical claim has not survived direct testing. The normative claim is defensible on its own terms, without the contested scientific scaffolding.

On the structural lesson: the MI story is a clear case of a careful theoretical proposal that became a brand, and the brand outran the evidence. Gardner is not at fault for the educational adoption — he proposed the theory honestly, has consistently distinguished framework from psychometrics, and has not personally built the assessment industry. But the educational and corporate adoption treated MI as if it had empirical validation it never had, and the field absorbed the cost in misallocated resources, weakly grounded curriculum design, and the displacement of better-validated alternatives. For CEOs and education leaders, the practical lesson is to be skeptical of any “scientific framework” that the underlying scholar has explicitly declined to defend on psychometric grounds. If the inventor of the theory will not call it a psychometric claim, the vendor selling it as one is selling more than the theory supports.

Sources

  • Gardner, H. (1983). Frames of Mind: The Theory of Multiple Intelligences. New York: Basic Books.
  • Visser, B. A., Ashton, M. C., & Vernon, P. A. (2006). Beyond g: Putting multiple intelligences theory to the test. Intelligence, 34(5), 487–502. DOI: 10.1016/j.intell.2006.02.004
  • Waterhouse, L. (2006). Multiple intelligences, the Mozart effect, and emotional intelligence: A critical review. Educational Psychologist, 41(4), 207–225. DOI: 10.1207/s15326985ep4104_1
  • Gardner, H., & Moran, S. (2006). The science of multiple intelligences theory: A response to Lynn Waterhouse. Educational Psychologist, 41(4), 227–232. DOI: 10.1207/s15326985ep4104_2
  • Gardner, H. (1995). Reflections on multiple intelligences: Myths and messages. Phi Delta Kappan, 77(3), 200–209.
  • Hattie, J. (2009). Visible Learning: A Synthesis of Over 800 Meta-Analyses Relating to Achievement. London: Routledge.
  • Carroll, J. B. (1993). Human Cognitive Abilities: A Survey of Factor-Analytic Studies. New York: Cambridge University Press.
  • Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124(2), 262–274. DOI: 10.1037/0033-2909.124.2.262
  • Spearman, C. (1904). “General intelligence,” objectively determined and measured. American Journal of Psychology, 15(2), 201–292. DOI: 10.2307/1412107
  • Armstrong, T. (1994, 2009, 2017). Multiple Intelligences in the Classroom. Alexandria, VA: ASCD.
  • Replication Crisis Hub — full index — start here for the broader landscape of contested behavioral-science claims that have entered corporate and educational practice.
  • Learning Styles: The Most Persistent Myth in Education — closely related: another popular educational-psychology framework with no empirical support, often bundled with MI in differentiated-instruction programs.
  • Growth Mindset — overlapping audience (school district and corporate L&D buyers), similar pattern of construct becoming oversold relative to the meta-analytic evidence.
  • Goleman’s Emotional Intelligence — direct structural parallel: a popular intelligence-adjacent construct whose academic foundation does not support its commercial use.
  • Left-Brain / Right-Brain Personality — another popular cognitive-style framework with no empirical basis, often co-marketed with MI.
  • Self-Esteem Movement — earlier example of educational-psychology export to schools and corporations that outran its empirical foundation.

FAQ

Does this mean Howard Gardner was wrong about everything?

No. Gardner’s normative argument — that schools should value a broader range of human capacities than narrow IQ-style testing rewards, that diverse cultural achievements deserve curricular attention, that human excellence does not reduce to verbal and quantitative reasoning alone — is defensible on educational and humanistic grounds. His empirical-cognitive claim — that there are eight (or nine) statistically separable intelligences that schools and workplaces should be measuring as distinct constructs — is the part that direct testing (Visser, Ashton & Vernon 2006) has refuted. The pattern is similar to several other entries in this hub: a careful scholar proposes a nuanced framework, the framework becomes a brand, and the brand makes stronger empirical claims than the original work supports. Gardner himself has consistently distinguished his framework from psychometric claims; the educational and corporate adoption has not.

What did Visser, Ashton & Vernon 2006 actually do?

They built a task battery with two purpose-designed ability tasks for each of eight Gardner intelligences (linguistic, logical-mathematical, spatial, musical, bodily-kinesthetic, interpersonal, intrapersonal, naturalist), administered it to 200 adults, and factor-analyzed the results. The central finding: the cognitive intelligences all loaded substantially on a single general factor (g) that was strongly correlated with classical measures of cognitive ability. The non-cognitive intelligences (bodily-kinesthetic, musical) loaded weakly on g, which the authors interpreted as evidence that these are distinct ability domains improperly grouped with cognition by MI. The study is the most direct empirical test of MI’s structural claim ever conducted, and its result — the intelligences are not separable from g — is unambiguous within the limits of the methodology. Gardner’s published response did not contest the factor-analytic finding on its own terms; he reiterated that MI was not primarily a psychometric claim.

What does Gardner himself say about the empirical status of MI?

In multiple venues over decades, Gardner has framed MI as a “framework” or “perspective” rather than a psychometric hypothesis to be tested by standard cognitive-ability methods. In his 1995 Phi Delta Kappan paper he wrote that “MI theory is in most respects a critique of psychometrics-as-usual.” In his 2006 response to Waterhouse, he argued that the relevant evidence for MI comes from neuropsychology, developmental psychology, and cross-cultural observation rather than from factor analysis. He has not retracted MI and continues to defend it as a useful framework, but he has consistently declined to defend it on the psychometric grounds that would justify educational and corporate use as a measurement-and-development framework. This is intellectually honest of him; it is also, for the practical question of whether to invest in MI-based products, an important admission.

Are learning styles the same as multiple intelligences?

They are related but distinct. “Learning styles” theories (visual, auditory, kinesthetic; or VARK; or various other taxonomies) claim that individuals have stable preferred sensory or processing modalities and that instruction matched to the preferred modality produces better learning outcomes. The meta-analytic evidence (Pashler et al. 2009, Psychological Science in the Public Interest) finds essentially no support for the “matching hypothesis” — students do not in fact learn better when instruction is matched to their claimed learning style. MI is a broader theory about cognitive capacities rather than sensory modalities, but the educational adoption of MI and learning styles converged in the “differentiated instruction” movement, and both share the empirical problem that the diagnostic-and-match premise is not supported by the available evidence. See the related article on learning styles in this hub.

If MI is wrong, what should I think about cognitive ability and intelligence?

The empirically defensible modern framework is the Cattell-Horn-Carroll (CHC) hierarchical model of cognitive abilities. The CHC model includes about ten broad cognitive abilities (fluid reasoning, crystallized intelligence, visual processing, auditory processing, processing speed, working memory, etc.) and dozens of narrower abilities — but they are all positively intercorrelated and substantially explained by a higher-order general factor (g). Major modern cognitive-ability assessments (Woodcock-Johnson, Stanford-Binet 5, WAIS-IV) are built on the CHC framework. There is genuine ongoing scientific debate about the relative practical importance of g versus more specific abilities for various outcomes, but the basic structural finding — that cognitive abilities are positively intercorrelated and partially explained by a general factor — is one of the most replicated findings in psychology. This is what intelligence actually looks like in the empirical data, and it is the alternative MI was proposing to displace.

Should my school district stop using MI-based curriculum products?

The evidence-based answer is that resources spent on MI-aligned curriculum redesign typically produce smaller learning gains than the same resources spent on direct-instruction practices, formative assessment, mastery learning, and other higher-effect-size interventions documented in Hattie’s Visible Learning synthesis. If your district is making fresh curriculum-investment decisions, MI-aligned products are not the highest-leverage choice. If your district already has MI-aligned curriculum in place, the cost of disruption may exceed the cost of continuing — but new investments in MI-specific assessment instruments, teacher training in MI-based differentiation, and MI-aligned content licensing are difficult to justify on the available evidence. The normative argument for valuing diverse human capacities does not require the MI taxonomy; you can have arts, music, athletics, and social-emotional learning in your curriculum without committing to MI as a scientific framework.

Should my company stop using MI-based assessment and leadership programs?

For corporate L&D programs that use MI as a pedagogical framework or narrative scaffolding without making strong empirical claims, the harm is mostly opportunity cost — resources that could have gone to higher-validity development approaches are instead spent on a framework with weak empirical foundations. For programs that use MI-derived diagnostic instruments to make selection or placement decisions, the case for not using them is stronger — the instruments have weak test-retest reliability, contested construct validity, and minimal incremental validity over established personality and cognitive-ability measures. If you are evaluating an MI-based vendor pitch, the diagnostic questions are similar to those for EI assessments: ability-based or self-report? test-retest reliability over 6+ months? incremental validity over Big Five and cognitive ability for the specific use? independent peer-reviewed validation? Vendors with defensible answers exist; vendors who pivot to “Gardner is from Harvard” and “the framework is widely used” are selling brand, not science.

What is the broader pattern here? Why do so many educational-psychology constructs end up in this hub?

The recurring structure is: (1) a careful scholar proposes a nuanced framework based on synthesis or initial empirical work; (2) the framework offers a compelling normative or values-based narrative that aligns with broader cultural movements (humanism, anti-IQ-determinism, recognition of diverse human capacities); (3) the framework gets adopted by educational and corporate buyers who treat it as scientifically validated; (4) an industry of consultants, assessment publishers, and curriculum vendors grows up around it; (5) when subsequent direct empirical testing is conducted, the framework either fails the test or is defended by its proponents as not subject to the relevant test; (6) the cultural and commercial adoption continues because of demand-side momentum rather than strengthening evidence. MI fits this pattern almost perfectly, as do learning styles, the self-esteem movement, Goleman’s EI, and grit. The lesson for evidence-driven decision-makers is to be skeptical of frameworks whose original proponents will not defend them on the empirical grounds the practical use implies — and to invest in the well-validated alternatives (CHC cognitive-ability assessment, Big Five personality, direct-instruction pedagogy, formative assessment) that the contested frameworks were proposing to displace.

Share this article
LinkedIn (opens in new tab) X / Twitter (opens in new tab)
Atticus Li

Experimentation and growth leader. CXL-certified CRO practitioner, Mindworx-certified behavioral economist (1 of ~1,000 worldwide). 200+ A/B tests across energy, SaaS, fintech, e-commerce, and marketplace verticals.