For more than a century, the same scene has been described in every introductory psychology textbook in the English-speaking world. An infant — perhaps eight, perhaps eleven months old, identified in the original paper only as “Albert B.” — sits on a mattress in a laboratory at Johns Hopkins. A researcher places a white laboratory rat in front of him. The infant reaches for the rat curiously, with no apparent fear. As the infant’s hand touches the rat, a second researcher behind the infant strikes a suspended steel bar with a hammer, producing a loud crashing sound. The infant startles, cries, and crawls away. The pairing is repeated. After several pairings, the rat alone — without the noise — produces crying and crawling away. The fear, the textbooks tell us, then generalizes: to a rabbit, to a dog, to a sealskin coat, to a Santa Claus mask, to cotton wool. This is offered as the foundational demonstration that emotional responses in humans can be classically conditioned. This is the proof, the textbooks imply, that John Watson’s behaviorist program — that all human behavior is the product of stimulus-response learning — has empirical support at the most basic level.
The study is real. The paper exists. It is Watson and Rayner (1920), “Conditioned emotional reactions,” published in the inaugural volume of the Journal of Experimental Psychology. It can still be downloaded from the American Psychological Association archive. What is missing from the textbook accounts is almost everything that would matter if the same study were submitted to a peer-reviewed journal today. There was one subject. There was no control infant. There was no pre-registered protocol. The fear responses Watson reported as clean and generalized were, on careful reanalysis of Watson’s own surviving film footage by later historians, inconsistent and incomplete. And — most damagingly — the subject may have been a neurologically impaired infant whose abnormalities Watson knew about and never disclosed. This article walks through what the 1920 paper actually says, where the methodology fails the basic tests of evidence we would apply to any study cited as foundational, the seventy-year mystery of who “Albert” actually was, the competing 2009 and 2014 identifications, the film reanalysis that punctures the generalization claim, and why — despite all of this — the study remains in the textbooks.
What The Textbook Version Claims
The textbook version of Little Albert is compact and clean, because that is what a textbook needs. Pavlov demonstrated classical conditioning in dogs in 1903. Watson and Rayner extended the demonstration to humans in 1920. They presented a nine-month-old infant with a white rat (the neutral stimulus). They paired the rat with a loud noise (the unconditioned stimulus) that naturally produced fear (the unconditioned response). After several pairings, the rat alone produced fear (the conditioned response). The fear then generalized to similar stimuli — other furry animals, fur-like objects, a Santa Claus beard. The implication, drawn out in subsequent chapters, is that human phobias are conditioned responses, that emotional life is shaped by learning history rather than by innate or unconscious processes, and that — in Watson’s famous 1930 boast — “give me a dozen healthy infants” and behaviorist methods could shape any of them into any kind of adult.
This is the version that has been transmitted, with minor variation, through Hilgard’s Introduction to Psychology, through Myers, through Gray, through Kalat, through every survey textbook published in the twentieth century and most of the twenty-first. It is the version that students leave Psychology 101 with. It is the version that gets cited in pop-science books on phobias, in cognitive-behavioral therapy textbooks introducing exposure therapy, and in popular articles on the origins of fear. It is the citation that underwrites the claim that classical conditioning is established in humans at the same basic level it is established in Pavlov’s dogs.
The actual 1920 paper does not support this version. The actual 1920 paper supports something narrower, less clean, and considerably more ethically compromised. The gap between the textbook version and the actual paper is the entire subject of this article.
What Watson And Rayner 1920 Actually Did
The 1920 paper opens with Watson and Rayner’s stated aim — to investigate whether emotional reactions could be conditioned in humans the way salivary responses had been conditioned in Pavlov’s dogs. They selected as their subject a single infant, identified in the paper only as “Albert B.”, described as the son of a wet nurse at the Harriet Lane Home for Invalid Children, a pediatric hospital affiliated with Johns Hopkins. Albert was described as healthy, “stolid and unemotional,” and as having shown no fear of a wide range of stimuli during baseline testing at approximately nine months of age — including a white rat, a rabbit, a dog, a monkey, cotton wool, burning newspapers, and a variety of masks. He had, however, reliably shown distress at the sound of a steel bar being struck with a hammer behind his head.
At approximately eleven months of age, the conditioning phase began. The procedure, as described in the paper, was as follows. The white rat was presented to Albert. As Albert reached for it, the steel bar was struck. After two such pairings on the first day, conditioning was paused for one week. At twelve months and ten days, the conditioning resumed. Five additional pairings of rat and noise were administered. Then the rat was presented alone. The paper reports that Albert “began to cry almost instantly” and “turned sharply to the left, fell over, raised himself on all fours and began to crawl away so rapidly that he was caught with difficulty before reaching the edge of the table.”
Five days later — at twelve months and fifteen days — Watson and Rayner conducted a generalization test. They presented Albert with a series of stimuli: the rat again, a rabbit, a dog, a sealskin coat, cotton wool, Watson’s own hair, a Santa Claus mask. The paper reports varying degrees of fear response to each of these stimuli — particularly to the rabbit and the dog. Watson and Rayner interpreted this as evidence that the conditioned fear had generalized along a dimension of “furry” or “animal” stimuli.
The final session reported in the paper occurred at twelve months and twenty-one days, after additional refreshing pairings. Watson and Rayner reported that Albert’s fear responses had persisted and remained stable across the intervening days. The paper notes that the researchers had intended to attempt deconditioning — to extinguish the fear they had induced — but that Albert was removed from the hospital before this could be undertaken. The paper does not specify, in any detail, what became of the infant after the study ended. It mentions, almost in passing, that he was the son of a wet nurse who had then taken him from the institution.
That is the entirety of the empirical content of the most-cited study of human classical conditioning in the history of psychology.
The Single-Subject Problem
The first and most fundamental methodological objection to Watson and Rayner 1920 is that it studied one infant. Not one infant per condition. One infant total. There was no control infant who received the loud noise without the rat. There was no control infant who received the rat without the noise. There was no comparison infant tested for fear of furry stimuli at matched developmental ages without any conditioning history. The entire empirical foundation of the claim that classical conditioning applies to human emotional responses, as the study is typically cited, rests on the observed behavior of a single child.
Single-subject designs are not, in principle, illegitimate. There is an entire methodological tradition — single-case experimental design, ABA designs, multiple-baseline designs — in which carefully controlled within-subject manipulations on small numbers of participants can yield interpretable causal inferences. But that tradition imposes substantial requirements: stable baselines, repeated reversals, careful operationalization of dependent variables, blinded observers. None of these requirements are met in Watson and Rayner 1920. There was no reversal. There was no extinction phase. The dependent variable was a subjective coding of Albert’s emotional state by the researchers who had a stake in the experiment’s outcome. There was no blinded observer.
The single-subject problem compounds when the published narrative is examined closely. The fear responses Watson and Rayner reported were not the clean, all-or-nothing responses the textbook summaries imply. On many of the generalization tests, Albert’s responses were ambiguous — sometimes whimpering, sometimes crawling away, sometimes reaching toward the stimulus, sometimes appearing curious. The paper itself, read carefully, notes these inconsistencies but interprets them, broadly, as supporting the conditioning hypothesis. A skeptical reader of the same descriptions would have considerable grounds to question whether what was being observed was conditioned fear or simply a tired, hungry, increasingly distressed infant in a strange laboratory environment with strangers handling unfamiliar animals.
Harris (1979), in a careful historical reexamination titled “Whatever happened to Little Albert?” (American Psychologist, 34, 151-160), made this point sharply. Harris demonstrated that subsequent textbook accounts had progressively cleaned up the original narrative — exaggerating the strength of the conditioning, exaggerating the breadth and consistency of the generalization, and largely omitting the methodological problems. Harris’s analysis showed that the textbook Little Albert had drifted substantially from the paper Little Albert. The drift was in the direction of cleaner, more compelling, more pedagogically useful — and less true.
Beck 2009: The Subject May Have Been Douglas Merritte
For nearly ninety years after the 1920 publication, the identity of “Albert B.” was unknown. Watson and Rayner had used a pseudonym. The records of the Harriet Lane Home from that era were incomplete. The fate of the infant — whether he had developed a lifelong fear of furry animals, whether he had grown up into adulthood with the implanted phobia or whether the fear had spontaneously extinguished — was unknown.
In 2009, Hall Beck, a psychologist at Appalachian State University, published with Sharman Levinson and Gary Irons a paper titled “Finding Little Albert: A journey to John B. Watson’s infant laboratory” (American Psychologist, 64, 605-614). Beck and colleagues had spent years working through the historical records — birth registers, hospital staff lists, payroll records, photographs from the Watson laboratory — attempting to identify the wet nurse who had borne Albert. Their analysis converged on a candidate: Arvilla Merritte, a wet nurse at the Harriet Lane Home in 1919-1920. Arvilla had a son, Douglas, born approximately on the timeline that would match Albert’s reported age. Beck and colleagues argued, with substantial supporting evidence, that Douglas Merritte was Little Albert.
The finding was significant for the basic question of historical identification, but it became substantially more significant when Beck and colleagues investigated what had become of Douglas Merritte. Douglas Merritte had died at age six, of complications from hydrocephalus — a congenital condition involving abnormal accumulation of cerebrospinal fluid that, in his case, had produced severe neurological impairment. The Merritte family records indicated that Douglas had been a sickly child from infancy. Hospital records suggested that Douglas may have had identifiable neurological abnormalities even at the age at which Watson and Rayner had conducted the conditioning study.
If Beck and colleagues’ identification was correct, the implications were severe. Watson and Rayner had described their subject as healthy, normal, and stolid. If their subject was instead a neurologically impaired infant — with hydrocephalus producing potentially abnormal patterns of arousal, fear response, and learning — then the entire generalizability of the study collapses. A conditioning protocol that produces a particular response pattern in an infant with congenital hydrocephalus tells us little about how typical infants without that condition would respond. And Watson, who had conducted the study at the same hospital where the infant’s medical condition would have been documented, would have either known about the abnormalities and concealed them, or would have failed to assess the basic health of his only subject.
The Beck paper produced substantial controversy within psychology and within the history of psychology. The identification was based on inference from incomplete records. Other interpretations of the same records were possible. But the basic question Beck raised — was Albert healthy as Watson claimed, or was Watson’s only subject neurologically impaired — could not be dismissed simply because the identification was contested.
Powell 2014: An Alternative Identification
In 2014, a research team led by Russell Powell, with Nancy Digdon, Ben Harris, and Christopher Smithson, published “Correcting the record on Watson, Rayner, and Little Albert: Albert Barger as ‘psychology’s lost boy’” (American Psychologist, 69, 600-611). Powell and colleagues, working through additional archival records, proposed an alternative identification: that Little Albert was not Douglas Merritte but William Albert Barger, the son of a different wet nurse at the same institution. William Albert Barger had, on Powell’s analysis, no documented neurological abnormalities. He had lived a long and apparently unremarkable life and had died in 2007. His family had no oral tradition of his having been a participant in a famous psychology experiment.
The Powell identification, if correct, would resolve part of the Beck objection. It would mean that Watson’s subject was a healthy infant after all. But it would also raise its own questions about how Watson had described the subject’s age, weight, and developmental milestones in the 1920 paper — which Powell and colleagues argued matched Barger’s documented characteristics better than Merritte’s.
The competing identifications have not been definitively resolved. The empirical case for each rests on different aspects of the surviving documentary record. Within the history of psychology, both identifications remain contested. The Beck team has continued to defend the Merritte identification in subsequent papers; the Powell team has continued to defend the Barger identification.
What is settled, regardless of which identification is correct, is that the question of whether Watson’s subject was healthy or impaired was a serious one, was for nearly a century not even asked, and remains contested today. A research enterprise that proposes itself as the empirical foundation of a science of human behavior cannot operate on a single subject whose basic medical status, ninety years later, is still in dispute.
Digdon 2014: The Film Reanalysis
In addition to the published 1920 paper, Watson left behind film footage of portions of the conditioning sessions. The footage was used by Watson during his lifetime in lectures and demonstrations and has been preserved in archival collections. The film is silent, low-resolution by modern standards, and incomplete — it captures portions of various sessions but not the full protocol. It is, nonetheless, a primary-source documentary record of what actually happened in the laboratory.
In 2014, Nancy Digdon, Russell Powell, and Ben Harris published “Little Albert’s alleged neurological impairment: Watson, Rayner, and historical revision” (History of Psychology, 17, 312-324). The paper engaged with the Beck team’s claim that Watson’s subject was neurologically impaired, examining the film footage frame by frame for evidence that would support or undermine that claim. The Digdon analysis was inconclusive on the neurological impairment question itself but produced an arguably more damaging set of observations about the fear conditioning narrative.
The film, on careful frame-by-frame examination, does not show the clean, generalized fear response that Watson and Rayner described in the 1920 paper. The infant’s responses to the rat and to the other furry stimuli are inconsistent across sessions and across stimuli. Sometimes the infant reaches for the rat. Sometimes the infant ignores it. Sometimes the infant whimpers in a way that could be coded as fear but could equally be coded as fatigue or hunger. The “generalization” to the rabbit, the dog, and the sealskin coat — described in the textbook accounts as a clean and orderly cascade — is, on the film, considerably less clean. Some of the generalization responses required the rat to be reintroduced alongside the new stimulus to elicit any fear at all.
The film reanalysis is consistent with the Harris 1979 critique: the textbook version of Little Albert has progressively cleaned up the original observations. But the film analysis goes further than Harris did, because it allows direct examination of the primary behavior rather than reliance on Watson’s written descriptions. The film suggests that Watson, in writing up the 1920 paper, selected and described those observations that supported his hypothesis while downplaying or omitting observations that complicated it. This is not p-hacking in the modern statistical sense, because Watson was not running statistical tests. But it is the methodological cousin of p-hacking: the selective reporting of supportive observations from a body of evidence that, taken as a whole, was considerably more ambiguous than the published account implied.
The Ethics, By 1920 Standards And By Modern Standards
The Watson and Rayner study would not be approved today by any institutional review board in any country with a functioning research ethics infrastructure. Deliberately inducing fear in a non-consenting infant — particularly an infant whose caregivers may not have given fully informed consent and who, per Watson’s own account, was never deconditioned — violates basic principles of beneficence, non-maleficence, and informed consent that are now considered foundational to research with human subjects, particularly with children.
The objection sometimes raised in defense is that 1920 was a different era and that contemporary ethical standards should not be applied retroactively. This is partially correct. The Nuremberg Code did not exist in 1920. The Declaration of Helsinki did not exist. The Belmont Report did not exist. Institutional review boards as we now understand them did not exist. Watson was not violating formal codes that existed at the time.
But the objection only goes so far. Concerns about the welfare of children in research were not unknown in 1920. There was, even in the period, a discernible discomfort within the research community about the conditioning study — a discomfort that Watson himself was aware of and that contributed to his subsequent professional difficulties (which were ultimately precipitated by other factors, notably his affair with Rayner and his consequent dismissal from Johns Hopkins). The post-hoc claim that “nobody knew it was wrong” is contradicted by the fact that even contemporaries found the work uncomfortable. And the failure to decondition — Watson’s explicit acknowledgment that he had intended to extinguish the fear but failed to do so before the infant was removed from the institution — is a failure that does not require modern ethics frameworks to recognize as a failure.
Why The Study Is Still In The Textbooks
If Watson and Rayner 1920 is a single-subject case study with no controls, ambiguous fear responses, possible subject impairment, and serious ethics violations, why does it remain in every introductory psychology textbook a century later? There are several reasons, and none of them are good.
The study is pedagogically convenient. Classical conditioning is one of the foundational concepts taught in introductory psychology. The instructor needs a concrete human example to ground what would otherwise be an abstract animal-learning paradigm. Little Albert is the canonical example. There is no replacement example of comparable narrative compactness. A textbook author who wants to introduce classical conditioning to humans in two pages reaches for Little Albert because there is no two-page replacement.
The contemporary alternatives are mostly fragmented or less famous. There are subsequent demonstrations of classically conditioned emotional responses in humans — taste aversions (Garcia 1955), eyeblink conditioning (the work of Larry Squire and colleagues with amnesic patients), conditioned skin conductance responses to fear-relevant stimuli (the work of Arne Öhman). These studies are individually more methodologically sound than Watson and Rayner but are less narratively unified. None of them produces the compact, dramatic, easy-to-remember narrative that Little Albert produces.
The textbook author’s incentive is conservation, not reform. A textbook in its eleventh edition has substantial inertia. The Little Albert section has been in every previous edition. Removing it requires the textbook author to defend the removal to reviewers, to instructors who have built their lecture notes around the section, and to students who took the previous edition and would lose continuity. Adding the methodological caveats — single subject, no controls, contested identification, possible neurological impairment — is the path of least resistance. The result is that the Little Albert section in most contemporary textbooks now has a paragraph noting the controversy, while the canonical narrative of the conditioning and generalization remains intact.
The behaviorist program retained substantial professional momentum. Even after behaviorism’s mid-century decline as a dominant theoretical paradigm, the techniques of behavior therapy, applied behavior analysis, and exposure therapy retained — and still retain — substantial clinical use. Little Albert is the origin story for the inferential chain that runs from classical conditioning of fear to systematic desensitization to in vivo exposure to the modern empirically-supported treatments for specific phobia. The clinical traditions that emerged downstream from Watson have an incentive to preserve the origin story, even as the methodological status of that story has deteriorated.
The honest treatment of Watson and Rayner 1920, in a contemporary introductory psychology textbook, would be a single paragraph noting that the study has historical significance as the first attempt to demonstrate classical conditioning of emotional responses in humans, was conducted on a single infant with no controls, may have been conducted on a neurologically impaired subject whose impairment was not disclosed, and produces fear-response data that — on reanalysis of the surviving film footage — does not cleanly support the generalization claims the paper made. The two-paragraph version that currently appears in most textbooks does not honestly represent the methodological status of the study, and a CEO or founder or consultant who has acquired their model of classical conditioning from those textbooks should treat that model as substantially less well-grounded than the textbook presentation implies.
What This Tells Us About How To Read Foundational Studies
The lesson of Little Albert, like the lesson of Bandura’s Bobo doll, like the lesson of the Stanford Prison Experiment, is that the canonical introductory-textbook version of a foundational study is, almost always, a substantially cleaned-up and pedagogically simplified version of a primary source that — on careful reading — supports a considerably narrower claim. The cleanup happens in steps. The original paper, written by a researcher with a stake in the finding, already presents the strongest case the data supports. The secondary literature, citing the original paper, summarizes that case in compact form. The tertiary literature — including textbooks — further compacts and simplifies. By the time the study reaches the introductory student or the lay reader, it has been smoothed into a clean narrative that the original data, examined directly, does not support.
This pattern is not unique to behaviorism. It is the default failure mode of how scientific findings move from the primary literature into general knowledge. The remedy is the discipline of reading primary sources before citing them. For a CEO or founder making decisions about product, marketing, or organizational design on the basis of behavioral-science findings, the discipline matters concretely: a model of human behavior built on textbook summaries of foundational studies will be substantially more confident, and substantially less accurate, than a model built on careful reading of what the underlying studies actually established.
The remediation, where Little Albert specifically is concerned, is straightforward. Classical conditioning in humans is a real phenomenon. Modern fear-conditioning research — using skin conductance, fMRI, and standardized paradigms with large samples and proper controls — has established it on much sounder empirical ground than Watson and Rayner ever did. The clinical applications, including exposure therapy for specific phobia, have substantial independent empirical support. None of the modern science depends on Watson and Rayner being correct. The study can be honestly described as a flawed historical first attempt — with the methodological problems, the ethics problems, and the contested subject-identification problems acknowledged — without compromising the modern science. The damage done by the textbook version is not in the underlying scientific content but in the credulity it instills: a generation of students who learn that “there’s a famous experiment that proves it” without learning that famous experiments often prove considerably less than they are cited to prove.
Primary Sources
Watson, J. B., & Rayner, R. (1920). Conditioned emotional reactions. Journal of Experimental Psychology, 3(1), 1-14. DOI: 10.1037/h0069608
Harris, B. (1979). Whatever happened to Little Albert? American Psychologist, 34(2), 151-160. DOI: 10.1037/0003-066X.34.2.151
Beck, H. P., Levinson, S., & Irons, G. (2009). Finding Little Albert: A journey to John B. Watson’s infant laboratory. American Psychologist, 64(7), 605-614. DOI: 10.1037/a0017234
Digdon, N., Powell, R. A., & Harris, B. (2014). Little Albert’s alleged neurological impairment: Watson, Rayner, and historical revision. History of Psychology, 17(4), 312-324. DOI: 10.1037/a0037325
Powell, R. A., Digdon, N., Harris, B., & Smithson, C. (2014). Correcting the record on Watson, Rayner, and Little Albert: Albert Barger as “psychology’s lost boy.” American Psychologist, 69(6), 600-611. DOI: 10.1037/a0036854
Related Cases In The Replication Crisis Hub
The Little Albert case sits within a broader pattern of foundational studies in psychology whose textbook versions have drifted substantially from what the primary research established. Several related cases are examined elsewhere in the replication crisis hub:
- Bandura’s Bobo Doll — The 1961 and 1963 imitation studies whose actual findings were considerably more modest than the media-violence inferences drawn from them.
- The Stanford Prison Experiment — Zimbardo’s 1971 simulation whose narrative collapsed under archival reexamination in the 2010s.
- Milgram’s Obedience Studies — The 1961-1962 experiments whose dramatic 65% obedience figure obscures substantial methodological and replication issues.
- Schachter-Singer Two-Factor Theory Of Emotion — The 1962 epinephrine study whose two-factor theory of emotion has not held up in subsequent replication attempts.
- Mirror Neurons — The 1990s primate findings whose extension to human empathy and social cognition has substantially outpaced the supporting data.
Frequently Asked Questions
Was Little Albert really conditioned to fear the white rat?
The 1920 paper claims he was. The surviving film footage, on careful reanalysis by Digdon, Powell, and Harris (2014), shows responses considerably more inconsistent and ambiguous than the paper described. The single-subject design with no controls makes it impossible to know whether what was observed was conditioned fear, normal infant distress at unfamiliar stimuli in a stressful environment, or some combination of both.
Did Little Albert have a lifelong phobia?
This is unknown. Watson and Rayner did not follow up. The infant was removed from the hospital before any deconditioning was attempted. If Powell’s identification of Albert as William Albert Barger is correct, Barger lived an apparently unremarkable life and died in 2007 with no documented animal phobias. If Beck’s identification as Douglas Merritte is correct, Merritte died at age six of complications from hydrocephalus, before any phobia history could be assessed.
Was Watson aware that his subject might have been impaired?
If the Beck identification is correct, the question is whether Watson, conducting the study at the same hospital where Douglas Merritte’s medical condition would have been documented, either knew about the impairment and concealed it, or failed to perform basic medical screening of his only subject. Either possibility is a serious indictment. If the Powell identification is correct, the question is moot — Barger had no documented impairment — but the broader question of why the study was conducted on a single un-screened subject still stands.
Why is this study still taught if it has these problems?
Pedagogical convenience, textbook inertia, the absence of an equally compact replacement example, and the downstream clinical traditions (behavior therapy, exposure therapy) that have an institutional stake in preserving the origin story. The honest treatment would be a brief acknowledgment of the historical significance alongside an explicit statement of the methodological problems. Most contemporary textbooks include a paragraph of caveats while preserving the canonical narrative — a compromise that does not serve students well.
Does classical conditioning of fear actually occur in humans?
Yes — but the evidence comes from substantially better-designed modern studies using skin conductance, startle responses, fMRI, and properly controlled paradigms with large samples. Modern fear-conditioning research stands on its own empirical foundation and does not require Watson and Rayner to have been correct. The clinical applications, including exposure therapy for specific phobia, have substantial independent empirical support. The damage done by the textbook version of Little Albert is not to the underlying science but to students’ calibration of how strong the evidence for famous foundational studies actually is.
What is the broader lesson for evaluating behavioral science research?
That the textbook version of a foundational study is, almost always, a cleaner and more compelling version than the primary source supports. Reading primary sources, attending to sample sizes, asking about controls, and tracking how a finding has been replicated (or not) in the subsequent decades are not optional steps for anyone making decisions on the basis of behavioral-science findings. The discipline of evidence evaluation is, as much as anything, the discipline of resisting the pedagogically convenient narrative.