For 70 years, every methodology textbook has cited the same story: workers at the Hawthorne Works got more productive when they were watched, regardless of what changed. The data the story was supposedly based on were lost for decades. When economists Levitt and List recovered them in 2011, they found the famous pattern is, in their words, entirely fictional. Here is what the original studies actually showed, who coined the term and when, and why a misread of a 1924 experiment still anchors arguments about open offices and constant feedback in 2026.

There is a story that nearly every business school graduate, organizational psychologist, and management consultant has heard. At a Western Electric factory called the Hawthorne Works, in Cicero, Illinois, in the late 1920s, researchers tried to figure out whether brighter lighting made workers more productive. They turned the lights up, and productivity rose. Confused, they turned the lights down, and productivity rose again. They turned them very low, almost to moonlight, and productivity still rose. The conclusion: it wasn’t the lighting. It was the fact of being watched. Workers, knowing they were being studied, worked harder regardless of what was changed. The phenomenon got a name --- the “Hawthorne effect” --- and it became one of the most-cited findings in the social sciences.

That story is used today to justify a remarkable range of management practices. It is cited to defend open-plan offices (“transparency keeps people on task”), constant feedback culture (“being measured is itself motivating”), surveillance-style productivity software (“once you measure it, it improves”), and the methodological caution that any organizational experiment is hopelessly contaminated by the fact that participants know they’re being studied. The story is told in introductory psychology, in HR certification courses, in research methods textbooks, in change-management seminars, and in TED Talks.

Almost none of the empirical claim is supported by the original data. The original illumination experiment data were thought to have been lost for decades. When economists Steven Levitt and John List recovered and reanalyzed them in 2011, what they found in the actual recorded production figures bore little resemblance to the famous narrative. The “every change in lighting raised productivity” pattern, in their words, was “entirely fictional.” Earlier critiques --- Stephen Jones in 1992, John Adair in 1984 --- had already shown that the canonical interpretation could not be derived from the data. The term itself, “Hawthorne effect,” was coined by a sociologist named Henry Landsberger in 1958 --- about a quarter-century after the studies ended, applied retroactively to a reinterpretation of someone else’s published summary.

This article walks through what the studies actually were, what the original investigators published, when and how the now-famous label got attached, what the modern reanalyses have found, and what is honest to say in 2026 about whether being observed changes behavior at work. The bottom line: there is some evidence that research participation can shift behavior in some settings by modest, heterogeneous amounts. There is no single, clean, reliable “Hawthorne effect” of the kind every textbook describes. The story is older than the term, the term is younger than the data, and the interpretation is wider than the evidence.

What the Hawthorne Studies Actually Were (1924—1932)

The studies usually grouped under the “Hawthorne” label were not a single experiment. They were a roughly eight-year sequence of distinct investigations conducted at the Hawthorne Works of the Western Electric Company, a manufacturing plant in Cicero, Illinois, that at its peak employed about 45,000 people producing telephone equipment. The studies are conventionally divided into three main phases.

The Illumination Experiments (1924—1927). Sponsored initially by the National Research Council of the National Academy of Sciences, the illumination experiments tested whether changes in workspace lighting affected worker output. Different test rooms received different lighting conditions and were compared to control rooms. The experiments were the first phase, and they are the studies most often invoked as “evidence” for the Hawthorne effect. They are also the studies whose data were thought lost --- and which Levitt and List recovered and reanalyzed in 2011.

The Relay Assembly Test Room (1927—1932). Conducted with the involvement of Harvard researchers Elton Mayo and Fritz Roethlisberger, this phase took a small group of five young women who assembled telephone relays, separated them from the main production floor into a special test room, and systematically varied working conditions --- rest breaks, work hours, pay incentives, refreshments. The widely cited claim is that production rose under almost every condition tested, including when conditions were returned to the original baseline.

The Bank Wiring Observation Room (1931—1932). A group of fourteen male workers who assembled terminal banks for telephone exchanges were observed in a dedicated room. Unlike the relay assembly study, the bank wiring study did not vary working conditions experimentally --- it was primarily an observational study of group dynamics. The famous finding from this phase was not a productivity bump but the opposite: workers informally enforced group norms that restricted output, with social pressure (“rate busters,” “chiselers,” “squealers”) used to keep individual production in line with what the group considered fair.

This last point is worth pausing on, because it is almost never mentioned in the popularized “Hawthorne effect” story. The bank wiring study, which used essentially the same kind of intensive observation as the relay assembly study, did not produce a productivity bump from being watched. It produced documented evidence that workers under observation actively held their output down through informal social control. If “being watched makes people more productive” were a general phenomenon, the bank wiring room should have shown it. It did not.

The studies were summarized in two influential books written after the fact. Elton Mayo’s The Human Problems of an Industrial Civilization (Macmillan, 1933) was a sweeping interpretive essay that placed the Hawthorne findings into a broader argument about human motivation and the limits of “scientific management.” Fritz Roethlisberger and William Dickson’s Management and the Worker (Harvard University Press, 1939) was a much longer, more detailed report of the experiments themselves, organized around the relay assembly and bank wiring phases. Both books advanced what came to be called the “human relations” school of management --- the idea that worker productivity is shaped less by physical conditions and pay than by social and psychological factors, attention from supervisors, and group dynamics.

What neither Mayo nor Roethlisberger and Dickson actually argued, in the technical language of later researchers, was the “Hawthorne effect” as it is now understood. They argued for the importance of social and supervisory attention. The specific claim that the act of being studied, independent of conditions changed, raises productivity in a measurable way --- that is a later interpretation, narrower than the original argument, that got attached to the studies long after they were over.

What Roethlisberger and Dickson Published in 1939

Management and the Worker is a 615-page book. It does not contain the simple, vivid story now associated with the Hawthorne effect. It contains a detailed description of the experiments, the working conditions, the production records, the interview program, and the bank wiring observations. Its interpretive thrust is the “human relations” framing: that supervisory attention, group cohesion, and the social meaning of work are major determinants of productivity, often more so than physical conditions or piece-rate pay.

Two things are true about Management and the Worker that the modern reception of the studies has flattened.

The illumination experiments are given limited and ambiguous treatment. The book is principally about the relay assembly and bank wiring phases. The illumination experiments, which preceded the Harvard involvement, are summarized briefly and treated as having produced ambiguous results --- including the observation that productivity sometimes rose under lower illumination, which the authors did not interpret as proof of an “observer effect” but as evidence that illumination, within tolerable ranges, was not the primary driver of output.

The relay assembly findings are not presented as a clean “observation effect.” Roethlisberger and Dickson discuss multiple factors that could have driven the production rises in the relay assembly room: changes in supervisory style, the supportive relationship the women developed with the experimenters, the rest breaks, the changes in pay structure (the relay assembly group was placed on a smaller piece-rate group that gave each member a larger share of group output), the selection of the five workers, and the social cohesion of the small team. They do not isolate “being studied” as a discrete causal factor with a measurable effect.

The interpretive leap from “supervisory attention and social factors matter for productivity” to “being studied, in itself, raises productivity in a clean, generalizable way” happened later, in the secondary literature. It is the secondary literature, not the original report, that produced the textbook Hawthorne effect.

When the “Hawthorne Effect” Was Actually Named

The term “Hawthorne effect” does not appear in Mayo (1933). It does not appear in Roethlisberger and Dickson (1939). It does not appear in the contemporaneous discussion of the studies through the 1940s and into the 1950s. The term was coined by sociologist Henry A. Landsberger in his 1958 book Hawthorne Revisited (Cornell University), which was itself a reconsideration of the studies a quarter-century after they ended.

Some scholars credit John R. P. French with using a similar phrase as early as 1953, but Landsberger’s Hawthorne Revisited is the source from which the term entered general use. Landsberger’s definition was modest: a short-term improvement in performance caused by observing worker behavior, which he viewed as a methodological complication researchers needed to be aware of rather than as a major substantive finding about human motivation.

The chronology matters. The data were collected from 1924 to 1932. The first major published interpretations appeared in 1933 (Mayo) and 1939 (Roethlisberger and Dickson). The term that now dominates the discussion was applied retroactively in 1958, by a third author summarizing a particular interpretation of those earlier publications. By the time the term became standard in methodology textbooks in the 1960s and 1970s, two reinterpretive layers separated readers from the original data --- and the original data themselves were nearly impossible to access, because the records were thought to have been lost or destroyed.

This is a quietly remarkable situation. The textbook version of the Hawthorne effect is the popularization of Landsberger’s 1958 framing of Roethlisberger and Dickson’s 1939 framing of data collected between 1924 and 1932. Each layer simplified and sharpened the previous one. The original investigators did not claim what the textbook claims. The eventual textbook claim was anchored in interpretive summaries rather than the raw production records. And for most of the period in which the “Hawthorne effect” was taught as established fact, the raw production records were not available for anyone to check.

Levitt and List 2011: What the Original Data Actually Show

The most important recent development in the Hawthorne literature is Steven D. Levitt and John A. List’s paper “Was There Really a Hawthorne Effect at the Hawthorne Plant? An Analysis of the Original Illumination Experiments,” published in the American Economic Journal: Applied Economics in 2011 (DOI: 10.1257/app.3.1.224). Levitt and List located archived data from the 1924—1927 illumination experiments --- data that had been widely assumed to be lost or destroyed --- and conducted a modern statistical reanalysis.

Their central finding, in the language of the paper itself, is that “existing descriptions of supposedly remarkable data patterns prove to be entirely fictional.” The famous narrative --- that every change in illumination, including reductions, produced increases in productivity --- cannot be derived from the actual production records. The data do not show the clean, universal “more output regardless of lighting direction” pattern that has been confidently described in textbooks for decades.

What the data do show is more complicated. Production patterns at the Hawthorne plant varied with a number of factors, including the time of day, the day of the week, and --- importantly --- confounding events that coincided with the experimental manipulations. One example Levitt and List flagged: experimental lighting changes were often introduced at specific points in the work week (commonly Mondays), and there was a separate pattern of production being higher on Mondays for reasons unrelated to lighting. Earlier analyses that attributed Monday production jumps to “the new lighting condition” were attributing to one variable what was at least partially driven by another.

Levitt and List did not conclude that no observation effect could ever have existed. They proposed a more careful method for testing for it --- comparing responses to experimenter-induced variation against the responses to naturally occurring variation in the same workplace --- and found suggestive but small effects of this kind in the recovered data. Their overall conclusion was that the original Hawthorne illumination experiments were seriously flawed and that “no clear lessons” of the kind commonly drawn from them are warranted from the data.

This is the most rigorous statistical look that has been taken at the original data, and it is a substantial deflation of the textbook story. The famous illumination pattern --- the most-cited single piece of evidence for the Hawthorne effect --- is not visible in the original production records when those records are analyzed properly.

What Other Critiques Have Found

The Levitt and List paper is the most recent and most data-grounded critique, but it is not the first. Skepticism about the textbook Hawthorne effect has been part of the methodological literature for decades.

Stephen R. G. Jones (1992), “Was There a Hawthorne Effect?” in the American Journal of Sociology (DOI: 10.1086/230046), reexamined the Relay Assembly Test Room data using more modern statistical techniques. Jones concluded that the data show “slender or no evidence” of a Hawthorne effect in the relay assembly experiments. The production rises that occurred in that test room are better explained by changes in pay structure, the small-group piece-rate incentive, and selection effects than by a generalized “knowing one was being studied” mechanism.

John G. Adair (1984), “The Hawthorne Effect: A Reconsideration of the Methodological Artifact,” in the Journal of Applied Psychology (DOI: 10.1037/0021-9010.69.2.334), surveyed the empirical literature on Hawthorne-like effects in field experiments and found that the term had been used so loosely as to be nearly meaningless. Adair argued that the methodological concept that researchers should worry about --- participants’ reactions to being studied --- was real but did not correspond to the simple “more output regardless of conditions” claim associated with the Hawthorne plant data. Different studies invoking the “Hawthorne effect” were referring to different phenomena, often with little overlap.

Across these critiques, a consistent pattern emerges. The popular “Hawthorne effect” story does not survive careful reanalysis of the data it is supposedly based on. The phenomenon that motivated the term --- research participation affecting behavior --- is real in some forms in some settings, but it is not the clean, universal, large effect the textbooks describe. The story has been told and retold because it is methodologically convenient and rhetorically vivid, not because it accurately summarizes what happened in the Hawthorne plant.

Why the Story Stuck

Several factors help explain why the Hawthorne narrative has been so resilient in spite of decades of critique.

Narrative power. “They turned the lights down and productivity rose” is a perfect classroom story. It is short, vivid, counterintuitive, and morally tidy. It comes with a memorable name and a single iconic location. It explains a complex phenomenon (worker motivation) with a simple mechanism (attention). Stories with those properties propagate far beyond the data that support them.

Methodological convenience. Researchers conducting field experiments need a way to discuss the possibility that participants’ awareness of being studied could affect outcomes. The “Hawthorne effect” provided a ready shorthand. Once the term existed in the methodological vocabulary, it served a function --- flagging a real concern --- that kept it in circulation even as the empirical basis weakened.

Citation laziness. The textbook account of the Hawthorne studies has been copied from textbook to textbook for sixty years. Most secondary citations refer to other secondary sources, not to the original Roethlisberger and Dickson volume, much less to the raw data. Errors and oversimplifications in early summaries became canonical because the citation chain effectively shielded them from primary-source review.

Useful to many sides of management debates. The story is invoked to support pay-for-attention motivational arguments, transparency and feedback cultures, surveillance-based productivity tools, and --- paradoxically --- as a methodological argument that any workplace experiment is hopelessly contaminated by observer effects. A story that supports many different uses gets cited more often than one that supports a narrower argument. The Hawthorne effect’s flexibility has been part of its survival.

The data were inaccessible. For most of the period during which the textbook story solidified, the raw data from the illumination experiments were assumed lost. There was no way for an interested researcher to check. Levitt and List’s 2011 recovery of the data is a relatively recent event in the history of the literature. The story had been settled for decades by then.

What’s Honest to Say About Observation Effects Now

The most useful recent treatment of what the empirical literature actually supports is Jim McCambridge, Jim Witton, and Diana R. Elbourne (2014), “Systematic Review of the Hawthorne Effect: New Concepts Are Needed to Study Research Participation Effects,” in the Journal of Clinical Epidemiology (DOI: 10.1016/j.jclinepi.2013.08.015). The authors identified nineteen purposively designed studies --- randomized controlled trials, quasi-experimental studies, and observational evaluations --- that attempted to estimate the size of research participation effects on participant behavior.

The published estimates were heterogeneous. Across the eight randomized controlled trials, the pooled odds ratio was small and not statistically significant (about 1.06, 95 percent CI 0.98—1.14). Across the observational studies, a larger effect was found (about 1.29, 95 percent CI 1.06—1.30). The overall pooled estimate across all nineteen studies was a modest 1.17 (95 percent CI 1.06—1.30). Importantly, the studies varied widely in what they operationalized as “being studied” --- being directly observed, knowing one’s behavior was being measured, completing baseline questionnaires that focused attention on the behavior, and so on.

McCambridge and colleagues drew a careful conclusion: “Consequences of research participation for behaviors being investigated do exist, although little can be securely known about the conditions under which they operate, their mechanisms of effects, or their magnitudes.” And, most clearly: “There is no single Hawthorne effect.”

That is the honest summary of what 90 years of subsequent research can say about the effect the famous studies are supposedly the origin of. There are real, heterogeneous, generally modest effects of various forms of research participation on various behaviors. They depend on the nature of the participation, the behavior being studied, the population, and many other factors. They are not well captured by a single named phenomenon, and they are not large enough or reliable enough to justify the sweeping claims commonly made about workplace observation under the “Hawthorne effect” banner.

What This Means for Leaders and Researchers

The collapse of the textbook Hawthorne effect has practical implications for two distinct audiences.

For leaders making management decisions. If you are designing an open-plan office, choosing a productivity-monitoring tool, deciding how visible team metrics should be, or building a feedback culture, do not anchor the decision on appeals to the Hawthorne effect. The empirical basis for “being watched makes workers more productive” is much weaker than the popular literature suggests. The strongest single piece of cited evidence --- the illumination experiments --- does not show the pattern when the original data are properly analyzed. The bank wiring observation room, which used similarly intense observation, actually showed workers using social pressure to restrict output. Whatever case you want to make for transparency, feedback, monitoring, or open-plan layouts, make it on the actual merits and tradeoffs of those choices in your context. Do not lean on a contested 1924—1932 dataset that has been reinterpreted four times and now does not seem to show what it was famous for showing.

For researchers designing studies. The methodological concern that participants’ awareness of being studied could affect outcomes is genuine. But it is also heterogeneous and not well captured by a single named effect. McCambridge and colleagues’ recommendation is to be specific: ask which mechanism of “research participation effect” you are worried about for your particular study (being observed? knowing the outcome is being measured? completing a baseline questionnaire that focuses attention?), and design controls that address that specific mechanism. The blanket invocation of “the Hawthorne effect” as a reason to dismiss field-experimental findings --- or to attribute any unexpected positive result to “the participants knew they were being watched” --- is, at this point, a misuse of a contested concept. There is no single, reliable effect of that name to invoke. There are several different phenomena, each with their own evidentiary status, that deserve separate consideration.

The broader lesson is the one that runs through nearly every entry in the replication crisis hub: a famous finding is often a tidy story imposed on messy data, retold confidently for so long that the original messiness becomes invisible. The Hawthorne effect is a particularly clean example because it has all the canonical features: original data initially lost, a memorable name applied decades after the fact, a confident textbook story that does not match the raw records, and a long delay before anyone could actually go back and check. That checking has now happened. The story does not survive it.

Sources

  • Adair, J. G. (1984). The Hawthorne effect: A reconsideration of the methodological artifact. Journal of Applied Psychology, 69(2), 334—345. DOI: 10.1037/0021-9010.69.2.334
  • Jones, S. R. G. (1992). Was there a Hawthorne effect? American Journal of Sociology, 98(3), 451—468. DOI: 10.1086/230046
  • Landsberger, H. A. (1958). Hawthorne Revisited. Cornell University.
  • Levitt, S. D., & List, J. A. (2011). Was there really a Hawthorne effect at the Hawthorne plant? An analysis of the original illumination experiments. American Economic Journal: Applied Economics, 3(1), 224—238. DOI: 10.1257/app.3.1.224
  • Mayo, E. (1933). The Human Problems of an Industrial Civilization. Macmillan.
  • McCambridge, J., Witton, J., & Elbourne, D. R. (2014). Systematic review of the Hawthorne effect: New concepts are needed to study research participation effects. Journal of Clinical Epidemiology, 67(3), 267—277. DOI: 10.1016/j.jclinepi.2013.08.015
  • Roethlisberger, F. J., & Dickson, W. J. (1939). Management and the Worker. Harvard University Press.

FAQ

Does observation really not affect behavior at all? That is not the conclusion. Observation effects exist in some forms in some settings. The McCambridge 2014 systematic review found a modest pooled effect --- about a 17 percent change on the odds ratio scale, with significant heterogeneity across studies. What is not supported is the strong textbook version: that observation reliably raises performance by a large, generalizable amount across contexts. There is no single Hawthorne effect of that kind.

What about clinical trials? Aren’t participants more compliant because they know they’re being studied? Some clinical trials do show modest research participation effects, particularly in observational designs. But the size and direction varies --- participants in some trials become more compliant, in some trials become more anxious or hyper-reporting, and in some trials show no measurable change. The honest answer is that trial designers should be specific about which mechanism they’re worried about and design around that specific risk, rather than invoking “the Hawthorne effect” as a generic caveat.

What about users of A/B-tested websites and apps? Are they affected by knowing they’re in a test? Most A/B test participants do not know they are in an experiment. The user-facing condition is usually invisible. The classical Hawthorne concern --- participant awareness of being studied --- generally does not apply. The real methodological concerns with A/B tests are different ones: novelty effects (a new variant looks different and gets attention for that reason, not the design intent), seasonality, selection on observable behaviors, and statistical issues like peeking and multiple comparisons. Invoking the Hawthorne effect in an A/B testing context is usually a category error.

What about workplace surveys and employee monitoring? Surveys with explicit awareness of measurement can shift responses --- this is a real phenomenon, well-documented in survey methodology under names like social desirability bias and self-presentation effects. But these are specific named phenomena with their own literatures, not generic “Hawthorne effects.” Employee monitoring software, similarly, has been shown to change short-term behavior in some studies, but the effects depend heavily on what’s being measured, how the feedback is structured, and the trust climate. The textbook Hawthorne story is not a useful guide to predicting those outcomes.

If the original data don’t support the effect, why is it still taught? A combination of citation inertia, the rhetorical usefulness of a vivid named effect, and the fact that the original data were inaccessible for most of the period during which the textbook story solidified. The Levitt and List recovery of the data is only fifteen years old. Textbook revisions move slowly. Many introductory psychology and management textbooks still teach the 1924—1932 illumination story as if it were a settled finding.

Did Roethlisberger and Dickson lie about the data in their 1939 book? No. Management and the Worker is a long, careful, generally honest report. The book does not actually contain the simplified textbook story of “productivity rose with every lighting change.” It contains a more measured discussion of multiple causal factors. The textbook simplification happened in secondary sources, particularly after Landsberger applied the term “Hawthorne effect” in 1958. Misreading later authors should not be conflated with misconduct by the original investigators.

Should we stop using the term “Hawthorne effect” entirely? A defensible position. McCambridge and colleagues (2014) recommended that researchers move to more specific language --- “research participation effects,” with the particular mechanism named --- rather than relying on a single contested term. In management writing and methods teaching, replacing the Hawthorne effect with more precise language (“observer effects in field experiments,” “social desirability in self-report surveys,” “novelty effects in workplace pilots”) would more accurately convey what we actually know.

What is the strongest single piece of evidence for any observation effect at the Hawthorne plant? Even Levitt and List, who substantially deflated the famous narrative, found suggestive evidence of small observation effects in their reanalysis. The strongest evidence is for modest, context-dependent shifts in behavior under explicit experimental manipulation --- not for the textbook “productivity rose with every change regardless of direction” story. The historical record supports a small, heterogeneous family of effects, not the single dramatic phenomenon the term has come to imply.

replication-crisis hawthorne-effect workplace-psychology research-methodology evidence-evaluation

Free Tool

Built for Experimentation Teams

GrowthLayer is the experimentation platform I built for CRO teams --- test management, AI-powered insights, and pattern recognition across your entire program.

Explore GrowthLayer → (opens in new tab)

· Start Free →

Share this article

LinkedIn (opens in new tab) X / Twitter (opens in new tab)

Copy link

Go deeper

Methodology The PRISM Method Case Studies $30M+ in Results Work Together Services & Mentoring

Experimentation and growth leader. Builds AI-powered tools, runs conversion programs, and writes about economics, behavioral science, and shipping faster.

About LinkedIn Newsletter

← Previous

Pygmalion Effect: The Self-Fulfilling Prophecy That Mostly Wasn’t

Next →

Money Priming: The Influential 2006 Effect That Modern Replications Cannot Find

Share this article
LinkedIn (opens in new tab) X / Twitter (opens in new tab)
Atticus Li

Experimentation and growth leader. CXL-certified CRO practitioner, Mindworx-certified behavioral economist (1 of ~1,000 worldwide). 200+ A/B tests across energy, SaaS, fintech, e-commerce, and marketplace verticals.