Broken Windows Theory: The Atlantic Essay That Reshaped Policing On Weak Evidence

Atticus Li

← Blog · replication-crisis

Broken Windows Theory: The Atlantic Essay That Reshaped Policing On Weak Evidence

In 1982, two academics published a nine-page essay in The Atlantic Monthly arguing that visible disorder causes serious crime. It was not empirical research. It was a theoretical argument illustrated by Philip Zimbardo's 1969 study — which had a sample of two cars. Within fifteen years, that essay had reshaped policing in New York and dozens of cities worldwide. Here is what the evidence actually shows about broken windows in 2026, and what it teaches strategists about behavioral-science arguments built on memorable stories.

By Atticus Li May 12, 2026 30 min read

In 1982, two academics published a nine-page essay in The Atlantic Monthly arguing that visible disorder causes serious crime. It was not empirical research. It was a theoretical argument illustrated by Philip Zimbardo’s 1969 study --- which had a sample of two cars. Within fifteen years, that essay had reshaped policing in New York and dozens of cities worldwide. Here is what the evidence actually shows about broken windows in 2026, and what it teaches strategists about behavioral-science arguments built on memorable stories.

In March 1982, The Atlantic Monthly published a nine-page essay called “Broken Windows: The Police and Neighborhood Safety.” Its authors were James Q. Wilson, then a political scientist at Harvard, and George L. Kelling, a criminologist who had worked with the Police Foundation. The essay argued that visible disorder in a neighborhood --- broken windows, graffiti, public drinking, panhandling, loitering --- was not just an aesthetic problem but a cause of serious crime. Left unchecked, the small disorders signaled that nobody was in charge, that informal social control had broken down, and that more serious offenses would follow. Police, the authors argued, should focus on preventing this cascade by attending to the small disorders, not just the felonies.

It was a magazine essay. It was not empirical research. It contained no original data, no field experiments, no statistical analyses. It was a theoretical argument, illustrated by anecdote and by a single 1969 study by Philip Zimbardo in which he abandoned two cars --- one in the Bronx, one in Palo Alto --- and observed what happened to them. That essay went on to become the most influential criminological argument of the late twentieth century. Within twelve years it had reshaped policing in New York City under William Bratton and Rudy Giuliani. Within twenty years it had been invoked as the intellectual rationale for “quality of life” enforcement, zero-tolerance policing, and the aggressive use of stop-and-frisk in cities from Los Angeles to London. By 2013, NYPD officers were conducting roughly 685,000 stops a year, the vast majority on Black and Latino men, in the name of the order-maintenance approach that traced its lineage back to that 1982 essay.

The empirical evidence assembled in the decades since is much weaker than the policy implementation has suggested. The Zimbardo study was not designed to test broken windows theory and did not have anywhere near the statistical power to support the inferences drawn from it. Bernard Harcourt’s 2001 book Illusion of Order and his 2006 paper with Jens Ludwig found that the relationship between disorder enforcement and crime decline in 1990s New York was at best weak and at worst attributed crime drops to enforcement that were better explained by other forces. Steven Levitt’s 2004 Journal of Economic Perspectives paper assigned the 1990s national crime decline to four factors --- none of which was broken-windows policing. A 2008 Science paper by Keizer, Lindenberg, and Steg appeared to vindicate the theory experimentally, but subsequent replication and meta-analytic work found the cross-norm effect was much weaker and less reliable than the original report suggested. The most rigorous meta-analysis of disorder policing, by Anthony Braga and colleagues in 2015, found a modest crime-reduction effect --- but the effect was concentrated in community-oriented, problem-solving approaches, not the aggressive zero-tolerance enforcement that policy entrepreneurs took from the original essay.

This is a piece about how a memorable narrative, dressed in the language of behavioral science, can drive consequential policy on empirical evidence that would not survive serious scrutiny --- and what that implies for any strategist evaluating the next round of “behavioral nudges at scale.”

What The 1982 Atlantic Essay Actually Was

Wilson and Kelling’s “Broken Windows” was published in The Atlantic Monthly in March 1982 (volume 249, issue 3, pages 29—38). It is reproduced and excerpted widely; the title and the page numbers are stable across sources. What it is not is original empirical research. There are no data tables. There is no methodology section. There are no regressions, no field experiments designed by the authors, no surveys, no archival analyses. It is an essay in the older sense --- a piece of structured argument, drawing on the authors’ reading of the literature, their experience with the Newark Foot Patrol Experiment of the mid-1970s, and a series of illustrative anecdotes.

The structural argument runs roughly as follows. Communities have informal mechanisms of social control --- neighbors who know each other, residents who challenge strangers, business owners who keep an eye on the street. When visible disorder accumulates --- broken windows that go unrepaired, graffiti, public drunks, panhandlers --- those informal mechanisms weaken. Residents withdraw. Predatory criminals interpret the visible disorder as a signal that the area is unprotected. Serious crime rises. The implication for policing is that officers should not wait for serious crime to happen; they should attend to the small disorders, restore the sense of order, and the broader crime problem will recede.

The essay’s central illustration is Philip Zimbardo’s 1969 study of abandoned cars, which Wilson and Kelling describe in the opening pages. They use it as a parable: an abandoned car in a “disorderly” environment (the Bronx) was vandalized within hours; an abandoned car in an “orderly” environment (Palo Alto) sat untouched for a week, until Zimbardo himself attacked it with a sledgehammer --- after which the rest of the community joined in. The lesson Wilson and Kelling drew was that disorder begets more disorder, that the first broken window matters because it signals that nobody is watching.

The Zimbardo “study” cited as the empirical anchor for this argument was, in its actual design, two cars. One car was placed in a Bronx neighborhood, with the license plates removed and the hood up. A second, in the same condition, was placed near the Stanford University campus in Palo Alto. Zimbardo and colleagues observed what happened. The Bronx car was stripped within days; the family that arrived first (a father, mother, and young son) removed the radiator and battery. The Palo Alto car sat untouched for more than a week, until Zimbardo himself smashed it with a sledgehammer to provoke a reaction.

This is a memorable anecdote. It is not, by the standards of any field that depends on inference from data, evidence on which to build a national policing doctrine. There is no comparison group. There are no controls. There is no measurement of pre-existing crime rates, pre-existing reporting practices, or pre-existing differences in resident attentiveness between the Bronx and Palo Alto. The sample is two. Zimbardo himself, in the original publication (the 1969 Nebraska Symposium on Motivation, volume 17, pages 237—307), framed the demonstration as evidence about deindividuation and the social conditions under which inhibitions break down --- he was not making a claim about an empirically supported causal chain from broken windows to serious crime. Wilson and Kelling lifted the illustration, expanded its implications, and built a theory of policing around it.

The Newark Foot Patrol Experiment, also referenced in the essay, was an actual field study --- but its principal finding was that foot patrols did not reduce crime. They did increase residents’ sense of safety, which Wilson and Kelling cited as evidence that disorder management produces real benefits even if measured offense rates do not move. This is a coherent argument, but it is not evidence that disorder enforcement reduces crime. The 1982 essay used the foot-patrol finding to argue for the broader theory; it did not present new evidence that the theory was right.

The essay was, in other words, a hypothesis. A thoughtful, well-written hypothesis. It was published in a general-interest magazine, not a peer-reviewed journal. It contained no original data. The empirical question it raised --- does aggressive enforcement of minor disorders cause reductions in serious crime --- was an open question in 1982 and would remain open, in the strict scientific sense, for decades.

What Got Built On The Theory

The first major operational adoption of broken windows thinking happened in New York City. In 1990, William Bratton became chief of the New York City Transit Police and applied an order-maintenance approach to the subway system, targeting fare evasion, graffiti, and aggressive panhandling. Subway crime fell. Bratton credited the order-maintenance approach. When Rudy Giuliani was elected mayor of New York City in 1993, he hired Bratton as commissioner of the NYPD and the same logic was extended citywide.

The operational implementation is documented in the city’s own arrest records and in subsequent academic studies. Misdemeanor arrests in New York rose from roughly 133,000 in 1993 to over 205,000 by 1996 --- a fifty-percent increase concentrated in offenses like turnstile jumping, public drinking, public urination, loitering, unlicensed street vending, panhandling, and “quality of life” infractions. The Compstat system, introduced under Bratton, tracked crime statistics by precinct and held commanders accountable for reductions, creating an organizational incentive to drive enforcement numbers upward. Stop-and-frisk, which had long been a tactical option for street officers, expanded dramatically: between roughly 2002 and 2011, NYPD recorded annual stops climbed from under 100,000 to a peak of approximately 685,000 in 2011, the overwhelming majority of which involved Black and Latino men and the overwhelming majority of which resulted in no arrest or summons. The legal and political defense of the stop-and-frisk regime, in court filings and public statements, repeatedly invoked broken windows reasoning: that visible enforcement of small infractions deters larger crimes, and that the constitutional latitude granted by Terry v. Ohio (1968) authorized the tactic.

The theory’s reach extended well beyond New York. Bratton was later hired as chief of the Los Angeles Police Department (2002—2009) and brought the same operational framework. Boston, Newark, Lowell, and dozens of mid-sized American cities adopted some version of order-maintenance policing in the late 1990s and 2000s. The Metropolitan Police in London ran “Operation Athena” and other quality-of-life initiatives invoking broken windows logic. International policing conferences featured Bratton and Kelling as keynote speakers. The theory became, in the language of policy diffusion scholars, isomorphic --- adopted in part because of its operational logic and in part because adopting it signaled that a city’s leadership was serious about crime.

Notice what is being built here. A theoretical argument from a magazine essay, anchored by a two-car illustration from a 1969 deindividuation study, became the doctrinal foundation for the daily operational behavior of hundreds of thousands of police officers, affecting tens of millions of citizens, over more than two decades. The chain of inference from “Zimbardo’s car got stripped in the Bronx” to “New York City should arrest 205,000 people a year for fare evasion and open containers” is long, and most of the empirical links in that chain were never established to the standard one would demand for, say, a new drug approval or a structural engineering specification.

What The 1990s Crime Decline Was Actually Driven By

Both the spread of broken-windows policing and the dramatic decline in American crime happened in the 1990s. The decline was real and large. National violent crime rates fell roughly forty percent between 1991 and 2001; property crime fell by roughly the same magnitude. New York City’s declines were even steeper. For a generation, the natural-seeming public inference was that the policing approach caused the decline.

The economics literature does not support that inference. Steven Levitt’s 2004 paper in the Journal of Economic Perspectives, “Understanding Why Crime Fell in the 1990s: Four Factors That Explain the Decline and Six That Do Not” (volume 18, issue 1, pages 163—190; DOI: 10.1257/089533004773563485)), synthesized the empirical economics of the 1990s crime decline. The four factors Levitt found best supported by the data were: increases in the number of police, increases in the prison population, the receding of the crack-cocaine epidemic, and the long-term effects of the legalization of abortion in the early 1970s. The six factors he found poorly supported --- meaning the data did not show them to be major contributors --- included the strong economy, changing demographics (the aging-out of the high-crime cohort), gun control laws, concealed-carry laws, the use of capital punishment, and innovative policing strategies. The category of innovative policing strategies includes broken-windows policing. Levitt’s conclusion was not that those strategies did nothing; it was that the empirical evidence did not support attributing the national crime decline to them in any large way.

Levitt’s “increases in the number of police” factor is sometimes cited as supporting broken-windows policing, because broken-windows departments typically also added headcount. But the mechanism Levitt identifies is the deterrent and incapacitation effect of more officers on the street generally --- not the specific tactical doctrine of enforcing minor disorders. Bigger force, more enforcement, fewer crimes. That is not a vindication of broken windows; it is a finding about the marginal value of police presence.

Harcourt and Ludwig’s 2006 paper in the University of Chicago Law Review, “Broken Windows: New Evidence from New York City and a Five-City Social Experiment” (volume 73, issue 1, pages 271—320), looked more directly at the New York data. Using precinct-level analyses, they found no robust evidence that the precincts with the most aggressive misdemeanor enforcement had larger crime declines than precincts with less aggressive enforcement, once one controlled for the regression-to-the-mean effect (precincts with the highest crime in the late 1980s tended to see the biggest drops, regardless of enforcement strategy). The five-city component drew on data from the Moving to Opportunity (MTO) housing program, in which roughly 4,800 families in five U.S. cities were randomly assigned vouchers to move from high-disorder to lower-disorder neighborhoods. The randomization is what makes MTO useful: it isolates the effect of moving into a more orderly environment from the underlying characteristics of the individuals. Harcourt and Ludwig’s analysis of the MTO data found no significant effect of the move on the criminal behavior of household members. People who moved to less disordered neighborhoods did not commit fewer crimes. The “disorder causes crime” causal arrow that broken windows theory requires did not show up when the experimental design was strong enough to test for it.

Harcourt’s earlier book, Illusion of Order: The False Promise of Broken Windows Policing (Harvard University Press, 2001), made the broader case at length. Harcourt argued that the empirical evidence for broken-windows policing was much weaker than its advocates claimed, that the New York crime decline was over-attributed to the strategy because the strategy was visible and the alternative explanations were not, and that the costs of the approach --- including the disproportionate burden on Black and Latino communities and the corrosive effect on police-community relations --- were not adequately weighed against benefits that, on close examination, were largely speculative.

None of this means broken-windows-era policing in New York did nothing. The honest summary is more uncertain than that. Some of the crime decline in some places may have been attributable to enforcement intensity; some of the decline was almost certainly attributable to the additional officers, regardless of doctrine; much of the decline tracks national trends that proceeded in cities that did not adopt broken windows. The strongest empirical claims made for the policing approach --- that it caused the New York miracle --- are not supported by the most careful subsequent analyses.

The Keizer 2008 Apparent Vindication

In November 2008, a paper by Kees Keizer, Siegwart Lindenberg, and Linda Steg of the University of Groningen was published in Science (volume 322, issue 5908, pages 1681—1685; DOI: 10.1126/science.1161405)), titled “The Spreading of Disorder.” It described six field experiments testing whether visible violations of one norm increased people’s willingness to violate other norms. The studies were widely reported as the long-awaited experimental vindication of broken windows.

The methodology was clever. In one study, an alley in Groningen used for bicycle parking was studied under two conditions. In the “order” condition the alley walls were clean; in the “disorder” condition they were covered in graffiti. A sign prohibiting graffiti was visible. A flyer was attached to the handlebars of parked bicycles. The dependent measure was whether the cyclist, on returning, dropped the flyer on the ground or carried it away. In the clean condition, thirty-three percent littered. In the graffiti condition, sixty-nine percent did. In another study, an envelope visibly containing five euros protruded from a mailbox; in the order condition the mailbox area was clean, in the disorder condition it was surrounded by graffiti or by litter on the ground. The proportion of passers-by who stole the envelope roughly doubled in the disorder conditions. Across the six experiments, the pattern was consistent: visible disorder increased rule-breaking on unrelated norms.

The paper was a methodological improvement over the 1982 essay’s evidentiary basis by orders of magnitude. It was an actual experiment, with random assignment of conditions, conducted in naturalistic settings, with measurable outcomes. It was published in one of the most rigorous scientific journals in the world. Coverage of the paper treated it as decisive: Economist and Nature news pieces presented the results as confirming that “broken windows are real.”

The subsequent replication record is much weaker than the 2008 paper suggested. A 2023 systematic review and meta-analysis by Volker and colleagues in the Journal of Environmental Psychology, “Are broken windows spreading? Evaluating the robustness and strengths of the cross-norm effect using replications and a meta-analysis,” compiled the original Keizer experiments alongside subsequent replication attempts. The authors documented that the original studies suffered from low statistical power (which inflates effect-size estimates when results are significant), used statistical tests that some commentators consider invalid for the design, and reported their methods with insufficient detail for clean replication. The meta-analytic estimate of the cross-norm effect across the studies and their replications was much smaller than the original Keizer estimates, with substantial heterogeneity that depended on the type of norm violation studied, the setting, and the population. The reviewers’ summary characterization was that the current state of broken windows theory, with respect to the cross-norm psychological mechanism, is inconclusive.

This is a familiar pattern from the broader replication crisis. A high-profile original study reports a large and clean effect. Subsequent attempts to replicate find a much smaller effect, or no effect at all, or an effect that depends on contextual moderators not specified in the original report. The original study’s contribution to the field is real --- it created a paradigm, generated follow-up research, sharpened the questions being asked --- but its specific quantitative claims do not hold up at the strength originally reported. Keizer 2008 follows this script. The cross-norm psychological mechanism it proposed may exist at some modest magnitude in some settings; it is not the robust general phenomenon the 2008 paper implied.

The Modern Meta-Analytic Verdict (Braga 2015)

The most rigorous quantitative summary of the policing evidence is Anthony A. Braga, Brandon C. Welsh, and Cory Schnell’s 2015 systematic review and meta-analysis in the Journal of Research in Crime and Delinquency, “Can Policing Disorder Reduce Crime? A Systematic Review and Meta-Analysis” (volume 52, issue 4, pages 567—588; DOI: 10.1177/0022427815576576)). The Braga team followed Campbell Collaboration protocols, identified thirty randomized experimental and quasi-experimental evaluations of disorder policing, and computed standardized mean-difference effect sizes (Cohen’s d) across the studies.

The headline finding is that disorder policing produces a statistically significant, modest crime-reduction effect on average. That is real, and it is not nothing. But two features of the analysis are more important than the headline.

First, the effect depends heavily on the type of disorder policing. Braga and colleagues separated the studies into two categories: community- and problem-oriented approaches (which engage residents, work with local institutions, and identify the specific drivers of disorder in a place) and aggressive order-maintenance approaches (which rely primarily on enforcement of minor offenses, zero-tolerance arrests, and tactical sweeps). The community- and problem-oriented programs produced a much larger mean effect size than the aggressive enforcement programs. The aggressive enforcement programs --- the operational style most directly associated with the New York implementation of broken windows --- produced effects that were small and not consistently distinguishable from zero in the meta-analysis.

Second, research design mattered. The randomized controlled experiments in the dataset produced smaller and less robust effects than the quasi-experimental studies. This is the standard pattern: weaker designs allow more room for confounding, regression to the mean, and selection effects, which tend to inflate the apparent benefit of an intervention. The closer one looks with rigorous methods, the smaller the disorder-policing effect gets, and the more it concentrates in the community-problem-solving subset.

A 2024 update by Braga and colleagues in Criminology & Public Policy, “Disorder Policing to Reduce Crime: An Updated Systematic Review and Meta-Analysis” (DOI: 10.1111/1745-9133.12667)), extended the analysis to additional studies. The updated meta-analysis confirms the 2015 pattern: a modest aggregate effect, concentrated in community-and-problem-oriented programs, with aggressive enforcement approaches showing weak and unreliable benefits.

The honest reading of the meta-analytic literature is therefore: the proposition that some targeted attention to disorder in some configurations produces some reduction in crime has empirical support. The proposition that aggressive, zero-tolerance, enforcement-heavy operational doctrine --- the version of broken windows that justified mass misdemeanor arrests and the stop-and-frisk era --- produces large crime reductions does not.

What This Means For Policy Founded On Behavioral Science

The broken windows case has an unsettling structure for anyone who designs strategy around behavioral arguments. A small group of articulate academics published a memorable essay in a general-interest magazine. The essay synthesized real observations about urban life and packaged them in a clean causal story. The story spread first through policy circles, then through political leadership, then through operational practice. Within fifteen years it had reorganized the daily activity of police forces serving more than 100 million people. The empirical evidence at every stage was thinner than the confidence with which the policy was implemented. The empirical evidence assembled in the decades since has not vindicated the original strength of the claim.

This is not a parable about academic dishonesty. Wilson and Kelling were not fabricating. They were writing a thoughtful essay in good faith about a serious problem. The structural failure was downstream: the conversion of a hypothesis into a doctrine without the intervening step of accumulating evidence at the resolution the doctrine required. The doctrine treated the hypothesis as if it had been tested; the hypothesis had not been tested at the level of detail necessary to justify the operational behavior it underwrote.

For any strategist evaluating a behavioral-science argument that proposes to change consequential choices at scale --- pricing structures, nudges in default settings, surveillance-as-deterrent claims, “small-thing” theories of organizational culture, gamification systems --- the broken windows trajectory is worth holding next to it. The questions to ask are: What is the original evidence base, in primary sources? Is it an essay, a small study, a single high-profile result, or a body of replicated work? Have the apparent successes of the approach been compared against alternative explanations with comparable rigor? When the meta-analytic verdict comes in, is the effect concentrated in the version of the approach that the advocates favor, or in some different version? How long is the chain of inference from “interesting finding” to “operational doctrine,” and how many of the links in that chain have been independently checked?

These are not rhetorical questions, and they are not anti-evidence. They are the questions one needs to ask when the cost of being wrong is large enough that the answer matters. Broken-windows policing in New York affected the lives of millions of people, generated decades of litigation, helped justify a stop-and-frisk regime that the federal courts eventually ruled unconstitutional in Floyd v. City of New York (2013), and contributed to the slow erosion of trust between police forces and the communities they served. The cost of the operational doctrine, weighed honestly, was very large. The empirical foundation for the doctrine, weighed honestly, was much smaller than the doctrine pretended.

The analog to private-sector strategy is closer than it looks. The leader who reorganizes a hundred sales reps around a behavioral-economics framework she read in a popular book, the consultant who recommends a “small wins” cultural overhaul anchored in a vivid TED talk, the head of product who pushes a redesign rooted in a single high-profile field study --- each is making a broken-windows-shaped move. The evidence for the approach may be perfectly real, in some form, at some magnitude. The question is whether the magnitude justifies the cost of the implementation, and whether the implementation matches the part of the evidence base that is actually robust. The honest answer, when these questions are taken seriously, is often “less than I assumed.”

What’s Honest To Say About Disorder And Crime Now

The most defensible synthesis from the modern criminological literature is roughly this. Crime is concentrated in places. A small number of street segments, addresses, and micro-locations account for a disproportionate share of incidents in any given city --- the “law of crime concentration” documented by David Weisburd and colleagues. Place-based, hot-spot policing strategies that focus officer attention on these high-crime micro-locations have substantial evidence of effect, supported by randomized trials and multiple meta-analyses. The mechanism is partly deterrence (visible police presence at the times and places where crime happens), partly problem-solving (identifying and addressing the specific environmental and social drivers in a location), and partly community engagement (rebuilding informal social control with residents rather than over them).

There is also evidence that the physical environment of a location matters for crime, though the mechanism is more complex than “disorder causes serious crime.” Studies of vacant-lot cleanup and remediation in Philadelphia by Charles Branas and colleagues have found measurable reductions in gun violence and overall crime in randomized comparisons of treated and untreated lots. These findings are sometimes invoked as vindicating broken windows; they are better understood as supporting a more specific claim --- that physical disorder in some forms in some places is causally connected to some crime types, and that physical intervention in the environment (cleanup, lighting, greening) can have measurable benefits without the heavy enforcement footprint of zero-tolerance policing.

The combined picture is not what either the strongest broken-windows advocates or the strongest broken-windows critics typically claim. There is something to the underlying intuition that places matter and that physical and social conditions in a neighborhood interact with crime. There is much less to the operational claim that aggressive enforcement of minor offenses, applied with the demographic pattern that the New York implementation produced, is the appropriate response. Modern criminology has largely moved to the “place-based, problem-oriented, community-engaged” frame as the most empirically supported version of the underlying idea. That frame has real overlap with what Wilson and Kelling were trying to articulate in 1982. It has very little overlap with what 1990s broken-windows policing actually did.

For a CEO, consultant, or policy leader evaluating any behavioral argument about scale and disorder --- in a city, in an organization, in a digital product --- the lesson is not “behavioral arguments are bunk.” The lesson is that the operational version of the argument typically diverges from the most empirically defensible version, and the divergence usually moves in the direction of more confident, more aggressive, and more legible to the people implementing it than the evidence will support. The 1982 essay’s actual claim was modest. The implementation built on it was not. Forty years later, the meta-analytic evidence sides with a modest version. The aggressive version, in the version it was actually deployed, is closer to a cautionary tale than to a vindication.

Sources

Wilson, J. Q., & Kelling, G. L. (1982). Broken windows: The police and neighborhood safety. The Atlantic Monthly, 249(3), 29—38.
Zimbardo, P. G. (1969). The human choice: Individuation, reason, and order versus deindividuation, impulse, and chaos. Nebraska Symposium on Motivation, 17, 237—307.
Harcourt, B. E. (2001). Illusion of Order: The False Promise of Broken Windows Policing. Harvard University Press.
Harcourt, B. E., & Ludwig, J. (2006). Broken windows: New evidence from New York City and a five-city social experiment. University of Chicago Law Review, 73(1), 271—320.
Levitt, S. D. (2004). Understanding why crime fell in the 1990s: Four factors that explain the decline and six that do not. Journal of Economic Perspectives, 18(1), 163—190. DOI: 10.1257/089533004773563485
Keizer, K., Lindenberg, S., & Steg, L. (2008). The spreading of disorder. Science, 322(5908), 1681—1685. DOI: 10.1126/science.1161405
Volker, B., et al. (2023). Are broken windows spreading? Evaluating the robustness and strengths of the cross-norm effect using replications and a meta-analysis. Journal of Environmental Psychology, 87, 101981.
Braga, A. A., Welsh, B. C., & Schnell, C. (2015). Can policing disorder reduce crime? A systematic review and meta-analysis. Journal of Research in Crime and Delinquency, 52(4), 567—588. DOI: 10.1177/0022427815576576
Braga, A. A., Welsh, B. C., & Schnell, C. (2024). Disorder policing to reduce crime: An updated systematic review and meta-analysis. Criminology & Public Policy. DOI: 10.1111/1745-9133.12667
Floyd v. City of New York, 959 F. Supp. 2d 540 (S.D.N.Y. 2013).
Branas, C. C., South, E., Kondo, M. C., Hohl, B. C., Bourgois, P., Wiebe, D. J., & MacDonald, J. M. (2018). Citywide cluster randomized trial to restore blighted vacant land and its effects on violence, crime, and fear. Proceedings of the National Academy of Sciences, 115(12), 2946—2951. DOI: 10.1073/pnas.1718503115

The Replication Crisis hub --- the broader landscape of behavioral-science claims that did not survive serious empirical scrutiny.
The Stanford Prison Experiment --- another single-site study whose dramatic narrative outran its evidentiary basis.
The Milgram Obedience Experiments --- what the canonical version of an iconic study leaves out.
The Bystander Effect and Kitty Genovese --- a parable about urban anonymity built on a misreported anchoring case.
The Hawthorne Effect --- when the data behind a foundational story were recovered, the famous pattern was, in the analysts’ word, “fictional.”

FAQ

Did broken-windows policing reduce crime in New York City?

The empirical evidence does not support a confident “yes.” New York’s crime decline in the 1990s was large and real, but it tracked a national crime decline that proceeded in cities that did not adopt broken-windows policing. The most rigorous economic analyses (Levitt 2004) attribute the national decline primarily to factors other than innovative policing strategies. The most rigorous direct examinations of the New York data (Harcourt 2001, Harcourt and Ludwig 2006) find no robust evidence that the precincts with the most aggressive misdemeanor enforcement saw larger crime drops than less aggressive precincts, once one controls for regression to the mean. The honest answer is that some portion of the decline may have been due to enforcement intensity, but the strongest claims made for the strategy are not supported.

What did Zimbardo’s 1969 study actually show?

Zimbardo placed two cars, one in the Bronx and one in Palo Alto, both with license plates removed and hoods up. The Bronx car was stripped within days; the Palo Alto car sat untouched for over a week until Zimbardo attacked it with a sledgehammer to provoke a reaction. Zimbardo himself framed the demonstration as evidence about deindividuation --- the loss of self-restraint under certain social conditions --- not as evidence for a “disorder causes crime” causal chain. Wilson and Kelling cited the two-car illustration in their 1982 essay as the empirical anchor for broken windows theory. Two cars, in two locations, observed informally, is not a basis on which to construct a national policing doctrine.

Was the Keizer 2008 Science paper a real vindication?

It was the strongest experimental evidence broken windows ever received, and it remains an important piece of work --- but the original effects were larger than subsequent replications and meta-analyses have found. A 2023 Journal of Environmental Psychology meta-analysis by Volker and colleagues compiled the original studies alongside replication attempts and found the cross-norm effect was much smaller than originally reported, with significant heterogeneity by context. The current verdict is “inconclusive” rather than “confirmed.”

What does the Braga 2015 meta-analysis say about disorder policing?

It finds a statistically significant, modest crime-reduction effect on average across thirty randomized and quasi-experimental studies. Critically, the effect is concentrated in community-oriented and problem-solving approaches to disorder policing --- engaging residents, identifying specific drivers in a place --- rather than in aggressive, enforcement-heavy, zero-tolerance approaches. The 2024 update confirms the same pattern. The operational version of broken windows that 1990s New York deployed is the version the meta-analysis finds least supported.

What should police actually do, then?

The strongest current evidence supports place-based, hot-spot policing focused on the small number of micro-locations where crime concentrates, combined with problem-oriented approaches that identify and address the specific drivers in those locations, and community engagement that rebuilds informal social control with residents rather than over them. Physical environmental interventions --- vacant-lot cleanup, lighting, greening --- also have meaningful empirical support, notably from Charles Branas’s randomized work in Philadelphia. None of this is the aggressive misdemeanor-enforcement model the term “broken windows” came to denote.

What about Cure Violence and similar non-policing approaches?

There is a growing evidence base for public-health-style violence-interruption programs that operate outside the policing model, treating gun violence as an epidemic and intervening through trusted community members. These approaches show promise in some evaluations, with mixed results in others. The point is not that any one model has been proven definitively superior; it is that the policy space for reducing serious crime is much larger than the broken-windows era recognized, and the empirical comparative evidence is still being built.

What is the policy alternative to zero-tolerance enforcement?

The contemporary criminological mainstream supports a portfolio that includes hot-spot patrol focused on high-crime micro-locations, problem-oriented policing that identifies and intervenes on specific local drivers, focused-deterrence approaches targeted at the small networks of individuals who drive most violence in many cities, physical environmental interventions, and significant investment in non-policing public-health approaches to violence. This portfolio reduces the operational and constitutional costs that aggressive misdemeanor enforcement imposed without abandoning the goal of crime reduction.

Why has the broken-windows narrative been so resilient despite the weak evidence?

For roughly the same reasons the Hawthorne effect narrative has been resilient: it is a vivid, memorable story with a clean causal mechanism, it travels well across policy and media audiences, it provides a ready justification for visible operational activity, and the alternative explanations (demographics, economics, drug-market shifts, abortion availability, regression to the mean) are statistically subtle and rhetorically less satisfying. A simple story with a named effect almost always beats a complicated story with no name, even when the simple story is much weaker than the evidence base it claims.

replication-crisis broken-windows criminology public-policy evidence-evaluation

Free Tool

Built for Experimentation Teams

GrowthLayer is the experimentation platform I built for CRO teams --- test management, AI-powered insights, and pattern recognition across your entire program.

Explore GrowthLayer → (opens in new tab)

· Start Free →

Share this article

LinkedIn (opens in new tab) X / Twitter (opens in new tab)

Copy link

Go deeper

Methodology The PRISM Method Case Studies $30M+ in Results Work Together Services & Mentoring

Experimentation and growth leader. Builds AI-powered tools, runs conversion programs, and writes about economics, behavioral science, and shipping faster.

About LinkedIn Newsletter

← Previous

Money Priming: The Influential 2006 Effect That Modern Replications Cannot Find

Next →

The Default Effect: The Behavioral-Economics Finding That Actually Holds Up

replication-crisis broken-windows criminology public-policy evidence-evaluation

Atticus Li

Experimentation and growth leader. CXL-certified CRO practitioner, Mindworx-certified behavioral economist (1 of ~1,000 worldwide). 200+ A/B tests across energy, SaaS, fintech, e-commerce, and marketplace verticals.

About LinkedIn Newsletter

Broken Windows Theory: The Atlantic Essay That Reshaped Policing On Weak Evidence

What The 1982 Atlantic Essay Actually Was

What Got Built On The Theory

What The 1990s Crime Decline Was Actually Driven By

The Keizer 2008 Apparent Vindication

The Modern Meta-Analytic Verdict (Braga 2015)

What This Means For Policy Founded On Behavioral Science

What’s Honest To Say About Disorder And Crime Now

Sources

FAQ

Built for Experimentation Teams

Three places this work shows up.

GrowthLayer

Consulting

Jobsolv

Get the Weekly
Experimentation Playbook

What The 1982 Atlantic Essay Actually Was

What Got Built On The Theory

What The 1990s Crime Decline Was Actually Driven By

The Keizer 2008 Apparent Vindication

The Modern Meta-Analytic Verdict (Braga 2015)

What This Means For Policy Founded On Behavioral Science

What’s Honest To Say About Disorder And Crime Now

Sources

Related

FAQ

Built for Experimentation Teams

Related Articles

Cohen's d And The Misuse Of "Small/Medium/Large" Effect Sizes

The False Consensus Effect: Why You Think Everyone Agrees With You

The Barnum/Forer Effect: Why Personality Tests And Horoscopes Feel So Accurate

Three places this work shows up.

GrowthLayer

Consulting

Jobsolv

Get the WeeklyExperimentation Playbook

Get the Weekly
Experimentation Playbook