In January 2008, the New England Journal of Medicine published a paper titled “Selective publication of antidepressant trials and its influence on apparent efficacy.” The lead author, Erick Turner, was a psychiatrist at the Portland Veterans Affairs Medical Center and at Oregon Health & Science University, with a previous stint as a medical reviewer at the US Food and Drug Administration --- a piece of biographical detail that mattered, because the paper’s central methodological move was something only a former FDA insider would have thought of. Turner and his coauthors filed Freedom of Information Act requests and pulled the FDA’s own files on 74 antidepressant trials covering 12 drugs --- the SSRIs and SNRIs that had been approved between 1987 and 2004 --- representing 12,564 patients. They then went to the published medical literature and looked for the matched publications. For each of the 74 trials, they asked two questions: was it published at all, and if so, did the published outcome match the FDA-recorded outcome?
The answers were the kind of finding that does not just describe a problem but reshapes the question of what the public literature even is. Of the 38 trials that the FDA had classified as positive --- meaning the antidepressant had shown a statistically significant advantage over placebo on the primary outcome measure --- 37 were published. Of the 36 trials that the FDA had classified as negative or questionable --- meaning the drug had not shown a statistically significant advantage --- only 3 were published as straightforwardly negative results. The remaining 33 were either not published at all (22 trials) or were published with the negative finding rewritten in some way as a positive finding (11 trials). The publication rate for positive trials was 97%. The publication rate for unambiguously negative trials was 8%. When the authors recomputed effect sizes for each drug using the full FDA dataset versus the published literature, the published literature inflated the apparent efficacy of every drug, with an average inflation of roughly 32%. For some drugs the inflation was much larger.
This paper did not invent the concept of publication bias, which had been documented in psychiatry and other fields for decades. What it did was demonstrate the bias with a methodology that was essentially unanswerable. The FDA holds the trial results because the sponsors are legally required to file them, regardless of outcome, before drug approval. The FDA’s classification of positive versus negative is based on prespecified primary outcomes filed in the trial protocol before the trial begins. The comparison of FDA records to published literature is therefore not a meta-analysis of selected studies; it is a near-complete audit of one drug class’s evidence base against its own internal ground-truth dataset. That methodological design made the conclusion impossible to dismiss as a sampling artifact, an analytic choice, or a contrarian re-interpretation. The published literature on antidepressants was systematically distorted, and the FDA’s files were the unbiased control.
For anyone who builds a strategic, clinical, organizational, or policy decision on top of a published research literature, the Turner 2008 paper is one of the most important methodological case studies of the last twenty years. It is the cleanest demonstration available of how big the gap can be between what a field has tested and what a field has published. The corollary is that anyone reading a published meta-analysis in any field without preregistered protocols and mandatory result reporting is reading a literature that is, with high probability, distorted in the direction of the hypothesis. The size of the distortion in the antidepressant case --- a third of effect size, on a literature most of medicine had treated as well-established --- is the calibration number a careful evaluator should carry around when reading any other published literature without the FDA-style audit infrastructure.
This is the story of the Turner paper, the parallel Kirsch FDA reanalysis that landed the same month, the Cipriani 2018 network meta-analysis that integrated these concerns into the modern evidence base on antidepressants, and the AllTrials movement for trial registration that the Turner paper helped catalyze. It is also the story of what a strategist should do when evaluating any medical or health claim, given that the published literature is reliably the wrong place to look first.
The FDA Files And Why Turner Could Get Them
The Turner methodology hinges on a piece of US regulatory machinery that most people outside pharmaceutical research do not know exists. When a pharmaceutical company wants the FDA to approve a new drug, it files a New Drug Application (NDA) that includes, by law, the complete record of every clinical trial the sponsor has conducted on the drug, regardless of whether the trial succeeded or failed and regardless of whether the trial has been published. The NDA includes the protocols, the prespecified primary outcomes, the analytic plans, the statistical results, and the FDA reviewers’ own analyses of those results. The FDA then makes its approval decision and writes its own statistical and medical reviews, which are filed in a document set called the FDA Approval Package. After approval, large portions of the Approval Package become publicly available, though the format and ease of access vary by drug, era, and what the sponsor has marked as confidential commercial information.
Turner had worked as a medical reviewer at the FDA’s Division of Neuropharmacological Drug Products during the late 1990s. He had personally reviewed antidepressant submissions. He knew, in a way that academic psychiatrists generally did not, that the FDA’s files contained trials that had never appeared in the medical literature --- and he knew which trials those were. After he left the FDA and returned to academia, he and his coauthors set out to do what amounted to a structured audit: for each FDA-approved antidepressant in the SSRI and SNRI classes, pull every trial in the FDA file, identify its prespecified primary outcome and the FDA’s classification of positive or negative based on that outcome, and then go look in PubMed and other databases for the matching publication. The audit included 12 drugs: fluoxetine (Prozac), sertraline (Zoloft), paroxetine (Paxil), citalopram (Celexa), escitalopram (Lexapro), fluvoxamine (Luvox), bupropion (Wellbutrin), nefazodone (Serzone), venlafaxine (Effexor), mirtazapine (Remeron), duloxetine (Cymbalta), and trazodone, approved between 1987 and 2004. The trial set was 74 trials totaling 12,564 patients. The paper, when it came out, was a forensic accounting exercise rather than a clinical study.
The forensic methodology mattered for two reasons. First, it sidestepped the standard objection to publication-bias arguments, which is that the universe of unpublished trials cannot be reliably enumerated. In most fields, “publication bias” is inferred from indirect evidence --- funnel plots, the distribution of p-values just above and below 0.05, comparison of trials reported in conference abstracts with their eventual publication status. These methods are useful but rebuttable: a defender of the published literature can always argue that the indirect signals are themselves biased. The FDA-file methodology is direct. The denominator is known. Every trial in the FDA file is counted, regardless of whether it was ever submitted for publication. The numerator is the subset that made it to print. No statistical inference about an unobserved population is required.
Second, the FDA’s classification of positive or negative was based on the prespecified primary outcome filed in the protocol before the trial began, not on what the eventual publication chose to emphasize. This is the central control. A trial can be negative on its prespecified primary outcome and still be published with an emphasis on a secondary outcome that happened to reach statistical significance. The Turner team identified 11 trials in their dataset where exactly this had happened: the FDA had classified the trial as negative on the prespecified primary outcome, but the published version of the trial reported the result as positive, typically by switching the analysis to a different outcome or to a different patient subset. These 11 trials represented a separate, distinct distortion mechanism layered on top of the bare publication-bias mechanism. Trials that should have been in the literature as negative were missing entirely. A further set of trials that should have been in the literature as negative were in the literature as positive instead. Both mechanisms inflated the apparent efficacy of the drug class.
The combined effect, when the team recomputed effect sizes, was that the published literature reported standardized mean differences (effect sizes) for the drug-versus-placebo comparison that were on average 32% larger than the effect sizes computed from the complete FDA dataset. For some drugs the inflation was modest. For others it was substantial. Reboxetine, an antidepressant that was not in the Turner dataset but was later subjected to a similar audit by the German Institute for Quality and Efficiency in Health Care (IQWiG) in 2010, turned out to have had three-quarters of its patient data unpublished, with the unpublished data showing essentially no benefit over placebo. The Turner dataset was the start of an audit pattern that turned out to generalize.
The 94/14 Finding And How To Read It
The specific numbers from the Turner paper that get repeated in textbooks --- 94% of positive trials published, 14% of negative trials published --- are slightly different from the raw counts above because they are computed at a slightly different level of aggregation. The 97% and 8% figures are the rates among the 74 trials directly. The 94% and 14% figures appear when the analysis groups outcomes by whether they were reported as positive in the published literature versus the FDA’s classification, and when the “negative” category includes the 11 trials whose negative FDA-recorded outcomes were rewritten as positive in publication. Different presentations of the Turner data use different denominators. The fundamental finding does not change: positive trials are published at extremely high rates; negative trials are published at extremely low rates; the published literature substantially overstates the efficacy of the drug class.
What is worth attention is the interpretation of the inflation. A 32% inflation in standardized mean difference is not a marginal effect. In the antidepressant context, the standardized mean differences reported in published meta-analyses had typically been in the range of 0.3 to 0.4, which is by convention considered a small-to-moderate effect. If the true effect from the FDA-file data is closer to 0.2, the clinical significance becomes ambiguous: a 0.2 standardized mean difference is the threshold at which a difference is often considered too small to be clinically meaningful for most patients in the absence of selection. The Turner paper did not argue that antidepressants do not work. It argued that the published literature had created a perception of efficacy substantially stronger than the underlying clinical-trial data could support. The policy and clinical implications --- about prescribing patterns, about patient counseling, about insurance coverage, about the role of psychotherapy as an alternative or complement --- were not resolved by the paper but were now being adjudicated on different evidence.
The paper was published in January 2008. The reception, unusually for a methodological audit paper, was substantial and immediate. The NEJM published an accompanying editorial by Drazen and colleagues that endorsed the methodology and noted that the findings reinforced the case for mandatory trial registration. The general medical press --- Reuters, BBC, the New York Times, the Wall Street Journal --- covered the paper in language that made the central finding accessible to non-specialists: a third of the apparent benefit of antidepressants was an artifact of publication bias. Psychiatry as a field responded in two directions. Some senior clinicians defended the existing literature and argued that the practical clinical benefit of antidepressants in everyday practice was well-established regardless of the published-literature distortion. Other senior clinicians, including some who had been arguing for years that the published efficacy of antidepressants was probably overstated, treated the paper as decisive vindication.
The paper’s methodological influence over the next decade and a half has been larger than its specific quantitative conclusions. Turner’s FDA-file methodology has been replicated for other drug classes, sometimes by Turner himself and sometimes by other groups, with broadly similar findings. The methodology is now a recognized standard for auditing the gap between regulatory datasets and published literature in any drug class where regulatory data can be obtained. The deeper consequence is that the paper made the case for trial preregistration --- that all clinical trials should be registered, in advance, with their prespecified outcomes and analytic plans, in a publicly searchable database --- in the most concrete and policy-relevant form anyone had managed to make it.
The Kirsch Paper, Same Month, Same FDA Files
A few weeks after the Turner paper appeared in the NEJM, Irving Kirsch and a research team based at the University of Hull in the UK published a paper in PLOS Medicine titled “Initial severity and antidepressant benefits: A meta-analysis of data submitted to the Food and Drug Administration.” Kirsch’s paper used substantially the same source --- the FDA file for antidepressant trials, this time for four specific drugs (fluoxetine, venlafaxine, nefazodone, and paroxetine) --- but asked a different question. Where Turner had asked about publication bias, Kirsch asked about clinical significance: stratifying patients by baseline depression severity, in which patients was the drug-versus-placebo difference large enough to be clinically meaningful?
The Kirsch finding was that the average drug-placebo difference across the FDA dataset for those four drugs was approximately 1.8 points on the 52-point Hamilton Depression Rating Scale (HAM-D), substantially below the 3-point difference that the UK’s National Institute for Clinical Excellence (NICE) had specified as a threshold for clinical significance. When the data were stratified by baseline severity, the drug-placebo difference was clinically meaningful only in the most severely depressed patients --- the small subgroup at the upper end of the severity distribution --- and this was driven not by an unusually large response to the drug in those patients but by a smaller placebo response. For the majority of patients in the trials, who had mild to moderate depression, the drug-placebo difference was below the NICE threshold for clinical significance.
The Kirsch paper was the second hammer blow of January 2008. The Turner paper had shown that the published literature was systematically inflated. The Kirsch paper had shown that even the unbiased FDA-file dataset, when correctly analyzed by patient severity, supported a much narrower clinical benefit than the published literature had described. The combination was uncomfortable. The clinical context in which antidepressants were prescribed --- for the full range of depression severity, including for patients with mild and moderate symptoms, often as a first-line intervention without referral to psychotherapy --- was being prescribed against an evidence base whose support, on careful reading of the underlying data, did not extend that broadly.
The Kirsch paper attracted even more popular-press attention than the Turner paper, partly because it lent itself to a sharper headline (“antidepressants barely work better than placebo”) and partly because Kirsch had been writing critically about antidepressant efficacy for over a decade and was a recognizable name in the public debate. The clinical-psychiatry response was mixed and at times defensive, and the technical objections to the Kirsch paper --- about the choice of severity cutoffs, about the appropriateness of the NICE 3-point threshold, about the difference between average effect and effect for individual patients --- were substantive and continued in the literature for years. What was not disputed, on the technical side, was the underlying FDA-file dataset. Both Turner and Kirsch had used the same regulatory ground-truth source. Both had found that the published literature was overstating the case.
The two papers together became the canonical citation pair for the proposition that the published evidence base for antidepressants required substantial recalibration. Subsequent meta-analyses have refined the picture in both directions: some have argued that the placebo response in modern trials has grown so large that drug-versus-placebo comparisons systematically understate the drug effect; others have argued that the publication-bias and severity-stratification findings hold up. The contemporary state of the question is more nuanced than the January 2008 headlines suggested, but the basic methodological challenge --- that the published literature on this drug class had been distorted by publication bias and that the unbiased dataset supported a more modest claim about clinical benefit --- has not been overturned. It has been integrated.
Cipriani 2018: Integrating The Lesson Into A Modern Meta-Analysis
In February 2018, Andrea Cipriani and a large international team published a paper in The Lancet titled “Comparative efficacy and acceptability of 21 antidepressant drugs for the acute treatment of adults with major depressive disorder: a systematic review and network meta-analysis.” The Cipriani paper was the largest meta-analysis of antidepressant trials ever conducted, covering 522 double-blind randomized trials and 116,477 patients across 21 antidepressant drugs. The methodology was a network meta-analysis, a more recent technique that allows simultaneous comparison of multiple treatments by pooling direct and indirect comparisons through a common reference (in this case placebo and a standard reference drug).
Two features of the Cipriani methodology are worth noting in the Turner-Kirsch context. First, the team made substantial efforts to identify and include unpublished trial data, including data obtained directly from pharmaceutical-company clinical-study-report archives that had not appeared in the published literature. This was explicitly an attempt to address the publication-bias problem that Turner had documented. The Cipriani trial set is more nearly complete than any prior antidepressant meta-analysis, which makes its conclusions more robust to the publication-bias critique. Second, the team’s risk-of-bias analysis flagged a substantial portion of the trial set --- the majority, in fact --- as being at moderate or high risk of bias, with the largest single contributor being missing outcome data (a category that includes the unreported-results problem the Turner paper had highlighted).
The Cipriani paper concluded that all 21 antidepressants studied were more effective than placebo, with effect sizes ranging from small to moderate. It also concluded that the differences between drugs in efficacy were small and that all drugs were broadly acceptable in terms of adverse-event profile, with meaningful differences between drugs. The headline number that got cited in the popular press was that the effect sizes for all 21 drugs were in the range that traditional thresholds would consider small-to-moderate. This was a more positive framing of the antidepressant evidence than the Turner-Kirsch papers had been, but it was not in fundamental contradiction with them. The Cipriani meta-analysis, with substantially better completeness of the underlying trial set, found smaller effect sizes than the published-literature-only meta-analyses had reported. The direction of correction matched what Turner had predicted: when you assemble a more complete dataset, the apparent efficacy goes down.
The Cipriani paper is now the canonical reference for the contemporary clinical question of which antidepressant works best for which patients in the acute treatment of major depressive disorder. Its results have been incorporated into national prescribing guidelines, including the updated NICE guidance in the UK. The Turner paper is the methodological precursor that made the Cipriani methodology, with its insistence on chasing unpublished data, the standard rather than the exception. The fact that Cipriani’s team had to invest substantial resources in pursuing unpublished trial data through clinical-study-report archives, in 2018, ten years after the Turner paper, is itself a measure of how slowly the trial-registration and result-reporting infrastructure has matured.
The AllTrials Movement And Trial Registration
The institutional response to the Turner paper, and to the broader publication-bias literature, has been the AllTrials movement. The campaign was launched in January 2013, five years after the Turner paper, by a coalition that included Sense About Science (a UK-based scientific-integrity charity), the BMJ, the Centre for Evidence-Based Medicine, the Cochrane Collaboration, the James Lind Initiative, PLOS, and the Dartmouth Institute for Health Policy and Clinical Practice. Ben Goldacre, a British physician and the author of Bad Pharma (2012), which had built on the Turner findings to argue that publication bias was structural across pharmaceutical research, was a central early advocate. The campaign’s demand was simple: all clinical trials should be registered, prior to enrollment, in a publicly accessible registry, with the full results reported within a year of completion regardless of whether the result was positive or negative.
The legal and regulatory infrastructure for this had been partly in place since the early 2000s. The US had launched ClinicalTrials.gov in 2000, originally as a voluntary registry, then mandated for certain trial types by the FDA Amendments Act of 2007. The International Committee of Medical Journal Editors (ICMJE) had announced in 2004 that its member journals would only publish trials that had been prospectively registered. The World Health Organization had launched its International Clinical Trials Registry Platform in 2006. The EU launched the EU Clinical Trials Register in 2011 and, in 2014, passed a regulation requiring summary results to be reported within a year of trial completion for trials conducted in the EU. The legal frameworks were therefore mostly in place. What the AllTrials movement was demanding was enforcement and extension: enforcement of the existing reporting requirements (which were widely ignored, with audits showing that a substantial fraction of registered trials still had not posted results years after completion) and extension of the requirements to cover older trials whose results had never been reported.
The enforcement gap has been slow to close. Audits of ClinicalTrials.gov reporting compliance, by groups including Goldacre’s TrialsTracker at Oxford, continue to show that a substantial fraction of trials remain unreported past their statutory deadlines. The FDA has imposed civil penalties on sponsors for non-reporting only sparingly. The EU register has had similar enforcement weaknesses. The structural problem that the Turner paper documented --- that a sponsor whose trial returns a negative result has every commercial incentive to keep the result out of the published literature, and now also has weak regulatory pressure to post the result to the registry --- has not been solved. It has been partly mitigated by the existence of the registries, by the ICMJE policies that prevent unregistered trials from being published, and by groups like AllTrials and TrialsTracker that name and shame non-compliant sponsors. The published literature on drug efficacy is still, at the structural level, biased in the direction the Turner paper described, even if the bias is no longer as severe as it was in the era covered by the original Turner dataset.
For the specific case of antidepressants, the major SSRIs and SNRIs whose trials Turner audited in 2008 are now nearly all off-patent. The commercial pressure to suppress negative findings has therefore largely lifted for that drug class, and subsequent re-analyses with more complete data --- including the Cipriani 2018 meta-analysis --- have produced a more honest picture. The pattern Turner documented for antidepressants is the pattern that is, with high probability, operating right now for some currently on-patent drug class whose trials we will be re-evaluating in 2035. The methodological lesson is general.
What This Means For A Strategist Evaluating Medical Or Health Claims
For anyone whose professional decisions depend on evaluating medical, health, or scientific evidence --- as a manager, as a policy-maker, as an investor in health-related companies, as a strategist advising clients in those spaces, or simply as an educated citizen making decisions about your own care or your family’s care --- the Turner paper is one of the most useful methodological calibration points available. The practical implications cluster around several patterns.
The published literature is a biased sample, not a complete record. For any drug class, any therapeutic intervention, any clinical-decision question, the published literature is the subset of trials whose results made it through a filter that systematically favored positive results. Even in 2026, after fifteen years of trial-registration infrastructure, the filter is still operating. The magnitude of the filter’s effect varies by drug class, by trial era, by sponsor, by therapeutic area, but the direction is consistent: the published literature overstates efficacy. The default prior for any reader of a published meta-analysis should be that the true effect is smaller than the meta-analysis reports, possibly by a substantial margin.
The strongest counter to publication bias is a regulatory-data audit. When the regulatory file --- FDA, EMA, MHRA --- is accessible and includes prespecified primary outcomes for all trials submitted, that file is the closest thing available to an unbiased ground-truth dataset. For drug-class evaluations, the question to ask is whether anyone has done a Turner-style FDA-file audit. If yes, that audit is the most reliable source. If no, the published literature should be treated as suggestive but not definitive. For non-drug interventions --- surgical procedures, behavioral interventions, dietary recommendations --- regulatory ground-truth datasets generally do not exist, and the publication-bias problem is therefore harder to bound. This is a structural reason that non-drug medical claims are typically held to a less rigorous evidentiary standard than drug claims, and a structural reason that healthcare-strategy decisions about non-drug interventions should be made with wider uncertainty bands than equivalent drug-related decisions.
Trial preregistration is now table stakes for any new trial-based evidence claim. Any clinical trial published since roughly 2010 should have a publicly accessible preregistration in ClinicalTrials.gov, the EU Clinical Trials Register, the ISRCTN registry, or a comparable platform. The preregistration should specify the primary outcome, the analytic plan, and the sample-size calculation before enrollment. A trial that is not preregistered, or whose published primary outcome does not match its preregistered primary outcome, is at high risk of being a Turner-type negative-trial-rewritten-as-positive. The check is mechanical: take the published primary outcome, look up the trial in the registry, compare. The discrepancies are usually visible at first read.
Patient-level data sharing is the next escalation and the most powerful tool. Beyond preregistration of outcomes, the gold-standard transparency mechanism is patient-level (also called individual-participant-data or IPD) sharing, where the underlying anonymized trial data are made available to outside researchers for re-analysis. Several major journals and several large pharmaceutical companies have committed to patient-level data sharing for certain trial sets, and the COMPare project at Oxford, the YODA Project at Yale, and the European Medicines Agency’s Policy 70 have built infrastructure for it. Patient-level data sharing makes Turner-style audits possible at much higher resolution than the FDA-file audit allowed. The current research-integrity reform agenda treats patient-level data sharing as the next major transparency milestone.
For non-drug medical claims, apply structural questions in lieu of regulatory data. When evaluating a claim that does not have a regulatory ground-truth dataset to audit --- a nutritional recommendation, a behavioral health intervention, a public-health policy --- the questions to ask are: How many of the relevant trials were preregistered? How many of the preregistered trials reported their primary outcome as specified? What does a funnel plot of the published meta-analysis look like (asymmetry suggests publication bias)? Is there a discrepancy between trials run by sponsors with a commercial interest and trials run by independent groups? Are the largest trials --- which are hardest to suppress --- consistent with the smaller-trial literature? When these questions cannot be answered favorably, the appropriate response is wider uncertainty bands, not skepticism for its own sake.
The lag between methodological reform and visible improvement in the published literature is long. The Turner paper appeared in 2008. The ICMJE registration policy had been in force since 2005. The FDA Amendments Act expanding registration requirements passed in 2007. The EU regulation requiring results reporting passed in 2014. AllTrials launched in 2013. As of the mid-2020s, audits still show substantial non-compliance with results-reporting requirements, and the published literature for many drug classes still likely contains Turner-type distortions for trials run before the registration era. The strategist’s response should not be to assume the problem has been fixed because the institutional response has been mounted; the response should be to assume that the problem is still operating to some degree in any literature whose underlying trials are more than a few years old, and to apply the audit questions above.
The bigger lesson, the one that travels beyond medical evidence and beyond drug trials, is about what a published literature is and what it is for. The published literature is not a record of what a field has tested. It is a record of what a field has tested and chosen to report. The choice is not neutral. In every field where outcomes can be classified as more or less favorable to the hypothesis under test, the published literature is systematically biased in the direction of the favored hypothesis, by mechanisms that include outright non-publication, post-hoc outcome switching, selective reporting of subgroups, and a thousand smaller analytic discretions. The Turner paper is the cleanest demonstration of the mechanism in any field, because the regulatory infrastructure for drugs makes the ground truth available. The same mechanism is operating in fields that do not have regulatory ground truth, but the bias is harder to bound there. For a strategist, the operating principle is the same: assume the literature is biased, calibrate the size of the assumed bias to what we know from cases like Turner where the bias can be measured, and apply correspondingly wider uncertainty intervals to any decision built on top of the literature.
Sources
- Turner, E. H., Matthews, A. M., Linardatos, E., Tell, R. A., & Rosenthal, R. (2008). Selective publication of antidepressant trials and its influence on apparent efficacy. New England Journal of Medicine, 358(3), 252-260. DOI: 10.1056/NEJMsa065779 --- the primary paper.
- Kirsch, I., Deacon, B. J., Huedo-Medina, T. B., Scoboria, A., Moore, T. J., & Johnson, B. T. (2008). Initial severity and antidepressant benefits: A meta-analysis of data submitted to the Food and Drug Administration. PLOS Medicine, 5(2), e45. DOI: 10.1371/journal.pmed.0050045 --- the parallel FDA-file reanalysis on severity stratification.
- Cipriani, A., Furukawa, T. A., Salanti, G., et al. (2018). Comparative efficacy and acceptability of 21 antidepressant drugs for the acute treatment of adults with major depressive disorder: a systematic review and network meta-analysis. The Lancet, 391(10128), 1357-1366. DOI: 10.1016/S0140-6736(17)32802-7 --- the modern integrative meta-analysis.
- Turner, E. H. (2013). Publication bias, with a focus on psychiatry: Causes and solutions. CNS Drugs, 27(6), 457-468. DOI: 10.1007/s40263-013-0067-9 --- Turner’s own retrospective and policy synthesis five years after the NEJM paper.
- Drazen, J. M., Van Der Weyden, M. B., Sahni, P., Rosenberg, J., Marusic, A., Laine, C., et al. (2009). Uniform format for disclosure of competing interests in ICMJE journals. New England Journal of Medicine, 361, 1896-1897. --- ICMJE editorial pattern that built on the Turner findings.
- Wieseler, B., McGauran, N., Kerekes, M. F., & Kaiser, T. (2010). Reporting of randomised controlled trials of reboxetine in published reports compared with clinical study reports: a meta-analysis. BMJ, 341, c4737. DOI: 10.1136/bmj.c4737 --- the IQWiG reboxetine audit that extended the Turner methodology.
- Goldacre, B. (2012). Bad Pharma: How Drug Companies Mislead Doctors and Harm Patients. Fourth Estate. --- the trade-book synthesis that built on Turner and helped launch AllTrials.
- AllTrials Campaign. (2013, launched). Mission statement and current status. --- the institutional response campaign.
- Goldacre, B., DeVito, N. J., Heneghan, C., Irving, F., Bacon, S., Fleminger, J., & Curtis, H. (2018). Compliance with requirement to report results on the EU Clinical Trials Register: cohort study and web resource. BMJ, 362, k3218. DOI: 10.1136/bmj.k3218 --- the TrialsTracker audit of the EU register’s results-reporting compliance.
Related: Other Studies in This Series
This article is part of an ongoing series on famous claims, frameworks, and studies that did not survive scrutiny. Other entries in the medical, statistical, and methodological-bias cluster cover the publication-bias and file-drawer problem in general, the stress-causes-ulcers myth, the p-hacking and researcher-degrees-of-freedom literature, the saturated-fat / diet-heart hypothesis, and Ioannidis 2005 on why most published research findings are false. The full hub lives at /replication-crisis/.
If you are evaluating medical, health, or research evidence for a strategic, clinical, or policy decision and want a careful audit of the underlying literature, book an evidence review.
FAQ
Did Turner’s paper claim that antidepressants do not work? No. The paper claimed that the published literature on antidepressants overstated their efficacy by approximately a third relative to the FDA’s complete trial dataset. The FDA dataset still showed that the drugs as a class were more effective than placebo. The argument was about the magnitude of the effect, not about its existence. The Kirsch paper that appeared the same month, using a subset of the same FDA data, made a related but distinct argument: that the average drug-placebo difference, even in the unbiased dataset, fell below the NICE clinical-significance threshold for patients with mild and moderate depression. The combined picture from the two papers is that antidepressants are not ineffective, but the clinical case for prescribing them across the full severity range of depression --- as had been standard US practice --- was based on a literature that overstated the case. The picture has been refined further by the Cipriani 2018 meta-analysis, which used much more nearly complete data and reaffirmed that all 21 antidepressants studied were more effective than placebo, with effect sizes that ranged from small to moderate.
Should I stop taking my antidepressant because of Turner’s paper? No, and any decision about whether to continue, change, or discontinue an antidepressant should be made with the prescribing clinician and not on the basis of a single methodological paper. The Turner paper is a finding about the population-level evidence base for the drug class, not a finding about whether any particular drug works for any particular patient. Individual response to antidepressants varies substantially. Some patients have clear, sustained, clinically meaningful responses. Others do not. The decision to start, continue, switch, or discontinue an antidepressant involves clinical factors --- severity of depression, response history, side-effect tolerance, alternative treatments available, presence of co-occurring conditions --- that the population-level evidence does not directly address. Discontinuation can also have its own risks, including discontinuation syndrome for some drugs and recurrence of depressive symptoms. The Turner paper informs the prescribing question at the population level; it does not adjudicate the decision for any individual.
Is publication bias still as bad as Turner documented? Probably not as bad, but not solved either. The Turner dataset covered antidepressants approved between 1987 and 2004, predating the modern trial-registration infrastructure. Trials run since the 2007 FDA Amendments Act and the ICMJE registration policies are more likely to be registered, more likely to have their primary outcomes specified in advance, and more likely to have their results posted to a public registry. But audits of compliance with the registration and reporting requirements --- including the TrialsTracker work at Oxford --- continue to find that a substantial fraction of trials remain unreported or partially reported past their statutory deadlines. The publication-bias problem has been reduced by the institutional response but has not been eliminated. For older trials that predate the registration era, the Turner-style distortions are likely still embedded in the published literature, with no easy mechanism to correct them.
What is the AllTrials movement and where does it stand now? AllTrials is a global campaign launched in January 2013 by a coalition that included Sense About Science, the BMJ, the Cochrane Collaboration, PLOS, and others. The campaign’s demand is that all clinical trials, past and present, should be registered, with their full results reported. The campaign has been signed by tens of thousands of individuals and hundreds of organizations, including a large number of pharmaceutical companies, patient groups, and academic institutions. Its institutional impact has been substantial in pushing for stronger regulatory enforcement of existing registration and reporting requirements, particularly in the EU. Its remaining structural challenge is that the regulatory enforcement teeth --- civil penalties for non-reporting --- have been used only sparingly by the FDA and the EU regulators, and many trials still go unreported past their statutory deadlines without consequence. The campaign is ongoing; its central reform agenda is partly accomplished and partly still in progress.
Why does this matter beyond the specific case of antidepressants? The Turner methodology was a forensic audit of one drug class because that was the drug class the lead author had insider access to. The methodology generalizes to any drug class for which a regulatory ground-truth dataset can be obtained. Subsequent audits using similar methods have found similar patterns for other drug classes, including reboxetine (Wieseler 2010), oseltamivir / Tamiflu (Doshi and colleagues, 2014), and others. Beyond drugs, the same publication-bias mechanism is operating in any field where outcomes are classified as more or less favorable to a hypothesis and where there is no regulatory mandate to report results regardless of outcome. Psychology, economics, education research, organizational behavior, and many other fields show indirect evidence of publication bias of comparable magnitude to what Turner found directly for antidepressants, but they cannot be audited as cleanly because they lack the FDA-style ground-truth dataset. The Turner case is the calibration point: when you can measure the publication-bias inflation directly, it is in the range of a third of effect size. When you cannot measure it directly, the appropriate prior is that something in that range is operating.
What is patient-level data sharing and why is it the next escalation? Patient-level data sharing, also called individual-participant-data (IPD) sharing, is the practice of making the underlying anonymized trial data available to outside researchers for re-analysis. It is a more powerful transparency mechanism than results reporting because it allows outside researchers to verify the analyses that the sponsor reported, to conduct sensitivity analyses with different analytic choices, to combine data across trials at the patient level rather than the trial level (which is statistically more efficient), and to ask new questions that the original analysis did not address. Several major journals (including the BMJ and PLOS Medicine) and several large pharmaceutical companies have committed to patient-level data sharing for at least subsets of their trials. Infrastructure projects including the YODA Project at Yale, the COMPare project at Oxford, the Vivli platform, and the European Medicines Agency’s Policy 70 have built the operational systems to support it. The reform agenda in research integrity treats patient-level data sharing as the next major milestone after registration and results reporting; the realistic time scale for it to become standard practice is probably the next decade.
What should I do as a strategist when evaluating any medical or health claim? Apply a sequence of structural questions. First, has a regulatory-data audit (Turner-style) been done for this drug class, intervention, or therapeutic area? If yes, that audit is the most reliable evidence base, even if it is older than the most recent published meta-analyses. Second, are the relevant trials preregistered, with primary outcomes specified in advance? Check the registries (ClinicalTrials.gov, EU CTR, ISRCTN). Third, does the published primary outcome match the preregistered primary outcome? Discrepancies suggest Turner-type outcome switching. Fourth, does the published meta-analysis include unpublished trial data (often disclosed in the methods section as “industry-source data” or “clinical study reports”)? If yes, the meta-analysis is more reliable than published-literature-only equivalents. Fifth, is there a funnel plot or other publication-bias diagnostic in the meta-analysis, and what does it look like? Sixth, are the largest trials --- which are hardest to suppress --- consistent in direction and magnitude with the smaller-trial literature? When these questions can be answered favorably, the evidence base is more reliable than the default. When they cannot, apply correspondingly wider uncertainty bands to any decision built on it. None of this is exotic; it is standard evidence-evaluation discipline applied with the calibration that Turner-class cases force on you. The strategic value is that most decision-makers in most fields do not apply this discipline, and the people who do can correctly identify which medical-evidence claims are robust and which are likely to be revised when the underlying literature is audited.
replication-crisis antidepressant-bias turner-2008 medical-research evidence-evaluation