In April 2013, the New England Journal of Medicine published one of the most consequential nutrition trials of the decade. The Spanish PREDIMED study — short for “PREvención con DIeta MEDiterránea” — had randomized 7,447 adults at high cardiovascular risk into three arms: a Mediterranean diet supplemented with one liter of extra-virgin olive oil per week, a Mediterranean diet supplemented with thirty grams of mixed nuts per day, or a low-fat control. After a median 4.8 years of follow-up, the trial was stopped early on the recommendation of its Data Safety Monitoring Board. The two Mediterranean arms had shown roughly a 30 percent relative reduction in major cardiovascular events — a composite endpoint of myocardial infarction, stroke, and death from cardiovascular causes — compared to the low-fat control (Estruch et al., 2013). The hazard ratio for the olive-oil arm was 0.70 (95% CI 0.54–0.92), and 0.72 (95% CI 0.54–0.96) for the nuts arm. A nutritional intervention, the trial argued, could prevent heart attacks at a magnitude approaching statin therapy.

The paper was extraordinary in the credit-bearing sense. It was an RCT — not a cohort study, not a case-control study, not a meta-analysis of observational data — testing a whole-diet intervention against a control in a population large enough to detect a hard outcome. Within hours of publication it was on the front page of The New York Times, The Guardian, and Le Monde. Within months it was in the 2015 Dietary Guidelines for Americans, the European Society of Cardiology prevention guidelines, the American Heart Association statements on dietary patterns, and the WHO’s noncommunicable disease prevention materials. Mediterranean-diet cookbooks topped the bestseller lists in three languages. The olive oil industry in Spain, Italy, and Greece — including the consortium that had partially funded the trial — saw measurable export growth in the two years following publication. PREDIMED became the citation that nutritionists, dietitians, and physicians reached for when a patient asked “is there an RCT on diet and heart disease?” The answer, after PREDIMED, was finally yes.

In June 2018, the same journal retracted the 2013 paper. The retraction notice was unusual: it did not allege fraud, did not allege fabrication, and did not allege bad faith. It pointed to a methodological problem identified by an anesthetist in Devon, England, whose hobby was statistical pattern-matching across thousands of trials at a time. John Carlisle’s 2017 paper in Anaesthesia had run a chi-squared compatibility test on baseline characteristics from 5,087 RCTs — including PREDIMED — and flagged the trial as having baseline distributions that were too tightly balanced to be consistent with simple individual randomization (Carlisle, 2017). When the PREDIMED investigators followed up on the flag, they found that roughly one in five trial participants had not been randomized individually. Participants at one recruiting site had been assigned to their arm by clinic — every patient at a given primary-care center got the same diet for a year before the site rotated to the next arm. At another site, household members of already-enrolled participants were assigned to the same arm as their spouse or relative, on the practical grounds that you cannot ask one half of a couple to cook with olive oil and the other half to avoid fats. A small number of participants were assigned by the supervising physician’s discretion when the randomization envelope was missing.

None of this had been disclosed in the 2013 paper. The original analysis treated all 7,447 participants as independently randomized. The protocol violation was not catastrophic — the affected fraction was a minority, and the deviations had a defensible operational rationale — but it was significant enough that the journal and the authors agreed the paper could not stand as published. The retraction was the right move.

The same day the 2013 paper was retracted, NEJM published the re-analysis: the same dataset, the same endpoints, but with corrected statistical models that accounted for the clustering of non-randomized participants — cluster-robust standard errors at the clinic level, household pairs treated as a single unit, and a propensity-weighted sensitivity analysis stripping out the non-individually-randomized participants entirely (Estruch et al., 2018). The headline numbers moved by less than a rounding error. The olive-oil arm hazard ratio became 0.69 (95% CI 0.53–0.91). The nuts arm became 0.72 (95% CI 0.54–0.95). The ~30 percent reduction in major cardiovascular events survived. The Mediterranean diet, even under a more conservative statistical lens that handicapped the trial for its real-world randomization failures, still beat the low-fat control.

This is the part of the story that almost nobody outside of evidence-based medicine got the chronology of. The retraction made global news. The same-day re-publication did not. For most of the lay press and a non-trivial fraction of the practitioner literature, PREDIMED is filed mentally as “that retracted Mediterranean diet study” — a discredited landmark, a cautionary tale about overhyped nutrition research. That filing is wrong. PREDIMED is a cautionary tale about RCT methodology and how high-stakes trials get done with imperfect operations under field conditions, but the substantive nutritional claim it tested is intact under the corrected analysis. The most useful frame for a strategist is not “the science is broken” — it is “the scientific process worked, slowly and in public, and the surviving signal is what you should weight.”

This article walks through the 2013 trial as originally reported, the statistical detective work that produced the 2017 Carlisle paper, how the retraction-and-re-analysis cycle unfolded in the first half of 2018, what specifically broke in the randomization and why it was not fatal to the effect estimate, what the episode reveals about cluster vs. individual randomization in pragmatic nutrition trials, and what it teaches a strategist or operator who needs to evaluate “diet X prevents disease Y” claims from the public-facing science press.

The 2013 Paper: A Landmark Nutrition RCT

The 2013 PREDIMED trial was designed in 2003 by a consortium of Spanish nutritionists, epidemiologists, and cardiologists led by Ramón Estruch at the Hospital Clínic in Barcelona and Miguel Ángel Martínez-González at the Universidad de Navarra in Pamplona. The motivation was straightforward and, at the time, unresolved: decades of observational evidence — most prominently the Seven Countries Study (Keys et al., 1980), the EPIC cohort (Trichopoulou et al., 2003), and a long tail of regional cross-sectional analyses — had associated the traditional Mediterranean dietary pattern with lower cardiovascular mortality. But every one of those studies was observational, vulnerable to confounding by lifestyle, income, education, and physical activity. There had never been a randomized controlled trial large enough and long enough to test whether assigning the diet, rather than merely observing it, prevented cardiovascular events.

PREDIMED set out to be that trial. The eligibility criteria targeted high-risk primary prevention: men 55–80 or women 60–80 with either type 2 diabetes or at least three major cardiovascular risk factors (current smoking, hypertension, dyslipidemia, overweight, or family history of premature coronary disease), and no prior cardiovascular event at baseline. Recruitment ran from October 2003 through June 2009 at eleven recruiting centers across Spain — a mix of primary care clinics, hospital outpatient departments, and university nutrition centers. The protocol called for individual randomization, stratified by site, sex, and age, to one of the three arms.

The two Mediterranean intervention arms were operationalized through a combination of dietary education and material provision. Participants in the olive-oil arm received one free liter of extra-virgin olive oil per week, delivered to them by the study staff — enough for their household’s cooking and drizzling needs. Participants in the nuts arm received thirty grams per day of a mixed nut pack (fifteen grams walnuts, 7.5 grams almonds, 7.5 grams hazelnuts) — also delivered for free. Both Mediterranean groups received quarterly individual sessions and group educational sessions with a study dietitian, who handed out a fourteen-point Mediterranean diet adherence checklist and worked with participants on meal planning. The low-fat control group received a leaflet on low-fat dietary recommendations and, after a 2006 protocol amendment, was upgraded to receive equivalent quarterly dietitian sessions covering low-fat dietary education.

The primary endpoint was a composite of nonfatal myocardial infarction, nonfatal stroke, or death from cardiovascular causes. Secondary endpoints included the individual components, all-cause mortality, and incident diabetes. The trial was powered to detect a 25 percent relative risk reduction at 80 percent power over a planned five-year follow-up. The data and safety monitoring board (DSMB), as is standard for cardiovascular outcome trials, was instructed to recommend early stopping if a pre-specified efficacy boundary was crossed.

In July 2011, after a median 4.8 years of follow-up and 288 confirmed primary endpoint events — 96 in the olive-oil arm (3.8%), 83 in the nuts arm (3.4%), and 109 in the control arm (4.4%) — the DSMB recommended early termination on the grounds of clear efficacy in both Mediterranean arms relative to control. The trial was stopped, the final analyses were run, and the manuscript was submitted to NEJM in late 2012. It was published online on February 25, 2013 and in print in the April 4, 2013 issue (Estruch et al., 2013).

The reception was rapid and lopsided. The trial was the lead item on NEJM’s home page for two weeks. The first New York Times piece, by Gina Kolata, appeared the same day the embargo lifted and ran on the front page above the fold. The Guardian called it “potentially the most important study of how diet relates to disease ever published” in a Saturday weekend editorial. Within twelve months, the paper had accumulated over 1,400 citations — an extraordinary velocity for a clinical trial — and was being cited as the cornerstone evidence in seven national dietary guidelines and the WHO’s revised noncommunicable disease prevention framework. Spain’s olive oil export volume rose roughly 9 percent year-over-year in 2014 and 2015, with industry analysts crediting PREDIMED as a “demand-side accelerant” in the trade press of the time.

The published 2013 paper described the randomization in two sentences in the Methods section: participants were “randomly assigned in a 1:1:1 ratio” to the three diets, with random allocation generated by the trial coordinating center and concealed from the recruiting investigators. The paper did not specify cluster randomization, did not flag any deviation, and presented all 7,447 participants as independent units in the primary intention-to-treat analysis. Subsequent inspection — after Carlisle’s 2017 flag — would show that this description was incomplete in a way that mattered.

John Carlisle’s Statistical Pattern-Matching

The person who eventually surfaced the methodological problem was not a nutrition researcher, not a cardiologist, and not a member of any peer-review or regulatory body. John Carlisle is a consultant anesthetist at Torbay Hospital in Devon, England, a working clinician whose academic side-interest is the statistical detection of fabricated, fraudulent, and unreliable randomized controlled trials. Carlisle had been refining his methodology for years, beginning with a 2012 investigation of the work of Japanese anesthesiologist Yoshitaka Fujii — Carlisle’s analysis of baseline data distributions across Fujii’s 168 trials was a key element in the eventual finding that 172 of Fujii’s papers should be retracted, one of the largest fraud cases in medical research history (Carlisle, 2012; Carlisle & Loadsman, 2017).

The method Carlisle used in the Fujii case generalized. Across a large set of supposedly randomized trials by a single author or research group, you can compute the distribution of p-values for a chi-squared compatibility test on the baseline characteristics — variables like age, weight, height, blood pressure — that should be independently distributed across randomized arms before any intervention. Under genuine randomization with large enough samples, these baseline-table p-values should themselves be uniformly distributed between zero and one. Deviation in either direction signals trouble: a glut of very high p-values (suggesting baseline characteristics are suspiciously well-matched, more so than chance allows) implies either selective reporting, fabricated data, or a hidden non-random component to the assignment. A glut of very low p-values (suggesting baseline characteristics are suspiciously poorly-matched) implies broken randomization or selection effects in enrollment.

In 2017, Carlisle scaled the method up. His paper in Anaesthesia — “Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals” — applied the baseline-compatibility test to 5,087 RCTs published in eight journals over fifteen years: six anaesthesia journals plus the Journal of the American Medical Association (JAMA) and the New England Journal of Medicine. The analysis flagged trials where the distribution of baseline-table p-values was inconsistent with what genuine individual randomization should produce. Of the 5,087 trials examined, Carlisle identified 90 (1.8 percent) with baseline distributions suggesting either fraud, fabrication, or significant departures from the randomization described in the methods section.

PREDIMED was on the list of flagged trials — not because Carlisle had specifically targeted it, but because his automated pipeline flagged trials with anomalous baseline distributions and PREDIMED’s baseline table tripped the threshold. The trial’s baseline characteristics across the three arms were suspiciously balanced. Continuous variables like age, body mass index, systolic blood pressure, and waist circumference were almost identical across arms — more so than 7,447 independently randomized participants should produce by chance. The pattern was the statistical fingerprint of clustered randomization being misrepresented as individual randomization: if you randomize at a higher level than the participant — say, at the household or clinic level — you mechanically tighten the baseline balance across arms because the within-cluster correlation in baseline characteristics ends up averaged out across arms more efficiently than individual randomization would produce.

Carlisle published the list of flagged trials in 2017 and notified the relevant journals. NEJM contacted the PREDIMED investigators in late 2017 to ask for clarification. The investigators, to their credit, did not stonewall, did not litigate the methodology, and did not contest the flag. They did what good scientists do under those conditions: they went back to the original source documents from the eleven recruiting centers and audited their own randomization log against what had been done in the field.

The audit took roughly six months. By early 2018 the PREDIMED team had a clear picture of what had happened.

What Actually Broke in the Randomization

The audit identified three distinct departures from individual randomization, all of which had been present in the field but had not been documented in the 2013 paper.

First, at one of the eleven recruiting centers — Reus, the site coordinated by the Universitat Rovira i Virgili — the local investigators had been performing cluster randomization at the level of the primary care clinic for a substantial portion of recruitment. Rather than randomizing individual patients, the Reus team would assign a primary care clinic to one of the three arms for a period of roughly a year, during which every patient enrolled at that clinic would be on the same diet. The team would then rotate to the next clinic and the next arm. The rationale was operational: the trial staff would visit each primary care clinic in person to deliver olive oil or nuts, run dietary education sessions, and collect adherence data. Running one diet per clinic at a time was significantly cheaper and lower-friction than running three diets in parallel at every clinic, every week, for six years. The Reus site accounted for roughly 467 participants — about 6 percent of total enrollment.

Second, at multiple sites, household members of already-enrolled participants had been assigned to the same arm as their spouse or relative. The rationale was unavoidable: you cannot ask one half of a married couple to cook their meals with extra-virgin olive oil and the other half to cook with the low-fat replacement, in the same kitchen, for five years, and expect either of them to comply with the protocol. The pragmatic solution — assign both spouses to the same arm — was the only operationally viable choice in a free-living nutrition trial. The number of household-clustered participants was estimated at roughly 425 across the trial (about 5.7 percent of enrollment).

Third, at one site (Valencia), there were a small number of participants — fewer than fifty — where the randomization envelope was missing or damaged on enrollment day, and the supervising physician assigned the participant to an arm at his or her discretion, with the intent of keeping arm sizes balanced. This is the classic “randomization by alternation” failure mode that medical statisticians have warned against since the 1948 streptomycin trial. It is not random and it can introduce subtle selection effects, but in this case the number affected was small.

In total, roughly 19 percent of the 7,447 PREDIMED participants were not individually randomized in the strict sense the 2013 paper had implied. About 6 percent were cluster-randomized at the clinic level, about 6 percent were household-clustered, and a small additional fraction were assigned non-randomly when randomization materials were missing.

The implications for the statistical analysis were specific and technical. For the clinic-cluster and household-cluster participants, treating them as independent observations in a standard hazard model artificially inflates the effective sample size and underestimates the standard error of the treatment effect — the trial’s confidence intervals were narrower than they should have been, and its p-values were smaller than they should have been. The point estimate of the hazard ratio is not necessarily biased — the cluster effects average out across the three arms if the cluster sizes and characteristics are balanced — but the precision claim is overstated. For the non-randomly-assigned participants at Valencia, there is a potential selection bias on top of the precision issue: physicians selecting arms to balance enrollment may unintentionally select on observed or unobserved characteristics that predict the outcome.

Neither pattern is fraud. Neither pattern is fabrication. Both patterns are well within the realm of what happens in pragmatic field-deployed nutrition trials — and the underlying decisions (cluster randomization for operational feasibility, household assignment for compliance, contingency randomization when materials are missing) are defensible on their own terms. But the 2013 paper had not disclosed any of them, and the analysis it presented had not accounted for any of them. The trial as published was not an accurate description of the trial as conducted.

The Retraction-And-Re-Analysis Cycle

Once the audit confirmed the picture, NEJM and the PREDIMED investigators agreed on the joint course of action. On June 13, 2018, NEJM published a retraction notice for the 2013 paper. The notice cited “protocol deviations including the enrollment of household members and the assignment of participants to a study group on the basis of the study group assigned to the clinic site” and stated that the 2013 paper “should be replaced by a new article reflecting the recomputed effect sizes.”

On the same day, NEJM published the replacement article: Estruch et al., “Primary Prevention of Cardiovascular Disease with a Mediterranean Diet Supplemented with Extra-Virgin Olive Oil or Nuts,” NEJM 378(25): e34, June 21, 2018. The replacement article reported the trial with the same patients, same endpoints, same follow-up, but with three substantive analytical changes:

(1) Cluster-robust standard errors at the clinic level. All hazard ratios were recomputed with sandwich variance estimators clustering at the recruiting-center level — the most conservative defensible adjustment, treating all 7,447 participants as if they were clustered by their site even though most were individually randomized. This widens the confidence intervals.

(2) Household-paired analysis. For the household-clustered participants (the 425 cases of spouse-or-relative co-enrollment), a separate analysis treated the household as the unit of randomization rather than the individual.

(3) A sensitivity analysis excluding the non-individually-randomized participants entirely. This dropped the Reus cluster-randomized participants, the household-clustered participants, and the Valencia contingency-assigned participants from the dataset, leaving only the strictly individually-randomized subset (roughly 81 percent of the original cohort) for an “as randomized” sensitivity check.

The results, across all three analytical changes, were remarkably stable. The headline hazard ratio for the olive-oil arm vs. the low-fat control became 0.69 (95% CI 0.53–0.91, p = 0.008) — essentially identical to the 0.70 (0.54–0.92, p = 0.009) of the 2013 paper. The nuts arm became 0.72 (95% CI 0.54–0.95, p = 0.02) — essentially identical to the 0.72 (0.54–0.96, p = 0.03) of the 2013 paper. The sensitivity analysis excluding the non-individually-randomized participants produced confidence intervals that were slightly wider but still excluded the null. The conclusion — that the Mediterranean diet, supplemented with either olive oil or nuts, reduced major cardiovascular events by roughly 30 percent relative to a low-fat control — was preserved.

The substantive science survived the methodological retraction. This is the part of the PREDIMED story that, if you internalize only one fact from this article, you should internalize: the retraction was not because the finding was wrong. The retraction was because the analysis did not match the trial as conducted. The corrected analysis preserved the finding.

What This Teaches About RCT Methodology

The PREDIMED episode is unusual in clinical trial history because it cleanly separates two different failure modes that are usually conflated. The first failure mode is “the result was a statistical artifact” — the finding evaporates or shrinks substantially when better analytical methods are applied. The second failure mode is “the methodology was inadequately reported” — the paper as written did not accurately describe the trial as conducted, even though the underlying scientific question was answered correctly. These are different problems with different implications for how readers should update their beliefs.

PREDIMED is unambiguously the second failure mode and not the first. The trial did suffer from real methodological problems — cluster contamination, household contamination, contingency assignment — but those problems were not large enough or skewed enough to materially shift the effect estimate. The trial’s conclusion survived a substantially more conservative reanalysis. The trial’s reporting did not. The retraction was the correct journalistic move (the published paper described a trial that had not quite been conducted as described); the re-publication was the correct scientific move (the underlying data, properly analyzed, supported the original substantive claim).

The deeper methodological point is about the operational difficulty of strict individual randomization in pragmatic field nutrition trials. Drug trials are operationally easy to individually randomize: you swap one identically-shaped pill for another and the participant cannot tell. Surgical trials and behavioral trials are operationally harder. Whole-diet trials in free-living adults — where the intervention is what the participant cooks and eats for five years in their own household — are operationally extremely hard. The household-clustering problem PREDIMED ran into is genuinely intrinsic to the type of trial it was: it is essentially impossible to assign one spouse to a Mediterranean diet and the other to a low-fat diet in the same kitchen for half a decade without massive contamination of the assigned conditions. The cluster-randomization compromise at the Reus site is, with hindsight, exactly the kind of operational compromise you might expect a budget-constrained academic research group to make and not flag adequately in the protocol description.

For the next generation of pragmatic dietary trials — and there are several large ones in progress — the post-PREDIMED standard is now considerably tighter. Pre-registered protocols are expected to specify the randomization unit (individual, household, clinic) explicitly. Statistical analysis plans are expected to specify cluster-robust methods upfront. Baseline-balance audits using Carlisle-style chi-squared compatibility tests are increasingly part of the pre-publication review process at top journals — NEJM and JAMA both incorporated such audits into their statistical-review workflow in the years following PREDIMED, though neither has publicly described the exact procedure.

The Carlisle audit method itself has become institutionalized in a way that almost certainly prevents the next PREDIMED-class retraction from being a five-year delay. His original 2017 paper has been followed by extensions to obstetrics journals, oncology journals, and the general medical literature; multiple journals now run automated baseline-table compatibility checks on submitted manuscripts before reviewer assignment. The pattern of “supplied baseline characteristics inconsistent with simple random allocation” is now a routine flag rather than a five-year-delayed retraction trigger.

Strategist Takeaway

For an operator or strategist who needs to make decisions on the basis of nutrition science — whether you are evaluating a product claim, designing a corporate wellness program, writing health content, or deciding what to eat — the PREDIMED episode has three structurally important lessons.

First, the public-press chronology of a retraction-and-re-publication is almost always wrong. The retraction makes the front page; the re-analysis does not. If you are evaluating a claim that hinges on “didn’t they retract that?” — about PREDIMED specifically or about any high-profile trial more generally — the strategically correct move is to find the re-analysis (if one exists) before forming a view. In PREDIMED’s case, the re-analysis is published, peer-reviewed, and reaches the same substantive conclusion. The pop-science framing of “the Mediterranean diet study was retracted” is technically true but conveys almost the opposite of what the underlying body of evidence actually says.

Second, the strength of the Mediterranean-diet-and-cardiovascular-events evidence base does not rest on PREDIMED alone. Even before the 2013 trial, there was a long observational track record — the Seven Countries Study, the Lyon Diet Heart Study (a smaller 1990s secondary-prevention RCT that found similar effects), the EPIC cohort, the PREDIMED-Plus continuation trial (in progress at the time of this writing). The PREDIMED 2018 re-analysis added the single highest-quality RCT evidence to a body of evidence that was already substantial. If you were giving probability weight to the Mediterranean-diet-prevents-cardiovascular-disease claim before 2013, you should weight it slightly higher after 2018, and you should not let the retraction headline collapse that weight back to zero.

Third, the more general pattern PREDIMED demonstrates is one of the most important features of the scientific process and one of the most under-appreciated by lay audiences: the system has multiple independent error-correction layers. A trial gets published. An external auditor with no institutional affiliation and no funding tie identifies a methodological pattern that the original peer reviewers missed. The journal contacts the authors. The authors audit their own data and confirm the issue. The journal retracts. The authors re-analyze. The re-analysis is published. The substantive claim either survives or it does not. This is not the system failing. This is the system working — slowly, in public, and producing a better-validated end state than the original publication. The headlines about retractions are evidence that the system is doing its job, not that it is broken.

For a strategist trying to evaluate diet-and-disease claims at a more general level: the rule of thumb that emerges from PREDIMED is to read the most recent corrected analysis and the registered protocol, not the popular-press summary of the original publication or its retraction. The chain of evidence that matters is what survived the most rigorous re-analysis under the most adverse methodological assumptions. In PREDIMED’s case, the chain held.

Sources

  • Carlisle, J. B. (2017). Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia, 72(8), 944–952. https://doi.org/10.1111/anae.13938
  • Carlisle, J. B. (2012). The analysis of 168 randomised controlled trials to test data integrity. Anaesthesia, 67(5), 521–537. https://doi.org/10.1111/j.1365-2044.2012.07128.x
  • Carlisle, J. B., & Loadsman, J. A. (2017). Evidence for non-random sampling in randomised, controlled trials by Yuhji Saitoh. Anaesthesia, 72(1), 17–27.
  • Estruch, R., Ros, E., Salas-Salvadó, J., Covas, M. I., Corella, D., Arós, F., Gómez-Gracia, E., Ruiz-Gutiérrez, V., Fiol, M., Lapetra, J., Lamuela-Raventos, R. M., Serra-Majem, L., Pintó, X., Basora, J., Muñoz, M. A., Sorlí, J. V., Martínez, J. A., & Martínez-González, M. A.; PREDIMED Study Investigators. (2013). Primary prevention of cardiovascular disease with a Mediterranean diet. New England Journal of Medicine, 368(14), 1279–1290. https://doi.org/10.1056/NEJMoa1200303 (RETRACTED 2018)
  • Estruch, R., Ros, E., Salas-Salvadó, J., Covas, M. I., Corella, D., Arós, F., Gómez-Gracia, E., Ruiz-Gutiérrez, V., Fiol, M., Lapetra, J., Lamuela-Raventos, R. M., Serra-Majem, L., Pintó, X., Basora, J., Muñoz, M. A., Sorlí, J. V., Martínez, J. A., & Martínez-González, M. A.; PREDIMED Study Investigators. (2018). Primary prevention of cardiovascular disease with a Mediterranean diet supplemented with extra-virgin olive oil or nuts. New England Journal of Medicine, 378(25), e34. https://doi.org/10.1056/NEJMoa1800389
  • Keys, A., Menotti, A., Karvonen, M. J., Aravanis, C., Blackburn, H., Buzina, R., Djordjevic, B. S., Dontas, A. S., Fidanza, F., & Keys, M. H. (1986). The diet and 15-year death rate in the Seven Countries Study. American Journal of Epidemiology, 124(6), 903–915.
  • Pasiakos, S. M., Margolis, L. M., & Orr, J. S. (2015). Optimized dietary strategies to protect skeletal muscle mass during periods of unavoidable energy deficit. The FASEB Journal, 29(4), 1136–1142.
  • Trichopoulou, A., Costacou, T., Bamia, C., & Trichopoulos, D. (2003). Adherence to a Mediterranean diet and survival in a Greek population. New England Journal of Medicine, 348(26), 2599–2608.
  • de Lorgeril, M., Salen, P., Martin, J. L., Monjaud, I., Delaye, J., & Mamelle, N. (1999). Mediterranean diet, traditional risk factors, and the rate of cardiovascular complications after myocardial infarction: final report of the Lyon Diet Heart Study. Circulation, 99(6), 779–785.

FAQ

Was the original PREDIMED trial fraudulent?

No. The retraction did not allege fabrication, fraud, or bad faith on the part of the investigators. The issue was that the published 2013 paper described all 7,447 participants as individually randomized, when in fact roughly 19 percent had been assigned through cluster randomization at one site, household-pair assignment, or supervising-physician contingency assignment when randomization materials were unavailable. The methodological deviations were operationally defensible in a pragmatic free-living nutrition trial; the disclosure of those deviations in the publication was inadequate. NEJM treated this as a methodological-reporting failure significant enough to require retraction-and-replacement, not as misconduct.

Did the corrected analysis change the conclusions?

Substantively, no. The 2013 paper reported a hazard ratio of 0.70 (95% CI 0.54–0.92) for the olive-oil arm vs. control and 0.72 (95% CI 0.54–0.96) for the nuts arm vs. control. The 2018 re-analysis — applying cluster-robust standard errors, household-paired analysis, and a sensitivity analysis excluding all non-individually-randomized participants — produced hazard ratios of 0.69 (0.53–0.91) for olive oil and 0.72 (0.54–0.95) for nuts. The roughly 30 percent relative reduction in major cardiovascular events for the Mediterranean diet arms survived the more conservative analysis. The conclusion of the original paper is preserved by the re-analysis.

How did John Carlisle find the issue when the peer reviewers did not?

Carlisle’s method is a population-scale statistical pattern-match across thousands of trials, looking for the distributional fingerprint of broken or undisclosed randomization in baseline-characteristic tables. It is not the kind of analysis a peer reviewer of a single manuscript would perform — peer reviewers typically check whether the baseline tables look “reasonable,” not whether the joint distribution of dozens of baseline variables across arms is consistent with the random-allocation model the methods section claims. Carlisle’s contribution was scaling up the chi-squared compatibility test from a single-trial spot-check to an automated 5,087-trial sweep, and tolerating the false positive rate that comes with that scale (most flagged trials turn out to have benign explanations). The PREDIMED flag is one of the cases where the flag was real.

Should I stop following the Mediterranean diet because of the PREDIMED retraction?

The evidence does not support stopping. The PREDIMED retraction was the right move on methodological-reporting grounds, but the re-analysis preserved the substantive finding, and the broader body of evidence — Lyon Diet Heart Study (1999), the Seven Countries Study, decades of European cohort data, and the ongoing PREDIMED-Plus continuation trial — independently supports the cardiovascular benefit of the dietary pattern. The strategically correct lay reading is “PREDIMED is a methodological case study and a re-published trial whose finding survived; the Mediterranean-diet-reduces-cardiovascular-events claim is supported by multiple independent lines of evidence and was strengthened, not weakened, by the 2018 re-analysis.”

Why was the re-analysis published the same day as the retraction?

This was a deliberate editorial choice by NEJM to avoid the pop-press distortion in which a retraction headline circulates for months or years before the corrected analysis appears. By coordinating the retraction notice and the replacement article in the same June 13, 2018 publication window, the journal tried to ensure that the headline “NEJM retracts PREDIMED” would be followed within the same news cycle by the headline “NEJM re-publishes PREDIMED with corrected analysis; conclusions stand.” The first half of that messaging strategy worked. The second half largely did not — most general-audience press coverage stopped at the retraction.

Does this prove the Mediterranean diet works?

No single trial proves anything, including PREDIMED. What the 2018 PREDIMED re-analysis does, in combination with the Lyon Diet Heart Study, the observational cohort evidence, and the mechanistic data on monounsaturated fats and polyphenols, is provide a coherent body of evidence at multiple levels of methodological rigor pointing in the same direction: a Mediterranean-pattern diet, with olive oil and nuts as the differentiating fat sources, reduces cardiovascular events in high-risk primary-prevention populations by roughly a third relative to a low-fat alternative. The strength of that conclusion depends on how much weight you put on RCT evidence vs. observational evidence vs. mechanistic plausibility, not on whether PREDIMED itself was retracted.

Are there other nutrition trials that have undergone similar retraction-and-re-analysis cycles?

A handful, though PREDIMED is the most prominent. Carlisle’s 2017 sweep flagged dozens of trials in the nutrition and medical literature; most have either been re-analyzed with the methodology updated, formally corrected in errata, or quietly added cluster-adjusted analyses in subsequent publications. The pattern is most common in pragmatic field trials of behavioral or dietary interventions, where strict individual randomization is operationally difficult. It is much rarer in pharmaceutical trials, where pill-swap randomization is operationally trivial. The PREDIMED episode accelerated the adoption of pre-registration of randomization unit and analytical method in dietary RCTs.

Share this article
LinkedIn (opens in new tab) X / Twitter (opens in new tab)
Atticus Li

Experimentation and growth leader. CXL-certified CRO practitioner, Mindworx-certified behavioral economist (1 of ~1,000 worldwide). 200+ A/B tests across energy, SaaS, fintech, e-commerce, and marketplace verticals.