Registered Reports: The Journal Format That Makes Replication Reform Stick

Atticus Li

← The Replication Crisis · replication-crisis

Registered Reports: The Journal Format That Makes Replication Reform Stick

Registered Reports flip the order of peer review: the methodology is reviewed and accepted before the data are collected, and the journal commits to publishing the eventual paper regardless of results. The empirical record shows what publication bias was actually doing — 96% positive in the standard literature, 44% in registered reports.

By Atticus Li May 25, 2026 29 min read

In one corner: the standard psychology literature, where roughly 96% of published empirical papers report a positive finding in support of the authors’ hypothesis. In the other corner: a sample of papers published under the Registered Reports format, where the figure drops to roughly 44%. Same field. Same kinds of questions. Same generation of researchers. The only thing that changed was the order of peer review.

That contrast — first quantified by Scheel, Schijen, and Lakens in a 2021 paper in Advances in Methods and Practices in Psychological Science — is the most direct empirical demonstration of publication bias that the replication-crisis literature has produced. It is not an inference from funnel plots, not a back-of-envelope simulation, not a theoretical Bayesian argument. It is a side-by-side comparison of the positive-result rates under two publication regimes that differ only in when peer review happens. The gap between 96% and 44% is, more or less, what publication bias and analytic flexibility were doing to the psychology literature before the reform.

This article walks through what Registered Reports actually are, the two-stage review process that defines them, what Scheel et al. (2021) measured and why the comparison is informative, how adoption has spread across journals, the Chambers (2017) manifesto that frames RRs as the strongest available institutional response to the replication crisis, where the format has been honestly criticized, and how a working strategist should treat a finding published as a Registered Report versus the same finding published the conventional way.

The Cold Open: 96% and 44%

The 96% number is the easier one to source. Across multiple bibliometric analyses of top psychology journals (Sterling 1959 was the first, Sterling, Rosenbaum, and Weinkam 1995 was the modern update, and Fanelli’s 2010 PLOS ONE analysis is the most-cited recent figure), the proportion of empirical papers in social and behavioral psychology that report a “positive” finding — a result that supports the authors’ headline hypothesis, with p < 0.05 — sits in the mid-90s. Fanelli’s specific number for psychology and psychiatry was 91.5% positive. Other field-specific and journal-specific analyses produce numbers between 90% and 97%. The shorthand “96%” comes from the specific sample Scheel et al. used to construct their comparison.

The 44% number is what they measured in their Registered Reports sample. Scheel and colleagues identified all empirical psychology papers published under the Registered Reports format up to a 2018 cutoff date — 71 papers in total — and coded each one for whether its primary hypothesis was supported. They compared this against a matched random sample of standard-format psychology papers from the same journals (or comparable journals when an RR-publishing journal had a small standard-format sample), and found a positive-result rate of 44% in the RR sample versus 96% in the matched standard-format sample.

The gap is roughly 52 percentage points. It is the largest single quantitative estimate of the combined effect of publication bias, selective reporting, p-hacking, HARKing, and the other analytic flexibilities that the standard publishing model implicitly tolerates. If you had been a working psychologist reading the literature in the years before the reform and adjusting your priors based on what the published record was telling you, your priors would have been systematically miscalibrated by something on the order of this gap. The world was not 96% in support of the hypotheses tested. The publication regime was simply selecting hypotheses-supported findings into print and selecting hypotheses-not-supported findings out of it.

This is the cold-open version of why Registered Reports matter. The format is not just a procedural reform. It is a measurement instrument that reveals, by direct contrast, what the prior regime was doing.

What Registered Reports Are

A Registered Report is a publication format introduced at the journal Cortex in 2013 by then-editor Chris Chambers. The format inverts the timing of peer review. In the conventional model, you design a study, collect the data, analyze them, write up the results, and then submit the completed manuscript to a journal — at which point reviewers evaluate the methodology, the analysis, and (critically) the results together. The decision to publish hinges substantially on whether the results are “interesting,” which in practice means: positive, novel, and supportive of the headline hypothesis.

In the Registered Reports model, the order is different. You design a study, write up the introduction, the hypotheses, the methodology, and the planned analysis as a complete Stage 1 manuscript — but with no data. You submit this Stage 1 manuscript to the journal. Reviewers evaluate it on the merits of the question and the rigor of the methodology: is the hypothesis interesting and well-grounded? Are the methods adequate to test it? Is the planned analysis appropriate, pre-specified, and free of degrees of freedom that would allow post-hoc result-shopping? If the Stage 1 manuscript clears review, the journal issues an “in-principle acceptance” (IPA): a binding commitment to publish the eventual paper regardless of whether the results turn out positive, negative, or null.

You then collect the data. You run the pre-specified analysis. You write up the Stage 2 manuscript, which adds the results and discussion sections to the Stage 1 protocol. The Stage 2 manuscript goes through a final review — but the review is restricted to checking that the authors actually followed the registered protocol, that any deviations are appropriately disclosed and justified, and that the interpretation of the results is calibrated to what was found. The publication decision is no longer contingent on whether the results were positive. The journal committed to publishing at Stage 1.

The format addresses, in a single institutional move, the four largest mechanisms that drive low PPV in the standard literature:

Publication bias is eliminated by construction. Null and negative results get published. The selection mechanism that drove the 96% figure is removed.
P-hacking is preempted. The analysis plan is pre-registered as part of the Stage 1 review. Researchers cannot try seventeen specifications and report the one that crosses the significance threshold, because all seventeen would constitute deviations from the registered plan.
HARKing (hypothesizing after results are known) is prevented. The hypothesis is frozen at Stage 1, before any data are collected. Researchers cannot retrofit a “we predicted this all along” framing onto an unexpected finding.
Researcher degrees of freedom are bounded. Subgroup analyses, covariate choices, outcome operationalizations, and statistical model specifications are all pre-specified in the Stage 1 protocol. The implicit multiple-comparisons inflation that drove much of Ioannidis’s Corollary 4 is curtailed.

Each of these four mechanisms is independently documented in the methodological literature as a major contributor to false positives. Registered Reports do not solve any one of them with a new statistical technique. They solve all four with a procedural change: peer review of the methodology before the results exist.

The Two-Stage Review, In Practice

The Stage 1 review at most RR-offering journals is structured around four explicit criteria. Reviewers are asked to evaluate: (1) whether the research question is important and the hypotheses are well-grounded; (2) whether the methodology is logically sound and capable of testing the hypotheses with adequate statistical power; (3) whether the proposed analysis plan is sufficiently detailed, pre-specified, and resistant to undisclosed flexibility; and (4) whether any obvious confounds, alternative explanations, or measurement issues have been addressed in the design. Crucially, reviewers are not asked to predict whether the results will be interesting — only whether the methodology, if executed faithfully, would produce informative results regardless of the direction the findings take.

The Stage 1 outcome is typically one of three: rejection (the methodology is not adequate), revision (specific issues need to be addressed before in-principle acceptance), or in-principle acceptance (the methodology is sound and the journal commits to publishing the Stage 2 manuscript). The IPA is a real commitment. Journals that offer RRs have published null and negative findings under the format precisely because they had pre-committed to do so. The published RR literature is, as a consequence, much closer to a random sample of what the underlying research found than the standard literature is.

Between Stage 1 and Stage 2, the authors execute the registered protocol. The pre-specified analysis is run. The Stage 2 manuscript adds the results section, the discussion, and any deviations-from-protocol notes that are appropriate. Common Stage 2 issues include: data collection that fell short of the registered sample size (typically requires a discussion of the implications for power), analysis decisions that turned out to require additional specification not anticipated at Stage 1 (typically allowed with clear disclosure), and exploratory analyses beyond the pre-registered set (allowed, but clearly labeled as exploratory rather than confirmatory).

The Stage 2 review is narrower than a standard peer review. Reviewers check that the registered protocol was followed; they check that the discussion is calibrated to what was actually found; they catch obvious analytic errors. They do not re-litigate the importance of the question, the soundness of the methodology, or the publication-worthiness of the results. Those questions were settled at Stage 1.

The end result is a published paper that looks structurally like a conventional paper but has a fundamentally different provenance. The hypotheses were pre-registered. The methodology was pre-reviewed. The analysis was pre-specified. The publication decision was made before the results existed. A reader of an RR paper can take the result at substantially more face value than the same reader can take a standard-format paper of equal apparent rigor.

What Scheel, Schijen, and Lakens (2021) Measured

The empirical paper that quantified the 96%/44% gap is Scheel, Schijen, and Lakens (2021), “An excess of positive results: Comparing the standard psychology literature with registered reports,” published in Advances in Methods and Practices in Psychological Science (DOI: 10.1177/25152459211007467)). The paper is a deliberate before-and-after comparison: it asks what fraction of empirical psychology papers report a positive primary finding under the standard publication regime, and compares this to the fraction in the same field’s Registered Reports literature.

The methodology was straightforward. The authors identified all Stage 2 Registered Reports published in psychology journals up to October 2018 — 71 papers across multiple journals. For each RR, two independent coders read the paper and classified the outcome of the primary pre-registered hypothesis test as either “positive” (the predicted effect was found in the predicted direction at the registered significance threshold), “null” (the predicted effect was not found), or “ambiguous” (the test was inconclusive in a way that did not cleanly map to either category). Coder agreement was high (Cohen’s kappa above 0.7); disagreements were resolved by discussion.

For the comparison sample, the authors drew a matched random sample of 152 standard-format empirical papers from the same journals (where the journal had published enough non-RR papers in the relevant time window) or from comparable peer journals (where the RR-publishing journal did not have a comparable standard sample). The same coding procedure was applied: independent classification of the primary hypothesis test as positive, null, or ambiguous.

The headline result, with the ambiguous category included:

Standard literature: 96% positive (the remaining 4% were null or ambiguous in roughly equal measure).
Registered Reports: 44% positive (with roughly 56% null or ambiguous).

The gap is 52 percentage points. The 95% confidence intervals around the two proportions do not come close to overlapping. The probability that this gap is a sampling artifact, rather than a real difference in the underlying publication distributions, is vanishingly small.

The authors were careful in interpreting the result. The 52-point gap is not solely attributable to publication bias in the narrow sense (file-drawer rejection of null results). It also includes the effects of selective analysis (p-hacking), HARKing (post-hoc hypothesis adjustment to match what was found), selective reporting (writing up the analysis that worked and not the ones that did not), and ordinary publication selection (journals’ implicit preference for “interesting” results). The Registered Reports format addresses all of these simultaneously, and the gap reflects the aggregate effect of removing all of them.

The authors also addressed several obvious objections. Could the RRs be of lower quality, and the null results reflect underpowered or poorly designed studies? The opposite is more likely: Stage 1 review at most RR-publishing journals requires adequate power and rigorous methodology as a condition of in-principle acceptance, so the RR sample is, if anything, methodologically stronger than the comparison sample. Could the RR authors be self-selecting toward null-result-prone questions? Possible in principle, but the topics covered in the RR sample are recognizably typical of psychology research — there is no obvious indication that RR authors are testing hypotheses they expect to fail. Could the comparison sample be unrepresentative of the broader literature? The 96% positive rate matches the figure that Fanelli and others have found across larger samples of psychology and psychiatry journals, so the comparison sample is representative of the standard literature.

What the gap measures, then, is the magnitude of the publication-bias-plus-analytic-flexibility distortion in the pre-reform regime. The standard psychology literature was reporting positive findings at a rate that was inconsistent with the actual prevalence of true effects in the underlying research being conducted. The RR literature, which removes the selection and the flexibility, reports positive findings at a rate that is much closer to what the underlying research is finding. The difference is the bias.

This is the single most empirically forceful argument for Registered Reports as an institutional reform. The reform is not merely procedurally cleaner; it directly demonstrates that the prior regime was producing a literature that was systematically misleading. A reader who internalizes the 96%/44% comparison should walk away with a substantially adjusted prior about how much to trust positive findings in standard-format psychology papers from the pre-reform era.

The Origin Story: Chambers (2013) at Cortex

The Registered Reports format was launched at the journal Cortex in 2013 under the editorship of Chris Chambers, a cognitive neuroscientist at Cardiff University. Chambers’s introductory editorial — “Registered reports: A new publishing initiative at Cortex” (DOI: 10.1016/j.cortex.2012.12.016)) — laid out the format and the reasoning in a short editorial that has since become the founding document of the format. The arguments in that editorial are worth understanding because they pre-figure the empirical findings that came later.

Chambers’s editorial began with the observation that the standard publishing model creates a structural incentive for researchers to engage in practices that, individually and collectively, degrade the reliability of the published record. The selection of “interesting” results into print rewards p-hacking. The expectation that successful papers report supporting findings rewards HARKing. The career-evaluation premium placed on high-impact publications rewards optimization of analytic choices in the direction of publishable significance. The collective consequence is a literature that is systematically more positive, more dramatic, and more cleanly hypothesis-confirming than the underlying research actually warrants.

Chambers’s proposed solution was structural rather than exhortative. Asking researchers to behave better, in his framing, would not work — the incentives were too well-aligned with the problematic behaviors. The reform had to change the incentives. By moving peer review to before data collection and pre-committing to publish regardless of results, the Registered Reports format removes the incentive to massage analyses toward significance and removes the file-drawer destination for null findings. Researchers who follow the RR pathway have no career penalty for publishing null results, because the publication itself was secured at Stage 1. Researchers who attempt to manipulate analyses post-hoc have nowhere to take the manipulated results, because the registered protocol is the protocol of record.

The 2013 editorial was followed in 2014 by a more developed methodological treatment co-authored by Brian Nosek and Daniel Lakens in Social Psychology (DOI: 10.1027/1864-9335/a000192)), titled “Registered reports: A method to increase the credibility of published results.” Nosek (founder of the Center for Open Science) and Lakens (methodologist at TU Eindhoven) extended the argument from Chambers’s specific journal-level intervention to a general framework for credibility-enhancing publication formats. The 2014 paper articulated the four-criterion Stage 1 review structure that has since become standard across RR-offering journals, and it explicitly framed RRs as a complement to (rather than a replacement for) other open-science reforms like preregistration, open data, and open materials.

The combination of the Chambers 2013 editorial and the Nosek-Lakens 2014 framework gave the format the institutional traction it needed to spread beyond Cortex. By 2017, dozens of journals had adopted some version of the format. By the mid-2020s, the Center for Open Science maintains a public list of RR-offering journals that includes more than 300 titles across psychology, biomedical sciences, neuroscience, education, and (increasingly) economics. The format has not displaced standard publishing — most empirical papers are still published the conventional way — but it has become a recognized and growing alternative pathway that researchers can choose when they want their work to carry the credibility premium that comes with pre-registered, pre-reviewed methodology.

Chambers (2017): The Manifesto

Chambers extended the editorial-length argument of his 2013 piece into a book-length manifesto in 2017: The Seven Deadly Sins of Psychology: A Manifesto for Reforming the Culture of Scientific Practice (Princeton University Press). The book is organized around seven structural pathologies of the working culture of empirical psychology — publication bias, low power, p-hacking, HARKing, fraud, lack of replication, and limited data sharing — and proposes a coordinated set of reforms that would collectively address them. Registered Reports occupy a central position in the proposed reforms: they are the single intervention that addresses the largest number of the seven pathologies simultaneously.

The book is worth reading in full for working scientists and methodologists, but for the strategist who is encountering the framework for the first time, the load-bearing claim is this: structural reforms of the publication pipeline are the only credible path out of the replication crisis, and Registered Reports are the strongest available structural reform. Exhortations to “do better science” are insufficient because they do not change the incentive landscape that produces the seven sins. Statistical reforms (better p-values, Bayesian alternatives, effect-size emphasis) are insufficient because they do not address the selection that determines what gets published in the first place. Reform of the publication regime — making peer review evaluate methodology before the existence of results — is the intervention that does the most institutional work per unit of effort.

Chambers’s argument has been broadly accepted by the methodological community. The methodological literature since 2017 has converged on Registered Reports as a near-consensus best-practice for confirmatory research, particularly for studies whose results have potential to influence policy, practice, or public understanding. The format does not work as well for purely exploratory research (where the hypotheses are not crisp enough to pre-register), for very long-cycle research (where the gap between Stage 1 and Stage 2 may exceed institutional patience), or for fields where the relevant journals have not yet adopted the format. But for the substantial domain where it does work — confirmatory empirical research in fields whose journals have signed on — RRs are now widely considered the highest-credibility format available.

Adoption: The Center for Open Science List

The Center for Open Science (COS) maintains a regularly updated public list of journals that offer Registered Reports as a publication format. The list includes the journal name, the field, the date of adoption, the URL of the journal’s RR policy page, and (where available) the journal’s specific implementation of the Stage 1 / Stage 2 review structure. As of recent updates, the list includes more than 300 journals across the empirical sciences.

The adoption pattern is informative. Psychology was the leading adopter — the field where the replication crisis had been most publicly debated and where the institutional pressure for reform was strongest. Neuroscience and cognitive science followed quickly, in part because of the overlap of personnel with the psychology reform community. Biomedical research has adopted more slowly, but a growing number of clinical-trial-focused journals have adopted RR-style protocols (where the format overlaps significantly with the long-established practice of clinical-trial pre-registration on platforms like ClinicalTrials.gov). Education research has been a notable mid-cycle adopter. Economics has been a relatively late adopter, with the Journal of Development Economics and a small number of other journals leading the way; the bulk of mainstream economics journals have not yet adopted the format.

The pattern is consistent with the underlying incentive structure. Fields where the publication regime has been most visibly broken (psychology, biomedical) have been the fastest adopters. Fields where the publication regime is perceived to be working reasonably well, or where the structural impediments to pre-registration are larger (long-cycle research, observational fields with less crisp hypothesis structure, fields with strong financial interests in particular results), have been slower. The trajectory is positive but uneven.

Practical implication for a working reader: when encountering an empirical paper, it is now reasonable to check whether the journal offers Registered Reports and whether the paper in question was published under the RR format. This information is typically prominently labeled on the paper’s first page or in the article-type metadata. A paper published as a Registered Report carries the credibility premium described above. A paper published in the standard format at the same journal does not.

Where Critics Have Pushed Back

Registered Reports are not universally embraced, and the critiques are worth taking seriously. A strategist who relies on the RR premium for evidence evaluation should understand the limits of the format.

RRs are slower. The two-stage review process adds time to the publication pipeline — typically several months between submission of the Stage 1 manuscript and receipt of in-principle acceptance, beyond the normal review timeline of the Stage 2 manuscript. For research questions with short half-lives (rapidly evolving fields, time-sensitive policy questions, hot competitive areas), the additional latency is a real cost. The trade-off is between the credibility premium and the timeliness of the result.

RRs are more rigid. Once the Stage 1 protocol is in-principle accepted, deviations require disclosure and justification, and substantial deviations may not be permitted. This rigidity is the source of much of the format’s credibility — it is what removes researcher degrees of freedom — but it also constrains the researcher’s ability to respond to unexpected complications in data collection, to incorporate new methodological insights that arise mid-study, or to pursue serendipitous findings that the original protocol did not anticipate. The format works best for research questions whose methodology is well-understood at the outset; it works less well for genuinely exploratory work.

Serendipity is constrained. Some of the most important findings in the history of science have come from unexpected observations that researchers were not specifically looking for. The Registered Reports format does not forbid exploratory analyses, but it does require that they be clearly labeled as exploratory rather than confirmatory — and the structural pressure of the format is to do less exploration and more pre-specified hypothesis testing. Critics have argued that this is a real cost in terms of the kinds of insights the format makes harder to surface. Defenders have responded that the cost is overstated, since exploratory work can be done in parallel with confirmatory RRs and reported separately, and that the loss of false-positive serendipity is probably larger than the loss of true-positive serendipity given the underlying base rates.

The “no penalty for nulls” claim is not yet fully tested. The format’s promise depends critically on Stage 1 acceptances being honored at Stage 2 even when the results are null or negative. In practice, the bulk of evidence suggests this commitment is being kept — null RRs are being published — but the institutional pressure to back away from null findings has not disappeared, and the format’s long-run credibility depends on journals continuing to honor their Stage 1 commitments even when the results are unflattering to authors, journals, or sponsors. The track record so far is positive but the structural pressures remain.

Not all fields have crisp enough hypotheses. The format works best when the research question can be reduced to a pre-specifiable confirmatory test. For fields where the hypotheses are inherently more exploratory (ethnographic work, qualitative research, theory-building studies in early-stage fields), the RR format is awkward at best and inapplicable at worst. The format is a strong fit for the confirmatory portion of empirical research; it is not a universal solution to all publishing problems.

Power considerations create a self-selection effect. Stage 1 review typically requires evidence of adequate statistical power, which means RR authors tend to commit to larger sample sizes than they might otherwise. This is good for the credibility of the resulting findings, but it also raises the cost of doing RR research, which may bias the format toward better-resourced labs and away from smaller research groups. The equity implications are not yet well-studied.

Taking the critiques together: Registered Reports are not a panacea. They are a strong institutional response to a specific cluster of problems — publication bias, p-hacking, HARKing, analytic flexibility — that have been the dominant drivers of low PPV in many empirical fields. They are particularly well-suited to confirmatory research where the methodology can be pre-specified and the power requirements can be met. They are less well-suited to exploratory work, to fields with non-confirmatory hypothesis structures, and to research questions where time-to-publication is itself a critical consideration. For the substantial domain where they fit well, they are the strongest available credibility-enhancing format.

Strategist Takeaway: How to Treat RR Findings

For the working professional who reads empirical research to inform decisions, the existence of Registered Reports creates a useful tiered credibility framework. The framework has three practical implications.

First, when a paper is published as a Registered Report, you can take the result more seriously. The Stage 1 review preempted the major mechanisms that drive low PPV in standard-format papers. The hypothesis was frozen before the data were collected. The analysis was pre-specified. The publication commitment was made independent of the results. A null finding in an RR is a real null finding, not the visible tip of an iceberg of suppressed nulls. A positive finding in an RR was generated by methodology that survived rigorous pre-review and is much less likely to be a p-hacked or HARKed artifact. Treat RR findings as carrying roughly the credibility weight of a well-conducted replication study, even when the RR is the first study of its kind.

Second, when a paper is published in standard format at a journal that offers Registered Reports, ask yourself why the authors did not choose the RR pathway. The standard format is still available for genuinely exploratory work, for short-cycle competitive research, and for some kinds of work where RR is awkward. But for confirmatory work in a field where RRs are well-established, the choice to publish in standard format is a (mild) signal worth noticing. It does not condemn the paper, but it does mean the credibility-enhancement mechanisms of the RR pathway are not present.

Third, the 96%/44% gap is your structural baseline for thinking about the pre-reform literature. When evaluating a body of pre-2015 psychology research, or pre-reform empirical work in any field with structural conditions similar to pre-reform psychology, the headline rate of positive findings should be discounted by something like the magnitude of this gap. The published record of those fields was generated under a regime that systematically inflated the appearance of supporting evidence. A finding that was “consistently replicated across multiple studies” in the pre-reform literature may, after the structural discount, look much more like a hypothesis to be tested than an established result. This is the operational shape of the Bayesian discount that the Ioannidis (2005) framework recommends; the Scheel et al. (2021) gap is the empirical magnitude of the discount.

The single most useful operational rule that falls out of the Registered Reports literature is this: if the same finding has been demonstrated under the RR format, it should carry substantially more weight in your decision-making than the same finding demonstrated only under the standard format. The format is the most direct institutional response to the structural drivers of the replication crisis, and the empirical evidence that it works is the most direct evidence that the prior regime was systematically misleading. For strategists evaluating any published research, this is the cleanest contemporary heuristic the methodological literature has produced.

Sources

Chambers, C. D. (2013). Registered reports: A new publishing initiative at Cortex. Cortex, 49(3), 609–610. DOI: 10.1016/j.cortex.2012.12.016.
Nosek, B. A., & Lakens, D. (2014). Registered reports: A method to increase the credibility of published results. Social Psychology, 45(3), 137–141. DOI: 10.1027/1864-9335/a000192.
Scheel, A. M., Schijen, M. R. M. J., & Lakens, D. (2021). An excess of positive results: Comparing the standard psychology literature with registered reports. Advances in Methods and Practices in Psychological Science, 4(2). DOI: 10.1177/25152459211007467.
Chambers, C. (2017). The Seven Deadly Sins of Psychology: A Manifesto for Reforming the Culture of Scientific Practice. Princeton University Press.
Hardwicke, T. E., & Ioannidis, J. P. A. (2018). Mapping the universe of registered reports. Nature Human Behaviour, 2(11), 793–796. DOI: 10.1038/s41562-018-0444-y.
Fanelli, D. (2010). “Positive” results increase down the hierarchy of the sciences. PLOS ONE, 5(4), e10068. DOI: 10.1371/journal.pone.0010068.
Sterling, T. D. (1959). Publication decisions and their possible effects on inferences drawn from tests of significance. Journal of the American Statistical Association, 54(285), 30–34. DOI: 10.1080/01621459.1959.10501497.
Sterling, T. D., Rosenbaum, W. L., & Weinkam, J. J. (1995). Publication decisions revisited: The effect of the outcome of statistical tests on the decision to publish and vice versa. The American Statistician, 49(1), 108–112. DOI: 10.1080/00031305.1995.10476125.
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359–1366. DOI: 10.1177/0956797611417632.

Publication Bias and the File-Drawer Problem — the underlying selection mechanism that Registered Reports are specifically designed to neutralize.
The OSC 2015 Reproducibility Project: Psychology’s Empirical Reckoning — the empirical companion to the Ioannidis framework that motivated the broader push for RRs.
P-Hacking and Researcher Degrees of Freedom — the analytic-flexibility problem that pre-registration of analysis plans addresses.
HARKing: Hypothesizing After the Results Are Known — the post-hoc theorizing problem that Stage 1 hypothesis lock-in eliminates.
Many Labs Replication Projects: What Multi-Lab Studies Actually Show — the complementary reform of multi-team replication that pairs with the RR format for the strongest evidence.

Frequently Asked Questions

Does a Registered Report guarantee the finding is true?

No. A Registered Report removes the structural drivers of false positives that come from publication bias, p-hacking, and HARKing. It does not remove the possibility that an honestly-designed, honestly-analyzed study still produces a result that is wrong due to ordinary sampling variability, an unmeasured confound, or a methodological subtlety that the Stage 1 reviewers missed. An RR finding is much more credible than a standard-format finding of equal apparent rigor, but it is still a single study. The general rule of waiting for independent replication before acting on any single finding still applies; the RR format reduces the discount you apply to the single study, but it does not eliminate the need for replication.

Why is the standard-literature figure 96% rather than the 91.5% Fanelli reported?

Different samples produce different numbers in the 90% to 97% range. Fanelli’s 2010 figure of 91.5% for psychology and psychiatry was drawn from a particular sample of papers and a particular classification of “positive.” Scheel et al. (2021) measured 96% in their specific matched sample from the journals that publish Registered Reports. The exact number depends on the journal mix and the classification rules; the point is that the figure is somewhere in the mid-90s under the standard regime versus the mid-40s under the RR regime. The 52-point gap is robust to reasonable variations in either number.

Can authors get out of their Stage 1 commitment if the results are embarrassing?

In principle, no — that is the whole point. In practice, authors occasionally withdraw RR submissions between Stage 1 and Stage 2, and the field has not yet developed strong norms about whether and how to publicize such withdrawals. The format’s long-run credibility depends on Stage 1 acceptances being honored, and the major RR-offering journals have generally been good about this. But the structural pressure to back away from unflattering results has not been eliminated, and a reader who relies on the RR credibility premium should be aware that the format is a strong institutional improvement rather than a hermetic guarantee.

Are Registered Reports used outside psychology?

Yes, increasingly. The Center for Open Science list includes journals across psychology, neuroscience, cognitive science, biomedical research, education, public health, and (more recently) economics. Psychology was the leading adopter for historical reasons — the replication crisis hit psychology first and hardest — but the format has spread well beyond its original field. The format works best where the methodology can be pre-specified at Stage 1, which is most natural for experimental and quasi-experimental research and somewhat less natural for purely observational or exploratory work.

Is preregistration the same thing as a Registered Report?

No, though they are related. Preregistration is the broader practice of recording your hypotheses and analysis plan publicly before collecting data, typically on a platform like the Open Science Framework. A Registered Report is the specific publication format in which the preregistered protocol is also peer-reviewed and accepted by a journal at Stage 1, with the journal committing to publish the eventual paper regardless of results. Every Registered Report includes preregistration, but not every preregistered study is a Registered Report — many preregistered studies are submitted to journals in the standard format and reviewed only after the results exist. The RR format adds the journal commitment to publish, which is the structural feature that eliminates publication bias.

What should I do with pre-2013 research that obviously could not have been a Registered Report?

Apply the Bayesian discount that the Scheel et al. (2021) gap quantifies. Pre-2013 empirical research in fields with structural conditions similar to pre-reform psychology was produced under a regime that systematically inflated the appearance of supporting evidence. Findings from that era should be treated as candidates for confirmation rather than as established results. If a pre-2013 finding has subsequently been replicated under preregistered or registered-report conditions, the replication carries the credibility weight; the original finding is approximately decorative at that point. If a pre-2013 finding has not been replicated under those conditions, the appropriate treatment is to use it as a hypothesis worth taking seriously but not as a fact worth building strategy on.

replication-crisisregistered-reportschambers-2013methodology-reformevidence-evaluation

Atticus Li

Experimentation and growth leader. CXL-certified CRO practitioner, Mindworx-certified in behavioral economics. Led 100+ in-house experiments at NRG in 2025, with project evidence and limits documented in the case studies.

About LinkedIn Newsletter

The Cold Open: 96% and 44%

What Registered Reports Are

The Two-Stage Review, In Practice

What Scheel, Schijen, and Lakens (2021) Measured

The Origin Story: Chambers (2013) at Cortex

Chambers (2017): The Manifesto

Adoption: The Center for Open Science List

Where Critics Have Pushed Back

Strategist Takeaway: How to Treat RR Findings

Sources

Related Reading

Frequently Asked Questions

Related Articles

Cohen's d And The Misuse Of "Small/Medium/Large" Effect Sizes

The False Consensus Effect: Why You Think Everyone Agrees With You

The Barnum/Forer Effect: Why Personality Tests And Horoscopes Feel So Accurate

Get the WeeklyExperimentation Playbook

Get the Weekly
Experimentation Playbook