In 2005, the Dutch methodologist Jelte Wicherts and three colleagues did something unusual. They picked up the phone — and then the email, and then the follow-up email, and then the second follow-up email — and tried to obtain the raw data from 141 published studies in mainstream psychology journals. The journals in question had policies on the books requiring authors to share their data on request. The American Psychological Association’s ethics code required it. The implicit norm of science required it. Wicherts had a methodological reason for the request: he wanted to re-examine the analyses for statistical errors of the kind that subsequent meta-research has shown to be widespread in the published literature.

Six months later, after a documented protocol of requests, reminders, and follow-ups, Wicherts and his team had received raw data from 38 of the 141 studies. The other 103 — 73% — had failed to provide the data despite repeated written requests, despite the journal policy that required them to do so, and despite the absence of any plausible legitimate reason for refusal in most cases. The result was published in 2006 in American Psychologist as a four-page note titled “The poor availability of psychological research data for reanalysis.” It is one of the founding documents of the open-science reform movement.

The framing question is simple. If a published research finding cannot be independently verified from the underlying data and code, in what sense is it a public scientific result rather than a private testimonial? The open-science answer is: it isn’t. A finding that cannot be reanalyzed is a claim, not a result. And a literature in which most findings cannot be reanalyzed is, structurally, a literature whose self-correction mechanism has been disabled.

The replication crisis as a public phenomenon — the recognition that large fractions of psychology, biomedicine, and economics fail to replicate — has been the diagnostic story. The infrastructure reforms that have followed have been the response: open data mandates from journals, code-sharing requirements from publishers, preregistration platforms run by independent nonprofits, data management plan requirements from funders, and computational reproducibility platforms that bundle data, code, and execution environment into a single citable artifact. The reforms are large in number, uneven in adoption, and produce a mixed but real empirical record of effects.

This article walks through what the empirical record actually says: Wicherts 2006 on the prior availability problem, Hardwicke 2018 on what happened when one journal mandated data sharing, Stodden 2018 on the computational-reproducibility version of the same problem, the institutional infrastructure (OSF, TOP guidelines, funder data management plans) that has been built in response, and the practical question that a working strategist should ask when evaluating any evidence-based claim: is the underlying data and code publicly available, and if not, how should I discount the claim?

Wicherts 2006: The 73% Refusal Rate

The Wicherts paper is short and devastating. The methods are about as straightforward as empirical methods get. The authors selected 141 studies from four leading American Psychological Association journals — the Journal of Personality and Social Psychology, the Journal of Educational Psychology, the Journal of Experimental Psychology: Learning, Memory, and Cognition, and Developmental Psychology — published in two issues of each journal in late 2004. They contacted the corresponding author of each study with a request for the raw data, framed as a request for reanalysis purposes. The contact protocol was structured: an initial email, a reminder if no response, and a final follow-up. The window for response was approximately six months.

The headline result was that 73% of authors did not provide the data within the response window. Of the 141 studies, 38 (27%) yielded the requested data. The remaining 103 either refused outright, did not respond after multiple follow-ups, claimed that the data were lost, claimed that the data were in a format that could not be shared, claimed that they were too busy to compile and send the data, or invoked confidentiality concerns that the authors of the original studies had not flagged at the time of publication.

The breakdown of non-responses is, in some ways, more interesting than the headline number. A meaningful fraction of authors did not respond at all to any of the contact attempts — they were not reachable, the email addresses on file were stale, or they had moved institutions and the request did not follow them. Some authors initially agreed to provide data but then stopped responding once the request crossed from agreement into actual data transfer. Some authors flatly refused on grounds ranging from “I’m too busy” to “I’d want to know what you plan to do with it first” — a response that, while reasonable in a vacuum, is incompatible with the structural premise of public science that published findings are publicly verifiable.

The most quietly damning category was the “data are lost” response. In multiple cases, authors of recently published studies — published within the prior year, in some cases — reported that the raw data files were no longer available. The data had been on a graduate student’s laptop that had been replaced. The data had been on a server that had been decommissioned. The data had been in a format that the current statistical software could no longer read. These were not papers from the 1970s; these were papers from 2004, whose data had vanished within months of publication. The vanishing was not the result of any deliberate action; it was the result of the absence of any structural requirement that the data be preserved. In the absence of a requirement, the default was loss.

Wicherts and colleagues went further: they used the available data from the 38 cooperating authors to examine the rate of statistical reporting errors in the original papers, and they cross-tabulated the willingness to share with the rate of reporting errors. The studies whose authors declined to share data had a higher rate of statistical reporting errors detectable from the published manuscripts than the studies whose authors shared. This is a small sample, the direction is what one would predict on selection grounds, and the inferential weight is limited — but the pattern is in the predicted direction and consistent with subsequent larger-scale work. Authors who are unwilling to expose their analyses to independent reanalysis tend, on average, to have analyses that benefit from not being exposed to independent reanalysis.

The Wicherts paper landed in a methodology community that had been articulating concerns about data sharing for decades, but it converted those concerns into an empirically grounded number that could be cited. “Wicherts 2006” became the short-form citation that anyone in any empirical field could invoke to make the case that the implicit norm of shareable data was, in practice, broken. The paper has been cited thousands of times in the two decades since and remains the canonical empirical reference for what the data-sharing problem actually looks like in the absence of structural enforcement.

Hardwicke 2018: The Cognition Mandate

If Wicherts 2006 measured the baseline rate of data sharing under voluntary norms, Hardwicke and colleagues 2018 measured what happened when a journal switched from voluntary norms to a mandatory policy. The paper, “Data availability, reusability, and analytic reproducibility: Evaluating the impact of a mandatory open data policy at the journal Cognition,” published in Royal Society Open Science in August 2018, is the most rigorous empirical evaluation of a journal-level open data mandate that has been conducted.

The journal Cognition — a leading outlet for experimental cognitive science — adopted a mandatory open data policy in March 2015. Under the new policy, authors of accepted manuscripts were required to make the underlying data publicly available unless they could justify a specific exception (typically involving genuinely sensitive participant information). The policy was enforced as a condition of publication.

Hardwicke and colleagues selected 35 articles published in Cognition in the year after the policy went into effect. For each article, they attempted three things in sequence. First, they attempted to locate the data file from the public repository linked in the manuscript. Second, where data were available, they attempted to download the code used for analysis (the code-sharing component was not formally required by the Cognition policy, but the authors examined the rate of voluntary code sharing as an adjunct). Third, where both data and code were available, they attempted to actually re-run the code on the data and reproduce the headline results reported in the manuscript.

The cascade of attrition at each stage is the central empirical contribution of the paper.

At the first stage — data availability — Hardwicke and colleagues found that 65% of the 35 articles had the underlying data publicly accessible in some form. The remaining 35% had broken links, empty repositories, files in formats that could not be opened, files that were not what they purported to be, or files that were missing the variables needed for the headline analysis. The 65% sharing rate under a mandatory policy was a substantial improvement over the 27% rate Wicherts had documented under voluntary norms — but it was a long way from 100%. A mandatory policy that produced sharing in 65% of cases was leaving a third of the literature unverifiable, even after the journal had formally required sharing as a condition of publication.

At the second stage — code availability — the rate dropped further. Of the articles where data were available, code was available in a smaller fraction. Hardwicke and colleagues reported that across the full sample, only about a third of articles had both data and code available in a form that allowed analytic reproduction to even be attempted.

At the third stage — actually re-running the code and reproducing the results — the rate dropped further still. Of the articles where both data and code were available, the team attempted to execute the analysis and check that the results matched what the paper reported. The headline figure that has been most-cited is that 36% of the articles produced code that ran and reproduced the reported results when executed on the available data. The remaining articles either had code that did not run (due to missing dependencies, version mismatches, hard-coded paths to the original researcher’s machine, or scripts that referenced data files that had not been included), code that ran but produced different numerical results from the published paper, or code whose connection to the published results was unclear enough that the verification attempt could not produce a clean yes/no judgment.

The Hardwicke paper is the empirical foundation for the contemporary phrase “computational reproducibility crisis.” A journal that had taken the deliberate institutional step of mandating data sharing, in a field with a mature open-source statistical computing ecosystem (most of the analyses were in R or MATLAB), was nevertheless producing a published literature where only about a third of analyses could be reproduced from the supplied materials. The other two-thirds, even where the underlying data were available, were unverifiable in a strict sense — a sufficiently determined re-analyst could probably reconstruct what the original authors had done, but the published artifact did not enable mechanical verification.

The lessons that Hardwicke and colleagues drew were structural. A data-sharing policy is necessary but not sufficient. Without a code-sharing requirement, the published analysis is a black box even when the data are open. Without a requirement that the code be executable on the supplied data (including specification of dependencies, software versions, and reproducible execution environments), code sharing is necessary but not sufficient. Without independent verification of computational reproducibility before publication (which would require either editor-side resources or a separate verification infrastructure), even data-plus-code policies leave most of the literature unverified. The Hardwicke paper effectively defined the policy frontier: a journal mandate is the first step, code sharing is the second step, executable-environment standards are the third step, and pre-publication verification is the fourth step. Few journals have moved past the first or second step. The empirical effect on actual reproducibility is correspondingly bounded.

Stodden 2018: The Computational Reproducibility Crisis

The Hardwicke result was framed as a finding about psychology, but the same structural pattern recurs across computational science more broadly. The most rigorous empirical work on the computational version of the reproducibility problem is by Victoria Stodden, a Columbia statistician who has been the leading methodologist of computational reproducibility for a decade. Her 2018 PNAS paper, “An empirical analysis of journal policy effectiveness for computational reproducibility” (DOI: 10.1073/pnas.1708290115), is the most ambitious attempt to measure the effectiveness of data and code sharing policies at the journal level in computational science.

Stodden and colleagues focused on Science as the test case. Science adopted a “data and code on request” policy in February 2011, requiring authors of accepted manuscripts to provide data and code upon request from other researchers. The policy was, on paper, more aggressive than many contemporary policies — it imposed an explicit ongoing obligation on authors. The Stodden team examined whether the policy actually produced the computational reproducibility it was intended to enable.

The team selected 204 articles published in Science between 2011 and 2014 that involved computational analyses. They contacted the authors of each article requesting the data and code needed to reproduce the reported computational results, following the protocol that Science’s own policy invoked. They documented the response rate, the materials that were actually provided, and — where materials were provided — whether the materials were sufficient to reproduce the reported results.

The numbers in Stodden 2018 are the computational analog of the Wicherts and Hardwicke results. Across the 204 articles, the team obtained data and code sufficient to attempt reproduction in a minority of cases. Of the articles where reproduction was attempted, the team was able to reproduce the reported computational results in roughly half. The combined attrition — from “policy requires data and code on request” to “team actually has reproducible computational results” — was such that the effective reproducibility rate in the sample was well under 50%. Stodden’s summary framing has been that for computational science in this period, the published literature was effectively unreproducible from the published materials in roughly half of cases, even at a flagship journal with an on-paper policy that should have produced full reproducibility.

The structural diagnosis in Stodden’s work has converged with the structural diagnosis in Hardwicke’s. Policies that depend on author cooperation after publication are systematically less effective than policies that require submission of materials as a condition of publication. Policies that share data without code are systematically less effective than policies that require both. Policies that require code without specifying the execution environment are systematically less effective than policies that bundle data, code, and environment into a single citable artifact. The frontier of computational reproducibility infrastructure has moved toward platforms — Code Ocean, Whole Tale, mybinder, and others — that bundle the data, the code, and a containerized execution environment into a single object that can be cited, downloaded, and re-run without any cooperation from the original author. Adoption of these platforms is growing but remains a minority of computational publications.

The Stodden work also intersects with a broader concern that has been raised by Lorena Barba, David Donoho, and others: computational science is structurally different from the bench sciences that have historically driven the replication discourse, and the reproducibility standards that apply to bench experiments are not directly portable to computational analyses. In a bench experiment, replication means a different lab runs the same protocol and gets the same result. In a computational analysis, the equivalent is “the same code runs on the same data and produces the same output bits,” which is a much weaker standard but is also one that is, in principle, achievable by mechanical means. The minimum standard for a computational paper should be that the published code, run on the published data, produces the published numbers. Stodden’s work has documented that even this minimum standard is not being met across substantial fractions of the computational literature. This is the version of the reproducibility crisis that is most amenable to infrastructure-based fixes — there is no analog of “the lab equipment is different” or “the population is different” in a computational analysis — and the infrastructure community has been building precisely those fixes.

The Institutional Infrastructure: OSF, TOP, and the Reform Stack

The empirical record on data and code sharing — Wicherts on the baseline, Hardwicke on the journal-mandate response, Stodden on the computational version — has been the diagnostic input to an institutional reform movement that has produced a stack of complementary infrastructure components. The reforms operate at the platform level, the journal level, the funder level, and the field-norm level. They are mutually reinforcing, and the cumulative effect on the actual rate of open and reproducible research is substantially larger than any single component.

The Open Science Framework (OSF). The Open Science Framework is a free, public-good platform operated by the Center for Open Science, a nonprofit founded by Brian Nosek in 2013. OSF provides a hosted environment for preregistration of analysis plans, posting of working papers and preprints, sharing of data and materials, and creation of citable project records that bundle all of the above. The platform is operated as a public good with stable URLs that the organization commits to maintaining indefinitely. The technical implementation supports versioning, DOI assignment, integration with statistical software ecosystems, and granular access controls for embargoed materials.

The practical effect of OSF on research practice has been substantial. Preregistration of analysis plans — once a marginal practice associated mainly with clinical trials — has become a mainstream option for psychology, behavioral economics, organizational research, and increasingly for ecology and biology. The OSF’s preregistration tool has been used for hundreds of thousands of studies in the years since it launched. The platform’s data and materials hosting has become the default location for the supplementary materials of an increasing fraction of published research in the open-science-influenced fields. The OSF is not the only such platform — Figshare, Zenodo, Dryad, and field-specific repositories play similar roles — but it has been the most central to the open-science reform movement in the social and behavioral sciences.

The TOP Guidelines. The Transparency and Openness Promotion (TOP) Guidelines are a set of standards published by the Center for Open Science in 2014 (Nosek et al., 2015, Science), articulating eight specific dimensions of research transparency along which journals can adopt graduated standards. The eight TOP standards cover: citation standards, data transparency, analytic methods (code) transparency, research materials transparency, design and analysis transparency, study preregistration, analysis plan preregistration, and replication policy. For each dimension, TOP defines three levels of stringency — encouraged, required, or enforced — that a journal can adopt. A “TOP 3” journal that enforces all eight dimensions at the highest level represents the frontier of open-science publishing.

The TOP framework has been adopted by approximately 5,000 journals and 100 societies in the years since launch. Adoption levels are typically lower than “TOP 3 on everything” — most adopters are at level 1 (encouraged) or level 2 (required) on most dimensions — but the framework has provided a portable vocabulary for journal policy negotiations and a coordination mechanism that makes it easier for fields to migrate toward stronger norms incrementally. The Hardwicke 2018 paper’s evaluation of the Cognition mandate is, effectively, an evaluation of a TOP-level-2 data-sharing policy in practice. Subsequent evaluations of stronger policies (level 3 — verified at submission) have produced higher computational reproducibility rates, but at the cost of additional editorial overhead that most journals have not adopted.

Funder data management plan requirements. Public funders in most developed-country research systems have adopted data management plan requirements in the years since 2010. The U.S. National Institutes of Health implemented a Data Management and Sharing policy that took effect in January 2023, requiring all NIH-funded research to include a data management and sharing plan and to make resulting data available subject to reasonable constraints. The U.S. National Science Foundation has required data management plans for all proposals since 2011. The European Research Council, the U.K. Research Councils, and most national funding bodies in Western Europe have analogous requirements. The combined effect is that, increasingly, the rule for publicly funded research is that data must be planned for sharing from the moment of grant submission, not arranged post-hoc at the moment of publication.

Funder requirements are arguably the most structurally consequential of the open-science reforms, because they apply at the source of the research rather than at the point of publication. A funder requirement that a project’s data be deposited in a public repository as a condition of continued funding has more enforcement power than a journal requirement that data be shared as a condition of publication, because the funder controls a resource that researchers care about throughout the multi-year arc of a project. The compliance gap on funder requirements is real (the formal requirement to submit a data management plan is widely met; the actual compliance with the plan at project conclusion is enforced unevenly), but the structural direction is toward stronger and more enforced requirements over time.

Computational reproducibility platforms. The Code Ocean, Whole Tale, mybinder, and CodaLab platforms represent the infrastructure response to the specific problem documented in Stodden 2018: that code-and-data sharing without execution-environment specification fails in practice because the code does not run on the reader’s machine. These platforms bundle code, data, and a containerized execution environment (typically using Docker) into a single citable artifact that can be re-run by anyone, anywhere, with one click. The architectural commitment is that computational results should be reproducible mechanically, without any cooperation from the original author, by anyone who has access to the published artifact. Adoption is concentrated in fields with high computational intensity (machine learning, genomics, computational physics) and remains a minority practice across science as a whole. But the underlying architecture is sound, and the infrastructure is in place for fields to migrate to it when journals or funders require it.

Registered Reports and pre-publication peer review of methods. The Registered Reports publication format, pioneered by Cortex in 2013 and now adopted by approximately 300 journals, separates peer review into two stages: methods are reviewed and accepted in principle before the data are collected, and the final paper is published regardless of the direction of the results provided that the pre-registered methods were followed. The format directly addresses the publication-bias and p-hacking problems that the broader open-science movement has documented. Registered Reports are not, strictly, a data-and-code-sharing reform — they are a methods-transparency reform — but they are conceptually adjacent and frequently adopted as part of the same institutional package. The empirical evidence on Registered Reports is encouraging: the rate of “positive” headline findings in Registered Reports is approximately half the rate in conventional publication formats in the same fields, which is what one would expect if the conventional format was producing inflated positive rates through some combination of publication bias and analysis flexibility.

The cumulative effect of the stack — OSF infrastructure, TOP standards, funder data management plans, computational reproducibility platforms, Registered Reports — is that the rate of open and reproducible research has been moving in the right direction across the open-science-influenced fields over the past decade. The rate is still far short of the implicit norm that all publicly funded research should be reproducible from publicly available materials, but the slope has been positive and the institutional infrastructure is now in place to support continued improvement. The McKiernan et al. 2016 eLife paper, “How open science helps researchers succeed,” documented the early evidence that open practices were correlated with citation and career benefits — the structural argument for individual adoption. The contemporary record is that the practices that were marginal in 2010 are now mainstream in 2026 across substantial parts of psychology, biomedicine, behavioral economics, and the computational sciences.

What This Means for Strategists Reading Research

The infrastructure reform movement matters to working strategists in a specific operational way. The fact that data and code sharing has become a publishable standard in influential fields means that, increasingly, you can ask a verifiable question of any “evidence-based” claim that you encounter: are the underlying data and code publicly available?

The diagnostic question takes three forms depending on what you find.

If the data and code are publicly available, posted to a stable repository, and the code runs: the claim is in the strongest evidentiary category that is achievable in published research. You can, in principle, re-run the analysis yourself. You can examine the analytic decisions for the researcher-degree-of-freedom problems that the broader replication crisis literature has documented. You can probe the sensitivity of the headline finding to specification choices that the authors made. Even if you do not actually conduct these analyses, the fact that they are possible — that a sufficiently motivated third party could conduct them — is a structural credibility signal that the published claim has not been hand-waved past the verification process.

If the data and code are nominally available but the code does not run, the data are in a non-standard format, or the analysis cannot be reproduced from the available materials: the claim has cleared the lowest bar of openness without clearing the substantive bar of verifiability. This is the category that Hardwicke 2018 documented for two-thirds of the Cognition papers and that Stodden 2018 documented for roughly half of computational Science papers. The claim is not fraudulent or even badly intended, but it is not, in a strict sense, verifiable from the published materials. Treat it as you would treat a claim from a field with substantial researcher degrees of freedom: as a credible hypothesis that warrants additional evidence before acting on it.

If the data and code are not available at all: the claim is in the weakest evidentiary category that is achievable in published research, regardless of the prestige of the journal or the credentials of the authors. The claim is structurally unverifiable; the field has decided not to require verifiability for this class of work; and the appropriate strategic response is the discount rate that follows from the broader replication crisis literature on unverifiable findings. This category includes much of the qualitative social-science literature, much of clinical practice based on small-sample studies, much of the management and consulting literature that draws on proprietary or confidential data, and much of the popular-press summarization of “what science says.” None of these are necessarily false, but none of them are verifiable, and the credence you accord them should reflect that.

The strongest operational rule that falls out of the open-science literature is: before acting on any “research-supported” claim that you have not personally verified, ask whether the underlying data and code are publicly available. If they are not, treat the claim as a testimonial rather than a result. This is the equivalent in the data-availability frame of the Ioannidis-derived rule “do not change your behavior based on a single study, in any field, ever.” The two rules are mutually reinforcing: a single study is weak evidence; a single unverifiable study is weaker; a single unverifiable study from a field where data are systematically withheld is weakest of all.

The infrastructure reforms of the open-science movement do not eliminate the replication crisis. They do not retroactively repair the literature of the past fifty years that was built without open-data norms. They do not, by themselves, prevent the next Daryl Bem, the next Diederik Stapel, the next Reinhart-Rogoff. What they do is build, slowly and incompletely, an alternative infrastructure for empirical research in which the default assumption is that published findings should be verifiable from publicly available materials. The strategic implication for working professionals is that the infrastructure now exists, and the appropriate response to any claim that did not pass through it is to apply a structural discount.

Sources

  • Wicherts, J. M., Borsboom, D., Kats, J., & Molenaar, D. (2006). The poor availability of psychological research data for reanalysis. American Psychologist, 61(7), 726–728. DOI: 10.1037/0003-066X.61.7.726.
  • Hardwicke, T. E., Mathur, M. B., MacDonald, K., Nilsonne, G., Banks, G. C., Kidwell, M. C., et al. (2018). Data availability, reusability, and analytic reproducibility: Evaluating the impact of a mandatory open data policy at the journal Cognition. Royal Society Open Science, 5(8), 180448. DOI: 10.1098/rsos.180448.
  • Stodden, V., Seiler, J., & Ma, Z. (2018). An empirical analysis of journal policy effectiveness for computational reproducibility. Proceedings of the National Academy of Sciences, 115(11), 2584–2589. DOI: 10.1073/pnas.1708290115.
  • Nosek, B. A., Alter, G., Banks, G. C., Borsboom, D., Bowman, S. D., Breckler, S. J., et al. (2015). Promoting an open research culture. Science, 348(6242), 1422–1425. DOI: 10.1126/science.aab2374.
  • McKiernan, E. C., Bourne, P. E., Brown, C. T., Buck, S., Kenall, A., Lin, J., et al. (2016). How open science helps researchers succeed. eLife, 5, e16800. DOI: 10.7554/eLife.16800.
  • Open Science Framework. Hosted by the Center for Open Science. https://osf.io.
  • Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716. DOI: 10.1126/science.aac4716.
  • Stodden, V. (2010). The scientific method in practice: Reproducibility in the computational sciences. MIT Sloan Research Paper No. 4773-10. DOI: 10.2139/ssrn.1550193.
  • Munafò, M. R., Nosek, B. A., Bishop, D. V. M., Button, K. S., Chambers, C. D., du Sert, N. P., et al. (2017). A manifesto for reproducible science. Nature Human Behaviour, 1(1), 0021. DOI: 10.1038/s41562-016-0021.
  • Chambers, C. D. (2013). Registered Reports: A new publishing initiative at Cortex. Cortex, 49(3), 609–610. DOI: 10.1016/j.cortex.2012.12.016.

Frequently Asked Questions

Q: If only 65% of papers shared data even under a mandatory policy, are mandatory policies worth it?

A: Yes, and the comparison is the relevant one. The pre-mandate baseline in psychology — documented by Wicherts in 2006 — was 27% data availability under voluntary norms with a substantial fraction of authors actively refusing requests. The post-mandate rate at Cognition — documented by Hardwicke in 2018 — was 65% under the journal’s enforced policy. The mandate roughly doubled the sharing rate. A doubling is large in absolute terms even when the resulting level is still imperfect. The remaining 35% gap represents the additional enforcement infrastructure that has not yet been built (verification at submission, structured deposits, executable environment requirements), not a failure of the mandate concept. Journals that have moved past the basic-mandate stage to verified-at-submission policies have reported higher rates, at the cost of editorial overhead.

Q: What about confidentiality concerns? Aren’t some data genuinely sensitive?

A: Yes, and the institutional infrastructure has converged on a layered approach. Genuinely sensitive data — individual patient records, identifiable survey responses, proprietary commercial data — can be deposited under access-controlled arrangements that allow verified third-party researchers to apply for access while protecting the underlying confidentiality. Repositories like the Inter-university Consortium for Political and Social Research (ICPSR) and the U.K. Data Service provide this tier. The relevant distinction is between data that genuinely cannot be made public and data that the original researcher would prefer not to make public; the institutional infrastructure increasingly distinguishes the two categories and treats only the former as legitimate grounds for non-sharing.

Q: For applied/industry research where I cannot share proprietary data, does any of this apply?

A: The framework still applies, in a modified form. If you are running an internal experiment in a company and producing a result that will inform business decisions, the data cannot be made public, but the verifiability requirement is internal: are the data, the analysis code, and the methodology documented well enough that an internal third party (a different team, a future hire, a successor) can re-run the analysis and reproduce the result? Most internal experimentation practices fall short of this internal-verifiability bar in the same ways that the published literature falls short of the external-verifiability bar. The same infrastructure patterns — versioned data deposits, executable code in a containerized environment, preregistered analysis plans — apply to internal experimentation programs that are serious about not deluding themselves.

Q: How do I, as a non-researcher, actually check whether a paper’s data and code are available?

A: For most published papers from the past decade in open-science-influenced fields, the answer is one of: a “Data Availability” section in the paper itself (typically near the end), a supplementary materials link, or a citation to an OSF, Figshare, Zenodo, or Dryad deposit. If none of these is present, the data are not available. For older papers, the typical answer is that the data are not available. For papers from fields that have not adopted open-science norms (much of qualitative social science, much of clinical-trial commercial sponsorship, much of management research), the data are typically not available regardless of paper age. The check is a five-minute exercise that produces a clean signal about which evidentiary category the paper falls into.

Q: Does the open-science movement risk creating a false sense of security — that “open” automatically means “true”?

A: Yes, and the methodological community has been explicit about this risk. The Hardwicke 2018 paper itself documents that even where data are available, the analysis is often not actually reproducible. Open data and code are necessary conditions for verifiability; they are not sufficient conditions for truth. An open-data analysis can still be wrong, can still suffer from researcher-degree-of-freedom problems, can still be embedded in a literature with publication bias, and can still be one positive finding in a context where the prior probability of the hypothesis was low. The right framing is that open data and code raise the floor of what verification is possible — they do not raise the ceiling of how reliable the underlying finding actually is. The other replication-crisis frameworks (Ioannidis Bayesian discounting, registered reports, multi-team replication) still apply on top of the open-data baseline.

Share this article
LinkedIn (opens in new tab) X / Twitter (opens in new tab)
Atticus Li

Experimentation and growth leader. CXL-certified CRO practitioner, Mindworx-certified behavioral economist (1 of ~1,000 worldwide). 200+ A/B tests across energy, SaaS, fintech, e-commerce, and marketplace verticals.