Extraordinarily corrupt or statistically commonplace? Reproducibility crises may stem from a lack of understanding of outcome probabilities

Abstract

Failure to consistently reproduce experimental results, i.e. failure to reliably identify or quantify an effect — often dubbed a ‘reproducibility crisis’ when referring to a large number of studies in a given field — has become a serious concern in many communities and is widely believed to be caused by lack of systematic methodological description, poor experimental practice, or outright fraud. On the other hand, it is common knowledge of the scientific practice that replicate experiments — even when performed in the same lab, by the same experimenter — will rarely show complete quantitative agreement between them. The presence of the widely believed and commonplace explanations are not mutually exclusive, but they are incompatible as justifications for irreproducibility. Invoking the former implies an anomaly, a crisis, while the latter is statistically expected and therefore amenable to quantification. Interpreting two or more studies as conflicting is often a reduction to a mechanicist view where a ground truth exists that must be observed with every properly performed experiment; a slightly less naive view is a frequentist view where statistical tests must confidently identify a true effect as significant almost always. A broader view, however, may consider that the effect can only be observed as a probability distribution; individual experiments are, therefore, not expected to differ only by sampling and power to identify a significant effect, but by variation at the level of the parameter value itself — i.e. it is accepted that there are sources of variation that cannot be controlled with infinite precision, for instance in the environment and from the experimenter, or it is acknowledged that there may be unknown, uncontrolled factors that will introduce biases. Quantitatively, that perspective is consistent with a Bayesian hierarchical formulation, where the effect parameters are under a hyperprior and above individual experiment parameters. Put another way, the Bayesian hierarchical view allows reconciliation between seemingly discordant results by interpreting each experiment as a sample itself of a distribution, which in turn sets the range and probability of expected outcomes for new individual experiments. As a corollary, a large number of replicates will increase the confidence not only in the expected value but also in the deviation for it. Thus, “validating” an experiment does not mean getting the same number every time, but establishing the range and likelihood of well-performed experiments. Conversely, once an experiment has been extensively replicated, the effect distribution is informative of how much each repetition deviates from expectation, whether they are actually extreme — and potentially contain anomalies or misconduct — or if they are probabilistically not surprising. This formulation has profound consequences for assessments and claims on reproducibility.

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 92,953

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

  • Only published works are available at libraries.

Similar books and articles

Should We Strive to Make Science Bias-Free? A Philosophical Assessment of the Reproducibility Crisis.Robert Hudson - 2021 - Journal for General Philosophy of Science / Zeitschrift für Allgemeine Wissenschaftstheorie 52 (3):389-405.
Opinion: Reproducibility failures are essential to scientific inquiry.A. David Redish, Erich Kummerfeld, Rebecca Morris & Alan Love - 2018 - Proceedings of the National Academy of Sciences 115 (20):5042-5046.
Re-Thinking Reproducibility as a Criterion for Research Quality.Sabina Leonelli - 2018 - Research in the History of Economic Thought and Methodology 36 (B):129-146.
Gaita on Philosophy, Corruption, and Justification.Joe Mintoff - 2015 - Journal of Philosophical Research 40:97-116.

Analytics

Added to PP
2022-07-02

Downloads
1 (#1,911,568)

6 months
1 (#1,514,069)

Historical graph of downloads

Sorry, there are not enough data points to plot this chart.
How can I increase my downloads?

Citations of this work

No citations found.

Add more citations

References found in this work

Re-Thinking Reproducibility as a Criterion for Research Quality.Sabina Leonelli - 2018 - Research in the History of Economic Thought and Methodology 36 (B):129-146.

Add more references