Skip to main content
Log in

Abstract

Benjamin et al. Nature Human Behavior 2 (1), 6–10 (2018) proposed decreasing the significance level by an order of magnitude to improve the replicability of psychology. This modest, practical proposal has been widely criticized, and its prospects remain unclear. This article defends this proposal against these criticisms and highlights its virtues.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. Supposing that these true negatives are ever observed, an unlikely outcome if people heavily engage in questionable research practices that ensure reaching the significance level (Simmons et al. 2011)

  2. Psychological Science publishes articles in various areas of psychology.

  3. That is, the proportion of replicated studies whose p-values is below .05.

  4. See also Trafimow (2018) for discussion of how to measure replicability.

  5. We do not claim originality for many of the points put forward in our paper.

  6. The Bayes Factor argument does not rely on the ascription of probabilities to hypotheses about parameter values: No probability is assigned to the null model or to the alternative model.

  7. This is only the case when the standard deviation is the same for the null and alternative models. Thanks to Justin fisher for noting this point.

  8. Morey (2018) also criticizes this empirical argument, but he misunderstands its point. The argument does not aim to show that there are fewer failed replications for original p’s ≤ .005 than for .005 < original p’s ≤ .05, but rather to give a sense of how much replicability could increase following a reduction of the significance level.

  9. This restriction applies only to direct replications and not to conceptual replications (for a criticism of this distinction, see however Machery n.d.).

  10. It is for example surprising that critics of null hypothesis significance testing fail to see that even in the absence of a cutoff scientists would engage in practices that exaggerate how much evidence they have for their pet hypotheses.

  11. Morey (2018) also shows that according to his own representation of two-sided tails as pairs of one-sided tails, a p-value equal to .02 provides substantial evidence for a directional hypothesis.

  12. Given that many null hypotheses are literally false (there is very often a tiny effect), Lakens and colleagues’ remark challenges the common assumption that by rejecting a point null hypothesis one is also entitled to conclude that the effect is at most negligible (Machery 2014).

  13. One may question this appeal to syncretism since the choice of a .005 level is only justified on Bayesian grounds. A true syncretic approach would instead justify it on Bayesian and on frequentist grounds. However, first, the appeal to syncretism is meant to undermine the idea that Bayesian considerations are always irrelevant to a frequentist. Even is no frequentist justification is provided, a syncretist can’t dismiss the relevance of Bayesian considerations. Second, the argument from the false discovery rate can be given a frequentist interpretation: It examines the frequency of false positives among significant results for various possible base rate of true null hypotheses, exactly as we would do when we assess whether a medical test is sufficiently sensitive.

  14. This presentation simplifies slightly Crane’s presentation, but nothing of importance is lost (see eq. 9 in Crane, n.d.).

  15. This type of objection would undermine various other proposals that take for granted the null hypothesis significance testing framework (e.g., preregistration).

  16. Argamon also flirts with the everything-or-nothing attitude that we criticized earlier when we discussed Trafimow et al. (2018).

  17. Noone thinks it is sufficient.

  18. Our proposal is entirely consistent with a metaanalytic approach, and it is unclear why, as Lakens et al. (2018, 169) assert, our proposal would “divert attention from the cumulative evaluation of findings, such as converging results of multiple (replication) studies.”

  19. What if the significance level is used as publication filter? We then need to distinguish the situation where the null hypothesis is true and those where the null hypothesis is false. When the null hypothesis is true, effect size inflation increases with a decreased significance level, even if the sample size increases to maintain power constant. However, when the null is false, such increase need not be the case. P-values are right skewed when the null is false and the extent of the skew depends on the sample size for constant population parameters. So, if decreasing the significance level results in an increase in sample size, a larger number of p-values may be significant for a smaller significance level. As a result, effect size inflation may decrease rather than increase.

References

Download references

Acknowledgements

I owe the expression “alpha war” to Simine Vazire. Thanks to John Doris, Felipe Romero, and two reviewers for very helpful feedback.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Edouard Machery.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Machery, E. The Alpha War. Rev.Phil.Psych. 12, 75–99 (2021). https://doi.org/10.1007/s13164-019-00440-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13164-019-00440-1

Navigation