The Alpha War

Machery, Edouard

doi:10.1007/s13164-019-00440-1

Edouard Machery¹

823 Accesses
6 Citations
2 Altmetric
Explore all metrics

Abstract

Benjamin et al. Nature Human Behavior 2 (1), 6–10 (2018) proposed decreasing the significance level by an order of magnitude to improve the replicability of psychology. This modest, practical proposal has been widely criticized, and its prospects remain unclear. This article defends this proposal against these criticisms and highlights its virtues.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Ruggles, Richard (1916–2001)

Introduction to Bayesian Inference for Psychology

Article 04 April 2017

Griess-Test

Notes

Supposing that these true negatives are ever observed, an unlikely outcome if people heavily engage in questionable research practices that ensure reaching the significance level (Simmons et al. 2011)
Psychological Science publishes articles in various areas of psychology.
That is, the proportion of replicated studies whose p-values is below .05.
See also Trafimow (2018) for discussion of how to measure replicability.
We do not claim originality for many of the points put forward in our paper.
The Bayes Factor argument does not rely on the ascription of probabilities to hypotheses about parameter values: No probability is assigned to the null model or to the alternative model.
This is only the case when the standard deviation is the same for the null and alternative models. Thanks to Justin fisher for noting this point.
Morey (2018) also criticizes this empirical argument, but he misunderstands its point. The argument does not aim to show that there are fewer failed replications for original p’s ≤ .005 than for .005 < original p’s ≤ .05, but rather to give a sense of how much replicability could increase following a reduction of the significance level.
This restriction applies only to direct replications and not to conceptual replications (for a criticism of this distinction, see however Machery n.d.).
It is for example surprising that critics of null hypothesis significance testing fail to see that even in the absence of a cutoff scientists would engage in practices that exaggerate how much evidence they have for their pet hypotheses.
Morey (2018) also shows that according to his own representation of two-sided tails as pairs of one-sided tails, a p-value equal to .02 provides substantial evidence for a directional hypothesis.
Given that many null hypotheses are literally false (there is very often a tiny effect), Lakens and colleagues’ remark challenges the common assumption that by rejecting a point null hypothesis one is also entitled to conclude that the effect is at most negligible (Machery 2014).
One may question this appeal to syncretism since the choice of a .005 level is only justified on Bayesian grounds. A true syncretic approach would instead justify it on Bayesian and on frequentist grounds. However, first, the appeal to syncretism is meant to undermine the idea that Bayesian considerations are always irrelevant to a frequentist. Even is no frequentist justification is provided, a syncretist can’t dismiss the relevance of Bayesian considerations. Second, the argument from the false discovery rate can be given a frequentist interpretation: It examines the frequency of false positives among significant results for various possible base rate of true null hypotheses, exactly as we would do when we assess whether a medical test is sufficiently sensitive.
This presentation simplifies slightly Crane’s presentation, but nothing of importance is lost (see eq. 9 in Crane, n.d.).
This type of objection would undermine various other proposals that take for granted the null hypothesis significance testing framework (e.g., preregistration).
Argamon also flirts with the everything-or-nothing attitude that we criticized earlier when we discussed Trafimow et al. (2018).
Noone thinks it is sufficient.
Our proposal is entirely consistent with a metaanalytic approach, and it is unclear why, as Lakens et al. (2018, 169) assert, our proposal would “divert attention from the cumulative evaluation of findings, such as converging results of multiple (replication) studies.”
What if the significance level is used as publication filter? We then need to distinguish the situation where the null hypothesis is true and those where the null hypothesis is false. When the null hypothesis is true, effect size inflation increases with a decreased significance level, even if the sample size increases to maintain power constant. However, when the null is false, such increase need not be the case. P-values are right skewed when the null is false and the extent of the skew depends on the sample size for constant population parameters. So, if decreasing the significance level results in an increase in sample size, a larger number of p-values may be significant for a smaller significance level. As a result, effect size inflation may decrease rather than increase.

References

Amrhein, V., and S. Greenland. 2018. Remove, rather than redefine, statistical significance. Nature Human Behaviour 2: 4.
Article Google Scholar
Amrhein, V., F. Korner-Nievergelt, and T. Roth. 2017. The earth is flat (p> 0.05): Significance thresholds and the crisis of unreplicable research. PeerJ 5: e3544.
Article Google Scholar
Amrhein, V., D. Trafimow, and S. Greenland. 2018. Abandon statistical inference. PeerJ Preprints 6: e26857v1. https://doi.org/10.7287/peerj.preprints.26857v1.
Article Google Scholar
Argamon, S. E. (2017). Don’t strengthen statistical significance—Abolish it. https://www.americanscientist.org/blog/macroscope/dont-strengthen-statistical-significance-abolish-it.
Google Scholar
Baker, M., and E. Dolgin. 2017. Cancer reproducibility project releases first results. Nature 541: 269–270.
Article Google Scholar
Begley, C.G., and L.M. Ellis. 2012. Drug development: Raise standards for preclinical cancer research. Nature 483: 531–533.
Article Google Scholar
Benjamin, D., Berger, J., Johannesson, M., Johnson, V., Nosek, B., & Wagenmakers, E. J. (2017). Précis by Dan Benjamin, Jim Berger, Magnus Johannesson, Valen Johnson, Brian Nosek, and EJ Wagenmakers. http://philosophyofbrains.com/2017/10/02/should-we-redefine-statistical-significance-a-brains-blog-roundtable.aspx.
Benjamin, D.J., J.O. Berger, M. Johannesson, B.A. Nosek, E.–.J. Wagenmakers, R. Berk, K.A. Bollen, B. Brembs, L. Brown, C. Camerer, D. Cesarini, C.D. Chambers, M. Clyde, T.D. Cook, P. De Boeck, Z. Dienes, A. Dreber, K. Easwaran, C. Efferson, E. Fehr, F. Fidler, A.P. Field, M. Forster, E.I. George, R. Gonzalez, S. Goodman, E. Green, D.P. Green, A. Greenwald, J.D. Hadfield, L.V. Hedges, L. Held, T.–.H. Ho, H. Hoijtink, J.H. Jones, D.J. Hruschka, K. Imai, G. Imbens, J.P.A. Ioannidis, M. Jeon, M. Kirchler, D. Laibson, J. List, R. Little, A. Lupia, E. Machery, S.E. Maxwell, M. McCarthy, D. Moore, S.L. Morgan, M. Munafó, S. Nakagawa, B. Nyhan, T.H. Parker, L. Pericchi, M. Perugini, J. Rouder, J. Rousseau, V. Savalei, F.D. Schönbrodt, T. Sellke, B. Sinclair, D. Tingley, T. Van Zandt, S. Vazire, D.J. Watts, C. Winship, R.L. Wolpert, Y. Xie, C. Young, J. Zinman, and V.E. Johnson. 2018. Redefine statistical significance. Nature Human Behavior 2 (1): 6–10.
Article Google Scholar
Bright, L. K. (2017). Supporting the redefinition of statistical significance. http://sootyempiric.blogspot.com/2017/07/supporting-redefinition-of-statistical.html.
Google Scholar
Button, K.S., J.P. Ioannidis, C. Mokrysz, B.A. Nosek, J. Flint, E.S. Robinson, and E.R. Munafò. 2013. Power failure: Why small sample size undermines the reliability of neuroscience. Nature Review Neuroscience 14: 365376. https://doi.org/10.1038/nrn3475.
Article Google Scholar
Chang, A. C., & Li, P. (2015). Is economics research replicable? Sixty published papers from thirteen journals say ‘usually not’. https://doi.org/10.17016/FEDS.2015.083. Available at SSRN: https://ssrn.com/abstract=2669564 or https://doi.org/10.2139/ssrn.2669564
Cohen, J. 1962. The statistical power of abnormal-social psychological research: A review. Journal of Abnormal and Social Psychology 65: 145–153.
Article Google Scholar
Colquhoun, D. 2014. An investigation of the false discovery rate and the misinterpretation of p-values. Royal Society Open Science 1 (3): 140216.
Article Google Scholar
Cox, D.R. 1977. The role of significance tests. Scandinavian Journal of Statistics 4: 49–63.
Google Scholar
Crane, H. (n.d.). Why ‘redefining statistical significance’ will not improve reproducibility and could make the replication crisis worse.
de Ruiter. 2019. Redefine or justify? Comments on the alpha debate. Psychonomic Bulletin & Review 26 (2): 430–433.
Esarey, J. (2017). Lowering the threshold of statistical significance to p < 0.005 to encourage enriched theories of politics. https://thepoliticalmethodologist.com/2017/08/07/in-support-of-enriched-theories-of-politics-a-case-for-lowering-the-threshold-of-statistical-significance-to-p-0-00
Etz, A., and J. Vandekerckhove. 2016. A Bayesian perspective on the reproducibility project: Psychology. PLoS One 11 (2): e0149794.
Article Google Scholar
Fanelli, D. 2010. “Positive” results increase down the hierarchy of the sciences. PLoS One 5 (4): e10068.
Article Google Scholar
Fraley, R.C., and S. Vazire. 2014. The N-pact factor: Evaluating the quality of empirical journals with respect to sample size and statistical power. PLoS One 9 (10): e109019.
Article Google Scholar
García-Pérez, M.A. 2017. Thou shalt not bear false witness against null hypothesis significance testing. Educational and Psychological Measurement 77: 631–662.
Article Google Scholar
Gelman, A. (2017a). When considering proposals for redefining or abandoning statistical significance, remember that their effects on science will only be indirect! http://andrewgelman.com/2017/10/03/one-discussion-redefining-abandoning-statistical-significance/.
Gelman, A. (2017b). Response to some comments on “abandon statistical significance.” http://andrewgelman.com/2017/10/02/response-comments-abandon-statistical-significance/.
Giner-Sorolla, R., (2018). Justify your alpha … for its audience. https://approachingblog.wordpress.com/2018/03/28/justify-your-alpha-to-an-audience/.
Greenland, S. 2010. Comment: The need for syncretism in applied statistics. Statistical Science 25 (2): 158–161.
Article Google Scholar
Greenwald, A.G. 1976. An editorial. Journal of Personality and Social Psychology 33: 1–7.
Article Google Scholar
Guilera, G., M. Barrios, and J. Gómez-Benito. 2013. Meta-analysis in psychology: A bibliometric study. Scientometrics 94 (3): 943–954.
Article Google Scholar
Hamlin, K. (2017). Commentary by Kiley Hamlin. http://philosophyofbrains.com/2017/10/02/should-we-redefine-statistical-significance-a-brains-blog-roundtable.aspx.
Ioannidis, J.P.A. 2005. Why most published research findings are false. PLoS Medicine 2 (8): e124.
Article Google Scholar
Ioannidis, J.P.A. 2016. The mass production of redundant, misleading, and conflicted systematic reviews and meta-analyses. The Milbank Quarterly 94 (3): 485–514.
Article Google Scholar
Lakens, D., F.G. Adolfi, C.J. Albers, F. Anvari, M.A.J. Apps, S.E. Argamon, T. Baguley, R.B. Becker, S.D. Benning, D.E. Bradford, E.M. Buchanan, A.R. Caldwell, B. Van Calster, R. Carlsson, S.-C. Chen, B. Chung, L.J. Colling, G.S. Collins, Z. Crook, E.S. Cross, S. Daniels, H. Danielsson, L. DeBruine, D.J. Dunleavy, B.D. Earp, M.I. Feist, J.D. Ferrell, J.G. Field, N.W. Fox, A. Friesen, C. Gomes, M. Gonzalez-Marquez, J.A. Grange, A.P. Grieve, R. Guggenberger, J. Grist, A.-L. van Harmelen, F. Hasselman, K.D. Hochard, M.R. Hoffarth, N.P. Holmes, M. Ingre, P.M. Isager, H.K. Isotalus, C. Johansson, K. Juszczyk, D.A. Kenny, A.A. Khalil, B. Konat, J. Lao, E.G. Larsen, G.M.A. Lodder, J. Lukavský, C.R. Madan, D. Manheim, S.R. Martin, A.E. Martin, D.G. Mayo, R.J. McCarthy, K. McConway, C. McFarland, A.Q.X. Nio, G. Nilsonne, C.L. de Oliveira, J.-J.O. de Xivry, S. Parsons, G. Pfuhl, K.A. Quinn, J.J. Sakon, S.A. Saribay, I.K. Schneider, M. Selvaraju, Z. Sjoerds, S.G. Smith, T. Smits, J.R. Spies, V. Sreekumar, C.N. Steltenpohl, N. Stenhouse, W. Świątkowski, M.A. Vadillo, M.A.L.M. Van Assen, M.N. Williams, S.E. Williams, D.R. Williams, T. Yarkoni, I. Ziano, and R.A. Zwaan. 2018. Justify your alpha. Nature Human Behaviour 2 (3): 168–171.
Article Google Scholar
Lemoine, N.P., A. Hoffman, A.J. Felton, L. Baur, F. Chaves, J. Gray, Q. Yu, and M.D. Smith. 2016. Underappreciated problems of low replication in ecological field studies. Ecology 97 (10): 2554–2561.
Article Google Scholar
Lindley, D.V. 1957. A statistical paradox. Biometrika 44: 187–192.
Article Google Scholar
Machery, E. 2014. Significance testing in neuroimagery. In New waves in the philosophy of mind, ed. J. Kallestrup and M. Sprevak, 262–277. Palgrave Macmillan.
Machery, E. (n.d.). What is a replication?.
Malinsky, D. (2017). Significant moral hazard. https://sootyempiric.blogspot.com/2017/08/significant-moral-hazard.html.
Marsman, M., and E.J. Wagenmakers. 2017. Three insights from a Bayesian interpretation of the one-sided p-value. Educational and Psychological Measurement 77 (3): 529–539.
Article Google Scholar
Mayo, D. (2017a). Commentary by Deborah Mayo. http://philosophyofbrains.com/2017/10/02/should-we-redefinestatistical-significance-a-brains-blog-roundtable.aspx.
Mayo, D. (2017b). Why significance testers should reject the argument to “redefine statistical significance”, even if they want to lower the p-value. https://errorstatistics.com/2017/12/17/why-significance-testers-should-reject-the-argument-to-redefine-statistical-significance-even-if-they-want-to-lower-the-p-value/.
McShane, B. B., Gal, D., Gelman, A., Robert, C., & Tackett, J. L. (2018). Abandon statistical significance. April 9, 2018.
Meehl, P.E. 1990. Why summaries of research on psychological theories are often uninterpretable. Psychological Reports 66: 195–244.
Article Google Scholar
Morey, E. (2017). When the statistical tail wags the scientific dog. Should we ‘redefine’ statistical significance? https://medium.com/@richarddmorey/when-the-statistical-tailwags-the-scientific-dog-d09a9f1a7c63.
Morey, E. (2018). Redefining statistical significance: The statistical arguments. https://medium.com/@richarddmorey/redefining-statistical-significance-the-statistical-arguments-ae9007bc1f91.
Oakes, L.M. 2017. Sample size, statistical power, and false conclusions in infant looking-time research. Infancy 22 (4): 436–469.
Article Google Scholar
Open Science Collaboration (2015). Estimating the reproducibility of psychological science. Science, 349, aac4716. https://doi.org/10.1126/science.aac4716.
Peters, G. J. (2017). Appropriate humility: Choosing sides in the alpha wars based on psychology rather than methodology and statistics. https://sciencer.eu/2017/08/appropriate-humility-choosing-sides-in-the-alpha-wars-based-on-psychology-rather-than-methodology-and-statistics/.
Schimmack, U. (2017). What would Cohen say? A comment on p < .005. https://replicationindex.wordpress.com/2017/08/02/what-would-cohen-say-a-comment-on-p-005/.
Schmalz, X. (2018). By how much would we need to increase our sample sizes to have adequate power with an alpha level of 0.005? http://xeniaschmalz.blogspot.ca/2018/02/by-how-much-would-we-need-to-increase.html?
Sedlmeier, P., and G. Gigerenzer. 1989. Do studies of statistical power have an effect on the power of studies? Psychological Bulletin 105: 309–316.
Article Google Scholar
Simmons, J.P., L.D. Nelson, and U. Simonsohn. 2011. False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science 22 (11): 1359–1366.
Article Google Scholar
Simonsohn, U., J.P. Simmons, and L.D. Nelson. 2015. Better P-curves: Making P-curve analysis more robust to errors, fraud, and ambitious P-hacking, a reply to Ulrich and Miller (2015). Journal of Experimental Psychology: General 144 (6): 1146–1152.
Article Google Scholar
Trafimow, D. 2018. An a priori solution to the replication crisis. Philosophical Psychology 31: 1188–1214.
Article Google Scholar
Trafimow, D., V. Amrhein, C.N. Areshenkoff, C. Barrera-Causil, E.J. Beh, Y. Bilgiç, R. Bono, M.T. Bradley, W.M. Briggs, H.A. Cepeda-Freyre, S.E. Chaigneau, D.R. Ciocca, J. Carlos Correa, D. Cousineau, M.R. de Boer, S.S. Dhar, I. Dolgov, J. Gómez-Benito, M. Grendar, J. Grice, M.E. Guerrero-Gimenez, A. Gutiérrez, T.B. Huedo-Medina, K. Jaffe, A. Janyan, A. Karimnezhad, F. Korner-Nievergelt, K. Kosugi, M. Lachmair, R. Ledesma, R. Limongi, M.T. Liuzza, R. Lombardo, M. Marks, G. Meinlschmidt, L. Nalborczyk, H.T. Nguyen, R. Ospina, J.D. Perezgonzalez, R. Pfister, J.J. Rahona, D.A. Rodríguez-Medina, X. Romão, S. Ruiz-Fernández, I. Suarez, M. Tegethoff, M. Tejo, R. van de Schoot, I. Vankov, S. Velasco-Forero, T. Wang, Y. Yamada, F.C. Zoppino, and F. Marmolejo-Ramos. 2018. Manipulating the alpha level cannot cure significance testing. Frontiers in Psychology 9, article 699. https://doi.org/10.3389/fpsyg.2018.00699.
Vankov, I., J. Bowers, and M.R. Munafò. 2014. On the persistence of low power in psychological science. The Quarterly Journal of Experimental Psychology 67 (5): 1037–1040.
Article Google Scholar
Wegner, D.M. 1992. The premature demise of the solo experiment. Personality and Social Psychology Bulletin 18 (4): 504–508.
Article Google Scholar
Zollman, K. (2017). Commentary by Kevin Zollman. http://philosophyofbrains.com/2017/10/02/should-we-redefinestatistical-significance-a-brains-blog-roundtable.aspx.

Download references

Acknowledgements

I owe the expression “alpha war” to Simine Vazire. Thanks to John Doris, Felipe Romero, and two reviewers for very helpful feedback.

Author information

Authors and Affiliations

University of Pittsburgh, Pittsburgh, PA, USA
Edouard Machery

Authors

Edouard Machery
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Edouard Machery.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Machery, E. The Alpha War. Rev.Phil.Psych. 12, 75–99 (2021). https://doi.org/10.1007/s13164-019-00440-1

Download citation

Published: 01 June 2019
Issue Date: March 2021
DOI: https://doi.org/10.1007/s13164-019-00440-1

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Alpha War

Abstract

Access this article

Similar content being viewed by others

Ruggles, Richard (1916–2001)

Introduction to Bayesian Inference for Psychology

Griess-Test

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Navigation

The Alpha War

Abstract

Access this article

Similar content being viewed by others

Ruggles, Richard (1916–2001)

Introduction to Bayesian Inference for Psychology

Griess-Test

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation