The American Statistician 73 (Sup(1)):168-185 (2019)

William M. Goodman
University Of Ontario Institute Of Technology
DOI: 10.1080/00031305.2018.1564697 When the editors of Basic and Applied Social Psychology effectively banned the use of null hypothesis significance testing (NHST) from articles published in their journal, it set off a fire-storm of discussions both supporting the decision and defending the utility of NHST in scientific research. At the heart of NHST is the p-value which is the probability of obtaining an effect equal to or more extreme than the one observed in the sample data, given the null hypothesis and other model assumptions. Although this is conceptually different from the probability of the null hypothesis being true, given the sample, p-values nonetheless can provide evidential information, toward making an inference about a parameter. Applying a 10,000-case simulation described in this article, the authors found that p-values’ inferential signals to either reject or not reject a null hypothesis about the mean (α = 0.05) were consistent for almost 70% of the cases with the parameter’s true location for the sampled-from population. Success increases if a hybrid decision criterion, minimum effect size plus p-value (MESP), is used. Here, rejecting the null also requires the difference of the observed statistic from the exact null to be meaningfully large or practically significant, in the researcher’s judgment and experience. The simulation compares performances of several methods: from p-value and/or effect size-based, to confidence-interval based, under various conditions of true location of the mean, test power, and comparative sizes of the meaningful distance and population variability. For any inference procedure that outputs a binary indicator, like flagging whether a p-value is significant, the output of one single experiment is not sufficient evidence for a definitive conclusion. Yet, if a tool like MESP generates a relatively reliable signal and is used knowledgeably as part of a research process, it can provide useful information.
Keywords Minimum effect size plus p-value criterion  statistical evidence  meaningful effect size  true power  true Type I error rate
Categories (categorize this paper)
Edit this record
Mark as duplicate
Export citation
Find it on Scholar
Request removal from index
Revision history

Download options

PhilArchive copy

 PhilArchive page | Other versions
External links

Setup an account with your affiliations in order to access resources via your University's proxy server
Configure custom proxy (use this if your affiliation does not provide a proxy)
Through your library

References found in this work BETA

Add more references

Citations of this work BETA

No citations found.

Add more citations

Similar books and articles

Précis of Statistical Significance: Rationale, Validity, and Utility.Siu L. Chow - 1998 - Behavioral and Brain Sciences 21 (2):169-194.
The Undetectable Difference: An Experimental Look at the ‘Problem’ of P-Values.William M. Goodman - 2010 - Statistical Literacy Website/Papers: Www.Statlit.Org/Pdf/2010GoodmanASA.Pdf.
The Critics Rebutted: A Pyrrhic Victory.Stephan Lewandowsky & Murray Maybery - 1998 - Behavioral and Brain Sciences 21 (2):210-211.
The Null-Hypothesis Significance-Test Procedure is Still Warranted.Siu L. Chow - 1998 - Behavioral and Brain Sciences 21 (2):228-235.
Revisiting the Effect of Population Size on Cumulative Cultural Evolution.Ryan Baldini - 2015 - Journal of Cognition and Culture 15 (3-4):320-336.
An a Priori Solution to the Replication Crisis.David Trafimow - 2018 - Philosophical Psychology 31 (8):1188-1214.
Chow's Defense of Null-Hypothesis Testing: Too Traditional?Robert W. Frick - 1998 - Behavioral and Brain Sciences 21 (2):199-199.
Effect of Size on Visual Slant.Robert B. Freeman Jr - 1966 - Journal of Experimental Psychology 71 (1):96.


Added to PP index

Total views
152 ( #75,933 of 2,499,389 )

Recent downloads (6 months)
23 ( #37,647 of 2,499,389 )

How can I increase my downloads?


My notes