The American Statistician 73 (Sup(1)):168-185 (2019)
Authors |
|
Abstract |
DOI: 10.1080/00031305.2018.1564697
When the editors of Basic and Applied Social Psychology effectively banned the use of null hypothesis significance testing (NHST) from articles published in their journal, it set off a fire-storm of discussions both supporting the decision and defending the utility of NHST in scientific research. At the heart of NHST is the p-value which is the probability of obtaining an effect equal to or more extreme than the one observed in the sample data, given the null hypothesis and other model assumptions. Although this is conceptually different from the probability of the null hypothesis being true, given the sample, p-values nonetheless can provide evidential information, toward making an inference about a parameter. Applying a 10,000-case simulation described in this article, the authors found that p-values’ inferential signals to either reject or not reject a null hypothesis about the mean (α = 0.05) were consistent for almost 70% of the cases with the parameter’s true location for the sampled-from population. Success increases if a hybrid decision criterion, minimum effect size plus p-value (MESP), is used. Here, rejecting the null also requires the difference of the observed statistic from the exact null to be meaningfully large or practically significant, in the researcher’s judgment and experience. The simulation compares performances of several methods: from p-value and/or effect size-based, to confidence-interval based, under various conditions of true location of the mean, test power, and comparative sizes of the meaningful distance and population variability. For any inference procedure that outputs a binary indicator, like flagging whether a p-value is significant, the output of one single experiment is not sufficient evidence for a definitive conclusion. Yet, if a tool like MESP generates a relatively reliable signal and is used knowledgeably as part of a research process, it can provide useful information.
|
Keywords | Minimum effect size plus p-value criterion statistical evidence meaningful effect size true power true Type I error rate |
Categories | (categorize this paper) |
Options |
![]() ![]() ![]() ![]() |
Download options
References found in this work BETA
Manipulating the Alpha Level Cannot Cure Significance Testing.David Trafimow, Valentin Amrhein, Corson N. Areshenkoff, Carlos J. Barrera-Causil, Eric J. Beh, Yusuf K. Bilgiç, Roser Bono, Michael T. Bradley, William M. Briggs, Héctor A. Cepeda-Freyre, Sergio E. Chaigneau, Daniel R. Ciocca, Juan C. Correa, Denis Cousineau, Michiel R. de Boer, Subhra S. Dhar, Igor Dolgov, Juana Gómez-Benito, Marian Grendar, James W. Grice, Martin E. Guerrero-Gimenez, Andrés Gutiérrez, Tania B. Huedo-Medina, Klaus Jaffe, Armina Janyan, Ali Karimnezhad, Fränzi Korner-Nievergelt, Koji Kosugi, Martin Lachmair, Rubén D. Ledesma, Roberto Limongi, Marco T. Liuzza, Rosaria Lombardo, Michael J. Marks, Gunther Meinlschmidt, Ladislas Nalborczyk, Hung T. Nguyen, Raydonal Ospina, Jose D. Perezgonzalez, Roland Pfister, Juan J. Rahona, David A. Rodríguez-Medina, Xavier Romão, Susana Ruiz-Fernández, Isabel Suarez, Marion Tegethoff, Mauricio Tejo, Rens van de Schoot, Ivan I. Vankov, Santiago Velasco-Forero, Tonghui Wang, Yuki Yamada, Felipe C. M. Zoppino & Fernando Marmolejo-Ramos - 2018 - Frontiers in Psychology 9.
Citations of this work BETA
No citations found.
Similar books and articles
Précis of Statistical Significance: Rationale, Validity, and Utility.Siu L. Chow - 1998 - Behavioral and Brain Sciences 21 (2):169-194.
The Undetectable Difference: An Experimental Look at the ‘Problem’ of P-Values.William M. Goodman - 2010 - Statistical Literacy Website/Papers: Www.Statlit.Org/Pdf/2010GoodmanASA.Pdf.
The Critics Rebutted: A Pyrrhic Victory.Stephan Lewandowsky & Murray Maybery - 1998 - Behavioral and Brain Sciences 21 (2):210-211.
When the Coefficient Hits the Clinic: Effect Size and the Size of the Effect.Brendan Maher - 1998 - Behavioral and Brain Sciences 21 (2):211-211.
The Null-Hypothesis Significance-Test Procedure is Still Warranted.Siu L. Chow - 1998 - Behavioral and Brain Sciences 21 (2):228-235.
Revisiting the Effect of Population Size on Cumulative Cultural Evolution.Ryan Baldini - 2015 - Journal of Cognition and Culture 15 (3-4):320-336.
An a Priori Solution to the Replication Crisis.David Trafimow - 2018 - Philosophical Psychology 31 (8):1188-1214.
Testing a Precise Null Hypothesis: The Case of Lindley’s Paradox.Jan Sprenger - 2013 - Philosophy of Science 80 (5):733-744.
A Closer Look at the Size of the Gaze-Liking Effect: A Preregistered Replication.Jason Tipples & Anna Pecchinenda - 2019 - Cognition and Emotion 33 (3):623-629.
Meta-Analysis, Power Analysis, and the Null-Hypothesis Significance-Test Procedure.Joseph S. Rossi - 1998 - Behavioral and Brain Sciences 21 (2):216-217.
Chow's Defense of Null-Hypothesis Testing: Too Traditional?Robert W. Frick - 1998 - Behavioral and Brain Sciences 21 (2):199-199.
Effect of Size on Visual Slant.Robert B. Freeman Jr - 1966 - Journal of Experimental Psychology 71 (1):96.
The Effect of Fertility Limitation on Intergenerational Social Mobility: The Quality–Quantity Trade-Off During the Demographic Transition.Jan van Bavel - 2006 - Journal of Biosocial Science 38 (4):553-569.
Synthetic Control Method: Inference, Sensitivity Analysis and Confidence Sets.Sergio Firpo & Vitor Possebom - 2018 - Journal of Causal Inference 6 (2).
Analytics
Added to PP index
2020-03-11
Total views
152 ( #75,933 of 2,499,389 )
Recent downloads (6 months)
23 ( #37,647 of 2,499,389 )
2020-03-11
Total views
152 ( #75,933 of 2,499,389 )
Recent downloads (6 months)
23 ( #37,647 of 2,499,389 )
How can I increase my downloads?
Downloads