The Unreasonable Ineffectiveness of Fisherian “Tests” in Biology, and Especially in Medicine

McCloskey, Deirdre N.; Ziliak, Stephen T.

doi:10.1162/biot.2009.4.1.44

The Unreasonable Ineffectiveness of Fisherian “Tests” in Biology, and Especially in Medicine

Published: 14 April 2015

Volume 4, pages 44–53, (2009)
Cite this article

Biological Theory Aims and scope Submit manuscript

Deirdre N. McCloskey¹ &
Stephen T. Ziliak²

108 Accesses
10 Citations
Explore all metrics

Abstract

Biometrics has done damage with levels of R or p or Student’s t. The damage widened with Ronald A. Fisher’s victory in the 1920s and 1930s in devising mechanical methods of “testing,” against methods of common sense and scientific impact, “oomph.” The scale along which one would measure oomph is particularly clear in biomedical sciences: life or death. Cardiovascular epidemiology, to take one example, combines with gusto the “fallacy of the transposed conditional” and what we call the “sizeless stare” of statistical significance. Some medical editors have battled against the 5% philosophy, as did, for example, Kenneth Rothman, the founder of Epidemiology. And decades ago a sensible few in education, ecology, and sociology initiated a “significance test controversy.” But, grantors, journal referees, and tenure committees in the statistical sciences had faith that probability spaces can substitute for scientific judgment. A finding of p <.05 is deemed to be “better” for variable X than p <.11 for variable Y. It is not. It depends on the oomph of X and Y—the effect size, size judged in the light of how much it matters for scientific or clinical purposes. In 1995 a Cancer Trialists’ Collaborative Group, for example, came to a rare consensus on effect size: 10 different studies had agreed that a certain drug for treating prostate cancer can increase patient survival by 12%. An 11th study published in the New England Journal in 1998 dismissed the drug. The dismissal was based on a t-test, not on what William Gosset (the “Student” of Student’s t) had called, against Ronald A. Fisher’s machinery, “real” error.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Article Open access 01 April 2016

A 24-step guide on how to design, conduct, and successfully publish a systematic review and meta-analysis in medical research

Article 13 November 2019

Small is beautiful: In defense of the small-N design

Article Open access 19 March 2018

References

Altaian DG (1991) Statistics in medical journals: Developments in the 1980s. Statistics in Medicine 10: 1897–1913.
Article Google Scholar
American Psychological Association (APA) 1952 to 2001 [revisions] Publication Manual of the American Psychological Association. Washington, DC: APA.
Google Scholar
Berger JO (2003) Could Fisher, Jeffreys, and Neyman have agreed on testing? Statistical Science 18: 1–32.
Article Google Scholar
Cohen J (1994) The earth is round (p < 0.05). American Psychologist 49: 997–1003.
Article Google Scholar
David FN, ed (1966) Research Papers in Statistics: Festschrift for J. Neyman. London: Wiley.
Google Scholar
Eisenberger MA, Blumenstein BA, Crawford ED, Miller G, McLeod DG, Loehrer PJ, Wilding G, Sears K, Culkin DJ, Thompson IM, Bueschen AJ, Lowe BA (1998) Bilateral orchiectomy with or without flutamide for metastatic prostate cancer. New England Journal of Medicine 339: 1036–1042.
Article Google Scholar
Fidler F (2002) The fifth edition of the APA Publication Manual: Why its statistics recommendations are so controversial. Educational and Psychological Measurement 62: 749–770.
Article Google Scholar
Fidler F, Thomason N, Cumming G, Finch S, Leeman J (2004) Editors can lead researchers to confidence intervals but they can’t make them think: Statistical reform lessons from medicine. Psychological Science 15: 119–126.
Article Google Scholar
Fisher RA (1922) On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society A 222: 309–368.
Article Google Scholar
Fisher RA (1926) Bayes’ Theorem. Eugenics Review 18: 32–33.
Google Scholar
Fisher RA ([1956] 1959) Statistical Methods and Scientific Inference, 2nd ed. New York: Hafner.
Google Scholar
Fleiss JL (1986) Significance tests do have a role in epidemiological research: Reaction to AA Walker. American Journal of Public Health 76: 559–600.
Article Google Scholar
Freiman JA, Chalmers T, Smith H, Kuebler RR (1978) The importance of beta, the type II error and sample design in the design and interpretation of the randomized control trial: Survey of 71 negative trials. New England Journal of Medicine 299: 690–694.
Article Google Scholar
Goodman S (1999a) Toward evidence-based medical statistics. 1: The p-value fallacy. Annals of Internal Medicine 130: 995–1004.
Article Google Scholar
Hoover K, Siegler M (2008) Sound and fury: McCloskey and significance testing in economics. Journal of Economic Methodology 15: 1–37.
Article Google Scholar
International Committee of Medical Journal Editors (ICMJE) (1988) Uniform requirements for … statisticians and biomedical journal editors. Statistics in Medicine 7: 1003–1011.
Article Google Scholar
Jeffreys H (1963) Review of L. J. Savage, et al., The Foundations of Statistical Inference (Methuen, London and Wiley, New York, 1962). Technometrics 5: 407–410.
Google Scholar
Klein H, Elifson KW, Sterk CE (2003) Perceived temptation to use drugs and actual drug use among women. Journal of Drug Issues 33: 161–192.
Article Google Scholar
Lang JM, Rothman KJ, Cann CI (1998) That confounded p-value. Epidemiology 9: 7–8.
Article Google Scholar
Pearson ES (1990) [posthumously published by Plackett RL, Barnard GA, eds] ‘Student’: A Statistical Biography of William Sealy Gosset. Oxford: Clarendon Press.
Google Scholar
Rennie D (1978) Vive la Difference (p < 0.05). New England Journal of Medicine 299: 828–829.
Article Google Scholar
Rossi J (1990) Statistical power of psychological research: What have we gained in 20 years? Journal of Consulting and Clinical Psychology 58: 646–656.
Article Google Scholar
Rothman KJ (1978) A show of confidence. New England Journal of Medicine 299: 1362–1363.
Article Google Scholar
Rothman KJ (1986) Modern Epidemiology. New York: Little, Brown.
Google Scholar
Rothman KJ (1990) Writing for epidemiology. Epidemiology 9: 333–337.
Article Google Scholar
Rothman KJ, Johnson ES, Sugano DS (1999) Is flutamide effective in patients with bilateral orchiectomy? Lancet 353: 1184.
Article Google Scholar
Savitz DA, Tolo K, Poole C (1994) Statistical significance testing in the American Journal of Epidemiology, 1970–1990. American Journal of Epidemiology 139: 1047–1052.
Google Scholar
Shyrock RH (1961) The history of quantification in medical science. Isis 52: 215–237.
Article Google Scholar
Sterne JAC, Davey Smith G (2001) Sifting the evidence—What’s wrong with significance tests? British Medical Journal 322: 226–231.
Article Google Scholar
Zabell S (1989) R. A. Fisher on the history of inverse probability. Statistical Science 4: 247–263.
Article Google Scholar
Zellner A (1984) Basic Issues in Econometrics. Chicago: University of Chicago Press.
Google Scholar
Ziliak ST, Hannon J (2006) Public assistance: Colonial times to the 1920s. In Historical Statistics of the United States. (Carter SB, Gartner SS, Haines MR, Olmstead AL, Sutch R, Wright G, eds). New York: Cambridge University Press.
Google Scholar
Ziliak ST, McCloskey DN (2008) The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives. Ann Arbor, MI: University of Michigan Press.
Google Scholar

Download references

Author information

Authors and Affiliations

University of Illinois at Chicago, Chicago, IL, USA
Deirdre N. McCloskey
Roosevelt University, Chicago, IL, USA
Stephen T. Ziliak

Authors

Deirdre N. McCloskey
View author publications
You can also search for this author in PubMed Google Scholar
Stephen T. Ziliak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Deirdre N. McCloskey.

Rights and permissions

Reprints and permissions

About this article

Cite this article

McCloskey, D.N., Ziliak, S.T. The Unreasonable Ineffectiveness of Fisherian “Tests” in Biology, and Especially in Medicine. Biol Theory 4, 44–53 (2009). https://doi.org/10.1162/biot.2009.4.1.44

Download citation

Received: 02 October 2008
Accepted: 06 September 2009
Published: 14 April 2015
Issue Date: March 2009
DOI: https://doi.org/10.1162/biot.2009.4.1.44

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Unreasonable Ineffectiveness of Fisherian “Tests” in Biology, and Especially in Medicine

Abstract

Access this article

Similar content being viewed by others

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

A 24-step guide on how to design, conduct, and successfully publish a systematic review and meta-analysis in medical research

Small is beautiful: In defense of the small-N design

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The Unreasonable Ineffectiveness of Fisherian “Tests” in Biology, and Especially in Medicine

Abstract

Access this article

Similar content being viewed by others

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

A 24-step guide on how to design, conduct, and successfully publish a systematic review and meta-analysis in medical research

Small is beautiful: In defense of the small-N design

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation