Skip to main content
Log in

The Unreasonable Ineffectiveness of Fisherian “Tests” in Biology, and Especially in Medicine

  • Published:
Biological Theory Aims and scope Submit manuscript

Abstract

Biometrics has done damage with levels of R or p or Student’s t. The damage widened with Ronald A. Fisher’s victory in the 1920s and 1930s in devising mechanical methods of “testing,” against methods of common sense and scientific impact, “oomph.” The scale along which one would measure oomph is particularly clear in biomedical sciences: life or death. Cardiovascular epidemiology, to take one example, combines with gusto the “fallacy of the transposed conditional” and what we call the “sizeless stare” of statistical significance. Some medical editors have battled against the 5% philosophy, as did, for example, Kenneth Rothman, the founder of Epidemiology. And decades ago a sensible few in education, ecology, and sociology initiated a “significance test controversy.” But, grantors, journal referees, and tenure committees in the statistical sciences had faith that probability spaces can substitute for scientific judgment. A finding of p <.05 is deemed to be “better” for variable X than p <.11 for variable Y. It is not. It depends on the oomph of X and Y—the effect size, size judged in the light of how much it matters for scientific or clinical purposes. In 1995 a Cancer Trialists’ Collaborative Group, for example, came to a rare consensus on effect size: 10 different studies had agreed that a certain drug for treating prostate cancer can increase patient survival by 12%. An 11th study published in the New England Journal in 1998 dismissed the drug. The dismissal was based on a t-test, not on what William Gosset (the “Student” of Student’s t) had called, against Ronald A. Fisher’s machinery, “real” error.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Altaian DG (1991) Statistics in medical journals: Developments in the 1980s. Statistics in Medicine 10: 1897–1913.

    Article  Google Scholar 

  • American Psychological Association (APA) 1952 to 2001 [revisions] Publication Manual of the American Psychological Association. Washington, DC: APA.

    Google Scholar 

  • Berger JO (2003) Could Fisher, Jeffreys, and Neyman have agreed on testing? Statistical Science 18: 1–32.

    Article  Google Scholar 

  • Cohen J (1994) The earth is round (p < 0.05). American Psychologist 49: 997–1003.

    Article  Google Scholar 

  • David FN, ed (1966) Research Papers in Statistics: Festschrift for J. Neyman. London: Wiley.

    Google Scholar 

  • Eisenberger MA, Blumenstein BA, Crawford ED, Miller G, McLeod DG, Loehrer PJ, Wilding G, Sears K, Culkin DJ, Thompson IM, Bueschen AJ, Lowe BA (1998) Bilateral orchiectomy with or without flutamide for metastatic prostate cancer. New England Journal of Medicine 339: 1036–1042.

    Article  Google Scholar 

  • Fidler F (2002) The fifth edition of the APA Publication Manual: Why its statistics recommendations are so controversial. Educational and Psychological Measurement 62: 749–770.

    Article  Google Scholar 

  • Fidler F, Thomason N, Cumming G, Finch S, Leeman J (2004) Editors can lead researchers to confidence intervals but they can’t make them think: Statistical reform lessons from medicine. Psychological Science 15: 119–126.

    Article  Google Scholar 

  • Fisher RA (1922) On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society A 222: 309–368.

    Article  Google Scholar 

  • Fisher RA (1926) Bayes’ Theorem. Eugenics Review 18: 32–33.

    Google Scholar 

  • Fisher RA ([1956] 1959) Statistical Methods and Scientific Inference, 2nd ed. New York: Hafner.

    Google Scholar 

  • Fleiss JL (1986) Significance tests do have a role in epidemiological research: Reaction to AA Walker. American Journal of Public Health 76: 559–600.

    Article  Google Scholar 

  • Freiman JA, Chalmers T, Smith H, Kuebler RR (1978) The importance of beta, the type II error and sample design in the design and interpretation of the randomized control trial: Survey of 71 negative trials. New England Journal of Medicine 299: 690–694.

    Article  Google Scholar 

  • Goodman S (1999a) Toward evidence-based medical statistics. 1: The p-value fallacy. Annals of Internal Medicine 130: 995–1004.

    Article  Google Scholar 

  • Hoover K, Siegler M (2008) Sound and fury: McCloskey and significance testing in economics. Journal of Economic Methodology 15: 1–37.

    Article  Google Scholar 

  • International Committee of Medical Journal Editors (ICMJE) (1988) Uniform requirements for … statisticians and biomedical journal editors. Statistics in Medicine 7: 1003–1011.

    Article  Google Scholar 

  • Jeffreys H (1963) Review of L. J. Savage, et al., The Foundations of Statistical Inference (Methuen, London and Wiley, New York, 1962). Technometrics 5: 407–410.

    Google Scholar 

  • Klein H, Elifson KW, Sterk CE (2003) Perceived temptation to use drugs and actual drug use among women. Journal of Drug Issues 33: 161–192.

    Article  Google Scholar 

  • Lang JM, Rothman KJ, Cann CI (1998) That confounded p-value. Epidemiology 9: 7–8.

    Article  Google Scholar 

  • Pearson ES (1990) [posthumously published by Plackett RL, Barnard GA, eds] ‘Student’: A Statistical Biography of William Sealy Gosset. Oxford: Clarendon Press.

    Google Scholar 

  • Rennie D (1978) Vive la Difference (p < 0.05). New England Journal of Medicine 299: 828–829.

    Article  Google Scholar 

  • Rossi J (1990) Statistical power of psychological research: What have we gained in 20 years? Journal of Consulting and Clinical Psychology 58: 646–656.

    Article  Google Scholar 

  • Rothman KJ (1978) A show of confidence. New England Journal of Medicine 299: 1362–1363.

    Article  Google Scholar 

  • Rothman KJ (1986) Modern Epidemiology. New York: Little, Brown.

    Google Scholar 

  • Rothman KJ (1990) Writing for epidemiology. Epidemiology 9: 333–337.

    Article  Google Scholar 

  • Rothman KJ, Johnson ES, Sugano DS (1999) Is flutamide effective in patients with bilateral orchiectomy? Lancet 353: 1184.

    Article  Google Scholar 

  • Savitz DA, Tolo K, Poole C (1994) Statistical significance testing in the American Journal of Epidemiology, 1970–1990. American Journal of Epidemiology 139: 1047–1052.

    Google Scholar 

  • Shyrock RH (1961) The history of quantification in medical science. Isis 52: 215–237.

    Article  Google Scholar 

  • Sterne JAC, Davey Smith G (2001) Sifting the evidence—What’s wrong with significance tests? British Medical Journal 322: 226–231.

    Article  Google Scholar 

  • Zabell S (1989) R. A. Fisher on the history of inverse probability. Statistical Science 4: 247–263.

    Article  Google Scholar 

  • Zellner A (1984) Basic Issues in Econometrics. Chicago: University of Chicago Press.

    Google Scholar 

  • Ziliak ST, Hannon J (2006) Public assistance: Colonial times to the 1920s. In Historical Statistics of the United States. (Carter SB, Gartner SS, Haines MR, Olmstead AL, Sutch R, Wright G, eds). New York: Cambridge University Press.

    Google Scholar 

  • Ziliak ST, McCloskey DN (2008) The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives. Ann Arbor, MI: University of Michigan Press.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deirdre N. McCloskey.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

McCloskey, D.N., Ziliak, S.T. The Unreasonable Ineffectiveness of Fisherian “Tests” in Biology, and Especially in Medicine. Biol Theory 4, 44–53 (2009). https://doi.org/10.1162/biot.2009.4.1.44

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1162/biot.2009.4.1.44

Keywords

Navigation