David Bourget (Western Ontario)
David Chalmers (ANU, NYU)
Rafael De Clercq
Jack Alan Reynolds
Learn more about PhilPapers
British Journal for the Philosophy of Science 57 (2):323-357 (2006)
Despite the widespread use of key concepts of the Neyman–Pearson (N–P) statistical paradigm—type I and II errors, significance levels, power, confidence levels—they have been the subject of philosophical controversy and debate for over 60 years. Both current and long-standing problems of N–P tests stem from unclarity and confusion, even among N–P adherents, as to how a test's (pre-data) error probabilities are to be used for (post-data) inductive inference as opposed to inductive behavior. We argue that the relevance of error probabilities is to ensure that only statistical hypotheses that have passed severe or probative tests are inferred from the data. The severity criterion supplies a meta-statistical principle for evaluating proposed statistical inferences, avoiding classic fallacies from tests that are overly sensitive, as well as those not sensitive enough to particular errors and discrepancies. Introduction and overview 1.1 Behavioristic and inferential rationales for Neyman–Pearson (N–P) tests 1.2 Severity rationale: induction as severe testing 1.3 Severity as a meta-statistical concept: three required restrictions on the N–P paradigm Error statistical tests from the severity perspective 2.1 N–P test T(): type I, II error probabilities and power 2.2 Specifying test T() using p-values Neyman's post-data use of power 3.1 Neyman: does failure to reject H warrant confirming H? Severe testing as a basic concept for an adequate post-data inference 4.1 The severity interpretation of acceptance (SIA) for test T() 4.2 The fallacy of acceptance (i.e., an insignificant difference): Ms Rosy 4.3 Severity and power Fallacy of rejection: statistical vs. substantive significance 5.1 Taking a rejection of H0 as evidence for a substantive claim or theory 5.2 A statistically significant difference from H0 may fail to indicate a substantively important magnitude 5.3 Principle for the severity interpretation of a rejection (SIR) 5.4 Comparing significant results with different sample sizes in T(): large n problem 5.5 General testing rules for T(), using the severe testing concept The severe testing concept and confidence intervals 6.1 Dualities between one and two-sided intervals and tests 6.2 Avoiding shortcomings of confidence intervals Beyond the N–P paradigm: pure significance, and misspecification tests Concluding comments: have we shown severity to be a basic concept in a N–P philosophy of induction?
|Keywords||No keywords specified (fix it)|
|Categories||categorize this paper)|
Setup an account with your affiliations in order to access resources via your University's proxy server
Configure custom proxy (use this if your affiliation does not provide a proxy)
|Through your library|
References found in this work BETA
No references found.
Citations of this work BETA
Roger Stanev (2015). Early Stopping of RCTs: Two Potential Issues for Error Statistics. Synthese 192 (4):1089-1116.
Robert D. Cousins (forthcoming). The Jeffreys–Lindley Paradox and Discovery Criteria in High Energy Physics. Synthese:1-38.
Kent Staley & Aaron Cobb (2011). Internalist and Externalist Aspects of Justification in Scientific Inquiry. Synthese 182 (3):475-492.
R. P. Farrell & C. A. Hooker (2009). Error, Error-Statistics and Self-Directed Anticipative Learning. Foundations of Science 14 (4):249-271.
D. Mayo (2014). Some Surprising Facts About Surprising Facts. Studies in History and Philosophy of Science Part A 45 (1):79-86.
Similar books and articles
Johannes Lenhard (2006). Models and Statistical Inference: The Controversy Between Fisher and Neyman–Pearson. British Journal for the Philosophy of Science 57 (1):69-91.
Deborah G. Mayo (1981). In Defense of the Neyman-Pearson Theory of Confidence Intervals. Philosophy of Science 48 (2):269-280.
Peter Godfrey-Smith (1994). Of Nulls and Norms. PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association 1994:280 - 290.
Stephen Spielman (1974). The Logic of Tests of Significance. Philosophy of Science 41 (3):211-226.
G. William Moore, Grover M. Hutchins & Robert E. Miller (1986). A New Paradigm for Hypothesis Testing in Medicine, with Examination of the Neyman Pearson Condition. Theoretical Medicine and Bioethics 7 (3).
Max Albert (1992). Die Falsifikation Statistischer Hypothesen. Journal for General Philosophy of Science / Zeitschrift für Allgemeine Wissenschaftstheorie 23 (1):1 - 32.
Deborah G. Mayo (1985). Behavioristic, Evidentialist, and Learning Models of Statistical Testing. Philosophy of Science 52 (4):493-516.
Deborah G. Mayo (2008). How to Discount Double-Counting When It Counts: Some Clarifications. British Journal for the Philosophy of Science 59 (4):857-879.
Andrés Rivadulla (1991). Mathematical Statistics and Metastatistical Analysis. Erkenntnis 34 (2):211 - 236.
Deborah G. Mayo (1991). Novel Evidence and Severe Tests. Philosophy of Science 58 (4):523-552.
Added to index2009-01-28
Total downloads54 ( #62,035 of 1,725,430 )
Recent downloads (6 months)6 ( #110,403 of 1,725,430 )
How can I increase my downloads?