Counterexamples to a likelihood theory of evidence

Forster, Malcolm R.

doi:10.1007/s11023-006-9038-y

Counterexamples to a likelihood theory of evidence

Original Paper
Published: 27 October 2006

Volume 16, pages 319–338, (2006)
Cite this article

Minds and Machines Aims and scope Submit manuscript

Malcolm R. Forster¹

357 Accesses
13 Citations
Explore all metrics

Abstract

The likelihood theory of evidence (LTE) says, roughly, that all the information relevant to the bearing of data on hypotheses (or models) is contained in the likelihoods. There exist counterexamples in which one can tell which of two hypotheses is true from the full data, but not from the likelihoods alone. These examples suggest that some forms of scientific reasoning, such as the consilience of inductions (Whewell, 1858. In Novum organon renovatum (Part II of the 3rd ed.). The philosophy of the inductive sciences. London: Cass, 1967), cannot be represented within Bayesian and Likelihoodist philosophies of science.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Article Open access 01 April 2016

Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods

Article Open access 08 March 2021

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Article 30 August 2016

Notes

Terminology varies. In the computer science literature especially, a simple hypothesis is called a model and what I am calling a model is referred to as a model class.
A peculiar thing about the quote from Barnard (above) is that he refers to the likelihood of a simple hypothesis as a probability function. It is not a function except in the very trivial sense of mapping a single hypothesis to a single number.
Akaike (1973), Sakamoto, Ishiguro, and Kitagawa (1986), Forster and Sober (1994) and Burnham and Anderson (2002).
In contrast, the Law of Likelihood (LL) is very specific about how likelihoods are used in the comparison of simple hypotheses. Forster and Sober (2004) argue that AIC is a counterexample to LL. Unfortunately, Forster and Sober (2004) mistakenly describe LL as the likelihood principle, which was pointed out by Boik (2004) in the same volume. For the record, Forster and Sober (2004) did not intend to say anything about the likelihood principle—the present paper is the first publication in which I have discussed LP.
See Forster (2000) for a description of the best known model selection criteria, and for an argument that the Akaike framework is the conceptually clearest framework for understanding the problem of model selection because it clearly distinguishes criteria from goals.
The term ‘predictive accuracy’ was coined by Forster and Sober (1994), where it is given a precise definition in terms of SOS and likelihood fit functions.
I owe this suggestion to Jason Grossman.
The problem is the same one discussed in Forster, 1988b.
While the refutation is not refutation in the strict logical sense, the number of data in the example can be increased to whatever number you like, so it becomes arbitrarily close to that ideal.
Fitelson (1999) shows that choice of the difference measure does matter in some applications. But that issue does not arise here.
Causal modeling of this kind has received a great deal of attention in recent years. See Pearl (2000) for a comprehensive survey of recent results, as well as Woodward (2003) for an introduction that is more accessible to philosophers.
The word ‘constraint’ is borrowed from Sneed (1971), who introduced it as a way of constraining submodels. Although the sense of ‘model’ assumed here is different from Sneed’s, the idea is the same.
Myrvold and Harper (2002) criticize the Akaike criterion of model selection (Forster & Sober, 1994) because it underrates the importance of the agreement of independent measurements in Newton’s argument for universal gravitation (see Harper, 2002 for an intriguing discussion of Newton’s argument). While this paper supports their conclusion, it does so in a more precise and general way. The important advance in this paper is (1) to point out that the limitation applies to all model selection criteria based on the Likelihood Principle and (2) to pinpoint exactly where the limitation lies. Nor is it my conclusion that statistics does not have the resources to address the problem.
Wasserman (2000) provides a nice survey.
Hooker (1987) and Norton (1993, 2000) discuss relevant issues and examples; in fact, there is a wealth of good literature in the philosophy of and history of science that deserves serious attention from outsiders.

References

Aitkin, M. (1991). Posterior Bayes factors. Journal of the Royal Statistical Society B, 53, 111–142.
MATH Google Scholar
Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. N. Petrov, & F. Csaki (Eds.), 2nd International symposium on information theory (pp. 267–281). Budapest: Akademiai Kiado.
Google Scholar
Barnard, G. A. (1947). Review of Wald’s ‘Sequential analysis’. Journal of the American Statistical Association, 42, 658–669.
Article Google Scholar
Berger, J. O. (1985). Statistical decision theory and Bayesian analysis (2nd ed.). New York: Springer-Verlag.
MATH Google Scholar
Berger, J. O., & Wolpert, R. L. (1988). The likelihood principle (2nd ed.). Hayward, California: Institute of Mathematical Statistics.
MATH Google Scholar
Birnbaum, A. (1962). On the foundations of statistical inference (with discussion). Journal of the American Statistical Association, 57, 269–326.
Article MATH MathSciNet Google Scholar
Boik, R. J. (2004). Commentary. In M. Taper, & S. Lele (Eds.), The nature of scientific evidence (pp. 167–180). Chicago and London: University of Chicago Press.
Google Scholar
Burnham, K. P., & Anderson, D. R. (2002). Model selection and multi-model inference. New York: Springer Verlag.
Google Scholar
Earman, J. (1978). Fairy tales vs. an ongoing story: Ramsey’s neglected argument for scientific realism. Philosophical Studies, 33, 195–202.
Article Google Scholar
Edwards, A. W. F. (1987). Likelihood (Expanded edition). Baltimore and London: The John Hopkins University Press.
Fitelson, B. (1999). The plurality of Bayesian measures of confirmation and the problem of measure sensitivity. Philosophy of Science, 66, S362–S378.
Article MathSciNet Google Scholar
Forster, M. R. (1984). Probabilistic causality and the foundations of modern science. Ph.D. Thesis, University of Western Ontario.
Forster, M. R. (1986). Unification and scientific realism revisited. In A. Fine, & P. Machamer (Eds.), PSA 1986 (Vol. 1, pp. 394–405). E. Lansing, Michigan: Philosophy of Science Association.
Forster, M. R. (1988a). Unification, explanation, and the composition of causes in Newtonian mechanics. Studies in the History and Philosophy of Science, 19, 55–101.
Article Google Scholar
Forster, M. R. (1988b). Sober’s principle of common cause and the problem of incomplete hypotheses. Philosophy of Science, 55, 538–559.
Article Google Scholar
Forster, M. R. (2000). Key concepts in model selection: Performance and generalizability. Journal of Mathematical Psychology, 44, 205–231.
Article MATH Google Scholar
Forster, M. R. (forthcoming). The miraculous consilience of quantum mechanics. In E. Eells, & J. Fetzer (Eds.), Probability in science. Open Court.
Forster, M. R., & Sober, E. (1994). How to tell when simpler, more unified, or less ad hoc theories will provide more accurate predictions. British Journal for the Philosophy of Science, 45, 1–35.
MathSciNet Google Scholar
Forster, M. R., & Sober, E. (2004). Why likelihood? In M. Taper, & S. Lele (Eds.), The nature of scientific evidence (pp. 153–165). Chicago and London: University of Chicago Press.
Google Scholar
Friedman, M. (1981). Theoretical explanation. In R. A. Healey (Ed.), Time, reduction and reality (pp. 1–16). Cambridge: Cambridge University Press.
Google Scholar
Glymour, C. (1980). Explanations, tests, unity and necessity. Noûs, 14, 31–50.
Hacking, I. (1965). Logic of statistical inference. Cambridge: Cambridge University Press.
MATH Google Scholar
Harper, W. L. (2002). Howard Stein on Isaac Newton: Beyond hypotheses. In D. B. Malament (Ed.), Reading natural philosophy: Essays in the history and philosophy of science and mathematics (pp. 71–112). Chicago and La Salle, Illinois: Open Court.
Google Scholar
Hooker, C. A. (1987). A realistic theory of science. Albany: State University of New York Press.
Google Scholar
Jeffreys, H. (1961). Theory of probability (3rd ed.). Oxford: The Clarendon press.
Google Scholar
Mayo, D. G. (1996). Error and the growth of experimental knowledge. Chicago and London: The University of Chicago Press.
Google Scholar
Myrvold, W., & Harper, W. L. (2002). Model selection, simplicity, and scientific inference. Philosophy of Science, 69, S135–S149.
Article Google Scholar
Norton, J. D. (1993). The determination of theory by evidence: The case for quantum discontinuity, 1900–1915. Synthese, 97, 1–31.
Article MathSciNet Google Scholar
Norton, J. D. (2000). How we know about electrons. In R. Nola, & H. Sankey (Eds.), After Popper, Kuhn and Feyerabend (pp. 67–97). Kluwer Academic Press.
Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge: Cambridge University Press.
Royall, R. M. (1991). Ethics and statistics in randomized clinical trials (with discussion). Statistical Science, 6, 52–88.
MATH MathSciNet Google Scholar
Royall, R. M. (1997). Statistical evidence: A likelihood paradigm. Boca Raton: Chapman & Hall/CRC.
Google Scholar
Savage, L. J. (1976). On rereading R. A. Fisher (with discussion). Annals of Statistics, 42, 441–500.
Google Scholar
Sakamoto, Y., Ishiguro, M., & Kitagawa, G. (1986). Akaike information criterion statistics. Dordrecht: Kluwer Academic Publishers.
MATH Google Scholar
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461–465.
MATH MathSciNet Google Scholar
Sneed, J. D. (1971). The logical structure of mathematical physics. Dordrecht: D. Reidel.
MATH Google Scholar
Sober, E. (1993). Epistemology for empiricists. In H. Wettstein (Ed.), Midwest studies in philosophy (pp. 39–61). Notre Dame: University of Notre Dame Press.
Google Scholar
Wasserman, L. (2000). Bayesian model selection and model averaging. Journal of Mathematical Psychology, 44, 92–107.
Article MATH MathSciNet Google Scholar
Whewell, W. (1858). Novum organon renovatum. Reprinted as Part II of the 3rd ed. of The philosophy of the inductive sciences. London: Cass, 1967.
Whewell, W. (1989). In R. E. Butts (Ed.), Theory of scientific method. Indianapolis/Cambridge: Hackett Publishing Company.
Woodward, J. (2003). Making things happen: A theory of causal explanation. Oxford and New York: Oxford University Press.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Philosophy, University of Wisconsin-Madison, 5185 Helen C. White Hall, 600 North Park Street, Madison, WI, 53706, USA
Malcolm R. Forster

Authors

Malcolm R. Forster
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Malcolm R. Forster.

Additional information

Thanks go to all those who responded well to the first version of this paper presented at the University of Pittsburgh Center for Philosophy of Science on January 31, 2006, and especially to Clark Glymour. A revised version was presented at Carnegie-Mellon University on April 6, 2006. I also wish to thank Jason Grossman, John Norton, Teddy Seidenfeld, Elliott Sober, Peter Vranas, and three anonymous referees for valuable feedback on different parts of the manuscript.

This paper is part of the ongoing development of a half-baked idea about cross-situational invariance in causal modeling introduced in Forster (1984). I appreciated the encouragement at that time from Jeff Bub, Bill Demopoulos, Michael Friedman, Bill Harper, Cliff Hooker, John Nicholas, and Jim Woodward. Cliff Hooker discussed the idea in his (1987), and Jim Woodward suggested a connection with statistics, which has taken me 20 years to figure out.

Appendix

Theorem

If the maximum likelihood hypothesis in F is $Y=\frac{10}{\sqrt{101}}X+U$ and the observed variance of X is 101, then the observed variance of Y is also 101. Thus, the maximum likelihood hypothesis in B is $X=\frac{10}{\sqrt{101}}Y+Z,$ and they have the same likelihood. Moreover, for any α, β, and σ, there exist values of a, b, and s such that Y = α + β X + σ U and X = a + bY + sZ have the same likelihood.

Partial Proof

The observed X variance of data distributed in two Gaussian clusters with unit variance centered at X = −10 and X = +10, where the observed means of X and Y are 0, is equal to

$$ \hbox{Var}X=\frac{1}{2}\frac{1}{N/2}\sum{x_i^2}+\frac{1}{2}\frac{1}{N/2}\sum{x_j^2}, $$

where x _i denotes X values in the lower cluster and x _j denotes X values in the upper cluster. If all the x _i where equal to −10, and all the x _j were equal to +10, then VarX would be equal to 100. To that, one must add the effect of the local variances. More exactly,

$$ \hbox{Var}X=\frac{1}{2}\frac{1}{N/2}\sum{((x_i+10)-10)^2}+\frac{1}{2}\frac{1} {N/2}\sum{((x_j-10)+10)^2}=101. $$

From the equation $Y=\frac{10}{\sqrt{101}}X+U,$ it follows that $\hbox{Var}Y=\frac{100}{101}101+1=101.$ Standard formulae for regression curves now prove that $X=\frac{10}{\sqrt{101}}Y$ is the backwards regression line, where the observed residual variance is also equal to 1. Therefore, the two hypotheses have the same conditional likelihoods, and the same total likelihoods. It follows that the hypotheses $Y=\frac{10}{\sqrt{101}} X+\sigma U$ and $X=\frac{10}{\sqrt{101}}Y+\sigma Z$ have the same likelihoods for any value of σ. It is also clear that for any α, β, and σ, there exist values of a, b, and s such that Y = α + β X + σ U and X = a + bY + sZ have the same likelihoods.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Forster, M.R. Counterexamples to a likelihood theory of evidence. Minds & Machines 16, 319–338 (2006). https://doi.org/10.1007/s11023-006-9038-y

Download citation

Accepted: 14 July 2006
Published: 27 October 2006
Issue Date: August 2006
DOI: https://doi.org/10.1007/s11023-006-9038-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Counterexamples to a likelihood theory of evidence

Abstract

Access this article

Similar content being viewed by others

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Theorem

Partial Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Counterexamples to a likelihood theory of evidence

Abstract

Access this article

Similar content being viewed by others

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

Theorem

Partial Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation