Is explanatoriness a guide to confirmation? A reply to Climenhaga William Roche* and Elliott Sober† * Department of Philosophy, Texas Christian University, Fort Worth, TX, USA, e-mail: w.roche@tcu.edu † Department of Philosophy, University of Wisconsin, Madison, Madison, WI 53706, USA, e-mail: ersober@wisc.edu ABSTRACT: We (2013, 2014) argued that explanatoriness is evidentially irrelevant in the following sense. Let H be a hypothesis, O an observation, and E the proposition that H would explain O if H and O were true. Then Pr(H | O & E) = Pr(H | O). We defended this screening-off thesis (SOT) by discussing an example concerning smoking and cancer. Climenhaga (forthcoming) argues that SOT is mistaken because it delivers the wrong verdict about a slightly different smoking-and-cancer case. He also considers a variant of SOT, called "SOT*", and contends that it too gives the wrong result. We here reply to Climenhaga's arguments. We suggest that SOT provides a criticism of the widely held theory of inference called "inference to the best explanation" (IBE). Key words: Bayesianism, Climenhaga, confirmation, explanatoriness, inference to the best explanation, screening-off. 1 Introduction We (2013, 2014) argued that explanatoriness is evidentially irrelevant in the following sense: Screening-Off Thesis (SOT): Let H be a hypothesis, O an observation, and E the proposition that H would explain O if H and O were true. Then Pr(H | O & E) = Pr(H | O). Some comments are in order. First, the sense of evidential irrelevance used in SOT is Bayesian: evidential irrelevance is a matter of probabilistic irrelevance. SOT is neutral on whether there are alternative senses of evidential irrelevance on which explanatoriness is 2 evidentially relevant.1 Second, the probabilities at issue should be understood as rational credences. Third, SOT concerns explanatoriness in the sense of E; SOT allows that explanatoriness in some other sense might be evidentially relevant.2 There is no explicit mention in SOT of background information, so it might seem that SOT is meant to hold regardless of the background information codified in Pr. But SOT then is false. For suppose that Pr(H | O) < 1 and that the background information codified in Pr includes the proposition that ~E ∨ H. Then Pr(H | O & E) = 1 > Pr(H | O). So SOT, if it is to be true, must be restricted. But how? Our answer is that SOT should hold in at least a wide range of realistic cases3 where Pr includes information about sample frequencies, as in the following example. Suppose you examine a large random sample of people older than age 50 and observe the following two frequencies: FreqS(people who were heavy smokers before age 50) = 0.3 FreqS(people who were heavy smokers before age 50 | people who got lung cancer after age 50) = 0.7 Here "S" means sample. You then come across Joe. He is more than 50 years old, but he was not in your sample. What value should you assign to Pr(Joe was a heavy smoker before age 50)? And what value should you assign to Pr(Joe was a heavy smoker before age 50 | Joe got lung cancer after age 50)? Suppose that you estimate these probabilities by using your sample frequency information and that, for some 0 < α < β < 1, your estimates are: Pr(Joe was a heavy smoker before age 50) = α 1 McCain and Poston (2014) miss this point. They grant that SOT is true in certain cases, but contend that even in those cases explanatoriness is evidentially relevant because it affects what they term "weight of evidence", which they say bears on a probability's "resilience". 2 McCain and Poston miss this point; see Roche and Sober (2014, p. 197) for discussion. In Section 4, we consider a variant of E. 3 We do not mean "realistic in all respects" since we assume that logical omniscience holds in the cases in question. However, this idealization is harmless, for if the assumption of logical omniscience were dropped, then a thesis similar to SOT would hold. See Roche and Sober (2014, p. 195) for discussion. 3 Pr(Joe was a heavy smoker before age 50 | Joe got lung cancer after age 50) = β You then learn that Joe got lung cancer after age 50, so you increase your credence in Joe's being a heavy smoker before age 50, from α to β. This is rather routine. But now suppose that after you have done all this, you learn that if Joe was a heavy smoker before age 50 and Joe got lung cancer after age 50, then the former would explain the latter. Should this new information lead you to change your credence in the proposition that Joe was a heavy smoker before age 50? For example, should it lead you to increase your credence in that proposition from β to some greater value?4 The answer to both these questions, we submit, is negative. What you learned concerning what would explain what is evidentially irrelevant (in the Bayesian sense) to the proposition that Joe was a heavy smoker before age 50. Your credence in that proposition should remain at β. This is the idea behind SOT. We (2013, 2014) offer no theory of explanation. Does this vitiate our argument for SOT? We think not. Our focus is on causal explanation as in the example just considered. Our claim is that SOT is true on any reasonable understanding of causal explanation. If we are right, this would be a significant result, regardless of whether SOT holds when explanation is non-causal. Climenhaga (forthcoming) argues against SOT.5 He argues that SOT goes wrong when applied to a case like that of Joe. He also considers a variant of SOT, called "SOT*", and argues that it too is mistaken. Climenhaga has causal explanation in mind but he, like us, offers no theory thereof. We explain his arguments in Section 2, critique them in Section 3, and conclude in Section 4. 2 Climenhaga's arguments Climenhaga's arguments are framed in terms of a case that resembles the example just described about Joe. The probabilities he discusses concern the following propositions (where the labels "Ca" and "Sm" are ours): Ca: S gets cancer. 4 We assume that explanatoriness can be informative to logically omniscient subjects. This should be uncontroversial. E is not a logical truth in the example about Joe. Note that in cases where E is a logical truth, SOT is trivially true. 5 All references to Climenhaga are to Climenhaga (forthcoming). 4 Sm: S smokes. E: If Sm and Ca were true, then Sm would explain Ca. C1: Smoking sometimes causes cancer. C2: Cancer sometimes causes smoking. C3: Smoking and cancer sometimes have a common cause. C1, C2, and C3 are token-level, not type-level, causal claims. C1, for example, says that at least one token of smoking causes (in the past, present, or future) at least one token of cancer. 2.1 Climenhaga's argument against SOT Climenhaga points out that SOT entails that (6) Pr(Sm | Ca & E) = Pr(Sm | Ca) Here and below the numbers are his. For ease of expression, we modify notation and suppress reference to background information K. Climenhaga argues against SOT by arguing that (6) is false. His argument is based in part on the following claims: (16) Pr(Sm | Ca & E) = Pr(C1 | Ca & E)Pr(Sm | Ca & E & C1) + Pr(~C1 | Ca & E)Pr(Sm | Ca & E & ~C1) (17) Pr(Sm | Ca) = Pr(C1 | Ca)Pr(Sm | Ca & C1) + Pr(~C1 | Ca)Pr(Sm | Ca & ~C1) (18) Pr(C1 | Ca & E) > Pr(C1 | Ca) (19) Pr(Sm | Ca & E & C1) ≥ Pr(Sm | Ca & C1) (20) Pr(Sm | Ca & E & ~C1) = 0 Climenhaga takes each of these claims to be true and argues that from them it follows that (6) is true only if (*) Pr(~C1 | Ca)Pr(Sm | Ca & ~C1) = Pr(C1 | Ca & E)Pr(Sm | Ca & E & C1) – Pr(C1 | Ca)Pr(Sm | Ca & C1) 5 The label "(*)" is ours. Is (*) true? If the answer is negative, then, assuming that each of (16)-(20) is true, it follows that (6) is false, and so is SOT. Climenhaga addresses (*) in two paragraphs. The first is this: Nevertheless, for (6) to be true the second summand in (17) would need to exactly equal the difference between the first summand in (16) and the first summand in (17). While this could be the case, there is no reason to expect it a priori. Hence, far from being a general truth, if (6) is true in this case it is only by fortuitous coincidence. (p. 10) It might seem odd that Climenhaga is discussing whether (*) is a priori and noncoincidental. The issue, after all, is whether (*) is true. Climenhaga's point, we take it, is that (*) is false in at least some of the cases in which SOT is meant to hold. Why, though, should this point be accepted? We noted above that Climenhaga addresses (*) in two paragraphs. The second, which immediately follows the first, is this (where some notation and formatting have been modified): More importantly, the negative influence of E on Sm sketched above is not the kind of influence that either proponents or opponents of inference to the best explanation have had in mind when disagreeing about whether explanation is relevant to confirmation. And if we build into K information that screens off this influence, then ... [(6) is false]. For example, imagine that we know that nothing besides smoking will give S cancer (and that S will not get cancer for no reason). In this case Pr(~C1 | Ca) = 0 − if the only way for S to get cancer is from smoking, then if S gets cancer it is because of S's smoking, and so C1 is true. Hence the second summand in both (16) and (17) is 0, and the dominance of (16)'s first summand over (17)'s first summand is sufficient for it to be the case that ... Pr(Sm | Ca & E) > Pr(Sm | Ca). (p. 10) That completes Climenhaga's argument against SOT. 2.2 Climenhaga's argument against SOT* The thesis that Climenhaga calls SOT* is this:6 6 Climenhaga's official formulation of SOT* (p. 5) involves the expression "for all K" where K is the background information. We take this to be a slip, for SOT* thus formulated is trivially false. 6 Pr(Sm | Ca & C1) = Pr(Sm | Ca) Climenhaga argues against SOT* by arguing for (14) Pr(Sm | Ca & C1) > Pr(Sm | Ca) and his argument for (14) is based in part on the following claims: (4) Pr(Ca | Sm) > Pr(Ca) (7) Pr(Ca | Sm & ~[C1 ∨ C2 ∨ C3]) = Pr(Ca | ~[C1 ∨ C2 ∨ C3]) (9) Pr(Sm | ~[C1 ∨ C2 ∨ C3]) = Pr(Sm) Climenhaga takes these claims to show that (13) Pr(Sm | Ca & [C1 ∨ C2 ∨ C3]) > Pr(Sm | Ca) Since C1 is stronger than C1 ∨ C2 ∨ C3, it might seem that (13) entails: (14) Pr(Sm | Ca & C1) > Pr(Sm | Ca) However, confirmation is not monotonic, so caution is needed in moving from (13) to (14). Here, in its entirety, is what Climenhaga says about this last step (where some notation and formatting have been modified): Presumably any one of C1, C2, and C3 also licenses extrapolation from our frequency data. Consequently, we can replace C1 ∨ C2 ∨ C3 with any one of C1, C2, and C3, and ... (13) will remain true. In particular, it will be true that ... Pr(Sm | Ca & C1) > Pr(Sm | Ca). (p. 8) That completes Climenhaga's argument against SOT*. 7 3 Our critiques 3.1 Our critique of Climenhaga's argument against SOT Realizing that claims (16)-(20) aren't enough to refute SOT, Climenhaga assumes, additionally, that "nothing besides smoking will give S cancer (and that S will not get cancer for no reason)". Let's call this assumption "SOW" (= "Smoking is the Only Way"). Climenhaga is right that it follows from SOW that Pr(~C1 | Ca) = 0, but there are flies in the ointment. First, SOW contradicts (18) by making both Pr(C1 | Ca & E) and Pr(C1 | Ca) equal 1. Second, SOW entails (6) by making both Pr(Sm | Ca & E) and Pr(Sm | Ca) equal 1. Climenhaga's several assumptions are not mutually consistent, but there is more: SOW, far from being a problem for (6), entails it. So SOW is not a problem for SOT.7 Suppose, now, that (16)-(20) are all true in at least some of the cases in which SOT is meant to hold. Is it plausible that (*) is true in all such cases? To answer this question, let's return to the case of Joe. Recall that you start with two sample frequencies (with values 0.3 and 0.7), which you use to estimate Joe's probabilities: Pr(Joe was a heavy smoker before age 50) = α Pr(Joe was a heavy smoker before age 50 | Joe got lung cancer after age 50) = β Why are you able to do this? Not because Joe is a member of your sample; he is not. But then, why? The reason why has two parts. First, your sample frequency information (perhaps when supplemented with assumptions that do not say whether smoking and cancer are causally or explanatorily connected) enables you to estimate frequencies in the population from which the sample was drawn.8 In particular, it enables you to arrive at the following estimates (where "P" means population): FreqP(people who were heavy smokers before age 50) = α 7 The same is true of any alternative assumption on which Pr(~C1 | Ca) equals 0 and Ca entails Sm. 8 It is controversial how estimation problems should be solved. Broadly speaking, there are Bayesian and frequentist approaches. See Howson and Urbach (1993) for a Bayesian perspective and Romeijn (2016) for discussion of maximum likelihood estimation. For discussion of statistical decision theory, see Vassend, Sober, and Fitelson (forthcoming). 8 FreqP(people who were heavy smokers before age 50 | people who got lung cancer after age 50) = β Second, since from your perspective, Joe is a random member of the population, you align your probabilities for Joe with your estimated population frequencies. Now why should your credence in the proposition that Joe was a heavy smoker before age 50 remain at β upon learning that E (= if Joe was a heavy smoker before age 50 and got lung cancer after age 50, then the former would explain the latter)? The reason, in part, is that what you learn should have no impact on your estimated population frequencies. We say "in part" for a reason. Suppose that E should have no impact on your estimated population frequencies. However, suppose, further, that your background information includes the proposition that ~E ∨ Joe is a member of P*, where P* is a subpopulation of people who got lung cancer after age 50, and where your estimated frequency of people in P* who were heavy smokers before age 50 is greater than β. Then your credence in the proposition that Joe was a heavy smoker before age 50 should increase to a value greater than β upon learning E. This means that part of what is doing the work in the initial version of the case (not the version in this paragraph) is the fact that your background information doesn't include propositions like the disjunctive proposition that ~E ∨ Joe is a member of P*. This illustrates how background assumptions can be cooked up to render E evidentially relevant. There is still the question of whether it's plausible that (*) holds in all of the cases at issue. Our answer is yes, because, in part, it's plausible that in all such cases E should have no impact on your estimated population frequency of people who smoke among people who get cancer. These points generalize beyond Joe's smoking and cancer. There is a wide range of realistic cases involving sample frequency information in which E − the proposition that H would explain O if H and O were true − should have no impact on your estimated population frequencies and, in part because of this, should have no impact on your credence in H. 3.2 Our critique of Climenhaga's argument against SOT* The situation with respect to Climenhaga's argument against SOT* is similar. SOT*, recall, says that Pr(Sm | Ca & C1) = Pr(Sm | Ca). There is a wide range of realistic cases involving sample frequency information in which C1 should have no impact on your 9 estimate of the population frequency of people who smoke among people who get cancer and, in part because of this, should have no impact on your credence in Sm.9 Where, then, does Climenhaga's argument against SOT* go wrong? Climenhaga claims that the argument for (13) carries over to (14). However, consider Climenhaga's defense of (7) (where the expression "there is no explanatory connection between smoking and cancer" simply means that ~[C1 ∨ C2 ∨ C3]): This [viz., (7)] is because on the (extremely unlikely) hypothesis that there is no explanatory connection between smoking and cancer, the observed frequency data is a huge fluke. But we shouldn't expect huge flukes to continue. If the observed association of smoking and cancer is merely coincidental, then we should expect future smokers that we observe to have cancer at the same rate as the rest of the population. (p. 6) If the argument for (13) carried over to (14), then Climenhaga's defense of (7) would carry over to the following: (7*) Pr(Ca | Sm & ~C1) = Pr(Ca | ~C1) However, Climenhaga's defense of (7) does not carry over to (7*). The supposition that smoking never causes cancer leaves it open that they oftentimes have a common cause and thus leaves it open that it is not "coincidental" or a "fluke" that your sample frequency of people who smoke among people who get lung cancer is greater than your sample frequency of people who smoke. There is more. Neither (7) nor (13) strikes us as compelling. And, as we aim to show below, Climenhaga's argument for (7), and thus his argument for (13), fails. First, it is not true in general that: (**) If there is no explanatory connection between F and G, and if FreqS(G | F) > FreqS(G), then the frequency inequality is a fluke, and it therefore shouldn't be expected to continue to hold in the future. (**) is doubly mistaken. To see why consider a made-up case described by Sober 9 Climenhaga notes in effect (footnote 9) that it is not true in general that Pr(Sm | Ca) equals the sample frequency of people who smoke among people who get cancer. We are not claiming otherwise. Our claim, rather, is that Pr(Sm | Ca) equals your estimate of the population frequency of people who smoke among people who get cancer. 10 (2001).10 Let F be the class of days in the past 200 years on which British bread prices have been above average, G be the class of days on which Venetian sea levels have been above average, and suppose that both British bread prices and Venetian sea levels have increased monotonically during those two centuries. Then, though there is no explanatory connection between F and G, for a randomly selected day d in those 200 years, FreqS(d is in G | d is in F) > FreqS(d is in G) and the association is no fluke; new samples drawn from those 200 years will reliably exhibit the same pattern. The pattern is no accident, but this doesn't mean that it must persist forever. Second, even if it is required that flukes not continue forever, the example can be modified; suppose there is reason to expect that Venetian sea levels and British bread prices each continue to increase beyond those 200 years, each going up for its own endogenous reasons.11 Climenhaga acknowledges in effect that (**) is not always true; indeed, he thinks that the case of British bread prices and Venetian sea levels shows this. He finds it plausible, though, that there are no counterexamples to (**) if the background information includes all relevant temporal information. We disagree. Some counterexamples to (**) are spatial (see Sober 2008, p. 233). Climenhaga additionally argues that Sober (2001) should accept (**) as applied to the smoking-and-cancer case. He writes: ... Sober (2001: 342-43) agrees that separate cause explanations "often" do not predict correlations, and I think he should accept that (7) is such a case. According to Sober, inference to a common cause is often rational because it is frequently the case that a common cause explanation predicts a correlation where a separate cause explanation does not. ... However, it is clearly rational to infer a causal relationship between smoking and cancer from the frequency data in K. Hence, Sober's reasoning would suggest that (7) is a case in which the above principle is true. (pp. 6-7; italics are ours) It is not true, though, that Sober endorses the italicized statement. Sober is concerned with "favoring" (where favoring is understood in terms of the law of likelihood). His point is that often an observed association O favors a common-cause hypothesis CC over a separate-cause hypothesis SC in the sense that Pr(O | CC) > Pr(O | SC). It does not 10 Sober uses this example against Reichenbach's principle of the common cause. For discussion and references, see, in addition to Sober (2001), Sober (2008, 2015). 11 The assumption of monotonicity is inessential. See Sober (2001, Sec. 3). For a gaggle of real-world counterexamples to (**), see Vigen (2015). 11 follow, in the cases in question, that Pr(CC | O) > Pr(SC | O).12 Moreover, Sober's point, even when conjoined with the claim that it's rational to infer a causal relationship between smoking and cancer, leaves it open that (7) is false because the left side of (7) is greater than the right. One final point about SOT* is in order. SOT* is logically independent of SOT. So even if, contra what we have been arguing, the argument against SOT* succeeded, it would not follow that the same is true of the argument against SOT. 4 Conclusion Our primary point in all this is that explanatoriness (in the subjunctive sense according to which H would explain O if H and O were true) is evidentially irrelevant (in the Bayesian sense according to which evidential irrelevance is a matter of probabilistic irrelevance) in at least a wide range of realistic cases involving sample frequency information. Here, then, is a clear sense in which explanatoriness is not a guide to confirmation. The reader may wonder whether the case for SOT requires that probabilities be inferred from sample frequencies. It is easy to see how the same result can obtain when probabilities are dictated by a well-established theory. We tell you that Jack and Jill are either full siblings or father and daughter. Mendelian genetics tells you that, either way, Pr(Jill has one copy of rare gene G | Jack has one copy of rare gene G) = 0.5 We then tell you something new, that if Jack and Jill both have gene G, then Jack's having the gene would explain why Jill has it. If, as we can suppose, explanation here means causal explanation, it follows that you then know that if Jack and Jill both have gene G, then Jack is Jill's father, not her brother. This should change the probability you assign to Jill's having gene G not one bit. We have a secondary point. Recall that if your background information includes the proposition that ~E ∨ Joe is a member of P*, where P* is a sub-population of people who got lung cancer after age 50, and your estimated frequency of people who were heavy smokers before age 50 among members of P* is greater than β, then your credence in the proposition that Joe was a heavy smoker before age 50 should increase to a value greater than β upon learning E. Our secondary point is that there is nothing special about explanatoriness: it is like any other contingent proposition in that whether it is 12 It might seem that Sober (2001, p. 343) believes otherwise. But there he has in mind cases where Pr(CC) ≥ Pr(SC). 12 evidentially relevant hinges on background information. If explanatoriness is evidentially relevant in a given case, then this is not because explanatoriness is evidentially relevant in itself. There is no explicit mention in E or SOT of alternatives to H, but E in SOT can be replaced by a claim about explanatoriness that is comparative: C: H would explain O if H and O were true, where H is better than the alternatives as a potential explanation of E. If the story about Joe were suitably reformulated, then the point would be that when you learn that Joe's being a heavy smoker before age 50 would explain his getting lung cancer after age 50 where this explanation is better than the alternatives, your credence in the proposition that Joe was a heavy smoker before age 50 should remain at β. We thus arrive at the following: Comparative Screening-Off Thesis (CSOT): Let H be a hypothesis, O an observation, and C the proposition that H would explain O if H and O were true, where H is better than the alternatives as a potential explanation of E. Then Pr(H | O & C) = Pr(H | O). CSOT bears on the theory of inference called "inference to the best explanation" (IBE), which can be formalized as follows (see Lycan 2002 and Psillos 2007 for similar formulations): O. H would explain O if H and O were true. H is better than the alternatives as a potential explanation of E. Thus, in all probability, H. IBE-ists, we take it, hold that the second and third premises here are essential in that without them the conclusion would not follow. That is, IBE-ists hold that the probability of H given all three premises is high whereas the probability of H given just the first premise is not. If IBE is meant to apply to cases of causal explanation in which probabilities are estimated from sample frequencies, it follows that CSOT conflicts with IBE.13 The key point is that if CSOT is true, then there is a wide range of realistic cases 13 The same is true with respect to CSOT and a theory defended by Douven and Wenmackers (forthcoming, p. 5) according to which, roughly, if you learn O, and also learn that H is the best explanation (in a partition of hypotheses) of O, then H gets a 13 involving frequency data in which the kind of explanatoriness at issue in IBE is evidentially irrelevant.14 Acknowledgments Thanks to an anonymous referee for a helpful comment on a prior version of the paper. References Climenhaga, N. (forthcoming). How explanation guides confirmation. Philosophy of Science. Douven, I., and Wenmackers, S. (forthcoming). Inference to the best explanation versus Bayes's rule in a social setting. British Journal for the Philosophy of Science. Howson, C., and Urbach, P. (1993). Scientific reasoning: The Bayesian approach (2nd ed.). Chicago: Open Court. Lipton, P. (2004). Inference to the best explanation (2nd ed.). London: Routledge. Lycan, W. (2002). Explanation and epistemology. In P. Moser (Ed.), The Oxford handbook of epistemology (pp. 408-433). Oxford: Oxford University Press. McCain, K., and Poston, T. (2014). Why explanatoriness is evidentially irrelevant. Thought, 3, 145-153. Psillos, S. (2007). The fine structure of Inference to the Best Explanation. Philosophy and Phenomenological Research, 74, 441-448. Roche, W., and Sober, E. (2013). Explanatoriness is evidentially irrelevant, or inference to the best explanation meets Bayesian confirmation theory. Analysis, 73, 659-668. Roche, W., and Sober, E. (2014). Explanatoriness and evidence: A reply to McCain and Poston. Thought, 3, 193-199. Romeijn, J. (2016). Philosophy of statistics. The Stanford Encyclopedia of Philosophy (Spring 2016 Edition), E. Zalta (ed.), URL = <http://plato.stanford.edu/archives/spr2016/entries/statistics/>. Sober, E. (2001). Venetian sea levels, British bread prices, and the principle of the common cause. British Journal for the Philosophy of Science, 52, 331-346. probabilistic bonus in that your new probability for H should exceed your old conditional probability for H given O. 14 IBE-ists often hold that IBE is a successful theory of inference when the explanations in question are causal; see Lipton (2004) and Lycan (2002, p. 413). 14 Sober, E. (2008). Evidence and Evolution: The logic behind the science. Cambridge: Cambridge University Press. Sober, E. (2015). Ockham's razors: A user's manual. Cambridge: Cambridge University Press. Vassend. O., Sober, E., and Fitelson, B. (forthcoming). The philosophical significance of Stein's Paradox. European Journal for the Philosophy of Science. Vigen, T. (2015). Spurious correlations. New York: Hachette Books.