Coherence and Confirmation Through Causation Gregory Wheeler and Richard Scheines To appear in Mind Abstract Coherentism maintains that coherent beliefs are more likely to be true than incoherent beliefs, and that coherent evidence provides more confirmation of a hypothesis when the evidence is made coherent by the explanation provided by that hypothesis. Although probabilistic models of credence ought to be well-suited to justifying such claims, negative results from Bayesian epistemology have suggested otherwise. In this essay we argue that the connection between coherence and confirmation should be understood as a relation mediated by the causal relationships among the evidence and a hypothesis, and we offer a framework for doing so by fitting together probabilistic models of coherence, confirmation, and causation. We show that the causal structure among the evidence and hypothesis is sometimes enough to determine whether the coherence of the evidence boosts confirmation of the hypothesis, makes no difference to it, or even reduces it. We also show that, ceteris paribus, it is not the coherence of the evidence that boosts confirmation, but rather the ratio of the coherence of the evidence to the coherence of the evidence conditional on a hypothesis. 2 1. Introduction A man is dead and the police are asking questions. Two witnesses believed not to have conferred with one another have implicated Mrs. White in the murder of her employer, Dr. Black. Each of their statements alone is damaging to White, yet both witnesses have given the police the same detailed account of the crime, and it is partly the 'coherence' of their testimonies which lends an additional measure of support to the hypothesis that White killed Black. The focus of the investigation changes after the police discover that they were wrong about the witnesses not having talked to one another. The second witness, it turns out, was nowhere near the scene of the crime. She instead simply repeated to the police what the first witness had told her to say. So, in light of these revelations, the second witness's statement provides no reason for thinking that White killed Black, and the coherence of their testimonies, such as it is, lends no additional support whatsoever. Sometimes coherence appears to amplify the support that individual pieces of evidence confer on a hypothesis, other times it does not, yet explaining what accounts for this difference is a notoriously difficult problem. Consider the example of Black's murder. The case against White collapses not because of a change in the coherence of the witness testimonies per se, but rather because of a change in our understanding of what produced the coherence. In the first act, White killing Black is a good explanation for the otherwise improbable event of both witnesses reporting that she killed him. In the second act, however, the agreement between the witnesses is not explained by White having killed Black but rather by their collusion. The epistemic moral of the story, it would seem, is that whether or not coherence provides justification depends on what produces the coherence. Yet, critics of the coherence theory of justification from Alfred Ewing (1934) on have cautioned against pinning hopes for the coherence theory on intuitive examples of coherence, like our twoact murder mystery, in the absence of a detailed theory of coherence. In this paper we attempt to follow Ewing's counsel by introducing a formal framework to explicate 'what produces the coherence' means and to explain various examples of coherentist justification, including why independent witness testimony is epistemically better than hearsay, all things considered. To be more specific, we use the theory of causal Bayesian networks to represent different causal explanations for evidence to cohere, and we show how those causal relationships are a mediating factor in probabilistic accounts of coherence and confirmation. We are not overly concerned with 3 how coherence and confirmation should be modelled. Our interest is the relationship between probabilistic association (coherence) and incremental confirmation, and how this relationship is influenced by probabilistic constraints induced by casual structure. Thus, the paper is foremost an examination of how probabilistic models of coherence, confirmation, and causal systems fit together.1 We approach this project in three stages. After presenting basic probabilistic models of coherence and confirmation, we first examine the relationship between coherence and confirmation in purely probabilistic terms-that is, without causal structure-through focused correlation (Myrvold 1996, Wheeler 2009). Focused correlation is a ratio of two quantities, the degree of probabilistic association of a set of evidence and the degree of probabilistic association of that evidence conditional on a specific hypothesis. We offer two results which give conditions under which focused correlation tracks confirmation. Next we look at the role that causal structure plays in regulating the relationship between coherence and confirmation. We consider three basic causal scenarios, each involving three individual pieces of evidence that are individually relevant to a hypothesis but more or less coherent when considered in pairs. In one case the coherence between the evidence sets is the same, as it is in the Black murder example above, but the causal relationship between hypothesis and evidence is different. In another case the coherence of the evidence sets differ but the causal structure is the same. In a third case, evidence sets exhibit distinct levels of coherence and distinct causal structures. Finally, we discuss how these two components, probability and causal structure, combine to explain when coherence contributes to incremental confirmation and when it does not, ceteris paribus. The organization of the paper is as follows. In section 2 we identify coherence with probabilistic association and introduce two well-known measures of probabilistic association. In section 3 we introduce a variety of well-known measures of incremental confirmation. In section 4 we present the assumptions and models we will use to give structure to the idea of 'ceteris paribus' when we compare evidence sets that differ in their degree of coherence but are otherwise equal. In section 5 we describe the idea of focused correlation and extend results connecting coherence to confirmation through focused correlation (Wheeler 2009). In section 6 we present the case for making causal beliefs explicit, and trace several consequences for the relationship between coherence and confirmation that arise solely from the causal structure governing the evidence and hypothesis. In section 7 we discuss our results, and contrast our approach to coherence 1 Models of coherence or confirmation, or the relation between them are discussed by Bovens and Hartmann 2003a, 2003b, 2006, Douven and Meijs 2007, Fitelson 2003, Glass 2006, Meijs 2004, Olsson 2002, Shogenji 1999, Wheeler 2009. Causal Bayes Nets, the probabilistic model of causal systems now standard in computer science and statistics, are discussed in Pearl 2000, and in Spirtes, Glymour, and Scheines 2000. 4 with the approach taken in Bayesian epistemology. We give proofs of the main theorems in an appendix. 2. Simple Probabilistic Models of Coherence There are many things one might mean by claiming that a set of propositions is coherent. Perhaps the most common idea is simply that the propositions are associated. According to this notion, the coherence of a set of propositions rises along with the likelihood of any specific subset being true given that the complement of that subset is true. For example, the heights of biological siblings are associated. This was C. I. Lewis's approach (Lewis 1946, BonJour 1985), and one of its advantages is that it can track logical relations among propositions. For example, let ⎡S#⎤ abbreviate the schema ⎡The die landed # side up⎤, and consider two sets of propositions, T1 and T2, where each describes a set of possible outcomes from rolling a fair die once. T1: {S1, S2, S5 or S6}. T2: {S1 or S3, S1 or S3 or S5, S1 or S2 or S3} Clearly the set T2 is more coherent than T1, in Lewis's sense. However, given that the die is fair, the coherence of either set reflects only the logical relations among its propositions: the propositions in T1 are disjoint, whereas those in T2 overlap. Alternatively, we might consider two individuals, A and B, and two sets of logically unrelated propositions that describe them. T3: {A is a cowboy, A drinks Bordeaux, A sings karaoke} T4: {B is a salaryman, B drinks sake, B sings karaoke} Set T4 is more coherent than T3, again in Lewis's sense of coherence, but none of the coherence (or absence thereof) in either set derives from logical relations among the propositions. Instead, coherence within either set is due to contingent cultural facts about cowboys and salarymen. While Lewis's definition succumbs to counterexamples (Bovens and Olsson 2000, p. 688-9), most probabilistic measures of coherence derive from Lewis's general approach. Although we will stick to a probabilistic model of coherence as association, we explicitly exclude logical sources of coherence for two reasons. First, we want to use causal models over sets of propositions (evidence) that might be more or less coherent, and defining causal relations over logically related events or variables is a philosophical minefield. Second, in our view the technicalities that come with trying to handle 5 logically related propositions are a side issue that has done more to obscure than clarify philosophical questions about coherence. Notation We assume throughout that binary variables represent propositions. For example, suppose that E1 is a binary evidence variable representing a witness report, where (E1=true) codes for 'the witness reported that fact 1 is the case', written E1 for short, and (E1=false) codes for 'the witness reports that fact 1 is not the case', abbreviated by ¬E1. A straightforward account of coherence based on probabilistic association2 is the deviation from independence measure advanced by Tomoji Shogenji (1999):3 S(E1,E2,...,En ) = P(E1!E2!...!En ) P(E1)P(E2 )...P(En ) Still another measure of association for two variables, X and Y, is Pearson's correlation coefficient, which for binary variables is defined as: where the variance of a binary variable X is € σX 2 = P(X)(1−P(X)). 3. Confirmation The debate about how to model confirmation is contentious and might forever remain so. We have no desire to enter this debate here. Our concern is only to examine how popular probabilistic conceptions of incremental confirmation relate to popular, probabilistic notions of coherence. 2 Other proposals along these lines have been made by Huemer 1997, Cross 1999, Olsson 2002, Fitelson 2003, Glass 2006, and Wheeler 2009. 3 Although this definition of association is attributed to Shogenji in Bayesian epistemology, it predates him in the general statistics literature by several decades. € ρX ,Y = P(X ∩Y ) − P(X)P(Y ) σ XσY = P(X)[P(Y | X) − P(Y )] P(X)(1− P(X)) P(Y )(1− P(Y )) , 6 Several measures of confirmation have been offered. A few of the more popular options use probability to express how much confirmation an evidence set E provides to a hypothesis H (Elles and Fitelson 2002):4 • € r1(H,E) =df log P(H |E) P(H) • € l(H,E) =df log P(E | H) P(E |¬H) • € ko(H,E) =df P(E | H) P(E |¬H) P(E | H) +P(E |¬H) Cohen (1977) and John Earman (1992) define the idea of incremental confirmation of a hypothesis H by E2 after we already know E1: • € inc1(H,E1,E2) =df P(H | E1,E2)− P(H | E1) , and there is a similar form based on the difference measure r1 defined above: • € r2 (H,E1,E2) =df log P(H | E1,E2) P(H | E1) . An extension of incremental confirmation that normalizes for how much room above P(H|E1) there is for E2 to 'boost' the posterior of H is: • € inc2 (H,E1,E2) =df P(H | E1,E2) 1− P(H | E1) . Although inc1 and inc2 are viewed as stand-alone measures, they also may be combined to comprise measure Z (Crupi et al. 2007) for propositions H and E in unconditional form, where inc2(H,E) =df P(H|E) – P(H) / 1 – P(H) is used if P(H|E) ≥ P(H), inc1(H,E) =df P(H|E) – P(H) otherwise. Confirmation and Coherence Using a measure of coherence (Coh) and a measure of confirmation (Conf) we can ask, all else equal, whether there is a relationship between the coherence of an evidence set 4 The measures r1, l, and ko, are typically discussed in terms of evidence proposition E representing an evidence set E of arbitrary size by a single conjunction of the propositions in E. We will restrict our discussion in this paper mainly to evidence sets of size 2, i.e., |E| = 2. 7 and the confirmation that set provides to a hypothesis. More formally, for two evidence sets E and E', a measure of coherence, Coh, and a measure of confirmation, Conf, is it the case, or, if so, under what conditions is it the case, that more 'coherence' translates into more 'confirmation'? (CB) Coh(E) > Coh(E') ⇒ Conf(H,E) > Conf(H,E').5 As many authors have noted, for measures of coherence involving only association, the answer is clearly no. It is not the association of the evidence that matters so much as the reason for the association. Return to the Black murder and consider the difference between first-hand, independent testimony and hearsay. Whatever the coherence of two separate witness reports and the coherence of two reports where one of the reports is hearsay, these two evidence sets provide different confirmation to the hypothesis that White killed Black. It is not the presence or absence of coherence (association) between the witness reports alone that matters, but the coherence in conjunction with the reason for the coherence. Attempts to secure a connection between probabilistic models of coherence, understood as simple association, and probabilistic models of confirmation either smuggle in a reason for the coherence-for example, the partially reliable witness model of Hartmann and Bovens (2003a, 2003b)-or rely upon a definition of coherence that is partially built from the confirmation relation, as in (Bovens and Hartmann 2003b). We discuss the partially reliable witness model further in sections 6 and 7. Measures of coherence that explicitly include the hypothesis fare better. Accounts of coherence that include the causal explanation of the coherence should fare best of all. 4. Ceteris Paribus Ideally, we would like to compare the confirmation provided by two sets of evidence that differ in their degree of coherence when all else about the sets and their relationship to the hypothesis is equal. In this section we attempt to formalize this idea. In what follows we will assume that the domain D = <H,E> is the hypothesis H=true and an evidence set E = {E1=true,..,En=true), where H and E1,..,En are propositional (binary) variables, none of which are logically related. A propositional variable conveniently 5 For our results to apply to inc1, we stipulate that inc1(H,E) stands for inc1(H, E1,E2) and inc1(H,E') stands for inc1(H, E1,E3). A similar remark applies for interpreting inc2 and r2, too. 8 expresses either the content of a proposition, or a witness report of a proposition. Extending what follows to real-valued variables is certainly possible. By insisting that no logical relations obtain, we mean that there are positive probability distributions over D in which every pair of variables X and Y are probabilistically independent. This is not possible, for example, in a setting in which (E1 = Mrs. White killed Dr. Black), and (E2 = Mrs. White killed Dr. Black or Colonel Mustard killed Dr. Black), for in no positive distribution is E1 independent of E2. We assume this condition in order to activate the theory of causal Bayesian networks, which requires variables that are unrelated logically.6 We assume that P(D), a probability distribution over a domain of propositions D = <H,E>,7 is positive. We say that two distinct pieces of evidence Ei and Ej are equally confirmatory for a hypothesis H iff • P(H | Ei) = P(H | Ej), and • P(H | ¬Ei) = P(H | ¬Ej) Consider two conditions: (A1) Positive Relevance: all propositions in an evidence set E are positively relevant to H, i.e., ∀Ei ∈ E, P(H | Ei) > P(H) > P(H | ¬Ei). (A2) Equal Relevance: all propositions in an evidence set E are equally confirmatory, i.e., ∀Ei Ej ∈ E, P(H | Ei) = P(H | Ej), P(H | ¬Ei) = P(H | ¬Ej). We say that an evidence set whose elements satisfy (A1) with respect to H is a positive evidence set for H, and a positive evidence set whose elements satisfy (A2) with respect to H an equally positive evidence set (epe) for H. To determine whether positive coherence of an evidence set entails positive incremental confirmation of some hypothesis from that evidence set, we consider only positive evidence sets. To compare the confirmatory power for H of two sets of evidence E and E', where E and E' are identical in all respects except for their coherence, we first look at epe sets and then relax this condition to allow evidence of variable strength. 6 Witness reports whose contents are logically related are not themselves logically related in this way, for it is perfectly possible to have a measure involving propositional variables V1: (Witness 1 report = Mrs. White did it), and V2: (Witness 2 report = Mrs. White did it or Colonel Mustard did it), in which V1 and V2 are independent. 7 Probability can be interpreted as credal or objective, we don't care. 9 5. Focused Correlation Wheeler (2009) attempted to address the apparent disconnect between Shogenji coherence and confirmation by invoking the idea of the coherence conditional on the hypothesis. Using the ratio of the Shogenji coherence and the conditional Shogenji coherence, a relation first introduced by Wayne Myrvold (1996), Wheeler examined how focused correlation tracks confirmation. The focused correlation of a set of evidence E = {E1,..,En) with respect to a hypothesis H is the ratio of the coherence/association of the evidence conditional on H to the coherence/association of the evidence simpliciter, which can be expressed generally as: For cases in which (A1) is satisfied, if the focused correlation of E with respect to H is greater than 1, then there is more association in the evidence set E given H than there is in the evidence alone. So, when (A1) holds and ForH(E) > 1 we say that focused correlation is inflationary, and when (A1) holds and ForH(E) < 1 we say that it is deflationary. If ForH(E) = 1 we say that it is stable. Wheeler (2009) connected inflationary focused correlation and positive incremental confirmation. Before examining the role of causal structure, we strengthen these connections for the case of evidence sets with two variables. Consider hypothesis H and evidence sets E = {E1, E2} and E' = {E1, E3} satisfying assumption (A1). For each of the confirmation measures above, the confirmation of H on an evidence set E is positive (greater than 0) if ForH(E) is inflationary. € S(E1,E2) = P(E1∩E2) P(E1)P(E2) , € S(E1,E2 |H ) = P(E1∩E2 |H) P(E1 |H)P(E2 |H) . € ForH(E1,...,En ) =df S(E1,...,En | H) S(E1,...,En ) = P(H | E1,...,En )P(H) n -1 P(H | E1)...P(H | En ) . 10 Proposition 1: If E is a positive evidence set for H, and ForH(E) > 1, then all of the following hold: • r1(H,E) > 0 • r2(H,E) > 0 • l(H,E) > 0 • ko(H,E) > 0 • inc1(H,E) > 0 • inc2(H,E) > 0. Proposition 1 says that for any evidence set E in which all the evidence individually confirms H, that is, whenever H and E satisfy (A1), if E has a focused correlation for H above 1, then E provides positive confirmation of H by any of these six popular confirmation measures of incremental confirmation. If a set of evidence has more conditional Shogenji coherence on H than it does unconditionally, then the evidence provides positive confirmation to H. When we further assume that each piece of evidence is equally confirmatory to H individually, that is, when we strengthen the assumptions on evidence to satisfy both (A1) and (A2), then focused correlation tracks confirmation: Proposition 2: If E={E1, E2} and E'= {E1, E3}, and E ∪ E' is an equally positive evidence set for H, then all of the following inequalities are equivalent: • ForH(E) > ForH(E') • r1(H, E) > r1(H, E') • r2(H, E) > r2(H, E') • l(H, E) > l(H, E') • ko(H,E) > ko(H, E') • inc1(H, E) > inc1(H, E') • inc2(H, E)> inc2(H, E'). So, in at least two important respects, focused correlation tracks confirmation and incremental confirmation, whereas simple coherence (association) does not. Looking at the formula for focused correlation, it is immediate that two equally positive evidence sets can have equal association while having unequal focused correlation and thus unequal confirmation. The equal relevance condition is theoretically important for this result because it isolates the role that coherence may or may not play in boosting the confirmation of a hypothesis. But this condition is too restrictive in practice, since positive evidence sets may have 11 unequal strengths. One therefore might worry that Proposition 2 tells us more about the strength of the epe condition than it does about the virtues of focused correlation to track confirmation strength. This worry is overstated, however, since (Schlosshauer and Wheeler 2011) have shown how to generalize Proposition 2 for positive evidence sets without (A2) when the ratio of P(H | E2) to P(H | E3) is bounded by a variable relevance condition: (A2*) Variable Relevance: € ForH(E1,E3) ForH(E1,E2) < P(H | E2) P(H | E3) ≤1. Clearly, Proposition 2 holds as the special case when € P(H | E2) P(H | E3) =1. Call a positive evidence set satisfying (A2*) a variable, positive evidence set. Then, Proposition 2* (Schlosshauer and Wheeler 2011): Suppose E={E1, E2}, E'= {E1, E3}, and E ∪ E' is a variable, positive evidence set for H, and confi ranges over the six incremental confirmation measures above. Then, ForH(E) > ForH(E') if and only if confi(H,E) > confi(H,E'). Proposition 2* tells us how the equal relevance assumption can be relaxed while preserving the bidirectional tracking between focused correlation and confirmation. The key idea behind replacing (A2) with (A2*) is that, if the individual strengths of relevant evidence remain within the general limits specified by (A2*), this suffices to guarantee bidirectional tracking. If instead one is interested in a specific incremental confirmation measure, or is interested in only unidirectional tracking, even less stringent limits may apply (Schlosshauer and Wheeler 2011). Although focused correlation captures something about the relationship between coherence and confirmation, it does not represent the whole story, pace (Myrvold 2003).8 Consider again the two witnesses who have not conferred yet provide similar testimony implicating Mrs. White in the murder of Dr. Black. A natural way to make sure that witnesses do not coordinate their testimony about a hypothesis is to ensure that both 8 Myrvold (2003) is not concerned with coherence per se, but instead proposes a normalized form of focused correlation as an account of unified evidence for a hypothesis. While it is true that focused correlation controls all the parameters that determine the behavior of the most common incremental confirmation measures (and then some), there are logical / causal structures which regulate the relationships between evidence and hypothesis that escape the scope of focused correlation. Distributions satisfying the properties (A1) and (A3), described below and the motivation for proposition 3, are an example. 12 evidence variables are conditionally independent of the hypothesis variable, a property that is sometimes called evidential independence. (A3) Evidential Independence: any propositions E1,...,En ∈ E are evidentially independent with respect to H iff both (+) P(E1,...,En | H) = P(E1 | H) × ... × P(En | H), and (–) P(E1,...,En |¬H) = P(E1 |¬H) × ... × P(En |¬H).9 If we assume (A3) and positive relevance (A1) with regard to E = {E1, E2} and H, then the focused correlation of E1 and E2 with respect to H is strictly less than 1, thus the focused correlation is deflationary. However, the incremental confirmation of the hypothesis may still be positive.10 Notice that this case is not a counterexample to Proposition 1 since the antecedent is not satisfied. However, it does show that Proposition 1 does not apply in the seemingly ideal case of independent witness testimonies. We return to this point in section 7. Why does focused correlation capture something about the relationship between coherence and confirmation? And why does it work in some circumstances but not in others? The answer to both of these questions, we believe, depends on the causal structure governing the system. 6. Causal Structure The notion that causal relationships between hypothesis and evidence should play an important role in a theory of coherence is not a new one. Olsson (2002) remarks that: We may safely conclude that coherence is not truth conducive if the reports are entirely dependent on each other .... On the other hand, it is implausible to require full independence for coherence to have the desirable effect; intuitively, a tiny influence of one report on the other does not cancel out the effect of coherence entirely (2002, p. 259). 9 The positive (+) condition together with the negative condition (–) entails that (A3) defines evidentially independent variables with respect to a hypothesis variable; alternatively, we may stick to propositions and talk about each condition as one of two weaker variants of (A3), namely (A3+) and (A3–). 10 To see that focused correlation is deflationary, notice that the numerator is 1, due to independence (A3), but the denominator is greater than 1, due to positive relevance (A1). Thanks to David Danks for this point. To see the incremental confirmation is positive in this case, see Proposition 3 in section 6. 13 What Olsson means by 'the reports are entirely dependent on each other' is that they directly cause each other. Similarly, Bovens and Hartmann (2003a, 2003b) describe a witness testimony model incorporating (A3) and an analogue to our (A1), arguing that 'coherence will play a confidence boosting role when the information sources are independent and partially reliable' (2003b, p. 604). They too have at least a partially specified causal situation in mind: The coherence of the story is of no consequence when the sources have had a chance to confer or when the sources are reporting what they inferred from the facts that other sources are reporting on.... (2003b, p. 604) Moreover, even BonJour has remarked on the role that causal facts might play in coherentist justification: The fact that a belief was caused in this way rather than some other can play a crucial role in a special kind of coherentist justification. The idea is that the justification of these perceptual or observational beliefs, rather than merely appealing to the coherence of their propositional contents with the contents of other beliefs (so that the way that the belief was produced would be justificationally irrelevant), appeals instead to a general belief that beliefs caused in this special way (and perhaps satisfying further conditions as well) are generally true (2002, p. 206-7). Our thesis is that causal facts are relevant to coherentism. Our proposal is to represent causal relationships directly within a theory of coherence using causal Bayes nets. Causal Bayes Nets The role of causal structure can be made more explicit and formal by using Causal Bayes Nets, which provide all the apparatus needed to represent causal systems,11 and to characterize the constraints such structures impose on the probability distributions they might produce. Let a causal graph G = {V,E} be a set of random variables V and a set of directed edges E such that Ei  Ej ∈ E if and only if Ei is a direct cause of Ej relative to V. The set of direct causes of a variable are its parents. A set of variables V is causally sufficient just in case for every pair of variables Vi ,Vj ∈ V, the direct common causes of Vi ,Vj are also in V. 11 See Spirtes, Glymour, and Scheines 2000, and Pearl 2000. 14 An acyclic causal graph G over a causally sufficient set of variables V and a probability distribution P(V) satisfy the Causal Markov Axiom (Spirtes, Glymour and Scheines 2000) just in case P(V) factors according to the causal graph: This factorization12 imposes independence constraints on the probability distributions- the set of P(V)'s-that can be generated by the causal graph. Those independence constraints are characterized by the graph-theoretic relation of d-separation (Pearl 1988), and they can be viewed as the non-parametric consequences of qualitative causal structure. An additional axiom typically applied to causal Bayes nets is the Faithfulness assumption (Spirtes, et al. 2000). A graph G and a probability distribution P(V) over the variables13 in G satisfy the Faithfulness Axiom just in case the only independence relations in P(V) are those entailed by the Causal Markov axiom.14 If causal structure alone plays a mediating role between coherence and confirmation, then that connection should be through the independence constraints in distributions that are Markov and Faithful to the causal graph, which accurately describes the qualitative causal relationships between the propositions comprising the evidence and the hypothesis. The Common Cause Model One easy application of causal Bayes nets to the coherence debate is to causally interpret the model of partially reliable, independent witness reports discussed by Bovens and Hartmann (2003), Olsson (2002), and others. Figure 1, in which each Ri is a binary fact variable, Repi a binary witness report variable, and H a (hidden) binary hypothesis variable gives the most plausible interpretation of the partially reliable witness report model of Bovens and Hartmann. 12 If X has no parents, then P(X | parents(X)) = P(X). 13 Again, the Faithfulness Axiom applies to causally sufficient sets of variables. 14 Pearl's d-separation relation characterizes the independence relations entailed by the Causal Markov axiom for any acyclic graph (Pearl 1988). € P(V) = P(X | parents(X)). X∈V ∏ 15 Figure 1: Common Cause Model for Bovens and Hartmann A simplification of the Bovens-Hartmann model is the single-factor common cause model in Figure 2. Figure 2: Single-Factor Common Cause Model Interpreted as a causal Bayes net, this model entails (A3); that is, within a single-factor common cause model, any pair of evidence variables are independent conditional on H: ∀i,j, Ei _||_ Ej | H.15 How then does the causal structure in such a model mediate the relationship between coherence and confirmation? 15 Ei _||_ Ej | H is to be read: Ei is independent of Ej conditional on H, where Ei, Ej , and H are random variables, or sets of random variables. If Ei, Ej, and H are naturally interpreted as events, then they can just as easily be represented as a random variable with binary outcome, e.g., Ei=0 , for the event did not occur, and Ei,=1, for the event occurred. Repn Rn Rep2 R2 Rep1 R1 ... H ... a c b H E1 E2 ..... En 16 The answer is that the coherence between pieces of evidence in this model is entirely due to the relationship between the hypothesis and each piece of evidence individually. More precisely, the correlation between any pair of evidence variables Ei and Ej in a single factor common cause model is just the product of the correlations between Ei and H and Ej and H.16 In Figure 2, for example, let a parameterize the correlation between the hypothesis H and the evidence E1, b the correlation between H and E2, and c the correlation between H and En. Then, ρE1,E2 = ab, ρE1,En = ac, ρE2,En = bc. This leads to the conjecture that, in a single-factor common cause model that satisfies positive relevance (A1) and in which the prior probabilities of E2 and E3 are the same, if ρE1,E2 > ρE1,E3, then after knowing E1, the incremental confirmation provided by E2 to H exceeds that provided by E3. More formally, we have the following proposition about the relationship between correlation and confirmation in this class of models: Proposition 3. If {E1, E2, E3} satisfies positive relevance (A1) and independence (A3) with respect to H, and P(E2) = P(E3), then ρE1,E2 > ρE1,E3 ⇔ inci(H, E1, E2) > inci(H, E1, E3). In single-factor common cause models, coherence among the evidence arises from the individual relationships between the hypothesis and the evidence. So, for example, it is impossible within this class of models for two sets of equally positive and independent evidence to have different levels of correlation or different levels of Shogenji coherence: Proposition 4. If E = {E1, E2} and E' = {E1, E3} satisfy positive relevance (A1), equal relevance (A2), and independence (A3) with respect to H, and P(E2) = P(E3), then ρE1,E2 = ρE1,E3 and S(E1, E2) = S(E1, E3). 16 In a singly connected CBN with only binary variables, the correlation of any two variables is the product of the correlations between every pair of variables connected by an edge on the trek between them (Danks and Glymour 2001). Thus, if X,Y,Z occur in a singly connected CBN, with Y on the trek between X and Z, then: ρXZ = ρXY * ρYZ. The idea is simple, but the jargon requires some explanation. A network is singly connected just in case there is at most one undirected path between every pair of variables. A trek from X to Y is either a directed path from X to Y, a directed path from Y to X, or the concatenation of two directed paths from a third variable Z to both X and Y. For example, the only trek between Rep1 and Rep2 in Figure 1 is: Rep1  R1  H  R2  Rep2. In Figure 2, the only trek between E1 and E2 is: E1  H  E2. As all singlefactor common cause models are singly connected and because all connections are treks, the correlation between any two pieces of evidence Ei and Ej (i≠j) is the product of the correlation between Ei and the hypothesis H and the correlation between H and Ej. 17 Independence (A3) is necessary for Proposition 4, and single-factor common cause models entail (A3). Interestingly, any model in which the hypothesis d-separates the evidence also entails (A3). So, for example, Figure 3 also satisfies (A3). Figure 3: Alternative to Common Cause Model Moreover, if H and all Ei are binary propositional variables, then any probability distribution that can be parameterized by the single-factor common cause structure in Figure 2 can also be parameterized by Figure 3, and vice versa. One motivation for the common cause model arises from the view that coherence should confirm a hypothesis exactly when the explanation provided by that hypothesis, when true, is the source of the coherence. Since causes explain and common causes produce coherence, common cause models would seem to fit the bill. Jonathan Cohen (1977, p. 98) discusses an explanation-based conception of coherence in which the co-occurrence of a set of propositions is explained by a particular hypothesis. Cohen's explanationbased coherence contrasts the probability of the co-occurrence of the evidence when the hypothesis is true against the probability of that co-occurrence when the hypothesis is false. As this is basically a variation of the measure l from section 3, we might formalize Cohen's idea as so: € C1(E1,E2,...,En ,H) = P(E1,E2,...,En | H) P(E1,E2,...,En |¬H) . A similar measure assesses the ratio of how much increase in the probability of cooccurrence of the evidence is gained from supposing the hypothesis false to supposing it true, over how much could have been gained: € C2 (E1,E2,...,En ,H) = P(E1,E2,...,En | H) − P(E1,E2,...,En |¬H) 1− P(E1,E2,...,En |¬H) . How does Cohen's explanation-based conception of coherence relate to confirmation in common cause models? In the simplest case, in which two evidence sets that share a common member are compared, E = {E1, E2} and E' = {E1, E3}, this amounts to asking, H E1 E2 ..... En 18 if the explanation based association of E is larger than E', whether that difference in association entails that E2 provides more incremental confirmation than E3. In other words, the question is, for i = 1,2, does Ci(E,H) > Ci(E',H) entail inc1(H, E1, E2) > inc1(H, E1, E3)? Interestingly, the answer is no, unless E ∪ E' satisfies (A1) and (A2), and (A3),17 in which case the claim is trivially true because the antecedent cannot be satisfied.18 Coherence and Causation So far we have only considered causal models that entail (A3). But since not every causal model satisfies (A3), it is natural to consider how causal structure can constrain or mediate the relationship between coherence and confirmation in general. To begin to address this question, consider a causal model (Figure 4) that simultaneously represents three important limit cases: 1. Independence (A3): all of the coherence among the evidence is because of the hypothesis (e.g., E = {E1,E2}). 2. None of the coherence among the evidence is because of the hypothesis (e.g., E' = {E1,E3}). 3. The evidence has no coherence, but each piece of evidence is individually relevant to the hypothesis (e.g., E'' = {E1,E4}). Figure 4: Causal Model of the Murder of Dr. Black 17 They satisfy (A3) in virtue of the causal structure. 18 To compare this result with Olsson's (2005, pp. 126-33), he assumes (A1) and (A2) but not (A3). White kills Black H White is bankrupt E4 Windfall for Black E1 The newspaper reports that Black is rich E3 Miss Scarlett reports that H E2 W Col. Mustard reports that H E5 Professor Plum reports that H E6 19 The hypothesis of interest, H, is whether Mrs. White murdered Dr. Black. There are several pieces of evidence relevant to this hypothesis. E1 is whether or not Black receives a large inheritance prior to his death, and E4 is whether or not White is recently bankrupt. We code E1 =1 as 'windfall' and E4 =1 as 'bankrupt' so that both are positively relevant to H. Both of these facts are evidence for, but also causes of, the hypothesis of interest. We will assume that whether or not White is recently bankrupt has no causal connection to Black inheriting a fortune, so E1 and E4 are causally and probabilistically independent. Proposition E3 is the published newspaper report that Dr. Black struck it rich. As any reader of newspapers knows, gossip columns are only partially reliable.19 Still, we assume that such a report is an effect of whether or not Dr. Black is in fact wealthy, and probabilistically independent of everything else given the state of his finances. Finally, we have three testimonies on H by three partially reliable witnesses: Miss Scarlett (E2), Colonel Mustard (E5), and Professor Plum (E6). The independence relations entailed by the Causal Markov axiom applied to this model are numerous: 1. {E1, E3, E4} _||_ {E2, E5, E6} | H. 20 2. E2 _||_ E5 | H , E2 _||_ E6 | H , E5 _||_ E6 | H. 3. {E1, E3} _||_ E4 . 4. E1 _||_ E4 | E3. 5. {E2, E5, E6} _||_ E3 | any non-empty subset of {E1, H}. 6. H _||_ E3 | any subset of {E1, E2 , E4, E5, E6} that contains E1. We assume that any joint distribution P over these variables is Faithful to the causal graph in Figure 4. That is, no other independence relations over these variables hold in P.21 Consider first the two evidence sets, E = {E1, E2} and E' = {E1, E3}. The coherence in E is for the same reason that different effects of a common cause are coherent: any coherence between E1 and E2 is the result of the connection between E1 and H and between H and E2. The evidence set E' marks the other extreme – none of the coherence between E1 and E3 is the result of the correlation between E1 and H and between H and E3. If E and E' have identical coherence, do they afford different degrees of confirmation to 19 In the sense of Bovens and Hartmann (2003a); that is, P(E3 | ¬E1) ≤ P(E3 | E1) < 1. 20 The independence relations entailed by the graph are over variables. As the variables are binary, representing the truth of the propositions they express this independence actually denotes: E1 _||_ E2 | H & E1 _||_ E2 | ¬H. A similarly remark applies for each independence relation in the list that follows. 21 Faithfulness is explained in chapter 3 of (Spirtes, Glymour, and Scheines 2000). 20 H? Since both sets share E1, this reduces to the question of whether the incremental confirmation for H afforded by E2 always exceeds that of E3, or vice versa, or neither. By the causal structure of this model, H and E3 are independent conditional on E1, P(H | E1) = P(H | E1, E3), thus E3 provides zero incremental confirmation after E1. Thus, the question of whether E and E' afford different degrees of confirmation to H reduces to asking whether E2 provides positive incremental confirmation to H conditional on E1, i.e., P(H | E1, E2) > P(H | E1). The answer is yes, and it makes no difference how strong the relationship between H and E2 is, so long as it is positive. Proposition 5: If E = {E1, E2} and E' = {E1, E3} are positive evidence sets for H, then in any probability distribution P(H, E1, E2, E3} that is Markov and Faithful to the causal graph in Figure 4, inc1(H,E1,E2) > inc1(H,E1,E3). So coherence plays no role whatsoever in this case. It is the causal structure of the situation that determines the result. No Coherence Now consider evidence sets E = {E1, E2} and E'' = {E1, E4}. From the causal graph in Figure 4, we know that E1 and E4 are probabilistically independent, so E'' has zero association, which means zero correlation and a Shogenji coherence equal to 1. Is it nevertheless possible for E'' to provide more confirmation to H than E, even though E has positive coherence? The answer, in a surprisingly wide range of cases, is yes. Proposition 6: In cases for which E and E'' are equally positive evidence (epe) sets for H, then then in any probability distribution P(H, E1, E2, E4} that is Markov and Faithful to the causal graph in Figure 4, inc1(H,E1,E4) > inc1(H,E1, E1) if and only if α/β > S(E1, E2), where € α = P(H | E1,E4 ) P(H | E1) and β = P(H | E4 ) P(H) = P(H,E4 ) P(H)P(E4 ) . The incremental confirmation from an evidence set with no coherence (E'') exceeds the confirmation from an evidence set with positive coherence (E) just in case the ratio of the incremental confirmation provided by E4 after knowing E1 to the confirmation provided by E4 alone is greater than coherence of E. Clearly these propositions are just the tip of the iceberg. Most are restricted to simple evidence sets that overlap, others require fairly strong assumptions, and others involve 21 only particular measures of coherence and confirmation. What we hope is clear, however, is that a program in which one directly models the causal reason for coherence will aid in the project of explicating the relationship between coherence and confirmation. 7. Discussion The results in Propositions 1 and 2, when considered in the context of causal Bayes nets, can appear confusing and counterintuitive. Proposition 1 gives a sufficient condition for positive confirmation of a hypothesis H from an evidence set E. If the coherence of E, conditional on H, is greater than the coherence of E simpliciter, that is, if ForH(E) > 1, then E confirms H.22 Consider a few simple versions of the structures we have considered earlier, which are displayed in Figure 5, again assuming positive relevance for each piece of evidence Ei. Figure 5: Three Different Hypothesis – Evidence Relationships In graph A, ForH(E) < 1. This is because E1 and E2 are independent, so S(E) = 1, but E1 and E2 are negatively associated conditional on H, so S(E| H) < 1. In graphs B and C, ForH(E) is also less than 1. This is because E1 and E2 are independent conditional on H, so S(E | H) = 1, but E1 and E2 are positively associated, so S(E) > 1. In elaborations of each of these structures that involve adding a causal connection between E1 and E2, it is possible to parameterize the model such that ForH(E) > 1, but in the simple structures pictured in Figure 5, Proposition 1 cannot be activated because the antecedent is false. Beginning with a piece of evidence E1 that is a cause of H, as in graphs A and B, and choosing between getting a new piece of evidence E2 that is a cause of H (graph A) or an effect of H (graph B), where both E1 and E2 are individually equally correlated with H, which structure ought one to prefer if the goal is to maximize the confirmation of H? If E2 is selected to be a cause of H (graph A), then one would be opting for evidence that has no coherence. If E2 is selected to be an effect of H (graph B), then one would be opting for evidence that has all of its coherence through H. As Proposition 6 shows, neither choice dominates; the outcome depends on a subtle inequality. 22 Always assuming positive relevance, (A1), of each member of E. (A) E1 E2 H E1 E2 H E1 E2 H (A) (B) (C) 22 Proposition 2 demonstrates that focused correlation tracks confirmation when comparing a pair of equal positive evidence sets for H or, within bounds, a pair of variable positive evidence sets for H.23 Readers familiar with the impossibility theorems of Erik Olsson (2005) and Luc Bovens and Stephan Hartmann (2003a) may wonder how this can be true. Olsson's result, for example, shows that 'there are no informative coherence measures that are truth conducive ceteris paribus in a basic Lewis scenario' (Olsson 2005, p. 213). The ceteris paribus conditions for Olsson's result, partial reliability and independence, are shared with Bovens and Hartmann's witness model, and those conditions correspond to our (A1) and (A3), respectively. There are subtle and important differences between these witness models and our own framework, but we view (A1) and (A3) to be the signature of Bayesian witness models, and our Propositions 3 and 4 show how evidential coherence is completely determined by the strength of individual evidence within Bayesian witness models. Specifically, Proposition 4 is our simplified and generalized version of Olsson's impossibility result. This result shows that there can be no difference in coherence between equally positive relevant evidence sets for H (A2) which satisfy evidential independence and P(Ei) = P(Ej), for all individual pieces of evidence. Proposition 3 shows that any difference in coherence between two positive evidence sets will be directly due to a difference in evidential strength. This brings us back to the beginning of this essay and how to explain why colluding witnesses offer less compelling testimony than independent witnesses for the claim that White killed Black. Proposition 5 tells us that collusion is always worse than positive evidence offered by independent witness reports. Neither evidential coherence nor our own impossibility result about Bayesian witness models holds any sway. Our approach to the riddle of coherence is different from Bayesian epistemology in at least four respects. First, we reject a central tenant of Bayesian epistemology, which is that the relationship between coherence and likelihood of truth is fully determined by probability alone (Bovens and Hartmann 2003a, pp. 12 & 27). In our view, it is necessary to take into consideration the causal structure that might regulate the relationships between evidence and a hypothesis. Second, we think that it is a mistake to focus on the specific formulation of 'truth conduciveness' and 'coherence' before understanding the general principles for how association, incremental confirmation, and causal structure fit together. Our strategy has been to start with what we believe are the most common incremental confirmation measures and very basic approaches to measuring probabilistic association and to explore general principles for how the two constrain one another given various causal structures. Third, on the Bayesian view, 23 For evidence sets of size 2 with a common variable. 23 models of witness testimony are believed to characterize an ideal class of models within which to explore the general relationship between measures of coherence and likelihood of truth. In our view, this has it exactly backwards. What is surprising about Bayesian witness models is, while designed to capture a pre-theoretic truth about ideal witness testimony which appears to be charitable to coherence theory, they specify conditions that are inimical to understanding how probability and likelihood of truth fit together. Last, although there are some exceptions (Douven and Meijs 2007), most Bayesian coherence measures attempt to combine together logical and probabilistic notions of coherence. However, we think that these two notions are better kept separate. 8. Conclusion Explicating notions of coherence and confirmation have occupied philosophers of science for hundreds of years. Further, most every philosopher since William Whewell who has discussed both notions has connected them. Recently, many have tried to model these ideas and the connection between them using only the probability calculus. Attempts to connect coherence simpliciter to confirmation are bound to fail, as probabilistic models of coherence make no reference to either the reason for coherence or the reason any piece of evidence in a set of evidence should relate to the hypothesis. In our view, any such efforts ought to include, explicitly in the formalism, both the reason the evidence is coherent and how the evidence is causally related to the hypothesis. We have tried to argue that focused correlation and causal structure move in this direction. Since evidence can be causally connected to other evidence and to the hypothesis in virtually any way possible, it turns out to be very useful to explicitly and formally model the causal structure governing the evidence and the hypothesis. Even when one connects causal structure to probability only qualitatively through independence and conditional independence, quite a lot about the relationship between coherence and confirmation can be adduced. In cases in which all the evidence are effects of the hypothesis and otherwise causally independent, coherence and confirmation are tightly connected.24 In cases in which the coherence between the evidence has nothing causally to do with the hypothesis, coherence and confirmation are utterly disconnected. In cases in which pieces of evidence are not caused by the hypothesis nor cause each other, the story is more complicated, but extremely rich nonetheless. 24 Philosophers, statisticians, and computer scientists have learned a lot about how to tell, from data, whether or not a set of measured variables are indeed effects of an unmeasured common cause and otherwise causally independent, and so this case is epistemically particularly exciting. See Silva, Scheines, Glymour, and Spirtes (2006), Junker and Ellis (1997), and Glymour (1998). 24 We have not offered a proof that focused correlation and/or causal structure are the only keys to the castle, nor do we think one is forthcoming. Nor have we offered anything approaching a complete theory of coherence and confirmation through focused correlation and causal structure. For one thing, we have concentrated on evidence sets of size 2, and difficulties loom for attempts to make comparisons of larger evidence sets (Bovens and Hartmann 2006). Focused correlation is defined for arbitrary-sized information sets, but confirmation, covariance, and correlation are here conceived of as binary relationships, or ternary in conditional form. Thus, studying the relationship between the focused correlation of evidence sets greater than size two and incremental confirmation, covariance or Pearson's correlation, will require a decision as to how to partition the evidence set. Specifically, there are many incremental confirmation questions that are compatible with one focused correlation problem involving an evidence set of size greater than two. To expect otherwise is a category mistake, and negative results should be no surprise.25 References BonJour, Laurence 1985: The Structure of Empirical Knowledge. Cambridge, MA, Harvard University Press. BonJour, Laurence 1999: 'The Dialectics of Foundationalism and Coherentism'. In Greco and Sosa 1999, pp. 117-42. BonJour, Laurence 2002: Epistemology. Oxford: Rowman and Littlefield. Bovens, Luc and Stephan Hartmann 2003a: Bayesian Epistemology. Oxford: Oxford University Press Bovens, Luc and Stephan Hartmann 2003b: 'Solving the Riddle of Coherence'. Mind, 112, pp. 601-33. Bovens, Luc and Stephan Hartmann 2006: 'An Impossibility Result for Coherence Rankings'. Philosophical Studies, 128, pp. 77-91. 25 We are grateful to David Danks , Kenny Easwaran, Clark Glymour, Christopher Hitchcock, Rune Nyrup, Teddy Seidenfeld, and Max Schlosshauer for many helpful discussions, and thanks especially to David and Teddy for some key algebraic insights. Thanks also to two anonymous referees for their thorough and very constructive remarks. The research was supported in part by award LogiCCC/0001/2007 from the European Science Foundation. 25 Bovens, Luc and Erik Olsson 2000: 'Coherentism, Reliability, and Bayesian Networks', Mind, 109: 685-719. Breese, J. and D. Koller (eds) 2001: Uncertainty in artificial intelligence: Proceedings of the 17th conference (UAI-2001). San Francisco: Morgan Kaufmann. Carnap, Rudolf 1962: The Logical Foundations of Probability. Chicago: University of Chicago Press. Cohen, L. J. 1977: The Probable and the Provable. Oxford: Clarendon Press. Cross, Charles B. 1999: 'Coherence and Truth Conducive Justification'. Analysis, 59(3), pp. 186-93. Crupi, V., K. Tentori, and M. Gonzalez 2007: 'On Bayesian Measures of Evidential Support: Theoretical and empirical issues'. Philosophy of Science, 74(2), pp. 229-52. Danks, David and Clark Glymour 2001: 'Linearity Properties of Bayes Nets with Binary Variables'. In Breese and Koller 2001, pp. 98-104. Douven, Igor and Wouter Meijs 2007: 'Measuring Coherence'. Synthese, 156(3), pp. 405-25. Earman, John 1992: Bayes or Bust: A Critical Examination of Bayesian Confirmation Theory. Cambridge, MA: MIT Press. Eells, Ellery and Branden Fitelson 2002: 'Symmetries and Asymmetries in Evidential Support'. Philosophical Studies, 107(2), pp. 129-42. Ewing, Alfred C. 1934: Idealism: A Critical Survey. London: Methuen. Fitelson, Branden 2003: 'A Probabilistic Theory of Coherence'. Analysis, 63, pp. 194-99. Greco, John and Ernest Sosa (eds) 1999: The Blackwell Guide to Epistemology. Malden, MA: Blackwell. Glass, D. H. 2006: 'Coherence Measures and their Relations to Fuzzy Similarity and Inconsistency in Knowledge Bases'. Artificial Intelligence Review, 26, pp. 227-49. 26 Glymour, Clark 1998: 'What Went Wrong: Reflections on Science by Observation and The Bell Curve'. Philosophy of Science, 65(1), pp. 1-32. Glymour, C., R. Scheines, P. Spirtes, and K. Kelly 1987: Discovering Causal Structure. London: Academic Press. Haenni, R. J.W. Romeyn, G. Wheeler, and J. Williamson 2011: Probabilistic Logic and Probabilistic Networks, Dordrecht: The Synthese Library. Huemer, Michael 1997: 'Probability and Coherence Justification'. The Southern Journal of Philosophy, 35, pp. 463-72. Jeffrey, Richard 1965: The Logic of Decision. New York: McGraw-Hill. Junker, B. W. and J. L. Ellis 1997: 'A Characterization of Monotone Unidimensional Latent Variable Models'. The Annals of Statistics, 25, pp. 1327-43. Klein, Peter and Ted Warfield 1994: 'What Price Coherence?' Analysis, 54(3), pp.129-32. Lewis, C. I. 1946: An Analysis of Knowledge and Valuation. La Salle: Open Court. Meijs, Wouter 2004: 'A Corrective to Bovens and Hartmann's Measure of Coherence'. Philosophical Studies, 133(2), pp. 151-80. Myrvold, Wayne 1996: 'Bayesianism and Diverse Evidence: A Reply to Andrew Wayne'. Philosophy of Science, 63, pp. 661-5. Olsson, Erik J. 2002: 'What is the Problem of Coherence and Truth?' Journal of Philosophy, 94, pp. 246-72. Olsson, Erik J. 2005: Against Coherence: Truth, Probability and Justification. Oxford: Oxford University Press. Schlosshauer, Maximillian and Gregory Wheeler 2011: 'Focused Correlation, Confirmation, and the Jigsaw Puzzle of Variable Evidence'. Philosophy of Science, 78(3), pp. 276-92. Shogenji, Tomoji 1999: 'Is Coherence Truth Conducive?' Analysis, 59, 1999, 338-45. 27 Silva, R., C. Glymour, R. Scheines, and P. Spirtes 2006: 'Learning the Structure of Latent Linear Structure Models'. Journal of Machine Learning Research, 7, pp. 191-246. Spirtes, P., C. Glymour, and R. Scheines 2000: Causation, Prediction, and Search. 2nd edition. Cambridge, MA: MIT Press. Wheeler, Gregory 2009: 'Focused Correlation and Confirmation'. The British Journal for the Philosophy of Science, 60(1), pp. 79-100. 28 Appendix Proposition 1. Theorems 1-6 establish that positive focused correlation of a positive evidence set E for H entails positive confirmation of H given E. We omit countermodels falsifying the converse relation. Theorem 1. Let {E1, E2} be a positive evidence set for H. Then, ForH (E1,E2) > 1 ⇒ inc1(H, E1,E2) > 0. Proof: We wish to show that, for any positive evidence set {E1,E2} for H, ForH (E1,E2) > 1 only if P(H | E1,E2) P(H | E1) > 0. Suppose ForH (E1,E2) > 1. Both P(H | E1) > P(H) and P(H | E2) > P(H) by (A1). We now show that P(H | E1, E2) > P(H | E1). € ForH (E1,E2) = P(H | E1,E2) P(H) × P(H)P(E1) P(H,E1) × P(H)P(E2) P(H,E2) > 1 = P(E1,E2,H) P(H,E2) × P(E1) P(H,E1) × P(H)P(E2) P(H,E2) > 1 = P(E1,E2,H) P(H,E2) × P(E1) P(H,E1) ×ε > 1, where ε < 1 by (A1); So, = P(H | E1,E2) ×ε > P(H | E1). Thus, whenever {E1,E2} is positive evidence for H and ForH (E1,E2) >1, then inc1(H, E1,E2) > 0. ♦ Theorem 2. Let {E1,E2} be a positive evidence set for H. Then, ForH (E1,E2) >1 ⇒ inc2(H, E1,E2) > 0. Proof: From Theorem 1, whenever {E1,E2} is positive evidence for H and ForH (E1,E2) >1, it follows immediately that inc2(H, E1,E2) > 0 unless , but P(H | E1) cannot equal 1, by (A1). Thus, whenever {E1,E2} is positive evidence for H and ForH (E1,E2) >1, then inc2(H, E1,E2) > 0. ♦ Theorem 3. Let {E1,E2} be a positive evidence set for H. Then, ForH (E1,E2) >1 ⇒ r1(H, E1,E2) > 0. Proof: By (A1), P(H)/P(H | E1) < 1 and P(H)/P(H | E2) < 1. So, given ForH (E1,E2) >1, P(H | E1, E2) > P(H). It follows immediately that log[P(H | E1, E2) / P(H)] > 0. ♦ € P(H | E1) =1 29 Theorem 4. Let {E1,E2} be a positive evidence set for H. Then, ForH (E1,E2) >1 ⇒ r2(H, E1,E2) > 0. Proof: By theorem 1, if ForH (E1,E2) >1 then P(H | E1, E2) × ε > P(H | E1), and ε < 1. Then P(H | E1, E2) > P(H | E1). Therefore, log[P(H | E1, E2) / P(H | E1)] > 0.♦ Theorem 5. Let {E1,E2} be a positive evidence set for H. Then, ForH (E1,E2) >1 ⇒ ko(H, E1,E2) > 0. Proof: By (A1), P(H)/P(H | E1) < 1 and P(H)/P(H | E2) < 1. So given ForH (E1,E2) >1, then P(H | E1, E2) / P(H) > 1. Hence, (i) P(H | E1, E2) > P(H | E2) > P(H) , therefore (ii) P(¬H | E1, E2) < P(¬H | E2) < P(¬H). Now we wish to show that € P(E1,E2 | H) − P(E1,E2 |¬H) P(E1,E2 |¬H) + P(E1,E2 |¬H) > 0. Observe: (iii) € P(H,E1,E2) P(H)P(E1,E2 | H)+P(E1,E2 |¬H) − P(¬H,E1,E2) P(¬H)P(E1,E2 | H)+P(E1,E2 |¬H) > 0, therefore (iv) € P(H,E1,E2) P(H)α − P(¬H,E1,E2) P(¬H)α > 0. Hence, P(E1, E2| H) × 1/α > P(E1, E2|¬H) × 1/α. Therefore, ko(H, E1, E2) iff P(E1, E2| H) > P(E1, E2|¬H) iff (v) € P(H | E1,E2)P(E1,E2) P(H) > P(¬H | E1,E2)P(E1,E2) P(¬H) , which is ensured by (i) and (ii). So, ForH (E1,E2) >1 entails ko(H, E1,E2) > 0 whenever {E1,E2} is positive evidence for H. ♦ Theorem 6. Let {E1,E2} be a positive evidence set for H. Then, ForH (E1,E2) >1 ⇒ l(H, E1,E2) > 0. Proof: By (A1), P(H)/P(H | E1) < 1 and P(H)/P(H | E2) < 1. and by hypothesis we suppose that focused correlation is greater than 1. Therefore, from Theorem 4, since these conditions entail P(E1, E2| H) > P(E1, E2|¬H), it follows immediately that log[P(E1, E2| H) / P(E1, E2|¬H)] > 0. ♦ 30 Proposition 2. Lemma 1. Let {E1,E2} and {E1,E3} be epe-evidence sets for H. Then, (a) If (P(H | E1, E2)P(H) / P(H | E1) P(H | E2)) = ((P(H | E1, E3)P(H)) / P(H | E1) P(H | E3)), then P(H | E1, E2) = P(H | E1, E3) (b) If (P(H | E1, E2)P(H) / P(H | E1) P(H | E2)) = ((P(H | E1, E3)P(H)) / P(H | E1) P(H | E3)), then P(H | E1, E2) = P(H | E1, E3) (c) Now we prove proposition 2 by the following seven theorems. To shorten the proofs, we use the notation 'X >= Y' to abbreviate two cases, (i) when X > Y and (ii) when X = Y. Theorem 7. ForH (E1,E2) ≥ ForH (E1,E3) ⇒ inc1(H, E1,E2) ≥ inc1(H, E1,E3). (i) If {E1,E2} and {E1,E3} be epe-evidence sets for H and ForH (E1,E2) = ForH (E1,E3), then inc1(H, E1,E2) = inc1(H, E1,E3). (ii) If {E1,E2} and {E1,E3} be epe-evidence sets for H and ForH (E1,E2) > ForH (E1,E3), then inc1(H, E1,E2) > inc1(H, E1,E3). Proof: By Lemma 1a for equality case and 1b for inequality, P(H | E1,E2) >= P(H | E1,E3). Then P(H | E1,E2) P(H | E1) >= P(H | E1,E3) P(H | E1). So, ForH (E1,E2) ≥ ForH (E1,E3) ⇒ inc1(H, E1,E2) ≥ inc1(H, E1,E3).♦ Theorem 8. inc1(H, E1,E2) ≥ inc1(H, E1,E3) ⇒ inc2(H, E1,E2) ≥ inc2(H, E1,E3) (i) If {E1,E2} and {E1,E3} be epe-evidence sets for H and inc1(H, E1,E2) = inc1(H, E1,E3), then inc2(H, E1,E2) = inc2(H, E1,E3). (ii) If {E1,E2} and {E1,E3} be epe-evidence sets for H and inc1(H, E1,E2) > inc1(H, E1,E3), then inc2(H, E1,E2) > inc2(H, E1,E3). Proof: if P(H | E1,E2) – P(H | E1) >= P(H | E1,E3) – P(H | E1), then by Lemma 1 P(H | E1,E2) >= P(H | E1,E3). Thus, (P(H | E1,E2) – P(H | E1) / 1P(E1)) >= (P(H | E1,E3) – P(H | E1) / 1P(E1). So, inc1(H, E1,E2) ≥ inc1(H, E1,E3) ⇒ inc2(H, E1,E2) ≥ inc2(H, E1,E3).♦ 31 Theorem 9. inc2(H, E1,E2) ≥ inc2(H, E1,E3) ⇒ r1(H, E1,E2) ≥ r1 (H, E1,E3) (i) If {E1,E2} and {E1,E3} be epe-evidence sets for H and inc2(H, E1,E2) = inc2(H, E1,E3), then r1(H, E1,E2) = r1(H, E1,E3). (ii) If {E1,E2} and {E1,E3} be epe-evidence sets for H and inc2(H, E1,E2) > inc2(H,E1,E3), then r1(H, E1,E2) > r1(H, E1,E3). Proof: If P(H | E1,E2) – P(H | E1) / 1P(E1)) >= (P(H | E1,E3) – P(H | E1) / 1P(E1) then P(H | E1,E2) >/= P(H | E1,E3). Thus it follows immediately that log[(P(H | E1,E2) /P(H)] >= log[(P(H | E1,E3) /P(H)]. So, inc2(H, E1,E2) ≥ inc2(H, E1,E3) ⇒ r1(H, E1,E2) ≥ r1 (H, E1,E3).♦ Theorem 10. r1(H, E1,E2) ≥ r1(H, E1,E3) ⇒ r2(H, E1,E2) ≥ r2(H, E1,E3). (i) If {E1,E2} and {E1,E3} be epe-evidence sets for H and r1(H, E1,E2) = r1(H, E1,E3), then r2(H, E1,E2) = r2(H, E1,E3). (ii) If {E1,E2} and {E1,E3} be epe-evidence sets for H and r1(H, E1,E2) > r1(H,E1,E3), then r2(H, E1,E2) > r2(H, E1,E3). Proof: (i) if log[(P(H | E1,E2) /P(H)] >/= log[(P(H | E1,E3) /P(H)], then P(H | E1,E2) >= P(H | E1,E3), and immediately log[(P(H | E1,E2) /P(H | E1)] >= log[(P(H | E1,E3) /P(H | E1)]. So, r1(H, E1,E2) ≥ r1(H, E1,E3) ⇒ r2(H, E1,E2) ≥ r2(H, E1,E3). ♦ Theorem 11. r2(H, E1,E2) ≥ r2(H, E1,E3) ⇒ l(H, E1,E2) ≥ l(H, E1,E3) (i) If {E1,E2} and {E1,E3} be epe-evidence sets for H and r2(H, E1,E2) = r2(H, E1,E3), then l(H, E1,E2) = l(H, E1,E3). (ii) If {E1,E2} and {E1,E3} be epe-evidence sets for H and r2(H, E1,E2) > r2(H, E1,E3), then l(H, E1,E2) > l(H, E1,E3). Proof: By hypothesis, r2(H, E1,E2) >= r2(H, E1,E3). So P(H | E1,E2) >= P(H | E1,E3). Observe that log[P(E1,E2 | H) / P(E1,E2 | ¬H)] >= log[P(E1,E3 | H) / P(E1,E3 | ¬H)] reduces to: € log P(H | E1,E2)P(E1,E2)P(H) P(¬H | E1,E2)P(E1,E2)P(¬H) >= log P(H | E1,E3)P(E1,E3)P(H) P(¬H | E1,E3)P(E1,E3)P(¬H) . But this inequality holds if, P(H | E1, E2) >= P(H | E1, E3), which holds by Lemma 1. So, r2(H, E1,E2) ≥ r2(H, E1,E3) ⇒ l(H, E1,E2) ≥ l(H, E1,E3).♦ Theorem 12. l(H, E1,E2) ≥ l(H, E1,E3) ⇒ ko(H, E1,E2) ≥ ko(H, E1,E3) 32 (i) If {E1,E2} and {E1,E3} be epe-evidence sets for H and l(H, E1,E2) = l(H, E1,E3), then ko(H, E1,E2) = ko(H, E1,E3). (ii) If {E1,E2} and {E1,E3} be epe-evidence sets for H and l(H, E1,E2) > l(H, E1,E3), then ko(H, E1,E2) > ko(H, E1,E3). Proof: Let: a = P(E1, E2 | H) b = P(E1, E2 | ¬H) c = P(E1, E3 | H) d = P(E1, E3 | ¬H) Suppose a/b >= c/d, by hypothesis. Hence, ac >= bd. To show that a-b/a+b = c-d/c+d, observe that this equality reduces to € −bc + ad (a +b)(c + d) >= 0, which holds since –bc + ad >= 0, by hypothesis. So, l(H, E1,E2) ≥ l(H, E1,E3) ⇒ ko(H, E1,E2) ≥ ko(H, E1,E3).♦ Theorem 13. ko(H, E1,E2) ≥ ko(H, E1,E3) ⇒ ForH (E1,E2) ≥ ForH (E1,E3) (i) If {E1,E2} and {E1,E3} be epe-evidence sets for H and ko(H, E1,E2) = ko(H, E1,E3), then ForH (E1,E2) = ForH (E1,E3). (ii) If {E1,E2} and {E1,E3} be epe-evidence sets for H and ko(H, E1,E2) > ko(H, E1,E3), then ForH (E1,E2) > ForH (E1,E3). Proof: Let: a = P(H| E1, E2) 1 − a = P(¬H |E1, E2) b = P(H| E1, E3) 1 − b = P(¬H |E1, E3). We have ko(H, E1,E2) >= ko(H, E1,E3), by hypothesis. 1) € log P(E1,E2 | H) P(E1,E2 |¬H) >= log P(E1,E3 | H) P(E1,E3 |¬H) , which is equivalent to 33 2) € log P(H | E1,E2)P(E1,E2)P(H) P(¬H | E1,E2)P(E1,E2)P(¬H) >= log P(H | E1,E3)P(E1,E3)P(H) P(¬H | E1,E3)P(E1,E3)P(¬H) . Then, the (in)equality of equation 2) holds if P(H | E1, E2) >= P(H | E1, E3), which follows by Lemma 1. So, ko(H, E1,E2) ≥ ko(H, E1,E3) ⇒ ForH (E1,E2) ≥ ForH (E1,E3). ♦ Proposition 3. If {E1, E2, E3} satisfy positive relevance (A1) and independence (A3) with respect to H, and P(E2) = P(E3), then ρE1,E2 > ρE1,E3 ⇔ inci(H, E1, E2) > inci(H, E1, E3). Proof of Proposition 3. 1) For binary variables X, Y, H, any distribution P(X, Y, H) in which X _||_ Y | H holds can be parameterized by a causal Bayes network (CBN) with the graph X  H  Y. 2) Among binary variables X, Y, Cov(X,Y) = P(Y, X) – P(X)P(Y). So, we have that Cov(Ei,H) = P(H | Ei) – P(H)P(Ei), for i = 1, 2, 3. 3) By (A1), P(H | E1) > P(E1). So P(H | E1) > P(H)P(E1), and thus Cov(E1, H) > 0. Similarly, Cov(E2, H) > 0 and Cov(E3, H) > 0. 4) In a singly connected CBN over binary variables, the correlation between any two variables X, Y, Cor(X,Y), is the product of the correlations on the trek from X to Y. (Danks and Glymour, 2001) 5) Since Cov(E1, H), Cov(E2, H) and Cov(E3, H) are positive, Cor(E1, H), Cor(E2, H) and Cor(E3, H) are positive. So, both Cor(E1, E2) and Cov(E1, E2) are positive and Cor(E1, E3) and Cov(E1, E3) are positive. 34 (⇒) Suppose ρE1,E2 > ρE1,E3. By 4), Cor(E1, H) Cor(E2, H) > Cor(E1, H) Cor(E3, H). By 5) we know that these correlations are positive, so Cor(E2, H) > Cor(E3, H). By hypothesis P(E2) = P(E3), so Cov(E2, H) > Cov(E3, H). Thus, P(H | E2) > P(H | E3). (⇐) Suppose inci(H, E1, E2) > inci(H, E1, E3). Then, by (A2) and (A3), P(H | E1, E2) P (H | E1) > P(H | E1 E3) P(H | E1) iff P(H|E2) > P(H|E3). Since P(E2) = P(E3), Cov(E2, H) > Cov(E3, H). Since by 5) correlations are positive, Cor(E2, H) > Cor(E3, H), and by 4), Cor(E1, H) Cor(E2, H) > Cor(E1, H) Cor(E3, H). So, ρE1,E2 > ρE1,E3.♦ Proposition 4. If E = {E1, E2} and E' = {E1, E3} satisfy positive relevance (A1), equal relevance (A2), and independence (A3) with respect to H, and P(E2) = P(E3), then ρE1,E2 = ρE1,E3 and S(E1, E2) = S(E1, E3). Proof of Proposition 4. Suppose P(D) is a positive probability distribution over (H, E1, E2, E3) such that E = {E1, E2} and E' = {E1, E3} satisfy positive relevance (A1), equal relevance (A2), and independence (A3) with respect to H, and P(E2) = P(E3). Lemma 2: Then, the covariance of E2 and H is identical to the covariance of E3, and H, since € P(E2)[P(H | E2)− P(H )]= P(E3)[P(H | E3)− P(H)], and (A1) guarantees that the covariance is positive. ♦ Then: 1) By (A1), Cor(H,Ei) is positive for i = 1, 2, 3. 2) From P(E2) = P(E3) and Lemma 1, Cor(H,E2) = Cor(H,E3). 3) So, by the Danks-Glymour product rule (2001), ρE1,E2 = ρE1,E3. 4) Also, since both P(E2) = P(E3) and ρE1,E2 = ρE1,E3, then Cov(E1,E2) = Cov(E1,E3). 5) So, S(E1, E2) = S(E1, E3). 35 Proposition 5. If E = {E1, E2} and E' = {E1, E3} are positive evidence sets for H, then in any probability distribution P(H, E1, E2, E3} that is Markov and Faithful to the causal graph in Figure 3, inc1(H,E1,E2) > inc1(H,E1,E3). Proof of Proposition 5: 1) € P(E1,E2 | H) P(E1,E2 |¬H) = P(E1 | H) P(E1 |¬H) × P(E2 | H) P(E2 |¬H) , by E1 _||_ E2 | H 2) € P(E1,E2 | H) P(E1,E2 |¬H) P(E1 | H) P(E1 |¬H) = P(E2 | H) P(E2 |¬H) , from dividing both sides by € P(E1 | H) P(E1 |¬H) . 3) € P(E2 | H) P(E2 |¬H) >1, by positive relevance and Bayes theorem. 4) € P(E1,E2 | H) P(E1,E2 |¬H) P(E1 | H) P(E1 |¬H) > 1, by 2 and 3. 5) € P(E1,E2 | H) P(E1,E2 |¬H) P(E1 | H) P(E1 |¬H) = P(H | E1,E2)P(E1,E2) P(H) P(¬H | E1,E2)P(E1,E2) P(¬H) P(H | E1)P(E1) P(H) P(¬H | E1)P(E1) P(¬H) , by Bayes theorem 6) € P(E1,E2 | H) P(E1,E2 |¬H) P(E1 | H) P(E1 |¬H) = P(H | E1,E2) P(¬H | E1,E2) P(H | E1) P(¬H | E1) , cancellations from 5. 7) € P(H | E1,E2) P(¬H | E1,E2) P(H | E1) P(¬H | E1) > 1, by 6 and 4 8) So, P(H | E1,E2) – P(H | E1) > 0, and inc1(H, E1,E2) > 0. 9) Since E3 _||_ H | E1 P(H | E1,E3) – P(H | E1) = 0, and thus inc1(H, E1,E3) = 0. 36 10) So, inc1(H, E1,E2) > inc1(H, E1,E3). ♦ Proposition 6. If E = {E1, E2} and E'' = {E1, E4} are equally positive evidence sets for H, then in any probability distribution P(H, E1, E2, E4,) that is Markov and Faithful to the causal graph in Figure 3, inc1(H,E1,E4) > inc1(H,E1,E2) if and only if S(E1,E2) > a/b, where a = P(H | E4) / P(H) b = P(H | E1 E4) / P(H | E1). Proof of Proposition 6. 1) Because E and E'' are epe, inc1(H,E1,E4) > inc1(H,E1,E2) if and only if ForH (E'') > ForH(E) 2) ForH (E1,E4) = € P(E1,E4 | H) P(E1 | H)P(E4 | H) P(E1,E4 ) P(E1)P(E4 ) = P(E1,E4 | H) P(E1 | H)P(E4 | H) , since E1 _||_ E4 3) ForH (E1,E2) = € P(E1,E2 | H) P(E1 | H)P(E2 | H) P(E1,E2) P(E1)P(E2) = P(E1,E2) P(E1)P(E2) , since E1 _||_ E2 | H 4) ForH (E1,E4) = € P(H | E1,E4 ) P(H) P(E1 | H)P(E4 | H) , Bayes theorem to numerator in 2. 5) ForH (E1,E4) = € P(H | E1,E4 )P(E1,E4 )P(H) P(H | E1)P(H | E4 )P(E1)P(E4 ) , Bayes theorem to denominator in 4 6) ForH (E1,E4) = € P(H | E1,E4 )P(H) P(H | E1)P(H | E4 ) ×1, since E1 _||_ E4 entails P(E1,E4) = P(E1)P(E4). 7) If ForH (E1,E4) > ForH (E1,E2) iff € P(H | E1,E4 )P(H) P(H | E1)P(H | E4 ) > P(E1)P(E2) P(E1,E2) . 37 8) iff € P(H | E1,E4 )P(H) P(H | E1)P(H | E4 ) × P(E1)P(E2) P(E1,E2) >1. 9) iff € P(H | E1,E4 ) P(H | E1) × S(E1,E2) > P(H | E4 ) P(H) . 10) € S(E1,E2) > P(H | E4 ) P(H) P(H | E1,E4 ) P(H | E1) .