Two Mistakes Regarding The Principal Principle Christopher J. G. Meacham Published in the British Journal for the Philosophy of Science, 61 (2010): 407-431. Abstract This paper examines two mistakes regarding David Lewis' Principal Principle that have appeared in the recent literature. These particular mistakes are worth looking at for several reasons: the thoughts that lead to these mistakes are natural ones, the principles that result from these mistakes are untenable, and these mistakes have led to significant misconceptions regarding the role of admissibility and time. After correcting these mistakes, the paper discusses the correct roles of time and admissibility. With these results in hand, the paper concludes by showing that one way of formulating the chance-credence relation has a distinct advantage over its rivals. 1 Introduction In "A Subjectivist's Guide to Objective Chance", Lewis (1986) presents the canonical account of the relation between credence and chance, the Principal Principle. A number of different formulations of Lewis' principle can be found in the literature. Most of these differences are reasonable, harmless or both. But some of them are not. Some of them involve substantive mistakes. Two mistakes are of particular interest. The first mistake replaces the reasonable initial credence function that appears in Lewis' original formulation with a subject's current credence function; the second mistake adds a time index to the subject's credences.1 It's worth looking at these mistakes carefully for several reasons. First, the principles that result from these mistakes are problematic. The first mistake leads to inconsistencies, the second makes the Principal Principle too weak to apply to some cases. Second, these are natural mistakes. And getting clear on why they're mistakes will shed light on why certain natural ways of thinking about the relation between credence and chance are problematic. Third, and most importantly, these mistakes have led to misconceptions regarding the role of admissible evidence and the role of time in the chance-credence relation. And in the process of correcting these mistakes, the proper role of time and admissibility will become clear. Examining these mistakes also has an unexpected benefit. In untangling the consequences of these mistakes, various merits and demerits of different formulations of the 1For instances of the first, see Loewer (2001), Loewer (2004), Hajek (2002), Vranas (2002), Vranas (2004) and Lewis (1994). For instances of the second, see Vranas (2002) and Vranas (2004). 1 chance-credence relation will come to light. Once the dust settles, it will become clear that one formulation of the chance-credence relation is strictly better than the alternatives. This paper will proceed as follows. In the next section I'll briefly sketch some background. In the third section I'll discuss the first mistake and its implications. In the fourth section I'll examine the role of admissible evidence. In the fifth section I'll discuss the second mistake, and its implications regarding the role of time. In the final section I'll explore the merits and demerits of different formulations of the chance-credence relation. With the lessons from the previous discussion in hand, I'll demonstrate that one way of formulating the chance-credence relation has a definite advantage over its rivals. 2 Background 2.1 The Chance Function The core of Lewis' (1980) metaphysical account of chance can be expressed as a pair of claims about the form of the chance function: The Function: Every possible chance assignment can be encoded by a function ch of two arguments, a grounding argument and an object argument. The grounding argument G picks out a chance distribution, chG(*). Given G, the object argument A picks out a value x = chG(A). The Arguments: The grounding argument of ch is a proposition: a conjunction of a complete chance theory, T , and a complete history up to a time at a world where that chance theory holds, H. The object argument of ch is also a proposition,2 and chG(*) assigns a chance to every proposition to which an idealized credence function assigns a value.3 Lewis also provided an alternative characterization of the grounding argument as a time and world pair. But for his purposes, and ours, these two characterizations are interchangeable. When conjoined with the Principal Principle, these two claims about the form of the chance function entail Lewis' other claims about chance, such as that chG(*) is a probability function, that "the past is no longer chancy" (if H⇒ A, chT H(A) = 1), and so on.4 (An exception is Lewis' claim that determinism and chance are incompatible. Although Lewis 2In particular, a proposition about the way the world could be, or what Lewis (1979) calls a "de dicto proposition". This is given as a simplifying assumption in Lewis (1986), but in later work (such as Lewis (2004)) this is retained as a substantive assumption. 3This might seem unfaithful to Lewis, since Lewis explicitly declines to assume that a chance distribution assigns a well-defined value to every proposition. (See Lewis (1986), p.91.) But Lewis notes in the postscript that his reasons for declining stem from mathematical worries concerning whether a well-defined non-standard probability can be assigned to every proposition. And "plainly this reason for caution is no reason at all to think that the domains of chance distributions will be notably sparser than the domains of idealized credence functions." (Lewis (1986), p.132.) So the restriction to propositions that idealized credence functions are defined for keeps the characterization I provide faithful to Lewis. 4I use '⇒' to indicate strict implication, and '⇔' to indicate a strict biconditional. For ease of exposition, I will occasionally describe these relations as 'entails' and 'iff ', respectively. (It should be clear from context what I mean.) 2 seems to have thought that this was required by his other commitments, it appears to be an independent assumption.5) 2.2 The Chance-Credence Relation Let up(*) represent a rational subject's "ur-priors", her initial credence function before she's obtained any evidence. This is what Lewis calls a "reasonable initial credence function".6 Let 〈ch(A)= x〉 be the proposition that is true iff some T H obtains such that chT H(A)= x.7 More formally, 〈ch(A) = x〉 ⇔ T1H1∨T2H2∨ ... , for all TiHi's such that chTiHi(A) = x. Intuitively, 〈ch(A) = x〉 can be thought of as the proposition that at some time the chance of A is x. Note that 〈ch(A) = x〉 and 〈ch(A) = y〉 can both be true, since the chance of A can be x at one time and y at another. Similarly, let 〈cht(A) = x〉 be the proposition that is true iff some T H obtains such that chT H(A) = x, and H is a history up to time t, Ht . More formally, 〈cht(A) = x〉 ⇔ T1H1∨T2H2∨ ..., for all TiHi's such that chTiHi(A)= x and Hi =Hti . Intuitively, 〈cht(A)= x〉 can be thought of as the proposition that at time t the chance of A is x. Lewis presents two formulations of the Principal Principle. Lewis' first formulation is: PP1: up(A|〈cht(A) = x〉E) = x, if E is admissible with respect to 〈cht(A) = x〉,8 (1) assuming the terms in question are well-defined. (I'll take the "if well-defined" caveat to be implicit from now on, for both this and the other formulations of the chance-credence relation we'll look at.) When is evidence admissible? Lewis never provides a precise account-"I have no definition of admissibility to offer, but must be content to suggest sufficient (or almost sufficient) conditions"-but he suggests that information about the past and information about which chance theory obtains should both be admissible.9 Lewis' second formulation is: PP2: up(A|T H) = chT H(A). (2) Infinity caveats aside, Lewis takes these two formulations of the Principal Principle to be equivalent.10 While PP1 makes use of the notion of admissibility, PP2 does not. This makes Lewis' explicit ambivalence about what counts as admissible evidence puzzling. If PP1 and PP2 5Loewer (2001) argues that it is not required by Lewis' Humean commitments; Meacham (2005) and Hoefer (2007) argue that it is not entailed by the Principal Principle and Lewis' other claims about chance. 6Lewis (1986), p.87. Lewis is not, of course, assuming that every rational subject must have the same ur-priors. 7In order to avoid Miller's Paradox, we should understand x, here and elsewhere, as a variable ranging over real numbers in the unit interval, not as (say) a schematic letter which we can replace with terms that designate these numbers (c.f. Lewis (1986)). 8Lewis (1986) originally assessed whether E was admissible relative to a time. But as Thau (1994) noted, and Lewis (1994) conceded, admissibility is relative to more than just time. 9Lewis (1986), p.92. 10Lewis (1986) notes that the derivations between PP1 and PP2 that he provides only go through for cases where 〈cht(A) = x〉 is equivalent to a finite disjunction of T H terms (see Lewis (1986), p.100). Lewis suggests that this lapse is unlikely to be important, and I'm inclined to agree. In the rest of this paper, I'll follow Lewis in putting this matter aside. 3 are equivalent, then providing a characterization of admissibility should be straightforward: call evidence "admissible" iff doing so is required to make PP1 and PP2 yield the same results. This puzzling feature of Lewis' presentation has led some authors to believe that Lewis must have intended to add an admissibility clause to PP2 as well as PP1.11 We will return to examine the purpose of PP1's admissibility clause in section 4. 2.3 Assumptions Following Lewis, I will assume that it's rational to be Bayesian. Letting crE(*) represent the credence function of an agent whose total evidence is E, we can characterize Bayesian agents as agents with probabilistic credences who satisfy the following constraint:12 crE(*) = up(*|E), if defined. (3) In what follows I will restrict my attention to Bayesian agents who satisfy the usual idealizations (logical omniscience, etc.). For convenience, I will also follow Lewis in assuming that the propositions under consideration can be identified with sets of possibilities; in particular, sets of possible worlds. For the purposes of this paper, I will turn a blind eye to the question of whether Lewis' Humean account of chance can be reconciled with the Principal Principle. I will also sidestep the question of whether the "New Principle" proposed by Lewis (1994) and Hall (1994) should replace Lewis' original principle. These issues, although interesting, are orthogonal to the matters we'll be concerned with. So I'll circumvent them in what follows. 3 Assessing the First Mistake 3.1 The First Mistake Recall Lewis' first formulation of the Principal Principle: PP1: up(A|〈cht(A) = x〉E) = x, if E is admissible with respect to 〈cht(A) = x〉. The first mistake replaces the "reasonable initial credence function" up(*) with the current credence function, cr(*): PP∗1: cr(A|〈cht(A) = x〉E) = x, if E is admissible with respect to 〈cht(A) = x〉. (4) 3.2 Motivating the First Mistake There are several reasons why it's natural to make the first mistake. 1. Two reasons stem from Lewis himself. First, in his original paper he uses "c(*)" to represent the "reasonable initial credence function" up(*). When expressed in this notation, it's easy to confuse PP1 with PP∗1. Second, and more importantly, Lewis himself 11For example, see Vranas (2004) and Meacham (2005). 12Where the subject's total evidence is understood here to be the conjunction of all of the evidence the agent has received so far. 4 endorses PP∗1 in one of his later papers. 13 Since it's reasonable to take Lewis at his word, it's natural to follow him in making this mistake. 2. Lewis' reason for endorsing PP∗1 in his later paper stems from another source. Given Bayesianism, we can reformulate PP1 as: PP1: cr〈cht(A)=x〉E(A) = x, if E is admissible with respect to 〈cht(A) = x〉. (5) Since a subject's credence in her evidence is 1, we can equate crE(A) with cr(A|E) as long as we explicitly state in the latter case that E is the subject's total evidence. So we can re-express PP1 as: PP1: cr(A|〈cht(A) = x〉E) = x, if E is admissible with respect to 〈cht(A) = x〉, (6) where 〈cht(A) = x〉E is the subject's total evidence. So far, so good. But we must be careful. It's easy to lose track of the total evidence clause and express this as: PP∗1: cr(A|〈cht(A) = x〉E) = x, if E is admissible with respect to 〈cht(A) = x〉. Indeed, since PP∗1 looks reasonable at first glance, one may see little reason to keep the total evidence clause, even if one realizes that the resulting rule is not the same as PP1. This is appears to be Lewis' reason for adopting PP∗1 in his later paper. Although he explicitly recognizes the distinction between PP∗1 and the formulation of PP1 expressed by (5), he states that the latter is merely a special case of the former, and presents PP∗1 as the full-fledged Principal Principle.14 3.3 Why the First Mistake is Problematic PP∗1 is problematic. Unlike PP2, PP ∗ 1 does not take one's evidence into account. As a result, it leads to inconsistencies. To see this, pick a T , H and A such that chT H(A) = x 6= 1, and consider a subject whose total evidence is T HA. Then it follows from PP∗1 that: 1 = crT HA(A) = crT HA〈cht(A)=x〉(A) = crT HA〈cht(A)=x〉(A|〈cht(A) = x〉) = x 6= 1. (7) What went wrong? The problem is that PP∗1 doesn't take the subject's evidence into account. If we formulate the principle in terms of ur-priors, as Lewis originally did, then we don't need to worry about this, since the subject has no evidence. Likewise, if we formulate the principle in terms of a subject's current credence, and we explicitly take her current evidence into account, as (5) and (6) do, we avoid these problems. But if we formulate the chance-credence relation in terms of a subject's current credence, and fail to take her current evidence into account, then we'll run into contradictions.15 13See Lewis (1994), p.237-238. The admissibility clause Lewis provides is slightly different, but this has no bearing on the issues we're concerned with. 14See Lewis (1994), p.237-238. 15One might hope that realistic agents won't get evidence of the kind that will lead to contradictions. Unfortunately, realistic agents will get such evidence. To get such evidence, one merely needs to have observed the outcome of a chance event. Alternatively, one might try to avoid these worries by restricting PP∗1 so that it only applies to subjects at the time picked out by the chance proposition 〈cht(A) = x〉 in question. Then realistic agents won't have 'future' evidence of the kind that will lead to contradictions. This modification of PP∗1 helps to circumvent contradictions, but it leads to problems of a different nature (see section 5). 5 3.4 Consequences of the First Mistake The first mistake skews our assessment of the role of admissibility. Once we make this mistake, it's easy to be led astray about the purpose of admissibility. For example, Vranas (2004) argues that any tenable formulation of the chance-credence relation requires an admissibility clause: "Assume you know (for sure) that ch(A) is 50%. Does it follow that your credence in A should be 50%?... Given [the chance-credence principle without an admissibility clause] it does, but in fact it doesn't, because you may also know some inadmissible proposition", such as the proposition that A occurred.16 With respect to PP∗1, this argument makes sense. Stripping PP ∗ 1 of the admissibility clause gives us: PP∗1-: cr(A|〈cht(A) = x〉) = x. (8) And this principle leads to inconsistencies, for the reasons Vranas presents. If we apply this principle to an agent whose total evidence is A, with respect to the proposition 〈cht(A) = 1/2〉, we get: 1 = crA(〈cht(A) = x〉) crA(〈cht(A) = x〉) = crA(A〈cht(A) = x〉) crA(〈cht(A) = x〉) = crA(A|〈cht(A) = x〉) = 1/2 6= 1. (9) Vranas concludes that the purpose of the admissibility clause is to protect us from these inconsistencies. And if we've made the first mistake, it's natural to follow Vranas in thinking that this is why we need admissibility. As we saw in section 3.3, adding the admissibility clause back to PP∗1 doesn't get us out of these inconsistencies, so this can't be why admissibility is needed. But put that aside. We won't even be tempted to follow this line of thought if we start with PP1 instead of PP∗1. If we strip PP1 of the admissibility clause we get: PP1-: up(A|〈cht(A) = x〉) = x, (10) or equivalently: PP1-: cr〈cht(A)=x〉(A) = x. (11) And these principles are not subject to the kind of worry Vranas raises. These principles only entail that a subject's credence in A should be 1/2 if her total evidence is 〈ch(A) = 1/2〉. If the subject is in possession of further evidence, such as A, then PP1no longer requires her credence in A to be 1/2. So Vranas' worry won't apply.17 PP1's admissibility clause isn't needed to escape Vranas' worry. But if this isn't what we need admissibility for, what do we need it for? We'll turn to this question next. 4 The Role of Admissibility Why does PP1 need an admissibility clause? One popular answer is that admissibility is needed to handle crystal ball cases. Another is that admissibility is needed in order to 16Vranas (2004), p.9. 17The first mistake is also what leads Vranas to criticize Hall's claim that the "New Principle" doesn't require an admissibility clause (see Vranas (2004), section 5). If we understand the New Principle correctly, in terms of ur-priors, then these criticisms won't apply. 6 make a chance-credence principle strong enough to be useful. However, nothing about these answers hangs on the particular form of PP1. If they succeed in showing why PP1 needs an admissibility clause, then they will have shown why PP2 needs an admissibility clause as well. So if either of these answers are correct, PP2 should be modified and equipped with an admissibility clause. The correct answer, on the other hand, will demonstrate why only PP1 needs an admissibility clause. Let's look at each of these answers in turn. 4.1 Crystal Balls One reason given for why admissibility is needed is that it's required in order to deal with crystal ball cases.18 And to handle such cases, both PP1 and PP2 require an admissibility clause. As Hall (1994) points out, this answer is mistaken: no admissibility clause is needed to deal with crystal ball cases. To see this, let's look at how PP2 handles a typical crystal ball case: Suppose that there are crystal balls-devices which gave infallible predictions with respect to the outcomes of future events. And suppose that a crystal ball tells us at t0 that the outcome of some chance event at t1 will be A. Finally, suppose that the chance at t0 of this outcome, A, is 1/2. If our total evidence consists of the chance theory at this world and the history up to t0, T H, what should our credence in A be? The correct answer seems to be 1. But PP2 appears to give the wrong answer: crT H(A) = up(A|T H) = chT H(A) = 1/2, (12) unless we add an admissibility clause. The way in which this argument fails depends on the status of the crystal ball's infallibility. If the crystal ball is actually infallible, then the case described above is incoherent. Since the crystal ball shows us at t0 that A will come about, T H ⇒ A, and thus 1/2 = chT H(A) = up(A|T H) = up(AT H)/up(T H) = up(T H)/up(T H) = 1, which is a reductio of the case described. If there was an infallible crystal ball that predicted the outcome of a future event, then the history up to the time of the prediction would entail that the event happens. This entails that the chance of the event at that time would have to be 1, contradicting the assumption that the chance of the event at that time is 1/2. On the other hand, if we just believe (mistakenly) that the crystal ball is infallible, then PP2 must declare our beliefs irrational. Let I be the proposition that the crystal ball is infallible. Since we believe that the crystal ball is infallible, our priors must be such that up(I|T H) = 1. Since IT H⇒ A, it follows that: 1/2 = chT H(A) = up(A|T H)≥ up(AI|T H) = up(I|T H) = 1 6= 1/2. (13) So PP2 must declare our priors irrational. But this is the right answer-it is irrational to have a credence of 1 in A when you know that the chance of A is 1/2-so this isn't a mark against PP2. 18For example, see Strevens (1999), Vranas (2004). 7 4.2 Usefulness Another reason given for why admissibility is needed is that it is required in order to make the chance-credence relation strong enough to be interesting. It is then suggested that Lewis' failure to add an admissibility clause to PP2 as well as PP1 was a mistake. This answer stems from something like the following train of thought. Consider how these principles fare without admissibility. PP2 tells us what our credences should be if our total evidence consists of some T H. And PP1-, the principle we get once we strip PP1 of the admissibility clause, tells us what our credences should be when our total evidence consists of some 〈cht(A) = x〉. But this is of little use to people like us, who are never in either of these evidential situations-situations where our total evidence is all and only some T H, or all and only some proposition about the chance of A. By contrast, a principle like PP1 is helpful: PP1: cr〈cht(A)=x〉E(A) = x, if E is admissible with respect to 〈cht(A) = x〉, since we are often in situations where our total evidence can be expressed as 〈cht(A)= x〉E, for some E. This answer fails to appreciate the strength of PP1and PP2 when coupled with Bayesianism.19 Consider PP2. It's true that PP2 only applies directly to our credences when our total evidence is equal to some T H. But PP2 doesn't need to apply directly in order to bear on our credences. Through PP2, the chances place constraints on our ur-priors. And the Bayesian rule generates our credences from our ur-priors and our evidence. So PP2 allows the chances to indirectly constrain our credences, even when our total evidence doesn't equal the grounds of a chance distribution. For a simple example, consider a subject whose total evidence is B = T1H1 ∨ T2H2, where T1H1 and T2H2 are mutually exclusive. (If they weren't mutually exclusive, then one would entail the other, and we could represent the subject's evidence as just one of the two disjuncts.) Suppose that chT1H1(A) = 1/2 and chT2H2(A) = 1/3. In this case, PP2 won't fix the subject's credence in A. But it will still place the appropriate constraints. In particular, it will require her credence in A to satisfy:20 crB(A) = crB(T1H1) 2 + crB(T2H2) 3 . (14) 4.3 The Strength of PP1 Neither of these two answers for why we need admissibility succeeds. So what do we need the admissibility clause for? And why does only PP1 need one? To see why PP1 requires an admissibility clause, consider the chance-credence relation that results once we strip it of this clause, PP1-: PP1-: up(A|〈cht(A) = x〉) = x. 19Meacham (2005) provides a more detailed defense of the claim that PP2 does not require an admissibility clause. 20In detail: crB(A) = up(A|B) = up(AB)/up(B) = up(AT1H1∨AT2H2)/up(B) = up(AT1H1)/up(B) + up(AT2H2)/up(B) = up(AT1H1)up(T1H1)/up(B)up(T1H1) + up(AT2H2)up(T2H2)/up(B)up(T2H2) = up(A|T1H1)up(T1H1)/up(B) + up(A|T2H2)up(T2H2)/up(B) = chT1H1(A)up(T1H1)/up(B) + chT2H2(A)up(T2H2)/up(B) = chT1H1(A)up(T1H1B)/up(B) + chT2H2(A)up(T2H2B)/up(B) = chT1H1(A)up(T1H1|B) + chT2H2(A)up(T2H2|B) = chT1H1(A)crB(T1H1) + chT2H2(A)crB(T2H2) = 1/2 * crB(T1H1) + 1/3 * crB(T2H2). 8 Given PP2, we can derive PP1-. (The proof is provided in Appendix A.) But given PP1-, we cannot derive PP2. So PP2 is strictly stronger than PP1-. Furthermore, PP1-'s weakness is problematic. There are intuitive relations between credence and chance that PP1cannot capture. For example, consider the following case: Consider a chance theory T1 such that chT1Ht1(A) = 1/2. Recall that 〈cht(A) = 1/2〉 is equivalent to a disjunction of all of the TiHti 's for which chTiHti (A) = 1/2. For simplicity, let us assume there are just two TiHti 's in the disjunction associated with 〈cht(A) = 1/2〉, T1Ht1 and T2Ht2. So 〈cht(A) = 1/2〉 ⇔ T1Ht1∨ T2Ht2. Now consider a subject whose total evidence is T1Ht1. What should her credence in A be? Since the subject knows only that T1Ht1 obtains, and T1H t 1 entails that the chance at t of A is 1/2, the subject's credence in A should be 1/2. But PP1is too weak to impose this constraint. We can see this explicitly. Here is an ur-prior function which (i) entails that the subject's credence in A will be 0 instead of 1/2, and yet (ii) satisfies the constraints that PP1imposes: up(T1Ht1) = a up(AT1H t 1) = 0 up(T1H t 1∧T2Ht2) = 0 up(T2Ht2) = a up(AT2H t 2) = a Given these ur-priors, (i) the subject's credence in A will be 0: crT1Ht1(A) = up(A|T1H t 1) = up(AT1Ht1) up(T1Ht1) = 0 a = 0. (15) But (ii) this ur-prior function is compatible with PP1-, since: up(A|〈cht(A) = 1/2〉) = up(A|Ht1T1∨Ht2T2) = up(AT1Ht1∨AT2Ht2) up(T1Ht1∨T2Ht2) (16) = up(AT1Ht1)+up(AT2H t 2) up(T1Ht1)+up(T2H t 2) = 0+a a+a = 1/2. Why isn't PP1strong enough to yield the desired result? PP1places a constraint on one's prior in A conditional on T1Ht1 ∨ T2Ht2 ∨ ... , the disjunction associated with 〈cht(A) = x〉. But in order to get the result we want, we need to impose a more finegrained constraint on one's priors. We need a principle which, like PP2, constrains one's prior in A conditional on each of the individual TiHti 's associated with 〈cht(A) = x〉. PP1constrains one's conditional priors with respect to the disjunction, but we also need to constrain one's conditional priors with respect to each of the individual disjuncts. This is why PP1 requires an admissibility clause. The admissibility clause makes PP1 strong enough to impose the constraints we want it to impose. Consider the case above, for example, where the subject's total evidence is T1Ht1 and chT1Ht1(A) = 1/2. If we apply PP1 instead of PP1-, we get the answer we want: crT1Ht1(A) = up(A|T1H t 1) = up(A|T1Ht1〈cht(A) = 1/2〉) = 1/2, (17) 9 where the last step employs the fact that the T H's associated with 〈cht(A) = x〉 are admissible relative to it. So with the admissibility clause in place, PP1 is strong enough to yield the desired results. But when, exactly, is evidence admissible? We can use the requirement that PP1 and PP2 should be equivalent to determine the required notion of admissibility: Admissibility: E is admissible relative to 〈cht(A) = x〉 iff 〈cht(A) = x〉E can be expressed as a disjunction of some subset of the T H's associated with 〈cht(A) = x〉.21 I.e., E is admissible relative to 〈cht(A) = x〉 if E intersects 〈cht(A) = x〉 'cleanly'-the outline of their intersection traces the boundaries of T H's. Given this characterization of admissibility, PP1 can be derived from PP2 and vice versa, in the way Lewis (1986) describes. (The proofs are reproduced in Appendix A.) This characterization of admissibility entails that information about the past is admissible, as is information about which chance theory obtains. For E to be inadmissible, it needs to exclude some, but not all, of the possibilities in some T H associated with 〈cht(A) = x〉. Consider one of these T H's. An inadmissible E needs to include some further information not entailed by T H. And T H entails everything about the past and the chance theory. So the inadmissible part of E can't be about the past or the chance theory; it must be about some other aspect of the world, such as the future. 4.4 Lewis and Admissibility This leaves us with a final question: why is Lewis so ambivalent about admissibility? Given that his claim about the equivalence of PP1 and PP2 entails a precise characterization of admissibility, why does he claim that "I have no definition of admissibility to offer"?22 Lewis' ambivalence is even more puzzling in light of two further observations: Lewis employs the criterion for admissibility given above as a necessary condition for admissibility in his derivation of PP1 from PP2, and the sufficient condition for admissibility Lewis offers-being about the past or the chance theory-entails that the above criterion is sufficient as well.23 So why doesn't he take this criterion to be a necessary and sufficient condition for admissibility, as I've suggested? Lewis refrains from adopting this characterization of admissibility for two reasons. The first concerns worlds with temporal abnormalities.24 Suppose there are worlds where time can loop back on itself, such as the Gödel universe models of general relativity which admit closed timelike curves. At worlds like these, where the same event can be in both one's future and one's past, it's hard to know what to count as "information about the future" or "a history up to time". Or suppose there are possible worlds without a welldefined notion of global time, such as the Anti-De Sitter models of general relativity which don't admit Cauchy surfaces. At worlds like these, it's hard to know how to make sense of even simple notions like the "chance at t". In either case, it's hard to make sense of the characterization of admissibility given above, which relies on these kinds of temporal 21This characterization of admissibility has the consequence that E is admissible relative to 〈cht(A) = x〉 in the trivial case where E and 〈cht(A) = x〉 are mutually exclusive. But one could also define things the other way. Nothing of substance hangs on this choice. 22Lewis (1986), p.92. 23See Lewis (1986), p.94-96 and 99-100. 24See Lewis (1986), p.94. 10 notions.25 So we can only take this characterization of admissibility to be adequate if we restrict ourselves to worlds without such temporal oddities.26 Second, Lewis worries that information about the chances themselves will be inadmissible if one adopted a Humean account of chance.27 If so, then the characterization of admissibility given above will be inadequate. This is an interesting topic, but since I'm side-stepping issues particular to Humeanism, I won't discuss it further here. So Lewis has good reasons for declining to adopt the characterization of admissibility suggested above. If we restrict ourselves to worlds without temporal anomalies, and put aside issues regarding Humeanism, then this characterization will suffice. But once we relax these restrictions, this characterization may no longer be adequate. 5 Assessing the Second Mistake 5.1 The Second Mistake Recall Lewis' second formulation of the Principal Principle: PP1: up(A|〈cht(A) = x〉E) = x, if E is admissible with respect to 〈cht(A) = x〉. The second mistake is to formulate the principle in terms of time-indexed credences. Let crt;E(*) be the credence at t of a subject whose total evidence is E. (When cr has only one subscript, it should be clear from context which is intended.) In the literature, the second mistake generally appears together with the first mistake:28 crt(A|〈cht(A) = x〉E) = x, if E is admissible with respect to 〈cht(A) = x〉. (18) That said, the second mistake does not require the first. We can isolate the second mistake by using Bayesianism to formulate PP1 in terms of a subject's current credences: PP1: cr〈cht(A)=x〉E(A) = x, if E is admissible with respect to 〈cht(A) = x〉, and then adding a time-index to the credence function: PP∗∗1 : crt;〈cht(A)=x〉E(A) = x, if E is admissible with respect to 〈cht(A) = x〉. (19) 5.2 Motivating the Second Mistake Here are two natural trains of thought that lead to this mistake.29 25I thank Phillip Bricker for suggesting this way of understanding Lewis. 26Although Lewis highlighted these worries in his discussion of admissibility, these possibilities also make trouble for Lewis' account of the grounding argument of the chance function. So the impact of these possibilities cuts deeper than just the characterization of admissibility: the intelligibility of Lewis' account of the grounding argument also depends on this restriction of the possibilities under consideration. 27See Lewis (1986), p.130. 28I.e., see Vranas (2002), Vranas (2004). 29Yet another motivation is to time index things in order to circumvent the kind of contradictions brought about by the first mistake. But since I am trying to treat these mistakes separately, I will put this motivation aside. 11 1. Consider a subject whose total evidence consists of what the chance of A will be at several different times, t1-tn. And suppose that the chance of this outcome at each of these times is different. Which of these chances should her credence in A line up with? A natural thought is that we should want our credences to line up with whatever the current chance is. So we should add a time index to the credence function on the left hand side of the Principal Principle, and PP1 should really be: PP∗∗1 : crt;〈cht(A)=x〉E(A) = x, if E is admissible with respect to 〈cht(A) = x〉. 2. In Lewis' formulation of PP1, the chance propositions 〈cht(A)= x〉 are time indexed. But what role is the time index playing? Suppose, for example, that we replaced 〈cht(A) = x〉 (the proposition that at t the chance of A is x) with 〈ch(A) = x〉 (the proposition that at some time the chance of A is x). Doing so yields: PP3: cr〈ch(A)=x〉E(A) = x, if E is admissible with respect to 〈ch(A) = x〉, (20) where E is admissible with respect to 〈ch(A) = x〉 iff E〈ch(A) = x〉 can be expressed as a disjunction of a subset of the T H's associated with 〈ch(A) = x〉. Is there anything wrong with PP3? Here's a natural gut-reaction response: PP3 can't be right. Consider an agent who knows nothing except a single fact about chance-that at some time the chance of A will be x. PP3 says that this agent's credence in A ought to be x. But that's crazy! What does it matter if at some time the chance of A is x? Surely we only have reason to set our credences equal to the chances if they're the current chances. With this thought in mind, it seems we should add a time index to the credence function on the left hand side of (5.1), and PP1 should really be: PP∗∗1 : crt;〈cht(A)=x〉E(A) = x, if E is admissible with respect to 〈cht(A) = x〉. 5.3 Why the Second Mistake is Problematic Unlike PP∗1, PP ∗∗ 1 is not inconsistent. The problem with PP ∗∗ 1 is that it won't allow us to make all of the inferences we'd like to make. Say you're told about a coin toss in 1900 A.D., before you were born. You know the chance right before the coin toss was 〈ch1900(h) = 1/2〉. And let's stipulate that you don't have any evidence that's inadmissible relative to 〈ch1900(h) = 1/2〉.30 What should your credence be that the coin came up heads? 30This is not a realistic case, since someone like us would have inadmissible evidence, information that "cuts across" the T H's associated with 〈ch1900(h) = 1/2〉. But the worry described below still arises, albeit less directly. I.e., consider someone who is told about this coin toss, and who has the kind of inadmissible evidence that we might have. What should her credence be in h? As Lewis (1986) notes in his discussion of a similar case (p.115116), it depends. PP1 will entail that her prior in h conditional on the admissible part of her evidence should be 1/2. If we further stipulate that the inadmissible part of her evidence is independent of the outcome of the coin toss, then it will follow that her credence should remain 1/2. On the other hand, if no such stipulation is made, then it won't follow that her credence in heads should remain 1/2. This is all as it should be. But PP∗∗1 cannot yield these results, since PP∗∗1 will not entail that her prior in h conditional on her admissible evidence should be 1/2. 12 It seems your credence should be 1/2. And this is what PP1 tells us to believe: you know the chance of the coin coming up heads was 1/2, and you don't have any evidence that's inadmissible relative to that chance. But PP∗∗1 won't deliver this result. You don't know the current chance of the coin coming up heads-you don't know whether it's 0 or 1-so PP∗∗1 won't apply. It might seem like Bayesianism can rescue PP∗∗1 here. If your credence in heads was appropriately constrained at the time of the coin toss, and you updated using the Bayesian rule, won't you end up with correct credences later on? If so, doesn't this get PP∗∗1 out of the problem? Yes, you will end up with the right credence, but no, this won't get PP∗∗1 out the problem. In the case described, the coin is flipped before you were born. So your credence in heads can't have been appropriately constrained at the time of the coin toss. And once that time has passed, it's too late. It's natural at this point to think that a shift to a constraint on ur-priors is called for. If we formulate the chance-credence principle as an ur-prior constraint, we don't need to worry about when you were born. But this move isn't available to the proponent of PP∗∗1 . Because the credences in PP∗∗1 are time indexed, there's no easy way to reformulate it in terms of ur-priors. 5.4 The Role of Time PP∗∗1 incorporates time in two ways: it tracks the time at which a subject has the credence, and it tracks the time at which the chance obtains. As we've just seen, incorporating time in this first way is undesirable. We don't want the time at which the credence is held to matter. What about incorporating time in the second way? Does it matter whether we track the time at which the chances obtain? No. We can see this in the following way. If we remove the time index from PP1, we get PP3: PP3: cr〈ch(A)=x〉E(A) = x, if E is admissible with respect to 〈ch(A) = x〉. And PP1 and PP3 are equivalent: given PP3 you can derive PP1 and PP2, and vice versa. (The proofs are provided in Appendix B.) Thus PP3 adequately captures the relation between credence and chance that Lewis envisioned, even though it doesn't track the time at which the chance obtains. So nothing is lost by discarding the time index that appears in PP1. To get a better feel for why this is the case, consider again a subject who knows nothing except a single fact about chance, 〈cht(A) = x〉. Regardless of what the time t is, PP1 will require that her credence in A be x. And since 〈ch(A) = x〉 is just the disjunction of 〈cht(A) = x〉's for all t's, PP1 will impose the same constraint if her only evidence was that the chance of A was x at some time. So in this case the time at which the chance obtains is irrelevant. In more complicated cases the time at which the chance proposition obtains can play a useful role: it can be used as an easy way to gauge the admissibility of different kinds of evidence. But what is and isn't admissible ultimately stems from structural features of the evidence and the chance proposition. And while interesting facts about the relation between admissibility and the time fall out as a consequence of these structural features (as we saw in section 4.3), time itself isn't an integral part of this structure. This is why we 13 can use the same characterization of admissibility for PP1 and PP3, even though the latter doesn't keep track of the time at which the chance obtains. We can index chances to times because of Lewis' choices with respect to the grounding argument. But while this plays an important part in Lewis' metaphysical account of chance, it is not a substantive part of Lewis' epistemic account of chance. With respect to the chance-credence relation itself, the role of time is dispensable. 6 Assessing The Chance-Credence Relation We've seen a number of attempts to formulate the relation between credence and chance. Which of these formulations should we prefer? We can rule out some of these formulations for reasons we've already seen. Both PP∗1 and PP ∗ 1are inconsistent. And neither PP1nor PP ∗∗ 1 is strong enough to impose the desired constraints on our credences. What about the three formulations of the Principal Principle, PP1, PP2 and PP3? The details of these assessments will depend on the assumptions we make about the grounding argument of chance distributions. So we'll first assess the different formulations of the chance-credence relation given Lewis' account of the grounding argument. Then we'll relax this assumption, and reassess the merits of these formulations once we allow for more general kinds of grounding arguments. 6.1 Take I: Lewis' Grounding Argument First, let's assess the different formulations of the chance-credence relation given Lewis' account of the grounding argument. Let's start by considering the two formulations Lewis offers, PP1 and PP2. As we've seen, these two formulations are equivalent: PP1 entails PP2, and vice versa. But this doesn't mean that the two formulations are equally good. PP2, unlike PP1, does not have an admissibility clause. And this makes PP2 much more convenient to work with. We can see this in Lewis' (1986) paper: after introducing the two formulations of the Principal Principle, he works exclusively with PP2. What about PP3? As we've seen, PP3 is also equivalent to PP2. But, like PP1, PP3 is burdened with an admissibility clause, and this makes PP3 harder to use. What if we removed the admissibility clause from PP3? This would yield PP3-: PP3-: up(A|〈ch(A) = x〉) = x. (21) But PP3suffers from the same problems as PP1-. PP3is too weak to capture all of the relations between credence and chance that we want. As a result, PP3is not a satisfactory formulation of the chance-credence relation. So given Lewis' account of the grounding argument, PP2 is the most convenient formulation of the Principal Principle. 6.2 Take II: General Grounding Arguments Now let's assess the merits of these formulations once we allow for more general kinds of grounding arguments. 14 The grounding arguments Lewis employs have two distinctive features. First, they are time-indexed: every grounding argument is naturally associated with some time t. In particular, T H is associated with the time which H is a history up to. Second, there are no partial overlaps: two grounding arguments overlap iff one is a subset of the other. Consider two T H's. If the T 's differ or the H's are incompatible, then they'll be mutually exclusive. If the T 's match and the H's are compatible (just of different lengths), then the T H corresponding to the longer history will be a subset of the other T H. These features fit well with many of our intuitions regarding chance. But in order to accommodate statistical mechanical chances and the like, we might want to allow for grounding arguments which have neither of these features.31 So let's consider more general kinds of grounding arguments, and then reassess the different formulations of the chance-credence relation. 1. What happens if we allow grounding arguments which aren't time indexed? Allowing such arguments creates problems for PP1, because there's no natural way to generalize PP1 in order to accommodate them. PP2 and PP3, on the other hand, have little trouble accommodating arguments that aren't time indexed. Let's look at each of these principles in turn. First, consider PP1. To generalize PP1, we need to generalize our characterization of 〈cht(A) = x〉. The natural way to do this is to replace the T H's with some more general grounding argument G, and to characterize 〈cht(A) = x〉 as: 〈cht(A) = x〉 ⇔ G1∨G2∨ ... , for all Gi's such that chGi(A) = x and Gi=Gti . (22) But this definition only makes sense if we're able to associate the grounding argument with a time. And once we allow grounding arguments which aren't time indexed, we no longer have a natural way to do this. Since the intelligibility of PP1 hangs on the intelligibility of 〈cht(A) = x〉, our failure to generalize 〈cht(A) = x〉 means we can't generalize PP1 either. Next, consider PP3. To generalize PP3, we need to generalize our characterization of 〈ch(A) = x〉. But time doesn't play a special role in the characterization of 〈ch(A) = x〉, so this generalization is unproblematic. We can simply replace the T H's with G's, and characterize 〈ch(A) = x〉 as: 〈ch(A) = x〉 ⇔ G1∨G2∨ ... , for all Gi's such that chGi(A) = x. (23) Of course, we can no longer think of 〈ch(A) = x〉 as the proposition that at some time the chance of A is x, since chances are no longer time-indexed. Instead, 〈ch(A) = x〉 should just be thought of as the proposition that some grounding argument obtains for which the chance of A is x. With these adjustments in hand, we can formulate a generalized version of PP3: GPP3: up(A|〈ch(A) = x〉E) = x, if E is admissible with respect to 〈ch(A) = x〉, (24) or, in terms of credences: GPP3: cr〈ch(A)=x〉E(A) = x, if E is admissible with respect to 〈ch(A) = x〉. (25) 31For proposals along these lines, and the reason such features need to be abandoned in order to accommodate statistical mechanics, see Arntzenius (1995), Loewer (2001), Meacham (2005), Hoefer (2007) and Nelson (2008). 15 Finally, consider PP2. Formulating a generalized version of PP2 is straightforward. We simply replace T H with G, and PP2 becomes: GPP2: up(A|G) = chG(A), (26) or, in terms of credences: GPP2: crG(A) = chG(A). (27) 2. What happens if we also allow grounding arguments which can partially overlap? The effects of this change are more subtle, though no less important. Once we allow such arguments, GPP2 and GPP3 are no longer equivalent. The derivations between PP2 and PP3 rely crucially on the assumption that grounding arguments do not partially overlap.32 Once we allow grounding arguments that can partially overlap, the analogous derivations won't hold between GPP2 and GPP3. Since these principles are no longer equivalent, we have a substantive choice to make. Which of these two principles-GPP2 or GPP3-should we prefer? A little consideration reveals that GPP2 is preferable to GPP3. The reason is that only GPP2 can accommodate theories with partially overlapping arguments of the kind that we're interested in, like statistical mechanics. If we try to apply GPP3 to such theories, we'll run into contradictions. These inconsistencies arise for statistical mechanics, but we can see how these problems arise using a simpler theory. Consider a chance theory T which holds at only three worlds, w1, w2 and w3. Let every subset of {w1, w2, w3} correspond to the grounds of some chance distribution chG(*), and let chG(*) assign an equal chance to each of the worlds in G. So the distribution grounded by G = w1∨w2∨w3 will assign a chance of 1/3 to each of the three worlds, the distribution grounded by G = w1∨w2 will assign a chance of 1/2 to w1 and w2, and so on.33 Now consider a subject whose total evidence is w1∨w2∨w3. What should her credence in w1 be? GPP2 delivers the correct result: crw1∨w2∨w3(w1) = chw1∨w2∨w3(w1) = 1/3. (28) What about GPP3? First consider the proposition 〈ch(w1) = 1/3〉. 〈ch(w1) = 1/3〉 corresponds to the disjunction of all of the grounding arguments G such that chG(w1) = 1/3. In this case, there is only one such grounding argument: Ga = w1 ∨w2 ∨w3. Thus 〈ch(w1) = 1/3〉= Ga = w1∨w2∨w3. So GPP3 entails: crw1∨w2∨w3(w1) = cr〈ch(w1)=1/3〉(w1) = 1/3. (29) Now consider the proposition 〈ch(w1) = 1/2〉. 〈ch(w1) = 1/2〉 corresponds to the disjunction of all of the grounding arguments G such that chG(w1) = 1/2. In this case, 32In particular, the proofs depend on being able to replace a disjunction of grounding arguments with a disjunction of a subset of these arguments that are mutually exclusive. (See Appendix B.) If we assume there are no partial overlaps, we can always do this: just eliminate the grounding arguments which are proper subsets of other grounding arguments. But if we allow for partially overlapping grounding arguments, we can't always do this, and the proofs no longer work. 33Recall that many different G's can be true at the same world. (Given Lewis' grounding argument, for example, there will be infinitely many different TH's that are true at a world, one for each time.) 16 there are two such grounding arguments: Gb = w1∨w2 and Gc = w1∨w3. Thus 〈ch(w1) = 1/2〉= Gb∨Gc = (w1∨w2)∨ (w1∨w3) = w1∨w2∨w3. So GPP3 entails: crw1∨w2∨w3(w1) = cr〈ch(w1)=1/2〉(w1) = 1/2. (30) So GPP3 makes inconsistent demands: it requires the subject's credence in w1 to be both 1/3 and 1/2. The source of GPP3's problem is that there are chance theories, like statistical mechanics or the simple theory just described, for which propositions asserting different chances are equivalent; i.e., correspond to the same set of worlds.34 And since GPP3 constrains credences via these propositions, this leads it to make inconsistent demands: a subject whose total evidence corresponds to these propositions is required to adopt inconsistent credences. GPP2 avoids these difficulties because it doesn't constrain credences through propositions about chance. Instead, it employs the grounding argument directly. 6.3 The Winner We've seen a number of formulations of the chance-credence principle. But one formulation: GPP2: crG(A) = chG(A), has several advantages over the others.35 It has no admissibility clause. It is strong enough to impose the desired constraints on our credences. It doesn't fail to apply in some cases. And unlike the other formulations of the chance-credence relation that we've looked at, it is intelligible and consistent regardless of what we take the grounding arguments to be.36 References Arntzenius, Frank. 1995. "Chance and the Principal Principle: Things Ain't What They Used To Be." Unpublished Manuscript. Hajek, Alan. 2002. "Interpretations of Probability." The Stanford Encyclopedia of Philosophy. Hall, Ned. 1994. "Correcting the Guide to Objective Chance." Mind 103:505–517. 34For those familiar with statistical mechanics, we can construct a statistical mechanical analog to this case in the following way. Take the grounding argument of a statistical mechanical chance distribution to be conjunction of the chance theory and the relevant macrostate. Consider every statistical mechanical macrostate T Mi such that chT Mi(A) = 1/2, for some A. This will consist of every union of A with another region of the phase space that has the same Louiville measure as A. So the disjunction of all T Mi's will encompass the entire state space, S. Similar reasoning shows that the disjunction of all T M j's such that chT M j(A) = 1/3 will also encompass all of S. It follows from GPP3 that both crS(A) = cr〈ch(A)=1/2〉(A) = 1/2 and that crS(A) = cr〈ch(A)=1/3〉(A) = 1/3. Contradiction. 35Notational variations aside, this formulation of the chance-credence principle is the identical to the formulations endorsed by Arntzenius (1995) (the "Scientific Principle"), Meacham (2005) (the "Basic Principle") and Nelson (2008). 36I would like to thank Frank Arntzenius, Ned Hall, Barry Loewer, Tim Maudlin and Michael Strevens for helpful discussions on these topics. I would also like to thank Phillip Bricker, Maya Eddon and an anonymous referee for discussion and comments on this paper. 17 Hoefer, Carl. 2007. "The Third Way on Objective Probability: A Skeptic's Guide to Objective Chance." Mind 116:549–596. Lewis, David. 1979. "Attitudes De Dicto and De Se." The Philosophical Review 88:513– 543. Lewis, David. 1986. A Subjectivist's Guide to Objective Chance. In Philosophical Papers, Vol. 2. Oxford University Press. Lewis, David. 1994. "Humean Supervenience Debugged." Mind 103:473–490. Lewis, David. 2004. "How Many Lives Has Schrodingers Cat?" Australasian Journal of Philosophy 82:3–22. Loewer, Barry. 2001. "Determinism and Chance." Studies in the History of Modern Physics 32:609–620. Loewer, Barry. 2004. "David Lewis' Theory of Objective Chance." Philosophy of Science 71:1115–1125. Meacham, Christopher J. G. 2005. "Three Proposals Regarding a Theory of Chance." Philosophical Perspectives 19:281–307. Nelson, Kevin. 2008. "On Background: Using Two-Argument Chance." Forthcoming in Synthese. Strevens, Michael. 1999. "Objective Probability as a Guide to the World." Philosophical Studies 95:243–275. Thau, Michael. 1994. "Undermining and Admissibility." Mind 103:491–503. Vranas, Peter. 2002. "Who's Afraid of Undermining? Why the Principal Principle might not contradict Humean Supervenience." Erkenntnis 57:151–174. Vranas, Peter. 2004. "Have your cake and eat it too: The Old Principal Principle reconciled with the New." Philosophy and Phenomenological Research 69:368–382. Appendix: Proofs These derivations employ the following characterization of admissibility: E is admissible relative to 〈cht(A) = x〉 iff 〈cht(A) = x〉E can be expressed as a disjunction of a subset of the T H's associated with 〈cht(A) = x〉. Likewise, E is admissible relative to 〈ch(A) = x〉 iff 〈ch(A) = x〉E can be expressed as a disjunction of a subset of the T H's associated with 〈ch(A) = x〉. For simplicity, I will assume in the following derivations that there are only finitely many terms. (The proofs in Appendix A are recreated from Lewis (1986), p.96-100.) 18 Appendix A: Proofs Regarding the Relation Between PP2 and PP1/PP1PP2⇒ PP1: If E is admissible relative to 〈cht(A) = x〉, then: up(A|〈cht(A) = x〉E) = up(A|T1Ht1∨ ...∨TnHtn) (31) = up(A∧ (T1Ht1∨ ...∨TnHtn)) up(T1Ht1∨ ...∨TnHtn) = ∑i up(ATiHti ) ∑ j up(TjHtj) = ∑i up(TiHti ) *up(A|TiHti ) ∑ j up(TjHtj) = ∑i up(TiHti ) * chTiHti (A) ∑ j up(TjHtj) = x * ∑i up(TiH t i ) ∑ j up(TjHtj) = x. PP1⇒ PP2: Suppose chT Ht (A) = x. Then T Ht ⇒ 〈cht(A) = x〉, and thus: up(A|T Ht) = up(A|〈cht(A) = x〉T Ht) (32) = x = chT Ht (A), where the second step employs PP1 and the fact that T Ht is admissible relative to 〈cht(A)= x〉. (This follows from the characterization of admissibility given above.) PP2⇒ PP1-: Once we replace E with a tautology, the derivation is identical to that given for PP2 ⇒ PP1. Appendix B: Proofs Regarding PP3/PP3PP2⇒ PP3: If E is admissible relative to 〈ch(A) = x〉, then: up(A|〈ch(A) = x〉E) = up(A|E ∧ (〈cht1(A) = x〉∨ ...∨〈chtn(A) = x〉)) (33) = up(A|(E ∧〈cht1(A) = x〉)∨ ...∨ (E ∧〈chtn(A) = x〉)) = up(A|T1H1∨ ...∨TmHm) = up(A|T1H1∨ ...∨TrHr) 19 = up(AT1H1∨ ...∨ATrHr) up(T1H1∨ ...∨TrHr) = ∑i up(ATiHi) ∑ j up(TjH j) = ∑i up(A|TiHi) *up(TiHi) ∑ j up(TjH j) = ∑i chTiHi(A) *up(TiHi) ∑ j up(TjH j) = ∑i x *up(TiHi) ∑ j up(TjH j) = x. The fourth step replaces the disjunction T1H1∨ ...∨TmHm, where the terms need not be mutually exclusive, with a disjunction of a subset of these terms, T1H1∨ ...∨TrHr, which are mutually exclusive. We can do this because if any two elements TiHi and TjH j of the disjunction aren't mutually exclusive, then one will be a subset of the other, and we can discard it. PP3⇒ PP2: Suppose chT H(A) = x. Then T H⇒ 〈ch(A) = x〉, and thus: up(A|T H) = up(A|〈ch(A) = x〉T H) (34) = x = chT H(A), where the second step employs PP3 and the fact that T H is admissible relative to 〈ch(A) = x〉. (This follows from the characterization of admissibility given above.) PP2⇒ PP3-: Once we replace E with a tautology, the derivation is identical to that given for PP2 ⇒ PP3.