The Principle of Indifference and the Principal Principle are Incompatible J. Dmitri Gallow † As I will understand it here, the Principle of Indifference (POI) says that your initial, or ur-prior, credences should be distributed uniformly. The Principal Principle (PP) says that your ur-prior credences should be aligned with the chances, in the following sense: your ur-prior credence in P , given that the chance of P is x, should be x. Pettigrew (2016) appears to accept both of these principles. However, the POI and the PP are incompatible. Abiding the POImeans violating the PP. Bayesians cannot accept both principles; theymust choose which, if either, to endorse. In §§1–2 below, I'll introduce the POI and the PP. These principles both say something about what your credences should be like in the absence of any evidence-that is to say, they both impose constraints on your initial, or urprior, credence function. In §3, I'll explain why these constraints contradict each other. In rough outline: the POI requires that your ur-prior credence in the conjunction A ∧ Ch(A) = x is equal to your ur-prior credence in the conjunction ¬A∧ Ch(A) = x, whereas the PP requires that your credence in the former is x/(1− x) times your credence in the latter. Unless x = 1/2, these two constraints are incompatible. In §4, I'll consider two moderate responses to this incompatibility. We may adopt a weaker formulation of the POI (§4.1), or we may accept a different formulation of the PP (§4.2). 1 The Principle of Indifference I'll suppose that you have degrees of belief, or credences, defined over the sentences in some language. In the simplest case, this will be a truth-functional language with a finite number of atomic sentences, A1,A2, . . . ,AN . In a language like this, a state description is a conjunction of the form ±A1 ∧ ±A2 ∧ * * * ∧ ±AN , where each ±Ai is either the atomic sentence Ai or its negation. Let 'Ω' be the set of state descriptions, and let ω ∈Ω be some particular state Draft of September 14, 2020. Word Count: 6,846 Comments appreciated: dmitri.gallow@gmail.com † Thanks to Kevin Dorst, Daniel Drucker, and an anonymous reviewer for helpful conversations and feedback on this material. The Principle of Indifference and the Principal Principle are Incompatible description. If our language is truth-functional, a state description settles the truth-value of every other sentence in the language, and every sentence in the language is equivalent to some disjunction of state descriptions. I will also suppose that it makes sense to talk about an initial, or ur-prior, credence function. This function describes the credences which you are disposed to hold in the absence of any evidence. I'll denote this ur-prior credence function with 'C'. I will take it for granted that a rational ur-prior is a probability function.1 As I'll understand it here, the POI says that a rational ur-prior gives the same credence to every state description. If there are N atomic sentences in your language, there will be 2N state descriptions: #Ω = 2N .2 In that case, the POI will say that your ur-prior credence in each state description should be 1/2N . (Note: there are other principles which have gone by the name 'the principle of indifference'. I will not be arguing that those other principles are incompatible with the Principal Principle. I will discuss those other principles in §4.1 below.) Defenders of the POI could restrict its application to cases in which the number of atomic sentences is finite-in which case, Ω will be finite. But they may also wish to generalize the principle to hold even when there are countably many atomic sentences in your language. If there are countably many atomic sentences, A1,A2,A3, . . . , then a state description will be an infinitary conjunction ∧∞ i=1 ±Ai , and the set of state descriptions will have the size of the continuum, #Ω = 2א0 .3 The usual way the POI is implemented when you have uncountably many state descriptions involves imposing additional structure on the setΩ. We find some random variable, V , which maps every state description ω ∈Ω to some real number, V (ω) ∈ R. Then, we can assign to each value v in the range of the variableV a credence density, ρV (v). This density function doesn't say what your credence that V = v is.4 If you abide by the POI, your credence that V takes on any particular value, v, will have to be zero. Instead, ρV (v) says how dense your credence is atV = v. Think about it like this: for any narrow interval [v,v +ε], the ratio C(V ∈ [v,v +ε])/ε is the density of your credence over the interval [v,v + ε]. By taking the limit of this ratio as ε goes to zero, we get the 1. By this I mean: 1) C(P ) > 0 for every P ; 2) if P is a tautology, then C(P ) = 1; and 3) if P and Q are incompatible, then C(P ∨Q) = C(P ) +C(Q). 2. Notation: I use '#S ' to stand for the cardinality of the set S . 3. Proof sketch: with each state description, wemay associate the binary expansion of a real number in the interval [0,1] as follows. Proceed through the atoms on a chosen enumeration. For each k, if Ak is included in the state description (rather than its negation), then let the digit in the 2−k th position be a 1. If, on the other hand, ¬Ak is in the state description, then let the digit in the 2−k th position be a 0. This associates with every state description a real number in [0,1] (in binary). This association is a bijection, so #Ω = #[0,1] = 2א0 . 4. Notation: 'V = v' is the disjunction of state descriptions which V maps to v, V = v def=∨ ω∈Ω :V (ω)=vω. 2 §2 The Principal Principle Figure 1: The uniform credence density over U . Your credence that U lies in the set u = [1/4,1/2]∪ [3/4,1] is given by the integral ∫ uρU (u) du, which is the area under the curve ρU (u) shown in grey. density of your credence at the point V = v, ρV (v). With a credence density function, ρV , we can determine your credence distribution by integrating over ρV . For instance, your credence thatV is between a and b will be given by ∫ b a ρV (v) dv. And, in general, for any measurable set of values v, your credence that V is within v is given by ∫ vρV (v) dv. 5 Then, the POI may be implemented by saying that your credences should have a uniform density. That is: every value of v should have exactly the same credence density. For instance: consider a random variable, U , which tells us what percentage of space is unoccupied. U can take on values between 0 and 1. Then, POI says that the density of your credence should be uniform over these values. This uniform credence density is shown in figure 1. 2 The Principal Principle Like the POI, David Lewis (1980)'s Principal Principle says something about a rational initial, or ur-prior, credence function, C-the credence function it would be rational to have in the absence of any evidence. In particular, it says: if P is any sentence,6 t is some future time, Cht(P ) = x says that the time t chance of P is x, for some real number x ∈ [0,1], andE is any time t admissible sentence which is compatible with Cht(P ) = x, then C(P | Cht(P ) = x∧E) = x The time t won't be important in my discussion, so I'll fix t to be some future time and omit explicit mention of t in the remainder. Likewise, the admissible sentenceE won't play any important role. I'll make only two assumptions about 5. In general, we could characterize the possibilities in Ω with any finite number of real-valued variables, V1,V2, . . . ,VN . Then, instead of having a density function on R, we'd have a density function on RN . However, we won't require these additional complications here. 6. Lewis assumes that the arguments of your credence function are propositions. Since I'm assuming here that the arguments of your credence function are sentences, I've slightly emended his Principal Principle. 3 The Principle of Indifference and the Principal Principle are Incompatible admissibility. Firstly, I'll assume that a tautology, ⊤, is admissible at t. The conjunction Ch(P ) = x ∧⊤ is equivalent to Ch(P ) = x, so if we set E = ⊤ in the Principal Principle, we get the following: C(P | Ch(P ) = x) = x(PP1) Secondly, I will assume that sentences about the time t chance function are admissible at t. A sentence about the time t chance function is of the form 'Ch = ch', where 'Ch' is the definite description 'the time t chance function', and 'ch' is a name for a particular chance function. Thus, 'Ch = ch' says that the time t chance function is ch. If Ch = ch is admissible at t, then, so long as Ch = ch is compatible with Ch(P ) = x, the Principal Principle says that C(P | Ch(P ) = x∧Ch = ch) = x Because Ch = ch specifies the chance of every sentence, Ch = ch will either entail Ch(P ) = x or its negation. Therefore, the only way for it to be compatible with Ch(P ) = x is for it to entail Ch(P ) = x. In that case, the conjunction Ch(P ) = x ∧ Ch = ch is equivalent to Ch = ch. So the Principal Principle entails: C(P | Ch = ch) = ch(P )(PP2) PP1 and PP2 govern your conditional credences; but I'll suppose that these conditional credences place a constraint on your unconditional credences, via the product rule, which says that, for any sentences P andQ, your credence in P ∧Q is equal to the product of your credence that P givenQ and your credence that Q. C(P ∧Q) = C(P |Q) *C(Q) Then, so long asC(Q) > 0, your conditional credence in P , givenQ, is the ratio of your unconditional credence in P ∧Q and your unconditional credence in Q, C(P |Q) = C(P ∧Q)/C(Q). So a constraint on your conditional credences will have consequences for your unconditional credences. If your credence in P is x, conditional on any sentence whatsoever, then your credence in ¬P must be 1 − x, conditional on that same sentence. So if, conditional on Ch(P ) = x, your credence in P is x, then, conditional on Ch(P ) = x, your credence in ¬P must be 1− x. So, supposing that C(Ch(P ) = x) > 0, C(P | Ch(P ) = x) C(¬P | Ch(P ) = x) = x 1− x 4 §2 The Principal Principle C(P ∧Ch(P ) = x)/C(Ch(P ) = x) C(¬P ∧Ch(P ) = x)/C(Ch(P ) = x) = x 1− x C(P ∧Ch(P ) = x) C(¬P ∧Ch(P ) = x) = x 1− x C(P ∧Ch(P ) = x) = x 1− x *C(¬P ∧Ch(P ) = x)(1) Equation 1 follows from PP1. It will be important in §3 below. Similarly: if, conditional on Ch = ch, your credence in P is ch(P ), then, conditional on Ch = ch, your credence in ¬P must be 1 − ch(P ). So long as C(Ch = ch) > 0, a derivation parrallel to the one given above establishes that C(P ∧Ch = ch) = ch(P ) 1− ch(P ) *C(¬P ∧Ch = ch)(2) Equation 2 follows from (PP2). It will be important in §3 below. Some Humeans do not accept the PP because it conflicts with their metaphysical commitments.7 Nonetheless, those Humeans are happy to accept a principle they call 'the new principle'. Where the original Principal Principle implores you to defer to the chances, the new principle implores you to defer to the chances conditional on the true theory of chance. The differences between the new principle and the original Principal Principle won't be relevant to anything I say here. If you favor the new principle, you can simply interpret 'Ch(P ) = x' as saying that the time t chance of P-conditional on the true theory of chance-is x.8 So understood, equation 1 will follow from the new principle. Likewise, you may interpret 'Ch = ch' as saying that the time t chance function-conditioned on the true theory of chance-is ch. So understood, equation 2 will follow from the new principle. Suppose you want your credences to be defined over uncountably many sentences of the form Ch(P ) = x-one for each of the uncountably many real numbers, x, between 0 and 1. Then, so long as your credences are real valued, you'll have to assign a credence of zero to uncountably many of the sentences Ch(P ) = x. If your credence in Ch(P ) = x is zero, then the product rule will not impose any constraint on the relationship between C(P | Ch(P ) = x) and C(P ∧ Ch(P ) = x). Lewis was not concerned with this, because he allowed rational credences to take on infinitesimal values.9 So he thought that, even when you're spreading your credences over uncountably many state descriptions, you needn't give a credence of zero to any of them. If we agree with him about this, then perhaps the Principal Principle is already general enough. But I've been persuaded that Lewis was wrong to rely upon infinitesimals.10 If, like 7. See Hall (1994), Lewis (1994), and Thau (1994). 8. Cf. Hall and Arntzenius (2003). 9. See Lewis (1980, pp. 267–8). 10. See Williamson (2007), Easwaran (2014), and Hájek (ms, §7). 5 The Principle of Indifference and the Principal Principle are Incompatible me, you want your credences to be real-valued, then you should be looking for a natural generalization of the Principal Principle for the case where you have credences over uncountably many chance sentences. Even though your credence that the chance of P is x will be zero for any particular choice of x, your credence that the chance of P lies within an interval of values [x,x + ε] (with ε > 0) can be non-zero, no matter how small the interval [x,x+ ε]. So a natural generalization of the PP says that a rational urprior credence in P , given that the chance of P lies in some interval [x,x + ε], is within the interval [x,x+ ε]: x 6 C(P | Ch(P ) ∈ [x,x+ ε]) 6 x+ ε(PP3) If your credence in P , given Ch(P ) ∈ [x,x+ε], is in the interval [x,x+ε], then your credence in ¬P , given Ch(P ) ∈ [x,x + ε], is within the interval [1 − x − ε,1− x]: 1− x − ε 6 C(¬P | Ch(P ) ∈ [x,x+ ε]) 6 1− x Following the same steps from our derivation of equation 1 above, we get that, for any positive ε, no matter how small, C(P ∧Ch(P ) ∈ [x,x+ ε]) 6 x+ ε 1− x − ε *C(¬P ∧Ch(P ) ∈ [x,x+ ε]) and C(P ∧Ch(P ) ∈ [x,x+ ε]) > x 1− x *C(¬P ∧Ch(P ) ∈ [x,x+ ε]) Divide both sides of these inequalities by ε, and take the limit as ε goes to zero. Thereby, we get that the density of your credence in the conjunction P∧Ch(P ) = xmust be x/(1−x) times the density of your credence in the conjunction¬P ∧ Ch(P ) = x, ρ(P ∧Ch(P ) = x) = x 1− x * ρ(¬P ∧Ch(P ) = x)(3) Equation 3 follows from PP3. It will be important in §3 below. 3 The Incompatibility In §1 above, I assumed that your credences were defined over a simple truthfunctional sentential language. However, if we wish to entertain sentences like 'Ch(P ) = x' or 'Ch = ch', thenwewill need a language slightlymore complicated than this. We can generate an appropriately rich language by distinguishing two different kinds of atomic sentences, which I'll call non-chancy atoms and chancy atoms. (Throughout, 'atom' means 'atomic sentence'.) We needn't concern ourselves with the form of the non-chancy atoms-call them A1, A2, . . . , 6 §3 The Incompatibility AN . Perhaps they have internal structure, as in a first-order language, where the atomic sentences would have the form 'Ra1 . . . am', for some m-place predicate R and m constants, a1 . . . am. Or perhaps they have no further internal structure, as in a simple sentential language. The chancy atoms say what the time t objective chance function is, 'Ch = ch'.11 We then get the full language by taking the union of the chancy atoms and the non-chancy atoms, and closing the resulting set under negation and disjunction (and thereby, under conjunction as well). We may then recover a chancy sentence like Ch(P ) = x, since it is equivalent to the disjunction ∨ ch :ch(P )=x Ch = ch. With this richer language, a defender of the POI should change theway they think about a state description. When we were dealing with a simple sentential language, we understood a state description to be a conjunction of (negations of) atoms. But if ch and ch′ are two distinct probability functions, then both Ch = ch and Ch = ch′ will be atoms. Then, there would be a state description which included both of these sentences as conjuncts. These sentences may be known a priori to be incompatible with each other, so that state description should receive credence zero. However, since the POI requires an ur-prior to give every state description the same credence, it would require your ur-prior credence in this impossible state description to be positive. The solution is to take a state description to be a conjunction of the form ±A1 ∧ ±A2 ∧ * * * ∧ ±AN ∧±Ch, where each ±Ai is either the non-chancy atomAi or its negation, and ±Ch is either one of the chancy atoms Ch = ch or else the negation of every chancy atom, ∧ chCh , ch. Again, let 'Ω' be the set of state descriptions. Supposing that there areN non-chancy atoms andM chancy atoms, there will be 2N (M + 1) state descriptions: #Ω = 2N (M + 1). And the POI wil say that your credence in eachω ∈Ω should be 1/(2N (M +1)). Just to save space, let's use 'α' to stand for this credence, α def= 1/(2N (M +1)). Now, suppose that there's at least one non-chancy atom A and at least one chancy atom Ch = ch such that ch(A) , 1/2. Let ΩA,ch be the set of all state descriptions which include both A and Ch = ch. Then, the conjunction A ∧ Ch = ch is equivalent to the disjunction of all the state descriptions in ΩA,ch. Since an ur-prior credence is a probability, its credence inA∧Ch = chmust be equal to the sum of its credence in each ω ∈ΩA,ch. C(A∧Ch = ch) = ∑ ω∈ΩA,ch C(ω) If an ur-prior satisfies the POI, then every state description will receive the 11. Here we face a choice point. We could either take the potential chance functions ch to be defined only over the sentences in the language generated from the non-chancy atoms, or we could take it to be defined over every sentence in the language. In the latter case, we would have chancy sentences like Ch(Ch(P ) = x) = y. Which of these options we choose won't make a difference in any of what follows. 7 The Principle of Indifference and the Principal Principle are Incompatible same credence, α, so each summand in the above sum will be the same. So C's credence in A∧Ch(A) = x will be α times the number of state descriptions in ΩA,ch. C(A∧Ch(A) = x) = α *#ΩA,ch(4) In the same way, let Ω¬A,ch be the set of all state descriptions which include both ¬A and Ch = ch. An ur-prior credence in the conjunction ¬A∧Ch = ch must be equal to the sum of its credence in every state descriptionω ∈Ω¬A,ch. By the POI, each of these state descriptions must receive the same ur-prior credence, α. So the ur-prior credence in ¬A ∧ Ch = ch will be α times the number of state descriptions in Ω¬A,ch. C(¬A∧Ch = ch) = α *#Ω¬A,ch(5) But the number of state descriptions in ΩA,ch must be equal to the number of state descriptions in Ω¬A,ch. Take any ω ∈ ΩA,ch, replace 'A' with '¬A', and you have a state description ω∗ ∈Ω¬A,ch. This associates each ω ∈ΩA,ch with a unique ω∗ ∈ Ω¬A,ch, so #ΩA,ch 6 #Ω¬A,ch. Likewise, for every ω∗ ∈ Ω¬A,ch, replace '¬A' with 'A', and you have a state descriptionω ∈ΩA,ch. This associates each ω∗ ∈Ω¬A,ch with a unique ω ∈ΩA,ch, so #Ω¬A,ch 6 #ΩA,ch. So #ΩA,ch = #Ω¬A,ch. And this, together with (4) and (5), implies that C(A∧Ch = ch) = C(¬A∧Ch = ch)(6) But since we stipulated that ch(A) , 1/2, (6) is incompatible with the Principal Principle. As we saw in §2, the Principal Principle requires that C(A∧Ch = ch) = (ch(A)/(1− ch(A))) *C(¬A∧Ch = ch)(7) (This follows from equation 2, if we substitute in 'A' for 'P '). The POI requires your initial, ur-prior credence function C to satisfy 6. The PP requires it to satisfy 7. Because ch(A) , 1/2, it is impossible for C to satisfy both 6 and 7 at once. So the POI and the PP impose incompatible demands on an ur-prior credence. They are incompatible. In the same way, we may show that the POI is incompatible with equation 1. Suppose that there is at least one non-chancy atom A and one chancy atom Ch = ch such that ch(A) = x , 1/2. Then, let ΩA,x be the set of all state descriptions which include both A and some chancy atom Ch = ch such that ch(A) = x, and letΩ¬A,x be the set of all state descriptions which include both ¬A and some chancy atom Ch = ch such that ch(A) = x. Then, we may go through exactly the same steps as above-swapping 'ΩA,ch' and 'Ω¬A,ch' out 8 §3 The Incompatibility for 'ΩA,x' and 'Ω¬A,x', respectively, to show that the POI requires C(A∧Ch(A) = x) = C(¬A∧Ch(A) = x)(8) whereas (1) requires that C(A∧Ch(A) = x) = (x/(1− x)) *C(¬A∧Ch(A) = x) So long as x , 1/2, it is impossible to satisfy both of these requirements at once. So, again, the demands of the POI and the demands of the PP conflict. Perhaps this incompatibility arises because we have only considered a finite number of chancy atoms. Perhaps we could avoid the incompatibility if we open the door to uncountablymany chancy atoms. To keepmatters simple, let's suppose that there is just a single non-chancy atom, A. (Including additional non-chancy atoms makes the math more complicated, but everything we say about the simple case will carry over to the more complicated case as well.) Then, we may have, for each x ∈ [0,1], a chancy atom Ch = chx, where chx is a probability function defined over the sentences we get by taking the set {A} and closing it under negation and disjunction. Every such sentence will be equivalent to one of the following four: 1)A∧¬A, 2)A∨¬A, 3)¬A, and 4)A. Since chance is a probability function, wemust have chx(A∧¬A) = 0, chx(A∨ ¬A) = 1, and chx(¬A) = 1 − chx(A). So we may characterize each potential chance function chx with a single parameter, x, which is the probability chx assigns to the atom A. For each x ∈ [0,1], there are two corresponding state descriptions: A ∧ Ch = chx and¬A∧Ch = chx. So there are uncountablymany state descriptions in Ω. To apply the POI, then, we must first parameterize these state descriptions by using an appropriate randomvariable fromΩ toR. We can encode the information of which chancy atom obtains with a variable ChA, which maps a state description ω ∈ Ω to x iff the chancy atom Ch = chx is included in ω. But this variable on its own doesn't tell us everything. Besides the chance ofA, we also need to know whetherA is true or false. I will encode this information with a variable 1A, which maps a state descriptionω ∈Ω to the value 1 iffA is included in ω, and maps ω to 0 if ¬A is included in ω. We can then put these two pieces of information together with a variable V = ChA + 1A. V tells us everything there is to tell about both the chance of A is and whether A is true or false.12 If V is between 0 and 1, then A is false and the value of V is the 12. Actually, that's a bit of a fib. V does not tell us everything there is to tell about whether A is true or false, since the value V = 1 is consistent with the following two state descriptions: (a) ¬A∧Ch = ch1, and (b)A∧Ch = ch0. To get around this problem, we could instead use a variable 2A, which is 2 if A is true and 0 if A is false, and then consider V ∗ = ChA +2A. Nothing of any substance would change if we used V ∗ in place of V , since the uniform distribution over V and the uniform distribution over V ∗ don't disagree in probability. I stick with V in the body for 9 The Principle of Indifference and the Principal Principle are Incompatible Figure 2: The uniform density over V = ChA + 1A is required by the POI, but is incompatible with PP3. (a) (b) Figure 3: Two sample credence densities over V = Ch(A) + 1A which abide PP3. (In figure 3a, ρV is 1−v between 0 and 1, v−1 between 1 and 2, and 0 elsewhere. In figure 3b, ρV is 2v −2v2 between 0 and 1, 2v2 −4v +2 between 1 and 2, and 0 elsewhere.) chance of A. If V is between 1 and 2, then A is true and the value of V is the chance of A plus 1. The POI tells you to have a uniform credence density over the potential values of V , as in figure 2. But this is incompatible with the PP3 (the generalization of the PP for situations in which the number of potential chance hypotheses is uncountably infinite). For PP3 requires that, for any v between 0 and 1, ρV (v +1) = v 1− v * ρV (v)(9) (Equation 9 follows from from equation 3, which itself follows from PP3, as we saw in §2.) But the uniform credence density shown in figure 2 sets ρV (v+1) = ρV (v) = 1/2 for every value of v between 0 and 1. So the uniform credence density will violate equation 9 for every value of v other than v = 1/2. So the uniformcredence density does not abide PP3. (I've shown two sample credence densities which abide PP3 in figure 3.) two reasons. Firstly, characteristic functions like 1A are more familiar than functions like 2A. Secondly, if I used V ∗, the graphs in figure 3 would get too wide for the margins. 10 §4 Further Discussion 4 Further Discussion One kind of reaction to this incompatibility is to reject one or both of the principles and leave it at that. I won't have anything further to say about this kind reaction. In this section, I'll discuss two more moderate reactions. The first (discussed in §4.1) is tomove to a weaker formulation of the POI which doesn't conflict with the PP. The second (discussed in §4.2) is to emend the PP so that it doesn't conflict with the POI. 4.1 Revising the Principle of Indifference As I formulated the POI, it says that your credence in any state descriptionmust be equal to your credence in any other state description. And as I've understood it, a state description ω ∈ Ω specifies all of the things your language is able to tell you about the world. It describes matters in as precise a detail as your language will permit. If ω is a state description, then any other description of the world (in your language) is either entailed byω or incompatible withω. If we adopt a less stringent understanding of a state description-one on which it specifies some, but not all, of the things your language is able to tell you about the world-then you will be able to give each of these ersatz state descriptions the same credence without violating the PP. Say that a non-chancy state description describes the world in as rich a detail as the non-chancy fragment of your language permits. Assuming a truthfunctional language: if A1,A2, . . . ,AN are the non-chancy atoms of your language, then a non-chancy state description is a conjunction of the form ±A1∧ ±A2∧* * *∧±AN , where each ±Ai is eitherAi or ¬Ai . Then, one way of avoiding the conflict with the PP is to weaken POI so that it doesn't tell you that your ur-prior credences must be distributed evenly over state descriptions. Instead, the weakened principle says that your ur-prior credences must be distributed evenly over the non-chancy state descriptions.13 This restricted principle is compatible with the PP. So why not simply restrict the POI in this way so as to make it consistent with the PP? We could certainly do so. However, I personally have a hard time seeing the philosophical motivation for such a view. By way of explanation, let me say something about what kind of constraint the POI imposes on an ur-prior credence, and why its defenders have thought you should satisfy this constraint 13. Cf. Hawthorne et al. (2017), who formulate a weakened version of the POI, according to which your credence in every non-chancy atom should be 1/2. This weakened version of the POI is strictly weaker than the one considered in the body, though the version considered in the body follows from this principle together with the assumption that every non-chancy atom is probabilistically independent of every other non-chancy atom. Hawthorne et al. claim that the PP implies this weaker principle, though this is not correct. See Pettigrew (2018) and Titelbaum andHart (2018) for discussion of whereHawthorne et al.'s argument goes awry. 11 The Principle of Indifference and the Principal Principle are Incompatible when you lack evidence. In general, a credence function will encode relations of evidential relevance. If your credence in P givenQ is greater than your credence in P , this encodes the fact that you takeQ to be evidence for P . The POI imposes a rather demanding constraint on what kinds of evidential relevance relations you're permitted to recognize in the absence of evidence. It forbids taking any atom of your language to be evidence for any other atom of your language in the absence of evidence. Williamson (2010) justifies the ur-prior recommended by POI on the grounds that it is leads tomaximally cautious actions: it "is on average themore cautious policy when it comes to risky decisions", in the sense that it "minimises worst-case expected loss".14 Similarly, Pettigrew (2016) argues for the POI on the grounds that it epistemically cautious: it minimizes the worst case with respect to the accuracy of your beliefs. According to Pettigrew, "what is wrong with assigning greater credence to one possibility [i.e., state description] over another in the absence of evidence is that by doing so you risk greater inaccuracy than you need to risk. [If you violate the POI, then] there is an alternative [ur-prior] credence function, namely the uniform distribution...that has lower inaccuracy in its worst-case scenario than you have in yours."15 None of these arguments depend in any way upon assumptions about the content of the atomic sentences in your language. So I have a hard time seeing why we should find those arguments any less compelling when some of the atoms of your language concern the chances. Moreover, if we grant an exemption for atoms about the chances, one wants to know why a similar exemption cannot be granted for other atoms. Suppose your language contains only the atoms B1, B2, . . . , BN , where Bi says that the ith raven is black. Some of us think that, even before receiving evidence, you should take the first k ravens being black to be evidence for the k+1st raven being black. Some of us say that-even with this simple language, and even in the absence of evidence-your credence in BN , given B1 ∧ B2 ∧ * * *∧BN−1, should be greater than your unconditional credence inBN . The POI disagrees. It says that, with this simple language, before you have any evidence, you must not take the fact that the firstN −1 ravens are black to be evidentially relevant to whether the N th raven is black. It says that your credence in the state description B1 ∧ B2 ∧ * * * ∧ BN−1 ∧ BN (every raven is black) must be the same as your credence in the state description B1∧B2∧* * *∧BN−1∧¬BN (every raven is black except for the last one). And if that's so, then your credence that the N th raven is black, given that the first N − 1 ravens are black, will be 1/2, which will be the same as your unconditional credence that theN th raven is black. (Exactly half of the state descriptions include 'BN ', and exactly half 14. Williamson (2010, pp. 62 & 65). See Williamson (2010, §3.4.4) for more. 15. Pettigrew (2016, p. 164). See Pettigrew (2016, part III) for more. 12 §4 Further Discussion contain its negation.) So, if you abide the POI, then you won't see the blackness of the first N − 1 ravens as evidence for the N th raven being black. More generally, the POI requires that-in the absence of evidence-every atomic sentence is given a credence of 1/2, and every atomic sentence is probabilistically independent of every other. So it forbids recognizing evidential relations between atoms, unless you have evidence supporting those evidential relations. This imposes a kind of a priori inductive skepticism. It forbids an ur-prior from recognizing many evidential relations typically recognized by inductive methods. It says that, in the absence of evidence, it is irrational to take 'John testifies that P ' or 'It appears that P ' to be evidence for 'P '. The incompatibility of the PP with the POI is simply another application of this a priori inductive skepticism. In exactly the sameway that the POI forbids thinking that the atom B1 is evidence for the atom B2, it forbids thinking that the atom Ch = ch is evidence for the atom A, even when ch(A) = 95%. Weakening the POI so that it says only to spread your credences evenly over the non-chancy state descriptions makes an exception to the general rule of not recognizing evidential relations between atoms. Such an exception could, of course, be granted. But the reasons provided for POI by defenders like Jaynes (1957), Williamson (2010), and Pettigrew (2016) do not seem to motivate such an exemption. Take an ur-prior which satisfies the PP by beingmore confident in state descriptions in whichA∧Ch = ch than it is in state descriptions in which ¬A∧Ch = ch. This ur-prior builds in more information, and so has greater entropy, than a credence function which satisfies the POI by spreading its credence equally over all state descriptions. If we should minimize prior information about whether nature is uniform, whether testifiers are trustworthy, and whether appearances are deceiving, then why shouldn't we also minimize information about whether the chances are accurate? If the outcome of a risky action depends upon whether A ∧ Ch = ch, the ur-prior which satisfies the PP will lead to less cautious actions than the one which satisfies the POI. If we shouldn't take incautious actions when it comes to whether nature is uniform, whether testifiers are trustworthy, and whether appearances are deceiving, then why should we take incautious actions when it comes to whether the chances are accurate? And, if we should minimize worst-case epistemic risk when it comes to matters about whether nature is uniform, testifiers are trustworthy, and appearances are deceiving, why shouldn't we also minimize worst-case epistemic risk when it comes to whether chance is accurate? I am not contending that there is no reason for a selective a priori inductive skepticism, according to which we have a priori grounds to trust in chance, but no a priori grounds to trust in regularities, testifiers, or our senses. I am contending that, to my knowledge, no such reason has been given. A somewhat less moderate position would weaken the POI even further by 13 The Principle of Indifference and the Principal Principle are Incompatible allowing an ur-prior to build in assumptions about the uniformity of nature, as well as the reliability of testifiers and appearances, in addition to the PP. More generally, we could allow in anynumber of a priori rationality constraints, and understand the POI as saying only that you should spread your ur-prior credences as evenly as possible subject to these constraints. That is: your urprior credences should be spread evenly, except when this conflicts with some other a priori norm of rationality. Let me make five observations about this more relaxed principle. Firstly, some authorswhohave defended "the principle of indifference" have something like this more relaxed principle in mind. For instance, White (2009) calls the following "the principle of indifference": If 'P ' and 'Q' are evidentially symmetrical, then your credence in 'P ' should equal your credence in 'Q' When explaining what it takes for 'P ' and 'Q' to be evidentially symmetrical, he makes it clear that this can include a priori reasons to think 'P ' is more likely than 'Q'. He writes: "I mean to understand evidence very broadly here to encompass whatever we have to go on in forming an opinion about the matter. This can include non-empirical evidence or reasons, if there are such."16 Secondly, several other authors who have defended "the principle of indifference" have the stronger thesis I've here called 'POI' in mind. For instance, Pettigrew (2016, §12.1) explicitly rejects White's formulation of the POI. In its place, he advocates the following stronger formulation: PoI Suppose that F is a finite, rank-complete set of propositions. If an agent has an initial credence function c0 defined on F , then rationality requires that c0 is the uniform distribution on F ...[where the uniform distribution] assigns to each proposition the proportion of the possible worlds at which it is true.17 Pettigrew formulates this principle in a framework where the arguments of your credence function are sets of possible worlds, which he calls 'propositions'. But we may translate between these two frameworks by taking each of our state descriptions to correspond to one of his possible worlds. Each of our sentences is equivalent to some disjunction of state descriptions. So we may associate each of our sentences with the set of state descriptions in this disjunction, which we may in turn associate with a set of possible worlds in Pettigrew's framework.18 Given this translation scheme, his requirement that 16. White (2009, p. 161–2) 17. Pettigrew (2016, p. 164) 18. You may worry that, while this translation scheme gives us a surjective function from sentences 14 §4 Further Discussion F be finite is analogous to requiring that there are finitely many atomic sentences. (The notion of a rank complete set is a slightly technical notion which is needed for Pettigrew's theorem, but which isn't relevant to our discussion here. Just note that, given our translation scheme, this condition will be satisfied so long as your language is closed under negation and conjunction and you have a credence in every sentence in your language.19) Given this translation, Pettigrew's principle says exactly what POI does: your ur-prior should give every state description the same credence.20 Thirdly, while there may be good reason to endorse the weaker principle while rejecting the stronger, the arguments of Williamson (2010) and Pettigrew (2016) do not support this more moderate position. An ur-prior which satisfies the PP will lead to less cautious actions than an ur-prior which satisfies the POI. So adopting the weaker principles does not minimize worst-case expected loss. Since Williamson's justification of the POI appeals to a principle of minimizing worst-case expected loss, that justification cannot be used to support themoderate position. Similarly, an ur-priorwhich satisfies the PPwill lead to less epistemic caution than one which satisfies the POI. As Pettigrew taught us, if you satisfy the PP, then there is an alternative ur-prior-namely, the uniform ur-prior-which has a lower inaccuracy in its worst-case scenario than you have in yours. Since Pettigrew's justification of the POI appeals to a principle which says that in the absence of evidence, you must minimize your worst-case inaccuracy, that justification cannot be used to support the moderate position, either. Fourthly, depending upon what the a priori rationality constraints happen to be, it could turn out that satisfying this kind of norm is impossible. For instance, suppose that it is an a priori requirement of rationality that your credence in B2, given B1, is greater than 1/2. Then, there is no ur-prior which spreads its credence most evenly, subject to this constraint. Choose any ε > 0 and take an ur-prior whose credence in B2, given B1, is 1/2+ ε. This ur-prior satisfies the constraint, but it distributes its credence less evenly than an urto propositions (sets of possible worlds), the function is not a bijection. For there will be multiple sentences translated to the same proposition. That's correct, but even so, any two sentences translated to the same proposition are equivalent. Since your ur-prior is a probability, it assigns equivalent sentences the same probability. Consider the equivalence classes of logically equivalent sentences. The proposed translation establishes a bijection between propositions and these equivalence classes. So the probability which an ur-prior gives to a proposition (in Pettigrew's framework) will correspond to the probability which an ur-prior gives to any sentence in the corresponding equivalence class (in our framework). So we may go back and forth between the two frameworks. 19. For the curious: this is what it is for F to be rank-complete: if there is a proposition P ∈ F which contains N possible worlds, then every other set of N worlds is also included in F . 20. I believe that Williamson (2010) also endorses the principle I've called 'POI', though this is more difficult to establish exegetically, since it hinges upon whether Williamson understands 'evidence' to include a priori knowledge, and the text says very little about evidence. In any case, Williamson rejects the Principal Principle (see the discussion in §4.2 below). 15 The Principle of Indifference and the Principal Principle are Incompatible prior whose credence in B2 given B1 is 1/2+ ε/2. In response to troubles like these, we could take the principle to say only that an ur-prior should spread its credence sufficiently evenly, subject to the a priori requirements of rationality,21 though one would like to hear something both general and substantive about how evenly your credences must be spread for them to count as sufficiently even. Finally, depending upon how exacting the other a priori norms of rationality are, theremay be little to no work left over for the weak variant of the POI to do. For instance, suppose that the other a priori norms of rationality pin down a precise rational credence in every state description. Then, the more relaxed variant of POI would be vacuously satisfied-which is to say, it would impose no constraint at all. There would be no difference between it and a normwhich says to spread your credence as unevenly as possible, given the (other) a priori rational norms. 4.2 Revising the Principal Principle Following Lewis, I have formulated the Principal Principle so that it, just like the POI, constrains your initial, orur-prior, credences-the credences youhave, or are disposed to have, in the absence of evidence. The principle has nothing to dowith how your credences are disposed to change upon receiving evidence. Nonetheless, some defenders of the POI may see hidden in the Principal Principle a vestige of the principle of conditionalization, according to which you should be disposed to update your credences by conditioning on any newly acquired evidence. Those defenders of the POI may wish to reject the Principal Principle as Lewis explicitly formulated it, but accept a nearby principle which, instead of constraining your initial conditional credences, constrain the credences you are disposed to adopt, upon learning what the chance of a sentence is. For instance, in response to an unrelated puzzle, Wallmann and Williamson (2020, p. 3) propose a modification of the Principal Principle which says that CCh(P )=x(P ) = x, where CCh(P )=x is the credence function you are disposed to adopt, upon learning that Ch(P ) = x and no more. The principle of conditionalization says that CCh(P )=x should be C conditioned on Ch(P ) = x, so this proposed principle agrees with the Principal Principle when conjoined with conditionalization. Williamson, however, rejects conditionalization. Instead, he says that, when your evidence imposes constraints on your credences, you should adopt a probability which meets those constraints and which otherwise distributes its probability as evenly as possible.22 Following Jaynes (1957), Williamson calls this updating norm the principle of maximum entropy. (The name comes from the fact that the evenness of your 21. Cf. Williamson (2010). 22. Or, if perhaps, sufficiently evenly. 16 §4 Further Discussion credence can be measured by its entropy.) In many cases, the principle of maximum entropy will agree with the principle of conditionalization. If you receive evidence which imposes the constraint that E receive credence 1, then the updating norm of maximum entropy will require that CE is C conditioned on E.23 That is: if your evidence constrains your credence in E to be 1, then the norm ofmaximum entropy will tell you to update by conditioning on E. So, if learning that Ch(A) = x imposes the constraint that Ch(A) = x be assigned a credence of 1, then CCh(A)=x(A) will be equal to C(A | Ch(A) = x). In that case, since the POI won't set C(A | Ch(A) = x) equal to x unless x = 1/2, the principle of maximum entropy won't set CCh(A)=x(A) equal to x unless x = 1/2, either. Perhaps for this reason, Williamson doesn't actually say that, upon learning that E, you should update by imposing the sole evidential constraint that E be assigned credence 1. That's how things work for the non-chancy sentences. But there's something special about chancy sentences. Learning something about the chances-something like Ch(P ) = x-doesn't only impose the evidential constraint that C(Ch(P ) = x) = 1. It additionally imposes the evidential constraint that your credence in P be equal to x.24 Then, the proposed revision of the Principal Principle is trivially satisfied: for any sentence P and any real number x, CCh(P )=x(P ) will be x, by stipulation. What if you just learn that the chance of P lies within some range of values? In that case, Williamson says that your credence that P must lie within that range.25 For illustration, suppose that you begin with the credence distribution shown in figure 2, and you learn that Ch(A) > 1/2. In that case, your credence that A is currently 1/2, so you currently satisfy the constraint to have a credence in the interval [1/2,1]. Moreover, your credence that A is independent of your credence that Ch(A) > 1/2, so C(A | Ch(A) > 1/2) = 1/2, and you will still satisfy the constraint to have a credence in the interval [1/2,1] after conditioning on the chance claim Ch(A) = 1/2. So Williamson won't advise you to increase your credence in A at all, even though your expectation of the chance of A has risen from 1/2 to 3/4ths. We could try to get around this problem by insisting that the evidence Ch(A) > 1/2 imposes the constraint that your credence that A equal 3/4ths. But then, if you were to go on to learn Ch(A) ≤ 3/4, we would presumably want to impose the new constraint that your credence in A be your (new) expectation of the chance of A, C(A) = 5/8. At that point, the evidential con23. See Seidenfeld (1986)'s 'Result1', on page 471. 24. "Learning Ch(P ) = x does not merely impose the constraint C(Ch(P ) = x) = 1, but also the constraint C(P ) = x" (Williamson, 2010, p. 79, with minor notational changes). 25. More generally, he says that your credence that P must lie within the convex hull of the numbers which might, for all your evidence has to say, be the chance of P . See §3.3.1 of Williamson (2010). 17 The Principle of Indifference and the Principal Principle are Incompatible straints on your credences would be inconsistent. It's clear that the constraint C(A) = 3/4 should be ditched and that the constraint C(A) = 5/8 should take its place, though it's less clear whether there's any principled story to be told about why. In any case, this kind of approach seems to me to confuse evidence-which is the input to an updating rule-with the rational response to that evidence-which is the output of an updating rule. To see why, think about what would happen, had you first learnt that Ch(A) 6 3/4, and then learnt that Ch(A) > 1/2. In that case, we would have to say that the evidence Ch(A) 6 3/4 imposes the constraint that C(A) = 3/8, and that the evidence Ch(A) > 1/2 imposes the constraint that C(A) = 5/8. But why should changing the order in which you receive the evidence about chancemake a difference to the constraint which that evidence imposes? Whether you already know that Ch(A) > 1/2 should make a difference to which credences you adopt when you learn that Ch(A) 6 3/4. But I don't see why the evidence you've already received would make any difference to the constraint which the piece of evidence Ch(A) 6 3/4 itself imposes on your credences. We should be able to specify the constraints imposed by a new piece of evidence in a way which is independent of your prior credences and your pre-existing evidence. For while it is natural to think that a rational agent changes their beliefs to meet the demands of their evidence, it is far less natural to think that the evidence changes its demands to meet the rational agent's beliefs.26 Whether we adopt this proposal or not, if you update your credences in the way Williamson advises, you will violate the rule of conditionalization whenever you stand to learn something about the chances. So, whenever you stand to learn something about the chances, you will be susceptible to a Dutch book strategy, as Teller (1973, 1976) and Lewis (1999) have shown. Williamson recognizes this, but contends that susceptibility to a Dutch book strategy is no vice. He argues for this as follows: suppose you are a juror who is about to listen to the defense. You know that they will only present evidence which supports the defendant's innocence. So, if you know you're rational, then you know that your credence in the defenant's innocence will go up, no matter what you hear. So, after their defense, you'll sell back a bet on the defendant's guilt for less than you paid for it.27 But Williamson is wrong that rationality will compel you to lower your credence in the defendant's guilt no matter what you hear. An exceptionally weak defense should make you more confident that the defendant is guilty (think: that's the best they could do?). 26. Cf. Field (1978)'s discussion of Jeffrey conditioning. 27. See Williamson (2010, §4.4). 18 §5 In Summation 5 In Summation In sum: the POI and the PP are incompatible. There is a minimal way of weakening the POI to render it compatible with the PP, though I have a hard time seeing the philosophical motivation for this ad hoc weakening. There is also a much weaker principle which sometimes goes by the name 'the principle of indifference'. This principle allows an ur-prior credence distribution to be uneven so long as this unevenness is required by some other a priori requirement of rationality. It says merely that your ur-prior credences should be as even as the other requirements of rationality allow them to be. This principle does not conflict with the PP. While there may be good reason to endorse this more moderate position alongside the PP, this more moderate position is not supported by Williamson's and Pettigrew's arguments for the POI. Though abiding the POI means violating the PP, you could satisfy both the POI and a surrogate chance deference principle which says that, upon learning that the chance of P is x, you should be disposed to adopt a credence of x in P . Satisfying this surrogate chance deference principle requires you to take evidence about the chances to impose special evidential constraints. Leaning information about the chances doesn't just require you to become certain in that information. It also requires you to change your credences in other sentences which you previously took to be probabilistically independent of the information about chances. Though this wil allow you to satisfy the surrogate chance deference principle, it will leave you disposed to not change your credence in P , even when your expectation of the chance of P is raised. We could try to get around this problem, but only by customizing evidential constraints on a case-by-case basis. And, even if we did this, the norm of maximum entropy would expose you to diachronic exploitability. 19 References Easwaran, Kenny, 2014. Regularity and Hyperreal Credences. The Philosophical Review, 123(1):1–41. [5] Field, Hartry, 1978. A Note on Jeffrey Conditionalization. Philosophy of Science, 45(3):361–367. [18] Hájek, Alan, ms. Staying Regular? [5] Hall, Ned, 1994. Correcting the Guide to Objective Chance. Mind, 103(412):505–517. [5] Hall, Ned and Arntzenius, Frank, 2003. On What We Know About Chance. The British Journal for the Philosophy of Science, 54(2):171–179. [5] Hawthorne, James, Landes, Jürgen, andWilliamson, Jon, 2017. ThePrincipal Principle Implies the Principle of Indifference. The British Journal for the Philosophy of Science, 68(1):123–131. [11] Jaynes, E. T., 1957. Information Theory and Statistical Mechanics. Physical Review, 106(4):620–630. [13], [16] Lewis, David K., 1980. A Subjectivist's Guide to Objective Chance. In Richard C. Jeffrey, editor, Studies in Inductive Logic and Probability, volume II, pages 263–293. University of California Press, Berkeley. [3], [5], [16] Lewis, David K., 1994. Humean Supervenience Debugged. Mind, 103(412):473–490. [5] Lewis, David K., 1999. Why Conditionalize? In Papers in Metaphysics and Epistemology, volume 2, chapter 23, pages 403–407. Cambridge University Press, Cambridge. [18] Pettigrew, Richard, 2016. Accuracy and the Laws of Credence. Oxford University Press, Oxford. [1], [12], [13], [14], [15], [19] Pettigrew, Richard, 2018. The Principal Principle Does Not Imply the Principle of Indifference. The British Journal for the Philosophy of Science. [11] Seidenfeld, Teddy, 1986. Entropy and Uncertainty. Philosophy of Science, 53(4):467–491. [17] Teller, Paul, 1973. Conditionalization and Observation. Synthese, 26(2):218–258. [18] Teller, Paul, 1976. Conditionalization, observation, and change of preference. In W. L. Harper and C. A. Hooker, editors, Foundations of Probability Theory, Statistical Inference, and Statistical Theories of Science, volume I, pages 205–253. D. Reidel Publishing Company, Dordrecht. [18] Thau, Michael, 1994. Undermining and Admissibility. Mind, 103(412):491– 503. [5] Titelbaum,MichaelG andHart, Casey, 2018. ThePrincipal PrincipleDoes Not Imply the Principle of Indifference, Because Conditioning on Biconditionals Is Counterintuitive. The British Journal for the Philosophy of Science. ISSN 0007-0882. doi:10.1093/bjps/axy011. [11] Wallmann, Christian and Williamson, Jon, 2020. The Principal Principle and Subjective Bayesianism. European Journal for Philosophy of Science, 10(3):1–14. [16] White, Roger, 2009. Evidential Symmetry and Mushy Credence. In Tamar SzaboGendler and JohnHawthorne, editors,Oxford Studies in Epistemology, pages 161–186. Oxford University Press. [14] Williamson, Jon, 2010. In Defense of Objective Bayesianism. Oxford University Press, Oxford. [12], [13], [15], [16], [17], [18], [19] Williamson, Timothy, 2007. How probable is an infinite sequence of heads? Analysis, 67(3):173–180. [5]