A Generalised Lottery Paradox for Infinite Probability Spaces1 Martin Smith University of Glasgow Many epistemologists have responded to the lottery paradox by proposing formal rules according to which high probability defeasibly warrants acceptance. Douven and Williamson (2006) present an ingenious argument purporting to show that such rules invariably trivialise, in that they reduce to the claim that a probability of 1 warrants acceptance. Douven and Williamson's argument does, however, rest upon significant assumptions – among them a relatively strong structural assumption to the effect that the underlying probability space is both finite and uniform. In this paper, I will show that something very like Douven and Williamson's argument can in fact survive with much weaker structural assumptions – and, in particular, can apply to infinite probability spaces. I. INTRODUCTION A very natural first thought to have about the relationship between rational acceptance and probability is that propositions become rationally acceptable when they are sufficiently likely to be true. This gives us the following: Basic Rule A proposition  is rationally acceptable if Pr() > t. where Pr is a probability function over propositions and t is some threshold value close to, but less than, 1. It is also very natural to think that rational acceptability is closed under conjunction. That is: Closure If each of  and  is rationally acceptable then so is   . As is well known however, Closure and the Basic Rule, when combined, yield the result that an inconsistent proposition can be rationally acceptable. This can be made vivid via the so-called 'lottery paradox'. Select an integer n > 1/1 – t and consider a fair n-ticket lottery guaranteed to have a single winner. The propositions 1 I would like to thank Stephan Leuenberger and three anonymous referees for numerous helpful comments and for pointing out several errors in the original version of this paper. 2 that ticket #1 will lose, that ticket #2 will lose etc. will each have a probability of 1 – 1/n which, given the above inequality, will be greater than t and, thus, qualify as rationally acceptable by the Basic Rule – call these 'lottery propositions'. The conjunction of the lottery propositions, however, is directly inconsistent with the proposition that some ticket will win, which can also be assumed to be rationally acceptable. By Closure, then, the inconsistent proposition that some ticket will win and no ticket will win will be rationally acceptable. Henry Kyburg, who was the first to draw attention to the lottery paradox, responded by rejecting Closure (Kyburg, 1961, 1970). This solution has not, however, been widely embraced amongst epistemologists – many of whom would rather retain Closure and resolve the paradox by refining the Basic Rule in such a way as to block the rational acceptability of lottery propositions (see, for instance, Lehrer, 1974, chap. 8, Pollock, 1990, pp80-81, Ryan, 1996, Nelkin, 2000, Douven, 2002). The refined rules that have been proposed can be shoe-horned into the following general form: Refined Rule A proposition  is rationally acceptable if Pr() > t, unless defeater D holds of . where D is some condition satisfied by lottery propositions. The defeaters proposed by Pollock and Douven suffice to give the general flavour: According to Pollock, a proposition  is rationally acceptable if Pr() > t, unless  is a member of a minimally inconsistent set of propositions, each of which has a probability greater than t (Pollock, 1990, pp80-81). According to Douven, a proposition  is rationally acceptable if Pr() > t, unless  is a member of a probabilistically self-undermining set of propositions, where a set of propositions is probabilistically self-undermining just in case (i) the probability of each member is greater than t and (ii) the probability of each member conditional upon the remaining members is less than t (Douven, 2002, see also Douven and Williamson, 2006, pp759). Any such rule will escape the paradox as it stands. But the ambition behind these rules, of course, is not just to resolve the lottery paradox per se. Generally 3 speaking, a refined rule of rational acceptability aspires to do two things: (i) predict that some propositions that are less than certain can be rationally acceptable and (ii) fail to predict that any inconsistent or otherwise absurd propositions are rationally acceptable, even in combination with Closure. The Basic Rule, of course, fails on the second count. Many of the refined rules that have been proposed, however, have turned out to fail on the first. That is, many of the proposed defeat conditions have turned out to encompass not only lottery propositions but also, on close inspection, all propositions that are less than certain. In this case, the associated rule will reduce to the claim that a probability of 1 is sufficient for rational acceptability. Although refined rules of rational acceptability have had a rather poor track record, one might simply take this as an invitation to refine further. In 'Generalising the lottery paradox' (2006), however, Igor Douven and Timothy Williamson present an ingenious argument to the effect that a strikingly broad range of refined rules – roughly all of those characterised in logical or probabilistic terms – will either fail on count (i) or on count (ii). This is their 'generalised lottery paradox' and it comes close, I think, to showing that the ambition behind the refined rules simply cannot be realised. Douven and Williamson's argument does, though, rest upon significant assumptions – among them a relatively strong structural assumption to the effect that the underlying probability space is both finite and uniform. As Douven and Williamson remark 'It must be admitted that there is no straightforward generalization to infinite probability spaces' (Douven and Williamson, 2006, pp775). This, as Douven and Williamson acknowledge, leaves a certain avenue of response open to the refined rule theorist. In this paper, I shall attempt to close this avenue off. By exploiting a result of Villegas (1964), I will show that a close analogue of Douven and Williamson's argument can survive with much weaker structural assumptions – and, in particular, can be generalised to infinite probability spaces. 4 II. DOUVEN AND WILLIAMSON'S ARGUMENT Following Douven and Williamson, let propositions be modelled as sets of possible worlds. A probability space is a triple W, F, Pr where W is the set of possible worlds, F is a -field on W – that is, a set of subsets of W that includes W itself and is closed under complementation and countable union – and Pr is a probability function taking F into the real interval [0, 1]. Douven and Williamson assume that W is a finite set, that F is equal to (W) and that Pr is a uniform distribution over the members of W – that is, for any w  W, Pr({w}) = 1/|W| (where |W| is the cardinality of W). With these assumptions in place, it follows that the probability of any proposition in F will be equal to the ratio of its cardinality to that of W – that is, for any   F, Pr() = ||/|W|. Call a function  an automorphism of W, F, Pr iff  is a 1:1 function from F onto itself that satisfies these conditions: (i) (  ) = ()  () (ii) -() = (-) (iii) Pr() = Pr(()) for all ,   F. A property P of propositions is structural with respect to a probability space W, F, Pr just in case, for any proposition   F and automorphism  of W, F, Pr,  has P iff () has P. A property P of propositions is structural simpliciter just in case it is structural with respect to all probability spaces. A property P of propositions is aggregative with respect to a probability space W, F, Pr just in case for any propositions ,   F,    has P whenever  has P and  has P. A property P of propositions is aggregative simpliciter just in case it is aggregative with respect to all probability spaces. It's important to note that whether a proposition possesses a property is also something that is probability space relative – a proposition may possess a property P relative to some spaces in which it features, but not others. When it is obvious what probability space we are dealing with, this relativity can be suppressed (and Douven and Williamson do suppress it) – but it will assume some 5 significance in the next section. Given these definitions, Douven and Williamson prove the following: Theorem 1 Let <W, (W), Pr> be a finite, uniform probability space. If P is a structural property, Q is an aggregative property and P is sufficient for Q then, if there is a proposition  (W) such that  has P and Pr() < 1, it follows that  has Q. Proof Since Pr() < 1,   W and for some w*  W, w*  . For all wi  W, let i be a permutation on the elements of W such that i(wi) = w*, i(w*) = wi and i(w) = w for every other w  W. Define i() as {i(w) | w  } for all   (W). Each such i evidently meets the first two conditions for an automorphism. Each i also preserves the cardinality of propositions which, given that W, (W), Pr is finite and uniform, ensures that it preserves the probability of propositions. In this case, each i is an automorphism of W, (W), Pr. Observe that, for each i, wi  i() (if wi  , then  = i() and if wi   then i() results from  by exchanging wi and w*). Since, by stipulation,  has P and P is structural, it follows that, for all i,1  i  |W| i() has P and, thus, has Q. Since Q is aggregative, it follows that 1() ... |W|() has Q, but 1() ... |W|() = . QED2 The significance of theorem 1 for refined rules of rational acceptability should be clear: Let Q be the property of rational acceptability and P be a sufficient condition for rational acceptability as articulated by a refined rule. The endorsement of Closure amounts, in effect, to the requirement that Q be an aggregative property. Assuming a finite and uniform probability space, if P is structural and satisfied by some proposition that is less than certain it follows, by theorem 1, that  will satisfy Q. Douven and Williamson go on to show just how broad a class of potential refined 2 Douven and Williamson's proof also serves to establish the following, stronger theorem: Theorem 1* Let <W, (W), Pr> be a finite, uniform probability space. If P is a structural property with respect to <W, (W), Pr>, Q is an aggregative property and P is sufficient for Q then, if there is a proposition   (W) such that  has P and Pr() < 1, it follows that  has Q. Theorem 1* is stronger than theorem 1 on account of the fact that any structural property will be structural with respect to <W, (W), Pr>, but the converse need not hold. Theorem 1 is, however, strong enough for their purposes. 6 rules articulate structural conditions – including all of those defined in broadly formal (that is, logical or probabilistic) terms – but this aspect of the argument does not depend upon either finiteness or uniformity and need not concern us here. The finiteness and uniformity assumptions do, however, play an essential role in the above proof. Without these assumptions, there is no guarantee that the is, so defined, will be automorphisms of W, (W), Pr in which case there is no guarantee that the i()s will share the structural properties of . There are at least some prima facie reasons to think that this is a serious shortcoming. If we take the 'possible worlds' talk at face value, then it seems as though the finiteness assumption, at least, is very much out of place. That is, if W is to be regarded as the totality of possible worlds and possible worlds are to be understood in the familiar way, then W will clearly be an infinite set. Douven and Williamson do suggest that the 'possible worlds' in W not be regarded as maximally specific – rather, W should be thought of as comprising a mutually exclusive and jointly exhaustive set of states that are specific enough to supply all possible answers to the questions that are relevant (Douven and Williamson, 2006, pp775, 776). It is not entirely clear, though, that even this conception of the members of W will motivate the finiteness assumption – after all, certain questions permit of an infinite number of possible answers (such as those that can be answered with an arbitrarily high degree of precision). Neither, it should be pointed out, does this conception provide any obvious motivation for the uniformity assumption. And, in any case, there is surely something to the thought that Douven and Williamson's argument should be available for the most general and broad kind of probability space – the space in which all questions are relevant, the members of W are maximally finegrained and the set of propositions modelled is maximised. There is undoubtedly more that one could say here – but I take it there is at least some motivation for wanting a stronger, more general result. It's important to note that Douven and Williamson do supply a proof of a related theorem that is not restricted to finite probability spaces. This is significant – but the theorem is, in some respects, weaker than theorem 1 and the proof continues to 7 rely upon a fairly strong descendant of the uniformity condition. I will undertake something similar here. That is, I will prove a slightly weakened version of theorem 1 that holds for infinite probability spaces. The weakening, though, is of a different kind – and a kind that is not, I think, significant. And the proof will not rely upon any uniformity-type restriction. III. INFINITE PROBABILITY SPACES The class of probability spaces for which I will prove a modified version of theorem 1 will, naturally, be characterised by a series of structural assumptions. It's worth pointing out that there is no prospect of a 'universal' theorem – it is quite trivial to show that there are probability spaces (both infinite and finite) for which Douven and Williamson's result cannot be obtained. The class of probability spaces in question does, I think, have a special significance in the present context – for it is very plausible that the 'general' probability space mentioned above, in which the members of W are maximally fine-grained, will be a member of this class. The first structural constraint I will impose is that of countable additivity. A probability function Pr is said to be countably additive iff it meets the following condition: If i is an increasing sequence of propositions (1  2  3...) then Pr(ii) = limiPr(i). If the domain of Pr is finite then this condition is automatically met. Countable additivity is a relatively standard constraint to impose once we allow for the possibility of infinite probability spaces – and it was a part of Kolmogorov's initial axiomatisation – but it is not uncontroversial and, thus, certainly worth noting. Call a proposition  a sub-proposition of  just in case    and a proper sub-proposition of  just in case  . A proposition   F is said to be an atom of the probability space W, F, Pr just in case Pr() > 0, and for all propositions   F, if   then Pr() = 0. An atom is a proposition with positive probability, that has no proper sub-propositions with positive probability. If a probability space is finite then it must have atoms and, furthermore, every proposition that has positive probability will be the union of some atoms. In the kind of probability spaces that 8 Douven and Williamson consider, the atoms are just the singletons containing the members of W. If a probability space is infinite, however, then the possibility arises that the space be atomless. A probability space W, F, Pr is said to be atomless just in case, for any proposition   F such that Pr() > 0, there is a proper sub-proposition  of , such that Pr() > Pr() > 0. What atomlessness requires, in effect, is that any proposition with a positive probability has proper sub-propositions with lower positive probability. If W, F, Pr is atomless, it follows that, for any w  W, such that {w} F, Pr({w}) = 0. The second structural constraint that I shall impose is that of atomlessness. There is good reason to think that the most general probability space – in which the members of W are maximally fine-grained, and the set of propositions modelled is maximised – must be an atomless space. If the set of propositions we are considering is maximally rich then, for any proposition with a non-zero probability it is plausible that we will always be able to identify some further statistically independent proposition that also has a non-zero probability. By conjoining the two, we will arrive at a proposition that is less likely than either conjunct, but has a probability greater than zero. Clearly, this could only be satisfied in an atomless probability space. These remarks are merely intended as suggestive – but I won't pursue the matter further here. If the set W is uncountably infinite, then the simplifying assumption that the set of propositions F is equal to (W) becomes problematic – and we drop it here. If W is uncountably infinite then the assumption that every subset of W receives a probability value is incompatible with certain natural constraints upon Pr. If a probability space is finite and uniform, then the propositions in that space will receive only rational probability values. This follows straightforwardly from the observation made earlier – namely, that the probability of any proposition in a finite uniform probability space will be equal to the ratio of the cardinalities of two finite sets. In an infinite probability space, it will be quite possible for propositions to 9 receive irrational probability values. My proof, however, will continue to be limited to propositions that receive rational values – for reasons that will soon become evident. This is another assumption worth flagging. Let W, F, Pr and W, F, Pr be two probability spaces such that F  F and Pr is the restriction of Pr to the members of F. Say, in this case, that W, F, Pr is a fine-graining of W, F, Pr and W, F, Pr a coarse-graining of W, F, Pr. Fine– graining, in effect, augments the set of propositions captured by a probability space while coarse-graining diminishes it. As I mentioned in the previous section, whether a proposition possesses a property is, in general, something that is probability space relative – a proposition can possess a property relative to some probability spaces in which it features, but not others. Say that a property of propositions P is preserved by coarse-graining just in case any proposition that possesses P relative to a probability space must also possess P relative to any coarse-graining of that space in which it features. More precisely, P is preserved by coarse graining just in case for any probability spaces W, F, Pr and W, F, Pr such that W, F, Pr is a coarse graining of W, F, Pr, and any proposition   F, if  has P relative to W, F, Pr then  has P relative to W, F, Pr. Many structural properties will be preserved by coarse-graining – the property of having a probability above a certain threshold is a simple example – but structuralness itself provides no guarantee of this3. As can be easily checked the conditions outlined in both Pollock's and Douven's rules are also properties that are preserved by coarse graining. In fact, all of the extant rules considered by Douven and Williamson have this feature. I think that this is no accident. As I mentioned, all of these refined rules are specifically designed to exempt 'lottery propositions'. But lottery propositionhood, whatever it amounts to exactly, is a kind of extrinsic status that depends upon the availability of further propositions with certain characteristics. Generally speaking, the more fine grained a probability space, the easier it will be for a proposition to qualify as a lottery proposition and the more difficult it will be for a 3 Consider the property of being non-atomic – that is, the property of having a proper sub-proposition with positive probability. As can be easily checked, this property is structural. Let W = {a, b, c}, F = (W) and Pr be a uniform distribution over the members of W. Let F = {W, {a, b}, {c}, } and Pr be the restriction of Pr to F. W, F, Pr is a coarse graining of W, F, Pr, but {a, b} is non-atomic with respect to W, F, Pr and not with respect to W, F, Pr. 10 proposition to satisfy the condition articulated by a refined rule. That is, generally speaking, if a proposition satisfies the condition articulated by a refined rule relative to a given probability space, then it will satisfy that condition relative to any coarsegraining of that space in which it features. The theorem that I shall prove will be restricted to properties that are both structural and preserved by coarse-graining. It is in this way that it represents a weakening of theorem 1. To my mind, the result is quite damning for the project of devising refined rules of rational acceptance. But one could perhaps, view it in a more positive light – as indicating the direction in which the project might be taken forward. After all, there is nothing really preventing the formulation of rules articulating conditions that are not preserved by coarse-graining. I don't have anything to say about such a response here – though it is difficult, at first blush anyway, to see what an independently motivated rule of this kind might look like. As noted above, my proof will exploit a corollary of a result established by Villegas (1964) (see also Savage, 1972, pp37, 38) – a corollary to the effect that any proposition within an atomless probability space can always be partitioned into n equiprobable sub-propositions, for any positive integer n. What this means is that, within an atomless probability space, it is always possible to construct a finite, uniform sub-space around a given proposition. This is the rough strategy that will be employed. This construction will rely upon Zorn's Lemma. Let (S, ) be a partially ordered set. A subset C of S is described as a chain iff for all x, y  C, x  y or y  x. The lemma states that, if S is a nonempty, partially ordered set, such that every chain in S has an upper bound, then S has a maximal element. Zorn's Lemma is, famously, set-theoretically equivalent to the Axiom of Choice. I won't comment further upon its use here. Before giving the proof, I shall introduce some further terminology. Let W, F, Pr be a probability space with   F a finite and uniform partition of W. Let cl()  F be the closure of  under complementation and union. Call a function  a 11 -automorphism of W, F, Pr just in case  is a 1:1 function from cl() onto itself that satisfies these conditions: (i) (  ) = ()  () (ii) -() = (-) (iii) Pr() = Pr(()) for all ,   cl(). It is important to note that a -automorphism of W, F, Pr need only be partially defined upon F – its domain is cl()  F. Call a property P of propositions structural just in case, for any proposition   cl() and -automorphism ,  has P iff () has P. All structural properties must be -structural, for any  meeting the above conditions. This follows from the fact that cl() is itself a -field on W, in which case all -automorphisms of W, F, Pr will be autmomorphisms simpliciter relative to the coarse-graining W, cl(), Pr (where Pr is the restriction of Pr to the members of cl()). By the definition of a structural property, all structural properties must be preserved by all automorphisms of W, cl(), Pr. With this background, I shall prove the following: Theorem 2 Let W, F, Pr be a countably additive, atomless probability space. If P is a structural property preserved by coarse-graining, Q is an aggregative property and P is sufficient for Q then, if there is a proposition   F such that  has P relative to W, F, Pr and Pr() = r/k, for r and k positive integers with r < k, it follows that  has Q relative to some probability space. Proof Let  be a proposition such that Pr() = r/k, for r, k positive integers with r < k. Call a proposition  an r-minor sub-proposition of  just in case  is a sub-proposition of  such that Pr() > 0 and Pr()  Pr()/r. By atomlessness, there is a decreasing sequence of sub-propositions of , 1, 2 ... such that for each n, Pr(n ) > 0 and limn Pr(n) = 0, in which case  is guaranteed to have an r-minor sub-proposition, for any positive integer r. Consider the set R of all r-minor sub-propositions of . This set can be partially ordered by inclusion. If 1, 2... is a chain of elements 12 within this set (such that 1  2 ...) then limn Pr(n)  Pr()/r in which case, by countable additivity, Pr(nn)  Pr()/r. In this case, the union of the members of any chain of r-minor sub-propositions will itself be an r-minor sub-proposition and an upper bound to the chain. By Zorn's lemma, then, the set of r-minor sub-propositions of  must have a maximal member. Let  be one such member. Consider the proposition   ~. By atomlessness, there is a decreasing sequence of sub-propositions of   ~, 1, 2 ... such that, for each n, Pr(n ) > 0 and limn Pr(n) = 0. Since  is a maximal r-minor sub-proposition of  it follows that, for each n,   n is not an r-minor sub-proposition of  (because n is disjoint from  so Pr(  n) = Pr() + Pr(n) > Pr()); thus, for each n, Pr(  n) > Pr()/r. So limn Pr(  n)  Pr()/r. But limn Pr(  n) = limn (Pr() + Pr(n)) = Pr(). Since  is an r-minor sub-proposition of  we have Pr()/r  Pr(). Thus, Pr()  Pr()/r  Pr(), in which case we have Pr() = Pr()/r = 1/k. If r = 2 then 1/k = Pr() = Pr()/2 = (Pr(  ) + Pr(  ~))/2 = (Pr() + Pr(  ~))/2, so 1/k = Pr() = Pr(  ~). If r > 2, we then seek out a maximal (r-1)-minor sub-proposition of   ~ – call it  – which, by the above reasoning, will also have a probability of 1/k. If r = 3 then Pr() = Pr() = Pr(  ~  ~) = 1/k. If r > 3, we seek out a maximal (r-2)-minor sub-proposition of   ~  ~ and so on. After r-1 repetitions of this process,  will be divided into r exclusive and exhaustive sub-propositions, each with a probability of 1/k. Proposition  will be equivalent to the union of these r propositions. We then repeat the same process with respect to ~, which, after k-r-1 repetitions, will be divided into k-r exclusive and exhaustive sub-propositions, each with a probability of 1/k. In this case W is divided into k equiprobable, disjoint and exhaustive propositions. We have a uniform partition  of W of cardinality k such that   cl(). At this point, the proof, in essence, proceeds as before: Since Pr() < 1,   W and, for some *  , * is disjoint from . For all i  , let i be a permutation on the elements of  such that i(i) = *, i(*) = i and i() =  for every other   . Define i() as {i() |   } for all   cl(). Each such i evidently meets the 13 first two conditions for a -automorphism. Since the elements of  are equiprobable, it also meets the third condition in which case each i is a -automorphism of W, F, Pr. Since P is preserved by coarse graining, it follows that  has P relative to W, cl(), Pr, where Pr is the restriction of Pr to the members of cl(). Since P is structural, it follows that P is -structural and, for all i,1  i  k i() has P and, thus, has Q. Since Q is aggregative, it follows that 1() ... k() has Q relative to W, cl(), Pr. But 1() ... k() = . QED4 It is possible, then, to modify theorem 1 by adding the requirement that P be preserved by coarse graining and relaxing the requirement that W, F, Pr be finite and uniform, allowing for the additional possibility that it be infinite and atomless (as well as countably additive). I don't for a moment think that this is the strongest such theorem that will be available (an analogue of the argument could certainly be mounted for certain 'mixed' probability spaces – that is, spaces that can be decomposed into atomic and non-atomic parts). Nevertheless, I think the theorem is particularly significant, for the reasons outlined, and makes the prospect of retaining a refined rule by denying structural assumptions a far less attractive one. References Douven, I. (2002) 'A new solution to the paradoxes of rational acceptability' British Journal for the Philosophy of Science v53, pp391-410 Douven, I. and Williamson, T. (2006) 'Generalising the lottery paradox' British Journal for the Philosophy of Science v57(4), pp755-779 4 Interestingly, this proof does not suffice to establish a corresponding extension of theorem 1* mentioned in footnote 2: Theorem 2* Let W, F, Pr be a countably additive, atomless probability space. If P is a structural property with respect to W, F, Pr that is preserved by coarse-graining, Q is an aggregative property and P is sufficient for Q then, if there is a proposition   F such that  has P and Pr() = r/k, for r and k positive integers with r < k, it follows that  has Q relative to some probability space. From the assumption that P is structural it will follow automatically that P is structural with respect to W, cl(), Pr (for   F and  a partition of W). But this will not follow from the weaker assumption that P is structural with respect to W, F, Pr. The following demonstration was pointed out to me by Stephan Leuenberger: Let P be the property of being atomic and true – that is, containing a designated 'actual' world. Since W, F, Pr is an atomless space, P will not be satisfied by any members of F and, thus, will count as trivially structural with respect to it. Since W, cl(), Pr is atomic, P will not be structural with respect to it, since truth is not preserved by automorphisms. 14 Kyburg, H. (1961) Probability and the Logic of Rational Belief (Middleton: Wesleyan University Press) Kyburg, H. (1970), 'Conjunctivitus', in Swain, M. ed. Induction, Acceptance and Rational Belief (Dordrecht: Reidel), pp55–82. Lehrer, K. (1974) Knowledge (Oxford: Clarendon Press) Nelkin, D. (2000) 'The lottery paradox, knowledge and rationality' Philosophical Review v109, pp373-409 Pollock, J. (1990) Nomic Probability and the Foundations of Induction (Oxford: Oxford University Press) Ryan, S. (1996) 'The epistemic virtues of consistency' Synthese v109, pp121-141 Savage, L. (1972) Foundations of Statistics (New York: Dover Publications) Villegas, C. (1964) 'On qualitative probability -algebras' Annals of Mathematical Statistics v35(4), pp1787-