BAYESIAN DECISION THEORY AND STOCHASTIC INDEPENDENCE1 by Philippe Mongin (CNRS & HEC Paris) mongin@greg-hec.com March 2018 Abstract. Stochastic independence (SI) has a complex status in probability theory. It is not part of the denition of a probability measure, but it is nonetheless an essential property for the mathematical development of this theory, hence a property that any theory on the foundations of probability should be able to account for. Bayesian decision theory, which is one such theory, appears to be wanting in this respect. In Savages classic treatment, postulates on preferences under uncertainty are shown to entail a subjective expected utility (SEU) representation, and this permits asserting only the existence and uniqueness of a subjective probability, regardless of its properties. What is missing is a preference postulate that would specically connect with the SI property. The paper develops a version of Bayesian decision theory to ll this gap. In a framework of multiple sources of uncertainty, we introduce preference conditions that jointly entail the SEU representation and the property that the subjective probability in this representation treats the sources of uncertainty as being stochastically independent. We give two representation theorems of graded complexity to demonstrate the power of our preference conditions. Two sections of comments follow, one connecting the theorems with earlier results in Bayesian decision theory, and the other connecting them with the foundational discussion on SI in probability theory and the philosophy of probability. 1 Introduction and preview The property of stochastic (or statistical, or probabilistic) independence occupies a rather special place in the mathematical theory of probability. It does 1A rst version of this paper was presented at a seminar at the Munich Center for Mathematical Philosophy and at the TARK 2017 conference. The present version has particularly beneted from comments made by Donald Gillies, Joseph Halpern, Marcus Pivato, Burkhard Schipper, Paul Weirich and two anonymous TARK referees. Special thanks are due to Richard Bradley and Peter Wakker for their detailed suggestions. We thank the Investissements dAvenir (ANR-11-IDEX-0003/Labex Ecodec/ANR-11-LABX-0047) for supporting our research. 1 not belong to the properties that this theory singles out to dene a probability measure axiomatically. Indeed, its familiar denitions by the multiplication rule, or the equality of conditional with unconditional probability, do not enter the Kolmogorov axiomatization of a probability measure. Rather, they capture properties of given events (and more generally, of given partitions or random variables) for a given probability measure and can thus be adopted only to model particular situations. At the same time, probability theory obviously makes extensive use of independence assumptions, as evidenced by the Laws of Large Numbers, many theorems on stochastic processes and some core results of statistical theory. For Kolmogorov himself, this property occupies a "central position in the theory of probability" (1933-1956, p. 8), making it signicantly di¤erent from the theory of positive measures. One would thus expect all theories of the foundations of probability to pay careful attention to stochastic independence, but this is not the case. In this paper, we investigate Bayesian decision theory, one of the most inuential among these theories, and at the same time an example of neglect of this major property. As is well known, Bayesian decision theorists have developed a brand of subjective interpretation for the probability calculus. They claim that an agents uncertain beliefs should be represented by a probability measure, and ground their claim on a pragmatic argument. They show formally that if the agents preferences over uncertain prospects typically, but not exclusively over monetary bets obey certain requirements of practical rationality, these agentsbeliefs should conform to the axiomatic denition of a probability measure. Bayesian decision theorists hardly go beyond this demonstration. As long as they have nothing to add on stochastic independence, it remains unclear whether they have established an appropriate connection between the probability calculus and rational belief. They can be criticized for handling only the basics of the probability calculus, and not its actual working. More technically, Bayesian decision theory o¤ers a representation theorem for preferences over uncertain prospects that involves two sets of quantities, utilities (over the consequences of prospects) and probabilities (over the uncertain events), these two items being combined by the familiar rule of expected utility (EU). After Ramseys and de Finettis sketches, this strategy was implemented in full axiomatic detail by Savage (1954-1972). At one point, he extends the representation theorem to obtain a posterior probability measure, i.e., one that represents beliefs after a partial resolution of uncertainty, and proves that this posterior obeys Bayess rule of revision; properly speaking, the "Bayesian" label of the school becomes fully justied only at this stage. This is also where Savage stops. As we will report below, however, he acknowledges that a treatment of stochastic independence should have come next, but his honest admission of an unnished business was lost on most Bayesian decision theorists. The rare exceptions will be discussed below. What is missing is a further condition put on the agents preferences under uncertainty that would account for that property specically, and the aim of the present paper is to provide one. We set up a framework in which there are two distinct sources of uncertainty S and T , and states of nature thus have the 2 form of two-component vectors (s; t) 2 ST . Our proposed condition is stated in terms of the agents conditional preferences: those dened conditionally on s should be the same for all s in S, and those dened conditionally on t should be the same for all t in T . What this intuitively says is that if the agent is uncertain on s but not on t, or is uncertain on t but not on s, then such a partial knowledge cannot improve the decisions made by the agent under the residual uncertainty. We thus recover in preference terms one of the standard informal justications of stochastic independence: if the occurrence of an event A carries no information on the occurrence of B, and vice versa, then the two events should be declared to be independent. An alternative informal justication, which is also common, goes as follows: if the occurrence of an event A does not inuence the occurrence of B, and vice versa, then the two events should be declared to be independent. The two lines are semantically distinct, but easily get mixed up in probability texts and even some works in the philosophy of probability. In the Bayesian decision theory of this paper, there is no danger of confusion since the information carried by events matters only if it enters the agents decision process, hence is subjective in character; objective connections holding between events, for instance causal connections, play a role only if the agent considers them relevant. Thus, one merit of the theory is to give the informational reading of stochastic independence a foundation that clearly sets it apart from other possible readings. Besides this contribution, the theory casts some light on the reective discussion of stochastic independence in probability theory and the philosophy of probability. In particular, it has something to say on the symmetry of the multiplicative denition As independence of B implying Bs independence of A -, a property that these literatures have often called into question. That stochastic independence follows from the above mentioned preference condition can be checked by assuming that the EU formula holds, and seeing what this condition entails for the shape of subjective probability entering the formula. However, this can only be an heuristic step towards the theoretical work, for Bayesian decision theory makes a strong point of taking the agents preferences as its only primitive terms. Consistently with this, one should devise a system of preference conditions that jointly delivers the EU rule and the stochastic independence property of the subjective probability for relevant events. The main task of this paper is to set out such a large system. Its EU part is more elementary than Savages system, but for that reason also much handier, and we will argue that it compares favourably with other variants of this canonical system such as Anscombe and Aumanns (1963). Our technical source is in the recent work by Mongin and Pivato (2015, 2017). We o¤er two representation theorems in succession. Both are adapted to the multiple uncertainty framework sketched above, and both deliver an EU formula in which the subjective probability is multiplicative in the values of two sources, hence satises stochastic independence. The highly condensed axiomatic system of Theorem 1 makes it possible to reach both the EU formula and the multiplicative property in one go. This brevity is both an advantage and a disadvantage. To derive the EU formula and the multiplicative property in two 3 succesive logical steps helps one better understand how each of the preference axioms contributes to the conclusions. The more advanced Theorem 2 is devised for this purpose. The rest of this paper is organized as follows. Section 2 introduces the twofold uncertainty framework and the preference conditions for Theorem 1 via a motivating example. Section 3 states Theorem 1 formally, and section 4 does the same for Theorem 2. Section 5 is devoted to technical comparisons within Bayesian decision theory and section 6 to conceptual comparisons within the philosophy of probability. An appendix gives proof details on the two theorems. 2 A motivating example Given any probability space ( ;A; P ), two events A;B 2 A are said to be stochastically independent if P (A \B) = P (A):P (B). Building on this elementary denition, probability theory also denes what it means for sets of events, partitions or random variables to be stochastically independent. Here we will approach stochastic independence (SI) by specializing the state set to be a product set, a standard move in the theory when it comes to working with this property (see, e.g., Halmos, 1974, p. 191-192). The simple example of this section illustrates this framework and the main decisiontheoretic concepts of the paper. Suppose that a corn producer must decide how much land to farm while not knowing what the climatic conditions and the demand for corn will be at the time of the harvest; supppose also this producer evaluates each farming policy in terms of monetary proceeds and no other criterion. Following the basics of decision theory, we can reexpress his example symbolically as follows. There is a set of states of the world, which takes the form of a product set = S  T , where S represents the set of unknown climatic conditions and T the set of unknown values for demand. There is a set of consequences, which we take to be the set of real numbers R to represent monetary proceeds. There is a set of uncertain prospects, i.e., mappings from the states of the world to consequences, each of which represents a farming policy, which we take to be RST , the set of real functions on . Finally, there is a binary relation % on the last set of prospects to capture the producers preferences among cultivation policies. Now suppose that this preference relation obeys the EU rule, i.e., there exists a probability function  on = S  T and a utility function u on R such that for all uncertain prospects X, Y, X % Y i¤ X s;t (s; t)u(X(s; t)  X s;t (s; t)u(Y(s; t), and moreover that  satises the stochastic independence (SI) property with respect to S and T , i.e., (s; t) = p(s)q(t), 4 where p and q are probability functions on S and T respectively. This equation, also written as  = p q, determines p and q uniquely; these are the marginals of  on S and T , respectively. The EU rule and SI property are our intended conclusions; in this section, we reason heuristically, working backwards from them to a set of preference conditions that could be proposed as axioms. Assume for simplicity that S = fs1; s2g and T = ft1; t2g. Then, the probabilities are given by the table: t1 t2 s1 ps1qt1 ps1qt2 s2 ps2qt1 ps2qt2 and an uncertain prospect X is represented by the following table, which gives the consequences of this prospect in each state: X t1 t2 s1 x11 x12 s2 x21 x22 The EU formula for %: V (X) = ps1qt1u(x11) + ps1qt2u(x12) + ps2qt1u(x21) + ps2qt2u(x22). can be restated either as: () V (X) = ps1 [qt1u(x11) + qt2u(x12)] + ps2 [qt1u(x21) + qt2u(x22)] , or as: () V (X) = qt1 [ps1u(x11) + ps2u(x21)] + qt2 [ps1u(x12) + ps2u(x22)] . Observe that the bracketed sums in () are numerical representations for conditional preferences on the possible values of s, and those in () are numerical representations for conditional preferences on the possible values of t. Thus, the equations entail that (i) conditional preferences are orderings. Since the same functional form qt1u(:) + qt2u(:) appears in the two bracketed sums of (), and similarly the same functional form ps1u(:) + ps2u(:) appears in the two bracketed sums of (), the equations also entail that (ii) conditional preferences are the same for di¤erent s, and the same for di¤erent t. Lastly, from the same equations, if the conditional orderings for both s1 and s2, or the conditional orderings for both t1 and t2, agree to rank prospect X above prospect Y, then the overall preference % ranks X above Y. Thus, it also holds that (iii) preferences over prospects are increasing with respect to either family of conditional preferences. We have stated (i), (ii) and (iii) in the preference language and by abstracting entirely from the specics of EU. Each of the three can indeed be satised by more general theories, and in particular, the dominance property (iii) is well known to apply to most existing alternatives to EU theory (like rank-dependent theory, see, e.g., Wakker, 2010). 5 In Theorem 1 below, we assume (i), (ii) and (iii), plus some background conditions, thus showing that these conditions are not only necessary, but also su¢ cient for our desired conclusions. Given the denition of a conditional, which is restated below, it is actually possible to fuse (i) with (iii) and obtain an even more condensed system. One may wonder how such apparently weak conditions can do the mathematical work. The key point is that they apply to s and t at the same time, and this creates the possibility of representing the preference % both in terms of s-conditionals and t-conditionals. Identifying these representations leads to the results. The equivalence between them thus transpires here from the algebraic equivalence between the two factorizations () and (). 3 A rst EU representation theorem involving stochastic independence Formally, there are two variables of interest, s 2 S and t 2 T , and a state of the world is any pair (s; t) 2 = S  T . By assumption, S and T are nite with cardinalities jSj, jT j  2. We keep the same number of factors in the product set as in the motivating example, but this is only for mathematical simplicity. The next section will illustrate how a larger number of factors can be handled. We take the set of consequences to be R and the set of prospects to RST .2 The sets of all probability functions on S, T and S  T are denoted by S , T and ST , respectively. It is convenient to represent prospects X as jSj  jT j matrices, with each s standing for a row and each t standing for a column. We will thus write X = [xts] t2T s2S , but sometimes also X = (x1,...,xjSj), where each component is a row vector xs 2 RT , or X = (x1,...,xjT j), where each component is a column vector xt 2 RS . By assumption, the agent compares prospects in terms of an ex ante preference relation %. As a maintained assumption, we take this relation to be a continuous weak ordering, hence representable by a continuous utility function. The other preference relations are obtained from % as conditionals. There are three families of conditionals to consider, i.e., f%sgs2S , f%tgt2T and f%stgs2S;t2T . The last family represents ex post preferences, and the rst two represent interim preferences, since each relation in these families depends on xing one variable and letting the other vary, which amounts to resolving only part of the uncertainty. We now formally dene how these conditionals are obtained from their master relation %. For example, %s, the conditional of % on s 2 S, is the relation 2These two assumptions are only for mathematical simplicity. The theorem below could be proved for smaller domains than R and RST , so as to pay attention to feasibility constraints on what counts as a consequence and what counts as a prospect. 6 %s on RT dened by the property that for all xs;ys 2 RT , xs % s ys i¤X % Y for some X;Y 2 RST s.t. xs is the s-row of X, ys is the s-row of Y, and X and Y are equal outside their s-row. The conditional of % on t 2 T and the conditional of % on (s; t) 2 S  T are dened in similar ways. Importantly, the denition of conditionals does not by itself make them weak orderings. By a well-known fact of decision theory, %s is a weak ordering if and only if the choice of X;Y in the denition of s is immaterial, or more precisely, if and only if X % Y () X0% Y0 when X0;Y0 also satisfy the condition stated for X;Y in this denition. When this holds, % is said to be weakly separable in s. By another well-known fact, weak separability in a factor (or product of factors) is equivalent to the property that % is increasing with the conditional on this factor (or the conditionals on the product of factors). Thus, for all X;Y 2 RST , if xs %s ys for all s, then X % Y; and if moreover xs s ys for some s, then X Y.3 Combining the two facts just stated, we see that conditions (i) and (iii) of the previous section can be fused into the single requirement that all %s and all %t are weak orderings.4 Since conditionals %st compare real numbers, it makes sense to identify them with the natural order of these numbers. This assumes that they represent desirable quantities, be they money values, as in the producer example, or something else. Thus, as another maintained assumption, we require that for all (s; t) 2 S  T and all xts; yts 2 R, xts %st yts i¤ xts  yts. Since this equivalence turns the %st into orderings, % is increasing with each of them, hence also with each entry xts of X. Let us say that the conditionals %s ( resp. %t) are an invariant family if %s= %s0 for all s; s0 2 S (resp. %t= %t0 for all t; t0 2 T ). Such requirements capture condition (ii) of previous section. They are not needed for the %st, which are identical relations by construction. We are now ready for the rst representation theorem. Theorem 1 The following conditions are equivalent:  The conditionals %s and %t are weak orderings for all s 2 S and all t 2 T , and either family of conditionals is invariant.  There are increasing, continuous function u : R  !R, and strictly positive probability functions p 2 S and q 2 T , such that  is represented by 3By , s, t and st we mean the strict preference relation associated with the weak preference relations %, %s, %t and %st. 4For these denitions and basic facts, see Fishburn (1970) or Wakker (1989). 7 the function V : RST !R that computes the p q-expected value of u, i.e., by the function dened as follows: for all X = [xts] t2T s2S , V (X) := X s2S X t2T ps qtu(x t s). In this format of EU representation, p and q are unique, and u is unique up to positive a¢ ne transformations. The conclusions state both the EU formula and that the two sources of uncertainty satisfy SI, so this theorem extends Bayesian decision theory up to the point that, from the argument made in the introduction, it ought to have reached. 4 A second EU representation theorem involving stochastic independence In Theorem 1, strong results follow from a short list of assumptions, undoubtedly a feature of mathematical elegance, but also a cause for conceptual dissatisfaction. Would it not be better to expand on the assumptions and separate those which are responsible for the existence of the EU representation from those which account for the SI property occurring in this representation? Such a disentangling would be the more justied since SI is an optional property of a probability measure, hence in need of a preference condition that should be detachable from those underlying the existence of this measure. However, the assumptions of Theorem 1 cannot be so divided, as the following argument shows. By taking the %s and %t to be merely orderings, not invariant orderings, one would get an additively separable representation that does not separate the utility and probability components of the added terms, unlike the EU representation. By taking only one of the two families to satisfy the ordering and invariance assumptions, one would get a representation that is only separable in that family and would be even more remote from the EU representation.5 As it turns out, however, we can obtain a relevant partitioning of assumptions if we enrich the decision-theoretic framewok beyond the present two-dimensional 5The additively separable representation of the rst case reads asX s2S;t2T vst(x t s), with increasing and continuous vst : R! R, s 2 S; t 2 T . In the second case, if the assumptions only hold for the %s, the separable representation reads as W (V1(x1); :::; VjSj(xjSj)), with increasing and continuous W : RS ! R and Vs : RT ! R, s 2 S. These conclusions follow from standard results in separability theory; see, e.g., Blackorby, Primont and Russell (1978). 8 stage. Let us suppose that the agent pays attention not only to the uncertainty dimensions s and t of the nal consequences, but also to a third dimension i, so that these consequences are now represented by real numbers xist. The added dimension can be thought of in several ways, like time, space or an omitted dimension of uncertainty. Each of these concrete suggestions can t the motivating example: the added dimension would indicate when, where or under what further unknown circumstance the monetary proceeds of a farming policy accrue to the producer. We will return to the interpretation of the added dimension after stating Theorem 2. By assumption, i takes its values in a nite set I with cardinality jIj  2. Prospects are now dened as mappings from triples (s; t; i) to the real numbers, that is as three-dimensional arrays, X =  xts i2I s2S;t2T 2 R STI . These may be rewritten as vectors X = (X1; :::;XjSj), X =(X1; :::;XjT j) or X = (X1; :::;XjIj), the components of which are now matrix-valued, i.e., Xs = (xist) i2I t2T 2 RTI , Xt = (x i st) i2I s2S 2 RSI and Xi = (xist)s2S;t2T 2 RST respectively. As before, the agent compares prospects in terms of a preference relation %, which is a continuous weak ordering, and this relation gives rise to various families of conditional relations. The %s, %t and %i respectively compare matrices Xs, Xt, Xi, as dened above, the %st compare vectors xst 2 RI , and the %ist compare real numbers. Similarly as before, we assume that each %ist coincides with the natural order of real numbers, which makes it an ordering, and furthermore turns the %ist into an invariant family. The other conditional relations may or may not be weak orderings, and may or may not form invariant families, depending on which assumptions are put on them. Theorem 2 The following conditions are equivalent:  The conditionals %i are weak orderings.  There are increasing, continuous functions ui : R  !R, for all i 2 I, and a strictly positive probability function  2 ST , such that  is represented by the function W : RSTI !R that computes the -expected value ofP i2I u i, i.e., the function thus dened: for all X = [xts] i2I s2S;t2T , () W (X) := X s2S;t2T X i2I st u i(xist). In this format of representation,  is unique, and the ui are unique up to positive a¢ ne transformations with a common multiplier. Moreover, the following are equivalent: 9  The same assumption on the %i holds, and the %s are weak orderings and an invariant family.  The same conclusions hold, and there are strictly positive probability functions p 2 S and q 2 T with  = p q, so that () becomes: for all X = [xts] i2I s2S;t2T , () W (X) = X s2S X t2T X i2I ps qtu i(xist). In this format of representation, p and q are unique, while the ui have the same uniqueness properties as before. Unlike Theorem 1, Theorem 2 is in two parts, corresponding to the EU formula and the SI property specically. What appears to be essential to the latter is that one of the two sources (here conventionally taken to be S) gives rise to an invariant family of conditionals. We now reinforce the suggestion that invariance is the crucial condition by a heuristic argument. Considering for simplicity only four states, suppose that the agent takes (s1; t1) to be more likely than (s1; t2), and (s2; t1) less likely than (s2; t2). That is, from knowing how the uncertainty on s is resolved, the agent is able to draw an inference on how the uncertainty on t would be resolved. If the agent reasoned probabilistically, the joint probabilities would of course not decompose multiplicatively. It is easy to conclude that the conditionals on s cannot be invariant. Take ; 0 representing desirable quantities, with  > 0, and the following prospects in matrix form: X t1 t2 s1   0 s2  0  and Y t1 t2 s1  0  s2   0 . The rst line of X, which puts the best consequence on the more likely state, should be preferred to the rst line of Y, which puts it on the less likely state; that is, (; 0) s1 (0; ). By a similar comparison, the second line of X should be preferred to the second line of Y; that is (0; ) s2 (; 0). Thus, the two conditional preferences di¤er. Contraposing the argument, we see that, if the %s are an invariant family, then, were the agent to know the particular s, he would not use this knowledge to draw information on the unknown t. The converse also holds: an agent who is in such an epistemic disposition has no reason for entertaining conditional preferences %s that are variable with s. This argument connects the invariance property of conditional preferences with the informational rendering of SI mentioned in the introduction. To derive (), it is unnecessary to assume that both the %s and the %t are invariant. The invariance of the latter relations follows from () itself.6 If it is su¢ cient to require one direction of invariance, this is because the EU 6From (), the %t2T are represented by P s2S P i2I ps u i(:), which does not depend on t. 10 representation () holds from the previous stage. In an EU framework, it is impossible to distinguish between s bringing no preferentially relevant information on t, and t bringing no preferentially relevant information on s. This is formally shown in the appendix. We now return to the interpretation of the third dimension i introduced in this section. A very natural decision-theoretic account becomes available when i represents time. Then, the alternatives X mean contingent plans, i.e., plans for the future whose consequences in a given period depend on the way the uncertainty still represented by (s; t) is resolved in that period. The matrixvalued objects Xs, Xt and Xi mean partly contingent plans (for the rst two, when one dimension of uncertainty is xed) or dated prospects (for the last, when the time dimension is xed). As to the vector-valued objects xst, they mean non-contingent plans, since they take the uncertainty to be entirely resolved.7 However, time considerations are extraneous to uncertainty, which is the concern here, and it may be more appropriate to nd an interpretation for i that relates to these concerns. Suppose we declare i to be a third dimension of uncertainty. We can then add a third part to Theorem 2, in which puts on the %i the same invariance assumption as was imposed on the %st and the %t. From this addition, it can be proved that () gives way to the following more specic representation: for all X = [xts] i2I s2S;t2T , (  ) W (X) = X s2S X t2T X i2I ps qtriu(x i st), where r = (ri)i2I is a strictly positive probability function on I, and the utility function u in the EU formula is now independent of the i index. Besides having a semantic advantage, this extension of Theorem 2 carries with it a sense of mathematical generalization. To obtain the SI property for a product space

with any nite number of uncertainty factors is no more di¢ cult than to obtain it for = S  T  I, but this would impose a heavy notational burden. 5 Comparisons with decision theory We start this decision-theoretic section with two comments that Savage makes on SI in his Foundations of Statistics. Having axiomatized a qualitative probability relation, he complains that "the notions of independence and irrelevance have ... no analogues in qualitative probability; this is surprising and unfortunate, for these notions seem to evoke a strong intuitive response" (1954-1972, p. 44). Later, at the end of a well-known passage on "small worlds", Savage restates his complaint as follows: "it would be desirable, if possible, to nd a simple qualitative personal description of independence between events" (p. 91).8 The two comments clearly express the need for Bayesian decision theory to 7These interpretations assume that each period is uncertain in the same way as any other, i.e., no interaction exists between the resolution of uncertainty and the passing of time. 8Savage used to say "personal probability" where later theorists say "subjective probability". 11 complement its derivation of subjective probability with an account of SI, but point in di¤erent directions. The rst relates to qualitative probability, which is an auxiliary concept in Savages construction; he uses it as an intermediary between his preference postulates and his nal conclusions, in which subjective probability acquires a numerical form. Today, Savages remark in respect of this concept is no longer justied. There now exist richer systems of qualitative probability than his, which contain a special relation to express the stochastic independence of two events or two random variables.9 The second comment does not mention qualitative probability and we read in it a suggestion to base SI directly on a preference foundation. In this respect, Savages complaint of a lacuna is still essentially justied. To the best of our knowledge, there are only three earlier works in decision theory that overlap with the present research. Blume, Brandenburger and Dekel (1991, p. 74) introduce a preference condition that is akin to our invariance condition and heuristically stress its connection with SI, but do not include it in their representation theorems. Their topic is anyhow the preference foundations of lexicographic probability, not of standard Kolmogorov probability.10 In an important follow up, Battigalli and Veronesi (1996) push the analysis of Blume, Brandenburger and Dekel up to the stage of representation theorems, but these are also concerned with lexicographic probability or related non-standard notions. Neither Theorem 1 nor Theorem 2 appear in these two works. More directly related is the version of Bayesian decision theory proposed by Bernardo, Ferrandiz and Smith (1985). This includes a preference condition relative to two events E and F that will entail the equation P (E \ F ) = P (E):P (F ) at the stage of proving the EU representation theorem. Although evocative of the informational reading of SI, this condition di¤ers from ours, and this di¤erence seems connected with the authorstechnical choice of approaching SI in general probability spaces rather than product spaces, as we do here.11 We now compare our axiomatization of the EU with those in current use. Being entirely preference-based, the former is like Savages (1954-1972), but there are important dissimilarities. An obvious one concerns the cardinality of the state set , which is innite in Savage and nite here. The axiom systems in Theorem 1 and Theorem 2 are highly condensed and do not relate to Savages seven-postulate system in a transparent way. However, the assumption that 9See Domotor (1969), Fine (1971), Kaplan and Fine (1977), Luce and Nahrens (1978), to cite only the earliest accounts of SI in terms of qualitative probability. Fines 1971 classic, Theories of Probabilities, makes interesting comments on SI, and at some point (p. 36-37) even suggests moving in the direction of a pragmatic, preference-based account of SI. 10Generally, a lexicographic probability is a nite sequence (p1; :::; pK) of probability measures on the same probability space. In the more specic denition privileged by Blume, Brandenburger and Dekel (1991), the supports of the successive pi are disjoint. 11As a further development of Joyces (1999) representation theorem for pairs of credibility and preference relations, Bradley (2017, p. 104) shows that a weak separability assumption on the credibility relation imposes the SI property on the probability measure representing this relation. Joyces and Bradleys analyses belong to Je¤reys (1965) theory of decision, which is several steps removed from the Bayesian decision theory of this paper. 12 certain conditionals are orderings amounts to replacing his postulate P2 the "sure-thing principle"by a dominance principle, which is weaker and more generally accepted. To make good for this loss, the invariance condition, when it applies at all, bears on all possible prospects and not only on constant prospects, as is the case in Savages P3 the "event independence" postulate. Savage has another important postulate, P4, which is a clear step towards the existence of subjective probability and has no analogue here. Our best guess is that P4 is made dispensable by the assumption that consequences are real numbers and conditional preferences respect the order of these numbers. By contrast, Savage puts no restriction at all on his consequence set. Another comparison to the point is with Anscombe and Aumanns (1963) popular variation on Savages system. We share with these authors the assumptions of a nite state set and a highly structured consequence set, but they assume the latter to be a set of probabilistically dened lotteries, an assumption we are glad to eschew here. From a Bayesian decision theory perspective, the Anscombe-Aumann system is open to the objection that it is question-begging to derive a subjective probability by supposing that other probabilities already exist. From the perspective of Bayesian decision theory, all probabilistic items require a preference derivation. The rejoinder that the preexisting probabilities are objective, hence of a di¤erent nature from the subjective probability to be derived, is a free commentary without any basis in Anscombe and Aumanns formal system. We do not deny the pratical convenience of this system, but ours is no more complicated, while being perhaps easier to defend theoretically.12 6 Connections with foundational discussions Underlying the axiomatic work of Savage and Bayesian decision theorists generally are two major claims on the foundations of probability: probability measures represent uncertain beliefs in the normatively appropriate way, and what makes the measures in question normatively appropriate is that pratical rationality considerations recommend using them in decision making. Both claims have been disputed, with some objections surfacing already before Bayesian decision theory fully took shape. The rst claim can be attacked along at least two di¤erent lines. One may question the appropriateness of probabilities on the ground that they are absolute measures, and as an alternative develop a calculus for conditional probabilities taken as primitive terms. This line is historically associated with Poppers (1959-1972, Appendices *iv and *v) axiomatization of probability and has recently been defended by Fitelson and Hajek (forthcoming). The existing conditional probability systems preserve the additivity of probability measures, and an alternative critical line is indeed to question that property. Decision theory has made thorough contributions here; see Gilboa 12For convenience, both Blume, Brandenburger and Dekel (1991) and Battigalli and Veronesi (1996) use the Anscombe-Aumann system. There are other alternatives to this system than the present one for applications to nite state sets. An early example is Wakkers (1989, ch. IV). 13 (2009) and Wakker (2010) for overviews. As to the second foundational claim, it can again be attacked from di¤erent sides, one representative example being Joyces (1998) "nonpragmatic" argument that probabilities are appropriate representations of uncertain beliefs for directly epistemic reasons.13 These deep foundational questions arise in connection with the present work, but exceed its limited purpose. We meant to ll a gap in Bayesian decision theory by following its own principles, rather than defend these principles against outside criticism. However, since SI is our focus, we should ask whether this theory, as extended here, may contribute to a better understanding of this property. There is much conceptual tension in the way probability theorists introduce the denition of SI. For one thing, they usually discuss its informal meaning in terms of a provisional denition of SI by the equality of conditional and unconditional probability: for any two events A;B 2 A with P (B) > 0, P (A j B) = P (A). Once they have made intuitive sense of this equation, they proceed to the multiplicative equation of section 2 as constituting the proper denition of SI, arguing that the latter avoids the sign restriction P (B) > 0 and makes SI symmetric. This argument is unconnected with the intuitions supporting the provisional denition, which makes the whole sequence semantically awkward.14 For another thing, probability theorists informally defend their denitions by resorting to more than one concept of unrelatedness. Prominent examples are logical independence, causal independence (or alternative forms of objective independence), and informational independence. While some accounts are relatively clear on which concept they privilege, many others are equivocal, and some even fall into amazing confusions between them.15 The Bayesian decision theory developed here contributes nothing to the rst problem because it does not allow for 0-probability events, and thus cannot properly distinguish between the provisional and nal denition of SI. This limitation is a price to pay for a handy mathematical apparatus (more on this in the appendix). On the second problem, however, the theory has something to say. At the very least, it avoids the equivocations between the di¤erent informal accounts by rmly opting for informational independence; the preference condition that Theorem 2 highlights, the invariance property of conditionals, is unquestionably on the pragmatic side of the foundations of probability. The agents unwillingness to adapt his preferences over t-uncertain prospects to the knowledge of s is the criterion by which we judge that this agent regards information on s as being irrelevant to t. It remains to be said whether the theory contents itself with endorsing one of the available accounts of SI or adds something signicant to that account. 13Leitgeb and Pettigrew (2010) have recently pursued this line of purely epistemic justication with a new derivation of probability from an accuracy requirement. 14Two examples among many of this two-step denitional sequence are Feller (1950-1968, p. 125) and Hoel, Port and Stone (1971, p. 19). 15Here is a curious example due to two otherwise excellent scholars: SI means that "the knowledge of (one event) does not a¤ect the other" (Luce and Narens, 1978, p. 226). Naturally, one would expect "the knowledge of the other" instead of "the other". 14 We credit the theory with two possible contributions. One is to connect the SI property with the foundations of subjective probability more tightly than is usually done. There has been some vacillation among subjectivists concerning the role of SI assumptions in probability theory. Whereas Savage did not underplay this role, de Finetti considered it with strong reluctance. As Gillies (2000, p. 75-76) explains, citing from Probabilismo (1931), de Finetti argued against the application of SI to repeated trials of the same experiment on the ground that this assumption blocked the possibility of learning from the successive results of the trials through Bayess rule of revision. This argument opened the way to de Finettis alternative to SI, which is exchangeability. Without entering the rich debate well covered by Gillies on the respective merits of the two concepts, we can make the broader point that learning by Bayess rule is just one particular case to be considered by the subjective theory of probability. It is possible to make perfect subjective sense of the opposite particular case in which no learning occurs; it is actually incumbent on the subject to decide which case is relevant. In other words, there is no logical necessity to associate the subjective theory with a large scope of application of Bayess rule. Although this point may be clear by itself, it comes out perhaps more clearly after Bayesian decision theory, which is a brand of subjectivism, has o¤ered an account of SI. Another contribution is to put the symmetry of the denition of SI in perspective. Writers on the foundations of probability have sometimes expressed dissatisfaction towards the fact that asymmetric dependence or independence cannot be formulated within the Kolmogorov axiomatic framework; see Fitelson and Hajek (forthcoming) for a recent example. This is a fair complaint to make in connection with the logical and causal (or more generally objective) readings of SI, but its force as to the informational reading is not so clear, as Fitelson and Hajek concede. In the Bayesian decision theory of this paper, one can assume the s-component of uncertainty to be informally irrelevant to the component t without assuming the converse irrelevance, for this amounts to requiring invariance from the s-conditionals and not from the t-conditionals. However, we have seen that this logical independence of the two assumptions vanishes once the preference axioms for EU are all in place. This result can be understood in two opposite ways those who take the preference conditions for EU to be normatively compelling will view it as a justication of the postulated symmetry of SI, whereas others, for whom this symmetry is an arbitrary diktat, will turn the result against the allegedly compelling preference conditions. 7 Conclusions We have responded to Savages request to extend the preference apparatus of Bayesian decision theory to the point where it includes an account of stochastic independence. To do so, we have reconstructed this preference apparatus and proved representation theorems that contain both a novel derivation of the expected utility formula and the desired specication that the subjective probability in this formula makes the sources of uncertainty stochastically in15 dependent. These theorems call for richer variants that need to be pursued elsewhere. One such variant would relax the niteness assumption put on the set of states of the world and consistently o¤er the treatment of 0-probability events, the lack of which had previously been made tolerable by this niteness assumption. Besides absolute probability as in Kolmogorov, this line of research could more ambitiously concern conditional probability taken as an axiomatic primitive, in Poppers sense. Each time, the objective would be to map the features of the probability space onto axiomatic preference counterparts. Another project would be to reconsider stochastic independence in relation to the nonadditive measures of uncertainty that decision theorists have introduced since they moved away from a primarily Bayesian outlook. This is the more challenging of the two lines of research, because it requires one not only to nd preference counterparts to already dened mathematical features, but also to discover those new mathematical denitions which capture stochastic independence when probability gives way to weaker notions. 8 Appendix The two theorems of this paper follow from a result proved by Mongin and Pivato (2015, Theorem 1). We restate this result in a simplied form adapted to the purpose of deriving them. Theorem 3 Granting the maintained assumptions of section 3 on % and the conditionals %st, the following conditions are equivalent:  The conditionals %s and %t are weak orderings for all s 2 S and all t 2 T , and the %s are an invariant family.  There are increasing, continuous functions ut : R  !R for all t 2 T , and there is a strictly positive probability function p 2 S, such that  is represented by the function V : RST !R that computes the p-expected value of P t2T u t, i.e., by the following function: for all X = [xts] t2T s2S , V (X) := X s2S X t2T ps u t(xts). In this format of EU representation, p is unique, and the ut are unique up to positive a¢ ne transformations with a common multiplier. Theorem 1 requires both families of %s and %t to be invariant. A proof for it results from applying Theorem 3 twice and checking the compatibility of the obtained representations. See Mongin and Pivato (2015, Corollary 1(c)) for mathematical details. The rst part of Theorem 2 is a direct application of Theorem 3, with i playing the role of s and (s; t) playing the role of t in the statement of the latter. The second part is proved below. 16 Proof. (Sketch). We rst observe that, for every given s 2 S, the W (X) representation of the rst part delivers a function RTI ! RX t2T X i2I stu i(xist) that represents the weak ordering %s. If we dene 0st := st= P t2T st for all t 2 T , the function X t2T X i2I 0stu i(xist) is also a representation of %s. Now x s0 2 S. By the invariance of the %s family, for every s 2 S, there is a strictly increasing function s on R such that X t2T X i2I 0s0t u i(xist) = s X t2T X i2I 0stu i(xist) ! . As the ui are strictly increasing and continuous, and so are the double sums of them, the s are continuous, and we can apply a functional equation argument and conclude that the s are positive a¢ ne transformations. I.e., for all s 2 S, there exist numbers s > 0 and s s.t.X t2T X i2I 0s0t u i(xist) = s X t2T X i2I 0stu i(xist) ! + s. 16 After redening the functions so as to dispense with the constant terms, we see that, for all s 2 S and t 2 T , 0s0t = s 0 st, and in fact (since proportional probability vectors are equal) 0s0t =  0 st. We thus rewrite () asX s2S;t2T X i2I 0s0t ( X t2T st)(u i(xist), which is () if one takes p = ( P t2T st)s2S and q = ( 0 sot)t2T . The uniqueness of p and q in this format of representation is easily established. To show that adding an invariance assumption on thei leads to the stronger representation claimed in the text, i.e., (  ) W (X) = X s2S X t2T X i2I ps qtriu(x i st), it is enough to reproduce the proof sequence used for () mutatis mutandis. Theorems 1, 2 and 3 all involve strictly positive probability functions. This restriction is due to the assumption that the %st (in Theorems 1 and 3) and the %ist (in Theorem 3) reproduce the natural order of real numbers. It can be checked that this makes the %s, %t and %i non-constant preference relations, so there are no "null events" in Savages (1954-1972, p.24) sense, hence no 0probability values either. That the mathematics of this paper does not handle these values is a limited shortcoming given that the state set is taken to be nite 17 and there is a single decision-maker to consider. The classic decision-theoretic move in this case is to prune the state set of the states the conditionals of which are constant.17 9 References Anscombe, F.J., & R.J. Aumann (1963), "A Denition of Subjective Probability", Annals of Mathematical Statistics, 34, p. 199-205. Battigalli, P. & P. Veronesi (1996), "A Note on Stochastic Independence without Savage-Null Events", Journal of Economic Theory, 70, p. 235-248. Bernardo, J.M., J.R. Fernandez & A.F.M. Smith (1985), "The Foundations of Decision Theory: An Intuitive, Operational Approach with Mathematical Extensions", Theory and Decision, 19, p. 127-150. Blackorby, C., D. Primont & R.R. Russell (1978), Duality, Separability and Functional Structures: Theory and Applications, New York, North Holland. Blume, L., A. Brandenburger & E. Dekel (1991), "Lexicographic Probability and Choice Under Uncertainty", Econometrica, 59, p. 61-79. Bradley, R. (2017), Decision Theory with a Human Face, Cambridge, Cambridge University Press. de Finetti, B. (1931), Probabilismo, Biblioteca losoca, Perrella, Napoli, p.163-219. Reprinted in B. de Finetti, La logica dellincerto, Milano, Il Saggiatore, 1989, p. 370. English tr. as "Probabilism", Erkenntnis, 31, p. 169223, 1989. Domotor, Z. (1969), "Probabilistic Relational Structures and Their Applications", Technical Report No. 144, Stanford University, Institute for Mathematical Studies in the Social Sciences. Feller, W. (1950), An Introduction to Probability Theory and Its Applications, New York, Wiley (3d ed. 1968). Fine, T.L. (1973), Theories of Probability, New York, Academic Press. Fishburn, P.C. (1970), Utility Theory For Decision Making, New York, Wiley. Fitelson, B. & A. Hájek (forthcoming), "Declarations of Independence", Synthese. Fleurbaey, M. & P. Mongin (2016), "The Utilitarian Relevance of the Aggregation Theorem", American Economic Journal: Microeconomics, 8, p. 289306. Gilboa, I. (2009), Theory of Decision Under Uncertainty, Cambridge, Cambridge University Press. Gillies, D. (2000), Philosophical Theories of Probability, London, Routledge. Halmos, P.R. (1974), Measure Theory, New York, Springer. Hoel, P.G., Port, S.C. & C.J. Stone (1971), Introduction to Probability Theory, Boston, Houghton Mi- in Company. 17Even under niteness assumptions, this pruning move is not available in a game-theoretic context, since 0 probability moves can be invested there with strategic signicance. 18 Joyce, J.M. (1998), "A Non-Pragmatic Vindication of Probabilism", Philosophy of Science 65, p. 575-603. Joyce, J.M. (1999), The Foundations of Causal Decision Theory, Cambridge, Cambridge University Press. Je¤rey, R. (1965), The Logic of Decision, Chicago, Chicago University Press (2nd ed., 1983). Joyce, J.M. (1999), The Foundations of Causal Decision Theory, Cambridge, Cambridge University Press. Kaplan, M. & T.L. Fine (1977), "Joint Orders in Comparative Probability", Annals of Probability, 5, p. 161179. Kolmogorov, A.N. (1933), Grundbegri¤e der Wahrscheinlichkeitsrechnung, in Ergebnisse der Mathematik und ihrer Grenzgebiete, 3. Engl. tr. as Foundations of the Theory of Probability, New York, Chelsea (2nd ed., 1956). Leitgeb, H. & R. Pettigrew (2010), "An Objective Justication of Bayesianism II: The Consequences of Minimizing Inaccuracy", Philosophy of Science, 77, p. 236-272 Luce, R.D. & L. Narens (1978), "Qualitative Independence in Probability Theory", Theory and Decision, 9, p. 225239. Mongin, P., & M. Pivato (2015), "Ranking Multidimensional Alternatives and Uncertain Prospects", Journal of Economic Theory, 157, p. 146-171. Mongin, P. & M. Pivato (2016), "Social Preference Under Twofold Uncertainty", HEC Paris Research Paper Paper No. ECO/SCD-2016-1154. https://ssrn.com/abstract=2796560 Popper, K.R. (1959), The Logic of Scientic Discovery, London, Hutchinson (6th revised ed., 1972). Rado, F. & J. Baker (1987), "Pexiders Equation and Aggregation of Allocations", Aequationes Mathematicae, 32, p. 227-239. Savage, L.J. (1954), The Foundations of Statistics, Wiley, New York (2nd revised ed., 1972). Wakker, P. (1989), Additive Representations of Preferences, Dordrecht, Kluwer. Wakker, P. (2010), Prospect Theory, Cambridge, Cambridge University Press.