BAYESIAN DECISION THEORY AND STOCHASTIC INDEPENDENCE1 Philippe Mongin CNRS & HEC Paris April 2019 To appear in Philosophy of Science Abstract. Stochastic independence (SI) has a complex status in probability theory. It is not part of the denition of a probability measure, but it is nonetheless an essential property for the mathematical development of this theory, hence a property that any theory on the foundations of probability should be able to account for. Bayesian decision theory, which is one such theory, appears to be wanting in this respect. In Savages classic treatment, postulates on preferences under uncertainty are shown to entail a subjective expected utility (SEU) representation, and this permits asserting only the existence and uniqueness of a subjective probability, regardless of its properties. What is missing is a preference postulate that would specically connect with the SI property. The paper develops a version of Bayesian decision theory that lls this gap. In a framework of multiple sources of uncertainty, we introduce preference conditions that jointly entail the SEU representation and the property that the subjective probability in this representation treats the sources of uncertainty as being stochastically independent. We give two representation theorems of graded complexity to demonstrate the power of our preference conditions. Two sections of comments follow, one connecting the theorems with earlier results in Bayesian decision theory, and the other connecting them with the foundational discussion on SI in probability theory and the philosophy of probability. Appendices o¤er more technical material. 1 Introduction and preview The property of stochastic (or statistical, or probabilistic) independence occupies a rather special place in the mathematical theory of probability. It does 1A rst version of this paper was presented at a seminar at the Munich Center for Mathematical Philosophy and at the TARK 2017 conference. The present version has particularly beneted from detailed comments made by Richard Bradley, Donald Gillies, Robert Nau, Marcus Pivato, Burkhard Schipper, Peter Wakker, Paul Weirich, two anonymous TARK referees, and two anonymous referees of this journal. We thank the Investissements dAvenir (ANR-11-IDEX-0003/Labex Ecodec/ANR-11-LABX-0047) for supporting our research. 1 not belong to the properties that this theory singles out to dene a probability measure axiomatically. Indeed, its familiar denitions by the multiplication rule, or the equality of conditional with unconditional probability, do not enter the Kolmogorov axiomatization of a probability measure. Rather, they capture properties of given events (and more generally, of given partitions or random variables) for a given probability measure and can thus be adopted only to model particular situations. At the same time, probability theory obviously makes extensive use of independence assumptions, as evidenced by the Laws of Large Numbers, many theorems on stochastic processes and some core results of statistical theory. For Kolmogorov himself, this property occupies a "central position in the theory of probability" (1933-1956, p. 8), making it signicantly di¤erent from the theory of positive measures. One would thus expect all theories of the foundations of probability to pay careful attention to stochastic independence, but this is not the case. In this paper, we investigate Bayesian decision theory, one of the most inuential among these theories, and at the same time a curious example of neglect of this major property. As is well known, Bayesian decision theorists have developed a brand of subjective interpretation for the probability calculus. They claim that an agents uncertain beliefs should be represented by a probability measure, and ground their claim on a pragmatic argument. They show formally that if the agents preferences over uncertain prospects typically, but not exclusively over monetary bets obey certain requirements of practical rationality, these agents beliefs should conform to the axiomatic denition of a probability measure. Bayesian decision theorists typically stop their work when they have completed this demonstration. By not explaining how it can be extended to recover the denition of stochastic independence, they are open to the objection that they have not yet fully connected the probability calculus with rational belief. That is, they can be critized for having handled only the basics of this calculus instead of its actual working. More technically, Bayesian decision theory o¤ers a representation theorem for preferences over uncertain prospects that involves two sets of quantities, utilities (over the consequences of prospects) and probabilities (over the uncertain events), these two items being combined by the familiar rule of expected utility. After Ramseys and de Finettis sketches, this strategy was implemented in full axiomatic detail by Savage (1954). At one point, Savage (1954-1972, p.44) extends the representation theorem to obtain a posterior probability measure, i.e., one that represents beliefs after a partial resolution of uncertainty, and proves that this posterior obeys Bayess rule of revision; properly speaking, the "Bayesian" label of the school becomes fully justied only at this stage. This is also where Savage stops. As we will report below, however, he acknowledges that a treatment of stochastic independence should have come next, but his admission of an unnished business was generally lost to later Bayesian decision theorists. The rare exceptions will be discussed below. What is missing from Savages axiom system and its variants is a condition put in the preference language of these systems that would account for stochastic independence specically. The present paper formulates one such condition. 2 We do not use Savages system, but a more accessible one, and it is another contribution of the paper to introduce this alternative system. The desired condition turns out to be highly simple and intuitive. We set up a framework in which there are two distinct sources of uncertainty S and T , and states of nature thus have the form of two-component vectors (s; t) 2 S  T . In this framework, the condition relates to the agents conditional preferences and stipulates that those dened conditionally on s should be the same for all s in S, and those dened conditionally on t should be the same for all t in T . What this e¤ectively means is that if the agent knows t but not s, or knows s but not t, then this partial knowledge cannot improve the decisions made under the residual uncertainty. This recovers in axiomatic preference terms one of the standard informal justications of stochastic independence: if the occurrence of an event A carries no information on the occurrence of B, and vice versa, then the two events should be declared to be independent. An alternative informal justication, which is also common, goes as follows: if the occurrence of an event A does not inuence the occurrence of B, and vice versa, then the two events should be declared to be independent. The two lines are semantically distinct, but easily get mixed up in probability texts and even some works in the philosophy of probability. When Bayesian decision theory is extended so as to encompass stochastic independence, there is no danger of confusion since the information carried by events matters only if it enters the agents decision process, hence is subjective in character. Objective connections holding between events, for instance causal connections, play a role only if the agent considers them relevant. Thus, one merit of the extended Bayesian theory is to give the informational reading of stochastic independence a foundation that clearly sets it apart from other possible readings. Besides this semantic contribution, the theory casts some light on the reective discussion of that property in the philosophy of probability. In particular, it has something to say on the symmetry of the multiplicative denition As independence of B implying Bs independence of A -, a property that this literature has sometimes called into question. We o¤er two representation theorems in succession. Both are adapted to the multiple uncertainty framework sketched above, and both deliver an expected utility formula in which the subjective probability is multiplicative in the values of two sources, hence satises stochastic independence. The system of Theorem 1 has very few axioms, which deliver the two desired outputs jointly. This condensed treatment is both an advantage and a disadvantage. To derive a subjective expected utility formula and the multiplicative property in two succesive logical steps helps one better understand how each of the preference axioms contributes to the conclusions. The more advanced Theorem 2 is devised for this purpose. The rest of this paper is organized as follows. Section 2 introduces the twofold uncertainty framework and the preference axioms for Theorem 1 via a motivating example. Section 3 states this result formally, and section 4 does the same for Theorem 2. Section 5 is devoted to comparisons within Bayesian decision theory, including the earlier sketches of a preference analysis of stochastic 3 independence. Section 6 draws connections with the philosophy of probability. Three appendices collect the more specialized material. Appendix 1 gives proof details on the two theorems. Appendix 2 generalizes Theorem 1 by allowing for 0-probability events. Appendix 3 pursues some of the comparisons of section 5 in more detail. 2 A motivating example Given any probability space ( ;A; P ), two events A;B 2 A are said to be stochastically independent if P (A \B) = P (A):P (B). Building on this elementary denition, probability theory also denes what it means for sets of events, partitions or random variables to be stochastically independent. Here we will approach stochastic independence (SI) by specializing the state set to be a product set, a standard move in the theory when it comes to working with this property (see, e.g., Halmos, 1974, p. 191-192). The simple example of this section illustrates this framework and the main decisiontheoretic concepts of the paper. Suppose that a corn producer must decide how much land to farm while not knowing what the climatic conditions and the demand for corn will be at the time of the harvest; supppose also this producer evaluates each farming policy in terms of monetary proceeds and no other criterion. Following the basics of decision theory, we can reexpress this example symbolically as follows. There is a set of states of the world, which takes the form of a product set = S  T , where S represents the set of unknown climatic conditions and T the set of unknown values for demand. There is a set of consequences, which we take to be the set of real numbers R to represent monetary proceeds. There is a set of uncertain prospects, i.e., mappings from the states of the world to consequences, each of which represents a farming policy, which we take to be the set RST of all real functions on . Finally, there is a binary relation % on the last set of prospects to capture the producers preferences among cultivation policies. Now suppose that this preference relation obeys the expected utility (EU) rule, i.e., there exists a probability function  on = S  T and a utility function u on R such that for all uncertain prospects X, Y, X % Y i¤ X s;t (s; t)u(X(s; t))  X s;t (s; t)u(Y(s; t)), and suppose moreover that  satises the stochastic independence (SI) property with respect to S and T , i.e., (s; t) = p(s)q(t), where p and q are probability functions on S and T respectively. This equation, also written as  = p q, determines p and q uniquely; these are the marginals 4 of  on S and T , respectively. The EU rule with an axiomatically derived subjective probability (what will be referred to as subjective expected utility, SEU) and the SI property of this subjective probability are the intended conclusions of our theorems. In the present section, we reason heuristically, working backwards from them to a set of preference conditions that could be proposed as axioms. Assume for simplicity that S = fs1; s2g and T = ft1; t2g. Then, the probabilities are given by the table: t1 t2 s1 ps1qt1 ps1qt2 s2 ps2qt1 ps2qt2 and an uncertain prospect X is represented by the following table, which gives the consequences of this prospect in each state: X t1 t2 s1 x11 x12 s2 x21 x22 The EU formula for %, i.e.: V (X) = ps1qt1u(x11) + ps1qt2u(x12) + ps2qt1u(x21) + ps2qt2u(x22). can be restated either as: () V (X) = ps1 [qt1u(x11) + qt2u(x12)] + ps2 [qt1u(x21) + qt2u(x22)] , or as: () V (X) = qt1 [ps1u(x11) + ps2u(x21)] + qt2 [ps1u(x12) + ps2u(x22)] . Observe that the bracketed sums in () are numerical representations for conditional preferences on the possible values of s, and those in () are numerical representations for conditional preferences on the possible values of t. Thus, the equations entail that (i) these conditional preferences are orderings. Since the same functional form qt1u(:) + qt2u(:) appears in the two bracketed sums of (), and similarly the same functional form ps1u(:) + ps2u(:) appears in the two bracketed sums of (), the equations also entail that (ii) these conditional preferences are the same for di¤erent s, and the same for di¤erent t. Lastly, from the same equations, if the conditional orderings for both s1 and s2, or the conditional orderings for both t1 and t2, agree to rank prospect X above prospect Y, then the overall preference % ranks X above Y. Thus, it also holds that (iii) preferences over prospects are increasing with respect to either family of conditional preferences. We have stated (i), (ii) and (iii) in pure preference language, thus abstracting entirely from the specics of a numerical representation. In Theorem 1 below, we assume no more than (i), (ii) and (iii), plus some background conditions, and 5 derive a SEU rule with the SI property. Given the denition of a conditional, which is restated below, it is actually possible to reduce (iii) to (i) and obtain an even more condensed system. One may wonder how such apparently weak conditions can do the mathematical work. One key point of the formal argument is that the weak ordering property, when applied to both s and t as in (i), makes it possible to decompose the preference % not only in terms of sand t-conditionals, but also in terms of any other partition of , and from there to get an additive representation for %. The other key point is that the invariance property, when applied to both s and t, as in (ii), permits not only turning this additive representation into a EU formula, but also giving the probabilities in this formula the multiplicative form.2 3 A rst representation theorem involving subjective expected utility and stochastic independence Formally, there are two variables of interest, s 2 S and t 2 T , and a state of the world is any pair (s; t) 2 = S  T . By assumption, S and T are nite with cardinalities jSj, jT j  2. We keep the same number of sources of uncertainty as in the motivating example, but this is only for mathematical simplicity. The next section will suggest how a larger number can be handled. We take the set of consequences to be R and the set of prospects to be the set of all mappings ST ! R, which is identied with RST .3 The sets of all probability functions on S, T and S  T are denoted by S , T and ST , respectively. It is convenient to represent prospects X as jSj  jT j matrices, with each s standing for a row and each t standing for a column. We will thus write X = [xts] t2T s2S , but sometimes also X = (x1,...,xjSj), where each component is a row vector xs 2 RT , or X = (x1,...,xjT j), where each component is a column vector xt 2 RS . We endow the agent with an ex ante preference relation % to compare prospects and specically assume that (A) % is a continuous weak ordering. Hence % is representable by a continuous utility function U(X).4 The agents other preference relations are obtained from % as conditionals. There are three families of conditionals of interest, i.e., f%sgs2S , f%tgt2T and f%stgs2S;t2T . 2These two heuristic features underlie the proof of Theorem 3 by Mongin and Pivato (2015), which is our main technical tool in this paper. 3Again, these two assumptions are only for mathematical simplicity. Using the full technology of Mongin and Pivato (2015), the theorems of this paper could be proved for smaller domains than R and RST . This would permit paying attention to feasibility constraints on what counts as a consequence and what counts as a prospect. 4A weak ordering (i.e., a reexive, transitive and complete binary relation) % on some Euclidean space Rn is continuous if for all x 2 Rn, the upper and lower contour sets fx0 2 Rn j x0 % xg and fx0 2 Rn j x % x0g are closed sets. This denition and the accompanying representation theorem are familiar since Debreu (1954) established them. 6 The last family represents ex post preferences, and the rst two represent interim preferences, since each relation in these families depends on xing one variable and letting the other vary, which amounts to resolving only part of the uncertainty. We now formally dene how conditionals are dened from the master relation %.5 Take any subset of states I  S  T . The conditional of % on I is the relation %I on RI dened by the property that for all xI ;yI 2 RI , xI %I yI i¤X % Y for some X;Y 2 RST s.t. X is xI on I, Y is yI on I, and X and Y are equal outside I. Taking I = fsg  T , I = S  ftg, and I = f(s; t)g in this denition, we obtain the three families of conditionals mentioned above.6 In terms of the matrix representation of prospects, the denitions of %s and %t read as follows:  for all xs;ys 2 RT , xs %s ys i¤X % Y for some X;Y 2 RST s.t. xs is the s-row of X, ys is the s-row of Y, and X and Y are equal outside their s-row,  for all xt;yt 2 RS , xt %t yt i¤ X % Y for some X;Y 2 RST s.t. xt is the t-column of X, yt is the t-column of Y, and X and Y are equal outside their t-column. Importantly, the denition of conditionals does not by itself make them weak orderings. By a well-known fact of decision theory, %I is a weak ordering if and only if the choice of X;Y in the denition of %I is immaterial, or more precisely, if and only if X % Y () X0% Y0, for all X0;Y0 that also satisfy the condition stated for X;Y in this denition. In this case, the quantier "for some" in the denition can be replaced by "for all". When this holds, % is said to be separable in I. By another well-known fact, separability in I is equivalent to the property that % is monotonically increasing with the family of conditionals f%i; i 2 Ig. We formally illustrate this property on the example of f%s; s 2 Sg. Then, if % is separable in each s 2 S (or equivalently, each %s is a weak ordering), the following holds: for all X;Y 2 RST , if xs %s ys for all s, then X % Y; and if moreover xs s ys for some s, then X Y.7 Combining the two basic facts just said, we see that the monotonicity condition (iii) of last section actually follows from the weak ordering condition (i). This will permit reducing the assumptions of our theorem to a bare minimum. 5The formalism below is standard in decision theory. See, e.g., Fishburn (1970) and Wakker (1989), or at a less technical level, Keeney and Rai¤a (1993).. 6Throughout, we will use the standard identication of s with fsgT , t with Sftg, and (s; t) with f(s; t)g. 7By , s, t and st we denote the strict preference relation associated with the weak preference relations %, %s, %t and %st. We denote the corresponding indi¤ erence relations by s, ss, st and sst. 7 Since conditionals %st compare real numbers, it makes sense to identify them with the natural order of these numbers. This amounts to assuming that, given any realized state (s; t), they are desirable quantities, be they money values, as in the producer example, or something else. Thus, we also assume that (B) for all (s; t) 2 S  T and all xts; yts 2 R, xts %st yts i¤ xts  yts. Since this equivalence turns the %st into weak orderings, % is increasing with each of them, hence also with each entry xts of X. Let us now say that the conditionals %s (resp. %t) are an invariant family if %s= %s0 for all s; s0 2 S (resp. %t= %t0 for all t; t0 2 T ). These two requirements capture condition (ii) of last section. None is needed for the conditionals %st, since they are identical relations by construction. We are now ready for the rst representation theorem. Theorem 1 Given assumptions (A) on % and (B) on %st, the following conditions are equivalent:  The conditionals %s and %t are weak orderings for all s 2 S and all t 2 T , and both families of conditionals are invariant.  There are increasing, continuous function u : R  !R, and strictly positive probability functions p 2 S and q 2 T , such that % is represented by the function V : RST !R that computes the p q-expected value of u, i.e., by the function thus dened: for all X = [xts] t2T s2S , V (X) := X s2S X t2T ps qtu(x t s). In this format of EU representation, p and q are unique, and u is unique up to positive a¢ ne transformations. The conclusions state both that the ex ante preference relation can be represented by a SEU formula and that the two sources of uncertainty satisfy SI. Thus, this theorem extends Bayesian decision theory up to the point that, from the argument made in the introduction, it ought to have reached. The conclusion that p and q have only positive values is restrictive, but it can be relaxed by complexifying the assumptions. We pursue this technical line in appendix 2. 4 A second representation theorem involving subjective expected utility and stochastic independence In Theorem 1, strong results follow from a short list of assumptions, undoubtedly a feature of mathematical elegance, but also a cause for conceptual dissatisfaction. Would it not be better to expand on the assumptions and separate those which are responsible for the existence of the SEU representation from 8 those which account for the SI property occurring in this representation? Such a disentangling would be the more justied since SI is an optional property of a probability measure, hence in need of a preference condition that should be detachable from those underlying the existence of this measure. However, the assumptions of Theorem 1 cannot be so divided, as the following argument shows. By taking the %s and %t to be merely orderings, not invariant orderings, one would get an additively separable representation that does not distinguish between the utility and probability components of the added terms, unlike EU representations. By taking only one of the two families to satisfy the ordering and invariance assumptions, one would get a representation that is separable in that family alone and would be even more remote from the SEU representation.8 As it turns out, however, we can obtain a relevant partitioning of assumptions if we enrich the decision-theoretic framewok beyond the present two-dimensional stage. Let us suppose that the agent pays attention not only to the uncertainty dimensions s and t of the nal consequences, but also to a third dimension i, so that these consequences are now represented by real numbers xist. The added dimension can be thought of in several ways, like time, space or an omitted dimension of uncertainty. Each of these concrete suggestions can t the motivating example: the added dimension would indicate when, where or under what further unknown circumstance the monetary proceeds of a farming policy accrue to the producer. We will return to the interpretation of the added dimension after stating Theorem 2. By assumption, i takes its values in a nite set I with cardinality jIj  2. The set of prospects is now the set of all mappings STI  ! R, i.e., RSTI . Prospects can be represented as three-dimensional arrays X =  xist i2I s2S;t2T 2 R STI , or as vectors X = (X1; :::;XjSj), X =(X1; :::;XjT j) or X = (X1; :::;XjIj), the components of which are matrix-valued, i.e., Xs =  xist i2I t2T 2 R TI , Xt = xist i2I s2S 2 R SI and Xi =  xist  s2S;t2T 2 R ST , respectively. Adapting the formalism of last section, we assume that 8The additively separable representation of the rst point reads asX s2S;t2T vst(x t s), with increasing and continuous vst : R ! R, for all s 2 S; t 2 T . This follows from another theorem of Debreu (1960). As to the next point, if the assumptions only hold for the %s, the separable representation reads as W (V1(x1); :::; VjSj(xjSj)), with increasing and continuous W : RS ! R and Vs : RT ! R, for all s 2 S. On this representation, see Blackorby, Primont and Russell (1978, p. 108). 9 (A) the agents ex ante preference relation % on RSTI is a continuous weak ordering, and from this relation, we dene four families of conditionals, i.e., f%s; s 2 Sg, f%t; t 2 Tg,  %i; i 2 T and  %ist; s 2 S; t 2 T; i 2 I . The %s, %t and %i respectively compare matrices Xs, Xt, Xi, as dened above, the %st compare vectors xst 2 RI , and the %ist compare real numbers. Similarly as before, we assume that (B) each %ist coincides with the natural order of real numbers, which makes it an ordering, and furthermore makes the %ist an invariant family. The other conditional relations may or may not be weak orderings, and may or may not form invariant families, depending on which assumptions are put on them. Theorem 2 Given assumptions (A) on % and (B) on %ist, the following conditions are equivalent:  The conditionals %i are weak orderings.  There are increasing, continuous functions ui : R  !R, for all i 2 I, and a strictly positive probability function  2 ST , such that % is represented by the function W : RSTI !R that computes the -expected value ofP i2I u i, i.e., the function thus dened: for all X =  xist i2I s2S;t2T , () W (X) := X s2S;t2T X i2I st u i(xist): In this format of EU representation,  is unique, and the ui are unique up to positive a¢ ne transformations with a common multiplier. Moreover, the following are equivalent:  The above assumption on the %i holds, and the %s are weak orderings and an invariant family.  The above conclusions hold, and there are strictly positive probability functions p 2 S and q 2 T with  = p q, so that () becomes: for all X =  xist i2I s2S;t2T , () W (X) := X s2S X t2T X i2I ps qtu i(xist). In this format of EU representation, p and q are unique, while the ui have the same uniqueness property as before. Unlike Theorem 1, Theorem 2 is in two parts, corresponding to the SEU formula and the SI property respectively. What appears to be essential to the latter is that one of the two sources (here conventionally taken to be S) gives 10 rise to an invariant family of conditionals. We now reinforce the suggestion that invariance is the crucial condition by a heuristic argument. Considering for simplicity only four states, suppose that the agent takes (s1; t1) to be more likely than (s1; t2), and (s2; t1) less likely than (s2; t2). That is, from knowing how the uncertainty on s is resolved, the agent is able to draw an inference on how the uncertainty on t would be resolved. If the agent reasoned probabilistically, the joint probabilities would of course not decompose multiplicatively. It is easy to conclude that the conditionals on s cannot be invariant. Take ; 0 representing desirable quantities, with  > 0, and the following prospects in matrix form: X t1 t2 s1   0 s2  0  and Y t1 t2 s1  0  s2   0 . The rst line of X, which puts the best consequence on the more likely state, should be preferred to the rst line of Y, which puts it on the less likely state; that is, (; 0) s1 (0; ). By a similar comparison, the second line of X should be preferred to the second line of Y; that is (0; ) s2 (; 0). Thus, the two conditional preferences di¤er. Contraposing the argument, we see that if an agent entertains identical %s and learns which s is realized without yet learning t, this agent would not use the knowledge so obtained to draw information on t. The converse claim is problematic. Preference reversals may occur between %s and %t simply because the agents preferences are sensitive to which of s or t is realized, and this evaluative disposition is logically unrelated to the epistemic disposition of not using s-information to infer t-information. That is, the latter disposition can prevail in preference contexts where the property that the %s are an invariant family should not be assumed. The only way to secure a converse is to exclude the troublesome evaluative disposition by at, which all ordinary axiomatizations of SEU theory actually do, a strategy that has the denite shortcoming of narrowing the application range of the theory. For lack of a better solution, we reproduce this standard move here. To derive (), it was unnecessary to assume that both the %s and the %t are invariant. The invariance of the latter family follows from the representation () itself.9 If it is su¢ cient to require one form of invariance, this is because a EU representation () holds from the previous stage. Under this assumption, it is impossible to distinguish between s bringing no relevant information on t, and t bringing no relevant information on s. This is formally shown in appendix 1 and further discussed in appendix 3. We now return to the interpretation of the third dimension i introduced in this section. A very natural decision-theoretic account becomes available when i represents time. Then, the alternatives X mean contingent plans, i.e., plans for the future whose consequences in a given period depend on the way the 9From (), the %t are represented by P s2S P i2I ps u i(:), which does not depend on t. 11 uncertainty still represented by (s; t) is resolved in that period. The matrixvalued objects Xs, Xt and Xi mean partly contingent plans (for the rst two, when one dimension of uncertainty is xed) or dated prospects (for the last, when the time dimension is xed). As to the vector-valued objects xst, they mean non-contingent plans, since they take the uncertainty to be resolved at each time period.10 However, time considerations are extraneous to uncertainty, which is the concern here, and it may be more appropriate to nd an interpretation for i that relates to these concerns. Suppose we declare i to be a third dimension of uncertainty. We can then add a third part to Theorem 2, which puts on the %i the same invariance assumption as was imposed on the %s and the %t. With this addition, it can be proved that () gives way to the following more specic representation: for all X =  xist i2I s2S;t2T , (  ) W (X) := X s2S X t2T X i2I ps qtriu(x i st), where r = (ri)i2I is a strictly positive probability function on I, and the utility function u in the SEU formula is now independent of the i index. Besides having a semantic advantage, this extension of Theorem 2 carries with it a sense of mathematical generalization. To obtain the SI property for a product space with any nite number of sources of uncertainty is no more di¢ cult than to obtain it for = S  T  I, but this would impose a heavy notational burden. 5 Comparisons with decision theory We start this decision-theoretic section with two comments that Savage makes on SI in his Foundations of Statistics. Having axiomatized a qualitative probability relation, he complains that "the notions of independence and irrelevance have ... no analogues in qualitative probability; this is surprising and unfortunate, for these notions seem to evoke a strong intuitive response" (1954-1972, p. 44). Later, at the end of a well-known passage on "small worlds", Savage restates his complaint as follows: "it would be desirable, if possible, to nd a simple qualitative personal description of independence between events" (p. 91).11 The two comments clearly express the need for Bayesian decision theory to complement its derivation of subjective probability with an account of SI, but point in di¤erent directions. The rst relates to qualitative probability, which is an auxiliary concept in Savages construction; he uses it as an intermediary between his preference postulates and his nal conclusions, in which subjective probability acquires a numerical form. Today, Savages complaint in respect of 10These interpretations assume that each period is uncertain in the same way as any other, i.e., no interaction exists between the resolution of uncertainty and the passing of time. 11Savage used to say "personal probability" where later theorists say "subjective probability". 12 this concept is no longer justied. There now exist richer systems of qualitative probability than his, which contain a special relation to express the stochastic independence of two events or two random variables.12 The second comment does not mention qualitative probability and we read in it a suggestion to base SI directly on the preference relation. In this respect, Savages complaint is still topical. The present axiomatic work, which it motivated, appears to have been foreshadowed in only three papers, to be discussed now. The main connection is with Blume, Brandenburger and Dekel (1991, p. 74). These writers use Anscombe and Aumanns (1963) axiom system to explore the preference foundations of lexicographic probability, a topic apparently remote from the present one. However, their construction includes ordinary Kolmogorov probabilities as a particular case, and we may just focus on this application. To account for SI, they impose invariance of relevant conditionals of their ex ante weak ordering (which they have redened so that it bears on a Cartesian product). Leaving aside the Anscombe-Aumann features (see our critical comments below), their condition is the same as ours. Their paper does not contain a proof that this condition delivers SI, but we will provide one in appendix 3. In an important follow up, which still uses Anscombe and Aumanns axiom system, Battigalli and Veronesi (1996) push the analysis of Blume, Brandenburger and Dekel in the direction of conditional probability systems (CPS). These amount to taking conditional probabilities, rather than absolute probabilities, as being the primitive terms of the probability calculus, a move that philosophers usually associate with the works of Rényi (1955) and Popper (1959). For a suitable denition of what it means for a CPS to satisfy SI, Battigalli and Veronesi (1996, p. 243) connect this property with a form of conditional invariance. Given the dissimilarity of frameworks, the step from this condition to ours is far from trivial. A comparison is also in order with a little known axiomatization of Bayesian decision theory by Bernardo, Ferrandiz and Smith (1985). These writersderivation of SEU hinges on the strong assumption that a preexisting probability measure is available on a subalgebra of events; this will serve to calibrate subjective probability values. They dene a preference condition relative to two events E and F that entails the equation P (E \F ) = P (E):P (F ) at the stage of proving the representation theorem. Although evocative of the informational reading of SI, this condition di¤ers from ours, and this di¤erence seems connected with the technical choice of approaching SI in general probability spaces rather than product spaces.13 Theorems 1 and 2 should also be compared with the recent axiomatic work 12See Domotor (1969), Fine (1971), Kaplan and Fine (1977), Luce and Nahrens (1978), to cite only the earliest accounts of SI in terms of qualitative probability. Fines 1971 classic Theories of Probabilities makes interesting comments on SI, and at some point (p. 36-37) even suggests moving in the direction of a pragmatic, preference-based account of SI. 13Relatedly, Pfanzagls (1968, p. 210-213) text on measurement theory states a preference condition for SI that is dened for a general probability space. Also using a calibration process, but with less elaboration than Bernardo, Ferrandiz and Smith, he derives a SEU formula that satises this property. 13 on second-order expected utility.14 This work involves considering two sources of uncertainty; however, unlike ours, it aims at establishing a logical hierarchy between these sources. By assumption, uncertainty takes both a primary form and a secondary form, which concerns what the primary form may be. This hierarchy can be interpreted epistemically, with the primary form bearing on the realization of natural events and the secondary form bearing on the realization of the primary form considered as an epistemic event, but it also admits of a temporal interpretation, with the primary and secondary sources corresponding to uncertainties of the second and rst periods, respectively. In the present notation, if T represents the uncertain states of the primary form, and S those of the secondary form, a second-order EU representation of ex ante preferences is X s2S psv( X t2T rstu(x t s)), where rs = (rst)t2T is a conditional probability function for t given s, v is a continuous and increasing function on R, and u is a function on a consequence set that may be, but is not necessarily R. This representation is axiomatized by Nau (2006, p. 143). Ergin and Guls (2009, p. 912) version uses an unconditional probability for t: X s2S psv( X t2T qtu(x t s)). 15 If one takes v to be a positive a¢ ne transformation, the two forms of uncertainty play symmetric roles, and both formulas collapse onto a SEU formula with a subjective probability that satises SI; thus, one gets the representation of Theorem 1 as a particular case. How the axioms for second-order EU should be strengthened so as to get this particular case is a relevant question, which we take up in appendix 3.16 We now compare our axiomatization of the SEU formula with those in current use. Being entirely preference-based, the former is like Savages (19541972), but there are important dissimilarities. An obvious one concerns the cardinality of the state set , which is innite in Savage and nite here. Also, the highly condensed axiom systems for Theorems 1 and 2 do not relate to Savages seven-postulate system in a transparent way. However, it is well-known that given postulate P1 (which requires that the ex ante preference relation % be a weak ordering), postulate P2 (the "sure-thing principle") can be restated as the requirement that the conditional %E on any event E of the state set be a weak 14We are grateful to a referee for bringing out this connection. 15The classic article by Klibano¤, Marinacci and Mukerjis (2005, p. 1858) has a similar representation with integrals instead of nite sums. 16Here is a nal comparison. As a further development of Joyces (1999) representation theorem for pairs of credibility and preference relations, Bradley (2017, p. 104) shows that a weak separability assumption on the credibility relation imposes the SI property on the probability measure representing this relation. Joyces and Bradleys analyses belong to Je¤reys (1965) theory of decision, which is several steps removed from the Bayesian decision theory of this paper and the works just reviewed. 14 ordering. This restatement facilitates comparison with the present system. We only assume that the conditionals on some distinguished events E of the state set are weak orderings. This weakening is in the spirit of axiomatizations of some non-EU rules that do not entirely relinquish the "sure-thing principle". By contrast, our invariance condition is stronger than P3 the "event independence" postulate because it bears on all possible prospects and not only on constant prospects, as is the case in that postulate. Savage has another important postulate, P4, which is a clear step towards the existence of subjective probability and has no analogue here; it can only be veried from the utility representation itself. Our best guess is that P4 is made dispensable by the assumptions that consequences are real numbers and conditional preferences respect the order of these numbers. By contrast, Savage puts no restriction at all on his consequence set. Another comparison to the point is with Anscombe and Aumanns (1963) popular variation on Savages system. We share with these authors the assumptions of a nite state set and a highly structured consequence set, but they assume the latter to be a set of probabilistically dened lotteries, an assumption we are glad to eschew here. From a Bayesian decision theory perspective, the Anscombe-Aumann system is open to the objection that it is question-begging to derive a subjective probability by supposing that other probabilities already exist. A common rejoinder is that the preexisting probabilities are objective, hence of a di¤erent nature from the subjective probability to be derived, but this is a free commentary without any basis in Anscombe and Aumanns formal system. We do not deny the practical convenience of this system, but ours is no more complicated, while being perhaps easier to defend theoretically.17 6 Connections with foundational discussions on probability Underlying the axiomatic work of Savage and Bayesian decision theorists generally are two major claims on the foundations of probability: probability measures represent uncertain beliefs in the normatively appropriate way, and what makes the measures in question normatively appropriate is that practical rationality considerations recommend using them in decision making. Both claims have been disputed, with some objections surfacing already before Bayesian decision theory fully took shape. The rst claim can be attacked along at least two di¤erent lines. One may question the appropriateness of probabilities on the ground that they are absolute measures, and as an alternative develop a calculus for conditional probabilities taken as primitive terms. This line has recently been defended by Fitelson and Hajek (2017) in connection with Poppers (1959-1972, Appendices *iv and *v) pioneering work in this area. All existing conditional probability systems preserve the additivity of probability measures, 17There are other alternatives to this system than the present one for application to nite state sets. An early example is Wakkers (1989, ch. IV). 15 and an alternative critical line is indeed to question that property. Decision theory has made thorough contributions on this score; see Gilboa (2009) and Wakker (2010) for overviews. As to the second foundational claim, it can again be attacked from di¤erent sides, one representative example being Joyces (1998) "nonpragmatic" argument that probabilities are appropriate representations of uncertain beliefs for directly epistemic reasons.18 These deep foundational questions arise in connection with the present work, but exceed its limited purpose. We meant to ll a gap in Bayesian decision theory by following its own principles, rather than defend these principles against outside criticism. However, since SI is our focus, we should ask whether this theory, as extended here, may contribute to a better understanding of this property. There is much conceptual tension in the way probability theorists introduce the denition of SI. For one thing, they usually discuss its informal meaning in terms of a provisional denition of SI by the equality of conditional and unconditional probability: for any two events A;B 2 A with P (B) > 0, P (A j B) = P (A). Once they have made intuitive sense of this equation, they proceed to the multiplicative equation of section 2 as constituting the proper denition of SI, arguing that the latter avoids the sign restriction P (B) > 0 and makes SI symmetric. This argument is unconnected with the intuitions supporting the provisional denition, which makes the whole sequence semantically awkward.19 For another thing, probability theorists informally defend their denitions by resorting to more than one concept of unrelatedness. Prominent examples are logical independence, causal independence (or alternative forms of objective independence), and informational independence. While some accounts are relatively clear on which concept they privilege, many others are equivocal, and some even fall into amazing confusions between them.20 The Bayesian decision theory developed here contributes nothing to the rst problem. Only a move from absolute to conditional probability could avoid the discrepancy between the provisional and nal denitions of SI, and we have not performed this move here. However, on the second problem, the theory has something to say. At the very least, it avoids the equivocations between di¤erent informal accounts by rmly opting for informational independence. The invariance property of conditional preferences is the pragmatic criterion by which one can judge that the agent regards s as carrying no information on t. It remains to be said whether the present theory contents itself with endorsing one of the available accounts of SI or adds something signicant to that account. We may credit the theory with two contributions. One is to connect 18Leitgeb and Pettigrew (2010) have recently pursued this line of purely epistemic justication with a new derivation of probability from an accuracy requirement. 19Two examples among many of this two-step denitional sequence are to be found in Feller (1950-1968, p. 125) and Hoel, Port and Stone (1971, p. 19). 20Here is a curious example due to two otherwise excellent scholars: SI means that "the knowledge of (one event) does not a¤ect the other" (Luce and Narens, 1978, p. 226). Naturally, one would expect "the knowledge of the other" instead of "the other". 16 the SI property with the foundations of subjective probability more tightly than is usually done. There has been some vacillation among subjectivists concerning the role of SI assumptions in probability theory. Whereas Savage did not underplay this role, de Finetti considered it with strong reluctance. As Gillies (2000, p. 75-76) explains, citing from Probabilismo (1931), de Finetti argued against the application of SI to repeated trials of the same experiment on the ground that this assumption blocked the possibility of learning from the successive results of the trials through Bayess rule of revision. This argument opened the way to de Finettis alternative to SI, which is exchangeability. Without entering the rich debate well covered by Gillies on the respective merits of the two concepts, we can make the broader point that learning by Bayess rule in repeated trials is just one particular case to be considered by the subjective theory of probability. It is possible to make perfect subjective sense of the opposite particular case in which no learning occurs from one trial to the other; it is actually incumbent on the subject to decide which case is relevant. In other words, there is no logical necessity to associate the subjective theory with a large scope of non-trivial applications of Bayess rule. Although this point may be clear by itself, it comes out perhaps more clearly after Bayesian decision theory, which is a brand of subjectivism, has o¤ered an account of SI. Another possible contribution is to put the symmetry of the denition of SI in perspective. Writers on the foundations of probability have sometimes expressed dissatisfaction towards the fact that asymmetric dependence or independence cannot be formulated within the Kolmogorov axiomatic framework; see Fitelson and Hajek (2017) for a recent example. This is a fair complaint to make in connection with the logical and causal (or more generally objective) readings of SI, but its force as to the informational reading is not so clear, as Fitelson and Hajek concede. In the Bayesian decision theory of this paper, one can assume the s-component of uncertainty to be informationally irrelevant to the t-component without assuming the converse irrelevance, for this amounts to requiring invariance from the s-conditionals and not from the t-conditionals. However, we have seen that this logical independence of the two assumptions vanishes once the preference axioms for SEU are all in place. This result can be understood in two opposite ways those who take the preference conditions for SEU to be normatively compelling will view it as a justication of the postulated symmetry of SI, whereas others, for whom this symmetry is an arbitrary diktat, will turn the result against the allegedly compelling preference conditions. 7 Conclusions We have responded to Savages request to extend the preference apparatus of Bayesian decision theory to the point where it includes an account of stochastic independence. To do so, we have reconstructed this preference apparatus and proved representation theorems that contain both a novel derivation of subjective expected utility and the desired addition that the subjective probability makes the sources of uncertainty stochastically independent. These theorems 17 call for richer variants that need to be pursued elsewhere. One such variant would relax the niteness assumption put on the set of states of the world. Besides absolute probability as in Kolmogorov, this line of research could more ambitiously concern conditional probability taken as an axiomatic primitive, in Poppers sense. Each time, the objective would be to map the features of the probability space onto axiomatic preference counterparts. Another project would be to reconsider stochastic independence in relation to the non-additive measures of uncertainty that decision theorists have introduced since they moved away from a primarily Bayesian conception. This is the more challenging of the two lines of research, because it requires one not only to nd preference counterparts to already dened mathematical properties, but also to discover those new mathematical denitions which capture stochastic independence when probability gives way to weaker notions. 8 Appendix 1: Proofs In the two theorems of this paper, the direction from the SEU representation to the axiomatic conditions is clear. The other direction follows from a result proved by Mongin and Pivato (2015, Theorem 1). We restate this result in the present notation and a simplied form adapted to the purpose of deriving them. Theorem 3 Given assumptions (A) on % and (B) on %st, the following conditions are equivalent:  The conditionals %s and %t are weak orderings for all s 2 S and all t 2 T , and the %s are an invariant family.  There are increasing, continuous functions ut : R  !R for all t 2 T , and there is a strictly positive probability function p 2 S, such that % is represented by the function V : RST !R that computes the p-expected value of P t2T u t, i.e., by the following function: for all X = [xts] t2T s2S, V (X) := X s2S X t2T ps u t(xts). In this format of EU representation, p is unique, and the ut are unique up to positive a¢ ne transformations with a common multiplier. Theorem 1 requires both families of %s and %t to be invariant. A proof for it results from applying Theorem 3 twice over and checking the compatibility of the obtained representations. See Mongin and Pivato (2015, Corollary 1) for mathematical details. The rst part of the conclusions of Theorem 2 is a direct application of Theorem 3, with i playing the role of s and (s; t) playing the role of t in the statement of the latter. The second part is proved below. 18 Proof. We rst observe that, for every given s 2 S, the () representation of the rst part of the conclusions delivers a function RTI ! R,X t2T X i2I stu i(xist), which represents the weak ordering %s. If we dene 0st := st= P t02T st0 for all t 2 T , the function X t2T X i2I 0stu i(xist) is also a representation of %s. Now x s0 2 S and take any s 2 S. By the invariance of the f%sgs2S family, there is an increasing function s on R such that for all xist 2 R,X t2T X i2I 0s0t u i(xist) = s X t2T X i2I 0stu i(xist) ! . Now for all t 2 T , i 2 I, dene zist :=  0 stu i(xist), so that the previous equation becomes a Pexider equation: X t2T X i2I 0s0t z i st 0st = s X t2T X i2I zist ! . As the ui are increasing and continuous, and so are the double sums of them, s is continuous and open.21 It follows that the set of possible values for the vector (zist) i2I t2T is connected and open. Thus, we can apply a theorem on Pexider equations and conclude that the s are positive a¢ ne transformations.22 That is, there exist a number s > 0, and for all t 2 T and i 2 I, numbers ist s.t. for all xist 2 R, 0s0t u i(xist) = s 0 stu i(xist) + i st. This entails that for all t 2 T , 0s0t = s 0 st, and in fact (since proportional probability vectors are equal) 0s0t =  0 st. We can thus rewrite () asX s2S;t2T 0s0t X i2I ( X t02T st0)u i(xist), which is () if one denes p := ( P t02T st0)s2S and q := ( 0 sot)t2T . The uniqueness of p and q in this format of representation is easily established. 21For a proof of continuity, see Fleurbaey and Mongin (2016, Lemma 1). Openness is easily established. 22This functional equation theorem is due to Rado and Baker (1987, Theorem 1 and Corollary 2). They formally prove it for sums of two terms, but Fleurbaey and Mongin (2016, Lemma 2) make the easy generalization to sums of any number of terms. 19 To show that adding an invariance assumption on the%i leads to the stronger representation claimed in the main text, i.e., (  ) W (X) := X s2S X t2T X i2I ps qtriu(x i st), it is enough to reproduce the proof sequence used for () mutatis mutandis. 9 Appendix 2: Extension to 0-probability events Theorems 1, 2 and 3 all involve strictly positive probability functions. This restriction is due to the assumption that all of the %st (in Theorems 1 and 3) and all of the %ist (in Theorem 2) reproduce the natural order of real numbers. It can indeed be checked that this makes the %s, %t and %i non-constant preference relations, so that there are no "null events" in Savages (1954-1972, p. 24) sense, hence no 0-probability values either. Given that the state set is nite, it is possible to prune it of the null states without creating side-e¤ects on the rest of the framework. This move is often performed in mathematical treatments of decision theory.23 One may complain, however, that it is unt to some game-theoretic applications: di¤erent players can di¤er in their assignments of 0-probability values, and this fact may be invested with strategic relevance. Moreover, and in closer connection with the aims of this paper, discarding 0-probability values would leave the present extension of Bayesian theory incomplete.24 We now show how Theorem 1 can be modied so as to include null states and 0-probability values. In the formal framework of section 3, we dene the conditional %I of % on I  S  T to be constant if for all xI ;yI 2 RI , xI %I yI . When, for a given s 2 S (resp. a given t 2 T ), %s(resp. %t) is constant, we say that s (resp. t) is a null component ; and when for a given (s; t) 2 S  T , %st is constant, we say that (s; t) is a null state. We assume that the subsets S  S and T   T of non-null components satisfy the cardinality restriction jSj ; jT j  2. In this enriched framework, a more general form of Theorem 1 can be proved. In this statement, assumption (A) is unchanged, but (B) is replaced by (B) for all (s; t) 2 S  T  and all xts; yts 2 R, xts %st yts i¤ xts  yts. Theorem 4 Given assumptions (A) and (B) , the following conditions are equivalent:  For all s 2 S and t 2 T , the conditionals %sand %t are weak orderings, and so are the conditionals %st for all s 2 SS and t 2 T nT . Furthermore, the subfamilies of conditionals f%sgs2S and f%tgt2T are invariant.  There are increasing, continuous function u : R  !R, and probability functions p 2 S and q 2 T , such that % is represented by the function 23See, e.g., Debreu (1959-1983, p. 128) and Wakker (1989, p. 83). 24We are grateful to a referee for pressing this point. 20 V : RST !R that computes the p q-expected value of u, i.e., by the function dened as follows: for all X = [xts] t2T s2S , V (X) := X s2S X t2T ps qtu(x t s), with ps = 0 i¤ s 2 SS and qt = 0 i¤ t 2 TT , i.e., i¤ s (resp. t) is a null component. In this format of EU representation, p and q are unique, and u is unique up to positive a¢ ne transformations. Proof. The derivation of the axiomatic conditions from the SEU representation is clear. For the converse derivation, let us rst observe that, for all s 2 S and all t 2 T , the conditional %st is constant i¤ either s 2 SS or t 2 TT . In one direction, if %st is non constant, it follows from the weak ordering assumption put on %s and %t that these two conditionals are non-constant. (This assumption ensures that the values outside (s; t) that are used to dene %st from % can also be used to dene %s and %t from %, a step needed for the non-constancy conclusion.) In the other direction, if both %s and %t are non-constant, (B) applies and then %st is non-constant. Hence  := (ST )c is the set of null states. In the next step, we take prospects X in vector form and decompose them into their S  T  and  subsets of entries, denoting the corresponding subvectors by XST and X respectively. We aim at reaching the following equivalence: for all X =(XST ;X), Y =(YST ;Y) 2 RST , and for all Z = (zts)(s;t)2 2 R, (+) X % Y i¤ (XST ;Z) % (YST ;Z). The proof uses the property that for all X0 = (x0ts )(s;t)2ST ;Y 0 = (y0ts )(s;t)2ST 2 RST , if x0ts sst y0ts for all s 2 S and t 2 T , then X0s Y0. This monotonicity property results from the fact that the %st are weak orderings. Regarding the %st s.t. (s; t) 2 , this fact follows from the assumptions and the denition of , and regarding the %st s.t. (s; t) 2 S  T , it can be proved by a technical argument.25 Since the %st are constant for all s 2 SS and t 2 TT , monotonicity entails that (XST ;X) s (XST ;Z), and that (YST ;Y) s (YST ;Z), whence (+) follows by the transitivity of %. Now, we may let Z vary in (+), and from the observation that this does not change the equivalence, conclude 25This argument involves Gormans (1968) overlapping separability theorem and is part of the proof of Theorem 3 (see Mongin and Pivato, 2015, Lemma 4). The role of (B) in this argument is to ensure that all subsets of S  T  are essential in Gormans sense. 21 that the conditional %ST of % on ST  is a weak ordering. In words, (+) means that the particular values taken on the null states make no di¤erence to the ex ante preferences. To move from (+) to the representation, we apply Theorem 1 to %ST , viewed as a new primitive preference relation on the new set of prospects RST . It can easily be checked that in this framework, all assumptions needed for Theorem 1 hold. It follows that there are increasing continuous functions u : R  !R, and strictly positive probability functions p 2 S and q 2 T , such that %ST is represented by V (X) := X s2S X t2T ps q  t u(x t s). By (+), this is also a representation of %. It can equivalentely be formulated in terms of p 2 S and q 2 T , as in the conclusion of the theorem. The uniqueness of the pand q for the SEU format of representation of %ST carries through to p and q for the SEU format of representation of %. A noticeable di¤erence between Theorems 1 and 4 is that the latter imposes a weak ordering assumption not merely on the %s and %t, but also on some %st. This extra assumption is essential to secure the monotonicity property that drives the above proof. 10 Appendix 3: Technical connections Blume, Brandenburger and Dekel (1991, p. 74) suggest, but do not formally establish that the SI property of subjective probability follows from their invariance condition, given their choice of an Anscombe-Aumann axiomatization of SEU. A proof for this claim can be obtained by merely adapting that which appendix 1 gave for the second part of Theorem 2. Let us rst assume that the Anscombe-Aumann axiomatization has delivered a SEU representation for % of the form X t2T stu(xst), and then follow the existing proof by ignoring the complication created by the i index. Thus modied, the proof is exactly suitable to Blume, Brandenburger and Dekels claim when there are only two factors in the Cartesian product domain. The case of any nite number of factors, as in their explicit statement, can be handled by a recursive argument. Notice that it should be enough to require invariance with respect to n 1 factors, as the representation itself should entail invariance for the nth one, in the same way as in the two-factor case. More importantly, the present formal argument involves a functional equation step that depends on having a nite set of states of the world, unlike in Savage. We now discuss how the conclusion of Theorem 1 can be related to the second-order EU representations discussed in section 5. Naus (2006) framework is most closely related to ours. He considers two sets of nite cardinality S 22 and T , and takes consequences to be real numbers, in his case representing money values, so that prospects X are elements of RST . (We use our own notation.) At a preliminary stage of his investigation, which will be su¢ cient for our comparative purpose, Nau assumes that the ex ante preference relation % satises (A) and is increasing in each xts, and he imposes two strong separability conditions. One amounts to assuming the full force of Savages P2 with respect to S-events (i.e., events of the form S0T , S0  S), and the other, to assuming a conditional form of P2 with respect to T -events (i.e., events of the form ST 0, T 0  T ). In this conditional form, P2 applies to each %s taken separately rather than the master relation %. This asymmetric treatment of the S and T sources of uncertainty is an essential feature of the second-order construction. The following utility representation for % ensues (Nau, 2006, p. 143): W (X) := X s2S vs ( X t2T ust(x t s)), where the vs and ust are continuous and increasing functions on R. To reach the conclusion of Theorem 1 from this formula, one must restrict the vs to be positive a¢ ne transformations. Here is a necessary and su¢ cient condition for this to obtain: for all t 2 T , %t is a weak order. Essentially, when added to the others, this condition secures (P2) on all events of S  T , whence an additive representation follows for %, and the result derives from comparing this representation with W (X).26 A comparison is also possible with Ergin and Gul (2009), although their framework is more complex than Naus and ours. As in Savage, they postulate an innite set of states of the world and end up with nonatomic probability measures.27 Barring these and other technical di¤erences, their work also makes it possible to relate SI to second-order EU representations. By itself, their formula with unconditional probability for t, i.e.,X s2S psv( X t2T qtu(x t s)) does imply that SI holds between s and t (Ergin and Gul, 2009, p. 906). If v does not degenerate to a positive a¢ ne transformation, this property manifests itself in the separate use of two probability measures on S and T . If v degenerates, it manifests itself in the SI property of a single probability measure on ST , as in Theorem 1. This simplication can be explained in terms of the agents secondorder risk attitude thanks to a theorem in Ergin and Gul (2009, p. 911): v is a 26 In more detail, suppose %t is a weak ordering for all t 2 T . Then, by the same argument as in fn 25, %A is a weak ordering for all subsets A  S  T . Given that % is increasing with each xts, Debreus (1960) theorem applies, and % is represented by P s, t wst(x t s), where the wst are continuous and increasing functions. Fixing t, one compares two representations of %t , i.e., P s wst(x t s) and P s vs (ust(x t s) + ks), where ks = P t02T vs (ust(x t0 s ) for arbitrarily xed values xt 0 s . A functional equation argument in the style of that used to prove Theorem 2 leads one to the conclusion that the vs are positive a¢ ne functions. 27 If Ergin and Guls representation involves nite sums, this is because they restrict attention to prospects with a nite number of distinct consequences. 23 positive a¢ ne transformation if and only if the agent is indi¤erent between risky and non-risky second-order prospects having the same mean. This establishes an intriguing connection between the SI property and the psychology of the agent of Bayesian decision theory. 11 References Anscombe, F.J., & R.J. Aumann (1963), "A Denition of Subjective Probability", Annals of Mathematical Statistics, 34, p. 199-205. Battigalli, P. & P. Veronesi (1996), "A Note on Stochastic Independence without Savage-Null Events", Journal of Economic Theory, 70, p. 235-248. Bernardo, J.M., J.R. Fernandez & A.F.M. Smith (1985), "The Foundations of Decision Theory: An Intuitive, Operational Approach with Mathematical Extensions", Theory and Decision, 19, p. 127-150. Blackorby, C., D. Primont & R.R. Russell (1978), Duality, Separability and Functional Structures: Theory and Applications, New York, North Holland. Blume, L., A. Brandenburger & E. Dekel (1991), "Lexicographic Probability and Choice Under Uncertainty", Econometrica, 59, p. 61-79. Bradley, R. (2017), Decision Theory with a Human Face, Cambridge, Cambridge University Press. Debreu, G. (1954), "Representation of a Preference Ordering by a Numerical Function", in Decision Processes, R.M. Thrall, C.H. Combs & R.L. Davis eds, New York, Wiley, p. 159-165. Reprinted as ch. 6 in G. Debreu, Mathematical Economics, Cambridge, Cambridge University Press, 1983. Debreu, G. (1960), "Topological Methods in Cardinal Utility Theory", in Mathematical Methods in the Social Sciences, 1959, K.J. Arrow, S. Karlin & P. Suppes eds, Stanford, Stanford University Press, 1960, p. 16-26. Reprinted as ch. 9 in G. Debreu,Mathematical Economics, Cambridge, Cambridge University Press, 1983. de Finetti, B. (1931), Probabilismo, Biblioteca losoca, Perrella, Napoli, p.163-219. Reprinted in B. de Finetti, La logica dellincerto, Milano, Il Saggiatore, 1989, p. 370. English tr. as "Probabilism", Erkenntnis, 31, p. 169223, 1989. Domotor, Z. (1969), "Probabilistic Relational Structures and Their Applications", Technical Report No. 144, Stanford University, Institute for Mathematical Studies in the Social Sciences. Ergin, H. & F. Gul (2009), "A Theory of Subjective Compound Lotteries", Journal of Economic Theory, 144, p. 899-929. Feller, W. (1950), An Introduction to Probability Theory and Its Applications, New York, Wiley (3d ed. 1968). Fine, T.L. (1973), Theories of Probability, New York, Academic Press. Fishburn, P.C. (1970), Utility Theory For Decision Making, New York, Wiley. Fitelson, B. & A. Hájek (2017), "Declarations of Independence", Synthese,194, p. 3979-3995. 24 Fleurbaey, M. & P. Mongin (2016), "The Utilitarian Relevance of the Aggregation Theorem", American Economic Journal: Microeconomics, 8, p. 289306. Gilboa, I. (2009), Theory of Decision Under Uncertainty, Cambridge, Cambridge University Press. Gillies, D. (2000), Philosophical Theories of Probability, London, Routledge. Gorman, W. (1968), "The Structure of Utility Functions", Review of Economic Studies, 35, p. 367-390. Halmos, P.R. (1974), Measure Theory, New York, Springer. Hoel, P.G., Port, S.C. & C.J. Stone (1971), Introduction to Probability Theory, Boston, Houghton Mi- in Company. Je¤rey, R. (1965), The Logic of Decision, Chicago, Chicago University Press (2nd ed., 1983). Joyce, J.M. (1998), "A Non-Pragmatic Vindication of Probabilism", Philosophy of Science, 65, p. 575-603. Joyce, J.M. (1999), The Foundations of Causal Decision Theory, Cambridge, Cambridge University Press. Kaplan, M. & T.L. Fine (1977), "Joint Orders in Comparative Probability", Annals of Probability, 5, p. 161179. Keeney, R.L. & H. Rai¤a (1993), Decisions With Multiple Objectives. Preferences and Value Tradeo¤s, Cambridge, Cambridge University Press. Kolmogorov, A.N. (1933), Grundbegri¤e der Wahrscheinlichkeitsrechnung, in Ergebnisse der Mathematik und ihrer Grenzgebiete, 3. Engl. tr. as Foundations of the Theory of Probability, New York, Chelsea (2nd ed., 1956). Leitgeb, H. & R. Pettigrew (2010), "An Objective Justication of Bayesianism II: The Consequences of Minimizing Inaccuracy", Philosophy of Science, 77, p. 236-272. Luce, R.D. & L. Narens (1978), "Qualitative Independence in Probability Theory", Theory and Decision, 9, p. 225239. Mongin, P., & M. Pivato (2015), "Ranking Multidimensional Alternatives and Uncertain Prospects", Journal of Economic Theory, 157, p. 146-171. Mongin, P. & M. Pivato (2016), "Social Preference Under Twofold Uncertainty", HEC Paris Research Paper Paper No. ECO/SCD-2016-1154. https://ssrn.com/abstract=2796560 Nau, R.F. (2006), "Uncertainty Aversion with Second-Order Utilities and Probabilities", Management Science, 52, p. 136-145. Pfanzagl, J. (1968), Theory of Measurement, New York, Wiley. Popper, K.R. (1959), The Logic of Scientic Discovery, London, Hutchinson (6th revised ed., 1972). Rado, F. & J. Baker (1987), "Pexiders Equation and Aggregation of Allocations", Aequationes Mathematicae, 32, p. 227-239. Rényi, A. (1955), "On a New Axiomatic Theory of Probability", Acta Mathematica Academiae Scientiae Hungaricae, 6, 268335. Savage, L.J. (1954), The Foundations of Statistics, Wiley, New York (2nd revised ed., 1972). Wakker, P. (1989), Additive Representations of Preferences, Dordrecht, Kluwer. Wakker, P. (2010), Prospect Theory, Cambridge, Cambridge University Press.