Accuracy-first epistemology without additivity Richard Pettigrew August 17, 2020 The Bayesian approach to reasoning under uncertainty is now a vast edifice. It contains methods you can use to build a probabilistic model of a phenomenon, some specific to certain sorts of situations and purposes, others more generally applicable; it contains rules for cleaning and incorporating different sorts of data and updating your model in response to it; it contains mathematical and computational techniques to aid you in discovering particular properties of those models and implementing those modes of inference. However, at its core lies a small number of fundamental principles: Probabilism tells you how your credences in logically related propositions should relate to one another; Conditionalization tells you how to update your credences in response to a specific sort of new evidence, namely, evidence that makes you certain of a proposition; the Principal Principle tells you how your credence about the objective chance of an event should relate to your credence in that event; and, for the socalled objective Bayesians, there are further principles that tell you how to set your prior credences, that is, those you have before you incorporate your data. The accuracy-first programme in epistemology has sought to provide new foundations for these central principles of Bayesianism (Joyce, 1998; Greaves & Wallace, 2006; Pettigrew, 2016a). The idea is straightforward. We adopt the orthodox Bayesian assumption that our uncertain doxastic states can be represented by assigning precise numerical credences to each proposition we consider. By convention, your credence in a proposition is represented by a single real number at least 0 and at most 1, which we take to measure how strongly you believe that proposition. We then represent the whole doxastic state by a credence function, which takes each proposition about which you have a credal opinion and assigns to it your credence in that proposition. So far, that's just the representational claim of orthodox Bayesianism. Now for the distinctive claim of accuracy-first epistemology. It is a claim about what makes a credal state, represented by a credence function, better or worse from the epistemic point of view. That is, it says what determines the epistemic or cognitive value of a credal state. It says: a credal state is better the more accurate it is; it is worse the more inaccurate 1 it is. And we might intuitively think of the inaccuracy of a credence function as how far it lies from the ideal credence function, which is the one that assigns maximal credence to each truth and minimal credence to each falsehood. In accuracy-first epistemology, we formulate mathematically precise ways of measuring this epistemic good. We then ask what principles a credence function should have if it is to serve the goal of attaining that good; or, perhaps better, what properties should it not have if it is to avoid being suboptimal from the point of view of that good. And indeed we can find arguments of exactly this sort for the various fundamental principles of Bayesianism that we listed above. Each of these arguments has the same form. Its first premise gives a set of properties an inaccuracy measure must have.1 Its second premise provides a bridge principle connecting inaccuracy with rationality. And its third and final premise is a mathematical theorem that shows that, for any inaccuracy measure that has the properties demanded by the first premise, if we apply the bridge principle with inaccuracy measured in this way, it will follow that any credence function that violates the principle we seek to establish is irrational. We thus conclude that principle. So, for instance, Jim Joyce lays down a series of properties that legitimate measures of inaccuracy must have-we'll mention them again briefly below (Joyce, 1998). He then formulates his bridge principle that connects inaccuracy and rationality: it says that a credence function is irrational if it is accuracy dominated; that is, if there is an alternative that is guaranteed to be more accurate. Then he proves a mathematical theorem to show that any credence function that is not probabilistic is accuracy dominated. And he concludes Probabilism. Similarly, Hilary Greaves and David Wallace lay down a single property they take to be necessary for measure of inaccuracy (Greaves & Wallace, 2006): it is Strict Propriety, and it will play a central role in what follows. Then they say that your updating plan is irrational if there is an alternative that your prior credence function expects to be more accurate. And finally they prove that updating rules that proceed by conditioning your prior on your evidence, and only such rules, minimize expected inaccuracy from the point of view of your prior. And they conclude Conditionalization. Pettigrew's arguments for the Principal Principle and the Principle of Indifference, and Pettigrew and Briggs' argument for Conditionalization have the same structure (Pettigrew, 2013, 2016b; Briggs & Pettigrew, 2018). Now, an argument is only as strong as its premises are plausible. In this paper, I'd like to consider the first premise in each of these arguments. In these first premises, we lay down what we will demand of an inaccuracy 1For various reasons, it's become standard in accuracy-first epistemology to work with measures of inaccuracy, rather than measures of accuracy. But there is no substantial difference: a measure of inaccuracy is simply the negative of a measure of accuracy, and vice versa. 2 measure. My aim is to take the current best version of this premise and improve it by making it less demanding. There are really six sets of conditions offered in the literature: (I) In his 1998 argument for Probabilism, Joyce imposes six conditions on measures of inaccuracy: Structure, Extensionality, Normality, Dominance, Weak Convexity, and Symmetry (Joyce, 1998). (II) In his 2009 argument for a restricted version of Probabilism, he imposes just two: Truth-Directedness and Coherence Admissibility (Joyce, 2009). (III) In their 2006 argument for Conditionalization, Greaves and Wallace impose just one: Strict Propriety (Greaves & Wallace, 2006). (IV) In their 2009 argument for Probabilism, Predd, et al. impose three: Continuity, Additivity, and Strict Propriety. These are also the three conditions imposed in Pettigrew's argument for the Principal Principle, and Briggs & Pettigrew's argument for Conditionalization (Predd et al., 2009; Pettigrew, 2013, 2016b; Briggs & Pettigrew, 2018). (V) In their 2010 arguments for Probablism and Conditionalization, Leitgeb and Pettigrew consider three different sets of conditions: for our purposes, the important condition is Global Normality and Dominance, which entails Additivity, the condition we seek to excise here (Leitgeb & Pettigrew, 2010). (VI) In their 2016 argument, D'Agostino and Sinigaglia impose five: OneDimensional Value-Sensitivity, Sub-Vector Consistency, Monotonic OrderSensitivity, Permutation Invariance, and Replication Invariance (D'Agostino & Dardanoni, 2009; D'Agostino & Sinigaglia, 2010). There is a lot to be said about the relationships between these different sets of necessary conditions for inaccuracy measures, but that's not my purpose here. Here, I want to take what I think are the best accuracy-first arguments for Probabilism, Conditionalization, and the Principal Principle and improve them by weakening the demands they make of inaccuracy measures in their first premises. That is, I want to show that those arguments go through for a wider range of inaccuracy measures than we've previously allowed. As I will explain below, I take those best arguments to be the ones based on Predd, et al.'s set of conditions: Strict Propriety, Continuity, and Additivity. I will strengthen those arguments by showing that they go through if we impose only Strict Propriety and Continuity. We do not need to impose the condition of Additivity, which says roughly that the inaccuracy of a whole credence function should be the sum of the inaccuracies of the credences that it assigns. That is, we can strengthen those arguments by weakening their first premise. 3 Why should this interest us? After all, Joyce as well as D'Agostino & Sinigaglia have offered arguments for Probabilism that don't assume Additivity. True, but Patrick Maher (2002) has raised serious worries about Joyce's 1999 characterization, and Pettigrew (2016a, Section 3.1) has built on those; and Joyce's 2009 characterization applies only to credence functions defined over a partition, and not those defined on a full algebra, so while its premises are weak, so is its conclusion. D'Agostino & Sinigaglia do not assume Additivity, but their characterization does entail it, and the Sub-Vector Consistency requirement they impose is implausibly strong for the same reason that Additivity is implausibly strong, as we'll see in the next section. And, similarly, Leitgeb and Pettigrew's Global Normality and Dominance condition entails Additivity, and so is implausibly strong for the same reason. This suggests that the best existing accuracy-first argument for Probabilism is the one based on Predd, et al.'s results, which assumes Additivity, Strict Propriety, and Continuity. So there is reason to show that Probabilism follows from Strict Propriety and Continuity alone. What about the Principal Principle? Well, the only existing accuracyfirst argument for that is Pettigrew's 2013 argument, and that assumes Additivity, Strict Propriety, and Continuity. So, again, there is reason to show that the principle follows from Strict Propriety and Continuity alone. What about Conditionalization? Here, Greaves and Wallace have offered an argument for Conditionalization based on Strict Propriety alone- it does not assume Additivity, nor even Continuity. True, but their result applies to a very specific case, namely, one in which (i) you know ahead of time the partition from which your evidence will come, (ii) you know that you will learn a proposition iff it is true, and (iii) you form a plan for how you will respond should you learn a particular proposition from that partition. In contrast, Briggs & Pettigrew's result can be generalized to cover many more cases than just this. As we will see, it can be generalized to establish what I will call the Weak Reflection Principle, which entails the restricted version of Conditionalization that Greaves and Wallace consider. But the Briggs & Pettigrew proof currently requires Additivity. So there is reason to explore how we can weaken the premises of that argument by removing that assumption and still retain its conclusion. As we will see, the Weak Reflection Principle and thus Conditionalization follows from Strict Propriety and Continuity alone. 1 Predd, et al's conditions In this section, I describe Predd, et al.'s set of conditions-the ones we numbered (IV) in our list above. This will furnish us with statements of Strict Propriety and Continuity, the assumptions we'll use in our new arguments for Probabilism, the Principal Principle, and Conditionalization; and it will 4 also introduce us to Additivity, the assumption that we're dropping from the existing best arguments for these conclusions. We will explain the problems with Additivity in Section 2 below. First, let's lay out the framework in which we're working: • We write W for the set of possible worlds. We assume W is finite.2 SoW = {w1, . . . , wn}. • We write F for the full algebra of propositions build over W . That is, F is the set of all subsets of W , where each subset represents the proposition that is true at all and only the worlds it contains. • We write C for the set of credence functions defined on F . That is, C is the set of functions c : F → [0, 1]. • We write P for the set of probabilistic credence functions defined on F . That is, p is in P iff p is in C and (i) p(>) = 1 and p(⊥) = 0, and (ii) p(A ∨ B) = p(A) + p(B) when A and B are mutually exclusive, i.e., when there is no possible world at which A and B are both true. • Given a credence function c, we write ci for the credence that c assigns to world wi. • We write wi for the ideal credence function on F at world wi. That is, for X in F , wi(X) = { 1 if X is true at wi 0 if X is false at wi So, in particular wi(wj) = wij = 1 if i = j and 0 if i 6= j. • An inaccuracy measure is a function I : C ×W → [0, ∞]. For c in C and wi inW , I(c, i) is the inaccuracy of c at world wi. Here are the three properties that Predd, et al. demand of inaccuracy measures. Continuity For each world wi, I(c, i) is a continuous function of c. That is, if wi is inW , then for all c in C, (∀ε > 0)(∃δ > 0)(∀c′ ∈ C)[(∀X ∈ F )[|c(X)− c′(X)| < δ⇒ |I(c, i)− I(c′, i)| < ε] Continuity says that, I can ensure that the inaccuracy of c′ is as close as I wish to the inaccuracy of c by ensuring that each credence that c′ assigns is at most a certain distance from the corresponding credence that c assigns. 2For a consideration of the infinite case, see (Kelley, ms). 5 Additivity For each X in F , there is a scoring rule sX : {0, 1} × [0, 1]→ [0, ∞] such that, for all c in C and wi inW , I(c, i) = ∑ X∈F sX(wi(X), c(X)) We say that the scoring rules sX for each X in F generate I. Additivity says that the inaccuracy of a credence function is the sum of the inaccuracies of the individual credences it assigns. Strict Propriety For all p inP , ∑ni=1 piI(c, i) is minimized uniquely, as a function of c, at c = p. That is, for all p in P and c 6= p in C, n ∑ i=1 piI(p, i) < n ∑ i=1 piI(c, i) Strict Propriety says that each probabilistic credence function should expect itself to be most accurate.3 A few examples of inaccuracy measures: • Brier score B(c, i) = ∑X∈F (wi(X)− c(X))2 • Log score L(c, i) = − log ci • Enhanced log score L?(c, i) = ∑X∈F (−wi(X) log c(X) + c(X)) • Absolute value score A(c, i) = ∑X∈F |wi(X)− c(X)| • Logsumexp score LSE(c, i) = − log(1 + ∑ X∈F ec(X)) + ∑X∈F (wi(X)− c(X))ec(X) 1 + ∑X∈F ec(X) Then: 3A quick remark: We sometimes assume that the inaccuracy measure I satisfies Additivity, and then assume that the individual scoring rules sX that generate I satisfy Continuity and Strict Propriety, rather than assuming that I itself satisfies those conditions. The following fact shows that this makes no difference. We say that sX is continuous if sX(1, x) and sX(0, x) are continuous functions of X. We say that sX is strictly proper if, for any 0 ≤ p ≤ 1, psX(1, x) + (1− p)sX(0, x) is minimized, as a function of x, at x = p. Then Proposition 1 Suppose I is additive with I(c, i) = ∑X∈F sX(wi(X), c(X)). Then (i) I is continuous iff each sX is continuous. (ii) I is strictly proper iff each sX is strictly proper. 6 Continuity Additivity Strict Propriety B X X X L X × × L? X X X A X X × LSE X × X Some notes: • The Brier score is additive. It is generated by using the quadratic scoring rule q for every proposition, where – q(1, x) = (1− x)2; – q(0, x) = x2. Since q is continuous and strictly proper, so is B. • The log score is not additive and it is not strictly proper. A credence function that assigns credence 1 to each world dominates any credence function that assigns less than credence 1 to each world. The log score is however strictly P-proper: that is, for all p in P , ∑i piL(q, i) is minimized uniquely, among credence functions q in P , at q = p. • The enhanced log score is additive. It is generated using the enhanced logarithmic scoring rule l? for every proposition, where – l(1, x) = − log x + x; – l(0, x) = x. Since l is continuous and strictly proper, so if L?. • The absolute value score is additive. It is generated by using the absolute scoring rule a for every proposition, where – a(1, x) = 1− x; – a(0, x) = x. But a is not strictly proper. If p < 12 , then pa(1, x) + (1− p)a(0, x) = p(1− x) + (1− p)x is minimized at x = 0; if p > 12 , it is minimized at x = 1; if p = 12 , it is minimized at any 0 ≤ x ≤ 1. So A is not strictly proper either. • The logsumexp score is strictly proper, but it is not additive. 7 My concern in this paper is to strengthen the accuracy-first arguments for Probabilism, the Principal Principle, and Conditionalization based on Continuity + Additivity + Strict Propriety by showing that they go through even if we don't assume Additivity. Thus, for instance, they go through for the logsumexp score as well as the Brier and enhanced log scores. Of course, weakening the premises of an argument always strengthens it. So there seems good reason to note this fact regardless of your view of Additivity- unless you're certain Additivity is a necessary condition on legitimate measures of inaccuracy, noting that it is inessential will strengthen the argument for you. Nonetheless, in the next section, I explain why you might be suspicious of Additivity. Then, in Section 3, I give the arguments for Probabilism, the Principal Principle, and Conditionalization without appealing to it. In Section 4, I prove the theorems on which those arguments are based. In Section 5, I note briefly that we can also strengthen Pettigrew's argument for linear pooling in cases of judgment aggregation by removing the Additivity condition there as well (Pettigrew, 2019). In section 6, I conclude. 2 Why Additivity? I should begin by pointing out that, while Pettigrew appeals to Predd, et al.'s mathematical results in his presentation of the accuracy dominance argument for Probabilism, those authors weren't themselves working in the accuracy-first framework (Pettigrew, 2016a, Part I). What we are calling inaccuracy measures are for them loss functions; and in the context of loss functions, the Additivity assumption is perfectly natural-providing the loss is stated in units of some commodity in which your utility is linear, the total loss to which your credence function is subject is the sum of the individual losses to which your credences are subject. But what about the accuracy-first framework? Is Additivity still so natural there? Pettigrew (2016a, Section 4.1) claims that it is. He begins his argument thus: [S]umming the inaccuracy of individual credences to give the total inaccuracy of a credence function is the natural thing to do (Pettigrew, 2016a, 49). His reason? Your credence function is not a single, unified doxastic state, but rather merely the motley agglomeration of all of your individual credal doxastic states. We might mistakenly think of a credence function as unified because we represent it by a single mathematical object, but mathematical functions are anyway just collections of assignments of values to arguments. 8 Pettigrew goes on to contrast credence functions with melodies. Just as we might ask for the inaccuracy of a credence function at a world, so we might ask for the inaccuracy of a particular rendition of an original melody. If we understand the inaccuracy of a credence function at a world as how far it lies from the perfect credence function, we might understand the inaccuracy of a particular rendition of a melody as how far it lies from the perfect rendition. Now, in the case of melodies, we can quickly see that we don't want additive measures of accuracy. For instance, consider the three melodies below. The first is the true opening of Bach's Art of Fugue. The second and third are less than perfect renditions of it. It seems clear to me that Rendition 2 is less accurate than Rendition 1.4 However, an additive measure of melodic inaccuracy cannot capture this. After all, at almost every particular point in time, the note played in Rendition 2 is closer in pitch to the note in the Original than is the note played in Rendition 1. Our intuitive measure of melodic inaccuracy must therefore be non-additive. Such a measure might take into account the 'shape' of the melody, for instance, and assign greater weight to similarity of 'shape' than it assigns to moment-by-moment proximity of pitch. More precisely, it might take into account the change in pitch between each note and the next. By doing that, it might capture our sense that Rendition 1 is more accurate than Rendition 2. The 'shape' of a melody, particularly as analysed as the sequence of changes in pitch between each note and the next, is a global feature of it. It depends on the relationships between the successive notes and cannot be captured by looking at each moment during the melody individually and 4Measuring the proximity of one melody to another is no idle game. Claims of plagiarism turn on it. However, copyright law does not enshrine any particular way of computing it, favouring instead the common sense criterion: does an ordinary person judge them similar? 9 aggregating the results.5 So inaccuracy measures for melodies must be non-additive because there are global features of melodies that are relevant to their inaccuracy. However, the same might be said of credence functions. For instance, at a world at which two propositions A and B have the same truth value, we might think that it is more accurate to have equal credence in A and in B than to have different credences in them. After all, the ideal credence function will have the same credence in them, namely, 1 in both or 0 in both. And perhaps resembling the ideal credence function in this respect gets you closer to it, and therefore more accurate. But having the same credence in A and in B is a global feature of a credence function. To determine whether or not it's true, you can't just look at the credences it assigns individually. However, interestingly, while this is indeed a global feature of a credence function, some additive inaccuracy measures will in fact reward it. Take the Brier score, for instance. If A and B are both true, then the inaccuracy of assigning credences a and b to them respectively is (1− a)2 + (1− b)2. But it's easy to see that, if a 6= b, then assigning their average, 12 a + 1 2 b to both is more accurate. Since (1− x)2 is a strictly convex function of x, (1− a)2 + (1− b)2 > ( 1− ( 1 2 a + 1 2 b ))2 + ( 1− ( 1 2 a + 1 2 b ))2 Similarly, if A and B are both false, a2 + b2 > ( 1 2 a + 1 2 b )2 + ( 1 2 a + 1 2 b )2 So, if we measure inaccuracy using the Brier score, which is additive, having the global property in question-assigning equal credences to A and B when A and B are either both true or both false-improves accuracy. Pettigrew seems to argue that we should simply assume that, for any global feature of a credence function that we consider relevant to accuracy, it must be possible to capture it using additive measures along the lines just sketched. Just as some hold that risk aversion phenomena in practical decision theory are best understood as the result of doing something other than maximizing expected utility-minimizing regret, for instance, or maximizing the quantity favoured by one of the many non-expected utility theories-and not as having a concave utility function, so any sensitivity to global features of 5See Jenann Ismael's essay on death, immortality, and the value of life for an argument that how well a human life has gone is also measured by appealing to global features of it, rather than by looking at the welfare obtained at each point in time and aggregating those (Ismael, 2006). 10 credence functions ought to be understood either as following from their local features or as following from the adoption of an alternative decision principle and not as having a non-additive inaccuracy measure. (Pettigrew, 2016a, 51) But why? Why think that this is true? I think Pettigrew imagines that this follows from the fact that credence functions are not unified entities. He assumes that this warrants a defeasible assumption in favour of additivity. That assumption could be defeated if we were to find a global feature we wished to reward but which could not be rewarded by additive measures, as in the case of melodic inaccuracy. But no such feature has presented itself. Perhaps. But even if we grant Pettigrew his move from the disunified nature of credence functions to this defeasible assumption in favour of additivity, such a defeasible assumption is a flimsy basis for an argument, particularly since we have not systematically investigated the sorts of global features we might consider relevant to accuracy, and so have rather sparse evidence that there is no defeater to be found. As we showed above, we can explain why one particular global feature is conducive to accuracy, namely, having the same credence in two propositions that have the same truth value. And indeed you can view the accuracy dominance argument for Probabilism as furnishing us with another example. After all, Probabilism makes two demands of a credence function. The first is local: the credence function should assign maximal credence to a tautology and minimal credence to a contradiction. The second is global: the credence is assigns to a disjunction of mutually exclusive propositions should be the sum of the credences it assigns to the disjuncts. To tell whether a credence function satisfies this latter condition, you must look at the relationships between the credences it assigns. However, the fact that you can run the accuracy dominance argument for Probabilism using additive inaccuracy measures like the Brier score shows that you can show that this global feature of a credence function is conducive to accuracy without building into your inaccuracy measure that it should be rewarded explicitly. Indeed, that is one of the remarkable features of de Finetti's original proof. But that's about it. Those are the only two global features of credence functions we've succeeded in capturing using additive inaccuracy measures. So, in the end, to the extent that there still exists doubt that we have considered all global features of credence functions that are relevant to their accuracy and showed how their relevance can be captured by additive inaccuracy measures, there still exists doubt over Additivity. And while there still exists doubt over Additivity, removing that assumption from the arguments for Probabilism, the Principal Principle, and Conditionalization strengthens them. 11 3 The arguments without additivity Let me begin this section by spelling out the arguments we wish to give. Then I'll move on to explaining and proving the theorems to which they appeal. As I noted above, each argument has the same form: (NC) Necessary conditions on being a legitimate inaccuracy measure. (BP) Accuracy-rationality bridge principle. (MT) Mathematical theorem. Therefore, (PR) Principle of Rationality The first component will be the same in each argument that we give. We will assume only that inaccuracy measures satisfy Continuity and Strict Propriety. But the accuracy-rationality principles will differ from case to case. In the remainder of this section, I'll state each principle for which we're arguing and then the bridge principle to which we'll appeal in that argument. To state some of the principles of rationality, we need some moderately technical notions. I'll lay them out here. Suppose X is a set of credence functions. Then: • X is convex if it is closed under taking mixtures. That is, for any c, c′ in X and any 0 ≤ λ ≤ 1, λc + (1− λ)c′ is also in X . • X is closed if it is closed under taking limits. That is, for any sequence c1, c2, . . . of credence functions in X with limit c, c is in X .6 • The convex hull of X is the smallest convex set that contains X . We write it X+. That is, if Z is convex and X ⊆ Z , then X+ ⊆ Z . • The closed convex hull of X is the smallest closed set that contains the convex hull of X . We write it cl(X+). Thus, if Z is closed and X+ ⊆ Z , then cl(X+) ⊆ Z . 6We say that limn→∞ cn = c if (∀ε > 0)(∃n)(∀m > n)[|cm(X)− c(X)| < ε] 12 As well as clarifying the technical notions just presented, I also want to flag that these arguments appeal to a notion of possibility at various points: in setting up the framework, we have already talked of possible worlds and algebras built on those worlds; in accuracy dominance arguments for Probabilism, we will quantify over possible worlds; in the accuracy dominance argument for Conditionalization we quantify over possible, but also over possible future credence functions that you currently endorse; in expected inaccuracy arguments like Greaves and Wallace's argument for Conditionalization, we will sum over credences in possible worlds to give expectations; and in chance dominance arguments for the Principal Principle, we'll quantify over possible objective chance functions. What is the notion of possibility at play here? Joyce (1998) takes it to be logical possibility; Pettigrew (2020) takes it to be something like epistemic possibility. It is beyond the scope of this essay to argue for one or the other in detail, so I will leave the notion as a placeholder. But my own preference is for epistemic possibility. What makes it irrational to violate Probabilism, for instance, is that there is another credence function that is more accurate than yours at all epistemically possible worlds-that is, you can tell from the inside, so to speak, that this alternative is better than yours because the only worlds you need to consider are those you consider possible. 3.1 The argument for Probabilism In the argument for Probabilism, we'll assume a credence function is irrational if there's another that is guaranteed to be more accurate. Here's Probabilism: Probabilism Your credence function at each time during your epistemic life should be probabilistic. That is, if c is your credence function at a given time, then we should have: (i) c(>) = 1 and c(⊥) = 0; (ii) c(A ∨ B) = c(A) ∨ c(B) for any mutually exclusive A and B. And here's the bridge principle: Credal Dominance If there is c? such that, for all possible worlds wi, I(c?, i) < I(c, i), then c is irrational. Thus, to move from the claim that an inaccuracy measure must satisfy Continuity and Strict Propriety, together with Credal Dominance, to Probabilism, we need the following theorem: 13 Theorem 2 Suppose I is continuous and strictly proper. If c is not in P , there is c? in P , such that I(c?, i) < I(c, i) for all wi inW . We prove that in Section 4 below. 3.2 The argument for the Principal Principle In the argument for the Principal Principle, we'll assume a credence function is irrational if there is another that is guaranteed to have greater objective expected accuracy-that is, greater expected accuracy from the point of view of the objective chance function. The Principal Principle is usually stated as follows, where, for any probability function ch, Cch is the proposition that says that ch is the objective chance function: Principal Principle If Cch is in F and c(Cch) > 0, then, for all A in F , we should have c(A|Cch) = ch(A) The version we'll consider here is more general than this version. And indeed, the more general version entails the usual version. Generalized Principal Principle Suppose A is the set of possible objective chance functions. Then your credence function should be in the closed convex hull of A. Thus, for instance, if all the possible objective chance functions consider A and B equally likely, your credence function should consider them equally likely; and if all the possible objective chance functions consider A more likely than B, then you should consider A at least as likely as B; and if all the possible objective chance functions consider A to be between 30% and 70% likely, then you should not assign is credence 0.8; and so on. Suppose ch is a possible objective chance function and Cch is in F . And suppose further that ch is not a self-undermining chance function. That is, it is certain that it gives the chances. That is, ch(Cch) = 1. Then, if you satisfy the Generalized Principal Principle, then c(A|Cch) = ch(A).7 So that's the version of the Principal Principle that we'll consider. And here's the bridge principle: Chance Dominance If there is c? such that, for all possible chance functions ch, ∑i chiI(c?, i) < ∑i chiI(c, i), then c is irrational. 7To see this, note that having this property is closed under taking mixtures and taking limits. 14 Thus, to move from the claim that an inaccuracy measure must satisfy Continuity and Strict Propriety, together with Chance Dominance, to the Generalized Principal Principle, we need the following theorem: Theorem 3 Suppose I is continuous and strictly proper. If c is not in cl(A+), there is c? in cl(A+), such that ∑i chiI(c?, i) < ∑i chiI(c, i), for all ch in A. Again, we prove this in Section 4. 3.3 The argument for Conditionalization and the Weak Reflection Principle In the argument for Conditionalization, we'll assume that it is irrational to have a prior credence function and a set of possible future credence functions if there is some alternative prior and, for each possible future credence function, an alternative to that, such that the sum of the inaccuracy of the prior and the inaccuracy of a possible future credence function is always greater than the inaccuracy of the alternative prior and the corresponding alternative possible future credence function. The version of Conditionalization for which Greaves and Wallace argue and Briggs and Pettigrew also argued is this: Plan Conditionalization Suppose your prior is c0 and you know that your evidence will come from a partition E1, . . . , Em. And suppose you will learn a particular cell of this partition iff it is true. And suppose you plan to adopt credence function ci in response to evidence Ei. Then, for all Ei with c0(Ei) > 0, and for all X in F , we should have ci(X) = c0(X|Ei) = c0(XEi) c0(Ei) As we mentioned above, we won't argue for Plan Conditionalization directly, but rather via a more general principle of rationality called the Weak Reflection Principle. Here it is: Weak Reflection Principle SupposeR = {c1, . . . , cm} is the set of possible future credence functions you endorse. Then your current credence function should be in the convex hull ofR. Here's the idea behind this principle. A lot might happen between now and tomorrow. You might see new sights, think new thoughts; you might forget things you know today, take mind-altering drugs that enhance or impair your thinking; and so on. So perhaps there is a set of credence functions you think you might have tomorrow. Some of those you'll endorse- perhaps those you'd adopt if you saw certain new things, or enhanced your 15 cognition in various ways. And some of them you'll disavow-perhaps those that you'd adopt if you were to forget certain things, or were to impair your cognition. The Weak Reflection Principle asks you to separate out the wheat from the chaff, and once you've identified the ones you endorse, it tells you that your current credence function should lie within the convex hull of those future ones. Now, suppose that you are in the situation in that Plan Conditionalization covers. That is, (i) you know that you will receive evidence from the partition E1, . . . , Em, (ii) you will learn Ek iff Ek is true, and (iii) you form a plan for how to respond to these different possible pieces of evidence- you will adopt c1 if you learn E1, c2 if you learn E2, and so on. Thus, the possible future credence functions that you endorse are c1, . . . , cm. Then, by the Weak Reflection Principle, c0 should be in the convex hull of c1, . . . , cn. Then, if ck(Ek) = 1, then ck(X) = c0(X|Ek), as Plan Conditionalization requires. So that's the Weak Reflection Principle. And here's the bridge principle: Diachronic Dominance Suppose c0 is your current credence function and c1, . . . , cm are the possible future credence functions you endorse. Then if there are c?0 , c ? 1 , . . . , c ? m such that, for each 1 ≤ k ≤ m and all wi inW , I(c?0 , i) + I(c ? k , i) < I(c0, i) + I(ck, i) then you are irrational. Thus, to move from the claim that an inaccuracy measure must satisfy Continuity and Strict Propriety, together with Diachronic Dominance, to the Weak Reflection Principle, we need the following theorem: Theorem 4 Suppose I is continuous and strictly proper. If c0 is not in {c1, . . . , cn}+, then there are c?0 , c ? 1 , . . . , c ? n such that, for all wi inW , I(c?0 , i) + I(c ? k , i) < I(c0, i) + I(ck, i) In the next section, we give the proofs of Theorems 2, 3, and 4. 4 The proofs 4.1 Theorems 2 and 3 As will be obvious to anyone familiar with the proof strategy in Predd, et al., many of the ideas used here are adapted from them. Predd, et al. proceed by proving a connection between additive and continuous strictly proper inaccuracy measures, on the one hand, and a sort of divergence between credence functions, on the other. A divergence from one credence function defined on F to another is a function D : C × C → [0, ∞] such that 16 (i) if c 6= c′, then D(c, c′) > 0, (ii) if c = c′, then D(c, c′) = 0. That is, the divergence from one credence function to another is always positive, while the divergence from a credence function to itself is always zero. Given a continuous scoring rule sX, Predd, et al. define the following function for 0 ≤ x ≤ 1: φX(x) := −xsX(1, x)− (1− x)sX(0, x) And then, given an additive and continuous strictly proper inaccuracy measure I that is generated by continuous strictly proper scoring rules sX for X in F , they define a divergence as follows: D(c, c′) = ∑ X∈F φX(c(X))− ∑ X∈F φX(c′(X))− ∑ X∈F φ′X(c ′(X))(c(X)− c′(X)) They show that this is a species of divergence known as a Bregman divergence. What's more, using a representation theorem due to Savage (1971), they show that, for any wi inW and c in C, D(wi, c) = I(c, i) That is, the divergence from the ideal credence function at a world to a credence function is the inaccuracy of that credence function at that world. Having established this, Predd, et al. can then appeal to various properties of Bregman divergences to establish their dominance result. One of the properties they use is this: if p is in P and c is in C, then D(p, c) = ∑ i piI(c, i)−∑ i piI(p, i) (†) That is, the divergence from p to c is what p expects the difference in accuracy to be between c and p itself. In our proofs, we bypass the construction of the Bregman divergence, since it's less straightforward to do that when you do not assume Additivity. And instead, given a continuous strictly proper inaccuracy measure I, we just construct the corresponding divergence DI directly using the property given in equation (†). The following lemma gives four useful initial properties of this divergence. Lemma 5 Suppose I is a strictly proper inaccuracy measure. Then define DI : P × C → [0, ∞] as follows: DI(p, c) := ∑ i piI(c, i)−∑ i piI(p, i) Then: 17 (i) DI is a divergence. That is, DI(p, c) ≥ 0 for all p in P and c in C with equality iff p = c. (ii) DI(wi, c) = I(c, i), for all 1 ≤ i ≤ n. (iii) DI is strictly convex in its first argument. That is, for all p, q in P and c in C and for all 0 ≤ λ ≤ 1, DI(λp + (1− λ)q, c) < λDI(p, c) + (1− λ)DI(q, c) (iv) DI(p, c) ≥ DI(p, q) +DI(q, c) iff ∑ni=1(pi − qi)(I(c, i)− I(q, i)) ≥ 0 Proof of Lemma 5. (i) Since I is strictly proper, DI(p, c) = ∑i piI(c, i) − ∑i piI(p, i) ≥ 0 with equality iff c = p. (ii) DI(wi, c) = 0× I(c, 1) + . . .+ 0× I(c, i− 1) + 1× I(c, i) + 0× I(c, i + 1) + . . .+ 0× I(c, n)− 0× I(wi, 1)− . . .− 0× I(wi, i− 1)− 1× I(wi, i)− 0× I(wi, i + 1)− . . .− 0× I(wi, n) = I(c, i) since I(wi, i) = 0. (iii) Suppose p and q are in P , and suppose 0 < λ < 1. Then let r = λp + (1 − λ)q. Then, since ∑i piI(c, i) is uniquely minimized, as a function of c, at c = p, and ∑i qiI(c, i) is uniquely minimized, as a function of c, at c = q, we have ∑ i piI(c, i) < ∑ i piI(r, i) ∑ i qiI(c, i) < ∑ i qiI(r, i) Thus λ[−∑i piI(p, i)] + (1− λ)[−∑i qiI(q, i)] > λ[−∑i piI(r, i)] + (1− λ)[−∑i qiI(r, i)] = −∑i riI(r, i) Now, adding 18 λ ∑i piI(c, i) + (1− λ)∑i qiI(c, i) = ∑i(λpi + (1− λ)qi)I(c, i) = ∑i riI(c, i) to both sides gives λ[∑i piI(c, i)−∑i piI(p, i)]+ (1− λ)[∑i qiI(c, i)−∑i qiI(q, i)] > ∑i riI(c, i)−∑i riI(r, i) That is, λDI(p, c) + (1− λ)DI(q, c) > DI(λp + (1− λ)q, c) as required. (iv) DI(p, c)−DI(p, q)−DI(q, c) [∑i piI(c, i)−∑i piI(p, i)]− [∑i piI(q, i)−∑i piI(p, i)]− [∑i qiI(c, i)−∑i qiI(q, i)] = ∑i(pi − qi)(I(c, i)− I(q, i)) as required. 2 Lemma 6 Suppose I is a continuous strictly proper inaccuracy measure. Suppose X is a closed convex subset of P . And suppose c is not in X . Then there is q in X such that (i) DI(q, c) < DI(p, c) for all p 6= q in X . (ii) For all p in X , n ∑ i=1 (pi − qi)(I(c, i)− I(q, i)) ≥ 0 (iii) For all p in X , DI(p, c) ≥ DI(p, q) +DI(q, c) Proof of Lemma 6. Suppose c is not in X . Then, since X is a closed convex set and since Lemma 5(iii) shows that DI is strictly convex in its first place, there is a unique q in X that minimizes DI(x, c) as a function of x. So, as (i) requires, DI(q, c) < DI(p, c) for all p 6= q in X . We now turn to proving (ii). We begin by observing that, since p, q are in P , since P is convex, and since DI(x, c) is minimized uniquely at x = q, if 0 < ε < 1, then 1 ε [DI(εp + (1− ε)q, c)−DI(q, c)] > 0 19 Expanding that, we get 1 ε [∑i(εpi + (1− ε)qi)I(c, i)− ∑i(εpi + (1− ε)qi)I(εp + (1− ε)q, i)− ∑i qiI(c, i) + ∑i qiI(q, i)] > 0 So 1 ε [∑i(qi + ε(pi − qi))I(c, i)− ∑i(qi + ε(pi − qi))I(εp + (1− ε)q, i)− ∑i qiI(c, i) + ∑i qiI(q, i)] > 0 So ∑i(pi − qi)(I(c, i)− I(εp + (1− ε)q), i)+ 1 ε [∑i qiI(q, i)−∑i qiI(εp + (1− ε)q, i)] > 0 Now, since I is strictly proper, 1 ε [∑ i qiI(q, i)−∑ i qiI(εp + (1− ε)q, i)] < 0 So, for all ε > 0, ∑ i (pi − qi)(I(c, i)− I(εp + (1− ε)q, i) > 0 So, since I is continuous ∑ i (pi − qi)(I(c, i)− I(q, i)) ≥ 0 which is what we wanted to show. And finally, (iii) follows immediately from (ii) and Lemma 5(iv). 2 Finally, this allows us to prove Theorems 2 and 3. Proof of Theorem 2. Suppose c is not in P . Then, by Lemma 6, there is c? in P such that, for all p in P , DI(p, c) ≥ DI(p, c?) +DI(c?, c) So, in particular, since each wi is in P , DI(vwi , c) ≥ DI(vwi , c?) +DI(c?, c) But, since c? is in P and c is not, c? 6= c, and since Lemma 5(i) shows that DI is a divergence, DI(c?, c) > 0. So DI(vwi , c) > DI(vwi , c ?) So, by Lemma 5(ii), for all wi inW , I(c, i) = DI(vwi , c) > DI(vwi , c ?) = I(c?, i) 20 as required. 2 Proof of Theorem 3. Suppose c is not in cl(A+). Then, by Lemma 6, there is c? such that, for all p in cl(A+), DI(p, c) ≥ DI(p, c?) +DI(c?, c) So, in particular, since each possible chance function ch is in cl(A+), DI(ch, c) ≥ DI(ch, c?) +DI(c?, c) But, since c? is in cl(A+) and c is not, c? 6= c, and since Lemma 5(i) shows that DI is a divergence, DI(c?, c) > 0. So, DI(ch, c) > DI(ch, c?) Now, • DI(ch, c) = ∑i chiI(c, i)−∑i chiI(ch, i) • DI(ch, c?) = ∑i chiI(c?, i)−∑i chiI(ch, i), so ∑ i chiI(c, i) > ∑ i chiI(c?, i) as required. 2 4.2 Proof of Theorem 4 To prove Theorem 4, we need a divergence not between one credence function and another, but between a sequence of m + 1 credence functions and another sequence of m + 1 credence functions. We create that in the natural way. That is, given p0, p1, . . . , pm in P and c0, c1, . . . , cm in C, the divergence from the former sequence to the latter is just the sum of the divergences from p0 to c0, p1 to c1, and so on. Thus: Corollary 7 Suppose I is a strictly proper inaccuracy measure. Then define DI : Pn+1 × Cn+1 → [0, ∞] as follows: DI((p0, p1, . . . , pn), (c0, c1, . . . , cn)) := m ∑ k=0 n ∑ i=1 ( pki I(c k, i)−∑ i pki I(p k, i) ) Then: (i) DI is a divergence. (ii) DI((wi, c1, . . . , ci−1, wi, ci+1, . . . , cn), (c0, c1, . . . , cn)) = I(c0, i)+I(ci, i), for all 1 ≤ i ≤ n. 21 (iii) DI is strictly convex in its first argument. Proof of Corollary 7. These follow immediately from Lemma 5. 2 Corollary 8 Suppose I is a continuous strictly proper inaccuracy measure. Suppose X is a closed convex subset of Pn+1. And suppose (c0, c1, . . . , cn) is not in X . Then there is (q0, q1, . . . , qn) in X such that (i) ∑mk=0 DI(q k, ck) < ∑mk=0 DI(p k, ck) for all (p0, p1, . . . , pn) 6= (q0, q1, . . . , qn) in X . (ii) For all (p0, p1, . . . , pn) in X , m ∑ k=1 ( n ∑ i=1 (pki − qki )(I(ck, i)− I(qk, i)) ) ≥ 0 (iii) For all (p0, p1, . . . , pn) in X , m ∑ k=1 DI(pk, ck) ≥ m ∑ k=1 DI(pk, qk) + m ∑ k=1 DI(qk, ck) Proof of Corollary 8. The proof strategy is exactly as for Lemma 6. 2 To prove Theorem 4, we now need just one more result: Lemma 9 Given c0, c1, . . . , cm in P , let X = {(wi, c1, . . . , ck−1, wi, ck+1, . . . , cm) : wi ∈ W & 1 ≤ k ≤ m} Then, (i) X+ ⊆ P . (ii) If c0 is not in the convex hull of c1, . . . , cn, then (c0, c1, . . . , cm) is not in X+. Proof of Lemma 9. We prove (ii) by proving the contrapositive. Suppose (c0, c1, . . . , cm) is inX+. Then there are 0 ≤ λi,k ≤ 1 such that ∑ni=1 ∑mk=1 λi,k = 1 and (c0, c1, . . . , cm) = n ∑ i=1 m ∑ k=1 λi,k(wi, c1, . . . , ck−1, wi, ck+1, . . . , cm) Thus, c0 = n ∑ i=1 m ∑ k=1 λi,kwi 22 and ck = n ∑ i=1 λi,kwi + n ∑ i=1 ∑ l 6=k λi,lck So ( n ∑ i=1 λi,k ) ck = n ∑ i=1 λi,kwi So let λk = ∑ni=1 λi,k. Then, for 1 ≤ k ≤ m, λkck = n ∑ i=1 λi,kwi And thus m ∑ k=1 λkck = m ∑ k=1 n ∑ i=1 λi,kwi = c0 as required. 2 Now we can turn to the proof of Theorem 4. Proof of Theorem 4. If c0 is not in the convex hull of c1, . . . , cm, then (c0, c1, . . . , cm) is not in X+. Thus, by Lemma 8, there is (q0, q1, . . . , qm) such that, for all (p0, p1, . . . , pm) in X+, DI((p0, p1, . . . , pm), (q0, q1, . . . , qm)) < DI((p0, p1, . . . , pm), (c0, c1, . . . , cm)) In particular, for wi inW and 1 ≤ k ≤ m, DI((wi, c1, . . . , ck−1, wi, ck+1, . . . , cm), (q0, q1, . . . , qm)) < DI((wi, c1, . . . , ck−1, wi, ck+1, . . . , cm), (c0, c1, . . . , cm)) But I(q0, i) + I(qk, i) = DI(wi, q0) +DI(wi, qk) ≤ DI((wi, c1, . . . , ck−1, wi, ck+1, . . . , cm), (q0, q1, . . . , qm)) < DI((wi, c1, . . . , ck−1, wi, ck+1, . . . , cm), (c0, c1, . . . , cm)) = DI(wi, c0) +DI(wi, ck) = I(c0, i) + I(ck, i) as required. 2 5 The argument for linear pooling So far, we have focussed on the central tenets of Bayesianism: Probabilism, Conditionalization, and the Principal Principle. But there is another principle and an accuracy argument in its favour that relies on Continuity, Strict 23 Proprity, and Additivity, and that we might strengthen using the results above to remove the need for Additivity. Suppose we have a group of individuals, each with credences concerning a range of propositions of interest; and suppose we wish to aggregate their credal judgments in these propositions to give the group's judgment. Perhaps the individuals are experts in the effects of ice sheet collapse on sea levels, the propositions concern those matters, and we wish to provide a summary of expert judgment for policymakers (Bamber et al., 2019). Perhaps the individuals are different probabilistic models of the development of hurricanes in the Atlantic Ocean, the propositions concern where it will make landfall if it does and what force it will have accumulated by that point, and we wish to aggregate their judgments to set insurance premiums (Roussos, 2020). Sarah Moss (2011) used accuracy considerations to argue that we should aggregate judgments by the popular method known as linear pooling. That is, if the individuals in the group have probabilistic credence functions p1, . . . , pm, the aggregate p should be some weighted average of them, that is, there should be weights 0 ≤ λ1 . . . , λm ≤ 1 with ∑mk=1 λk = 1 such that p = m ∑ k=1 λk pk But her argument assumed something close to what she wished to prove. She assumed that the group's judgment of the expected inaccuracy of a possible aggregate c should be a weighted average of its expected inaccuracies from the point of view of the members of the group. That is, there should be weights 0 ≤ λ1 . . . , λm ≤ 1 with ∑mk=1 λk = 1 such that the group expected inaccuracy of c is m ∑ k=1 λk ∑ i pki I(c, i) She then noted that, if I is strictly proper, this group expected inaccuracy is minimized at the linear pool c = ∑mk=1 λk p k, as required.8 Pettigrew then offered an argument to the same conclusion but without the almost questionbegging assumption that group expected inaccuracy is the weighted sum of individual expected inaccuracy. He showed that, if p is not a weighted 8It is worth noting that this observation can be generalized to show that the divergence DI we defined in the previous section has a property that is often noted of Bregman divergences: m ∑ k=1 λkDI(p k, c) is minimized, as a function of c at ∑mk=1 λk p k. That is, the credence function that minimizes the weighted average of distances from a set of probabilistic credence functions is the corresponding weighted average of those credence functions (Banerjee et al., 2005). 24 average of p1, . . . , pm, then there is p? such that each pk expects p? to be more accurate than it expects p to be. That is, for all 1 ≤ k ≤ m, ∑ i pki I(p ?, i) < ∑ i pki I(p, i) The argument has the same form as the argument for the Principal Principle that we considered above, but where the possible chance functions are replaced with the credence functions of the individual members of the group. Thus, just as we strengthened the argument for the Principal Principle by removing the assumption of Additivity, and leaving only Strict Propriety and Continuity, so we can strengthen the argument for linear pooling in the same way. This is helpful because, although Pettigrew's argument improves on Moss's by removing the question-begging assumption about the relationship between the group's and the individuals' expected inaccuracies, it is also takes a step back by introducing the assumptions of Additivity and Continuity. Removing the more controversial of these helps the argument. 6 Conclusion Accuracy arguments for the core Bayesian tenets differ mainly in the conditions they place on the legitimate inaccuracy measures. The best existing arguments rely on Predd, et al.'s conditions: Continuity, Additivity, and Strict Propriety. In this paper, I showed how to strengthen each argument based on these by showing that the central mathematical theorem on which it depends goes through without assuming Additivity. References Bamber, J., Oppenheimer, M., Kopp, R., Aspinall, W., & Cooke, R. (2019). Ice sheet contributions to future sea-level rise from structured expert judgment. Proceedings of the National Academy of Science of the U.S.A., 166(11195-11200). Banerjee, A., Guo, X., & Wang, H. (2005). On the Optimality of Conditional Expectation as a Bregman Predictor. IEEE Transactions of Information Theory, 51, 2664–69. Briggs, R. A., & Pettigrew, R. (2018). An accuracy-dominance argument for conditionalization. Noûs. D'Agostino, M., & Dardanoni, V. (2009). What's so special about Euclidean distance? A characterization with applications to mobility and spatial voting. Social Choice and Welfare, 33(2), 211–233. 25 D'Agostino, M., & Sinigaglia, C. (2010). Epistemic Accuracy and Subjective Probability. In M. Suárez, M. Dorato, & M. Rédei (Eds.) EPSA Epistemology and Methodology of Science: Launch of the European Philosophy of Science Association, (pp. 95–105). Springer Netherlands. Greaves, H., & Wallace, D. (2006). Justifying Conditionalization: Conditionalization Maximizes Expected Epistemic Utility. Mind, 115(459), 607– 632. Ismael, J. (2006). The Ethical Importance of Death. In C. Tandy (Ed.) Death And Anti-Death, Volume 4: Twenty Years After De Beauvoir, Thirty Years After Heidegger, (pp. 181–98). Palo Alto: Ria University Press. Joyce, J. M. (1998). A Nonpragmatic Vindication of Probabilism. Philosophy of Science, 65(4), 575–603. Joyce, J. M. (2009). Accuracy and Coherence: Prospects for an Alethic Epistemology of Partial Belief. In F. Huber, & C. Schmidt-Petri (Eds.) Degrees of Belief . Springer. Kelley, M. (ms). On Accuracy and Coherence with Infinite Opinion Sets. Unpublished manuscript. Leitgeb, H., & Pettigrew, R. (2010). An Objective Justification of Bayesianism I: Measuring Inaccuracy. Philosophy of Science, 77, 201–235. Maher, P. (2002). Joyce's Argument for Probabilism. Philosophy of Science, 69(1), 73–81. Moss, S. (2011). Scoring Rules and Epistemic Compromise. Mind, 120(480), 1053–1069. Oddie, G. (2019). What Accuracy Could Not Be. British Journal for the Philosophy of Science, 70, 551–80. Pettigrew, R. (2013). A New Epistemic Utility Argument for the Principal Principle. Episteme, 10(1), 19–35. Pettigrew, R. (2016a). Accuracy and the Laws of Credence. Oxford: Oxford University Press. Pettigrew, R. (2016b). Accuracy, Risk, and the Principle of Indifference. Philosophy and Phenomenological Research, 92(1), 35–59. Pettigrew, R. (2019). On the Accuracy of Group Credences. In T. S. Gendler, & J. Hawthorne (Eds.) Oxford Studies in Epistemology, vol. 6. Oxford: Oxford University Press. Pettigrew, R. (2020). Logical Ignorance and Logical Learning. Synthese. 26 Predd, J., Seiringer, R., Lieb, E. H., Osherson, D., Poor, V., & Kulkarni, S. (2009). Probabilistic Coherence and Proper Scoring Rules. IEEE Transactions of Information Theory, 55(10), 4786–4792. Roussos, J. (2020). Policymaking under Scientific Uncertainty. Ph.D. thesis, London School of Economics. Savage, L. J. (1971). Elicitation of Personal Probabilities and Expectations. Journal of the American Statistical Association, 66(336), 783–801.