ACCEPTANCE, AGGREGATION AND SCORING RULES Jake Chandler∗ (Forthcoming in Erkenntnis) Abstract This article provides a novel perspective on the vexed issue of the relation between probability and rational acceptability, exploiting a recently-noted structural parallel with the problem of judgment aggregation. After offering a number of general desiderata on the relation between finite probability models and sets of accepted sentences in a Boolean sentential language, it is noted that a number of these constraints will be satisfied if and only if acceptable sentences are true under all valuations in a distinguished non-empty set W. Drawing inspiration from distance-based aggregation procedures, various scoring rule based membership conditions for W are discussed and a possible point of contact with ranking theory is considered. The paper closes with various suggestions for further research. Keywords acceptance distance-based aggregation model coarsening / refinement probability ranking functions scoring rules 1 Introduction The formal modeling of doxastic states appears to operate at different levels of granularity. At the finer end of the scale, we encounter for instance– in the so-called Bayesian tradition–'graded' representations in terms of sets of real-valued functions over some formal language. At the coarser ∗Address for correspondence: Center for Logic and Analytic Philosophy, HIW, KU Leuven, Kardinaal Mercierplein 2, 3000 Leuven, Belgium. Email: jacob.chandler[at]hiw.kuleuven.be 1 end, we find–most notably in some areas of the belief-revision literature– 'all-or-nothing' representations in terms of sets of sentences or valuations. These differences in modeling choices seemingly map onto a corresponding heterogeneity in folk psychological practice. Belief reports do indeed come both in the form of attributions of degrees of confidence, aka 'credences', (e.g. 'I am pretty sure that he needs help.') and in the form of unqualified attributions of 'full' belief (e.g. 'He believes in fairies.'). Whilst a unification of these various types of models, and of the phenomena they purport to represent, would be an extremely desirable achievement, it unfortunately seems fair to say that the history of attempts to provide a single, overarching framework–in particular a reduction of the coarser level of description to the finer one–encourages a certain amount of pessimism. In what follows, however, I hope to show that this pessimism is premature. The current track record of the unificationist camp is indeed poor; there is no disputing that. But, as Chandler (2010) has argued, the root cause of this mediocre performance is easy to pinpoint: the blame lies in the uncritical endorsement of a rather dubious would-be constraint on the relation between graded and all-or-nothing belief. This putative constraint, which I shall call Independence, turns out to be a close cousin of a homonymous troublemaker discussed in the judgment aggregation literature. Loss of Independence, one might say, paves the way to unification. But now since existing proposals invariably satisfy the constraint, it is of interest to see what kinds of alternatives may be on offer. The aim of this article is to outline and briefly discuss a family of such alternatives, inspired by recent work on the problem of judgment aggregation. The paper will proceed as follows. Section 2 presents the basic framework and notation employed throughout, discuss some desiderata on the mapping between degrees of confidence and full belief and present some baseline results. Section 3 spells out the relationship between the task at hand and the problem of judgment aggregation, offering a brief overview of a family of distance-based aggregation methods that have enjoyed a certain degree of popularity in the recent computer science literature. Section 4 then outlines a corresponding family of scoring rule -based mappings from degrees of confidence onto sets of accepted sentences. To illustrate this general approach, a sample of noteworthy members of this family are presented and some of their respective properties flagged out. Section 5 notes that these distance-based mappings suggest yet a further potential unificatory strategy, this time pertaining to the vexed issue of the relation 2 between probabilistic and ranking-theoretic models of belief. Section 6 briefly concludes with some suggestions for future research. 2 Formal preliminaries For the purposes of the current paper, the credal state of mind of a rational agent will be assumed to be representable by a probability modelM. Whilst more expressive types of representations, such as sets of probability models, are of course available, this simplification will help keep the discussion focused. M will be defined as a pair 〈L,Pr〉, where: (i) L is a sentential language constructed from a finite set P of m atomic sentences by means of the usual Boolean connectives ∧, ∨ and ¬, and (ii) Pr is a probability function with domain L, obeying the following standard axioms, where ` denotes the relation of classical consequence and > and ⊥ respectively denote an arbitrary classical tautology and an arbitrary classical contradiction: for any φ, ψ ∈ L, (i) if φ ` ⊥, then Pr(φ) = 0, (ii) if > ` φ, then Pr(φ) = 1, (iii) if φ ` ψ, then Pr(φ) ≤ Pr(ψ) and (iv) if φ ∧ ψ ` ⊥, then Pr(φ ∨ ψ) = Pr(φ) + Pr(ψ). V will denote the set of valuations {v1, v2, . . . , v2m} of L, which are total functions from L to the set of classical truth values {0, 1}. Furthermore, JφK will denote the set of valuations that validate φ (i.e. {v ∈ V : v(φ) = 1}) and conversely, where W is some set of interpretations, φW will be used to denote an arbitrary sentence such that W = {v ∈ V : v(φW) = 1}. By abuse of notation, we shall sometimes write v for {v} and vice versa. Finally, a rational acceptance function Acc is a particular kind of function that maps pairs of probability models M = 〈L,Pr〉 and sentences φ ∈ L onto {0, 1}. The intended interpretation is that Acc(M, φ) = 1 iff it is rationally permissible to fully believe φ given that one's credences are representable by M.1,2 Acc(M, * ) is, in other terms, the 'characteristic function' of the belief set that is rationally permissible givenM. Clearly, in order to ensure that the set of accepted sentences is 'wellbehaved', one should require the following: Zero-Normalisation: For any probability model M = 〈L,Pr〉 and sentence φ ∈ L, if φ ` ⊥ then Acc(M, φ) = 0. Unit-Normalisation: For any probability model M = 〈L,Pr〉 and sentence φ ∈ L, if > ` φ then Acc(M, φ) = 1. 3 Deductive Closure: For any probability model M = 〈L,Pr〉, sentence φ ∈ L and set of sentences Γ ⊆ L, if for any ψ ∈ Γ, Acc(M, ψ) = 1 and Γ ` φ, then Acc(M, φ) = 1. These correspond, respectively, to the claims that it is never permissible to accept a contradiction, always permissible to accept a tautology and always permissible to accept the logical consequences of what one is permitted to accept. It seems, however that one should not require Opinionation: For any probability modelM = 〈L,Pr〉 and sentences φ, ψ ∈ L, either Acc(M, φ) = 1 or Acc(M,¬φ) = 1. which states that, for any sentence φ that a rational agent S can entertain, S always accepts either φ or its negation. Another important and fairly uncontroversial constraint is the following: Non-Unanimity: For some probability modelM = 〈L,Pr〉 and some sentence φ ∈ L, Pr(φ) < 1 and Acc(M, φ) = 1. This reflects the commonly-held intuition that one can accept sentences whose truth one is not one hundred percent certain of. Perhaps more controversial, but in my view prima facie plausible, is Structurality: For any model M = 〈L,Pr〉, any automorphism π ofM and any sentence φ ∈ L, Acc(M, φ) = Acc(M, π(φ)), where an automorphism π of a probability model M = 〈L,Pr〉 is a 1 : 1 function L 7→ L, such that, for any φ, ψ ∈ L, (i) π(φ ∧ ψ) = π(φ) ∧ π(ψ), (ii) π(¬φ) = ¬π(φ), (iii) π(φ ∨ ψ) = π(φ) ∨ π(ψ) and (iv) Pr(φ) = Pr(π(φ)). This principle can be understood as stating that the acceptability of a sentence with respect to a model M supervenes on the logical and probabilistic properties of M. As Douven and Williamson (2006) point out, Structurality, Deductive Closure and Zero-Normalisation jointly entail Lottery-Proofness: For any finite probability model M = 〈L,Pr〉 such that for any v, v∗ ∈ V , Pr(φv) = Pr(φv∗) and any sentence ψ ∈ L, if Pr(ψ) < 1, then Acc(M, ψ) = 0.3 which states that no contingent sentence is rationally acceptable with respect to a finite uniform probability model.4 Douven and Williamson appeared to reject this principle. In fact, however, as argued by Chandler 4 ([2010, pp.8–9]), this property is intuitively correct, if one grants one of the working assumptions of this paper, namely that acceptability is determined by a credal state modeled as a single probability model. The grounds for Douven and Williamson's objection may have been rooted in a tacit commitment to the following principle: Monotonicity: For any pair of probability modelsM = 〈L,Pr〉 and M∗ = 〈L∗,Pr∗〉 and sentences φ ∈ L and ψ ∈ L∗, if Pr(φ) ≤ Pr∗(ψ) then Acc(M, φ) ≤ Acc(M∗, ψ). Indeed, it is easy to show the following: Theorem 2.1. Lottery-Proofness, Monotonicity and NonUnanimity are jointly inconsistent. In response to this, it was argued that no clear justification for Monotonicity had been given and that, in view of the prima facie plausibility of Structurality, Deductive Closure and Zero-Normalisation, as well as the independent plausibility of Lottery-Proofness, Monotonicity ought to be rejected. The same line can be taken with respect to the somewhat weaker Independence: For any pair of probability modelsM = 〈L,Pr〉 and M∗ = 〈L∗,Pr∗〉 and sentence φ ∈ L,L∗, if Pr(φ) = Pr∗(φ) then Acc(M, φ) = Acc(M∗, φ). By a similar chain of reasoning to the one above, Independence can be shown to be incompatible with the conjunction of Lottery-Proofness and a seemingly unobjectionable strengthening of Non-Unanimity to the claim that there exists a rational-valued sub-unit probability sentence that is rationally acceptable (calls this Non-Unanimity+). But it would seem entirely arbitrary to insist that no rational-valued sub-unit probability sentence is rationally acceptable, but maintain that some irrational-valued subunit probability sentence is so. So denying the stronger version of NonUnanimity, presumably leads to denying the weaker one, which, I submit, is unacceptable. So Independence must surely be given up too. Finally, to the aforementioned desiderata, it is clear that one should furthermore add Responsiveness: For some pair of probability models M = 〈L,Pr〉 and M∗ = 〈L,Pr∗〉 and some sentence φ ∈ L, Acc(M, φ) , Acc(M∗, φ). 5 This states that the underlying probability distribution makes a difference as to whether or not a sentence is rationally acceptable. Let us briefly take stock. We have argued that the following are to be categorically endorsed: Unit-Normalisation, Zero-Normalisation, Deductive Closure, Non-Unanimity+, Lottery-Proofness and Responsiveness. We have more tentatively endorsed Structurality and have categorically rejected Opinionation. In the light of Theorem 2.1, Independence, and hence Monotonicity, has also been categorically rejected, due the endorsement of Non-Unanimity+ and Lottery-Proofness. Can we find a function that simultaneously satisfies all of these constraints? It turns out that the range of possible avenues is limited by the following elementary result: Theorem 2.2. Unit-Normalisation, Zero-Normalisation and Deductive Closure hold iff there exists a non-empty set W ⊆ V of valuations such that, for any φ ∈ L, Acc(M, φ) = 1 iff W ⊆ JφK. So we need a procedure to select this non-empty set W of valuations, and one that will yield an acceptance function that fares well on the remaining desiderata. How should this be done? 3 Taking cue from judgment aggregation In a recent article, Douven and Romeijn (2007) hint at the existence of a striking parallel between the issue of individual-level rational acceptability or sentences relative to a probability model and the issue of group-level rational acceptability of sentences relative to the opinions of a set of agents, aka the problem of rational judgment aggregation. In formal terms, the problem of rational judgment aggregation involves the characterisation of a two-place aggregation function Agg, this time mapping pairs of opinion models M and sentences in L onto {0, 1}, such that Agg(M, φ) = 1 iff φ is rationally acceptable, at the group level, with respect to M. An opinion model is a pair 〈L,O〉, where O, known as an 'opinion profile', is an n-tuple 〈ψ1, . . . , ψn〉 of classically consistent sentences in L. These sentences are known as 'opinions', and represent the respective full beliefs of the members of a group of n rational agents.5 In cases in which the range of Pr is a subset of the rational numbers Q, we could think of the problem of rational acceptability as a special case of the problem of judgment aggregation in which (i) the opinions aggregated 6 are maximally strong consistent sentences in L and (ii) the aggregation function Agg is subject to the following constraints: Anonymity: For any opinion models M = 〈L, 〈ψ1, . . . , ψn〉〉 and M∗ = 〈L, 〈π(ψ1), . . . , π(ψn)〉〉, with ψi ∈ L(1 ≤ i ≤ n), where π is a permutation of 〈ψ1, . . . , ψn〉, and any φ ∈ L, Agg(M, φ) = Agg(M∗, φ). Duplication: For any opinion model M = 〈L, 〈ψ1, . . . , ψn〉〉 and M∗ = 〈L, 〈ψ1, . . . , ψn, ψ1, . . . , ψn〉〉, with ψi ∈ L(1 ≤ i ≤ n), and any φ ∈ L, Agg(M, φ) = Agg(M∗, φ). Indeed, we could view an assignment of rational-valued probabilities to maximally strong consistent sentences as a function returning the relative frequencies of corresponding opinions in the opinion set of a maximally opinionated group. The parallel deepens when we turn to the kinds of constraints that are typically imposed on aggregation functions. We find, for instance, endorsements of precise aggregation-theoretic analogues of Unit-Normalisation, Zero-Normalisation, Deductive Closure, NonUnanimity+ and Responsiveness. Furthermore, although the analogue of Opinionation is an admittedly widespread constraint, this has also occasionally been waived. Finally, concerns with respect to the obvious analogue of Independence, which is crucially involved in a number of aggregation-theoretic impossibility results6, have also been raised. In view of all this, it will come as little surprise that recent developments in the aggregation literature suggest an attractive answer to the question posed at the end of the previous section. Konieczny, Lang and Marquis (2004) provide an overview of a large family of aggregation functions based on the notion of distance between valuations and opinion profiles.7 The idea behind this class of proposals is strikingly simple: first (i) provide a measure of the distance between the various valuations ofL and the tuple of individual-level opinions, then (ii) select, as adopted at the group level, all and only those sentences that are validated by all the valuations that are close enough. As we shall see, Konieczny et al. take 'close enough' to mean 'closest'. It is however worth noting that alternative options could be explored. One could also for instance select all and only those valuations that are situated within some suitably chosen distance t. I shall return to this kind of variant in the next section. 8 7 With some relevant simplifications, the calculation of the distance between a valuation v and an opinion profile O proceeds as follows. Step 1: For every v∗ ∈ V , provide a measure of the distance between v and v∗. Step 2: For every ψ inO, construct a measure of the distance between v and ψ, by aggregating the distances between v and the valuations that validate ψ. Step 3: Construct a measure of the distance between v and O, by aggregating, in turn, the distances between v and each ψ in O. So we start off with a 'distance' function d mapping pairs of valuations onto R+. The requirements imposed by Konieczny and his colleagues are very minimal: Definition 3.1. d is a distance between valuations iff it is a total function from V×V to R such that, for every v, v∗ ∈ V, (i) d(v, v∗) ≥ 0 (Positivity), (ii) d(v, v∗) = d(v∗, v) (Symmetry), and (iii) d(v, v∗) = 0 iff v = v∗ (Minimality).9 Of course, one could just take d as primitive and simply add a further argument to the aggregation function. Primitivism regarding distance between valuations is not unheard of in philosophical circles (e.g. Hilpinen 1976), but it is somewhat mysterious and unparsimonious nevertheless. Indeed, Konieczny et al do not even discuss the option and focus instead on two particular alternatives, both of which are, incidentally, bounded above (by 1), which does not seem to be an undesirable feature. One of these alternatives is what they call the 'drastic distance' (dD): Definition 3.2. dD(v, v∗) := 0 if v = v∗ and equal to 1 otherwise. The other is the normalised weighted Hamming distance (dHq): Definition 3.3. dHq(v, v∗) := ∑ p∈S q(p), where S := {p ∈ P : v(p) , v∗(p)} and q is a total function from P to R+ such that ∑ p∈P q(p) = 1.10 If we take q to be a constant function, treating all mismatches symmetrically, we obtain the familiar normalised Hamming distance (dH). 8 Definition 3.4. dH(v, v∗) := |S | |P| , where S is defined as in Definition 3.3. Once a measure of the distance between valuations has been settled on, we obtain distances between valuations and opinion profiles in steps 2 and 3 by means of two successive distance aggregation procedures. Konieczny et al. offer three constraints on such procedures: Definition 3.5. g is a distance aggregation function iff it is a function from R+n to R+, such that (i) g is non-decreasing in every argument, (ii) g(r1, r2, . . . , rn) = 0 iff r1 = r2 = . . . = rn = 0 and (iii) g(r) = r. To these, one would presumably want to add the following internality requirement: (iv) min{r1, r2, . . . , rn} ≤ g(r1, r2, . . . , rn) ≤ max{r1, r2, . . . , rn}. Step 2 establishes the distance d′ between valuations and sentences, by aggregating the distances between these valuations and the valuations that validate the sentences. Although there is a fair amount that could be said here, given (iv), which is satisfied by Konieczny et al.'s own proposal for d′, the specifics of this step are of no great relevance to the present paper. Indeed, in the cases that we are interested in, the ψi are maximally strong consistent sentences, and the JψiK are singletons, so we will wind up with the same value for d′(v, φ) whatever the particular d′ we settle on. Of more central concern to the present case is Step 3, in which we aggregate the distances d′(v, ψi) between v and the ψi in O to obtain a distance D(v,O) between v andO. There is a large number of available options here, such as the max and min functions, as well as various weighted and non-weighted means: As mentioned earlier, Konieczny et al. simply make use of distances between valuations and opinion profiles to establish a total preorder in V , taking as acceptable, at the group level, all and only those sentences that are validated by the valuations that are in the minimal set. This leaves us with: For any opinion modelsM = 〈L,O〉 and φ ∈ L, Agg(M, φ) = 1 iff v(φ) = 1, for any v ∈ V such that, for any v∗ ∈ V , D(v,O) ≤ D(v∗,O). 9 It is worth taking a brief look at some of the general properties of the resulting proposal, in the case of interest in which the opinions are maximally strong. By the aggregation-theoretic analogue of Theorem 2.2, we already know that Agg will satisfy the analogues of Unit-Normalisation, Zero-Normalisation and Deductive Closure. Responsiveness straightforwardly holds by virtue of definitions 3.1 and 3.5. Anonymity, which we are keen to preserve in the present context, obviously corresponds to symmetry of D, i.e. its invariance under argument permutation. This fails for weighted means (with non-uniform weighting function). In the presence of Anonymity, the analogue of Structurality would be secured by the symmetry of the distance matrices of dD and dH along their main diagonal. The analogue of Independence, however, needn't hold: Example 3.3. Let P = {φ, ψ}, O1 = 〈φ∧ψ, φ∧¬ψ,¬φ∧ψ,¬φ∧¬ψ〉 and O2 = 〈φ ∧ ψ, φ ∧ ψ,¬φ ∧ ψ,¬φ ∧ ¬ψ〉. Let d = dD and D be the arithmetic mean function. It is easily verified that all valuations are equidistant from O1 but that the sole member of Jφ ∧ ψK is uniquely closest to O2. Hence Agg(M1, φ ∧ ψ) = 0 but Agg(M2, φ ∧ ψ) = 1. The same example demonstrates that Opinionation can also fail, since we also have Agg(M1,¬(φ∧ψ)) = 0. Finally, the analogue of Non-Unanimity fails in some cases, for instance if we set d = dH and D = max. 4 Acceptance and scoring rules We can adapt and generalise the kind of strategy outlined in the previous section in the following manner: (a) Provide a measure D of the distance between the v ∈ V and Pr, which, we shall assume, satisfies constraints of both positivity and minimality (see Definition 3.1). (b) Use this to select a set W of 'close enough' valuations, such that, for any v, v∗ ∈ V , if v ∈ W and D(v∗,Pr) ≤ D(v,Pr), then v∗ ∈ W. (c) Define Acc as follows: For any φ ∈ L, (i) if W = ∅, then Acc(M, φ) = 1 iff > ` φ, (ii) otherwise Acc(M, φ) = 1 iff W ⊆ JφK.11 10 Regarding membership conditions for W, we have two salient options. The first of these, the analogue of which was endorsed by Konieczny et al., we shall call 'Min', for 'minimising'. The second, whose analogue has, to the best of my knowledge, yet to be discussed in the aggregation-theoretic literature, we shall call 'Sat', for 'satisficing'. Min: v ∈ W iff v ∈ V and ∀v∗ ∈ V(D(v∗,Pr) ≥ D(v,Pr)). Sat: v ∈ W iff v ∈ V and D(v,Pr) ≤ t, for some appropriate t ∈ R+. Regarding D, the literature is already replete with various suggestions under the name of so-called 'scoring rules'. It turns out however that most proposals in usage satisfy ≥Pr-Reversal: D(v∗,Pr) ≥ D(v,Pr) iff Pr(φv) ≥ Pr(φv∗). These include the family of exponential scores, of which the well-known Brier score is a member, as well as the logarithmic score (Dlog), which we will return to in the next section: Definition 4.1. Dlog(v,Pr) := − log(Pr(φv)). Given ≥Pr-Reversal, whatever the particular choice of D, the upshot of Min would be that it is permissible to accept all and only those sentences that are true in all the most probable worlds. Call the resulting function 'Acc1'. This does have the arguable drawback of allowing for acceptance of sentences in whose truth one has an arbitrarily small degree of confidence. Opting for Sat avoids this consequence, since it permits acceptance of all and only those sentences that are true in all the those worlds whose probability exceeds a certain threshold. Call the resulting function 'Acc2'. It is easily checked that both Acc1 and Acc2 satisfy both Structurality and Consensus Preservation, and violate Opinionation. Furthermore NonUnanimity fails for Acc1 and does so for Acc2 as well, unless we set t to 0.12 Interestingly enough, some relevant functions do violate ≥Pr-Reversal. One particularly interesting example, which isn't found in the scoring rule literature but whose analogue appears to be taken quite seriously in the aggregation literature, is the following: DH(v, Pr) = ∑ v∗∈V Pr(φv∗)dH(v, v∗) 11 There are a number of potential issues worth noting here, however. To begin with, as has been famously noted in the verisimilitude debate, the Hamming distance is not invariant under translation (see Miller's (1974) well-known criticism of Tichý (1974, 1976)). Example 4.1. Let L1 be built up from the set of atomic sentences P1 = {φ, ψ, χ} and L2 from P2 = {φ, β, γ}. Let the binary relation R pair up sentences in L1 with their synonymous counterparts in L2 and let 〈φ↔ ψ, β〉, 〈φ↔ χ, γ〉 ∈ R. We have, in R, 〈φ∧ψ∧χ, φ∧β∧ γ〉, 〈¬φ∧ψ∧χ,¬φ∧¬β∧¬γ〉 and 〈¬φ∧¬ψ∧¬χ,¬φ∧β∧γ〉. Now dH(Jφ ∧ ψ ∧ χK, J¬φ ∧ ψ ∧ χK) = 1/3 < dH(Jφ ∧ ψ ∧ χK, J¬φ ∧ ¬ψ ∧ ¬χK) = 1. However, the ordering is reversed when we substitute the following L2 counterparts: dH(Jφ ∧ β ∧ γK, J¬φ ∧ ¬β ∧ ¬γK) = 1 > dH(Jφ ∧ β ∧ γK, J¬φ ∧ β ∧ γK) = 1/3. But this, so the worry might go, is bad news, since it is easy to show that it will result, given either Min (call the resulting function 'Acc3') or Sat (call the resulting function 'Acc4'), in an undesirable translation-sensitivity of acceptability.13, 14 Setting this issue aside, both Acc3 and Acc4 violate the following property of Consensus Preservation, which could be argued to be intuitively compelling: Consensus Preservation: For any finite probability model M = 〈L,Pr〉 and sentence φ ∈ L, if Pr(φ) = 1 then Acc(M, φ) = 1.15 To illustrate the result for Acc3: Example 4.2. Let P = {φ, ψ} and Pr(¬φ∧ψ) = Pr(φ∧¬ψ) = 1/2 . Let D = DH. It is easily verified that all valuations are equidistant from Pr and hence, although Pr(¬(φ↔ ψ) = 1, Acc(M,¬(φ↔ ψ)) = 0. For these reasons, and pending further discussion of the normative status of the constraints flouted, it seems fair to treat both Acc3 and Acc4 with a certain degree of caution. To wrap up this section, it is worth briefly considering an issue that has been somewhat overlooked in the literature on probability and acceptability, namely the behaviour of acceptability under probability model refinement or coarsening. Indeed, it has been put to me that the following property may be desirable: 12 Preservation under Refinement: Where M+ = 〈L+,Pr+〉 is a refinement of M = 〈L,Pr〉, for any φ ∈ L, if Acc(M, φ) = 1, then Acc(M+, φ) = 1. Preservation under Coarsening: Where M+ = 〈L+,Pr+〉 is a refinement of M = 〈L,Pr〉, for any φ ∈ L, if Acc(M+, φ) = 1, then Acc(M, φ) = 1. Where a refinement is defined as follows: Definition 4.2. M+ = 〈L+,Pr+〉 is a refinement ofM = 〈L,Pr〉 and M a coarsening of M+ iff L ⊂ L+ and, for any φ ∈ L, Pr(φ) = Pr+(φ). It is worth noting that Smith (forth.) appears to have recently–albeit cautiously–endorsed the second of the above principles, which plays a crucial role in the impossibility result that is central to his paper. Now Independence obviously guarantees the satisfaction of both constraints. However, both are violated–and hence so to is Independence–by both Acc1 and Acc216. The following simple example illustrates the point for Acc1: Example 4.3. Let P = {φ} and P+ = {φ, ψ}. Furthermore, let Pr(φ) = 0.6, Pr+(φ ∧ ψ) = 0.3 Pr+(φ ∧ ¬ψ) = 0.3, Pr+(¬φ ∧ ψ) = 0.4 Pr+(¬φ ∧ ¬ψ) = 0. It is easily verified that we have Acc1(M, φ) = 1 and Acc1(M,¬φ) = 0, but Acc1(M+, φ) = 0 and Acc1(M+,¬φ) = 1. But do the principles constitute intuitive desiderata in the first place? This far from clear. If constraints on the preservation of acceptability under model coarsening or refinement are indeed in order, these should at the very least be restricted to changes in granularity that are not accompanied by the loss or acquisition of further information. The above conditions are simply too strong. A more promising, weaker constraint would be invariance of acceptability in the presence of a relation of conservative refinement between the two models: Definition 4.3.M+ = 〈L+,Pr+〉 is a conservative refinement ofM = 〈L,Pr〉 iff (i) L ⊂ L+ (ii) for any maximally strong consistent φ ∈ L, and maximally strong consistent φ+1 , φ + 2 ∈ L +, such that φ+i 0 ¬φ, with 1 ≤ i ≤ 2, Pr+(φ+1 ) = Pr +(φ+2 ). 17 13 It is easy to show that the resulting weakened principles are both satisfied by Acc1. This follows pretty immediately from the fact that, for any maximally strong consistent sentences φ1, φ2 ∈ L, the sets of L+-valuations Jφ1K and Jφ2K will have the same cardinality. Matters, however, are somewhat different for Acc2, when t > 0. The weakening of Preservation under Refinement clearly fails. The weakening of Preservation under Coarsening, however, holds. 5 Probability and ranking theory For a number of years, Wolfgang Spohn has been floating a fairly influential model of rational degrees of belief that appears, on the face of it, to be a genuine alternative to the probabilistic view. This well worked-out model, known as 'ranking theory' ships with a number of attractive features, including various credence update procedures, as well as a story regarding the relation between graded and full belief. Ranking functions are defined as follows: Definition 5.1. A ranking function κ is a function fromL to R+∪{∞} such that, for any φ, ψ ∈ L, (i) if φ ` ⊥, then κ(φ) = ∞, (ii) if > ` φ, then κ(φ) = 0, (iii) if φ ` ψ, then κ(ψ) ≤ κ(φ) and (iv) κ(φ ∨ ψ) = min{κ(φ), κ(ψ)}. A rank of ∞ plays a role that is somewhat analogous to that of a credence of 0 in probabilistic frameworks. In particular, the rank of ∞ is invariant under strict ranking-theoretic conditionalisation, just as a credence of 0 is invariant under strict probabilistic conditionalisation, as the following definition of a conditional ranking functions makes clear: Definition 5.2. Where κ is a ranking function with domain L, φ, ψ ∈ L and κ(φ) < ∞, the quantity κ(ψ | φ) = κ(ψ ∧ φ) − κ(φ) is the conditional rank of ψ given φ.18 Adapting the notation to harmonise with the present paper, the account of acceptability on offer is the following, where Acc denotes this time a function from pairs of ranking modelsM = 〈L, κ〉 and sentences φ ∈ L: Min Rank: For any ranking models M = 〈L, κ〉 and all φ ∈ L, Acc(M, φ) = 1 iff κ(φ) < κ(¬φ). 14 In a recent article, Spohn (2009) ponders over the 'suprisingly complex and fascinating' relation between the ranking-theoretic and probabilistic pictures. He notes some superficial connections and suggests a potential explanation: . . . translate the sum of probabilities into the minimum of ranks, and the quotient of probabilities into the difference of ranks. Thereby, the probabilistic law of additivity turns into the law of disjunction, the probabilistic law of multiplication into the law of conjunction (for negative ranks), and the definition of probabilities into the definition of conditional ranks. . . [T]ake any probabilistic theorem, apply the above translation to it, and you are almost guaranteed to get a ranking theorem. . . The translation of products and quotients of probabilities suggests that negative ranks simply are the logarithm of probabilities. [ibid, p. 209]19 He does then however note that the suggested picture may not be so clear, mentioning discrepancies regarding conditional independence and 'positive and non-negative instantial relevance', as well as the translation of sums of probabilities. However, the translation is not fool-proof. . . The issue is not completely cleared up. . . [The view of ranks as logarithms of probabilities] does not seem to fit with the translation of sums of probabilities. But it does fit when the logarithmic base is taken to be some infinitesimal i. . . But the discussion in the previous section suggests a somewhat more straightforward picture of the relationship between ranks and probabilities. Indeed, it is worth noting the following alternative formulation of Min Rank: For any ranking modelsM = 〈L, κ〉 and all φ ∈ L, Acc(M, φ) = 1 iff, for any v ∈ W, v(φ) = 1, where W = {v ∈ V : κ(φv) = 0}. It is easily verified that, in virtue of the constraints on ranking functions, W is guaranteed to be non-empty. But now we can see what is a really quite remarkable similarity with the kind of proposal made in the previous section. In both cases, we start 15 with some positive real -valued function f on the elements of V (previous section) or on the set of strongest consistent sentences in L (current section). This enables us to select a lower set W of valuations, such that if v ∈ W and f (v∗) ≤ f (v), or f (φv∗) ≤ f (φv), then v∗ ∈ W. Acceptability is then identified with truth in all elements of W. And the similarity is all the more striking if we recall the logarithmic distance Dlog, briefly mentioned earlier on. Indeed, the range of the function is R+ ∪ {∞}, just as is the case with ranking functions, with Dlog(v,Pr) = − log(Pr(φv)) = ∞ iff Pr(φv) = 0. This suggests an interpretation of ranks of maximally strong consistent sentences as renormalised negations of logarithms of their probabilities. Given Min Rank, the details of the renormalisation then have to depend on whether we opt for Min or opt for Sat, to ensure consistency. Here are two simple correspondences that would do the trick: For any v ∈ V , κ(φv) = minv∗∈V(− log(Pr(φv∗))) + log(Pr(φv)) (for Min) For any v ∈ V , κ(φv) = t −max{t,− log(Pr(φv))} (for Sat) We then straightforwardly recover the ranks for the remainder of the sentences in the language using Definition 5.1, obtaining a many-to-one mapping from probability function onto ranking functions. Interestingly, this would somewhat vindicate Spohn's intuition about a logarithmic connection between ranks and probabilities. The vindication, however, would only be partial: the connection would not quite be the one anticipated. 6 Concluding comments Due to space limitations and to the amount of ground to be covered, we have had to keep the model rather simple. The most obvious shortcoming is perhaps the chronic lack of expressiveness of the language. A move from a sentential to a predicate or modal language would yield a significant gain in realism. Another, admittedly somewhat minor, irritant is the current limitation to finitely generated languages. The infinitary case certainly does raise some further issues. For instance, proposals based on distanceminimising, rather than satisficing, need to address the issue of what to say with respect to the acceptability of a contingent sentence in the case in which there is an infinite sequence of ever-closer valuations. On the current 16 proposal, we would wind up with mandatory suspension of judgment, since W would be empty. There may however be a case for claiming that this kind of case, highlights what one might call a 'deontic blindspot' a point at which rationality fails to yield any recommendation whatsoever. But there are also technical complications prior to that stage, when it comes to computing the distances themselves. It is far from clear, in particular, how to suitably generalise the concept of normalised Hamming distance without running the risk of having W = V . Acknowledgments I am grateful to the members of the Formal Epistemology Project, KU Leuven, and to the audience of PROGIC 2009, Groningen for useful feedback on earlier versions of this paper. I am also indebted to two anonymous referees for this journal for the time and trouble that they took to provide exceptionally detailed and insightful reports. Part of the research for this article was funded by a Research Foundation – Flanders (FWO) postdoctoral research grant. Appendix Proof of Theorem 2.1. Assume Non-Unanimity. It follows that there exists a model M = 〈L,Pr〉 and sentence φ ∈ L such that Pr(φ) = p < 1 but Acc(M, φ) = 1. Since, as is well-known, there exists a rational number between any two distinct real numbers, there exists a rational number q = m/n (with m, n ∈ N), such that p < q < 1. LetM∗ = 〈L∗,Pr∗〉 be a uniform probability model, such that the cardinality of the set of valuations of L∗ is equal to m. Let ψ denote an arbitrary sentence validated by exactly n of the valuations. Now by Lottery-Proofness, Acc(M∗, ψ) = 0, since Pr(φ{w1,...,wm}) = q < 1. By Monotonicity, however, Acc(M ∗, φ{w1,...,wm}) ≥ Acc(M, φ) = 1. Contradiction.  Proof of Theorem 2.2. Let S := {φW : W ⊆ V} and φW  φW∗ iff W∗ ⊆ W. Let L be the lattice 〈S ,〉. Let B ⊆ S denote the set of φ ∈ S s.t. Acc(M, φ) = 1. Zero-Normalisation is true iff φ∅ = ⊥ < B. UnitNormalisation and Deductive Closure are true iff B is a filter of L. So the three conditions are true iff B is a proper filter of L. But every proper 17 filter is the intersection of a set of ultrafilters. For each of these ultrafilters u, there is some v ∈ V s.t. u = {φ ∈ S : v(φ) = 1}.  Notes 1Note that the assumption made here, that acceptability is a function of an underlying probability model, whilst commonplace in the literature, is not entirely uncontroversial. It rules out, for instance, the views that the acceptability of a sentence depends also depends on the practical payoffs associated with true/false negatives/positives (see Rudner 1953) or again that it is relative to a specific question, modeled as a partition of the language (Levi 1967). A discussion of these issues is however beyond the scope of the present paper. 2In what follows I will be using the expressions 'full belief in the truth of' and 'acceptance of' interchangeably. 3The name for this constraint originates in (Chandler 2010), and derives from the fact that were it to be violated, Structurality and Deductive Closure could be marshalled, lottery paradox -style, to yield a violation of Zero-Normalisation. The lottery paradox -proofness involved was dubbed 'weak' in the original paper for reasons that do not apply to the current model. 4In fact, Douven and Williamson prove a slightly stronger result, making use of a principle that its strictly weaker than Deductive Closure, namely: Aggregativity: For any probability modelM = 〈L,Pr〉 and sentences φ, ψ ∈ L, if Acc(M, φ) = 1 and Acc(M, ψ) = 1, then Acc(M, φ ∧ ψ) = 1. 5The reader familiar with the judgment aggregation literature will notice a considerable amount of simplification going on here. For instance, I am rather severely restricting the class of possible opinion models, which, in the aggregation-theoretic literature, notably involve opinion profiles that are tuples of possibly inconsistent sets of sentences in L. For reasons that will become clear shortly, these expository niceties can be dispensed with. 6See most notably Theorem 2 (a) of (Dietrich and List 2008), in which, in contrast to many other previous results, Opinionation plays no role. 7Distance based approaches to aggregation are also more recently discussed in Miller and Osherson (2009) and Pigozzi (2006). 8On this option, to ensure that the set of selected valuations isn't empty, one could specify that, in the event that all valuations are further than t, all valuations are selected. 9Strictly-speaking, this is not a distance, as the triangle inequality d(v, v∗)+d(v∗, v∗∗) ≥ d(v, v∗∗) need not be respected. 10In fact, Konieczny et al. use the non-normalised counterpart; the normalisation is introduced here to harmonise with the range of dD. 11Clause (i) is crucial here. Without it, in the event that stage (b) allows for W to be the empty set, we would have a violation of Zero-Normalisation, since, obviously, ∅ ⊆ JφK, for all φ ∈ L, including contradictions. 12Both suggestions were very briefly mentioned by Chandler (2010), where the connection with distance-based aggregation functions had not been drawn. 18 13This is of course not a suitable occasion to address the controversial issue of the normative status of a requirement of translation-invariance. The issues at play are complex and, as far as I can see, remain unresolved at this point. See (Zwart 2001), chapter 5 for a detailed overview of the debate. 14In the original draft of this paper, I had suggested that the issue of translationsensitivity of distance-based methods had been overlooked in the judgment aggregation literature. As an anonymous referee pointed out to me, however, I was wrong. Indeed, the issue is in fact briefly discussed in a recent piece by Cariani et al (2008). There, they first offer a theorem (Theorem 5, p. 17) to the effect that translation sensitivity is a property, not only of the Hamming distance, but of any distance measure that is not 'trivial', in the following sense: A measure d of distance between valuations is trivial iff there exists r ∈ R+ such that, for all v,w ∈ V, d(v,w) = r × dD(v,w). After presenting this result, they then go on to say: In short, only trivial distance measures are translation-invariant. . . Thus any judgment aggregation procedure that depends on a non-trivial distance measure will fail translation-invariance. Now the inference from translation-sensitivity of the distance measure to translationsensitivity of the corresponding aggregation procedure seems basically correct, given certain assumptions about the latter. But these assumptions are not provided and neither is the precise derivation: a little more work is required to establish the result, as Cariani et al have acknowledged in recent correspondence. 15The restriction to finite models is important here. Universal quantification over all probability models–finite or otherwise–would yield a principle that is incompatible with the conjunction of Aggregativity and Zero-Normalisation. Note, furthermore, that Consensus Preservation entails Responsiveness. 16When t > 0, since for t = 0, Independence is satisfied. 17As an anonymous referee has pointed out to me, there may be grounds to hold that conservative refinements are perhaps not as informationally innocent as I have suggested and hence that a requirement of preservation of acceptability under such refinements may not be in order. Indeed, the standard Bayesian suggestion of modeling a lack of opinionation with respect to a partition P by a uniform probability distribution over P, faces a number of apparent difficulties: Bertrand-style paradoxes, counterintuitive prescriptions in Ellsberg's urn decision problem, and so on. Whilst I do share the referee's worries here, a constraint of preservation under conservative refinement remains the best that can be achieved within the orthodox Bayesian framework that was assumed from the outset of this paper. 18This is the definition given in (Spohn 2009). Just as the standard ratio definition for conditional probability precludes conditionalisation on probability 0 sentences, it precludes conditionalisation on rank∞ sentences. There are however alternative accounts of both conditional probabilities and conditional ranks that, rightly or wrongly, waive this prohibition. The former will presumably be well-known to the reader. Regarding the latter, we have the following proposal from (Huber 2009): 19 κ(ψ | φ) = { κ(ψ ∧ φ) − κ(φ) if ψ 0 ⊥; ∞ if ψ ` ⊥. The reason for setting the conditional rank to ∞ in case ψ ` ⊥ is presumably to prevent ψ from receiving a rank of 0 upon conditionalisation on a sentence φ of rank∞ (since we would then have κ(ψ ∧ φ) − κ(φ) = ∞ − ∞ = 0), and having κ(* | φ) violate clause (i) of Definition 5.1. Somewhat curiously, however, note that here, contrary to what was the case in Definition 5.2, we no longer have the result that κ(ψ | ¬ψ) = ∞. Indeed, let κ(¬ψ) = ∞ and ψ 0 ⊥. By the above proposal, κ(ψ | ¬ψ) = κ(ψ ∧ ¬ψ) − κ(¬ψ) = ∞ −∞ = 0. This result could however be avoided by simply swapping ψ ` ⊥ (resp.ψ 0 ⊥) for ψ ∧ φ ` ⊥ (resp. ψ ∧ φ 0 ⊥) in the above definition. 19In Definition 5.1, we took the range of κ to be R+ ∪ {∞}. In some publications, however, including the one from which this quote was taken, the range is stated to be N ∪ {∞}. But this second option obviously doesn't square with the conjecture that ranks are logarithms of potentially real-valued probabilities. References Cariani, F., M. Pauly and J. Snyder (2008). Decision Framing in Judgment Aggregation. Synthese, 163:1–24 Chandler, J. (2010). The Lottery Paradox Generalised? British Journal for the Philosophy of Science, 61(3): 667–679. Dietrich, F. and C. List (2008). Judgment Aggregation without Full Rationality Social Choice and Welfare, 31(1):15–39. Douven, I. and Romeijn, J.-W. (2007). The Discursive Dilemma as a Lottery Paradox. Economics and Philosophy, 23:301–319. Douven, I. and Williamson, T. (2006). Generalizing the Lottery Paradox. British Journal for the Philosophy of Science, 57(4):755–779. Hilpinen, R. (1976). Approximate Truth and Truthlikeness. in Przelecki, et al (eds.) Formal Methods in the Methodology of the Empirical Sciences, Dordrecht, Reidel: 19–42. Huber, F. (2009). Belief and Degrees of Belief. In F. Huber and C. Schmidt-Petri (eds) Degrees of belief, Springer Synthese Library:1–33. Konieczny, S.J. Lang and P. Marquis (2004). DA2 Merging Operators. Artificial Intelligence, 157(1–2):49–79. 20 Levi, I. (1967). Gambling with Truth New York: Knopf. Miller, D. (1974). On the Comparison of False Theories by their Bases. British Journal for the Philosophy of Science, 25:178–88. Miller, D. (1976). Verisimilitude Redeflated. British Journal for the Philosophy of Science, 27(4):363–81. Miller, M. and D. Osherson (2009). Methods for Distance-Based Judgment Aggregation. Social Choice and Welfare, 32:575–601. Pigozzi, G. (2006) Belief Merging and the Discursive Dilemma: an argument-based account to paradoxes of judgment aggregation. Synthese, 152:285–298. Rudner, R. (1953). The Scientist Qua Scientist Makes Value Judgments. Philosophy of Science, 20(1):1–6. Smith, M. (forth.). A Generalised Lottery Paradox for Infinite Probability Spaces. Forthcoming in the British Journal for the Philosophy of Science. Spohn, W. (2009). A Survey of Ranking Theory. In F. Huber and C. Schmidt-Petri (eds) Degrees of belief, Springer Synthese Library:185–228. Tichý, P. (1974). On Popper's Definitions of Verisimilitude. British Journal for the Philosophy of Science, 25:155–60. Tichý, P. (1976). Verisimilitude Redefined. British Journal for the Philosophy of Science, 27:25–42. Zwart, S.D. (2001). Refined Verisimilitude. Springer Synthese Library.