Forthcoming in Philosophy of Science ANOTHER APPROACH TO CONSENSUS AND MAXIMALLY INFORMED OPINIONS WITH INCREASING EVIDENCE RUSH T. STEWART AND MICHAEL NIELSEN Abstract. Merging of opinions results underwrite Bayesian rejoinders to complaints about the subjective nature of personal probability. Such results establish that sufficiently similar priors achieve consensus in the long run when fed the same increasing stream of evidence. Initial subjectivity, the line goes, is of mere transient significance, giving way to intersubjective agreement eventually. Here, we establish a merging result for sets of probability measures that are updated by Jeffrey conditioning. This generalizes a number of different merging results in the literature. We also show that such sets converge to a shared, maximally informed opinion. Convergence to a maximally informed opinion is a (weak) Jeffrey conditioning analogue of Bayesian "convergence to the truth" for conditional probabilities. Finally, we demonstrate the philosophical significance of our study by detailing applications to the topics of dynamic coherence, imprecise probabilities, and probabilistic opinion pooling. Keywords. Conditionalization; consensus; convergence; imprecise probabilities; Jeffrey conditioning; learning; merging of opinions 1. Introduction Merging of opinions results (e.g., Blackwell and Dubins, 1962; Gaifman and Snir, 1982; Schervish and Seidenfeld, 1990; Kalai and Lehrer, 1994; Huttegger, 2015b) have underwritten Bayesian rejoinders to complaints about the subjective nature of personal probability (e.g., Savage, 1954; Schervish and Seidenfeld, 1990; Earman, 1992). Such results establish that sufficiently similar priors achieve consensus in the long run when fed the same increasing stream of evidence. Initial subjectivity, the line goes, is of mere transient significance, giving way to intersubjective agreement eventually. Schervish and Seidenfeld (1990) show that Blackwell and Dubins's classic result (1962) can be extended to certain sets of probability functions, while Huttegger (2015b) provides sufficient conditions for merging of opinions for Jeffrey conditioning, a generalization of Bayesian conditionalization. We establish that Huttegger's merging result for Jeffrey conditioning can in turn be extended to sets of probability functions in analogy to Schervish and Seidenfeld's extension (Section 6). We also extend Huttegger's convergence result (to a "maximally informed opinion") for Jeffrey conditioning to convergence of a set of probabilities to a shared maximally informed opinion (Section 7). Convergence to a maximally informed opinion is a (weak) Jeffrey conditioning analogue of Bayesian "convergence to the truth" for conditional probabilities. There are a number of motivations for considering merging and convergence for sets of probabilities. For example, according to proponents of imprecise probabilities (IP), rationality does not always demand numerically precise probability judgments of an agent-to say nothing of the descriptive adequacy of such precise judgments (e.g., Levi, 1974; Walley, 1991). In this context, merging and convergence results show that a single agent with a certain sort of imprecise credal state fully expects to uniformly strengthen her point of view towards a consensus among the set (merging) and for the consensus point of view to Date: June 1, 2018. 1 2 STEWART AND NIELSEN stabilize (convergence). In effect, evidence reduces an agent's imprecise credal state towards a precise probability, at least in certain cases. In Section 8.2, we discuss how our results establish that "certain evidence" is unnecessary for the reduction of an imprecise credal state towards a stable, precise probability judgment. Another motivation comes from considering these points in a social setting. In such a setting, sets represent beliefs in some community. In Section 8.3, we detail an application of our study to probabilistic opinion pooling with imprecise probabilities (Elkin and Wheeler, 2018; Stewart and Ojea Quintana, 2018b). We also discuss Schervish and Seidenfeld's claim that such results for sets of probabilities allow us to relax so-called dynamic coherence to a large extent. In the case of Bayesian updating, for certain asymptotic properties like convergence to certainty and long-run consensus, it is enough that posteriors take any value in particular set. In principle, our Propositions 1 and 2 establish that a similar point can be made about relaxing dynamic coherence for Jeffrey conditioning. However, we discuss some reservations about the precise significance of this point in Section 8.1. 2. Preliminaries Let Ω be a sample space, a set of elementary events or possible worlds. We let F denote a σ-algebra of subsets of Ω, i.e., a non-empty set of subsets of Ω closed under complementation and countable unions. Elements of F can be interpreted as events or propositions. For example, Ω could be the set of all infinite sequences of tosses of a coin, and F would be the set of relevant coin tossing events. Included in F would be propositions describing finite initial segments of a sequence like the first flip landing heads, as well as propositions describing the limiting behavior of the sequence such as limn→∞Hn/n = 1/4, where Hn is the total number of heads on the first n flips. Throughout the paper uppercase blackboard letters like P and Q denote (countably additive) probability measures on (Ω,F).1 An event A ∈ F is said to occur almost surely with respect to P, or a.s. (P), if P[A] = 1. Let E1,E2, ... be an infinite sequence of (finite) partitions of Ω. We suppose that, for any n, En+1 is a refinement of En, i.e., every element of En+1 is a subset of an element of En. For any n, the partition En can be thought of as the possible information an agent might receive about the actual world ω ∈ Ω. A single flip of a coin determines a binary partition of the set of all infinite sequences of tosses: the coin lands heads or the coin lands tails. Let Fn be the algebra generated by En. Besides ∅ and Ω, Fn contains all the unions of elements of En. In the coin tossing example, the algebra Fn can be thought of as the set of all coin tossing events up to stage n. The refinement assumption on E1,E2, ... implies that Fn ⊆ Fn+1, n = 1, 2, .... We say that the sequence {Fn : n ∈ N} of σ-algebras is a filtration. We assume that this filtration increases to the background σ-algebra F, i.e., σ (⋃ n≥0 Fn ) = F, so that F is generated by a countable sequence of subsets of Ω. The informal idea behind the filtration structure is that information is always increasing and eventually captures all propositions of interest. For any events A,E ∈ F with P[E] > 0, we use the standard ratio definition of conditional probability given an event: P[A|E] = P[A ∩ E] P[E] . (1) For a sub-σ-algebra Fn of F, we will also use the standard definition of conditional probability 1The role of countable additivity in a normative theory of probability judgment is a contentious issue. Its mathematical role in all of the results discussed here is not: it is presupposed. CONSENSUS AND MAXIMALLY INFORMED OPINIONS 3 given Fn: P[A | Fn] is a Fn-measurable function that satisfies P[A ∩ E] = ∫ E P[A | Fn](ω)P[dω] (2) for all E ∈ Fn.2 If we think of ω as the "actual world" regarding coin tosses, then, after observing n tosses, Fn is the collection of all propositions that the agent can distinguish as true or false in the actual world. So, just as the conditional probability of A given an event E can be interpreted as the probability of A on the supposition that E is true, so the conditional probability of A given Fn can be interpreted as the probability of A in light of the information provided by Fn. 3 In the present framework, conditional probabilities given sub-σ-algebras can be reduced (almost surely) to conditional probabilities given events; the former merely provide a convenient tool for working around conditioning on null events without adopting extra conventions or assumptions. Since each Fn is generated by a finite partition En, for almost every ω ∈ Ω we have P[A | Fn](ω) = P(A | En(ω)), (3) where En(ω) is the unique cell of En that contains ω. If P(En(ω)) = 0, then the left-hand side of (3) can be defined arbitrarily. 3. Merging of Opinions What does it mean to say that two opinions merge, or that P and Q agree in the limit? Following Blackwell and Dubins (1962), we adopt the total variation distance d as a measure of the distance between P and Q. d(P,Q) = sup A∈F |P[A]−Q[A]| So if P and Q are within ε according to the metric d, then they are within ε for every event A ∈ F. Now we can formulate a natural sense in which two sequences {pn} and {qn} of probability measures might become (and stay) close. We say that {pn} and {qn} merge if d(pn, qn)→ 0 as n→∞. If {pn} and {qn} are updates (not necessarily Bayesian) of P and Q respectively, we say that P and Q merge if {pn} and {qn} do. The main result of Blackwell and Dubins (1962) concerns merging of conditional probabilities. If learning goes by conditionalization, and the priors P and Q do not disagree too drastically, then the Blackwell-Dubins result says P and Q must assign probability 1 to merging. To make this precise, we say that Q is absolutely continuous with respect to P, denoted Q P, if for all A ∈ F P[A] = 0 =⇒ Q[A] = 0. 2For any sub-σ-algebra G, the existence of the conditional probability P[A | G] is guaranteed by the RadonNikodym theorem. The uniqueness of P[A | G] is only almost sure. This means that, in general, there are many measurable functions that satisfy equation (2) and differ from each other on a set of P-measure 0. To mark this fact, we say that there are different versions of P[A | G]. Unlike conditional probabilities given events, there may not exist versions of a conditional probability given a sub-σ-algebra such that P[* | G](ω) is a probability measure on (Ω,F) for all ω ∈ Ω. In this case, we say that P[* | G] is irregular. We discuss regularity more below in connection with the Blackwell-Dubins merging of opinions theorem. 3As pointed out by Billingsley (2008, p. 438), this heuristic explanation of conditioning on a sub-σ-algebra breaks down when conditional probabilities are not proper (see also Blackwell and Dubins, 1975; Seidenfeld, 2001). 4 STEWART AND NIELSEN So Q agrees with P about events bearing probability 0. If P also agrees with Q about probability 0 events, so that P Q, we say that P and Q are equivalent or mutually absolutely continuous. In that case, we have for all A ∈ F P[A] = 0 ⇐⇒ Q[A] = 0. It turns out that absolute continuity is sufficient for merging of conditional probabilities.4 We write PFn = P[* | Fn] and QFn = Q[* | Fn].5 Theorem 1. (Blackwell and Dubins, 1962) If Q P, then d(PFn ,QFn) → 0 a.s. (Q) as n→∞. With Q-probability 1, P merges to Q when Q P. If P Q also, we have merging from the perspectives of both P and Q. The almost surely qualifications are needed in the statement of Theorem 1 because the conditional probabilities PFn and QFn in this general setting are random objects, that is, they depend on ω ∈ Ω (see footnote 2 for details). 4. Merging of Opinions for Sets of Probabilities Schervish and Seidenfeld provide sufficient conditions to secure merging of opinions for sets of probability functions. A set of probability functions C is called convex if P,Q ∈ C implies αP+(1−α)Q ∈ C where α ∈ [0, 1]. That is, for any two elements of C, C includes all convex combinations of those elements. We call P ∈ C an extreme point of C if P = αQ + (1− α)R with Q,R ∈ C and α ∈ [0, 1] implies Q = P or R = P. The following result is a consequence of Theorem 1. Theorem 2. (Schervish and Seidenfeld, 1990, Corollary 1) Let C be a closed, convex set of probability functions, all mutually absolutely continuous, and generated by finitely many extreme points. Then, almost surely (P ∈ C), the conditional probabilities of elements of C merge uniformly, that is, supP,Q∈C d(PFn(ω),QFn(ω))→ 0 as n→∞ for almost all ω ∈ Ω. Another way of stating the conclusion of Theorem 2 is that for almost all ω and all ε > 0 there is some m such that for all n ≥ m and all P,Q ∈ C, d(PFn(ω),QFn(ω)) < ε. That the same m works for all P and Q is what it means for merging to be uniform. Here is the gist of their proof. Since C is generated by finitely many extreme points, P1, ...,Pk, the (total variation) distance between the conditional probabilities of any of the extreme points of C is bounded by the maximum distance for pairs of conditional probabilities of extreme points. Moreover, that maximum is a bound for the distance between the conditional probabilities of any points in the whole set. To see this, note two things. First, d(P,Q) bounds the distances d(P,R) and d(R,Q) where R is a convex combination of P and Q. Second, at any stage n, the conditional probabilities of points in C can be written as convex combinations of the conditional probabilities of the extreme points, i.e., there exist α1, ..., αk with αi ∈ [0, 1] and ∑k i=1 αi = 1 such that RFn = ∑k i=1 αiPiFn for all R ∈ C and all n. Theorem 2 is stronger than it may at first appear. Notice that any set of probabilities 4Kalai and Lehrer show that absolute continuity is not merely a sufficient condition for merging of conditional probabilities, it is also a necessary one (1994, Theorem 2). 5Besides absolute continuity, the Blackwell-Dubins theorem also requires that the conditional probabilities PFn and QFn be regular (or what Blackwell and Dubins call predictive): PFn(ω) and QFn(ω) are probability measures on (Ω,F) for each ω ∈ Ω. It follows from the fact that each Fn is generated by a finite partition that PFn and QFn have regular versions. We assume throughout that all conditional probabilities are regular. For examples of conditional probabilities that are irregular, see Billingsley (2008, Exercise 33.11) and Seidenfeld (2001). CONSENSUS AND MAXIMALLY INFORMED OPINIONS 5 that is included in a closed, convex set generated by finitely many extreme points merges uniformly by this result. It is not easy to relax the assumption in Theorem 2 that C is convex and generated by finitely many extreme points. Schervish and Seidenfeld demonstrate this with several other results and examples. For instance, if C is d-compact, an assumption that is weaker than the assumptions of Theorem 2, the conditional probabilities of elements of C may not exhibit almost sure uniform merging. However, Schervish and Seidenfeld show that weaker modes of merging can be achieved under weaker assumptions about C, such as d-compactness (see their Corollaries 2 and 3). It is worth noting that any negative result for Bayesian conditionalization and merging carries over to Jeffrey conditioning, a generalization of conditionalization that we introduce and study below. An immediate consequence of Schervish and Seidenfeld's results, then, is that Jeffrey updates of a d-compact set of probability measures may not almost surely merge uniformly. There are a number of uses to which results such as Theorem 2 might be put. We delay explaining some of the significance of Theorem 2 and Propositions 1 and 2 (to come) until Section 8. 5. Jeffrey Conditioning Bayesian conditionalization is a putative diachronic norm that specifies how probabilistic learning takes place.6 A basic assumption of conditionalization is that it is propositions (or events) that are learned. In other words, learning experiences can always be represented by some E ∈ F. When an agent learns E, conditionalization says that her posterior probability PE [A] for an event A ∈ F should be equal to her prior conditional probability of A given E (provided, of course, that this is well defined), i.e. PE [A] = P[A | E]. It is clear from (1) that conditionalization requires that E have posterior probability 1. Jeffrey conditioning relaxes both of the fundamental features of conditionalization: it does not assume that learning experiences are always represented by propositions, and it does not require that learning involves assigning propositions probability 1. Proponents of Jeffrey conditioning want to allow for "uncertain learning." Uncertain learning induces a change in the probabilities Pn assigned to the members of the partition En. Jeffrey conditioning applies just in case the change over En is rigid, i.e. Pn[A|E] = Pn−1[A|E], for all E ∈ En and all A ∈ F. (4) The rigidity condition (4) says that the update from Pn−1 to Pn does not change conditional probabilities given members of the partition En. This is equivalent to the requirement that for all members E of the partition En and all subsets A,B of E, the update from Pn−1 to Pn does not change the ratio of the probabilities of A and B. The law of total probability and (4) yield the familiar Jeffrey conditioning equation, which extends Pn from En to the entire algebra F: 6In this essay, we will not quarrel with the view that conditionalization and Jeffrey conditioning are genuine diachronic norms (outside of indicating how they can be relaxed while still achieving merging and convergence). However, both authors regard diachronic "learning" norms with a good deal of suspicion. We recognize that this is likely at odds with most philosophical writing on probability. 6 STEWART AND NIELSEN Pn[A] = ∑ E∈En Pn−1[A|E]Pn[E], for all E ∈ En and all A ∈ F. (5) If En = {E,Ec} and Pn[E] = 1, then Jeffrey conditioning reduces to standard conditionalization, i.e. Pn[A] = Pn−1[A | E] for all A ∈ F.7 Equation (5) simplifies in another way under our assumption that the sequence of partitions E1, ...,En is such that Ei+1 refines Ei for all 1 ≤ i ≤ n− 1. In this case, after n− 1 applications of equation (4) in equation (5), we have Pn[A] = ∑ E∈En P[A|E]Pn[E], for all E ∈ En and all A ∈ F. (6) Equation (6) shows that, in our framework, posterior probabilities Pn are determined by their values on members of the partition En and prior (P) conditional probabilities given members of En. Unless otherwise stated, we will be assuming that the events E ∈ En have positive prior probability so that the right-hand side of (6) is well defined. It is worth pointing out that, in certain situations, Jeffrey conditioning can be represented as standard Bayesian conditionalization in a richer algebra. The cases in which such so-called superconditioning is possible are characterized by Diaconis and Zabell's superconditioning criterion (1982, Theorem 2.1). But the superconditioning criterion is not trivial; it fails to hold in many cases. Hence, Huttegger's merging result for Jeffrey conditioning in the following section and our merging result for sets of Jeffrey updates in Section 6 genuinely generalize Blackwell and Dubins's and Schervish and Seidenfeld's results, respectively. (For other reservations about reducing Jeffrey conditioning to Bayesian conditionalization in the setting of merging of opinions, see Huttegger's discussion (2015b, pp. 630–631).) 5.1. Merging. Huttegger proves an analogue of Blackwell and Dubins's merging result for Jeffrey conditioning. Just as we considered random conditional probabilities in the case of conditionalization, we now treat posteriors over the learning partition as random. By random probability, we mean Pn[*] is not a determinate number but rather a (measurable) function on Ω such that Pn[*](ω) is a probability measure on En for each ω ∈ Ω. Let pEn = Pn[E] = Pn[E] be a random probability assigned to E ∈ En. We treat future probabilities as random quantities because, from an agent's present point of view concerning E ∈ En, the precise value pEn is unknown. Because Jeffrey conditioning is less constrained than Bayesian conditionalization, some additional assumptions figure into Huttegger's theorem. The first additional assumption is that the sequence {Qn} is uniformly absolutely continuous with respect to Q. This condition consists of two requirements: (i) Qn Q|Fn for all n, and (ii) for every ε > 0 there is a δ > 0 such that for all n and all A ∈ Fn, Q[A] < δ =⇒ Qn[A] < ε. (i) is the usual constraint on Bayesian learning that events that are null for the prior Q remain null for the posterior Qn. The additional uniformity requirement (ii) ensures that this relation holds as we pass to the limit (Huttegger, 2015b, p. 623).8 7This, of course, requires that we adopt some conventions for conditioning on null events, otherwise the left-hand side of equation (4) may be undefined. 8Here is another way of thinking about the uniform absolute continuity requirement. First, if Qn Q|Fn for all n, then {Qn} is uniformly absolutely continuous with respect to Q if and only if the sequence {Yn} = {dQn/dQ} of Radon-Nikodym derivatives is uniformly integrable. The "only if" direction of this statement is shown by Huttegger (2015b, Lemma 12.1), and the "if" direction is not difficult to prove. The uniform CONSENSUS AND MAXIMALLY INFORMED OPINIONS 7 The other assumption demands a certain sort of stability of probability judgments as the agent updates over increasingly fine partitions. While we treat posterior probabilities as random, condition (M') places significant constraints on sequences of probability judgments. For all n and all E ∈ En,∫ G pEm+1dP = ∫ G pEmdP for all m ≥ n and all G ∈ Fm (M') Property (M') is the martingale condition, and requires that future probabilities are equal to present probabilities on average. For all n and all E ∈ En, the sequence pEn , pEn+1, ... must form a martingale. The martingale condition has been defended as an essential feature of rational learning experiences (Huttegger, 2013, 2015b). In the case of Bayesian conditionalization, future probabilities are fixed at 1 for those events learned. This, of course, is not generally the case for Jeffrey conditioning. Theorem 3 is Huttegger's merging result for Jeffrey conditioning. Theorem 3. (Huttegger, 2015b, Theorem 9.2) Suppose that Pn and Qn, n = 1, 2, ... are random sequences of probability measures on (Ω,F1), (Ω,F2), ... with Qn = Pn, that Qn, n = 1, 2, ... is uniformly absolutely continuous with respect to Q a.s. (Q), and that Q P. If (M') holds for Qn, n = 1, 2, ..., then d(Pn,Qn)→ 0 as n→∞ a.s. (Q). 6. Merging of Opinions for Jeffrey Conditioning and Sets of Probabilities We are now in a position to turn to the contributions of the present paper. Our first result extends Huttegger's theorem for merging of opinions via Jeffrey conditioning to sets of probabilities, an analogue of Theorem 2. Proposition 1. Let C be a closed, convex set of probability measures, all mutually absolutely continuous, and generated by finitely many extreme points. Suppose that almost surely each pair of probabilities P,Q ∈ C satisfies the conditions of Theorem 3. Then almost surely (P ∈ C) the elements of C merge uniformly. A proof is included in the Appendix. As Schervish and Seidenfeld indicate, Theorem 2 is a simple corollary of Theorem 1 given that point-by-point conditionalization preserves the convexity of a set. We find Proposition 1 especially interesting in light of the fact that, unlike Bayesian conditionalization, Jeffrey conditioning does not generally preserve the convexity of the initial set-even for convex sets of priors generated by finitely many extreme points. We provide a demonstration in the Appendix (Proposition 3). 7. Maximally Informed Opinions Another consequence of Doob's martingale convergence theorem is the so-called "convergence to truth" or "convergence to certainty" for conditional probabilities, captured by the following statement: For all A ∈ F, lim n→∞ PFn [A] = 1A a.s. (P). Put another way, coherence requires assigning probability 1 to converging to "the truth" for any event A ∈ F whose truth value is determined by observations. One way of interpreting such convergence is that "[i]t shows that evidence triumphs over prior opinions under the integrability of {Yn} in turn implies that {Yn} is uniformly bounded in expectation, i.e. supn E[Yn] < ∞ (again, this is a standard result). Therefore, if we think of the derivatives Yn as representing the "rate" at which the posteriors Qn are changing with respect to the prior Q, then Huttegger's uniform absolute continuity condition demands that this rate is not expected to diverge to ∞. 8 STEWART AND NIELSEN appropriate circumstances" (Huttegger, 2015a, p. 590). Two mutually absolutely continuous priors assign probability 1 not just to merging, but to being certain of the same observational hypotheses in the limit.9 Convergence to certainty does not obtain in general for Jeffrey conditioning. However, Huttegger shows that Jeffrey conditioning does lead probabilities to settle down to some limit if we assume (M'), just not necessarily to 0 or 1.10 In particular, Huttegger's Theorem 9.1 states that sequences of almost surely uniformly absolutely continuous Jeffrey updates that satisfy (M') converge setwise almost surely, i.e. for all A ∈ F, P∞[A] := limn→∞ Pn[A] exists almost surely. Skyrms calls this convergence to a "maximally informed opinion" (1996). Convergence to maximally informed opinions can be extended to sets of probabilities. For all A ∈ F, all probabilities in the set C converge to the same limit P∞[A] almost surely. To see this, consider that for all A ∈ F, by the result of Huttegger just mentioned, all elements in C almost surely are converging to a limit. Also, all the elements of C are merging (Theorem 3). If two elements P,Q ∈ C converged to different limits (with positive probability) for some A ∈ F, then there would be some ε and some m such that d(Pn,Qn) > ε for any n ≥ m, which is inconsistent with merging. In fact, a slightly stronger observation can be demonstrated using our Proposition 1. As indicated in the introduction, dynamic coherence can be relaxed. For convergence to a maximally informed opinion, it suffices that probabilities take their values in the set of Jeffrey updates of elements of C. Proposition 2. Let C be as in Proposition 1, and let A ∈ F. If fn[A] ∈ {Pn[A] : P ∈ C}, then the sequence fn[A] converges to the maximally informed opinion P∞[A] a.s. (P ∈ C). We omit the proof. Proposition 2 is the Jeffrey conditioning analogue of Schervish and Seidenfeld's Corollary 4 (1990, p. 410). 8. Philosophical Significance Standard Bayesian learning proceeds by conditionalization, which requires that the event or proposition learned be assigned posterior probability 1. As fans of Jeffrey conditioning sometimes put the point, "learning experiences need not be like that at all" (Huttegger, 2015b, p. 613). Some learning experiences do not lead to certainty. Here, the standard Bayesian framework cannot help. If there are "uncertain learning experiences," Jeffrey conditioning, and the extension of learning rules to accommodate "uncertain evidence" more generally, are topics of considerable philosophical interest. Similarly, as fans of imprecise probabilities often point out, not all uncertainty is reducible to numerically precise probabilities. Here, too, the standard Bayesian framework cannot help. In the presence of "deep uncertainty," sets of probability measures, and generalizations of precise probability models more generally, warrant philosophical attention and scrutiny. We find it somewhat surprising, then, that, with very few exceptions (e.g., Škulj, 2006; Stewart and Ojea Quintana, 2018a), studies that treat "uncertain learning" do not do so in the context of uncertainty that is not reducible to a numerically precise probability nor do studies of IP treat uncertain evidence. A very general motivation for our present study is to redress this lacuna in the literature. We turn now to three more specific points of philosophical interest. 9While Cisewski et al., for example, refer to such convergence results as "desirable" and "laudable" (2017), others are less enthusiastic (e.g., Earman, 1992; Kelly, 1996; Belot, 2013). 10For a discussion of this point, see (Huttegger, 2015b, §7 and §9). CONSENSUS AND MAXIMALLY INFORMED OPINIONS 9 8.1. Relaxing Dynamic Coherence. Earlier, we indicated some consequences of merging results for sets of probabilities. First, Schervish and Seidenfeld claim that results such as Theorem 2 and Proposition 1 have interesting implications for issues related to dynamic coherence. Precisely what they have in mind about the connection between their results and dynamic coherence remains a bit obscure to us (more on this below). The upshot of their discussion, however, seems to be: To achieve asymptotic consensus, agents need not be dynamically coherent-they need not conditionalize or update by Jeffrey conditioning. As Schervish and Seidenfeld put the point with respect to conditionalizing, "We do not impose a constraint of (full) dynamic coherence. For consensus, it suffices that the agents use conditional probabilities arbitrarily chosen from a class C enveloped by finitely many (mutually absolutely continuous) distributions. Under the conditions of [Theorem 2], asymptotic certainty follows from static coherence"(1990, p. 402). Interpreting merging results as claims about agents' unconditional probabilities through time, it is sufficient for achieving asymptotic consensus that each posterior be a conditionalization of some element of C. By the results of this paper, this point can be extended to Jeffrey conditioning. For agents with priors contained within a convex polytope of mutually absolutely continuous probabilities, C, as long as they take posterior probabilities in the set that results from Jeffrey updating the elements of C under the conditions of Theorem 3, they will achieve consensus. That is, each agent is free to choose her posterior arbitrarily from the set of Jeffrey updates of elements of C without surrendering the assurance of long-run consensus. For the purpose of achieving consensus with elements of C, an agent's posterior need not result from Jeffrey updating her prior; it suffices that it be obtained as the result of Jeffrey updating some prior in C. Analogous points hold for convergence. As Schervish and Seidenfeld point out, a consequence of their results is that convergence to the truth does not require conditionalization. Similarly, it follows from Proposition 2 that convergence to a maximally informed opinion does not demand Jeffrey conditioning. Once again, it suffices that an agent's posterior be in the set of Jeffrey updates of elements of C. While Schervish and Seidenfeld's point that dynamic coherence is not necessary for convergence is certainly correct, and while the results discussed in this paper certainly suffice to make this point, the same moral can be gleaned in an even more straightforward way. For example, let {fn : n ∈ N} be any sequence of "updates" of Q that satisfies fn[A](ω) ∈ {x ∈ R : |x−QFn [A](ω)| < 1/n} for all n ∈ N, all A ∈ F, and Q-almost every ω ∈ Ω. Then, fn[A] converges to the truth for all A ∈ F almost surely. But {fn : n ∈ N} need not be a dynamically coherent sequence of updates. In view of simple examples of this kind, it is not clear to us that there is an especially interesting or significant connection between dynamic coherence and the asymptotics of sets of probabilities. Whether or not one finds such a connection, it is clear that we need not impose full dynamic coherence in order to achieve the desirable epistemic ends of convergence and merging. Implicit in Schervish and Seidenfeld's focus on these sorts of asymptotic properties, however, is the idea that we may not be able to relax dynamic coherence in this way and retain just any good-making property typically associated with it. For example, if one is compelled by dynamic Dutch books, taking arbitrary values in the set as above will not cut it. And Skyrms has shown that Dutch book arguments can be extended to Jeffrey conditioning (Skyrms, 10 STEWART AND NIELSEN 1987). It bears repeating, though, that the status of dynamic Dutch book arguments is a subject of considerable controversy (e.g., Levi, 1987; Maher, 1992; Skyrms, 1993). 8.2. IP and Inquiry for a Single Agent. Second, if an agent has a credal state properly represented by a set of probability measures, Theorem 2 and Proposition 1 provide sufficient conditions for that indeterminacy to be reduced by evidence in the limit, for both Bayesian conditionalization and Jeffrey conditioning. Standard, precise probability models are often taken as suspect in contexts of "deep" or "Knightean" uncertainty, motivating the appeal to more general models of uncertainty (Ellsberg, 1963; Levi, 1974; Walley, 1991). Sets of probability measures provide one very general representation of uncertainty. The elements of such sets can be interpreted as those probabilities that the agent regards as permissible to use in inference and decision making, those probability functions that she has not ruled out for such purposes. One way of interpreting the classic distinction between decision making under risk and decision making under uncertainty (Luce and Raiffa, 1957, p. 13) is to identify contexts in which the decision maker has precise probabilities over the relevant events with decision making under risk and contexts in which the decision maker's probabilities may be imprecise with decision making under uncertainty. By assuming agents should always have precise probabilities, decision making under uncertainty is reduced to decision making under risk by fiat. But Schervish and Seidenfeld's Theorem 2 provides conditions under which learning or the acquisition of evidence will reduce uncertainty in the long run. Our Proposition 1 establishes that certain evidence is not required to reduce uncertainty to risk eventually; such a reduction can be achieved with uncertain evidence as well. Moreover, for an agent with a credal state of the same form as C, Proposition 2 shows that, for any event A ∈ F, certain evidence is not required to stabilize on a precise opinion in the limit-P∞[A] = 0.8, say-despite potentially extensive indeterminacy about the event initially-{P[A] : P ∈ C} = [0.1, 0.9], for example. Jeffrey conditioning can bring about such stability even though it may fail to lead to convergence to probabilistic certainty (0 or 1). This is an important distinction between the framework in which Schervish and Seidenfeld work and ours. They show that uncertainty can, in fact, be reduced to probabilistic certainty in the limit, while Jeffrey conditioning only guarantees that we can reduce uncertainty to risk given the appropriate assumptions. 8.3. Pooling and Learning, and Learning, and Learning ... A third upshot reinterprets the previous point in a social setting. Mathematical aggregation frameworks are general and precise settings in which to study ways of forming a consensus or group point of view from a set of potentially diverse points of view. Aggregation is an important topic in economics (e.g., allocation aggregation), political science (e.g., social choice theory), and statistics (e.g., opinion pooling), but it also finds application throughout decision theory and epistemology. Aggregation methods can be and often are interpreted as delivering a "consensus" position among sets of probabilities, beliefs, or preferences. Different concrete recipes for pooling probability judgments have been studied.11 Some of the most extensively studied are ways of averaging probabilities to arrive at a group probability. For example, we could take linear or geometric averages of profiles of individual probabilities. We could also take such averages after weighting the probabilities in the profile differently. In general, pooling allows us to consider ways of arriving at a "consensus" besides those of updating on a shared stream of evidence. 11For a good survey, see (Genest and Zidek, 1986). CONSENSUS AND MAXIMALLY INFORMED OPINIONS 11 One assumption common throughout the pooling literature is that pooling produces a single, "group" or "consensus" probability measure. Yet the standard frameworks have significant limitations. A number of results show that certain sets of desirable aggregation properties cannot be simultaneously satisfied on pain of triviality or inconsistency. Drawing on work on imprecise probabilities, Stewart and Ojea Quintana (2018b) motivate the use of imprecise probabilities in the context of pooling and generalize the canonical mathematical framework to allow for set-valued pooling functions. The group or consensus opinion is allowed to take the form of a set of probabilities. In this more general setting, a number of simple possibility results are established, showing that collections of desirable aggregation properties that are impossible to satisfy in the setting of precise opinion pooling can be jointly satisfied in the setting of opinion pooling with imprecise probabilities. They also characterize a distinguished format of pooling with imprecise probabilities (Stewart and Ojea Quintana, 2018b, Proposition 6). That format is the function that returns the convex hull of any profile of probabilities to be aggregated.12 The point of departure for concern with the convex hull-and the use of imprecise probabilities in the context of pooling more generally-is an essay of Isaac Levi's (1985). There, Levi distinguishes between consensus as shared agreement, which is available at the outset of inquiry by retaining agreements and suspending judgment on other matters, and consensus as the outcome of inquiry, which emerges when evidence resolves initial disputes. About the role of the convex hull, Levi writes, a potential resolution of the conflict between rival credal probability distributions is to be represented by a credal probability distribution which is the weighted average of the distributions in conflict. Hence, the set of all potential resolutions of such a conflict is to be represented by the convex hull (the set of all weighted averages) of the credal distributions initially in contention. My assumption was that this convex set of probability distributions represented the first kind of consensus I regard as important-consensus as shared agreements regarding probability judgment (Levi, 1985, p. 6). On this view, a pooling function that returns the convex hull can be regarded as delivering a consensus position at the outset of inquiry. This is a very different view of opinion pooling than typically presented. One potential complaint about such a view of pooling is that, in many cases, it would identify rather weak consensus positions. According to Levi, this is as consensus at the outset of inquiry should be in order to avoid begging questions by suspending judgment between conflicting points of view. But that is not the end of his story. Weak points of view can be strengthened through inquiry. Levi writes, if we adopt a consensus as shared agreement on credal probabilities before the acquisition of evidence, we may hope to obtain new data via experimentation and observation which will yield a consensus which resolves the original dispute via inquiry. In typical cases, ample data will lead via Bayes's theorem and conditionalization to a reduction in the indeterminacy in the state of credal probability judgment. Consensus as the outcome of inquiry will be more determinate than the consensus as shared agreements adopted at the outset of inquiry (Levi, 1985, p.10, emphasis ours). 12The framework in which convex IP pooling is characterized is general, however, subsuming any mapping from a profile of probabilities to a set of probabilities. 12 STEWART AND NIELSEN One could see merging results as validating Levi's claim here. Under certain conditions, conditionalization does lead to a reduction in indeterminacy. And, happily enough, those conditions are met when forming the convex hull of some finite profile of mutually absolutely continuous probability functions to be aggregated into an initial consensus position. Furthermore, our Proposition 1 shows that the ability to achieve the desired reduction of indeterminacy is not limited to Bayesian conditionalization. Under a few further assumptions, Jeffrey conditioning too reduces indeterminacy uniformly as evidence accumulates. At the outset of inquiry, we can "pool" opinions by identifying the group consensus with the convex hull of individual opinions. This weak initial position can be subsequently uniformly strengthened through inquiry to a consensus at the outcome of inquiry, as uncertainty is reduced to simple risk. Given our observations in this essay, Levi's picture of consensus appears to be robust against the choice of updating method. Proponents of Jeffrey conditioning would reasonably balk at the account if the reduction of indeterminacy were a mere artifact of a learning rule that they think fails to capture so many learning experiences. We have shown that this is not the case. It is also worth emphasizing again that any pooling function that outputs a subset of the convex hull of the profile presents a consensus position that is subject to all of the merging and convergence results for sets mentioned in this essay. If Levi is right that the convex hull represents "the set of all potential resolutions" of the conflict in probability judgments, and reasonable pooling functions take values in the power set of the set of such resolutions, then the merging and convergence results discussed here hold for the class of reasonable pooling functions. Merging results, we submit, constitute a partial response to complaints concerning the use of imprecise probabilities to identify a consensus at the outset of inquiry in the context of pooling. CONSENSUS AND MAXIMALLY INFORMED OPINIONS 13 Appendix: Proofs Proof of Proposition 1 Proof. Let P1, ...,Pk be the extreme points that generate C. We show that for almost every ω ∈ Ω, sup P,Q∈C d(Pn(ω),Qn(ω))→ 0 as n → ∞. Note that all of our "almost surely" qualifications hold for every P ∈ C by the mutual absolute continuity assumption. Let P,Q ∈ C be arbitrary. Following Huttegger (2015a, Appendix), we note that for all ω ∈ Ω the Jeffrey conditioning equation (6) can be written as Pn[*](ω) = ∫ P[* | Fn]dPn(ω). (7) Using (7) and our assumption that Pn(ω) = P 1 n(ω) = Qn(ω) for almost every ω, we have d(Pn(ω),Qn(ω)) ≤ ∫ d(PFn ,QFn)dP 1n(ω) = ∫ d(PFn ,QFn) dP 1n(ω) dP1 dP1 (8) for almost all ω. The function dP 1n(ω)/dP1 is the Radon-Nikodym derivative of P 1n(ω) with respect to P1 on the measurable space (Ω,Fn), which is guaranteed to exist for almost every ω because P 1n(ω) is absolutely continuous with respect to P1 for almost every ω. Note that d(PFn ,QFn) ≤ max i,j d(PiFn ,P j Fn ) a.s. because PFn and QFn are (almost surely) convex combinations of P1Fn , ...,P k Fn (Schervish and Seidenfeld, 1990). Therefore, for almost every ω,∫ d(PFn ,QFn) dP 1n(ω) dP1 dP1 ≤ ∫ max i,j d(PiFn ,P j Fn ) dP 1n(ω) dP1 dP1. From this inequality and (8) we have d(Pn(ω),Qn(ω)) ≤ ∫ max i,j d(PiFn ,P j Fn ) dP 1n(ω) dP1 dP1 (9) for almost every ω. Since (9) holds for all P,Q ∈ C, it holds upon taking a supremum over P,Q ∈ C. Hence, for almost all ω we have sup P,Q∈C d(Pn(ω),Qn(ω)) ≤ ∫ max i,j d(PiFn ,P j Fn ) dP 1n(ω) dP1 dP1. (10) We conclude the proof with arguments similar to Huttegger's. We refer the reader to his Appendix for more details. First, we note that (M') implies that the sequence {dP 1n(ω)/dP1} is a nonnegative martingale with respect to P1 for almost every ω, and hence converges almost surely to a finite limit for almost every ω. Theorem 2 implies that maxi,j d(PiFn ,P j Fn ) → 0 almost surely. Therefore, for almost every ω, max i,j d(PiFn ,P j Fn ) dP 1n(ω) dP1 → 0 almost surely as n→∞. Using this fact we have lim n→∞ ∫ max i,j d(PiFn ,P j Fn ) dP 1n(ω) dP1 dP1 = ∫ lim n→∞ max i,j d(PiFn ,P j Fn ) dP 1n(ω) dP1 dP1 = 0 (11) 14 STEWART AND NIELSEN for almost every ω. The passage of the limit under the integral is justified by the fact that dP 1n(ω)/dP1 is uniformly integrable for almost every ω, which in turn implies that maxi,j d(PiFn ,P j Fn )dP 1n(ω)/dP1 is uniformly integrable for almost every ω as it is dominated by dP 1n(ω)/dP1. That dP 1n(ω)/dP1 is uniformly integrable for almost every ω is equivalent to our assumption that P 1n(ω) is uniformly absolutely continuous with respect to P1 for almost every ω. See Huttegger (2015a, Proof of Theorem 9.2) for more on this point. Finally, (10) and (11) imply sup P,Q∈C d(Pn(ω),Qn(ω))→ 0 as n→∞ for almost every ω, as desired.  Jeffrey Conditioning Does Not Preserve Convexity Proposition 3. Let C be as in Proposition 1. The result of Jeffrey conditioning all elements of C on a common posterior need not be a convex set. Proof. We sketch a counterexample. Let Ω = {ω1, ω2, ω3, ω4}, and consider the following two probabilities. Table 1. Priors ω1 ω2 ω3 ω4 P 1/4 1/4 1/4 1/4 Q 1/8 1/2 1/4 1/8 Set C = conv{P,Q}. Let E = {E1, E2} with E1 = {ω1, ω2} and E2 = {ω3, ω4} be a partition of Ω. Jeffrey updating both P and Q using P , where P (E1) = 2/3 and P (E2) = 1/3, we obtain the following posteriors. Table 2. Posteriors ω1 ω2 ω3 ω4 P1 1/3 1/3 1/6 1/6 Q1 2/15 8/15 2/9 1/9 Let CPE be the result of Jeffrey updating each element of C on the common posterior P on partition E. To establish the claim, it suffices to find some α ∈ [0, 1] such that αR1 + (1 − α)R′1 /∈ CPE for some R,R′ ∈ C (since it is clear that R1,R′1 ∈ CPE ). Let α = 4/9 and consider αP1 + (1 − α)Q1. Suppose for reductio that CPE is convex. Then, αP1 + (1 − α)Q1 ∈ CPE . This implies that there is some β ∈ [0, 1] and R∗ ∈ C such that βP + (1 − β)Q = R∗ and R∗1 = αP1 + (1− α)Q1. Clearly, αP1({ω3}) + (1 − α)Q1({ω3}) = 16/81. By the definition of Jeffrey conditioning and our assumptions, we have 16 81 = ∑ j=1,2 P (Ej)R∗({ω3}|Ej) = ( 2 3 ) R∗({ω3} ∩ E1) R∗(E1) + ( 1 3 ) R∗({ω3} ∩ E2) R∗(E2) . CONSENSUS AND MAXIMALLY INFORMED OPINIONS 15 Since {ω3} ∩ E1 = ∅, the left summand of the right-hand side is 0. By algebra and the definition of R∗, we obtain 16 81 = ( 1 3 ) βP({ω3}) + (1− β)Q({ω3}) βP(E2) + (1− β)Q(E2) . Substituting values from Table 1 and solving for β, we have β = 3/8. However, αP1({ω1}) + (1 − α)Q1({ω1}) = 2/9. This implies that R∗1({ω1}) = 2/9. But for P and β = 3/8, this is not the case. Again using the definition of Jeffrey conditioning and our assumptions, it can be verified that R∗1({ω1}) = 22/111 < 2/9. It follows that there do not exist a β ∈ [0, 1] and a R∗ ∈ C that meet our stated requirements above.  16 STEWART AND NIELSEN References Belot, G. (2013). Bayesian orgulity. Philosophy of Science 80 (4), 483–503. Billingsley, P. (2008). Probability and Measure. John Wiley & Sons. Blackwell, D. and L. E. Dubins (1962). Merging of opinions with increasing information. The Annals of Mathematical Statistics 33 (3), 882–886. Blackwell, D. and L. E. Dubins (1975). On existence and non-existence of proper, regular, conditional distributions. The Annals of Probability 3 (5), 741–752. Cisewski, J., J. Kadane, M. Schervish, T. Seidenfeld, and R. Stern (2017). Standards for Modest bayesian credenes. Philosophy of Science 85 (1), 2018. Diaconis, P. and S. L. Zabell (1982). Updating subjective probability. Journal of the American Statistical Association 77 (380), 822–830. Earman, J. (1992). Bayes or Bust? A Critical Examination of Bayesian Confirmation Theory. Cambridge, MA: MIT Press. Elkin, L. and G. Wheeler (2018). Resolving peer disagreements through imprecise probabilities. Noûs 52 (2), 260–278. Ellsberg, D. (1963). Risk, ambiguity, and the savage axioms. The Quarterly Journal of Economics 77 (2), 327–336. Gaifman, H. and M. Snir (1982). Probabilities over rich languages, testing and randomness. The Journal of Symbolic Logic 47 (03), 495–548. Genest, C. and J. V. Zidek (1986). Combining probability distributions: A critique and an annotated bibliography. Statistical Science 1 (1), 114–135. Huttegger, S. M. (2013). In defense of reflection. Philosophy of Science 80 (3), 413–433. Huttegger, S. M. (2015a). Bayesian convergence to the truth and the metaphysics of possible worlds. Philosophy of Science 82 (4), 587–601. Huttegger, S. M. (2015b). Merging of opinions and probability kinematics. The Review of Symbolic Logic 8 (04), 611–648. Kalai, E. and E. Lehrer (1994). Weak and strong merging of opinions. Journal of Mathematical Economics 23 (1), 73–86. Kelly, K. T. (1996). The Logic of Reliable Inquiry. Oxford University Press. Levi, I. (1974). On indeterminate probabilities. The Journal of Philosophy 71 (13), 391–418. Levi, I. (1985). Consensus as shared agreement and outcome of inquiry. Synthese 62 (1), pp. 3–11. Levi, I. (1987). The demons of decision. The Monist 70 (2), 193–211. Luce, R. D. and H. Raiffa (1957). Games and decisions: Introduction and critical survey. Courier Dover Publications. Maher, P. (1992). Diachronic rationality. Philosophy of Science 59 (1), 120–141. Savage, L. (1972, originally published in 1954). The Foundations of Statistics. New York: John Wiley and Sons. Schervish, M. and T. Seidenfeld (1990). An approach to consensus and certainty with increasing evidence. Journal of Statistical Planning and Inference 25 (3), 401–414. Seidenfeld, T. (2001). Remarks on the theory of conditional probability: Some issues of finite versus countable additivity. In V. F. Hendricks (Ed.), Probability Theory: Philosophy, Recent History and Relations to Science. Synthese Library, Kluwer. Škulj, D. (2006). Jeffrey's conditioning rule in neighbourhood models. International Journal ofApproximate Reasoning 42 (3), 192–211. Skyrms, B. (1987). Dynamic coherence and probability kinematics. Philosophy of Science 54 (1), 1–20. Skyrms, B. (1993). A mistake in dynamic coherence arguments? Philosophy of Science 60 (2), 320–328. Skyrms, B. (1996). The structure of radical probabilism. Erkenntnis 45 (2-3), 285–297. Stewart, R. T. and I. Ojea Quintana (2018a). Learning and pooling, pooling and learning. Erkenntnis 83 (3), 369–389. CONSENSUS AND MAXIMALLY INFORMED OPINIONS 17 Stewart, R. T. and I. Ojea Quintana (2018b). Probabilistic opinion pooling with imprecise probabilities. Journal of Philosophical Logic 47 (1), 17–45. Walley, P. (1991). Statistical reasoning with imprecise probabilities. Chapman and Hall London.