Belief revision generalized: A joint characterization of Bayess and Je¤reys rules Franz Dietrich, Christian List, and Richard Bradley1 October 2010, nal version December 2015 Abstract We present a general framework for representing belief-revision rules and use it to characterize Bayess rule as a classical example and Je¤reys rule as a non-classical one. In Je¤reys rule, the input to a belief revision is not simply the information that some event has occurred, as in Bayess rule, but a new assignment of probabilities to some events. Despite their di¤erences, Bayess and Je¤reys rules can be characterized in terms of the same axioms: responsiveness, which requires that revised beliefs incorporate what has been learnt, and conservativeness, which requires that beliefs on which the learnt input is silentdo not change. To illustrate the use of non-Bayesian belief revision in economic theory, we sketch a simple decision-theoretic application. Keywords: Belief revision, subjective probability, Bayess rule, Je¤reys rule, axiomatic foundations, ne-grained versus coarse-grained beliefs, unawareness JEL classication: C73, D01, D80, D81, D83, D90 1 Introduction A belief-revision rule captures how an agents subjective probabilities should change when the agent learns something new. The standard example is Bayess rule. Here, the agent learns that some event has occurred, and the response is to raise the subjective probability of that event to 1, while retaining all probabilities conditional on it. Formally, let be the underlying set of possible worlds (where is non-empty and, for simplicity, nite or countably innite).2 Subsets of are called events. Beliefs are represented by some probability measure on the set of all events. Bayess rule says that, upon learning that some event B  has occurred (with p(B) 6= 0), one should move from the prior probability measure p to the posterior probability measure p0 given by p0(A) = p(AjB) for all events A  . 1F. Dietrich, Paris School of Economics & CNRS; C. List, London School of Economics; R. Bradley, London School of Economics. We are grateful for comments from the editors and referees and from audiences at D-TEA 2010 (HEC & Polytechnique, Paris, June 2010), LSEs Choice Group Seminar (LSE, September 2010), Pluralism in the Foundations of Statistics (University of Kent, September 2010), and Decisions, Games, and Logic 2012 (University of Munich, June 2012). Although this paper is jointly authored, List and Bradley wish to note that the bulk of the mathematical credit should go to Dietrich. Dietrich was supported by a Ludwig Lachmann Fellowship at the LSE and the French Agence Nationale de la Recherche (ANR-12-INEG-0006-01). List was supported by a Leverhulme Major Research Fellowship (MRF-2012-100) and the Franco-Swedish Program in Philosophy and Economics (via a visit to the University of Uppsala). Bradley was supported by the Arts and Humanities Research Council (via a grant on Managing Severe Uncertainty, AH/J006033/1). 2We expect that our results can be generalized to a set of arbitrary cardinality. 1 Preprint of an article in Journal of Economic Theory 162: 352-371, 2016 Official Publication: https://www.sciencedirect.com/science/article/pii/S0022053115002008 For extensions to other revision rules (Adam's rule and the dual-Jeffrey rule), see the earlier version: In economic theory, belief changes are almost always modelled in this way. The aim of this paper is to draw attention to a more general form of belief revision, which is seldom discussed in economics. We develop a general framework in which di¤erent belief-revision rules Bayesian and non-Bayesian can be characterized. In this framework, the key di¤erence between di¤erent belief-revision rules lies in what they take to be the input prompting the agents belief change. Under Bayess rule, the learnt input is always the occurrence of some event, but this is more restrictive than often recognized, and we show that there is scope for useful generalization. We begin with an example that will resonate with any international traveller. Kotaro, a junior academic from Japan, is applying for a faculty position in the UK. After his interview, he is telephoned by Terence, the chair of the department, to inform him of the outcome. Kotaro gets a vague impression that he is being o¤ered the job, but struggles to understand Terences thick Irish accent. At the end of the call, he is still unsure whether he has received an o¤er. He becomes convinced only after a subsequent conversation with another member of the department. This example illustrates an instance of belief revision triggered by a noisy signal. Before the telephone conversation with the chair of the department, Kotaro attaches a very low probability to the event of getting the job. After the conversation, he attaches a somewhat higher probability to it, but one that still falls short of certainty. For this, a second conversation (with another person) is needed. Such cases present challenges to the Bayesian modeller. If the rst change in Kotaros probabilities is to be modelled as an application of Bayess rule, then it will clearly not su¢ ce to restrict attention to the naïveset of possible worlds = fappointed, not appointedg. Relative to that naïveset, a Bayesian belief change could never increase the probability of the event appointedwithout raising it all the way to 1. The modeller will need to enrich the set to capture the possible sensory experiences responsible for Kotaros shift in probabilities over the events appointedand not appointed. So, the enriched set of worlds will have to be something like

= fappointed, not appointedg  A; whereA is the set of possible analogue auditory signals received by Kotaros eardrums. The signal he receives from Terence will then correspond to some subset of 0, specically one of the form B = fappointed, not appointedg  A, where A  A is a particular auditory event. Not much less than this representation will do. Even replacing A with a smaller set of possible verbal messages would not su¢ ce, since Terences words are subject to a triple distortion: by his thick accent, by imperfections of the telephone line, and by the interpretation of a non-native speaker. Enriching the set in this way, however, has denite modelling costs. First, the agent (Kotaro) almost certainly does not have prior subjective probabilities over the events from such a rich set. In light of the huge range of possible signals, the set 0 is of dizzying size and complexity, when compared to the naïve set on which the agents attention is initially focused. Second, it is doubtful that before the conversation he is even aware of the possibility of such complex auditory signals (probably he has never heard, or even imagined, an accent like Terences). So, a Bayesian model of this story, and others like it, must inevitably involve a heavy dose of ction. It ascribes to the agent greater prior opinionation (ability to assign prior 2 probabilities) and greater awareness (conceptualization or consideration of events) than psychologically plausible. In a similar vein, Diaconis and Zabell (1982, p. 823) have called the assignment of prior subjective probabilities to many classes of sensory experiences [...] forced, unrealistic, or impossible(see also Je¤rey 1957 and Shafer 1981). Of course, whether this is a problem or not will depend on the uses to which the model is put; we are not denying the usefulness of as-ifmodelling in all cases. But good scientic practice should encourage us to investigate whether other beliefrevision rules are better at capturing cases like the present one and how these other rules relate to Bayess rule. This is what motivates this paper. One of the most prominent generalizations of Bayess rule is Je¤reys rule (e.g., Je¤rey 1957, Shafer 1981, Diaconis and Zabell 1982, Grünwald and Halpern 2003). Here, the agent learns a new probability of some event, for instance a 20% probability of an accident or a 75% probability of a job o¤er, as perhaps in the example of Kotaro, the junior academic. More generally, the agent learns a new probability distribution of some random variable such as the level of rainfall or GDP. The response, then, is to assign the new distribution to that random variable, while retaining all probabilities conditional on it. Formally, let B be a partition of the set into nitely many nonempty events, and suppose the agent learns a new probability B for each event B in B. The family of learnt probabilities, (B)B2B, is a probability distribution over B (i.e., consists of non-negative numbers with sum-total 1). Je¤reys rule says that, upon learning (B)B2B, one should move from the prior probability measure p to the posterior probability measure p0 given by p0(A) = X B2B p(AjB)B for all events A  :3 For instance, suppose the agent learns that it will rain with probability 12 , snow with probability 13 , and remain dry with probability 1 6 . Then the partition B (of a suitable set ) contains the events of rain (R), snow (S), and no precipitation (N), where R = 12 , S = 1 3 , and N = 1 6 . Bayess rule is the special case where B partitions into an event B and its complement B, with B = 1 and B = 0. (The complement of any event B is B = nB.) The framework we develop allows us to dene and characterize di¤erent beliefrevision rules. What is being learnt by the agent can take a variety of forms; we call this the learnt input. It can be interpreted as the constraint that a particular experience say the receipt of some signal imposes on the agents beliefs. Examples of learnt inputs are event occurrences for Bayess rule and learnt probability distributions for Je¤reys. But, in principle, the learnt inputs could also be very di¤erent, such as new conditional probabilities of certain events. We show that, despite their di¤erences, Bayess and Je¤reys rules can be characterized in terms of the same two axioms, simply applied to di¤erent domains of learnt inputs. Our axioms are (i) a responsiveness axiom, which requires that revised beliefs be consistent with the learnt input, and (ii) a conservativeness axiom, which requires that those beliefs on which the input is silent (in a sense to be made precise) do 3For p0 to be well-dened, we must have B = 0 whenever p(B) = 0. This ensures that if a term p(AjB) is undened in the displayed formula (because p(B) = 0), then this term does not matter (because it is multiplied by B = 0). 3 not change. The fact that one of most prominent non-classical belief-revision rules, namely Je¤reys rule, can be justied in complete analogy to Bayess rule should assuage some economistsworry that non-Bayesian rules automatically involve costly departures from compelling principles of belief revision. Our characterization can be extended to other non-classical belief-revision rules too, but for expositional simplicity, we set these aside in this paper. We hope that our framework will inspire further work on economic applications, and behaviourial implications, of non-Bayesian forms of belief revision. To suggest some steps in this direction, we conclude the paper with a discussion of how Je¤rey revision may be introduced into decision and game theory, especially to capture unforeseen learning;we also briey revisit the issue of unawareness.4 Prior literature. Bayess and Je¤reys rules have been axiomatically characterized in previous work, but the existing approaches are less unied than ours. One approach is a distance-basedone. This consists in showing that a given belief-revision rule generates posterior beliefs that incorporate the information learnt, while deviating as little as possible from prior beliefs, relative to some notion of distancebetween beliefs.5 Bayess and Je¤reys rules have been characterized relative to either the variation distance (dened by the maximal absolute di¤erence in probability, over all events in the algebra), the Hellinger distance, or the relative-entropy distance (e.g., Csiszar 1967, 1977, van Fraassen 1981, Diaconis and Zabell 1982, Grünwald and Halpern 2003). The third notion of distance does not dene a proper metric, as it is asymmetric in its two arguments. These results, however, do not generally carry over to other belief-revision rules without changing the distance metric,6 and the interpretation of di¤erent distance metrics is often controversial. Another approach to characterizing belief-revision rules invokes the idea of rigidityrather than distance-minimization (e.g., Je¤rey 1957 and Bradley 2005 for some extensions). For example, Bayesian belief revision is rigidin the sense of preserving the conditional probability of any event, given the learnt event. Although this approach is closer in spirit to ours, it also lacks unication (since the notion of rigidity is not fully general). Still, one might interpret our present conservativeness axiom as a unied alternative to earlier rigidity axioms, applicable to any belief-revision rule. For an overview of various forms of probabilistic belief and belief revision, we refer the reader to Halperns textbook (2003). Since we here deal exclusively with beliefs that are represented by subjective probability measures, we set aside the literature on the revision of beliefs that do not take this form. 4For a brief discussion of dynamic-consistency arguments for Je¤reys rule, similar to classic dynamic-consistency arguments for Bayess rule, see Vineberg (2011). 5Distance is represented by a function d : PP ! R, where P is the set of all probability measures over the events from and d(p; p0) is interpreted as the distance between two such measures, p; p0 2 P. 6For example, Douven and Romeijn (2011) characterize another belief-revision rule, Adamss rule (which we briey discuss in Section 5), by invoking a di¤erent measure of distance, the inverse relative-entropy distance, which di¤ers from ordinary relative-entropy distance in the inverted order of its arguments. 4 2 A general framework We can study belief revision in general by specifying (i) a set P of possible belief states in which a given agent can be, and (ii) a set I of possible inputs which can inuence the agents belief state. A revision rule maps pairs (p; I) of an initial belief state p in P and an input I in I to a new belief state p0 = pI in P. The pair (p; I) belongs to some domain D  P I containing those beliefinput pairs that are admissible under the given revision rule. A revision rule is thus a function from D to P (see also Dietrich 2012). Since we focus on beliefs that take the form of subjective probability measures, the set P of possible belief states is the set of all probability measures over the events from

. Formally, a probability measure is a countably additive function p : 2 ! [0; 1] with p( ) = 1 (where is the underlying nite or countably innite set of worlds). How can we dene the set I of possible inputs? Looking at Bayess rule alone, one might be tempted to dene them as observed events B  . But Je¤reys rule and the other rules introduced below permit di¤erent inputs, such as a family (B)B2B of learnt probabilities in Je¤reys case. Methodologically, one should not tie the notion of a learnt inputtoo closely to one particular revision rule, by dening it as a mathematical object that is tailor-made for that rule. This would exclude other revision rules from the outset and thereby prevent us from giving a fully compelling axiomatic characterization of the rule in question. Instead, we need an abstract notion of a learnt input. We dene a learnt input as a set of belief states I  P, interpreted as the set of those belief states that are consistent with the input. We can think of the input I as the constraint that a particular experience, such as the receipt of some signal, imposes on the agents belief state. The set of logically possible inputs is I = 2P . Note that this is deliberately general. An agents belief change from p to pI upon learning I 2 I is responsive to the input if pI 2 I. We can now dene the inputs involved in Bayesian revision and Je¤rey revision. Denition 1 A learnt input I 2 I is  Bayesian if I = fp0 : p0(B) = 1g for some event B 6= ?; we then write I = IB;7  Je¤ rey if I = fp0 : p0(B) = B for all B 2 Bg for some probability distribution (B)B2B on some partition B; we then write I = I(B)B2B .8 Here, and in what follows, we use the term partition to refer to a partition of into nitely many non-empty events. Clearly, every Bayesian input is also a Je¤rey input, while the converse is not true. Our framework also allows us to represent many other kinds of inputs, for instance I = fp0 : p0(A \ B) > p0(A)p0(B)g, which captures the constraint that the events 7The representation I = IB is unique, because for any Bayesian input I, there exists a unique event B such that I = IB . 8For any Je¤rey input I, the corresponding family (B)B2B is essentially uniquely determined, in the sense that the subpartition fB 2 B : B 6= 0g and the corresponding subfamily (B)B2B:B 6=0 are unique. The subpartition fB 2 B : B = 0g is sometimes non-unique. Uniqueness can be achieved by imposing the convention that jfB 2 B : B = 0gj  1. 5 A and B are positively correlated, or I = fp0 : p0(A)  9=10g, which captures the constraint that A is highly probable. In general, the smaller the set I, the stronger (more constraining) the input. The strongest consistent inputs are the singleton sets I = fp0g, which require adopting the new belief state p0 regardless of the initial belief state. The weakest input is the set I = P, which allows the agent to retain his or her old belief state. We are now able to dene Bayess and Je¤reys rules in this framework. (Of course, the framework equally permits the denition of other belief-revision rules.) Denition 2  Let DBayes be the set of all pairs (p; I) 2 P  I such that I = IB is a Bayesian input compatible with p (which means p(B) 6= 0). Bayess rule is the revision rule on DBayes which maps each (p; IB) 2 DBayes to p0 2 P, where p0(A) = p(AjB) for all events A  : (1)  Let DJe¤ rey be the set of all pairs (p; I) 2 P  I such that I = I(B)B2B is a Je¤rey input compatible with p (which means B = 0 whenever p(B) = 0). Je¤ reys rule is the revision rule on DJe¤ rey which maps each (p; I(B)B2B) 2 DJe¤ rey to p0 2 P, where p0(A) = X B2B p(AjB)B for all events A  : (2) The domains DBayes and DJe¤rey are the maximal domains for which formulas (1) and (2) are well-dened.9 Je¤reys rule extends Bayess, i.e., it coincides with Bayess rule on the subdomain DBayes ( DJe¤rey). 3 An axiomatic characterization We now introduce two plausible axioms that a belief-revision rule may be expected to satisfy and show that they imply that the agent must revise his or her beliefs in accordance with Bayess rule in response to any Bayesian input and in accordance with Je¤reys rule in response to any Je¤rey input. As already noted, the same axioms can also be used to characterize other belief-revision rules, but we relegate these extensions to follow-up work (as briey discussed in Section 5). All proofs are given in the Appendix. 3.1 Two axioms Let D  P  I be the domain of the belief-revision rule. For each beliefinput pair (p; I) 2 D, we write pI 2 P to denote the revised belief state. Our rst axiom says that the revised belief state should be responsive to the learnt input. 9The denition of each revision rule and its domain relies on the fact that each Bayesian input I is uniquely representable as I = IB and that each Je¤rey input is almostuniquely representable as I = I(B)B2B , where any residual non-uniqueness makes no di¤erence to the revised belief state or the criterion for including (p; I) in the domain. For details, see Lemmas 1 and 3 in the Appendix. 6 Responsiveness: pI 2 I for all beliefinput pairs (p; I) 2 D. Responsiveness guarantees that the agents revised belief state respects the constraint given by the input. For example, in response to a Bayesian input, the agent assigns probability one to the learnt event. The second axiom expresses a natural conservativeness requirement: those parts of the agents belief state on which the learnt input is silentshould not change in response to it. In short, the learnt input should have no e¤ect where it has nothing to say. To dene that axiom formally, we must answer two questions: what do we mean by parts of a belief state, and when is a given input silenton them? To answer these questions, note that, intuitively:  a Bayesian input is not silent on the probability of the learnt event B, but is silent on all conditional probabilities, given B; and  a Je¤rey input is not silent on the probabilities of the events in the relevant partition B, but is silent on all conditional probabilities, given these events. So, the partsof the agents belief state on which Bayesian inputs and Je¤rey inputs are silent are conditional probabilities of some events, given others. The relevant conditional probabilities are preserved by Bayess and Je¤reys rules, so that these rules are intuitively conservative. In the next subsection, we dene formally what it means for a learnt input to be silenton the probability of one event, given another. Once we have this denition, we can formulate our conservativeness axiom as follows. Conservativeness (axiom scheme): For all beliefinput pairs (p; I) 2 D, if I is silenton the probability of a (relevant) event A given another B, this conditional probability is preserved, i.e., pI(AjB) = p(AjB) (if pI(B); p(B) 6= 0). 3.2 When is a learnt input silent on the probability of one event, given another? Our aim is to dene when a learnt input I 2 I is silenton the probability of one event A, given another event B (where possibly B = ). Our analysis is fully general, i.e., not restricted to any particular class of inputs, such as Bayesian inputs or Je¤rey inputs. We rst note that we need to dene silence only for the case in which ? ( A ( B  Supp(I); where Supp(I) is the support of I, dened as f! 2 : p0(!) 6= 0 for some p0 2 Ig.10 There are two plausible notions of silence, which lead to two di¤erent variants of our conservativeness axiom. We begin with the weaker notion. A learnt input is weakly silent on the probability of A given B if it permits this conditional probability to take any value. Formally: 10Here, and elsewhere, we write p0(!) as an abbreviation for p0(f!g) when we refer to the probability of a singleton event f!g. 7 Denition 3 Input I 2 I is weakly silent on the probability of A given B (for ? ( A ( B  Supp(I)) if, for every value in [0; 1], I contains some belief state p0 (with p0(B) 6= 0) such that p0(AjB) = . For instance, the learnt input I = fp0 : p0(B) = 1=2g is weakly silent on the probability of A given B. So is the input I = fp0 : p0(A)  1=2g. This weak notion of silence gives rise to the following strong conservativeness axiom: Strong Conservativeness: For all beliefinput pairs (p; I) 2 D, if I is weakly silent on the probability of an event A given another B (where ? ( A ( B  Supp(I)), this conditional probability is preserved, i.e., pI(AjB) = p(AjB) (if pI(B); p(B) 6= 0). Although this axiom may seem plausible, it leads to an impossibility result. Proposition 1 If #  3, no belief-revision rule on any domain D  DJe¤ rey is responsive and strongly conservative. Note that, on the small domain DBayes , there is no such impossibility, because Bayess rule is responsive as well as strongly conservative. On that domain, the present strong conservativeness axiom is equivalent to our later, weaker one. The impossibility occurs on domains on which the two conservativeness axioms come apart. We weaken strong conservativeness by strengthening the notion of silence. The key insight is that even if a learnt input I is weakly silent on the probability of A given B, it may still implicitly constrain the relationship between this conditional probability and others. Suppose, for instance, that = f0; 1g2, where the rst component of a world (g; j) 2 represents whether Richard has gone out (g=1) or not (g=0), and the second whether Richard is wearing a jacket (j=1) or not (j=0). Consider the event that Richard has gone out, G = f(g; j) : g = 1g, and the event that he is wearing a jacket, J = f(g; j) : j = 1g. Some inputs are weakly silent on the probability of J (given ) and yet require this probability to be related in certain ways to other probability assignments, especially those conditional on J . Consider, for instance, the Je¤rey input which says that G is 90% probable, formally I = fp0 : p0(G) = 0:9g. It is compatible with any probability of J and is thus weakly silent on the probability of J , given . But it requires this probability to be related in certain ways to the probability of G, given J . If this conditional probability is 1 (which is compatible with I), then the probability of J can no longer exceed 0.9. If it did, the probability of G would exceed 0.9, which would contradict the learnt input I. In short, although I does not directly constrain the agents subjective probability for J , it constrains it indirectly, i.e., after other parts of the belief state have been xed. A learnt input is strongly silent on the probability of A given B if it permits this conditional probability to take any value even after other parts of the agents belief state have been xed. Let us rst explain this idea informally. What exactly are the other parts of the agents belief state? They are those probability assignments that are orthogonalto the probability of A given B. In other words, they are all the beliefs of which the belief state p0 consists, over and above the probability of A given B. More precisely, assuming again that A is included in B, they are given by the quadruple consisting of the unconditional probability p0(B) and the conditional 8 probabilities p0(jA), p0(jBnA), and p0(jB).11 This quadruple and the conditional probability p0(AjB) jointly determine the belief state p0, because p0 = p0(jA) p0(A)| {z } p0(AjB)p0(B) +p0(jBnA) p0(BnA)| {z } p0(B) p0(AjB)p0(B) +p0(jB) p0(B):| {z } 1 p0(B) If an input I is strongly silent on the conditional probability of A given B, then this probability can be chosen freely even after the other parts of the agents belief state have been xed in accordance with I (which requires them to match those of some belief state p in I). This idea is illustrated in Figure 1, where a learnt input I is probability of A given B 0 1 other parts of the belief state (a) no silence I 0 1 (b) weak silence I probability of A given B 0 1 (c) strong silence I probability of A given Bα p* p' α α p*p' other parts of the belief state other parts of the belief state Figure 1: An inputs weak or strong silence on some conditional probability represented in the space whose horizontal coordinate represents the probability of A given B and whose vertical coordinate represents the other parts of the agents belief state (collapsed into a single dimension for illustration). In part (a), input I (represented by the circular region) is not silent at all on the probability of A given B, since many values of this probability, such as , are ruled out by I. In part (b), input I (represented by the ovalregion) is weakly but not strongly silent on the probability of A given B. This is because I is consistent with any value of that probability, but to combine it with a particular value, such as , other parts of the belief state can no longer be freely chosen. In part (c), input I (represented by the rectangular region) is strongly silent on the probability of A given B. It is consistent with any value of that probability, even after other parts of the belief state have been xed. To dene strong silence formally, we say that two belief states p0 and p coincide outside the probability of A given B if the other parts of these belief states coincide, i.e., if p0(B) = p(B) and p0(jC) = p(jC) for all C 2 fA;BnA;Bg such that p0(C); p(C) 6= 0. Clearly, two belief states that coincide both (i) outside the probability of A given B and (ii) on the probability of A given B are identical. Denition 4 Input I 2 I is strongly silent on the probability of A given B (for ? ( A ( B  Supp(I)) if, for all 2 [0; 1] and all p 2 I, the set I contains some belief state p0 (with p0(B) 6= 0) which 11This informal discussion assumes that p0(A); p0(BnC); p0(B) 6= 0. 9 (a) coincides with on the probability of A given B, i.e., p0(AjB) = , (b) coincides with p outside the probability of A given B (if p(A); p(BnA) 6= 0). In this denition, there is only one belief state p0 satisfying (a) and (b), given by p0 := p(jA)p(B) + p(jBnA)(1  )p(B) + p( \B); (3) so that the requirement that there exists some p0 in I satisfying (a) and (b) reduces to the requirement that I contains the belief state (3).12 For example, the inputs I = fp0 : p0 is uniform on Bg and I = fp0 : p0(B)  1=2g are strongly silent on the probability of A given B, since this conditional probability can take any value, independently of other parts of the agents belief state (e.g., independently of the probability of B). This strengthened notion of silence leads to a weaker conservativeness axiom, which we call just conservativeness. Conservativeness: For all beliefinput pairs (p; I) 2 D, if I is strongly silent on the probability of an event A given another B (for ? ( A ( B  Supp(I)), this conditional probability is preserved, i.e., pI(AjB) = p(AjB) (if pI(B); p(B) 6= 0). 3.3 An alternative perspective on weak and strong silence Before stating our characterization theorem, we note that there is an alternative and equivalent way of dening weak and strong silence, which gives a di¤erent perspective on these notions. Informally, weak silence can be taken to mean that the learnt input implies nothing for the probability of A given B. Strong silence can be taken to mean that all its implications are outside the probability of A given B (i.e., the input constrains only parts of the agents belief state that are orthogonal to the probability of A given B). To make this more precise, we rst dene the implicationof a learnt probability of A given B 0 1 BAI | BAI | I other parts of the belief state Figure 2: The implications IAjB and IAjB derived from input I input I for the probability of A given B and for other parts of the agents belief state. Again, we assume that ? ( A ( B  Supp(I). 12To be precise, this is true whenever p(A); p(BnA) 6= 0. 10  The implication of I for the probability of A given B is the input, denoted IAjB, which says everything that I says about the probability of A given B, and nothing else (see Figure 2). So, IAjB contains all belief states p0 which are compatible with I on the probability of A given B. Formally, IAjB is the set of all p0 in P such that p0(AjB) = p(AjB) for some p in I (modulo a non-triviality constraint).13  The implication of I outside the probability of A given B is the input, denoted I AjB, which says everything that I says outside the probability of A given B, and nothing else (see Figure 2). So, I AjB contains all belief states which are compatible with I outside the probability of A given B. Formally, I AjB is the set of all p0 in P which coincide with some p in I outside the probability of A given B (modulo a non-triviality constraint).14 Clearly, I  IAjB and I  IAjB. The inputs IAjB and IAjB capture two orthogonal components (sub-inputs) of the full input I. Each component encodes part of the information conveyed by I. Weak and strong silence can now be characterized (and thereby alternatively dened) as follows. Proposition 2 For all inputs I 2 I and events A, B (where ? ( A ( B  Supp(I)), (a) I is weakly silent on the probability of A given B if and only if IAjB = P (i.e., I implies nothing for the probability of A given B), (b) I is strongly silent on the probability of A given B if and only if I AjB = I (i.e., I implies only something outside the probability of A given B). We can illustrate this proposition by combining Figures 1 and 2. According to part (a), weak silence means that the sub-input IAjB, which pertains to the probability of A given B, is vacuous. Graphically, it covers the entire area in the plot. According to part (b), strong silence means that the input I conveys no information beyond the sub-input I AjB, which pertains to those parts of the agents belief state that are orthogonal to the probability of A given B. Graphically, the input I covers a rectangular area ranging from the far left to the far right. 3.4 The theorem We have seen that the strong version of our conservativeness axiom, dened in terms of weak silence, leads to an impossibility result. By contrast, its weaker counterpart, dened in terms of strong silence, yields a characterization of Bayess and Je¤reys rules. Theorem 1 Bayess and Je¤reys rules are the only responsive and conservative belief-revision rules on the domains DBayes and DJe¤ rey, respectively. 13 In full precision, IAjB is the set of all p 0 in P such that if p0(B) 6= 0 then p0(AjB) = p(AjB) for some p in I satisfying p(B) 6= 0. 14 In full precision, IAjB is the set of all p 0 in P such that if there is a belief state p in I satisfying [p(C) 6= 0 for all C 2 fA;BnAg such that p0(C) 6= 0], then p0 coincides with some such p outside the probability of A given B. 11 It is also worth noting the following consequence of this result: Corollary 1 Every responsive and conservative belief-revision rule on some domain D  P  I coincides with Bayess rule on D \ DBayes and with Je¤reys rule on D \DJe¤ rey. It is easier to prove that if a belief-revision rule on one of these domains is responsive and conservative, then it must be Bayess or Je¤reys rule, than to prove the converse implication, namely that each of these rules is responsive and conservative on its domain. To illustrate the easier implication, note, for instance, that if a belief input pair (p; I) belongs to DBayes, such as I = fp0 : p0(B) = 1g, then the new belief state pI equals pI(jB) (since pI(B) = 1, by responsiveness), which equals p(jB) (by conservativeness, as I is strongly silent on probabilities given B). The reason why the converse implication is harder to prove is that one must take care to identify all the conditional probabilities on which a given input is strongly silent; there are more such conditional probabilities than one might expect. Once we have identied all those conditional probabilities, however, we can verify that the corresponding belief-revision rule does indeed preserve all of them, as required by conservativeness. 4 A decision-theoretic application To show that there is room for non-Bayesian belief-revision rules in economic theory, we now sketch an illustrative application of Je¤rey revision to decision and game theory. Standard dynamic decision and game theory is inherently Bayesian. As is widely recognized, this sometimes entails unrealistic assumptions of forward-looking rationality, which limit the ability to model real-life learning, reasoning, and behaviour. We give an example which illustrates some of these di¢ culties and shows how a non-Bayesian model can avoid them. The example suggests a new class of dynamic decision problems or games, those with surprisesor with unforeseen learning inputs. Ann, an employer, must decide whether to hire Bob, a job candidate. There is no time for a job interview, since a quick decision is needed. Ann is uncertain about whether Bob is competent or not; both possibilities have prior probability 12 . It would help Ann to know whether Bob has previous work experience, since this is positively correlated with competence, but gathering this information takes time. Bobs type is thus a pair (; ) whose rst component indicates whether he is competent ( = c) or not ( = c) and whose second component indicates whether he has work experience ( = e) or not ( = e). To apply a belief-revision model, let the set of worlds be the set of possible types of Bob, i.e., = fc; cg  fe; eg. Anns initial beliefs about Bobs type are given by the belief state p 2 P in which p(c; e) = p(c; e) = 0:4 and p(c; e) = p(c; e) = 0:1. Note the positive correlation between competence and work experience. Ann initially seems to face the dynamic decision problem shown in Figure 3:  First, a chance move determines Bobs type in according to the probability measure p. 12 Figure 3: Anns decision problem in its initial form  Next, Ann, uninformed of the chance move, can hire Bob (h) or reject him (h) or gather information about whether he has previous work experience (g).  Finally, if Ann chooses g, she faces a subsequent choice between hiring Bob (h) or rejecting him (h), but now she has information about , i.e., about whether he has work experience. Ann is an expected-utility maximizer, and her utility function is as follows: hiring Bob, who is of type (; ), contributes an amount v() to her utility, where v() = 5 if = c and v() =  5 if = c; and gathering information about reduces her utility by 1. Not hiring Bob yields a utility of 0. Ann has only one rational strategy: rst she gathers information (g), and then she hires Bob if and only if she learns that Bob has work experience ( = e). To see why, note the following. Immediately hiring Bob yields an expected utility of v(c)p(c) + v(c)p(c) = 512 + ( 5) 1 2 = 0. Immediately rejecting Bob also yields an expected utility of 0. Gathering information leads Ann to a Bayesian belief revision:  If she learns that he has work experience, she raises her probability that he is competent to p(cje) = p(c;e)p(e) = 0:4 0:5 = 4 5 . So, she hires Bob, since this yields an expected utility of (v(c)   1)p(cje) + (v(c)   1)p(cje) = 445 + ( 6) 1 5 = 2, while rejecting Bob would have yielded an expected utility of  1.  If she learns that Bob has no work experience, she lowers her probability that he is competent to p(cje) = p(c;e)p(e) = 0:1 0:5 = 1 5 . So, she rejects him as this yields an expected utility of  1, whereas hiring him would have yielded an expected utility of (v(c)  1)p(cje) + (v(c)  1)p(cje) = 415 + ( 6) 4 5 =  4. So ex ante the expected utility of gathering information is the average 2p(e) + ( 1)p(e) = 212 + ( 1) 1 2 = 1 2 , which exceeds the zero expected utility of the two other choices. So far, everything is classical. Now suppose Ann follows her rational strategy. She writes to Bob to ask whether he has work experience. At this point, however, something surprising happens. Bobs answer reveals right from the beginning that 13 Figure 4: Two ways of rening Anns decision problem his written English is poor. Ann notices this even before guring out what Bob says about his work experience. In response to this unforeseen learnt input, Ann lowers her probability that Bob is competent from 12 to 1 8 . It is natural to model this as an instance of Je¤rey revision. Formally, Ann learns the Je¤rey input I = fp0 2 P : p0(c) = 18g, and by Je¤reys rule her revised belief state pI is given by pI(c; e) = 1 10 , pI(c; e) = 1 40 , pI(c; e) = 7 40 , and pI(c; e) = 7 10 . As she reads the rest of Bobs letter, Ann eventually learns that he has previous work experience, which prompts a Bayesian belief revision, so that her nal belief state is pI(je) (or equivalently, (pI)I0 where I 0 is the Bayesian input I 0 = fp0 2 P : p0(e) = 1g). Since Anns posterior probability for Bobs competence is only pI(cje) = pI(c;e)pI(c;e)+pI(c;e) = 4 11 , she decides not to hire him, despite his work experience. Can classical decision theory explain this? Of course, the dynamic decision problem shown in Figure 3 is no longer adequate, as it wrongly predicts that Ann hires Bob after learning that he has work experience. The natural response, from a classical perspective, would be to rene the decision problem as shown in Figure 4(a). After Anns information-gathering move g an additional chance move is introduced, which determines whether Bobs written English is normal (w) or poor (w), where the probability of w, denoted t;, is larger if Bob is competent than if he is not, i.e., tc; > tc; . After observing this chance move, Ann makes her hiring decision. Although this rened classical model predicts that Ann turns down Bob after receiving his poorly written letter, it is inadequate in many ways. It ignores the fact that Ann is initially unaware of or does not consider the possibility that Bobs written English is poor (suppose, for instance, that based on her initial information, she had no reason to doubt, or even to think about, Bobs literacy). It treats the event of a poorly written letter from Bob as a foreseen rather than an unforeseen contingency. As a result, Anns reasoning at each of her decision nodes is modelled in an inadequate manner: 14 (i) In her rst decision (between h; h, and g), Ann is falsely taken to foresee the possibilities of learning w or learning w, i.e., to reason along the tree displayed in Figure 4(a) rather than that in Figure 3. This articially complicates her expected-utility maximization exercise, for instance by assuming awareness of the four parameters t;, for all values of and . (ii) In her second decision, in case Bobs written English is normal, Ann is taken to have learnt not just the parameter , but also the chance move w (normal written English), so that her posterior probability for Bobs competence is now p(cj;w) rather than p(cj). The additional conditionalization on w misrepresents Anns beliefs, since the absence of linguistic errors in Bobs letter goes unnoticed: it is not an unforeseen event (she had taken Bobs normal literacy for granted). In fact, Ann continues to conceptualize her decision problem as the one shown in Figure 3 rather than the one in 4(a). The additional belief revision (upon learning w) departs from, and complicates, Anns true reasoning. (iii) In her second decision, in case Bobs written English is poor, Anns reasoning is again misrepresented. Although it is true that the unforeseen news that Bobs written English is poor implies that Ann cannot uphold her original conceptualization of the decision problem (Figure 3), it does not follow that Ann re-conceptualizes her decision problem in line with Figure 4(a). Our informal description of Anns reasoning takes her to perform a Je¤rey revision of her beliefs over = fc; cgfe; eg, whereas Figure 4(a) takes her to perform a Bayesian revision of beliefs over the rened set of worlds 0 = fc; cg  fe; eg  fw;wg. Arguably, the Bayesian model of Anns behaviour is not only psychologically inadequate, but its predictive adequacy is also far from clear. Whether the model correctly predicts Anns behaviour at the various decision nodes depends on the exact calibration of the parameters t;, for all values of and . Their most plausible (e.g., objective) values might not imply Anns true decision behaviour, since that behaviour has a rather di¤erent psychological origin, which does not involve the parameters t; at all. We propose to model Anns decision problem non-classically as a decision problem with unforeseen inputs or surprises. As illustrated in Figure 4(b), instead of introducing a chance move (selecting w or w), we introduce a surprise move, which determines whether or not Ann receives a particular unforeseen input (here, the Jeffrey input I = fp0 2 : p0(c) = 18g). Then the problems in (i), (ii), and (iii) no longer arise:  Problem (i) is avoided because Ann does not foresee or conceptualize the surprise move before its occurrence, so that she initially still reasons along the simple decision tree of Figure 3.  Problem (ii) is avoided because in her second decision, without receiving the unforeseen input (the right branch at the surprise node), Ann only learns and hence reasons like in her second decision in the simple decision problem of Figure 3.  Problem (iii) is avoided because in her second decision, after receiving the unforeseen input (the left branch at the surprise node), Ann revises her beliefs in 15 response to the Je¤rey input I. In follow-up work, we formally dene decision problems (or more generally games) with unforeseen inputs and introduce a corresponding equilibrium notion.15 The details are beyond the scope of the present paper. Our aim in this section has simply been to illustrate that there is useful room for non-Bayesian belief-revision rules in a decision-theoretic model. 5 Concluding remarks We have developed a unied framework for the study of belief revision and shown that Bayess and Je¤reys rules can be characterized in terms of the same two axioms: responsiveness to the learnt input and conservativeness. The only di¤erence between those rules lies in the domains of learnt inputs to which they apply. We show in followup work that our analysis can be extended to other belief-revision rules, distinct from Bayess and Je¤reys rules. One such rule is Adamss rule, inspired by Ernst Adamss work on the logic of conditionals (Adams 1975) and formally introduced by Bradley (2005). It takes the learnt input to be a new assignment of conditional probabilities of some events, given some other events. Another rule is the dual-Je¤rey rule, which stands out for its natural dualityto Je¤reys rule. Here, the learnt input is a new conditional probability distribution, given a certain partition. Like Bayess and Je¤reys rules, these two rules also uniquely satisfy our two axioms on their respective domains. Beyond o¤ering a novel formal framework, the programmatic aim of this paper has been to put non-Bayesian belief revision onto the map for economic theorists. No doubt, skeptics will still wonder, why bother about non-Bayesian belief revision. By suitably rening the set of possible worlds, so the objection goes, we can always remodel Je¤rey inputs (as well as other non-Bayesian inputs) in a Bayesian manner. However, as we have noted in our discussions of Kotaro the junior academicand Ann the employer, such Bayesian remodelling comes at a cost: Over-ascription of opinionation: A signicant drawback of the Bayesian remodelling is that we must assume that the agent is able to assign prior probabilities to many complex events. In our initial example, Kotaro must assign prior probabilities to the various possible auditory signals that he might receive over the phone. Similarly, Ann the employer must assign prior probabilities to the various possible Je¤rey inputs she might receive. These may include not only learning that Bobs written English is poor, but also that he is a poetic writer, that he comes across as communicatively awkward in a way that she had not anticipated, and so on. To accommodate the possibility of belief changes in response to such inputs, we would have to ascribe to the agent beliefs over an ever more rened algebra of events, whose size grows exponentially with the number of belief changes to be modelled. This is not very plausible, since typical real-world agents either have no beliefs about such events or have only 15Anns equilibrium strategy in her decision problem with unforeseen inputs has the intended form: she rst gathers information (g), and then hires Bob if and only if she does not receive the unforeseen input I (Bobs poor written English) and learns that he has work experience ( = e). 16 imprecise ones. Even on a pure as-ifinterpretation of the Bayesian model, taking an agent to have highly sophisticated beliefs is dubious, given the complexity of their behavioural implications, which may be hard to test empirically. By contrast, once we restrict the complexity of the event-algebra, we may have to invoke non-Bayesian belief revision to capture the agents belief dynamics adequately. Over-ascription of awareness: The literature on unawareness suggests that a belief in an event (the assignment of a subjective probability to it) presupposes awareness of this event, where awareness is understood, not as knowledge of the events occurrence or non-occurrence, but as conceptualization, mental representation, imagination, or consideration of its possibility (e.g., Dekel et al. 1998; Heifetz et al. 2006; Modica and Rustichini 1999). But as we have noted, it is far from clear whether, prior to the telephone conversation with Terence (the departmental chair with the thick accent), Kotaro even considered the possibility of receiving such incomprehensible auditory signals, or whether Ann the employer would have considered the possibility that Bobs written English was so poor. In these examples, the agents plausibly lacked not only knowledge but also awareness of the surprise events. Arguably, many real-life belief changes involve the observation or experience of something that was previously not just unknown, but even beyond awareness or imagination. In sum, an economic modeller often faces a choice between (i) ascribing to an agent Bayesian revision of beliefs over a very complex, negrained algebra of events and (ii) ascribing non-Bayesian revision of beliefs over a simpler, more coarse-grained algebra of events. Perhaps because of the elegance of Bayess rule, many economists tend to assume that the rst of these routes is more parsimonious than the second. But this overlooks the loss of parsimony at the level of the event-algebra. If all non-Bayesian beliefrevision rules were ad hoc or otherwise unsatisfactory, the choice of route (i) might be understandable. But as we have shown, there are perfectly well-behaved nonBayesian alternatives. This should make option (ii) at least a contender worth taking seriously. 6 References Adams, E. (1975) The Logic of Conditionals. Dordrecht and Boston: Reidel. Bradley, R. (2005) Radical Probabilism and Bayesian Conditioning, Philosophy of Science 72(2): 342364. Csiszar, I. (1967) Information-type measures of di¤erence of probability distributions and indirect observations, Studia Scientiarum Mathematicarum Hungarica 2: 299318. Csiszar, I. (1977) Information Measures: A Critical Survey, Transactions of the Seventh Prague Conference on Information Theory : 7386. Dekel, E., Lipman, B., Rustichini, A. (1998) Standard state-space models preclude unawareness, Econometrica 66(1): 159173. 17 Diaconis, P., Zabell, S. L. (1982) Updating subjective probability, Journal of the American Statistical Association 77(380): 822830. Dietrich, F. (2012) Modelling change in individual characteristics: an axiomatic framework, Games and Economic Behavior 76(2): 471494. Douven, I., Romeijn, J. W. (2011) A new resolution of the Judy Benjamin Problem, Mind 120(479): 637670. Grünwald, P., Halpern, J. (2003) Updating probabilities, Journal of AI Research 19: 243278. Halpern, J. (2003) Reasoning about Uncertainty, MIT Press, Cambridge, MA. Heifetz, A., Meier, M., Schipper, B. C. (2006). Interactive unawareness, Journal of Economic Theory, 130(1): 7894. Je¤rey, R. (1957) Contributions to the theory of inductive probability, PhD Thesis, Princeton University. Modica, S., Rustichini, A. (1999) Unawareness and partitional information structures, Games and Economic Behavior 27(2): 265298. Shafer, G. (1981) Je¤reys rule of conditioning, Philosophy of Science 48(3): 337 362. van Fraassen, B. C. (1981) A Problem for Relative Information Minimizers in Probability Kinematics, British Journal for the Philosophy of Science 32(4): 375 379. Vineberg, S. (2011) Dutch Book Arguments, Stanford Encyclopedia of Philosophy (Summer 2011 Edition). A Appendix: Proofs A.1 Well-denedness of each revision rule Our two belief-revision rules are well-dened because the mathematical object used in the denition of the new belief state and the rules domain i.e., the learnt event B or the learnt family of probabilities (B) is uniquely determined by the relevant input I (in the case of Bayess rule) or su¢ ciently determined (in the case of Je¤reys rule) so that the denition does not depend on non-unique features. The following two lemmas, which the reader can easily verify, make this precise: Lemma 1 Every Bayesian input is generated by exactly one event B  . Lemma 2 For every Je¤rey input I, (a) all families (B)B2B generating I have the same subfamily (B)B2B:B 6=0 (especially, the same set fB 2 B : B 6= 0g); (b) in particular, for every (initial) belief state p 2 P, the (revised) belief state (2) is either (i) dened and identical for all families (B)B2B generating I, or (ii) undened for all these families.16 16Footnote 1 species when (2) is dened. 18 A.2 Proposition 1 Proof of Proposition 1. Suppose that #  3. Suppose, for a contradiction, that there exists a responsive and conservative revision rule on a domain D  DJe¤rey. Since #  3, we can nd events A;B  such that A \ B;BnA;AnB 6= ?. Consider an initial belief state p such that p(A \ B) = 1=4 and p(AnB) = 3=4, and dene the Je¤rey input I := fp0 : p0(B) = 1=2g. Note that (p; I) 2 D. What is the new belief state pI? First, note that I is weakly silent on the probability of A \ B given B. So, by strong conservativeness, we have pI(A \ BjB) = p(A \ BjB) (using the fact that p(B) 6= 0 and that pI(B) 6= 0 by responsiveness), i.e., (*) pI(AjB) = 1. Similarly, (**) pI(AjB) = 1. (This is trivial if A \ B = B, and can otherwise be shown like (*), using the fact that I is weakly silent on the probability of A\B given B.) By (*) and (**), pI(A) = 1. Further, I is weakly silent on the probability of A \ B given A, so that we have pI(A \ BjA) = p(A \ BjA), by strong conservativeness (using the fact that pI(A); p(A) 6= 0). Since pI(A) = 1 and given the denition of p, it follows that pI(B) = 1=4. But, by responsiveness, we have pI(B) = 1=2, a contradiction.  A.3 Proposition 2 We begin by o¤ering a convenient reformulation of strong silence; we leave the proof to the reader. Lemma 3 For all inputs I and all events ? ( A ( B  Supp(I), I is strongly silent on the probability of A given B if and only if  I contains a belief state p with p(A); p(BnA) 6= 0, and  for every such p 2 I and every 2 [0; 1], I contains the belief state p0 which coincides with on the probability of A given B and with p outside that conditional probability, formally p0 2 I where p0 = p(jA)p(B) + p(jBnA)(1  )p(B) + p( \B): Proof of Proposition 2. Consider I  P and ? ( A ( B  Supp(I). (a) First suppose IAjB = P. Consider any 2 [0; 1]. As ? ( A ( B, there exists a belief state p0 such that p0(B) 6= 0 and p0(AjB) = . As IAjB = P, we have p0 2 IAjB, so that I contains a p (with p(B) 6= 0) such that p(AjB) = p0(AjB), i.e., such that p(AjB) = , as required for weak silence. Now assume that I is weakly silent on the probability of A given B. Trivially, IAjB  P. We show that P  IAjB. Let p0 2 P. If p0(B) = 0, then clearly p0 2 IAjB. Otherwise, by weak silence, applied to := p0(AjB), I contains a p such that p(B) 6= 0 and p(AjB) = p0(AjB), so that p0 2 IAjB. (b) First, in the trivial case in which I contains no p0 such that p0(A); p0(BnA) 6= 0, the equivalence holds because strong silence is violated (see Lemma 3) and moreover I AjB 6= I because IAjB but not I contains a belief state p 0 such that p0(A); p0(BnA) 6= 0. Now consider the less trivial case in which I contains a ~p such that ~p(A); ~p(BnA) 6= 0. First suppose I AjB = I. To show strong silence, consider any 2 [0; 1] and any p 2 I with p(A); p(BnA) 6= 0. By Lemma 3, it su¢ ces to show that the belief 19 state p0 which coincides with p outside the probability of A given B and satises p0(AjB) = belongs to I. Clearly, p0 belongs to I AjB. Hence, as I = IAjB, p 0 belongs to I. Conversely, assume that I is strongly silent on the probability of A given B. Trivially, I  I AjB. To show the converse inclusion, suppose that p 0 2 I AjB. Then there exists p 2 I such that p0 and p coincide outside the probability of A given B and such that p(C) 6= 0 for all C 2 fA;BnAg with p0(C) 6= 0. We distinguish two cases. First suppose p(A); p(BnA) 6= 0. Then p0(B) = p(B) 6= 0. By Is strong silence on the probability of A given B, I contains a belief state ~p (with ~p(B) 6= 0) which satises ~p(AjB) = p0(AjB) and coincides with p outside the probability of A given B. Note that, since p(A); p(BnA) 6= 0, there can be only one belief state that coincides with p outside the probability of A given B and such that the probability of A given B takes a given value. Therefore, p0 = ~p, and so p0 2 I, as required. Next consider the special case in which p(C) = 0 for at least one C 2 fA;BnAg. As p(C) = 0) p0(C) = 0 for each C 2 fA;BnAg and as p0(A)+ p0(BnA) = p0(B) = p(B) = p(A) + p(BnA), it follows that p0(C) = p(C) for each C 2 fA;BnA;Bg. This and the fact that p0(jC) = p(jC) for all C 2 fA;BnA;Bg for which p0(C) (= p(C)) is non-zero imply that p0 = p. So again p0 2 I.  A.4 The main theorem As a key step towards proving our main theorem, we now answer the following question: on which conditional probabilities are the learnt inputs under Bayess and Jeffreys rules strongly silent? Lemma 4 Any Bayesian input I (of learning that some event B0 has occurred) is strongly silent on any conditional probability of A given B such that ? ( A ( B  B0. Lemma 5 For all Je¤rey inputs I (of learning a new probability distribution on a partition B) and all events ? ( A ( B  Supp(I), I is strongly silent on the conditional probability of A given B if and only if B  B0 for some B0 2 B. Since Lemma 4 is a special case of Lemma 5, we turn directly to the proof of the latter. Proof. Let I, B, A, and B be as specied, and let (B)B2B be the learnt probability distribution on B. First, if B  B0 for some B0 2 B, then I is strongly silent on the probability of A given B, as one can easily check, using Lemma 3. Conversely, suppose that B 6 B0 for all B0 2 B. For each D  , we write BD := fB0 2 B : B0 \D 6= ?g. Note that BB = BA [ BBnA, where #BA  1 (as A 6= ?), #BBnA  1 (as BnA 6= ?), and #BB  2 (as otherwise B would be included in a B0  B). It follows that there are B0 2 BA and B00 2 BBnA with B0 6= B00. Note that I contains a p such that p(B0 \ A) = B0 and p(B00 \ (BnA)) = B00 . Since each of B0 and B00 has a nonempty intersection with B, and hence with Supp(I) ( B), we have B0 ; B00 6= 0. Now p(B00 \A) = p(B00 \B) = 0, since p((B00 \A) [ (B00 \B)) = p(B00)  p(B00 \ (BnA)) = B00   B00 = 0. 20 By Lemma 3, if I were strongly silent on the probability of A given B, I would also contain the belief state p0 which coincides with p outside the probability of A given B and satises p0(AjB) = 1; i.e., I would contain the belief state p0 := p(jA)p(B) + p( \B). But this is not the case because p0(B00) = p(B00jA)p(B) + p(B00 \B) = 0 6= B00 , where the second equality uses the fact that p(B00 \A) = p(B00 \B) = 0, which we have shown. Hence, I is not strongly silent on the probability of A given B.  We can now complete the proof of our characterization theorem. Proof of Theorem 1. It su¢ ces to consider Je¤reys rule, since Bayess rule is extended by Je¤reys. We rst prove one direction of implication, and then prove the other direction. Part 1 : We consider a responsive and conservative revision rule on the domain DJe¤rey and show that the rule is indeed Je¤reys rule. Suppose (p; I) 2 DJe¤rey, say I = fp0 : p(B) = B 8B 2 Bg. Then pI is given by Je¤reys rule, because we may expand pI as pI = X B2B:pI(B) 6=0 pI(jB)pI(B); (4) where pI(B) reduces to B by responsiveness, and pI(jB) reduces to p(jB) by conservativeness. (Note that, by Lemma 5, I is strongly silent on the probability, given B, of any event strictly between ? and B.) Part 2 : Conversely, we now show that Je¤reys rule is responsive and conservative. Responsiveness is obvious. To establish conservativeness, consider any (p; I) in the domain DJe¤rey and any events ? ( A ( B  Supp(I) such that I is strongly silent on the probability of A given B and pI(B); p(B) 6= 0. We have to show that pI(AjB) = p(AjB). As a Je¤rey input, I takes the form fp0 : p0(B) = B 8B 2 Bg for some learnt probability distribution (B)B2B on some partition B. Since I is strongly silent on the probability of A given B, we must have B  B0 for some B0 2 B, by Lemma 5. It follows that pI(AjB) = p(AjB), because pI(AjB) = pI(A) pI(B) = p(AjB0)B0 p(BjB0)B0 = p(A)=p(B0) p(B)=p(B0) = p(AjB), where the second identity holds by the denition of Je¤rey revision. 