Working draft. Please do not cite or circulate. Comments very welcome. new boundary lines Alejandro Pérez Carballo University of Massachusetts, Amherst According to Bayesian orthodoxy, a rational agent's epistemic state is essentially a probability function-her credence function. At any point in time, an agent's credence function is dened over a hypothesis space- typically a Boolean algebra of propositions or events. As an agent receives new evidence, her credence function is updated by conditionalizing on her evidence in the following sense: if an agent's credence function at a given point is C and she receives evidence e, then (assuming C(e) ≠ 0) her updated probability function C′ is set to: C′(x) = C(x ∣ e) = C(xe) C(e) . Typically, conditionalization is taken to be the only rational epistemic change. And this entails that a rational agent's hypothesis space will remain constant throughout time. is is because for any e, if C(p) is not well-dened, neither is C(p ∣ e).1 1 e same holds of Jerey-conditionalization (Jerey 1983), for exactly the same reasons. Matters are less straightforward if we take conditional probabilities to be primitive-as in e.g. Hájek 2003, Popper 1959, Rényi 1970. For if we allow C(x ∣ e) to be well-dened even when the ratio is not, we can have C(p ∣ e) be well-dened even when C(p) is not. But in any case, no 'radical' changes in the hypothesis space are allowed. More precisely, let the generalized hypothesis space consist of those propositions x for which C(x ∣ y) is well-dened for some y that is logically independent of x. Even if we allow for primitive conditional probabilities, it is a consequence of Bayesian orthodoxy that an agent's generalized hypothesis space will remain constant throughout time. Furthermore, as will become clear later on, while capturing the phenomenon of rational conceptual change requires the possibility of changes in the hypothesis space, it is not obviously sucient.e kind of change we will be interested in is rarely (if at all) triggered by new evidence. dfd9d16 on 2020-01-09 ⋅ branch master Is this a feature or a bug? Surely, some important changes in our epistemic states are le out of the orthodox picture. Typical examples involve the introduction of new theories, usually via the introduction of new concepts. In acquiring the concept of a gene, say, we enlarged our hypothesis space-e.g. we introduced new competing hypotheses to explain inheritance patterns of certain phenotypic traits. Still, onemight think such changes fall outside of the scope of a theory of epistemic rationality. Aer all, epistemic rationality arguably requires a certain amount of immodesty: if you are rational, then you should take yourself to be doing as well as you can, epistemically, given your available evidence. And it certainly seems as if changing your epistemic state without new evidence would involve moving to an epistemic state that is worse, by your own lights, than your current state. So it is tempting to conclude that a change in one's hypothesis space-at least when it is not the result of acquiring new evidence-can never be epistemically rational. (A good inquirer might be required to act in certain ways-perhaps she will be required not to shield herself from relevant evidence. Yet as far as what her epistemic state should be, one might think, the only sensible thing to say is whatever orthodoxy has to say.) But one can consistently maintain that epistemic rationality is immodest and that some rational epistemic changes need not be triggered by new evidence. To see what I have in mind, consider the following analogy. Lars is a fun-maximizer. At any point in time, he always takes himself to be doing the most fun thing he can do. While having lunch, Lars' perspective on the world is quite striking: I could be doing a number of things right now, but nothing would be as fun as having lunch. Of course, Lars is not stuck having lunch all day long. As he eats, his evidence changes-he acquires evidence that he is no longer hungry. He thus proceeds to do what he takes to be the most fun activity he could do, in light of his new evidence: sit quietly under an oak tree. Now, if you looked at the specic choices Lars makes throughout his life, you would nd them incredibly boring-imagine he spends his waking hours alternating between having lunch and sitting quietly under an oak tree. But this is not because his preferences are much dierent from yours. Rather, the problem is that Lars lacks imagination. If you could only get him to see that there are many things he could do with his day other than spend it quietly under an oak tree, you know he would 2 be thankful. And this can be so even though Lars is doing as well as he can in order to have fun. Among all the options that ever occur to him as things he could be doing at any particular time, he always takes himself to be doing the one he nds most fun. But this is not because he is always having that much fun: he is just not able to imagine all the fun things he could do instead. Now consider an agent, Tom, who always takes himself to be doing as well as he can, epistemically. He only has beliefs about a particular issue: whether he is tired. He is very good at responding to the evidence he receives, and at any time, he takes himself to be doing as well as he can with regards to the issue of whether he is tired. To some extent, Tom is doing quite well, epistemically. But he could be doing much better: he could be asking questions about many things, including issues that have little to do with how tired he is. If Tom lacks imagination, he could take himself to be doing as well as he can, epistemically, because he only considers a limited range of options. If he came to see that he could be in an epistemic state he had not considered, he would pick it in a heartbeat. So if a new issue occurred to him, he could in principle come to have a view on the matter without having acquired any new evidence. And crucially, without taking his earlier self to have been at fault. We can think of the introduction of new concepts as an increase in our capacity for epistemic imagination. Once new concepts are introduced, epistemic states that were not previously available to us become available. e question arises as to how this aects what epistemic state we ought to be in: is it ever rational for us to move to a new epistemic state just because new concepts have been introduced? is a type of epistemic change that orthodox Bayesian theories of epistemic rationality have little to say about.2 Indeed, the question cannot even be formulated within the orthodox picture. Here I want to sketch a way of thinking about rational epistemic change that gives us the exibility 2 Let me ag here an important distinction, one we will return to, between two distinct if related questions. First, assuming new concepts will be introduced, what is the rational way of distributing credence among the resulting set of propositions? Second, assuming new concepts could be introduced, when is it rational to do so? My concern here will mainly be with the second of these two questions.e rst, of course, is one I will need to have something to say about. But I will remain largely neutral on what the best way to answer it is. See e.g. Romeijn 2005, Williamson 2003 for discussion. 3 to ask that question.e hope is to show how epistemic rationality could constrain expansions of the range of hypothesis we rely on in inquiry. ere will be many moving parts. It will thus be helpful, before moving on, to give a detailed road map of what is to follow. I start (§1) by making explicit my main working assumptions. I then spell out in §2 a minimal way of modeling conceptual change within a Bayesian picture of our epistemic states. Next (§3), I situate the question of the rationality of conceptual change within the broadly decision-theoretic approach to epistemic rationality sometimes labeled 'epistemic utility theory'. Doing so requires extending existing the epistemic utility framework to allow for comparisons of epistemic states that assign credence to dierent sets of propositions. I then motivate a richer account of epistemic utility (§4), one that goes beyond accuracy considerations and incorporates explanatory considerations. I oer a particular framework for doing so, one where accuracy with respect to a proposition is valued in a way that is proportional to its explanatory worth. Aer motivating a particular way of understanding the explanatory worth of a given proposition, I oer in §5 a way of measuring explanatory worth so as to generate the weights needed to dene epistemic utility functions that are sensitive to explanatory considerations. Before closing and listing a few issues le outstanding (§7), I turn back in §6 to some of themotivating toy examples and show how the strategy I develop can be used to vindicate some of our intuitive judgments about the epistemic gain that comes from certain instances of conceptual innovation. 1 methodological assumptions I will identify propositions with sets of possible worlds. I will speak interchangeably of the conjunction and the intersection of two propositions- similarly for the disjunction or union of two propositions, and the negation or the complement of a propositions. I will also speak interchangeably of a proposition entailing another and of the former being a subset of the latter. I will be relying on a familiar, broadly Bayesian picture of our cognitive states. An epistemic agent, for our purposes, is essentially a credence 4 function-a function C that assigns real numbers to propositions, subject to the following constraints:3 bayesian core: C is a probability function dened over a nite Boolean algebra BC of subsets of W , the collection of possible worlds. at is, C is a real-valued function whose domain is a collection of propositions that is closed under conjunction and negation, that satises the following constraints: a. C(x) ≥ 0 b. C(W) = 1. c. C(x∨ y) = C(x)+C(y), whenever x and y are incompatible propositions (i.e., whenever their intersection is empty) in the domain of C. Whenever C satises the constraints in bayesian core, I will say that C is probabilistically coherent. I will oen refer to the domain of C as its hypothesis space Fix such a function C. Since BC is nite, there is a unique subset πC of BC of atoms: for all s ∈ πC and x ∈ BC , either s entails x or s and x are incompatible. Any nonempty member of BC can thus be written out as the disjunction of elements of πC . I will call πC the state space of C. I will sometimes refer to the atoms of BC as C-atoms. If C is dened over the entire collection of propositions, its state space will be the collection of singleton propositions, viz. propositions of the form {w} for w ∈ W . But in general, πC will be a partition of W-a collection of pairwise exclusive and jointly exhaustive subsets of W . When there is no risk of ambiguity, I will drop the subscripts and simply use π (and B) to talk about the state space (resp. the domain) of C. By design, B is the Boolean closure of π-the smallest Boolean algebra that contains all the members of π. For any x ∈ B, we have C(x) = ∑ s ∈ π C(sx) =∑ s⊆x C(s). 3 roughout, I will be assuming we are dealing with credence functions dened over a nite collection of proposition. As far as I can see, the bulk of what I say survives if we allow for credence functions dened over countably many propositions.e gain in generality, however, would come at the cost of simplicity. (I do not know what would happen if we took into account credence functions dened for uncountably many propositions.) 5 us, in order to specify a given credence function, all we need to know is what values it assigns to elements of its state space. e elements of π-the C-atoms-can be thus thought of as the possible states of the world that play the role of possible worlds for C.4 I will call Ce the credence function that results from updating C on evidence e. When e ∈ BC , I will assume: bayesian update: For e , x ∈ B, if C(e) ≠ 0, Ce(x) = C(x ∣ e) = C(xe) C(e) is completes the sketch of our background framework. Now, our concern here is, ultimately, with the rationality of conceptual change. So before we can ask whether conceptual change can ever be epistemically rational, we need a way of thinking about conceptual change within this background picture of our cognitive economy. 2 modeling conceptual change in a bayesian framework Concepts, we are oen told, are 'constituents of thoughts'.is presupposes a certain picture of thoughts as having structure-thoughts are 'built out of' concepts, as it were. But even those who want to reject this presupposition can agree that, to the extent that concepts have a role to play in a theory of the mind, what concepts one has constrains what thoughts one can have. Conceptually impoverished creatures, we can say, can only entertain a limited range of propositions. And acquiring new concepts, among other things, increases the range of thoughts an agent can have.5 is minimal characterization of the role of concepts in our cognitive economy suggests a straightforward way of thinking about conceptual change in a Bayesian framework.6 4 Elements of the state space can be thought of as 'small worlds', in the terminology of Savage 1972. 5 Whether the relation between having a concept and having the ability to entertain the relevant thoughts is one of constitution or not is something I want to remain neutral on for the purposes of this paper. 6 For related discussion of how this approach can model talk of concepts in a broadly Bayesian picture, see Yalcin 2018, as well as [redacted for blind review]. 6 2.1 Epistemic transitions ink of an epistemic transition as a change from one credence function to another. A transition is trivial i the initial and nal credence functions are identical. An epistemic transition is an update if the initial credence function and the nal credence function of the transition have the same domain.7 An epistemic transition is an expansion if the domain of the initial credence function is a subset of the domain of the nal credence function. A credence functionC′ is an extension ofC iC′ is an expansion of C that is conservative in the following sense: for all x in the domain of C, C′(x) = C(x) (see Figure 1). Other types of transitions are possible. It is natural to think of the domain of an agent's credence function as the collection of propositions the agent can entertain.8 To x terminology, then, I will say that an epistemic transition is an instance of conceptual change only if the domain of the nal credence function is distinct from of the domain of the initial credence function-if there are propositions in the domain of one of the two credence functions that are not in the domain of the other. So expansions, as I understand them, count as instances of conceptual change. I should emphasize that this is not meant as an analysis of the intuitive notion of conceptual change. Instead, it is a modeling suggestion: if we model epistemic states using credence functions, we canmodel changes in an agent's conceptual resources as changes in the domain of her credence function. is modeling choice has the advantage of requiring a minimal adjustment to the orthodox Bayesian picture. Admittedly, it leaves many 7 Hence, not all updates satisfy the constraints in bayesian update. 8 More precisely, my proposal is that we think of the domain of the agent's credence functionere are other interpretations. Some have identied the domain of an agent's credence function (at a time t) as the collection of propositions the agent is attending to (at t). See e.g. Franke & de Jager 2011, Swanson 2006, Yalcin 2007. For our purposes here, however, I want to ignore the dierence between the propositions an agent is attending to and those she is not. 7 0.4 0.6 0.2 0.8 (a) An update. 0.4 0.6 0.25 0.25 0.25 0.25 (b) An expansion 0.4 0.6 0.2 0.2 0.3 0.3 (c) An extension Figure 1: ree kinds of epistemic transitions. ink of each square as representing the collection of possible worlds, partitioned along the lines of the state space of a credence function. We can use shades of gray to represent the amount of probability assigned to elements of the corresponding partition: the darker the cell, the more credence it gets. Updates can thus be represented by changes in the degree of gray in the cells of the partition. Expansions may also involve moving to a ner partition. Extensions are extensions that leave the distribution of probability over the state space of the prior function unchanged. 8 questions unanswered.9 But the resulting picture is exible enough to model some important features of conceptual change. A couple of choice examples illustrate this point. (ese examples will prove to be helpful later on, so I will go through them in a bit more detail than might seem necessary at this point.) 2.2 Toy example 1:e Reds You are studying an unfamiliar type of organism, call them 'reds', and their reactions to certain stimuli.10 You keep your reds inside dark boxes for a little while and then proceed to ash dierent colored lights on them to see how they react. On Monday, you notice that some reds start moving faster when exposed to blue lights; others show no change in behavior when exposed to blue light. You also notice that some reds start moving faster aer being exposed to red lights; others show no change in behavior when exposed to red light. To your surprise, you notice too that that most of the reds that move faster aer exposure to red light are among those that move faster aer being exposed to blue light, and that most of those that move faster aer being exposed to blue light are among those that move faster aer exposure to red light. On Tuesday, it occurs to you that there could be an internal state R such that being in state R makes a red respond to blue and red light by moving faster than normal. Once you bring R into the picture, you can formulate a hypothesis that could be used to explain why some reds 9 For example, take an epistemic transition that, according to our terminology, counts as an instance of conceptual change. Which are the new concepts available in the nal stage? Nothing in what I have said thus far allows us to answer that question. To be sure, our modeling suggestion could be rened in many ways. For example, it may be that not all changes in the domain of an agent's credence function correspond to genuine instances of conceptual change. Perhaps only those transitions where the nal credence function satises some kind of closure condition-say, something like the socalled generality constraint (Evans 1982)-should be classied as instances of conceptual change. Further elaboration of these ideas, however, is beyond the scope of this paper. 10 is example is based on a series of cases discussed in great detail in Sober 1998. See also Forster 1999, for related discussion of how conceptual innovation can be motivated by epistemic considerations. 9 respond to blue and red lights the way they do-e.g., that they, unlike the rest, are in state R.11 In order tomodel your epistemic state onMonday we need a credence function whose domain is the Boolean closure of propositions of the form: c moves faster aer exposure to blue light, c moves faster aer exposure to red light, c's behavior is unaected by exposure to blue light, c's behavior is unaected by exposure to red light. To model your epistemic state on Tuesday, we need an expansion of your Monday credence function. We need a credence function whose domain includes the domain of your Monday credence function together with propositions of the form c is in state R. It is largely because of the addition of these new propositions that you can formulate the hypotheses that reds in state R move fast when exposed to red light and that a red in state R stops moving while exposed to blue light. 2.3 Toy example 2: Cubes and spheres OnWednesday you discover some critters.12ey come in two shapes, spheres and cubes. You soon realize that by squeezing two of them together, a new, slightly larger critter appears. You start gathering data about how the shape of the generated critter depends on the shape of the two you pressed against one another, as depicted on Table 1. Second parent First parent cube sphere cube cube cube sphere cube sphere Table 1: Shape of second-generation critters as a function of parents' shape. Once you press the new, slightly larger critters against one another, things get messier. Spheres pressed together yield even larger spheres. But for every other combination of the slightly larger critters you notice that the shape of the created critter depends on the shapes of the ones 11 I am not taking a stand on whether this is a good explanation. Whether this amounts to anything beyond a 'dormitive virtue' explanation is not relevant to our present purposes. 12 e example that follows is essentially due to Arntzenius 1995. 10 you pressed together and of the shapes of the ones those came from (see Table 2). Second parent First parent cube, cube parents cube, mixed parents sphere cube, cube parents 100% cube 100% cube 100% cube cube, mixed parents 100% cube 75% cube 50% cube sphere 100% cube 50% cube 100% sphere Table 2: Shape of third-generation critters as a function of ancestors' shape. ings get even messier when you try to gather data about third generation critters. (As it happens, squeezing together critters of dierent sizes does nothing to them.) Onursday morning it occurs to you that cube critters could be of two kinds, call them pure and hybrid. Once you factor in hybrid cubes, you can accommodate your data using the hypothesis in Table 3. Second parent First parent pure cube hybrid cube sphere pure cube 100% pure cube 50% pure cube 100% hybrid cube 50% hybrid cube hybrid cube 50% pure cube 25% pure cube 50% hybrid cube 50% hybrid cube 50% hybrid cube 50% sphere 25% sphere sphere 100% hybrid cube 50% hybrid cube 100% sphere 50% sphere Table 3: A new hypothesis about the shape inheritance pattern. To model your epistemic state on Wednesday, we need a credence function that is dened over the Boolean closure of all propositions of the form: c is a cube, c is a sphere, c is an nth-generation critter, and a and b are c's parents. To model your epistemic state onursday, we need an expansion of your Wednesday credence function. Its domain should be the Boolean closure of the domain of your Wednesday credence function together with propositions of the form c is a pure cube and c is a hybrid cube. Crucially, the hypothesis that the inheritance pattern of shapes is given 11 by Table 3 is in the domain of yourursday credence function, but not in the domain of your Wednesday credence function.is allows us to think of the change in your epistemic state that took place onursday as an instance of conceptual change. 1st 2nd 3rd cube sphere 1st 2nd 3rd hybrid cube pure cube sphere Figure 2: Each of the gures above represents a dierent space of hypotheses about the properties of a given critter.e space of hypotheses on the le only contains propositions of the form 'x is an nth generation critter of shape S ', for n ∈ {1, 2, 3} and S one of 'cube' or 'sphere'.e output of the expansion contains a larger space of hypotheses, also generated by propositions of the form 'x is an nth generation critter of shape S ', but where S is allowed to range over 'pure cube', 'sphere', and 'hybrid cube'. Note that we're identifying the hypothesis 'x is a cube', in the rst hypothesis space, with the hypothesis 'x is a hybrid cube or x is a pure cube' in the second hypothesis space. e proposition that the critter is a second-generation pure cube (in gray) is one that is in the domain of the output credence function, but not in that of the input credence function. 3 epistemic utility and epistemic rationality Which epistemic transitions are rational? On a standard Bayesian picture, the only rational epistemic transitions are updates.is is not because Bayesian dynamics has a criterion that is applicable to all epistemic transitions and which rules out as non-rational anything other than updates. Rather, it is because the scope of Bayesian dynamics is limited to updates: the question Bayesian dynamics aims to answer is whether an update is a rational response to the acquisition of a new piece of evidence. If we are to tackle the question of the rationality of conceptual change, we need a framework in which the general question of the rationality of epistemic transitions can be formulated. Fortunately, we need not look too far. We can adapt standard decisiontheoretic tools to build such a framework.13 13 ere is a growing body of literature on so-called epistemic utility theory. See Greaves 2013, Greaves & Wallace 2006, Joyce 1998, 2009, Leitgeb & Pettigrew 2010, Moss 2011, 12 3.1 e epistemic decision-theoretic framework Start by focusing on a simple form of expected utility theory. We evaluate dierent alternatives relative to a credence function and a utility function- an assignment of numerical values to each alternative relative to a given state of the world. A particular alternative is better than another if it has a higher expected utility-again, relative to a credence and utility functions. More precisely, suppose C is a credence function with state space π and U assigns a numerical value to each alternative a ∈ A relative to each s ∈ π.en, the expected utility of a, relative to C, is given by EC[U(a)] = ∑ s ∈ π C(s) ⋅U(a, s). e canonical application of this framework is to give an account of rational decision theory-to solve decision problems. Typically, a decision problem is just a set of alternative courses of action. We evaluate these relative to a given utility function and a credence function. On one view, an agent's actions are rational just in case they have the highest expected utility (among a relevant set of alternatives) relative to her own credence function and her own utility function.14 But we can apply this conception of rationality to other situations. Whenever we have a range of options and an assignment of utility to each option relative to each possible state of the world, we can apply expected utility theory to evaluate each of the relevant options. In particular, we can think of decision problems where the alternatives are possible epistemic states one could be in. So long as we have a credence function (dened over possible states of the world) and a utility function dened over the Pettigrew 2012, 2013. As will become apparent below, I will be building upon some of this work. e framework I will develop can be seen as a generalization of the more traditional epistemic utility theory to allow for variability in the domain of the credence functions that are the object of evaluation. For some critical discussion of the particular modeling assumptions of most work in epistemic utility theory, see Caie 2013. For critical discussion of the underlying 'consequentialist' framework, see Berker 2013. For insight on how to understand the epistemic utility framework without the seemingly problematic teleological assumptions of most interpretations of the framework, see Stalnaker 2002. 14 Since the debate over whether to formulate decision theory in evidential or causal terms is orthogonal to our present purposes, I am skating over some subtle issues here. Everything I go on to say would survive dening expected utility in more evidentialist-friendly ways, since I will be restrictingmy attention to cases where the dierent formulations of decision theory give the same verdict. 13 relevant alternatives (relative to a possible state of the world), we can use expected utility to compare dierent possible epistemic states. is way of evaluating alternatives is relativistic in an important sense: it only makes sense to ask whether a given option is better than another relative to a particular credence function and utility function.15 But this does not prevent us from using it to capture thicker notions of value: we only need to make more restrictions on what is to count as an admissible utility function (ditto for credence functions). In particular, we can use it to capture a notion of epistemic value. Given a credence function C, an 'epistemic utility function' u, and a set of epistemic states, we say that an epistemic state is better than another, epistemically, i it has higher expected utility relative to C and u. Now, for this to be of any help, we need to specify what a utility function must be like if it is to count as an epistemic utility function-a utility function that corresponds to an epistemic dimension of evaluation. And you might worry that there is not much content to the notion of purely epistemic value: perhaps any epistemic dimension of evaluation will be somewhat entangled with pragmatic considerations.16 Still, we can aim to evaluate our beliefs so as to minimize the interference of pragmatic considerations. We can set aside particular idiosyncrasies of our judgments of practical value, and focus instead on how some beliefs more than others help us make sense of the world. To give you a sense of the kind of evaluation I'm aer, consider the following case.17 An Oracle (which you know to be perfectly reliable) tells you the truth-value of every proposition. She then tells you that you will be put to sleep and your memory will be erased. Fortunately, you can now pick which credence function you will wake up with. If you could pick any credence function, I suppose you would know what to choose: the one that assigns 1 to all and only the true propositions. But here is the catch: you cannot pick any credence function. You will be 15 Cf. Stalnaker 2002, p. 158. 16 Formore on skepticism about the notion of a purely epistemic notion of value, see Gibbard 2008, as well as Arntzenius 2008. 17 A similar case is used for essentially the same purpose in Moss 2011, p. 1063f.anks to [redacted for blind review] for bringing this to my attention. 14 given a choice among a small set of credence functions which does not include the one you currently have. I suspect you have a rough idea of how you will choose, if you set aside your practical interests for a moment. You will be able to compare at least some epistemic states with each other, in a way that corresponds to an epistemic dimension of evaluation. Intuitively, a utility function will count as an epistemic utility function just in case it corresponds to the way a fully informed agent would rank epistemic states from an epistemic perspective. To be sure, this cannot be a denition of an epistemic utility function, unless we have a clear enough notion of what an epistemic dimension of evaluation is. But it is a useful heuristic, one that can help motivate plausible conditions on epistemic utility functions. 3.2 Accuracy-based measures of epistemic value One way in which we can evaluate epistemic states along an epistemic dimension is in terms of accuracy. My credence in p is accurate, relative to the actual world, to the extent that it is close to p's actual truth-value. If p is true, there is a relevant sense in which, again relative to the actual world, a credence function that assigns .9 to p is better, all else equal, than one that assigns .8 to p. is intuitive thought has been made precise in the literature on so-called scoring rules or accuracy measures. An accuracy measure, as I will use the term, is a function that assigns to each credence function and possible state of the world, a numerical value-a measure of how accurate the given credence function is relative to that state of the world. An accuracy measure is thus a plausible example of an epistemic utility function. ere is a lively, ongoing debate about how best to characterize measures of accuracy.18 For now, and for the sake of concreteness, let us stick to a particular working example of a measure of accuracy, viz. the Brier score β∗, dened as: β∗(C , s) = −∑ t ∈ π (C(t) − 1{s = t})2, 18 See, e.g. Joyce 2009, Leitgeb & Pettigrew 2010. 15 where 1{s = t} equals 1 if s = t and 0 otherwise. is utility function assigns, to each credence function C and world w, the sum over s ∈ πC of the negative square of the distance between C(s) and s's truth-value at w. Since we can think of the negative square of the distance between C(s) and s's truth value at w as a measure of the accuracy of assigning C(s) to s in world w, we can think of β∗(C ,w) as telling us how accurate C is at world w. What happens if we assume that epistemic rationality is a matter of maximizing the expected Brier score of our credence function? Remarkably, we can derive a large part of the tenets of a Bayesian picture of epistemic rationality.ere is an interesting sense in which, if rationality is a matter of maximizing the expected Brier score of our credal state, then rationality requires that we have a probabilistically coherent credal state and that we update our degrees of belief by conditionalizing on new evidence-in short, that our credence functions satify bayesian core and bayesian update.19,20 3.3 Expansions and epistemic utility We can formulate the Bayesian picture of epistemic rationality within the framework of epistemic utility theory. But the scope of the epistemic utility framework is broader than that of the orthodox Bayesian picture. So far, the focus in the literature has been on epistemic utility functions dened over a collection of credence functions with a xed domain. But in principle, an epistemic utility function could be dened so as to compare, relative to a given world, credence functions with dierent domains. 19 Cf. Greaves &Wallace 2006, Joyce 2009, Leitgeb & Pettigrew 2010. One needs to tread carefully if one is to argue that probabilistic coherence is a requirement of rationality. So far, I have only talked of credal states that are probabilistically coherent, and dened the notion of expected epistemic utility for probability functions. Joyce oers a more general denition, where it makes sense to talk about the expected epistemic utility of an assignment of degrees of belief relative to another assignment, even if neither of them are probabilistically coherent. Since those issues are orthogonal to our present concerns, I will simply refer the reader to Joyce's paper for the details. 20 Certain complications arise if we look at credence functions dened over propositions about one's own credal assignments. For present purposes, I will simply assume those complications away-any credence function that will be relevant here is one that is undened for any such 'higher-order' proposition. 16 Return to the example from §2.3, involving cubes and spheres. Consider your Wednesday credence function, one whose domain does not include propositions about whether a critter is a pure cube. Consider now yourursday credence function, one whose domain does include those propositions. Here are a few things about it worth noting: ● e hypothesis-call it t1-that the inheritance pattern of shapes among the critters is given by Table 3 is generation independent. It applies equally well to critters of any generation, and so ismore general than the hypotheses available to you on Wednesday. ● Our hypothesis t1-which, again, is in the domain of your ursday credence function, but not in the domain of your Wednesday credence function-accurately predicts the inheritance pattern of shapes among the critters. And t1 is simpler than any reasonably accurate hypothesis that you could have formulated with your Wednesday credence function. ● Onursday you are in a position to explain why a particular pair of cubes have some spheres as ospring by appealing to the fact that they are both hybrid cubes. And this explanation is better, I submit, than any one you could have formulated on Wednesday.21 In short, there is something to be said, epistemically, and relative to the actual world, for yourursday credence function over your Wednesday credence function. A theory of epistemic utility should be sensitive to such facts. Our rst order of business is to spell out a way of comparing credence functions with dierent hypothesis spaces relative to the same state of the world.is will require introducing a few denitions. LetP be the collection of all probability functionswith a nite domain. To x terminology, let us stipulate that a utility function is a function u that associates, to each probability function P ∈ P and world w ∈ W a real number u(P,w), which I will call the utility of P in w. definition 3 . 1. A utility function is a real-valued function dened over P ×W . Intuitively, any such function must satisfy the following desideratum: if P does not distinguish between w and w′-that is, if for all x in the 21 For further discussion of this issue, see §4.3. 17 domain of P, w ∈ x i w′ ∈ x-then u(P,w) = u(P,w′). us, for any world w, u(P, ⋅) should be constant throughout the πP cell of w-that is, the unique element of πP containing w. is amounts to the following constraint: definition 3 .2. A utility function u is nice i for each P, the function u(P, ⋅) ∶W → R is constant throughout any s ∈ πP . In other words, for each s ∈ πP and any w ,w′ ∈ s, u(P,w) = u(P,w′). Equivalently, u is nice i for each P, u(P, ⋅) P-measurable-i.e. if for each r ∈ R, {w ∈W ∶ u(P,w) = r} is in the domain of P.22 From now on, I will assume that all utility functions are nice. If u is a nice utility function, it makes sense to talk of the utility of P 'at s'. More generally: remark 3 .3. If u is nice and t is a subset of a given P-atom s, we can set u(P, t) to u(P,w) for any w ∈ t, so that u(P, t) = u(P, s) = u(P,w). To see what a nice utility function looks like, consider the following utility function β: β(C , x) = − ∑ t ∈ πC (C(t) − 1{[x]C = t})2, where [x]C is the unique element of πC containing x. Note that for all C, if [x]C = [x′]C , then β(C , x) = β(C , x′).us, β is a nice utility function. Note moreover that, when restricted to comparing probability functions of a given domain, our utility function β is just the Breier score, i.e.: β(C , x) = β∗(C , [x]C). Nice utility functions can be used to make comparisons among credence functions with dierent domains. To see how, consider a simple example. 22 In the terminology of Lewis 1981, q P-measurability is just the requirement that all valuelevel propositions be in the domain of P. 18 example 3 .4. Let P0 be the only probability function dened over the trivial algebra-that is, πP0 = {W}. Let P1 be such that πP1 = {s0, s1}, with P1(s0) = P1(s1) = .5. You can think of the state space of P1 as what you get from the state space of P0 if you simply partitionW into two cells, s0 and s1-P1 is just the uniform probability distribution over the ner state space. Now, for all w, β(P1,w) = −((.5 − 1)2 + (.5 − 0)2) = −.5. Since for all w, β(P0,w) = 0, we have that P0 is doing better than P1 relative to β and any w ∈W . In this case, it makes sense to ask for the expected score of each of P0 and P1 relative to P1 and β, where as before: EP1[β(P)] = ∑ s ∈ πP1 P1(s) ⋅ β(P, s) In particular: EP1[β(P0)] = 0, EP1[β(P1)] = −.5. Hence, relative to P1 and β, P0 is doing better than P1.23 Sincewe are dealingwith probability functionswith dierent domains, we need to be careful when dening the expected utility of a probability function relative to another. For ease of reference, let us introduce a familiar denition: 23 is might seem surprising to those familiar with epistemic utility theory. Aer all, one of the notable features of the Brier score is that it is strictly proper (in a sense to be dened in §3.4). But recall that the claim that β is strictly proper essentially amounts to the claim that any credence function should take itself to be doing better than any other credence function with the same domain. I discuss generalizations of the more familiar notion of propriety in the context of comparing credence functions with dierent domains below. In the terminology to be introduced in §3.4, we will say that β is not 'downwards proper', even though it is strictly proper. 19 definition 3 .5. Say that a partition π of W is a renement of π′ i for any s′ ∈ π′ there is s ∈ π such that s ⊆ s′. e renement relation induces a partial ordering on the set of partitions of a set, where π ⪯ π′ i π′ is a renement of π.24 If π is a renement of π′, we will say that π′ is a coarsening of π. Now note that if P′ is an expansion of P then πP′ is a renement of πP . And a straightforward consequence of Remark 3.3 is that if πP′ is a renement of πP , then for all s′ ∈ πP′ , u(P, s) is well-dened. Hence, whenever P′ is an expansion of P, we can dene the expected utility of P relative to P′ and any nice utility function in the usual way: EP′[u(P)] = ∑ s ∈ πP′ P′(s) ⋅ u(P′, s). e diculty arises when πP′ is not a renement of πP , since we cannot guarantee that for all s ∈ πP′ , u(P, s) is well-dened. In particular, if P′ is an expansion of P, we cannot guarantee that u(P′, s) will be well-dened for all s ∈ πP , since u(P′, ⋅)may not be constant throughout all s ∈ πP . We will be particularly interested in evaluating, relative to a probability function P and utility function u, dierent expansions of P. Although we cannot assume that EP[u(P′)] will be dened for an arbitrary expansion P′ of P, there are two dierent but related well-dened quantities that will come in handy.25 If π is a renement of πP , let Pπ /P be the extensions of P to π-the collection of all probability functions whose state space is π that agree with P on P 's domain. If P′ is an expansion of πP , we will letPP′ /P stand for PπP /P-so PP′ /P is the collection of extensions of P to the domain of P′.26 24 For the cognoscenti: the partial ordering I will be relying on is the inverse of the partition lattice ofW (see e.g. Ellerman 2010.) In the present context, the inverse order seems easier to work with, for π ⪯ π′ i Bπ ⊆ Bπ′ , so that 'moving forward' along the renement relation amounts to an increase in the set of propositions in the corresponding algebra. 25 Cf. Goldstein 1984, Manski 1981. 26 Note that we can think of PP′ /P as an imprecise probability function, most naturally identied with a representor in the sense of van Fraassen 1990, viz. a set of probability functions. In this case, we can think of PP′ /P as an imprecise probability function that assigns precise values to each member of BP and imprecise values to any other member of BP′ . e denitions to follow can thus be seen as the familiar denition of upper and lower expectation for imprecise probabilities (Gilboa 1987, Satia & Lave 1973). See Troaes 2007 for a recent of these and related works. 20 definition 3 .6. Let P′ be an expansion of P, and let PP′ /P be the collection of extensions of P to the domain of P′-that is, the collection of probability functions with the same domain as P′ and which assign the same value as P does to every proposition in P 's domain (recall that if P′ is an expansion of P then the domain of P′ includes that of P).e upper expected value of P′, relative to P and u is EP[u(P′)] = sup P+ ∈PP′ /P EP+[u(P′)] = sup P+ ∈PP′ /P ∑ πP′ P+(s) ⋅ u(P′, s) e lower expected value of P′, relative to P and u, is EP[u(P ′ )] = inf P+ ∈PP′ /P EP+[u(P′)] = inf P+ ∈PP′ /P ∑ πP′ P+(s) ⋅ u(P′, s) Note that EP[u(P′)] ≥ EP[u(P′)], with equality if πP = πP′ . example 3 .7. Let P0 be as in Example 3.4, viz. the unique probability function dened over the trivial algebra. Let P2 be such that πP2 = {s0, s1}, with P2(s0) = 1.en: β(P2,w) = ⎧⎪⎪ ⎨ ⎪⎪⎩ −((1 − 1)2 + (0 − 0)2) = 0 if w ∈ s0 −((0 − 1)2 + (1 − 0)2) = −2 otherwise , which means β(P2, s0) = 0 and β(P2, s1) = −2.us, EP0[β(P2)] = sup P+ ∈PP2 /P0 ∑ πP2 P+(s) ⋅ β(P2, s) = sup P+ ∈PP2 /P0 P+(s1) ⋅ −2 = 0. Since EP0[β(P0)] = 0, we have EP0[β(P0)] = EP0[β(P2)]. 21 Note further that EP0[β(P2)] = infP+ ∈PP2 /P0 ∑ πP2 P+(s) ⋅ β(P2, s) = inf P+ ∈PP2 /P0 P+(s1) ⋅ −2 = −2. Hence EP0[β(P2)] < EP0[β(P0)] = EP0[β(P2)]. Given a credence function C and a utility function u, we can now compare dierent expansions of C in dierent ways. Importantly, as shall become clearer, we can compare dierent expansions of C dened over dierent collections of propositions. For example, we can ask which one maximizes EP[u(⋅)] (maximax), which one maximizes EP[u(⋅)] (maximin), or perhaps which one maximizes some weighted average of the two. Presumably there will be things to be said in favor of each of these decision rules. Here, though, I hope to avoid getting into that debate. My focus will be rst and foremost on the preliminary question how to motivate epistemic utility functions that allow for such comparisons in the rst place. And, fortunately, some of the applications I will briey review towards the end allow us to remain neutral on exactly how best to choose dierent expansions. 3.4 Propriety and expansions Not all utility functions, in the sense dened above, are admissible as epistemic utility functions. A function that assigns the same value to any probability function relative to any world, for example, cannot count as an epistemic utility function. A minimal requirement on epistemic utility functions, one that has been proposed in a slightly dierent form in the literature, is this. Suppose P and P′ are dened over the same state space π and for all s ∈ π, the distance between P(s) and the actual truth-value of s is as close, and sometimes closer, than that between P′(s) and the actual truth-value of s. 22 en the utility of P at the actual world is greater than that of P′. More generally, for any two P and P′ dened over a given state space π, we will say that P is as close to the truth as P′, relative to s∗ ∈ π, i for all s ∈ π, ∣P(s) − 1{s∗ = s}∣ ≤ ∣P′(s) − 1{s∗ = s}∣. We will say that P is closer to the truth than P′, relative to s∗ ∈ π, if P is as close to the truth as P′, relative to s∗ ∈ S, and for some s0 ∈ πP′ , ∣P(s0) − 1{s∗ = s0}∣ < ∣P′(s0) − 1{s∗ = s0}∣. definition 3 .8. A utility function is truth-directed i for any two P and P′ dened over a given state space π, (i) if P is as close to the truth as P′, relative to s∗ ∈ π, then u(P, s∗) ≥ u(P′, s∗), and (ii) if P is closer to the truth than P′, relative to s∗ ∈ π, then u(P, s∗) > u(P′, s∗). Plausibly, any reasonable epistemic utility function will be truth-directed. e Brier score, widely taken to be a reasonable epistemic utility function, is a truth-directed utility function. ere is broad consensus that not all truth-directed utility functions are reasonable epistemic utility functions. For example, consider the absolute distancemeasure α, where α(P,w) = − ∑ s ∈ πP ∣P(s) − 1{[w]P = s}∣. is is plausibly the simplest truth-directed utility function. But it has the following property, which many consider to be a bug: if you are slightly more condent in p than in its negation, you should expect that being fully condent in p is better, by the lights of α, than having your current degrees of belief. For example, suppose you have a credence function C whose state space consists of two propositions, p and ¬p, and suppose that C(p) > .5. en, with a bit of algebra, we can see that the expected utility of C, relative to C and α, will be lower than that of the function C′, where C′(p) = 1. If epistemic rationality is a matter of maximizing expected 23 epistemic utility, and if α is a reasonable measure of epistemic utility, then it would be rational to jump to extremes. Since jumping to extremes is not epistemically rational, once we accept that epistemic rationality is a matter of maximizing expected epistemic utility, we must conclude that α is not a reasonable measure of epistemic utility. We can draw a general lesson from this. Suppose it is epistemically rational to have credence function C dened over a state space π.en someone with that credence function cannot expect to be doing better, epistemically, by assigning dierent credences to the propositions in π.27 In general, suppose any probabilistically coherent credence function can sometimes be epistemically rational.en all reasonable epistemic utility functions must be partition-wise proper in the following sense: definition 3 .9. A utility function u is partition-wise proper i for all P ≠ P′ dened over the same state space, EP[u(P)] ≥ EP[u(P′)]. A utility function is partition-wise strictly proper i for all P ≠ P′ dened over the same state space, EP[u(P)] > EP[u(P′)]. Note the restriction to comparisons of probability functions with the same domain. Since most work on epistemic utility functions has implicitly made this restriction, our denition of partition-wise propriety coincides with the more familiar denition of propriety. e Brier score, for example, is a paradigmatic example of a (strictly) proper utility function, and it is as a partition-wise (strictly) proper utility function. In what follows, and when there is no risk of confusion, I will sometimes use 'proper' (resp. 'strictly proper') as shorthand for 'partition-wise proper' (resp. 'partition-wise strictly proper'). We could introduce a stronger constraint: if someone with credence function C is rational, she cannot expect to be doing better, epistemically, by switching to any other credence function whose domain is a subset 27 What is wrong with α, then, on this picture, is that it is sometimes rational to have a credence function C such that C(p) > C(¬p) > 0. And if α were a reasonable measure of epistemic utility, someone with that credence function C would expect to be doing better, epistemically, by becoming certain of p. 24 of the domain of C. In other words, one should not be able to do better either by changing one's assignment of credence in p or by no longer assigning any credence (not even 0) to p. If again we suppose that any probability function can sometimes be a rational credence function, we would require that all reasonable epistemic utility functions be strictly downwards proper in the following sense: definition 3 . 10. A utility function u is downwards proper i for all P ≠ P′, if πP′ is a coarsening28 of πP (πP′ ⪯ πP), then EP[u(P)] ≥ EP[u(P′)]. A utility function is strictly downwards proper i for all P ≠ P′ if πP′ ⪯ πP , then EP[u(P)] > EP[u(P′)]. As we saw in Example 3.4, β is not downwards proper. If we require that all epistemic utility functions be downwards proper, then the Brier score will not count as an epistemic utility function.29 A related but dierent constraint we could impose is this. If someone with credence function C is rational, she cannot expect to be doing better, epistemically, by switching to any other credence function whose domain extends the domain of C. In other words, one should not be able to do better either by changing one's assignment of credence to those propositions one currently assigns credence to or by assigning credence to propositions not in the domain of one's current credence function. If again we suppose that any probability function can sometimes be a rational credence function, we would require that all reasonable epistemic utility functions be strictly upwards proper in the following sense:30 28 See Denition 3.5. 29 Sometimes the Brier score is dened in a dierent way, viz. β∗(P,w) = − 1 N ∑πP (P(s) − 1{sw = s})2 , where N is the size of πP . e same example can be used to illustrate that β∗ is not downwards proper. 30 As I show in Appendix a, in the presence of partition-wise propriety, upwards propriety is equivalent to the following condition: for each P ≠ P′, if πP ⪯ πP′ , then EP[u(P)] ≥ EP′[u(P′)]. 25 definition 3 . 11. A utility function u is upwards proper i for all P ≠ P′, if πP′ is a renement of πP (πP ⪯ πP′), then EP[u(P)] ≥ EP[u(P′)]. A utility function is strictly upwards proper i for all P ≠ P′, if πP ⪯ πP′ , then EP[u(P)] > EP[u(P′)]. Aswe saw in Example 3.7, the Brier score is not strictly upwards proper. If we require that all epistemic utility functions be strictly upwards proper, then the Brier score will not count as an epistemic utility function. Say that a theory of epistemic rationality requires strong immodesty if the only rational epistemic transitions are the result of updating on new evidence. If epistemic rationality is a matter of maximizing expected epistemic utility, strong immodesty requires that any rational credence function judge itself to be doing better, epistemically, than any of its alternatives, regardless of their domain. If we think any probability function can sometimes be a rational credence function, strong immodesty would entail that all epistemic utility functions be strictly universally proper in the following sense: definition 3 . 12. A utility function u is universally proper i it is upwards proper and downwards proper. A utility function is strictly universally proper i it is strictly downwards proper and strictly downwards proper. As it turns out, strong immodesty is too strong a requirement. For it rules out all utility functions as reasonable epistemic utility functions:31 fact 3 . 13. ere are no strictly universally proper utility functions. See Fact a . 7. 31 Note that we could have formulated upwards propriety using lower expected value rather than upper expected value (see Denition 3.6).e corresponding requirement would have been that for each P and each P′ ≠ P, if πP ⪯ πP′ , then EP[u(P)] ≥ EP[u(P ′)]. As we will see, however, this requirement is also incompatible with strict downwards propriety, at least given some fairly harmless assumptions-see Corollary a . 10. 26 What is more, say that a utility function u is partition-wise strictly proper i for each P and any Q dened over the same domain, if P ≠ Q then EP[u(P)] > EP[u(Q)]. (Most work on the literature so far uses the term 'strict propriety' to mean what I'm calling 'partition-wise strict propriety'.) If we assume all epistemic utility functions are partition-wise strictly proper, then not only do we know there are no strictly universally proper epistemic utility functions. We can also show that there are no universally proper epistemic utility functions: fact 3 . 14. If u is universally proper, it is not partition-wise proper.32 ese results raise a couple of questions. Should all epistemic utility functions be downwards proper? If not, should they all be upwards proper? (Examples of upwards proper and downwards proper epistemic utility functions can be found in Appendix b.) I think an argument can be made for requiring downwards propriety, much along the lines of a familiar argument for (partition-wise) strict propriety. Suppose u is not downwards proper.en there are credence functions such that they take themselves to be doing strictly worse, epistemically, than one of their restrictions.is means that anyone with such a credence function should think, of some of issues she has an opinion on, that she would better o simply having no view whatsoever on the matter.us, if we assume that any credence function can sometimes be epistemically rational, we must require that all epistemic utility functions be downwards proper.33 32 Seeeorem a . 3. 33 Note that this argument does not straightforwardly generalize to an argument for requiring upwards propriety, for reasons spelled out in the introduction (see the discussion of immodesty and epistemic imagination). Richard Pettigrew has shown-building on Carr 2015-that any epistemic utility function that satises some plausible constraints will give rise to a dilemma: either (a) for all є > 0, there is a proposition p such that assigning credence < є is equally epistemically good relative to worlds in which p is true as it is relative to worlds in which p is false, or (b) some credence functions are strictly dominated by some of their expansions (Pettigrew 2016). Ultimately, Pettigrew thinks we should learn to live with the second horn of the dilemma (see his discussion in section 3.4). I agree that this is the horn we should plump for, but my reasons are slightly dierent: as long as the agent is not entertaining the propositions in the expanded domain, it may be rational for her to stick to her current credence function even if there is an expansion of her credence function dened over the larger domain that dominates it. 27 It is not my goal here, however, to make a case for a requirement of downwards propriety. Instead, I will tentatively assume that all ways of measuring accuracy should be downwards proper. From this and Fact 3.14 we can now conclude that no accuracy measure will be upwards proper, and thus that expansions can sometimes be epistemically rational.ere is more to be said, though. In the remainder of the paper, I want to nd a way of vindicating the intuitive judgments about ourmotivating examples, by relying on a principled way of comparing dierent expansions of a given credence function along an epistemic dimension of evaluation. Recall, for example, our toy example from §2.3: there youmoved from a credence function (your Wednesday credence function) that was not dened over propositions of the form c is a pure cube or c is a hybrid cube to one (yourursday one) which was. Incorporating such propositions into your posterior, I claimed, was an epistemic improvement, and this is from the perspective of your prior credence function. Before incorporating the new propositions into your credence function-before assigning a particular credence to it and before deploying that proposition in your theorizing-you could have expected that doing so will have some non-trivial epistemic benets. To capture these epistemic judgments, we need more than just the assumption that there are epistemic utility functions that are downwards proper. We need a principled way of dening epistemic utility functions which are not just downwards proper, but also which get the judgments in question right. In the next section, I oer a way of doing just that. In particular, I will oer a family of epistemic utility functions according to which, from the point of view of your Wednesday credence function,ursday's credence function had a higher expected epistemic value than your Wednesday credence function itself, but also than other possible expansions of your Wednesday credence function. In other words, if the utility functions in question do in fact correspond to a genuine epistemic dimension of evaluation, as I will suggest they do, we will have vindicated the intuition that switching to yourursday credence function was a Gooding. 4 a richer theory of epistemic utility We could try to appeal to accuracy considerations, broadly understood, in order to account for why the move to yourursday credence function was epistemically benecial. Aer all, by introducing new distinctions 28 you increased the number of propositions about which you have beliefs. And this could turn out to increase the amount of overall accuracy, on at least some ways of measuring it, of your body of beliefs. For example, we could modify the Brier score so that the score of C at w depends on how large its state space is.34 And we could do this in a way that would vindicate the thought that, relative to the actual world, yourursday credence function is doing better than your Wednesday credence function. But this would not vindicate the similarly plausible thought that the epistemic gain from incorporating those propositions into yourursday credence function exceeds that of incorporating (say) propositions of the form c is a pure sphere and c is a hybrid sphere.35ere is epistemic work to be done by the former set of propositions, but not (as much) by the latter. We want our theory of epistemic utility to be sensitive to that fact. e epistemic gain corresponding to the transition from yourWednesday credence function to yourursday credence function is (at least in part) due to the content of those propositions themselves. You reaped epistemic benets from the transition to yourursday credence function because the added propositions gave you explanatory resources you did not have before, and because you gained the ability to formulate a simpler, more general hypothesis that accurately predicts the inheritance pattern of the traits you were interested in. And this suggests that our epistemic utility function should not treat accuracy with respect to a given proposition to be as important as accuracy with respect to any other proposition. It should instead weigh accuracy with respect to a given proposition in a way that is proportional to the explanatory benets it would provide (if true).36 More precisely, the proposal is this: 34 Cf. Corollary b .4 in the appendix. 35 is is because, like manymeasures of accuracy, themodied Brier score would be subject to the following constraint (cf. Joyce 2009, p. 273): extensionality: Let P and Q be dened, respectively, over πP = {p i ∶ 0 ≤ i < m} and πQ = {q i ∶ 0 ≤ i < m}. If P(p i) = Q(q i) and 1{[w]P = p i} = 1{[w′]Q = q i} for all i, then μ(P,w) = μ(Q ,w′). 36 Cf. fn. 42. 29 explanation sensitivity: Relative to the goal of explaining e, accuracy with respect to pmatters more, epistemically, to the extent that p would contribute towards explaining e. Our rst order of business, then, is to dene an epistemic utility function that is sensitive to accuracy considerations, but which gives dierent weight to accuracy with respect to dierent propositions. Our second and more challenging order of business is to justify a particular assignment of weights-one that assigns to each proposition a number that measures its explanatory benets. Let us take each in turn. 4.1 Additive epistemic utility functions Recall our working example of an accuracy measure, the Brier score: β(C , x) = − ∑ t ∈ πC (C(t) − 1{[x]C = t})2. is utility function treats accuracy about s at w exactly like it treats accuracy about s′ at w: in each case what matters is the distance between the credence assignment in a proposition and that proposition's truthvalue. But we can easily dene a variant of our epistemic utility function that treats accuracy about s at w dierently from accuracy about s′ at w. Given any real-valued function λ of s, where each λ(s) > 0, we can dene: βλ(C , x) = − ∑ s ∈ πC λ(s) ⋅ (C(s) − 1{x ∈ s})2. Such a utility function would weigh accuracy with respect to s as a function of λ(s). is is not, of course, the only way of obtaining utility functions that treat accuracy with respect to s in a way that sensitive to certain features of s. Any additive function, in the following sense, will do:37 definition 4.1. A utility function u is additive i for each partition π, each s ∈ π, there is a function δπs ∶ [0, 1] × {0, 1} Ð→ R, such that (i) δπs (x , 1) is continuous, twice dierentiable, and strictly increasing, (ii) 37 e denition to follow is a slight variant of the one given in Joyce 2009, p. 272. Note that I'm building in niceness (something Joyce need not worry about) into the denition of additive utility functions. I'm also building in truth-directedness, much like Joyce does. 30 δπs (x , 0) is continuous, twice dierentiable, and strictly decreasing, and (iii) for all C with π = πC u(C ,w) = ∑ s ∈ π δπs (C(s),1{w ∈ s}). Additive utility functions are epistemic utility functions that can treat accuracy with respect to dierent propositions quite dierently. An additive utility function will be partition-wise strictly proper so long as each δπs is strictly proper, in the sense that, for all r ≠ t ∈ [0, 1]:38 r ⋅ δπs (r, 1) + (1 − r) ⋅ δπs (r, 0) > r ⋅ δπs (t, 1) + (1 − r) ⋅ δπs (t, 0). In particular, for any assignment λ of weight functions λπ ∶ π → R+ to each state space π, the function βλ is a strictly proper utility function, one which weighs accuracy with respect to s, in a credence function dened over π, in a way that is proportional to λπ(s). Of particular relevance to the purposes of this paper is a family of additive utility functions that we will call weighted accuracy measures. Say that a function δ ∶ [0, 1] × {0, 1} Ð→ R is a local accuracy measure i for each i ∈ [0, 1], δ(x , i) is a continuous, twice dierentiable, strictly decreasing function of ∣i−x∣. (us, δ(x , 1)will be strictly increasing and δ(x , 0) strictly decreasing.) We can think of a local accuracy measure as a way of evaluating an assignment of credence in a proposition in a way that is sensitive to the truth-value of the proposition and the distance between the credence assignment and that truth-value.39 definition 4.2. A utility function δ is a weighted accuracy measure i there is a local accuracy measure δ ∶ [0, 1] × {0, 1} Ð→ R and, for each 38 is result is stated without proof in Joyce 2009, p. 276, but it is a straightforward consequence of the additivity of expectation, which ensures that (for C′ dened over πC ): ∑ s ∈ πC C(s) ∑ s′ ∈ πC δπs (C′(s′),1{s = s}) = ∑ s ∈ πC C(s)⋅δπs (C′(s), 1)+(1−C(s))⋅δπs (C′(s), 0). On the assumption that each δπs is proper, we can infer that this sum will be maximized at C′ = C. 39 Compare the denition of a local epistemic utility function in Pettigrew 2016, §1.1.3. Note that, unlike Pettigrew, I do not build into the denition either the requirement that δ be proper or the requirements that δ(0, 1) < 0 < δ(1, 1) and δ(1, 0) < 0 < δ(0, 0). Indeed, as we will see below (cf. Corollary c .2) assuming δ(0, 0) > 0 imposes substantive constraints on epistemic utility functions. 31 partition π, a weight function λπ ∶ π Ð→ R+ such that for all P with πP = π, u(P,w) = ∑ p ∈ π λπ(p)δ(P(p),1{w ∈ p}), e weighted version of the Brier score, βλ, is a weighted accuracy measure in this sense. We can use weighted accuracy measures in order to capture a way of measuring epistemic value that is sensitive to explanatory considerations. Doing so requires that we identify an assignment of weights to dierent propositions that measure the extent to which they contribute to an explanation of that explanandum. Given such an assignment, we can dene epistemic utility functions that treat accuracy with respect to a proposition in a way that depends on how much it contributes to a given explanandum. Presumably, a theory of explanation should tell us how much a particular proposition contributes to a (good) explanation of any given explanandum. Now, I do not have a general theory of explanation to oer. But there is a common core to many theories of explanation that we can appeal to in order to illustrate how to go about measuring the explanatory strength of a given proposition (relative to a particular explanatory goal). is will allow us to vindicate the thought that, in an interesting sense, the move from your Wednesday credence function to your ursday credence function (and the same goes for the transition from Monday to Tuesday) was epistemically rational. It bears repeating. In order to build in explanatory considerations into a theory of epistemic utility, I will need to take a stand on dicult questions about the nature of explanation. But it is not my goal here to defend those answers, and the main conclusions of this paper should not depend on those being the correct answers. Rather, I will rely on them in order to oer a proof of concept: I want to explore how we can extend the Bayesian framework if we enrich our theory of epistemic utility. Before moving on, I should make explicit two methodological assumptions. e rst is that I will presuppose, in order to keep things somewhat simpler, that whenever an agent is interested in explaining e, she is certain that e is true. is is no doubt an oversimplication. For our purposes, however, it will do-none of the main conclusions of the paper hinge on it. 32 e second is that I will be mostly concerned not with whether (and to what extent) p is a good explanation of e simpliciter. Rather, I will be concerned with whether (and to what extent) p is a good explanation of e relative to some particular body of beliefs.is is so for two reasons. First: I want to remain neutral on the debate overwhether there is such a thing as an 'objective' notion of explanation. I thus want to rely only on judgments of explanatory relevance, which would make sense against a background body of belief even if there is no such thing as 'the' explanation of a given explanandum. Second: in line with the the broadly 'internalist' perspective that underlies much work on epistemic decision theory, I think that what epistemic utility function we use to evaluate an agent's epistemic rationality should reect the agent's judgments about epistemic value.40 To the extent that an epistemic utility function is meant to be sensitive to explanatory considerations, those considerations will reect the agents own judgments about what explains what. And since, again, those judgments may well be sensitive to the agent's background body of beliefs, we are better o focusing on whether p is a good explanation of e relative to a given body of beliefs. 4.2 Explanation, invariance, and stability You ip a coin ten times in a row. To your surprise, it lands tails nearly every single time. Here is a possible explanation of what happened: bias:e coin is heavily biased toward tails. Another possible explanation consists of a specication of each of the starting positions of the coin and your hand, together with a specication of the force with which you ipped it and the wind conditions which, together with the laws of physics, make it extremely likely that the coin landed tails nearly every time. Call this explanation initial, and sup40 I suspect those who think that explanations are, as Woodward puts it, ''a matter of exhibiting systematic patterns of counterfactual dependence'' (Woodward 2005, p. 191) will agree that explanatory judgments like 'the glass broke because it fell from the table' only deserve to be so-called because they take place against a background set of assumptions which, together with the specic judgment in question, entail facts about about the relations of counterfactual dependence among the relevant events. 33 pose it is incompatible with bias, perhaps because we can derive from the facts cited in initial that the coin is fair.41 To some extent, the rst explanation is more satisfying than the second.is is not because it is more or less likely: it may be highly unlikely that the coin you got from the bank was heavily biased towards tails. Rather, it is because if true it would be more satisfying as an explanation than the second one would be, if it happened to be true.42 Why would an explanation in terms of bias be more satisfying than one in terms of initial? It is not because bias makes the explanandum more probable- we would prefer bias even if we modied initial so that it entailed the truth of the explanandum, and therefore raised its probability to one. Rather, it is because bias has some familiar explanatory virtues that initial lacks. For example, bias is simpler than initial. We want our theories to be simple-we want them to involve no more detail than it is necessary- partly because theorizing has cognitive costs, and we rather not spend cognitive resources on details that promise little in terms of theoretical payo.43 Another reason for preferring bias over initial is that it is more general-it can be applied to many dierent circumstances. Generality tends to make for good explanations.is is why appealing to beliefs and desires to explain my behavior can be more satisfying than giving a full description of the state of my brain.44 Consider this example, essentially due to Alan Garnkel.45 Tom is running late for a meeting, because he had a leisurely breakfast. He gets in 41 is example is based on a slightly dierent example in White 2005, where it's used to illustrate an explanatory virtue called stability. Although the pointWhite goes on to make is dierent from the one I will make, and although the characterization of stability he provides is not quite the notion of counterfactual resilience that I introduce in this paper, there is much in common to the spirit of both proposals. 42 e distinction I'm drawing here is of course reminiscent of Peter Lipton's well-known distinction between the loveliness and likelihood of an explanation (Lipton 2004). 43 Admittedly, it is tempting to think that simplicity is a virtue not just because of our cognitive limitations, but we needn't take a stance on that issue. For related discussion, see e.g. Baker 2003, Nolan 1997. See also Baker 2011 for a helpful overview of some of the relevant issues. 44 Cf. Jackson & Pettit 1988, Strevens 2004, and the discussion of causal relevance in Yablo 1992 for related discussion. 45 Garnkel 1981, p. 30. 34 his car and drives somewhat recklessly-so much so that he loses control of the car at some point and gets into an accident. A natural explanation of this unfortunate event is that Tom was driving recklessly-that he was speeding, say. Given background assumptions, his speeding makes it very likely that he got into an accident. But this cannot be all it takes for something to be a good explanation of the accident. Aer all, a fuller description of Tom's morning would also make it highly likely that he got into an accident. And this, I submit, would not be as good an explanation of the accident. e reason-or at least, a reason that many have oered to account for similar cases-is that, unlike the rst explanation, the second one is not very portable.46 Had Tom not had a leisurely breakfast, we couldn't have used this second one as an explanation for why he got into the accident. e explanation in terms of his reckless driving, in contrast, is applicable to many other situations-there are many other ways Tom's morning could have been that, given his reckless driving, would have ended up in a car accident. I do not intend to argue that simplicity and generality are explanatory virtues. I will simply take it for granted for the purposes of this paper. What I want to suggest is a way of capturing these explanatory virtues in a way that is more amenable to the framework of epistemic utility theory. To see how, start by noting that both simplicity and generality have one thing in common. Having a simple, or a very general, explanation, makes the explanandum very stable.47 e simpler the explanation, the fewer stars had to align in just the right way to make the explanandum occur. e same goes for more general explanations-the more circumstances it applies to, the easier it is that the explanandumoccurs, given the explanans. is suggests a strategy for coming up with a diagnostic tool for good explanations: a way of assessing how well a given claim contributes to explaining another. 46 I have in mind a number of authors who have argued for the explanatory relevance of higher-level, or 'more abstract', properties-e.g. Garnkel 1981, Jackson & Pettit 1988, 1990, Strevens 2004, 2008, Weslake 2010. 47 e notion of stability I am aer is related to, although distinct from, the notion of resilience discussed in Jerey 1983 (when discussing the so-called paradox of ideal evidence), or Skyrms 1977, 1980.eir focus is on stability under conditionalization-or 'indicative supposition'. Mine is on stability under counterfactual supposition. 35 4.3 Explanatory value and counterfactual resilience My suggestion, in a nutshell, is this: the contribution of p to having an explanation of e is proportional to the extent to which the truth of p would make e stable.48 Oneway to see that satisfying explanations increase the counterfactual stability of the explanandum is to think about laws of nature. Laws of nature have a high degree of counterfactual stability.49 ey are also some of the best candidates for explanatory bedrock. We all know the explanatory buck has to stop somewhere. We all agree that stopping at the laws of nature is as good a place as any. I say it is no coincidence that their high counterfactual robustness goes hand in hand with their not being in need of an explanation. It is because laws of nature are so counterfactually robust-because they would have obtained (almost) no matter what-that they do not cry out for explanation.50 Another way of motivating the connection between counterfactual stability and explanation is to reect on the plausibility of so-called contrastive accounts of explanation.51 e idea is simple: any request for explanation takes place against the backdrop of a contrast class. What we want out of an explanation of an event e is a story as to why e rather than some other member of the contrast class occurred. Now, the harder it is to nd a natural contrast class, the harder it is to take e to be in need of explanation, on this way of thinking. And if e has a high degree of counterfactual stability, then the harder it is to think of e as crying out for an explanation. Counterfactual stability is also a helpful diagnostic tool for simplicity, a highly plausible candidate for an explanatory virtue.e fewer variables are involved in an explanation, the more robust will the explanandum 48 Of course this will not do, as it stands, when it comes to low-probability events. But these are vexed issues far beyond the scope of this paper. See Woodward 2010 for discussion and references. For all I say in this paper, there may be other explanatory virtues that are not captured by the notion of counterfactual resilience. For my purposes, however, all I need is that there be an important dimension of explanatory value that is captured by the notion of counterfactual resilience. 49 Indeed, some would go so far as to use counterfactual stability in order to characterize what laws of nature are. See, e.g. Lange 2005, 2009. 50 is is not to say that we cannot explain a given law of nature.e same caveat listed in fn. 48 applies here. 51 See Garnkel 1981, as well as van Fraassen 1980, Lipton 1990, inter alia. 36 be, and vice-versa.e fewer variables we need to x for the explanation to go through, the more variables we can modify consistent with the explanandum obtaining. And every aspect of the situation that we can counterfactually modify without aecting the explanandumwill plausibly correspond to a variable that is not involved in the explanation. Seeking explanations that make the explanandum counterfactually robust is thus likely to lead to simpler explanations. And explanations that make the explanandum counterfactually robust can be applied,mutatis mutandis, to many dierent circumstances: they are highly portable. Consider again the explanation of the sequence of nearly ten tails in a row in terms of the coin's initial conditions (together with specication of the forces involved, wind conditions, etc.)-what I called initial. Slight variations in the initial conditions would have made this explanation inapplicable: there are many ways things could have been-ways similar to the way things actually are-where the explanandum might have been false. For example, for all the explanation tells you, if you had held the coin in a slightly dierent way in one of the tosses, the coin might have easily landed heads. Had someone sneezed nearby, altering the wind conditions, the outcome might have been dierent. In contrast, if you had held the coin slightly dierently, then according to bias the coin would have still landed tails nearly every time. bias is applicable to many situations- it wears its portability on its sleeve-not just involving dierent coins and dierent initial conditions, but dierent processes involving binary random variables.52 It is hard to cash out the notion of counterfactual stability in a more precise way. e number of ways things might have turned out such that, for all that initial says, the explanandum might have been false is innite. But so is the number of ways things might have turned out such that, for all bias says, the explanandummight have been true. We cannot just count the relevant possibilities. And while it is in principle possible to provide a measure that would dierentiate between the relevant innite 52 ere are some tricky issues I'm skating over. For example, one might think that counterfactuals of the form If p had been false, the coin would have landed tails cannot be true, since bias does not rule out entirely the possibility of the coin landing heads-cf. Hájek n.d. For our purposes, however, these complications are best set aside. 37 sets of possibilities, it is not obvious how to motivate one measure over another in a way that will work for all cases. But assume we can agree on a nite set of relevant suppositions.53 If the explanandum is more robust under counterfactual suppositions in that set relative to one body of beliefs than another, I submit, that would give us an epistemic reason (albeit a pro tanto one) for preferring the rst body of beliefs over the other.is is not to say this is a reason for taking the rst body of beliefs to be more accurate than the second one.54 But it is surely a reason for favoring the one over the other when both are (expected to be) equally accurate. For example, suppose you are interested in explaining why the outcome of the ten coin tosses is what it is. You are told your memory will be erased, but you will have some say on what credence function you will have aerwards. In particular, you are given the choice of waking up with a credence function that gets very close to the truth about the bias of the coin, and a credence function that gets very close to the truth about the initial position of each of the coin tosses. If all else is equal, you will prefer the former over the latter. If you can only have a view on one of the two questions, you would rather know what the bias of the coin is than what the particular initial conditions of those ten coin tosses were. Aer all, you can expect to have more stable beliefs about the outcomes of the tosses of that coin if you know what its bias is than if you only know what the initial conditions of one particular sequence of tosses were. Holding xed a class of explananda, having a counterfactually robust body of beliefs is better than not-and this, I submit, from an epistemic point of view.55 53 Assume further that all such suppositions should be treated equally, an assumption that we might want to relax at some point. 54 Although see White 2005, especially §3. 55 Note that the notion of robustness at issue is dierent from the one that gures in discussions of the value of knowledge inspired by theMeno, or in theories of knowledge that impose a so-called 'safety condition'. Your belief in p is counterfactually robust, in the sense that is relevant for our present purposes, if you think p would have happened no matter what.is can be so even though it is just a uke that you happened to have that belief in the rst place. 38 5 explanation and epistemic utility functions 5.1 Measuring explanatory potential We want to compare how much a given proposition would contribute to making e well-explained relative to a given body of beliefs-a credence function.is, I have suggested, can be done by measuring how learning that proposition would increase the counterfactual robustness of e relative to that credence function. Relative to a particular set of suppositions B, we can measure the counterfactual robustness of e relative to a credence function C as a function of the average amount of variation betweenC(e) andC(b e) for b ∈ B (as long as C(b e) is well-dened for each b ∈ B).56 ere are dierent ways of measuring the relevant variation. To x ideas, let us use the most straightforward option: the counterfactual resilience of e, relative to a credence function C and a nite set of suppositions B is given by the average dierence between C(e) and C(b e), for b ∈ B.57 definition 5 . 1. e counterfactual resilience of e relative to a credence function C (and a set of suppositions B), rB(C , e), is given by: rB(C , e) = 1 − 1 ∣B∣ ∑b ∈ B ∣C(e) − C(b e)∣ . 56 Onemust tread carefully here. Williams 2012 argues that one cannot identify the credence in φ ψ with the credence one assigns to ψ on the counterfactual supposition that φ. For reasons carefully laid out in Schwarz 2016, I think the argument fails. So I will proceed henceforth on the assumption that whenever a and b are propositions, a b is a proposition. I will postpone the question of how to assign credence to counterfactuals until Appendix d, and note here that everything I say here can be reformulated in terms of imaging (cf. Gärdenfors 1982, Joyce 1999, Lewis 1976) rather than in terms of credences in counterfactual propositions. 57 is measure is thus simimlar to Skyrm's notion of resilience (Skyrms 1977, 1980, inter alia), which is dened in terms of conditional probabilities instead of probabilities of counterfactuals, and is essentially the same as 1 −max b ∈ B ∣C(e) − C(e ∣ b)∣. An alternative measure of resilience in Skyrm's sense would equal one minus the average dierence between C(e) and C(e ∣ b) as opposed to one minus themaximum dierence between C(e) and C(b ∣ e). 39 us, rB(C , e) is a number between 0 and 1 that increases to the extent that the average amount of variation betweenC(e) andC(b e) decreases. How resilient e is, relative to your credence function, is thus meant to capture how well-explained you take e to be. (In what follows, I will drop explicit relativization to the set of suppositions, and assume throughout that a given explanatory context xes such a set. ) Note that the resilience of e relative to your credence function says nothing about what you take the explanation of e to be. In other words, the claim explanation provides resilience (epr): r(C , e)measures how well-explained an agent whose credence function is C takes e to be. is fairly neutral as to what the correct theory of explanation is. Aer all, epr is a claim about how well-explained someone with a given credence function takes e to be. So unless we grant some additional, and by no means obvious, assumptions-say that someone takes e to be wellexplained to the extent that they assign high credence to the true answer to the question what explains e?-epr is compatible with any theory of what, if any, is the correct explanation of e. Furthermore, epr does not by itself entail anything about what is the explanation of e, even by the lights of an agent who takes e to be well-explained: for all epr says, an agent can take e to be well-explained even if there is no answer to the question what explains e according to her? At the same time, epr is not completely neutral on some well-known questions about explanation. For instance, according to epr, there is a sense in which high-level explanations are preferable, all else equal, to low-level explanations.is can be seenmore clearly if we focus on causal explanations. A high-level explanation will abstract away from more details of the causal history of a particular explanandum. As such, it will be applicable to a wider variety of events that dier from the target explanandum in the details of its causal history. So, all else equal, a high-level explanation will make the explanandum more stable than a lower-level one.58 58 I am making some nontrivial assumptions here on how the truth-value of certain counterfactuals is aected by the features of a particular explanatory context. In particular, I am assuming that in a context where we are given every single detail of the causal history 40 Still, as stated epr is highly schematic. For any credence function C and any proposition e we can nd a set of suppositions relative to which r(C , e) takes its maximum value (i.e. 1). If my purpose here were to provide an account (reductive or not) of what makes for a good explanation, I would be in urgent need of a way of specifying a set of suppositions for any given explanandum. But that is not my purpose here. Rather, I want to oer a way of building in some of our views on explanatory value into the framework of epistemic utility theory in order to see how a more exible theory of Bayesian dynamics could go. us, my hope is that we can agree, in any particular case, on a suitable class of suppositions, and look at the consequences that would have for questions about the rationality of conceptual change. 5.2 Combining accuracy and resilience Suppose we x a particular explanandum e and a given credence function C. Further suppose λeC is a function which assigns to each s a positive number that increases as the amount of change in resilience of e that would come from updating C on s increases.en, for any local accuracy measure δ, δC(P, x) = ∑ s ∈ SP λeC(s) ⋅ δ(P(s),1{x ∈ s}) is a weighted accuracy measure, whose weight function measures the extent to which s contributes to explaining e relative to C. Moreover, if δ is (strictly) proper, then δC will be partition-wise (strictly) proper.59 of some event e, counterfactuals of the form 'if d had not obtained, e would still have obtained', where d is some particular detail that gures in the putative explanation in question, are not true.us, even if a low-level explanans e logically entails a high-level explanans E, it could be that the explanatory context set up by e, the closest not-d worlds (where d is incompatible with e but compatible with E) will include not-E worlds.is is by no means uncontroversial, but it will not be needed in what follows. If there are readings of the relevant counterfactuals on which lower-level explanations come out as providing more stability than higher-level ones, that makes epr less controversial than I'm taking it to be. 59 ere is a small wrinkle here I'm ignoring for now. We want perfect accuracy with respect to more important propositions to count for more than perfect accuracy with respect to less important propositions. If our local accuracy measure is always nonpositive, then we will want the weight assigned to a proposition to be smaller the more 41 For concreteness, let us dene, for each s ∈ π, χeC(s) = 1 + r(Cs , e), where Cs = C( ⋅ ∣ s). For a given s, then, χeC(s) will be a number between 1 and 2 that increases as the stability of e relative to Cs increases. For a xed C and e, we can now dene an epistemic utility function-one which is sensitive to both accuracy and explanatory considerations-by combining our χeC function with any strictly downwards proper local accuracy measure δ: εδC(P, x) = ∑ s ∈ πP χeC(s)δ(P(s),1{x ∈ s}) Note that, unlike any other epistemic utility function we've discussed thus far, εδC is dened in terms of a particular probability function C.is is to be expected. Aer all, the weights given to each of the propositions in question are supposed to measure how much learning the truth about them contributes to the explanatory closure of a particular body of beliefs. If we think of an agent's epistemic utility function as something determined by her epistemic values, we can think of an agent with credence function C who uses εδC as her epistemic utility function as someone who values accuracy as well as the resilience of her body of beliefs-which, given epr, would be tantamount to valuing both accuracy and explanatory closure.60 It is also worth noting that, because εδC is dened in terms of a xed probability function, it can be partial: it may not be dened for all pairs important the proposition. If instead our local accuracy measure is always non-negative, we will want the weight assigned to a proposition to be greater the more important the proposition. (ings get even trickier if our local accuracy measure is sometimes positive and sometimes negative.) Fortunately, if we assume that our local accuracy measure is downwards proper-in a sense to be made precise-and that it is 0/1-symmetric-in that δ(x , 1) = δ(1 − x , 0) for all x ∈ [0, 1], our local accuracy measure will be non-negative. See the Remark c .4. 60 We could, of course, nd some fully objective way of assigning to each proposition a measure of its explanatory potential-cf. the discussion of weighted accuracy measures in redacted for blind review. Doing so would allow us to dene an epistemic utility function that is sensitive to explanatory considerations in a way that does not depend on the agent's credence function. Further investigation may result in one such measure of explanatory worth, but for our purposes, I will set that strategy aside. 42 consisting of an arbitrary probability function and a possible world. Specifically, εδC(P, x) will be well-dened only if r(Cs , e) is well-dened for all s ∈ πP . Now, r(Cs , e)may not be well-dened for one of two reasons. First, it may be that Cs is well-dened, but that Cs(b e) is not, for some relevant b. Second, it may be that Cs is not well-dened, perhaps because C(s) is not well-dened. Fortunately, in the cases that will be of interest- when using εδC to compare expansions ofC-our epistemic utility function will always be well-dened. In the remainder of this section, I briey explain why, leaving the details to an appendix. I start by assuming that counterfactual conditionals are only understood against the backdrop of a similarity function-a function σ that assigns, to each proposition a and possible world w, the set σ(a,w) of those a-worlds that are most similar tow.e counterfactual conditional aσ b is thus the set of all worlds w such that all worlds in σ(a,w) are b-worlds. (Henceforth, I will drop the relativization to σ and assume one such function is xed throughout.) Now, recall that from the perspective of an agent whose state space is π, we can think of each s ∈ π as playing the role of a possible world-the elements of π are each maximally consistent relative to the agent's credence function, in that each proposition in the agent's credence function is entailed by or inconsistent with some member of π.us, from the perspective of such an agent, we can think of the counterfactual conditional as the proposition that is true in all and only those s ∈ π such that their 'closest worlds' that make a true, also make b true. Intuitively, we can think of σπ(a, s) as the set of those elements of π that entail a and which are most like s in all similarity respects the agent can entertain. For a given partition π, (a b)π is thus the union of the set of atoms s ∈ π such that their most similar a-atoms all entail b. In other words, (a b)π will be the weakest proposition in the Boolean closure of π that entails a b. 43 In general, (a b)π and (a b) are dierent propositions.61 After all, (a b)π is a proposition that is 'visible' in the algebra generated by π, even though (a b)may not be. wv w∗A wA A D Figure 3: ink of the solid lines as tracing the partition π that generates the algebra containing the proposition that Alice enters the room onursday (here labeled as A) and the proposition that Alice had the disease on Friday (here labeled as D). Relative to π, the counterfactual A D ends up being equivalent to A ∩ D. e area traced by the dashed line corresponds to the proposition V that there was a virus in the room. Relative to the renement π′ of the initial partition that contains this new proposition, the counterfactual A D ends up expressing the proposition corresponded to the grayed out region-i.e. it turns out to be equivalent to (A ∩ D) ∪ V. Consider now wv-a ¬A ∩ ¬D ∩ V world. If we only consider similarity respects available in π, then wA and w∗A are both among the closest A-worlds to w0 . If instead we consider similarity respects available in π′, wA is a closest A-world to wv , but w∗A is not. is is at it should be. We can think of (a b)π as the proposition that plays the role of (a b) in an agent whose conceptual resources are given by π. is proposition will be true at a world just in case the a-worlds that most agree with that world in all similarity making respects that the agent can entertain are b-worlds. And it may not be true at a world even though the a-world that most agrees with that world in all similarity making respects simpliciter is a b-world. (See Figure 3.) Take, for example, the following counterfactual: caught: If Alice had entered the room, she would have caught the disease. 61 To avoid clutter, I will no longer make explicit the relativization to σ in what follows, unless itmatters. For our purposes, we can assume that we have xed a selection function- perhaps the one corresponding to what Lewis 1979 calls the 'standard resolution' of the 'vagueness' of counterfactuals. 44 And suppose π is a partition that does not distinguish between worlds that dier only at themicroscopic level (see Figure 3). Consider now some possible world wv in which there is some virus in the air inside the room. Presumably, caught is true at that world: on a standard interpretation, caught is true at wv just in case the world most similar to wv in which Alice enters the room is one in which she catches the disease. Since such a world-call it wA-will agree with wv as to the presence of the virus, caught is true inwv (at least if we assume that Alice's absence is causally independent of the presence of the virus). But consider now all worlds that are closest to wv in all respects that are visible in π in which Alice enters the room. Among those worlds is wA. But so is w∗A, a world that diers from wA before the time at which Alice enters the room only at the microscopic level, and in which there is no virus in the room. Since the dierence between wA and w∗A is not visible in π, both worlds will count as being equally close in all respects visible in π. As a result, since presumably inw∗A Alice does not catch the disease, if we restrict ourselves to similarity respects that are available in π, caught comes out false. In short, there is a principled sense in which, for each agent with state space π and a and b in their algebra B, there will be a proposition (a b)π that plays the role of a b for that agent. It's the proposition that will be at a world w ∈ s, with s ∈ π, just in case the closest s′ ∈ π that entails a also entails b. e second reason εδC(P, x) could fail to be well-dened, recall, is that r(Cs , e)may not well-dened if C(a e ∣ s) is not well-dened for some s ∈ πP . In assessing the epistemic value of an extension P, you need to assign weights to propositions in πP . ese weights in turn depend on your credence, given s ∈ πP , of some counterfactuals. Since P is an extension of your credence function, though, we run into a problem. For whenever q is not in the domain of your credence function, your credence conditional on q may not be well-dened-not if we assume, as I have been throughout, that the conditional probability of q given p is dened in terms of the ratio of your credence in qp and your credence in p. Fortunately, if p is an element of πP , any proposition in the domain of P is either entailed by p or it is incompatible with p. us, for any probability function dened over πP and any q, the probability of q given p, for p ∈ π, will be 0 or 1, depending on whether p ⊆ q. e upshot of these two claims-which I argue for in more detail in Appendix d-is this. Say that P is fully opinionated i it assigns 0 45 or 1 to every proposition in its domain. If P is fully opinionated, there is a unique s ∈ πP such that P(s) = 1-in that case, we will say that P is concentrated on s. As I show in Appendix d, our two observations ensure that whenever an agent with credence function C is considering an extension C′ of C, χeC(s ′) will in fact be well-dened, for each s′ ∈ π′, since r(Cs′ , e)will be. In particular, we can establish the following remark, which will come in handy in what follows: remark 5 .2. Suppose P is fully opinionated and concentrated on s. If P(e) = 1 then r(P, e) = 1 − 1 ∣B∣ ∑b ∈ B ∣1 − 1{σ(b, s′) ⊆ e}∣ In other words, the resilience of e relative to P) is determined by the proportion of b ∈ B such that sP entails b e. It is also worth highlighting the following consequence of Remark 5.2: corollary 5 .3. Suppose P and Q are fully opinionated and dened over the same partition π. If P(e) = Q(e) = 1, then r(P, e) = r(Q , e). 6 the rationality of conceptual change Let me take stock. I have taken for granted a particular picture of epistemic rationality. On this picture, epistemic rationality is a matter of maximizing expected epistemic value. It is well known that, if we impose some minimal constraints on the notion of epistemic value, this picture allows to recover many of the norms of a broadly Bayesian picture of rationality. I have argued that we can generalize familiar ways of thinking about epistemic value so as to allow for comparisons of credence functions dened over dierent hypothesis spaces.e strategy I recommended-if only as a proof of concept-was this. First, useweighted accuracymeasures as epistemic utility functions-where these are functions that treat accuracy with respect to a proposition in a way that depends on what a given weight function assigns to that proposition (§4.1). Second, use weight functions that assign to the relevant proposition something that measures its explanatory value-where this is something that is meant to capture 46 how much it would contribute, if true, to having a good explanation of an antecedently given explanandum (determined by the relevant agent's explanatory goals).ird, measure the explanatory value of a given proposition using the notion of counterfactual resilience-the explanatory value of a given proposition is determined by how much learning that proposition would increase the counterfactual stability of the explanandum (§5.1). Crucially, how much a given proposition would increase the stability of another proposition depends on what other propositions are available to the agent-for it is those propositions that determine what the similarity respects used to evaluate counterfactuals are (§5.2).e resulting strategy allows us to compare the epistemic value of dierent expansions of an agent's function in terms of how much the added proposition would contribute to the agent's explanatory goals. My ultimate goal, however, is to show that this generalization provides us with the resources to make sense of questions about the rationality of conceptual change. And to do this, we need to specify some bridge principle yielding norms of epistemic rationality from facts about epistemic value. In the simple case when we're comparing choices among credence functions with the same domain, the familiar injunction to maximize expected value is a reasonable bridge principle. As we saw in §3.3, though, once we allow for options whose domains are renements of the domain of our prior credence function, the notion of expected value is not welldened. Instead, we have two related but distinct quantities-the upper expected value and the lower expected value (see Denition 3.6)-and at least two distinct but related bridge principles-maximize upper expected value (maximax) andmaximize lower expected value (maximin). To make things slightly more concrete, suppose we could argue that the right way to pick an expansion of P to some renement π of πP is to pick the credence function Pπ with domain π that maximizes EP[u(⋅)] for a given utility function u. In order to compare two distinct renements π and π′ of πP from P 's perspective, we could then simply compare the values ofEP[u(Pπ)] andEP[u(Pπ′)]. Suppose instead we agreed that the right way to pick an expansion of P to π is to pick the credence function Pπ 47 that maximizes EP[u(⋅)] for a given u.en we could compare distinct renements π and π′ by comparing EP[u(Pπ)] and EP[u(Pπ′)].62 I do not here have a case to oer in favor of one or another such bridge principle. To that extent, at least, this is just a progress report. Fortunately, sometimes it will turn out that no matter which way we go, one partition comes out ahead of another one. And in those cases, we can avoid settling the vexed question how to assign credence to new propositions and still declare a choice of one partition over another to be rational. Our two toy examples can illustrate this point. As we will see, our strategy for both examples will rely on two observations.e rst is that, for each C and e ∈ BC , χeC has the following monotonicity property relative to any extension of C: remark 6.1. Suppose C′ is an extension of C, s ∈ πC and s′ ∈ πC′ . If s′ ⊆ s then χeC(s) ≤ χ e C(s ′).63 is ensures that, as long as our local accuracy measure is downwards proper, so will be εeC .64 Our second observation requires a little more setup. Suppose your credence function C is dened over π and suppose π1 and π2 are two renements of π. Fix a weight function λ and assume that for each s ∈ π that is partitioned by π1, there is an equally likely (by your own lights) s′ ∈ π that is partitioned by π2 into the same number of cells. Further suppose that π1 splits s into propositions whose weight is at least as high as those into which π2 splits s′, in the sense that you can send eachmember t of π1 that is included in s into exactly one member t′ of π2 that is included in s′, so that the λ weight of t is at least as great, and sometimes greater, than that of t′.en, we say that C and λ rank π1 over π2. More precisely: 62 Again, these are not the only possible ways to go about choosing a distribution over a renement of a prior credence function. One could assign, for example, to each such distribution a weighted average of its upper and lower expected values, for xed weights α and β (this is the so-called Hurwicz criterion, aer Hurwicz 1951). Here, however, I will restrict my attention to the two simpler bridge principles, and simply note that in the cases we will be concerned with, the Hurwicz criterion would yield exactly the same results. I do not know what would happen if we relied on other bridge principles instead (e.g. minimax regret). 63 Cf. Remark d . 10. 64 See Fact b .9. 48 definition 6.2. If π1 and π2 are two renements of πP we say that λ and P rank π1 over π2 i there is a bijection f ∶ π1 Ð→ π2 such that: • for all s ∈ π1, λπ1(s) ≥ λπ2( f (s)), • for some p ∈ πP , max s ⊆ p,s ∈ π1 λπ1(s) > maxs⊆p,s ∈ π1 λπ2( f (s)), • for all p ∈ πP , ⋃ s ⊆ p, s ∈ π1 f (q) ∈ π2, • for all p ∈ πP , P(⋃ s ⊆ p, s ∈ π1 q) ≥ P(⋃ s ⊆ p, s ∈ π1 f (q)). Our nal observation is that whenever λ and P rank π1 over π2, it makes sense to adopt π1 rather than π2 no matter which of the bridge principles under consideration turns out to be the correct bridge principle. In other words:65 fact 6.3. Suppose u is a weighted accuracy measure with weight function λ and local accuracy measure δ that is a downwards proper, ane transformation of the Brier local accuracy measure. For any two renements π1 and π2 of P, if λ and P rank π1 over π2, then 1. maxC ∈Pπ1 /P EP[u(C)] > maxC ∈Pπ2 /P EP[u(C)], and 2. maxC ∈Pπ1 /P EP[u(C)] > maxC ∈Pπ2 /P EP[u(C)]. What we want to show, then, is that in each of our toy examples, introducing the new propositions you did made epistemic sense. is is 65 Cf. Corollary c . 16. In the terminology of the appendix, λ and P ranks π1 over π2 i there is some π that is πP-equivalent to π1 and λ-P-equivalent to π2 such that λ favors π1 over π. 49 because, regardless of how exactly we think you should assign probabilities over new propositions, rening the way your did was epistemically rational. Our strategy will be to show that, in each case, one of the atoms in the posterior credence function gets a strictly greater weight-has more explanatory value than-the atom of the prior credence function that it entails.is will suce to show that your epistemic transitionwas rational, at least when your options were your posterior credence function and any renement π∗ of your prior such that (a) for each atom of your prior that is split by your posterior, there is an equally probable atom in π∗ that is split into the same number of elements, and (b) none of the atoms of π∗ get greater weight than the atom of your prior that they entail.is is because of the following simple fact, which follows from Remark d . 10 in Appendix d. remark 6.4. Fix a credence function C with state space π, x some explanandum E, and let π1 and π2 be two renements of π such that there is a bijection f ∶ π1 Ð→ π2 such that, for all s ∈ π and all t ∈ π1, t ⊆ s⇔ f (t) ⊆ s. Suppose for each s ∈ π and each t ∈ π2 with t ⊆ s, χEC(s) = χEC(t) and suppose for some s ∈ π there is t ∈ π1 with t ⊆ s and χEC(t) > χEC(s).en C and χEC rank π1 over π2. 6.1 e Reds, revisited Your Monday credence function, recall, was dened over the Boolean closure of propositions of the form: c moves faster aer exposure to blue light, c moves faster aer exposure to red light, c's behavior is unaected by exposure to blue light, c's behavior is unaected by exposure to blue light. Your Tuesday credence function was one whose domain is the result of adding propositions of the form c is in state R and (again) closing under Boolean operations. We will only focus on propositions involving a particular red, call it c0. Call Fr (resp. Fb) the proposition that c0 moves faster aer exposure to red (resp. blue) light, and let R denote the proposition that c0 is in state R. For simplicity, let us assume that the only change in behavior we are interested in is whether c0 moves fast, so that Fb is equivalent to 50 the negation of c0 's behavior is unaected by blue light and similarly for Fr. Let us further assume that, together with everything else you believe, R entails (but is not entailed by) Fb ∪ Fr (see Figure 4). Finally, assume E ∶= Fb is the proposition you are interested in explaining (the reasoning below can be reused,mutatis mutandis, to establish the same conclusion on the assumption that it is Fr instead you are interested in explaining). Fb ¬Fb Fr ¬Fr Fb ¬Fb Fr ¬Fr R Figure 4:e transition from your Monday credence function to your Tuesday credence function. Aer adding the proposition R and closing under Boolean operations, the result is the algebra generated by the state space that results from taking the state space of your Monday credence function and replacing each of Fb ∩ Fr, Fb ∩ ¬Fr, ¬Fb ∩ Fr, and ¬Fb ∩ ¬Fr, with their intersections with R and with ¬R. Note that for worlds in Fbr ∩ R, their closest ¬Fr worlds are also Rworlds.is means that the atom Fbr ∩ R entails the counterfactual ¬Fr  Fb, which Fbr does not (since there are Fbr worlds whose closest ¬Fr) are ¬Fb worlds. Call C your Monday credence function, let π denote C 's state space, and let π′ denote the coarsest renement of π whose Boolean closure includes R. Assume C is concentrated on the proposition Fbr = Fb ∩ Fr, reecting the fact that you are certain that c0 moves faster aer exposure to blue and red lights. We want to show that for some s′ ∈ π′, s ∈ π, s′ ⊆ s, χEC(s′) > χEC(s). In other words, that some proposition in π′ has greater weight than the proposition in π that encloses it-or, in yet other words, that some proposition in π′ has more explanatory value than the proposition in π it entails. Let X = Fbr ∩ R. We want to show that X has greater weight than Fbr: roughly, that upon being certain that Fbr is true, learning further that R is true would put you in a better position to explain E (that is, Fb). Since X ⊆ Fbr, X entails any counterfactual that is entailed by Fbr. So in order to show that χEC(s′) > χEC(s′), we need to show that there is some relevant counterfactual of the form b E that is entailed by X but not by Fbr. For this would show that the proportion of relevant counterfactuals entailed by X is greater than the proportion of those entailed by Fbr, which given Remark 5.2, is enough to show that χEC(X) > χEC(Fbr). 51 Take an o-the-shelf theory of similarity, like the one in Lewis 1979. Clearly, ¬Fr-worlds in which R is true-worlds in which c0 does not move faster when exposed to red light but in which c0 is in state R-are closer to any X-world than ¬Fr-worlds in which R is not true. Aer all, R concerns matters of particular fact prior to the time at which Fr is true.66us, X entails ¬Fr E, since R entails Fb ∪ Fr and thus any R-world in which ¬Fr is true is an Fb-world (since c0 is in state R, worlds in which c0 is in state R are closer to the actual world than worlds in which it isn't; thus, the closest worlds in which c0 does not move faster when exposed to red light are those in which he is also in state R, which means that they are worlds in which he moves faster when exposed to blue light). In contrast, Fbr does not entail ¬Fr E, for there are Fbr-worlds (namely ¬R-worlds) such that their closest ¬Fr-worlds are not Fb-worlds. Consider now any other renement of π which splits the same cells as π′ does, and in the same number of ways, but whose weights do not exceed the weights of their corresponding C-cells. For example, consider the renement of C that results from adding the proposition, call it T, that c0 was tired aer moving faster from exposure to either blue light or red light. Like R, T entails (given everything else you believe) Fb ∪ Fr.67 And it is compatible with but logically independent of any of Fbr, Fr ∩ ¬Fb, and ¬Fr ∩ Fb. Plausibly, however, for any X of the form T ∩ s, with s ∈ {Fbr, Fr∩¬Fb,¬Fr∩Fb}, the closest ¬Fr worlds to X-worlds include both worlds in which E is true and worlds in which it isn't. (Aer all, had ¬Fr been true, T might have been false, perhaps because c0 wouldn't have moved at all.) Similar reasoning shows that, for any relevant supposition b and any s ∈ πC , if s does not entail b E, neither does T ∩ s. As a result, for each atom t of this new renement (call it π∗), χEC(t) = χEC(s), where s ∈ πC and t ⊆ s. is shows that χEC ranks π′ over π∗, and thus that regardless of which bridge principle we use, expanding to π′ rather than π∗ would be rational. 66 I'm relying on context to provide the relevant time indices here. Strictly speaking, we should let R be the proposition that c0 is in state R at t0, Fr the proposition that c0 's behavior at t1 is unaected by red light, etc. 67 I am ignoring the possibility that Fb ∪ Fr is not entailed but merely presupposed by T. Adjust accordingly if you think that's something we should not ignore. 52 6.2 Cubes and spheres, revisited Let C denote yourWednesday credence function, which was dened over the Boolean closure of propositions of the form: c is a cube, c is a sphere, c is an nth-generation critter, and a and b are c's parents. Let π = πC and let π′ denote the state space of yourursday credence function, whose domain results from enlarging BC by adding propositions of the form c is a hybrid cube and c is a pure cube and closing under Boolean operations. Take two second-generation cubes with mixed parents. Let E be the proposition that their ospring is 75% cube and 25% sphere and assume C(e) = 1. Again, we want to show there is one s ∈ π and some s′ ∈ π′ with s′ ⊆ s such that χEC(s′) > χEC(s)-that some proposition in π′ has more explanatory value than the proposition in π it entails. As before, this requires showing that there is s′ ∈ πC′ , s ∈ πC , with s′ ⊆ s, and some relevant proposition b such that s′ entails b E but s does not. In other words, a proposition which, were you to update on it, the resilience of E relative to your credence function would increase. Let M be the proposition that the cubes are second generation cubes with mixed parents and let H be the proposition that they are second generation hybrid cubes. (Note that, by construction, M is an atom of π and H is an atom of π′.) Let T be the proposition that the cubes are third-generation cubes. We want to show that, while H entails T E, M does not. On the assumption that T is a relevant proposition, this would establish that χEC(H) > χEC(M)-roughly, that learning H even aer being certain of M would increase the resilience of E, and thus leave you with a better explanation of E. Note now that there are M-worlds in which one of our cubes is not a hybrid cube, and thus there are T-worlds closest to them in which their ospring is not 75% cube-which is to say that M does not entail T E. In contrast, T ∩ M-worlds are closer to any H-world than any T ∩ ¬M-worlds. But given everything else you believe, M entails E, which means that the closest T worlds-which will be T ∩ M worlds-will also be E worlds. We thus have that H entails T E, as desired. Contrast now π′ with another renement of π, call it π∗: this is the result of adding to B the proposition B that one of each cube's parents is a blue sphere. Let S be the proposition that one of each cube's parents is a blue sphere. Plausibly, for any relevant supposition b and any s ∈ π, if E∩ s entails b E, so does s E.us, for each s ∈ π, χEC(s) = χEC(s ∩ B). 53 Again, this shows that expanding to π′ is better than expanding to π∗, regardless of which bridge principle we use. 7 closing Bayesian epistemology has long remained silent on questions about how best to carve up the space of hypotheses we use in theorizing about the world. But it need not: there is a natural way of generalizing the classical Bayesian framework so as to formulate and perhaps answer this question. Doing so requires thinking of epistemic rationality as having a broadly decision theoretic structure.e key is to allow for a more expansive notion of epistemic value to play the role of 'epistemic utility'-more expansive, that is, than the accuracy-centered approach that has dominated work on epistemic utility theory thus far. I suggested a simple way to do this: we should use weighted accuracy measures, where the weight assigned to a proposition corresponds to how epistemically important it is. And I oered a proof of concept: a way of assigning weights to propositions that measures their explanatory value. e resulting picture is conservative with respect to standard, Bayesian epistemology. But it oers answers to questions that cannot even be formulated in the classical framework. I have only told the beginning of the story. Before concluding, I want to highlight just a few questions for future work to address. Perhaps the most pressing one is whether we can drop the relativization to particular explananda in our characterization of epistemic utility functions. One could avoid this by making the choice of epistemic utility function be even more dependent on the particular agent whose credences we are evaluating. On this view, whatever the agent herself seeks to explain gives rise to the particular weighted accuracy measure, which we should use to assess the rationality of her epistemic transitions. But a more ambitious strategy would be to identify which explanada cry out for explanation relative to a body of beliefs, and nd a way of aggregating the dierent weight functions generated by each explanandum.e resulting function would measure the extent to which a given proposition contributes to explaining that which ought to be explained. Aside from concerns about the possibility of aggregating dierent weight functions, 54 the main obstacle I foresee for this strategy is that of characterizing what it is for a proposition to be in need of explanation.68 Another issue le outstanding is the status of some of the constraints on epistemic utility functions that I have taken for granted. In particular, it might be worth considering the possibility that downwards propriety is too strong a demand. On the resulting picture, neither downwards propriety nor upwards propriety should be taken as constraints on epistemic utility functions. Rather, we should think that only those credence functions that are 'stable' are fully rational-only those credence functions which take themselves to be doing better than any of their extensions and any of their restrictions. An interesting question would be how to characterize the class of such probability functions relative to a particular epistemic utility function. But a more pressing concern would be to motivate the choice of one such epistemic utility function on independent grounds. Finally, assuming we abandon both downwards propriety and upwards propriety, the question arises as to whether it can be rational to move from a given probability function to an expansion of it that is not an extension of it-whether, in other words, enlarging the domain of propositions one assigns credence to requires revising one's prior credences. An armative answer to this question would promise to shed light on the so-called problem of new theories, one of the big open questions for Bayesian epistemology. At this stage, however, I cannot tell which answer will turn out to be right. I have argued here that if we enrich the framework of epistemic utility theory with a more expansive notion of epistemic value, we can better understand how our hypothesis spaces should change, and vindicate the plausible idea that conceptual innovation is rationally constrained.ere is still a signicant role for epistemic imagination: there may be little we can do but wait for new distinctions to occur to us. But we need to know which such distinctions to take seriously and which to ignore as mere clutter.e framework outlined in this paper can help us do just that.69 68 Some suggestive remarks inWhite 2005 might be used to characterize what being in need of explanation amounts to in a way that is amenable to the present framework. I discuss this and related ideas in my [redacted for blind review]. 69 [acknowledgments omitted for blind review.] 55 appendix a I stated, without proof, that no utility function can be strictly universally proper (Fact 3.13), and that no strictly proper utility utility function can be universally proper (Fact 3.14).e purpose of this appendix is to provide proofs of these and related results. First, we establish the following lemma: lemma a . 1. Fix a nice utility function u. Suppose there are P and Q such that Q is an extension of P and EQ[u(Q)] > EQ[u(P)]. en u is not upwards proper. Proof. Suppose there are such P and Q, for a given u. Let q and p range over πQ and πP , respectively. Since πP ⪯ πQ , we know from Remark 3.3 that for each q, u(P, q) is well-dened, with u(P, q) = u(P, q′)whenever q and q′ are in the same cell of SP . It thus follows from the probability calculus that70 ∑ q Q(q) ⋅ u(P, q) =∑ p P(p) ⋅ u(P, p). Now, by denition: EP[u(Q)] ≥∑ q Q(q) ⋅ u(Q , q). And by assumption, ∑ q Q(q) ⋅ u(Q , q) >∑ q Q(q) ⋅ u(P, q) =∑ p P(p) ⋅ u(P, p). us, EP[u(Q)] > EP[u(P)], which means u is not upwards proper. 70 Since πP is a partition ofW , ∑ q Q(q) ⋅ u(P, q) =∑ q ∑ p Q(q ∣ p)Q(p) ⋅ u(P, q) =∑ p Q(p)∑ q Q(q ∣ p) ⋅ u(P, q). And for each p, since u is nice and Q extends P, Q(p)∑ q Q(q ∣ p) ⋅ u(P, q) = P(p)∑ q ⊆ p Q(q ∣ p) ⋅ u(P, q) = P(p) ⋅ u(P, p). 56 e result stated in Fact 3.13 follows immediately. theorem a .2. ere are no strictly universally proper epistemic utility functions. What is more, if we assume that epistemic utility functions must be partition-wise strictly proper, we can show that there are no universally proper epistemic utility functions. theorem a . 3. If u is universally proper, it is not partition-wise strictly proper. Before proceeding with the proof ofeorem a . 3, let me introduce one more denition, which will come in handy in what follows: definition a .4. A probability function Q is an opinionated extension of a probability function P i Q is an extension of P and for each p ∈ πP there is qp ∈ πQ with qp ⊆ p and Q(qp) = P(p). 71 Proof ofeorem a . 3. Our result follows immediately from the following lemma: lemma a . 5. Let u be a strictly proper utility function that is downwards proper. Suppose P is a probability function and π is a renement of πP such that there is p ∈ πP ∖ π with P(p) ≠ 0.en there is an extension P∗ of P to π such that EP[u(P∗)] > EP[u(P)] e proof of Lemma a . 5 relies on an observation. remark a .6. Suppose Q is an extension of P. For each utility function u there is an opinionated extension Q∗P of P such that EP[u(Q)] = EQ∗P [u(Q)]. 71 Note that any probability function is an opinionated extension of itself. Note also that not all opinionated extensions of a probability function are fully opinionated in the sense dened in §5.2. Indeed, only those opinionated extensions of fully opinionated probability functions are themselves fully opinionated. 57 Proof. For each p ∈ πP , pick qp ∈ πQ with qp ⊆ p and such thatu(Q , qp) ≥ u(Q , q) whenever q ⊆ p. Let Q∗P be the unique extension of P such that Q∗P(qp) = P(p). Clearly, Q∗P is an opinionated extension of P. Furthermore, for any extension Q′ of P, we have EQ′[u(Q)] ≤ EQ∗P [u(Q)], and thus EP[u(Q)] ≤ EQ∗P [u(Q)] ≤ EP[u(Q)], as desired. Proof of Lemma a . 5. Supposeu is partition-wise strictly proper anddownwards proper. Fix a probability function P, let π be a renement of πP , and suppose there is pπ ∈ πP ∖ π with P(pπ) ≠ 0. Let Q be an extension of P with Q(q) ≠ P(pπ) for each q ⊆ pπ . We know from Remark a .6 that there is an opinionated extension Q∗P of P to π such that EP[u(Q)] = EQ∗P [u(Q)], with Q∗P(q) ∈ {P(p), 0}, whenever q ⊆ p. By construction, Q∗P ≠ Q. Since u is partition-wise strictly proper, we have EQ∗P [u(Q)] < EQ∗P [u(Q ∗ P)]. And by denition, we have EQ[u(Q)] ≤ EP[u(Q)]. Since u is downwards proper, we have EQ[u(Q)] ≥ EQ[u(P)] = EP[u(P)].us, EP[u(P)] ≤ EP[u(Q)] = EQ∗P [u(Q)] < EQ∗P [u(Q ∗ P)] ≤ EP[u(Q∗P)], which entails that u is not upwards proper. It might seem too demanding to require upwards propriety of epistemic utility functions. So it might be tempting to require instead that for each P and each extension Q of P, EP[u(P)] ≥ EQ[u(Q)]. e thought behind this is that, whereas in order to compare an extension of one's credence function one needs to use some credence function with 58 the same domain as that extension, youwould be stacking the deck against your credence function if you evaluate extensions using the credence function that gives the most favorable assessment of it. Perhaps, then, the thing to do is to evaluate each extension not with whoever gives it the most positive evaluation, but with itself.e problem is, in the presence of partition-wise propriety, this requirement turns out to be equivalent to the requirement of upwards propriety, as made clear by the following fact: fact a . 7. A partition-wise proper utility function u is upwards proper if and only if, for each P and each opinionated extension Q∗ of P, EP[u(P)] ≥ EQ∗[u(Q∗)]. Proof. e le to right direction is straightforward, since by denition EP[u(Q∗)] ≥ EQ∗[u(Q∗)]. For the right to le direction, we rely again on Remark a .6. For take P, let Q be an extension of P, and let Q∗P be such that EP[u(Q)] = EQ∗P [u(Q)]. Since u is partition-wise proper, we have EQ∗P [u(Q ∗ P)] ≥ EQ∗P [u(Q)]. Which gives us what we want, since by assumption, EP[u(P)] ≥ EQ∗P [u(Q ∗ P)]. corollary a .8. Suppose that for each P and any extension Q of P, EP[u(P)] ≥ EQ[u(Q)]. en u is not downwards proper. 59 Let me conclude by noting that, if we make some restrictions on u, we can strengtheneorem a .2 in an interesting way.72 We say that a function δ ∶ [0, 1]Ð→ R is concave i for each x , y, α ∈ [0, 1], δ(λx + (1 − λ)y) ≥ λδ(x) + (1 − λ)δ(y). We say that δ is strictly concave i the above inequality is always strict. Recall now (see Denition 4.1) that a utility function u is additive i for each partition π and each s ∈ π there is δπs ∶ [0, 1] × {0, 1} Ð→ R, with δπs (x , 0) (resp. δπs (x , 1)) continuous, twice dierentiable, and strictly increasing (resp. decreasing), such that for all C with πC = π, u(C ,w) = ∑ s ∈ π δπs (C(s),1{w ∈ s}). We say that an additive utility function u is (strictly) concave i each component function δπs (x , i) (i ∈ {0, 1}) is (strictly) concave. Any utility function which, restricted to probability functions with a given domain, is equivalent to an ane transformation of the Brier score is concave.73 We can now formulate the following theorem (recall that PC′ /C is the set of extensions of C to the domain of C′): theorem a .9. Suppose u is an additive, concave, strictly partition-wise proper utility function. Fix C and let C′ be an expansion of C.en: max P ∈PC′ EC[u(P)] = min Ĉ ∈PC′ /C EĈ[u(Ĉ)], where PC′ is the set of probability functions dened over πC′ . 72 e restrictions aren't all strictly necessary-an even more general result for pretty much any natural partition-wise proper utility function follows as a corollary ofeorem 6.2 in Grünwald & Dawid 2004-but they allow for a more self-contained presentation of our results. 73 e requirement that a utility function be concave is a generalization of what Joyce calls convexity (Joyce 2009, p. 282). is unfortunate terminological dierence is probably due to the fact that, when Joyce formulated a similar requirement in Joyce 1998, he was focused on disutility functions (or inaccuracy measures), and a utility function u is concave i the corresponding disutility function −u is convex. Requiring concavity rules out any ane transformation of the log score, and while I believe we can extend our results below so as to include the log score and its ane transformations if we allow for local accuracy measures to take on the value −∞ for (0, 1) and (1, 0), I will not explore this issue further here. 60 corollary a . 10. Suppose u is an additive, concave, strictly partitionwise proper utility function. Suppose for each P and each extension Q of P, EP[u(P)] ≥ EP[u(Q)]. en u is not strictly downwards proper. Proof. Pick any such u. Fix P and let P′ be an extension of P. By assumption, we know that EP[u(P)] ≥ max Q ∈PP′ /P EP[u(Q)]. eorem a .9 thus entails that EP[u(P)] ≥ min Q ∈PP′ /P EQ[u(Q)]. Letting Q∗ be such that EQ∗[u(Q∗)] = min Q ∈PP′ /P EQ[u(Q)], we thus have EQ∗[u(Q∗)] /> EP[u(P)] = EQ∗[u(P)], which means u is not strictly downwards proper. In order to proveeorem a .9, we will rely on the following wellknown result:74 theorem a . 1 1 (von neumann 's minimax theorem). Suppose X ⊆ Rn and Y ⊆ Rm are compact and convex and let f ∶ X × Y Ð→ R be concave for a xed y ∈ Y and convex for a xed x ∈ X.en max x ∈ X min y ∈Y f (x , y) = min y ∈Y max x ∈ X f (x , y). 74 e canonical reference here is von Neumann 1928. For a more accessible proof of this result, see Binmore 2004. 61 Proof ofeorem a .9. Fix C and let C′ be an expansion of C to some partition π. As before, let P pi be the set of probability functions dened over π and letPC′ /C be the set of extensions ofC to π. Fix an enumeration {si ∶ i ≤ n} of π, with n = ∥π∥ and identify each credence function P ∈ P pi dened over π with xP = ⟨P(s1), . . . , P(sn)⟩. Set X = {xP ∶ P ∈ PC′ /C} and Y = {xP ∶ P ∈ P pi}. Note that both X and Y are compact (since they are each subsets of [0, 1]n) and convex (Y trivially so, and X because any convex combination of extensions of C is itself an extension of C). Dene f ∶ X × Y Ð→ R by f (xP , xQ) = −EP[u(Q)]. Since u is concave, f (xP , xQ) is convex for a xed xP . And clearly, for a xed xQ , f (xP , xQ) is a linear function of xP , and thus concave. From eorem a . 1 1 we can thus conclude that max xP ∈ X min xQ ∈Y f (xP , xQ) = min xQ ∈Y max xP ∈ X f (xP , xQ). Note now that min xQ ∈Y max xP ∈ X f (xP , xQ) = max xQ ∈Y min xP ∈ X − f (xP , xQ) = max Q ∈P pi EC[u(Q)]. And since u is proper, we have max xP ∈ X min xQ ∈Y f (xP , xQ) = min P ∈PC′ /C max Q ∈P pi EP[u(Q)] = min P ∈PC′ /C EP[u(P)]. Putting these two observations together, we thus get max Q ∈Pπ EP[u(Q)] = minP ∈PC′ /C EP[u(P)]. appendix b e purpose of this appendix is to provide examples of downwards proper and of upwards proper epistemic utility functions and to prove two characterization theorems (eorem b . 3 andeorem b . 5) for a simple class of weighted accuracy measures. 62 Recall that an epistemic utility function is aweighted accuracymeasure i for each partition π there is a weight function λπ ∶ π Ð→ R+ and a local accuracy measure δ ∶ [0, 1] × {0, 1}Ð→ R such that u(P,w) = ∑ p ∈ πP λπP(p)δ(P(p),1{w ∈ p}), We will say that u is a simple accuracy measure i it is a weighted accuracy measure with constant weight function. Note that if u is a simple accuracy measure, then there is a local accuracy measure δu such that u(P,w) = ∑ p ∈ π δu(P(p),1{w ∈ P}). If we restrict our attention to simple accuracy measures, we can provide a general characterization of all downwards proper utility functions. To do that, it will be helpful to have at our disposal the following denition. definition b . 1. For any x , y ∈ [0, 1], let Ex δ(y) ∶= x ⋅ δ(y, 1) + (1 − x) ⋅ δ(y, 0) is gives us a slightlymore convenient way of rewriting the denition of expected utility. remark b .2. Suppose u is a weighted accuracy measure with weight function λ and local accuracy measure δ.en for any partition π and any probability functions P and P′ dened over π EP[u(P′)] = ∑ p ∈ π λπ(p)EP(p) δ(P′(p)). We can now establish the following result: theorem b . 3. A simple accuracy measure u is downwards proper i for all x , y ≥ 0, if x + y ≤ 1 then Ex δu(x) +Ey δu(y) ≥ Ex+y δu(x + y). It is strictly downwards proper i for all x , y ≥ 0, if x + y ≤ 1, then Ex δu(x) +Ey δu(y) > Ex+y δu(x + y) 63 Proof. e le to right direction of each biconditional follows immediately from the denitions. For the right to le direction of the rst biconditional, suppose u is not downwards proper.en there are P and Q such that P is an extension of Q with EP[u(Q)] > EP[u(P)]. In other words, ∑ q ∈ πQ EP(q) δu(Q(q)) > ∑ q ∈ πQ ∑ p⊆q EP(p) δu(P(p)). But this entails that there is q ∈ πQ such that EQ(q) δu(Q(q)) > ∑ p⊆ q EP(p) δu(P(p)), or equivalently that there are non-negative x1, . . . xk such that Ex∗[δ]u(x∗) >∑ i Ex i [δ]u(xi), where x∗ = ∑i xi ≤ 1 and k = ∣{p ∶ p ⊆ q}∣. And this in turn entails there are x , y ≥ 0 with x + y ≤ 1 such that Ex δu(x) +Ey δu(y) < Ex+y δu(x + y) Analogous reasoning can be used to establish the right to le direction of the second biconditional. corollary b .4. Let βθ(P,w) = ∑ p ∈ πP θ − (P(p) − 1{w ∈ p})2. βθ is downwards proper i θ ≥ 1/2. Proof. Note that βθ is a simple accuracy measure with local accuracy measure bθ(x , i) = θ − (x − i)2, with Ex bθ(x) = θ − (x − x2), 64 From eorem b . 3, we know that βθ is downwards proper i for all x , y ≥ 0 with x + y ≤ 1, Ex bθ(x) +Ey bθ(y) ≥ Ex+y bθ(x + y). Now, Ex+y bθ(x+ y) = θ−(x+ y−(x+ y)2) = Ex bθ(x)+Ey bθ(y)−θ+2xy. So βθ is downwards proper i θ ≥ 2x(r − x) for all x ≥ 0, r ≤ 1. Since 2x(r− x) is maximized at x = r/2, we can conclude that βθ is downwards proper i θ ≥ 1/2. We can also provide a characterization of all upwards proper simple accuracy measures, one that includes the Brier score β. theorem b . 5. If u is a simple accuracy measure, then u is upwards proper (resp. strictly upwards proper) i δ(0, 0) ≤ 0 (resp. δ(0, 0) < 0). Proof. We start with the following fact. lemma b .6. Suppose u is a proper, weighted accuracy measure. en for each P and any opinionated extension Q of P, if λπQ (q) = λπP(p) whenever q ⊆ p, then EQ[u(Q)] = EP[u(P)] + ∑ p ∈ πP λπP(p) ⋅ (∥p∥πQ − 1) ⋅ δ(0, 0). Proof. Since Q is an opinionated extension of P, we know that for each p ∈ πP , Q(qp) = P(p) and that for q ⊆ p, q ≠ qp implies Q(q) = 0. And since E0 δ(0) = δ(0, 0), we have EQ[u(Q)] = ∑ p ∈ πP ⎛ ⎜ ⎜ ⎝ λQ(qp)EP(p) δ(P(p)) + ∑ q⊆p, q≠qp λQ(q)δ(0, 0) ⎞ ⎟ ⎟ ⎠ . Now, by assumption, we have that for each p ∈ πP , q ∈ πQ , q ⊆ p entails λQ(q) = λP(p).us, we can conclude EQ[u(Q)] = ∑ p ∈ πP (λP(p)EP(p) δ(P(p)) + λP(p) ⋅ (∥p∥πQ − 1) ⋅ δ(0, 0)) . as desired. 65 corollary b . 7. If u is a simple accuracy measure, then for each P and any non-trivial opinionated extension Q of P there is k > 0 EQ[u(Q)] = EP[u(P)] + k ⋅ δ(0, 0). eorem b . 5 follows immediately from Corollary b . 7 and Remark a .6. corollary b .8. For each θ ≤ 0, βθ(P,w) = ∑ p ∈ πP θ − (P(p) − 1{w ∈ p})2 is upwards proper. In particular, the Brier score β = β0 is upwards proper. Before concluding this appendix, let me note the following easy fact, which allows to generate a class of downwards proper weighted accuracy measures from any downwards proper local accuracy measure. fact b .9. Suppose u is a weighted accuracy measure with a downwards proper local accuracy measure. Suppose for each π ⪯ π′, p ∈ π and p′ ∈ π′, p ⊆ p′ entails λπ(p) ≤ λπ(p′).en u is downwards proper. appendix c e purpose of this appendix is to establish Corollary c . 16, a straightforward consequence of which is Fact 6.3. Start by rst generalizing the notion of downwards propriety to apply to local accuracy measures: definition c . 1. A local accuracy measure δ ∶ [0, 1]Ð→ R is (strictly) downwards proper i the corresponding simple accuracy measure uδ , dened by uδ(P,w) = ∑ p ∈ πP δ(P(p),1{w ∈ p}), is (strictly) downwards proper. e following is a straightforward consequence ofeorem b . 3: 66 corollary c .2. If δ is downwards proper, then δ(0, 0) ≥ 0. If δ is strictly downwards proper, then δ(0, 0) > 0. Some local accuracy measures care only about the distance between a credal assignment and truth-value. We will say that such local accuracy measures are normal. definition c . 3. A local accuracy measure is normal75 i for each x ∈ [0, 1], δ(x , 1) = δ(1 − x , 0). e following observation-an immediate consequence of the denitions- will come in handy: remark c .4. If δ is normal, then for each x ∈ [0, 1], Ex δ(x) = E1−x δ(1 − x). e combination of normality and downwards-propriety ensures that Ex δ(x) is non-negative. fact c . 5. Suppose δ is normal and downwards proper. en for each x ∈ [0, 1], Ex δ(x) ≥ 0. If in addition, δ is strictly downwards proper, then for each x ∈ [0, 1], Ex δ(x) > 0. Proof. Suppose δ is normal and strictly downwards proper, and suppose for reductio that there is x ∈ [0, 1] such that Ex δ(x) ≤ 0. Since δ is normal, we can apply Remark c .4 we can conclude that Ex δ(x) +E1−x δ(1 − x) = Ex δ(x) +Ex δ(x) ≤ 0. From the normality of δ we also know that δ(0, 0) = δ(1, 1).us, from eorem b . 3 and the fact that δ is strictly downwards proper we know that Ex δ(x) +E1−x δ(1 − x) > E1 δ(1) = δ(1, 1) = δ(0, 0). 75 Cf. Joyce 2009, p. 274. Note that a weighted accuracy measure with a normal local accuracy measure need not be normal in Joyce's sense. In Joyce's terminology, what I'm calling normality best corresponds to 0/1-symmetry. But when restricted to local accuracy measures, 0/1-symmetry and normality are equivalent. 67 Putting all of this together with Corollary c .2, we get 0 ≥ Ex δ(x) +E1−x δ(1 − x) > δ(0, 0) ≥ 0, a contradiction. Perfectly analogous reasoning (mutatis mutandis) shows that if u is downwards proper then Ex δ(x) ≥ 0. For the remainder of this appendix, we will be working with a xed weighted accuracy measure u whose local accuracy measure δ is normal and downwards proper. (Note that u may fail to be downwards proper even if its local accuracy measure δ is.) Our ultimate goal is to establish Fact 6.3. Before getting there, let me introduce a few denitions. definition c .6. Suppose π1, π2 are two renements of π. Say that π1 and π2 are π-equivalent i there is a bijection f ∶ π1 Ð→ π2 such that, for all s ∈ π and all t ∈ π1, t ⊆ s⇔ f (t) ⊆ s. remark c . 7. If π1 and π2 are π-equivalent, then for each s ∈ π, ∣{t ∈ π1 ∶ t ⊆ s}∣ = ∣{t ∈ π2 ∶ t ⊆ s}∣. definition c .8. Suppose π1 and π2 are π-equivalent. Say that a weight function λ π-favors π1 over π2 i there is a bijection f ∶ π1 Ð→ π2 such that for all s ∈ π and all t ∈ π1, 1. t ⊆ s⇔ f (t) ⊆ s and 2. λπ1(t) ≥ λπ2( f (t)). remark c .9. If λ π-favors π1 over π2, then for each s ∈ π, max t ∈ π1 , t ⊆ p λπ1(t) ≥ maxt ∈ π2 , t ⊆ p λπ2(t). 68 theorem c . 10. Suppose u is a weighted accuracy measure with a strictly downwards proper and normal local accuracy measure. Fix P and let π1 and π2 be two πP-equivalent renements of πP . If λ π-favors π1 over π2 and for some p ∈ π with P(p) ≠ 0 max q ∈ π1 , q ⊆ p λπ1(q) > maxq ∈ π2 , q ⊆ p λπ2(q), then 1. maxQ ∈Pπ1 /P EQ[u(Q)] > maxQ ∈Pπ2 /P EQ[u(Q)], and 2. minQ ∈Pπ1 /P EQ[u(Q)] > minQ ∈Pπ2 /P EQ[u(Q)]. Proof. Fix f ∶ π1 Ð→ π2 witnessing that λ π-favors π1 over π2 and x Q and R such that: EQ[u(Q)] = min P′ ∈Pπ1 /P EP′[u(P′)], ER[u(R)] = min P′ ∈Pπ2 /P EP′[u(P′)]. Let p, q, and r range over elements of πP , π1, and π2, respectively. For each p, x an enumeration qpi (1 ≤ i ≤ np) of all subsets of p in π1 and let rpi = f (q p i ). To reduce clutter, for each p and 1 ≤ i ≤ np, let q p i = Q(q p i ). Finally, dene RQ over π2 by letting RQ(r p i ) = q p i . From Fact c . 5 we know that for each x,Ex δ(x) > 0. Since λ π-favors π1 over π2, we thus have EQ[u(Q)] =∑ p ∑ q ⊆ p λπ1(q p i )Eqpi δ(q p i ) >∑ p ∑ 1≤i≤np λπ2(r p i )Eqpi δ(q p i ). But ∑ p ∑ 1≤i≤np λπ2(r p i )Eqpi δ(q p i ) = ERQ [u(RQ)], and by construction ERQ [u(RQ)] ≥ ER[u(R)], which establishes the second claim. To establish the rst claim we will use the following observation, which follows immediately from the denitions and Fact c . 5: 69 remark c . 1 1. Suppose u is a weighted accuracy measure with a normal, downwards proper local accuracy measure. en for each P and each renement π of πP max Q ∈Pπ /P EQ[u(Q)] =∑ p (max q⊆p λπ(q)EP(p) δ(P(p)))+∑ p (kπp ⋅ δ(0, 0)) , where for each p kπp = (∑ q ⊆ p λπ(q)) −maxq ⊆ p λπ(q). Note now that our assumptions ensure that ∑ p max q⊆p λπ1(q)EP(p) δ(P(p)) >∑ p max q ⊆ p λπ2(q)EP(p) δ(P(p)), and that ∑ p kπ1p >∑ p kπ2p . Given Remark c . 1 1 and Corollary c .2, we can conclude max Q ∈Pπ1 /P EQ[u(Q)] > max Q ∈Pπ2 /P EQ[u(Q)], as desired. corollary c . 12. Suppose u is weighted, accuracy measure with a strictly downwards proper local accuracy measure that is normal and concave. Fix P and let π1 and π2 be two πP-equivalent renements of πP . If λ π-favors π1 over π2 and for some p ∈ π with P(p) ≠ 0 max q ∈ π1 , q ⊆ p λπ1(q) > maxq ∈ π2 , q ⊆ p λπ2(q), then 1. maxQ ∈Pπ1 /P EP[u(Q)] > maxQ ∈Pπ2 /P EP[u(Q)], and 2. maxQ ∈Pπ1 /P EP[u(Q)] > maxQ ∈Pπ2 /P EP[u(Q)]. 70 Proof. In light ofeorem c . 10 andeorem a .9, it suces to establish the following lemma: lemma c . 13. Suppose u is partition-wise proper. en for each P and any renement π of πP , max Q ∈Pπ /P EP[u(Q)] = max Q ∈Pπ /P EQ[u(Q)]. Proof. Fix Q such that EP[u(Q)] = max Q ∈Pπ /P EP[u(Q)], and x Q such that EQ[u(Q)] = maxQ ∈Pπ /P EQ[u(Q)]. From Remark a .6, we know that there is an opinionated extension Q∗P of P such that EP[u(Q)] = EQ∗P [u(Q)]. By denition, we have max Q ∈Pπ /P EP[u(Q)] ≥ EP[u(Q)] ≥ EQ[u(Q)] = maxQ ∈Pπ /P EQ[u(Q)]. And since u is partition-wise proper, we have max Q ∈Pπ /P EQ[u(Q)] ≥ EQ∗P [u(Q ∗ P)] ≥ EQ∗P [u(Q)] = EP[u(Q)] = max Q ∈Pπ /P EP[u(Q)]. To conclude, let me state a slight generalization ofeorem c . 10. 71 definition c . 14. Fix P and let π1 and π2 be two renements of πP . Say that π1 and π2 are λ-P-equivalent i there is a bijection f ∶ π1 Ð→ π2 such that: • for all p ∈ πP , ⋃ q ⊆ p f (q) ∈ πP , • for all p ∈ πP , ∑ q ⊆ p, q ∈ π1 P( f (q)) = ∑ q ⊆ p, q ∈ π1 P(q), • for all q ∈ π1, λπ1(q) = λπ2( f (q)). remark c . 15. Suppose π1 and π2 are two λ-P-equivalent renements of πP .en: 1. maxQ ∈Pπ1 /P EQ[u(Q)] = maxQ ∈Pπ2 /P EQ[u(Q)], and 2. minQ ∈Pπ1 /P EQ[u(Q)] = minQ ∈Pπ2 /P EQ[u(Q)]. Proof. Fix f witnessing that π1 and π2 are λ-P-equivalent. Dene φ ∶ Pπ1 /P Ð→ Pπ2 /P by letting φ(Q)(q) = Q( f −1(q)). Clearly, φ is a bijection from Pπ1 /P to Pπ1 /P . And for each Q ∈ Pπ1 /P , EQ[u(Q)] = Eφ(Q)[u(φ(Q)]). corollary c . 16. Suppose u is weighted, accuracy measure with a strictly downwards proper local accuracy measure that is normal and concave. Fix P, let π1 and π2 be two πP-equivalent renements of πP . If λ π-favors π1 over π2 and for some p ∈ π with P(p) ≠ 0 max q ∈ π1 , q ⊆ p λπ1(q) > maxq ∈ π2 , q ⊆ p λπ2(q), then, whenever π3 and π2 are λ-P-equivalent, 72 1. maxQ ∈Pπ1 /P EP[u(Q)] > maxQ ∈Pπ3 /P EP[u(Q)], and 2. maxQ ∈Pπ1 /P EP[u(Q)] > maxQ ∈Pπ3 /P EP[u(Q)]. appendix d e purpose of this appendix is to argue in more detail for the two claims I relied on in §5.2 in making a case for Remark 5.2, as well as to provide a proof of Remark 5.2 starting from those two claims. Recall that we are seeking to assess, for a given credence function C, any expansion C′ of C.e proposal was to use a weighted accuracy measure whose weights were sensitive to explanatory considerations.e construction proceeded in two steps. First, we dened the counterfactual resilience of a given explanandum e relative to a function C as follows. definition d . 1. e counterfactual resilience of e relative to a credence function C (and a set of suppositions B), rB(C , e), is given by: r(C , e) = 1 − 1 ∣B∣ ∑b ∈ B ∣C(e) − C(b e)∣ . As before, I will drop the relativization to the set of suppositions in what follows. We then dened a weighted accuracy measure using a weight function dened in terms of r. definition d .2. Let δ be a local accuracy measure. Fix P and e ∈ BP . For any extension Q of P, εδP(Q , x) = ∑ s ∈ πQ χeC(s)δ(Q(s),1{x ∈ s}) where for each s ∈ πQ , χeC(s) = r(Cs , e), with Cs = C( ⋅ ∣ s). is function will be well-dened, for a given extension Q of P, if and only if for all q ∈ πQ , r(Pq , e) is well-dened. And r(Pq , e) will be well-dened if and only if for each b in the relevant set of suppositions B, 73 Pq(b e) are well-dened. Since b e is not a Boolean combination of b and e, we have no guarantee that b e is in the domain P even if b and e are. So we need to say something about how to assign credences to counterfactuals. Further, since for any non-trivial extension Q of P there will be q ∈ πQ that is not in πP , we need to say something about how to understand Pq. Aer all, using the standard ratio denition (as before), the fact that q /∈ BP entails that Pq is not well-dened. In what follows, I want to oer a principled way of addressing each of these concerns. d . 1 Credences in counterfactuals For our purposes, we will rely on the familiar selection function semantics associated with Stalnaker 1968, which is a particular case of the similarity based semantics associated with Lewis 1973. definition d . 3. A selection function is a function σ ∶ ℘(W) ×W Ð→ ℘(W) such that (i) for each a ∈ ℘(W), σ(a,w) ⊆ a; for each a ∈ ℘(W), if w ∈ a then σ(a,w) = {w}; (iii) for all a, b ∈ ℘(W), if σ(a,w) ⊆ b and σ(b,w) ⊆ a, then σ(a,w) = σ(b,w); (iv) σ(a,w) = ∅ only if a = ∅. As usual, we think of σ(a,w) as the closest worlds tow in which a is true, and we introduce the usual denition: definition d .4. e counterfactual conditional relative to σ ,σ , is a binary propositional operator dened as aσ b ∶= {w ∶ σ(a,w) ⊆ b} us, aσ b contains all and only those worlds such that all of their closest a-worlds are b-worlds. Now, recall that from the perspective of an agent whose state space is π, we can think of each s ∈ π as playing the role of a possible world-the elements of π are each maximally consistent relative to the agent's credence function, in that each proposition in the agent's credence function is entailed by or inconsistent with some member of π.us, from the perspective of such an agent, we can think of the counterfactual conditional 74 as the proposition that is true in all and only those s ∈ π such that their 'closest worlds' that make a true, also make b true. For this to make sense, however, we need a function that assigns a set of closest worlds not to a member ofW and a proposition, but rather to a member of π and a proposition. Fortunately, doing so is straightforward. definition d . 5. Given a selection function σ we can dene a function σ∗ ∶ ℘(W) × ℘(W)Ð→ ℘(W) by letting σ∗(a, s) ∶= {σ(a,w) ∶ w ∈ s}. Slightly abusing notation, we will identify σ∗ and σ , and write σ(a, s) instead of σ∗(a, s). (Note that even if σ(a,w) contains at most one world for all w, σ∗(a, s) will typically contain more than one world.) definition d .6. Fix σ and π. e projection of a σ b onto π, written (aσ b)π , is dened as (aσ b)π ∶=⋃{s ∈ π ∶ σ(a, s) ⊆ b}. We can think of (aσ b)π as the proposition that contains all and only those cells in π such that their closest cells that entail a also entail b.76 More precisely, for each a, b in B, (aσ b)π is the weakest proposition in B that entails aσ b. remark d . 7. Suppose π′ ⪯ π, and let a and b be in the Boolean closure of π′.en (a b)π ′ ⊆ (a b)π . Suppose now C is a credence function with state space π. Suppose a, b ∈ BC but a b /∈ BC . I submit that (a b)π has a very strong claim to being the proposition that plays the role of the counterfactual conditional in an agent whose conceptual resources are given by π. For 76 Note that, for a given s ∈ π , σ(a, s) may be the union of more than one member of π . us, even if the underlying selection function satises the so-called uniqueness assumption-so that for all a ⊆W and w ∈W , σ(a,w) contains at most one world-the selection function ltered through an agent's state space π may not. 75 (a b)π will be the proposition that is true at a 'world'-where this is just an atom of the agent's state space-if and only if in all 'worlds' 'most similar' that 'world' in which a is true, b is true-where similarity is understood in the only terms the agent can grasp. us, whenever a, b ∈ BC , I will henceforth use C(a σ b) to denote the value C assigns to (aσ b)πC , a proposition which, by denition, is in BC . We should accordingly understand r(C , e) as a function of the value that C assigns to (b e)πC . is addresses our rst concern, at least on the assumption that Cs is well-dened. We turn now to the second concern, viz. that Cs may not be well-dened, perhaps because C(s) is not. d . 2 Conditioning beyond one's state space Given everything we've said thus far, for Cs = C(⋅ ∣ s) to be well-dened, we need C(s) to be well-dened (and non-zero)-this is because we have dened C(x ∣ y) using the so-called ratio formula C(x ∣ y) = C(xy) C(y) . But if C′ is a non-trivial extension of C, there will be a C′-atom s′ that such that C(s′) is not well-dened. We thus need some alternative way of dening C′s, then, in order for εC(C′,w) to be well-dened. Suppose C is a credence function with state space π and suppose s′ is not in π. Consider the set C′ of all possible extensions of C whose domain include s′ and x. If for all P, P′ ∈ C′ we have P(x ∣ s′) = P′(x ∣ s′) = α, it makes good sense to assignC(x ∣ s′) = α. Aer all, an agent with credence function C would be committed to assigning α to C(x ∣ s′)-assuming every extension of C whose domain includes x and s′ assigns α to x conditional on s′. (at said, we may wish to restrict our extension of the denition of C(x ∣ y) so that it is only dened over pairs of propositions that the agent whose credence function is C can grasp.) Consider now an agent with credence function C that is assessing an extension C′ of her credence function-an agent perhaps who just underwent an expansion of her conceptual resources. For most a and b in the domain of C′, dierent extensions of C will assign dierent values to b conditional on a. But for our purposes this turns out not to matter. 76 remark d .8. Suppose s is an atom of P and let a be any proposition in the domain of of C.en for any probability function Q whose domain includes s and a, with Q(s) ≠ 0, Q(a ∣ s) will be either 1 or 0, depending on whether s ⊆ a or not. We can now see that, whenever an agent with credence function C is considering an extension C′ of C, χeC(s ′) will be well-dened, for each s′ ∈ π′, since r(Cs′ , e) will be, with r(Cs′ , e) = 1 − 1 ∣B∣ ∑b ∈ B ∣C(e) − 1{s′ ⊆ (b e)π ′ }∣ = 1 − 1 ∣B∣ ∑b ∈ B ∣C(e) − 1{σ(b, s′) ⊆ e}∣ . is suces to establish the following remark, which has Remark 5.2 as an immediate consequence: remark d .9. If C(e) = 1, s is a C-atom, π′ is a renement of C and s′ ⊆ s, with s′ ∈ π′, then r(Cs′ , e) = ∣{b ∈B ∶ s′ ⊆ (b e)π′}∣ ∣B∣ And from Remark d . 7 we can then establish the following observation, from which Remark 6.4 follows immediately. remark d . 10. Suppose C′ is an extension of C to π′ ⪰ π = πC . en for each s′ ∈ π′, s ∈ π, if s′ ⊆ s then χeC(s) ≤ χ e C(s ′ ). 77 references Arntzenius, Frank. 1995. A Heuristic for Conceptual Change. Philosophy of Science 62(3). 357–369. Arntzenius, Frank. 2008. Rationality and Self-Condence. In Tamar Szábo Gendler & John Hawthorne (eds.), Oxford Studies in Epistemology, vol. 2, 165–178. Oxford: Oxford University Press. Baker, Alan. 2003. Quantitative Parsimony and Explanatory Power.e British Journal for the Philosophy of Science 54(2). 245–259. Baker, Alan. 2011. Simplicity. In Edward N. Zalta (ed.),e Stanford Encyclopedia of Philosophy, Summer 2011. Berker, Selim. 2013. Epistemic Teleology and the Separateness of Propositions. Philosophical Review 122(3). 337–393. Binmore, Ken. 2004. GuillermoOwen's Proof OfeMinimaxeorem.eory and Decision 56(1). 19–23. Caie, Michael. 2013. Rational Probabilistic Incoherence. Philosophical Review 122(4). 527–575. Carr, Jennifer. 2015. Epistemic Expansions. Res Philosophica 92(2). 217–236. Ellerman, David. 2010.e Logic of Partitions: Introduction to the Dual of the Logic of Subsets.e Review of Symbolic Logic 3(2). 287–350. Evans, Gareth. 1982.e Varieties of Reference. John McDowell (ed.). Oxford: Clarendon Press. Forster, Malcolm R. 1999. How Do Simple Rules 'Fit to Reality' in a Complex World?Minds and Machines 9(4). 543–564. van Fraassen, Bas C. 1980.e Scientic Image. Oxford University Press. van Fraassen, Bas C. 1990. Figures in a Probability Landscape. In J. Michael Dunn & Anil Gupta (eds.), Truth or Consequences: Essays in Honor of Nuel Belnap, 345–356. Dordrecht: Kluwer Academic Publishers. Franke, Michael & Tikitu de Jager. 2011. Nowat You Mention It: Awareness Dynamics in Discourse and Decisions. In Anton Benz, Christian Ebert, Gerhard Jäger & Robert van Rooij (eds.), Language, Games, and Evolution (LNAI 6207), 60–91. Heidelberg: Springer. Gärdenfors, Peter. 1982. Imaging and Conditionalization. Journal of Philosophy 79(12). 747–760. Garnkel, Alan. 1981. Forms of Explanation. New Haven: Yale University Press. Gibbard, Allan. 2008. Rational Credence and the Value of Truth. In Tamar Szábo Gendler & John Hawthorne (eds.), Oxford Studies in Epistemology, vol. 2, 143–164. Oxford: Oxford University Press. Gilboa, Itzhak. 1987. Expected Utility with Purely Subjective Non-Additive Probabilities. Journal of Mathematical Economics 16(1). 65–88. Goldstein, Michael. 1984. Turning Probabilities Into Expectations.e Annals of Statistics 12(4). 1551–1557. Greaves, Hilary. 2013. Epistemic Decisioneory.Mind 122(488). 915–952. 78 Greaves, Hilary & David Wallace. 2006. Justifying Conditionalization: Conditionalization Maximizes Expected Epistemic Utility. Mind 115(459). 607– 632. Grünwald, Peter D. & A. Philip Dawid. 2004. Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory.e Annals of Statistics 32(4). 1367–1433. Hájek, Alan. n.d. Most Counterfactuals Are False. Unpublished ms., Australian National University. Hájek, Alan. 2003. What Conditional Probability Could Not Be. Synthese 137(3). 273–323. Hurwicz, Leonid. 1951.e generalized Bayes minimax principle: a criterion for decision making under uncertainty. Cowles Commission Discussion Paper 335. 1950. Jackson, Frank & Philip Pettit. 1988. Functionalism and Broad Content.Mind 96(387). 381–400. Jackson, Frank&Philip Pettit. 1990. ProgramExplanation: AGeneral Perspective. Analysis 50(2). 107–117. Jerey, Richard C. 1983.e Logic of Decision. Chicago: University of Chicago Press. Joyce, James M. 1998. A Nonpragmatic Vindication of Probabilism. Philosophy of Science 65(4). 575–603. Joyce, James M. 1999.e Foundations of Causal Decisioneory. New York: Cambridge Univ Press. Joyce, James M. 2009. Accuracy and Coherence: Prospects for an Alethic Epistemology of Partial Belief. In Franz Huber & Christoph Schmidt-Petri (eds.), Degrees of Belief, vol. 342 (Synthese Library), chap. 10, 263–297. Dordrecht: Springer Netherlands. Lange, Marc. 2005. Laws andeir Stability. Synthese 144(3). 415–432. Lange, Marc. 2009. Laws and Lawmakers. New York: Oxford University Press. Leitgeb,Hannes&Richard Pettigrew. 2010. AnObjective Justication of Bayesianism I: Measuring Inaccuracy. Philosophy of Science 77(2). 201–235. Lewis, David. 1973.Counterfactuals. Cambridge, Mass.: Harvard University Press. Lewis, David. 1976. Probabilities of Conditionals and Conditional Probabilities. Philosophical Review 85(3). 297–315. Lewis, David. 1979. Counterfactual Dependence and Time's Arrow. Noûs 13(4). 455–476. Lewis, David. 1981. Causal Decisioneory. Australasian Journal of Philosophy 59(1). 5–30. Lipton, Peter. 1990. Contrastive Explanation. Royal Institute of Philosophy Supplement 27(1). 247–266. Lipton, Peter. 2004. Inference to the Best Explanation. 2nd edn. London: Routledge. 79 Manski, Charles F. 1981. Learning and Decision Making When Subjective Probabilities Have Subjective Domains.e Annals of Statistics 9(1). 59–65. Moss, Sarah. 2011. Scoring Rules and Epistemic Compromise.Mind 120(480). 1053–1069. von Neumann, John. 1928. Zureorie der Gesellschasspiele.Mathematische Annalen 100(1). 295–320. Nolan, Daniel. 1997. Quantitative Parsimony.e British Journal for the Philosophy of Science 48(3). 329–343. Pettigrew, Richard. 2012. Accuracy, Chance, and the Principal Principle. Philosophical Review 121(2). 241–275. Pettigrew, Richard. 2013. A New Epistemic Utility Argument for the Principal Principle. Episteme 10(01). 19–35. Pettigrew, Richard. 2016.e Population Ethics of Belief: In Search of an Epistemiceory X. Noûs. Forthcoming. Popper, Karl. 1959.e Logic of Scientic Discovery. London: Hutchinson. Rényi, Alfréd. 1970. Foundations of Probability. San Francisco: Holden-Day. Romeijn, Jan-Willem. 2005.eory Change and Bayesian Statistical Inference. Philosophy of Science 72(5). 1174–1186. Satia, Jay K. & Roy E. Lave. 1973. Markovian Decision Processes with Uncertain Transition Probabilities. Operations Research 21(3). 728–740. Savage, Leonard J. 1972.e Foundations of Statistics. Second. New York: Dover. Schwarz, Wolfgang. 2016. Subjunctive Conditional Probability. Journal of Philosophical Logic. Skyrms, Brian. 1977. Resiliency, Propensities, and Causal Necessity. Journal of Philosophy 74(11). 704–713. Skyrms, Brian. 1980. Causal Necessity. New Haven: Yale University Press. Sober, Elliott. 1998. Black Box Inference: When Should Intervening Variables Be Postulated?e British Journal for the Philosophy of Science 49(3). 469–498. Stalnaker, Robert C. 1968. Aeory of Conditionals. Studies in Logicaleory 2. 98–112. Stalnaker, Robert C. 2002. Epistemic Consequentialism. Aristotelian Society Supplementary Volume 76(1). 153–168. Strevens, Michael. 2004.e Causal and Unication Approaches to Explanation Unied-Causally. Noûs 38(1). 154–176. Strevens, Michael. 2008.Depth: An Account of Scientic Explanation. Cambridge, Mass.: Harvard University Press. Swanson, Eric. 2006. Interactions With Context. PhD dissertation, Massachusetts Institute of Technology. Troaes, Matthias C.M. 2007. Decision making under uncertainty using imprecise probabilities. International Journal of Approximate Reasoning 45(1). 17–29. Weslake, Brad. 2010. Explanatory Depth. Philosophy of Science 77(2). 273–294. 80 White, Roger. 2005. Explanation as a Guide to Induction. Philosopher's Imprint 5(2). 1–29. Williams, J. Robert G. 2012. Counterfactual Triviality: A Lewis-Impossibility Proof for Counterfactuals. Philosophy and Phenomenological Research 85(3). 648–670. Williamson, Jon. 2003. Bayesianism and Language Change. Journal of Logic, Language, and Information 12. 53–97. Woodward, James. 2005.Makingings Happen: Aeory of Causal Explanation. Oxford: Oxford University Press. Woodward, James. 2010. Scientic Explanation. In Edward N. Zalta (ed.),e Stanford Encyclopedia of Philosophy, Spring 2010. Yablo, Stephen. 1992. Mental Causation. Philosophical Review 101(2). 245–280. Yalcin, Seth. 2007. Epistemic Modals.Mind 116(464). 983–1026. Yalcin, Seth. 2018. Belief as Question-Sensitive. Philosophy and Phenomenological Research 97(1). 23–47. Early View, 2016.