1 Introduction

Were the dinosaurs killed by an asteroid? I don’t know—and neither do you. How confident ought we to be that this proposition is true?

A plausible answer is that our confidence that the dinosaurs were killed by an asteroid ought to be equal to the probability of that proposition given our evidence. This raises two further questions: what is our evidence, and how is the probability of a proposition given some evidence determined? This paper is a first step in the (very large) project of answering the second of these questions.

The relevant sense of probability here is epistemic probability. The epistemic probability of A given B—notated \(\text{P}(\text{A}|\text{B})\)—is a relation between the propositions B and A. It is the degree to which B supports A, or makes A plausible. Entailment is a limiting case of this relationship; if B entails A, then \(\text{P}(\text{A}|\text{B})=1.\) It constrains rational degrees of belief, in that, if \(\text{P}(\mathrm{A}|\text{B})=n,\) then someone with B as their evidence ought to be confident in A to degree n.Footnote 1

Keynes (1921), Jeffreys (1939), Cox (1946), Carnap (1950), Williamson (2000: ch. 10), Swinburne (2001), Jaynes (2003), Hawthorne (2005), and Maher (2006) offer similar explications of probability.Footnote 2 There is a great deal more that could be said about the nature of epistemic probability. Most of the above authors claim that epistemic probability relations are necessary and knowable a priori. I am sympathetic to these claims, but the approach to the structure of epistemic probabilities I go on to defend could also be accepted by philosophers who conceive of probabilistic support relations in an externalist or subjectivist manner.Footnote 3

Epistemic probabilities conform to the laws of the probability calculus. However, these laws do not suffice to determine the values of epistemic probabilities. We can break down the project of explaining how these values are determined into two parts, which I will call the structural project and the substantive project. The structural project asks what probabilities’ values are determined by the values of other probabilities, and what probabilities’ values are not determined by the values of other probabilities. The substantive project then asks how the values of the latter probabilities are determined. (For example, one traditional answer would be that they are determined by the Principle of Indifference.) In this paper I undertake the structural project, leaving the substantive project for another time.

The premise of the structural project is that just as an object’s weight is determined by its mass and its gravitational acceleration, the values of some probabilities are determined by the values of other probabilities.Footnote 4 I will call probabilities the values of which are determined by the values of other probabilities non-basic. Basic probabilities, by contrast, are the elementary quantities out of which other probabilities are built; they are the ‘atoms’ of probability theory. Given values for basic probabilities, we can compute values for all non-basic probabilities.Footnote 5

So the structural project asks: what probabilities are basic? And the substantive project asks: how are the values of these basic probabilities determined? Although these questions are both metaphysical, they are interesting mainly because of their epistemological upshot. We want to be able to figure out how probable our evidence makes the hypothesis that the dinosaurs were killed by an asteroid. The structural and substantive projects aid us in this to the extent that they help us figure out the values of basic probabilities, and then compute the values of non-basic probabilities (like, I will argue, this one) as a function of those.

In the past, philosophers who have addressed the question of how we might figure out the values of epistemic probabilities have mainly focused on the substantive project, jumping straight to arguing for or against substantive methods like the Principle of Indifference. But what probabilities should we (for example) assign equal values to? We must answer the structural question before we can know how to apply the Principle of Indifference (or some other substantive method).

Some philosophers have suggested that the values of some epistemic probabilities can be directly perceived (e.g., Keynes 1921: ch. II.8). If this is so, it again raises the question: which ones? In Sects. 3.3 and 3.4, I argue that the values of basic probabilities are more epistemically accessible than the values of non-basic probabilities. This means that determining which probabilities are basic can help us more reliably figure out the values of probabilities we care about even in the absence of an answer to the substantive question.

I call my answer to the structural question Explanationism. Informally, Explanationism says that the basic probabilities are the probabilities of atomic propositions conditional on potential direct explanations of those propositions. In Sect. 2, I explain Explanationism in more depth, contrasting it with the Orthodox view about the structure of probabilities. In Sect. 3, I argue for Explanationism against Orthodoxy. In Sect. 4, I explore some philosophical implications of Explanationism. Section 5 concludes with some questions for further research.

2 Rival views on the structure of probabilities

Before us is an urn. We know that it was selected by coin flip from two urns, U1 and U2. U1 contains 1 black ball and 2 white balls, and U2 contains 2 black balls and 1 white ball. We propose to learn about the contents of the urn by sampling from it at random. Let B and W stand for the propositions that the ball we draw is black or white, respectively.

In this problem there are two variables: the contents of the urn, and what color ball we draw. For each value that a variable can take on (e.g., the color of the ball drawn taking on the value black), there is an associated proposition (e.g., the proposition that the ball drawn out of the urn is black). Hence, each variable has an associated partition, that is, set of mutually exclusive and jointly exhaustive possibilities: {U1, U2}, {B, W}. (For ease of exposition, I will often informally speak of the members of these partitions as the values of their associated variables.)

In this problem we have four atomic propositions: U1, U2, B, and W.Footnote 6 We also have various complex disjunctions and conjunctions of these propositions which we can consider. Of particular interest are the following complex propositions:

  • U1&B

  • U1&W

  • U2&B

  • U2&W

These propositions are state-descriptions—conjunctions in which one member of each partition appears once. State-descriptions are maximally complete descriptions of the world of our problem. They answer all our questions, assign a value to all our variables. In general, if we have n partitions with m members each, then we have mn possible state-descriptions.Footnote 7

For any pair of propositions in our problem X and Y, we can consider \(\text{P}(\text{X}|\text{Y}).\) We can also consider “unconditional” probabilities like P(X), which is the probability of X conditional only on the background knowledge given in the statement of the problem. (For ease of exposition, I suppress this background in this and the next section, e.g., writing P(U1&B) rather than \(\text{P}(\text{U}_{1}\&\text{B}\,|\,\text{K}).\)) Our question, applied to this problem, is which of these probabilities are basic, and which are non-basic.

2.1 Non-starters

One answer is that all these probabilities are basic. The lack of attention to the structural question suggests that many philosophers tacitly assume this. A second answer is that all the unconditional probabilities are basic. On this view, P(U1) and P(B) are basic, but \(\text{P}(\text{U}_{1}|\text{B})\) and \(\text{P}(\text{B}|\text{U}_{1})\) are not. This view is suggested by Hedden’s (2015b: 470) claim that the “unique rational prior probability function … represents the a priori plausibility of each proposition,” and Williamson’s (2000: 211) remark that evidential probability “measures something like the intrinsic plausibility of hypotheses prior to investigation.” This second view is also implicit in the subjective Bayesian theories of Ramsey (1926) and Jeffrey (1983), which define unconditional degrees of belief first, and then define conditional degrees of belief in terms of these.Footnote 8

Accepting either of these views makes it difficult to give an account of how the values of basic probabilities are determined. Standard answers to the substantive question would lead to probabilistic incoherence if applied to all probabilities, or applied to all unconditional probabilities. For example, the Principle of Indifference tells us to assign equal probabilities to a set of possibilities when our information does not support one over another. But it is impossible to assign equal values to all probabilities, or all unconditional probabilities; doing so will always be probabilistically incoherent. (For example, in the problem at hand, suppose that P(U1) = P(U2) = P(U1&B) = P(U1&W). Since P(U1) = P(U1&B) + P(U1&W), it follows that P(U1) = 2P(U1), and so P(U1) = P(U2) = 0. But this is impossible, since {U1, U2} is a partition, and so P(U1) + P(U2) = 1.) So the Principle of Indifference can never directly determine the values of all (unconditional) probabilities; if it determines the values of these probabilities at all, it must determine some indirectly, by determining the values of others. Or consider a substantive view on which simpler propositions have higher probabilities than more complex propositions. Presumably U1∨B is a more complex proposition than U1. If this criterion of simplicity is applied unrestrictedly, it then implies that P(U1) > P(U1∨B), which is impossible.

I discuss further how answers to the structural question combine with substantive principles for determining the values of basic probabilities in Sect. 3.6.1. For now the important thing to note is that principles like the above were designed to be applied to partitions of propositions, like {U1, U2} and {B, W}. What went wrong in the above examples is that the different propositions being assigned probabilities are not mutually exclusive. I will now consider two structural views on which basic probabilities are assigned across partitions, in a way that makes it easier to combine these views with an answer to the substantive question.

2.2 Orthodoxy

The first of these views focuses on the partition of state-descriptions: in this case, {U1&B, U1&W, U2&B, U2&W}. On this view, the basic probabilities are the unconditional probabilities of state-descriptions: P(U1&B), P(U1&W), P(U2&B), and P(U2&W). This answer to the structural question takes its cue from orthodox mathematical treatments of probabilty, in which the probabilities of state-descriptions are assigned first, and other probabilities are determined as a function of these. Because of this, I call this view Orthodoxy.

In Kolmogorov’s (1933) axiomatization of probability, the set of different state-descriptions is the “sample space.” The sample space is one of the three basic notions in Kolmogorov’s axiomatization. The second notion is an “algebra” on this sample space, that is, a set of subsets of the sample space. We can understand this as a set of state-descriptions and disjunctions of state-descriptions. The third notion is a “probability function” from members of the algebra to the unit interval [0,1].

While Kolmogorov’s axioms for this probability function do not themselves require that any particular members of the algebra get assigned numbers first, the most standard way to construct a function that obeys these axioms is to begin by assigning probabilities to each member of the sample space (i.e., each state-description) such that these probabilities sum to 1.Footnote 9 (One can think of each state-description as taking up a certain proportion of the total space of possibilities, which has measure 1.) Kolmogorov’s axioms, together with the ratio definition of conditional probability, then determine \({\text{P}}({\text{X}}|{\text{Y}})\) for any pair of propositions in our algebra X and Y, because any such proposition is logically equivalent to a disjunction of state-descriptions.Footnote 10

Since orthodox probability theory assigns unconditional probabilities directly to each state-description, it treats the unconditional probabilities of state-descriptions as basic. Perhaps for this reason, most philosophers who have given precise quantitative (as opposed to merely qualitative) solutions to the substantive problem, including Carnap (1950), Solomonoff (1964), and Williamson (2010), have assumed Orthodoxy in their solutions.Footnote 11

2.3 Explanationism

A final answer to our question, and the one I will defend, is Explanationism.Footnote 12 According to Explanationism, basic probabilities are the probabilities of atomic propositions conditional on propositions directly explanatorily prior to them. Because {U1, U2} is directly prior to {B, W}, and nothing is prior to {U1, U2}, we have here six basic probabilities: P(U1), P(U2), \({\text{P}}({\text{B}}|{\text{U}_{1}}), {\text{P}}({\text{W}}|{\text{U}}_{1}), {\text{P}}({\text{B}}|{\text{U}}_{2}),\) and \({\text{P}}({\text{W}}|{\text{U}}_{2}).\)

figure a

Before offering a more formal statement of Explanationism, it will be helpful to go through this reasoning more slowly. According to Explanationism, the first step in determining the values of probabilities is to order the variables/partitions in our problem by their explanatory priority. In our current case, the Urn variable is explanatorily prior to the Draw variable—the contents of the urn influence what ball we draw out, but what we draw from the urn does not influence its (initial) composition. Figure 1 formalizes these priority relations. It has two nodes, representing our two variables, with an arrow from the Urn node to the Draw node because the former is prior to the latter.

After ordering our variables, we take the basic probabilities to be those given to values of a variable by values of the variable(s) immediately prior to it. A basic probability, then, is the probability of a “downstream” proposition conditional on immediately “upstream” propositions. In the current case there are six such probabilities:

  • \(P(U_{1})=1/2\)

  • \(P(U_{2})=1/2\)

  • \(P(B|U_{1})=1/3\)

  • \(P(W|U_{1})=2/3\)

  • \(P(B|U_{2})=2/3\)

  • \(P(W|U_{2})=1/3\)

The Urn node is a root node; that is, there are no nodes pointing into it. As such, the (basic) probabilities of U1 and U2 are represented as unconditional. Really, they are conditional on the suppressed background knowledge given in the statement of the problem.

These six basic probabilities let us calculate any other probabilities we might be interested in. For example, Bayes’ Theorem gives us:

$$P(U_{1} |B) = \frac{{P(U_{1} )P(B|U_{1} )}}{{P(U_{1} )P(B|U_{1} ) + P(U_{2} )P(B|U_{2} )}} = \frac{{\left( {\frac{1}{2}} \right)\left( {\frac{1}{3}} \right)}}{{\left( {\frac{1}{2}} \right)\left( {\frac{1}{3}} \right) + \left( {\frac{1}{2}} \right)\left( {\frac{2}{3}} \right)}} = \frac{1}{3}$$

In this simple example, we had only two variables to order. Consider now two modifications of the above case. In the first modification, we make two draws with replacement from the urn. Then our diagram looks like Fig. 2. In the second modification, we make two draws from the urn but do not replace the ball after the first draw. Then our diagram looks like Fig. 3. Alternatively, we could represent the choice to replace or not replace the first draw as a separate variable, as in Fig. 4.

In these diagrams, we include an arrow from one variable to another if we think it possible that the value of the first variable somehow influences the value of the second. If we are sampling with replacement, the outcome of the first draw does not influence the outcome of the second. If we are sampling without replacement, it does; drawing black the first time lowers the probability that we draw it the second time. In Fig. 4, the lack of an arrow from the Draw 1 variable to the Replacement variable represents the assumption that the outcome of the first draw will not influence our choice of whether to put the ball back in the urn.

figure b

Figures 1, 2, 3 and 4 are directed acyclic graphs (DAGs). A DAG is a directed graph with no loops. It consists of a finite number of nodes, with arrows drawn from some nodes to other nodes such that the arrows never form a loop. We can interpret a DAG as giving us the ordering of the variables in our algebra which allows us to determine which probabilities are basic. To do this we employ the language of ancestors and descendants. We say that X is a parent of Y iff there is an arrow from X to Y, and an ancestor of Y iff it is a parent, parent of a parent, etc. (that is, there is a directed path from X to Y). If X is a parent/ancestor of Y, Y is a child/descendant of X.

The variables represented by a DAG are said to obey the Markov condition just in case a variable’s parents screen it off from all non-descendants. For example, in Fig. 2 the Urn variable screens off Draw 1 from Draw 2—if we know what urn we are sampling from, learning the outcome of the second draw provides us no information about the outcome of the first draw, and vice versa. Formally:

  • Markov Condition

  • A DAG obeys the Markov condition iff for all atomic X, X is conditionally independent, given any assignment of values to its parent variables, from any (conjunction of) non-descendants of X.

The Markov condition is intuitively plausible when we think of a DAG as representing causal structure (and there are no relevant causal variables omitted from the DAG). If Y already tells us everything relevant to predicting X in advance, then we can only get more information about whether or not X is true by learning about its effects. For example, if we know that the only thing that directly causally influences one’s getting lung cancer is the amount of tar in one’s lungs, then it is plausible that the amount of tar in one’s lungs screens off getting cancer from one’s smoking habits—that is, \({\text{P}} ({\text{cancer}}\,|\, {\text{tar}}) = {\text{P}} ({\text{cancer}}\,|\, {\text{tar}}\& {\text{smoking}}).\)Footnote 13

A directed network of partitions that obeys the Markov condition is called a Bayesian network. On the Explanationist answer to the structural problem, we start off by ordering the partitions we are interested in in a Bayesian network. The basic probabilities will be those given to an atomic proposition by assignments of values to all its parents. All other probabilities in the network can be determined as a function of those (Pearl 2000: 14–16). For example, in Fig. 4, \({\text{P}}({\text{B}}_{2}\,|\,{\text{R}}\&{\text{B}}_{1}\&{\text{U}}_{1})\) is basic, but \({\text{P}}({\text{B}}_{2}\,|\,{\text{B}}_{1}\&{\text{U}}_{1})\) is not, because the latter probability is not conditioned on all the parents of B2. Rather, \({\text{P}}({\text{B}}_{2}\,|\,{\text{B}}_{1}\&{\text{U}}_{1})\) must be calculated as a weighted average of \({\text{P}}({\text{B}}_{2}\,|\,{\text{R}}\&{\text{B}}_{1}\&{\text{U}}_{1})\) and \({\text{P}}({\text{B}}_{2}\,|\,\sim{\text{R}}\&{\text{B}}_{1}\&{\text{U}}_{1}),\) weighted by \({\text{P}}({\text{R}}\,|\,{\text{B}}_{1}\&{\text{U}}_{1})\) and \({\text{P}}(\sim{\text{R}}\,|\,{\text{B}}_{1}\&{\text{U}}_{1}).\)Footnote 14

Here, then, is a first statement of Explanationism:

  • Explanationism 1.0

  • \({\text{P}}({\text{X}}|{\text{Y}})\) is basic iff X is atomic, and Y is a conjunction of values for all parents of X in a Bayesian network that includes all variables immediately explanatorily prior to X, and correctly relates all the variables it includes.

Later on, in Sect. 3.6, I will relativize this statement to higher-order hypotheses about Bayesian networks, to allow for uncertainty about the correct Bayesian network. For the moment, I set these complications aside, as there is plenty to unpack here already.

First, in saying that a Bayesian network correctly relates the variables it includes, I mean that it includes an arrow from V1 to V2 iff V1 is immediately explanatorily prior to V2.Footnote 15 By ‘immediately explanatorily prior’ (or ‘directly explanatorily prior’), I mean that V1 is explanatorily prior to V2, and there is no other variable that mediates the explanatory relation between them.

When is V1 explanatorily prior to V2? Causal priority, as in the above urn examples, is one kind of explanatory priority, and the most common kind to which Bayesian networks have been applied. Schaffer (2016) also uses Bayesian networks to formalize metaphysical grounding. Plausibly, causal and metaphysical priority are the only two kinds of direct explanatory priority, so that V1 is directly explanatorily prior to V2 iff it is either directly causally prior to V2 or directly metaphysically prior to V2. But if V1 is metaphysically prior to V2, which is causally prior to V3, then even if V1 is neither metaphysically or causally prior to V3, it is still explanatorily prior to it: indirect relations of explanatory priority need not be solely metaphysical or solely causal, but can be combinations of both (cf. Lange 2018: 1345).

Whether causal and metaphysical priority are really the only two kinds of direct explanatory priority is disputable. Mathematical priority might be distinct from metaphysical grounding. Huemer (2009: 352–53) discusses temporal, part-whole, in-virtue-of, and supervenience priority. Henderson et al. (2010: 180) speak of more specific theories as being “constructed” out of more general theories, giving examples in which the probability of the specific theory conditional on the general theory is apparently treated as basic by scientists. I leave the question of whether these are really (distinct) kinds of explanatory priority, and whether there are other kinds, as an area for further research.

Although I am aware of no philosopher who has explicitly formulated Explanationism in the above manner, the view has several important predecessors. It sides with defenders of inference to the best explanation (e.g., Thagard 1978; Lipton 2004; Henderson 2014; Hedden 2015a: Sect. 4; Climenhaga 2017a) in holding that explanatory relations are central to uncertain inference. Mathematically, it is indebted especially to Pearl’s (1988, 2000) groundbreaking work on Bayesian networks.Footnote 16 For the most part, philosophers who have applied Bayesian networks to epistemology (e.g., Bovens and Hartmann 2003) do not discuss the foundational issues explored in this essay; the same goes for statisticians such as Gelman et al. (2014: ch. 5) who employ hierarchical Bayesian models (a special case of Bayesian networks)Footnote 17 in statistics. Explanationism can justify these applications of Bayesian networks, as well as (I argue in Sect. 3.4) many other ordinary applications of probability that do not appeal to graphical modeling. The philosophers who have come closest to endorsing Explanationism are Henderson et al. (2010), who defend hierarchical Bayesian modeling in the philosophy of science, and Huemer (2009) and Weisberg (2009: 140–41), who defend the application of the Principle of Indifference to explanatorily basic partitions. Both of these are special cases of Explanationism.Footnote 18

In the next section I give six arguments for Explanationism.Footnote 19 The first is that it fits more naturally with the characteristics of epistemic probability than does Orthodoxy. The second is that in some cases, conditional probabilities may be well-defined while associated state-description probabilities are not, making the latter unavailable as a ground for the former. The third and fourth are that the probabilities that Explanationism identifies as basic are precisely those which we find ourselves able to more easily judge the value of, both in urn-sampling thought experiments and in more realistic applications. The fifth is that Explanationism can be more easily extended to calculate probabilities conditioned on interventions rather than observations. The final, and most important, argument is that plausible substantive methods deliver incorrect results when combined with Orthodoxy, but not when combined with Explanationism.

3 Six arguments for Explanationism

3.1 Explanationism fits better with the nature of epistemic probability

Orthodoxy about a kind of probability may look initially appealing partly because it offers to reduce conditional probabilities to unconditional probabilities. Epistemic probability, though, is a relation between propositions: the degree to which one proposition makes another plausible. This means that all epistemic probabilities are conditional, because only conditional probabilities have two relata. The “unconditional” epistemic probability of a state-description is really the state-description’s probability conditional only on a priori truths (Hájek 2003: 315)—the degree to which a priori truths make that state-description plausible.

On the epistemic interpretation of probability, then, Orthodoxy becomes less motivated: it becomes unclear why we should think that the probabilities that Orthodoxy identifies as basic are basic. If these conditional probabilities can be basic, why must other conditional probabilities be defined in terms of them? What is special about the Orthodox basic probabilities?

By contrast, Explanationism can give a principled explanation of why, say, \({\text{P}}({\text{B}}|{\text{U}}_{1})\) is basic—it is basic because U1 directly gives a probability to B in virtue of the Urn variable being the sole variable influencing B’s truth. U1 (which says that the urn contains 1 black and 2 white balls) directly makes B plausible to degree 1/3 because of the role it plays in explaining the truth or falsity of B. This fits well with a conception of epistemic probability as measuring a quantity (namely, plausibility) that U1 confers on B.

3.2 Conditional probabilities of atomic propositions may be well-defined when associated unconditional state-description probabilities are not

It is controversial whether all probabilities are well-defined. Hájek (2003: 303–05, 309–10) suggests that there may not be well-defined physical or subjective probabilities for some of a person’s future free actions. Similarly, one might think that the unconditional epistemic probabilities of some future free actions are undefined. Consider again the urn example represented in Fig. 4, in which we include a variable for whether we sample with replacement. If the choice whether or not to replace is a free choice, it might be that P(R) is undefined.

It is obvious that \({\text{P}}({\text{B}}_{2}\,|\,{\text{R}}\&{\text{B}}_{1}\&{\text{U}}_{1})\) = 1/3—for U1 says that we are drawing from the urn with 1 black ball and 2 white balls, and R says that we replace our first draw, so that it does not impact the composition of the urn. However, Orthodoxy would have it that this value is determined by the equation

$$P(B_{2} |R\& B_{1} \& U_{1} ) = \frac{{P(B_{2} \& R\& B_{1} \& U_{1} )}}{{P(R\& B_{1} \& U_{1} )}} = \frac{{P(B_{2} \& R\& B_{1} \& U_{1} )}}{{P(B_{2} \& R\& B_{1} \& U_{1} ) + P(W_{2} \& R\& B_{1} \& U_{1} )}}$$

But if P(R) is undefined, then so presumably are these state-description probabilities. So according to Orthodoxy, \({\text{P}}({\text{B}}_{2}\,|\,{\text{B}}_{1}\&{\text{R}}\&{\text{U}}_{1})\) should be undefined too. By contrast, Explanationism identifies \({\text{P}}({\text{B}}_{2}\,|\,{\text{B}}_{1}\&{\text{R}}\&{\text{U}}_{1})\) as basic, and so can easily let it be well-defined.

It is not obvious that some epistemic probabilities are undefined. But it is also not obvious that all epistemic probabilities are well-defined. Orthodoxy would make obviously well-defined conditional probabilities undefined if the unconditional probabilities of some state-descriptions turn out to be undefined. By contrast, Explanationism can allow that these obviously well-defined conditional probabilities are well-defined, even if the unconditional probabilities of the associated state-descriptions turn out to be undefined. Inasmuch as we should leave open the possibility that some epistemic probabilities are undefined, we should prefer a structural theory that does not let potentially undefined probabilities extend their influence too widely.

3.3 The probabilities Explanationism identifies as basic can be more directly perceived than those Orthodoxy identifies as basic

In the above urn cases, you can immediately tell that \({\text{P}}({\text{B}}_{2}|{\text{U}}_{1})=1/3\) as soon as you understand what B and U1 say. On the Orthodox treatment of probability, however, \({\text{P}}({\text{B}}_{2}|{\text{U}}_{1})\) is not basic, but is instead defined as

$$P(B|U_{1} ) = \frac{{P(U_{1} \& B)}}{{P(U_{1} )}} = \frac{{P(U_{1} \& B)}}{{P(U_{1} \& B) + P(U_{1} \& W)}}$$

However, whereas you can immediately see that \({\text{P}}({\text{B}}|{\text{U}}_{1})=1/3,\) the unconditional probabilities of these two state-descriptions are not immediately obvious. Were you called upon to determine P(U1&B), the way to proceed would be to reduce it to P(U1)\({\text{P}}({\text{B}}|{\text{U}}_{1})=(1/2)(1/3)=1/6.\) But this way of determining its value appeals to \({\text{P}}({\text{B}}|{\text{U}}_{1})=1/3,\) and so cannot be the means by which we gain knowledge of that equality (cf. Pearl 1988: 31, 2000: 4).

It does not follow from the fact that we can more immediately see the value of \({\text{P}}({\text{B}}|{\text{U}}_{1})\) than P(U1&B) that the former is more metaphysically basic than the latter. In many contexts, less metaphysically basic properties are more epistemically accessible. For example, we can more easily determine the weight of an object than its mass, even though the weight depends on the mass. In the a priori case, many of us can readily tell that, if we have four cards with “Beer” or “not-Beer” on one side and “Over 21” or “Under 21” on the other, then in order to make sure that no card violates the rule “If you are drinking beer you are over 21,” we must turn over any card with Beer face up and any card with Under 21 face up. But we cannot as readily tell that, if we have four cards with “P” or “not-P” on one side and “Q” or “not-Q” on the other, then, in order to make sure that no card violates the rule “If P then Q,” we must turn over any card with P face up and any card with not-Q face up.

Nevertheless, the metaphysical basicality of \({\text{P}}({\text{B}}|{\text{U}}_{1})\) is the most plausible explanation of its epistemic directness in the present case. We are able to discover empirical properties without any knowledge of their metaphysical grounds because we can examine the way they affect the environment. For example, we can determine an object’s weight by placing it on a scale. But this is not how we determine the value of \({\text{P}}({\text{B}}|{\text{U}}_{1})\): we do not measure the effects of this value on some external stimulus. Similarly, we can sometimes more readily perceive less basic a priori facts because of our implicit knowledge of the more basic facts which make them true. But our knowledge that \({\text{P}}({\text{B}}|{\text{U}}_{1})=1/3\) does not appear to be based on any implicit grasp of the values of P(U1&B) and P(U1&W), in the way that our knowledge of how to react in the beer-rule example is based on implicit knowledge of how conditionals work.

Instead, in this case we appear to judge that \({\text{P}}({\text{B}}|{\text{U}}_{1})=1/3 \) simply because we understand what B says and we understand what U1 says. If you were to ask a layperson, unfamiliar with Kolmogorov’s axiomatization, why \({\text{P}}({\text{B}}|{\text{U}}_{1})=1/3, \) the most likely answer would appeal to the content of B and U1, and their explanatory relation: “Well, U1 says that 1 out of the 3 balls is black, and B says that we draw a black ball.” (And perhaps: “And we’ve got no reason to think we’re more likely to draw one ball than another.”) So in the present case, it is plausible that we perceive the value of \({\text{P}}({\text{B}}|{\text{U}}_{1}) \) either directly or in virtue of grasping some substantive rule like the Principle of Indifference.

3.4 Explanationism better models actual probabilistic reasoning

In the last sub-section I observed that the propositions Explanationism identifies as metaphysically basic in our urn example are exactly the ones that are most epistemically direct, and argued that their being metaphysically basic is a plausible explanation of their being epistemically direct. You might worry that the urn example is cherry-picked, and that in other examples we can more easily see the values of state-description probabilities. However, when we turn to real-life applications of Bayesian reasoning, we find that—despite orthodox mathematical probability theory’s favoring Orthodoxy—philosophers and scientists reason more in accord with Explanationism than Orthodoxy.

For example, consider Bayes’ Theorem,

$$P(H|E) = \frac{P(H)P(E|H)}{P(H)P(E|H) + P( \sim H)P(E| \sim H)}$$

Expositions of Bayes’ Theorem frequently advocate its use in cases where H is a “hypothesis” or “theory” and E is some “empirical data” “predicted” by H (see, e.g., Howson and Urbach 2006: 20–22; Joyce 2008: Sect. 1; Weisberg 2015: Sect. 1.2.2). These terms connote H’s being explanatorily prior to E, as in Fig. 5. If Fig. 5 is our entire network, then according to Explanationism, the basic probabilities in the network are exactly the ones in Bayes’ Theorem above.Footnote 20

figure c

When we turn to examples writers use to illustrate Bayes’ Theorem, they are invariably ones in which the hypothesis H is explanatorily prior to the evidence E. Salmon (1990: 178) illustrates Bayes’ Theorem with an example in which H is the hypothesis that a particular can opener was produced by a machine with a given propensity for producing defective can openers, and E is the (explanatorily downstream) proposition that this can opener is defective. All four examples (drawing balls from an urn, finding a spider in a batch of bananas, hearing a witness report the color of a taxi, and getting a positive result on a medical test) in the “Bayes’ Rule” chapter from Ian Hacking’s introductory textbook (Hacking 2001: ch. 7) likewise conform to this pattern.

figure d

Again, consider the standard Bayesian treatment of Duhem’s problem that most scientific hypotheses only make definitive predictions when combined with auxiliary assumptions. If H is our hypothesis and E is our empirical data, as before, this amounts to the problem of determining \({\text{P}}({\text{E}}|{\text{H}}) \) and \({\text{P}}({\text{E}}|{\sim}{\text{H}}) \) when applying Bayes’ Theorem. The standard Bayesian resolution is to make explicit different possible auxiliary assumptions {A1, …, An} and incorporate them into Bayes’ Theorem as follows (Howson and Urbach 2006: 103–14):

$$P(H|E) = \frac{{\mathop \sum \nolimits_{i} P(H\& A_{i} )P(E| \sim H\& A_{i} )}}{{\mathop \sum \nolimits_{i} \left[ {P(H\& A_{i} )P(E|H\& A_{i} ) + P( \sim H\& A_{i} )P(E| \sim H\& A_{i} )} \right]}}$$

If H and the Ai are independent (relative to any implicit background knowledge), then P(H&Ai) = P(H)P(Ai), and we have:

$$P(H|E) = \frac{{\mathop \sum \nolimits_{i} P\left( {H)P(A_{i} } \right)P(E| \sim H\& A_{i} )}}{{\mathop \sum \nolimits_{i} \left[ {P(H)P(A_{i} )P(E| \sim H\& A_{i}) + P( \sim H)P(A_{i} )P(E| \sim H\& A_{i} )} \right]}}$$

In this context, the Ai are understood to be additional assumptions about, e.g., experimental set-up, the accuracy of our measurements, and any background theory relevant to making predictions about the outcome of our experiment. This suggests (if H and the Ai are independent) the network in Fig. 6. According to Explanationism, in this network, the terms on the right-hand side of the above equation are all basic.Footnote 21

The above examples furnish us with another argument for Explanationism: many applications of Bayesian reasoning break down probabilities into precisely those quantities which Explanationism says are basic (or closer to being basic, inasmuch as the above networks approximate the actual evidential situation). As Pearl (1988: 78) notes,

Human performance shows the opposite pattern of complexity [from Orthodoxy]: probabilistic judgments on a small number of propositions … are issued swiftly and reliably, while judging the likelihood of a conjunction of propositions entails much difficulty and hesitancy. This suggests that the elementary building blocks of human knowledge are not entries on a joint-distribution table. Rather, they are low-order marginal and conditional probabilities defined over small clusters of propositions.

Inasmuch as it is plausible that the more metaphysically basic probabilities will also be more epistemically direct, Explanationism explains the way people reason probabilistically in both philosophical and empirical contexts. By contrast, if Orthodoxy is true it is unclear why philosophers and scientists so often apply rules like Bayes’ Theorem to break down complex probabilities into precisely those probabilities which Explanationism identifies as basic.

3.5 Explanationism combines more easily with a probabilistic calculus for causal interventions

Another advantage of Explanationism presents itself when we consider adding the possibility of “direct causal interventions” to our problem. Explanationism, but not Orthodoxy, can easily tell us what probabilities to assign propositions conditional on such interventions.

In his influential 2000 book Causality, Pearl argues that we need to expand the syntax of the probability calculus to include probabilities of the form \({\text{P}}({\text{X}}\,|\,{\text{do}}(\text{Y})),\) where do(Y) says that we directly make Y true, rather than observe that Y is true. Pearl (2000: 110) observes,

By specifying a[n Orthodox] probability function P(s) on the possible states of the world, we automatically specify how probabilities should change with every conceivable observation e, since P(s) permits us to compute (by conditioning on e) the posterior probabilities \(P(E|e)\) for every pair of events E and e. However, specifying P(s) tells us nothing about how probabilities should change in response to an external action do(A).

Constructing a Bayesian network relating X and Y allows us to determine \({\text{P}}({\text{X}}\,|\,{\text{do}}(\text{Y}))\) by simply deleting any arrows going into Y, and calculating \({\text{P}}({\text{X}}|\text{Y})\) in our mutilated network. Consider again the case of sampling twice from our urn with replacement in Fig. 2. Because we are sampling with replacement, the outcome of the first draw does not influence the outcome of the second—hence there is no arrow between them. However, learning that the first draw was black gives us information about the contents of the urn, and so is evidence that the second draw will also be black. By breaking down the value of \({\text{P}}({\text{B}}_{2}|\text{B}_{1})\) into basic probabilities, we can see that B1 raises the probability of B2 by raising the probability of U2 from 1/2 to 2/3:

$$\begin{aligned} P(B_{2} |B_{1} ) = &\, P(U_{1} |B_{1} )P(B_{2} |U_{1} ) + P(U_{2} |B_{1} )P(B_{2} |U_{2} ) \\ = &\, \frac{{P(U_{1} )P(B_{1} |U_{1} )}}{{P(U_{1} )P(B_{1} |U_{1} ) + P(U_{2} )P(B_{1} |U_{2} )}}P(B_{2} |U_{1} ) \\ & + \frac{{P(U_{2} )P(B_{1} |U_{2} )}}{{P(U_{1} )P(B_{1} |U_{1} ) + P(U_{2} )P(B_{1} |U_{2} )}}P(B_{2} |U_{2} ) \\ =\, & \frac{{\left( {\frac{1}{2}} \right)\left( {\frac{1}{3}} \right)}}{{\left( {\frac{1}{2}} \right)\left( {\frac{1}{3}} \right) + \left( {\frac{1}{2}} \right)\left( {\frac{2}{3}} \right)}}\left( {\frac{1}{3}} \right) + \frac{{\left( {\frac{1}{2}} \right)\left( {\frac{2}{3}} \right)}}{{\left( {\frac{1}{2}} \right)\left( {\frac{1}{3}} \right) + \left( {\frac{1}{2}} \right)\left( {\frac{2}{3}} \right)}}\left( {\frac{2}{3}} \right) = \left( {\frac{1}{3}} \right)\left( {\frac{1}{3}} \right) + \left( {\frac{2}{3}} \right)\left( {\frac{2}{3}} \right) \\ = & \, \frac{1}{9} + \frac{4}{9} = \frac{5}{9} \\ \end{aligned}$$

The value of P(B2) can similarly be obtained by summing over U1 and U2 as above. In that calculation, the weights P(U1) and P(U2) are both equal to 1/2, and P(B2) is a simple average of P\(({\text{B}}_{2}|{\text{U}}_{1})=1/3\) and \({\text{P}}({\text{B}}_{2}|{\text{U}}_{2}) =2/3, \) so that \({\text{P}}({\text{B}}_{2})\,<\,{\text{P}}({\text{B}}_{2}|{\text{B}}_{1}) =5/9.\)

figure e

But now suppose that we directly “set” the value of the first draw to black, e.g., we hire someone to look inside the urn and intentionally pull out a black ball. If we then put the ball back in the urn, we learn nothing about the outcome of the second draw. Explanationism can deliver this result if we take \({\text{P}}({\text{B}}_{2}\,|\,{\text{do(B}}_{1}))\) in our original network to be equal to \({\text{P}}({\text{B}}_{2}|{\text{B}}_{1})\) in the mutilated network in Fig. 7. Here \({\text{P}}({\text{B}}_{2}|{\text{B}}_{1})\) = P(B2), because B1 neither raises the probability of B2 directly nor via some intermediary, as in the original network.

By contrast, the Orthodox probability distribution over our four state-descriptions imposes no constraints on \({\text{P}}({\text{B}}_{2}\,|\,{\text{do(B}}_{1})).\) The Orthodox probabilist could assign a new probability distribution over a new set of state-descriptions that includes actions like do(B1). But nothing in Orthodoxy requires that this distribution give probabilities like \({\text{P}}({\text{B}}_{2}\,\,|\,\,{\text{do(B}}_{1}))\) the intuitively correct values. If it does give the correct values, this is simply a brute fact about those probability distributions. Inasmuch as Explanationism requires intuitively correct equalities that Orthodoxy must stipulate ad hoc, this gives us reason to prefer Explanationism.

3.6 Substantive methods for determining the values of basic probabilities get the wrong result if applied to Orthodox basic probabilities

Recall that the task of determining the values of epistemic probabilities has two parts. We have been exploring the structural part, which asks which probabilities are basic and which are non-basic. The substantive part involves assigning values to the basic probabilities. Substantive methods will have different implications if applied to different (allegedly basic) probabilities. One of the most important reasons to settle the structural question is to guide the application of substantive methods in probabilistic reasoning. I will now argue that when we combine Orthodoxy and Explanationism with proposed substantive methods and they deliver different results, it is Orthodoxy that goes wrong. The proposed substantive methods I will consider are Maximum Entropy (a generalization of the Principle of Indifference) and assigning higher probabilities to simpler hypotheses.

I should stress that I am not committed to the correctness of these proposed substantive methods. My argument is conditional: if Maximum Entropy or simplicity are the correct criteria of basic probability, they get the right result only if combined with Explanationism. I argue, moreover, that the basic problematic phenomenon I identify—the addition of explanatorily posterior variables affecting the probability of explanatorily prior variables—will take place with any method that assigns probabilities directly to state-descriptions.

My argument in this sub-section will be most effective with objectivists who think there are privileged probability assignments determined by some substantive method or other. However, I would note two points. First, applying Maximum Entropy to Explanationist basic probabilities rather than state-descriptions allows us to avoid many of the paradoxes the Principle of Indifference is often held to lead to (see Huemer 2009). As such, some objections to objectivism may be undermined by my argument in this sub-section. Second, many subjectivists about probability think of the impact of evidence as something individuals are free to determine based on how they weigh conflicting substantive criteria—such as symmetry and simplicity—against each other. So subjectivists who use these criteria to determine their own personal probabilities might still be moved by my arguments in this sub-section, provided that they share my intuitions about which applications of these criteria seem unsatisfying.

3.6.1 Example 1: Maximum Entropy

The Principle of Indifference says that we should assign equal probability to a space of alternatives if our knowledge does not favor any of these alternatives over any other. The Maximum Entropy principle (MaxEnt) generalizes this by telling us to assign probabilities that are as close to equal as is consistent with our knowledge (Williamson 2005: 80, 2010: 28–29).Footnote 22

Orthodox probabilists like Williamson would have us apply MaxEnt to the set of all possible state-descriptions. On Williamson’s version of objective Bayesianism, “the probabilities of the atomic states [i.e., state-descriptions] are basic: all other probabilities can be defined in terms of them” (2010: 27). According to Williamson, when one has no information favoring one state-description over another, one should assign equal probabilities to all of them. If one does have information favoring one state-description over another, one should assign probabilities as close to equal as is consistent with one’s information.

I will now argue that applying MaxEnt to state-descriptions in this way leads to absurd results. Suppose I tell you that I have an urn in front of me that contains 1 black ball and 1 white ball. If I sample from the urn only once and we apply the Principle of Indifference to the partition {B1, W1}, we get the result that P(B1) = P(W1) = 1/2.

But now suppose that I tell you that I am going to sample from the urn twice, and that the outcome of the first draw will influence the outcome of the second one. In particular, if I draw the black ball the first time, I will set it aside, and so be ensured to draw the white ball the second time. If I draw the white ball the first time, I will set it aside, but also add a green ball to the urn.

Now we have two partitions: {B1, W1}, {B2, W2, G2}. This gives us six state-descriptions: {B1&B2, B1&W2, B1&G2, W1&B2, W1&W2, W1&G2}. Your background knowledge that B1 ↔ W2 and W1 ↔ B2∨G2 allows you to eliminate the first, third, and fifth outcomes, leaving you with {B1&W2, W1&B2, W1&G2}. If you apply the Principle of Indifference to those state-descriptions not excluded by your knowledge, they each get 1/3 probability. This implies that, before either draw has been made, P(B1) = 1/3 and P(W1) = 2/3. So without giving you any new knowledge about how I make the first draw and without telling you about any actual (as opposed to merely possible) effects of that draw, I have made it more initially likely for you that the first draw is white.

This is the intuitively wrong result. The outcome of the first draw is determined prior to the outcome of the second. B1 and W1 should both be assigned unconditional probability 1/2, and B2 and G2 should each be assigned equal probability conditional on B1. This gives probabilities of 1/2, 1/4, and 1/4 to our state-descriptions.

Explanationism delivers the intuitively correct result in this case. First, we order our variables: Draw 1 is prior to Draw 2. Then we have the following basic probabilities:

  • \(P(B_{1})= P(W_{1})=1/2,\)

  • \(P(B_{2}|B_{1})= P(G_{2}|B_{1})=0,\)

  • \(P(W_{2}|B_{1})=1,\)

  • \(P(B_{2}|W_{1})=P(G_{2}|W_{1})=1/2,\)

  • \(P(W_{2}|W_{1})=0.\)

The first and fourth lines follow from the application of MaxEnt to {B1, W1} and to {B2, W2, G2} conditional on W1. It follows that

  • \(P(B_{1}{\&}W_{2})=P({B}_1)\)\(P(W_{2}|B_{1})=(1/2)(1)=1/2,\)

  • \(P(W_{1}{\&}B_{2})=P({W}_1)\)\(P(B_{2}|W_{1})=(1/2)(1/2)=1/4,\)

  • \(P(W_{1}{\&}G_{2})=P({W}_1)\)\(P(G_{2}|W_{1})=(1/2)(1/2)=1/4.\)

Williamson (2005: 95–106; cf. 2010: 46–47) recognizes the above problem with applying MaxEnt to state-descriptions when we have causal information, discussing a similar example raised by Pearl (1988). His solution is to introduce causal constraints in addition to the quantitative constraints \({\text{P}}(\text{W}_{2}|{\text{B}}_{1})=1\) and \( {\text{P}}({\rm{B_{2}}}\!\vee\!{\rm{G_{2}}}\,|\, {\rm{W_{1}}}) = 1 \) imposed by the above information. These causal constraints say that if our knowledge tells us that variables {V1, … Vi} are causally ordered from 1 to i, then we begin by maximizing entropy over the propositions in V1, giving us a probability distribution P1. Next, we select the highest entropy probability distribution over V1×V2 (i.e., the Cartesian product of V1 and V2) which is consistent with P1, giving us P2. Then, we select the highest entropy probability distribution over V1×V2×V3 which is consistent with P2, and so on.

In the above case, we begin by maximizing entropy over {B1, W1}, assigning probability 1/2 to each possibility, and then choose the probability distribution which maximizes entropy over {B1&W2, W1&B2, W1&G2} among those distributions consistent with P(B1) = P(W1) = 1/2. This gives us the same result as the Explanationist method.

There are two problems with using the above method to save Orthodoxy. First, it appears to be Orthodox in name only. From the perspective of Orthodoxy, Williamson’s causal constraint looks ad hoc and unmotivated, wheeled in only to stave off counterexamples. If the probabilities of state-descriptions really are basic, then why does our background knowledge require us to conform them to probabilities first assigned to what are, from the perspective of Orthodoxy, disjunctions of state-descriptions (e.g., [W1&B2]∨[W1&G2])?

Indeed, this constraint gets the right results only because it parrots the Explanationist approach. For example, if P(W1) = 1/2, then

  • \(P({W_1}\&{B_2})=P(W_1)P(B_{2}|W_1)=(1/2)P(B_{2}|W_{1})\)

and

  • \(P({W_1}\&{G_2})=P(W_1)P(G_{2}|W_{1})=(1/2)P({G}_{2}|W_{1}).\)

We obtain the most equal distribution over {B1&W2, W1&B2, W1&G2} by setting \(P(B_{2}|W_{1})=P(G_{2}|W_{1})=1/2, \) which gives us the {1/2, 1/4, 1/4} distribution over this partition. At each step we maximize entropy over the new set of variables consistent with the causal constraints precisely by maximizing entropy over the probabilities that Explanationism says are basic.

figure f

Second, and more seriously, Williamson’s method only applies to the special case in which we know which variables causally influence which other variables. But consider a modification to the above case. As before, I tell you that an urn will be sampled from twice, and that in the one draw the possible outcomes are {B1, W1} and in the other they are {B2, W2, G2}. And as before, I tell you that B1 ↔ W2 and W1 ↔ B2∨G2. However, now I do not tell you which draw takes place first.

We can continue denoting the draw in which the only possibilities are black and white as Draw 1 and the other as Draw 2, but now these should be understood simply as labels, and not as denoting temporal information. So for all you know, the situation could be as represented in Fig. 8, or it could be as represented in Fig. 9. In this latter scenario, if I draw white in Draw 2 (which is now the first draw), I set the white and green balls aside, ensuring that I draw black on Draw 1; and if I draw black or green in Draw 2, set the black and green balls aside, ensuring that I draw white in Draw 1.

Williamson’s method only applies when we know what the causal constraints are (2005: 99). As such, it will not preclude the ordinary application of MaxEnt to the state-descriptions {B1&W2, W1&B2, W1&G2}. So we will again assign 1/3 probability to each of these, since that makes our distribution maximally equivocal. But inasmuch as you have no reason to think that either draw comes first, the Principle of Indifference should advise you to assign equal probability to both these possibilities, and then determine how likely each of these possibilities would make each of these state-descriptions. Letting N1 stand for the hypothesis that the network in Fig. 8 is correct (i.e., Draw 1 comes first), and N2 stand for the hypothesis that the network in Fig. 9 is correct (i.e., Draw 2 comes first), this gives us

$$\begin{aligned} P(B_{1} \& W_{2} ) &= P(N_{1} )P(B_{1} \& W_{2} |N_{1} ) + P(N_{2} )P(W_{2} \& B_{1} |N_{2} ) \\ & = P(N_{1} )P(B_{1} |N_{1} )P(W_{2} |B_{1} \& N_{1} ) + P(N_{2} )P(W_{2} |N_{2} )P(B_{1} |W_{2} \& N_{2} ) \\ &= \left( {{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-0pt} \!\lower0.7ex\hbox{$2$}}} \right)\left( {{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-0pt} \!\lower0.7ex\hbox{$2$}}} \right)\left( 1 \right) + \left( {{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-0pt} \!\lower0.7ex\hbox{$2$}}} \right)\left( {{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 3}}\right.\kern-0pt} \!\lower0.7ex\hbox{$3$}}} \right)\left( 1 \right) = {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 4}}\right.\kern-0pt} \!\lower0.7ex\hbox{$4$}} + {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 6}}\right.\kern-0pt} \!\lower0.7ex\hbox{$6$}} = {\raise0.7ex\hbox{$3$} \!\mathord{\left/ {\vphantom {3 {12}}}\right.\kern-0pt} \!\lower0.7ex\hbox{${12}$}} + {\raise0.7ex\hbox{$2$} \!\mathord{\left/ {\vphantom {2 {12}}}\right.\kern-0pt} \!\lower0.7ex\hbox{${12}$}} = {\raise0.7ex\hbox{$5$} \!\mathord{\left/ {\vphantom {5 {12}}}\right.\kern-0pt} \!\lower0.7ex\hbox{${12}$}} \\ \end{aligned}$$
$$\begin{aligned} P(W_{1} \& B_{2} ) & = P(N_{1} )P(W_{1} \& B_{2} |N_{1} ) + P(N_{2} )P(B_{2} \& W_{1} |N_{2} ) \\ & = P(N_{1} )P(W_{1} |N_{1} )P(B_{2} |W_{1} \& N_{1} ) + P(N_{2} )P(B_{2} |N_{2} )P(W_{1} |B_{2} \& N_{2} ) \\ & = \left( {{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-0pt} \!\lower0.7ex\hbox{$2$}}} \right)\left( {{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-0pt} \!\lower0.7ex\hbox{$2$}}} \right)\left( {{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-0pt} \!\lower0.7ex\hbox{$2$}}} \right) + \left( {{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-0pt} \!\lower0.7ex\hbox{$2$}}} \right)\left( {{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 3}}\right.\kern-0pt} \!\lower0.7ex\hbox{$3$}}} \right)\left( 1 \right) = {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 8}}\right.\kern-0pt} \!\lower0.7ex\hbox{$8$}} + {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 6}}\right.\kern-0pt} \!\lower0.7ex\hbox{$6$}} = {\raise0.7ex\hbox{$3$} \!\mathord{\left/ {\vphantom {3 {24}}}\right.\kern-0pt} \!\lower0.7ex\hbox{${24}$}} + {\raise0.7ex\hbox{$4$} \!\mathord{\left/ {\vphantom {4 {24}}}\right.\kern-0pt} \!\lower0.7ex\hbox{${24}$}} = {\raise0.7ex\hbox{$7$} \!\mathord{\left/ {\vphantom {7 {24}}}\right.\kern-0pt} \!\lower0.7ex\hbox{${24}$}} \\ \end{aligned}$$
$$\begin{aligned} P(W_{1} \& G_{2} ) & = P(N_{1} )P(W_{1} \& G_{2} |N_{1} ) + P(N_{2} )P(G_{2} \& W_{1} |N_{2} ) \\ & = P(N_{1} )P(W_{1} |N_{1} )P(G_{2} |W_{1} \& N_{1} ) + P(N_{2} )P(G_{2} |N_{2} )P(W_{1} |G_{2} \& N_{2} ) \\ &= \left( {{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-0pt} \!\lower0.7ex\hbox{$2$}}} \right)\left( {{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-0pt} \!\lower0.7ex\hbox{$2$}}} \right)\left( {{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-0pt} \!\lower0.7ex\hbox{$2$}}} \right) + \left( {{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-0pt} \!\lower0.7ex\hbox{$2$}}} \right)\left( {{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 3}}\right.\kern-0pt} \!\lower0.7ex\hbox{$3$}}} \right)\left( 1 \right) = {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 8}}\right.\kern-0pt} \!\lower0.7ex\hbox{$8$}} + {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 6}}\right.\kern-0pt} \!\lower0.7ex\hbox{$6$}} = {\raise0.7ex\hbox{$3$} \!\mathord{\left/ {\vphantom {3 {24}}}\right.\kern-0pt} \!\lower0.7ex\hbox{${24}$}} + {\raise0.7ex\hbox{$4$} \!\mathord{\left/ {\vphantom {4 {24}}}\right.\kern-0pt} \!\lower0.7ex\hbox{${24}$}} = {\raise0.7ex\hbox{$7$} \!\mathord{\left/ {\vphantom {7 {24}}}\right.\kern-0pt} \!\lower0.7ex\hbox{${24}$}} \\ \end{aligned}$$

My initial statement of Explanationism in Sect. 2.3, Explanationism 1.0, assumed that our background information entailed a unique explanatory network. We can amend Explanationism to recommend the above calculation by representing uncertainty about {N1, N2} as higher-order uncertainty about what network is correct, and then taking basic probabilities to be relative to the network endorsed by Ni (cf. Huemer 2009: 363–65, Weisberg 2009: 141)—e.g., \({\text{P}}({\text{B}}_{1}|{\text{N}}_{1})\) and \({\text{P}}({\text{W}}_{2}\,|\,{\text{B}}_{1}\&{\text{N}}_{1})\) in the first equation above.

More formally:

  • Explanationism 2.0

  • \({\text{P}}({\text{X}}\,|\,{\text{Y}}\&{\text{N}}_{i})\) is basic iff X is atomic, and Y is a conjunction of values for all parents of X in a Bayesian network that, according to Ni, includes all variables immediately explanatorily prior to X, and correctly relates all the variables it includes.Footnote 23

Given values for the basic probabilities identified by Explanationism 2.0, we can determine \({\text{P}}({\text{W}}\,|\,{\text{Z}}\&{\text{N}}_{i})\) for any W and Z in Ni. From these we can then obtain \({\text{P}}({\text{W}}|{\text{Z}})\) by averaging over \({\text{P}}({\text{W}}\,|\,{\text{Z}}\&{\text{N}}_{j})\) for all possible networks Nj, weighted by the network-probabilities \({\text{P}}({\text{N}}_{j}|{\text{Z}}),\) as above. These latter probabilities are a function of the prior probabilities of the networks, P(Nj), and the degree to which these networks predict Z, \({\text{P}}({\text{Z}}|{\text{N}}_{j}).\) Explanationism 2.0 can hold that the prior probability of a network P(Nj) is basic by holding that this probability is implicitly relative to a higher-order network which contains a partition of the different possible first-order networks Nj. In the example above, this higher-order network would contain a single node, with the partition {N1, N2}.Footnote 24

Although Williamson, like me, advocates the use of Bayesian networks in calculating probabilities, he cannot accommodate this higher-order uncertainty about networks into his framework. On my view Bayesian networks are logically prior to probability assignments, and basic probabilities are determined by means of them. But for Williamson, Bayesian networks play the purely pragmatic role of simplifying computations (2010: ch. 6), except in the special case in which a Bayesian network is uniquely determined by our causal information. If we do not know which of the Draw variables comes first, Williamson’s method for constructing Bayesian networks (2005: 84–95) would lead to the network represented in Fig. 9, simply because Draw 2 has more variables than Draw 1 and so placing it prior to Draw 1 on the network, combined with successive applications of MaxEnt in the way that Explanationism recommends, gives us the same result as directly maximizing entropy over the state-descriptions {B1&W2, W1&B2, W1&G2}. So except in the special case in which causal knowledge forces us to adopt a particular network, what Bayesian network to employ is determined by what will maximize entropy over the state-descriptions. By contrast, according to Explanationism what Bayesian network or networks to employ is determined by explanatory relations that are prior to the application of MaxEnt or any other substantive method for determining the values of basic probabilities.

3.6.2 Example 2: Simplicity

I have considered the application of MaxEnt to determining the values of basic probabilities, and argued that it gives us the wrong result if the basic probabilities are those posited by Orthodoxy, whereas it gives us the right result if the basic probabilities are those posited by Explanationism. The same phenomenon occurs if we employ other proposed criteria of basic probability, such as simplicity. For illustrative purposes, let us follow Hesse (1974: 234–36) and Swinburne (2001: 87) in taking one facet of simplicity to be quantitative parsimony, so that a theory is simpler to the extent that it posits fewer entities.Footnote 25

Suppose that we know that either 1 male or 1 male and 1 female bird (of the same species) flew to an island off the coast of the Americas 2 generations ago. We further know that each pair of male–female birds has 5 male and 5 female children in a generation. Then the total number of birds (in all generations) under the second hypothesis is 2 + 10 + 50 = 62. Since, on the first hypothesis, the bird has no mate with which to reproduce, the total number of birds given the first hypothesis is 1.

If we read quantitative parsimony as attaching to the total number of entities to which we are committed in our overall worldview, then the 2-bird hypothesis is much less simple than the 1-bird hypothesis—it posits 62 times as many birds! Intuitively, however, the 2-bird hypothesis is only slightly less simple than the 1-bird hypothesis, inasmuch as it posits only one more (comparatively) fundamental entities (birds). And as with the adding-a-green-ball-to-the-urn example above, comparing the simplicity of state-descriptions would lead to the implausible conclusion that learning that one more generation has gone by should lower the relative probability of the 2-bird hypothesis. It seems, then, that if we want to give preference to simpler hypotheses, we should compare the simplicities of atomic hypotheses on the same level of explanation, and not the simplicities of overall worldviews.

In this sub-section I have considered the application of substantive methods for determining the values of basic probabilities, and argued that they give us the wrong result if we adopt Orthodoxy, and the right result if we adopt Explanationism. The defender of Orthodoxy might object that the methods I have considered are not the correct ones, or have been misapplied. But the same general phenomenon of the addition of explanatorily posterior variables wrongly affecting the probability of explanatorily prior variables will take place with any method that assigns probabilities directly to state-descriptions, unless a safeguard is built into the method to avoid this, as in Williamson’s version of MaxEnt. And such a safeguard will likely, as above, either fail to avoid all counterintuitive consequences, reveal an implicit commitment to the order of explanation as prior to the assignment of probabilities, or both.

4 Why Explanationism matters

In Sect. 3 I gave six arguments for Explanationism over Orthodoxy. First, it is philosophically better motivated than Orthodoxy as a theory of basic epistemic probabilities. Second, it allows for conditional probabilities to be well-defined even when the state-description probabilities to which Orthodoxy would reduce them may not be well-defined. Third, we are more easily able to judge the values of the probabilities Explanationism identifies as basic than those Orthodoxy identifies as basic. Fourth, it better describes actual (good) scientific and empirical reasoning. Fifth, it can more easily be combined with Pearl’s (2000) probabilistic do-calculus. Finally, it leads to more intuitive probability assignments when combined with substantive methods like the Principle of Indifference.

In light of my fourth argument, that applications of Bayesian reasoning tend to conform better to Explanationism than Orthodoxy, you may wonder what Explanationism can really teach us. Even if philosophers don’t explicitly endorse the view, don’t they already tacitly assume it in their reasoning? Unfortunately, while many applications of probability conform to Explanationism, the lack of explicit attention to the structure of probabilities leads to both incorrect expositions of basic concepts and bad reasoning about more complicated examples. This is especially so when it comes to the use of Bayes’ Theorem in calculating probabilities.

$$P(H|E\& K) = \frac{P(H|K)P(E|H\& K)}{P(H|K)P(E|H\& K) + P( \sim H|K)P(E| \sim H\& K)}$$

frequently speak as if the “empirical data” E that enters into it is always “new evidence we have just acquired” (Salmon 1990: 177). Others describe Bayes’ Theorem “as a normative rule for updating beliefs in response to evidence” (Pearl 1988: 32–33, emphasis mine). However, our having just learned a proposition E is neither necessary nor sufficient for Bayes’ Theorem to break \({\text{P}}({\text{H}}\,|\,{\text{E}}\&{\text{K}})\) into more basic quantities. All that is necessary is that the evidence E is explanatorily downstream from the hypothesis H.

The terms in Bayes’ Theorem are often divided into the “priors,” \({\text{P}}({\text{H}}\,\,|\,\,{\text{E}}{\&}{\text{K}})\) and \({\text{P}}({\sim}{\text{H}}|{\text{K}}),\) “likelihoods,” \({\text{P}}({\text{E}}\,|\,{\text{H}}\&{\text{K}})\) and \({\text{P}}({\text{E}}\,|\,{\sim}{\text{H}}\&{\text{K}}),\) and “posterior,” \({\text{P}}({\text{H}}\,|\,{\text{E}}\&{\text{K}}).\) Many philosophers of probability attach undue metaphysical weight to these divisions, holding that there is a special problem with determining the values of prior probabilities.Footnote 27 Other philosophers have pointed out that the assumption that only prior probabilities are difficult to determine is dubious: for example, Earman (1992: 84) writes that “while much of the attention on the Bayesian version of the [Duhem] problem has focused on the assignments of prior probabilities, the assignments of likelihoods involves equally daunting difficulties.” But the assumption that likelihoods are objective, while priors are not, is not only dubious—it is impossible. There can be no intrinsic difference between prior probabilities and likelihoods because these terms describe not the intrinsic nature of different probabilities, but their functional role in a particular application of Bayes’ Theorem. In different instances of Bayes’ Theorem, one and the same probability can be both a prior probability and a likelihood.

For example, consider the proposition C: a coin will be flipped to choose between urns U1 and U2. \({\text{P}}({\mathrm{U}}_{1}|{\text{C}})\) will be a “likelihood” if we are calculating the posterior probability of C, \({\text{P}}({\text{C}}|{\mathrm{U}}_{1}),\) and it will be a “prior probability” if we know C and are calculating the posterior probability of U1 given that we draw black, \({\text{P}}({\text{U}}_{1}\,|\,{\text{C}}\&{\text{B}}).\) Either way, \({\text{P}}({\text{U}}_{1}|{\text{C}})\) is a basic probability, and we can see that its value is 1/2. What matters for determining the values of probabilities is not whether they are likelihoods or priors, but whether they are basic or non-basic, and if they are non-basic, what basic probabilities they can be reduced to.

The assumption that there is a special problem with the objectivity of prior probabilities has led most philosophers who discuss the problem of determining the values of probabilities to misconstrue it as the “problem of the priors.” In turn, most existing solutions to the problem of the priors are based on a false presupposition—namely, that the unconditional, or “intrinsic,” probabilities of hypotheses are basic.Footnote 28 On Explanationism, this amounts to the assumption that when we have no background knowledge, the partition of rival hypotheses being assigned (unconditional) prior probabilities in a problem is a root node in the Bayesian network representing our hypothesis space; that is, it has no parents. Substantive methods like the ones discussed in Sect. 3.6 can then be applied to that partition: for example, a flat (indifferent) distribution can be assigned over the partition, or the hypotheses in the partition can be ranked in order of simplicity, with higher probabilities given to simpler hypotheses.

In idealized cases (including the urn scenarios above) it is often useful to assume that prior probabilities are basic. But in real-life Bayesian reasoning the prior probability of almost any hypothesis is non-basic. This is because there are almost always other theories explanatorily prior to the hypothesis which make a difference to how likely it is to be true.

For example, consider the formulation of Darwin’s theory of evolution by natural selection. The prior probability of Darwinism (i.e., its probability apart from the data explanatorily downstream from it) was not basic. Rather, it was influenced by such considerations as empirical data suggesting that the earth was comparatively young, so that there had not been sufficient time for the speciation required by Darwin’s theory to take place (McGrew et al. 2009: 242). The age of the Earth is explanatorily prior to the origins of Earth’s species, and so in evaluating the prior probability of a theory about the latter we need to sum over different hypotheses about the former and about other relevant higher-level possibilities. For example, the network in Fig. 10 lets us calculate the probability of Darwinism as follows:

$$P\left( {Darwinism|K} \right) = \mathop \sum \limits_{i} \mathop \sum \limits_{j} P\left( {A_{i} |K} \right)P(T_{j} |K)P \left( {Darwinism|A_{i} \& T_{j} \& K}\right)$$
figure g

Historically we had empirical data relevant to the higher-level hypotheses in this network. But the structure of the network is not dependent on the existence of these data. Evidence that the earth is young is not necessary for us to see that the possibility of the degree of speciation necessary to produce the variety of life on earth today depends on how old the earth is. So even in the absence of such background knowledge, the prior probability of Darwinism would still be a function of its probability on different combinations of higher-order theories like those in Fig. 10, weighted by the prior probability of those combinations. (These priors will be influenced by even more explanatorily basic hypotheses, suggesting that we need to expand the above Bayesian network. How far back we need to expand it—at what point we reach explanatorily fundamental theories, or ultimate explanations—is a large question which I do not have space to address here.)

It follows that how well Darwinism and Special Creationism satisfy proposed criteria of theory choice, such as simplicity, is not directly relevant to their relative prior probabilities, when those simplicities are measured in the absence of potential background explanations. Their prior probabilities are a function of their probabilities conditional on conjunctions of higher-order theories. These conditional probabilities may partially be a function of the simplicity of Darwinism and Special Creationism relative to these conjunctions; but in this case what matters is not how simple the two theories are unconditionally, but how simple they are when we assume the truth of particular higher-order theories.Footnote 29

5 Conclusion

How are the values of epistemic probabilities determined? In this paper I have taken a first step towards answering this persistently difficult question. In particular, I have addressed the structural problem of how to “break down” a non-basic probability into basic probabilities of which it is a function. I have defended a view on which the explanatory structure of probabilities is determined by the explanatory structure of the propositions these probabilities relate. We obtain basic probabilities by explanatorily ordering different partitions of propositions, and determining which propositions potentially explain the truth of other propositions. Consideration of both simple thought experiments and actual applications of probabilistic reasoning reveals that we do conceive of basic probabilities in this way.

On the Orthodox approach, the probabilities of complete state-descriptions are basic, and other probabilities are determined as a function of those. Because Orthodoxy ignores the explanatory relations between the conjuncts of state-descriptions, it conflicts with our intuitive judgments about what probabilities are basic. Moreover, when combined with substantive methods for determining probabilities, it delivers the wrong results. By ignoring the asymmetry of explanation, it wrongly allows the addition of future, explanatorily downstream, variables to alter the probability distribution over past, explanatorily upstream, variables.

Explanationism has important implications for many debates in epistemology and philosophy of science. In particular, it sheds light on informal debates about the substantive problem, such as the literature on the so-called “problem of the priors.” According to the Explanationist, these debates are largely misconceived, treating prior probabilities of empirical hypotheses as sui generis, rather than imposed on them by explanatorily prior theories.

There remain significant open questions about how to flesh out the Explanationist picture:

  • Besides causal and metaphysical priority, are there other kinds of explanatory priority relevant to constructing a Bayesian network?

  • Is an infinite explanatory regress possible? Or is the Explanationist committed to there being a first cause/ultimate explanation?

  • In cases of network uncertainty, is an infinite regress of higher-order networks possible? Or is the Explanationist committed to there being some a priori higher-order network that relates all lower-order networks?

All these issues deserve further investigation. In addition, the substantive question of what determines the values of basic probabilities continues to loom large.

Daunting questions remain, then. Nevertheless, the Explanationist picture seriously advances the project of determining the values of epistemic probabilities, laying a foundation for further work and dispelling much of the dust and confusion surrounding this thorny project.