! Plausibility and Probability in Juridical Proof [forthcoming in The International Journal of Evidence & Proof] ! Marcello Di Bello Herbert H. Lehman College, City University of New York January 10, 2019 !! Abstract: This note discusses three issues that Allen and Pardo believe to be especially problematic for a probabilistic interpretation of standards of proof: (1) the subjectivity of probability assignments; (2) the conjunction paradox; and (3) the non-comparative nature of probabilistic standards. I offer a reading of probabilistic standards that avoids these criticisms. ! Keywords: Evidence, Probability, Plausibility, Standard of Proof !!! Allen and Pardo in this and previous work have defended a plausibility-based theory of 1 juridical proof and have leveled many criticisms against the competing probability-based theory. I focus on three issues that they believe to be especially problematic for the probabilistic theory: (1) the subjectivity of probability assignments; (2) the conjunction paradox; and (3) the noncomparative nature of probabilistic standards. I offer a reading of probabilistic standards that avoids these criticisms. My remarks are sympathetic toward the plausibility-based theory, but also suggest that when the probability-based theory is suitably formulated, it can capture many of the insights of the plausibility-based theory. ! Preliminarily, it is instructive to formulate different possible rules of decision based on plausibility and probability. For reasons of space, the discussion focuses on civil cases. The relative plausibility of competing hypotheses (or explanations, in Allen and Pardo's terminology) serves to formulate plausibility-based decision rules. In a civil case, the rule reads: ! [Pl] If, given the evidence, the hypothesis Hp put forward by the plaintiff is more plausible than the hypothesis Hd put forward by the defense, then the fact-finders should find for the plaintiff; otherwise, they should find for the defendant. ! This rule is comparative and holistic. It assesses the relative plausibility of the plaintiff's and defendant's hypotheses considered in their entirety. A probability-based theory gives rise to a family of rules depending on whether the decision rule is atomistic or holistic, fixed or comparative. In a civil case, a holistic and fixed (i.e. non-comparative) probability-based rule reads:   of  1 9 ! Ronald J. Allen and Michael S. Pardo, Relative Plausibility and its Critics, The International Journal of 1 Evidence & Proof, forthcoming. ! [Pr-holistic-fixed] If, given evidence E, the probability of the the hypothesis Hp put forward by the plaintiff exceeds 50%, that is, Pr(Hp|E)>50%, then the fact-finders should find for the plaintiff; otherwise, they should find for the defendant. ! This rule is holistic because it applies to the plaintiff's hypothesis in its entirety. It is fixed (i.e. non-comparative) because the probability of the hypothesis should meet a fixed threshold regardless of the probability of the competing hypothesis put forward by the other party. By contrast, an atomistic and fixed probabilistic rule reads: ! [Pr-atomistic-fixed] If, given evidence E, every individual claim Cp that the plaintiff is expected to establish exceeds 50%, that is, Pr(Cp|E)>50%, then the fact-finders should find for the plaintiff; otherwise, they should find for the defendant. ! This rule is atomistic because it applies to every individual claim by the plaintiff. It is fixed because the probability of each claim should meet a fixed threshold regardless of the probability of the claims made by the other party. There is also a holistic and comparative probability-based rule: ! [Pr-holistic-comparative] If, given evidence E, the hypothesis Hp put forward by the plaintiff is more probable than the hypothesis Hd put forward by the defense, that is, Pr(Hp|E)>Pr(Hd|E), then the fact-finders should find for the plaintiff; otherwise they should find for the defendant. 2! Finally, there is an atomistic and comparative probabilistic rule, as follows: ! [Pr-atomistic-comparative] If, given evidence E, for every pair of competing individual claims Cp and Cd put forward by the plaintiff and the defendant, claim Cp is more probable than Cd, that is, Pr(Cp|E)>Pr(Cd|E), then the fact-finders should find for the plaintiff; otherwise, they should find for the defendant. ! These different formulations are not exhaustive, but underscore the fact that the probabilitybased theory comprises a family of rules. As we shall see, not every criticism that Allen and Pardo level against the probability-based theory applies to every rule. Some criticisms are general, and others are more specific. !! 1. Convergence and Complementarity of Plausibility and Probability !   of  2 9 ! Edward K. Cheng and Michael S. Pardo, Accuracy, Optimality and the Preponderance Standard, 14 Law, Probability 2 and Risk 193 (2015); Edward K. Cheng, Reconceptualizing the Burden of Proof, 122 Yale L. J. 1254 (2013). The first criticism of the probability-based theory applies generally. Allen and Pardo observe that this theory provides no method for assigning probabilities to hypotheses and under the subjective interpretation of probability - which is the most natural in the context of trial proceedings - probability assignments are not constrained by the evidence. Except for the constraints imposed by the probability axioms, any number would do. The probability-based theory, in this sense, would be "truly subjective" and thus could hardly guide trial decisions. ! Although this is true of a probability-based theory that relies on purely subjective Bayesianism, it is not true of a theory that relies on objective Bayesianism. Objective Bayesianism adopts a subject-relative interpretation of probability, say, in terms of degrees of belief, but also requires that frequencies, data and evidence be taken into account while assigning probabilities to hypotheses. Probability assignments should be well-calibrated to the evidence. 3 The charge that the probability-based theory can only lead to probability assignments that are "truly subjective" should therefore be dismissed. ! Still, as Allen and Pardo point out in their discussion of Dale Nance's epistemic interpretation of probability, the well-calibrated fact-finder is an idealized figure distant from the reality of trial proceedings. There is no method for how probabilities should be well-calibrated to the evidence except the slogan "look at the evidence." In this respect, Allen and Pardo are correct in underscoring the superiority of their theory when it comes to assessing the plausibility of hypotheses. Their theory contains criteria for assessing when a hypothesis (or an explanation) is more plausible than its alternative, in light of the evidence. These criteria include, among other things, "consistency, coherence, fit with background knowledge, simplicity, absence of gaps, and the number of unlikely assumptions that need to be made." ! Plausibility criteria, however, need not conflict with the idealized process of calibrating probabilities to the evidence. Allen and Pardo observe that their plausibility-based theory should not lead to selecting a hypothesis that is less probable than its alternative. The more plausible hypothesis should also be the more probable hypothesis. In other words, they agree that ! [Pl =>Pr] If H1 is more plausible than H2 given E, then Pr(H1|E)>Pr(H2|E). ! They would probably also agree with the converse. If the converse did not hold, there would be cases in which Pr(H1|E)>Pr(H2|E), but H1 was not more plausible than H2, either (a) because H2 was more plausible than H1 or (b) because there was no plausibility ordering between H1 and H2. If (a), a plausibility-based theory could lead to selecting hypotheses that were less probable than their alternatives. If (b), this would show that probabilities are defined for more hypotheses than relations of comparative plausibility. I will later return to this informationally demanding aspect of probability assignments. For now, let Pr be a partial function undefined for incomparable hypotheses. This yields the equivalence: !   of  3 9 ! Jon Williamson , In Defense of Objective Bayesianism (Oxford: Oxford University Press, 2010).3 [Pl<=>Pr] H1 is more plausible than H2 given E if and only if Pr(H1|E)>Pr(H2|E). ! Given this equivalence, one might wonder to what extent the two theories of juridical proof differ from one another. Allen and Pardo insist that the plausibility-based theory does not "collapse" into the probability-based theory because plausibility criteria - not the probability axioms - guide decisions about which hypothesis should be preferred. If anything, plausibility criteria help us arrive at well-calibrated probabilities, not the other way around. In absence of accessible criteria about how to assess the probabilities of hypotheses, plausibility criteria are the only guide. ! This is correct, but is not the whole story. If plausibility relations diverged dramatically from probabilities, this would be an argument against the plausibility-based theory. Some of Allen and Pardo's critics, in fact, might allege that a plausibility-based theory permits, or even encourages, probabilistic fallacies, such as the prosecutor's fallacy. Allen and Pardo's response would probably be that this is false. If it were true, their theory would contradict the requirements of rationality of probability theory. So while Allen and Pardo are right that plausibility criteria prevail as far as practical feasibility goes, they cannot deny that probabilities have primacy in determining the normative requirements of rationality. To understand the relationship between the plausibility-based theory 4 and the probability-based theory of juridical proof, a brief mention of psychologist Gerg Gigerenzer's fast and frugal heuristics can be of service. According to Gigerenzer, human 5 beings rely on fast and frugal heuristics as shortcuts for reasoning and decision-making. These heuristics are more efficient and computationally less burdensome than fully probabilistic analyses, and in most cases, perform just as well as their probabilistic counterparts. We can interpret Allen and Pardo's plausibility criteria for the assessment of the relative plausibility of hypotheses along similar lines. That is, assessments of relative plausibility are computationally more efficient than complex probabilistic analyses, but the two do not diverge. As Allen and Pardo note, the plausibility of a hypothesis can be used as a "proxy" for its probability. ! If this is correct, the two theories converge and complement one another. They converge because if a hypothesis is more plausible than another, it will also be more probable than the other. They complement one another because, while plausibility is primary as far as practical feasibility goes, probability is primary as a standard of rationality. This convergence and complementary is particularly apparent if we compare the plausibility rule [Pl] with the holistic and comparative probabilistic rule [Pr-holistic-comparative]. The two mirror one another closely. !   of  4 9 ! To escape this conclusion, Allen and Pardo could weaken claim [Pl<=>Pr] so that the equivalence would no longer 4 hold between plausibility and probability, but between plausibility and an impoverished notion of probability that did not obey all the probability axioms. I do not know if they would want to pursue this line of argument. ! Gerg Gigerenzer, P.M. Todd and ABC Research Group, Simple Heuristics that Make us Smart (Oxford: Oxford 5 University Press, 2000). ! 2. The Conjunction Paradox ! Another objection often leveled against the probability-based theory of juridical proof is the conjunction paradox. If a plaintiff establishes two independent claims to the required probability, say, Pr(C1|E)=70% and Pr(C2|E)=70%, their conjunction will not be established to the required probability since Pr(C1 & C2|E)=Pr(C1|E) x Pr(C2|E)=49%. But the law does not require the plaintiff to establish the conjunction to the required probability, it only requires to establish the individual claims. This is a potential problem for the probability-based theory. ! The key insight of Allen and Pardo's solution to this paradox is the idea of an explanation (or hypothesis) that is assessed holistically in light of the evidence without considering each claim in isolation. This solution has two parts. Part 1: the hypothesis Hp put forward by the plaintiff, if taken to be true as a whole, makes true all the individual claims that by law the plaintiff should establish in order to prove the defendant's liability. Part 2: the plaintiff is found liable only if Hp is more plausible, given the evidence, than Hd. ! This solution can be adopted by the supporters of a probability-based theory of juridical proof provided they abandon the rules [Pr-atomistic-fixed] and [Pr-atomistic-comparative]. If Allen and Pardo are right, the conjunction paradox is a definitive argument against any probabilistic rule of decision that is atomistic. I will assume they are right in this and will focus in what follows on probabilistic rules that are holistic. According to [Pr-holistic-comparative], 6 the plaintiff is found liable only if Hp is more probable, given the evidence, than Hd. According to rule [Pr-holistic-fixed], the plaintiff is found liable only if Hp is more than 50% probable given the evidence. Would the conjunction paradox arise for these two holistic rules? There is no reason for thinking that it would. ! Suppose the plaintiff must establish two separate claims and two pieces of evidence are offered, say, E1 and E2. Suppose the plaintiff offers hypothesis Hp that, if taken to be true as a whole, makes both claims true. According to [Pr-holistic-fixed], Hp must exceeds 50% for the plaintiff to win. By Bayes' theorem, ! Pr(Hp|E1 & E2) = Pr(E1 & E2|Hp)/Pr(E1 & E2) x Pr(Hp), ! where ! Pr(E1 & E2)= Pr(E1 & E2|Hp) x Pr(Hp)+Pr(E1 & E2|not-Hp) x Pr(not-Hp). ! Suppose each piece of evidence E1 and E2 is independent relative to hypotheses Hp and not-Hp. That is, Pr(E2|Hp)=Pr(E2|Hp & E1) and Pr(E2|not-Hp)=Pr(E2|not-Hp & E1). We have:   of  5 9 ! Rule [Pr-atomistic-comparative] leads to counterintuitive results; see Ronald J. Allen and Alex Stein, Evidence, 6 Probability, and the Burden of Proof, 55 Ariz. L. Rev. 557 (2013). ! Pr(E1 & E2)=Pr(E1|Hd) x Pr(E2|Hd) x Pr(Hp)+Pr(E1|not-Hp) x Pr(E2|not-Hp) x Pr(not-Hp). ! So long as both E1 and E2 are assessed relative to hypotheses Hp and not-Hp, no conjunction paradox emerges. Each piece of evidence is assessed against the entirety of the hypotheses, not against any particular claim. No multiplication of the probabilities of the individual claims given the evidence takes place. The key insight of Allen and Pardo's solution to the conjunction paradox is therefore preserved. ! A similar strategy can be adopted for [Pr-holistic-comparative]. The probabilities of the competing hypotheses put forward of the prosecutor and the defense, Hp and Hd, must be compared given the evidence. Bayes' theorem, in terms of ratios of odds, can be used, that is: ! Pr(Hp|E1 & E2)/Pr(Hd|E1 & E2) = [Pr(E1 & E2|Hp)/Pr(E1 & E2|Hd)] x [Pr(Hp)/Pr(Hd)]. ! By the independence of E1 and E2 relative to Hp and Hd, as before, we have: ! Pr(Hp|E1 & E2)/Pr(Hd|E1 & E2) = [Pr(E1|Hp)/Pr(E1|Hd)] x [Pr(E2|Hp)/Pr(E2|Hd)] x [Pr(Hp)/Pr(Hd)] ! Once again, since the probabilities of the individual claims are not multiplied, no conjunction paradox should arise. One difficulty might be that the probability of, say, Pr(E1|Hp) or other conditional probabilities, are not easy to determine. This is true, but does not obviously give rise to the conjunction paradox. ! I should be clear that what I have presented is not a solution to the conjunction paradox. I have only shown that if we accept Allen and Pardo's plausibility-based solution to the conjunction paradox, a probability-based solution can be formulated along the same lines. If Allen and Pardo's solution to the paradox succeeds (or fails), the parallel probability-based solution should also succeed (or fail). !! 3. Comparative Standards ! Consider now another problem for the probabilistic theory. Allen and Pardo believe that standards of proof are comparative. In civil cases the hypotheses put forward by the plaintiff and the defense are compared against one another, and the hypothesis more strongly supported by the evidence should prevail. This is aptly captured by the comparative plausibility-based rule of decision [Pl]. This rule assesses the relative plausibility of the competing hypothesis presented by the plaintiff and the defendants. The same cannot be said of probability-based rules that are fixed and non-comparative, such as [Pr-holistic-fixed] or [Pr-atomistic-fixed]. !   of  6 9 But why should rules of decision be comparative? And if they should, what is the best way to capture their comparative nature? Allen and Pardo show that there are both normative and descriptive reasons for thinking that standards of proof should be comparative. Since atomistic rules make it difficult to address the conjunction paradox, I will focus on what is problematic for rule [Pr-holistic-fixed]. The problem is that if Hp was only 40% probable on the evidence and Hd 20%, this rule would recommend deciding in favor of the defendant because of the plaintiff's failure to meet the 50% threshold. This is odd from a normative standpoint. If trial decisions were to promote accuracy, the decision should be for the plaintiff since Hp is more likely to be true than Hd. Rule [Pr-holistic-fixed] is thus normatively inadequate because it does not promote accuracy. On the descriptive side, Allen and Pardo at various junctures observe that since the parties cannot litigate everything, they must decide to set aside, perhaps tacitly, certain issues and not litigate them, while focusing their efforts on other issues. They call this the master-of-theirown-case principle. The plausibility-based theory of juridical proof complies with this principle because it limits the litigation to the comparison of the plaintiff's and the defendant's hypotheses. Rule [Pr-holistic-fixed] does not. As Allen and Pardo put it, this rule "would have the parties resolve every possible way in which the universe might have been the day in question." ! This can be readily seen from applying Bayes' theorem, that is, ! Pr(Hp|E)=Pr(E|Hp)/Pr(E) x Pr(Hp), ! where Pr(E) equals ! Pr(E|Hp) x Pr(Hp)+Pr(E|not-Hp) x Pr(not-Hp). ! Note that not-Hp is a catch-all alternative hypothesis that describes every alternative way the world could have been. More perspicuously, Pr(E) can be written as ! Pr(E|Hp) x Pr(Hp)+Pr(E|H1) x Pr(H1)+Pr(E|H2) x Pr(H2)+Pr(E|H3) x Pr(H3)+...+Pr(E|Hk) x Pr(Hk), ! where H1, H2, H3, ..., Hk are all the alternative hypotheses to Hp. ! As one can see, in order to assess the probability of Hp given E, the fact-finders would have to examine every alternative hypothesis. Besides the computational difficulties that this would entail, Allen and Pardo point out that the process of proof at trial does not take into consideration every alternative hypothesis. Given the limitation of time and resources, not every issue can be litigated at trial. This makes [Pr-holistic-fixed] descriptively inadequate. ! If we pair this criticism with the conjunction paradox which rules out probabilistic rule that are not holistic, the legal probabilists would only be left with one option, the holistic and comparative probabilistic rule [Pr-holistic-comparative]. This rule agrees with the comparative plausibility-based rule [Pl]. By the equivalence [Pl<=>Pr], a rule of decision based on relative   of  7 9 plausibility will lead to selecting a hypothesis that is more probable than its alternative. Rules [Pr-holistic-comparative] and [Pl] should therefore mirror one another closely. ! I do think, however, that [Pr-holistic-fixed] can be salvaged if it is appropriately interpreted. It might actually outperform the comparative version in capturing the comparative nature of probabilistic standards. First, the rule can be interpreted in a way that better accords with the master-of-their-own-case principle. Consider an analogy. It makes sense to say that the probability that some ticket of the New York lottery will win is 100 percent. Of course, this probability is not entirely accurate. There could be a natural disaster; New York state might go bankrupt; and so on. The probability that a ticket of the New York lottery will win is - presumably - extremely high but lower than 100 percent. ! Which one is it? Is Pr(a ticket wins)=100% or Pr(a ticket wins)<100%? These two seemingly contradictory probability statements result from relying on two different probability models that make different simplifying assumptions. A more perspicuous way to write them would be as Pr-M1(a ticket wins)=100% and Pr-M2(a ticket wins)<100%, where M1 and M2 encode the different assumptions underlying each probability model. A contradiction between the two probability statements would no longer arise. ! Model M1 is the most natural because it leaves out the possibility of a natural disaster, bankruptcy and other calamities. This is typically appropriate when we talk about the probability that a lottery ticket will win although it is not a complete representation of all the uncertainties. Given these simplifications, the probability that a lottery ticket will win is said to be 100 percent. ! The choice of a probability model - along with certain assumptions and simplifications - is analogous to the decision by the parties at trial to bracket off certain issues and focus the litigation on others. The space of salient possibilities to be litigated, then, need not include every way the world could have been. It need only include those possibilities the parties disagree about. If we represent this restriction by B, rule [Pr-holistic-fixed] can be rewritten as follows: ! [Pr-holistic-fixed-restricted] If, given the evidence E and a suitable set of salient possibilities defined by B, the probability of the the hypothesis Hp put forward by the plaintiff exceeds 50%, that is, Pr-B(Hp|E)>50%, the fact-finders should find for the plaintiff; otherwise, they should find for the defendant. ! It is no longer true that [Pr-holistic-fixed-restricted] would require the parties to litigate everything because the space of salient possibilities is restricted by B. ! We have just seen that rule [Pr-holistic-fixed-restricted] escapes the descriptive challenge based on the master-of-their-own-case principle. Can it also escape the accuracy-based normative challenge? It can. For consider a case in which the plaintiff's hypothesis is 40% probable on the evidence presented at trial. Crucially, this probability cannot be evaluated in a vacuum but only relative to a set of restrictions on the issues to be litigated, as represented by B. Given these   of  8 9 restrictions, if 40% is the probability of Hp, that is, Pr-B(Hp|E)=40%, then 60% is the probability of the alternative hypotheses left open by the parties, that is, Pr-B(not-Hp|E)=60%. To side with the defendant is thus the decision that would best promote accuracy. Allen and Pardo might be right that without any restriction on the issues to be litigated, accuracy requires that the only comparison be between the probabilities of Hp and Hd and that the more probable of the two should prevail. But this is not so for rule of decision [Pr-holistic-fixed-restricted]. ! This new rule might in fact better capture the comparative nature of standards of proof. Some of Allen and Pardo's critics allege that standards of proof are not comparative because the defense is not required by law to offer an alternative explanation. Allen and Pardo's response is that, in practice, the defense will offer an alternative explanation, if only for tactical reasons. And even if the defense did not, the fact-finders making the decision will entertain their own alternative explanations. ! This response to their critics indicates that, when Allen and Pardo talk about alternative explanations (or hypotheses, in the terminology used here), they do not necessarily have in mind actual explanations put forward by the defense. These alternatives are those that a reasonable fact-finder would - or should - entertain. Keeping in mind the restrictions imposed by B on the issues to be litigated, these alternatives are all the hypotheses that rule [Pr-holistic-fixedrestricted] requires to take into account. This rule, then, has the potential to capture the comparative nature of standards of proof in a broader sense. This might be the sense that Allen and Pardo have in mind, at least when they answer their critics. Their critics, in turn, might be more amenable to agree with this reading of the comparative nature of standards of proof, as exemplified by rule [Pr-holistic-fixed-restricted].   of  9