Abstract
Scoring rules measure the deviation between a credence assignment and reality. Probabilism holds that only those credence assignments that satisfy the axioms of probability are rationally admissible. Accuracy-based arguments for probabilism observe that given certain conditions on a scoring rule, the score of any non-probability is dominated by the score of a probability. The conditions in the arguments we will consider include propriety: the claim that the expected accuracy of p is not beaten by the expected accuracy of any other credence c by the lights of p if p is a probability. I argue that if we think through how a non-probabilist can respond to pragmatic arguments for probabilism, we will expect the non-probabilist to accept a condition stronger than propriety for the same reasons that the probabilist gives for propriety, but this stronger condition is incompatible with the other conditions that the probabilist needs to run the accuracy argument. This makes it unlikely for the probabilist’s argument to be compelling.
Similar content being viewed by others
Notes
This paper is focused on arguments that credences should be probabilities, and hence will pass over Lindley’s (1982) very interesting argument that, given some technical assumptions on an additive scoring rule but notably not including propriety, the “admissible” credence assignments—ones whose scores are not weakly dominated (one outcome is weakly dominated by another provided that in all circumstances the second is at least as good, and in at least one it is better) by the scores of other credence assignments—can transformed into probabilities while preserving order. To take this argument as establishing probabilism would require thinking that two order-isomorphic patterns of credence assignment are equivalent, so that the admissible credences are essentially probabilities. But order-isomorphism is not sufficient for real equivalence of credence assignments. Imagine that Alice always assigns a credence that is equal to the square root of the credence that Bob assigns. Then although there is an order-preserving transformation between their patterns of credence assignments, their rational betting behavior will be different: when Alice accepts even odds, Bob does not. Nonetheless, showing order isomorphism of admissible credences to probabilities is a significant result and provides some evidence for probabilism.
Some of our discussion will then be simplified by not considering negative credences and credences greater than one. The kind of non-probabilist that we will be considering will be one that will place some reasonable constraints on credences, and making credences range from 0 to 1 certainly seems reasonable.
In the setting of Pruss (2021b), a wager (perhaps “wager portfolio” would be a better term) is a sequence of event-payoff pairs, with the utility function defined in terms of these. Different wagers can yield the same utility function: a wager that yields \( \$2\) on heads as well as yielding \( \$3\) on heads-or-tails has the same utility function as a wager that yields \( \$5\) on heads as well as yielding \( \$3\) on tails. By assuming that wagers are compared by their utility functions, we are simplifying and assuming that wagers with the same utility function are interchangeable. This assumption is not satisfied for every method of linking decisions to credences. It is not satisfied, for instance, by the method presupposed by De Finetti’s pragmatic arguments for probabilism (de Finetti, 1937). In the context of preference comparisons derived from previsions, the interchangeability of portfolios with the same utility function corresponds to saying that the prevision is “integral-like” (Pruss, 2021b). Because our task in this paper is to consider how the accuracy arguments for probabilism fare against the most plausible versions of non-probabilism, and because a preference comparison that fails to be indifferent between two portfolios that have exactly the same utility function—say, because the portfolios arrange the wagers in different but logically equivalent ways—is eo ipso problematic, integral-likeness is a reasonable assumption in our context. We should not expect a smart non-probabilist to distinguish equivalent wagers.
Our \({\text {LSI}}_c\) corresponds to \({\text {LSI}}_c^\uparrow \) in Pruss (2021b). We ignore \({\text {LSI}}_c^\pm \), as we would expect a utility prevision to commute with positive affine transformations, since utilities are normally thought to be defined only up to positive affine transformations, and \({\text {LSI}}_c^\uparrow \) commutes in this way if c satisfies Zero and Normalization (Pruss, 2021b, Lemma 1) while \({\text {LSI}}_c^\pm \) does not in general.
Hájek (2008) notes that it is important to also show that if c is a probability, then there is no such p. Since this follows from the propriety assumption which says that \(E_c s(c) \le E_c s(p)\) if c is a probability, we will omit this point for brevity in discussions.
In the case of mathematical expectation for a probability r, we have \(E_r(-f)=-E_r(f)\) and so our condition is equivalent to the more familiar \(E_r s(r) \le E_c s(r)\). However, it is in general not true that \({\text {LSI}}_r(-f)=-{\text {LSI}}_r(f)\): see the Appendix for what is actually true.
Additionally, the result in Pruss (2016) suggests that there may be natural significant credence thresholds at 3/4, 15/16, 255/256, and so on. For when a perfect Bayesian’s credence in p exceeds 3/4, the perfect Bayesian is in a position to favor (i.e., assign credence greater than 1/2) the claim that her credence in p will never dip below 1/2, and when the credence in p exceeds 15/16, she is in a position to assign a credence greater than 3/4 that her credence in p will never dip below 3/4, and so on.
The assumption that that at least one regular credence has a finite score is very plausible. First, if a regular probability has a score that’s somewhere infinite, the expected inaccuracy \(E_p s(p)\) of a regular probability p by its own lights will be infinite, and it is implausible that some probability, especially a regular one, would expect itself to be infinitely inaccurate. Hence, it is reasonable to think that every regular probability would have finite score. If every regular probability has an infinite score somewhere, and we have propriety, then every irregular probability would also have an infinite score somewhere, since by propriety we must have \(\infty = E_p s(p) \le E_p s(q)\) for any probability q and any regular probability p. Therefore, the propriety inequality \(E_p s(p) \le E_p s(q)\) would hold only in the degenerate \(\infty \le \infty \) form for any regular probability p and any probability q. Such a scoring rule would not be useful for guiding epistemic practice.
It is worth noting that in the infinite case, there is a third highly technical solution. It is possible to construct strictly proper scoring rules in the infinite contexts if instead of requiring the values of the scores to be extended real numbers, we allow scores to take values in some larger set such as nets of real numbers (Pruss, 2022) (it is not known at present whether an argument for probabilism can be run in infinite contexts using this approach). This way out of the negative results does not appear to have a parallel in our finite-space non-probabilist context.
Nor should it much help the “extremist” case if the only known strictly \((\mathcal E,E)\)-proper scoring rule s on all credences, where \(\mathcal E\) is the extreme probabilities, was such that if c is not an extreme probability, then c is strictly s-dominated by some extreme probability p. For a trivial example, one can let \(s(c)(\omega )=1-c(\{\omega \})\) if c is an extreme probability and \(s(c)(\omega )=2\) otherwise.
I am grateful to Michael Nielsen for drawing my attention to \(\mathcal P_0\) and asking whether there is a strictly \((\mathcal P_0,E)\)-proper score.
Recently Michael Nielsen has suggested to me that only probabilities in \(\mathcal P_0\) are rationally acceptable, and so perhaps the argument in favor of \(\mathcal P_0\) should be taken more seriously, despite the fact that it rules out the fair coin product measure which seems paradigmatically rational.
I am grateful to Michael Nielsen for discussions of these topics, to two anonymous readers for a careful reading, and to one anonymous reader for a number of important observations discussing which has greatly enhanced the paper.
References
Campbell-Moore, C., & Levinstein, B. A. (2021). Strict propriety is weak. Analysis, 81, 8–13.
Dawid, A. P., & Musio, M. (2014). Theory and applications of proper scoring rules. Metron, 72, 169–183.
de Finetti, B. (1937). Foresight: Its logical laws, its subjective sources. In H. E. Kyburg & H. E. K. Smokler (Eds.), Studies in subjective probability. Kreiger Publishing.
Gneiting, T., & Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102, 359–378.
Greaves, H., & Wallace, D. (2006). Justifying conditionalization: Conditionalization maximizes expected epistemic utility. Mind, 115, 607–632.
Hájek, A. (2008). Arguments for, or against, probabilism? British Journal for the Philosophy of Science, 59, 793–819.
Heinonen, J. (2012). Lectures on analysis on metric spaces. Springer.
Joyce, J. M. (1998). A nonpragmatic vindication of probabilism. Philosophy of Science, 65, 575–603.
Joyce, J. M. (1999). Accuracy and coherence: Prospects for an alethic epistemology of partial belief. In F. Huber & C. Schmidt-Petri (Eds.), Degrees of belief (pp. 263–97). Springer.
Lindley, D. V. (1982). Scoring rules and the inevitability of probability. International Statistical Review, 50, 1–11.
Molchanov, I. (2006). Theory of random sets. Springer.
Nielsen, M. (2022). On the best accuracy arguments for probabilism. Philosophy of Science, 89, 621–630.
Norton, J. D. (2021). The material theory of induction. University of Calgary Press.
Oyama, D. (2014). On the differentiability of the support function: Mathematical notes for advanced microeconomics. http://www.oyama.e.u-tokyo.ac.jp/notes/diffSuppFunc01.pdf
Pettigrew, R. (2021). Accuracy-first epistemology without the additivity axiom. Philosophy of Science, 89, 128–151.
Pettigrew, R. (2019). On the expected utility objection to the Dutch Book argument for probabilism. Noûs, 55, 23–38.
Pettigrew, R. (2016). Accuracy and the laws of credence. Oxford University Press.
Predd, J. B., Seiringer, R., Lieb, E. H., Osherson, Dl. N., Vincent Poor, H., & Kulkarni, S. R. (2009). Probabilistic coherence and proper scoring rules. IEEE Transactions on Information Theory, 55, 4786–4792.
Pruss, A. R. (2016). Being sure and being confident that you won’t lose confidence. Logos and Episteme, 7, 45–54.
Pruss, A. R. (2021a). Proper scoring rules and domination. https://arxiv.org/abs/2102.02260
Pruss, A. R. (2021). Avoiding Dutch Books despite inconsistent credences. Synthese, 198, 11265–11289.
Pruss, A. R. (2022). Accuracy, probabilism and Bayesian update in infinite domains. Synthese, 200, 444.
Pruss, A. R. (2023a). Necessary and sufficient conditions for domination results for proper scoring rules. Review of Symbolic Logic (forthcoming). https://arxiv.org/abs/2103.00085
Pruss, A. R. (2023b). Domination for finite proper scoring rules (manuscript). http://philsci-archive.pitt.edu/21871
Ramsey, F. P. (1931). Truth and probability. In The foundations of mathematics and other logical essays (pp. 156–198). Routledge and Kegan Paul.
Rockafellar, R. T. (1970). Convex analysis. Princeton University Press.
Schervisch, M. J., Seidenfeld, T., & Kadane, J. B. (2009). Proper scoring rules, dominated forecasts, and coherence. Decision Analysis, 6, 202–221.
User “alesia”. (2022). Answer to: “For most directions does the supporting hyperplane meeting a bounded convex set meet it at one point?” Mathoverflow. https://mathoverflow.net/questions/432080
Winkler, R. L., Munoz, J., Cervera, J. L., Bernardo, J. M., Blattenberger, G., Kadane, J. B., Lindley, D. V., Murphy, A. H., Oliver, R. M., & Ríos-Insua, D. (1996). Scoring rules and the evaluation of probabilities. Test, 5, 1–60.
Acknowledgements
No funds, grants, or other support were received.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author has no relevant financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Some technical results
Appendix: Some technical results
1.1 Level set integrals and monotonic credences
Given a credence c, let \(c^*(A)=1-c(\Omega -A)\) (for a probability p we have \(p^*=p\)). Then given Zero and Normalization, we have \({\text {LSI}}_c (-f) = -{\text {LSI}}_{c^*} f\) when f is a function that takes values either in \((-\infty ,\infty ]\) or in \([-\infty ,\infty )\). We only need to check this for f having finite values. Moreover, because \({\text {LSI}}_c (\alpha +f)={\text {LSI}}_c f\) (by the well-definition of \({\text {LSI}}_c f\)), we may suppose f takes values in [0, L] for some finite L and then:
where the first and last equalities used Zero and Normalization (applied to c) respectively, and the move from considering the level set \(\{ \omega : -f(\omega ) \ge -y \}\) to considering the level set \(\{ \omega : -f(\omega ) > -y \}\) depended on the fact that the two level sets are equal except perhaps when y is one of the finitely many values of f.
In Pruss (2021b, Lemma 1g) it is incorrectly stated (with some trivial translation to our setting) that if f is negative and finite, then \({\text {LSI}}_c f = -{\text {LSI}}_c (-f)\). The right hand side should instead be \(- {\text {LSI}}_{c^*} (-f)\). The only place where (Pruss, 2021b) uses the incorrect claim appears to be in the proof of Theorem 1 in the special case of \({\text {LSI}}^\uparrow _P\), where it is shown that decisions using level set integrals avoid Dutch Books. To fix the problem, in the statement of the theorem in the case of \({\text {LSI}}^\uparrow _P\) one needs to replace Non-Negativity with the axiom that credences have value at most 1, and instead of the argument given in the proof, use the formula \({\text {LSI}}^\uparrow _P f = -{\text {LSI}}^\uparrow _{P^*} (-f)\) to establish that \({\text {LSI}}^\uparrow _P f < 0\) if \(f<0\) everywhere, noting that \(P^*\) satisfies Non-Negativity if \(P\le 1\) everywhere.
The following extends one of the monotonicity results from Pruss (2021b):
Theorem 1
If \(c\in \mathcal M\) and f and g are functions on \(\Omega \) with values in \([-\infty ,\infty ]\) such that \(f<g\) everywhere, then \({\text {LSI}}_c f < {\text {LSI}}_c g,\) with both level set integrals well-defined.
Proof
Since \(f<g\) everywhere, g cannot take the value \(-\infty \) anywhere and f cannot take the value \(+\infty \) anywhere. Let \(M_0=1+\max (\max f,\max (-g))\). This is finite, and if \(M\ge M_0\), then
everywhere. By Pruss (2021b, Theorem 2) we then have
Taking the limit as \(M\rightarrow \infty \), we conclude that \({\text {LSI}}_c f < {\text {LSI}}_c g\). \(\square \)
1.2 Propriety
Theorem 2
Any bounded scoring rule s defined only on the probabilities and proper there can be extended to an \((\mathcal M,{\text {LSI}})\)-proper scoring rule on all the credences.
Proof
The value s(p) of s is a function from \(\Omega \) to \({\mathbb {R}}\) and the set of such functions \({\mathbb {R}}^\Omega \) can be thought of as n-dimensional Euclidean space, where n is the cardinality \(|\Omega |\) of \(\Omega \). Let V be the topological closure of the set \(\{ -s(p) : p \in \mathcal P \}\). For any fixed \(u\in \mathcal M-\mathcal P\), the prevision \({\text {LSI}}_u\) is a continuous function from \({\mathbb {R}}^\Omega \) to \({\mathbb {R}}\) (Pruss, 2021b, Prop. 2) and since V is compact, it attains a maximum at one or more points of V. Choose any one of these points, and call it \(\alpha _u\).
The selection of \(\alpha _u\) for each u can be done as a direct application of the Axiom of Choice, but we can also do it constructively. Identifying \({\mathbb {R}}^\Omega \) with \({\mathbb {R}}^n\), order it lexicographically. The set of points of V where \({\text {LSI}}_u\) attains its maximum is closed (since it’s the pre-image of the closed set \(\{ \max _V {\text {LSI}}_u \}\) under the continuous function \({\text {LSI}}_u\)) and hence compact, and so it will have a lexicographically first element. Let \(\alpha _u\) be that element.
Then let \(s(u)=-\alpha _u\). For any \(u\in \mathcal M-\mathcal P\), the point \(-s(u)\) maximizes \({\text {LSI}}_u\) over V. The same can be seen to be true for \(u\in \mathcal P\) by propriety of our original score s on \(\mathcal P\), the fact that any point of V is a limit of a sequence of values of \(-s\), and the fact that \({\text {LSI}}_u\) agrees with \(E_u\) for u a probability. Then for any \(u,v\in \mathcal M\) we have \({\text {LSI}}_u (-s(u)) \ge {\text {LSI}}_u (-s(v))\) because \({\text {LSI}}_u\) is maximized over V at \(-s(u)\) while \(-s(v)\in V\). Finally, we need to define s(c) where \(c\in \mathcal C-\mathcal M\). The simplest solution is just to let \(s(c)(\omega )=\infty \) for all \(\omega \) (any point that is dominated by some point in V will also work). \(\square \)
Say that a credence c satisfies Subadditivity provided that \(c(A)+c(B)\le c(A\cup B)\) whenever A and B are disjoint. Given that our credences take values in [0, 1], Subadditivity implies Zero and Monotonicity. Recall that c is regular provided that \(c(A)>0\) whenever A is non-empty. Let \(\mathcal S\) be the regular credences that satisfy Normalization and Subadditivity. Say that a member f of \([-\infty ,\infty ]^\Omega \) is finite provided \(|f(\omega )|<\infty \) for all \(\omega \in \Omega \).
Theorem 3
Suppose \(\Omega \) has at least two points. Let \(s:\mathcal S\rightarrow [M,\infty ]^\Omega \) be a \((\mathcal S,{\text {LSI}})\)-proper scoring rule defined on \(\mathcal S\) and suppose that s(u) is finite for at least one \(u\in \mathcal S\). Then there is a probability p in \(\mathcal S\) and a non-probability r in \(\mathcal S\) such that (a) \(s(p)=s(r)\) everywhere, and (b) there is a point \(\omega \in \Omega \) at which p is truer than s.
The proof of the Theorem can actually be used to show that for almost all (in the sense of Lebesgue measure) regular probabilities p with finite score there is a non-probability \(r\in \mathcal S\) such that (a) and (b) are true.
Say that a scoring rule s is probability-distinguishing provided that if \(p\in \mathcal P\) and \(c\in \mathcal C-\mathcal P\), then \(s(p)(\omega )\ne s(c)(\omega )\) for some \(\omega \). Then Theorem 3 shows that no \((\mathcal S,{\text {LSI}})\)-proper scoring rule defined on \(\mathcal S\) with at least one finite score is probability-distinguishing.
Note that quasi-strict propriety makes it impossible for a regular probability p to have an infinite score, since then we would have \(E_p s(p)=\infty \).
Corollary 1
No \((\mathcal S,{\text {LSI}})\)-proper scoring rule on a space with at least two points is quasi-strictly proper or strictly truth-directed.
Write \(v \cdot w\) for the dot product of two vectors. The (convex) support function \(\sigma _K\) of a subset K of \({\mathbb {R}}^n\) is defined by:
for \(v\in {\mathbb {R}}^n\). As usual, we say that something happens for almost all members of a set if it happens everywhere except on a set of zero Lebesgue measure.
Lemma 1
Let \(K \subseteq (-\infty ,M]^n\) be a non-empty closed convex set for \(n\ge 2\). Then for almost all v in the positive orthant \((0,\infty )^n,\) there is a unique \(z\in V\) such that \(\sigma _K(z) = v\cdot z\).
We will write \(v_i\) for the ith component of a vector in \({\mathbb {R}}^n\). I am grateful to a Mathoverflow responder (user “alesia”, 2022) for the part of the proof after the reduction to bounded K.
Proof of Lemma 1
Without loss of generality \(0\in V\) (otherwise translate K and change M as needed), so \(\sigma _K(z) \ge 0\) for all z.
Fix \(\varepsilon >0\). Let \(Q_\varepsilon \) be the set of vectors v in the positive orthant such that \(v_i/|v| > \varepsilon \) for all i. We shall show our result restricted to vectors in \(Q_\varepsilon \), and the general result follows since \((0,\infty )^n = \bigcup _{k=1}^\infty Q_{1/k}\).
Next observe that without loss of generality we can take K to be bounded. For suppose that \(v\in Q_\varepsilon \) and \(z\in K\). If \(z_i < -(n-1)M/\varepsilon \) for some i, then
Thus if \(K' = K\cap [-(n-1)M/\varepsilon ,M]^n\), then \(\sigma _{K'}\) and \(\sigma _{K}\) are equal on \(Q_\varepsilon \) and the suprema defining them are attained at the exact same points.
The support function of any closed, bounded and convex set is Lipschitz (Molchanov, 2006, p. 421, Theorem F.1). A Lipschitz function on an open set in \({\mathbb {R}}^n\) is differentiable almost everywhere (Heinonen 2012, p. 47, Theorem 6.15). And if the support function of K is differentiable at v, then there is a unique \(z\in K\) such that \(\sigma _K(z) = v\cdot z\) (see Rockafellar, 1970, Corollary 25.1.3) or, for a self-contained proof (Oyama, 2014, Theorem 1.1); note that results for the concave support function applied to the negative of the argument vector yield results for our convex support function \(\sigma _K\)). \(\square \)
Proof of Theorem 3
Let \(n=|\Omega |\). Suppose without loss of generality that \(\Omega =\{1,\dots ,n\}\). Let \(t=-s\), so \({\text {LSI}}_r t(r) \ge {\text {LSI}}_r t(u)\) for all \(r,u\in \mathcal S\) by the \((\mathcal S,{\text {LSI}})\)-propriety of s.
By abuse of notation, identify members of \({\mathbb {R}}^\Omega \) with members of \({\mathbb {R}}^n\). Let \(U\subset {\mathbb {R}}^n\) be the set of all finite t(u) for \(u\in \mathcal S\). Let K be the closed convex hull of U. By Lemma 1, let v in the positive orthant be such that for a unique \(z\in K\) we have \(\sigma _K(v) = v\cdot z\). Rescaling as needed, suppose \(\sum _{i=1}^n v_i = 1\).
Let p be the probability such that \(p(\{ i \}) = v_i\). Then for any \(w\in U\) we have \(w=t(u)\) for some u and so:
By continuity and linearity of the inner product, it follows that \(v\cdot t(p) \ge v\cdot w\) for all \(w \in K\). Letting \(w=z\), we see that \(v \cdot z \le v\cdot t(p) \le v\cdot z\), and so by choice of z we must have \(z=t(p)\).
Let \(i_1,\ldots ,i_n\) be an enumeration of \(\{1,\dots ,n\}\) such that \(z_{i_1}\le \dots \le z_{i_n}\). Then for any credence u satisfying Zero and Normalization:
Let r be any credence such that \(r(A)=p(A)\) if \(A\ne \{ i_1 \}\) and \(0<r(\{i_1\})<p(\{i_1\})\). Then r satisfies Zero, Normalization and Subadditivity, and is regular, but is not a probability since \(\sum _{j=1}^n r(\{i_j\})<1\).
Observe that \({\text {LSI}}_p z\) and \({\text {LSI}}_r z\) are equal, because our formula for \({\text {LSI}}_u z\) does not depend on \(u(\{i_1\})\), and \(\{i_1\}\) is the only event p and r disagree on.
Recall that for any \(w\in {\mathbb {R}}^n\) (identified with \({\mathbb {R}}^\Omega \)) and credence u we have:
where \(\alpha \) is chosen so that \(\alpha +w_i \ge 0\) for all i. It follows that \({\text {LSI}}_r w \le {\text {LSI}}_p w\), since \(r(A)\le p(A)\) for every \(A\subseteq \Omega \).
Let \(w=t(r)\). Then
Thus \(v\cdot z = v \cdot w\), and hence by choice of z we must have \(w=z\), so \(s(r)=s(p)\). Moreover, p is truer than r at \(i_1\). \(\square \)
1.3 Continuity
Without loss of generality, suppose \(\Omega = \{ 1,\dots ,n \}\) for \(n\ge 2\). We show there is a strictly proper scoring rule that is probability-continuous at every probability other than \(\delta _1\) but where no non-probability is strictly score-dominated by any probability.
Let \(s_0\) be the logarithmic score on the probabilities, defined by
where \(\log 0 = -\infty \). This is strictly proper.
Let s be a tweaked version of the logarithmic scoring rule where
We now check that this is strictly proper, i.e.,
whenever p is a probability and \(p\ne c\). There are four cases.
Case 1: p and c are distinct probabilities other than \(\delta _1\). Then:
by the strict propriety of the logarithmic scoring rule.
Case 2: \(p=\delta _1\) and c is a probability other than \(\delta _1\). Then:
Case 3: \(p=\delta _1\) and c is not a probability. Then:
Case 4: \(p\ne \delta _1\) and either \(c=\delta _1\) or c is not a probability. Then \(p(\{i\})>0\) for some \(i\ne 1\), and \(s(c)(i)=\infty \) regardless of whether c is \(\delta _1\) or a non-probability. Hence, \(E_p s(c)=\infty \), and this is also what \(E_p s_0(\delta _1)\) equals. Furthermore, since \(p\ne \delta _1\), we have \(s(p)=s_0(p)\). Thus by strict propriety of \(s_0\):
All the cases have been checked, and s is strictly proper. But if c is not a probability, then c is not strictly \(s_0\)-dominated by any probability. For \(s(c)(1)=-1\), and all the scores of probabilities other than \(\delta _1\) are non-negative, so the only possible s-dominator of c is \(\delta _1\). But \(s(c)(1)=\infty =s(\delta _1)(1)\), so we cannot have strict domination.
The above example is an unbounded scoring rule.
1.4 Strict truth-directedness
The set of credences \(\mathcal C\) is the space of functions from the powerset of \(\Omega \) to [0, 1] and can be equipped in the natural way with \(2^{|\Omega |}\)-dimensional Euclidean topology. This agrees with the topology on \(\mathcal P\subset \mathcal C\) that was used to define probability–continuity.
If a scoring rule is proper but not probability-distinguishing, then it cannot be quasi-strictly proper and also it cannot satisfy the domination thesis (4). To see the latter point, observe that no score of a probability can be s-dominated by the score of a probability given propriety, since if p were s-dominated by q, then \(E_p s(p) > E_p s(q)\), contrary to propriety. So if the score of a non-probability c equaled that of a probability, we wouldn’t have the domination thesis for c.
Theorem 4
Let s be any proper truth-directed scoring rule defined on the probabilities \(\mathcal P\) on \(\Omega \) where \(|\Omega |=2\). Then s can be extended to a truth-directed, proper but not probability-distinguishing scoring rule defined on all of \(\mathcal C\). Furthermore, the extension can be taken to be a continuous function from \(\mathcal C\) to \([M,\infty ]\) if s is probability-continuous.
Proof
Without loss of generality \(\Omega =\{1,2\}\). Let \(p_\alpha \) be the probability such that \(p_\alpha (\{1\})=\alpha \). Note that \(p_\alpha \) is truer than \(p_\beta \) at 1 if and only if \(\alpha >\beta \) and at 2 if and only if \(\alpha <\beta \).
Let \(\alpha (c)=1/2+(c(\{1\})-c(\{2\}))/2\) for any credence c. Now define
for \(c\in \mathcal C\). Note that this agrees with the original definition on \(\mathcal P\), since if c is a probability, \(\alpha (c) = c(\{ 1\})\). For simplicity, write s in place of \(s'\).
We now need to show that s thus extended is truth-directed, proper but not quasi-strictly proper.
Propriety is easy. Let p be any probability and c any credence. If c is a probability, we have \(E_p s(p) \le E_p s(c)\) by propriety restricted to the probabilities. If c is not a probability, we have \(E_p s(p) \le E_p s(p_{\alpha (c)}) \le E_p s(c)\), since \(s(c) \ge s(p_{\alpha (c)})\) everywhere.
Lack of probability distinguishing follows from the fact that if c satisfies Zero and Normalization but is not in \(\mathcal P\), then \(s(c)=s(p_{\alpha (c)})\) everywhere.
We now prove truth-directedness. All we need to prove is that if c is truer than d at 1, then \(s(c)(1) < s(d)(1)\); the case where c is truer than d at 2 is essentially the same. Furthermore, by forming a chain of credences between c and d that differ on only one event, we just need to prove that \(s(c)(1) < s(d)(1)\) in each of the following cases:
-
(i)
c and d agree on all events except \(\varnothing \), where \(c(\varnothing ) < d(\varnothing ),\)
-
(ii)
c and d agree on all events except \(\Omega \), where \(c(\Omega ) > d(\Omega ),\)
-
(iii)
c and d agree on all events except \(\{1\}\), where \(c(\{1\}) > d(\{1\}),\)
-
(iv)
c and d agree on all events except \(\{2\}\), where \(c(\{2\}) < d(\{2\}).\)
The inequality \(s(c)(1) < s(d)(1)\) is obvious in cases (i) and (ii).
Now suppose we have case (iii) or (iv). In both cases we have \(\alpha (c) > \alpha (d)\). Then \(p_{\alpha (c)}\) is truer at 1 than \(p_{\alpha (d)}\), and so by truth-directedness of s on \(\mathcal P\) we have \(s(c)(1) = s(p_{\alpha (c)})(1) < s(p_{\alpha (d)})(1) = s(d)(1)\).
Finally, the continuity claim is clear from our definition of the extension s. \(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Pruss, A.R. The dialectics of accuracy arguments for probabilism. Synthese 201, 153 (2023). https://doi.org/10.1007/s11229-023-04145-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11229-023-04145-y