Skip to main content
Log in

The dialectics of accuracy arguments for probabilism

  • Original Research
  • Published:
Synthese Aims and scope Submit manuscript

Abstract

Scoring rules measure the deviation between a credence assignment and reality. Probabilism holds that only those credence assignments that satisfy the axioms of probability are rationally admissible. Accuracy-based arguments for probabilism observe that given certain conditions on a scoring rule, the score of any non-probability is dominated by the score of a probability. The conditions in the arguments we will consider include propriety: the claim that the expected accuracy of p is not beaten by the expected accuracy of any other credence c by the lights of p if p is a probability. I argue that if we think through how a non-probabilist can respond to pragmatic arguments for probabilism, we will expect the non-probabilist to accept a condition stronger than propriety for the same reasons that the probabilist gives for propriety, but this stronger condition is incompatible with the other conditions that the probabilist needs to run the accuracy argument. This makes it unlikely for the probabilist’s argument to be compelling.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. This paper is focused on arguments that credences should be probabilities, and hence will pass over Lindley’s (1982) very interesting argument that, given some technical assumptions on an additive scoring rule but notably not including propriety, the “admissible” credence assignments—ones whose scores are not weakly dominated (one outcome is weakly dominated by another provided that in all circumstances the second is at least as good, and in at least one it is better) by the scores of other credence assignments—can transformed into probabilities while preserving order. To take this argument as establishing probabilism would require thinking that two order-isomorphic patterns of credence assignment are equivalent, so that the admissible credences are essentially probabilities. But order-isomorphism is not sufficient for real equivalence of credence assignments. Imagine that Alice always assigns a credence that is equal to the square root of the credence that Bob assigns. Then although there is an order-preserving transformation between their patterns of credence assignments, their rational betting behavior will be different: when Alice accepts even odds, Bob does not. Nonetheless, showing order isomorphism of admissible credences to probabilities is a significant result and provides some evidence for probabilism.

  2. Some of our discussion will then be simplified by not considering negative credences and credences greater than one. The kind of non-probabilist that we will be considering will be one that will place some reasonable constraints on credences, and making credences range from 0 to 1 certainly seems reasonable.

  3. In the setting of Pruss (2021b), a wager (perhaps “wager portfolio” would be a better term) is a sequence of event-payoff pairs, with the utility function defined in terms of these. Different wagers can yield the same utility function: a wager that yields \( \$2\) on heads as well as yielding \( \$3\) on heads-or-tails has the same utility function as a wager that yields \( \$5\) on heads as well as yielding \( \$3\) on tails. By assuming that wagers are compared by their utility functions, we are simplifying and assuming that wagers with the same utility function are interchangeable. This assumption is not satisfied for every method of linking decisions to credences. It is not satisfied, for instance, by the method presupposed by De Finetti’s pragmatic arguments for probabilism (de Finetti, 1937). In the context of preference comparisons derived from previsions, the interchangeability of portfolios with the same utility function corresponds to saying that the prevision is “integral-like” (Pruss, 2021b). Because our task in this paper is to consider how the accuracy arguments for probabilism fare against the most plausible versions of non-probabilism, and because a preference comparison that fails to be indifferent between two portfolios that have exactly the same utility function—say, because the portfolios arrange the wagers in different but logically equivalent ways—is eo ipso problematic, integral-likeness is a reasonable assumption in our context. We should not expect a smart non-probabilist to distinguish equivalent wagers.

  4. Our \({\text {LSI}}_c\) corresponds to \({\text {LSI}}_c^\uparrow \) in Pruss (2021b). We ignore \({\text {LSI}}_c^\pm \), as we would expect a utility prevision to commute with positive affine transformations, since utilities are normally thought to be defined only up to positive affine transformations, and \({\text {LSI}}_c^\uparrow \) commutes in this way if c satisfies Zero and Normalization (Pruss, 2021b, Lemma 1) while \({\text {LSI}}_c^\pm \) does not in general.

  5. Hájek (2008) notes that it is important to also show that if c is a probability, then there is no such p. Since this follows from the propriety assumption which says that \(E_c s(c) \le E_c s(p)\) if c is a probability, we will omit this point for brevity in discussions.

  6. In the case of mathematical expectation for a probability r, we have \(E_r(-f)=-E_r(f)\) and so our condition is equivalent to the more familiar \(E_r s(r) \le E_c s(r)\). However, it is in general not true that \({\text {LSI}}_r(-f)=-{\text {LSI}}_r(f)\): see the Appendix for what is actually true.

  7. Additionally, the result in Pruss (2016) suggests that there may be natural significant credence thresholds at 3/4, 15/16, 255/256, and so on. For when a perfect Bayesian’s credence in p exceeds 3/4, the perfect Bayesian is in a position to favor (i.e., assign credence greater than 1/2) the claim that her credence in p will never dip below 1/2, and when the credence in p exceeds 15/16, she is in a position to assign a credence greater than 3/4 that her credence in p will never dip below 3/4, and so on.

  8. The assumption that that at least one regular credence has a finite score is very plausible. First, if a regular probability has a score that’s somewhere infinite, the expected inaccuracy \(E_p s(p)\) of a regular probability p by its own lights will be infinite, and it is implausible that some probability, especially a regular one, would expect itself to be infinitely inaccurate. Hence, it is reasonable to think that every regular probability would have finite score. If every regular probability has an infinite score somewhere, and we have propriety, then every irregular probability would also have an infinite score somewhere, since by propriety we must have \(\infty = E_p s(p) \le E_p s(q)\) for any probability q and any regular probability p. Therefore, the propriety inequality \(E_p s(p) \le E_p s(q)\) would hold only in the degenerate \(\infty \le \infty \) form for any regular probability p and any probability q. Such a scoring rule would not be useful for guiding epistemic practice.

  9. It is worth noting that in the infinite case, there is a third highly technical solution. It is possible to construct strictly proper scoring rules in the infinite contexts if instead of requiring the values of the scores to be extended real numbers, we allow scores to take values in some larger set such as nets of real numbers (Pruss, 2022) (it is not known at present whether an argument for probabilism can be run in infinite contexts using this approach). This way out of the negative results does not appear to have a parallel in our finite-space non-probabilist context.

  10. Nor should it much help the “extremist” case if the only known strictly \((\mathcal E,E)\)-proper scoring rule s on all credences, where \(\mathcal E\) is the extreme probabilities, was such that if c is not an extreme probability, then c is strictly s-dominated by some extreme probability p. For a trivial example, one can let \(s(c)(\omega )=1-c(\{\omega \})\) if c is an extreme probability and \(s(c)(\omega )=2\) otherwise.

  11. I am grateful to Michael Nielsen for drawing my attention to \(\mathcal P_0\) and asking whether there is a strictly \((\mathcal P_0,E)\)-proper score.

  12. Recently Michael Nielsen has suggested to me that only probabilities in \(\mathcal P_0\) are rationally acceptable, and so perhaps the argument in favor of \(\mathcal P_0\) should be taken more seriously, despite the fact that it rules out the fair coin product measure which seems paradigmatically rational.

  13. I am grateful to Michael Nielsen for discussions of these topics, to two anonymous readers for a careful reading, and to one anonymous reader for a number of important observations discussing which has greatly enhanced the paper.

References

  • Campbell-Moore, C., & Levinstein, B. A. (2021). Strict propriety is weak. Analysis, 81, 8–13.

    Article  Google Scholar 

  • Dawid, A. P., & Musio, M. (2014). Theory and applications of proper scoring rules. Metron, 72, 169–183.

    Article  Google Scholar 

  • de Finetti, B. (1937). Foresight: Its logical laws, its subjective sources. In H. E. Kyburg & H. E. K. Smokler (Eds.), Studies in subjective probability. Kreiger Publishing.

    Google Scholar 

  • Gneiting, T., & Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102, 359–378.

    Article  Google Scholar 

  • Greaves, H., & Wallace, D. (2006). Justifying conditionalization: Conditionalization maximizes expected epistemic utility. Mind, 115, 607–632.

    Article  Google Scholar 

  • Hájek, A. (2008). Arguments for, or against, probabilism? British Journal for the Philosophy of Science, 59, 793–819.

    Article  Google Scholar 

  • Heinonen, J. (2012). Lectures on analysis on metric spaces. Springer.

    Google Scholar 

  • Joyce, J. M. (1998). A nonpragmatic vindication of probabilism. Philosophy of Science, 65, 575–603.

    Article  Google Scholar 

  • Joyce, J. M. (1999). Accuracy and coherence: Prospects for an alethic epistemology of partial belief. In F. Huber & C. Schmidt-Petri (Eds.), Degrees of belief (pp. 263–97). Springer.

    Google Scholar 

  • Lindley, D. V. (1982). Scoring rules and the inevitability of probability. International Statistical Review, 50, 1–11.

    Article  Google Scholar 

  • Molchanov, I. (2006). Theory of random sets. Springer.

    Google Scholar 

  • Nielsen, M. (2022). On the best accuracy arguments for probabilism. Philosophy of Science, 89, 621–630.

    Article  Google Scholar 

  • Norton, J. D. (2021). The material theory of induction. University of Calgary Press.

    Book  Google Scholar 

  • Oyama, D. (2014). On the differentiability of the support function: Mathematical notes for advanced microeconomics. http://www.oyama.e.u-tokyo.ac.jp/notes/diffSuppFunc01.pdf

  • Pettigrew, R. (2021). Accuracy-first epistemology without the additivity axiom. Philosophy of Science, 89, 128–151.

    Article  Google Scholar 

  • Pettigrew, R. (2019). On the expected utility objection to the Dutch Book argument for probabilism. Noûs, 55, 23–38.

    Article  Google Scholar 

  • Pettigrew, R. (2016). Accuracy and the laws of credence. Oxford University Press.

    Book  Google Scholar 

  • Predd, J. B., Seiringer, R., Lieb, E. H., Osherson, Dl. N., Vincent Poor, H., & Kulkarni, S. R. (2009). Probabilistic coherence and proper scoring rules. IEEE Transactions on Information Theory, 55, 4786–4792.

    Article  Google Scholar 

  • Pruss, A. R. (2016). Being sure and being confident that you won’t lose confidence. Logos and Episteme, 7, 45–54.

    Article  Google Scholar 

  • Pruss, A. R. (2021a). Proper scoring rules and domination. https://arxiv.org/abs/2102.02260

  • Pruss, A. R. (2021). Avoiding Dutch Books despite inconsistent credences. Synthese, 198, 11265–11289.

    Article  Google Scholar 

  • Pruss, A. R. (2022). Accuracy, probabilism and Bayesian update in infinite domains. Synthese, 200, 444.

    Article  Google Scholar 

  • Pruss, A. R. (2023a). Necessary and sufficient conditions for domination results for proper scoring rules. Review of Symbolic Logic (forthcoming). https://arxiv.org/abs/2103.00085

  • Pruss, A. R. (2023b). Domination for finite proper scoring rules (manuscript). http://philsci-archive.pitt.edu/21871

  • Ramsey, F. P. (1931). Truth and probability. In The foundations of mathematics and other logical essays (pp. 156–198). Routledge and Kegan Paul.

  • Rockafellar, R. T. (1970). Convex analysis. Princeton University Press.

    Book  Google Scholar 

  • Schervisch, M. J., Seidenfeld, T., & Kadane, J. B. (2009). Proper scoring rules, dominated forecasts, and coherence. Decision Analysis, 6, 202–221.

    Article  Google Scholar 

  • User “alesia”. (2022). Answer to: “For most directions does the supporting hyperplane meeting a bounded convex set meet it at one point?” Mathoverflow. https://mathoverflow.net/questions/432080

  • Winkler, R. L., Munoz, J., Cervera, J. L., Bernardo, J. M., Blattenberger, G., Kadane, J. B., Lindley, D. V., Murphy, A. H., Oliver, R. M., & Ríos-Insua, D. (1996). Scoring rules and the evaluation of probabilities. Test, 5, 1–60.

    Article  Google Scholar 

Download references

Acknowledgements

No funds, grants, or other support were received.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexander R. Pruss.

Ethics declarations

Conflict of interest

The author has no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Some technical results

Appendix: Some technical results

1.1 Level set integrals and monotonic credences

Given a credence c, let \(c^*(A)=1-c(\Omega -A)\) (for a probability p we have \(p^*=p\)). Then given Zero and Normalization, we have \({\text {LSI}}_c (-f) = -{\text {LSI}}_{c^*} f\) when f is a function that takes values either in \((-\infty ,\infty ]\) or in \([-\infty ,\infty )\). We only need to check this for f having finite values. Moreover, because \({\text {LSI}}_c (\alpha +f)={\text {LSI}}_c f\) (by the well-definition of \({\text {LSI}}_c f\)), we may suppose f takes values in [0, L] for some finite L and then:

$$\begin{aligned} {\text {LSI}}_c f&= \int _0^L c(\{ \omega : f(\omega )> y \}) \, dy \\&= \int _0^L (1-c^*(\{ \omega : -f(\omega ) \ge -y \})) \, dy \\&= L - \int _0^L c^*(\{ \omega : -f(\omega ) \ge -y \}) \, dy \\&= L - \int _0^L c^*(\{ \omega : -f(\omega )> -y \}) \, dy \\&= L - \int _0^L c^*(\{ \omega : -f(\omega )> t-L \}) \, dt \\&= L - \int _0^\infty c^*(\{ \omega : L-f(\omega ) > t \}) \, dt \\&= -{\text {LSI}}_{c^*} (-f), \end{aligned}$$

where the first and last equalities used Zero and Normalization (applied to c) respectively, and the move from considering the level set \(\{ \omega : -f(\omega ) \ge -y \}\) to considering the level set \(\{ \omega : -f(\omega ) > -y \}\) depended on the fact that the two level sets are equal except perhaps when y is one of the finitely many values of f.

In Pruss (2021b, Lemma 1g) it is incorrectly stated (with some trivial translation to our setting) that if f is negative and finite, then \({\text {LSI}}_c f = -{\text {LSI}}_c (-f)\). The right hand side should instead be \(- {\text {LSI}}_{c^*} (-f)\). The only place where (Pruss, 2021b) uses the incorrect claim appears to be in the proof of Theorem 1 in the special case of \({\text {LSI}}^\uparrow _P\), where it is shown that decisions using level set integrals avoid Dutch Books. To fix the problem, in the statement of the theorem in the case of \({\text {LSI}}^\uparrow _P\) one needs to replace Non-Negativity with the axiom that credences have value at most 1, and instead of the argument given in the proof, use the formula \({\text {LSI}}^\uparrow _P f = -{\text {LSI}}^\uparrow _{P^*} (-f)\) to establish that \({\text {LSI}}^\uparrow _P f < 0\) if \(f<0\) everywhere, noting that \(P^*\) satisfies Non-Negativity if \(P\le 1\) everywhere.

The following extends one of the monotonicity results from Pruss (2021b):

Theorem 1

If \(c\in \mathcal M\) and f and g are functions on \(\Omega \) with values in \([-\infty ,\infty ]\) such that \(f<g\) everywhere, then \({\text {LSI}}_c f < {\text {LSI}}_c g,\) with both level set integrals well-defined.

Proof

Since \(f<g\) everywhere, g cannot take the value \(-\infty \) anywhere and f cannot take the value \(+\infty \) anywhere. Let \(M_0=1+\max (\max f,\max (-g))\). This is finite, and if \(M\ge M_0\), then

$$\begin{aligned} f_M \le f_{M_0} < g_{M_0} \le g_M \end{aligned}$$

everywhere. By Pruss (2021b, Theorem 2) we then have

$$\begin{aligned} {\text {LSI}}_c f_M \le {\text {LSI}}_c f_{M_0} < {\text {LSI}}_c g_{M_0} \le {\text {LSI}}_c g_M. \end{aligned}$$

Taking the limit as \(M\rightarrow \infty \), we conclude that \({\text {LSI}}_c f < {\text {LSI}}_c g\). \(\square \)

1.2 Propriety

Theorem 2

Any bounded scoring rule s defined only on the probabilities and proper there can be extended to an \((\mathcal M,{\text {LSI}})\)-proper scoring rule on all the credences.

Proof

The value s(p) of s is a function from \(\Omega \) to \({\mathbb {R}}\) and the set of such functions \({\mathbb {R}}^\Omega \) can be thought of as n-dimensional Euclidean space, where n is the cardinality \(|\Omega |\) of \(\Omega \). Let V be the topological closure of the set \(\{ -s(p) : p \in \mathcal P \}\). For any fixed \(u\in \mathcal M-\mathcal P\), the prevision \({\text {LSI}}_u\) is a continuous function from \({\mathbb {R}}^\Omega \) to \({\mathbb {R}}\) (Pruss, 2021b, Prop. 2) and since V is compact, it attains a maximum at one or more points of V. Choose any one of these points, and call it \(\alpha _u\).

The selection of \(\alpha _u\) for each u can be done as a direct application of the Axiom of Choice, but we can also do it constructively. Identifying \({\mathbb {R}}^\Omega \) with \({\mathbb {R}}^n\), order it lexicographically. The set of points of V where \({\text {LSI}}_u\) attains its maximum is closed (since it’s the pre-image of the closed set \(\{ \max _V {\text {LSI}}_u \}\) under the continuous function \({\text {LSI}}_u\)) and hence compact, and so it will have a lexicographically first element. Let \(\alpha _u\) be that element.

Then let \(s(u)=-\alpha _u\). For any \(u\in \mathcal M-\mathcal P\), the point \(-s(u)\) maximizes \({\text {LSI}}_u\) over V. The same can be seen to be true for \(u\in \mathcal P\) by propriety of our original score s on \(\mathcal P\), the fact that any point of V is a limit of a sequence of values of \(-s\), and the fact that \({\text {LSI}}_u\) agrees with \(E_u\) for u a probability. Then for any \(u,v\in \mathcal M\) we have \({\text {LSI}}_u (-s(u)) \ge {\text {LSI}}_u (-s(v))\) because \({\text {LSI}}_u\) is maximized over V at \(-s(u)\) while \(-s(v)\in V\). Finally, we need to define s(c) where \(c\in \mathcal C-\mathcal M\). The simplest solution is just to let \(s(c)(\omega )=\infty \) for all \(\omega \) (any point that is dominated by some point in V will also work). \(\square \)

Say that a credence c satisfies Subadditivity provided that \(c(A)+c(B)\le c(A\cup B)\) whenever A and B are disjoint. Given that our credences take values in [0, 1], Subadditivity implies Zero and Monotonicity. Recall that c is regular provided that \(c(A)>0\) whenever A is non-empty. Let \(\mathcal S\) be the regular credences that satisfy Normalization and Subadditivity. Say that a member f of \([-\infty ,\infty ]^\Omega \) is finite provided \(|f(\omega )|<\infty \) for all \(\omega \in \Omega \).

Theorem 3

Suppose \(\Omega \) has at least two points. Let \(s:\mathcal S\rightarrow [M,\infty ]^\Omega \) be a \((\mathcal S,{\text {LSI}})\)-proper scoring rule defined on \(\mathcal S\) and suppose that s(u) is finite for at least one \(u\in \mathcal S\). Then there is a probability p in \(\mathcal S\) and a non-probability r in \(\mathcal S\) such that (a) \(s(p)=s(r)\) everywhere, and (b) there is a point \(\omega \in \Omega \) at which p is truer than s.

The proof of the Theorem can actually be used to show that for almost all (in the sense of Lebesgue measure) regular probabilities p with finite score there is a non-probability \(r\in \mathcal S\) such that (a) and (b) are true.

Say that a scoring rule s is probability-distinguishing provided that if \(p\in \mathcal P\) and \(c\in \mathcal C-\mathcal P\), then \(s(p)(\omega )\ne s(c)(\omega )\) for some \(\omega \). Then Theorem 3 shows that no \((\mathcal S,{\text {LSI}})\)-proper scoring rule defined on \(\mathcal S\) with at least one finite score is probability-distinguishing.

Note that quasi-strict propriety makes it impossible for a regular probability p to have an infinite score, since then we would have \(E_p s(p)=\infty \).

Corollary 1

No \((\mathcal S,{\text {LSI}})\)-proper scoring rule on a space with at least two points is quasi-strictly proper or strictly truth-directed.

Write \(v \cdot w\) for the dot product of two vectors. The (convex) support function \(\sigma _K\) of a subset K of \({\mathbb {R}}^n\) is defined by:

$$\begin{aligned} \sigma _K(v) = \sup _{z\in K} v\cdot z \end{aligned}$$

for \(v\in {\mathbb {R}}^n\). As usual, we say that something happens for almost all members of a set if it happens everywhere except on a set of zero Lebesgue measure.

Lemma 1

Let \(K \subseteq (-\infty ,M]^n\) be a non-empty closed convex set for \(n\ge 2\). Then for almost all v in the positive orthant \((0,\infty )^n,\) there is a unique \(z\in V\) such that \(\sigma _K(z) = v\cdot z\).

We will write \(v_i\) for the ith component of a vector in \({\mathbb {R}}^n\). I am grateful to a Mathoverflow responder (user “alesia”, 2022) for the part of the proof after the reduction to bounded K.

Proof of Lemma 1

Without loss of generality \(0\in V\) (otherwise translate K and change M as needed), so \(\sigma _K(z) \ge 0\) for all z.

Fix \(\varepsilon >0\). Let \(Q_\varepsilon \) be the set of vectors v in the positive orthant such that \(v_i/|v| > \varepsilon \) for all i. We shall show our result restricted to vectors in \(Q_\varepsilon \), and the general result follows since \((0,\infty )^n = \bigcup _{k=1}^\infty Q_{1/k}\).

Next observe that without loss of generality we can take K to be bounded. For suppose that \(v\in Q_\varepsilon \) and \(z\in K\). If \(z_i < -(n-1)M/\varepsilon \) for some i, then

$$\begin{aligned} v\cdot z{} & {} < -(\varepsilon |v|) (n-1)M/\varepsilon + (n-1)M|v| = 0 \\ {}{} & {} \le \sigma _K(v). \end{aligned}$$

Thus if \(K' = K\cap [-(n-1)M/\varepsilon ,M]^n\), then \(\sigma _{K'}\) and \(\sigma _{K}\) are equal on \(Q_\varepsilon \) and the suprema defining them are attained at the exact same points.

The support function of any closed, bounded and convex set is Lipschitz (Molchanov, 2006, p. 421, Theorem F.1). A Lipschitz function on an open set in \({\mathbb {R}}^n\) is differentiable almost everywhere (Heinonen 2012, p. 47, Theorem 6.15). And if the support function of K is differentiable at v, then there is a unique \(z\in K\) such that \(\sigma _K(z) = v\cdot z\) (see Rockafellar, 1970, Corollary 25.1.3) or, for a self-contained proof (Oyama, 2014, Theorem 1.1); note that results for the concave support function applied to the negative of the argument vector yield results for our convex support function \(\sigma _K\)). \(\square \)

Proof of Theorem 3

Let \(n=|\Omega |\). Suppose without loss of generality that \(\Omega =\{1,\dots ,n\}\). Let \(t=-s\), so \({\text {LSI}}_r t(r) \ge {\text {LSI}}_r t(u)\) for all \(r,u\in \mathcal S\) by the \((\mathcal S,{\text {LSI}})\)-propriety of s.

By abuse of notation, identify members of \({\mathbb {R}}^\Omega \) with members of \({\mathbb {R}}^n\). Let \(U\subset {\mathbb {R}}^n\) be the set of all finite t(u) for \(u\in \mathcal S\). Let K be the closed convex hull of U. By Lemma 1, let v in the positive orthant be such that for a unique \(z\in K\) we have \(\sigma _K(v) = v\cdot z\). Rescaling as needed, suppose \(\sum _{i=1}^n v_i = 1\).

Let p be the probability such that \(p(\{ i \}) = v_i\). Then for any \(w\in U\) we have \(w=t(u)\) for some u and so:

$$\begin{aligned} v \cdot t(p){} & {} \qquad \qquad = E_p t(p) = {\text {LSI}}_p t(p)\\ {}{} & {} \ge {\text {LSI}}_p t(u) = {\text {LSI}}_p w = E_p w = v \cdot w. \end{aligned}$$

By continuity and linearity of the inner product, it follows that \(v\cdot t(p) \ge v\cdot w\) for all \(w \in K\). Letting \(w=z\), we see that \(v \cdot z \le v\cdot t(p) \le v\cdot z\), and so by choice of z we must have \(z=t(p)\).

Let \(i_1,\ldots ,i_n\) be an enumeration of \(\{1,\dots ,n\}\) such that \(z_{i_1}\le \dots \le z_{i_n}\). Then for any credence u satisfying Zero and Normalization:

$$\begin{aligned} {\text {LSI}}_u z = z_{i_1} + \sum _{j=1}^{n-1} (z_{i_{j+1}}-z_{i_{j}}) u(\{ i_{j+1}, \dots , j_n \}). \end{aligned}$$

Let r be any credence such that \(r(A)=p(A)\) if \(A\ne \{ i_1 \}\) and \(0<r(\{i_1\})<p(\{i_1\})\). Then r satisfies Zero, Normalization and Subadditivity, and is regular, but is not a probability since \(\sum _{j=1}^n r(\{i_j\})<1\).

Observe that \({\text {LSI}}_p z\) and \({\text {LSI}}_r z\) are equal, because our formula for \({\text {LSI}}_u z\) does not depend on \(u(\{i_1\})\), and \(\{i_1\}\) is the only event p and r disagree on.

Recall that for any \(w\in {\mathbb {R}}^n\) (identified with \({\mathbb {R}}^\Omega \)) and credence u we have:

$$\begin{aligned} {\text {LSI}}_u w = -\alpha + \int _0^\infty u(\{i : \alpha +w_i > y \}) \, dy, \end{aligned}$$

where \(\alpha \) is chosen so that \(\alpha +w_i \ge 0\) for all i. It follows that \({\text {LSI}}_r w \le {\text {LSI}}_p w\), since \(r(A)\le p(A)\) for every \(A\subseteq \Omega \).

Let \(w=t(r)\). Then

$$\begin{aligned} {\text {LSI}}_p w \ge {\text {LSI}}_r w \ge {\text {LSI}}_r z = {\text {LSI}}_p z = v \cdot z \ge v \cdot w = {\text {LSI}}_p w. \end{aligned}$$

Thus \(v\cdot z = v \cdot w\), and hence by choice of z we must have \(w=z\), so \(s(r)=s(p)\). Moreover, p is truer than r at \(i_1\). \(\square \)

1.3 Continuity

Without loss of generality, suppose \(\Omega = \{ 1,\dots ,n \}\) for \(n\ge 2\). We show there is a strictly proper scoring rule that is probability-continuous at every probability other than \(\delta _1\) but where no non-probability is strictly score-dominated by any probability.

Let \(s_0\) be the logarithmic score on the probabilities, defined by

$$\begin{aligned} s_0(p)(i) = -\log p(\{i\}), \end{aligned}$$

where \(\log 0 = -\infty \). This is strictly proper.

Let s be a tweaked version of the logarithmic scoring rule where

$$\begin{aligned} s(c)(i) = \left\{ \begin{array}{cc} -\log c(\{i\})&{} \quad \text {if}\, c\, \text {is a probability, and (i)}~c\ne \delta _1\, \text {or (ii)}~i\ne 1,\\ -2 &{}\text {if }\, c=\delta _1 \text {and } i=1, \\ -1 &{}\text {if }\,c\, \text {is not a probability and}\, i=1,\\ \infty &{}\text {if }\, c\, \text { is not a probability and}\, i\ne 1. \end{array}\right. \end{aligned}$$

We now check that this is strictly proper, i.e.,

$$\begin{aligned} E_p s(p) < E_p s(c) \end{aligned}$$

whenever p is a probability and \(p\ne c\). There are four cases.

Case 1: p and c are distinct probabilities other than \(\delta _1\). Then:

$$\begin{aligned} E_p s(p) = E_p s_0(p) < E_p s_0(c) = E_p s(c) \end{aligned}$$

by the strict propriety of the logarithmic scoring rule.

Case 2: \(p=\delta _1\) and c is a probability other than \(\delta _1\). Then:

$$\begin{aligned} E_p s(p) = -2 < 0 \le s(c)(1) = E_p s(c). \end{aligned}$$

Case 3: \(p=\delta _1\) and c is not a probability. Then:

$$\begin{aligned} E_p s(p) = -2 < -1 = E_p s(c). \end{aligned}$$

Case 4: \(p\ne \delta _1\) and either \(c=\delta _1\) or c is not a probability. Then \(p(\{i\})>0\) for some \(i\ne 1\), and \(s(c)(i)=\infty \) regardless of whether c is \(\delta _1\) or a non-probability. Hence, \(E_p s(c)=\infty \), and this is also what \(E_p s_0(\delta _1)\) equals. Furthermore, since \(p\ne \delta _1\), we have \(s(p)=s_0(p)\). Thus by strict propriety of \(s_0\):

$$\begin{aligned} E_p s(p) = E_p s_0(p) < E_p s_0(\delta _1) = \infty = E_p s(c). \end{aligned}$$

All the cases have been checked, and s is strictly proper. But if c is not a probability, then c is not strictly \(s_0\)-dominated by any probability. For \(s(c)(1)=-1\), and all the scores of probabilities other than \(\delta _1\) are non-negative, so the only possible s-dominator of c is \(\delta _1\). But \(s(c)(1)=\infty =s(\delta _1)(1)\), so we cannot have strict domination.

The above example is an unbounded scoring rule.

1.4 Strict truth-directedness

The set of credences \(\mathcal C\) is the space of functions from the powerset of \(\Omega \) to [0, 1] and can be equipped in the natural way with \(2^{|\Omega |}\)-dimensional Euclidean topology. This agrees with the topology on \(\mathcal P\subset \mathcal C\) that was used to define probability–continuity.

If a scoring rule is proper but not probability-distinguishing, then it cannot be quasi-strictly proper and also it cannot satisfy the domination thesis (4). To see the latter point, observe that no score of a probability can be s-dominated by the score of a probability given propriety, since if p were s-dominated by q, then \(E_p s(p) > E_p s(q)\), contrary to propriety. So if the score of a non-probability c equaled that of a probability, we wouldn’t have the domination thesis for c.

Theorem 4

Let s be any proper truth-directed scoring rule defined on the probabilities \(\mathcal P\) on \(\Omega \) where \(|\Omega |=2\). Then s can be extended to a truth-directed, proper but not probability-distinguishing scoring rule defined on all of \(\mathcal C\). Furthermore, the extension can be taken to be a continuous function from \(\mathcal C\) to \([M,\infty ]\) if s is probability-continuous.

Proof

Without loss of generality \(\Omega =\{1,2\}\). Let \(p_\alpha \) be the probability such that \(p_\alpha (\{1\})=\alpha \). Note that \(p_\alpha \) is truer than \(p_\beta \) at 1 if and only if \(\alpha >\beta \) and at 2 if and only if \(\alpha <\beta \).

Let \(\alpha (c)=1/2+(c(\{1\})-c(\{2\}))/2\) for any credence c. Now define

$$\begin{aligned} s'(c) = s(p_{\alpha (c)})+c(\varnothing )+1-c(\Omega ) \end{aligned}$$

for \(c\in \mathcal C\). Note that this agrees with the original definition on \(\mathcal P\), since if c is a probability, \(\alpha (c) = c(\{ 1\})\). For simplicity, write s in place of \(s'\).

We now need to show that s thus extended is truth-directed, proper but not quasi-strictly proper.

Propriety is easy. Let p be any probability and c any credence. If c is a probability, we have \(E_p s(p) \le E_p s(c)\) by propriety restricted to the probabilities. If c is not a probability, we have \(E_p s(p) \le E_p s(p_{\alpha (c)}) \le E_p s(c)\), since \(s(c) \ge s(p_{\alpha (c)})\) everywhere.

Lack of probability distinguishing follows from the fact that if c satisfies Zero and Normalization but is not in \(\mathcal P\), then \(s(c)=s(p_{\alpha (c)})\) everywhere.

We now prove truth-directedness. All we need to prove is that if c is truer than d at 1, then \(s(c)(1) < s(d)(1)\); the case where c is truer than d at 2 is essentially the same. Furthermore, by forming a chain of credences between c and d that differ on only one event, we just need to prove that \(s(c)(1) < s(d)(1)\) in each of the following cases:

  1. (i)

    c and d agree on all events except \(\varnothing \), where \(c(\varnothing ) < d(\varnothing ),\)

  2. (ii)

    c and d agree on all events except \(\Omega \), where \(c(\Omega ) > d(\Omega ),\)

  3. (iii)

    c and d agree on all events except \(\{1\}\), where \(c(\{1\}) > d(\{1\}),\)

  4. (iv)

    c and d agree on all events except \(\{2\}\), where \(c(\{2\}) < d(\{2\}).\)

The inequality \(s(c)(1) < s(d)(1)\) is obvious in cases (i) and (ii).

Now suppose we have case (iii) or (iv). In both cases we have \(\alpha (c) > \alpha (d)\). Then \(p_{\alpha (c)}\) is truer at 1 than \(p_{\alpha (d)}\), and so by truth-directedness of s on \(\mathcal P\) we have \(s(c)(1) = s(p_{\alpha (c)})(1) < s(p_{\alpha (d)})(1) = s(d)(1)\).

Finally, the continuity claim is clear from our definition of the extension s. \(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pruss, A.R. The dialectics of accuracy arguments for probabilism. Synthese 201, 153 (2023). https://doi.org/10.1007/s11229-023-04145-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11229-023-04145-y

Keywords

Navigation