Learning and Pooling, Pooling and Learning

Stewart, Rush T.; Quintana, Ignacio Ojea

doi:10.1007/s10670-017-9894-2

Learning and Pooling, Pooling and Learning

Original Research
Published: 05 June 2017

Volume 83, pages 369–389, (2018)
Cite this article

Erkenntnis Aims and scope Submit manuscript

Rush T. Stewart¹ &
Ignacio Ojea Quintana¹

418 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

We explore which types of probabilistic updating commute with convex IP pooling (Stewart and Ojea Quintana 2017). Positive results are stated for Bayesian conditionalization (and a mild generalization of it), imaging, and a certain parameterization of Jeffrey conditioning. This last observation is obtained with the help of a slight generalization of a characterization of (precise) externally Bayesian pooling operators due to Wagner (Log J IGPL 18(2):336–345, 2009). These results strengthen the case that pooling should go by imprecise probabilities since no precise pooling method is as versatile.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Introduction to Stacking Regression for Economists

Pointless Learning

A distributed one-step estimator

Article 20 February 2019

Cheng Huang & Xiaoming Huo

Notes

Not all merging of opinions results require probabilities to converge to certainty (Blackwell and Dubins 1962). Under certain conditions, Bayesian conditionalizing can bring probabilities close even if they do not converge to 1 or 0.
$\Omega $ may be thought of as a partition of a space of agent-relative serious possibilities determined by consistency with a state of full belief. As is a state of full belief, $\Omega $ is open to being revised, refined, etc., as judged appropriate (Levi 1980).
Notice that, due to the way geometric pooling is defined, there are profiles for which $F(\varvec{p}_1,\ldots , \varvec{p}_n)(\omega ) = 0$ for all $\omega \in \Omega $—in violation of the probability axioms. Such a situation arises if for each $\omega \in \Omega $ there is a $\varvec{p}_i \in (\varvec{p}_1,\ldots , \varvec{p}_n)$ such that $\varvec{p}_i(\omega ) = 0$. Circumventing this problem, Wagner restricts the domain of pooling operators to the set of profiles for which this does not happen. That is, the domain of a pooling function is the set of profiles such that there is some $\omega \in \Omega $ for which $\varvec{p}_i(\omega ) > 0$ for all $i=1,\ldots , n$.
See Schervish and Seidenfeld (1990), Herron et al. (1997) for studies of convergence relevant to IP.
Within the IP research community, convexity is a matter of some controversy. For attacks on the requirement, see Seidenfeld et al. (1989, 2010), Kyburg and Pittarelli (1992). For defenses, see Levi (1990, 2009).
In the IP setting, conditionalization can actually lead to greater uncertainty in the short-run, a very interesting phenomenon known as dilation (Seidenfeld and Wasserman 1993; Pedersen and Wheeler 2014).
For any $A \in \mathscr {A},\quad \varvec{p}^E(A) = \frac{\varvec{p}(A \cap E)}{\varvec{p}(E)} = \frac{\sum _{\omega \in A \cap E}\varvec{p}(\omega )}{\sum _{\omega \in E}\varvec{p}(\omega )}$. By the definition of a probability measure, $\varvec{p}(A) = \sum _{\omega \in A} \varvec{p}(\omega )$, so $\sum _{\omega \in A} \varvec{p}^\lambda (\omega ) = \frac{\sum _{\omega \in A} \varvec{p}(\omega )\lambda (\omega )}{\sum _{\omega ' \in \Omega } \varvec{p}(\omega ')\lambda (\omega ')}$ gives us $\varvec{p}^\lambda (A)$. We show that these two fractions are equal by showing the equality of both the numerators and denominators. Since, for all $\omega \in A$, $\varvec{p}(\omega )\lambda (\omega ) = \varvec{p}(\omega )$ if $\omega \in E$ and 0 otherwise, $\sum _{\omega \in A}\varvec{p}(\omega )\lambda (\omega ) = \sum _{\omega \in A \cap E} \varvec{p}(\omega ) = \varvec{p}(A \cap E)$. Hence, the numerators are equal. And since, for all $\omega ' \in \Omega , \varvec{p}(\omega ')\lambda (\omega ') = \varvec{p}(\omega ')$ if $\omega ' \in E$ and 0 otherwise, we have $\sum _{\omega ' \in \Omega } \varvec{p}(\omega ')\lambda (\omega ') = \sum _{\omega ' \in E} \varvec{p}(\omega ') = \varvec{p}(E)$. Hence, the denominators are equal, too. So, $\varvec{p}^E = \varvec{p}^\lambda $.
Thanks to Paul Pedersen for emphasizing this point to us.
Wagner contends that identical learning should be thought of as identical Bayes factors rather than identical posteriors. One alleged reason is that posteriors are tainted by the prior, whereas Bayes factors are an uncontaminated measure of the impact of the evidence. How do Bayes factors measure the impact of the evidence in isolation from the prior? Consider the case in which $\varvec{q}$ comes from $\varvec{p}$ by Bayesian conditionalization on E. Then,
$$\begin{aligned} \varvec{q}(A)/\varvec{q}(B) = \frac{\varvec{p}(A|E)}{\varvec{p}(B|E)} \end{aligned}$$
and
$$\begin{aligned} {\mathcal {B}}(\varvec{q}, \varvec{p}; A:B) = \frac{\varvec{p}(A|E)/\varvec{p}(B|E)}{\varvec{p}(A)/\varvec{p}(B)}. \end{aligned}$$
So, ${\mathcal {B}}(\varvec{q}, \varvec{p}; A:B)$ is a measure of the change the evidence, E, induces in favor of A over B. ${\mathcal {B}}(\varvec{q}, \varvec{p}; A:B)$ can also be rearranged using Bayes’ theorem.
$$\begin{aligned} \frac{\varvec{q}(A)}{\varvec{q}(B)} = \frac{\varvec{p}(A|E)}{\varvec{p}(B|E)} = \frac{\frac{\varvec{p}(A)\varvec{p}(E|A)}{\varvec{p}(E)}}{\frac{\varvec{p}(B)\varvec{p}(E|B)}{\varvec{p}(E)}} = \frac{\varvec{p}(A)\varvec{p}(E|A)}{\varvec{p}(B)\varvec{p}(E|B)} = \frac{\varvec{p}(A)}{\varvec{p}(B)} \times \frac{\varvec{p}(E|A)}{\varvec{p}(E|B)} \end{aligned}$$
Dividing now by $\frac{\varvec{p}(A)}{\varvec{p}(B)}$, the denominator of ${\mathcal {B}}(\varvec{q}, \varvec{p}; A:B)$, gives us
$$\begin{aligned} {\mathcal {B}}(\varvec{q}, \varvec{p}; A:B) = \frac{\varvec{p}(E|A)}{\varvec{p}(E|B)} \end{aligned}$$
The quantity $\varvec{p}(E|A) \big / \varvec{p}(E|B)$ is sometimes referred to as the likelihood ratio. So, the Bayes factor is a ratio of the non-prior quantities involved in Bayes’ theorem, the quantities that revise the prior.
Wagner’s version of commutativity with Jeffrey conditionalization involves some additional technical assumptions. First, that $\varvec{p}_i(E_k) > 0$ for all i and all k. Second, that $b_1 = 1$ and $\sum _k b_k \varvec{p}_i(E_k) < \infty $ for $i = 1,\ldots , n$. Third, where $\varvec{q}_i(\omega ) = \frac{\sum _k b_k \varvec{p}_i(\omega )[\omega \in E_k]}{\sum _k b_k \varvec{p}_i(E_k)}$, it is the case that $0< \sum _k b_k F(\varvec{p}_1,\ldots , \varvec{p}_n)(E_k) < \infty $. In the IP setting, this last assumption may be adjusted to be a requirement for each $\varvec{p}\in {\mathcal {F}}(\varvec{p}_1,\ldots , \varvec{p}_n)$.
In finite spaces, any revision method can be represented as conditionalization in a richer space via superconditioning provided the posterior probability is absolutely continuous with repsect to the prior.
A metaphysically deflationary conception of possible worlds has it that a possible world is just a maximally complete set of sentences in some propositional language, instead of a “possible totality of facts.”.
Others, however, have offered more uniform accounts of supposition (e.g., Levi 1996).
Though, as Diaconis and Zabell’s aforementioned result shows us, in a range of cases there is no mathematical necessity in adopting Jeffrey conditionalization in order to obtain the results of Jeffrey conditionalization.
Though it is not uncontroversial that conditionalization or some other type of updating of represents learning. Isaac Levi, for instance, writes, “All conditions of rationality are equilibrium conditions. In a sense they are synchronic conditions [...] Furthermore, in stating conditions of rational equilibrium, no prescription is made regarding the psychological path to be taken in moving from disequilibrium or from one equilibrium position to another. In other words, there are no norms prescribing rational learning processes” (Levi 1970).

References

Arló-Costa, H. (2007). The logic of conditionals. In E. N. Zalta (Ed.), The Stanford Encyclopedia of Philosophy (Summer 2014 ed.). Stanford University: Metaphysics Research Lab.
Google Scholar
Baratgin, J., & Politzer, G. (2010). Updating: A psychologically basic situation of probability revision. Thinking & Reasoning, 16(4), 253–287.
Article Google Scholar
Blackwell, D., & Dubins, L. (1962). Merging of opinions with increasing information. The Annals of Mathematical Statistics, 33, 882–886.
Article Google Scholar
Christensen, D. (2009). Disagreement as evidence: The epistemology of controversy. Philosophy Compass, 4(5), 756–767.
Article Google Scholar
de Finetti, B. (1964). Foresight: Its logical laws, its subjective sources. In H. E. Kyburg & H. E. Smoklery (Eds.), Studies in Subjective Probability. Hoboken: Wiley.
Google Scholar
Diaconis, P., & Zabell, S. L. (1982). Updating subjective probability. Journal of the American Statistical Association, 77(380), 822–830.
Article Google Scholar
Dietrich, F., & List, C. (2014). Probabilistic opinion pooling. In A. Hájek & C. Hitchcock (Eds.), Oxford Handbook of Probability and Philosophy. Oxford: Oxford University Press.
Google Scholar
Elga, A. (2007). Reflection and disagreement. Noûs, 41(3), 478–502.
Article Google Scholar
Elkin, L., & Wheeler, G. (2016). Resolving peer disagreements through imprecise probabilities. Noûs. doi:10.1111/nous.12143.
Google Scholar
Field, H. (1978). A note on jeffrey conditionalization. Philosophy of Science, 45, 361–367.
Article Google Scholar
Gaifman, H., & Snir, M. (1982). Probabilities over rich languages, testing and randomness. The Journal of Symbolic Logic, 47(03), 495–548.
Article Google Scholar
Gaifman, H., & Vasudevan, A. (2012). Deceptive updating and minimal information methods. Synthese, 187(1), 147–178.
Article Google Scholar
Gärdenfors, P. (1982). Imaging and conditionalization. The Journal of Philosophy, 79, 747–760.
Article Google Scholar
Genest, C. (1984). A characterization theorem for externally bayesian groups. The Annals of Statistics, 12, 1100–1105.
Article Google Scholar
Genest, C., McConway, K. J., & Schervish, M. J. (1986). Characterization of externally bayesian pooling operators. The Annals of Statistics,14, 487–501.
Article Google Scholar
Genest, C., & Wagner, C. G. (1987). Further evidence against independence preservation in expert judgement synthesis. Aequationes Mathematicae, 32(1), 74–86.
Article Google Scholar
Genest, C., & Zidek, J. V. (1986). Combining probability distributions: A critique and an annotated bibliography. Statistical Science, 1, 114–135.
Article Google Scholar
Girón, F. J., & Ríos, S. (1980). Quasi-bayesian behaviour: A more realistic approach to decision making? Trabajos de Estadística y de Investigación Operativa, 31(1), 17–38.
Article Google Scholar
Good, I. J. (1983). Good Thinking: The Foundations of Probability and Its Applications. Minneapolis: U of Minnesota Press.
Google Scholar
Hájek, A., & Hall, N. (1994). The hypothesis of the conditional construal of conditional probability. In E. Eells & B. Skyrms (Eds.), Probability and conditionals: Belief revision and rational decision (pp. 75–112). Cambridge: Cambridge University Press.
Google Scholar
Hartmann, S. (2014). A new solution to the problem of old evidence. In Philosophy of Science Association 24th Biennial Meeting, Chicago, IL.
Herron, T., Seidenfeld, T., & Wasserman, L. (1997). Divisive conditioning: Further results on dilation. Philosophy of Science, 64, 411–444.
Article Google Scholar
Huttegger, S. M. (2015). Merging of opinions and probability kinematics. The Review of Symbolic Logic, 8(04), 611–648.
Article Google Scholar
Jeffrey, R. (2004). Subjective Probability: The Real Thing. Cambridge: Cambridge University Press.
Book Google Scholar
Joyce, J. M. (1999). The Foundations of Causal Decision Theory. Cambridge: Cambridge University Press.
Book Google Scholar
Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 22, 79–86.
Article Google Scholar
Kyburg, H. E. (1987). Bayesian and non-bayesian evidential updating. Artificial Intelligence, 31(3), 271–293.
Article Google Scholar
Kyburg, H.E., Pittarelli, M. (1992). Some problems for convex bayesians. In Proceedings of the Eighth International Conference on Uncertainty in Artificial Intelligence, pp. 149–154. Morgan Kaufmann Publishers Inc.
Leitgeb, H. (2016). Imaging all the people. Episteme. doi:10.1017/epi.2016.14.
Levi, I. (1967). Probability kinematics. British Journal for the Philosophy of Science, 18(3), 197–209.
Article Google Scholar
Levi, I. (1970). Probability and evidence. In M. Swain (Ed.), Induction, Acceptance, and Rational Belief (pp. 134–156). New York: Humanities Press.
Chapter Google Scholar
Levi, I. (1978). Irrelevance. In C. Hooker, J. Leach, & E. McClennen (Eds.), Foundations and Applications of Decision Theory (Vol. 1, pp. 263–273). Boston: Springer.
Google Scholar
Levi, I. (1980). The Enterprise of Knowledge. Cambridge, MA: MIT Press.
Google Scholar
Levi, I. (1985). Consensus as shared agreement and outcome of inquiry. Synthese, 62(1), 3–11.
Article Google Scholar
Levi, I. (1990). Pareto unanimity and consensus. The Journal of Philosophy, 87(9), 481–492.
Article Google Scholar
Levi, I. (1996). For the Sake of the Argument: Ramsey Test Conditionals, Inductive Inference and Nonmonotonic Reasoning. Cambridge: Cambridge University Press.
Book Google Scholar
Levi, I. (2009). Why indeterminate probability is rational. Journal of Applied Logic, 7(4), 364–376.
Article Google Scholar
Lewis, D. (1976). Probabilities of conditionals and conditional probabilities. The Philosophical Review, 85, 297–315.
Article Google Scholar
Madansky, A. (1964). Externally Bayesian Groups. Santa Monica, CA: RAND Corporation.
Google Scholar
Nau, R. F. (2002). The aggregation of imprecise probabilities. Journal of Statistical Planning and Inference, 105(1), 265–282.
Article Google Scholar
Pedersen, A. P., & Wheeler, G. (2014). Demystifying dilation. Erkenntnis, 79(6), 1305–1342.
Article Google Scholar
Raiffa, H. (1968). Decision analysis: Introductory lectures on choices under uncertainty. Random House.
Ramsey, F. P. (1990). Truth and probability. In D. H. Mellor (Ed.), Philosophical Papers (pp. 52–109). Cambridge University Press.
Russell, J. S., Hawthorne, J., & Buchak, L. (2015). Groupthink. Philosophical Studies, 172(5), 1287–1309.
Article Google Scholar
Savage, L. (1972, originally published in 1954). The Foundations of Statistics. New York: Wiley.
Schervish, M., & Seidenfeld, T. (1990). An approach to consensus and certainty with increasing evidence. Journal of Statistical Planning and Inference, 25(3), 401–414.
Article Google Scholar
Seidenfeld, T. (1986). Entropy and uncertainty. Philosophy of Science, 53, 467–491.
Article Google Scholar
Seidenfeld, T., Kadane, J. B., & Schervish, M. J. (1989). On the shared preferences of two bayesian decision makers. The Journal of Philosophy, 86(5), 225–244.
Article Google Scholar
Seidenfeld, T., Schervish, M. J., & Kadane, J. B. (2010). Coherent choice functions under uncertainty. Synthese, 172(1), 157–176.
Article Google Scholar
Seidenfeld, T., & Wasserman, L. (1993). Dilation for sets of probabilities. The Annals of Statistics, 21(3), 1139–1154.
Article Google Scholar
Skyrms, B. (1986). Choice and Chance: An Introduction to Inductive Logic (3rd ed.). Belmont: Wadsworth Publishing Company.
Google Scholar
Spohn, W. (2012). The Laws of Belief: Ranking Theory and Its Philosophical Applications. Oxford: Oxford University Press.
Book Google Scholar
Stewart, R. T. & Ojea Quintana, I. (2017). Probabilistic opinion pooling with imprecise probabilities. Journal of Philosophical Logic. doi:10.1007/s10992-016-9415-9.
Google Scholar
van Fraassen, B. C. (1989). Laws and Symmetry. Oxford: Clarendon Press.
Book Google Scholar
Wagner, C. (2002). Probability kinematics and commutativity. Philosophy of Science, 69(2), 266–278.
Article Google Scholar
Wagner, C. (2009). Jeffrey conditioning and external bayesianity. Logic Journal of IGPL, 18(2), 336–345.
Article Google Scholar
Williams, P. M. (1980). Bayesian conditionalisation and the principle of minimum information. British Journal for the Philosophy of Science, 31, 131–144.
Article Google Scholar

Download references

Acknowledgements

The bulk of this work was done while we were on a Junior Group Visiting Fellowship at the Munich Center for Mathematical Philosophy. The paper benefited from conversations with Stephan Hartmann and Hannes Leitgeb. We would especially like to thank Greg Wheeler for feedback, numerous relevant discussions, and support. We are grate- ful to Matt Duncan, Robby Finley, Arthur Heller, Isaac Levi, Michael Nielsen, Rohit Parikh, Paul Pedersen, Teddy Seidenfeld, and Reuben Stern for their excellent comments on drafts or presentations of the pa- per. Finally, thanks to an anonymous referee for his or her meticulous and valuable review.

Author information

Authors and Affiliations

Department of Philosophy, Columbia University, New York, NY, USA
Rush T. Stewart & Ignacio Ojea Quintana

Authors

Rush T. Stewart
View author publications
You can also search for this author in PubMed Google Scholar
Ignacio Ojea Quintana
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rush T. Stewart.

Appendices

Appendix: Proofs

Proof of Proposition 2

Proof

We follow through Wagner’s proof for the precise case (2009, Theorem 3.3), adapting it for IP where necessary.

$(\Rightarrow )$ Assume that ${\mathcal {F}}$ is externally Bayesian, i.e., for all profiles and any likelihood function, ${\mathcal {F}}^\lambda (\varvec{p}_1,\ldots , \varvec{p}_n) = {\mathcal {F}}(\varvec{p}_1^\lambda ,\ldots , \varvec{p}_n^\lambda )$. We want to show that, for all partitions $\varvec{E} = \{E_k\}$ of $\Omega $ and all profiles in ${\mathbb {P}}^n$,

$$\begin{aligned} {\mathcal {F}}_J^{\varvec{E}}(\varvec{p}_1,\ldots , \varvec{p}_n)= & {} \left\{ \dfrac{\sum _k b_k \varvec{p}[\cdot \in E_k]}{\sum _k b_k \varvec{p}(E_k)}: \varvec{p}\in {\mathcal {F}}(\varvec{p}_1,\ldots , \varvec{p}_n)\right\} \\= & {} {\mathcal {F}}\left( \dfrac{\sum _k b_k \varvec{p}_1[\cdot \in E_k]}{\sum _k b_k \varvec{p}_1(E_k)},\ldots , \dfrac{\sum _k b_k \varvec{p}_n[\cdot \in E_k]}{\sum _k b_k \varvec{p}_n(E_k)}\right) \\= & {} {\mathcal {F}}(\varvec{p}_{1J}^{\varvec{E}},\ldots , \varvec{p}_{nJ}^{\varvec{E}}) \end{aligned}$$

where the first and last equalities are definitional. Recall the definition of $b_k$: $b_k = {\mathcal {B}}(\varvec{q},\varvec{p};E_k:E_1) = \dfrac{\varvec{q}(E_k)/\varvec{q}(E_1)}{\varvec{p}(E_k)/\varvec{p}(E_1)}$, $k = 1, 2,\ldots $ Set $\lambda (\omega ) = \sum _k b_k [\omega \in E_k]$. Wagner observes the following chain of equalities then obtains for $\varvec{p}_i, i = 1,\ldots , n$ (2009, (3.10), p. 342):

$$\begin{aligned} (\star )\sum _{\omega \in \Omega } \lambda (\omega )\varvec{p}_i(\omega ) = \sum _{\omega \in \Omega }\varvec{p}_i(\omega )\sum _k b_k [\omega \in E_k] = \sum _k b_k \sum _{\omega \in \Omega }\varvec{p}_i(\omega )[\omega \in E_k] = \sum _k b_k \varvec{p}_i(E_k) \end{aligned}$$

Since each of the terms $b_k \varvec{p}_i(E_k)$ is positive and $\sum _k b_k \varvec{p}_i(E_k) < \infty $, $\lambda $ is a likelihood function for $\varvec{p}_i,$ with $\varvec{p}^{\lambda}_{i}$ a defined, updated pmf for $i = 1,\ldots , n.$ Using $(\star )$, we can obtain

$$\begin{aligned} {\mathcal {F}}(\varvec{p}_{1J}^{\varvec{E}},\ldots , \varvec{p}_{nJ}^{\varvec{E}}) = {\mathcal {F}}\left( \frac{\varvec{p}_1\lambda (\cdot )}{\sum _{\omega ' \in \Omega }\varvec{p}_1(\omega ')\lambda (\omega ')},\ldots , \frac{\varvec{p}_n\lambda (\cdot )}{\sum _{\omega ' \in \Omega } \varvec{p}_n(\omega ')\lambda (\omega ')}\right) \end{aligned}$$

by substituting, for each $i=1,\ldots , n$, $\lambda (\cdot )$ for $\sum _k b_k [\omega \in E_k]$ in the numerator and $\sum _{\omega ' \in \Omega } \varvec{p}_i(\omega ')\lambda (\omega ')$ for $\sum _k b_k \varvec{p}_i(E_k)$ in the denominator. But by definition,

$$\begin{aligned} {\mathcal {F}}\left( \frac{\varvec{p}_1\lambda (\cdot )}{\sum _{\omega ' \in \Omega }\varvec{p}_1(\omega ')\lambda (\omega ')},\ldots , \frac{\varvec{p}_n\lambda (\cdot )}{\sum _{\omega ' \in \Omega } \varvec{p}_n(\omega ')\lambda (\omega ')}\right) = {\mathcal {F}}(\varvec{p}_1^\lambda ,\ldots , \varvec{p}_n^\lambda ) \end{aligned}$$

and by assumption ${\mathcal {F}}(\varvec{p}_1^\lambda ,\ldots , \varvec{p}_n^\lambda )={\mathcal {F}}^\lambda (\varvec{p}_1,\ldots , \varvec{p}_n)$. By definition, ${\mathcal {F}}^\lambda (\varvec{p}_1,\ldots , \varvec{p}_n) = \{\varvec{p}^\lambda : \varvec{p}\in {\mathcal {F}}(\varvec{p}_1,\ldots , \varvec{p}_n)\}$. But, for all $\varvec{p}\in {\mathcal {F}}(\varvec{p}_1,\ldots , \varvec{p}_n)$, $\varvec{p}^\lambda = \frac{\sum _k b_k \varvec{p}[\cdot \in E_k]}{\sum _k b_k \varvec{p}(E_k)}$. Hence, ${\mathcal {F}}^\lambda (\varvec{p}_1,\ldots , \varvec{p}_n) = {\mathcal {F}}_J^{\varvec{E}}(\varvec{p}_1,\ldots , \varvec{p}_n)$. So, ${\mathcal {F}}_J^{\varvec{E}}(\varvec{p}_1,\ldots , \varvec{p}_n) = {\mathcal {F}}(\varvec{p}_{1J}^{\varvec{E}},\ldots , \varvec{p}_{nJ}^{\varvec{E}})$ follows from the assumption.

$(\Leftarrow )$ Suppose that ${\mathcal {F}}$ satisfies $\textit{CJC}_W$ and that $\lambda $ is a likelihood function for $\varvec{p}_i, i = 1,\ldots , n$. Let $(\omega _1, \omega _2,\ldots )$ be a list of all of those $\omega \in \Omega $ such that $\lambda (\omega ) > 0$, and let $\varvec{E} = \{E_1, E_2,\ldots \},$ where $E_i:\,= \{\omega _i\}.$ Setting $b_k = \frac{\lambda (\omega _k)}{\lambda (\omega _1)}$ for $k = 1, 2,\ldots $, it follows that $b_k>0$ and that $b_1=1$. Since $\lambda $ is a likelihood for $\varvec{p}_i, i = 1,\ldots , n,$ we have $\sum _k b_k \varvec{p}_i(E_k)<\infty , i = 1,\ldots , n,$ and that $(\varvec{q}_1,\ldots , \varvec{q}_n) \in {\mathbb {P}}^n,$ where $\varvec{q}_i(\omega ):\,= \frac{\sum _k b_k \varvec{p}_i(\omega )[\omega \in E_k]}{\sum _k b_k \varvec{p}_i(E_k)}.$ From $\textit{CJC}_W$, it follows that $1)\ 0< \sum _k b_k \varvec{p}(E_k) < \infty $ for all $\varvec{p}\in {\mathcal {F}}(\varvec{p}_1,\ldots , \varvec{p}_n),$ and that $2)\ {\mathcal {F}}_J^{\varvec{E}}(\varvec{p}_1,\ldots , \varvec{p}_n) = {\mathcal {F}}(\varvec{p}_{1J}^{\varvec{E}},\ldots , \varvec{p}_{nJ}^{\varvec{E}})$. 1) implies that $0<\sum _{\omega \in \Omega } \lambda (\omega ) \varvec{p}(\omega ) < \infty $ for all $\varvec{p}\in {\mathcal {F}}(\varvec{p}_1,\ldots , \varvec{p}_n)$, and 2) implies that ${\mathcal {F}}^\lambda (\varvec{p}_1,\ldots , \varvec{p}_n) = {\mathcal {F}}(\varvec{p}_1^\lambda ,\ldots , \varvec{p}_n^\lambda )$ (since substituting the definition of $b_k$ in terms of $\lambda $ in $\frac{\sum _k b_k \varvec{p}_i(\omega )[\omega \in E_k]}{\sum _k b_k \varvec{p}_i(E_k)}$, the formula for obtaining the $\varvec{q}_i$, reduces that formula to the formula for updating on that $\lambda $). $\square $

Proof of Proposition 5

Proof

We provide a case in which convex IP pooling and Jeffrey conditionalization as standardly construed do not commute. Let $\varvec{q}_i$ come from $\varvec{p}_i$ by Jeffrey conditionalization, and let $\varvec{q}$ be a common posterior distribution over partition $\varvec{E}$ for $\varvec{p}_i$, $i = 1,\ldots , n$. Let ${\mathcal {F}}_{J}^{\varvec{E}}(\varvec{p}_1,\ldots , \varvec{p}_n)$ come from ${\mathcal {F}}(\varvec{p}_1,\ldots , \varvec{p}_n)$ by Jeffrey conditionalizing each $\varvec{p}_i$ using $\varvec{q}$, the common posterior distribution over $\varvec{E}$. We offer a counterexample to commutativity in which ${\mathcal {F}}_J^{\varvec{E}}(\varvec{p}_1,\ldots , \varvec{p}_n) \ne {\mathcal {F}}(\varvec{q}_1,\ldots , \varvec{q}_n)$.

Let $\Omega = \{\omega _1, \omega _2, \omega _3, \omega _4\}$, and consider the following two pmfs listed in Table 2. Let $\varvec{E} = \{E_1, E_2\}$ with $E_1 = \{\omega _1, \omega _2\}$ and $E_2 = \{\omega _3, \omega _4\}$ be a partition of $\Omega $. Jeffrey updating both pmfs using $\varvec{q}$, where $\varvec{q}(E_1) = 2/3$ and $\varvec{q}(E_2) = 1/3$, we obtain the following posteriors listed in (Table 3).

Table 2 Priors

Full size table

Table 3 Posteriors

Full size table

Consider the $.50-.50$ mixture of $\varvec{p}_1$ and $\varvec{p}_2$, $\varvec{p}^\star = 0.5\varvec{p}_1 + 0.5\varvec{p}_2$. It is clear that $\varvec{p}^\star \in {\mathcal {F}}(\varvec{p}_1, \varvec{p}_2)$. Jeffrey conditionalizing $\varvec{p}^\star $ with $\varvec{q}$ gives us $\varvec{q}^\star $. In particular, $\varvec{q}^\star (\omega _1) = 2/9$ and $\varvec{q}^\star (\omega _3) = 4/21$. It is clear that $\varvec{q}^\star \in {\mathcal {F}}^J_{\varvec{E}}(\varvec{p}_1, \varvec{p}_2)$. Any $\varvec{q}_\star \in {\mathcal {F}}(\varvec{q}_1, \varvec{q}_2)$ is of the form $\varvec{q}_\star = \alpha \varvec{q}_1 + (1 - \alpha ) \varvec{q}_2$ for $\alpha \in [0, 1]$.

Suppose that ${\mathcal {F}}_J^{\varvec{E}}(\varvec{p}_1, \varvec{p}_2) = {\mathcal {F}}(\varvec{q}_1, \varvec{q}_2)$. Then, there is a $\varvec{q}_\star \in {\mathcal {F}}(\varvec{q}_1, \varvec{q}_2)$ such that $\varvec{q}^\star = \varvec{q}_\star $. In particular, $\varvec{q}_\star (\omega _1) = 2/9$ and $\varvec{q}_\star (\omega _3) = 4/21$. Letting $\varvec{q}_\star (\omega _1) = 2/9$, we can compute $\alpha $.

$$\begin{aligned} 2/9 = \varvec{q}_\star (\omega _1) = \alpha \varvec{q}_1(\omega _1) + (1 - \alpha )\varvec{q}_2(\omega _1) = \alpha 1/3 + (1 - \alpha ) 2/15 \end{aligned}$$

Solving, we get $\alpha = 4/9$. However, we are supposed to have $\varvec{q}_\star (\omega _3) = 4/21$. For $\alpha = 4/9$, that is not the case.

$$\begin{aligned} \varvec{q}_\star (\omega _3) = \alpha \varvec{q}_1(\omega _3) + (1 - \alpha ) \varvec{q}_2(\omega _3) = 4/9(1/6) + 5/9(2/9) = 16/81 > 4/21 = \varvec{q}^\star (\omega _3) \end{aligned}$$

It follows that ${\mathcal {F}}_J^{\varvec{E}}(\varvec{p}_1, \varvec{p}_2) \ne {\mathcal {F}}(\varvec{q}_1, \varvec{q}_2)$. $\square $

Proof of Proposition 6

Proof

We want to show that ${\mathcal {F}}(\varvec{q}_1,\ldots , \varvec{q}_n) = {\mathcal {F}}_I^E(\varvec{p}_1,\ldots , \varvec{p}_n)$, where $\varvec{q}_i$ comes from $\varvec{p}_i$ by general imaging on E, and ${\mathcal {F}}_I^E(\varvec{p}_1,\ldots , \varvec{p}_n)$ comes from ${\mathcal {F}}(\varvec{p}_1,\ldots , \varvec{p}_n)$ by general imaging each $\varvec{p}\in {\mathcal {F}}(\varvec{p}_1,\ldots , \varvec{p}_n)$ on E. Again, we show both inclusions. In the proofs, we appeal to the fact any element of a convex set is some convex combination of the generating, extreme points: For any $\varvec{p}\in {\mathcal {F}}(\varvec{p}_1,\ldots , \varvec{p}_n), \varvec{p}=\sum _{i=1}^n \alpha _i\varvec{p}_i$, where $\alpha _i \ge 0$ for $i = 1,\ldots , n$, and $\sum _{i=1}^n \alpha _i= 1$ (see, e.g., Stewart & Ojea Quintana 2017, Lemma 1).

Let $\varvec{q}\in {\mathcal {F}}(\varvec{q}_1,\ldots , \varvec{q}_n)$. So, $\varvec{q}= \sum _{i=1}^n \alpha _i\varvec{q}_i$. Since $\varvec{q}$ is a linear pool of $\varvec{q}_i$ for $i = 1,\ldots , n$, by Gärdenfors’ result, Theorem 5, $\varvec{q}$ is also the result of imaging $\varvec{p}= \sum _{i=1}^n\alpha _i\varvec{p}_i$ on E, because linear pooling and general imaging commute. Since $\varvec{p}\in {\mathcal {F}}(\varvec{p}_1,\ldots , \varvec{p}_n)$, it follows that $\varvec{q}\in {\mathcal {F}}_I^E(\varvec{p}_1,\ldots , \varvec{p}_n)$.

For the other direction, assume that $\varvec{q}\in {\mathcal {F}}_I^E(\varvec{p}_1,\ldots , \varvec{p}_n)$. So, $\varvec{q}$ is the result of general imaging some $\varvec{p}\in {\mathcal {F}}(\varvec{p}_1,\ldots , \varvec{p}_n)$ on E. For any $\varvec{p}\in {\mathcal {F}}(\varvec{p}_1,\ldots , \varvec{p}_n), \varvec{p}= \sum _{i=1}^n\alpha _i\varvec{p}_i$. By Gärdenfors’ result, $\varvec{q}= \sum _{i=1}^n \alpha _i \varvec{q}_i$, where the $\varvec{q}_i$ come from the $\varvec{p}_i$ by general imaging on E, because general imaging and linear pooling commute. But then it follows that $\varvec{q}\in {\mathcal {F}}(\varvec{q}_1,\ldots , \varvec{q}_n)$. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Stewart, R.T., Quintana, I.O. Learning and Pooling, Pooling and Learning. Erkenn 83, 369–389 (2018). https://doi.org/10.1007/s10670-017-9894-2

Download citation

Received: 28 February 2016
Accepted: 22 March 2017
Published: 05 June 2017
Issue Date: June 2018
DOI: https://doi.org/10.1007/s10670-017-9894-2

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning and Pooling, Pooling and Learning

Abstract

Access this article

Similar content being viewed by others

An Introduction to Stacking Regression for Economists

Pointless Learning

A distributed one-step estimator

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix: Proofs

Proof of Proposition 2

Proof

Proof of Proposition 5

Proof

Proof of Proposition 6

Proof

Rights and permissions

About this article

Cite this article

Navigation

Learning and Pooling, Pooling and Learning

Abstract

Access this article

Similar content being viewed by others

An Introduction to Stacking Regression for Economists

Pointless Learning

A distributed one-step estimator

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix: Proofs

Proof of Proposition 2

Proof

Proof of Proposition 5

Proof

Proof of Proposition 6

Proof

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation