Academia.eduAcademia.edu
Ken Binmore, Rational Decisions (Princeton: Princeton University Press, 2009) Ken Binmore’s subtle and thought-provoking book offers a trenchant critique of a number of Bayesian orthodoxies in economics and philosophy. At the same time it offers an elegant (and highly individual) survey of the principal topics, both philosophical and formal, in discussions of probability, rational choice, and Bayesian epistemology. The survey and critique are well integrated. The result deserves lengthy study from anyone interested in theories of rational decision-making. Binmore’s take-home message is that Jimmie Savage and other pioneers of formal decision theory created a monster that they could not control. Properly understood, decision theory is a tool with limited applications. Savage himself restricted it to what he called “small worlds” – worlds where decision-makers can predict the potential impact of all possible sources of new information. Binmore offers a sustained argument in support of Savage’s restriction, drawing on an eclectic range of resources, from the mathematics of probability theory to the epistemology of Bayesian updating, via a version of the Kaplan-Montague paradox of the knower. In place of unrestricted Bayesianism, Binmore proposes a more modest theory that he proposes to apply to decision-making in large-than-small worlds (and so that is potentially relevant both to microeconomics and macroeconomics). He rejects the basic assumption of Bayesian decision theory, which is that rational decision-makers can always assign numerically determinate probabilities, working instead with a theory that accommodates uncertainty and ambiguity by requiring only that decision-makers assign upper probabilities and lower probabilities. Binmore’s framework is set by the distinction between decision-making under risk and decision-making under uncertainty. We make decisions under risk when we are able to assign probabilities to the different possible states of the world. For Binmore these probabilities have to be objective. He gives an elegant presentation of how suitably consistent choices over a set of actions permit the derivation of a von Neumann-Mortgenstern utility function over the consequences of those actions. This utility function is cardinal and has the expected utility property – that is, a suitably consistent agent will choose between actions as though she is maximizing the expected value of that utility function. The presentation of the expected utility theorem in 3.4.1 and 3.6.1 is one of the many places in the book where Binmore uses diagrams very effectively. The expected utility theorem rests upon the mathematics of probability and Binmore explores the foundations of classical probability theory in Chapter 5. This is a section of the book where mathematical argumentation is particularly prominent, but most of the weight is carried by the general ideas, which are clearly communicated. As Binmore observes, Kolmogorov’s formulation of probability theory is highly abstract. A probability measure is a function on a sample space of measurable events (in the sense of Lebesgue measure theory). The axioms of probability theory place constraints upon the behavior of subsets of the sample space. The constraints are that that the space as a whole be measurable; that the complement of a measurable set be measurable; and that the union of a finite or countable set of measurable events be measurable. The Kolmogorov axioms apply only to measurable events. It is relatively straightforward to show that there are non-measurable sets. Binmore reviews Vitali’s (choice-dependent) illustration of a set on the circle that is not Lebesgue-measurable. The Banach-Tarski paradox is an even more exotic example. From the viewpoint of decision theory, though, Binmore thinks that non-measurability is an everyday phenomenon. In order to appreciate this we need to look more closely at how measurable sets behave (bearing in mind that an event in probability theory is simply a subset of the sample space). Suppose that S is a set. It has supersets T1, T2, . . . within the sample space. If these supersets are measurable then this gives an upper bound for the measure of S. Let m(S) denote the measure of S. We have m(S) ≤ where is the least real number equal to or less than m(Ti) for all measurable supersets of S. Similar reasoning allows a lower bound, m(S), to be derived from subsets of S. Measurable sets have the following property: Measurability m(S) = = m(S) In the case of non-measurable sets, the outer and inner (m) measures diverge. So, one way of thinking about non-measurable events is in terms of probability ranges, identifying the outer measure with an upper probability and the inner measure with a lower probability. Binmore argues very plausibly that many events fit this description – we can assign them probability intervals, but not numerically determinate probabilities. Complete ignorance, where the interval is [0, 1], is the limiting case. For more interesting examples, think about cases of ambiguity, such as those revealed in the Ellsberg paradox. Non-measurable events cannot be usefully discussed within Kolmogorov’s framework – which means that there is not much to be said about them within the framework of von Neumann-Morgenstern utility theory. With this we have reached the nub of the issue between Binmore and the Bayesian orthodoxy. From the orthodox Bayesian point of view, non-measurability applies only when we are looking for objective probabilities. A rational agent will always be able to assign a unique subjective probability to a given event. This unique subjective probability is then updated according to Bayes’s rule as more information comes in. As Binmore lucidly explains in section 7.2, if a decision-maker satisfies a number of basic postulates, then we can prove an analog of the von Neumann-Morgenstern expected utility theorem, so that a suitably consistent decision-maker chooses as if maximizing expected utility relative to her subjective probabilities. If the Bayesian move is admissible, then all events are measurable. But Binmore is not prepared to accept the move – at least, not when decision-makers confront what he calls large worlds. His argument starts off from the observation that Savage’s theory of subjective probability entails Kolmogorov’s definition of conditional probability. Since Bayes’s rule connecting posterior and prior probabilities is simply a rewriting of that definition, any follower of Savage is left with the problem of explaining where the prior probabilities come from. It is this problem, Binmore argues, that can only be solved in a small world. The key argument is in section 7.5.1. Binmore offers (on Savage’s behalf) a simple method of identifying prior probabilities. The decision-maker should work backwards from her posterior probabilities to her priors. She should start by asking, for each possible way that things might turn out, what subjective probabilities she would assign in the relevant circumstances. This will give her a set of posterior probabilities that will most likely be inconsistent. Suitable massaging (of the type that Savage himself famously engaged in when he ran into the Allais paradox) will turn these judgments into a consistent set of posteriors. Savage’s own theory then tells us that this consistent set of posteriors can be derived from a single set of priors. So the decision-maker just needs to carry out the derivation. This account of how priors can be fixed has two consequences. The first is that subjective decision theory can be applied only in small worlds. Only in small worlds can decision-makers get started on the process of considering what their posterior probabilities would look like in all possible contingencies. The second is a very significant demotion of Bayes’s rule, which is no longer a blueprint for inductive reasoning. Instead, as Binmore puts it, “Bayes’s rule is reduced to nothing more than a bookkeeping tool that saves Pandora from having to remember all her massaged posterior probabilities” (p. 132). So, if this is right, what are we supposed to do in large worlds when we lack information about objective probabilities? If we are in a position of complete ignorance, then there is a range of possible decision rules we can appeal to, including maximin, the Hurwicz criterion, and Savage’s own minimax regret criterion (Binmore unsurprisingly gives short shrift to the Bayesian principle of insufficient reason). Chapter 9 presents and discusses the axiomatic framework that Milnor offers for thinking about these different criteria. The crucial case, though, is where decision-makers are partially ignorant. They are not completely ignorant, but nor do they have a complete set of determinate subjective probabilities. This is where Binmore proposes his own extension of Bayesian decision theory. He assumes that some form of Savage-like massaging has taken place, with the decision-maker ending up with at least some events to which they are only able to assign upper and lower probabilities. The current literature contains a number of different models of rational decision-making in this sort of situation. Giron and Rios have extended von Neumann-Morgenstern utility theory to probability intervals (on the assumption that the class of possible probability measures is convex), defining a preference relation over acts such that a decision-maker will choose one act over another only if the expected utility of the first exceeds the expected utility of the second for all probability measures deemed possible. As one might imagine, this only generates an incomplete preference relation, since there are likely to be pairs of acts whose expected utility will be ordered differently on different probability measures. The Wald maximin does yield a complete preference relation. Basically, it calls for decision makers to work out the worst case scenario for each lottery according to each possible probability measure and then to choose the lottery for which the lower bound of worst case scenarios is highest. Binmore adopts a different tack. He aims for a complete preference relation (which rules out the Giron and Rios approach) and, moreover, one that is determined solely by the upper and lower probabilities of the relevant events (which rules out Wald maximin). The reasoning behind the latter aim stems from some suggestive remarks of Milnor (quoted on p. 162). Milnor points out that we can transform certain types of decision-making under partial ignorance into decision-making under total ignorance. So, for example, if we have upper and lower probability bounds then we can rule out all probability assignments that fall outside the bounds. Once we have done that we are back in a state of total ignorance. Binmore’s own approach essentially applies Milnor’s insight. He assumes (in line with one of Milnor’s axioms – the column duplication axiom) that a decision-maker assigns utility to a gamble on a single event as a function only of that event’s upper and lower probabilities – and also that the von Neumann-Morgenstern postulates apply. So, we have a von Neumann-Morgenstern utility function u whose value coincides with the value of a function U whose domain is pairs of upper and lower probabilities. This function U will, he argues, have certain properties (assumptions (1) through (3) on p. 166). From these properties he concludes (in section 9.2.2) that U functions as a multiplicative version of the Hurwicz criterion for decision-making under complete ignorance. The Hurwicz criterion in its standard form maximizes the value of (1 – h)c + hC, where c is the worst pay-off consequent upon a given action; C the best pay-off; and h a pessimism index (see Luce and Raiffa 1957 p. 283 for a way of calculating your pessimism index). Binmore ends up with the following: U(p, P) = where U is the function described in the previous paragraph and p, P lower and upper probabilities respectively. As he shows in section 9.2.3, provided that the Hurwicz coefficient h is less than 0.5, this multiplicative Hurwicz criterion predicts the standard pattern of choices in Ellsberg cases. In 9.3.1 he illustrates how his approach can be extended to game theory. A mixed strategy in game theory is one where a player has a probability distribution across available pure strategies (where a pure strategy is one that prescribes a specific move for every situation). The notion of a mixed strategy can be extended to the notion of a muddled strategy, where muddled strategies specify the upper and lower probabilities with which the pure strategies will be played. Muddled strategies are applied to the Battle of the Sexes in section 9.3.1. Binmore shows (with the technical details in section 10.7) that the Battle of the Sexes has a continuum of symmetric equilibria if muddled strategies are allowed. * * * Let me turn now to exploring in more detail the dialectic between Binmore and the subjective decision theorist. Here is what might seem an obvious ad hominem objection to Binmore. The crucial argument for restricting classical Bayesian decision theory to small worlds derives from the problem of the priors. The only plausible way of assigning priors, he argues, is by working backwards from posteriors. We are only in a position to do this in small worlds, because it is only in small worlds that we can consider in advance how all possible new information might impact our probability assignments. But this problem does not go away simply because our posteriors are probability intervals rather than unique subjective probabilities. Binmore himself states that his extended Bayesian decision theory only applies to decision-makers with appropriately massaged posterior probabilities. But surely, a subjective decision theorist is likely to argue, the constraints upon “Savage massaging” mean that Binmore’s own recipe for rational decision-making can itself only be followed in exactly the same worlds as subjective decision theory. Binmore’s response, I think, would be that he is not offering a refinement of subjective decision theory. He is not proposing subjective probability intervals in place of numerically determinate subjective probabilities. Instead, what he is proposing is a set of techniques for dealing with non-measurable events in terms of events that are measurable. This is clearly stated, for example on p. 164: “we are now concerned with the case in which the event E isn’t measurable. However, other events are assumed to be measurable, so that it becomes meaningful to talk about the upper probability P = and the lower probability p = p(E) of the event E.” But how should we think about these measurable events? One possibility would be to understand the measurable events that fix lower and upper probabilities objectively. This would certainly be plausible for the Ellsberg paradox and comparable cases. These are set up to provide clear and objectively derived upper and lower probabilities. We know that the number of black balls in the urn is between 0 and 20. Likewise for the number of white balls. And it is certainly no small achievement to find a descriptively adequate and normatively compelling model of the dominant pattern of choices in Ellsberg cases. But a subjective decision theorist is likely to wonder whether this really adds up to an interesting extension of Bayesian decision theory. The problem is that very little of the decision-making about whose rationality we are concerned seems susceptible to an objectivist approach. Small worlds of the type envisaged by Savage are rarely encountered outside the laboratory, and the decisions we take in them are rarely of much import. But it is not clear how many significant larger-than-small world decisions would be tractable, if tractability were to require objectively fixed upper and lower probabilities. As Binmore brings out in this discussion of von Mises in Chapter 6, objective probabilities typically make sense only relative to what von Mises called collectives – sequences of trials that differ only in certain prespecified attributes. Binmore extends von Mises accunt by introducing what he calls randomizing boxes (and muddling boxes), but the basic point remains. We can only speak of objective probabilities when we are dealing with large classes of suitably homogenous events – coin tosses, mortality statistics, and so forth. But how many of the truly important decisions that we make (even within the economic and financial realms) depend upon events that are homogenous in this way? The consequences of buying Treasury bonds, for example, depend critically upon the future course of interest rates. But can we really look to past patterns of interest rate behavior in order to derive upper and lower probabilities? There’s a sense in which we can. I’m fairly confident, for example, that interest rates won’t rise above 150% and that they won’t be hovering around 0 for too long. No doubt if I were a more astute observer of central banks and financial markets I could refine these upper and lower bounds. But I don’t think that I would be doing this by thinking in terms of frequencies and objective probabilities. Financial history is simply not homogenous enough for the notion of frequency to be applicable. The subjective probability theorist seems at least to have a more accurate way of describing what I what I would be doing – namely, as fixing a subjective probability interval (a range of degrees of belief). The key question, then, is whether Binmore’s approach can be understood in terms of subjective probabilities. He argues persuasively that subjective probabilities cannot be applied in the way that the subjective Bayesian wants to apply them – at least, not outside the small worlds where it is possible to look before one leaps, as he puts it. But, given that he is proposing an extension of Bayesian decision theory to larger-than-small worlds, his position must be that his own arguments do not apply to subjective probabilities that fix upper and lower bounds, as opposed to giving unique degrees of belief. The book does not, however, explain why this should be the case. What’s so special about subjective probability intervals that they should be immune to arguments that are effective against subjective probabilities in the classical sense? As far as I can see, the arguments for restriction to small worlds based on the requirements of Bayesian updating and fixing priors apply just as forcefully in the case where upper and lower probabilities diverge as in the case where they coincide. In any event, this is something that I hope Binmore will clarify in future work. * * * As I hope this review has made clear, Rational Decisions contains a wealth of stimulating arguments and thought-provoking claims. It would be an excellent text for an advanced seminar in decision theory, particularly for students with a solid technical background. And no economist, philosopher, or political scientist seriously interested in theories of rational decision-making can afford to ignore Binmore’s controversial and iconoclastic claims. 12