Multiarm bandit problems have been used to model the selection of competing scientific theories by boundedly rational agents. In this paper, I define a variable-arm bandit problem, which allows the set of scientific theories to vary over time. I show that Roth-Erev reinforcement learning, which solves multiarm bandit problems in the limit, cannot solve this problem in a reasonable time. However, social learning via preferential attachment combined with individual reinforcement learning which discounts the past, does.
We introduce a dynamic model for evolutionary games played on a network where strategy changes are correlated according to degree of influence between players. Unlike the notion of stochastic stability, which assumes mutations are stochastically independent and identically distributed, our framework allows for the possibility that agents correlate their strategies with the strategies of those they trust, or those who have influence over them. We show that the dynamical properties of evolutionary games, where such influence neighborhoods appear, differ dramatically from (...) those where all mutations are stochastically independent, and establish some elementary convergence results relevant for the evolution of social institutions. (shrink)
Sender–receiver games, first introduced by David Lewis ([1969]), have received increased attention in recent years as a formal model for the emergence of communication. Skyrms ([2010]) showed that simple models of reinforcement learning often succeed in forming efficient, albeit not necessarily minimal, signalling systems for a large family of games. Later, Alexander et al. ([2012]) showed that reinforcement learning, combined with forgetting, frequently produced both efficient and minimal signalling systems. In this article, I define a ‘dynamic’ sender–receiver game in which (...) the state–action pairs are not held constant over time and show that neither of these two models of learning learn to signal in this environment. However, a model of reinforcement learning with discounting of the past does learn to signal; it also gives rise to the phenomenon of linguistic drift. 1 Introduction2 Dynamic Signalling Games with Reinforcement Learning2.1 Introducing new states2.2 Swapping state–action pairs3 Discounting the Past3.1 Learning to signal in a dynamic world3.2 An unexpected outcome: linguistic drift4 ConclusionAppendix: A Markov Chain Analysis. (shrink)
Evolutionary game theoretic accounts of justice attempt to explain our willingness to follow certain principles of justice by appealing to robustness properties possessed by those principles. Skyrms (1996) offers one sketch of how such an account might go for divide-the-dollar, the simplest version of the Nash bargaining game, using the replicator dynamics of Taylor and Jonker (1978). In a recent article, D'Arms et al. (1998) criticize his account and describe a model which, they allege, undermines his theory. I sketch a (...) theory of evolutionary explanations of justice which avoids their methodological criticisms, and develop a spatial model of divide-the-dollar with more robust convergence properties than the models of Skyrms (1996) and D'Arms et al. (1998). (shrink)
Decision theory faces a number of problematic gambles which challenge it to say what value an ideal rational agent should assign to the gamble, and why. Yet little attention has been devoted to the question of what an ideal rational agent is, and in what sense decision theory may be said to apply to one. I show that, given one arguably natural set of constraints on the preferences of an idealized rational agent, such an agent is forced to be indifferent (...) among entire families of goods, and hence cannot choose among them. This result illustrates the dangers of speaking of the choices of an ?ideal rational agent? when one does not make precise the exact nature of the idealizing assumptions. The result may also be viewed as providing an upper bound on the kinds of idealizing assumptions which can be made for rational agents, beyond which the very concept of choice becomes attenuated. (shrink)
We introduce a dynamic model for evolutionary games played on a network where strategy changes are correlated according to degree of influence between players. Unlike the notion of stochastic stability, which assumes mutations are stochastically independent and identically distributed, our framework allows for the possibility that agents correlate their strategies with the strategies of those they trust, or those who have influence over them. We show that the dynamical properties of evolutionary games, where such influence neighborhoods appear, differ dramatically from (...) those where all mutations are stochastically independent, and establish some elementary convergence results relevant for the evolution of social institutions. (shrink)
The Pasadena game is an example of a decision problem which lacks an expected value, as traditionally conceived. Easwaran (2008) has shown that, if we distinguish between two different kinds of expectations, which he calls ‘strong’ and ‘weak’, the Pasadena game lacks a strong expectation but has a weak expectation. Furthermore, he argues that we should use the weak expectation as providing a measure of the value of an individual play of the Pasadena game. By considering a modified version of (...) the Pasadena game, I argue that weak expectations may provide a very poor measure of the value of an individual play of the game, and hence should not be used to value individual plays unless further information is taken into consideration. (shrink)
In the course of history, many individuals have the dubious honor of being remembered primarily for an eponym of which they would disapprove. How many are aware that Joseph-Ignace Guillotin actually opposed the death penalty? Another notable case is that of Maria Agnesi, an Italian woman of privileged, but not noble, birth who excelled at mathematics and philosophy during the eighteenth century. In her treatise of 1748, Instituzioni Analitiche, she provided a comprehensive summary of the current state of knowledge concerning (...) both integral calculus and differential equations. Later in life she was elected to the Bologna Academy of Sciences and, in 1762, was consulted by the University of Turin for an opinion on the work of an up-and-coming mathematician named Joseph-Louis Lagrange. (shrink)
Recent years have seen increased interest in the question of whether it is possible to provide an evolutionary game-theoretic explanation for certain kinds of social norms. I sketch a proof of a general representation theorem for a large class of evolutionary game-theoretic models played on a social network, in hope that this will contribute to a greater understanding of the long-term evolutionary dynamics of such models, and hence the evolution of social norms.
At the very end of the 19th century, Gabriele Tarde wrote that all society was a product of imitation and innovation. This view regarding the development of society has, to a large extent, fallen out of favour, and especially so in those areas where the rational actor model looms large. I argue that this is unfortunate, as models of imitative learning, in some cases, agree better with what people actually do than more sophisticated models of learning. In this paper, I (...) contrast the behaviour of imitative learning with two more sophisticated learning rules in the context of social deliberation problems. I show for two social deliberation problems, the Centipede game and a simple Lewis sender-receiver game, that imitative learning provides better agreement with what people actually do, thus partially vindicating Tarde. (shrink)
Cheap talk has often been thought incapable of supporting the emergence of cooperation because costless signals, easily faked, are unlikely to be reliable. I show how, in a social network model of cheap talk with reinforcement learning, cheap talk does enable the emergence of cooperation, provided that individuals also temporally discount the past. This establishes one mechanism that suffices for moving a population of initially uncooperative individuals to a state of mutually beneficial cooperation even in the absence of formal institutions.
Rachlin's idea that altruism, like self-control, is a valuable, temporally extended pattern of behavior, suggests one way of addressing common problems in developing a rational choice explanation of individual altruistic behavior. However, the form of Rachlin's explicitly behaviorist account of altruistic acts suffers from two faults, one of which questions the feasibility of his particular behaviorist analysis.
: Strong reciprocators possess two behavioural dispositions: they are willing to bestow bene ts on those who have bestowed bene ts, and they are willing to punish those who fail to bestow bene ts according to some social norm. There is no doubt that peoples' behaviour, in many cases, agrees with what we would expect if people are strong reciprocators, and Fehr and Henrich argue that many people are, in fact, strong reciprocators. They also suggest that strongly reciprocal behaviour may (...) be brought about by specialised cognitive architecture produced by evolution. I argue that specialised cognitive architecture can play a role in the production of strongly reciprocal behaviour only in a very attenuated sense, and that the evolutionary foundations of strong reciprocity are more likely cultural than biological. (shrink)
In a highly influential work, List and Pettit draw upon the theory of judgement aggregation to offer an argument for the existence of nonreductive group agents; they also suggest that nonreductive group agency is a widespread phenomenon. In this paper, we argue for the following two claims. First, that the axioms they consider cannot naturally be interpreted as either descriptive characterisations or normative constraints upon group judgements, in general. This makes it unclear how the List and Pettit argument is to (...) apply to real world group behaviour. Second, by examining empirical data about how group judgements are made by a powerful international regulatory board, we show how each of the List and Pettit axioms can be violated in ways which are straightforwardly explicable at the level of the individual. This suggests that group agency may best be understood as a pluralistic phenomenon, where close inspection of the dynamics of intragroup deliberation can reveal that what prima facie appears to be a nonreductive group agent is, in fact, reducible. (shrink)