In many learning or inference tasks human behavior approximates that of a Bayesian ideal observer, suggesting that, at some level, cognition can be described as Bayesian inference. However, a number of findings have highlighted an intriguing mismatch between human behavior and standard assumptions about optimality: People often appear to make decisions based on just one or a few samples from the appropriate posterior probability distribution, rather than using the full distribution. Although sampling-based approximations are a common way to implement Bayesian (...) inference, the very limited numbers of samples often used by humans seem insufficient to approximate the required probability distributions very accurately. Here, we consider this discrepancy in the broader framework of statistical decision theory, and ask: If people are making decisions based on samples—but as samples are costly—how many samples should people use to optimize their total expected or worst-case reward over a large number of decisions? We find that under reasonable assumptions about the time costs of sampling, making many quick but locally suboptimal decisions based on very few samples may be the globally optimal strategy over long periods. These results help to reconcile a large body of work showing sampling-based or probability matching behavior with the hypothesis that human cognition can be understood in Bayesian terms, and they suggest promising future directions for studies of resource-constrained cognition. (shrink)
Shepard has argued that a universal law should govern generalization across different domains of perception and cognition, as well as across organisms from different species or even different planets. Starting with some basic assumptions about natural kinds, he derived an exponential decay function as the form of the universal generalization gradient, which accords strikingly well with a wide range of empirical data. However, his original formulation applied only to the ideal case of generalization from a single encountered stimulus to a (...) single novel stimulus, and for stimuli that can be represented as points in a continuous metric psychological space. Here we recast Shepard's theory in a more general Bayesian framework and show how this naturally extends his approach to the more realistic situation of generalizing from multiple consequential stimuli with arbitrary representational structure. Our framework also subsumes a version of Tversky's set-theoretic model of similarity, which is conventionally thought of as the primary alternative to Shepard's continuous metric space model of similarity and generalization. This unification allows us not only to draw deep parallels between the set-theoretic and spatial approaches, but also to significantly advance the explanatory power of set-theoretic models. Key Words: additive clustering; Bayesian inference; categorization; concept learning; contrast model; features; generalization; psychological space; similarity. (shrink)
Marr's levels of analysis—computational, algorithmic, and implementation—have served cognitive science well over the last 30 years. But the recent increase in the popularity of the computational level raises a new challenge: How do we begin to relate models at different levels of analysis? We propose that it is possible to define levels of analysis that lie between the computational and the algorithmic, providing a way to build a bridge between computational- and algorithmic-level models. The key idea is to push the (...) notion of rationality, often used in defining computational-level models, deeper toward the algorithmic level. We offer a simple recipe for reverse-engineering the mind's cognitive strategies by deriving optimal algorithms for a series of increasingly more realistic abstract computational architectures, which we call “resource-rational analysis.”. (shrink)
Languages are transmitted from person to person and generation to generation via a process of iterated learning: people learn a language from other people who once learned that language themselves. We analyze the consequences of iterated learning for learning algorithms based on the principles of Bayesian inference, assuming that learners compute a posterior distribution over languages by combining a prior (representing their inductive biases) with the evidence provided by linguistic data. We show that when learners sample languages from this posterior (...) distribution, iterated learning converges to a distribution over languages that is determined entirely by the prior. Under these conditions, iterated learning is a form of Gibbs sampling, a widely-used Markov chain Monte Carlo algorithm. The consequences of iterated learning are more complicated when learners choose the language with maximum posterior probability, being affected by both the prior of the learners and the amount of information transmitted between generations. We show that in this case, iterated learning corresponds to another statistical inference algorithm, a variant of the expectation-maximization (EM) algorithm. These results clarify the role of iterated learning in explanations of linguistic universals and provide a formal connection between constraints on language acquisition and the languages that come to be spoken, suggesting that information transmitted via iterated learning will ultimately come to mirror the minds of the learners. (shrink)
If Bayesian Fundamentalism existed, Jones & Love's (J&L's) arguments would provide a necessary corrective. But it does not. Bayesian cognitive science is deeply concerned with characterizing algorithms and representations, and, ultimately, implementations in neural circuits; it pays close attention to environmental structure and the constraints of behavioral data, when available; and it rigorously compares multiple models, both within and across papers. J&L's recommendation of Bayesian Enlightenment corresponds to past, present, and, we hope, future practice in Bayesian cognitive science.
People are adept at inferring novel causal relations, even from only a few observations. Prior knowledge about the probability of encountering causal relations of various types and the nature of the mechanisms relating causes and effects plays a crucial role in these inferences. We test a formal account of how this knowledge can be used and acquired, based on analyzing causal induction as Bayesian inference. Five studies explored the predictions of this account with adults and 4-year-olds, using tasks in which (...) participants learned about the causal properties of a set of objects. The studies varied the two factors that our Bayesian approach predicted should be relevant to causal induction: the prior probability with which causal relations exist, and the assumption of a deterministic or a probabilistic relation between cause and effect. Adults’ judgments (Experiments 1, 2, and 4) were in close correspondence with the quantitative predictions of the model, and children’s judgments (Experiments 3 and 5) agreed qualitatively with this account. (shrink)
Analyzing the rate at which languages change can clarify whether similarities across languages are solely the result of cognitive biases or might be partially due to descent from a common ancestor. To demonstrate this approach, we use a simple model of language evolution to mathematically determine how long it should take for the distribution over languages to lose the influence of a common ancestor and converge to a form that is determined by constraints on language learning. We show that modeling (...) language learning as Bayesian inference of n binary parameters or the ordering of n constraints results in convergence in a number of generations that is on the order of n log n. We relax some of the simplifying assumptions of this model to explore how different assumptions about language evolution affect predictions about the time to convergence; in general, convergence time increases as the model becomes more realistic. This allows us to characterize the assumptions about language learning (given the models that we consider) that are sufficient for convergence to have taken place on a timescale that is consistent with the origin of human languages. These results clearly identify the consequences of a set of simple models of language evolution and show how analysis of convergence rates provides a tool that can be used to explore questions about the relationship between accounts of language learning and the origins of similarities across languages. (shrink)
The tendency to test outcomes that are predicted by our current theory (the confirmation bias) is one of the best-known biases of human decision making. We prove that the confirmation bias is an optimal strategy for testing hypotheses when those hypotheses are deterministic, each making a single prediction about the next event in a sequence. Our proof applies for two normative standards commonly used for evaluating hypothesis testing: maximizing expected information gain and maximizing the probability of falsifying the current hypothesis. (...) This analysis rests on two assumptions: (a) that people predict the next event in a sequence in a way that is consistent with Bayesian inference; and (b) when testing hypotheses, people test the hypothesis to which they assign highest posterior probability. We present four behavioral experiments that support these assumptions, showing that a simple Bayesian model can capture people's predictions about numerical sequences (Experiments 1 and 2), and that we can alter the hypotheses that people choose to test by manipulating the prior probability of those hypotheses (Experiments 3 and 4). (shrink)
Current psychological theories of human causal learning and judgment focus primarily on long-run predictions: two by estimating parameters of a causal Bayes nets, and a third through structural learning. This paper focuses on people’s short-run behavior by examining dynamical versions of these three theories, and comparing their predictions to a real-world dataset.
Information changes as it is passed from person to person, with this process of cultural transmission allowing the minds of individuals to shape the information that they transmit. We present mathematical models of cultural transmission which predict that the amount of information passed from person to person should affect the rate at which that information changes. We tested this prediction using a function-learning task, in which people learn a functional relationship between two variables by observing the values of those variables. (...) We varied the total number of observations and the number of those observations that take unique values. We found an effect of the number of observations, with functions transmitted using fewer observations changing form more quickly. We did not find an effect of the number of unique observations, suggesting that noise in perception or memory may have affected learning. (shrink)