Marr's levels of analysis—computational, algorithmic, and implementation—have served cognitive science well over the last 30 years. But the recent increase in the popularity of the computational level raises a new challenge: How do we begin to relate models at different levels of analysis? We propose that it is possible to define levels of analysis that lie between the computational and the algorithmic, providing a way to build a bridge between computational- and algorithmic-level models. The key idea is to push the (...) notion of rationality, often used in defining computational-level models, deeper toward the algorithmic level. We offer a simple recipe for reverse-engineering the mind's cognitive strategies by deriving optimal algorithms for a series of increasingly more realistic abstract computational architectures, which we call “resource-rational analysis.”. (shrink)
Is language understanding a special case of social cognition? To help evaluate this view, we can formalize it as the rational speech-act theory: Listeners assume that speakers choose their utterances approximately optimally, and listeners interpret an utterance by using Bayesian inference to “invert” this model of the speaker. We apply this framework to model scalar implicature (“some” implies “not all,” and “N” implies “not more than N”). This model predicts an interaction between the speaker's knowledge state and the listener's interpretation. (...) We test these predictions in two experiments and find good fit between model predictions and human judgments. (shrink)
We derive a probabilistic account of the vagueness and context-sensitivity of scalar adjectives from a Bayesian approach to communication and interpretation. We describe an iterated-reasoning architecture for pragmatic interpretation and illustrate it with a simple scalar implicature example. We then show how to enrich the apparatus to handle pragmatic reasoning about the values of free variables, explore its predictions about the interpretation of scalar adjectives, and show how this model implements Edgington’s Vagueness: a reader, 1997) account of the sorites paradox, (...) with variations. The Bayesian approach has a number of explanatory virtues: in particular, it does not require any special-purpose machinery for handling vagueness, and it is integrated with a promising new approach to pragmatics and other areas of cognitive science. (shrink)
Hierarchical Bayesian models (HBMs) provide an account of Bayesian inference in a hierarchically structured hypothesis space. Scientific theories are plausibly regarded as organized into hierarchies in many cases, with higher levels sometimes called ‘paradigms’ and lower levels encoding more specific or concrete hypotheses. Therefore, HBMs provide a useful model for scientific theory change, showing how higher‐level theory change may be driven by the impact of evidence on lower levels. HBMs capture features described in the Kuhnian tradition, particularly the idea that (...) higher‐level theories guide learning at lower levels. In addition, they help resolve certain issues for Bayesians, such as scientific preference for simplicity and the problem of new theories. *Received July 2009; revised October 2009. †To contact the authors, please write to: Leah Henderson, Massachusetts Institute of Technology, 77 Massachusetts Avenue, 32D‐808, Cambridge, MA 02139; e‐mail: [email protected]. (shrink)
Recent debates over adults' theory of mind use have been fueled by surprising failures of perspective-taking in communication, suggesting that perspective-taking may be relatively effortful. Yet adults routinely engage in effortful processes when needed. How, then, should speakers and listeners allocate their resources to achieve successful communication? We begin with the observation that the shared goal of communication induces a natural division of labor: The resources one agent chooses to allocate toward perspective-taking should depend on their expectations about the other's (...) allocation. We formalize this idea in a resource-rational model augmenting recent probabilistic weighting accounts with a mechanism for (costly) control over the degree of perspective-taking. In a series of simulations, we first derive an intermediate degree of perspective weighting as an optimal trade-off between expected costs and benefits of perspective-taking. We then present two behavioral experiments testing novel predictions of our model. In Experiment 1, we manipulated the presence or absence of occlusions in a director–matcher task. We found that speakers spontaneously modulated the informativeness of their descriptions to account for “known unknowns” in their partner's private view, reflecting a higher degree of speaker perspective-taking than previously acknowledged. In Experiment 2, we then compared the scripted utterances used by confederates in prior work with those produced in interactions with unscripted directors. We found that confederates were systematically less informative than listeners would initially expect given the presence of occlusions, but listeners used violations to adaptively make fewer errors over time. Taken together, our work suggests that people are not simply “mindblind”; they use contextually appropriate expectations to navigate the division of labor with their partner. We discuss how a resource-rational framework may provide a more deeply explanatory foundation for understanding flexible perspective-taking under processing constraints. (shrink)
Hierarchical Bayesian models (HBMs) provide an account of Bayesian inference in a hierarchically structured hypothesis space. Scientific theories are plausibly regarded as organized into hierarchies in many cases, with higher levels sometimes called ‘para- digms’ and lower levels encoding more specific or concrete hypotheses. Therefore, HBMs provide a useful model for scientific theory change, showing how higher-level theory change may be driven by the impact of evidence on lower levels. HBMs capture features described in the Kuhnian tradition, particularly the idea (...) that higher-level theories guide learning at lower levels. In addition, they help resolve certain issues for Bayesians, such as scientific preference for simplicity and the problem of new theories. (shrink)
An important, but relatively neglected, aspect of human theory of mind is emotion inference: understanding how and why a person feels a certain why is central to reasoning about their beliefs, desires and plans. The authors review recent work that has begun to unveil the structure and determinants of emotion inference, organizing them within a unified probabilistic framework.
Learning to understand a single causal system can be an achievement, but humans must learn about multiple causal systems over the course of a lifetime. We present a hierarchical Bayesian framework that helps to explain how learning about several causal systems can accelerate learning about systems that are subsequently encountered. Given experience with a set of objects, our framework learns a causal model for each object and a causal schema that captures commonalities among these causal models. The schema organizes the (...) objects into categories and specifies the causal powers and characteristic features of these categories and the characteristic causal interactions between categories. A schema of this kind allows causal models for subsequent objects to be rapidly learned, and we explore this accelerated learning in four experiments. Our results confirm that humans learn rapidly about the causal powers of novel objects, and we show that our framework accounts better for our data than alternative models of causal learning. (shrink)
Hierarchical Bayesian models provide an account of Bayesian inference in a hierarchically structured hypothesis space. Scientific theories are plausibly regarded as organized into hierarchies in many cases, with higher levels sometimes called ‘paradigms’ and lower levels encoding more specific or concrete hypotheses. Therefore, HBMs provide a useful model for scientific theory change, showing how higher-level theory change may be driven by the impact of evidence on lower levels. HBMs capture features described in the Kuhnian tradition, particularly the idea that higher-level (...) theories guide learning at lower levels. In addition, they help resolve certain issues for Bayesians, such as scientific preference for simplicity and the problem of new theories. (shrink)
We combine two recent probabilistic approaches to natural language understanding, exploring the formal pragmatics of communication on a noisy channel. We first extend a model of rational communication between a speaker and listener, to allow for the possibility that messages are corrupted by noise. In this model, common knowledge of a noisy channel leads to the use and correct understanding of sentence fragments. A further extension of the model, which allows the speaker to intentionally reduce the noise rate on a (...) word, is used to model prosodic emphasis. We show that the model derives several well-known changes in meaning associated with prosodic emphasis. Our results show that nominal amounts of actual noise can be leveraged for communicative purposes. (shrink)
Despite their diversity, human languages share consistent properties and regularities. Wherefrom does this consistency arise? And does it tell us something about the problem that all languages need to solve? The authors provide an intriguing analyses which focuses on the “communicative function of ambiguity” whose resolution entailed an equally intriguing “speaker–listener cross‐entropy objective for measuring the efficiency of linguistic systems from first principles of efficient language use.”.
The language we use over the course of conversation changes as we establish common ground and learn what our partner finds meaningful. Here we draw upon recent advances in natural language processing to provide a finer‐grained characterization of the dynamics of this learning process. We release an open corpus (>15,000 utterances) of extended dyadic interactions in a classic repeated reference game task where pairs of participants had to coordinate on how to refer to initially difficult‐to‐describe tangram stimuli. We find that (...) different pairs discover a wide variety of idiosyncratic but efficient and stable solutions to the problem of reference. Furthermore, these conventions are shaped by the communicative context: words that are more discriminative in the initial context (i.e., that are used for one target more than others) are more likely to persist through the final repetition. Finally, we find systematic structure in how a speaker's referring expressions become more efficient over time: Syntactic units drop out in clusters following positive feedback from the listener, eventually leaving short labels containing open‐class parts of speech. These findings provide a higher resolution look at the quantitative dynamics of ad hoc convention formation and support further development of computational models of learning in communication. (shrink)
Humor plays an essential role in human interactions. Precisely what makes something funny, however, remains elusive. While research on natural language understanding has made significant advancements in recent years, there has been little direct integration of humor research with computational models of language understanding. In this paper, we propose two information-theoretic measures—ambiguity and distinctiveness—derived from a simple model of sentence processing. We test these measures on a set of puns and regular sentences and show that they correlate significantly with human (...) judgments of funniness. Moreover, within a set of puns, the distinctiveness measure distinguishes exceptionally funny puns from mediocre ones. Our work is the first, to our knowledge, to integrate a computational model of general language understanding and humor theory to quantitatively predict humor at a fine-grained level. We present it as an example of a framework for applying models of language processing to understand higher level linguistic and cognitive phenomena. (shrink)
As modern deep networks become more complex, and get closer to human‐like capabilities in certain domains, the question arises as to how the representations and decision rules they learn compare to the ones in humans. In this work, we study representations of sentences in one such artificial system for natural language processing. We first present a diagnostic test dataset to examine the degree of abstract composable structure represented. Analyzing performance on these diagnostic tests indicates a lack of systematicity in representations (...) and decision rules, and reveals a set of heuristic strategies. We then investigate the effect of training distribution on learning these heuristic strategies, and we study changes in these representations with various augmentations to the training set. Our results reveal parallels to the analogous representations in people. We find that these systems can learn abstract rules and generalize them to new contexts under certain circumstances—similar to human zero‐shot reasoning. However, we also note some shortcomings in this generalization behavior—similar to human judgment errors like belief bias. Studying these parallels suggests new ways to understand psychological phenomena in humans as well as informs best strategies for building artificial intelligence with human‐like language understanding. (shrink)
Machines that learn and think like people must be able to learn from others. Social learning speeds up the learning process and – in combination with language – is a gateway to abstract and unobservable information. Social learning also facilitates the accumulation of knowledge across generations, helping people and artificial intelligences learn things that no individual could learn in a lifetime.