Human and automated tutors attempt to choose pedagogical activities that will maximize student learning, informed by their estimates of the student's current knowledge. There has been substantial research on tracking and modeling student learning, but significantly less attention on how to plan teaching actions and how the assumed student model impacts the resulting plans. We frame the problem of optimally selecting teaching actions using a decision-theoretic approach and show how to formulate teaching as a partially observable Markov decision process planning (...) problem. This framework makes it possible to explore how different assumptions about student learning and behavior should affect the selection of teaching actions. We consider how to apply this framework to concept learning problems, and we present approximate methods for finding optimal teaching actions, given the large state and action spaces that arise in teaching. Through simulations and behavioral experiments, we explore the consequences of choosing teacher actions under different assumed student models. In two concept-learning tasks, we show that this technique can accelerate learning relative to baseline performance. (shrink)
According to rational pedagogy models, learners take into account the way in which teachers generate evidence, and teachers take into account the way in which learners assimilate that evidence. The authors develop a framework for integrating rational pedagogy into models of active exploration, in which agents can take actions to influence the evidence they gather from the environment. The key idea is that a single agent can be both teacher and learner.
As children gradually master grammatical rules, they often go through a period of producing form‐meaning associations that were not observed in the input. For example, 2‐ to 3‐year‐old English‐learning children use the bare form of verbs in settings that require obligatory past tense meaning while already starting to produce the grammatical –ed inflection. While many studies have focused on overgeneralization errors, fewer studies have attempted to explain the root of this earlier stage of rule acquisition. In this work, we use (...) computational modeling to replicate children's production behavior prior to the generalization of past tense production in English. We illustrate how seemingly erroneous productions emerge in a model, without being licensed in the grammar and despite the model aiming at conforming to grammatical forms. Our results show that bare form productions stem from a tension between two factors: (1) trying to produce a less frequent meaning (the past tense) and (2) being unable to restrict the production of frequent forms (the bare form) as learning progresses. Like children, our model goes through a stage of bare form production and then converges on adult‐like production of the regular past tense, showing that these different stages can be accounted for through a single learning mechanism. (shrink)
The enormous scale of the available information and products on the Internet has necessitated the development of algorithms that intermediate between options and human users. These algorithms attempt to provide the user with relevant information. In doing so, the algorithms may incur potential negative consequences stemming from the need to select items about which it is uncertain to obtain information about users versus the need to select items about which it is certain to secure high ratings. This tension is an (...) instance of the exploration–exploitation trade‐off in the context of recommender systems. Because humans are in this interaction loop, the long‐term trade‐off behavior depends on human variability. Our goal is to characterize the trade‐off behavior as a function of human variability fundamental to such human–algorithm interaction. To tackle the characterization, we first introduce a unifying model that smoothly transitions between active learning and recommending relevant information. The unifying model gives us access to a continuum of algorithms along the exploration–exploitation trade‐off. We then present two experiments to measure the trade‐off behavior under two very different levels of human variability. The experimental results inform a thorough simulation study in which we modeled and varied human variability systematically over a wide rage. The main result is that exploration–exploitation trade‐off grows in severity as human variability increases, but there exists a regime of low variability where algorithms balanced in exploration and exploitation can largely overcome the trade‐off. (shrink)