Cross-situational word learning, like any statistical learning problem, involves tracking the regularities in the environment. However, the information that learners pick up from these regularities is dependent on their learning mechanism. This article investigates the role of one type of mechanism in statistical word learning: competition. Competitive mechanisms would allow learners to find the signal in noisy input and would help to explain the speed with which learners succeed in statistical learning tasks. Because cross-situational word learning provides information at multiple (...) scales—both within and across trials/situations—learners could implement competition at either or both of these scales. A series of four experiments demonstrate that cross-situational learning involves competition at both levels of scale, and that these mechanisms interact to support rapid learning. The impact of both of these mechanisms is considered from the perspective of a process-level understanding of cross-situational learning. (shrink)
Two experiments were conducted to examine adult learners' ability to extract multiple statistics in simultaneously presented visual and auditory input. Experiment 1 used a cross-situational learning paradigm to test whether English speakers were able to use co-occurrences to learn word-to-object mappings and concurrently form object categories based on the commonalities across training stimuli. Experiment 2 replicated the first experiment and further examined whether speakers of Mandarin, a language in which final syllables of object names are more predictive of category membership (...) than English, were able to learn words and form object categories when trained with the same type of structures. The results indicate that both groups of learners successfully extracted multiple levels of co-occurrence and used them to learn words and object categories simultaneously. However, marked individual differences in performance were also found, suggesting possible interference and competition in processing the two concurrent streams of regularities. (shrink)
Previous research shows that people can use the co-occurrence of words and objects in ambiguous situations (i.e., containing multiple words and objects) to learn word meanings during a brief passive training period (Yu & Smith, 2007). However, learners in the world are not completely passive but can affect how their environment is structured by moving their heads, eyes, and even objects. These actions can indicate attention to a language teacher, who may then be more likely to name the attended objects. (...) Using a novel active learning paradigm in which learners choose which four objects they would like to see named on each successive trial, this study asks whether active learning is superior to passive learning in a cross-situational word learning context. Finding that learners perform better in active learning, we investigate the strategies and discover that most learners use immediate repetition to disambiguate pairings. Unexpectedly, we find that learners who repeat only one pair per trial—an easy way to infer this pair—perform worse than those who repeat multiple pairs per trial. Using a working memory extension to an associative model of word learning with uncertainty and familiarity biases, we investigate individual differences that correlate with these assorted strategies. (shrink)
Perceptual tasks such as object matching, mammogram interpretation, mental rotation, and satellite imagery change detection often require the assignment of correspondences to fuse information across views. We apply techniques developed for machine translation to the gaze data recorded from a complex perceptual matching task modeled after fingerprint examinations. The gaze data provide temporal sequences that the machine translation algorithm uses to estimate the subjects' assumptions of corresponding regions. Our results show that experts and novices have similar surface behavior, such as (...) the number of fixations made or the duration of fixations. However, the approach applied to data from experts is able to identify more corresponding areas between two prints. The fixations that are associated with clusters that map with high probability to corresponding locations on the other print are likely to have greater utility in a visual matching task. These techniques address a fundamental problem in eye tracking research with perceptual matching tasks: Given that the eyes always point somewhere, which fixations are the most informative and therefore are likely to be relevant for the comparison task? (shrink)
Forensic evidence often involves an evaluation of whether two impressions were made by the same source, such as whether a fingerprint from a crime scene has detail in agreement with an impression taken from a suspect. Human experts currently outperform computer-based comparison systems, but the strength of the evidence exemplified by the observed detail in agreement must be evaluated against the possibility that some other individual may have created the crime scene impression. Therefore, the strongest evidence comes from features in (...) agreement that are also not shared with other impressions from other individuals. We characterize the nature of human expertise by applying two extant metrics to the images used in a fingerprint recognition task and use eye gaze data from experts to both tune and validate the models. The Attention via Information Maximization model quantifies the rarity of regions in the fingerprints to determine diagnosticity for purposes of excluding alternative sources. The CoVar model captures relationships between low-level features, mimicking properties of the early visual system. Both models produced classification and generalization performance in the 75%–80% range when classifying where experts tend to look. A validation study using regions identified by the AIM model as diagnostic demonstrates that human experts perform better when given regions of high diagnosticity. The computational nature of the metrics may help guard against wrongful convictions, as well as provide a quantitative measure of the strength of evidence in casework. (shrink)
Objects in the world usually have names at different hierarchical levels. This research investigates adults' ability to use cross-situational statistics to simultaneously learn object labels at individual and category levels. The results revealed that adults were able to use co-occurrence information to learn hierarchical labels in contexts where the labels for individual objects and labels for categories were presented in completely separated blocks, in interleaved blocks, or mixed in the same trial. Temporal presentation schedules significantly affected the learning of individual (...) object labels, but not the learning of category labels. Learners' subsequent generalization of category labels indicated sensitivity to the structure of statistical input. (shrink)
New efforts are using head cameras and eye-trackers worn by infants to capture everyday visual environments from the point of view of the infant learner. From this vantage point, the training sets for statistical learning develop as the sensorimotor abilities of the infant develop, yielding a series of ordered datasets for visual learning that differ in content and structure between timepoints but are highly selective at each timepoint. These changing environments may constitute a developmentally ordered curriculum that optimizes learning across (...) many domains. Future advances in computational models will be necessary to connect the developmentally changing content and statistics of infant experience to the internal machinery that does the learning. (shrink)
Joint attention has been extensively studied in the developmental literature because of overwhelming evidence that the ability to socially coordinate visual attention to an object is essential to healthy developmental outcomes, including language learning. The goal of this study was to understand the complex system of sensory-motor behaviors that may underlie the establishment of joint attention between parents and toddlers. In an experimental task, parents and toddlers played together with multiple toys. We objectively measured joint attention—and the sensory-motor behaviors that (...) underlie it—using a dual head-mounted eye-tracking system and frame-by-frame coding of manual actions. By tracking the momentary visual fixations and hand actions of each participant, we precisely determined just how often they fixated on the same object at the same time, the visual behaviors that preceded joint attention and manual behaviors that preceded and co-occurred with joint attention. We found that multiple sequential sensory-motor patterns lead to joint attention. In addition, there are developmental changes in this multi-pathway system evidenced as variations in strength among multiple routes. We propose that coordinated visual attention between parents and toddlers is primarily a sensory-motor behavior. Skill in achieving coordinated visual attention in social settings—like skills in other sensory-motor domains—emerges from multiple pathways to the same functional end. (shrink)
Culture is surely important in human learning. But the relation between culture and psychological mechanism needs clarification in three areas: (1) All learning takes place in real time and through real-time mechanisms; (2) Social correlations are just a kind of learnable correlations; and (3) The proper frame of reference for cognitive theories is the perspective of the learner.
Our computational studies of infant language learning estimate the inherent difficulty of Arbib's proposal. We show that body language provides a strikingly helpful scaffold for learning language that may be necessary but not sufficient, given the absence of sophisticated language in other species. The extraordinary language abilities of Homo sapiens must have evolved from other pressures, such as sexual selection.