Bootstrapping the lexicon: A computational model of infant speech segmentation
Introduction
One of the infant's early tasks is to break up continuous streams of speech into more manageable chunks that can be attached to meaning. The problem can be represented schematically:
A successful segmentation – one which locates “words” – is a logically necessary preparation for the more complex language learning which follows. Since each language has different words, and different regularities for word formation, successful segmentation cannot be due to innate knowledge.1
That the child succeeds in discovering words early and often is clear. According to Mandel, Jusczyk, and Pisoni (1995), infants as young as 4.5 months can distinguish their own names, said in isolation, from other names which are similar in stress pattern (e.g. Joshua vs. Agatha, Brandon vs. Kevin) and prefer them, as shown by significantly longer looking times. At 6 months English-learning children understand “mommy” and “daddy” to refer to their own parents (Tincoff & Jusczyk, 1999). Although there is wide individual variation,2 by 1 year 4 months of age most children have a comprehension vocabulary of at least 50 words (Harris & Chasin, 1999).
This first word comprehension, or “the child's dawning appreciation of some of the conventional meaning units of the adult language” (Vihman, 1996, p. 122), is one result of a successful chunking or segmentation process. Various sources of information that the infant might use for word segmentation have been proposed, and behavioral experiments with infants have tested the availability and effectiveness of prosodic information like pauses, stress, and intonational contours,3 phonetic cues to word boundaries,4 phonotactics,5 and the distribution of sounds in the speech stream,6 as well as tests of two or more of these strategies working in combination.7 Research in this area has expanded lately to the point where space does not permit a proper review here; for comprehensive surveys, see Jusczyk, 1997, Jusczyk, 1999 and Aslin, Jusczyk, and Pisoni (1998).
In this paper, I will focus on just one of these sources of information – the distribution of segmental information,8 or the relative frequency of sounds and sound clusters, and their tendencies to co-occur with each other and with utterance boundaries. Distributional information comes from observing the frequency of events in the environment, a skill available to even the tiniest infant, and indeed to most non-human animals; for reviews of research on the cognitive effects of frequency, see Hasher and Zacks (1984), Alloy and Tabachnik (1984), and Kelly and Martin (1994). In experiments specific to language stimuli, 8-month-old infants successfully segmented an artificial speech stream based solely on distributional information – frequency and order (Saffran et al., 1996a, Saffran et al., 1996b) – and the same stimuli drew similar responses from tamarin monkeys (Hauser, Newport, & Aslin, 2001). The infant experiment has been replicated with naturally spoken syllables (Johnson & Jusczyk, 2001).
Here we will be concerned not with the behavioral data, but rather with computational models of the use of distributional cues to segment words. In particular, this paper describes BootLex, a model of early word segmentation which uses the distribution of segments and pauses to discover word boundaries in several language corpora from three different languages. Second, several previously reported computer models of the same cognitive process are reviewed and compared to BootLex, not only in terms of the usual quantitative measures of effectiveness, but also by contrasting their more global functional characteristics. I hope to show that comparison of models of this small but critical cognitive process can highlight aspects of the problem – both cognitive and computational – that might otherwise be overlooked.
Section 2 of the paper describes how speech segmentation is modeled by computers, and how the performance of such models has been evaluated quantitatively, and then previews the qualitative characteristics that we will contrast in the several models. Section 3 presents the BootLex algorithm in detail. Section 4 discusses three groups of other computer models, and compares them with BootLex and with each other. Section 5 compares the cognitive plausibility of these models, and considers some broader implications.
Section snippets
Distributional models of infant speech segmentation
A number of computational models of the use of statistical cues for infant speech segmentation have been presented recently. These computer models, including BootLex, are inductive, or self-organizing, algorithms. With the significant exception of the categories implicit in the coded input, they have no linguistic knowledge to begin with. That is, there is no lexicon of known words or knowledge of applicable rules or regularities, such as phonotactics. They can only try to discover any
The BootLex algorithm
Olivier (1968) was the first to create a working probabilistic segmentation routine. His algorithm was a deceptively simple exercise in self-organization, using only letter co-occurrence frequencies to segment utterances into words, and the BootLex model is a new implementation based on his idea.13 Because Olivier's algorithm
Other model strategies
A number of computational models of segmentation using other paradigms have been reported recently, falling into three main groups:
- (i)
Three connectionist networks
- (ii)
Two algorithms using the minimum description length principle
- (iii)
Two algorithms based on a formal statistical model called “Model-based dynamic programming” (MBDP)
From computer model to infant cognition
The previous two sections have presented the BootLex algorithm and compared it in some detail with two other groups of models, both in terms of quantitative performance and more global characteristics of design and function. In this final section, we examine the claims of these computational models to be cognitive models – to go beyond the purely engineering goal of an end product that is comparable with that realized by human infants, and also demonstrate similarities in process.
The relation
Conclusion
A new model, BootLex, was shown to be a conceptually simple and effective segmentation procedure. Based on observation of frequently appearing phoneme clusters and their relationship to utterance boundaries, a lexicon was built incrementally and used to recognize words and parse incoming utterances, with the results fed back to further modify the lexicon. The algorithm was tested on a number of corpora with a variety of characteristics. Then, two other groups of models which have been applied
Acknowledgements
The research reported here was conducted as partial fulfillment of the requirements for the degree of Doctor of Philosophy. I thank my thesis supervisor, Virginia Teller, and the members of my committee, Virginia Valian and Martin Chodorow. Portions of this manuscript were written while I was a Foreign Research Fellow of the Japanese Society for the Promotion of Science, appointed on the recommendation of the National Science Foundation, and hosted by Nobuo Ohta at the University of Tsukuba.
References (88)
Speech segmentation and word discovery: a computational perspective
Trends in Cognitive Sciences
(1999)- et al.
Distributional regularity and phonotactic constraints are useful for segmentation
Cognition
(1996) - et al.
Bootstrapping word boundaries: a bottom-up corpus-based approach to speech segmentation
Cognitive Psychology
(1997) - et al.
The perception of rhythmic units in speech by infants and adults
Journal of Memory and Language
(1997) Finding structure in time
Cognitive Science
(1990)- et al.
When prosody fails to cue syntactic structure: nine-month-olds’ sensitivity to phonological vs. syntactic phrases
Cognition
(1994) - et al.
Segmentation of the speech stream in a non-human primate: statistical learning in cotton-top tamarins
Cognition
(2001) - et al.
Clauses are perceptual units for young infants
Cognition
(1987) - et al.
Word segmentation by 8-month-olds: when speech cues count more than statistics
Journal of Memory and Language
(2001) Constraining the search for structure in the input
Lingua
(1998)
How infants begin to extract words from speech
Trends in Cognitive Sciences
Infants’ sensitivity to the sound structure of native language words
Journal of Memory and Language
Perception of acoustic correlates of major phrasal units by young infants
Cognitive Psychology
The beginnings of word segmentation in English-learning infants
Cognitive Psychology
Through a narrow window: working memory capacity and the detection of covariation
Cognition
Domain-general abilities applied to domain-specific tasks: sensitivity to probabilities in perception, cognition, and language
Lingua
Discussion: connections and symbols: closing the gap
Cognition
In defense of representation
Cognitive Psychology
Phonotactic cues for segmentation of fluent speech by infants
Cognition
Phonotactic and prosodic effects on word segmentation in infants
Cognitive Psychology
The TRACE model of speech perception
Cognitive Psychology
A precursor of language acquisition in young infants
Cognition
Maturational constraints on language learning
Cognitive Science
PARSER: a model for word segmentation
Journal of Memory and Language
Words in a sea of sounds: the output of infant statistical learning
Cognition
Statistical learning of tone sequences by human infants and adults
Cognition
Specifying the scope of 13-month-olds’ expectations for novel words
Cognition
Assessment of covariation by humans and animals: the joint influence of prior expectations and current situational information
Psychological Review
Speech and auditory processing during infancy
Computation of conditional probability statistics by 8-month-old infants
Psychological Science
Models of word segmentation in fluent maternal speech to infants
From ‘signal to syntax’: but what is the nature of the signal?
An efficient, probabilistically sound algorithm for segmentation and word discovery
Machine Learning
A first language
Lexical segmentation: the role of sequential statistics in supervised and un-supervised models
Segmenting speech without a lexicon: evidence for a bootstrapping model of lexical acquisition
Statistical language learning
Learning to segment speech using multiple cues: a connectionist model
Language and Cognitive Processes
Do infants perceive word boundaries? An empirical study of the bootstrapping of lexical acquisition
Journal of the Acoustical Society of America
Mechanisms of implicit learning: connectionist models of sequence processing
The Cambridge encyclopedia of language
On the discovery of novel wordlike units from utterances: an artificial-language study with implications for native-language acquisition
Journal of Experimental Psychology: General
Cited by (61)
Does morphological complexity affect word segmentation? Evidence from computational modeling
2022, CognitionCitation Excerpt :Computational modeling work has started to investigate word segmentation in various languages (Batchelder, 2002; Blanchard, Heinz, & Golinkoff, 2010; Caines, Altmann-Richer, & Buttery, 2019; Daland, 2009; Fleck, 2008; Fourtassi, Börschinger, Johnson, & Dupoux, 2013; Kastner & Adriaans, 2017; Pearl & Phillips, 2018; Saksida et al., 2017). Providing a thorough overview of their findings is beyond the scope of the present study, but we would like to highlight that most previous work attempts to check how a given algorithm performs cross-linguistically to argue for the validity of the algorithm the authors of those studies proposed, rather than to understand whether language properties affect segmentation in a systematic way (e.g., Batchelder, 2002; Boruta, Peperkamp, Crabbé, & Dupoux, 2011; M. Johnson, 2008; Pearl & Phillips, 2018; Phillips & Pearl, 2014a, Phillips & Pearl, 2014a). Exceptions include studies that try to explain away cross-linguistic differences on the basis of corpus characteristics (e.g., Caines et al., 2019; Fourtassi et al., 2013), and work assessing the effect of prosodic and syntactic structure such as head direction (saliently, Gervain & Erra, 2012; Saksida et al., 2017), or the effects of input representation (Kastner & Adriaans, 2017).
Chunks of phonological knowledge play a significant role in children's word learning and explain effects of neighborhood size, phonotactic probability, word frequency and word length
2021, Journal of Memory and LanguageCitation Excerpt :This view has fruitfully been applied to word segmentation – locating word boundaries within continuous speech, a feat typically achieved by the developing infant between the ages of around 0;6–1;6. For example, BootLex (Batchelder, 2002) parses continuous speech into potential words by a combination of knowledge of optimal word length and selection of the (incrementally chunked) phoneme sequences having the highest combined frequency; while TRACX (French, Addyman & Mareschal, 2011) shows over a series of studies how recognition of previous frequently encountered phoneme sequences is able to mimic behavior in studies of segmentation. Similar to TRACX, our view records no frequency information; rather, frequently encountered phoneme sequences form larger and larger chunks.
When forgetting fosters learning: A neural network model for statistical learning
2021, CognitionCitation Excerpt :For example, network models (such as Simple Recurrent Networks; Elman, 1990) are directional, and thus do not account for backward TPs, while their sensitivity to non-adjacent TPs will likely depend on the network parameters. “Chunking models” that store items in memory (Batchelder, 2002; Perruchet & Vinter, 1998; Thiessen, 2017) and information-theoretic models (or related Bayesian models) that minimize storage space in memory (Brent & Cartwright, 1996; Orbán et al., 2008) will not track (adjacent or non-adjacent) TPs in unattested items, and thus do not account for the entire range of data either. Here, we suggest that an ability to succeed in the crucial test cases above follows naturally from a correlational learning mechanism such as Hebbian learning.
Statistical learning and memory
2020, Cognition