Elsevier

Cognition

Volume 83, Issue 2, March 2002, Pages 167-206
Cognition

Bootstrapping the lexicon: A computational model of infant speech segmentation

https://doi.org/10.1016/S0010-0277(02)00002-1Get rights and content

Abstract

Prelinguistic infants must find a way to isolate meaningful chunks from the continuous streams of speech that they hear. BootLex, a new model which uses distributional cues to build a lexicon, demonstrates how much can be accomplished using this single source of information. This conceptually simple probabilistic algorithm achieves significant segmentation results on various kinds of language corpora – English, Japanese, and Spanish; child- and adult-directed speech, and written texts; and several variations in coding structure – and reveals which statistical characteristics of the input have an influence on segmentation performance. BootLex is then compared, quantitatively and qualitatively, with three other groups of computational models of the same infant segmentation process, paying particular attention to functional characteristics of the models and their similarity to human cognition. Commonalities and contrasts among the models are discussed, as well as their implications both for theories of the cognitive problem of segmentation itself, and for the general enterprise of computational cognitive modeling.

Introduction

One of the infant's early tasks is to break up continuous streams of speech into more manageable chunks that can be attached to meaning. The problem can be represented schematically:

A successful segmentation – one which locates “words” – is a logically necessary preparation for the more complex language learning which follows. Since each language has different words, and different regularities for word formation, successful segmentation cannot be due to innate knowledge.1

That the child succeeds in discovering words early and often is clear. According to Mandel, Jusczyk, and Pisoni (1995), infants as young as 4.5 months can distinguish their own names, said in isolation, from other names which are similar in stress pattern (e.g. Joshua vs. Agatha, Brandon vs. Kevin) and prefer them, as shown by significantly longer looking times. At 6 months English-learning children understand “mommy” and “daddy” to refer to their own parents (Tincoff & Jusczyk, 1999). Although there is wide individual variation,2 by 1 year 4 months of age most children have a comprehension vocabulary of at least 50 words (Harris & Chasin, 1999).

This first word comprehension, or “the child's dawning appreciation of some of the conventional meaning units of the adult language” (Vihman, 1996, p. 122), is one result of a successful chunking or segmentation process. Various sources of information that the infant might use for word segmentation have been proposed, and behavioral experiments with infants have tested the availability and effectiveness of prosodic information like pauses, stress, and intonational contours,3 phonetic cues to word boundaries,4 phonotactics,5 and the distribution of sounds in the speech stream,6 as well as tests of two or more of these strategies working in combination.7 Research in this area has expanded lately to the point where space does not permit a proper review here; for comprehensive surveys, see Jusczyk, 1997, Jusczyk, 1999 and Aslin, Jusczyk, and Pisoni (1998).

In this paper, I will focus on just one of these sources of information – the distribution of segmental information,8 or the relative frequency of sounds and sound clusters, and their tendencies to co-occur with each other and with utterance boundaries. Distributional information comes from observing the frequency of events in the environment, a skill available to even the tiniest infant, and indeed to most non-human animals; for reviews of research on the cognitive effects of frequency, see Hasher and Zacks (1984), Alloy and Tabachnik (1984), and Kelly and Martin (1994). In experiments specific to language stimuli, 8-month-old infants successfully segmented an artificial speech stream based solely on distributional information – frequency and order (Saffran et al., 1996a, Saffran et al., 1996b) – and the same stimuli drew similar responses from tamarin monkeys (Hauser, Newport, & Aslin, 2001). The infant experiment has been replicated with naturally spoken syllables (Johnson & Jusczyk, 2001).

Here we will be concerned not with the behavioral data, but rather with computational models of the use of distributional cues to segment words. In particular, this paper describes BootLex, a model of early word segmentation which uses the distribution of segments and pauses to discover word boundaries in several language corpora from three different languages. Second, several previously reported computer models of the same cognitive process are reviewed and compared to BootLex, not only in terms of the usual quantitative measures of effectiveness, but also by contrasting their more global functional characteristics. I hope to show that comparison of models of this small but critical cognitive process can highlight aspects of the problem – both cognitive and computational – that might otherwise be overlooked.

Section 2 of the paper describes how speech segmentation is modeled by computers, and how the performance of such models has been evaluated quantitatively, and then previews the qualitative characteristics that we will contrast in the several models. Section 3 presents the BootLex algorithm in detail. Section 4 discusses three groups of other computer models, and compares them with BootLex and with each other. Section 5 compares the cognitive plausibility of these models, and considers some broader implications.

Section snippets

Distributional models of infant speech segmentation

A number of computational models of the use of statistical cues for infant speech segmentation have been presented recently. These computer models, including BootLex, are inductive, or self-organizing, algorithms. With the significant exception of the categories implicit in the coded input, they have no linguistic knowledge to begin with. That is, there is no lexicon of known words or knowledge of applicable rules or regularities, such as phonotactics. They can only try to discover any

The BootLex algorithm

Olivier (1968) was the first to create a working probabilistic segmentation routine. His algorithm was a deceptively simple exercise in self-organization, using only letter co-occurrence frequencies to segment utterances into words, and the BootLex model is a new implementation based on his idea.13 Because Olivier's algorithm

Other model strategies

A number of computational models of segmentation using other paradigms have been reported recently, falling into three main groups:

  • (i)

    Three connectionist networks

  • (ii)

    Two algorithms using the minimum description length principle

  • (iii)

    Two algorithms based on a formal statistical model called “Model-based dynamic programming” (MBDP)

All these models interpret the cognitive problem of word segmentation similarly, as discussed above, but there are significant differences among them in goals and methods. Each

From computer model to infant cognition

The previous two sections have presented the BootLex algorithm and compared it in some detail with two other groups of models, both in terms of quantitative performance and more global characteristics of design and function. In this final section, we examine the claims of these computational models to be cognitive models – to go beyond the purely engineering goal of an end product that is comparable with that realized by human infants, and also demonstrate similarities in process.

The relation

Conclusion

A new model, BootLex, was shown to be a conceptually simple and effective segmentation procedure. Based on observation of frequently appearing phoneme clusters and their relationship to utterance boundaries, a lexicon was built incrementally and used to recognize words and parse incoming utterances, with the results fed back to further modify the lexicon. The algorithm was tested on a number of corpora with a variety of characteristics. Then, two other groups of models which have been applied

Acknowledgements

The research reported here was conducted as partial fulfillment of the requirements for the degree of Doctor of Philosophy. I thank my thesis supervisor, Virginia Teller, and the members of my committee, Virginia Valian and Martin Chodorow. Portions of this manuscript were written while I was a Foreign Research Fellow of the Japanese Society for the Promotion of Science, appointed on the recommendation of the National Science Foundation, and hosted by Nobuo Ohta at the University of Tsukuba.

References (88)

  • P.W. Jusczyk

    How infants begin to extract words from speech

    Trends in Cognitive Sciences

    (1999)
  • P.W. Jusczyk et al.

    Infants’ sensitivity to the sound structure of native language words

    Journal of Memory and Language

    (1993)
  • P.W. Jusczyk et al.

    Perception of acoustic correlates of major phrasal units by young infants

    Cognitive Psychology

    (1992)
  • P.W. Jusczyk et al.

    The beginnings of word segmentation in English-learning infants

    Cognitive Psychology

    (1999)
  • Y. Kareev

    Through a narrow window: working memory capacity and the detection of covariation

    Cognition

    (1995)
  • M.H. Kelly et al.

    Domain-general abilities applied to domain-specific tasks: sensitivity to probabilities in perception, cognition, and language

    Lingua

    (1994)
  • B. MacWhinney

    Discussion: connections and symbols: closing the gap

    Cognition

    (1993)
  • A.B. Markman et al.

    In defense of representation

    Cognitive Psychology

    (2000)
  • S.L. Mattys et al.

    Phonotactic cues for segmentation of fluent speech by infants

    Cognition

    (2001)
  • S.L. Mattys et al.

    Phonotactic and prosodic effects on word segmentation in infants

    Cognitive Psychology

    (1999)
  • J.L. McClelland et al.

    The TRACE model of speech perception

    Cognitive Psychology

    (1986)
  • J. Mehler et al.

    A precursor of language acquisition in young infants

    Cognition

    (1988)
  • E.L. Newport

    Maturational constraints on language learning

    Cognitive Science

    (1990)
  • P. Perruchet et al.

    PARSER: a model for word segmentation

    Journal of Memory and Language

    (1998)
  • J.R. Saffran

    Words in a sea of sounds: the output of infant statistical learning

    Cognition

    (2001)
  • J.R. Saffran et al.

    Statistical learning of tone sequences by human infants and adults

    Cognition

    (1999)
  • S.R. Waxman

    Specifying the scope of 13-month-olds’ expectations for novel words

    Cognition

    (1999)
  • L.B. Alloy et al.

    Assessment of covariation by humans and animals: the joint influence of prior expectations and current situational information

    Psychological Review

    (1984)
  • R.N. Aslin et al.

    Speech and auditory processing during infancy

  • R.N. Aslin et al.

    Computation of conditional probability statistics by 8-month-old infants

    Psychological Science

    (1998)
  • R.N. Aslin et al.

    Models of word segmentation in fluent maternal speech to infants

  • Batchelder, E. O. (1997). Computational evidence for the use of frequency information in discovery of the infant's...
  • N. Bernstein Ratner

    From ‘signal to syntax’: but what is the nature of the signal?

  • M.R. Brent

    An efficient, probabilistically sound algorithm for segmentation and word discovery

    Machine Learning

    (1999)
  • Brent, M. R., & Siskind, J. M. (2000). The role of exposure to isolated words in early vocabulary development. NECI TR...
  • R. Brown

    A first language

    (1973)
  • P. Cairns et al.

    Lexical segmentation: the role of sequential statistics in supervised and un-supervised models

  • T.A. Cartwright et al.

    Segmenting speech without a lexicon: evidence for a bootstrapping model of lexical acquisition

  • E. Charniak

    Statistical language learning

    (1993)
  • M.H. Christiansen et al.

    Learning to segment speech using multiple cues: a connectionist model

    Language and Cognitive Processes

    (1998)
  • A. Christophe et al.

    Do infants perceive word boundaries? An empirical study of the bootstrapping of lexical acquisition

    Journal of the Acoustical Society of America

    (1994)
  • A. Cleeremans

    Mechanisms of implicit learning: connectionist models of sequence processing

    (1993)
  • D. Crystal

    The Cambridge encyclopedia of language

    (1987)
  • D. Dahan et al.

    On the discovery of novel wordlike units from utterances: an artificial-language study with implications for native-language acquisition

    Journal of Experimental Psychology: General

    (1999)
  • Cited by (61)

    • Does morphological complexity affect word segmentation? Evidence from computational modeling

      2022, Cognition
      Citation Excerpt :

      Computational modeling work has started to investigate word segmentation in various languages (Batchelder, 2002; Blanchard, Heinz, & Golinkoff, 2010; Caines, Altmann-Richer, & Buttery, 2019; Daland, 2009; Fleck, 2008; Fourtassi, Börschinger, Johnson, & Dupoux, 2013; Kastner & Adriaans, 2017; Pearl & Phillips, 2018; Saksida et al., 2017). Providing a thorough overview of their findings is beyond the scope of the present study, but we would like to highlight that most previous work attempts to check how a given algorithm performs cross-linguistically to argue for the validity of the algorithm the authors of those studies proposed, rather than to understand whether language properties affect segmentation in a systematic way (e.g., Batchelder, 2002; Boruta, Peperkamp, Crabbé, & Dupoux, 2011; M. Johnson, 2008; Pearl & Phillips, 2018; Phillips & Pearl, 2014a, Phillips & Pearl, 2014a). Exceptions include studies that try to explain away cross-linguistic differences on the basis of corpus characteristics (e.g., Caines et al., 2019; Fourtassi et al., 2013), and work assessing the effect of prosodic and syntactic structure such as head direction (saliently, Gervain & Erra, 2012; Saksida et al., 2017), or the effects of input representation (Kastner & Adriaans, 2017).

    • Chunks of phonological knowledge play a significant role in children's word learning and explain effects of neighborhood size, phonotactic probability, word frequency and word length

      2021, Journal of Memory and Language
      Citation Excerpt :

      This view has fruitfully been applied to word segmentation – locating word boundaries within continuous speech, a feat typically achieved by the developing infant between the ages of around 0;6–1;6. For example, BootLex (Batchelder, 2002) parses continuous speech into potential words by a combination of knowledge of optimal word length and selection of the (incrementally chunked) phoneme sequences having the highest combined frequency; while TRACX (French, Addyman & Mareschal, 2011) shows over a series of studies how recognition of previous frequently encountered phoneme sequences is able to mimic behavior in studies of segmentation. Similar to TRACX, our view records no frequency information; rather, frequently encountered phoneme sequences form larger and larger chunks.

    • When forgetting fosters learning: A neural network model for statistical learning

      2021, Cognition
      Citation Excerpt :

      For example, network models (such as Simple Recurrent Networks; Elman, 1990) are directional, and thus do not account for backward TPs, while their sensitivity to non-adjacent TPs will likely depend on the network parameters. “Chunking models” that store items in memory (Batchelder, 2002; Perruchet & Vinter, 1998; Thiessen, 2017) and information-theoretic models (or related Bayesian models) that minimize storage space in memory (Brent & Cartwright, 1996; Orbán et al., 2008) will not track (adjacent or non-adjacent) TPs in unattested items, and thus do not account for the entire range of data either. Here, we suggest that an ability to succeed in the crucial test cases above follows naturally from a correlational learning mechanism such as Hebbian learning.

    View all citing articles on Scopus
    View full text