Elsevier

Cognitive Science

Volume 25, Issue 5, September–October 2001, Pages 711-731
Cognitive Science

Levels of representation in the electrophysiology of speech perception

https://doi.org/10.1016/S0364-0213(01)00049-0Get rights and content

Abstract

Mapping from acoustic signals to lexical representations is a complex process mediated by a number of different levels of representation. This paper reviews properties of the phonetic and phonological levels, and hypotheses about how category structure is represented at each of these levels, and evaluates these hypotheses in light of relevant electrophysiological studies of phonetics and phonology. The paper examines evidence for two alternative views of how infant phonetic representations develop into adult representations, a structure-changing view and a structure-adding view, and suggests that each may be better suited to different kinds of phonetic categories. Electrophysiological results are beginning to provide information about phonological representations, but less is known about how the more abstract representations at this level could be coded in the brain.

Introduction

Speech perception involves a mapping from continuous acoustic waveforms onto the discrete phonological units used to store words in the mental lexicon. For example, when we hear the word cat, we map a complex and continuous pattern of vibration at the eardrum onto a phonological percept which has just three clearly distinct pieces: /k/, /æ/ and /t/. A great deal of evidence indicates that this mapping from sound to words is not a simple one-step mapping, but is instead mediated by a number of different levels of representation. This article reviews studies of how the brain supports the different levels of representation, with a focus on work using the electrophysiological measures electroencephalography (EEG) and magnetoencephalography (MEG), and how it relates to hypotheses derived from behavioral and theoretical research in phonetics and phonology.

EEG and MEG provide noninvasive measures of neuronal activity in the brain. Electrodes positioned on the scalp, or magnetic field sensors positioned close to the scalp, measure changes in scalp voltages or scalp magnetic fields, entirely passively and with millisecond resolution. These measures are direct, and provide excellent temporal resolution, but only modest localization information. Whereas the other papers in this special issue discuss findings involving a number of different brain regions, most of the existing electrophysiological findings about speech perception have focused on evoked responses occurring within 250 ms after a sound is presented, and generated in auditory cortex. Human auditory cortex is situated on the superior plane of the temporal lobe, that is, on the lower side of the Sylvian fissure. Human auditory cortex consists of a number of subareas, but most electrophysiological findings about speech perception do not reliably implicate specific subareas of auditory cortex.

A great deal of electrophysiological research on speech has focused on two evoked responses: the auditory N100 and the mismatch response (see Fig. 1). The auditory N100, and its magnetic counterpart M100, are often referred to as exogenous responses, meaning that they are evoked by any acoustic stimulus with a well-defined onset, regardless of the listener’s task or attentional state (Näätänen & Picton, 1987). However, the latency, amplitude and localization of the N100 vary reliably when certain acoustic and perceptual parameters are varied, and there are reports of task-related modulation of M100 (e.g., Poeppel et al., 1996).

The auditory mismatch paradigm has been the most productive paradigm in the electrophysiological study of speech, and has revealed evidence of a number of different levels of representation. When a sequence of identical sounds, known as standards, is interrupted by infrequent deviant sounds, the deviant sounds elicit a characteristic response component known as the Mismatch Negativity (MMN) or Magnetic Mismatch Field (MMF; Näätänen et al 1978, Näätänen and Winkler 1999. The mismatch response typically occurs 150–250 ms after the onset of the deviant sound, and when localization information is available, it typically implicates supratemporal auditory cortex Hari et al 1984, Scherg et al 1989, Sams et al 1991, Alho 1995.1 It can only be seen by comparing responses to infrequent deviant sounds to responses to frequent standard sounds. It can be elicited even when the subject does not attend to the stimuli: studies in this paradigm often present sounds while subjects read a book or watch a silent movie. Hence, it is commonly used as a measure of preattentive processing. However, it is also elicited when subjects actively attend to the stimuli.

Importantly for research on speech, the amplitude of the mismatch response also tends to increase as the discriminability of the standard and deviant stimuli increases Sams et al 1985, Lang et al 1990, Aaltonen et al 1993, Tiitinen et al 1994. For this reason a number of studies have examined whether mismatch response amplitudes track the discriminability profiles established in behavioral phonetics research, or are affected by the categorial status of pairs of sounds. The focus of this paper is on evidence for different levels of representation and how they might be encoded in the brain.

Section snippets

Multiple levels of representation

It is standard to distinguish at least the levels of acoustics, phonetics, and phonology in the representation of speech sounds, but there is also a good deal of evidence that these levels should be further subdivided.

At one end of the speech perception process, in the peripheral auditory system, is a fairly faithful analog representation of the acoustics of speech, which is most likely not modified by exposure to specific languages.

At the other end of the process are discrete, abstract

Vowels

Vowels are relatively easy to study, because their acoustic properties are well-described and easily manipulated. A good deal of information about vowel height, backness and rhotacism is carried in the values of the first three formants of the vowel (F1, F2, F3) respectively. The fundamental frequency (F0) of the vowel primarily conveys information about affect, focus and speaker identity, but is relatively unimportant in the identification of vowel categories.

The structure-changing approach

Phonological representations

Phonetics provides a number of detailed proposals about how representations of speech might be encoded in the brain, and a growing body of electrophysiological results provide evidence that bears upon these hypotheses. The diversity of phonetic coding hypotheses reflects the diversity of the acoustics of phonetic categories. In phonology, on the other hand, existing evidence suggests a need for representations which are fundamentally different from those found in phonetics, representations

Conclusions

Recent years have seen a great increase in findings about how speech is encoded in the human brain, and electrophysiological techniques have played a central role in this. However, the extent of our understanding is still very limited. The greatest progress has been made in areas where there are straightforward hypotheses about acoustics-to-phonetics mappings. Nonlinearities in the acoustics-to-phonetics mapping can be correlated with nonlinearities in ERP or MEG response components. When the

Notes

Acknowledgements

I thank three anonymous reviewers, Cindy Brown, Tom Pellathy and especially Bill Idsardi for valuable feedback on the material in this paper. Preparation of this paper was supported in part by grants from the McDonnell-Pew Program in Cognitive Neuroscience, Oak Ridge Associated Universities, and the National Science Foundation (BCS-0196004).

References (92)

  • D. Poeppel et al.

    Task-induced asymmetry of the auditory evoked M100 neuromagnetic field elicited by speech sounds

    Cognitive Brain Research

    (1996)
  • D. Poeppel et al.

    Processing of vowels in supratemporal auditory cortex

    Neuroscience Letters

    (1997)
  • R.F. Port

    The discreteness of phonetic elements and formal linguisticsResponse to a. manaster ramer

    Journal of Phonetics

    (1996)
  • R.F. Port et al.

    Neutralization of syllable-final voicing in German

    Journal of Phonetics

    (1985)
  • M. Rivera-Gaxiola et al.

    Electrophysiological correlates of cross-linguistic speech perception in native English speakers

    Behavioural Brain Research

    (2000)
  • M. Rivera-Gaxiola et al.

    Electrophysiological correlates of category goodness

    Behavioral Brain Research

    (2000)
  • M. Sams et al.

    Auditory frequency discrimination and event-related potentials

    Electroencephalography & Clinical Neurophysiology

    (1985)
  • A. Sharma et al.

    Acoustic versus phonetic representation of speech as reflected by the mismatch negativity event-related potential

    Electroencephalography and Clinical Neurophysiology

    (1993)
  • L.M. Slowiaczek et al.

    On the neutralizing status of Polish word-final devoicing

    Journal of Phonetics

    (1985)
  • M. Steinschneider et al.

    Speech-evoked activity in primary auditory cortexEffects of voice onset time

    Electroencephalography and Clinical Neurophysiology

    (1994)
  • J.F. Werker et al.

    Cross-language speech perceptionEvidence for perceptual reorganization during the first year of life

    Infant Behavioral Development

    (1984)
  • I. Winkler et al.

    Pre-attentive detection of vowel contrasts utilizes both phonetic and auditory memory representations

    Cognitive Brain Research

    (1999)
  • O. Aaltonen et al.

    Perceptual magnet effect in the light of behavioral and psychophysiological data

    Journal of the Acoustical Society of America

    (1997)
  • K. Alho

    Cerebral generators of Mismatch Negativity (MMN) and its magnetic counterpart (MMNm) elicited by sound changes

    Ear & Hearing

    (1995)
  • L. Armstrong

    The phonetic and tonal structure of Kikuyu

    (1967)
  • R. Aulanko et al.

    Phonetic invariance in the human auditory cortex

    Neuroreport

    (1993)
  • S.E. Blumstein et al.

    Acoustic invariance in speech productionEvidence from measurements of the spectral characteristics of stop consonants

    Journal of the Acoustical Society of America

    (1979)
  • S.E. Blumstein et al.

    Perceptual invariance and onset spectra for stop consonants in different vowel environments

    Journal of the Acoustical Society of America

    (1980)
  • A.E. Carney et al.

    Noncategorical perception of stop consonants differing in VOT

    Journal of the Acoustical Society of America

    (1977)
  • N. Chomsky et al.

    The sound patterns of English

    (1968)
  • G. Dehaene-Lambertz

    Electrophysiological correlates of categorical phoneme perception in adults

    Neuroreport

    (1997)
  • G. Dehaene-Lambertz et al.

    A phonological representation in the infant brain

    Neuroreport

    (1998)
  • G. Dehaene-Lambertz et al.

    Electrophysiological correlates of phonological processingA cross-linguistic study

    Journal of Cognitive Neuroscience

    (2000)
  • E. Dupoux et al.

    Epenthetic vowels in JapaneseA perceptual illusion?

    Journal of Experimental Psychology: Human Perception and Performance

    (1999)
  • P. Eimas

    Auditory and linguistic processing of cues for place of articulation by infants

    Perception & Psychophysics

    (1974)
  • P. Eimas

    Auditory and phonetic coding of the cues for speechDiscrimination of the [r-l] distinction by young infants

    Perception & Psychophysics

    (1975)
  • P. Eimas et al.

    Contextual effects in infant speech perception

    Science

    (1980)
  • P. Eimas et al.

    Speech perception in infants

    Science

    (1971)
  • M.H. Giard et al.

    Brain generators implicated in the processing of auditory stimulus devianceA topographic event-related potential study

    Psychophysiology

    (1990)
  • Govindarajan, K., Phillips, C., Poeppel, D., Roberts, T., & Marantz, A. (1998). Latency of MEG M100 response indexes...
  • P. Jusczyk

    The discovery of spoken language

    (1997)
  • D. Kewley-Port

    Time-varying features as correlates of place of articulation in stop consonants

    Journal of the Acoustical Society of America

    (1983)
  • K.R. Kluender et al.

    Japanese quail can learn phonetic categories

    Science

    (1987)
  • P.K. Kuhl

    Human adults and human infants show a “perceptual magnet effect” for the prototypes of speech categories, monkeys do not

    Perception & Psychophysics

    (1991)
  • P.K. Kuhl

    Perception cognition and the ontogenetic and phylogenetic emergence of human speech

  • P.K. Kuhl

    Language, mind, and brainExperience alters perception

  • Cited by (58)

    • Plasticity in auditory categorization is supported by differential engagement of the auditory-linguistic network

      2019, NeuroImage
      Citation Excerpt :

      Because categories represent knowledge of stimulus groupings, patterns, and the linkage of sensory cues with internal memory representations (Seger and Miller, 2010), it is argued that categorization reflects the nexus between perception and cognition (Freedman et al., 2001). Understanding how the brain imposes these “top-down” transformation(s) onto the “bottom-up” sensory input to construct meaningful categories is among the many broad and widespread interests to understand how sensory features are realized as invariant perceptual objects (Phillips, 2001; Pisoni and Luce, 1987). The process of categorization requires a higher-order abstraction of the sensory input and consequently, offers an ideal window into how experiential factors might alter this fundamental mode of auditory perception-cognition.

    • Tracing the emergence of categorical speech perception in the human auditory system

      2013, NeuroImage
      Citation Excerpt :

      Indeed, comparing multiunit responses in brainstem and early sensory cortices in an animal model, Perez et al. (2013) have recently shown that the neural code for speech becomes more distinct as the information ascends the central auditory pathway. Neurobiological networks engaged during speech and other sensory perception presumably involve a coordinated sequence of computations applied to neural representations at multiple stages of processing (Carandini, 2012; Hickok and Poeppel, 2004; Phillips, 2001). Understanding these neurocomputations requires comparing the output of the participating neural elements across multiple brain regions and time scales (Carandini, 2012).

    View all citing articles on Scopus
    View full text