Levels of representation in the electrophysiology of speech perception
Introduction
Speech perception involves a mapping from continuous acoustic waveforms onto the discrete phonological units used to store words in the mental lexicon. For example, when we hear the word cat, we map a complex and continuous pattern of vibration at the eardrum onto a phonological percept which has just three clearly distinct pieces: /k/, /æ/ and /t/. A great deal of evidence indicates that this mapping from sound to words is not a simple one-step mapping, but is instead mediated by a number of different levels of representation. This article reviews studies of how the brain supports the different levels of representation, with a focus on work using the electrophysiological measures electroencephalography (EEG) and magnetoencephalography (MEG), and how it relates to hypotheses derived from behavioral and theoretical research in phonetics and phonology.
EEG and MEG provide noninvasive measures of neuronal activity in the brain. Electrodes positioned on the scalp, or magnetic field sensors positioned close to the scalp, measure changes in scalp voltages or scalp magnetic fields, entirely passively and with millisecond resolution. These measures are direct, and provide excellent temporal resolution, but only modest localization information. Whereas the other papers in this special issue discuss findings involving a number of different brain regions, most of the existing electrophysiological findings about speech perception have focused on evoked responses occurring within 250 ms after a sound is presented, and generated in auditory cortex. Human auditory cortex is situated on the superior plane of the temporal lobe, that is, on the lower side of the Sylvian fissure. Human auditory cortex consists of a number of subareas, but most electrophysiological findings about speech perception do not reliably implicate specific subareas of auditory cortex.
A great deal of electrophysiological research on speech has focused on two evoked responses: the auditory N100 and the mismatch response (see Fig. 1). The auditory N100, and its magnetic counterpart M100, are often referred to as exogenous responses, meaning that they are evoked by any acoustic stimulus with a well-defined onset, regardless of the listener’s task or attentional state (Näätänen & Picton, 1987). However, the latency, amplitude and localization of the N100 vary reliably when certain acoustic and perceptual parameters are varied, and there are reports of task-related modulation of M100 (e.g., Poeppel et al., 1996).
The auditory mismatch paradigm has been the most productive paradigm in the electrophysiological study of speech, and has revealed evidence of a number of different levels of representation. When a sequence of identical sounds, known as standards, is interrupted by infrequent deviant sounds, the deviant sounds elicit a characteristic response component known as the Mismatch Negativity (MMN) or Magnetic Mismatch Field (MMF; Näätänen et al 1978, Näätänen and Winkler 1999. The mismatch response typically occurs 150–250 ms after the onset of the deviant sound, and when localization information is available, it typically implicates supratemporal auditory cortex Hari et al 1984, Scherg et al 1989, Sams et al 1991, Alho 1995.1 It can only be seen by comparing responses to infrequent deviant sounds to responses to frequent standard sounds. It can be elicited even when the subject does not attend to the stimuli: studies in this paradigm often present sounds while subjects read a book or watch a silent movie. Hence, it is commonly used as a measure of preattentive processing. However, it is also elicited when subjects actively attend to the stimuli.
Importantly for research on speech, the amplitude of the mismatch response also tends to increase as the discriminability of the standard and deviant stimuli increases Sams et al 1985, Lang et al 1990, Aaltonen et al 1993, Tiitinen et al 1994. For this reason a number of studies have examined whether mismatch response amplitudes track the discriminability profiles established in behavioral phonetics research, or are affected by the categorial status of pairs of sounds. The focus of this paper is on evidence for different levels of representation and how they might be encoded in the brain.
Section snippets
Multiple levels of representation
It is standard to distinguish at least the levels of acoustics, phonetics, and phonology in the representation of speech sounds, but there is also a good deal of evidence that these levels should be further subdivided.
At one end of the speech perception process, in the peripheral auditory system, is a fairly faithful analog representation of the acoustics of speech, which is most likely not modified by exposure to specific languages.
At the other end of the process are discrete, abstract
Vowels
Vowels are relatively easy to study, because their acoustic properties are well-described and easily manipulated. A good deal of information about vowel height, backness and rhotacism is carried in the values of the first three formants of the vowel (F1, F2, F3) respectively. The fundamental frequency (F0) of the vowel primarily conveys information about affect, focus and speaker identity, but is relatively unimportant in the identification of vowel categories.
The structure-changing approach
Phonological representations
Phonetics provides a number of detailed proposals about how representations of speech might be encoded in the brain, and a growing body of electrophysiological results provide evidence that bears upon these hypotheses. The diversity of phonetic coding hypotheses reflects the diversity of the acoustics of phonetic categories. In phonology, on the other hand, existing evidence suggests a need for representations which are fundamentally different from those found in phonetics, representations
Conclusions
Recent years have seen a great increase in findings about how speech is encoded in the human brain, and electrophysiological techniques have played a central role in this. However, the extent of our understanding is still very limited. The greatest progress has been made in areas where there are straightforward hypotheses about acoustics-to-phonetics mappings. Nonlinearities in the acoustics-to-phonetics mapping can be correlated with nonlinearities in ERP or MEG response components. When the
Notes
Acknowledgements
I thank three anonymous reviewers, Cindy Brown, Tom Pellathy and especially Bill Idsardi for valuable feedback on the material in this paper. Preparation of this paper was supported in part by grants from the McDonnell-Pew Program in Cognitive Neuroscience, Oak Ridge Associated Universities, and the National Science Foundation (BCS-0196004).
References (92)
- et al.
Cortical differences in tonal versus vowel processing as revealed by an ERP component called mismatch negativity (MMN)
Brain and Language
(1993) - et al.
The neurotopography of vowels as mirrored by evoked magnetic field measurements
Brain and Language
(1996) - et al.
Responses of the primary auditory cortex to pitch changes in a sequence of tone pipsNeuromagnetic recordings in man
Neuroscience Letters
(1984) - et al.
Acoustic and perceptual evidence for complete neutralization of manner of articulation in Korean
Journal of Phonetics
(1996) - et al.
Acoustic features and acoustic change are represented by different central pathways
Hearing Research
(1995) - et al.
Where the abstract feature maps of the brain might come from
Trends in Neuroscience
(1999) Learning and representation in speech and language
Current Opinion in Neurobiology
(1994)On the internal structure of phonetic categoriesA progress report
Cognition
(1994)- et al.
Early selective attention effect on evoked potential reinterpreted
Acta Psychologica
(1978) - et al.
Tonotopic organization of the human auditory cortex revealed by transient auditory evoked magnetic fields
Electroencephalography and Clinical Neurophysiology
(1988)
Task-induced asymmetry of the auditory evoked M100 neuromagnetic field elicited by speech sounds
Cognitive Brain Research
Processing of vowels in supratemporal auditory cortex
Neuroscience Letters
The discreteness of phonetic elements and formal linguisticsResponse to a. manaster ramer
Journal of Phonetics
Neutralization of syllable-final voicing in German
Journal of Phonetics
Electrophysiological correlates of cross-linguistic speech perception in native English speakers
Behavioural Brain Research
Electrophysiological correlates of category goodness
Behavioral Brain Research
Auditory frequency discrimination and event-related potentials
Electroencephalography & Clinical Neurophysiology
Acoustic versus phonetic representation of speech as reflected by the mismatch negativity event-related potential
Electroencephalography and Clinical Neurophysiology
On the neutralizing status of Polish word-final devoicing
Journal of Phonetics
Speech-evoked activity in primary auditory cortexEffects of voice onset time
Electroencephalography and Clinical Neurophysiology
Cross-language speech perceptionEvidence for perceptual reorganization during the first year of life
Infant Behavioral Development
Pre-attentive detection of vowel contrasts utilizes both phonetic and auditory memory representations
Cognitive Brain Research
Perceptual magnet effect in the light of behavioral and psychophysiological data
Journal of the Acoustical Society of America
Cerebral generators of Mismatch Negativity (MMN) and its magnetic counterpart (MMNm) elicited by sound changes
Ear & Hearing
The phonetic and tonal structure of Kikuyu
Phonetic invariance in the human auditory cortex
Neuroreport
Acoustic invariance in speech productionEvidence from measurements of the spectral characteristics of stop consonants
Journal of the Acoustical Society of America
Perceptual invariance and onset spectra for stop consonants in different vowel environments
Journal of the Acoustical Society of America
Noncategorical perception of stop consonants differing in VOT
Journal of the Acoustical Society of America
The sound patterns of English
Electrophysiological correlates of categorical phoneme perception in adults
Neuroreport
A phonological representation in the infant brain
Neuroreport
Electrophysiological correlates of phonological processingA cross-linguistic study
Journal of Cognitive Neuroscience
Epenthetic vowels in JapaneseA perceptual illusion?
Journal of Experimental Psychology: Human Perception and Performance
Auditory and linguistic processing of cues for place of articulation by infants
Perception & Psychophysics
Auditory and phonetic coding of the cues for speechDiscrimination of the [r-l] distinction by young infants
Perception & Psychophysics
Contextual effects in infant speech perception
Science
Speech perception in infants
Science
Brain generators implicated in the processing of auditory stimulus devianceA topographic event-related potential study
Psychophysiology
The discovery of spoken language
Time-varying features as correlates of place of articulation in stop consonants
Journal of the Acoustical Society of America
Japanese quail can learn phonetic categories
Science
Human adults and human infants show a “perceptual magnet effect” for the prototypes of speech categories, monkeys do not
Perception & Psychophysics
Perception cognition and the ontogenetic and phylogenetic emergence of human speech
Language, mind, and brainExperience alters perception
Cited by (58)
The time course of normalizing speech variability in vowels
2021, Brain and LanguagePlasticity in auditory categorization is supported by differential engagement of the auditory-linguistic network
2019, NeuroImageCitation Excerpt :Because categories represent knowledge of stimulus groupings, patterns, and the linkage of sensory cues with internal memory representations (Seger and Miller, 2010), it is argued that categorization reflects the nexus between perception and cognition (Freedman et al., 2001). Understanding how the brain imposes these “top-down” transformation(s) onto the “bottom-up” sensory input to construct meaningful categories is among the many broad and widespread interests to understand how sensory features are realized as invariant perceptual objects (Phillips, 2001; Pisoni and Luce, 1987). The process of categorization requires a higher-order abstraction of the sensory input and consequently, offers an ideal window into how experiential factors might alter this fundamental mode of auditory perception-cognition.
Induced neural beta oscillations predict categorical speech perception abilities
2015, Brain and LanguageTracing the emergence of categorical speech perception in the human auditory system
2013, NeuroImageCitation Excerpt :Indeed, comparing multiunit responses in brainstem and early sensory cortices in an animal model, Perez et al. (2013) have recently shown that the neural code for speech becomes more distinct as the information ascends the central auditory pathway. Neurobiological networks engaged during speech and other sensory perception presumably involve a coordinated sequence of computations applied to neural representations at multiple stages of processing (Carandini, 2012; Hickok and Poeppel, 2004; Phillips, 2001). Understanding these neurocomputations requires comparing the output of the participating neural elements across multiple brain regions and time scales (Carandini, 2012).