Levels of representation in the electrophysiology of speech perception

doi:10.1016/S0364-0213(01)00049-0

Cognitive Science

Volume 25, Issue 5, September–October 2001, Pages 711-731

https://doi.org/10.1016/S0364-0213(01)00049-0 Get rights and content

Abstract

Mapping from acoustic signals to lexical representations is a complex process mediated by a number of different levels of representation. This paper reviews properties of the phonetic and phonological levels, and hypotheses about how category structure is represented at each of these levels, and evaluates these hypotheses in light of relevant electrophysiological studies of phonetics and phonology. The paper examines evidence for two alternative views of how infant phonetic representations develop into adult representations, a structure-changing view and a structure-adding view, and suggests that each may be better suited to different kinds of phonetic categories. Electrophysiological results are beginning to provide information about phonological representations, but less is known about how the more abstract representations at this level could be coded in the brain.

Introduction

Speech perception involves a mapping from continuous acoustic waveforms onto the discrete phonological units used to store words in the mental lexicon. For example, when we hear the word cat, we map a complex and continuous pattern of vibration at the eardrum onto a phonological percept which has just three clearly distinct pieces: /k/, /æ/ and /t/. A great deal of evidence indicates that this mapping from sound to words is not a simple one-step mapping, but is instead mediated by a number of different levels of representation. This article reviews studies of how the brain supports the different levels of representation, with a focus on work using the electrophysiological measures electroencephalography (EEG) and magnetoencephalography (MEG), and how it relates to hypotheses derived from behavioral and theoretical research in phonetics and phonology.

EEG and MEG provide noninvasive measures of neuronal activity in the brain. Electrodes positioned on the scalp, or magnetic field sensors positioned close to the scalp, measure changes in scalp voltages or scalp magnetic fields, entirely passively and with millisecond resolution. These measures are direct, and provide excellent temporal resolution, but only modest localization information. Whereas the other papers in this special issue discuss findings involving a number of different brain regions, most of the existing electrophysiological findings about speech perception have focused on evoked responses occurring within 250 ms after a sound is presented, and generated in auditory cortex. Human auditory cortex is situated on the superior plane of the temporal lobe, that is, on the lower side of the Sylvian fissure. Human auditory cortex consists of a number of subareas, but most electrophysiological findings about speech perception do not reliably implicate specific subareas of auditory cortex.

A great deal of electrophysiological research on speech has focused on two evoked responses: the auditory N100 and the mismatch response (see Fig. 1). The auditory N100, and its magnetic counterpart M100, are often referred to as exogenous responses, meaning that they are evoked by any acoustic stimulus with a well-defined onset, regardless of the listener’s task or attentional state (Näätänen & Picton, 1987). However, the latency, amplitude and localization of the N100 vary reliably when certain acoustic and perceptual parameters are varied, and there are reports of task-related modulation of M100 (e.g., Poeppel et al., 1996).

The auditory mismatch paradigm has been the most productive paradigm in the electrophysiological study of speech, and has revealed evidence of a number of different levels of representation. When a sequence of identical sounds, known as standards, is interrupted by infrequent deviant sounds, the deviant sounds elicit a characteristic response component known as the Mismatch Negativity (MMN) or Magnetic Mismatch Field (MMF; Näätänen et al 1978, Näätänen and Winkler 1999. The mismatch response typically occurs 150–250 ms after the onset of the deviant sound, and when localization information is available, it typically implicates supratemporal auditory cortex Hari et al 1984, Scherg et al 1989, Sams et al 1991, Alho 1995.¹ It can only be seen by comparing responses to infrequent deviant sounds to responses to frequent standard sounds. It can be elicited even when the subject does not attend to the stimuli: studies in this paradigm often present sounds while subjects read a book or watch a silent movie. Hence, it is commonly used as a measure of preattentive processing. However, it is also elicited when subjects actively attend to the stimuli.

Importantly for research on speech, the amplitude of the mismatch response also tends to increase as the discriminability of the standard and deviant stimuli increases Sams et al 1985, Lang et al 1990, Aaltonen et al 1993, Tiitinen et al 1994. For this reason a number of studies have examined whether mismatch response amplitudes track the discriminability profiles established in behavioral phonetics research, or are affected by the categorial status of pairs of sounds. The focus of this paper is on evidence for different levels of representation and how they might be encoded in the brain.

Section snippets

Multiple levels of representation

It is standard to distinguish at least the levels of acoustics, phonetics, and phonology in the representation of speech sounds, but there is also a good deal of evidence that these levels should be further subdivided.

At one end of the speech perception process, in the peripheral auditory system, is a fairly faithful analog representation of the acoustics of speech, which is most likely not modified by exposure to specific languages.

At the other end of the process are discrete, abstract

Vowels

Vowels are relatively easy to study, because their acoustic properties are well-described and easily manipulated. A good deal of information about vowel height, backness and rhotacism is carried in the values of the first three formants of the vowel (F1, F2, F3) respectively. The fundamental frequency (F0) of the vowel primarily conveys information about affect, focus and speaker identity, but is relatively unimportant in the identification of vowel categories.

The structure-changing approach

Phonological representations

Phonetics provides a number of detailed proposals about how representations of speech might be encoded in the brain, and a growing body of electrophysiological results provide evidence that bears upon these hypotheses. The diversity of phonetic coding hypotheses reflects the diversity of the acoustics of phonetic categories. In phonology, on the other hand, existing evidence suggests a need for representations which are fundamentally different from those found in phonetics, representations

Conclusions

Recent years have seen a great increase in findings about how speech is encoded in the human brain, and electrophysiological techniques have played a central role in this. However, the extent of our understanding is still very limited. The greatest progress has been made in areas where there are straightforward hypotheses about acoustics-to-phonetics mappings. Nonlinearities in the acoustics-to-phonetics mapping can be correlated with nonlinearities in ERP or MEG response components. When the

Notes

Acknowledgements

I thank three anonymous reviewers, Cindy Brown, Tom Pellathy and especially Bill Idsardi for valuable feedback on the material in this paper. Preparation of this paper was supported in part by grants from the McDonnell-Pew Program in Cognitive Neuroscience, Oak Ridge Associated Universities, and the National Science Foundation (BCS-0196004).

References (92)

O. Aaltonen et al.
Cortical differences in tonal versus vowel processing as revealed by an ERP component called mismatch negativity (MMN)
Brain and Language
(1993)
E. Diesch et al.
The neurotopography of vowels as mirrored by evoked magnetic field measurements
Brain and Language
(1996)
R. Hari et al.
Responses of the primary auditory cortex to pitch changes in a sequence of tone pipsNeuromagnetic recordings in man
Neuroscience Letters
(1984)
H. Kim et al.
Acoustic and perceptual evidence for complete neutralization of manner of articulation in Korean
Journal of Phonetics
(1996)
C. King et al.
Acoustic features and acoustic change are represented by different central pathways
Hearing Research
(1995)
T. Kohonen et al.
Where the abstract feature maps of the brain might come from
Trends in Neuroscience
(1999)
P.K. Kuhl
Learning and representation in speech and language
Current Opinion in Neurobiology
(1994)
J. Miller
On the internal structure of phonetic categoriesA progress report
Cognition
(1994)
R. Näätänen et al.
Early selective attention effect on evoked potential reinterpreted
Acta Psychologica
(1978)
C. Pantev et al.
Tonotopic organization of the human auditory cortex revealed by transient auditory evoked magnetic fields
Electroencephalography and Clinical Neurophysiology
(1988)

D. Poeppel et al.

Task-induced asymmetry of the auditory evoked M100 neuromagnetic field elicited by speech sounds

Cognitive Brain Research

(1996)

D. Poeppel et al.

Processing of vowels in supratemporal auditory cortex

Neuroscience Letters

(1997)

R.F. Port

The discreteness of phonetic elements and formal linguisticsResponse to a. manaster ramer

Journal of Phonetics

(1996)

R.F. Port et al.

Neutralization of syllable-final voicing in German

Journal of Phonetics

(1985)

M. Rivera-Gaxiola et al.

Electrophysiological correlates of cross-linguistic speech perception in native English speakers

Behavioural Brain Research

(2000)

M. Rivera-Gaxiola et al.

Electrophysiological correlates of category goodness

Behavioral Brain Research

(2000)

M. Sams et al.

Auditory frequency discrimination and event-related potentials

Electroencephalography & Clinical Neurophysiology

(1985)

A. Sharma et al.

Acoustic versus phonetic representation of speech as reflected by the mismatch negativity event-related potential

Electroencephalography and Clinical Neurophysiology

(1993)

L.M. Slowiaczek et al.

On the neutralizing status of Polish word-final devoicing

Journal of Phonetics

(1985)

M. Steinschneider et al.

Speech-evoked activity in primary auditory cortexEffects of voice onset time

Electroencephalography and Clinical Neurophysiology

(1994)

J.F. Werker et al.

Cross-language speech perceptionEvidence for perceptual reorganization during the first year of life

Infant Behavioral Development

(1984)

I. Winkler et al.

Pre-attentive detection of vowel contrasts utilizes both phonetic and auditory memory representations

Cognitive Brain Research

(1999)

O. Aaltonen et al.

Perceptual magnet effect in the light of behavioral and psychophysiological data

Journal of the Acoustical Society of America

(1997)

K. Alho

Cerebral generators of Mismatch Negativity (MMN) and its magnetic counterpart (MMNm) elicited by sound changes

Ear & Hearing

(1995)

L. Armstrong

The phonetic and tonal structure of Kikuyu

(1967)

R. Aulanko et al.

Phonetic invariance in the human auditory cortex

Neuroreport

(1993)

S.E. Blumstein et al.

Acoustic invariance in speech productionEvidence from measurements of the spectral characteristics of stop consonants

Journal of the Acoustical Society of America

(1979)

S.E. Blumstein et al.

Perceptual invariance and onset spectra for stop consonants in different vowel environments

Journal of the Acoustical Society of America

(1980)

A.E. Carney et al.

Noncategorical perception of stop consonants differing in VOT

Journal of the Acoustical Society of America

(1977)

N. Chomsky et al.

The sound patterns of English

(1968)

G. Dehaene-Lambertz

Electrophysiological correlates of categorical phoneme perception in adults

Neuroreport

(1997)

G. Dehaene-Lambertz et al.

A phonological representation in the infant brain

Neuroreport

(1998)

G. Dehaene-Lambertz et al.

Electrophysiological correlates of phonological processingA cross-linguistic study

Journal of Cognitive Neuroscience

(2000)

E. Dupoux et al.

Epenthetic vowels in JapaneseA perceptual illusion?

Journal of Experimental Psychology: Human Perception and Performance

(1999)

P. Eimas

Auditory and linguistic processing of cues for place of articulation by infants

Perception & Psychophysics

(1974)

P. Eimas

Auditory and phonetic coding of the cues for speechDiscrimination of the [r-l] distinction by young infants

Perception & Psychophysics

(1975)

P. Eimas et al.

Contextual effects in infant speech perception

Science

(1980)

P. Eimas et al.

Speech perception in infants

Science

(1971)

M.H. Giard et al.

Brain generators implicated in the processing of auditory stimulus devianceA topographic event-related potential study

Psychophysiology

(1990)

Govindarajan, K., Phillips, C., Poeppel, D., Roberts, T., & Marantz, A. (1998). Latency of MEG M100 response indexes...

P. Jusczyk

The discovery of spoken language

(1997)

D. Kewley-Port

Time-varying features as correlates of place of articulation in stop consonants

Journal of the Acoustical Society of America

(1983)

K.R. Kluender et al.

Japanese quail can learn phonetic categories

Science

(1987)

P.K. Kuhl

Human adults and human infants show a “perceptual magnet effect” for the prototypes of speech categories, monkeys do not

Perception & Psychophysics

(1991)

P.K. Kuhl

Perception cognition and the ontogenetic and phylogenetic emergence of human speech

P.K. Kuhl

Language, mind, and brainExperience alters perception

Cited by (58)

Bridging Stories and Science: An fNIRS-based hyperscanning investigation into child learning in STEM
2024, NeuroImage
Early STEM education is crucial for later learning. This novel study utilised fNIRS to examine how STEM teaching methods (i.e., traditional, storytelling, storyboarding) affect neural activity synchronisation between teachers and students. Our results showed that left and right inferior frontal gyrus (IFG) for storytelling teaching versus traditional teaching, superior temporal gyrus for storyboard teaching versus traditional teaching, and left angular gyrus for storyboard and storytelling teaching were significant different in brain synchronisation. In the storytelling teaching condition, left supramarginal gyrus brain synchrony was found to improve STEM learning outcomes. In the storyboard teaching condition, IFG brain synchrony correlated positively with STEM learning improvement. The findings confirmed that story-based teaching and storyboarding can improve STEM learning efficacy at the neural level and unscored the significant role of neural synchronization as a predictor of learning outcomes.
The time course of normalizing speech variability in vowels
2021, Brain and Language
To achieve perceptual constancy, listeners utilize contextual cues to normalize speech variabilities in speakers. The present study tested the time course of this cognitive process with an event-related potential (ERP) experiment. The first neurophysiological evidence of speech normalization is observed in P2 (130–250 ms), which is functionally related to phonetic and phonological processes. Furthermore, the normalization process was found to ease lexical retrieval, as indexed by smaller N400 (350–470 ms) after larger P2. A cross-language vowel perception task was carried out to further specify whether normalization was processed in the phonetic and/or phonological stage(s). It was found that both phonetic and phonological cues in the speech context contributed to vowel normalization. The results suggest that vowel normalization in the speech context can be observed in the P2 time window and largely overlaps with phonetic and phonological processes.
Plasticity in auditory categorization is supported by differential engagement of the auditory-linguistic network
2019, NeuroImage
Citation Excerpt :
Because categories represent knowledge of stimulus groupings, patterns, and the linkage of sensory cues with internal memory representations (Seger and Miller, 2010), it is argued that categorization reflects the nexus between perception and cognition (Freedman et al., 2001). Understanding how the brain imposes these “top-down” transformation(s) onto the “bottom-up” sensory input to construct meaningful categories is among the many broad and widespread interests to understand how sensory features are realized as invariant perceptual objects (Phillips, 2001; Pisoni and Luce, 1987). The process of categorization requires a higher-order abstraction of the sensory input and consequently, offers an ideal window into how experiential factors might alter this fundamental mode of auditory perception-cognition.
To construct our perceptual world, the brain categorizes variable sensory cues into behaviorally-relevant groupings. Categorical representations are apparent within a distributed fronto-temporo-parietal brain network but how this neural circuitry is shaped by experience remains undefined. Here, we asked whether speech and music categories might be formed within different auditory-linguistic brain regions depending on listeners’ auditory expertise. We recorded EEG in highly skilled (musicians) vs. less experienced (nonmusicians) perceivers as they rapidly categorized speech and musical sounds. Musicians showed perceptual enhancements across domains, yet source EEG data revealed a double dissociation in the neurobiological mechanisms supporting categorization between groups. Whereas musicians coded categories in primary auditory cortex (PAC), nonmusicians recruited non-auditory regions (e.g., inferior frontal gyrus, IFG) to generate category-level information. Functional connectivity confirmed nonmusicians’ increased left IFG involvement reflects stronger routing of signal from PAC directed to IFG, presumably because sensory coding is insufficient to construct categories in less experienced listeners. Our findings establish auditory experience modulates specific engagement and inter-regional communication in the auditory-linguistic network supporting categorical perception. Whereas early canonical PAC representations are sufficient to generate categories in highly trained ears, less experienced perceivers broadcast information downstream to higher-order linguistic brain areas (IFG) to construct abstract sound labels.
Induced neural beta oscillations predict categorical speech perception abilities
2015, Brain and Language
Neural oscillations have been linked to various perceptual and cognitive brain operations. Here, we examined the role of these induced brain responses in categorical speech perception (CP), a phenomenon in which similar features are mapped to discrete, common identities despite their equidistant/continuous physical spacing. We recorded neuroelectric activity while participants rapidly classified sounds along a vowel continuum (/u/ to /a/). Time–frequency analyses applied to the EEG revealed distinct temporal dynamics in induced (non-phase locked) oscillations; increased β (15–30 Hz) coded prototypical vowel sounds carrying well-defined phonetic categories whereas increased γ (50–70 Hz) accompanied ambiguous tokens near the categorical boundary. Notably, changes in β activity were strongly correlated with the slope of listeners’ psychometric identification functions, a measure of the “steepness” of their categorical percept. Our findings demonstrate that in addition to previously observed evoked (phase-locked) correlates of CP, induced brain activity in the β-band codes the ambiguity and strength of categorical speech percepts.
Tracing the emergence of categorical speech perception in the human auditory system
2013, NeuroImage
Citation Excerpt :
Indeed, comparing multiunit responses in brainstem and early sensory cortices in an animal model, Perez et al. (2013) have recently shown that the neural code for speech becomes more distinct as the information ascends the central auditory pathway. Neurobiological networks engaged during speech and other sensory perception presumably involve a coordinated sequence of computations applied to neural representations at multiple stages of processing (Carandini, 2012; Hickok and Poeppel, 2004; Phillips, 2001). Understanding these neurocomputations requires comparing the output of the participating neural elements across multiple brain regions and time scales (Carandini, 2012).
Speech perception requires the effortless mapping from smooth, seemingly continuous changes in sound features into discrete perceptual units, a conversion exemplified in the phenomenon of categorical perception. Explaining how/when the human brain performs this acoustic–phonetic transformation remains an elusive problem in current models and theories of speech perception. In previous attempts to decipher the neural basis of speech perception, it is often unclear whether the alleged brain correlates reflect an underlying percept or merely changes in neural activity that covary with parameters of the stimulus. Here, we recorded neuroelectric activity generated at both cortical and subcortical levels of the auditory pathway elicited by a speech vowel continuum whose percept varied categorically from /u/ to /a/. This integrative approach allows us to characterize how various auditory structures code, transform, and ultimately render the perception of speech material as well as dissociate brain responses reflecting changes in stimulus acoustics from those that index true internalized percepts. We find that activity from the brainstem mirrors properties of the speech waveform with remarkable fidelity, reflecting progressive changes in speech acoustics but not the discrete phonetic classes reported behaviorally. In comparison, patterns of late cortical evoked activity contain information reflecting distinct perceptual categories and predict the abstract phonetic speech boundaries heard by listeners. Our findings demonstrate a critical transformation in neural speech representations between brainstem and early auditory cortex analogous to an acoustic–phonetic mapping necessary to generate categorical speech percepts. Analytic modeling demonstrates that a simple nonlinearity accounts for the transformation between early (subcortical) brain activity and subsequent cortical/behavioral responses to speech (> 150–200 ms) thereby describing a plausible mechanism by which the brain achieves its acoustic-to-phonetic mapping. Results provide evidence that the neurophysiological underpinnings of categorical speech are present cortically by ~ 175 ms after sound enters the ear.
Neural Evidence for Perceiving a Vowel Merger after a Social Interaction within a Native Language
2024, SSRN

View all citing articles on Scopus

View full text

Levels of representation in the electrophysiology of speech perception

Abstract

Introduction

Section snippets

Multiple levels of representation

Vowels

Phonological representations

Conclusions

Notes

Acknowledgements

Brain and Language

Brain and Language

Neuroscience Letters

Journal of Phonetics

Hearing Research

Trends in Neuroscience

Current Opinion in Neurobiology

Cognition

Acta Psychologica

Electroencephalography and Clinical Neurophysiology

Cognitive Brain Research

Neuroscience Letters

Journal of Phonetics

Journal of Phonetics

Behavioural Brain Research

Behavioral Brain Research

Electroencephalography & Clinical Neurophysiology

Electroencephalography and Clinical Neurophysiology

Journal of Phonetics

Electroencephalography and Clinical Neurophysiology

Infant Behavioral Development

Cognitive Brain Research

Perceptual magnet effect in the light of behavioral and psychophysiological data

Journal of the Acoustical Society of America

Cerebral generators of Mismatch Negativity (MMN) and its magnetic counterpart (MMNm) elicited by sound changes

Ear & Hearing

The phonetic and tonal structure of Kikuyu

Phonetic invariance in the human auditory cortex

Neuroreport

Acoustic invariance in speech productionEvidence from measurements of the spectral characteristics of stop consonants

Journal of the Acoustical Society of America

Perceptual invariance and onset spectra for stop consonants in different vowel environments

Journal of the Acoustical Society of America

Noncategorical perception of stop consonants differing in VOT

Journal of the Acoustical Society of America

The sound patterns of English

Electrophysiological correlates of categorical phoneme perception in adults

Neuroreport

A phonological representation in the infant brain

Neuroreport

Electrophysiological correlates of phonological processingA cross-linguistic study

Journal of Cognitive Neuroscience

Epenthetic vowels in JapaneseA perceptual illusion?

Journal of Experimental Psychology: Human Perception and Performance

Auditory and linguistic processing of cues for place of articulation by infants

Perception & Psychophysics

Auditory and phonetic coding of the cues for speechDiscrimination of the [r-l] distinction by young infants

Perception & Psychophysics

Contextual effects in infant speech perception

Science

Speech perception in infants

Science

Brain generators implicated in the processing of auditory stimulus devianceA topographic event-related potential study

Psychophysiology

The discovery of spoken language

Time-varying features as correlates of place of articulation in stop consonants

Journal of the Acoustical Society of America

Japanese quail can learn phonetic categories

Science

Human adults and human infants show a “perceptual magnet effect” for the prototypes of speech categories, monkeys do not

Perception & Psychophysics

Perception cognition and the ontogenetic and phylogenetic emergence of human speech

Language, mind, and brainExperience alters perception