Event Abstract

Neural correlates of individual differences in processing of rising tones in Cantonese: Implications for speech perception and production

  • 1 University of Hong Kong, Division of Speech and Hearing Sciences, Hong Kong, SAR China

Introduction Two aspects of the F0 - the F0 level (high, middle, low) and the F0 contour (static, rising, falling) – are generally considered the perceptual correlates of lexical tones in tone languages, including Mandarin Chinese (Gandour, 1983), Cantonese (Khouw & Ciocca, 2007), and Thai (Gandour, Potisuk, & Dechongkit, 1994). Besides the dominant role of spectral information, much attention has recently been paid to the importance of temporal information in parsing the acoustic signal into relevant segments for decoding during auditory/speech processing (Luo & Poeppel, 2012). Acoustic cues from the temporal waveform envelope have also been shown to successfully cue tone perception in Mandarin Chinese (Whalen & Xu, 1992) as well as Cantonese (Zhou, 2012). Of the various cues to amplitude envelope, rise time, defined as the time taken for a sound to reach its maximum amplitude (Rosen, 1992), is proposed to be an important perceptual cue for the representation of amplitude envelope. The amplitude rise time has been found to be important in facilitating prosodic and syllable segmentation processes in children (Carpenter & Shahi, 2013; Leong, Hämäläinen, Soltesz, & Goswami, 2011), which are arguably critical for the formation of well-specified phonological representations (Goswami, 2011). Hence, one may question whether the rise time of sound amplitude envelope may likewise play a role in processing lexical tones. In other words, to process tones efficiently may entail the encoding of both spectral and temporal cues present in the speech signal to derive tone representations. The present study is the first examination of neural processes underlying the discrimination of the high rising and low rising tones T2/T5 in Hong Kong Cantonese (HKC) from two groups of typically-developed native speakers of HKC with comparable language and musical backgrounds. The participant groups represented, respectively, the pattern of good perception and good production of all Cantonese tones [+Per+Pro], and that of good perception of all tones but poor production of specifically the T2/T5 distinction [+Per-Pro]. Electrophysiological responses to the contrasts of pitch and amplitude envelope between T2 and T5 were measured to allow us to assess the timing and strength of neural activities associated with the auditory stimuli unfolding over time. Any difference in neural response between the two groups would shed light on how the acoustic cues of pitch and amplitude envelope are differentially represented in their auditory memory, and enable us to consider the relationship between perception and production. Method A total of 138 native speakers of Cantonese, all born and raised in Hong Kong, were recruited. No speaker reported a history of hearing abnormalities. According to the Edinburgh Handedness Inventory (Oldfield, 1971), they were all right-handers. They first participated in a tone perception and a tone production task. On the basis of their performance on these tasks, 41 participants were invited back to carry out a passive oddball task. The participants were classified into two groups, i.e. [+Per+Pro] (N = 20, female = 8) and [+Per-Pro] (N = 21, female = 13). The EEG experiment employed the passive oddball paradigm and was conducted in a sound-attenuated electrically shielded booth. Three syllables /fu1/, /fu2/ and /fu5/ were used. The experiment consisted of four oddball conditions of different Standard/Deviant pairs, including T2/T5 and T5/T2 as two experimental conditions, and two control conditions by pairing T2 and T5 with T1 as the common standard, i.e. T1/T2 and T1/T5. For the control conditions, the divergence point was at the vowel onset, where pitch height of the two stimuli begin to deviate. For the experimental conditions, as T2 resembled T5 in the early part of the pitch contour, the two began to diverge at 250 post stimulus onset. Additionally, in the period of 100 to 250 ms where the pitch contours of T2 and T5 fully overlapped, the amplitude rise time, computed as the duration between the vowel onset and amplitude peak during the overlapping pitch period (Tarr, 2013), differed between them. The rise time was 120 ms for T2 and 70 ms for T5. In each condition, the standard stimuli were presented in 85% of the trials, and the deviant occurred on 15% (or 80) of the trials in a quasi-random sequence. The sequence of blocks was rotated across participants. The pre-processed EEG data were analyzed in two ways. Statistical differences between the true and dummy waves were evaluated by a non-parametric cluster-based random permutation approach implemented in Fieldtrip (see Maris & Oostenveld, 2007). The conventional analysis was also performed to examine whether the two groups differed in the ERPs to rise time, the magnitude and latency of the MMN and P3a to pitch level/contour. To explore the relationship between perception and production, correlations between the T2-T5 production accuracy and the perceptual responses were computed, including the behavioral response latency to trials involving T2 and T5 in the tone discrimination task, as well as the neural correlates to rise time and pitch height/contour between T2 and T5. Results and Discussion Behavioral results Results of the tone discrimination task showed that the [+Per-Pro] group had significantly longer response time (RT) of trials involving T2 and T5 than the [+Per+Pro] group, [M[+Per+Pro] = 1046.18 ms, SD = 80.19; M[+Per-Pro] = 1204.54 ms, SD = 177.51; t(39)= -3.57, p = .001, Cohen's d = 1.14], although both groups achieved high accuracies (above 98%). ERP results The results of the cluster-level permutation test revealed several significant clusters in different conditions in the two participant groups (see Figure 1). For clusters that were considered MMNs, the [+Per+Pro] group exhibited the component in the conditions of T1/T5, T1/T2, and T2/T5 -- between 100 and 166 ms (post-divergence point unless specified otherwise) for T1/T2 (p < .001), between 100 and 166 ms for T1/T5 (p = .006), and between 150 and 200 ms for T2/T5 (p = .015). The [+Per-Pro] group showed MMNs in the T1/T2 (110 – 166 ms, p = .005), T1/T5 (104 – 154 ms, p = .024) and T2/T5 (150 – 200 ms, p = .015) conditions. No significant negative cluster was observed in the T5/T2 condition for either group. For P3a, only the T1/T2 condition elicited a significant positive cluster immediately following the MMN for both groups, in the time window of 300 to 400 ms for [+Per+Pro] (p = .025) and 342 to 404 ms for [+Per-Pro] (p = .039). For brain responses to rise time, both participant groups exhibited an early positive-going cluster in the T2/T5 condition in the time window between 62 and 154 ms for [+Per+Pro] (p = .015) and between 64 and 144 ms for [+Per-Pro] (p = .022). For the T5/T2 condition, an early negative-going component was observed only in the [+Per+Pro] group in the time window of 36 to 176 ms (p = .039). T-tests and mixed model ANOVAs of neural responses at Fz revealed that the [+Per+Pro] group showed a shorter MMN latency than the [+Per-Pro] group [t(39) = -2.305, p < .027, Cohen's d = - .74] in the T1/T2 condition. For the MMNs and P3a in the experimental conditions of T2/T5 and T5/T2, significant main effects of condition were found for the MMN mean amplitude [F(2, 39) = 5.85, p = .020, η2 = .13] and the MMN peak latency [F(2, 39) = 10.83, p = .002, η2 = .22], with stronger MMN responses to T2/T5 than to T5/T2 but longer latency to T2/T5 relatively to T5/T2. For rise time, results of a mixed ANOVA of the average amplitudes showed main effects of tone condition [F = (2, 39) = 47.18, p < .001, η2 = .55] and group [F = (2, 39) = 75.89, p = .017, η2 = .14], with T5 eliciting more positive responses than T2, and stronger responses from the [+Per+Pro] than [+Per-Pro] group. Correlations between production accuracy of the two rising tones and perceptual measures found that the averaged production accuracy was negatively correlated with the discrimination RT (r = -.502, p = .001), with shorter discrimination RTs associated with higher production accuracy. In addition, the production accuracy was positively correlated with the mean amplitude of brain responses to rise time of T5 (r = .421, p = .006), the larger the response, the higher the production accuracy. In summary, the present study demonstrated that tone perception is highly dynamic and exploits different acoustic cues at different stages of processing – rise time at the sensory/perceptual level and pitch feature at the cognitive level, as the auditory signal unfolds over time. Moreover, our findings revealed differential sensitivities between individuals with and without distinctive production of the two rising tones as evidenced by the differences in discrimination latency of the two tones and magnitude of brain response to short rise time. The individual differences found in production are proposed to have a perceptual origin, in that less defined phonological representations lead to less distinctive production.

Figure 1

Acknowledgements

This research was supported by a Small Project Fund at the University of Hong Kong
(Project titled “Neural correlates and cognitive capability associated with individual
variations in tone perception and production in Cantonese – An event-related potential (ERP)
study”). We are grateful to all the participants in this study for their time and effort.

References

Gandour, J. (1983). Tone perception in far eastern-languages. Journal of Phonetics, 11(2), 149-175.
Gandour, J., Potisuk, S., & Dechongkit, S. (1994). Tonal coarticulation in Thai. Journal of Phonetics, 22(4), 477-492.
Khouw, E., & Ciocca, V. (2007). Perceptual correlates of Cantonese tones. Journal of Phonetics, 35(1), 104-117.
Leong, V., Hämäläinen, J., Soltész, F., & Goswami, U. (2011). Rise time perception and detection of syllable stress in adults with developmental dyslexia. Journal of Memory and Language, 64(1), 59-73.
Luo, H., & Poeppel, D. (2012). Cortical oscillations in auditory perception and speech: evidence for two temporal windows in human auditory cortex. Frontiers in Psychology, 3, 170.

Keywords: amplitude rise time, lexical tone, mismatch negativity (MMN), ERPs (Event-Related Potentials), individual differences, speech perception and production

Conference: Academy of Aphasia 53rd Annual Meeting, Tucson, United States, 18 Oct - 20 Oct, 2015.

Presentation Type: Poster

Topic: Not student first author

Citation: Law S and Ou J (2015). Neural correlates of individual differences in processing of rising tones in Cantonese: Implications for speech perception and production. Front. Psychol. Conference Abstract: Academy of Aphasia 53rd Annual Meeting. doi: 10.3389/conf.fpsyg.2015.65.00048

Copyright: The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers. They are made available through the Frontiers publishing platform as a service to conference organizers and presenters.

The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated.

Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed.

For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions.

Received: 10 Apr 2015; Published Online: 24 Sep 2015.

* Correspondence: Dr. Sampo Law, University of Hong Kong, Division of Speech and Hearing Sciences, Hong Kong, Hong Kong, SAR China, splaw@hku.hk