The effect of syllable segmentation and phonological neighbours on spoken word recognition in Mandarin Chinese

Abstract

In Indo-European languages, the phonological access points, also known as ‘proximate units,’ are segmental, whereas in Mandarin Chinese, they are entire syllables (O’Seaghdha et al., 2010). There is no uniform approach to how syllables are phonologically segmented for Mandarin Chinese (Neergaard & Huang, 2019). Words may be deemed phonological neighbours, depending on what part of the syllable is regarded contrastive. Whether the entire rime instead of every phoneme is considered as a contrastive element, and whether or not tone is included can have consequences for the way phonological neighbourhood density (PND), neighbourhood frequency (PNF), and homophone density (HD) are calculated. Neergaard et al. (2016) created a database of Mandarin Chinese words, encoded more than 16 different syllable segmentation patterns, using different combinations of phonemes, syllable structure, and tonemes. Sharma (2020) examined four such segmentation schemas: onset and rime with tone (C_GVX_T), and without tone (C_GVX); phonemes with tone (C_G_V_X_T) and without tone (C_G_V_X). They compared the effects of phonological neighborhood based on each of these schemas on the processing of Mandarin Chinese syllables in an auditory lexical decision task (Table 1). While they find certain neighborhood measures to have stronger predictive power when using an Onset-Rime-Tone segmentation pattern, they also point out that their findings might depend on the nature of their task. Our study continues this line of research and aims to assess further insight on phonological representation and neighbourhood measures, examining the mentioned four segmentation schemas but on a task that requires a different type of spoken word recognition. An open-set speech-in-noise spoken word recognition task was used to elicit identification accuracy of Mandarin Chinese syllables. 30 normal-hearing participants were tested and are native speakers of Mandarin from mainland China. An exhaustive syllable set with 1,310 real syllables was tested and 1,164 of which were enriched with lexical statistics (PND, PNF, HD and syllable frequency) based on the four segmentation schemas from the database by Neergaard et al. (2021). The syllables were recorded by native speakers of both genders, intensity normalised, high-pass filtered and silence padded. Each syllable was randomly assigned one of the three Signal-to-Noise-Ratio (SNR) (-5, 0 and +5 dB) with speech-shaped noise. Each participant was tested on a random sample of 655 target syllables. Participants used a keyboard to enter their responses. 16,490 recognition trials over 1,164 real syllables were examined. Logistic mixed effect regressions (glmer) were used to predict accuracy with four lexical measures (PND, PNF, HD, Syllable Frequency) and other control variables (trial position, SNR, volume, syllable duration, speaker’s gender). Four models were fitted with four different sets of lexical measures (one per schema). Participants and items are random intercepts. Continuous variables were log10-transformed (if needed), then z-transformed. The model with the lowest AIC/BIC was determined to be the best model. PND has an inhibitory effect (negative coefficient); HD has a facilitatory effect (positive coefficient); syllable frequency has a facilitatory effect only under the C_G_V_X schema. Two schemas, C_GVX_T and C_G_V_X, resulted in similarly low AIC and BIC scores (Δ AIC/BIC < 2 are not significant) (Table 2), suggesting that the tone needs to be considered only when the syllable is segmented into onsets and rimes. The findings of our study adds to the findings on neighbourhood effects in relation to word representation and processing in Mandarin Chinese. Our use of speech-in-noise word recognition requires a different, arguably more naturalistic, type of speech processing than what is required by auditory lexical decision. Compared to Sharma (2020), we also established an inhibitory effect of neighbourhood density on the spoken word recognition and C_GVX_T was also found to be one of the best schemas, albeit with a different task. The surprising finding with C_G_V_X might be due to the role of task types and requires a more indepth investigation across spoken word recognition tasks.

Publication
The 19th Conference on Laboratory Phonology