Consonant and vowel confusions in speech-weighted noise

J Acoust Soc Am. 2007 Apr;121(4):2312-26. doi: 10.1121/1.2642397.

Abstract

This paper presents the results of a closed-set recognition task for 64 consonant-vowel sounds (16 C X 4 V, spoken by 18 talkers) in speech-weighted noise (-22,-20,-16,-10,-2 [dB]) and in quiet. The confusion matrices were generated using responses of a homogeneous set of ten listeners and the confusions were analyzed using a graphical method. In speech-weighted noise the consonants separate into three sets: a low-scoring set C1 (/f/, /theta/, /v/, /d/, /b/, /m/), a high-scoring set C2 (/t/, /s/, /z/, /S/, /Z/) and set C3 (/n/, /p/, /g/, /k/, /d/) with intermediate scores. The perceptual consonant groups are C1: {/f/-/theta/, /b/-/v/-/d/, /theta/-/d/}, C2: {/s/-/z/, /S/-/Z/}, and C3: /m/-/n/, while the perceptual vowel groups are /a/-/ae/ and /epsilon/-/iota/. The exponential articulation index (AI) model for consonant score works for 12 of the 16 consonants, using a refined expression of the AI. Finally, a comparison with past work shows that white noise masks the consonants more uniformly than speech-weighted noise, and shows that the AI, because it can account for the differences in noise spectra, is a better measure than the wideband signal-to-noise ratio for modeling and comparing the scores with different noise maskers.

MeSH terms

  • Adult
  • Humans
  • Models, Biological*
  • Noise*
  • Phonetics*
  • Speech Perception*