Linking speech perception and neurophysiology: speech decoding guided by cascaded oscillators locked to the input rhythm

Front Psychol. 2011 Jun 27:2:130. doi: 10.3389/fpsyg.2011.00130. eCollection 2011.

Abstract

The premise of this study is that current models of speech perception, which are driven by acoustic features alone, are incomplete, and that the role of decoding time during memory access must be incorporated to account for the patterns of observed recognition phenomena. It is postulated that decoding time is governed by a cascade of neuronal oscillators, which guide template-matching operations at a hierarchy of temporal scales. Cascaded cortical oscillations in the theta, beta, and gamma frequency bands are argued to be crucial for speech intelligibility. Intelligibility is high so long as these oscillations remain phase locked to the auditory input rhythm. A model (Tempo) is presented which is capable of emulating recent psychophysical data on the intelligibility of speech sentences as a function of "packaging" rate (Ghitza and Greenberg, 2009). The data show that intelligibility of speech that is time-compressed by a factor of 3 (i.e., a high syllabic rate) is poor (above 50% word error rate), but is substantially restored when the information stream is re-packaged by the insertion of silent gaps in between successive compressed-signal intervals - a counterintuitive finding, difficult to explain using classical models of speech perception, but emerging naturally from the Tempo architecture.

Keywords: brain rhythms; cascaded cortical oscillations; decoding; decoding time; memory access; parsing; phase locking; speech perception.