Elsevier

Neuroscience

Volume 258, 31 January 2014, Pages 292-306
Neuroscience

Detection and identification of speech sounds using cortical activity patterns

https://doi.org/10.1016/j.neuroscience.2013.11.030Get rights and content

Highlights

  • New classifier uses cortical activity to locate and identify consonants in real time.

  • The classifier predicts the behavioral performance of rats on speech discrimination.

  • Rats can identify a target sound in a stream presented up to 10 syllables per second.

  • The temporal processing limit of rats for speech discrimination mimics that of humans.

  • Spatial smoothing of neural data can compensate for un-correlated neural recordings.

Abstract

We have developed a classifier capable of locating and identifying speech sounds using activity from rat auditory cortex with an accuracy equivalent to behavioral performance and without the need to specify the onset time of the speech sounds. This classifier can identify speech sounds from a large speech set within 40 ms of stimulus presentation. To compare the temporal limits of the classifier to behavior, we developed a novel task that requires rats to identify individual consonant sounds from a stream of distracter consonants. The classifier successfully predicted the ability of rats to accurately identify speech sounds for syllable presentation rates up to 10 syllables per second (up to 17.9 ± 1.5 bits/s), which is comparable to human performance. Our results demonstrate that the spatiotemporal patterns generated in primary auditory cortex can be used to quickly and accurately identify consonant sounds from a continuous speech stream without prior knowledge of the stimulus onset times. Improved understanding of the neural mechanisms that support robust speech processing in difficult listening conditions could improve the identification and treatment of a variety of speech-processing disorders.

Introduction

Speech sounds evoke unique spatiotemporal patterns in the auditory cortex of many species (Kuhl and Miller, 1975, Eggermont, 1995, Engineer et al., 2008). Primary auditory cortex (A1) neurons respond to most consonants, which evoke short, transient bursts of neural activity, but respond with different spatiotemporal patterns for different sounds (Engineer et al., 2008). For example, the consonant /d/ evokes activity first in neurons tuned to high frequencies, followed by neurons tuned to lower frequencies. The sound /b/ causes the opposite pattern such that low-frequency neurons fire approximately 20 ms before the high-frequency neurons (Engineer et al., 2008, Shetake et al., 2011, Perez et al., 2012, Ranasinghe et al., 2012b). These patterns of activity can be used to identify the evoking auditory stimulus in both human (Steinschneider et al., 2005, Chang et al., 2010, Pasley et al., 2012) and animal auditory cortex (Engineer et al., 2008, Mesgarani et al., 2008, Huetz et al., 2009, Bizley et al., 2010, Shetake et al., 2011, Perez et al., 2012, Ranasinghe et al., 2012a, Centanni et al., 2013a).

Rats are a good model of human speech sound discrimination as these rodents have neural and behavioral speech discrimination thresholds that are similar to humans. Rats can discriminate isolated human speech sounds with high levels of accuracy (Engineer et al., 2008, Perez et al., 2012, Centanni et al., 2013a). Rats and humans have similar thresholds for discriminating spectrally-degraded speech sounds, down to as few as four bands of spectral information (Ranasinghe et al., 2012b). Rats and humans are both able to discriminate speech sounds when presented at 0-dB signal to noise ratio (Shetake et al., 2011).

In both rats and humans, sounds that evoke different patterns of neural activity are more easily discriminated behaviorally than sounds that evoke similar patterns of activity (Engineer et al., 2008, Shetake et al., 2011, Ranasinghe et al., 2012b). Speech sounds presented in background noise evoke neural response patterns with longer latency and lower firing rate than speech presented in quiet and the extent of these differences is correlated with behavioral performance (Martin and Stapells, 2005, Shetake et al., 2011). Neural activity patterns in anesthetized rats also predict behavioral discrimination ability of temporally degraded speech sounds (Ranasinghe et al., 2012b).

The relationship between neural activity and associated behavior is often analyzed using minimum distance classifiers, but classifiers used in previous studies typically differ from behavioral processes in one key aspect: the classifiers were provided with the stimulus onset time, which greatly simplifies the problem of speech classification (Engineer et al., 2008, Shetake et al., 2011, Perez et al., 2012, Ranasinghe et al., 2012a, Centanni et al., 2013a, Centanni et al., 2013b). During natural listening, stimulus onset times occur at irregular intervals. One possible correction allows a classifier to look through an entire recording sweep, rather than only considering activity immediately following stimulus onset. The classifier then guesses the location and identity of the sound post hoc by picking the location most similar to a template (Shetake et al., 2011). While this method is highly accurate and predicts behavioral ability without the need to provide the onset time, the method could not be implemented in real time and assumes that a stimulus was present. We expected that large numbers of recording sites would be able to accurately identify a sound’s onset, since the onset response in A1 to sound is well known (Anderson et al., 2006, Engineer et al., 2008, Dong et al., 2011, Centanni et al., 2013b). We hypothesized that with many recording sites, A1 activity can also be used for identification of the sound with a very brief delay consistent with behavioral performance in humans and animals.

Section snippets

Speech stimuli

For this study, we used the same stimuli as several previous studies in our lab (Engineer et al., 2008, Floody et al., 2010, Porter et al., 2011, Shetake et al., 2011, Ranasinghe et al., 2012b). We used nine English consonant–vowel–consonant (CVC) speech sounds differing only by the initial consonant: (/bad/, /dad/, /gad/, /kad/, /pad/, /sad/, /tad/, /wad/, and /zad/), which were recorded in a double-walled, soundproof booth spoken by a female native-English speaker. The spectral envelope was

Neural activity patterns predict stimulus identity

Our classifier was tested using previously published neural activity evoked by nine different consonant sounds (Engineer et al., 2008) (Fig. 5). The first test of the classifier used 2-ms temporal bins over an 80-ms sliding window (which created an analysis window of 40 units), which is similar to the temporal parameters used in previous studies (Engineer et al., 2008). Overall, this classifier performed at chance levels (10% chance vs. 10.7 ± 0.6% correct; unpaired t-test, p = 0.86; Fig. 2A). We

Calculation of decision thresholds

In our study, we designed a classifier that sweeps neural activity for a pattern of activity evoked by a speech sound and decides which sound caused that activity using predetermined decision thresholds. Our results support the idea that A1 contains information sufficient to perform speech sound identification (Steinschneider et al., 1995, Engineer et al., 2008, Bizley et al., 2010, Shetake et al., 2011, Perez et al., 2012, Ranasinghe et al., 2012b). This information may also be present in

Conclusion

In the current study, we developed a classifier that can locate the onset and identify consonant speech sounds using population neural data acquired from A1. Our classifier successfully predicted the ability of rats to identify a target CVC speech sound in a continuous stream of distracter CVC sounds at speeds up to 10 sps, which is comparable to human performance. The classifier was just as accurate when using data recorded from awake rats. We also demonstrate that smoothing neural data along

Acknowledgments

We would like to thank N. Lengnick, H. Shepard, N. Moreno, R. Cheung, K. Im, S. Mahioddin, C. Rohloff and A. Malik for their help with behavioral training, as well as A. Reed, D. Gunter, C. Mains, M. Borland, E. Hancik and Z. Abdulali for help in acquiring neural recordings. We would also like to thank K. Ranasinghe for suggestions on earlier versions of this manuscript. This work was supported by the National Institute for Deafness and Communication Disorders at the National Institutes of

References (64)

  • R. Rennaker et al.

    A comparison of chronic multi-channel cortical implantation techniques: manual versus mechanical insertion

    J Neurosci Methods

    (2005)
  • F. Sengpiel et al.

    The role of activity in development of the visual system

    Curr Biol

    (2002)
  • A.M. Sloan et al.

    Frequency discrimination in rats measured with tone-step stimuli and discrete pure tones

    Hear Res

    (2009)
  • M. Steinschneider et al.

    Tonotopic organization of responses reflecting stop consonant place of articulation in primary auditory cortex (A1) of the monkey

    Brain Res

    (1995)
  • S. Treue

    Neural correlates of attention in primate visual cortex

    Trends Neurosci

    (2001)
  • J.A. Winer et al.

    Auditory thalamocortical transformation: structure and function

    Trends Neurosci

    (2005)
  • E. Ahissar et al.

    Speech comprehension is correlated with temporal response patterns recorded from auditory cortex

    Proc Natl Acad Sci

    (2001)
  • Mark F. Bear et al.

    A physiological basis for a theory of synapse modification

    Science

    (1987)
  • J.K. Bizley et al.

    Neural ensemble codes for stimulus periodicity in auditory cortex

    J Neurosci

    (2010)
  • L. Brillouin

    Science and information theory

    (2013)
  • D.V. Buonomano et al.

    Cortical plasticity: from synapses to maps

    Annu Rev Neurosci

    (1998)
  • D.V. Buonomano et al.

    Temporal information transformed into a spatial code by a neural network with realistic properties

    Science

    (1995)
  • T. Centanni et al.

    Knockdown of the dyslexia-associated gene KIAA0319 impairs temporal responses to speech stimuli in rat primary auditory cortex

    Cereb Cortex

    (2013)
  • T.M. Centanni et al.

    Cortical speech-evoked response patterns in multiple auditory fields are correlated with behavioral discrimination ability

    J Neurophysiol

    (2013)
  • E.F. Chang et al.

    Categorical speech representation in human superior temporal gyrus

    Nat Neurosci

    (2010)
  • G.B. Christianson et al.

    Depth-dependent temporal response properties in core auditory cortex

    J Neurosci

    (2011)
  • S. Cohen-Cory

    The developing synapse: construction and modulation of synaptic structures and circuits

    Science

    (2002)
  • O. Creutzfeldt et al.

    Thalamocortical transformation of responses to complex auditory stimuli

    Exp Brain Res

    (1980)
  • C. Dong et al.

    Neural responses in the primary auditory cortex of freely behaving cats while discriminating fast and slow click-trains

    PLoS One

    (2011)
  • J.J. Eggermont

    Representation of a voice onset time continuum in primary auditory cortex of the cat

    J Acoust Soc Am

    (1995)
  • C.T. Engineer et al.

    Cortical activity patterns predict speech discrimination ability

    Nat Neurosci

    (2008)
  • O. Ghitza et al.

    On the possible role of brain rhythms in speech perception: intelligibility of time-compressed speech with periodic and aperiodic insertions of silence

    Phonetica

    (2009)
  • Cited by (0)

    View full text