Detection and identification of speech sounds using cortical activity patterns
Introduction
Speech sounds evoke unique spatiotemporal patterns in the auditory cortex of many species (Kuhl and Miller, 1975, Eggermont, 1995, Engineer et al., 2008). Primary auditory cortex (A1) neurons respond to most consonants, which evoke short, transient bursts of neural activity, but respond with different spatiotemporal patterns for different sounds (Engineer et al., 2008). For example, the consonant /d/ evokes activity first in neurons tuned to high frequencies, followed by neurons tuned to lower frequencies. The sound /b/ causes the opposite pattern such that low-frequency neurons fire approximately 20 ms before the high-frequency neurons (Engineer et al., 2008, Shetake et al., 2011, Perez et al., 2012, Ranasinghe et al., 2012b). These patterns of activity can be used to identify the evoking auditory stimulus in both human (Steinschneider et al., 2005, Chang et al., 2010, Pasley et al., 2012) and animal auditory cortex (Engineer et al., 2008, Mesgarani et al., 2008, Huetz et al., 2009, Bizley et al., 2010, Shetake et al., 2011, Perez et al., 2012, Ranasinghe et al., 2012a, Centanni et al., 2013a).
Rats are a good model of human speech sound discrimination as these rodents have neural and behavioral speech discrimination thresholds that are similar to humans. Rats can discriminate isolated human speech sounds with high levels of accuracy (Engineer et al., 2008, Perez et al., 2012, Centanni et al., 2013a). Rats and humans have similar thresholds for discriminating spectrally-degraded speech sounds, down to as few as four bands of spectral information (Ranasinghe et al., 2012b). Rats and humans are both able to discriminate speech sounds when presented at 0-dB signal to noise ratio (Shetake et al., 2011).
In both rats and humans, sounds that evoke different patterns of neural activity are more easily discriminated behaviorally than sounds that evoke similar patterns of activity (Engineer et al., 2008, Shetake et al., 2011, Ranasinghe et al., 2012b). Speech sounds presented in background noise evoke neural response patterns with longer latency and lower firing rate than speech presented in quiet and the extent of these differences is correlated with behavioral performance (Martin and Stapells, 2005, Shetake et al., 2011). Neural activity patterns in anesthetized rats also predict behavioral discrimination ability of temporally degraded speech sounds (Ranasinghe et al., 2012b).
The relationship between neural activity and associated behavior is often analyzed using minimum distance classifiers, but classifiers used in previous studies typically differ from behavioral processes in one key aspect: the classifiers were provided with the stimulus onset time, which greatly simplifies the problem of speech classification (Engineer et al., 2008, Shetake et al., 2011, Perez et al., 2012, Ranasinghe et al., 2012a, Centanni et al., 2013a, Centanni et al., 2013b). During natural listening, stimulus onset times occur at irregular intervals. One possible correction allows a classifier to look through an entire recording sweep, rather than only considering activity immediately following stimulus onset. The classifier then guesses the location and identity of the sound post hoc by picking the location most similar to a template (Shetake et al., 2011). While this method is highly accurate and predicts behavioral ability without the need to provide the onset time, the method could not be implemented in real time and assumes that a stimulus was present. We expected that large numbers of recording sites would be able to accurately identify a sound’s onset, since the onset response in A1 to sound is well known (Anderson et al., 2006, Engineer et al., 2008, Dong et al., 2011, Centanni et al., 2013b). We hypothesized that with many recording sites, A1 activity can also be used for identification of the sound with a very brief delay consistent with behavioral performance in humans and animals.
Section snippets
Speech stimuli
For this study, we used the same stimuli as several previous studies in our lab (Engineer et al., 2008, Floody et al., 2010, Porter et al., 2011, Shetake et al., 2011, Ranasinghe et al., 2012b). We used nine English consonant–vowel–consonant (CVC) speech sounds differing only by the initial consonant: (/bad/, /dad/, /gad/, /kad/, /pad/, /sad/, /tad/, /wad/, and /zad/), which were recorded in a double-walled, soundproof booth spoken by a female native-English speaker. The spectral envelope was
Neural activity patterns predict stimulus identity
Our classifier was tested using previously published neural activity evoked by nine different consonant sounds (Engineer et al., 2008) (Fig. 5). The first test of the classifier used 2-ms temporal bins over an 80-ms sliding window (which created an analysis window of 40 units), which is similar to the temporal parameters used in previous studies (Engineer et al., 2008). Overall, this classifier performed at chance levels (10% chance vs. 10.7 ± 0.6% correct; unpaired t-test, p = 0.86; Fig. 2A). We
Calculation of decision thresholds
In our study, we designed a classifier that sweeps neural activity for a pattern of activity evoked by a speech sound and decides which sound caused that activity using predetermined decision thresholds. Our results support the idea that A1 contains information sufficient to perform speech sound identification (Steinschneider et al., 1995, Engineer et al., 2008, Bizley et al., 2010, Shetake et al., 2011, Perez et al., 2012, Ranasinghe et al., 2012b). This information may also be present in
Conclusion
In the current study, we developed a classifier that can locate the onset and identify consonant speech sounds using population neural data acquired from A1. Our classifier successfully predicted the ability of rats to identify a target CVC speech sound in a continuous stream of distracter CVC sounds at speeds up to 10 sps, which is comparable to human performance. The classifier was just as accurate when using data recorded from awake rats. We also demonstrate that smoothing neural data along
Acknowledgments
We would like to thank N. Lengnick, H. Shepard, N. Moreno, R. Cheung, K. Im, S. Mahioddin, C. Rohloff and A. Malik for their help with behavioral training, as well as A. Reed, D. Gunter, C. Mains, M. Borland, E. Hancik and Z. Abdulali for help in acquiring neural recordings. We would also like to thank K. Ranasinghe for suggestions on earlier versions of this manuscript. This work was supported by the National Institute for Deafness and Communication Disorders at the National Institutes of
References (64)
- et al.
Response to broadband repetitive stimuli in auditory cortex of the unanesthetized rat
Hear Res
(2006) - et al.
Effects of damage to auditory cortex on the discrimination of speech sounds by rats
Physiol Behav
(2010) - et al.
Psth-based classification of sensory stimuli using ensembles of single neurons
J Neurosci Methods
(2004) - et al.
Spectrotemporal features of the auditory cortex: the activation in response to dynamic ripples
Neuroimage
(2003) - et al.
Representation of lateralization and tonotopy in primary versus secondary human auditory cortex
Neuroimage
(2007) - et al.
Altered low-gamma sampling in auditory cortex accounts for the three main facets of dyslexia
Neuron
(2011) - et al.
Arithmetic of subthreshold synaptic summation in a model ca1 pyramidal cell
Neuron
(2003) - et al.
Pyramidal neuron as two-layer neural network
Neuron
(2003) - et al.
Discrimination of brief speech sounds is impaired in rats with auditory cortex lesions
Behav Brain Res
(2011) - et al.
Speech discrimination after early exposure to pulsed-noise or speech
Hear Res
(2012)