Abstract
Understanding speech in background noise is challenging for every listener, including those with normal peripheral hearing. This difficulty is attributable in part to the disruptive effects of noise on neural synchrony, resulting in degraded representation of speech at cortical and subcortical levels as reflected by electrophysiological responses. These problems are especially pronounced in clinical populations such as children with learning impairments. Given the established effects of noise on evoked responses, we hypothesized that listening-in-noise problems are associated with degraded processing of timing information at the brainstem level. Participants (66 children; ages, 8–14 years; 22 females) were divided into groups based on their performance on clinical measures of speech-in-noise (SIN) perception and reading. We compared brainstem responses to speech syllables between top and bottom SIN and reading groups in the presence and absence of competing multitalker babble. In the quiet condition, neural response timing was equivalent between groups. In noise, however, the bottom groups exhibited greater neural delays relative to the top groups. Group-specific timing delays occurred exclusively in response to the noise-vulnerable formant transition, not to the more perceptually robust, steady-state portion of the stimulus. These results demonstrate that neural timing is disrupted by background noise and that greater disruptions are associated with the inability to perceive speech in challenging listening conditions.
Introduction
Speech consists of rapidly changing elements that require fine-grained neural representation of temporal information, especially in background noise. Temporal cues are important components of auditory object formation (Shinn-Cunningham and Best, 2008), a necessary element of auditory stream segregation. Early stages of auditory stream segregation occur subcortically (Pressnitzer et al., 2008; Parbery-Clark et al., 2009a), and timing cues needed for speech perception and auditory stream segregation are preserved in the brainstem via neural synchrony (Kraus and Nicol, 2005; Akhoun et al., 2008; Hornickel et al., 2009; Tzounopoulos and Kraus, 2009). It is well established that neural synchrony is degraded in noise, leading to delayed and reduced auditory evoked responses from cortical (Warrier et al., 2004; Billings et al., 2009; Russo et al., 2009) and brainstem structures (Hall, 1992; Cunningham et al., 2001; Burkard and Sims, 2002; Russo et al., 2004). In the auditory brainstem response (ABR), background noise disrupts the representation of temporal aspects of the stimulus, leading to delayed scalp-recorded far field responses to the time-varying features of a speech stimulus (e.g., the onset and formant transition).
Children with language-based learning disabilities are known to have difficulty understanding speech in background noise (Bradlow et al., 2003; Ziegler et al., 2005). In children with dyslexia, perceptual deficits in noise occur despite normal perception in quiet conditions (Ziegler et al., 2009), indicating that the deficit may be located central to the cochlea. Consistent with the idea of centrally located noise-induced deficits, children with dyslexia can exhibit atypical cortical (Warrier et al., 2004; Wible et al., 2005) and brainstem (Cunningham et al., 2001; Russo et al., 2005) responses to speech sounds presented in white noise. However, it is not known how these noise-induced neural deficits relate to speech-in-noise (SIN) perception. In experiment 1, we examined the hypothesis that children with poor SIN perception have greater temporal delays in noise than children with good SIN perception. We predicted that children performing below the 50th percentile on a behavioral SIN task would have inordinate neural delays in multitalker babble, particularly in the region corresponding to the formant transition region of the stimulus, since this region is most perceptually vulnerable (Tallal and Stark, 1981; Banai et al., 2009; Hornickel et al., 2009). In experiment 2 using the same dataset, we examined whether children with reading impairments show degraded neural responses in noise compared with typically developing children, as predicted by the noise exclusion deficit hypothesis (Sperling et al., 2005).
Materials and Methods
Participants
Sixty-six children (ages, 8–14; mean, 10.9; SD, 1.70; 22 females) were recruited from public and private schools in the Chicago area as part of an ongoing study examining neural encoding of speech in children who are typically developing or have learning impairments. Thirty-six of these children had external diagnoses of learning impairments (29 of whom had reading impairments and 7 of whom had nonverbal learning impairments), and 30 children were normally developing. Audiometric thresholds were measured at octave intervals from 250 to 8000 Hz, and all participants demonstrated pure-tone thresholds <20 dB with no conductive hearing loss present at two or more frequencies in either ear. Inclusionary criteria also included normal wave V click-evoked ABR latencies and normal cognitive abilities based on standard scores of ≥85 on verbal, performance, and overall scores of the WASI (Wechsler Abbreviated Scales of Intelligence) (Zhu and Garcia, 1999). All experimental procedures were approved by the Northwestern University Institutional Review Board.
Behavioral measures
Speech understanding in noise was evaluated with the Hearing in Noise Test (HINT) (Bio-Logic Systems), which uses the Bamford–Kowal–Bench (BKB) (Bench et al., 1979) phonetically balanced sentences appropriate for children at the first-grade reading level and above. Age-normed percentile HINT scores were used in the analysis.
To evaluate the relationship between SIN performance and literacy, reading ability was evaluated using the Test of Word Reading Efficiency–Total (TOWRE-T) (Torgesen et al., 1999), a standard test of reading efficiency. The TOWRE-T combines measures of the ability to sound out nonwords and to recognize real words quickly and accurately.
Participant groups
Top and bottom SIN groups were formed based on HINT-Front scores. The top SIN group (N = 30) had HINT scores ≥50th percentile (mean, 78.26; SD, 15.92; range, 50–100) and the bottom SIN group (N = 36) had scores <50th percentile (mean, 20.28; SD, 16.20; range, 0.02–47.50). In this HINT condition, the target sentences and masking noise emanate from the same loudspeaker located 1 m directly in front of the participant. There were no significant SIN group differences for pure-tone audiometric thresholds from 250 to 8000 Hz (p = 0.858, independent t test; top SIN group: mean, 3.24 dB; SD, 4.33 dB; bottom SIN group: mean, 3.48 dB; SD, 4.95 dB), click-ABR latencies (p = 0.333, independent t test; top SIN group: mean, 5.89 ms; SD, 0.16 ms; bottom SIN group: mean, 5.84 ms; SD, 0.14 ms), or reading score (p = 0.555, independent t test; top SIN group: mean, 101.82; SD, 21.51; bottom SIN group: mean, 99.06; SD, 15.67).
Top and bottom reading groups were formed based on TOWRE-T reading scores and external diagnosis of reading impairment. Children in the bottom reading group (N = 28) had an external diagnosis of reading impairment as well as a TOWRE score <100 (TOWRE: mean, 83.5; SD, 0.944; range, 58–96), and children in the top reading group (N = 27) were typically developing and had a TOWRE score ≥100 (TOWRE: mean, 116.48; SD, 11.08; range, 101–138). There were no significant reading group differences for pure-tone audiometric thresholds from 250 to 8000 Hz (p = 0.591, independent t test; top readers: mean, 3.01 dB; SD, 4.49 dB; bottom readers: mean, 3.87 dB; SD, 4.78 dB), click-ABR latencies (p = 0.921, independent t test; top readers: mean, 5.88 ms; SD, 0.13 ms; bottom readers: mean, 5.87 ms; SD, 0.19 ms), or HINT scores (p = 0.375, independent t test; top readers: mean, −0.66; SD, 1.64; bottom readers: mean, −0.66; SD, 1.32).
Electrophysiology
Stimulus and recording.
The speech syllable [da] was a six-formant, 170 ms syllable [described by Parbery-Clark et al. (2009b)] synthesized at a 20 kHz sampling rate using a Klatt synthesizer (Klatt, 1980). The [da] stimulus was presented with a 60 ms interstimulus interval using interleaved alternating stimulus polarities to the right ear at 80 dB sound pressure level (SPL) through an electromagnetically shielded insert earphone (ER-3; Etymotic Research) using the stimulus presentation software NeuroScan Stim2 (Sound module; Compumedics). Before each recording session, the stimulus [da] was calibrated to 80 dB SPL using a Bruel & Kjaer 2238 Mediator sound level meter coupled to an insert earphone adaptor, sampling the SPL over 60 s to obtain the average SPL. The stimulus intensity was lower than levels tolerated by typically developing children and children with attention deficit disorders (Lucker et al., 1996). Responses were recorded with a vertical montage using NeuroScan Acquire4 from Cz-to-earlobe with forehead as ground at a sampling rate of 20 kHz. During electrophysiological testing, participants watched movies of their choice in a comfortable reclining chair. The left ear was unoccluded enabling the participant to hear the soundtrack played at <40 dB SPL, an insufficient loudness to mask the stimulus. The use of movies ensured participant cooperation by enabling them to sit quietly for 2 h sessions.
The [da] was presented in two blocks: quiet (i.e., no background babble) and six-talker babble background noise. The six-talker babble (four female and two male voices) was created by mixing six tracks of sentences in Cool Edit Pro, version 2.1 (Syntrillium Software, 2003), into a 4.7 s babble track with the signal-to-noise (SNR) set at +10 dB relative to the [da] based on the root mean square (RMS) amplitude of the entire track. The Stim2 Sound program automatically tracks the level of the babble relative to the [da] syllable, keeping the SNR constant at +10 dB.
Data analysis.
Electrophysiological responses were off-line bandpass filtered from 70 to 2000 Hz (12 dB/octave, zero phase-shift) to minimize low-frequency myogenic noise and cortical activity and to include energy that would be expected in the brainstem response given its phase-locking limits (Chandrasekaran and Kraus, 2010a; Skoe and Kraus, 2010). Responses were then averaged over a window of −40 to 190 ms, with time 0 corresponding to the stimulus onset. An artifact reject criterion of ±35 μV was applied, and for each stimulus polarity 3000 artifact-free responses were averaged together.
The SNR of the final average response was measured by dividing the RMS of the response region (0–190 ms) of the waveform by the RMS of the prestimulus region (−40 to 0 ms). This metric was used to ensure that the response was adequately free of myogenic and electrical noise. All subjects had a minimum SNR of 1.5 in the quiet condition and 1.35 in the noise condition.
Measurement of the brainstem response
The brainstem evoked response to this 170 ms [da] syllable is characterized by three time domain regions: the onset, transition, and steady state, reflecting the corresponding characteristics of the stimulus (see Fig. 1, middle). The onset response typically has a latency of 8–11 ms and is analogous to wave V in the click response (Song et al., 2006; Chandrasekaran and Kraus, 2010a). The transition response specific to this [da] token occurs within 20–60 ms and corresponds to the consonant-to-vowel formant transition. The transition and the steady state are characterized by large, periodic peaks occurring every 10 ms, corresponding to the period of the 100 Hz fundamental frequency of the syllable.
The peaks of the brainstem responses thought to be most critical to speech perception are those that reflect important speech features, such as in the time-varying formant transition (Johnson et al., 2008; Hornickel et al., 2009). In this study, SIN- and reading-group differences were most apparent within the time-varying transition region of the response (20–60 ms). For this reason, the analysis focused on the latencies of peaks in the transition region (three positive- and negative-going peak pairs at mean latencies of ∼32, 34, 42, 43, 52, and 53 ms) in both the quiet and the six-talker babble recordings (see Fig. 1, bottom). For comparison, 12 peak pairs occurring every 10 ms with the steady-state region (60–180 ms) were also evaluated. Latencies of early peaks occurring at 9 and 10 ms (defined as the response to the syllable onset) and 23 and 24 ms (defined as the response to the voicing onset) were also evaluated. Results are reported using a consistent nomenclature for the peaks (e.g., peak 32 refers to the peaks occurring at ∼32 ms, etc.). The peaks were identified by the primary author and a second peak picker who was blind to group membership. In cases of disagreement over peak identification, the assistance of a third peak picker was obtained. An interpeak picker reliability measure of 92% was obtained. Although the syllable onset peaks (9 and 10 ms) and voicing onset peaks (23 and 24 ms) were consistently present in the quiet condition, their amplitudes did not exceed the noise floor in the babble condition and were therefore excluded from the analysis.
Statistical analyses
In experiment 1, we compared ABRs in the quiet and babble conditions using a two-way mixed-model multivariate analysis of covariance (ANCOVA) in SPSS with group (top SIN vs bottom SIN) serving as the between-group independent variable and condition (quiet vs noise) serving as the within-group independent variable. We covaried for reading scores to ensure that the results were not driven by reading ability, given that previous studies have demonstrated SIN deficits in children with learning impairments (Bradlow et al., 2003; Ziegler et al., 2009). Positive peaks at ∼32, 42, and 52 ms and negative peaks at ∼34, 43, and 53 ms served as dependent variables for the analysis of the transition region of the response. In addition, positive and negative peaks from 60 to 180 ms served as dependent variables in the analysis of the steady-state region of the response.
In experiment 2, we compared ABRs in the quiet and babble conditions using a two-way mixed-model multivariate ANCOVA in SPSS with group (top readers vs bottom readers) serving as the between-group independent variable and condition (quiet vs noise) serving as the within-group independent variable. We covaried for HINT scores to ensure that the results were not driven by SIN performance. Pearson's correlations were calculated for the entire group (N = 66) between HINT-Front and TOWRE-T scores.
Results
For all children, background noise significantly delayed the brainstem response (Fig. 1); however, the children in the bottom SIN and bottom reading groups had greater delays in the transition period relative to the top groups. Means and SDs for each peak pair in the transition and the first two peak pairs in the steady state are provided in Table 1.
Greater timing delays in poor SIN performers (experiment 1)
A two-way mixed-model ANCOVA (including the six transition peaks) demonstrated a main effect of condition (F(6,58) = 14.984; p < 0.001), indicating that noise had the expected overall effect of prolonging neural responses in both groups (Fig. 2). Furthermore, a significant interaction between SIN group and condition was noted (F(6,58) = 3.288; p = 0.007). Post hoc analyses indicated significant group by noise effects for peaks 42 (F(1,63) = 7.879; p = 0.007) and 43 (F(1,63) = 11.157; p = 0.001). Figure 2, inset, demonstrates this significant interaction at one of these peaks (42), in which the peak latencies are essentially equivalent between the groups in quiet, but in noise the bottom SIN group is significantly delayed compared with the top SIN group.
A two-way mixed-model ANOVA using peaks in the steady-state portion of the response (60–180 ms) revealed no significant differences in the quiet-to-noise latency shifts between the SIN groups (F(24,41) = 1.109; p = 0.400). Thus, group differences were restricted to the formant transition period in noise. Furthermore, the group differences did not result from differences in the overall magnitude of neural activity. Based on independent t tests, no SIN group differences were attributable to SNR (p = 0.889) or RMS (p = 0.357) differences in the quiet condition or to SNR (p = 0.504) or RMS (p = 0.769) differences in the noise condition when calculated over the entire response (0–190 ms).
Greater timing delays in poor readers (experiment 2)
A two-way mixed-model ANCOVA (including the six transition peaks) indicated a significant main effect of condition (F(6,58) = 10.611; p < 0.001) as well as a significant main effect of group (F(6,58) = 2.320; p = 0.048) with the bottom reading group having greater peak timing delays than the top group (Fig. 3). Post hoc analyses indicated a significant group by noise effect for peak 52 (F(1,63) = 4.959; p = 0.030) (Fig. 3, inset). In general, the poor readers had greater neural delays in noise. The correlation between HINT-Front and TOWRE-T (r = 0.277; p = 0.024) indicated the presence of a weak but significant relationship.
A two-way mixed-model ANOVA using the peaks in the steady-state portion of the response (60–180 ms) revealed no significant effect of group (F(24,30) = 0.921; p = 0.577). Reading group differences were not attributable to SNR (p = 0.147) or RMS (p = 0.178) differences in the quiet condition or to SNR (p = 0.327) or RMS (p = 0.593) differences in the noise condition and were therefore not a result of differences in overall neural magnitude.
Discussion
To summarize, when comparing the brainstem responses in top and bottom SIN perceivers and top and bottom readers, we found greater noise-induced timing delays in both the bottom SIN and reading groups. As we predicted, these peak delays corresponded to the formant transition of the evoking syllable, the most perceptually vulnerable segment of the speech syllable (Tallal and Stark, 1981). These results are in line with our prediction that deficient SIN perception and reading is associated with decreased neural synchrony leading to impaired processing of timing information in noise.
Our finding of greater noise-induced delays in bottom SIN perceivers supports the importance of temporal resolution in perception (Benasich and Tallal, 2002; Tremblay et al., 2002; Hornickel et al., 2009). The role of temporal resolution was also recently demonstrated in a study comparing brainstem timing in musicians versus nonmusicians (Parbery-Clark et al., 2009a). Musicians, who demonstrate a perceptual benefit for SIN perception (Parbery-Clark et al., 2009b), also have more robust brainstem timing in background noise relative to nonmusicians. Temporal information is important for object identification and subsequent sound segregation (Shinn-Cunningham and Best, 2008), and our results suggest that inordinate noise-induced neural delays impede the listener's ability to extract the desired signal from background noise, interfering with stream segregation at brainstem and cortical levels, ultimately leading to poorer SIN perception.
The children whose responses reflected the greatest noise-induced decreases in temporal resolution may be exhibiting the auditory analog of a noise exclusion deficit (Sperling et al., 2005). Sperling et al. showed that dyslexic and nondyslexic children performed differently on a visual task only when the visual stimulus was embedded in noise. Similarly, we found that children with poor SIN perception or reading had timing delays only in the noise condition but not in quiet, indicating that a noise exclusion deficit may also be present in the auditory system. The finding of noise-induced neural timing delays in children with either poor SIN or poor reading suggests the possibility of a common mechanism, such as a noise exclusion deficit, contributing to impairments in both of these populations.
The cause of these brainstem timing deficits is still a matter of debate. Computational models have placed the loci of SIN deficits at lower levels of the auditory system, including the brainstem, auditory nerve, and cochlea (Shamma and Klein, 2000; Carney et al., 2002). Suprathreshold temporal deficits of cochlear processing may be responsible for upstream deficits in the brainstem. Cochlear damage has also been found to degrade neural phase-locking cues in regions of audiometrically normal hearing (Lorenzi et al., 2006, 2009). However, recent work has suggested that lifelong experiences may result in top–down modulation of brainstem responses in noise (Parbery-Clark et al., 2009a). Together, this work suggests that an interplay of top–down and bottom–up processing may be needed to overcome the deleterious effects of challenging listening conditions.
This study has important implications for auditory training and other forms of intervention. Previous work has demonstrated improvement in brainstem activity after short-term auditory training in both children (Russo et al., 2005) and adults (de Boer and Thornton, 2008; Song et al., 2008) as well as improvement in cortical responses in children (Warrier et al., 2004) and adults (Tremblay et al., 2002). A number of commercially available adaptive auditory training programs (Tallal, 2004; Henderson Sabes and Sweetow, 2007; Smith et al., 2009) take advantage of exaggerated temporal cues to facilitate learning. Our results indicate that temporal training may aid SIN perception. Recent work has also demonstrated that musical training results in enhanced ability to hear speech in background noise (Parbery-Clark et al., 2009a; Chandrasekaran and Kraus, 2010b). Future and ongoing work will help us to determine which aspects of auditory training provide the best enhancement of spectrotemporal representation and to delineate objective measures of training efficacy and predictors of success.
Footnotes
-
This work was supported by National Institutes of Health Grant R01 DC01510 and The Hugh Knowles Center of Northwestern University. We thank Trent Nicol, Alexandra Parbery-Clark, and Jane Hornickel for their helpful comments and suggestions regarding this manuscript and Alexandra Parbery-Clark for her help with data analysis. We especially thank the children and their families who participated in the study.
- Correspondence should be addressed to Dr. Nina Kraus, 2240 Campus Drive, Evanston, IL 60208. nkraus{at}northwestern.edu