Older adults frequently report they can hear what is said but cannot understand the meaning, especially in noise. This difficulty may arise from the inability to process rapidly changing elements of speech. Aging is accompanied by a general slowing of neural processing and decreased neural inhibition, both of which likely interfere with temporal processing in auditory and other sensory domains. Age-related reductions in inhibitory neurotransmitter levels and delayed neural recovery can contribute to decreases in the temporal precision of the auditory system. Decreased precision may lead to neural timing delays, reductions in neural response magnitude, and a disadvantage in processing the rapid acoustic changes in speech. The auditory brainstem response (ABR), a scalp-recorded electrical potential, is known for its ability to capture precise neural synchrony within subcortical auditory nuclei; therefore, we hypothesized that a loss of temporal precision results in subcortical timing delays and decreases in response consistency and magnitude. To assess this hypothesis, we recorded ABRs to the speech syllable /da/ in normal hearing younger (18–30 years old) and older (60–67 years old) adult humans. Older adults had delayed ABRs, especially in response to the rapidly changing formant transition, and greater response variability. We also found that older adults had decreased phase locking and smaller response magnitudes than younger adults. Together, our results support the theory that older adults have a loss of temporal precision in the subcortical encoding of sound, which may account, at least in part, for their difficulties with speech perception.
As the aging population grows, the communication problems experienced in later stages of life are more prevalent. Older adults often report that they can hear what their communication partner is saying but that they cannot understand the meaning. In other words, the problem is not simply one of audibility but of clarity. Accurate speech discrimination requires precise temporal processing, an ability that is compromised in older adults compared with younger adults. Speaking at slower rates improves intelligibility for older adults (Craig, 1992), supporting the idea that overall age-related slowing contributes to problems with speech perception (Wingfield et al., 1999; Tremblay et al., 2002). Neurophysiological evidence of delayed neural timing and decreased temporal processing ability with age has been found in animals (Walton et al., 1998; Burkard and Sims, 2001; Finlayson, 2002; Recanzone et al., 2011) and humans (Burkard and Sims, 2001; Caspary et al., 2005; Lister et al., 2011; Vander Werff and Burns, 2011; Wang et al., 2011; Konrad-Martin et al., 2012; Parbery-Clark et al., 2012). For example, Burkard and Sims (2001) found prolonged latencies in the auditory brainstem response (ABR) to clicks in older relative to younger adults but only at fast stimulation rates, affirming the idea that temporal processing of rapid stimuli is compromised in older adults. What remains unknown is the effect of aging on subcortical processing of specific speech cues, such as timing, frequency, and harmonics, in older participants who are relatively free of hearing loss.
Older adults' neural responses may be affected by decreased levels of inhibitory neurotransmitters, a consistent finding in the dorsal cochlear nuclei (Caspary et al., 2005; Wang et al., 2009), inferior colliculi (ICs) (Caspary et al., 1995), and auditory cortices (de Villers-Sidani et al., 2010; Hughes et al., 2010; Juarez-Salinas et al., 2010) of aging animals. Speech is temporally dynamic (Liederman et al., 2005); accurate processing of the rapidly changing aspects of speech depends in part on the sharpening of neural responses through inhibitory mechanisms (Walton et al., 1998; Caspary et al., 2002, 2008). The synchronous neural firing of the auditory brainstem results in a precise temporal representation of incoming sound (Kraus et al., 2000), but this precision may be degraded by decreased inhibition or other factors association with aging, such as variability in neural firing (Turner et al., 2005; Yang et al., 2009) or temporal jitter (Pichora-Fuller et al., 2007). Indeed, temporal jitter or lack of neural response consistency is thought to contribute to impairments in binaural processing (Pichora-Fuller and Schneider, 1992) and speech perception deficits in older adults (Pichora-Fuller et al., 2007).
We hypothesized that the consequences of reductions in inhibitory neurotransmitters in older adults include the following: (1) delayed neural timing (i.e., delayed peak latencies); (2) greater variability in neural firing, reflected in decreased intertrial response consistency; and (3) decreased phase locking and spectral magnitudes. Here, we compared subcortical processing of a speech syllable in younger and older adults using the ABR to complex sounds (cABR), a method that has proven useful in older adults and other populations (Anderson et al., 2011; Vander Werff and Burns, 2011). Establishing the biological consequences of age-related neural slowing on subcortical speech encoding will help elucidate inferred mechanisms underlying impaired speech-in-noise (SIN) perception in older adults.
Materials and Methods
Participants comprised 17 young adults (18–30 years old; mean ± SD, 23.18 ± 3.52 years; four males) and 17 older adults (60–67 years old; mean ± SD, 62 ± 2.24; three males) recruited from the Chicago area. All participants had clinically normal hearing, defined as follows: (1) air conduction thresholds ≤20 dB HL from 125 to 4000 Hz bilaterally (≤25 dB HL at 8000 Hz); (2) air-bone gaps ≤10 dB HL; and (3) no interaural asymmetry (≥15 dB HL difference at two or more frequencies). Additionally, all subjects had normal click-evoked brainstem response latencies (wave V < 6.8 ms), measured by a 100 μs click stimulus presented at 80 dB SPL at a rate of 31.4 Hz. No participants reported a history of neurologic conditions. Because extensive musical training and bilingualism can affect auditory brainstem processing across the lifetime (Parbery-Clark et al., 2009; Bidelman and Krishnan, 2010; Krizman et al., 2012; Parbery-Clark et al., 2012; for review, see Kraus and Chandarsekaran, 2010), anyone who reported a history of musical training (>3 years) or was a non-native English speaker was excluded.
Participants had normal IQ scores [young adults, ≥85 on the Test of Nonverbal Intelligence (Brown et al., 1997); older adults, ≥85 on the Wechsler Abbreviated Scale of Intelligence (Zhu and Garcia, 1999)]. Different IQ screening tests were used for these age groups because we combined datasets from two studies for the aging analysis. In addition, the older adults were screened for dementia using a cutoff score of 22/30 on the Montreal Cognitive Assessment (MOCA) (Nasreddine et al., 2005). Groups were sex matched (Fisher's exact test, p = 0.656) and matched audiometrically through 2000 Hz (t tests, all p > 0.1). Although there were significant group differences for hearing thresholds at 4000 and 8000 Hz (p < 0.001), the difference at 4000 Hz was within the margin of error of the audiogram (∼5 dB HL), and the difference at 8000 Hz was ∼12 dB HL. See Table 1 for participant characteristics in each group and Figure 1 for average audiometric thresholds and click latencies. All procedures were reviewed and approved by the Institutional Review Board of Northwestern University. Participants gave informed consent and were paid for their time.
Stimulus and recording.
A 170 ms speech syllable, /da/, was synthesized at a 20 kHz sampling rate with a Klatt-based synthesizer (Klatt, 1980). After an initial 5 ms stop burst in the syllable, voicing remained constant with a fundamental frequency (F0) of 100 Hz. During the 50 ms transition from the /d/ to the /a/, the lower three formants shifted (F1, 400 → 720 Hz; F2, 1700 → 1240 Hz; F3, 2580 → 2500 Hz) but stabilized for the 120 ms steady-state vowel portion. The fourth through sixth formants (F4–F6) remained constant over 170 ms at 3300, 3750, and 4900 Hz, respectively. The /da/ was chosen because it combines a transient (the /d/) and periodic (the /a/) segment, two acoustic features that have been extensively studied in cABRs (Skoe and Kraus, 2010). Additionally, stop consonants pose challenges to young and old listeners (Miller and Nicely, 1955) and thus provide a more ecologically valid stimulus than other complex sounds, such as tones. Finally, the /da/ has been shown to elicit a robust and replicable cABR across the lifespan (Hornickel et al., 2009; Parbery-Clark et al., 2012). A waveform of the /da/ is presented in Figure 2A, along with average responses in younger and older adults (C). A spectrogram is shown in Figure 2B. The /da/ was presented binaurally using Neuroscan Stim2 (Compumedics) with alternating polarities at 80 dB SPL at a rate of 3.95 Hz through electromagnetically shielded insert earphones (ER-3; Etymotic Research). Subcortical responses were digitized at 20 kHz. Electromagnetically shielded insert earphones were used to reduce stimulus and noise artifact. A vertical montage of 4 Ag–AgCl electrodes (Cz active, forehead ground, earlobe references) was used with all impedances <5 kΩ. Continuous responses were recorded with Neuroscan Acquire 4.3. During the recording session (∼28 min), participants sat in a recliner and watched a silent, captioned film of their choice to facilitate a relaxed yet wakeful state. Six thousand artifact-free sweeps were recorded from each participant.
Responses were digitally bandpass filtered offline from 70 to 2000 Hz using Neuroscan Edit. This frequency range filters out cortical activity while maximizing signal-to-noise ratio (SNR) and the detection of transient peaks (such as the onset). A Butterworth filter with a 12 dB/octave rolloff and zero phase shift was applied, and then responses were epoched using a −40 to 213 ms time window referenced to the stimulus onset. Any sweep with amplitude greater than ±35 μV was considered artifact and rejected before averaging. Final averages comprised 6000 sweeps (3000 of each polarity). Two final average responses were created for subsequent analysis. In the first, the two polarities were added to minimize the influence of cochlear microphonic and stimulus artifact on the response (Gorga et al., 1985; Campbell et al., 2012) and to maximize the envelope response. In the second average, the two polarities were subtracted to enhance temporal fine structure in the harmonics (formant-related harmonics) (Aiken and Picton, 2008; Hornickel et al., 2012). Added responses were used to analyze latency, amplitude, frequency representation of the fundamental frequency and lower harmonics, phase-locking factor, and response consistency; subtracted polarities were used to analyze the frequency representation of the first formant (400–720 Hz). Before analysis, responses were amplitude baselined in the prestimulus period.
To analyze the effects of age on neural timing, we manually identified peaks in the subcortical responses. The identification provides the latency and amplitude of each peak. Peaks were labeled according to a reference latency (i.e., a peak occurring ∼33–34 ms after onset would be called “peak 33”; see Fig. 3). The onset peak was identified as peak 9, transition peaks were 33, 43, 53, and 63, and steady-state peaks were 73, 83, 93, …, 163. Note that we included the 63 ms peak in the transition analysis, because we observed that the periodicity in the older adults' responses did not become stable until 70 ms. Two trained peak pickers, blind to participant group, identified each major peak of interest in the onset, transition, and steady-state portions of the response. An additional trained peak picker confirmed each peak identification. We also identified peaks for waves I, III, and V of click-evoked responses.
We computed a measure of response consistency over the length of the recording period. Filtered, epoched, and baselined responses were reaveraged offline in MATLAB (MathWorks). Two “paired” averages from an individual response were computed. Each average comprised 3000 randomly selected non-overlapping sweeps from the response (for example, if one average had sweeps 1, 2, 4, 7, …, and 6000, its companion average would have sweeps 3, 5, 6, 8, …, and 5999). These two averages were correlated to compute a Pearson's correlation coefficient. This process was performed 300 times for each individual response, each time selecting random subsets of sweeps; the mean correlation of these 300 sets of paired averages was computed. Pearson's r scores were Fisher transformed to z′-scores for statistical purposes (Cohen and Cohen, 1975). Correlations were performed over three time ranges corresponding to the entire response (5–180 ms), the transition (20–60 ms), and the steady state (60–170 ms).
To obtain a measure of phase-locking ability, we performed a wavelet analysis in MATLAB. This provides a measure of trial-to-trial phase coherence (“phase-locking factor”) of encoding of the speech syllable within a specified time × frequency range of the response (Fell, 2007). The analysis procedure is similar to the wavelet analysis described by Tallon-Baudry et al. (1996). A complex Morlet (Gaussian-localized) wavelet function was convoluted with the response at each time point to provide an amplitude (arbitrary units) and phase (radians) value for each time × frequency bin in the response. The amplitude corresponds to the energy at that point in the signal and the phase to the direction of a momentary deflection in the EEG signal. This amplitude was normalized to 1 to weigh each trial equally, regardless of its magnitude (i.e., voltage). Resulting vectors were averaged to compute a complex value that represents phase distribution for each one of the 6000 trials of the /da/; the modulus (i.e., absolute magnitude) of this value is the resulting phase-locking factor. A phase-locking factor of 1 indicates perfect phase coherence at every point in every trial, whereas a phase-locking factor of 0 indicates no phase coherence at any point in every trial. In practice, the range of phase-locking factors is small (up to ∼0.15–0.20) because of the relatively poor SNR offered by a single neural response trial in a far-field brainstem recording, resulting in the presence of multiple aberrant oscillatory deflections (cf. Ruggles et al., 2011, 2012). Phase-locking factors were calculated across the frequency range of 70–800 Hz for the entire response (5–180 ms), the transition (20–60 ms), and the steady state (60–170 ms). Phase-locking factors were also computed for restricted frequency bins corresponding to the F0 (100 Hz) and its integer harmonics (200–700 Hz) across the same three time ranges. Additionally, a three-dimensional plot was generated (see Fig. 5) with a color spectrum representing phase-locking power (warmer colors indicating stronger phase locking).
Response and spectral magnitudes.
Root mean square (RMS) amplitude was used to objectively quantify the overall magnitude of response and prestimulus (i.e., nonresponse) activity. RMS amplitudes were computed for the prestimulus period (−40 to 0 ms), the entire response (5–180 ms), the transition (20–60 ms), and the steady state (60–170 ms).
Average spectral amplitudes were calculated from each response; for each time region, zero padding was applied before a Fourier analysis to increase the resolution of the spectral display to 1 Hz/point. For statistical analyses, average amplitudes were computed over 20 Hz bins around the frequencies of interest, which included the fundamental frequency (F0) and its integer harmonics up to 1000 Hz (H2–H10). This procedure was performed over the entire response, the transition, and the steady state for added and subtracted polarity averages.
All statistical analyses were conducted in SPSS version 18.0 (SPSS). Multivariate ANOVA was used for group comparisons (young vs old) of multiple peaks, multivariate analysis of covariance was used for RMS amplitudes, the F0, and its harmonics, and one-way ANOVAs were used for latencies of individual peaks, response consistency, and phase-locking factor. Levene's test was used to ensure homogeneity of variance for all measures, and Shapiro–Wilk tests were used to ensure that all variables were normally distributed. The assumption of homogeneity of variance was violated for the latency analysis; therefore, Pillai's trace criterion was used as a conservative model. Bonferroni corrections for multiple comparisons were applied as appropriate; p values reflect two-tailed tests.
As predicted, younger adults had earlier peak latencies but only for the region of the response corresponding to the onset of the syllable (F(1,33) = 75.436, p < 0.001) and rapidly changing formant transition (overall: F(4,29) = 11.683, p < 0.001; peaks: 33 ms, F(1,33) = 13.597, p = 0.001; 43 ms, F(1,33) = 5.039, p = 0.038; 53 ms, F(1,33) = 7.626, p = 0.009; 63 ms, F(1,33) = 5.779, p = 0.022). In contrast, the response peaks corresponding to the unchanging steady-state region of the syllable (73–163 ms) were equivalent between groups (overall: F(10,23) = 1.506, p = 0.200; no individual peak differed, all p > 0.1). For a graphic representation of latency differences between groups, see Figure 3 and Table 2.
We also compared click-evoked ABRs to determine whether these latency differences were driven by peripheral changes in the auditory system. The groups were closely matched on wave I (t(33) = 1.069, p = 0.293) and wave III (t(33) = 1.603, p = 0.119); however, consistent with the /da/ results, evidence of delayed neural timing was seen in a trending difference in wave V latencies (t(33) = 1.792, p = 0.083). To ensure that these results were not driven by high-frequency hearing differences, we reran the analysis covarying for thresholds at 4000 and 8000 Hz and found that the trending difference in wave V latencies was still present (t(30) = 1.872, p = 0.064). See Table 1 for mean click latencies and Figure 1 for a comparison of average click-evoked waveforms for each group.
There were striking differences in response consistency between younger and older adults, whereby older adults had substantially less consistent responses. This was the case for the entire response (F(1,33) = 18.792, p < 0.001), the transition (F(1,33) = 12.167, p = 0.001), and the vowel (F(1,33) = 17.692, p < 0.001.) To ensure that this effect was not driven by differences in levels of resting neural noise, we reran the analyses by covarying for prestimulus RMS amplitude and found no differences in the results. Mean correlation values and SDs are reported in Table 3.
Younger adults had better neural phase locking to the stimulus. When the phase-locking factor was averaged across a broad frequency range (70–800 Hz), there were substantial differences in the transition (F(1,33) = 16.252, p < 0.001), the steady state (F(1,33) = 16.287, p < 0.001), and the entire response (F(1,33) = 18.468, p < 0.001; Table 4). Average phase-locking factor is displayed for each group in Figure 4.
Response magnitude: time and frequency domains
In the time domain, younger adults had greater RMS amplitudes for regions of the cABR corresponding to both the transition and the steady state of the stimulus compared with older adults (transition: F(1,32) = 5.694, p = 0.023; steady state: F(1,32) = 8.947, p = 0.005; entire: F(1,32) = 8.511, p = 0.007), when covarying for prestimulus RMS amplitude. Although in older adults the amplitudes of response regions were smaller, the amplitude of the prestimulus region was larger in the older group (t(33) = 3.295, p = 0.002) (Fig. 5A). The more robust response magnitudes in young adults were also reflected in the frequency domain as greater representation of the fundamental frequency (F0) and lower harmonics in the added responses calculated over the entire response (overall: F(3,30) = 3.434, p = 0.030; F0: F(1,33) = 3.881, p = 0.058; H2: F(1,33) = 7.458, p = 0.010; H3: F(1,33) = 10.548, p = 0.003). However, these effects were not as strong when the transition (F0: F(1,33) = 2.818, p = 0.103; H2: F(1,33) = 5.421, p = 0.043; H3: F(1,33) = 4.434, p = 0.043) and steady state (F0: F(1,33) = 0.158, p = 0.694; H2: F(1,33) = 5.235, p = 0.029; H3: F(1,33) = 2.025, p = 0.164) were analyzed separately.
Spectral magnitudes were also calculated using subtracted polarities to obtain an estimate of fine structure representation that is less influenced by the speech envelope (Aiken and Picton, 2008; Skoe and Kraus, 2010; Hornickel et al., 2012). Spectral magnitudes corresponding to the first formant (400–700 Hz) of the stimulus were smaller in older versus younger adults when covarying for RMS amplitude in the prestimulus period (overall: F(4,29) = 6.707, p = 0.001; H4: F(1,33) = 14.617, p = 0.001; H5: F(1,33) = 3.359, p = 0.076; H6: F(1,33) = 16.994, p < 0.001; H7: F(1,33) = 3.225, p = 0.082). The comparison was also significant for the steady state (overall: F(1,33) = 5.737, p = 0.002; H4: F(4,29) = 10.372, p = 0.003; H5: F(4,29) = 4.003, p = 0.054; H6: F(1,33) = 15.889, p < 0.001; H7: F(1,33) = 1.619, p = 0.212) but was not for the transition (overall: F(4,29) = 1.148, p = 0.354; H4: F(1,33) = 1.690, p = 0.203; H5: F(1,33) = 2.201, p = 0.148; H6: F(1,33) = 1.234, p = 0.275; H7: F(1,33) = 3.760, p = 0.061). Average added and subtracted spectral magnitudes are displayed in Figure 5.
Repeat analyses excluding individuals with below normal scores on the MOCA
Five of 17 older participants had scores below 27 on the MOCA (scores: 24, 25, 25, 26, 26), which can indicate the possibility of mild cognitive impairment (Nasreddine et al., 2005). To ensure that the cABR differences were not driven by cognitive impairment, we repeated the analyses excluding these five subjects. All of the effects held as follows (entire response unless noted): (1) timing differences in onset and transition (F(5,23) = 13.670, p < 0.001); (2) response consistency (F(1,31) = 31.11, p < 0.001); (3) phase locking (F(1,31) = 9.773, p = 0.004), RMS amplitude (F(1,30) = 4.991, p = 0.013), spectral added (low frequencies; F(3,24) = 3.965, p = 0.020), and spectral subtracted (first formant; F(4,23) = 3.021, p = 0.039). We conclude, therefore, that the observed age-related declines were not driven by individuals who may be experiencing mild cognitive impairment.
Our results indicate delayed neural timing in the brainstem response to the onset and formant transition of a speech syllable in older adults compared with younger adults. This lack of neural precision was corroborated by decreased response consistency and phase-locking power. We found smaller RMS and spectral magnitudes in the responses of older adults. Our results replicate and extend previous findings of age-related latency and amplitude changes in response to a 40 ms stimulus encompassing only the onset and the transition (Vander Werff and Burns, 2011). The current results are also relatively free of the audiometric threshold confounds often found in comparisons of auditory performance in younger and older adults.
Using far-field electrophysiological techniques, we tested the inferred consequences of age-related decreases in inhibition on speech processing. Blocking inhibition in the IC reduces the selectivity required for discrimination of conspecific vocalizations in animal models (Klug et al., 2002), and GABAergic and glycinergic inhibition reduces the frequency selectivity of the IC (Koch and Grothe, 1998). The IC is the putative generator of the cABR (Chandrasekaran and Kraus, 2010); therefore, we propose that a reduction in inhibition alters subcortical temporal processing of the spectrotemporally dynamic formant transition.
The rapidly changing formants make the response to the transition vulnerable in noise, in contrast to the unchanging nature of the steady-state vowel (Anderson et al., 2010). Therefore, timing delays in the transition region likely lead to speech perception difficulties. The delays ranged from 1.5 ms in the onset and to 0.4–0.8 ms in transition. To put these results in context, interaural brainstem timing differences as small as 0.2 ms can be considered clinically significant when screening for cerebellopontine angle tumors (Grayeli et al., 2008), and differences on the order of 0.5–0.7 ms have separated good from poor perceivers of speech in noise (Anderson et al., 2010). Our finding of timing delays specific to the transition of the response indicate that aging selectively affects dynamic versus static encoding of speech processing. These selective timing differences in the transition may also arise from longer neural adaptation in older adults (Schneider and Hamstra, 1999). In the syllable /da/, the transition is shorter than the steady state (50 vs 120 ms); therefore, the timing differences in the transition may arise from the older adult's longer adaptation period.
The lack of differences in click-evoked wave I and III latencies suggest that these deficits occur farther along the auditory pathway than the cochlea and auditory nerve. Evidence for age-related changes in brainstem structures is not new; amplitude reductions and lengthened wave I–V intervals in the click-evoked response were found in studies performed in the 1970s and 1980s (Fujikawa and Weber, 1977; Jerger and Hall, 1980; Maurizi et al., 1982).
Response consistency and neural synchrony
A decrease in the synchrony of neural firing may account for delayed neural timing and an overall decrease in representation of speech cues. We therefore assessed neural response consistency, expecting greater intertrial variability in responses in older adults. Indeed, older adults had less consistent brainstem responses than younger adults. Numerous studies of auditory perception have documented greater between-subject variability in older adults and have attributed this variability to differences in cognitive function (Harris et al., 2008; Rossi-Katz and Arehart, 2009; Arehart et al., 2011; Fogerty et al., 2010). Our analysis documents greater within-subject variability. Similar increases in response variability have been found in the primary visual cortex and middle temporal visual area in older monkeys (Yang et al., 2009) and the auditory cortex of older rats (Turner et al., 2005). These results support the idea raised by Pichora-Fuller et al. (2007), that temporal jitter in older adults contributes to difficulties with SIN perception. Additionally, response variability may have been a factor in studies showing decreased amplitudes (Parthasarathy et al., 2010) and a loss of temporal precision (Schatteman et al., 2008) in older versus younger animals. Indeed, Schatteman et al. (2008) conjectured that inhibitory neurotransmitters play a role in the synchronization of the brainstem to amplitude modulations, thus accounting for the loss of temporal resolution in older adults (Gordon-Salant and Fitzgibbons, 1993; Strouse et al., 1998; Schneider and Pichora-Fuller, 2001; Tremblay et al., 2002).
Variable neural firing in the auditory system leads to an imprecise representation of an auditory object, decreasing the listener's ability to selectively attend to a specific object from other auditory sources (Shinn-Cunningham and Best, 2008). When listening in noise, older adults draw on cognitive resources such as attention and memory more than younger adults (Wong et al., 2009), and cognitive demands increase reliance on perceptual cues (Kuo et al., 2011), activating efferent pathways (Nahum et al., 2008, Bajo et al., 2010; for review, see Kraus and Chandrasekaran, 2010). These findings, together with our results, suggest that older adults' perceptual impairments may arise in part from impaired subcortical representations that are modulated by cognitive influences. Perceptual problems may be exacerbated by age-related declines in higher-order cognitive functioning, reducing the compensatory gain afforded by cognitive influences (Davis et al., 2012).
The phase-locking analysis demonstrates deficits in the consistency of representation of stimulus phase. In the auditory cortex, phase tracking of speech envelopes is posited as a mechanism of speech intelligibility (Luo and Poeppel, 2007; Giraud and Poeppel, 2012). Our finding that older adults have lower phase locking suggests that this decline may be one mechanism contributing to poor SIN perception in older populations. Similar to the effects of decreased inhibition on encoding of rapidly changing temporal events, deterioration of phase locking may also rise from decreased inhibitory neurotransmitter levels (Koch and Grothe, 1998).
Response magnitude: time and frequency domains
In a previous study, we found that older adults with good SIN perception have larger RMS and F0 amplitudes than age- and hearing-matched adults with relatively poor SIN perception (Anderson et al., 2011); we conjectured that the older participants with poor SIN perception had experienced greater effects of biological aging, although they were matched for chronological age. Here, we directly tested that conjecture by comparing older and younger adults. We confirmed that aging affects RMS and F0 amplitudes, reflecting decreased encoding of the speech envelope. Our results are consistent with previous findings indicating aging effects on phase coherence and amplitudes in brainstem responses to tone bursts (Clinard et al., 2010). In addition, the older adults' deficits in response to the stimulus fine structure provide a neural basis for the finding of age-related perceptual performance deficits in the use of fine structure cues (Vongpaisal and Pichora-Fuller, 2007; Grose and Mamo, 2010; Ben-David et al., 2011; Hopkins and Moore, 2011; Russo et al., 2012).
We found greater neural noise levels in our older adults during the prestimulus period. Similar results were found for click ABRs (Spivak and Malinoff, 1990): older adults had greater low-frequency background noise in their responses. This increased noise may be attributable to a number of factors, including exogenous factors, such as greater muscle tension (Brown et al., 1999), and endogenous factors, such as increased spontaneous activity in the IC (Willott et al., 1988). Greater spontaneous neural activity has also been found in the auditory cortices of older rats (Hughes et al., 2010), especially layers I–III, in which higher numbers of inhibitory neurons are found (Prieto et al., 1994). Increased neural noise will decrease the SNR, providing at least a partial explanation for the older adult's speech perception difficulties.
The older adults in our study all had normal hearing (thresholds ≤20 dB HL through 4000 Hz and ≤25 dB HL at 8000 Hz), potentially indicating that they are “biologically younger” than the majority of older people who experience some degree of peripheral presbycusis (Yueh et al., 2003). Despite this finding, our group of normal-hearing older adults still exhibited striking deficits in temporal processing; therefore, management of hearing difficulties in older adults must involve strategies extending beyond enhancing audibility. The benefits of training for improving the synchrony of neural firing have been demonstrated in rats (de Villers-Sidani et al., 2010), who achieved a partial reversal of behavioral deficits after being trained on a frequency discrimination task. Our results suggest that similar strategies should be considered when developing management plans for older adults who report difficulty hearing in noise. Studies with older musicians reinforce the idea that sustained, repeated cognitive engagement, such as is achieved through intensive musical training, offsets age-related subcortical timing delays (Parbery-Clark et al., 2012) and benefits SIN perception and temporal processing (Parbery-Clark et al., 2011; Zendel and Alain, 2012). Therefore, auditory training that engages memory and attention while training auditory discrimination is likely to improve auditory perception.
Although both groups had normal audiometric thresholds through 4000 Hz (≤20 dB HL), there were significant group differences in the high frequencies (4000 and 8000 Hz), which may have influenced our findings. Given that average hearing at these frequencies is highly related to age (ρ = 0.746, p < 0.001), controlling for thresholds would factor out our primary independent variable (age) and so was not a feasible resolution. Therefore, although we cannot fully rule out between-group differences in high-frequency hearing, we argue that these effects are, at least in part, driven by age-related biological declines.
We found evidence of diminished neural precision in the cABRs of older adults who had normal hearing across the audiometric frequency range, affirming the theory that suprathreshold temporal processing deficits contribute to the perceptual difficulties experienced by older adults (Ruggles et al., 2011; Shamma, 2011). One set of findings—age-related decreases in response consistency and phase locking—support the hypothesis that variability in neural firing leads to temporal processing deficits in older adults. Decreased inhibitory neurotransmission may be a factor in reduced ability to process rapid changes in speech stimuli. The neural timing delays were seen in response to the formant transition only, suggesting a selective timing deficit rather than a pervasive delay. Imprecise representation of an auditory signal reduces the ability to selectively attend to that signal and extract meaning from it. Work is currently underway to determine whether auditory training can partially offset these age-related changes in neural precision.
This work is supported by National Institutes of Health Grants T32 DC009399 and R01 DC010016. We thank Sarah Drehobl and Emily Hittner for their assistance with data collection and analysis and members of the Auditory Neuroscience Laboratory for their feedback. We express our appreciation to Adam Tierney and Trent Nicol for developing the codes used for the phase locking and response consistency analyses, respectively. We also thank Erika Skoe, Jane Hornickel, and Trent Nicol for their comments on this manuscript.
- Correspondence should be addressed to Dr. Nina Kraus, 2240 Campus Drive, Evanston, IL 60208.