Abstract
Aging results in pervasive declines in nervous system function. In the auditory system, these declines include neural timing delays in response to fast-changing speech elements; this causes older adults to experience difficulty understanding speech, especially in challenging listening environments. These age-related declines are not inevitable, however: older adults with a lifetime of music training do not exhibit neural timing delays. Yet many people play an instrument for a few years without making a lifelong commitment. Here, we examined neural timing in a group of human older adults who had nominal amounts of music training early in life, but who had not played an instrument for decades. We found that a moderate amount (4–14 years) of music training early in life is associated with faster neural timing in response to speech later in life, long after training stopped (>40 years). We suggest that early music training sets the stage for subsequent interactions with sound. These experiences may interact over time to sustain sharpened neural processing in central auditory nuclei well into older age.
Introduction
Over the past several years, evidence has emerged to suggest that musicians have nervous systems distinct from nonmusicians, potentially due to training-related plasticity (Moreno et al., 2009; Kraus and Chandrasekaran, 2010; Bidelman et al., 2011; Herholz and Zatorre, 2012). One of the most provocative findings in this literature is that lifelong music training may offset age-related declines in cognitive and neural functions (Hanna-Pladdy and MacKay, 2011; Parbery-Clark et al., 2012a; Zendel and Alain, 2012; Alain et al., 2013). To date, the majority of work on music training—including in older adults—has focused on individuals who play an instrument continuously throughout their lives. These are special cases, as many adults have dabbled with an instrument on and off, especially during childhood. An outstanding question is whether limited training early in life leaves a trace in the aging nervous system, affecting neural function years after training has stopped.
Aging results in pervasive declines throughout the nervous system, including in the auditory system. Even older adults with preserved peripheral function face a loss of central auditory function (Pichora-Fuller et al., 2007; Anderson et al., 2012; Ruggles et al., 2012). These age-driven deficits compound with declines in neocortical function that affect sensory and cognitive circuits (Gazzaley et al., 2005; Recanzone et al., 2011). Together, these declines reduce the ability to form a clear and veridical representation of the sensory world.
In speech, age-related deficits affect the neural encoding of fast-changing sounds, such as consonant–vowel (CV) transitions (Anderson et al., 2012). CV transitions pose perceptual challenges across the lifespan due to fast-changing spectrotemporal content and relatively low amplitudes (compared to vowels; Tallal, 1980). Precise coding of CV transitions supports auditory-cognitive and language-based abilities (Strait et al., 2013; Anderson et al., 2010). Yet age-related declines are not inevitable: older adults with lifelong music training do not exhibit neural timing delays in response to CV transitions (Parbery-Clark et al., 2012a). Moreover, the typical auditory system that has experienced this age-related decline is not immutable: short-term computer training in older adult nonmusicians partially reverses age-related delays in neural timing (Anderson et al., 2013b). This finding is consistent with emerging evidence from humans and animals suggesting a maintained potential for neuroplasticity into older age (Linkenhoker and Knudsen, 2002; Berry et al., 2010; de Villers-Sidani et al., 2010).
In young adults, neural enhancements from limited music training during childhood persist 5–10 years after training has stopped (Skoe and Kraus, 2012). The question remains, however, whether enhancements are preserved into older adulthood. Here, we hypothesized that older adults with a moderate amount of instrumental music training during childhood and/or young adulthood retain a neural trace of this early training, manifest as less severe age-related neural timing delays. To test this hypothesis, we measured scalp-recorded auditory brainstem responses to speech in a cohort of older adults who reported varying levels of music training early in life, but with no training after age 25. We predicted that there would be faster neural timing, indicating a more efficient auditory system, in older adults with more years of past music training.
Materials and Methods
Subjects.
Forty-four older adults (ages 55–76; 25 female) were recruited from the Chicago area. Audiometric thresholds were measured bilaterally at octave intervals from 0.125 to 8 kHz, including interoctave intervals at 3 and 6 kHz. Subjects had either normal hearing [thresholds from 0.25 to 8 kHz ≤20 dB hearing level (dB HL) bilaterally] or no more than a mild-to-moderate sensorineural hearing loss; all pure-tone averages (average threshold from 0.5 to 4 kHz) were ≤45 dB HL bilaterally. No individual threshold was >40 dB HL at or below 4 kHz or >60 dB HL at 6 or 8 kHz; no asymmetries were noted (>15 dB HL difference at two or more frequencies between ears). Stimulus presentation levels for electrophysiology were corrected for hearing loss (see Stimuli, below) and our subject cohort was a mix of older adults with normal hearing and hearing loss, as our group has previously published (Anderson et al., 2013b, 2013c). All subjects had normal wave V click-evoked latencies (<6.8 ms in response to a 100 μs click presented at 80 dB SPL at 31.25 Hz), and no history of neurological disorders. All subjects passed the Montreal Cognitive Assessment, a screening for mild cognitive impairment (Nasreddine et al., 2005). One subject was a statistical outlier and so was excluded (see Neural timing, below); all reported statistics, including descriptive statistics for each group, exclude this individual. Informed consent was obtained from all subjects in accordance with Northwestern University's Institutional Review Board.
Training groups.
Participants were divided into three groups based on self-report of formal instrumental music training: “None” (0 years, N = 15), “Little” (1–3 years, N = 13), and “Moderate” (4–14 years, N = 13). Music training was private and/or group instrumental instruction. Although the groups were separated based upon total years of music training, there were converging factors that motivated dividing the music-trained groups into 1–3 years versus 4–14 years. In the United States, 1–3 years of training corresponds approximately to training during middle school or junior high school. Many of the subjects in the Little group completed their music training during this time, whereas many subjects in the Moderate group continued training into high school or the beginning of college. In general, subjects in the Little group also rated themselves as less proficient on their instruments than those with ≥4 years of training. Therefore, we believe that this cutoff is meaningful and reflects the length, intensity, and success of subjects' music training. No subject reported any instrumental practice, performance, or instruction after age 25.
The groups did not differ on sex distribution, age, hearing, intelligence quotient (IQ), educational attainment, current levels of exercise, or age of training onset. Since groups were matched on these criteria, we did not model them as covariates in comparisons of neural function. IQ was measured with the Wechsler Abbreviated Scale of Intelligence vocabulary (verbal) and matrix reasoning (nonverbal) subtests (Zhu and Garcia, 1999). Educational attainment was inferred from a four-item Likert scale (highest academic grade completed: 1, middle school; 2, high school/equivalent; 3, college; 4, graduate or professional degree). To guard against group differences driven by general health or lifestyle factors (Anderson et al., 2013c), current levels of exercise were evaluated by having subjects provide the frequency (number of times per week) with which they engage in physical activities (cycling, walking, gardening, etc.) and then averaged, in an effort to control for health factors that may support better auditory function (Anderson et al., 2013c). See Tables 1 and 2 for group characteristics.
Electrophysiology.
To compare groups' neural representation of speech, we used scalp electrodes to measure auditory brainstem responses to a synthesized speech sound [da] (Fig. 1). This subcortical electrophysiologic response is a variant of the auditory brainstem response that is elicited in response to complex sounds instead of simple clicks or tones (Skoe and Kraus, 2010) and is generated by synchronous firing of midbrain nuclei, predominantly inferior colliculus (IC; for review, see Chandrasekaran and Kraus, 2010). Despite its subcortical origin, this response is malleable with experience and similar techniques have revealed changes due to developmental (Johnson et al., 2008; Anderson et al., 2012) and experience-dependent plasticity (Krishnan et al., 2009; Anderson et al., 2013b), including plasticity resulting from music training (Kraus and Chandrasekaran, 2010; Bidelman et al., 2011; Strait et al., 2013).
Stimuli.
A 170 ms six-formant speech syllable [da] was synthesized using a Klatt-based synthesizer at a 20 kHz sampling rate, with an initial 5 ms stop burst and a steady fundamental frequency (F0 = 100 Hz). During the first 50 ms (transition between the stop burst /d/ and the vowel /a/), the first, second, and third formants change (F1, 400 → 720 Hz; F2, 1700 → 1240 Hz; F3, 2580 → 2500 Hz) but stabilize for the subsequent 120 ms steady-state vowel. The higher formants are stable in frequency throughout the entire 170 ms (F4, 3300 Hz; F5, 3750 Hz; F6, 4900 Hz). The syllable was presented alone (“quiet” condition) and masked by a two-talker babble track (“noise” condition, adapted from Van Engen and Bradlow, 2007). In cases of hearing loss (N = 31 total; None group, N = 13; Little group, N = 7; Moderate group, N = 11; see Subjects for hearing classifications) the [da] stimulus was selectively frequency-amplified with the NAL-R (National Acoustic Laboratories—Revised) algorithm (Byrne and Dillon, 1986), a procedure our group has shown improves the morphology and replicability of the response while retaining important timing properties (Anderson et al., 2013a).
Recording parameters.
Auditory brainstem responses were recorded at a 20 kHz sampling rate using four Ag-AgCl electrodes in a vertical montage (Cz active, Fpz ground, and earlobe references). Stimuli were presented binaurally in alternating polarities (stimulus waveform was inverted 180°) with an 83 ms interstimulus interval through electromagnetically shielded insert earphones (ER-3A, Etymotic Research) at 80 dB SPL. During the recording session, participants were seated in a sound-attenuated booth and watched a muted, captioned movie of their choice to facilitate an alert but restful state. Sixty-three hundred sweeps were collected.
Data reduction.
Responses were bandpass filtered offline from 70 to 2000 Hz (12 dB/octave, zero-phase shift). The responses were epoched using a −40 to 213 ms time window referenced to stimulus onset (0 ms). Sweeps with amplitudes greater than ±35 μV were considered artifact and rejected, resulting in 6000 response trials for each participant. Responses to the two polarities were added to limit the influence of cochlear microphonic and stimulus artifact. Amplitudes of responses were baseline-corrected to the prestimulus period.
Neural timing.
To gauge the timing of subjects' neural responses to speech, response peaks corresponding to the CV transition of the stimulus (peaks occurring ∼23, 33, 43, 53, 63 ms) and the steady-state sustained vowel period (peaks occurring ∼73, 83, 93, 103, 113, 123, 133, 143, 153, 163 ms) were identified by an automated program. Peak selections were confirmed by a rater blind to the subject group or condition and checked by a secondary peak picker. “Relative” peak timings (Figs. 2, 3) were computed by subtracting expected latency (i.e., 43 ms) from group average peak latency (for example, were 43.69 ms the mean latency in a given group for a peak occurring ∼43 ms, that group would be plotted as having a “relatively latency” of 0.69 ms). One subject with 8 years of music training was a statistical outlier (four of five latencies for noise transition peaks were >2.5 SDs beyond the mean) and was excluded from analysis.
Statistical analyses.
To compare neural timing between groups and conditions, peak latencies were submitted to a repeated-measures ANOVA (RM-ANOVA) with condition (quiet/noise) as a within-subjects factor and group (None/Little/Moderate) as a between-subjects factor. Main effects of group are reported for each condition. Normality was confirmed using the Shapiro–Wilk test and Levene's test ensured homogeneity of variance. The RM-ANOVA was run with α = 0.05, and all follow-up tests were strictly Bonferonni-corrected for multiple comparisons; all p values refer to two-tailed tests. In the Little and Moderate groups, nonparametric Spearman's correlations were used to relate years of training to neural timing, since the distribution of years of training was slightly skewed. Analyses were performed in SPSS 20.0 (SPSS).
Results
Summary of results
A greater amount of music training early in life was associated with the most efficient auditory function decades after training stopped. Older adults in the Moderate training group had the fastest neural timing in response to the [da] presented in quiet and noise. The Moderate group was also the most resilient to noise-induced timing delays. Group differences were only seen in the region of the response corresponding to the CV transition of the syllable (the region between 20 and 60 ms in the response); in the steady-state vowel portion of the response, the groups were equivalent.
Neural resistance to noise degradation on response timing
In a typical auditory system, noise slows neural responses to speech (Hall, 2006; Anderson et al., 2010), although musicians experience less drastic timing delays than nonmusicians (Parbery-Clark et al., 2009; Strait et al., 2012; Kraus and Nicol, 2013). To compare the degrading effect of noise on neural timing across all three groups, latencies in quiet and noise for the CV transition region of the responses (phase-locked peaks occurring at ∼23, 33, 43, 53, and 63 ms) were compared. There was a significant main effect of group, revealing fastest timing in the Moderate group in both quiet and noise (F(1,71) = 2.85, p = 0.005). There was also a significant group × condition interaction, demonstrating that the Moderate group was least affected by latency delays due to noise (F(1,71) = 2.17, p = 0.030). There were no effects of group in response to the steady-state vowel (all p > 0.1) meaning that past music training was only associated with improved timing in the CV transition.
Absolute neural timing in response to speech in quiet
There was a significant effect of group membership on peak timing in response to speech in quiet, with the Moderate training group having the fastest timing, on average, followed by the Little and None groups, respectively (F(1,71) = 2.13, p = 0.033). See Table 3 for mean peak latencies in quiet and Figure 2 for responses in the transition and an illustration of mean neural timing by group.
Follow-up tests revealed that this difference was driven by the Little versus Moderate comparison (F(1,20) = 3.28, p = 0.025), with no significant group differences in the None versus Moderate (F(1,22) = 1.74, p = 0.168) or None versus Little comparisons (F(1,22) = 1.49, p = 0.237). However, Bonferonni-corrected assessments at an individual peak level revealed differences between the None and Moderate groups for peaks 43 (t(26) = 2.79, p = 0.036) and 53 (t(26) = 3.02, p = 0.018) with trending differences for peaks 23 (t(26) = 2.24, p = 0.067) and 63 (t(26) = 2.26, p = 0.075). This suggests that there was a None versus Moderate difference in quiet but that our sample size was underpowered to bring this out at a multivariate level. This is consistent with previous work that suggests training effects in quiet are relatively weak compared to those in noise (Russo et al., 2010; Anderson et al., 2013b).
There were no effects of group on timing in response to the steady-state vowel (overall: F(1,62) = 0.82, p = 0.685; all individual peaks p > 0.1), which is consistent with previous findings establishing a selective effect of music training for encoding spectrotemporally dynamic regions of speech (Parbery-Clark et al., 2009, 2012a; Strait et al., 2012, 2013).
Absolute neural timing in response to speech in noise
There was a significant effect of group membership on peak timing in response to speech in noise, with the Moderate training group having the fastest timing, on average, followed by the Little and None groups, respectively (F(1,71) = 3.35, p = 0.001). See Table 4 for mean peak latencies in noise and Figure 3 for responses in the transition and an illustration of mean neural timing by group.
Follow-up tests revealed that this difference was driven by the None versus Moderate and Little versus Moderate group comparisons (None vs Moderate: F(1,22) = 4.94, p = 0.004; Little vs Moderate: F(1,20) = 3.57, p = 0.018), with no significant group difference in the None versus Little comparison (F(1,22) = 2.07, p = 0.107).
As in quiet, there were no effects of group on timing in response to the steady-state vowel (overall: F(1,62) = 0.97, p = 0.504; all individual peaks p > 0.1). Again, this is consistent with previous findings suggesting a selective enhancement conferred by extensive music training.
Linear relationship between years of training and neural timing
In addition to group-wise comparisons, we correlated years of training and neural timing within subjects with some degree of music training to determine whether the relationship between extent of training and neural timing reflected more years of training. For every transition peak in quiet and noise there was a negative association between years of training and neural timing, indicating that subjects with more years of music training had faster neural responses to speech. However, in quiet, the correlation only reached statistical significance for peak 23 (ρ(26) = −0.600, p = 0.001) and in noise for peaks 23 (ρ(26) = −0.438, p = 0.025) and 33 (ρ(26) = −0.494, p = 0.010). A scatterplot with years of these peak latencies in noise is presented in Figure 4.
Discussion
We compared neural responses to speech in three groups of older adults who reported varying degrees of music training early in their lives. The group with the most music training displayed the fastest neural timing in the response region corresponding to the information-bearing and spectrotemporally dynamic region in a speech syllable, the CV transition, most notably in noise. While this neural enhancement has been observed in older adults with lifelong music training (Parbery-Clark et al., 2012a), and to a lesser extent in those who have undergone intensive short-term computer training (Anderson et al., 2013b), here we observe this benefit in adults with past music experience ∼40 years after training stopped.
There are a number of potential mechanisms driving these group differences decades after training. It may be that early music instruction instills a fixed change in the central auditory system that is retained throughout life. Early acoustic experience can have lasting consequences for neural function. For example, rearing rat pups in noise decreases auditory cortical synchrony into adulthood, even after noise exposure has stopped (Zhou and Merzenich, 2008), whereas enrichment early in life promotes auditory processing in adulthood (Engineer et al., 2004; Threlkeld et al., 2009; Sarro and Sanes, 2011). Similar effects are observed in owl auditory midbrain, specifically in central nucleus of IC (Linkenhoker et al., 2005). These principles may apply to the group differences observed here: a moderate amount of training early in life changed subcortical auditory function such that the system responded with faster timing many years later. There are, however, other mechanisms to consider, especially in light of recent evidence that sensory and neocortical systems retain the potential for substantial plasticity into older adulthood (Smith et al., 2009; Berry et al., 2010; Anderson et al., 2013b), including in response to passive stimulus exposure (Kral and Eggermont, 2007).
In addition to instilling fixed changes early in life, music instruction may “set the stage” for future interactions with sound, driving the timing differences we observed between groups (i.e., fastest timing in the group with the greatest amount of past training). Past experiences mapping sounds to meaning—as through music training—may prime the auditory system to interact more dynamically with sound. This could account for the selective relationship between extent of past music training and response timing to the CV transition in speech but not the steady-state vowel. By priming the auditory system to encode sound according to informational saliency and acoustic complexity (Strait et al., 2009; Kraus and Chandrasekaran, 2010), increased neural resources could be devoted to the most relevant acoustic features of auditory scenes. In fact, this may account for the inconsistency in group effects for neural timing in response to speech presented in quiet versus speech presented in noise: whereas the group effect in noise was driven by the Moderate group having faster timing than the Little and None groups, in quiet this was driven by poorer timing in the Little group. Previous studies on musicians have identified stronger training effects in the neural encoding of speech in noise, which may explain these results. This “setting of the stage” may apply especially to degraded acoustic environments that place increased demands on sensory circuitry for accurate target signal transcription.
The nature of an individual's interactions with sound can alter auditory processing; enriched auditory experiences may improve subcortical auditory function. Assistive listening devices that enhance classroom signal-to-noise ratios, for example, improve the stability of children's subcortical speech-sound processing even after the children stop using the devices (Hornickel et al., 2012). Such devices may assist students in directing attention to important auditory streams even when audibility is returned to normal signal-to-noise levels. Music training may function similarly; after all, a key element of playing music is dynamically directing attention to certain auditory objects or features. Using similar reasoning, Bavelier, Green, and colleagues have pursued the use of action video games to train attentional systems (Green and Bavelier, 2003), arguing that video games enhance attention and thus provide the means to modify neural function (Green and Bavelier, 2012).
Evidence in older adults supports the hypothesis that early or limited training experiences influence subsequent cognitive function and/or learning. For example, older adults with past music training rely more on cognitive mechanisms (such as memory and attention) to understand speech in noise than do older adults with no past music training (Anderson et al., 2013c). In the current study, the members of the Moderate training group may engage with sound using subtly different cognitive mechanisms that modulate auditory processing (cf. Gaab and Schlaug, 2003; Wong et al., 2009). Future work should explore differences in the relative mechanisms used to achieve similar performance on perceptual tasks as a function of previous training experiences. These mechanisms may dictate the nature of interactions with sound after training has stopped to reinforce modulations to subcortical processing in the long term (as reviewed in Kraus and Chandrasekaran, 2010; Strait and Kraus, 2013). Work in the human perceptual learning literature has explored what happens to learning when active training has stopped. Taking breaks from learning (Molloy et al., 2012) or interspersing active and passive exposure (Wright et al., 2010) may facilitate “latent learning” and improve end performance.
If initial training experiences set the stage for subsequent learning, past music training may promote auditory learning ability or increase the benefits accrued from short-term experiences. In fact, adults with previous music experience perform better on statistical learning tasks (Shook et al., 2013; Skoe et al., 2013), which model naturalistic environmental language learning. Similarly, initial exposure to different statistical “languages” benefits subsequent novel language learning (Graf Estes et al., 2007). Together, this work supports the proposal that limited, initial training experiences interact with later learning to guide the final course of nervous system function. There is also physiologic evidence for early training's influence on later learning in adulthood. Prior experiences prime the capacity for and efficiency of future plasticity in neocortex (Zelcer et al., 2006; Abraham, 2008; Hofer et al., 2009). Acoustic experiences early in life may also be complemented or counteracted by later training experiences (Threlkeld et al., 2009; Zhou and Merzenich, 2009). Music training may then prime the auditory system to benefit from subsequent auditory experiences, complementing enhancements that are primary effects of the training itself, thereby recapitulating past experiences to guide future interactions with sound (cf. Salimpoor et al., 2013; Skoe and Kraus, 2013).
Up to now, we have proposed that group differences were driven by an initial enhancement from music training that was reinforced throughout life due to music leading to more effortful and meaningful interactions with sound in other contexts. Of course there are other factors that may drive our group differences. For example, we cannot rule out innate differences in central auditory physiology; individuals with better subcortical function may be drawn to play a musical instrument for a longer period of time. We think this unlikely, however, since there are many other factors (family history, personal interest, environment, etc.) that prompt an individual to study music. We also strived to carefully match our groups demographically, especially on factors as IQ and education (Table 1). Finally, we find it particularly striking that the group difference we found (neural timing in response to CV transitions in speech presented in noise) has also been seen in older adults who were randomly assigned to complete short-term training (Kraus and Nicol, 2013; Anderson et al., 2013b); this may be a metric especially suited to demonstrate learning-related enhancements, which would further suggest that the neural timing differences were due to training, and were not innate. We do note, however, that the neural enhancements in the Moderate group, which we attribute to music training, are not as pervasive as those seen in lifelong older adult musicians (Parbery-Clark et al., 2012b; Zendel and Alain, 2012).
It is worth noting that there was a large degree of heterogeneity in the Moderate training group. In this group there was a range of 10 years of training (minimum 4, maximum 14) and a wide array of instruments and means of instruction (classroom, private teacher, etc.) that subjects reported. Unfortunately, it is difficult to quantify the precise amount of training (i.e., number of hours per week) that these subjects pursued since we have to rely on recollections about training many decades ago. That said, it is striking that despite the diversity of this group there was an effect of past training on neural timing many years after the training stopped, with the extent of past training correlating with the extent of neural response enhancement (Fig. 4).
Our findings have important consequences for education and social policy. Today, music education is at high risk for being cut from schools in the United States. School districts with limited financial resources prioritize science, math, and reading since music is often considered a nonessential component of the curriculum (Rabkin and Hedberg, 2011). Here, we show a neural enhancement linked to a moderate amount of music training many decades after training has stopped. Importantly, this enhancement was for encoding CV transitions in speech, which are especially vulnerable to the effects of age, yet important for everyday communication. These findings support current efforts to reintegrate arts education into schools (President's Committee on the Arts and the Humanities, 2011; Yajima and Nadarajan, 2013) by suggesting that music training in adolescence and young adulthood may carry meaningful biological benefits into older adulthood.
Footnotes
- Received June 17, 2013.
- Revision received August 22, 2013.
- Accepted September 18, 2013.
This work was supported by the National Institutes of Health (R01 DC010016 to N.K., F31 DC011457-01 to D.S.), the Hugh Knowles Center (to N.K.), and the Northwestern Cognitive Science program (to K.W.C.). We thank Hee Jae Choi for her assistance with data analysis and Trent Nicol, Elaine Thompson, and Jennifer Krizman for critical reviews of the manuscript.
- Correspondence should be addressed to Nina Kraus, 2240 Campus Dr., Evanston, IL 60208. nkraus{at}northwestern.edu
- Copyright © 2013 the authors 0270-6474/13/3317667-08$15.00/0