Abstract
Musicianship in early life is associated with pervasive changes in brain function and enhanced speech-language skills. Whether these neuroplastic benefits extend to older individuals more susceptible to cognitive decline, and for whom plasticity is weaker, has yet to be established. Here, we show that musical training offsets declines in auditory brain processing that accompanying normal aging in humans, preserving robust speech recognition late into life. We recorded both brainstem and cortical neuroelectric responses in older adults with and without modest musical training as they classified speech sounds along an acoustic–phonetic continuum. Results reveal higher temporal precision in speech-evoked responses at multiple levels of the auditory system in older musicians who were also better at differentiating phonetic categories. Older musicians also showed a closer correspondence between neural activity and perceptual performance. This suggests that musicianship strengthens brain-behavior coupling in the aging auditory system. Last, “neurometric” functions derived from unsupervised classification of neural activity established that early cortical responses could accurately predict listeners' psychometric speech identification and, more critically, that neurometric profiles were organized more categorically in older musicians. We propose that musicianship offsets age-related declines in speech listening by refining the hierarchical interplay between subcortical/cortical auditory brain representations, allowing more behaviorally relevant information carried within the neural code, and supplying more faithful templates to the brain mechanisms subserving phonetic computations. Our findings imply that robust neuroplasticity conferred by musical training is not restricted by age and may serve as an effective means to bolster speech listening skills that decline across the lifespan.
- auditory event-related brain potentials (ERPs)
- brainstem frequency-following response (FFR)
- categorical speech perception
- cognitive aging
- experience-dependent plasticity
- music-to-language transfer effects
Introduction
Cognitive aging is associated with declines in auditory brain function that often manifest behaviorally in poorer speech comprehension abilities (Humes et al., 2013). These deficits include the ability to parse, sequence, and identify acoustic features of the speech signal (Alain et al., 2012; Hutka et al., 2013; Bidelman et al., 2014a). That age-related speech deficits persist in the absence of measurable hearing loss (Anderson et al., 2011; Bidelman et al., 2014a) challenges conventional and longstanding views that speech intelligibility is determined solely by signal audibility (Humes et al., 2013). Instead, they highlight the importance of central brain mechanisms in determining successful speech perception later in life.
The ubiquity of perceptual–cognitive deficits with age has kindled interest in the biomedical sciences to identify lifestyle choices and training regimens that promote successful aging (Kramer et al., 2004). Aside from isolated reports (Anderson et al., 2013), programs that endorse improvement in central auditory processing and speech-listening skills have experienced limited success or remain to be validated (Owen et al., 2010; Henshaw and Ferguson, 2013). Cognitive decline in the elderly is, however, successfully offset with engagement in intellectually stimulating activities (Scarmeas and Stern, 2003). For example, higher educational/occupational achievements (Qiu et al., 2001) and bilingualism attenuate the negative effects of age on cognitive abilities (Bialystok et al., 2007). Furthermore, recent studies indicate that musical training positively modifies brain mechanisms to provide robust, long-lasting improvements in speech-language function in young adults (Kraus and Chandrasekaran, 2010; Moreno and Bidelman, 2014). Interestingly, the neuroplasticity afforded by musical training in childhood is retained into early adulthood, even when lessons are terminated in adolescence (Skoe and Kraus, 2012). Enriched human experience in the form of music instruction may therefore engender a “cognitive reserve” that could delay, postpone, or even reverse age-related cognitive declines (Alain et al., 2014). Although it is clear that musicianship impacts neural function in young developing brains (e.g., Fujioka et al., 2006; Hyde et al., 2009; Strait et al., 2012; Chobert et al., 2014), whether these benefits extend throughout the course of a lifespan remains largely unexplored (cf. Zendel and Alain, 2012). Music-related plasticity documented for younger adults and children (Chobert et al., 2014; Moreno and Bidelman, 2014) may not yield the same degree of benefit in the older, “less plastic” nervous system (Stiles, 2000; Alain et al., 2014).
We investigated the hierarchy of auditory neuroplasticity afforded by musicianship and whether functional enhancements in brain processing counteract poorer speech reception observed in older individuals (e.g., Bidelman et al., 2014a). We predicted that older musicians (1) would show more efficient/robust neural encoding of speech in subcortical–cortical pathways subserving speech and (2) these neural representations would organize more categorically and with higher coupling to behavior than in musically naive listeners. Our hypotheses were based on evidence in younger adults showing that musicianship strengthens brainstem–cortical speech processing in concert to enhance categorical perception (CP) of speech (Bidelman et al., 2014b). Demonstrating similar speech identification benefits in older individuals would establish that robust auditory neuroplasticity extends across the lifespan. Verifying that musical training impacts the aged brain could also have important societal impact as it could offer a viable rehabilitation program to benefit speech-language behaviors in the aging population (e.g., Kraus et al., 2014).
Materials and Methods
Participants.
Twenty older adults (10 females) participated in the experiment: 10 English-speaking musicians and 10 nonmusicians. All participants were right-handed and had normal hearing sensitivity, defined as audiometric thresholds ≤25 dB HL at octave frequencies from 500 to 4000 Hz. Consistent with previous reports (Wong et al., 2007; Zendel and Alain, 2009; Bidelman et al., 2011b; 2014b; Parbery-Clark et al., 2012b), musicians (Ms) were defined as amateur instrumentalists who had received ≥5 years of continuous private instruction on their principal instrument (mean ± SD: 11.40 ± 5.8 years), beginning before age 14 (10.8 ± 2.5 years). Beyond formal private or group lessons, each was currently active in music practice or ensemble engagement in their daily lives. Nonmusicians (NMs) had no more than 2 years of self-directed music training (0.3 ± 0.7 years) throughout their lifetime. The two groups were otherwise closely matched in age (M: 70.1 ± 7.1 years, NM: 69.6 ± 8.5 years; p = 0.91) and years of formal education (M: 18.3 ± 3.8 years, NM: 15.2 ± 3.7 years; p = 0.08). All participants reported no history of neuropsychiatric disorders. To rule out additional deficits in cognitive function, older adults were screened for dementia and cognitive impairment using the Montreal Cognitive Assessment (Nasreddine et al., 2005). Participants were compensated for their time and provided written, informed consent in compliance with a protocol approved by the Baycrest Centre Research Ethics Committee.
Speech vowel continuum stimuli.
We used a synthetic, five-step vowel continuum from /u/ to /a/ (Bidelman et al., 2013, 2014b). Each token of the continuum was separated by equidistant steps acoustically, yet the continuum as a whole was perceived categorically. Only a single acoustic dimension varied across stimuli: first formant frequency (F1). Each token was 100 ms, including 10 ms of rise/fall time to reduce spectral splatter in the stimuli. Tokens contained identical voice fundamental (F0), second (F2), and third formant (F3) frequencies (F0: 100, F2: 1090, and F3: 2350 Hz, respectively). The critical stimulus variation was achieved by parameterizing first formant (F1) over five equal steps between 430 and 730 Hz such that the resultant stimulus set spanned a perceptual phonetic continuum from /u/ to /a/ (Bidelman et al., 2013, 2014a, b).
Data acquisition and preprocessing.
Data acquisition was identical to previous reports from our laboratory (Bidelman et al., 2013, 2014a, b). Stimuli were delivered binaurally at an intensity of 83 decibels sound pressure level (dB SPL) through insert earphones (ER-3A, Etymotic Research). Extended acoustic tubing (50 cm) was used to eliminate electromagnetic stimulus artifact from contaminating neural (brainstem) responses (Aiken and Picton, 2008). During event-related brain potential (ERP) recording, listeners heard 200 randomly ordered exemplars of each token and were asked to label them with a binary response as quickly as possible (“u” or “a”) (Bidelman et al., 2013). Following the participant's behavioral response, the interstimulus interval was jittered randomly between 400 and 600 ms (20 ms steps, rectangular distribution). An additional 2000 trials (interstimulus interval = 150 ms) were then collected to measure sub-microvolt brainstem ERPs (Bidelman and Krishnan, 2010; Bidelman et al., 2013). Participants watched a self-selected movie with subtitles during these blocks to facilitate a calm yet wakeful state.
Continuous EEGs were recorded differentially between an electrode placed on the high forehead at the hairline (∼Fpz) referenced to linked mastoids. A third electrode on the mid-forehead served as common ground. Inter-electrode impedance was ≤3 kΩ. This montage is optimal for recording evoked responses of both subcortical and cortical origin (Musacchia et al., 2008; Bidelman et al., 2013, 2014b). Contact impedances were maintained ≤3 kΩ. EEGs were digitized at 20 kHz and bandpass filtered online between 0.05 and 3500 Hz (SymAmps2, Compumedics NeuroScan). Traces were then segmented (cortical ERP: −100 to 600 ms; brainstem ERP: −40 to 210 ms), baselined to the prestimulus interval, and subsequently averaged in the time domain to obtain ERPs for each condition. Trials exceeding a ±50 μV threshold were rejected automatically before averaging. Grand average evoked responses were then bandpass filtered in different frequency bands to highlight brainstem (80–2500 Hz) and cortical (1–30 Hz) ERPs, respectively (Bidelman et al., 2013, 2014a, b).
Behavioral data analysis.
Psychometric identification functions were constructed by computing the proportion of trials (for each token) identified as a single vowel class (i.e., /a/). Behavioral speech labeling speeds (i.e., reaction times [RTs]) were computed as the listener's mean response latency across trials for a given condition. RTs outside 250–1000 ms were deemed outliers and excluded from further analysis (Bidelman et al., 2013; Bidelman et al., 2014a).
Brainstem ERP response analysis.
Fast Fourier transforms (FFTs) were computed from the steady-state portion of brainstem time-waveforms (0–100 ms). We measured the magnitude of the response F0 to quantify the degree of neural phase-locking to the speech envelope frequency, a neural correlate most reflective of “voice pitch” encoding (Bidelman and Krishnan, 2010; Bidelman et al., 2011a, b; Parbery-Clark et al., 2013). Onset latency was also measured from each brainstem ERP as the peak positivity between 7 and 12 ms, the expected onset latency of the brainstem response (Galbraith and Brown, 1990; Bidelman et al., 2011c). Together, F0 amplitude and onset latency were used to assess the overall sustained magnitude (F0) and neural synchrony/precision (onset latency) of older adults' spectrotemporal encoding of speech as a function of musical experience.
Cortical ERP response analysis.
Peak amplitude and latency were measured for prominent deflections of the cortical ERPs (P1, N1, P2, P3). Functionally, the early waves (P1-N1-P2) are presumed to reflect activity generated in thalamus, primary, and secondary auditory cortex and thus index the early cortical processing of complex sounds (Näätänen and Picton, 1987; Picton et al., 1999). In particular, the N1-P2 complex is highly sensitive to speech perception tasks (Tremblay et al., 2001; Alain et al., 2007; Bidelman et al., 2013, 2014a) and prone to the neuroplastic effects of speech sound training (Tremblay et al., 2001; Alain et al., 2007) and long-term musical experience (Shahin et al., 2003; Seppänen et al., 2012; Bidelman et al., 2014b). In contrast, the P3 component is thought to reflect later cognitive operations and has been linked to attentional reorienting, stimulus evaluation, and memory (Knight et al., 1989). Guided by prior literature (e.g., Irimajiri et al., 2005; Bidelman et al., 2014b), P1 was taken as the positivity between 60 and 80 ms, N1 the negative-going trough between 90 and 110 ms, P2 as the positive-going peak between 130 and 170 ms, and P3 as the positivity between 220 and 320 ms. The overall magnitude of the N1-P2 complex, computed as the voltage difference between the two individual waves, was used as an index of early (exogenous) perceptual speech processing; P3 amplitudes served as an index of later cognitive (endogenous) processing related to speech sound classification. ERP analysis and automated peak selection were performed using custom routines coded in MATLAB (The MathWorks).
Neurometric identification functions derived from cortical ERPs.
We constructed “neurometric” identification functions derived from the cortical ERPs using a data-driven approach (i.e., unsupervised classification and clustering) that implements the definition of CP (Chang et al., 2010; Bidelman et al., 2013). Within-category speech sounds are perceived as belonging to the same class and therefore should elicit similar neural activity patterns, whereas across-category tokens are heard as dissimilar and, thus, should elicit more divergent neural responses (e.g., Chang et al., 2010). In speech perception, psychophysical differences are often explored using confusion matrices representing the perceptual dissimilarity/similarity between speech sounds. Analogous “neural dissimilarity” matrices (Chang et al., 2010; Bidelman et al., 2013) were computed separately within a 20 ms time window centered on the cortical P2 wave of the ERP. We chose this analysis time window as our previous study demonstrated that categorical neural organization in the ERPs of younger adults did not emerge before the P2 wave (Bidelman et al., 2013). Within this window, the standardized Euclidean distance was computed between the raw voltage waveforms of all pairwise response combinations and used to construct each dissimilarity matrix (see Fig. 5a). Each matrix cell quantifies the degree to which neuroelectric activity differs between a given pair of vowel stimuli.
Neural dissimilarity matrices were then submitted to a multidimensional scaling (MDS) analysis to visualize differences in neurophysiological responses across stimuli in a two-dimensional Euclidean space. MDS is a popular tool for examining perceptual dissimilarities between auditory stimuli (Shepard, 1980; Borg and Groenen, 2005); we use it here to similarly examine the dissimilarity of neural responses elicited by a speech continuum. Graphically, points in MDS space represent empirical distances between brain responses akin to geographic distances between cities on a map. Within-category speech sounds are more difficult to discriminate and, thus, elicit responses that are positioned closer together in MDS space; across-category tokens are easier to discriminate and, hence, are well separated geometrically (Chang et al., 2010; Bidelman et al., 2013). A stress value <0.1, which represents the reconstruction's “badness of fit,” was obtained with an MDS solution of only two dimensions indicating an adequate fit to the data (Borg and Groenen, 2005).
We applied k-means clustering (k = 2) to the MDS solution to test whether ERPs grouped in a way that paralleled the psychophysical classification based solely on differences between the evoked activity generated by each vowel sound. The choice of two clusters was based on the a priori knowledge that, perceptually, our stimuli fall into one of two distinct phonetic categories (i.e., /u/ and /a/) (compare Chang et al., 2010; Bidelman et al., 2013) and the fact that our behavioral task forced listeners to make a binary decision when labeling tokens. Objects falling within a given cluster were considered to be representatives of the same vowel identity (i.e., phonetic category) (see Fig. 5b).
“Neurometric” identification functions were then constructed for each phonetic category by computing the normalized distance between each of the individual responses (as represented in MDS space) and each of the two cluster means (representing a neural exemplar, or “template,” for each vowel category) (for details, see Chang et al., 2010; Bidelman et al., 2013). Generated with respect to both cluster means, the resulting functions estimate how well the neural activity evoked by each vowel stimulus fits into one of the two discrete phonetic categories (see Fig. 5c). The rationale behind this approach is that categorical speech sounds are perceived as belonging to the same class and therefore should elicit similar neural activity patterns, whereas across-category tokens are heard as dissimilar and, thus, should elicit more disparate neural responses in the MDS solution (Chang et al., 2010; Bidelman et al., 2013). Given the inherent symmetry in a binary-choice CP task, we report only one side of these identification functions (i.e., identification of /u/). Comparisons between the degree to which older musicians' and nonmusicians' neurometric functions paralleled their psychometric functions (i.e., via correlations) allowed us to assess whether or not older musicians show more pronounced categorical encoding/organization for speech.
Statistical analyses.
Two-way, repeated-measures ANOVAs were conducted on all dependent variables. Group (2 levels; M, NMs) functioned as the between-subjects factor; vowel stimulus [5 levels; vowel (vw) 1–5] as the within-subjects factor. Tukey-Kramer multiple comparisons controlled Type I error inflation. A priori significance level was set at α = 0.05.
Multiple least-squares regression was used to determine the extent to which brainstem and cortical ERPs could predict each group's behavioral CP for speech (Bidelman et al., 2014b). We constructed a regression model (per group) consisting of both simple main effects as well as interaction terms: ΨID speed = β1BSerp + β2Cerp + β3BSerp * Cerp, where Ψ represents a listener's behavioral speech classification speed, BSERP is the magnitude of brainstem F0 encoding), and CERP is the cortical N1-P2 response to speech. N1-P2 responses were selected for regression analyses (as opposed to P2 amplitudes used in neural dissimilarity analysis) to be consistent with and allow direct comparison to metrics used in our previous reports (Bidelman et al., 2013, 2014b). β1- β3 represent to-be-estimated scalar coefficients, computed via least-squares analysis, for the weighting of each of these neural factors in the regression model. Regression coefficients were standardized (total variance = 1) to equate the scales between variables and allow us to estimate their individual predictive power on human behavior. Adjusted R2 was used to assess model fits, which increase only if additional terms improve a model more than expected by chance. Additionally, pairwise correlations were used to explore the link between subcortical and cortical speech representations (brainstem: F0 amplitude; cortical: N1-P2 magnitudes) and behavioral speech identification performance.
Results
Behavioral speech identification
Despite continuous changes in stimulus acoustics, listeners heard a clear perceptual shift in the phonetic category (/u/ vs /a/) near the continuum's midpoint (Fig. 1a). The overall location of the perceptual boundary did not differ between groups nor did vowel identification scores (F(1,18) = 0.01, p = 0.99). However, we found a main effect of group on vowel RTs (F(1,18) = 11.06, p = 0.0013), indicating that Ms were faster at speech sound classification than NMs. In both groups, participants were slower at classifying speech tokens near the CP boundary (vw3) relative to others in the continuum, consistent with previous reports examining speeded phonetic classification (Pisoni and Tash, 1974; Bidelman et al., 2013, 2014b).
Brainstem ERPs to speech
Speech-evoked brainstem waveforms and spectra illustrate the phase-locked, neurophonic nature of the brainstem response, which faithfully preserves spectral information of speech stimuli with high fidelity (Fig. 2). No group differences were observed in brainstem F0 amplitudes (F(1,18) = 2.97, p = 0.10), suggesting that musicianship does not offset normal age-related degradation in the magnitude of subcortical speech encoding (e.g., Parbery-Clark et al., 2012a; Bidelman et al., 2014a) (Fig. 2b). Yet, pooled across stimuli, Ms showed earlier brainstem response latencies (independent samples t test; t(18) = −1.92, p = 0.036) than their NM peers, indicating more efficient (i.e., faster) subcortical speech processing in the aged brain of Ms (Fig. 2c).
Cortical ERPs to speech
Figure 3 shows waveforms and response properties of the cortical ERPs elicited by the vowel continuum. Cortical ERP amplitude was characterized by considerable variability, and no group differences were observed in the magnitudes of the individual waves of the P1-N1-P2 complex (P1 wave: F(1,18) = 0.82, p = 0.37; N1 wave: F(1,18) = 0.17, p = 0.68; P2 wave: F(1,18) = 0.05, p = 0.83; P1-N1 complex: F(1,18) = 0.28, p = 0.60). N1-P2 amplitudes were however strongly modulated by the vowel stimulus (F(4,72) = 5.58, p < 0.001) (Fig. 3c) (group × vowel: F(4,72) = 0.69, p = 0.60). Pooling both groups, follow-up contrasts revealed that this stimulus effect was largely attributable to listeners having weaker responses to the ambiguous vowel token (vw3) than the exemplar vowel stimuli (vw1, vw5) (t(19) = 3.52, p = 0.0023). Higher amplitude for easily categorized versus phonetically ambiguous sounds supports the notion that cortical activity distinguishes speech information based on its phonetic rather than acoustic properties (Bidelman et al., 2013, 2014a).
Compared with the early cortical ERPs, P3 showed a prominent vowel × group interaction (F(1,72) = 3.60, p = 0.009). This was driven by Ms having larger P3 responses, particularly for vw5 (/a/). A priori contrasts revealed that P3 responses of Ms differentiated ends of the vowel continuum (/u/ vs /a/ response, paired-samples t test: t(9) = 3.22, p = 0.01), whereas NM responses did not (t(9) = 0.11, p = 0.91) (Fig. 3a, insets). Larger P3 in older musicians suggests enhanced attentional reorienting within the soundscape and/or improved neural differentiation of speech with musical experience.
In contrast to evoked potential amplitudes, latency measures showed more marked group effects. Overall, Ms had shorter N1 latencies than NMs (F(1,18) = 5.12, p = 0.03) (Fig. 3d). The analysis of P2 revealed a vowel × group interaction (F(1,72) = 2.41, p = 0.05) (Fig. 3e), which was driven by Ms having earlier responses for vw3 and vw5. No group differences were observed in P1 (F(1,18) = 0.31, p = 0.58) or P3 (F(1,18) = 0.03, p = 0.88) latencies. Collectively, findings indicate earlier and more robust cortical encoding of speech in older adults with musical experience.
Brain-behavior correspondences in speech processing
Pairwise correlations probed the connection between brainstem, cortical, and behavioral responses underlying CP (Fig. 4). We found that both brainstem and cortical ERPs predicted listeners' speed in speech sound classification but did so differentially depending on group membership. In both Ms and NMs, stronger brainstem encoding of the speech envelope (i.e., voice pitch) was positively associated with task speed: stronger responses predicted slower RTs in both groups by nearly the same degree (M: r = 0.39, p = 0.0042; NM: r = 0.37, p = 0.0076) (Fig. 4a,b). In contrast, N1-P2 cortical response amplitudes showed a strong negative correlation with behavior in Ms (r = −0.40, p = 0.0042), an effect not observed for NMs (r = 0.00, p = 0.99). That is, larger N1-P2 amplitudes were associated with faster speech RTs in older Ms but not NMs. Collectively, our results show that, in older adults with musical experience, larger cortical responses (but smaller brainstem responses) predicted faster behavioral speech classification performance. The dissociation between subcortical and cortical responses in predicting behavior (brainstem ERPs: positive correlation; cortical ERPs: negative correlation) further highlights that encoding at lower- and higher-order parts of the auditory pathway might differential influence behavior. Similar dissociations between levels of auditory processing and speech behaviors have been reported in older adults (Bidelman et al., 2014a). These effects have been interpreted as reflecting an age-related change in central inhibition and its differential impact on subcortical versus cortical levels of the auditory pathway.
Multiple least-squares regression was used to determine the extent to which brainstem and cortical ERPs could predict each group's behavioral RTs (Bidelman et al., 2014b). We used N1-P2 amplitudes for the cortical regressor given that (1) these responses showed a profile of categorical encoding (e.g., Fig. 3c) and (2) they provided a single, parsimonious measure of listeners' total cortical response to speech. F0 amplitudes were used as the brainstem regressor as this index captures the overall brainstem encoding of speech. The weighting coefficient (β value) computed for each variable reflects the degree to which that neural measure predicts behavior. The resultant regression functions are shown in Table 1. Model fits show that brainstem and cortical responses were robust predictors of M's behavioral performance in the CP task (adjusted R2 = 0.22, p < 0.01). This same combination of neural markers was much weaker (unreliable) in predicting behavior for NMs (adjusted R2 = 0.09, p = 0.06). Only brainstem responses held significant predictive power for the NM group. The higher correspondence between multiple brain responses and perception suggests that, in older musicians, neural representations carry more behaviorally relevant information within the auditory brain and lead to better behavioral outcomes (cf. younger musicians: Bidelman et al., 2014b).
Neural speech classification from cortical ERPs
Having established that musicianship modulates and enhances speech processing across the aged brain, we aimed to more definitively evaluate whether older Ms showed a more pronounced categorical organization for speech than their age-matched NMs peers. Using raw voltage differences between responses, we constructed neural dissimilarity matrices, analogous to perceptual confusion matrices, to quantify the degree to which each group's brain activity could differentiate vowel stimuli (Fig. 5a). MDS applied to dissimilarity scores provides a visualization of response dissimilarities in a common Euclidean space; distances between objects quantify the magnitude of neural response dissimilarity (Fig. 5b). Consistent with our findings in young adults (Bidelman et al., 2013), examination of MDS “maps” showed that responses to within-category speech sounds elicited similar patterns of neural activity and appeared with closer proximity in geometric space; across-category tokens elicited divergent activity and were mapped farther apart. Clustering performed on MDS solutions revealed that cortical responses generated across the speech continuum could be meaningfully segregated into two distinct groupings (i.e., /u/ and /a/ clusters) mimicking the two phonetic classes heard by listeners. M's MDS maps revealed that speech sounds clustered in a consistent manner which paralleled perception (e.g., vws1–2 grouped near one another but were remote from vws 4–5). In contrast, NM's maps revealed patterns of miss-classifications where perceptually similar vowels were erroneously assigned to opposing clusters (e.g., vw1 and vw2).
To further evaluate correspondences between brain and behavior, we derived neurometric identification functions using the distance between evoked responses (as represented in MDS space) elicited by each vowel stimuli (i.e., Fig. 5b) and the two cluster means (representing a neural “template” for each vowel category). Neurometric functions provided estimates of how well each group's pattern of neural activity evoked by each vowel fit into one of the two discrete phonetic categories. At the group level, M's neural classification functions were strikingly similar to behavioral identification scores and closely mirrored their psychometric counterparts (Fig. 5c; r = 0.93, p = 0.02). Contrastively, NM's neural responses were less reliable in predicting their behavioral CP (r = 0.77, p = 0.13). Brain-behavior correlations were less robust at the individual level in both groups (Fig. 5c, inset). Nevertheless, Ms showed higher correlations between their neurometric and psychometric identification than NMs (t(9) = 2.14, p = 0.03). These findings demonstrate that, in older adults, cortical speech representations are arranged according to a categorical (i.e., phonetic) rather than purely acoustic code, corroborating recent data in younger listeners (Bidelman et al., 2013). More critically, they provide convincing evidence that the neural organization of speech is more categorical in older adults who have engaged in musical training.
In summary, findings of the current study demonstrate three main observations: relative to hearing and age-matched peers, older adults with musical training show the following: (1) improved (i.e., faster, more robust) neural encoding and differentiation of speech at both subcortical and cortical levels of the auditory system; (2) neural representations that are more strongly coupled to perception; and (3) enhanced ability to classify (i.e., categorically identify) speech information. These findings add to a growing body of literature that suggests musicianship tunes multiple aspects of the auditory nervous system, which, in turn, benefit speech-listening abilities (Parbery-Clark et al., 2009; Bidelman and Krishnan, 2010; Bidelman et al., 2011b, 2014b; Parbery-Clark et al., 2012a; Skoe and Kraus, 2012; Zendel and Alain, 2012, 2013; Chobert et al., 2014; Zendel and Alain, 2014).
Discussion
Our results offer important new insight to experience-dependent plasticity by demonstrating that a concert of neuroplasticity boosts speech-listening skills and counteracts negative declines in speech understanding that emerge with age. These findings establish that musicianship engenders coordinated functional plasticity throughout the auditory system. Moreover, these results demonstrate that robust auditory neuroplasticity is not restricted to younger, more “plastic brains” (Stiles, 2000) but, rather, extends across the lifespan.
Effective speech understanding requires that the auditory system faithfully transcribe acoustic information and maintain these neural representations through various signal transformations from periphery to percept. Under normal circumstances, neural representations along the ascending auditory pathway are made less redundant (i.e., more abstract) at successive stages so as to allow for easier readout in higher-level structures (Chechik et al., 2006; Bidelman et al., 2014a). Comparing brainstem and cortical speech evoked potentials in the same listeners, we have shown that this normal hierarchy along the neuroaxis becomes more redundant (i.e., more correlated) with increasing age (Bidelman et al., 2014a). Increased redundancy in the neural code reduces flexibility within the aging auditory system and ultimately impairs the acoustic–phonetic mapping necessary for robust speech understanding (Bidelman et al., 2014a). In the current study, we found more efficient and robust neurophysiological processing of speech at multiple tiers of auditory processing, paralleling the enhancements reported in younger musicians (Bidelman et al., 2014b). Moreover, older musicians' brain responses showed higher correlations between processing at brainstem and cortical levels and behavior. These results imply stronger, more pervasive coupling between auditory brain areas subserving speech in musically trained listeners (e.g., Bidelman et al., 2014b). They also extend previous investigations documenting younger musicians' increased intracerebral connectivity between auditory cortices (Kühnis et al., 2014) and enhanced preattentive speech processing (Chobert et al., 2014) by demonstrating (1) similar functional enhancements between subcortical and cortical auditory brain areas and (2) improved speech processing and perception in older individuals. Critically, older musicians' coordinated brainstem-cortical plasticity was directly related to behavioral outcomes, as these listeners also showed improved speed for phonetic speech classification. Together, our findings support the notion that weakened and delayed neural responses to speech observed with age (Parbery-Clark et al., 2012b; Bidelman et al., 2014a) are counteracted with musical experience (e.g., Parbery-Clark et al., 2012b; Zendel and Alain, 2013).
Classic models of speech perception often include “distortion” factors to account for the effects of aging on speech perception (Plomp, 1986). Such distortions may result from the biological declines in neural inhibition (Caspary et al., 2008) and response timing precision (Anderson et al., 2012; Parbery-Clark et al., 2012b; Bidelman et al., 2014a) that accompany normal aging. Age-related changes in rapid acoustic processing are thought to partially underlie older adults' deficits in speech understanding (Goŕdon-Salant and Fitzgibbons, 1993; Frisina, 2010). Here, we show that older adults with musical training have faster, more efficient neural responses at both subcortical and cortical levels of speech processing.
With few exceptions (P3 wave), improved neural timing was observed in the absence of overall changes in ERP amplitude. Indeed, older musicians tended to have smaller brainstem response amplitudes (although the group effect was not significant) that occurred earlier than their nonmusician peers, an effect apparent but not discussed in previous reports (Parbery-Clark et al., 2012a). A plausible explanation of the slightly more exaggerated and slower brainstem responses in our nonmusician cohort is a decrease in neural inhibition that accompanies normal aging (Caspary et al., 2008). We have previously shown a general slowing of the nervous system's responsiveness to speech in older compared with younger adults who have never received musical training (Bidelman et al., 2013). Here, we show that musical experience counteracts this normal age-related decline in neural inhibition. Neurophysiologically, this maintenance of neural precision confers central auditory processing benefits in older musicians resulting in neural responses to speech that begin to approach those of younger individuals (Zendel and Alain, 2012; Bidelman et al., 2014b). Together, our findings suggest that musicianship largely combats the overall slowing of auditory processing observed with age rather than the overall responsiveness of the brain, per se. Higher efficacy and/or improved temporal precision of neural processing may account for older musicians' faster speech sound identification we observe behaviorally.
More critically, comparison between neurometric and psychometric speech identification revealed a stronger categorical representation of speech in older musicians (i.e., neural organization more closely mirrored perception) (Fig. 5). These findings imply a differential pattern of speech processing between groups whereby older musicians' neural code carries more behaviorally relevant information of the speech signal than in older nonmusicians. Neural representations for speech are distorted and become less categorical with advancing age (Bidelman et al., 2014a). It is thus noteworthy that we find musicianship endows elderly listeners with more refined mental representations of the phonemic inventory of their native vowel-space. We argue that older musicians' more robust and selective internalized representations for speech across the auditory pathway supply more faithful phonemic templates to the decision mechanisms governing speech sound identification (Bidelman et al., 2014b). Enhanced exogenous representations feeding more robust decision-related processes is evident in older musicians' larger P3 responses and neural differentiation of speech, a component thought to reflect attentional reorienting. Interestingly, enhanced attentional processing resulting from musicianship is evident only in older (Zendel and Alain, 2014) but not younger adult musicians (Baumann et al., 2008; Bidelman et al., 2014b; Zendel and Alain, 2014). However, it should be noted that, in young children, musical expertise can sometimes facilitate both passive (preattentive) and active (attentional) processing of speech information (Chobert et al., 2011, 2014). Collectively, converging studies corroborate the notions that exogenous sensory processing of acoustic information: (1) is weakened by normal aging (Bidelman et al., 2014a), (2) is strengthened in the brains of musicians (Parbery-Clark et al., 2012b; Skoe and Kraus, 2012; Bidelman et al., 2014b; Zendel and Alain, 2014; current study), and (3) is further bolstered by improved endogenous, compensatory mechanisms in the musical brain (current study; Zendel and Alain, 2014). Together, these results establish a plausible neurobiological basis to account for older musicians' heightened acoustic–phonetic decisions and their enhanced speech identification performance.
Presumably, these neuroplastic effects might be mediated by a stronger interplay between auditory brain regions tuned through long-term experience analyzing complex sounds. Stronger exchange between brainstem and cortical levels in the aged brain would tend to reinforce feedforward and feedback information transfer throughout auditory and nonauditory brain areas (Bajo et al., 2010; Bidelman et al., 2014b). Enhanced links between lower- and higher-order brain regions may act to refine signal representations and ultimately sharpen older adults' behavioral acuity for speech signals as observed in musicians of this and previous reports (Parbery-Clark et al., 2009; Bidelman and Krishnan, 2010; Zendel and Alain, 2012; Bidelman et al., 2014b). It is reasonable to infer that the higher brainstem-cortical-behavioral correlations observed for young (Bidelman et al., 2014b) and now older musicians (current study) might reflect enhanced functional connectivity and increased reciprocity between subcortical and cortical hubs of auditory processing (Bidelman et al., 2014b). Although our study is limited to examining experience-dependent plasticity above the level of the inferior colliculus (the putative generator of the human brainstem ERP), recent work suggests changes in musicians' brain function as early as the cochlea (Bidelman et al., 2014c; Perrot and Collet, 2014). Thus, it is conceivable that older musicians' functional benefits for speech result from signal enhancements as early as the auditory periphery.
In conclusion, we found that long-term music engagement tunes a hierarchy of brain processing, increases flexibility along the auditory pathway, and renders more accurate neural representations that yield important speech percepts. That these neuroplastic effects are observed in older adults (Bidelman et al., 2014b; Chobert et al., 2014; Kraus et al., 2014) (cf. young adults) demonstrates that musicianship has long-lasting benefits to brain and behavioral function and provides robust neuroplasticity across the lifespan. The longevity (i.e., “sticking power”) of most auditory–cognitive training regimens is undefined or remains to be fully validated (e.g., Owen et al., 2010; Henshaw and Ferguson, 2013). Musicianship, on the other hand, enhances the neural encoding of speech in early childhood (Strait et al., 2012) and auditory processing advantages are retained into adulthood, even when music lessons are interrupted in adolescence (Skoe and Kraus, 2012; White-Schwoch et al., 2013). Together with current data, we argue that intense musical engagement early in life yields life-long listening advantages that improve speech-language skills (Skoe and Kraus, 2012; Zendel and Alain, 2012). However, given that our study was limited to examining semi-expert listeners, it remains to be determined whether musical training programs mitigate perceptual–cognitive declines in older individuals who begin training later in life (e.g., Bugos et al., 2007; Hanna-Pladdy and MacKay, 2011).
Recent longitudinal studies are promising in this regard. Randomized controlled studies have shown that enriched music training programs (∼1–2 years) boost speech processing in children at risk for language and learning problems (Kraus et al., 2014; see also Fujioka et al., 2006; Chobert et al., 2014). Future training studies are needed to characterize the time course of these neuroplastic effects and determine whether the same functional benefits observed here in highly experienced older adults (and previously in younger adults) are achieved with short-term music training programs. Nevertheless, our study provides direct neurobiological evidence that musicianship impacts the neural architecture supporting critical speech-listening skills in older individuals. Our findings therefore underscore the importance of arts and music instruction in school and rehabilitative programs to complement traditional curricula (President's Committee on the Arts and the Humanities, 2011). Musical activities are arguably more engaging than typical cognitive “brain training” programs and, thus, may offer similar reward with lower rates of attrition.
Footnotes
This work was supported in part by Canadian Institutes of Health Research Grant CIHR MOP 106619 to C.A. and the GRAMMY® Foundation awarded to G.M.B. We thank Michael Weiss and Joshua Villafuerte for their assistance in data collection.
The authors declare no competing financial interests.
- Correspondence should be addressed to Dr. Gavin M. Bidelman, School of Communication Sciences & Disorders, University of Memphis, 807 Jefferson Avenue, Memphis, TN 38105. g.bidelman{at}memphis.edu