Abstract
Musicians are often reported to have enhanced neurophysiological functions, especially in the auditory system. Musical training is thought to improve nervous system function by focusing attention on meaningful acoustic cues, and these improvements in auditory processing cascade to language and cognitive skills. Correlational studies have reported musician enhancements in a variety of populations across the life span. In light of these reports, educators are considering the potential for co-curricular music programs to provide auditory-cognitive enrichment to children during critical developmental years. To date, however, no studies have evaluated biological changes following participation in existing, successful music education programs. We used a randomized control design to investigate whether community music participation induces a tangible change in auditory processing. The community music training was a longstanding and successful program that provides free music instruction to children from underserved backgrounds who stand at high risk for learning and social problems. Children who completed 2 years of music training had a stronger neurophysiological distinction of stop consonants, a neural mechanism linked to reading and language skills. One year of training was insufficient to elicit changes in nervous system function; beyond 1 year, however, greater amounts of instrumental music training were associated with larger gains in neural processing. We therefore provide the first direct evidence that community music programs enhance the neural processing of speech in at-risk children, suggesting that active and repeated engagement with sound changes neural function.
Introduction
Community music programs provide an exciting model to offer widespread music training, especially to underserved children. Whereas private music lessons are prohibitively expensive, community programs bring together groups of children, channeling their creativity and energy away from damaging alternatives. Reports of programs such as El Sistema (Caracas, Venezuela) suggest these programs accomplish more than providing children with an enjoyable activity—participants stay in school, do well in school, and pursue postsecondary education more frequently than their peers (Majno, 2012). To date, however, few studies have asked whether these community music programs have a biological impact on the developing nervous system.
Myriad cross-sectional studies have reported behavioral and neurophysiological differences between musicians and non-musicians (Bidelman et al., 2011; Parbery-Clark et al., 2012; Seppänen et al., 2012; for review see Strait and Kraus, 2014); these “musician effects” are predominantly attributed to training-related plasticity. This interpretation is supported by evidence from humans and animals that the nervous system has profound potential for functional reorganization following auditory training, imparting a positive impact on everyday communication (Recanzone et al., 1993; Blake et al., 2006; Kilgard, 2012; Anderson et al., 2013; Anguera et al., 2013; Heim et al., 2013; Engineer et al., 2014). It is thought that music training can effect structural and functional neural changes (i.e., experience-dependent plasticity; Kraus and Chandrasekaran, 2010; Patel, 2011; Herholz and Zatorre, 2012; Zatorre, 2013) because music engages widely distributed sensory, cognitive, and reward networks in the brain—the very networks whose integration drives neuroplasticity. However, only a small number of longitudinal studies have described a direct effect of music training (Fujioka et al., 2006; Moreno et al., 2009; Johnson et al., 2013; Tierney et al., 2013; Chobert et al., 2014) and debates persist concerning innate differences between musicians and non-musicians versus a causal role for music training (Corrigall et al., 2013; Zatorre, 2013); although there is encouraging longitudinal evidence for the potential of music training to engender improvements in automatic sound processing in children in this age range (Putkinen et al., 2014).
These music enhancements do not only manifest neurophysiologically: musicianship is associated with a host of cognitive benefits for listening and learning. These include auditory memory and attention (Koelsch et al., 1999; Strait et al., 2010; Kraus et al., 2012), general intelligence and executive functions (Schellenberg, 2004; Moreno et al., 2011), understanding speech in noisy environments (Parbery-Clark et al., 2009b; Zendel and Alain, 2012), language processing (Milovanov et al., 2008), and literacy skills (reviewed in Tierney and Kraus, 2013). Therefore, large-scale community interventions have the potential to instill salient behavioral benefits in children that can set them up for better learning in and out of the classroom.
Motivated by cross-sectional studies of music training (Elbert et al., 1995; Gaser and Schlaug, 2003; Bidelman et al., 2011), and the overlap of biological mechanisms of speech and music (Patel, 2011, 2010), here we asked whether participation in an established community music program changes auditory neurophysiology. We hypothesized that participation improves the neural processing of speech syllables. To test this hypothesis, we used a randomized control design in collaboration with Harmony Project (Los Angeles, CA), a longstanding and successful community music program that has provided free music instruction to >1000 children from Los Angeles gang-reduction zones. We measured neural responses to contrastive speech sounds before and after training, and in light of cross-sectional studies of childhood musical training (Strait et al., 2014), we predicted that music training improves the neural differentiation of speech.
Materials and Methods
Subjects.
Forty-four children, aged 80–112 months (mean 99 months; 8.25 years; 25 girls) at Year 1, participated in a hybrid randomized control design. All were public-school pupils living in Los Angeles gang-reduction zones. Subjects were randomly assigned either to defer their participation in music lessons for 1 year and then undergo training (“Group 1,” N = 18, 1 year of total music) or begin music lessons immediately (“Group 2,” N = 26, 2 years of total music), all following Harmony Project's curriculum (see below). Targeted group assignment was conducted for the last few subjects to ensure that the two groups were age- and sex-balanced. Thus at Year 2, Group 2 had 1 year of music training; at Year 3, Group 2 had 2 years of music training and Group 1 had 1 year. At Year 1, groups were matched on age (t(42) = 1.196, p = 0.239), hearing thresholds (t(42) = 0.289, p = 0.774), maternal education (t(40) = 0.799, p = 0.429), IQ (t(41) = 0.419, p = 0.677), and proportion of females and males (p > 0.1). All subjects came from Harmony Project's waitlist, meaning the groups were equally motivated to pursue music training.
Intervention.
The musical training followed Harmony Project's standard curriculum. All children first attend group introductory musicianship classes (1 h per session, 2 sessions per week) consisting of instruction in fundamental skills such as pitch and rhythm identification, performance, notation, and basic recorder playing. Subjects generally progress to group instrumental instruction after 6 months or when instruments are available, depending on instructor judgment of their proficiency in musicianship class and access to instruments (provided at no cost to subjects). Instrumental and ensemble training differ as a function of instructor/seat availability programmatically, but comprise ≥4 h/week of group instruction. Instruments include strings, woodwinds, and brass winds.
Neurophysiological protocol.
At each test session (annually in July of 2011, 2012, and 2013) all subjects received a neurophysiological test battery consisting of click and speech-evoked auditory brainstem responses administered using Intelligent Hearing System's SmartEP platform (Intelligent Hearing Systems). The click-evoked response was conventionally administered (Hall, 2006) and all children were within normal limits for response latency. The speech-evoked responses combine neural responses to transients and sustained (frequency following) features in speech that, together, offer insight into the precision of automatic auditory processing (Skoe and Kraus, 2010). Despite their subcortical origin, these responses reflect short- and long-term influences from auditory cortical and nonauditory regions, because the brainstem is an integrative “hub” of auditory processing (Kraus and Nicol, 2014). Two synthesized, voiced consonant–vowel syllables, [ba] and [ga], differing only in the onset frequency of the second formant, were delivered to the right ear via insert earphones at 80 dB SPL. See (Hornickel et al., 2009) for a complete acoustic description of the syllables. Six thousand presentations of each syllable were presented in alternating polarity at a rate of 4.35/s. Responses were recorded from vertex (Cz) referenced to right earlobe, digitized at 13.333 kHz, and filtered on-line from 0.05 to 3 kHz. Responses to the two presentation polarities were averaged separately and subtracted to enhance the spectral component of the response (Aiken and Picton, 2008).
Cross-phaseogram procedure.
A time-frequency cross-phaseogram approach, first described by Skoe et al. (2011), was used to quantify the difference in response timing between the two evoking consonants. This technique comprises computing a short-term cross-phase spectrum resulting in a time-frequency matrix of phase differences. With this pair of stimuli, the response to [ga] phase leads the response to [ba] in a typically operating auditory system. This is because [ga] has higher frequency content in the first 50 ms of the syllable; higher frequencies activate more basal regions of the cochlea initiating an earlier neural volley. When depicted in graphical form as in Figure 1, the phaseogram's abscissa is time, in milliseconds (0 = stimulus onset), the ordinate is frequency, in Hertz, and the phase difference in radians is depicted in color. Green represents no phase difference; warm colors indicate the response to [ga] leading the response to [ba] and cool colors indicate [ba] leading [ga].
Statistical analysis.
The dependent variable was an arithmetic mean of phase differences in a time-frequency “region of interest” (ROI) defined as 15–45 ms poststimulus onset and 0.9–1.5 kHz (Strait et al., 2014). This ROI corresponds to the second format frequency over the time of maximal difference between the stimuli. Outlying data (>2 SDs from the group mean) were adjusted to exactly 2 SDs before analysis (Group 2, N = 4; Group 1, N = 2). Repeated-measures analyses of covariance (RMANCOVA) were computed, with age in months as a covariate. The repeated factor was test time (Year 1, Year 2, and Year 3) and the between-groups factor was participant group (Group 2, music training between all test times; Group 1, music training only between Test 2 and Test 3). Follow-up RMANCOVAs were conducted for each study group. Sphericity was confirmed for all within-subjects comparisons (Mauchly's ps > 0.750) and post hoc tests were Bonferroni corrected.
Results
We observed a progressive enhancement of neurophysiological function with community music training when controlling for age (i.e., development). Children with 2 years of training (Group 2) showed a marked improvement in the neural differentiation of the syllables [ba] and [ga]. Across both groups, more music training was associated with larger enhancements in neural function.
We found an improvement in the neurophysiological distinction of contrastive speech sounds in children who participated in 2 years of music lessons, but not those who participated in only 1 year (Group × Year interaction, F(2,80) = 3.709, p = 0.029). Neurophysiological distinction of the syllables [ba] and [ga] is displayed in Figure 1 for each group, at each session, in a cross-phaseogram format. These figures provide an objective illustration of the timing differences between responses to the two speech syllables. Both groups evinced a moderate distinction of the syllables at Year 1, illustrated by the red swatch in a time-frequency bin corresponding to acoustic differences between the syllables (i.e., in their consonant–vowel transitions in a frequency region corresponding to the second format; see Materials and Methods). This distinction is strengthened after 2 years of training, illustrated by larger and deeper red contrast at Year 3 in Group 2 (within group main effect of year, F(2,48) = 6.670, p = 0.003). This strengthening occurred following the second year of music training (Year 2 vs Year 3, p = 0.010) with overall stronger distinction after 2 years (Year 1 vs Year 3, p = 0.025).
In Group 1 there was no change in neurophysiological distinction across the 3 years (within group main effect of year, F(2,32) = 1.634, p = 0.211). While there was no overall group difference (main effect of group, F(1,40) = 0.559, p = 0.459), there was a trending difference present at the third assessment (F(1,40) = 3.688, p = 0.062), with Group 2 having better neural differentiation than Group 1.
The group analysis suggested that more music training led to greater enhancements in neurophysiological function. We therefore asked whether there was a direct relationship between extent of music training (i.e., total hours of instrumental music practice over the 2 years) and extent of neurophysiological improvement. Indeed, we found that increasing hours of instrumental training predicted larger improvements in neural differentiation (r = 0.481, p = 0.001; Fig. 2). Together, these results suggest that community musical training improves neural differentiation of speech syllables and that more training leads to larger gains in neurophysiological function.
Discussion
We show that 2 years of participation in a community music program improves the neurophysiological distinction of similar speech sounds. This is the first demonstration of biological changes in auditory processing following participation in community music programs using a randomized longitudinal design. These changes were in the neurophysiological distinction of contrastive speech syllables during passive listening, after active music training had stopped. This suggests that music training transferred to non-music listening settings to influence automatic auditory processing. Importantly, these improvements were in processes that are important for everyday communication: previous investigations have revealed that, as groups, children who are better readers and children who hear better in noise show stronger neural distinctions of these same syllables (Hornickel et al., 2009; Skoe et al., 2011; White-Schwoch and Kraus, 2013). These findings therefore provide support for the efficacy of community and co-curricular music programs to engender improvements in nervous system function. These children are from underserved backgrounds and stand at high risk for academic and social problems; this impoverishment carries concomitant biological insults (Bradley and Corwyn, 2002; Skoe et al., 2013). Our finding reveals the potential for neuroplasticity in the impoverished human brain (Neville et al., 2013), paralleling an effect shown in a rat model (Zhu et al., 2014). Moreover, our finding has a clear pragmatic implication by showing that community music programs may stave off certain language-based challenges.
What mechanisms drive these changes? We propose that the improvements observed in neurophysiological distinction of speech sounds were driven by top-down modifications to automatic auditory processing, with music training directing children's attention to meaningful sounds of their environment. This interpretation is consistent with Patel's OPERA hypothesis (overlap, precision, emotion, repetition, and attention; Patel, 2011), which stresses the importance of attentional involvement during training. Patel also identifies the importance of repetition during training; we see a strong role for the prolonged repetition of music practice, because 1 year of training was insufficient to affect nervous system function. In addition to OPERA, our view is broadly consistent with other theories of learning that impute a major role for directed attention to modulate future automatic sensory processing (Ahissar and Hochstein, 2004; Kraus and Chandrasekaran, 2010; Green and Bavelier, 2012).
The neural responses we measured are generated predominantly by auditory midbrain (Warrier et al., 2011). Midbrain plasticity is mediated by a large network of descending corticofugal fibers (Bajo et al., 2010) and other projections that cross-innervate midbrain and brainstem nuclei with motor (Molinari et al., 2007), reward (Bajo and King, 2012), and prefrontal cortices (Raizada and Poldrack, 2007)—the very centers that are actively engaged by music (Kraus and Chandrasekaran, 2010; Chanda and Levitin, 2013; Salimpoor et al., 2013). These influences converge to make auditory midbrain a hub of cognitive, motor, and sensory processing. We speculate that top-down attentional and cognitive modulations caused an activity-driven enhancement in midbrain function, which progressively (i.e., with more training) drove the changes we observed (Polley et al., 2006; Hornickel et al., 2009; Bajo et al., 2010; Kraus and Chandrasekaran, 2010; Bajo and King, 2012). Uniquely, making music engages these systems in a positive, reinforcing, and active manner that offers neuroplastic potential beyond everyday listening experiences.
Since music integrates the perception and production of meaningful sounds in a communicative context, music training has the potential to generalize to language and speech, as has been argued previously (Kraus and Chandrasekaran, 2010; Patel, 2011). By directing children's attention to meaningful acoustic cues in their environments, music training may have facilitated the sound-meaning connections that drive neural plasticity, observed here as an improvement in the neural distinction of speech syllables. Converging evidence from animals and humans suggests that attention to past sounds influences automatic processing of sounds during future listening experiences (Krishnan et al., 2005; Zhou and Merzenich, 2008; Threlkeld et al., 2009; Ortiz-Mantilla et al., 2010; Sarro and Sanes, 2011; Krizman et al., 2012; White-Schwoch et al., 2013), such as the neurophysiological improvement observed here.
A previous cross-sectional study, using the same neurophysiological methods, showed that school-aged children with at least 3 years of music training had stronger distinctions of these speech syllables than non-musician children—a finding paralleled in preschool age children and adults (Parbery-Clark et al., 2012; Zuk et al., 2013; Kraus and Nicol, 2014; Strait et al., 2014). Here we show this enhancement with 2 years of training longitudinally, suggesting that the musician enhancement established through cross-sectional differences is indeed, at least in part, due to music training, and not innate differences between musicians and non-musicians. Children who underwent only 1 year of music training did not have stronger neural processing of these speech sound differences. Neural changes from music training may take longer to emerge than those from other forms of auditory training, such as computerized training programs. However, previous investigations suggest that these neural enhancements from music training persist for decades after training stops (Skoe and Kraus, 2012; White-Schwoch et al., 2013). Therefore, even if these enhancements take relatively long to emerge, they may be long lasting.
Our finding is also evocative of research on training attentional systems using action video games: an interpretation of this line of research is that video games allow individuals to “learn how to learn,” and functional enhancements follow this prerequisite (Bavelier et al., 2011; Green and Bavelier, 2012). Here, the first year of music training may have facilitated more active engagement with sound in a meaningful context to promote efficient auditory processing (Strait et al., 2009; Parbery-Clark et al., 2009a). During the second year this new mode of active listening may have been brought to bear, allowing the children to make sound-meaning connections that modulated neural function (Fritz et al., 2003; Kraus and Chandrasekaran, 2010).
A number of longitudinal studies have used scientifically developed training materials based on the principles of perceptual learning elucidated in decades of animal and human studies (Tallal et al., 1996; Temple et al., 2003; Moore et al., 2005; Moreno et al., 2009; Anderson et al., 2013). These training regimens are carefully designed to be delivered in a short time span in the laboratory or on a computer, and are associated with improvements in perceptual and neurophysiological functions after only a few short weeks of training; yet training benefits often do not generalize far beyond the training material (Hayes et al., 2003; Song et al., 2012; Anderson et al., 2013; Anderson et al., 2014). However, there have been studies that have found biological enhancements in auditory processing following participation in informal music activities during early childhood (Putkinen et al., 2013).
Here, we show an improvement in auditory processing that emerges after a 2 year course of music. Neural enhancements that generalize to automatic processing of stimuli that were not explicitly trained, such as we show here, may take longer to emerge than those from focused computer training. We still find merit in music training as a mechanism to improve neural function. After all, music is an inherently fun activity for most people, likely providing children emotional satisfaction throughout their training (Dube and Le Bel, 2003), even if that training continues over several years. That said, it remains an open question whether and how scientifically inspired training regimens may be combined with ecologically valid music programs to provide the most effective improvements in communicative skills. An additional question is what would be seen with other types of enrichment. We did not have an active control group in this study, meaning some or all of the training-related enhancements we observed might be attributed to providing these children with any kind of enrichment as opposed to a per se music effect (Moreno et al., 2009; but see Anderson et al., 2013). It also bears mentioning that, although significant, our training effects were relatively small. It will be important to replicate these findings to strengthen the argument of the potential for these sorts of community-based interventions. There are also several factors that may contribute to the amount of music instruction a child received (Fig. 2), including availability of instruments, if they missed classes (due to illness, home trouble, etc.), and Harmony faculty's judgments of their progress in the curriculum. And since Group 1 students started ∼1 year later, we cannot rule out interactions with development that may have biased training benefits toward Group 2 (Bailey and Penhune, 2013). Future work will have to evaluate the intersections of age and training that dictate final outcomes. However, in cross-sectional studies of musicianship Strait et al. (2009, 2013) have found that musician enhancements for timing aspects of neural processing, including the distinction of contrastive speech syllables, are linked to the extent of music training and not age of onset.
Cross-sectional studies of musicians, on the one hand, and longitudinal studies of computerized or private music training on the other hand, offer little concrete evidence for policymakers and community organizers interested in enacting broad-based youth programs. By providing objective biological evidence that music programs improve the neurophysiological processing of speech sound contrasts, our findings support efforts to expand community and co-curricular opportunities for at-risk children during critical developmental years. Future work should follow children in similar programs to ascertain whether these neurophysiological changes eventually lead to salient behavioral outcomes for learning, listening, and literacy skills, and whether music training can counteract learning and auditory processing difficulties in clinical populations. These efforts are especially important for children from underserved populations, such as those who participated in the current study. Our findings support efforts to reintegrate music into public schooling as an important complement to science, technology, math, and reading instruction (Rabkin and Hedberg, 2011; President's Committee on the Arts and the Humanities, 2011). In addition to providing children with a personally satisfying afterschool activity, community music programs offer the potential to engender biological changes in neural processes important for everyday communication.
Footnotes
This work is supported by the National Association of Music Merchants, the Grammy Foundation, and the Hugh Knowles Center. We are grateful to S.R. O'Connell, S. Bhatia, J. Thompson, E. Spitzer, E. Skoe, and J. Krizman for their assistance with the study. We also express our appreciation to Harmony Project founder Margaret Martin, Dr. P.H., M.P.H., executive director Myka Miller, and staff Monk Turner, Sara Flores, and Jeremy Drake (www.harmony-project.org), and the children and their families for their participation.
The authors declare no competing financial interests.
- Correspondence should be addressed to Nina Kraus, 2240 Campus Drive, Evanston, IL 60208. nkraus{at}northwestern.edu