Abstract
Auditory perception is fundamental to human development and communication. However, no long-term studies have been performed on the plasticity of the auditory system as a function of musical training from childhood to adulthood. The long-term interplay between developmental and training-induced neuroplasticity of auditory processing is still unknown. We present results from AMseL (Audio and Neuroplasticity of Musical Learning), the first longitudinal study on the development of the human auditory system from primary school age until late adolescence. This 12-year project combined neurologic and behavioral methods including structural magnetic resonance imaging (MRI), magnetoencephalography (MEG), and auditory tests. A cohort of 112 typically developing participants (51 male, 61 female), classified as “musicians” (n = 66) and “nonmusicians” (n = 46), was tested at five measurement timepoints. We found substantial, stable differences in the morphology of auditory cortex (AC) between musicians and nonmusicians even at the earliest ages, suggesting that musical aptitude is manifested in macroscopic neuroanatomical characteristics. Maturational plasticity led to a continuous increase in white matter myelination and systematic changes of the auditory evoked P1-N1-P2 complex (decreasing latencies, synchronization effects between hemispheres, and amplitude changes) regardless of musical expertise. Musicians showed substantial training-related changes at the neurofunctional level, in particular more synchronized P1 responses and bilaterally larger P2 amplitudes. Musical training had a positive influence on elementary auditory perception (frequency, tone duration, onset ramp) and pattern recognition (rhythm, subjective pitch). The observed interplay between “nature” (stable biological dispositions and natural maturation) and “nurture” (learning-induced plasticity) is integrated into a novel neurodevelopmental model of the human auditory system.
Significance Statement We present results from AMseL (Audio and Neuroplasticity of Musical Learning), a 12-year longitudinal study on the development of the human auditory system from childhood to adulthood that combined structural magnetic resonance imaging (MRI), magnetoencephalography (MEG), and auditory discrimination and pattern recognition tests. A total of 66 musicians and 46 nonmusicians were tested at five timepoints. Substantial, stable differences in the morphology of auditory cortex (AC) were found between the two groups even at the earliest ages, suggesting that musical aptitude is manifested in macroscopic neuroanatomical characteristics. We also observed neuroplastic and perceptual changes with age and musical practice. This interplay between “nature” (stable biological dispositions and natural maturation) and “nurture” (learning-induced plasticity) is integrated into a novel neurodevelopmental model of the human auditory system.
- auditory cortex
- auditory evoked fields
- learning-induced plasticity
- maturation
- morphology
- musical practice
Introduction
Despite public interest in how playing musical instruments and singing affects brain and behavior in children and adolescents, relatively little is known from a neuroscientific perspective about the interplay between a priori dispositions, biological maturational processes, and learning-induced plasticity through musical training over the lifespan. Active music making involves numerous neural processes of perception, cognition, brain connectivity, and behavior (Kraus and Chandrasekaran, 2010; Herholz and Zatorre, 2012; Tierney et al., 2015; Habibi et al., 2018). Several cross-sectional studies (limited to one measurement timepoint) describe age-related neuroanatomical and -functional characteristics of the auditory system (Ponton et al., 2000; Long et al., 2018) and associated perceptual abilities (Litovsky, 2015).
With regard to neuroanatomy, gray matter (GM) and white matter (WM) have been shown to be relevant markers. GM mainly contains neural cell bodies with their dendrites, as well as axon terminals and all synapses. WM predominantly consists of long-range myelinated axon bundles that transmit signals to different brain regions (Purves et al., 2008; Zatorre et al., 2012). From childhood to adulthood, the myelination of long-range bundles of axons increases, which is associated with a higher speed of signal transmission and neural efficiency (Su et al., 2008). Giedd et al. (1999) observed a gradual increase in GM density and WM myelination in the temporal lobe from infancy to middle adolescence (∼16 years), followed by a slight decrease thereafter. Extensive musical practice during childhood was found to be linked to increased WM connectivity in the corpus callosum (Bengtsson et al., 2005; Steele and Zatorre, 2018).
Auditory cortex (AC) located in the superior-temporal plane includes primary core areas on Heschl’s gyrus (HG) subserving elementary sound processing, and adjacent secondary belt areas including planum temporale (PT) that are relevant for more complex auditory pattern recognition (Schneider et al., 2005; Benner et al., 2023). The neural efficiency of AC is reflected in characteristic auditory evoked response patterns. The P1-N1-P2 complex is relevant for developmental and learning-induced changes (Sharma et al., 1997; Seither-Preisler et al., 2014; Benner et al., 2017; Schneider et al., 2022). A cross-sectional EEG study on age-related differences in children and adolescents reported a significant reduction of response latencies with increasing age (Ponton et al., 2000). Several other cross-sectional studies have examined the neurofunctional specificities of musically experienced children (Jentschke and Koelsch, 2009; Putkinen et al., 2014) and adults (Münte et al., 2002; Gaser and Schlaug, 2003; Tervaniemi et al., 2016; Vuust et al., 2022). Although these studies provide valuable insights into musical neural processing, they do not allow reliable conclusions about the extent to which observed differences are because of predisposition (aptitude), acquired expertise (learning), or the interplay of both (Herholz and Zatorre, 2012).
For this reason, longitudinal studies (multiple measurement timepoints with the same sample) have been designed to examine the effects of musical practice (Müllensiefen et al., 2022; Worschech et al., 2022). Neurologic longitudinal studies with children (Moreno et al., 2009; Chobert et al., 2014; Habibi et al., 2018) and adults (Herdener et al., 2014) are rare and limited to a typical observation period of three to five years. Some long-term studies focused on near-transfer or far-transfer effects of musical training (Bigand and Tillmann, 2022), the former directly relating to the auditory domain (e.g., elementary sound processing, complex pattern recognition, audio-motor control, and language skills), the latter affecting nonauditory domains (e.g., visuospatial, arithmetic processing, etc.). Language-related near transfer effects were observed by Moreno et al. (2009) and Patel (2011). Other studies reported that individuals who played music in childhood and later stopped sometimes showed benefits for auditory perception and neural information processing even in late adulthood (Skoe and Kraus, 2012; Strait and Kraus, 2014).
Still, little is known about the interplay between stable neurologic dispositions and age-dependent maturational plasticity (“nature”) and learning-induced plasticity (“nurture”) through musical training from childhood to late adolescence. Our 12-year longitudinal study AMseL (Audio and Neuroplasticity of Musical Learning) aimed to fill this gap in understanding of human auditory system development.
Materials and Methods
AMseL study
In a 12-year long-term study comprising five measurement timepoints (MT1–5), we monitored the influence of musical practice on structural AC morphology, functional auditory evoked fields (AEFs), and behavioral auditory skills from childhood into late adolescence. The longitudinal project was part of funded research accompanying the German cultural education program JeKi (“Jedem Kind ein Instrument”/“An Instrument for Every Child”).
Participants
A total of 112 typically developing children (51 male, 61 female) participated in the study. At MT1, the children were in primary school (8.1 ± 0.8 years), at MT5, they were late adolescents (17.7 ± 0.8 years). The mean timespan between subsequent MTs was 2.4 ± 0.9 years. All participants showed normal hearing levels (≤20 dB pure-tone thresholds) and no history of neurologic disorders. All were fully informed about the aims of the study and gave written informed consent before participation.
Musical status
An index of cumulative musical practice (IMP) was created to assess musical training. The IMP was defined as the product of the number of years of formal music education and the number of hours per week spent practicing a musical instrument (Seither-Preisler et al., 2014). As the longitudinal study involved an exceptionally long observation period of ∼10 years for every participant, a certain fluctuation of musical engagement over time was inevitable. It was therefore not possible to recruit a sample of nonmusicians with a mean IMP of 0. Rather, two groups were defined, one with no or little and one with moderate to high musical practice. Consistent with our previous study (Serrallach et al., 2016), participants were classified as “musicians” if their IMP_3 exceeded 4; 66 subjects satisfied this criterion. The remaining 46 participants (mean age: 11.7 ± 0.8 years) with no or very little musical training (IMP_3 ≤ 4) were classified as “nonmusicians.” Most of them participated at all five measurement timepoints (47 musicians and 26 nonmusicians).
The following mean IMP values were obtained at the five MTs (1) for nonmusicians: IMP_1 = 0.3 ± 0.1, IMP_2 = 0.9 ± 0.2, IMP_3 = 1.5 ± 0.2, IMP_4 = 2.8 ± 0.7, IMP_5 = 6.4 ± 1.6) and (2) for musicians: IMP_1 = 5.3 ± 0.7, IMP_2 = 7.8 ± 0.8, IMP_3 = 16.7 ± 1.2, IMP_4 = 26.7 ± 2.3, IMP_5 = 39.7 ± 3.6. Figure 1A displays the longitudinal IMPs of the two groups for the age span of 8–18 years along with cross-sectional IMPs of adult nonmusicians and musicians, as described in our previous studies (Bücher et al., 2023). The latter were in young adulthood (YA: 19–29 years) and in middle-aged adulthood (MA: 30–67 years), respectively.
Social variables
The children’s socioeconomic background was investigated using a comprehensive questionnaire for parents. As already outlined in Seither-Preisler et al. (2014), three relevant social factors derived from principal component analysis were considered: (1) educational environment (including the mother’s and father’s highest professional degree and the number of books at home); (2) parental support (including the amount of parent–child communication, the frequency of common participation in cultural events, and the parents’ personal interest in children’s activities); and (3) resources and leisure activities (including courses in sports and arts, and children’s resources such as their own room or personal computer). Parental income loaded as well on factors 1 and 3.
Magnetic resonance imaging (MRI)
A T1-weighted structural magnetic MRI was performed to investigate the anatomy of AC (MT1+2: Siemens Trio, 3 Tesla; MT3–5: Siemens, TrioTim, 3 Tesla). All the data were acquired using 12 channel head coils and a standardized scanning protocol (MPRAGE, 176 DICOM slices, sagittal orientation; slice thickness 1 mm, field of view: 256 × 256; matrix size 128 K (16 bit), repetition time (TR) = 1930 ms, echo time (TE) = 3.47 ms, flip angle 15. An individual approach of three-dimensional (3D) gray matter surface reconstruction of auditory subareas (HG, PT) was applied to account for individual morphology and gyrification patterns (Schneider et al., 2005, 2009; Wengenroth et al., 2014; Turker et al., 2019; Dalboni da Rocha et al., 2020; Rus-Oswald et al., 2022; Serrallach et al., 2022). For segmentation the Brain Voyager software QX 2.8 (Brain Innovation, B.V.) was used. All brain images were adjusted in contrast and in brightness, were precisely corrected for inhomogeneity and rotated in direction of the antero-posterior commissural line. The superior temporal plane, including HG, anterior superior temporal gyrus (aSTG), and PT, was segmented into sagittal MRI slices along the Sylvian fissure using the standard definition of the landmarks of AC and approved additional criteria. The first complete Heschl’s sulcus with a large mediolateral extent (>97%) and pronounced depth was used as the posterior boundary and the crescent-shaped first transverse sulcus as the anterior boundary of HG, thereby dividing AC into two parts: (1) an anterior stream including HG and aSTG and (2) a posterior stream including PT. HG was separated from aSTG by an anterior borderline with y = 0 (Schneider et al., 2005). The range of the included image gray values was calculated individually. A box was marked around left and right AC to generate intensity histograms of these areas. The “gray value inclusion range,” which was used for surface reconstruction and morphometry, was defined on the basis of two criteria: (1) the value of the gray matter peak multiplied by the factor 0.28, which characterizes an appropriate cutoff value to separate the cerebrospinal fluid from gray matter tissue, (2) a fixed cutoff point (GM + 2*WM)/3, i.e., a point two-thirds in the direction of the WM peak, was set to separate gray matter and WM voxels. The WM and GM voxels were marked and used for 3D reconstruction. Furthermore, the ratio of white to gray matter (WG ratio) was calculated, which signifies the relative proportion of WM in the areas of interest and hence is an indicator of the degree of myelination in the respective structures (Emmorey et al., 2003). Next, four boxes centered in right and left HG and PT were defined as regions of interest to compute individual WG ratios (coordinates in stereotaxic space in mm: right HG: x = [40 60], y = [0 −20], z = [0 15]; left HG: x = [−40 −60], y = [0 −20], z = [0 15]; right PT: x = [45 65], y = [−20 −40], z = [10 25]; left PT: x = [−45 −65], y = [−20 −40], z = [10 25]).
Magnetoencephalography (MEG)
AEFs were measured by a 122 planar gradiometers (Neuromag-122) whole-head MEG system (Hämäläinen et al., 1993) in response to seven different sampled instrumental sounds (piano, guitar, flute, bass clarinet, trumpet, violin, and percussion) and five artificial simple harmonic complex tones that were created using MATLAB software as used in previous studies (Schneider et al., 2005; Seither-Preisler et al., 2014; Serrallach et al., 2016; Groß et al., 2022). Before measurement, four reference coils were attached to the participant’s head (left and right temples and left and right mastoid) with skin-friendly adhesive tapes. An electronic digitizing pen and a sensor on the forehead were first used to scan three points on the head surface that define the head coordinate system (nasion, right and left preauricular points). In addition, 32 other points on the head surface were digitized. For the MEG measurement, the participants were placed under the MEG dewar in a relaxed posture. To avoid an overlaying influence of task-specific changes in the auditory evoked responses, participants were measured without a task. The stimuli were presented binaurally via 90-cm plastic tubes through foam ear pieces placed in the ear canal and connected to small shielded transducers that were fixed in boxes next to the chair. Participants were instructed to listen attentively to the binaurally presented sounds in a relaxed state and to leave their eyes open while watching a silent movie. The participants could choose among a preselection of approximately four to five different appropriate movies. As there was no systematic assignment of movies to participants or groups and the auditory stimuli were played in pseudorandomized order with pseudorandomized interstimulus intervals, no systematic cross-modal interactions between visual and auditory stimulation were expected. The position coils were then calibrated and their position relative to the MEG dewar was determined. The stimuli were presented in a continuous sequence for 17 min to guarantee a high signal-to-noise ratio [noise reduction: √(1200) = 34.6]. Each of the 12 stimuli was presented 100 times (N = 1200) in pseudorandomized order (stimulus length 500 ms, interstimulus interval 300–400 ms). All stimuli had the same length and superimposed onset and offset ramps (duration: 10 ms using a Hanning filter) to avoid clicks. The intensity of the stimulation was adjusted from the output of the foam pieces to 70 ± 2 dB SPL as determined by a Brüel and Kjaer artificial ear (type 4152) with an additional 2-ml coupler.
The sensor waveforms were recorded with a sampling rate of 1000 Hz, corresponding to a lowpass filter of 330 Hz [filter range 0.00 (DC) to 330 Hz]. Data analysis was conducted with the BESA Research 6.0 software (MEGIS Software GmbH). AEFs were calculated post hoc from the ongoing changes of the field distributions. Before averaging, data were inspected with the BESA Research Event-Related Field Module to automatically exclude three to seven noisy (bad) channels, ∼10% of all epochs exceeding a gradient of 600 fT/cm × s, and amplitudes either exceeding 3000 fT/cm or falling below 100 fT/cm. Signal strength was calculated relative to a 100-ms prestimulus baseline. The responses of each participant were collapsed into a grand average (on the average 1100 artifact-free epochs) in a 100-ms prestimulus to 400-ms poststimulus time window. Based on a standard single-sphere head model (Hämäläinen and Sarvas, 1987; Sarvas, 1987; Scherg, 1990), spatiotemporal source modeling was performed in normalized coordinates, independent of individual brain anatomy. Subsequently, the source activity of the primary and secondary auditory cortex was fitted using a two-dipole model with one equivalent dipole in each hemisphere (Schneider et al., 2005; Seither-Preisler et al., 2014; Zoellner et al., 2019; Groß et al., 2022). Source modeling was done on an individual basis before group-averaging of the source waveforms. Since the head position of the participants under the dewar of the MEG was not the same at the five MTs, source localizations and orientations were fitted separately with exactly the same fitting parameters. The fitting intervals of the P1-N1-P2 complex were individually adjusted in four steps: (1) the dipoles were converted to a regional source in each hemisphere and the center of gravity (Yvert et al., 2001) was localized using an individually adjusted fitting interval between the P1 peak and P2 peak; (2) the regional sources were converted back to single dipoles; (3) the orientation of the primary P1 response was fitted individually around its lower and upper half-sidelobes (from the middle of the ascent to the middle of the decent) and directed toward the vertex before analyzing P1 latencies and amplitudes; (4) the orientation of the N1 and P2 responses were fitted toward their lower and upper half-sidelobes while maintaining the direction of the P1 toward the vertex. Subsequently, the N1 and P2 latencies and amplitudes were taken from the source waveforms, with the N1 amplitude usually having negative values. The described procedure is well-established and has been used similarly in our earlier studies (Benner et al., 2017; Schneider et al., 2022). In addition, absolute asynchronies |peak latency {right – left}| and absolute amplitude asymmetries |peak amplitude {right – left}| were calculated to assesses how well the latencies and amplitudes match between hemispheres.
Auditory discrimination tests
For the audiometric and psychoacoustic tests, the stimuli were presented binaurally using an RME Hammerfall DSP Multiface system and closed dynamic headphones (Sennheiser HAD 200) designed for high-quality hearing tests. These headphones provide ∼30 dB of passive attenuation in the frequency region of the stimuli used. The intensity was controlled not to exceed 75 dB SPL. The auditory testing battery included the assessment of auditory discrimination abilities (KLAWA test; Christiner et al., 2022; Schneider et al., 2022) and of subjective pitch perception (Schneider et al., 2005; Schneider and Wengenroth, 2009).
The KLAWA test (“Klangwahrnehmungstest”/“Test of Subjective Sound Perception”) is a German inhouse computer-based threshold test for children, adolescents, and adults based on an “alternative-forced-choice” procedure (Jepsen et al., 2008). In this procedure, which automatically adapts to the participants’ performance, difference thresholds are calculated. The KLAWA measures the sensitivity for discriminating different acoustic parameters, namely intensity (dB; soft vs loud), frequency (semitones/ST; low vs high), onset ramp (ms; sharp vs mellow), and tone duration (ms; short vs long). The discrimination thresholds for these parameters may vary largely among participants (> factor 100). In the frequency subtest, the standard is a 500-Hz pure tone and the difference between tones varies randomly by up to two STs. In the intensity subtest, the standard is fixed at 65 dB SPL, while the test tones vary between 45 and 65 dB SPL. In the onset ramp subtest, the standard has a linear rise time of 15 ms, a continuous segment of 735 ms, and a linear fall time of 50 ms, while the rise times of the test tones vary logarithmically up to 300 ms. In the duration subtest, the standard has a duration of 400 ms and the comparison tones are varied logarithmically from 400 to 600 ms. Since the development of the KLAWA test battery in 2013, it has been used consistently for MT3–5. For MT1 and MT2 (recorded 2010–2012), the comparable Dinosaur threshold estimation program was used to measure individual loudness, frequency, ramp, and duration thresholds of pure tones (Sutcliffe and Bishop, 2005; Seither-Preisler et al., 2014). The experimental design of the Dinosaur program is comparable to the KLAWA test, but uses fixed thresholds. Rhythmic abilities were assessed by 24 pairs of rhythmic sequences, classified as same or different.
The pitch perception preference (ppp) test includes 144 different pairs of harmonic complex tones. Each pair consists of two consecutive tones (duration: 500 ms, 10-ms rise-fall time, interstimulus interval 250 ms). Each test tone includes two, three, or four adjacent harmonics, omitting the fundamental frequency (Schneider et al., 2005). For each individual, an “index of pitch perception preference (ppp index)” δ = (fSP – f0)/(fSP + f0) was computed (fSP: number of perceived spectral pitches, f0 number of perceived fundamental pitches; see Schneider et al., 2005).
We also included the musical aptitude tests Intermediate Measures of Music Audiation (IMMA; Gordon, 1986) and Advanced Measures of Music Audiation (AMMA; Gordon, 1998). These tests measure the ability to internalize musical structures and to detect tonal or rhythmic modifications in sequentially presented patterns (Schneider et al., 2002).
Data analyses
With regard to the neuroanatomical MRI data, four-way ANOVAs were calculated for the independent variables “MT” (1–5), “Region” (HG, PT), “Hemisphere” (R, L), and “Musical expertise” (Mus, Non). Separate analyses were performed for the dependent variables GM volumes and WG ratios.
For the neurofunctional MEG data, three-way ANOVAs were calculated for the independent variables “MT” (1–5), “Hemisphere” (R, L), and “Musical expertise” (Mus, Non). Separate analyses were performed for the dependent variables “latency” and “amplitude” of P1, N1 and P2. Moreover, the absolute P1, N1 and P2 asynchronies and absolute amplitude differences measured in the right and left hemisphere were considered as measures of functional lateralization in corresponding two-way ANOVAs.
Likewise, performance in each of the psychoacoustic tests was analyzed in two-way ANOVAs with the dependent variables’ discrimination thresholds for “loudness,” “frequency,” “onset ramp,” and “tone duration,” “subjective pitch,” and “rhythm score.”
For all ANOVAs, post hoc tests were adjusted for multiple comparisons by Bonferroni correction.
For correlational analyses we used Pearson’s coefficients if according to the Kolmogorov–Smirnov Test both tested variables were normally distributed. Otherwise, the nonparametric Spearman’s Rho was used.
Supplementary data from previous studies
In order to obtain a comprehensive view of the neuroplastic development of the auditory system in a lifetime perspective, we complemented our longitudinal data (8–18 years) with cross-sectional data from our previous studies with samples of 43 young adults (19–29 years; 19 musicians, 24 nonmusicians) and 42 middle-aged adults (30–67 years; 23 musicians, 19 nonmusicians; Bücher et al., 2023). The means and SEMs of these data are also included in Figures 2-4.
Ethics declarations
The study was reviewed and approved by Medical Faculty of Heidelberg S-475/2007, S-616/2015, S-778/2018. Written informed consent to participate in the study was obtained from the participants or (for children) from their parents.
Results
The aim of the 12-year longitudinal study was to investigate the plasticity of the auditory system from childhood to adulthood, as a function of both natural maturation and musical training. A total of 112 typically developing children, including 66 musically engaged and 46 untrained individuals, were studied from childhood (seven to nine years) to late adolescence (17–19 years) at five measurement timepoints (MT1–5). In the following, these two groups will be referred to as “musicians (mus)” and “nonmusicians (non),” respectively. For the sake of compactness, only means, significances (p-values) and effect sizes (partial η2) are reported. The findings of all performed ANOVAs including standard errors of the mean (SEMs) and F values are summarized in Tables 1–6 (n.s.: not significant).
Long-term plasticity of gray and white matter in human auditory cortex
Gray matter volume
Over the total observation period, the mean GM volumes of HG and PT decreased slightly but significantly from MT1 (3943 mm3) to MT5 (3876 mm3); p = 1.6E-4, partial η2 = 0.12. On average, GM volumes were larger in HG (4333 mm3) than in PT (3508 mm3; p = 6.7E-5, partial η2 = 0.20) and larger in the left (4267 mm3) than in the right (3574 mm3) hemisphere; p = 3.9E-10, partial η2 = 0.43. While this hemispheric difference was small for HG (R: 4144 mm3, L: 4522 mm3; p = 0.010), it was pronounced for PT (R: 3004 mm3, L: 4012 mm3; p = 7.8E-7). While HGs were substantially larger than PTs in musicians (5058 vs 2924 mm3), the reverse was true for nonmusicians (HG: 3607 mm3 vs PT: 4092 mm3); p = 3.7E-9, partial η2 = 0.39.
However, the complete picture unfolded when additionally taking into account hemispheric dominance. The 3-fold interaction “Musical expertise × Region × Hemisphere” (p = 0.011, partial η2 = 0.09) revealed that musicians were characterized by extraordinary enlarged HGs in both hemispheres (R: 5008 mm3, L: 5109 mm3) and smaller PTs, especially on the right side (R: 2209 mm3, L: 3639 mm3). In contrast, nonmusicians showed a weak dominance of PTs over HGs, which was more pronounced in the right hemisphere (right: HG: 3279 mm3, PT: 3799 mm3; left: HG: 3935 mm3, PT: 4385 mm3). As shown in Figure 2, the very different neuroanatomical characteristics of the musicians’ and nonmusicians’ brains were remarkably stable over time and already observed at the beginning of our study. The complete statistical results on GM volume are summarized in Table 1.
White to gray matter ratio
The ratio of white to gray matter (WG ratio) signifies the relative proportion of WM in the areas of interest and hence is an indicator of the degree of myelination in the respective structures (Emmorey et al., 2003). In general, the myelination of HGs and PTs increased continuously from MT1 (0.25) to MT5 (0.35); p = 2.8E-27, partial η2 = 0.54. Myelination was higher in PTs (0.32) than in HGs (0.27; p = 2.0E-8, partial η2 = 0.36), and higher in the right (0.31) than in the left (0.27; p = 2.3E-5, partial η2 = 0.23) hemisphere. The interaction “MT × Region” (p = 5.7E-12) revealed that during childhood (MT1 to MT3) the slopes of the WG ratio increase were comparable for HGs and PTs. Thereafter, PTs showed slightly steeper slopes, which suggests a somewhat stronger plasticity of this structure in adolescence in both hemispheres. Furthermore, there was a prominent interaction “Region × Hemisphere” (p = 5.8E-18, partial η2 = 0.65), which was because of a strikingly higher myelination of the right PT (0.37) as compared with the left PT (0.27) and both HGs (R: 0.26, L: 0.28). This surprising pattern was seen throughout the total observation period.
Most importantly, our findings on white to gray matter ratio show a strong maturational plasticity with age, but no learning-induced plasticity through musical training. The complete statistical results on WG ratio are summarized in Table 2.
Long-term plasticity of the neurofunctional activation of auditory cortex
The three major AEF components (P1, N1, P2) form a functional fingerprint of the AC (Schneider et al., 2022). Our data show that children initially have a markedly dominant P1 response (Fig. 3). The N1 becomes clearly visible at the age of ∼10 years, first on the right and later on the left side. The P2 shows its characteristic expression at the age of ∼15 years. A balanced pattern of these three subcomponents is only evident after completion of the maturation processes in adulthood, particularly in musicians.
The primary P1 response showed a substantial decrease in latency throughout the measurement period, which was continuous from MT1 (92.3 ms) to MT5 (72.0 ms); p = 4.1E-28, partial η2 = 0.56. P1 latencies were shorter in the right (80.0 ms) than in the left hemisphere (82.0 ms); p = 0.006, partial η2 = 0.11. This lead of the right P1 was only significant up to an age of ∼12 years (MT1–3), thereafter showing an increasing bilateral alignment (p = 0.05, partial η2 = 0.04). The mean absolute asynchrony of the P1 between hemispheres [|R-L| (ms)] decreased over the total measurement period from 8.1 to 5.0 ms (p = 0.01, partial η2 = 0.05). Absolute P1 asynchrony was smaller in musicians (5.3 ms) than in nonmusicians (7.6 ms); p = 0.02. partial η2 = 0.08. The P1 amplitudes showed a strong and continuous decrease from MT1 (36.9 nAm) to MT5 (17.2 nAm); p = 1.1E-27, partial η2 = 0.46. P1 amplitudes were slightly higher in the left (28.7 nAm) than in the right (26.5 nAm) hemisphere; p = 0.01, partial η2 = 0.10. The absolute asymmetry of P1 amplitudes [|R-L| (nAm)] decreased substantially from MT1 (10.8 nAm) to MT5 (4.9 nAm); p = 7.9E-8, partial η2 = 0.15.
N1 latencies decreased substantially and continuously from MT1 (227.2 ms) to MT5 (123.9 ms); p = 3.2E-46, partial η2 = 0.67. There was a minor lead of the right (167.4 ms) as compared with the left (172.3 ms) N1 (p = 0.01, partial η2 = 0.09). The absolute asynchrony of the N1 between hemispheres decreased over time and was about halved from MT1 (22.6 ms) to MT5 (10.8 ms); p = 0.04, partial η2 = 0.04. N1 amplitudes showed a decrease in negativity from MT1 (−23.7 nAm) to MT5 (−12.1 nAm); p = 1.4E-7, partial η2 = 0.16. Moreover, the N1 amplitude was more negative in the right (−20.0 nAm) than in the left hemisphere (−14.2 nAm); p = 1.9E-6, partial η2 = 0.29, indicating a right-hemispheric predominance. The absolute asymmetry of N1 amplitudes decreased over the observation period from 13.3 nAm at MT1 to 9.1 nAm at MT5 (p = 0.02, partial η2 = 0.05).
Similar to the P1 and N1, the latencies of the P2 component decreased markedly from MT1 (314.9 ms) to MT5 (196.5 ms); p = 3.1E-44, partial η2 = 0.64 and there was a small, but significant lead of the right (241.1 ms) as compared with the left (247.3 ms) hemisphere; p = 0.007, partial η2 = 0.11. There was a strong effect of MT on P2 amplitude (p = 4.9E-30, partial η2 = 0.49), showing a continuous increase from MT1 (−13.9 nAm) to MT5 (8.5 nAm). On average, P2 amplitudes were clearly larger in the left (−2.1 nAm) than in the right (−7.2 nAm) hemisphere (p = 1.1E-4, partial η2 = 0.21). The interaction “MT × Hemisphere” (p = 1.3E-5, partial partial η2 = 0.12) revealed that this was only the case for MT1–3. Moreover, P2 amplitudes were larger in musically trained (−1.5 nAm) as compared with untrained (−7.8 nAm) participants; p = 0.05, partial η2 = 0.06. The absolute asymmetry of P2 amplitude was fairly constant from MT1 to MT3 (mean: 12.3 nAm) and then showed a rapid decrease (MT4: 8.3 nAm; MT5: 6.9 nAm); p = 5.6E-4, partial η2 = 0.08, indicating a leveling of bi-hemispheric amplitude differences. The complete statistical results regarding the P1, N1 and P2 response can be found in Tables 3-5.
The neurofunctional MEG parameters were found to be sensitive to both biological maturation from MT1 to MT5 and musical training. Thus, they can be classified as indicators both of maturational plasticity (“developmental nature”) and training-induced plasticity (“nurture”).
Effects of auditory perception
The long-term comparisons revealed a substantial improvement of auditory perception from childhood to adolescence both for musicians and nonmusicians. With regard to frequency, discrimination thresholds improved substantially from 1.03 ST at MT1 to 0.29 ST at MT5 (p = 1.3E-20, partial η2 = 0.39). For intensity, the thresholds were rather stable from MT1 to MT3 (mean value 1.8 dB) and then improved toward MT4 (1.2 dB) and further toward MT5 (0.8 dB); p = 2.9E-16, η2 = 0.30. For onset ramp, thresholds decreased almost exponentially from 105 to 34 ms over the observation period (p = 1E-6, η2 = 0.18). For tone duration, between MT1 and MT3 thresholds decreased from 75 to 49 ms and then remained fairly stable (p = 1.2E-5, η2 = 0.12). For rhythm, the proportion of correct responses increased from 68.6% at MT1 to 84.4% at MT5 (p = 2.2E-28, η2 = 0.44). There was no significant difference in performance between MT4 and MT5, signifying that rhythm perception did not further improve after mid-adolescence. For subjective pitch, the ability to perceive the fundamental pitch increased slightly and gradually from MT1 (ppp index: −0.17) to MT5 (−0.51); p = 1.1E-17, η2 = 0.35.
Except for intensity discrimination, musicians outperformed nonmusicians in all perceptual domains (frequency: p = 1.1E-5, η2 = 0.25; onset ramp: p = 0.005, η2 = 0.11; tone duration: p = 5E-4, η2 = 0.17; rhythm: p = 0.002, η2 = 0.13; subjective pitch: p = 0.01, η2 = 0.09). Moreover, for frequency discrimination there was an interaction “MT × Musical experience” (p = 0.001, η2 = 0.08), showing that the difference between musicians and nonmusicians was largest at MT1 and then decreased (Fig. 4A). All statistical results regarding the auditory tests can be found in Table 6.
Our findings show that the auditory skills we measured were dependent on both maturational plasticity (“developmental nature”) and training-induced plasticity (“nurture”), with musical training having a positive influence as well on elementary auditory perception (frequency, tone duration, onset ramp) as on pattern recognition (rhythm, subjective pitch).
Correlational findings
As the degree of musical experience at the five MTs was metrically scaled, we also performed cross-sectional correlational analyses between IMP and all measured variables. This enabled higher accuracy with regard to individual musical experience at the expense of potential longitudinal effects (evident from the above ANOVAs).
In the following, the correlational neurologic findings are provided for IMP_5, where accumulated musical practice was greatest. When describing auditory correlational findings, we refer to MT4, because the AMMA test was only performed at MT4.
Anatomical variability in GM volumes, which was most pronounced in right HG (ρ = 0.66, p = 1.5E-10), showed a strong positive correlation with musical practice. After partialing out a possible influence of social background (factors “educational environment,” “parental support,” and “resources and leisure activities”; Seither-Preisler et al., 2014), the correlation between the GM volume of right HG and IMP_5 was still very high (right HG: ρ = 0.40, p = 3.2E-4), consistent with the relevance of musical status for individual AC morphology. With regard to WG ratios, no correlations with musical practice were observed. For the MEG parameters, there were significant correlations with P1 latency (right: ρ = −0.24, p = 0.025; left: ρ = −0.31, p = 5.9E-3), absolute P1 asynchrony (ρ = −0.11, p = 0.046), left N1 amplitude (ρ = −0.28, p = 0.012), and left P2 amplitude (ρ = 0.28, p = 0.012). Regarding effect sizes, the latter effects are typical for P1 latencies, small for absolute P1 asynchrony and relatively large for left N1 and P2 amplitudes (Gignac and Szodorai, 2016).
In the auditory tests, musical experience (IMP_4) correlated significantly with discrimination abilities for frequencies (ρ = −0.42, p = 9.2E-5), onset ramps (ρ = −0.24, p = 0.028), tone durations (ρ = −0.37, p = 6.3E-4), rhythms (ρ = 0.35, p = 1.4E-4), the perception of the pitch at the fundamental of complex tones (ρ = −0.40, p = 1.5E-4), as well as the tonal (ρ = 0.37, p = 5.0E-4) and rhythmic scores (ρ = 0.35, p = 8.2E-4) of the musical aptitude test AMMA. Figure 1B graphically displays the correlation of the total AMMA score, including both the tonal and rhythmic dimension, with musical experience. According to Gignac and Szodorai (2016), correlational effects were typical for onset ramp discrimination and relatively large for frequency discrimination, tone duration discrimination, rhythm and fundamental pitch perception, and the tonal and rhythmic scores of the AMMA test.
There was a positive correlation between the tonal AMMA score and the GM volume of HG, especially on the right side (ρ = 0.24, p = 0.046; Fig. 5A). This corroborates earlier findings on the importance of right HG for frequency and melody processing (Schneider et al., 2005). The AMMA total score was correlated with right (ρ = −0.40, p = 1E-3) and left (ρ = −0.37, p = 3E-3) P1 latency, right (ρ = −0.31, p = 0.012) and left (ρ = −0.32, p = 9E-3; Fig. 5B) N1 amplitude, the pitch perception preference index (ρ = −0.32, p = 3E-3; Fig. 5C), frequency discrimination threshold (ρ = −0.27, p = 0.012; Fig. 5D), and duration threshold (ρ = −0.26, p = 0.015; Fig. 5E). These findings demonstrate that musical aptitude according to Gordon’s (1989) concept of audiation is manifested in neuroanatomical characteristics of AC, neurofunctionally at the level of primary and secondary auditory processing, and in dimensions of auditory perception. The variable “education environment” was positively correlated with “absolute P1 asynchrony,” suggesting that social background influences the neurofunctional activation of AC (Fig. 5F).
We also calculated correlations between IMP_5 and developmental progress over the total observation period (MT5 minus MT1) for the variables of interest, which reflect training-induced changes from age ∼8–18 years. Increasing musical practice led to larger training-induced changes for the MEG-parameters right P2 latency (ρ = 0.36, p = 1.6E-3) and left P2 latency (ρ = 0.27, p = 0.014). In particular, musical expertise was related to a faster maturation of late P2 subcomponents. This probably reflects more complex response patterns in musicians because of a better separation of anterior and posterior P2 areas in the STG, being characteristic of higher auditory pattern recognition skills (Benner et al., 2023). Moreover, the increase of left P2 amplitude over time accelerated with musical training (ρ = 0.22, p = 0.037). These findings are consistent with the high sensitivity of the P2 to musical practice and signify an increasing training-induced neural efficiency from MT1 to MT5.
As already indicated by a significant ANOVA interaction between musical expertise and frequency discrimination, the correlational analyses revealed that with increasing musical practice the time-dependent changes decreased for the auditory skill of frequency discrimination (ρ = 0.43, p = 1.2E-4). This is probably because of a poor initial performance of nonmusicians, who caught up from MT1 to MT2.
The extent of musical practice at the end of the study (IMP_5) was positively correlated with educational environment (ρ = 0.50, p = 5.6E-7), but not with the other two socioeconomic variables. Accordingly, the two groups differed in educational background (t(df=69.5) = 3.7, p = 3.7E-4). Obviously, children from educated families tend to engage more strongly in musical activities.
In summary, these findings provide evidence that musical expertise is unequivocally related to GM volumes, unrelated to WG ratios, clearly related to auditory performance, and weakly related certain MEG parameters. Although playing a musical instrument is often associated with an advantageous social background, particularly with educational environment, our findings demonstrate that musicianship per se is crucial for the described neurologic and perceptual findings.
Discussion
The presented findings for the first time provide a comprehensive overview of the interplay between stable neurologic dispositions, age-dependent maturational plasticity and learning-induced plasticity through musical training of the human auditory system from childhood to late adolescence, which are graphically represented in an integrative neurodevelopmental model (Fig. 6).
First, GM volumes of AC revealed high interindividual variability, but almost no intraindividual variability, over the entire observation period. Musicians showed significantly higher volumes of gray matter in HG, particularly in the right hemisphere. In contrast, in nonmusicians the PTs were slightly larger than the HGs, consistent with the findings of our previous cross-sectional studies (Schneider et al., 2005; Wengenroth et al., 2014; Benner et al., 2023). The GM morphology remained fairly stable from MT1 to MT5, although musical experience increased substantially in the musicians group (IMP_MT1 = 2.5; IMP_MT5 = 33.7). The fact that the morphology, shape, and gyrification of AC was fully matured at a timepoint when most participants had very little formal musical training suggests that these characteristics are determined by dispositional factors, which in turn predict how much participants will or will not engage in musical activities in their future lives. In other words, the individual macroanatomical shaping of the AC reflects what is commonly regarded as musical aptitude (Fig. 6, gray circle, bottom left).
Second, analysis of WG ratios revealed a systematic increase in myelination of AC up to MT5. This signifies a natural maturation of WM and an increase in neuronal connectivity between different auditory brain regions with age, which is independent of musical training (Fig. 6, left path). The myelination of HGs entered a saturation phase at MP3, whereas it continued to grow in PTs until adulthood (Fig. 2C–F). This suggests that HGs, which are specialized in the processing of elementary sound features (Schneider et al., 2005), mature faster than PTs. The latter are involved in more integrative, multimodal and higher cognitive functions (Griffiths and Warren, 2002; Krumbholz et al., 2005). These different developmental trajectories are in line with general evidence for longer individual maturation times of brain regions with higher complex functions that arose later in phylogenetic evolution (Coward, 2012).
Interestingly, myelination of the right PT was strikingly higher than that of the left PT and both HGs. From fMRI experiments on auditory motion processing there is evidence for a differential innervation of the right and left PT (Krumbholz et al., 2005). While the left PT predominantly responded to contralateral stimulation from the right ear, the right PT was similarly sensitive to inputs from both ears. This suggests a prevalent contralateral innervation of the left PT, but a balanced contralateral and ipsilateral innervation of the right PT, the latter subserving bilateral integration. To our knowledge, this is the first neuroanatomical evidence that the right PT functions as an integrational hub in human AC. This makes it worthwhile to further investigate the functional significance of this particular structure, its importance for auditory pathways (Rauschecker and Scott, 2009), and its function as a gateway to parieto-frontal areas (Zatorre et al., 2007; Wengenroth et al., 2014; Bücher et al., 2023).
Third, auditory evoked responses showed distinct neuroplastic changes influenced both by age-related biological maturation (Fig. 6, blue circle) and the individual’s auditory-musical learning history (Fig. 6, red circle). Thus, they can be classified as indicators both of maturational plasticity (“developmental nature”) and training-induced plasticity (“nurture”). Latencies decreased systematically from MT1 to MT5 for all components of the P1-N1-P2 complex. P1 and P2 latencies were shorter and decreased faster over time in the right than in the left hemisphere. As short AEF latencies are an indicator of high neural efficiency (Seither-Preisler et al., 2014; Schneider et al., 2022), this signifies an earlier and faster maturation of right AC. There is evidence that this bilateral asymmetry starts before birth. Prenatal studies revealed that during fetal maturation the right hemisphere develops faster, in general (Chiron et al., 1997) and the right HG develops one to two weeks earlier than the left (Chi et al., 1977).
Furthermore, the absolute asynchronies between the right and left P1 and N1 responses decreased systematically over time. Thus, a mature state of auditory processing is characterized by a fast responsiveness and synchronization between the two hemispheres. In agreement with earlier studies (Sharma et al., 1997; Ponton et al., 2000), our current data suggest a continuous decrease in P1 amplitude from childhood to adulthood. We suggest that this effect is mainly because of the simultaneous increase of the N1 response. Since the onsets of the P1 and N1 largely overlap and their orientation is inverted, the emerging N1 inevitably leads to a reduction of P1 amplitude. At the histologic level, the emerging N1 has been linked to the maturation of superficial cortical layers and their intracortical connections in later childhood (Moore and Linthicum, 2007).
Consistent with the above reported results on the myelination of AC, our neurofunctional findings show that the development of the primary P1 component originating from mediolateral HG is completed earlier than that of the secondary N1 and P2 components arising from PT and the anterior superior temporal gyrus (aSTG; Benner et al., 2017, 2023). Accordingly, the development of more complex auditory functions involving integrative higher-order processing and auditory pattern recognition (Griffiths and Warren, 2002) requires more time.
In the following, we will address the relationship between musical aptitude (Fig. 6, gray circle bottom left) and musical experience (Fig. 6, yellow circle, center). As already outlined, the observed substantial differences in GM volumes between musicians and nonmusicians (Fig. 2C,D) qualify the morphology of AC as a neuroanatomical marker of musical aptitude (Fig. 6, gray circle, bottom left).
Conversely, musical experience is reflected in specific neurofunctional changes (Fig. 6, right path). The AEFs of musicians showed significantly shorter latencies, especially in the right hemisphere. In addition, P1 responses were more synchronous in musicians, signifying better bilateral integration at least at the primary level. Furthermore, P2 amplitudes were clearly larger in both hemispheres in musically trained participants. This demonstrates that musical engagement also has a positive effect on more complex auditory information processing, which is relevant for priming multimodal integration and higher cognitive functions. Moreover, the individual P1, N1, and P2 responses were approximately equally weighted in musicians, suggesting a balanced representation of primary and secondary processes. Different from GM volume and auditory evoked responses, the development of WG ratios was unaffected by musicality (Fig. 2E,F) and hence has to be considered an exclusive marker of maturational plasticity, which continues into adulthood (Fig. 6, left path).
Musicianship relies on basic auditory abilities, such as the discrimination of frequencies, timbre, loudness, and tone duration, as well as higher-order pattern recognition skills, such as the perception of subjective pitch, meter, rhythmical and melodic structures. In addition, playing a musical instrument requires extensive multisensory integration, such as the planning of finger movements and the simultaneous processing of sensorimotor and visual information. The psychoacoustic findings of our study show that all assessed auditory skills significantly improved over time. Furthermore, musical training had a positive influence on almost all dimensions of auditory perception except intensity. The discrimination of tone duration and rhythm perception did not further improve in adolescence (Fig. 4D,E). For subjective pitch we found a U-shaped curve, suggesting three distinct developmental stages of complex sound perception (predominance of timbral, fundamental pitch and analytic overtone perception; Schneider et al., 2005; Seither-Preisler et al., 2007; Schneider and Wengenroth, 2009; Fig. 4F).
Remarkably, the effect sizes related to musical training were moderate at the neurofunctional level, but relatively large at the behavioral level. This suggests that specific auditory skills are only partly reflected in the latencies and amplitudes of the P1-N1-P2 response complex. The finding raises the question which factors are most important for neuroplastic changes of auditory evoked responses. Our recent findings have demonstrated that the P1-N1-P2 complex exhibit a strong learning-induced plasticity in response to an active listening training of only 20 h in two weeks (Schneider et al., 2022), suggesting that in principle these components are highly neuroplastic.
Does this mean that conventional musical training does not optimally stimulate the neural circuits of auditory cortex, so that the potential of learning-induced plasticity is not fully exploited? Although this is possible, before drawing such a general conclusion, it is worthwhile to have a closer look at the musical experience of our longitudinal samples. As evident from Figure 1A, the amount of musical experience in the musicians group increased substantially toward the end of the study, but not in the nonmusicians group (IMP_5: Mus: 39.7 ± 3.6; Non: 6.4 ± 1.6). However, in comparison to young and middle-aged adult professional musicians and nonmusicians (IMP YA: Mus: 147.6 ± 20.3; Non: 9.7 ± 3.6; IMP MA: Mus: 491.1 ± 45.7; Non: 27.9 ± 6.1), who have been tested in our previous studies (Bücher et al., 2023), the group differences in our current samples are not very high. Therefore, the moderate differences in MEG parameters and relatively large differences in auditory skills are likely to result from the limited accumulated practice of our young “musicians.”
The observed longitudinal progression on the neuroanatomical (WG ratio), neurofunctional, and behavioral level from childhood to late adolescence is consistent with earlier cross-sectional findings in young and middle-aged adults reported in our previous studies (Bücher et al., 2023). As evident from Figure 2C–F, the progression of WG ratios is continuous over the lifespan. Moreover, our three studies (current AMseL longitudinal study and two cross-sectional studies) reflect the same group-specific differences between musicians and nonmusicians (Figs. 2C,D, 3C,D, 4A–E).
From the first prenatal auditory impressions to culturally shaped listening experiences starting in infancy and persisting throughout life, our auditory system is permanently challenged (Fig. 6, gray circle, bottom right). Early listening experiences also influence our musical expectations and preferences (Thompson, 2015), which, together with our individual musical aptitude profile (Benner et al., 2017; gray circle, bottom left), lay the foundation for further musical interests and activities (yellow circle, center). In a recent study we found that only one to two weeks of active listening training, challenging the auditory system to complement missing information, induced surprisingly large neuroplastic and perceptual short-term effects (Schneider et al., 2022). Specifically, the listening training led to a significant bilateral synchronization of the AEFs. Furthermore, the amplitude of the P2 increased 3-fold, a change in magnitude usually requiring three to four years of musical education (Fig. 3C). Clearly, the functional activation of the human auditory system is exceptionally plastic as a result of listening experience (gray circle, bottom right). It can be assumed that natural listening experiences in early childhood prepare the ground for later more sophisticated musical learning. Taken together, this suggests that the human auditory system has an extraordinary potential for learning-induced plasticity over the lifespan that is only partly exploited by conventional musical training.
Footnotes
This work was supported by German Federal Ministry of Education and Research (BMBF) Grants 01KJ0809/10 and 01KJ1204 [collaborative project “AMseL: Audio and Neuroplasticity of Musical Learning” in cooperation with the University of Graz and the German Research Foundation (DFG) with the Heisenberg fellowship program “Sound perception between outstanding musical abilities and auditory dysfunction: The neural basis of individual predisposition, maturation, and learning-induced plasticity in a lifespan perspective” (Grant SCHN 965/7-1)]. We thank A. Rupp and M. Bendszus for providing the MEG and 3T-MRI in Heidelberg.
The authors declare no competing financial interests.
- Correspondence should be addressed to Annemarie Seither-Preisler at annemarie.seither-preisler{at}uni-graz.at or Peter Schneider at schneider{at}musicandbrain.de