Abstract
Absolute pitch (AP), the ability of some musicians to precisely identify and name musical tones in isolation, is associated with a number of gross morphological changes in the brain, but the fundamental neural mechanisms underlying this ability have not been clear. We presented a series of logarithmic frequency sweeps to age- and sex-matched groups of musicians with or without AP and controls without musical training. We used fMRI and population receptive field (pRF) modeling to measure the responses in the auditory cortex in 61 human subjects. The tuning response of each fMRI voxel was characterized as Gaussian, with independent center frequency and bandwidth parameters. We identified three distinct tonotopic maps, corresponding to primary (A1), rostral (R), and rostral-temporal (RT) regions of auditory cortex. We initially hypothesized that AP abilities might manifest in sharper tuning in the auditory cortex. However, we observed that AP subjects had larger cortical area, with the increased area primarily devoted to broader frequency tuning. We observed anatomically that A1, R and RT were significantly larger in AP musicians than in non-AP musicians or control subjects, which did not differ significantly from each other. The increased cortical area in AP in areas A1 and R were primarily low frequency and broadly tuned, whereas the distribution of responses in area RT did not differ significantly. We conclude that AP abilities are associated with increased early auditory cortical area devoted to broad-frequency tuning and likely exploit increased ensemble encoding.
SIGNIFICANCE STATEMENT Absolute pitch (AP), the ability of some musicians to precisely identify and name musical tones in isolation, is associated with a number of gross morphological changes in the brain, but the fundamental neural mechanisms have not been clear. Our study shows that AP musicians have significantly larger volume in early auditory cortex than non-AP musicians and non-musician controls and that this increased volume is primarily devoted to broad-frequency tuning. We conclude that AP musicians are likely able to exploit increased ensemble representations to encode and identify frequency.
Introduction
Absolute pitch (AP), also referred to as perfect pitch, is the ability to identify or recreate a given note or collection of notes in the absence of a reference note (Deutsch, 2013). It is not simply a better ability to hear, but the ability to mentally classify sounds into remembered categories. The prevalence of AP is relatively rare, with estimates of <1 in 10,000 persons reported (Bachem, 1955; Profita and Bidder, 1988; Deutsch, 2013), affecting both genders equally (Deutsch et al., 2006). Additionally, AP ability is rare even among expert musicians who have had the same amount of musical training and have spent tens of thousands of hours practicing and reading scores (Deutsch et al., 2009). Many noted musicians, such as Mozart, Bach, and Beethoven, had AP, whereas many other equally prominent musicians, such as Wagner and Schumann, lacked it (Sacks, 2007).
The contributions of genetics and experience to the development of AP are still debated, but it seems that there is a critical period during which musical training must occur (Levitin and Zatorre, 2003; Russo et al., 2003; Miyazaki and Ogawa, 2006). Several brain imaging studies investigating AP have identified differences in cortical thickness and connectivity in sound-, music-, and memory-processing regions (Loui et al., 2011; Dohn et al., 2015). Increased activation has been observed in the left superior temporal sulcus in AP musicians during a pitch memory task compared with controls (Schulze et al., 2009) and increased functional activations in the STG, bilateral Heschl's gyrus (HG), and middle temporal gyrus (MTG) in the AP group compared with the control group during a music-listening task (Loui et al., 2012). However, the fundamental neural mechanisms underlying AP are not well understood.
We wondered whether differences in the precision of low-level frequency representation might account for the special abilities in AP. The human auditory cortex is organized into tonotopic maps, although the exact orientations of the primary gradients in the primary auditory cortex (A1) in HG are subject to debate (Saenz and Langers, 2014). We developed a population receptive field (pRF) model of the frequency-tuning response of individual voxels in the auditory cortex and compared center frequency (CF) and tuning sharpness (Q) in A1, rostral (R), and rostral-temporal (RT) areas among groups of AP musicians (AP group), matched non-AP musicians (MUS group), and controls without musical training (CON group). We hypothesized that the AP group would have sharper frequency tuning than the MUS and CON groups, which could explain the high accuracy of their pitch discrimination and identification (Bidelman et al., 2014).
Materials and Methods
Subjects
A total of 61 subjects were tested, including 20 AP [mean age (±SD) 25.2 ± 7.6 years, 13 males], 20 MUS (mean age 25.5 ± 7.4 years, 13 males), and 20 CON (mean age 25.4 ± 7.4 years, 13 males) subjects. An additional musician subject scored high on the AP test but did not realize she had AP and was denoted as quasi-AP and omitted from the main groups. She reported using a tonal reference (middle C) and relative pitch comparisons on the AP test. AP and MUS subjects were recruited from notices advertised in university music departments and by word of mouth. Although AP is rare in the general population, AP subjects were readily identified; 18/20 of the AP subjects had music-related professions. A comprehensive auditory questionnaire was collected for each subject that pertained to musical background, education, primary instrument/voice, age of onset of musical training, and AP (Table 1). Subjects in each group were matched for age (F(2,59) = 0.011, p = 0.99), gender (F(2,59) = 0, p = 1), handedness (F(2,59) = 0.38, p = 0.69), and number of languages spoken (F(2,59) = 0.66, p = 0.52). Each group had three to four subjects who spoke a tonal language (e.g., Mandarin). AP and MUS subjects were matched on their primary instrument, onset age of musical training (F(1,39) = 1.3, p = 0.27), and the number of hours of musical training per week (F(1,39) = 2.4, p = 0.44). The ability to judge one note in relation to another given a reference tone is known as relative pitch (RP). RP is very common and all musicians (both AP and MUS) reported having RP. In the CON group, minimal musical training was defined as not having any current musical training in any instrument and having <3 years of any musical training and exposure overall. Of the 20 CON subjects, 11 had no musical training or exposure to any instrument, whereas 9 had minimal exposure, <3 years of musical training, and practiced <6 h/week during that time.
Subject demographics and musical experience
Before the collection of data, written informed consent was obtained from each subject after detailed explanation of the experimental procedure. All subjects were screened for normal hearing, had normal structural MRI scans, and did not report any hearing impairments or neurological disorders. The study was approved by the Human Participants Review Committee at York University.
Behavioral tests
AP test.
A standardized AP test (http://www.musicianbrain.com/aptest/), developed in the laboratory of Gottfried Schlaug, permitted objective classification of AP status. The AP test consisted of 24 sine wave tones drawn from the chromatic scale (C4–B4 repeated twice and randomized per trial). Data were collected on four trials (for a total of 96 tones presented). AP ability was confirmed if the accuracy within one semitone was 90% or above (Miyazaki, 1988; Zatorre and Beckett, 1989; Hamilton et al., 2004). Normal audiometric thresholds were confirmed for each subject using an audiometer.
Just noticeable difference (JND) test.
A JND frequency test was administered to determine the smallest detectable difference between two pitches. Two independent tests were run using different base frequencies: 1000 Hz (which does not correspond to a musical note) and 987.76 Hz (the equitempered tone of B5). The JND threshold for pitch depends on the frequency of the tone and sound level, as well as duration and the suddenness of the frequency change. We used a 250 ms pure tone with onsets and offsets ramped with a 10 ms cosine. The experiment was programmed in MATLAB (The MathWorks) using the Psychoacoustics Toolbox (Soranzo and Grassi, 2014). Two tones, the base frequency and a higher frequency, were presented in random order and the subjects pressed a key to indicate which pitch was higher. The higher tone was initially 100 Hz above the base frequency and changed on each trial based on the subjects' correct or incorrect responses according to a maximum likelihood procedure that efficiently estimated the difference threshold (Grassi and Soranzo, 2009). For each base frequency, 5 blocks were presented with 30 trials per block using a p-target (sweet point) = 80.9% and β = 0.5 (slope of the logistic psychometric function).
Melody mistuning (amusia) test.
To determine whether any of the controls had amusia, an additional behavioral test was administered pertaining to melody mistuning/tone deafness detection developed by Dr. Mandell (http://jakemandell.com/tonedeaf/). The test involved 36 short musical phrases that repeated twice. After hearing each repeated musical phrase, the participant would indicate if the melodies were both the same or different. The test was made purposefully difficult such that expert musicians on averaged scored 75% correct. Each melodic excerpt varied in musical timbre, duration, and tempo.
The behavioral tests were analyzed with SPSS (Mac version 23; IBM, RRID:SCR_002865) and the α value was Bonferroni adjusted for multiple comparisons for post hoc comparisons between groups.
Imaging
All images were acquired using a 3 T Siemens Trio MRI scanner with a 32-channel head coil at York University. To reduce head motion, cushions were placed around the subjects' heads. A high-resolution T1-weighted 3D MPRAGE scan of the entire head was collected with the following parameters: TR = 1.9 s, TE = 2.52 ms, 1 mm thick slices, 256 × 256 matrix (1 mm3 isotropic voxel size). Ten functional runs with 160 time points each were collected in a single session with the following parameters: echo-planar, gradient echo sequence, 192 mm field of view, 128 matrix (1.5 ×1.5 × 2 mm voxel size), TR = 2 s, TE = 30 ms, 22 2 mm thick slices, flip angle = 90°, partial phase Fourier = 6/8, iPAT = GRAPPA, acceleration factor = 3. A whole head echo-planar image (WHEPI) was collected with the same parameters, except with 77 slices and TR = 7 s, to aid in registration.
Stimuli
Auditory stimuli were presented on MRI-compatible on-ear piezo headphones (MR Confon). Subjects wore attenuating earplugs under the headphones and, to ensure a consistent balance between the background scanner noise and the stimulus, the volume settings of the computer and amplifier were fixed across subjects; all subjects reported clearly hearing the stimuli throughout the scanning procedures. Human hearing ranges from 20 Hz to 20 kHz, with the greatest sensitivity in the 200–2000 Hz range, which occupies up to two-thirds of the basilar membrane (Kollmeier et al., 2008). The stimuli consisted of pure tone logarithmic sweeps (chirps with unity amplitude and continuous phase) (De Martino et al., 2013) ranging from 20 Hz to 10 kHz, including 6 ascending and 6 descending 24 s sweeps in varying order interleaved with four 8 s blank periods. Subjects were instructed to fixate on a cross on the visual display and to remain still while listening to the auditory stimulus.
Analysis
Data were preprocessed using MATLAB (RRID:SCR_001622), FreeSurfer (RRID:SCR_001847), and AFNI (RRID:SCR_005927). Surface reconstructions were derived from using a single T1 collected at the beginning or end of each experimental scanning session. Cortical data were visualized in AFNI with the surface mapper SUMA. The anatomy was individually analyzed on surface inflated brains in the native space of each subject by a single rater blinded to group membership. HG was readily apparent in this space and parcellations of A1, R, and RT regions of interest (ROIs) were based on the anatomy and directions of tonotopic gradients. For subjects with a bifurcated and trifurcated HG, the most anterior portion was chosen using low-to-high gradients as landmarks, consistent with previous studies (Kaas and Hackett, 2000; Humphries et al., 2010; Moerel et al., 2014). Anatomical statistics were calculated using SPSS (Mac version 25; IBM, RRID:SCR_002865).
Each functional run was aligned by composing a two-step affine registration of the EPI slab registered to the WHEPI and the WHEPI registered to the surface anatomy. The first four volumes of each functional scanning run were discarded. To compensate for subject head movement, the remaining volumes were registered to a single volume obtained during the same scanning session. In addition to motion correcting the functional imaging data, we also used a volume-censoring procedure (Power et al., 2012). The framewise displacement is an aggregate measure of the translational and rotational head movement gleaned from the motion correction transformation for each volume in the functional series. Volumes with a framewise displacement >0.4 mm were flagged for censoring and were not included in the mean functional series. The functional data were time shifted and de-obliqued and the linear trend was removed from the time series of each voxel. The functional data were resampled to 0.75 mm3 and smoothed in surface space with a 3 mm full width at half maximum (FWHM) kernel and combined into a single mean run.
Data were analyzed using an open source (DeSimone et al., 2016) adaptation of the pRF model (Dumoulin and Wandell, 2008). pRF maps were thresholded at r2 = 0.0625 (r = 0.25), p = 0.01. We used a 1D Gaussian pRF model to describe the BOLD response of each voxel in terms of stimulus-referred CF and bandwidth (BW), which was the SD of the Gaussian. The model fitting consisted of two phases: a brute force grid search of a sparsely sampled stimulus space to find the initial guess and a subsequent gradient descent search in a more finely sampled space. In addition to the two parameters of the Gaussian frequency response, we also modeled the baseline and amplitude of the voxelwise BOLD signal to account for its arbitrary units. We allowed the model to explore the frequency space 2 Hz–10 kHz. Q was computed as the ratio between CF and the FWHM of the Gaussian pRF response model, where FWHM = 2.355 * SD. Therefore, sharper tuning relates to a higher Q value, whereas broader tuning relates to a lower Q value.
Results
Behavioral measures
A one-way ANOVA revealed a significant main effect of group (F(2,59) = 1248.6, p < 0.001) among the AP, MUS, and CON groups for the AP test. Post hoc tests between groups revealed significantly increased scores for AP (99.58 ± 0.29%, mean ± SEM) compared with both MUS (10.0 ± 9.1%, p < 0.001) and CON (7.9 ± 6.7%, p < 0.001), but no significant difference between MUS and CON (p = 0.33) (Fig. 1).
Behavioral test scores. AP: open black circles, MUS: open triangles, CON: open squares. n = 20 per group. A, Absolute pitch test scores. “X” represents the quasi-AP subject who scored 67% correct on the AP test. B, JND thresholds. C, Melody mistuning detection (amusia) test results. Error bars indicate SEM. **p < 0.01; ***p < 0.001.
JNDs were submitted to a one-way repeated-measures ANOVA with base frequency (1000 or 987.76 Hz) as the repeated measure. A significant main effect of group (F(2,57) = 13.0, p < 0.001) was observed but no main effect of frequency (F(1,57) = 2.6, p = 0.11) or interaction between group and frequency (F(2,57) = 0.4, p = 0.67). Comparisons between groups showed that both AP (5.1 ± 2.1 Hz) and MUS (9.3 ± 2.1 Hz) showed smaller JNDs than CON (19.6 ± 2.1 Hz, p < 0.001 and p = 0.001, respectively), but did not significantly differ from each other (p = 0.15). Of particular interest was whether the AP subjects would differ on the JND test between frequencies because only one of the frequencies corresponded to a named musical note, but a paired-samples t test revealed no significant effect of frequency on JND in this group (t(19) = 0.53, p = 0.61) (Fig. 1B).
A one-way ANOVA revealed a significant main effect of group (F(1,59) = 9.27, p < 0.001) for the melody mistuning test. Post hoc tests revealed significantly increased scores for AP compared with CON (p < 0.001) and MUS compared with CON (p = 0.021). There were no significant differences between AP and MUS (p = 0.47). The AP test average score in AP participants was 81.1 ± 1.4%; in MUS, it was 77.4 ± 2.1%; and in CON, it was 70.0 ± 1.8% (Fig. 1C). CON subjects ranged in test scores from 50% to 83% correct and were within the distribution of >61,000 participants who took the test online.
Imaging
Anatomy
A total of 58.3% of all subjects collapsed across hemispheres and groups had an intact HG, whereas 31.7% had their HG partially bifurcate and 10% had their HG partially trifurcate. Interestingly, the left hemisphere had fewer bifurcations or trifurcations (30%) than the right hemisphere (53.3%). We subjected the total volume of HG (A1 + R + RT) to a one-way repeated-measures ANOVA with hemisphere as the within-subjects factor. We found a significant main effect of group (F(2,57) = 10.7, p < 0.001). Planned comparisons revealed that the HG volume in the AP (2661 ± 127 mm3) was significantly larger than in either MUS (1915 ± 127 mm3, p < 0.001) or CON (1973 ± 127 mm3, p < 0.001), which did not differ significantly from each other (p = 0.75). In addition, there was a significant main effect of hemisphere (F(1,57) = 9.6, p = 0.003) with the left HG (2326 ± 104 mm3) significantly larger than the right (2041 ± 64 mm3), but there was no significant interaction between group and hemisphere (F(2,57) = 0.38, p = 0.69). A Pearson product-moment correlation coefficient (two-tailed) was computed to assess the relationship between the volume of HG in the AP and MUS groups and the mean number of practice hours per week and the onset of age of musical training, but no significant correlations were found.
The volumes of the individual cortical areas A1, R, and RT were each subjected to a one-way repeated-measures ANOVA, with hemisphere as the repeated measure (Fig. 2). For A1, there was a significant main effect of group (F(2,57) = 21.8, p < 0.001). Planned comparisons revealed that A1 volume was greater (p < 0.001) in AP (1284 ± 53 mm3) than in MUS (832 ± 53 mm3) or CON (870 ± 53 mm3), but did not significantly differ between MUS and CON (p = 0.63). The main effect of hemisphere was also significant (F(1,57) = 16.6, p < 0.001), with the left hemisphere larger than the right, 1068 ± 41 vs 922 ± 29 mm3, but no significant interaction between group and hemisphere (F(2,57) = 0.49, p = 0.61). For R, there was a significant main effect of group (F(2,57) = 5.12, p = 0.009). Planned comparisons revealed that R volume was greater in AP (719 ± 40 mm3) than in MUS (549 ± 40 mm3, p = 0.004) or CON (579 ± 40 mm3, p = 0.016), but did not significantly differ between MUS and CON (p = 0.61). The main effect of hemisphere was also significant (F(1,57) = 4.57, p = 0.037), with the left hemisphere larger than the right, 655 ± 34 vs 576 ± 24 mm3, but no significant interaction between group and hemisphere (F(2,57) = 0.19, p = 0.83). For RT, there was a significant main effect of group (F(2,57) = 6.08, p = 0.004). Planned comparisons revealed that RT volume was greater in AP (690 ± 40 mm3) than in MUS (530 ± 40 mm3, p = 0.006) or CON (515 ± 40 mm3, p = 0.003), but did not significantly differ between MUS and CON (p = 0.78). The main effect of hemisphere was only marginally significant (F(1,57) = 2.91, p = 0.093), with the left hemisphere larger than the right, 607 ± 32 vs 549 ± 25 mm3, and there was no significant interaction between group and hemisphere (F(2,57) = 0.35, p = 0.70).
Volumes of A1, R, and RT, collapsed across hemispheres. AP: open black circles, MUS: open black triangles, CON: open black squares. n = 20 per group. Error bars indicate SEM. *p < 0.05; **p < 0.01; ***p < 0.001.
Function
We conducted a one-way ANOVA comparing the percentage of time points censored for excessive motion. Across groups, 4.32 ± 0.70% of time points were removed due to a framewise displacement of >0.4 mm, but this percentage did not significantly vary among the three groups (F(2,59) = 0.074, p = 0.93). The frequency tuning of voxels in the three cortical areas were generally well described by the Gaussian pRF model. Across all subjects and areas, 33.2% of voxels exhibited below threshold responses to the auditory stimuli. In 34.0% of the remaining activated voxels, the fitted CF and/or BW parameters fell outside 100 Hz-10 kHz. Therefore, of the voxels located within the anatomically defined auditory areas, 44.1% were well activated and well characterized by the Gaussian pRF model and their parameters were subjected to further analysis. To confirm that there were no biases in activation across groups, the percentage of activated voxels in each area were subjected to a repeated-measures ANOVA, with hemisphere and area as within-subjects factors. The main effect of group was not significant (F(1,57) = 2.06, p = 0.14), but there was a significant effect of hemisphere (F(1,57) = 4.40, p = 0.040), with the right hemisphere better activated than the left (46.2 ± 2.6 vs 41.8 ± 2.6%). There was also a significant interaction between hemisphere and area (F(1,57) = 6.91, p = 0.011), with the asymmetry most pronounced in area RT, with 49.0 ± 3.6% of voxels activated in the right hemisphere compared with 39.4 ± 3.5% in the left.
Examining the topography of CF representation, we found consistent tonotopic maps with a high-to-low frequency gradient in A1, reversing to low-to-high frequency in R and reversing again to high-to-low in RT; the gradients extended into surrounding areas (Moerel et al., 2014). However, our results may also be interpreted as parallel high-to-low gradients within each auditory region (Saenz and Langers, 2014). The topography of Q was less well defined but approximately followed the gradient of CF, with narrow-to-broad tuning oriented in the same direction as high-to-low frequency. Typical examples for each group of the distributions over the early auditory cortex of CF and Q are shown in Figure 3.
pRF maps of tonotopy (center frequency) and Q in auditory cortex in representative subjects from each group. A, Left surface inflated hemispheres. B, Left zoomed in view of the pRF map for A1, R, RT and belt regions. Solid black lines indicate boundaries between tonotopic maps and black arrows indicate direction of tonotopic gradient (low-high) consistent with previous interpretations (Kaas and Hackett, 2000; Moerel et al., 2014). C, Left magnified view of the Q maps.
To qualitatively compare the auditory representations among the three groups, we examined the distribution of CF and Q (Fig. 4). Voxels were pooled from all subjects and hemispheres in each group and CF and Q were sorted into 100 bins, ranging from 100 to 10,000 Hz and Q = 0–1, respectively. We measured the statistical significance of the difference at each bin in the distributions by calculating a paired t test that included 5 bins on either side, adjusted for multiple comparisons (100 bins) using a reduced α = 0.0005. Planned comparisons were made between the AP and MUS groups and between the MUS and CON groups. In Figure 4, the bins in the distribution that were significantly different from each other are indicated by circular markers (red for AP vs MUS and green for MUS vs CON). As can be seen, the increase in cortical area in A1 for the AP group compared with MUS was mostly due to a larger number of voxels with CF < 1000 Hz and those with broad tuning, Q < 0.6. Interestingly, the AP group had significantly fewer voxels with CF in the 1250–2000 Hz range compared with the MUS group, which in turn had significantly more voxels in this range than the CON group. Similar, but smaller differences between AP and MUS were observed in area R for CF in the range of 400–800 Hz and for Q = 0.1–0.2. There were few differences between AP and MUS in area RT for either CF or Q. There were few differences at all in Q between the MUS and CON groups in any area, but the MUS group had a significantly larger number of voxels than the CON group with CF ∼500–1000 Hz in areas R and RT.
Center frequency and Q distributions for A1, R, and RT in the AP group (red), the MUS group (green), and the CON group (blue). Each distribution is across all subjects in that group. The red dots indicate points at which the AP distribution differs significantly from the MUS distribution and the green dots indicate points at which the MUS distribution differs significantly from CON.
Discussion
This has been the first study to measure the tonotopic and tuning sharpness organization in a large sample size of the AP, MUS, and CON groups within the A1, R, and RT subdivisions of HG. We observed that A1, R, and RT were significantly larger in AP than MUS or CON subjects, which did not differ significantly from each other. The increased A1 and R area in AP was primarily due to increases in the volume devoted to low-frequency and broadly tuned responses, whereas the distribution of tuning responses in area RT did not differ significantly.
Previous studies in nonhuman primates found sharper (narrower) neural tuning within A1 core regions and broader tuning in belt regions (Rauschecker et al., 1995; Hackett et al., 1998; Rauschecker and Tian, 2004; Kajikawa et al., 2005; Kuśmierek and Rauschecker, 2009). Tuning widths were also reported narrower in the human auditory core regions compared with nonhuman primates based on electrophysiological recordings (Bitterman et al., 2008) and a filter model of cochlear responses suggested sharper tuning in musicians (Bidelman et al., 2014). Whereas broader frequency tuning might be counterintuitive, it is suggestive of a greater utilization of ensemble encoding of frequency, with more cortical machinery involved in the encoding of frequencies in AP subjects. Ensemble encoding can allow more efficient and thus higher bandwidth representations (Cohen et al., 2016). Additionally, frequency tuning within a voxel in AP subjects may be broader either due to the underlying neurons themselves having broader tuning or due to more scatter among the neurons within the voxel. Given that there are smooth tonotopic gradients in auditory cortex, the former seems more likely.
We found consistent tonotopic map interpretations of the cortex that matched previous studies: a high-to-low progression in A1, followed by a reversal gradient of low-to-high in R, followed by a gradient of high-to-low in RT, with extended gradients into neighboring belt regions as found in neuroimaging studies in humans (Da Costa et al., 2011; Moerel et al., 2012, 2014; De Martino et al., 2013) and in microelectrode studies nonhuman primates (Morel et al., 1993; Kaas and Hackett, 2000). The auditory cortical anatomical model in the monkey has been well defined predominantly based on neuro-electrical recordings (Kaas, 2011). However, there exist many differences in the human compared with monkey auditory cortex, including larger cortical surface areas, additional gyri, more interindividual variability, and sharper frequency tuning in humans (Galaburda et al., 1978; Bitterman et al., 2008). Therefore, it may not be straightforward to apply the monkey model to the human brain for direct comparison. A variety of parcellation schemes have been proposed for the human auditory cortex. Barton et al. (2012) suggest a cloverleaf parcellation of auditory cortex based on their periodotopy measurements. However, Herdener et al. (2013) suggested a different parcellation using periodotopy, and Leaver and Rauschecker (2016) found no evidence to support large-scale organized periodotopy at all. For further review of the inconsistencies in parcellation schemes, see Moerel et al. (2014). Our results are not strongly dependent on the precise parcellation scheme because we observed a broad and gradually decreasing trend from A1 to R to RT.
Based on our behavioral tests in this study, we report that AP and MUS had significantly smaller JND thresholds than their CON counterparts. Our findings are consistent with a previous study finding no differences in JND thresholds between AP and MUS groups (Fujisaki and Kashino, 2002), although musicians overall have done significantly better than non-musician controls (Micheyl et al., 2006). In addition, a previous study found sharper cochlear tuning in a high 4 kHz frequency region in musicians compared with non-musicians using both forward- and simultaneous-masked psychophysical tuning curves and a relationship between years of musical training and sharper tuning (Bidelman et al., 2014).
An ancillary finding to this study was that 20% of the AP subjects who were recruited did not have any musical training before the age of 7 years (the critical period window) and only started any formal musical training and note association labeling in their early to late teens. This finding does not support the critical period hypothesis suggesting that a child must be exposed to musical training in note labeling before the age of seven for AP ability to emerge (Levitin and Zatorre, 2003; Russo et al., 2003; Miyazaki and Ogawa, 2006). Although debated, further claims that AP is linked to a critical period suggest that musical training after the age of 9 years very seldomly leads to AP emergence, which is additionally supported by no reported cases in adults successfully developing it (Brady, 1970; Cohen and Baird, 1990; Ward, 1999; Levitin and Rogers, 2005). Our findings suggest that genetics may play a more salient role for AP ability to emerge in neurodevelopment as opposed to a critical period alone.
To our knowledge, this is the first study to separately extract the volume of auditory ROIs comprising HG (A1, R, and RT) in humans categorized by pitch perception attributes. Of most significance, A1 volume was significantly larger in both hemispheres in AP compared with MUS and CON subjects. Our results are consistent with reported findings that A1 occupies approximately half of the HG volume (Rademacher et al., 2001). Within A1, neurons exhibit characteristic responses to harmonic spectral stimuli and periodic temporal modulations (Wang, 2013). Rats with bilateral A1 inactivation showed impairments in their response to pure tone frequency changes (Talwar et al., 2001). It seems that A1 does have some related function with auditory pitch discrimination. However, it is not clear whether only a subset of A1 neurons participates in pitch encoding, with other neurons analyzing temporal or spectral components of sound or if pitch is more preferentially encoded in secondary cortical fields beyond A1. Nonetheless, A1 is implicated in AP, suggesting that its enhanced volume may be related to AP ability in pitch categorization and perception.
Areas outside of the auditory core, including the belt, parabelt, and regions beyond, may play a more relevant role in pitch perception. For example, a previous fMRI study reported cortical activation in response to pitch height that extended beyond auditory core regions into the posterior planum temporale, whereas cortical activation in response to pitch chroma (i.e., pitch class, where a set of pitches are related to each other by octave) extended to the planum polare, a region just anterior to A1; a hierarchical stream of pitch processing was proposed to account for these findings, with areas beyond the primary auditory cortex having specialized perceptual roles (Warren et al., 2003). More anterior regions to the core were responsive to object-independent auditory stimuli, whereas more posterior regions to the core including the planum temporale were more responsive to object identification. Future studies need to account for these extended regions involved in pitch processing, including the read-out of lower-level representations, which we found to be markedly different in AP compared with MUS and CON subjects.
We report all auditory ROIs in left HG trended larger than those in the right HG, with A1 and R being significantly larger. Several studies have reported that left HG was ∼10–30% larger than the right using MRI (Penhune et al., 1996; McCarley et al., 2002; Sumich et al., 2002; Dorsaint-Pierre et al., 2006; Takahashi et al., 2006; Golestani et al., 2007; Salisbury et al., 2007) and postmortem measurements (Chance et al., 2008; Smiley et al., 2013), whereas other neuroimaging studies, including those with large sample sizes, did not report hemispheric asymmetry in HG (Kulynych et al., 1995; Frangou et al., 1997; Schneider et al., 2002; Knaus et al., 2006). We used the stringent boundary delineations of the recent working model of the human auditory cortex that only includes A1, R, and RT in HG (Moerel et al., 2014). The reported discrepancies on asymmetry in HG are likely due to various interpretations and methods of defining the borders of HG. A few studies included surrounding areas of the planum temporale and planum polare within HG, which lead to considerable overestimation of the auditory core size and potentially biased the asymmetry measurements (Da Costa et al., 2011; Herdener et al., 2013; Langers, 2014).
In conclusion, we found that each of the auditory areas in HG was significantly larger in subjects with AP compared to MUS and CON groups, and that this increased cortical area was primarily broadly tuned to frequencies below 1000 Hz.
Footnotes
The authors declare no competing financial interests.
- Correspondence should be addressed to Keith A. Schneider at keithas{at}udel.edu