Abstract
In normal listeners, the tonal rules of music guide musical expectancy. In a minority of individuals, known as amusics, the processing of tonality is disordered, which results in severe musical deficits. It has been shown that the tonal rules of music are neurally encoded, but not consciously available in amusics. Previous neurophysiological studies have not explicitly controlled the level of attention in tasks where participants ignored the tonal structure of the stimuli. Here, we test whether access to tonal knowledge can be demonstrated in congenital amusia when attention is controlled. Electric brain responses were recorded while asking participants to detect an individually adjusted near-threshold click in a melody. In half the melodies, a note was inserted that violated the tonal rules of music. In a second task, participants were presented with the same melodies but were required to detect the tonal deviation. Both tasks required sustained attention, thus conscious access to the rules of tonality was manipulated. In the click-detection task, the pitch deviants evoked an early right anterior negativity (ERAN) in both groups. In the pitch-detection task, the pitch deviants evoked an ERAN and P600 in controls but not in amusics. These results indicate that pitch regularities are represented in the cortex of amusics, but are not consciously available. Moreover, performing a pitch-judgment task eliminated the ERAN in amusics, suggesting that attending to pitch information interferes with perception of pitch. We propose that an impaired top-down frontotemporal projection is responsible for this disorder.
Introduction
In the majority of individuals, musical knowledge, such as tonality, guide perception and performance. Awareness of tonal violations results from conscious access to tonal knowledge that was acquired implicitly through exposure to music throughout life. Despite normal exposure to music, people with congenital amusia fail to detect tonal violations. As a consequence, amusics may not experience music or enjoy musical activities as most humans do (Peretz, 2013).
There is increasing evidence that amusics have some tonal knowledge. For example, tonal pitch structure can facilitate the discrimination of speech phonemes and acoustic timbre in amusics (i.e., priming), which suggests the presence of sophisticated tonal knowledge (Omigie et al., 2012a; Tillmann et al., 2012). Similarly, electrical brain responses elicited by tonal violations indicate that amusics possess tonal knowledge but are unable to use it. Peretz et al. (2009) presented amusics and controls with melodies that occasionally contained a tonal violation. Amusics could not perceptually distinguish the in-tune melodies from the melodies with a pitch deviant; however, the pitch deviants evoked an N200 that was related to the automatic detection of a melodic tonal violation [i.e., early right anterior negativity (ERAN); Koelsch, 2011]. In the controls, the ERAN was followed by a P600 that indexes the attentive process of integrating the incongruous note into the melodic context (Besson and Faita, 1995; Patel et al., 1998; Brattico et al., 2006). No P600 was observed in amusics (Peretz et al., 2009). Similarly, Omigie et al. (2013) found that the N1 was augmented for tones that were less expected based on the tonal hierarchy (Krumhansl and Kessler, 1982). The N1 modulation was observed in both controls and amusics while detecting a timbral deviant (Omigie et al., 2013). Overall, these studies demonstrate that the amusic brain is sensitive to tonality but lacks access to this knowledge.
Critically, in previous studies, attention was not systematically controlled. Attention modulates the emergence of a P600 in addition to early processing of acoustic information. For example, the N1, mismatch negativity, and ERAN are all enhanced when the incoming stimulus is attended (Näätänen and Picton, 1987; Näätänen et al., 2007; Koelsch, 2011). This modulation is believed to be part of a selection process that arises from frontal areas of the brain, as these attention-selection processes are reduced after frontal lobe damage (Knight et al., 1981; Chao and Knight, 1998). In amusia, it is likely that aspects of these frontal attentional mechanisms are disrupted. For example, amusia is associated with decreased white matter concentration in the inferior frontal gyrus (IFG; Hyde et al., 2006; Mandell et al., 2007), decreased volume of the arcuate fasciculus (Loui et al., 2009), and reduced functional connectivity between the IFG and the auditory cortex (Hyde et al., 2011; Albouy et al., 2013). An abnormal propagation of pitch information along this pathway may cause fluctuations in both the early-negative and late-positive brain responses associated with attentional mechanisms related to the detection of pitch deviances. Importantly, it is likely that the disordered attentional mechanisms in amusia relate to conscious access (Loui et al., 2008; Peretz et al., 2009).
Materials and Methods
Rationale
To control for attention and manipulate conscious access to the rules of tonality, participants were given two tasks. First they were asked to detect an individually adjusted near-threshold click embedded in a melody. This task allowed us to record cortical responses to tonal violations when attention was sustained on an auditory task, but focused away from tonal information. Critically, using individual thresholds for the click helped maintain a similar level of attention in all participants. Second, participants were asked to detect a tonal violation in a melody. This task allowed us to record cortical responses to tonal violations when attention was focused on tonal information. In control participants, we expect both tonal violations to evoke an ERAN in the click-detection and pitch-detection tasks, and a P600 in the pitch-detection task only. If conscious access is the core deficit of amusia, and not sustained attention or auditory processing, a normal ERAN and N1 should be observed in amusics. In contrast, the P600 should be absent because it is related to conscious access of a tonal violation. One limitation with this task is that amusics have known difficulties detecting tonal violations, and thus their overall level of attention during the pitch-detection task may differ from their overall level of attention during the click-detection task. To address this issue, a second analysis was done that compared ERPs between the pitch-detection and click-detection tasks within each group. Differences in these ERPs would highlight attentional differences between the two conditions. Specifically, it is well known that there is an increased negativity during the N1-P2 epoch (i.e., 100–300 ms) that is related to sustained auditory attention (Näätänen and Picton, 1987). Given that sustained attention is required for both the click-detection and pitch-detection tasks, we expect little difference between the ERPs in the pitch-detection and click-detection tasks when the target note is in tune. Furthermore, we expect an increased positivity related to the P600 when the note violates the tonal structure of the melody during the pitch-detection task in controls only. This pattern of results would demonstrate similar levels of attentional focus between the two tasks, and demonstrate that detecting a near-threshold click requires a similar level of attention as detecting a pitch deviant.
Participants
Twenty people were recruited for the study and provided formal informed consent according to the Research Ethics Council for the Faculty of Arts and Sciences at the Université de Montréal. Nine were amusic (six female and three male) and 11 were nonamusic controls (eight female and three male). All participants were nonmusicians, and the group of controls and amusics were matched in terms of age and education level (Table 1). Amusics were determined by their scores on the Montreal Battery of the Evaluation of Amusia (MBEA; Peretz et al., 2003). All amusic participants had a global MBEA score that was two SDs below the mean of the controls. In addition, amusics were severely impaired in the detection of out-of-key notes in melodies and less impaired in the detection of off-beat notes in the same melodies when assessed with the online test (Peretz et al., 2008).
Material and procedure
Participants were presented with 40 novel melodies constructed from the Western major scale. All melodies were four bars long and played using a synthesized piano tone. The root-mean square (RMS) amplitude of each note was equated. On average, melodies had 10.3 notes (range: 7–15 notes) and lasted 5.4 s (range: 2.8–12 s). They were randomly mixed with the same melodies in which 40 target tones were played out-of-key [±100 cents (1 semitone)] and 40 target tones that were out-of-tune [±50 cents (one-half semitone)]. All melodies were presented through Etymotic (ER-2) insert earphones at 75 dB SPL. The interstimulus interval was 2270 ms. The target tone was always on the first beat of the third bar and was always 500 ms long. The melodies were the same as the melodies from a previous study on amusia (Peretz et al., 2009). The difference in material between the prior study and this one concerns the insertion of a click in half the melodies. The clicks were 0.023 ms and were individually adjusted in amplitude so the click would be just above threshold. When present, the click occurred ≥4 notes (average 7.2 notes) after the target tone, and was calibrated in amplitude such that a listener could detect ∼75% of the clicks. Accordingly, there were six versions of each melody (i.e., in-tune, click; in-tune, no click; out-of-tune, click; out-of-tune, no click; out-of-key, click; out-of-key, no click) yielding a total of 240 melodies. For all participants, two distinct sets of 120 stimuli were used with each task comprising three randomly mixed versions of each melody. In both tasks, each type of target tone (in-key, out-of-tune, or out-of-key) occurred in one-third of the melodies and half of the melodies had a click. A sample stimulus is illustrated at the top of Figure 1.
First participants completed the click-detection task. To determine the starting amplitude of the click, participants were presented with melodies that contained a 76 dB SPL click. Participants were asked whether they could hear the click. The amplitude of the click was reduced in steps of 10, 5, 5, 3, and 1 dB until the participant could no longer detect the click. The amplitude was then increased in 1 dB steps until they could detect the click. This amplitude was chosen as the starting click amplitude in the click-detection task. Across all participants, the average level was 69.3 dB SPL (SD, 5.7). After the click threshold was determined, participants were asked whether they heard a click in the melody and, if so, how sure they were that they heard the click. After each melody, participants could respond, “click, sure,” “click, not sure,” “no click, not sure,” or “no click, sure” by pressing a button on a computer keyboard. Participants were not informed that the melodies could contain an out-of-key or out-of-tune note during the click-detection task. To maintain individual accuracy level at ∼75% correct, the intensity of the clicks was adjusted during the task. Based on the accuracy of the eight previous trials, the intensity of the click was decreased by 2.25 dB if the accuracy was at least eight out of eight and by 0.75 if the accuracy was seven out of eight. The intensity was increased by 0.75 dB if accuracy was five out of eight, and by 1.5 dB if accuracy was four out of eight or below.
Next, participants completed the pitch-detection task. In this task, participants were asked whether they heard an incongruous note in the melody, and how sure they were that there was an incongruous note. After each melody, participants could respond “wrong note, sure,” “wrong note, not sure,” “no wrong note, not sure,” or “no wrong note, sure” by pressing a button on a computer keyboard. For the pitch-detection task, participants were told to ignore the clicks. For both tasks, the response choices were displayed on the screen (Fig. 1) and the position of the response buttons on the keyboard was counterbalanced across participants. Before each test, there were 12 practice trials that included performance feedback. No feedback was provided for the experimental trials. The pitch-detection and click-detection tasks were not counterbalanced, so participants remained blind to the presence of the out-of-key and out-of-tune notes, allowing for attention to be focused on detecting the click and not on tonal anomalies.
Electric brain activity was digitized continuously from 70 active electrodes at a sampling rate of 256 Hz, with a high-pass filter set at 0.1 Hz, using a Biosemi ActiveTwo system (Biosemi). Five electrodes were placed bilaterally at mastoid, inferior ocular, and lateral ocular sites (M1, M2, IO1, LO1, LO2). All averages were computed using Brain Electrical Source Analysis (BESA; version 5.2). ERPs were averaged to the onset of the target note (i.e., out-of-tune note, out-of-key note, or in-key note). The analysis epoch included 200 ms of prestimulus activity and 900 ms of poststimulus activity. Continuous EEG was then averaged separately in the click-detection and pitch-detection task for each melody type (i.e., in-tune, out-of-tune, and out-of-key) and each electrode site. Prototypical eye blinks and eye movements were extracted from the continuous EEG. A principal component analysis of these averaged recordings provided a set of components that best explained the eye movements. These components were then decomposed into a linear combination along with topographical components that reflected brain activity. This linear combination allowed the scalp projections of the artifact components to be subtracted from the experimental ERPs to minimize ocular contamination, such as blinks and vertical and lateral eye movements, for each individual (Berg and Scherg, 1994). After this correction, trials with >120 μV of activity were considered artifacts and excluded from further analysis. Overall, an average of 13.8% of trials were rejected. To determine whether there was an impact of Group, Note type, or Listening condition, a mixed-design ANOVA was calculated on the number of rejected trials. Only the effect of listening condition was significant (F(1,18) = 5.81, p = 0.027), with more trials being rejected during the pitch-detection task compared with the click-detection task (17.3 vs 10.4%). There was no impact of Group or Note type on the number of rejected trials (p > 0.1). Averaged ERPs were then bandpass filtered to attenuate frequencies <0.1 Hz and >15 Hz and referenced to the linked mastoid.
Data analysis
Behavioral.
Behavioral data were analyzed separately for the pitch-detection and click-detection tasks. The ratings (1–4) were separated into accuracy and confidence scores. A trial was considered accurate if the person made the correct judgment, regardless of his or her confidence. If correct, it was scored as 1; if incorrect, it was scored as 0. Raw accuracy was the overall percentage of correct responses. For group comparisons, accuracy was further calculated as hits minus false alarms (HFAs). In the click-detection task, a false alarm corresponded to the hearing of a click when there was none. Similarly, in the pitch-detection task, a false alarm corresponded to reporting a wrong note when there was none. The HFA scores were calculated separately for in-key (click-detection task only), out-of-key, and out-of-tune melodies. HFAs could not be calculated for the in-key note during the pitch-detection task because responses to the in-key note were needed to calculate the false-alarm rate for out-of-key and out-of-tune notes. Confidence was quantified by separating “sure” from “not sure” responses regardless of the judgment. “Sure” responses were coded as 1. “Not sure” responses were coded as 0. Confidence was the percentage of trials reported as “sure.” These responses were analyzed using mixed-design ANOVAs that included Group (amusic, control) and Note type [in-key (click-detection task only), out-of-key, out-of-tune].
EEG.
To quantify the EEG data, a series of pairwise permutation tests was done using BESA statistics (version 1.0; Maris and Oostenveld, 2007; Maris, 2012). The analysis was entirely data driven and included every time point at each electrode in the analysis. The first part of the analysis focused on within-subject effects by comparing the ERPs recorded to the out-of-tune and out-of-key notes to the in-key note in both groups. These comparisons were performed to identify the ERAN and P600. The ERAN was defined as a difference in the ERP evoked by the out-of-key or out-of-tune note that was more negative than the in-key note during the 100–300 ms epoch at frontocentral electrodes. The P600 was defined as a difference in the ERP evoked by the out-of-key or out-of-tune note that was more positive than the in-key note during the 400–800 ms epoch at posterior electrodes. These epochs were chosen because previous studies have demonstrated that the ERAN and P600 occur within these timeframes (Besson and Faita, 1995; Koelsch, 2011). Before comparing ERPs between amusics and controls, a series of t tests was calculated that compared the amplitude of ERPs for the in-key note to the ERPs for the out-of-key or out-of-tune note at every electrode and time point. From this analysis, clusters of electrodes and time points where there were differences between the two melody types were identified. Clusters were formed over time by grouping time points based on both having a significant t test (i.e., p < 0.05). Clusters were formed over space by grouping electrodes within 4 cm of each other (i.e., adjacent electrodes) that had significant t tests at the same time point. Accordingly clusters were dynamic; that is, the electrodes that formed a cluster could change over the identified epoch. Critically, the formation of these clusters was entirely data driven. However, given the number of multiple comparisons, it was likely that some of the identified clusters were actually type I errors.
To account for the multiple comparisons, a permutation approach was used to determine the probability of the differences being real. This permutation test involved comparing the clusters identified in the previous step by randomly assigning participants or conditions into two groups or conditions, and repeating the multiple t tests. If the effect of group or condition is real, the t tests comparing the randomly permutated groups or conditions should yield nonsignificant results (Maris and Oostenveld, 2007). To derive a probability estimate, 1000 different permutations were calculated. The percentage of permutations where the largest t value in the cluster was significant provides an estimate of likelihood of the original difference being due to chance alone (i.e., a p value; Maris and Oostenveld, 2007). For example if 100 of the 1000 random permutations were significant, then the p value would be 0.1; if 800 of the 1000 permutations were significant, the p value would be 0.8. Accordingly, to achieve a p value <0.05, a maximum 50 of the 1000 permutations could be significant. All significant clusters are reported by p values; clusters with the lowest p value are reported first.
Importantly, we focused our analysis on two cluster types based on our hypotheses, and we did not interpret nonhypothesis-driven clusters. The first cluster type was the ERAN and the second cluster type was a P600. To determine group differences, a second analysis compared the ERAN and P600 between controls and amusics. First, difference waves were calculated separately between the out-of-key/out-of-tune melody and the in-key melody. This isolated the impact of pitch deviance in each participant, and allowed for permutation testing of this effect between groups. To focus the analysis on the ERAN and P600, only time points that were considered part of the ERAN or P600 in controls during the first analysis were compared in each condition. This was done because the epoch of the ERAN and P600 clusters was longer in controls and the ERAN in controls completely overlapped the ERAN observed in amusics. The electrode location of differences between the groups remained data driven. Significance was determined using the same permutation method reported above. This analysis tests for the difference in the ERAN or P600 between amusics and controls.
Finally, to determine whether attentional deployment was similar in both the pitch-detection and click-detection tasks, a second set of comparisons was calculated. This compared ERPs from the pitch-detection task and from the click-detection task separately for each Group for all three Note types. Other than the ERPs being compared, the analysis was the same as the within-subject analysis described above.
Results
Behavioral data
Click-detection task
As can be seen in Figure 2A, pitch deviance had an impact on click-detection accuracy HFA (F(2,36) = 18.26, p < 0.0001). Accuracy was lower after the presentation of an out-of-tune note compared with an out-of-key or in-key note (p < 0.0001 for both by pairwise comparison). Overall, amusics tended to be less accurate than controls but this difference failed to reach significance (F(1,18) = 3.36, p = 0.08). The Group × Note type interaction was not significant (p = 0.55). The confidence data were not impacted by amusia either, as the main effect of Group, and its interactions with Note type and Click were not significant (p = 0.96, 0.39, 0.9, 0.97). Like accuracy, the Note type had an impact on confidence as there was a significant interaction between Note type and Click (F(2,36) = 5.33, p = 0.009). Follow-up simple main effects revealed that Note-type had no effect on confidence when there was no click present (F(2,36) = 0.88, p = 0.42), but affected confidence when a click was present (F(2,36) = 4.33, p = 0.02). Pairwise comparisons revealed that confidence was lowest when detecting a click after an out-of-tune note (p = 0.007).
Given that there were no significant group differences in accuracy and confidence during the click-detection task, correlations were calculated using the group average across stimulus types (i.e., click-present and click-absent) on raw accuracy data (i.e., percentage correct) and confidence data. Participants were more confident when they were more accurate (r(19) = 0.50, p = 0.03).
Pitch-detection task
Pitch-detection accuracy (HFA) is shown along with individual data points in Figure 3A. As expected, controls were far more accurate than amusics in detecting pitch deviants (F(1,18) = 51.57, p < 0.0001). Accuracy was similar for both out-of-key and out-of-tune notes as the main effect of Note type and its interaction with Group were not significant (p = 0.48 and 0.15, respectively). Accuracy (HFA) of two amusics approached the tail of the distribution of the controls (A06 and A07). While A06 and A07 performed below controls, they managed to correctly identify 60.9 and 62.6% of the stimuli (Fig. 3B).
Confidence in judgments was unrelated to performance in amusics. The correlation was close to zero (r(8) = −0.11, p = 0.78). In contrast, confidence and accuracy were positively correlated in the control group, although the correlation failed to reach significance (r(10) = 0.47, p = 0.14). Accordingly, control participants were more confident when they were more accurate, while amusics' confidence was unrelated to performance.
Electrophysiology
The first step in the data analysis was to isolate the early negativity associated with detecting pitch violations (i.e., the ERAN) and later positive activity related to the conscious detection of the violation (i.e., the P600). This analysis was done separately for each group, in both the click-detection and pitch-detection tasks, and was done by comparing the ERPs from the in-key melodies to the ERPs from the out-of-tune melodies or the ERPs from the out-of-key melodies. Given that clusters were data driven, they are presented in their order of probability, such that the cluster with the lowest p value (the most statistically significant result), as determined by the permutation testing, is presented first. Illustrations of the topography of each statistical effect (Figs. 4⇓–6) are presented to highlight a sample of the scalp distribution at the peak of the effect. These topographies are dynamic during the cluster. Therefore the peak was chosen as a representative distribution of the effect.
Click-detection task
Controls.
One significant cluster was identified when comparing the in-key and out-of-tune notes while controls were monitoring the melodies for the presence of a near-threshold click. For this cluster, the ERP for the out-of-tune note was more negative than the in-key note from 132 to 288 ms at frontocentral electrodes (p < 0.0001). The scalp distribution of the electrodes that form this cluster at 150 ms is presented in Figure 4A along with the ERP waveforms at electrode FC4. Given the latency and topography, this cluster is likely an ERAN.
Three significant clusters were identified when comparing the out-of-key to the in-key notes. For the first cluster, the ERPs for the out-of-key note were more negative than the in-key note from 139 to 283 ms at frontoright electrodes (p = 0.015). The scalp distribution of the electrodes that form this cluster at 155 ms is presented on the left of Figure 4B, while the ERPs are presented on the right. Again, this cluster is likely an ERAN. For the second cluster, the ERPs for the out-of-key note were also more negative but later, from 603 to 691 ms at posterior and frontoright electrodes (p = 0.017). For the third cluster, the ERPs for the out-of-key note were again more negative than the in-key note over left frontal electrodes from 329 to 539 ms (p = 0.026). The second and third clusters do not correspond to an ERAN or a P600 and will not be discussed further.
Amusics.
Essentially, the same pattern of results was found in amusics compared with controls. The ERPs for the out-of-tune note were more negative than the ERPs for the in-key note from 157 to 231 ms at frontocentral electrodes (p = 0.002; Fig. 4C). Given the latency and topography, this cluster is likely an ERAN. Similarly, the ERP for the out-of-key note was more negative compared with the in-key note from 199 to 270 ms at central-right electrodes (p = 0.035; Fig. 4D). Given the latency and topography, this cluster is likely an ERAN.
Pitch-detection task
Controls.
Two significant clusters were identified when comparing ERPs for in-key and out-of-tune notes for controls during the pitch-detection task. For the first cluster, the ERPs for the out-of-tune note were more positive than those for the in-key note from 553 to 709 ms at central-posterior electrodes (p < 0.00001). The scalp distribution of the electrodes that form this cluster at 600 ms is presented in Figure 5B along with the ERP waveforms at electrode Pz. Given the latency and topography, this cluster is likely a P600. For the second cluster, the ERPs for the out-of-tune note were more negative than ERPs for the in-key note from 135 to 232 ms at frontoright electrodes (p = 0.011). The scalp distribution of the electrodes that form this cluster at 160 ms and the ERP waveforms at electrode FC4 is presented in Figure 5A. Given the latency and topography, this cluster is likely an ERAN.
Two significant clusters were identified when comparing the out-of-key to the in-key notes. For the first cluster, the ERPs for the out-of-key note were more positive than those for the in-key note from 439 to 637 ms at central-posterior electrodes (p < 0.00001). The scalp distribution of the electrodes that form this cluster at 600 ms and the ERP waveforms are presented in Figure 5D. Given the latency and topography, this cluster is likely a P600. For the second cluster, the ERPs for the out-of-key note were more negative than those for the in-key note from 779 to 891 ms at anteriofrontal electrodes (p = 0.046). This cluster is not likely an ERAN or P600. However, a third cluster was identified where the ERPs for the out-of-key notes were more negative than the ERPs for the in-key notes from 129 to 223 ms at frontocentral electrodes, suggesting the presence of an ERAN. However, this cluster failed to reach significance (p = 0.12).
Amusics.
The pattern of results differed in amusics. Only one significant cluster was identified, where the ERPs for the out-of-tune note were more positive than the ERPs for the in-key note from 762 to 856 ms at frontoright electrodes (p = 0.008). Although this cluster could be a delayed P600, it is unlikely given the frontal distribution of the effect. No other clusters approached statistical significance for the comparison between out-of-tune and in-key notes; the p value for the second-most-significant cluster was 0.56. In addition, no significant clusters were identified for the comparison between in-key and out-of-key notes in amusics. Accordingly, there was no evidence of an ERAN or a P600 in amusics when asked to detect a pitch deviant.
Inspection of the ERPs elicited in the two amusic participants (A06, A07) who were able to detect the note deviance above chance revealed no evidence of an ERAN or of a P600.
Amusics versus controls
No significant differences between controls and amusics were found for the ERAN for both the out-of-tune or out-of-key notes during the click-detection task, as there were no clusters of electrodes or clusters of electrodes that were significantly different between the groups.
In the pitch-detection task, the ERAN evoked by out-of-tune notes was larger in controls compared with amusics from 135 to 232 ms (p = 0.03). The scalp distribution of the electrodes identified as significant during the permutation testing and the difference waves for each group at electrode F4 are presented in Figure 6A. The P600 evoked by out-of-tune notes was also larger in controls compared with amusics from 553 to 709 ms (p = 0.02). The scalp distribution of the electrodes identified as significant during the permutation testing and the difference waves for each group at electrode Pz are presented in Figure 6B. Similarly, the ERAN evoked by the out-of-key note was larger in controls compared with amusics from 129 to 223 ms, although the permutation test failed to reach significance (p = 0.08; Fig. 6C). The P600 evoked by the out-of-key note was also larger in controls from 439 to 637 ms (p = 0.003; Fig. 6D).
Impact of task demands
Comparisons between the pitch-detection task and the click detection tasks were calculated for each stimulus type (in-key, out-of-key, and out-of-tune) in both groups. In controls, for the in-key notes, there were no significant differences between the pitch-detection and click-detection tasks. For the out-of-key notes, two clusters were significant. For the first cluster, ERPs from the pitch-detection task were more positive than ERPs from the click-detection task from 256 to 545 ms, starting at frontal electrodes, progressing to most of the electrodes on the scalp, and finishing at posterior electrodes (p < 0.0001). For the second cluster, ERPs from the pitch-detection task were more positive than ERPs from the click-detection task at posterior electrodes from 555 to 690 ms (p = 0.017). For the out-of-tune notes, there was one significant cluster where ERPs from the pitch-detection task were more positive than ERPs from the click-detection task at posterior electrodes from 555 to 720 ms (p = 0.015). These differences likely reflect the P600 reported above.
In amusics there were no significant differences between the ERPs for the pitch-detection and click-detection tasks for any of the stimulus types. This suggests that sustained attention was similar in both tasks. Moreover, the lack of late positivities in the pitch-detection task likely reflects a lack of P600 in this group.
Results summary
The main finding from this study was that abnormal neural responses to pitch deviance in a melody only emerge in the amusic brain when detection of the deviance was required. This result cannot be due to variability in attentional deployment, as both the click-detection and pitch-detection tasks required sustained attention and the only within-subject differences between the ERPs from two tasks were related to a P600 evoked by the out-of-key and out-of-tune notes in controls during the pitch-detection task. When task difficulty was equated across participants by maintaining the detection of a click near threshold, pitch deviance elicited brain responses that were similar in amusics and controls. This indicates that the amusic brain has implicit knowledge of pitch irregularities in music. Most critical were the differences between amusics and controls in their brain responses to pitch deviants when asked to detect an incongruous note. There was no evidence that amusics consciously differentiated between incongruous and in-key notes, while controls did. The results support the idea that knowledge of tonal structure is intact in amusics (Omigie et al., 2012a; Tillman et al., 2012), but that conscious use of this knowledge is disturbed. Reduced connectivity along the right frontotemporal pathway is the likely origin of this perturbation.
Discussion
The impact of amusia on the detection of tonal violations was only evident when amusics were asked to attend to pitch information. Attention is a critical factor in EEG studies because it modulates most long-latency ERP components. Controlling for attention is critical for the laboratory study of amusic adults because they are self-aware that music processing is a challenge for them. Since the near-threshold click appeared after any pitch incongruity could have occurred, attention to pitch incongruities was unintentional and equivalent for all participants. Neurophysiologically, the only difference between the pitch-detection and click-detection tasks was the emergence of a P600 in controls during the pitch-detection task. An increased N1 amplitude would be expected if deployment of attention was different between the two tasks as the N1 is sensitive to attention (Näätänen and Picton, 1987). There was no evidence that the N1 was impacted by the task, suggesting that deployment of attention was similar in both tasks. Interestingly, the presence of an out-of-tune note disturbed click-detection accuracy similarly for amusics and controls, despite the finding that most amusics were unable to detect the mistuned notes above chance. The interference caused by the mistuned note probably occurred without awareness, likely due to an auditory attentional blink (Raymond et al., 1992). The auditory attentional blink is a processing deficit that occurs after a distracter sound (i.e., mistuned note), is largest when the target (i.e., click) differs from the distracter, and when there are other distracter sounds following the target (i.e., the in-key notes that end the melody; Duncan et al., 1997; Shen and Mondor, 2006, 2008). The presence of this interference effect in both groups supports the idea that melodic structure is tracked normally by amusics.
Observation of an ERAN in both controls and amusics was consistent with the interference effect found in behavior. Moreover, the ERAN was evoked by both the out-of-tune and out-of-key notes. Representation of the out-of-tune note can be derived from the acoustical regularities of the preceding pitch intervals in the melody, while recognition of the out-of-key note requires acquired knowledge of tonality. Understanding tonal structure requires top-down knowledge of how notes are typically used in a melody. Knowledge of tonality may have been acquired through implicit exposure to music. Indeed, several amusics engage and appreciate music (Omigie et al., 2012b) and lack of musical exposure is unlikely to be the cause of amusia (Mignault Goulet et al., 2012). One interesting finding was that the clusters identified as an ERAN were shorter in amusics compared with controls during the click-detection task. This difference may have been related to the statistical analysis, as no significant differences were found during the click-detection task in a direct comparison between controls and amusics. Alternatively, it is possible that the ERAN was delayed in amusics, and this may be due to abnormalities along the frontotemporal pathway (Hyde et al., 2011; Albouy et al., 2013). Most critical, these findings demonstrate that tonal schemata are present in the amusic brain.
The amusic deficit in accessing tonal knowledge becomes apparent when asked to make musical pitch judgments. In controls, both the out-of-key and out-of-tune notes evoked an ERAN and a P600 in the pitch-detection task, although the ERAN for the out-of-key note failed to reach statistical significance. In amusics, neither an ERAN nor a P600 were observed, and the incongruous notes were not reliably distinguished. The lack of a P600 in amusics is consistent with previous electrophysiological studies that reported a lack of late positivities related to conscious detection of a pitch deviant (e.g., P3, P600; Peretz et al., 2009, 2005; Moreau et al., 2013). Interestingly, Peretz et al. (2009) found an ERAN using a similar task. However, the ERAN was only present in response to an out-of-tune note, not to an out-of-key note. In the current study, the ERAN was not evoked by either incongruity. The difference between the two studies could be related to task differences. Peretz et al. (2009) asked participants to make a judgment on the whole melody based on a seven-point scale ranging from “highly incongruous” to “highly congruous,” while the current study focused on identifying a single “wrong note.” The more general question in Peretz et al. (2009) may have allowed for increased nonconscious access to pitch information derived from acoustical regularities, which is needed to detect an out-of-tune note, while access to the tonal rules of music, which would be needed to detect an out-of-key note, remained unavailable. We therefore propose that the absence of an ERAN and P600 in the current study reflects a genuine defect in frontal connectivity to the temporal lobes.
Unlike control listeners, amusics do not know what a wrong note is. Even obviously dissonant music and isolated dissonant chords sound pleasant to them (Ayotte et al., 2002; Cousineau et al., 2012). Attempts in our laboratory to use explicit task feedback and practice have not been successful in improving music perception in amusics. Their seemingly intact tonal knowledge is disconnected from conscious experience. In support of this, amusics' confidence was not related to accuracy when detecting a pitch deviant, while they were related when detecting a click. In a previous study, Omigie and Stewart (2011) found that amusics were less confident during a tonal statistical learning task, despite performing at a similar level to controls. Tillman et al. (2014) reported that amusics had longer reaction times to detect a familiar song compared with an unfamiliar song, suggesting decreased confidence in their judgments. Overall, this pattern of results suggests that amusics have low confidence when making judgments about pitch information, but not when making other auditory judgments.
In neural terms, a lack of awareness of pitch information suggests that the frontal areas and its connections to other structures are the principal structural abnormality in amusics, not primary/secondary auditory areas in the superior temporal gyrus (STG). In support of this proposal, Hyde et al. (2011) observed a decreased BOLD response in amusics in the right IFG (BA 47/11) as a function of pitch distance in a melody, while controls had increased BOLD activity. Furthermore, connectivity analyses revealed that the auditory cortex was functionally connected to the right IFG in the normal brain but showed decreased functional connectivity in the amusic brain (Hyde et al., 2011). Dynamic causal modeling of the N100m revealed decreased backwards connectivity along the right frontotemporal pathway in amusics compared with controls (Albouy et al., 2013). These indications of reduced connectivity along the right frontotemporal pathway in response to tones may have at least two causes that are not mutually exclusive: altered backwards connectivity from the IFG to the auditory cortex and altered activity in the IFG itself.
The IFG is critically involved in tasks requiring the detection of musical key violations (Maess et al., 2001; Tillmann et al., 2003) and in working memory tasks related to pitch (Zatorre et al., 1994; Gaab et al., 2003; Gosselin et al., 2009; Koelsch et al., 2009). The IFG is likely the core structural deficit in amusia because in addition to the pitch perception deficit, pitch memory is severely impaired in amusics (Gosselin et al., 2009; Williamson and Stewart, 2010; Tillmann et al., 2012; Albouy et al., 2013). Moreover the ERAN has sources in the IFG (Maess et al., 2001), and the ERAN was abolished when amusics were asked to make a tonal judgment, but not when asked to detect a click. Accordingly, pitch irregularities are likely detected normally in amusics using bottom-up information traveling from the STG to the IFG without top-down interference, resulting in a normal ERAN as observed in the click-detection task. In contrast, a top-down search for irregularities would interfere with, or even inhibit, the proper functioning of the pitch deviance detection system in the STG. This interference may be a result of an abnormal backwards propagation of information along the frontotemporal pathway. This account emphasizes the importance of top-down projections that involve higher-order cortices for conscious perception. Accordingly, the roll of the IFG may be to amplify the auditory deviance detection system in the STG (Opitz et al., 2002), as the process of cognitive amplification is an important aspect of consciousness (Dehaene and Changeux, 2011).
The current characterization of congenital amusia as a disorder of connectivity between an intact auditory perceptual system in the STG and impaired higher-level processing in the IFG bears striking similarities to other disorders. Congenital disorders in both reading (i.e., dyslexia) and face recognition (i.e., prosopagnosia) are also believed to be disorders of connectivity (Eimer et al., 2012; Boets et al., 2013). This conclusion is best illustrated by a recent fMRI study conducted in adults with dyslexia. Boets et al. (2013) found that phonetic representations in the STG are similar between controls and dyslexics. It is the functional and structural connectivity between the STG and the left IFG that is reduced in dyslexics. One of the most relevant findings related to the present study is the possibility that patients suffering from disorders of consciousness due to disordered connectivity have preserved bottom-up but impaired top-down processing along the frontotemporal pathway (Boly et al., 2011). The similarities among these disorders suggest that an alteration in consciousness can give rise to severe cognitive impairments by disconnecting core perceptual systems from the IFG. Hence, congenital amusia can serve as a valuable model for the characterization of the neural correlates of consciousness in the human brain.
Footnotes
This work was supported by grants from the Natural Sciences and Engineering Research Council of Canada, the Canadian Institutes of Health Research, and the Canada Research Chairs program. We thank Mihaela Felezeu for technical assistance.
The authors declare no competing financial interests.
- Correspondence should be addressed to Benjamin Rich Zendel, International Laboratory for Brain, Music and Sound Research, Suite 0-120, Pavillon 1420 boul. Mont Royal, Université de Montréal, C.P. 6128—Station Centre Ville, Montreal, Quebec H3C 3J7, Canada. benjamin.rich.zendel{at}umontreal.ca