Abstract
Processing auditory sequences involves multiple brain networks and is crucial to complex perception associated with music appreciation and speech comprehension. We used time-resolved cortical imaging in a pitch change detection task to detail the underlying nature of human brain network activity, at the rapid time scales of neurophysiology. In response to tone sequence presentation to the participants, we observed slow inter-regional signaling at the pace of tone presentations (2-4 Hz) that was directed from auditory cortex toward both inferior frontal and motor cortices. Symmetrically, motor cortex manifested directed influence onto auditory and inferior frontal cortices via bursts of faster (15-35 Hz) activity. These bursts occurred precisely at the expected latencies of each tone in a sequence. This expression of interdependency between slow/fast neurophysiological activity yielded a form of local cross-frequency phase-amplitude coupling in auditory cortex, which strength varied dynamically and peaked when pitch changes were anticipated. We clarified the mechanistic relevance of these observations in relation to behavior by including a group of individuals afflicted by congenital amusia, as a model of altered function in processing sound sequences. In amusia, we found a depression of inter-regional slow signaling toward motor and inferior frontal cortices, and a chronic overexpression of slow/fast phase-amplitude coupling in auditory cortex. These observations are compatible with a misalignment between the respective neurophysiological mechanisms of stimulus encoding and internal predictive signaling, which was absent in controls. In summary, our study provides a functional and mechanistic account of neurophysiological activity for predictive, sequential timing of auditory inputs.
SIGNIFICANCE STATEMENT Auditory sequences are processed by extensive brain networks, involving multiple systems. In particular, fronto-temporal brain connections participate in the encoding of sequential auditory events, but so far, their study was limited to static depictions. This study details the nature of oscillatory brain activity involved in these inter-regional interactions in human participants. It demonstrates how directed, polyrhythmic oscillatory interactions between auditory and motor cortical regions provide a functional account for predictive timing of incoming items in an auditory sequence. In addition, we show the functional relevance of these observations in relation to behavior, with data from both normal hearing participants and a rare cohort of individuals afflicted by congenital amusia, which we considered here as a model of altered function in processing sound sequences.
- audition
- congenital amusia
- neural oscillations
- phase-amplitude coupling
- pitch discrimination
- predictive coding
Introduction
Pitch is a fundamental perceptual feature of sound that is a form-bearing dimension of music and an important cue for understanding speech (Bregman, 1990). Meaningful pitch changes are perceived by combining auditory inputs with contextual priors and by engaging attentional focus (Garrido et al., 2007). Hence, pitch perception engages a network of brain regions involved in both auditory prediction and perceptual decision-making (Peretz and Zatorre, 2005). Here, we sought to determine the regional and network neurophysiological mechanisms and dynamics crucially involved in brain pitch processing.
One classic approach to assess the neural processing of pitch is via the oddball paradigm (Näätänen et al., 2007), which consists of the presentation of sequences of identical tones that are infrequently interrupted by a deviant stimulus that differs in pitch. Scalp EEG and MEG oddball studies have shown that, when attention is directed away from auditory stimuli, the brain response to deviant pitch is marked by an early event-related component (mismatch negativity) (Näätänen et al., 2007). Attention toward the deviant sound and its detection elicit later event-related components (>300 ms, P3a and P3b, respectively). The P3a is thought to be generated by frontal circuits typically involved in attention orienting and novelty processing tasks. The P3b is observed when the stimulus is task-relevant and has been shown to originate from temporo-parietal regions associated with attention (Polich, 2007; for related fMRI studies, see also Opitz et al., 2002; Schönwiesner et al., 2007).
The nature of neurophysiological signaling within and between these extended sets of brain regions and networks remains elusive. Yet, Tse et al. (2018) demonstrated the key role of fronto-temporal network connectivity in pitch change detection using transcranial magnetic stimulation. Further, Morillon and Baillet (2017) emphasized that frontal motor circuits are actively involved in an auditory attention task, issuing a directed influence on temporal auditory regions, an observation compatible with the notion of active inference in sensory perception (Schroeder et al., 2010). Effects involving oscillatory brain activity are consistently reported in the recent literature concerning temporal attention and the predictive inference of sensory inputs (Morillon and Baillet, 2017; Chang et al., 2018; Haegens and Zion Golumbic, 2018; Nobre and van Ede, 2018). Yet, the mechanistic role of oscillatory activity within and between brain regions and frequency bands remains to be understood, which was the objective of the present study.
To clarify the mechanistic relevance of our observations to behavior, we enrolled typical listeners and participants affected by congenital amusia, a pitch-specific neurodevelopmental disorder (Peretz, 2016). Amusic individuals are impaired at detecting pitch deviations that are smaller than two semitones. Therefore, congenital amusia affects the processing of pitch in oddball tasks, as in the present study (Hyde and Peretz, 2004; Peretz et al., 2005). Pitch deviations as small as an eighth of a tone (25 cents) elicit normal mismatch negativity responses in amusic brains, yet without conscious perception (i.e., absence of P3b) (Omigie et al., 2012; Moreau et al., 2013). This functional gap is currently attributed to altered brain connectivity between the superior temporal gyrus and the inferior frontal gyrus (IFG) (Albouy et al., 2013; Peretz, 2016; Albouy et al., 2019), with empirical, neurophysiological effects reported in the α (8-12 Hz) (Tillmann et al., 2016) and γ (40-80 Hz) (Albouy et al., 2013) frequency bands.
In the present study, we bring together these apparently disparate findings and provide a cohesive description of the neurophysiological network mechanisms that are essential to pitch change detection, considering amusia as a specific model of perturbed conscious perception. We proceed by proposing a more integrative view of the neurophysiological dynamics involved, assessing the time-resolved variations of interdependent brain oscillations and the directional connectivity between key nodes of the fronto-temporal network recruited by pitch processing. The network nodes were selected based on previous pitch processing studies (Zatorre et al., 1992; Albouy et al., 2013; Peretz, 2016; Morillon and Baillet, 2017) and comprise, bilaterally, the superior temporal gyrus in auditory cortices, the posterior portion of the inferior frontal gyrus, and the precentral motor cortices (see Materials and Methods).
Materials and Methods
The details of the statistical inference tests used are indicated in the flow of the manuscript when effects are reported.
Participants
Sixteen right-handed participants took part in the study. Eight amusics were recruited according to the Montreal Protocol for Identification of Amusia (Vuvan et al., 2018). The other eight participants formed a control group (i.e., typical listeners) who were matched with the amusic group in terms of age, gender, and years of education. The experimental paradigm was reviewed and approved by the ethics review board of McGill University Health Center (Protocol NEU-12-023). All participants gave written informed consent to take part in the study.
All participants had their hearing assessed using age-normalized audiometric guidelines (Walker et al., 2013). Although some participants had heightened audiometric thresholds (which is common in older adults), all participants reported being able to hear the experimental trials without effort. Table 1 displays demographic information for all participants and reports their performances on a battery of musical tasks. The amusic and control participants were matched in terms of demographic variables but differed significantly in musical performance. Specifically, amusics scored significantly lower than controls on the melodic tasks (AMUSIA Scale, AMUSIA Out-of-Key, MBEA Melodic). In contrast, they scored equivalently to controls on the AMUSIA Off-Beat test, confirming the specificity of their disorder to musical pitch.
Experimental design
The study paradigm was adapted from Hyde and Peretz (2004) and Peretz et al. (2005). Each trial consisted of a sequence of five pure tones: tones 1, 2, 3, and 5 were identical and played at the pitch level of C6 (1047 Hz; standard pitch). Tone 4 was the target tone played at 5 different pitches across trials. In half of the trials, the target tone was played at the standard C6 (1047 Hz) pitch (“standard” trials). In the other half of the trials (“deviant” trials), the target tone was played with a deviation of 25, 50, 100, or 200 cents (100 cents correspond to 1 semitone) from the standard tone. Each tone was presented for 100 ms, and the time interval between two consecutive tone onsets in a sequence (intertone interval) was 350 ms. The total duration of a sequence was 1.4 s (see Fig. 1A). EEG and MEG signals were recorded simultaneously from 1.5 s before the target tone (tone 4) presentation to 1.5 s after it, and were analyzed separately. We called the interval before target tone presentation (−1.5 s, 0) pretarget, and after tone presentation (0, 1.5 s) post-target interval (see Fig. 1A). All source level analysis in this study were done exclusively on MEG recordings.
Ten minutes of resting state were recorded from all participants (eyes open) at the beginning of the session. Participants were then asked to listen to tone sequences and to press a button with one of their index fingers to indicate whether the presented sequence comprised a standard or a deviant target sound. The laterality of the motor response alternated between runs for each participant. They were instructed to keep their gaze fixed on a cross displayed on a back-projection screen positioned at a comfortable distance. Responses with the right- or left-hand finger to standard versus deviant trials were also intermixed between participants. All subjects received 40 training trials before MEG data collection. A total of 640 tone sequences were then presented to every participant, in 10 blocks of 64 trials, which resulted in a total of 320 standard tone sequences and 80 deviant trials per pitch deviance level. Trials started in succession, 1 s (± < 50 ms jitter) following the subject's response to the previous trial. No feedback was provided to participants on their performance.
Data acquisition
MEG data were collected during resting state and task performance in the upright position using a 275-channel CTF MEG system, with a sampling rate of 2400 Hz. Simultaneous EEG data were recorded also using the CTF system from four standard electrode positions: FZ, FCZ, PZ, and CZ (reference was placed on right mastoid), electrode locations according to 10/20 system with 2400 Hz sampling rate. The audio presentation, button presses, heartbeat, and eye movement electrophysiological signals (ECG and EOG, respectively) were also collected in synchronization with MEG. Head position was monitored and controlled using three coils attached to the subject's nasion and both pre-auricular points (150 Hz sampling rate). The head coil locations and 100 scalp points were digitized prior to MEG recordings for each individual, using a Polhemus 3-D digitizer system (https://polhemus.com/scanning-digitizing/digitizing-products/). We obtained a T1-weighted MRI volume for each participant (1.5-T Siemens Sonata, 240 × 240 mm FOV, 1 mm isotropic, sagittal orientation) for cortically constrained MEG source imaging (Baillet, 2017).
Data preprocessing and source modeling
Contamination from system and environmental noise was attenuated using built-in CTF's third-order gradient compensation. All further data preprocessing and modeling were performed with Brainstorm (Tadel et al., 2011) following good-practice guidelines (Gross et al., 2013). The recordings were visually inspected, with segments contaminated by excessive muscle artifacts, head movements, or remaining environmental noise marked as bad and discarded from further analysis. Powerline artifacts at 60 Hz and harmonics up to 240 Hz were reduced using notch filtering. Signal-space projectors were designed using Brainstorm's default settings to attenuate the electrophysiological contamination from heartbeats and eye blinks.
The MRI data were segmented using the default FreeSurfer pipeline (Dale et al., 1999). For distributed source imaging, we used Brainstorm to resample the cortical surface tessellations produced down to 15,000 vertices. We derived individual forward MEG head models using the overlapping-sphere analytical approach (with Brainstorm default settings). We then obtained a weighted minimum-norm kernel (Brainstorm with default settings) for each participant to project sensor-level preprocessed data onto the 15,000 vertices of the individual cortical surface. The empirical covariance of sensor noise was estimated for weighted minimum-norm kernel modeling from a 2 min empty-room MEG recording collected at the beginning of each session (i.e., for each participant). All source maps were obtained from MEG sensor data exclusively.
ROIs
In all participants, we defined six brain ROIs using an MEG functional localizer. The right and left auditory cortices (rAud and lAud) were identified as the regions presenting the strongest M100 (within 100-120 ms after stimulus onset) event-related average response peak to all tones, restricted to 3 cm2 of surface area per region. We defined rIFG and lIFG as portions of Brodmann BA45 identified from the Brodmann cortical atlas of Freesurfer registered to individual anatomy. The spatial extent of the rIFG and lIFG ROIs was based on the maximum differential activity observed between the brain responses to deviant and standard tones at ∼100 ms after “target tone” presentation (Florin et al., 2017). The resulting surface areas varied between ROIs as driven by the strength of the event-related response observed and were typically ∼1.3 cm2. Left and right cortical motor regions (lMot and rMot, respectively) were defined following Morillon and Baillet (2017) over a surface area of ∼3 cm2 at the precentral locations of the largest M50 50-ms latency responses after right and left index finger button presses, respectively (see Fig. 1D).
Posterior alpha band activity measurement
A cluster of five posterior MEG channels presenting the highest levels of pretarget [8, 12] Hz alpha band activity across subjects (MZP01, MLP31, MRP31, MLP32, MRP32) were selected. The power of MEG signals at these sensor locations over the pretarget period ([−1.5, 0] s) of each trial was computed. We used the even-order linear-phase FIR filter in Brainstorm (bandpass: [8, 12] Hz, stop-band attenuation: 40 dB, 99% energy transient: 0.402 s) and computed the root-mean-square signal strength across the sensor cluster for each trial. The same approach was applied to EEG, restricted to electrode CZ.
Phase-amplitude coupling (PAC)
We used the time-resolved measure of PAC (tPAC) between colocalized slow and fast cortical signal components, as published by Samiee and Baillet (2017) and distributed with Brainstorm. tPAC measures the temporal fluctuations of the coupling between the phase of slower activity (at frequency fP) and the amplitude of faster signal components (at frequency fA). Briefly, the instantaneous amplitude of faster signals (AfA (t)) in a sub-band of the fA band of interest was extracted using the Hilbert transform. Power spectral analysis was used to identify the frequency of strongest oscillation in AfA(t) (in the fP band of interest), coinciding with an oscillation in the original time series. This frequency was then labeled as the fP frequency coupled to the current fast fA frequency. The coupling strength between AfA (t) and the instantaneous phase of the signal filtered around fP were then calculated. For further methodological details concerning tPAC, see Samiee and Baillet (2017).
We then extracted comodulograms to identify the strongest modes of (fP, fA) coupling over time windows of 1.5 s that contained the entire tone sequence at every trial, testing 20 candidate fA frequencies linearly distributed within the (15-250 Hz) band. The frequency band of interest for fP was 2-12 Hz. We found the strongest (fP, fA) mode of coupling from the obtained comodulogram was with fP in the 2-4 Hz band and fA in the 15-35 Hz band. The temporal dynamics of tPAC coupling were extracted between these two bands of interest (reflecting the dominant mode of coupling) from 700-ms time windows with 50% overlap over the entire trial duration. Since previous studies reported on right-hemisphere dominance in similar pitch discrimination tasks (Peretz, 2016; Zatorre et al., 1992), we analyzed PAC in the right-hemisphere ROIs only.
Stimulus–brain coupling
We assessed whether the auditory stimulus tone sequence induced modulations of beta activity in the auditory cortex. The goal was to replicate previous observations of stimulus-induced β-amplitude modulations in auditory cortex in similar conditions (Fujioka et al., 2012; Cirelli et al., 2014; Chang et al., 2018). These results emphasize how beta band activity is expressed by the auditory tone sequence in addition to local δ activity in the auditory cortex, and whether β bursts occur preferentially at the expected latency of the tone presentation as a predictive form of signal.
Following the method used by Morillon and Baillet (2017), we generated a reference sinusoidal signal at 2.85 Hz (i.e., the rate of the tone presentation every 350 ms), with its peaks aligned at the onset of each tone presentation. We then estimated the tPAC cross-frequency coupling between the phase of this reference signal and the amplitude of β oscillations in the right auditory cortex. We tracked the variations in time of this coupling using tPAC with a sliding window length of two cycles of the tone presentation rate (700 ms) with 50% of overlap, following the specifications derived by Samiee and Baillet (2017). We then identified the preferred phase of tPAC coupling along the cycle of the stimulus sinusoid reference signal. Finally, we converted the corresponding phase angle into a time latency, as a fraction of the 350 ms stimulus presentation cycle.
Functional and effective connectivity
We estimated frequency-specific functional connectivity between ROIs using coherence (Walter et al., 1966; Thatcher et al., 1986; Fries, 2005), which is a measure of amplitude and phase consistency between cortical signals. We also measured signs of directional, effective connectivity between ROIs with phase-transfer entropy (PTE) (Lobier et al., 2014), adopting the approach by Morillon and Baillet (2017). PTE measures narrowband phase leading/lagging statistics to derive estimates of effective connectivity between regions. Importantly to our study, PTE has shown better performance than coherence in detecting signs of interdependence between signals of relatively short duration (Bowyer, 2016). In more details, PTE measures effective connectivity based on the respective instantaneous phases of pairs of narrow-band neurophysiological signals. The sign of directed PTE (dPTE) values indicates the estimated direction for effective connectivity. For example, considering two regions A and B, positive (respectively negative) dPTE values indicate information transfer from A (resp. B) to B (resp. A). We used the dPTE code openly shared by Hillebrand et al. (2016), which we have made available in Brainstorm. For dPTE calculation, the first PCA component of all vertices (filtered in the frequency band of interest) in each ROI was used for the analysis.
Informed by tPAC frequency ranges, we derived dPTE connectivity measurements between the ROIs and between hemispheres. We evaluated interhemispheric coherence and dPTE connectivity between homologous regions bilaterally, in the δ (2-4 Hz) and β (15-30 Hz) frequency bands, over the baseline resting state period, the [−1500, 0] ms pretarget, and the post-target [0, 1500] ms time segments.
In reported results, the significance of observed dPTE values in each group and condition was statistically corrected for all comparisons performed (18 comparisons for the following: 3 pairs of regions × 2 frequency bands × 3 states).
Statistical analyses
In all cases with relatively large number of data points (>250), parametric tests (e.g., t tests against zero-mean, paired t tests, repeated-measures ANOVAs) were used (with p = 0.05 considered as significance threshold). Tukey's tests were used for post hoc analyses and corrections for multiple comparisons. The distributions of event-related potentials (see Fig. 2) were tested for zero-mean using t tests and reported with corrections for multiple comparisons considering false discovery rates (FDRs). tPAC values were assessed for statistical significance using a nonparametric resampling approach (Samiee and Baillet, 2017): for each trial, we generated 500 surrogates using block-resampling. Each surrogate was produced from selecting five time points randomly in the trial epoch to subdivide the instantaneous phase signal into five blocks. These blocks were then randomly shuffled, and tPAC was estimated using the resulting block-shuffled phase signal and from the original instantaneous amplitude time series. This resampling technique provides reference surrogate signals with PAC at chance levels and with minimum phase distortion (Samiee and Baillet, 2017). The tPAC values obtained from surrogate data were normally distributed (Shapiro–Wilk test, p > 0.8). tPAC values from each original trial were z-scored with respect to the empirical distribution of tPAC values obtained from the surrogate data generated from the same trial (Eq. 1) as follows:
Results
Behavior
Participants listened to a sequence of five pure tones and were asked to categorize the trial as standard or deviant, based on the fourth target tone in the sequence (Fig. 1A; for details, see Materials and Methods). The deviant trials had four levels of difficulty depending on difference in pitch frequency between the fourth tone and the other tones in the sequence presented at a standard pitch (first to third + fifth tone). The pitch differences used amounted to 25, 50, 100, and 200 cents (percent of a semitone). In standard trials, all tones in the sequences were presented using the same identical pitch. We measured the hit rate (HR) to capture behavioral performance, defined as the ratio of correctly detected trials with respect to all trials for all five deviance levels (the four possible deviant pitch levels and the standard, Fig. 1B). A two-factor (group × deviance level) between-subject ANOVA of observed performance accuracy revealed a significant interaction between groups and deviance levels (F(1,4) = 19.1, p < 0.001). Post hoc analysis indicated no difference in HR between deviance levels of 25 and 50 cents (post hoc Tukey test, t(70) < 1.1, pTukey > 0.98), and between deviance levels of 100 and 200 cents (t(70) <= 0.28, pTukey = 1), but all other interactions were significant (pTukey < 0.01). We therefore combined the trials with deviance levels of 25 and 50 into a single condition (small deviance), and all trials with deviance levels of 100 and 200 cents into another condition (large deviance). We used d' to measure sensitivity when assessing behavioral performance in both deviance conditions (Fig. 1C). d' is a measure that reflects the interaction between HR and false alarm (FA) rate, with d'=z(FA) – z(HR). We found an interaction between the level of target pitch deviance and groups (F(1,1)=20.3, p < 0.001). Further, both typical listeners and amusics showed higher sensitivity to large deviance (controls: t(60) = 4.1, pTukey < 0.001, amusics: t(60) = 10.5, pTukey < 0.001). Typical listeners were more sensitive than amusics in the small deviance condition (t(60) = 7.5, pTukey < 0.001) but not in the large deviance condition (t(60) = 1.2, pTukey = 0.66).
Nonparameteric ANOVA testing (Kruskall–Walis test) on reaction times reflected no significant difference between groups (χ2(1) = 0.0313, p = 0.86) and no interaction between groups and any other factor (correctness of answer or deviance level). However, there was a main effect of deviance level (χ2(2) = 18.6, p < 0.001): both groups responded faster to high deviance (easy) trials compared with low deviance (hard) trials (post hoc Tukey's test: W = 5.39, p<0.001) and compared with standard trials (post hoc Tukey's test: W = 4.90, p = 0.002).
Event-related responses and power-spectrum density estimates
Similar to all previous literature, event-related responses were investigated using EEG recordings. Group average event-related responses to target tone presentations are shown for electrode CZ in Figure 2. There was a clear N1 component around 110 ms following the onset of the target tone in both groups and all three conditions. In line with previous reports (Peretz et al., 2005), both groups produced a P3 component in the high deviance condition (significantly different from zero-mean t(7)>4.8, p < 0.002). The P3 response was weaker for standard tones in both groups. In the low deviance condition, amusics showed similar responses to standards (not significantly different from zero-mean t(7)<2.3, pFDR > 0.05), while controls produced a P3 which amplitude was intermediate between those in the standard and high deviance conditions (significantly different from zero-mean t(7)>3.5, p < 0.01).
Figure 2B shows the normalized power spectral density of activities in three ROIs during the pitch discrimination task (−1.5 to 1.5 s; Fig. 1A), in a representative subject. The normalized power spectrum is the power spectrum density scaled by the total signal power. There are prominent peaks in δ (2-4 Hz) and β (15-35 Hz) frequency bands (highlighted with green and purple shadows, respectively).
Monitoring of vigilance: posterior alpha band activity
We measured the posterior normalized alpha power as a proxy for vigilance (Valentino et al., 1993), attention (Aftanas and Golocheikine, 2001), and the cognitive demand of the task (Gevins and Smith, 2000; Ciesielski et al., 2007). Higher α levels could indeed account for lower task performances and confound the interpretation of our data. There was a main effect for group and task performance (Fig. 2). Amusics produced lower levels of posterior alpha activity (F(1)=28.37, p < 0.001), which could be indicative of the task requiring higher attentional demands from this group (Gevins et al., 1979; Smith et al., 1999). Posterior alpha activity was reduced in correct trials (F(1)=8.1, p = 0.004), which is consistent with its negative association with attention and vigilance. Interactions between response accuracy and groups were significant (F(1)=5.86, p = 0.01), with a post hoc Tukey's test showing lower alpha power in correct trials in amusics (t(9,262)=−4.46, pTukey<0.001) and significantly lower posterior α levels in amusics compared with controls in both correct and incorrect trials (correct: (t(9,262) = 3.98, pTukey<0.001, incorrect: (t(9,262)=4.16, pTukey<0.001). We made similar observations from EEG recordings at electrode Pz (not shown).
Coupling between slow and fast neural dynamics
We investigated potential cross-frequency interactions between oscillatory activities with a PAC analysis of neurophysiological signals in all ROIs. We derived the strength of PAC between the phase of slow oscillations in the 2-12 Hz range and the amplitude of faster rhythms in the 15-250 Hz range. Across participants and for both groups, we observed the strongest PAC for the entire sequence between the phase of δ-band activity at 2-4 Hz and the amplitude of neurophysiological signals in the β frequency range at 15-35 Hz in the rAud (Fig. 3A). This observation reflects the modulation of the amplitude of beta band oscillations by the phase of slower rhythms in the δ frequency range. tPAC strength in rAud over the tone sequence is shown in Figure 3B. The five data points in Figure 3B report tPAC values during the presentation of each of the five auditory tones. The last two tPAC data points correspond to subsequent time windows during which there was no tone presentation. We found in both groups and across all tested time windows that the strength of PAC was above chance levels (z > 3.4, pcorrected<0.01). Overall, coupling was stronger in amusics than in typical listeners (F(1)=11.1, p < 0.001), with no effect of response accuracy (F(1)=0.02, p = 0.88) or pitch deviance (F(1)=0.94, p = 0.33; Fig. 3C). There was also a main effect of time (F(6)=6.5, p < 0.001): in both groups, a post hoc analysis showed that PAC increased after the onset of the tone sequence (p=0.0006) and decreased after the occurrence of the target tone (over the three subsequent time windows: p = 0.019, p = 0.013, and p < 0.0001, respectively).
In the rIFG, the strongest PAC was also observed between the phase of regional δ activity and the amplitude of beta band fluctuations (Fig. 3D). Time-resolved tPAC analysis in that region revealed a main effect for groups (F(1)=43.95, p < 0.0001; Fig. 3E): as in rAud, amusics expressed stronger PAC levels than controls (p < 0.001, Fig. 3E,F). We also observed a main effect of deviance level (F(1)=5.8, p = 0.0157) and an interaction between actual deviance and accuracy of pitch change detection (F(1,1)=13.1, p < 0.001). Indeed, in controls, PAC was stronger in rIFG when target tones were perceived as deviant than when reported as standards (pcorrected=0.007; Fig. 3F), regardless of response correctness.
We performed a two-factor ANOVA (group × perceived deviance) of PAC in rIFG, which confirmed a main effect of group (F(1)=77.82, p < 0.0001) and of perceived deviance (F(1)=7.05, p = 0.0079). In the right auditory cortex, there was only a main effect of group (F(1)=21.35, p < 0.0001) and no effect of perceived deviance (F(1)=2.25, p = 0.13). These observations point at a neurophysiological marker in the inferior frontal cortex of the individual's perception of the target tone as deviant, regardless of accuracy. There was no such effect in other tested regions in both groups.
We also derived PAC statistics in the baseline resting state before the auditory-testing session, with the objective of evaluating a possible predictive relation with the values observed during task performance (Fig. 3G). Resting state PAC between ongoing δ and β was above chance level in rAud for both groups (p < 0.05), but only marginally in rIFG (p > 0.07). We found a main effect of group (amusics stronger, F(1)=13.93, p = 0.0002), region (rAud stronger, F(1)= 411.44, p < 0.0001), and behavior (resting state weaker than task performance, F(1)=2241.1, p < 0.0001), with a significant interaction between region and state (F(1,2)=93.97, p < 0.0001). Post hoc analysis of the interaction showed that PAC in rAud during task performance was stronger than in rIFG (p < 0.0001) and stronger than during the resting state in rAud (p < 0.0001) in both groups.
Beta bursts are temporally aligned with tone presentations in a sequence
We also derived measures of phase-amplitude stimulus-to-brain coupling in the right auditory cortex. Our observations reproduced previously reported findings (Fujioka et al., 2012; Cirelli et al., 2014; Chang et al., 2018) of stronger coupling in amusics compared with controls between the phase of a reference sinusoid adjusted to the tone sequence and the amplitude of β signaling (F(1)=60.5, p < 0.001; Fig. 3H, left). There was no significant effect of time (F(6)=1.16, p = 0.32), accuracy (F(1)=2.08, p = 0.14), or pitch deviance (F(1)=0.41, p = 0.53). Overall, neurophysiological δ-to-β PAC was stronger than stimulus-to-β coupling in the tested region (t(119,985)=69.45, p < 0.001).
For each trial, we also extracted the latency of β amplitude bursts with respect to the corresponding tone presentation in the sequence. We found in both groups that, after the first tone in the sequence was presented, the amplitude of β bursts was maximal at the expected latency of auditory inputs reaching the auditory cortex (i.e., ∼50 ms after tone onset; Fig. 3H, right).
Frequency-specific network interactions
We measured the coherence between all pairs of ROIs. We observed a main effect of the pair of ROIs (Aud-IFG presented stronger coherence than Mot-Aud and Mot-IFG, F(2) = 120.29, p < 0.0001), frequency band (δ-band coherence was stronger than beta band's, F(1) = 49.7, p < 0.0001), and laterality (with right-hemisphere coherence being stronger than left-hemisphere, F(1) = 17.91, p < 0.0001), but not for group (F(1) = 1.56, p = 0.21) or state (rest, vs pretarget vs post-target, F(2) = 0.03, p = 0.96) (Fig. 4A). We also observed significant interactions between frequency-band and group (F(1,1) = 6.16, p = 0.013) with stronger δ coherence in controls than in amusic participants (pcorrected=0.04).
We then measured manifestation of frequency-specific directed interactions between ROIs, using PTE (Lobier et al., 2014). The analysis showed that in the resting state of controls, beta band activity was directed from motor cortex to Aud (t(15)=−6.48, p < 0.001) and from motor cortex to IFG (t(15)=−6.31, p < 0.001; Fig. 4B, left). We also found similar expressions of directed connectivity during task performance in both pretarget (Aud: T(15)=−12.42, p < 0.001, IFG: T(15)=−8.42, p < 0.001) and post-target segments (Aud: T(15)=−7.33, p < 0.001, IFG: T(15)=−4.6, p = 0.006). There was a reversed directed connectivity transfer in the δ range, from the auditory cortex to IFG and motor regions in pretarget (IFG: T(15)=4.06, p = 0.018, Mot: T(15)=4.07, p = 0.017) and post-target (IFG: T(15)=3.71, p = 0.038, Mot: T(15)=4.66, p = 0.006) segments, but not in the resting state.
In typical listeners, a three-factor ANOVA (pairs of ROIs × state × frequency bands) of PTE measures confirmed a significant main effect of the frequency band (F(1) = 4.57, p < 0.001) with opposite directions of connectivity transfer for δ versus β band activity. Interactions showed that directed connectivity transfer of δ-band activity from bilateral auditory regions to inferior frontal cortices was increased during pretarget tone presentations, compared with baseline resting state (t(90) = 4.73, p < 0.001). δ-band transfer was also stronger from bilateral auditory to motor regions over the entire tone sequence (pre-target and post-target) compared with the baseline resting state (pre: F(90) = 5.39, p < 0.001, post: F(90) = 4.27, p < 0.001). Reversed directed connectivity transfers were observed in the β band from motor to auditory regions, and from motor to inferior frontal cortices. PTE was stronger over the pretarget segment compared with post-target (from motor to auditory regions: F(90) = 3.65, p = 0.006; from motor to inferior frontal cortices: F(90) = 3.52, p = 0.009). All these observations were identical for both hemispheres, with the exception of δ transfer from auditory to inferior frontal cortex, which was stronger on the right side (post hoc: hemisphere × frequency band interaction: t(84)=−2.7, pcorrected=0.04).
Qualitatively, directed connectivity measures were similar between typical listeners and amusics (Fig. 4B). There was no significant main effect of group or interaction of the group with other factors. We noted that the greater variability around zero PTE in the group of amusic participants, for δ-band connectivity between the auditory to the inferior frontal cortices, and between the auditory and the motor cortices, did not indicate clear directionality during tone sequence presentation (both during pre-target and post-target segments, t test against zero p > 0.099).
Akin to typical listeners, top-down β-range directed transfer was significant from motor to auditory cortices (resting state: t(15)=−6.49, p < 0.001, pre: t(15)=−7.67, p < 0.001, post: t(15)=−8.11, p < 0.001), and from motor to inferior frontal cortices (resting state: t(15)=−6.31, p = 0.005, pre: t(15)=−6.80, p < 0.001, post: t(15)=−4.44, p = 0.009). There was no difference between the right and left hemisphere in amusics (F < 1.33, p > 0.25).
Discussion
We used noninvasive neurophysiological measures of local and inter-regional brain dynamics to study the neurophysiological mechanisms of pitch change detection. These analyses were performed in typical listeners and in congenital amusics, to resolve the mechanistic elements that are essential to pitch perception, and deficient in amusia. Our data report effects within and between brain ROIs from previous pitch processing studies that used a variety of functional techniques (Zatorre et al., 1992; Albouy et al., 2013; Peretz, 2016; Morillon and Baillet, 2017). These regions comprise, bilaterally, the superior temporal gyrus in auditory cortices, the posterior portion of the inferior frontal gyrus, and the precentral motor cortices.
Our behavioral results confirmed that amusics had more difficulty than controls in detecting small pitch variations of up to 50 cents (Fig. 1B). This result is consistent among participants and indicates that the paradigm works like a simple diagnosis tool (Fig. 1C). Analysis of reaction times reflected that there was no significant difference between reaction times of the two groups in any deviance level. We show in Figure 2 that lower accuracy of amusic participants was not related to lack of vigilance while performing a task that was difficult and possibly frustrating to them (Ciesielski et al., 2007). Indeed, we found that posterior alpha band activity, as a proxy marker of decreased vigilance, was lower in amusics than in controls during the task (Valentino et al., 1993; Aftanas and Golocheikine, 2001). This observation could be attributed to amusics experiencing the task as more difficult hence soliciting more vigilance from their part, marked with lesser alpha power. The effects on event-related responses were in line with previous reports (Moreau et al., 2013) of the mismatch negativity component not followed by a marker of conscious sensory processing (P3b) in the amusic group.
During the presentation of tone sequences, we found local expressions of cross-frequency coupling between the phase of δ-band activity and the amplitude of β-band signal components in the right auditory and inferior frontal cortices in both groups. The frequency of δ-band activity was similar to the presentation rate of tones in the auditory sequences (2.85 Hz), which is typical of cortical tracking at the dominant rate of auditory signals (Doelling and Poeppel, 2015; Morillon and Baillet, 2017; Puschmann et al., 2019). By boosting neural signals in response to regular sensory inputs, cortical tracking increases signal-to-noise ratio and improves the detection of genuine PAC effects (Aru et al., 2015; Samiee and Baillet, 2017). There was no δ-to-β coupling above chance level in the absence of tone-sequence presentation, namely, during baseline resting state in IFG (Fig. 3G). The fact that we observed δ-band tracking in IFG with the task (Fig. 3B,C) is compatible with this region being a downstream node of the ventral auditory pathway (Zatorre et al., 1992; Gaab et al., 2003; Albouy et al., 2013, 2019). Expressions of beta band activity during pitch processing have been previously reported in auditory regions (Fujioka et al., 2012; Cirelli et al., 2014), including during the pretarget time period (Florin et al., 2017).
In both groups, δ-β PAC was elevated in auditory and inferior frontal cortices during task performance compared with baseline resting state (Fig. 3G). This observation is in line with reports of higher transient PAC levels during task performance, such as with working memory (Axmacher et al., 2010), associative learning (Tort et al., 2009; van Wingerden et al., 2014), and visual attention (Szczepanski et al., 2014).
A striking overall effect between groups was that δ-to-β coupling in the auditory and inferior frontal cortices was higher in amusics than in controls, both during tone-sequence presentations and at baseline in the resting state. These observations of elevated ongoing PAC are the first observed in amusia. They contribute to converging evidence that chronically elevated PAC levels could be brain signal indicators of impaired neurophysiological function, as previously shown in, for example, epilepsy (Amiri et al., 2016; Samiee et al., 2018), Parkinson's disease (de Hemptinne, 2013; van Wijk et al., 2016), and autism spectrum disorders (Berman et al., 2015). In amusic participants, we found increased PAC levels in regions that have been reported as abnormal in congenital amusia using structural MRI (Albouy et al., 2013), malfunctioning with fMRI (Hyde et al., 2011; Albouy et al., 2019), and electrophysiology (Albouy et al., 2013, 2015; Tillmann et al., 2016). Our perspective is that this observation is compatible with previous reports of stronger expressions of slow (δ-range) prediction error signaling in the auditory cortex of amusics, during presentation of tone sequences (Albouy et al., 2015). Recent data on neurophysiological processing of natural speech also reported that δ signaling is enhanced in auditory cortex by words and phonemes that are less predictable in the sentence flow (Donhauser and Baillet, 2020). Mechanistically, we propose that, although PAC is expressed ubiquitously and dynamically in the human brain (Florin and Baillet, 2015), overexpressions of PAC coupling may reflect a lack of flexibility in the adjustment of the phase angle where fast frequency bursts are nested along slow frequency cycles (Lennert et al., 2021). This phase angle is related to the level of net excitability of the underlying cell assemblies and has been discussed as an essential parameter for the neural registration of sensory inputs (Gips et al., 2016, Lennert et al., 2021). High levels of PAC may reduce opportunities for registering, and therefore encoding and processing, incoming sensory inputs with sufficient temporal flexibility and adaptation to prediction errors (Arnal and Giraud, 2012). These considerations may inspire future studies in the field.
δ-to-β coupling was stronger in auditory regions than in inferior frontal cortex in both groups (Fig. 3G), which we interpret as the tracking of stimulus cortical inputs by auditory delta activity, which is expected to be more direct than in downstream regions. Yet, another marked difference between groups was that there were modulations of δ-to-β PAC in the inferior frontal cortex of typical listeners depending on their individual report of the target tone being perceived as deviant, regardless of accuracy. Such percept-dependent increases are compatible with the known involvement of the inferior frontal cortex in pitch detection (Alain et al., 2001; Doeller et al., 2003; Florin et al., 2017) and in integrating auditory events that are presented sequentially (Tillmann et al., 2006; Albouy et al., 2013, 2017). There was no such modulation in amusics, which is in line with the absence of P300 event-related responses in this population for small pitch deviations (Fig. 2), with IFG as a contributing generator (Albouy et al., 2013, 2015; Florin et al., 2017).
We derived tPAC over time windows around the occurrence of each of the tones in the sequence. In the auditory cortex of both groups, there was an increase of cross-frequency coupling immediately after the onset of the tone sequence (Fig. 3B), which culminated at the expected latency of the target tone presentation. This was confirmed by a time-resolved analysis of stimulus-to-β coupling in the auditory cortex, which showed that stronger phasic β activity occurred at the expected latency of the auditory tones in the sequence (Fig. 3H). This observation is compatible with the signaling of predictive inferences concerning the timing of the next expected tone presentation in the sequence, which Morillon and Baillet (2017) showed to be emphasized by temporal attention. These effects were observed both in typical listeners and amusics. Such coupling between the timing of tone presentations (i.e., the actual physical stimulus) and modulations of beta band signal amplitude in auditory cortex was previously observed by Chang et al. (2018): They showed that stimulus-to-β coupling in right auditory cortex was associated with the predictability of pitch changes in a sequence. In our study, pitch changes occurred systematically on the fourth tone of the sequence and in only 50% of the trials. Hence, the predictability of the timing of pitch changes was high, but their actual occurrences were poorly predictable from trial to trial. In that respect, our results are consistent with those of Chang et al. (2018), as our participants presented lower levels of stimulus-to-β coupling than endogenous δ-to-β, with no temporal modulations along the presentation of the tone sequence (Fig. 3H).
We interpret the lesser levels of stimulus-to-β compared with δ-to-β coupling as because of the fact that the dominant δ-band neurophysiological activity did not exactly match the tone presentation rate. This is indicative of phase and frequency jitters between the regular auditory inputs of the tone sequence and the induced neurophysiological responses. We also observed (Fig. 3G) that the beta band activity in auditory cortex was modulated by target-tone presentations more strongly in amusics than in controls. At the present time, we can only speculate that this may reflect the allocation of greater neural computation resources locally in primary auditory regions for tone prediction in amusics.
Our observations of functional connectivity, at different rhythmic frequencies of neural activity, between ROIs provide further insight into both the neurophysiological processes of typical pitch change detection and of the impaired processing in amusia. Coherence measures revealed stronger δ-band effects, right-hemisphere dominance, and auditory-IFG connectivity in typical listeners than in amusics, confirming previous published observations (Albouy et al., 2013, 2015, 2019; Peretz, 2016).
Directed connectivity analyses during resting state baseline revealed bilateral influences in the beta band from the motor cortex, directed to the auditory and inferior frontal cortices. These interactions persisted during task performance and were emphasized during the pretarget segment of each trial. These results are in line with reports of dynamically structured and anatomically organized beta band activity in the resting state (Brookes et al., 2011; Bressler and Richter, 2015). They are also concordant with strong emerging evidence that beta band activity is a vehicle for top-down signaling in brain systems during sensory processing (Engel et al., 2001; Engel and Fries, 2010; Bressler and Richter, 2015; Bastos et al., 2015; Michalareas et al., 2016; Morillon and Baillet, 2017; Chao et al., 2018). This body of empirical evidence is in support of the theoretical framework of predictive coding (Rao and Ballard, 1999; Friston and Kiebel, 2009) and predictive timing (Arnal and Giraud, 2012; Morillon and Baillet, 2017) in sensory perception. In this context, beta band top-down activity would channel predictive information concerning the expected nature and temporal occurrence of incoming sensory information to primary systems (Fontolan et al., 2014; Baillet, 2017). In essence, the theoretical principles posit sensory perception as an active sensing process, in which the motor system would play a key role especially in predicting the timing of expected sensory events (Schroeder et al., 2010). In audition, for instance, we previously showed that, even in the absence of overt movements, beta band oscillations issued in the motor cortex had influence on auditory cortices and contributed to the temporal prediction of tone occurrences in complex auditory sequences (Morillon and Baillet, 2017). Our present data confirm and extend these observations: the modulations of beta band activity in the auditory cortex peaked at the expected and effective occurrences of the tones in the sequence, which is compatible with the involvement of the motor cortex in driving inter-regional signals for predictive sensory timing. This top-down signaling mechanism was not affected in amusic participants.
During tone-sequence presentations, we found only in controls a bottom-up form of directed connectivity issued from the auditory cortices toward both the inferior frontal and motor cortices (Fig. 4B). These connections were not significantly expressed in amusics and were not present during the baseline resting state in both groups. δ-band oscillatory activity contributed to the mode of maximum regional PAC and encompassed the stimulus presentation rate of 2.85 Hz. Coherence connectivity analysis reflected that there was a hemispheric asymmetry connectivity transfer toward the right hemisphere (Fig. 4A), in line with previous reports (Zatorre et al., 1992). Such bottom-up connectivity transfer is also compatible with the principles of predictive coding and timing, which posit that primary sensory regions propagate prediction-error signals downstream in brain systems networks, for ongoing updates of internal predictive and decision models (Rao and Ballard, 1999; Friston and Kiebel, 2009; Baillet, 2017) as recently shown in natural speech processing (Donhauser and Baillet, 2020). This observation is consistent with published dynamical causal models of impaired directed connections between auditory and inferior frontal cortices in amusics (Albouy et al., 2013; 2015) and other neurophysiological disorders (Omigie et al., 2013). These previous results were not specific to narrowband oscillatory signals: they were obtained from event-related signals in response to tone-sequence presentations. Our findings are also in line with fMRI data showing reduced functional, not directed, connectivity in amusic participants between the same ROIs (Hyde et al., 2011; Albouy et al., 2019). Loui et al. (2009) also reported reduced anatomic connections via the arcuate fasciculus in amusic participants using diffusion-weighted imaging and tractography, although more recent results have been mixed (Chen et al., 2015; Wilbiks et al., 2016). In our data, directed connectivity measures were qualitatively similar between amusics and controls. The interindividual variability of directed connectivity statistics was greater in amusics, which may explain why both the strength and directionality of connections were deemed not significant in this group. We acknowledge that our sample size was small because of the relatively rare amusia syndrome. Yet the motifs of directed connectivity are compatible with the large effects we observed in behavior, local PAC statistics, and functional connectivity statistics (coherence) reflecting stronger δ-band interactions in typical listeners than in amusics.
In conclusion, we provide evidence that pitch discrimination from a sequence of pure tones engages a distributed network of cortical regions comprising at least the auditory, inferior frontal and lateral motor cortices. We also show that the motor cortex issues beta band signals directed to inferior frontal and auditory regions, which are present by default in the resting state, but which timing during auditory presentation marks the actual expected occurrences of tones in the sequence. The auditory cortex is entrained at a rate around the physical pace of the tone sequence, and this signal is propagated in a bottom-up fashion further downstream to the motor system and along the ventral pathway to the inferior frontal cortex. These poly-frequency phenomena interact locally through PAC, which increases in auditory regions at the onset of the tone sequence and culminates at the expected occurrence of the target tone before returning to baseline levels. Our data identify two cross-frequency mechanisms as crucial to pitch-change detection, when contrasting amusic participants with typical listeners. First, δ-to-β PAC is elevated in the auditory and inferior frontal regions of amusics. Second, bottom-up signaling along the ventral auditory pathway and to the motor cortex is depressed in this group. In sum, our findings point at an alteration of pitch encoding in the auditory regions of amusics, which may depress prediction error signaling driven to motor and inferior frontal regions, and eventually poorer perceptual detection. The predictive timing functions seem to be preserved in amusics, at least in the present context of highly predictable and regular pacing of the tone sequence.
Together, we believe these findings advance the complete and dynamic view of tone sequence sensory processing in audition. We anticipate that some of these new observations would generalize to other sensory modalities and that the cross- and poly-frequency neurophysiological markers of impaired auditory processing would be pertinent to other functional deficits in sensory perception.
Footnotes
S.S. was supported by McGill University Integrated Program in Neuroscience. I.P. was supported by Natural Sciences and Engineering Research Council of Canada, Canadian Institutes of Health Research, and Canada Research Chair program. S.B. was supported by Natural Science and Engineering Research Council of Canada Discovery Grant 436355-13, Canada Research Chair of Neural Dynamics of Brain Systems, National Institutes of Health Grant 1R01EB026299-01, Healthy Brains for Healthy Lives Canada Excellence Research Fund, and Brain Canada Foundation Platform Support Grant PSG15-3755. Pilot data for this study were collected with support from Research Incubator Grant from McGill University's Center for Research on Brain Language & Music.
The authors declare no competing financial interests.
- Correspondence should be addressed to Sylvain Baillet at sylvain.baillet{at}mcgill.ca