Abstract
The detection of temporally unpredictable visual targets depends on the preceding phase of alpha oscillations (∼7–12 Hz). In audition, however, such an effect seemed to be absent. Due to the transient nature of its input, the auditory system might be particularly vulnerable to information loss that occurs if relevant information coincides with the low-excitability phase of the oscillation. We therefore hypothesized that effects of oscillatory phase in audition will be restored if auditory events are made task irrelevant and information loss can be tolerated. To this end, we collected electroencephalography (EEG) data from 29 human participants (21F) while they detected pure tones at one sound frequency and ignored others. Confirming our hypothesis, we found that the neural response to task-irrelevant but not to task-relevant tones depends on the prestimulus phase of neural oscillations. Alpha oscillations modulated early stages of stimulus processing, whereas theta oscillations (∼3–7 Hz) affected later components, possibly related to distractor inhibition. We also found evidence that alpha oscillations alternate between sound frequencies during divided attention. Together, our results suggest that the efficacy of auditory oscillations depends on the context they operate in and demonstrate how they can be employed in a system that heavily relies on information unfolding over time.
Significance Statement
The phase of neural oscillations shapes visual processing, but such an effect seemed absent in the auditory system when confronted with temporally unpredictable events. We here provide evidence that oscillatory mechanisms in audition critically depend on the degree of possible information loss during the oscillation's low-excitability phase, possibly reflecting a mechanism to cope with the rapid sensory dynamics that audition is normally exposed to. We reach this conclusion by demonstrating that the processing of task-irrelevant but not task-relevant tones depends on the prestimulus phase of neural oscillations during selective attention. During divided attention, cycles of alpha oscillations seemed to alternate between possible acoustic targets similar to what was observed in vision, suggesting an attentional process that generalizes across modalities.
Introduction
Confronted with a dynamic environment, our brain constantly engages in the selection and prioritization of incoming sensory information. Previous research posits that neural oscillations, rhythmic fluctuations in neural excitability, are instrumental for this purpose (Schroeder and Lakatos, 2009). One fundamental assumption in this line of research is that the sensory information that coincides with the high-excitability phase of an oscillation is processed more readily than that occurring during the low-excitability phase, leading to perceptual or attentional rhythms (VanRullen, 2016b).
Previous studies in the visual modality have confirmed this assumption, demonstrating that the detection of temporally unpredictable targets depends on the prestimulus phase of alpha oscillations in the EEG (Busch et al., 2009; Mathewson et al., 2009; Dugué et al., 2011, 2015). This phasic effect was only found for the detection of attended, but not unattended visual targets (Busch and VanRullen, 2010).
Studies in the auditory modality, however, revealed a more ambivalent role of neural oscillations in auditory perception (VanRullen et al., 2014). On the one hand, the detection of near-threshold auditory tones, presented at unpredictable moments in quiet, does not depend on pre-target neural phase (Zoefel and Heil, 2013; VanRullen et al., 2014). This result seems to question the assumption of an auditory perception that is inherently rhythmic. On the other hand, it is clear that neural oscillations aligned (“entrained”) to rhythmic stimuli (Henry and Obleser, 2012; Obleser and Kayser, 2019; van Bree et al., 2021) or transcranial current stimulation (Neuling et al., 2012; Riecke et al., 2015; Zoefel et al., 2020; van Bree et al., 2021) serve a mechanistic role in auditory attention and perception. Rhythmicity in auditory processing can also be observed after a cue like the onset of acoustic noise, assumed to reflect a phase reset of oscillations in the theta range (Ho et al., 2017; Wöstmann et al., 2020; Lui et al., 2023).
We here tested a hypothesis that can reconcile these apparently discrepant findings. This hypothesis is based on the fact that the auditory environment is particularly dynamic and transient (Kubovy, 1988; VanRullen et al., 2014). Losing critical auditory information that coincides with the low-excitability phase of the oscillation may be too costly for successful comprehension of auditory input. To avoid such a loss of information, the brain may therefore suppress neural oscillations in auditory system and operate in a more “continuous mode” (Schroeder and Lakatos, 2009) if incoming auditory stimuli are relevant (e.g., attended) but their timing is unknown. This assumption predicts two scenarios in which a “rhythmic mode” can be restored (Zoefel and VanRullen, 2017). First, if the timing of relevant events is known, the phase of the oscillation can be adapted accordingly, and a loss of critical information during the low-excitability phase avoided. As explained above, such an effect is fundamental for the field of “neural entrainment” (Lakatos et al., 2008, 2019). A second scenario remained unexplored and was tested here: The temporary suppression of input processing during the low-excitability phase can be tolerated if expected events are irrelevant to perform a task, even if their timing is unpredictable. In this scenario, the processing of irrelevant (but not relevant) events would be modulated by the oscillatory phase.
We measured participants’ EEG and asked them to detect pure tones at one sound frequency (task-relevant tone) and ignore pure tones at another sound frequency (task-irrelevant tone), presented at unpredictable moments (Fig. 1A). We predicted that the processing of the task-irrelevant, but not that of the task-relevant tone, depends on the pretone phase of neural oscillations. In a condition where both tones needed to be detected, we tested how divided attention impacts the role of spontaneous oscillations for auditory processing. On the one hand, as only task-relevant tones were present, the system might use a continuous mode of processing that avoids a loss of information at the low-excitability phase, similar to what we hypothesized for selective attention to specific sounds frequencies. In such a case, the processing of neither tone would depend on the preceding oscillatory phase. On the other hand, the presence of multiple task-relevant tones might require a rhythmic alternation of attentional focus between these tones—and consequently, a phasic modulation of detection even for task-relevant tones—as previously demonstrated for the visual system (Fiebelkorn et al., 2013; Helfrich et al., 2018) and after a cue-induced “reset” in the auditory system (Ho et al., 2017, 2019).
Materials and Methods
Participants
Thirty native French participants took part in the experiment with informed consent for a monetary reward of €25. The data of one participant was excluded due to technical issues, thus 29 participants (21 females; mean age, 22.34; SD = 1.2) were included in the final data analyses. All experimental procedures were approved by the CPP (Comité de Protection des Personnes) Ouest II Angers (protocol number 21.01.22.71950/2021-A00131-40).
Experimental design
Participants performed a tone-in-noise detection task where they were presented with pure tones at two different sound frequencies (440 and 1,026 Hz), embedded at unpredictable moments into a continuous stream of pink noise (Fig. 1A). They were instructed to press a button when they hear a tone at the to-be-attended, task-relevant sound frequency and ignore the other one. A correct detection was defined as a button press within 1 s after pure tone onset throughout the experiment. All tones were 20 ms in duration with a rise-and-fall period of 5 ms. The continuous pink noise was presented at ∼70 dB SPL. Prior to the main experiment, the sound level of the pure tones was titrated individually so that ∼50% of tones were detected in the main task (see below, Adaptive staircase procedure). In total, 504 pure tones at each sound frequency were presented. These were divided into 12 experimental blocks, each ∼5 min long.
In “selective attention” blocks, participants had to detect tones at one of the two sound frequencies and to ignore the other. In “divided attention” blocks, they had to detect tones at both sound frequencies. While the “selective attention” block included both task-relevant and task-irrelevant tones, the “divided attention” block only included task-relevant tones. The order of the tones was pseudorandomized with the constraint that all probabilities of repetition (i.e., low-low, high-high) and alternation (i.e., low-high, high-low) are between 0.24 and 0.26. This constraint avoids that the identity of upcoming tones is predictable. The stimulus-onset asynchrony (SOA) between tones was randomized between 2 and 5 s with a uniform distribution to maximize temporal unpredictability. Due to an increase in hazard rate, the timing of tones presented later was slightly more predictable than that of earlier tones. Nevertheless, in a study using a similar range of SOAs (Zoefel and Heil, 2013), the prestimulus phase of neural oscillations did not affect the perception of near-threshold tones. As the phase of neural oscillations can influence auditory perception in temporarily predictive scenarios (Lakatos et al., 2019; Obleser and Kayser, 2019), we therefore conclude that the wide range of SOA used decreases predictability to a degree that it is not used by participants to anticipate upcoming tones. Note also that, even if some predictability were present, it would not differ between task-relevant and task-irrelevant tones and therefore not affect our main hypothesis of a phasic effect only in the latter.
We adopted a rolling adaptive procedure to ensure that participants would detect the tone at threshold level (50%) throughout the experiment. After each block, if the participant's detection probability was lower than 40% or higher than 60%, the sound level of the tone at the corresponding pitch was increased or decreased by 1 dB, respectively. The block order (selective attention–low pitch, selective attention–high pitch, divided attention) was counterbalanced between participants.
Stimulus presentation was done via Matlab 2019a (MathWorks) and Psychtoolbox (Brainard, 1997). The auditory stimuli were presented using Etymotic ER-2 inserted earphones and a Fireface UCX soundcard. The same sound card was used to send triggers to the EEG system, ensuring synchronization between sound and EEG.
Adaptive staircase procedure
Individual detection thresholds were determined separately for each of the two pure tones with a 1-up-1-down adaptive staircase procedure as implemented in the Palamedes toolbox (Prins and Kingdom, 2018). In each adaptive trial, one pure tone was embedded randomly between 0.5 and 4.5 s after the onset of a 5 s pink noise snippet. The participant had to press a button as soon as they detected the pure tone. At the start of the procedure, both tone and pink noise were presented at −30 dB relative to the maximal output of the soundcard (∼100 dB SPL), which corresponded to ∼70 dB SPL. The sound level of the tone then decreased in steps of 1 dB if the participant correctly detected it within a 1 s time window or increased accordingly if they missed the pure tone within that time period. The pink noise remained at 70 dB SPL throughout the entire adaptive procedure so that the signal-to-noise ratio between tone and noise would decrease if participants responded correctly. The adaptive procedure ended after 10 reversals, and the final six reversals were used to calculate the threshold. The convergence of the staircase procedure was examined by visual inspection to determine whether the threshold would be used in the following main experiment. Convergence was considered to have failed if there remained a visible slope in the staircase during the last six reversals. If convergence failed, the adaptive procedure was repeated. The average thresholds for high- and low-frequency tones were −9.67 dB (SD = 1.21 dB) and −7.10 dB (SD = 1.26 dB) relative to the pink noise, respectively, resulting in ∼50% detected tones during both selective and divided attention (Fig. 1B).
EEG recording and data processing
EEG was recorded using a Biosemi Active 2 amplifier (Biosemi). Sixty-four active electrodes positioned according to the international 10-10 system. The sampling rate of the EEG recording was 2,048 Hz. Equivalent to typical reference and ground electrodes, the Biosemi system employs a “Common Mode Sense” active electrode and a “Driven Right Leg” passive electrode located in the central-parietal region for common mode rejection purposes. The signal offsets of all electrodes were kept under 50 µV.
All EEG preprocessing steps were conducted using Matlab 2021a (MathWorks) and the FieldTrip toolbox (Oostenveld et al., 2011). EEG data were rereferenced to the average of all electrodes. Then, the data were high- and low-pass filtered (fourth-order Butterworth filter, cutoff frequencies 0.5 and 100 Hz, respectively). Noisy EEG channels were identified by visual inspection and were interpolated. Artifacts such as eyeblinks, eye movements, muscle movements, and channel noise were detected in an independent component analysis (ICA) applied to downsampled EEG data with a sampling rate of 256 Hz. Contaminated components were detected by visual inspection and removed from data at the original sampling rate. The continuous EEG data were segmented from −2 to +2 s relative to each tone onset, termed “trials” in the following. Trials with an absolute amplitude that exceeded 160 µV were rejected.
We did not measure participants’ subjective perception of task-irrelevant tones as this would have rendered them relevant. Instead, we used a neural proxy to infer how readily these tones were processed, and how processing depended on prestimulus phase. In line with previous work (Busch and VanRullen, 2010), we used global field power (GFP) evoked by tones as such a proxy. GFP corresponds to the spatial variance of evoked activity (Skrandies, 1990; Murray et al., 2008) and leverages the fact that components of event-related potentials (ERPs) typically consist of simultaneous positive and negative deflections in different electrodes (reflecting underlying “dipoles”). Consequently, the GFP is an assumption-free (as information from all electrodes is used and no choice of electrodes is required) and possibly more sensitive measure of the neural response to a stimulus. As both are evoked by the stimulus, the timing of peaks in both GFP and ERP are correlated as long as the global response (across the electrode montage) is of interest. In practice, ERPs were calculated for each participant, separately for correctly detected (hits) and missed targets (misses) and for each condition (task-relevant and task-irrelevant tones in selective attention blocks, task-relevant tones in divided attention blocks). For the selective attention condition, ERPs for trials where participants correctly did not respond to the task-irrelevant tone (correct rejection; CR) were also calculated. GFP was then extracted as the standard deviation of the ERPs across EEG channels (Lehmann and Skrandies, 1980; Skrandies, 1990; Murray et al., 2008; Busch and VanRullen, 2010; Wisniewski et al., 2020). A low-pass filter (cutoff frequency 10 Hz) was applied to the grand average GFP. Three relevant time lags for tone processing were then extracted as local maxima (i.e., peaks identified with the “findpeaks” Matlab function) in the low-pass filtered (cutoff frequency 10 Hz) grand average GFP between 0 and 1 s after tone onset, separately for selective and divided conditions (Fig. 1C). As the aim of this step is the identification of relevant time lags for tone processing, we restricted the analysis to detected task-relevant tones (Fig. 1C,D). Time windows of interest for the analysis of phasic effects (see below) were selected as ±30 ms around each of these three peaks. Single-trial GFP amplitudes were obtained by averaging the GFP amplitude across time points within each time window of interest. This was done separately for each experimental condition, including those without a behavioral response (i.e., the task-irrelevant conditions).
We used a fast Fourier transform (FFT) with Hanning tapers and sliding windows from −800 to 0 ms (corresponding to the center of the analysis window) before tone onset (0.02 s steps) to extract EEG phases at frequencies from 2 to 20 Hz (1 Hz steps) from single trials and channels. The window size for phase estimation was linearly spaced between 2 (for 2 Hz) and 5.6 (for 20 Hz) cycles of the corresponding frequency. The subsequent analytical steps were restricted to phases estimated from windows that do not include poststimulus EEG data (compare Figs. 1E, 2A). This avoids a potential contamination with stimulus-evoked responses that can lead to spurious phase effects (Zoefel and Heil, 2013; Vinao-Carl et al., 2024).
Statistical analysis
To address our main hypothesis, we tested whether the magnitude of the stimulus-evoked response (as GFP; see previous section) varies with prestimulus neural phase (Fig. 1E). We used a statistical approach that a previous simulation study (Zoefel et al., 2019) showed to be particularly sensitive to such phasic effects (“sine fit binned” method in that study). For each condition, participant, EEG channel, frequency, and time point separately, single trials were divided into eight equally spaced bins according to their phase (Fig. 1EI) and the average GFP amplitude extracted for each phase bin. We then fitted a sine function to the resulting phase-resolved GFP amplitude (Fig. 1EII). The amplitude of this sine function (Fig. 1EII, a) indexes how strongly tone processing is modulated by EEG phase whereas its phase (Fig. 1EII, p) reflects “preferred” and “non-preferred” phases for GFP (leading to highest and lowest GFP, respectively). To quantify phase effects statistically, we compared sine fit amplitudes with those obtained in a simulated null distribution, i.e., in the absence of a phasic modulation of tone processing. This null distribution was obtained by randomly assigning EEG phases to single trials and recomputing the amplitude of the sine 1,000 times for each condition, EEG channel, frequency, and time point (VanRullen, 2016a). For each combination of these factors, the sine amplitude from the original data was compared with the null distribution to obtain group-level z-scores:
One advantage of the statistical method used is that it makes explicit assumptions on whether participants have consistent “preferred” EEG phases, reflected in the phase of the sine fitted to individual participants (Fig. 1EII). If these phases are uniformly distributed (i.e., inconsistent across participants), the sine fit amplitude is extracted separately for each participant and then averaged before the comparison with the surrogate distribution. In this way, the z-score defined above is independent of individual preferred EEG phases. If phases are nonuniformly distributed (i.e., consistent across participants), the phase-resolved GFP (Fig. 1EI) is first averaged across participants and the sine function is fitted to the resulting average (Fig. 1EII) before the comparison with the surrogate distribution. In this way, the z-score is only high when its phase is consistent across participants. To test which version of the test is appropriate in our case, we applied a Rayleigh's test for circular uniformity (Circular Statistics Toolbox; Berens, 2009) to the distribution of individual preferred EEG phases at each time–frequency point. We found a prestimulus cluster of significant phase consistency across participants (compare Results) and adapted our statistical method accordingly (using the second version described).
We adapted this statistical approach to test whether task-relevant and task-irrelevant tones differ in their phasic modulation. In this version, we contrasted the difference in averaged sine fit amplitudes between the two conditions (relevant vs irrelevant) with another surrogate distribution for which the condition label was randomly assigned to trials. This procedure yielded another z-score which was calculated as described above.
In the divided attention condition, we additionally tested whether the processing of the low- and high-frequency tone has a different preferred phase by comparing the phase difference between the two tone conditions against zero (circular one-sample test against angle of 0; circ_mtest.m in the Circular Statistics Toolbox).
Source localization of the phase dependence effect
We also explored the neural origins of the effects found in the analysis of EEG phase effects, using standard procedures implemented in the FieldTrip toolbox. For this purpose, we used a standard volume conduction model and electrode locations to calculate the leadfield matrix (10 mm resolution). Then, for the selective attention condition, we calculated a spatial common filter using the LCMV beamformer method (lamba = 5%; Van Veen et al., 1997) on the 20 Hz low-pass filtered EEG data from −1 to −0.5 s relative to tone onset. The chosen time window encompasses all of the observed phase effects (compare Results). This resulted in 2,015 source locations that were inside the brain.
Single-trial EEG data from individual participants were projected onto the source space with the spatial common filter. The analysis of phasic effects was then applied to data from each source location as described above for the sensor level. Due to the large computational demand, we used 100 permutations for the construction of surrogate distributions (z-score defined above), a number shown to be sufficient in the past (VanRullen, 2016a). The voxels with the 1% largest z-scores were selected as the origin of the corresponding effects on the sensory level. Note that, due to the low spatial resolution of EEG, we explicitly treat these source-level results as explorative.
Results
Overview
Participants were presented with tones at two different sound frequencies (Fig. 1A). In some experimental blocks, they were asked to detect one of them (task-relevant tone in the selective attention condition) and ignore the other (task-irrelevant tone in the selective attention condition). In other blocks, they were asked to detect both of them (divided attention condition).
A, Schematic of the tone-in-noise detection task. Purple and yellow rectangles denote task-relevant and task-irrelevant tones, respectively. In the main experiment, low and high tones served as task-relevant and task-irrelevant tones in different blocks. The gray line shows the continuous pink noise. B, Behavioral performance for selective attention (left) and divided attention (right) conditions. Black lines show the mean across participants. C, D, Global field power (GFP) for hit (blue; relevant tones), miss (red; relevant tones), and correct rejection (CR; yellow; irrelevant tones) in the selective (C) and divided (D) attention conditions. Gray areas indicate the time window selected for the phase dependence analysis. Insets show topographies of GFP at each time window for hit trials. E, Illustration of the analysis pipeline for the phase dependence analysis. EI, Extraction of single-trial phase estimates for individual participants. The black line in the left panel shows the EEG data; gray-shaded areas denote the time windows used to extract the prestimulus phase and poststimulus neural response, respectively, for the example shown. These measures were combined to quantify how strongly GFP in a given poststimulus time window depends on prestimulus phase in each subject (right panel). EII, Group-level analysis. The individual phase effects, shown for one subject in the right panel of EI, are now illustrated for all individual subjects as thin gray lines in the left panel. The hypothesized phase effect was quantified on the group level by fitting a sine function to the averaged data (bold black line) and contrasting the amplitude a of this fit against that obtained in a permutation distribution (N = 1,000). This analysis assumes that the phase p of individual sine functions is consistent across participants, an assumption that we verified statistically (see Materials and Method; Results).
On average, participants detected 51.18% (SD = 0.07%) and 51.84% (SD = 0.04%) of task-relevant tones during selective and divided attention, respectively (Fig. 1B), demonstrating successful titration of individual thresholds (see Materials and Methods).
We did not ask participants to detect irrelevant tones as this would have rendered them task relevant. Instead, we used a neural correlate of the detection of task-relevant tones to infer how strongly individual task-irrelevant tones had been processed. During both attentional conditions, task-relevant tones produced a strong increase in global field power (GFP) if they were detected but not if they were missed (Fig. 1C,D). We therefore used the grand-average evoked GFP as a proxy for tone processing and identified three time lags with local GFP maxima for further analyses (Fig. 1C,D, gray). The time lags for “early,” “medium,” and “late” evoked GFP were 119, 227 and 598 ms for the selective attention condition and 159, 243, and 457 ms for the divided attention condition, respectively. The lags identified resemble those from other related studies (Busch and VanRullen, 2010; Wisniewski et al., 2020). The GFP at each of the three time lags was significantly larger for detected than for missed task-relevant tones during both selective (early: t(28) = 7.81, p < 0.001; medium: t(28) = 7.89, p < 0.001; late: t(28) = 10.67, p < 0.001) and divided attention (early: t(28) = 7.46, p < 0.001; medium: t(28) = 7.22, p < 0.001; late: t(28) = 8.44, p < 0.001), demonstrating perceptual relevance of this measure of neural processing.
Having identified critical time lags from responses to task-relevant tones, we extracted the GFP—our measure of tone processing—evoked by both task-relevant and task-irrelevant single tones. We then tested how strongly GFP at each of the three lags depends on prestimulus EEG phase in the different conditions (task-relevant vs irrelevant, selective or divided attention). Following previous work (Zoefel et al., 2019; Lui et al., 2023), we fitted a sine function to GFP as a function of EEG phase (Fig. 1E) and used the amplitude of this fit (Fig. 1EII, a) as a measure of phasic modulation strength. Statistical reliability of the phase effects was tested by comparison with a simulated null distribution (as z-score; see Materials and Methods).
In the following, we illustrate results separately for task-relevant (Fig. 2) and task-irrelevant tones (Fig. 3) in the selective attention condition, respectively, as well as for the divided attention condition, where only task-relevant tones were present (Fig. 4). We only display results for early and late GFP, as no phasic modulation was found for the medium time lag in any of the conditions.
Results for task-relevant tones in the selective attention condition. The color shows how strongly GFP (A, B) and hit rate (C) depends on EEG phase, expressed relative to a surrogate distribution, and averaged across channels. Time 0 corresponds to tone onset. In A and B, insets show relevant time lags for the analysis (early GFP: +119 ms; late GFP: +598 ms. Time–frequency points “contaminated” by poststimulus data (which is “smeared” into prestimulus phase estimates during spectral analysis) are masked. Prestimulus EEG phase does not predict hit rate or GFP evoked by task-relevant tones in any of the selected time windows.
Results for task-irrelevant tones in the selective attention condition. A, D, Same as Figure 2A,B, but for task-irrelevant tones, and for channels selected for their significant phasic modulation of GFP (p < 0.05 after FDR correction). Black contours show the time–frequency points with significant phase effects. Bold black contours show the cluster with the largest summed z-score. Top insets on the two panels show the topographical maps of z-scores in the corresponding time–frequency clusters. Bottom insets show the 1% voxels with the largest source-projected z-scores in the same clusters. B, E, Distribution of individual phases of the sine function fitted to phase-resolved GFP (Fig. 1EII, p), at the time–frequency–channel combination with strongest phasic modulation (B: 11 Hz, −0.64 s, C5; E: 4 Hz, −0.76 s, FT7). C, F, GFP as a function of EEG phase from the same time–frequency–channel combination. The bold line shows the group-level average, the shaded area shows its standard error. Insets next to the titles show the GFP from Figure 1C with the time windows at which the analysis was performed.
Results for task-relevant tones in the divided attention condition. A, D, As in Figures 2A,B, 3A,D, the color shows how strongly GFP in early (A) and late (D) poststimulus time windows depends on the prestimulus phase of neural oscillations. Thin black contours in A correspond to the time–frequency points with significant phase effects. Inset shows the topographical map of z-scores between 3 and 8 Hz from −0.46 to −0.34 s. B, Distribution of individual differences in the prestimulus phases that evoked maximal GFP (related to p in Fig. 1EII) for low- and high-frequency tones, respectively, at the frequency (3 Hz), time (−0.42 s), and channel (FC4) with the peak effect. C, GFP as a function of EEG phase for low- and high-frequency tones separately.
Neural response evoked by task-irrelevant but not task-relevant tones depends on phase of neural oscillations during selective attention
We found that prestimulus EEG phase did not predict GFP evoked by task-relevant tones at any of the three time lags (all p > 0.05 after FDR correction; Fig. 2A,B). Consistent with this result, the probability of detecting these tones was independent of prestimulus phase (all p > 0.05 after FDR correction; Fig. 2C). In contrast, both early (Fig. 3A–C) and late (Fig. 3D–F) GFP evoked by task-irrelevant tones depended on prestimulus phase.
For the early lag, the phasic modulation was maximal at 10 Hz and 0.8 s preceding tone onset (z = 5.41, FDR-corrected p = 0.003). The EEG phase leading to maximal GFP at that time–frequency point was consistent across participants (Rayleigh's test; z = 6.21, FDR corrected p = 0.006; Fig. 3B). The largest cluster of significant z-scores (FDR-corrected p < 0.05) was identified at ∼10–11 Hz, in the left central channels, and between −0.7 and −0.62 s relative to tone onset (summed z = 63.2, 14 time–frequency–channel points; Fig. 3A). Explorative source localization revealed that the phasic modulation originated from the left superior temporal cortex (Fig. 3A, inset).
For the late lag, the phasic modulation was maximal at 5 Hz and 0.7 s preceding tone onset (z = 5.49, FDR-corrected p = 0.001). The EEG phase leading to maximal GFP at that time–frequency point was also consistent across participants (z = 6.24, FDR corrected p = 0.006; Fig. 3E). The largest cluster of significant z-scores was identified at ∼4–5 Hz and between −0.78 and −0.74 s relative to tone onset (summed z = 47.45, 11 time–frequency–channel points; Fig. 3D). This effect was localized to the right superior frontal gyrus and, to a lesser extent, the right inferior parietal cortex (Fig. 3D, inset).
Contrasting amplitudes of the fitted sine functions between task-relevant and task-irrelevant tones, we found a stronger phasic modulation for the task-irrelevant tones at their relevant time-frequency points (Fig. 3A,D) that concerned both early GFP (z = 4.08, p < 0.001; paired t test) and late GFP (z = 2.92, p = 0.004). However, neither of these outcomes survived correction for multiple comparison (p > 0.05 after FDR correction).
Together, our results confirm previous findings that the processing of task-relevant auditory information is independent of the phase of neural oscillations (Zoefel and Heil, 2013) and extend them by demonstrating that such a phasic modulation reappears when the information is made irrelevant. Both alpha and theta oscillations, through their correspondence with different stages of neural processing, seem to contribute to rhythmic effects on unattended information during selective attention.
Early but not late response evoked by task-relevant tones depends on phase of neural oscillations during divided attention
In the divided attention condition, only task-relevant tones were present. According to our principal hypothesis, the auditory system should suppress oscillations and instead operate in a continuous mode of processing to avoid a loss of information at the low-excitability phase. However, an alternative possibility is that the presence of multiple target tones requires a rhythmic alternation of attentional focus between these tones as previously demonstrated for the visual system (Fiebelkorn et al., 2013; Helfrich et al., 2018). Such a case would lead to a phasic modulation of tone processing, similarly to what we observed for task-irrelevant tones in the selective attention condition.
Figure 4 shows how strongly the evoked GFP at early (A) and late (D) time lags depended on prestimulus EEG phase in the divided attention condition. We found a phasic modulation of tone processing only for the early time lag. This effect was maximal at 3 Hz and 0.42 s preceding tone onset (z = 5.09, FDR-corrected p = 0.01). However, we could not identify a cluster of significant z-scores, suggesting that these did not conglomerate in neighboring electrodes, frequency, or time as evidently as for the selective attention condition. EEG phase leading to the strongest early GFP were similar for low- and high-frequency tones (Fig. 4C), supported statistically by a distribution of their phase difference (Fig. 4B) that did not significantly differ from zero (mean angle = 0.23, p = 0.71; circular one-sample test against angle of 0). The probability of detecting tones did not depend on prestimulus phase during divided attention (all FDR corrected p < 0.05; results for time–frequency point with strongest effect in Fig. 4A: z = 0.89, p = 0.37).
Together, our results show that a rhythmic mode of processing reappears in the auditory system when confronted with multiple targets, but only affects early stages of target processing. In the presence of two target tones, the frequency of modulation is approximately divided by half as compared with a single tone, and the two target tones have similar preferred EEG phases for their processing. These results speak for a mechanism processing each of the two tones at subsecutive cycles of a faster rhythm, as we explain in the Discussion.
Discussion
The current study aimed to unveil the rhythm of auditory perception during selective and divided attention. To this end, we asked participants to perform a target-in-noise detection task where they had to attend to tones at one sound frequency and ignore another (selective attention) or had to attend to both (divided attention).
In line with previous work (Zoefel and Heil, 2013; VanRullen et al., 2014) and our own hypothesis, we found that neural and behavioral responses to task-relevant tones do not depend on the prestimulus phase of neural oscillations during selective attention. Conversely, early and late neural responses to task-irrelevant tones were modulated by the phase of prestimulus alpha and theta oscillations, respectively. These results demonstrate that while neural oscillations seem to be suppressed during attentive selection of single auditory targets, there exists a rhythmic mode of perception in the auditory system that is applied to unattended sensory information. Finally, we found evidence that this mode is also active when confronted with multiple auditory targets, although restricted to early stages of their processing.
An inattentional rhythm in audition
It is a striking difference between modalities that selective attention increases the effect of neural phase on the processing of temporarily unpredictable targets in the visual domain (Busch and VanRullen, 2010) but decreases it in the auditory one (Zoefel and Heil, 2013; current study). Confirming previous speculations (Zoefel and VanRullen, 2017), we here demonstrate that a rhythmic mode of auditory processing is restored when stimuli become irrelevant and information loss is tolerable. This “inattentional rhythm” that seems specific to audition may arise from specific requirements on the auditory system during dynamic stimulus processing.
In contrast to the relatively stable visual environment, auditory inputs are often transient and dynamic. Therefore, periodic sampling of the external environment may be more detrimental for audition when temporarily unpredictable information is important for goal-directed behavior. In this case, the auditory system may engage in a desynchronized cortical state in the auditory cortex that is associated with the active processing of incoming sensory inputs (Pachitariu et al., 2015). As much as this “continuous mode” prevents the loss of information by suppressing periodic moments of low excitability, it is likely to be metabolically demanding (Schroeder and Lakatos, 2009). Therefore, the auditory system may limit the use of such a mode to scenarios in which a loss of information is likely (such as the expectation of relevant events whose timing cannot be predicted). This notion can also explain the prevalence of rhythm in acoustic information (music, speech, etc.): If relevant events are presented regularly, then their timing can be predicted and the oscillatory phase adapted accordingly (Lakatos et al., 2008). Such a mechanism would enable a “rhythmic mode” of processing even for task-relevant stimuli.
Based on these results, we propose that—due to its highly dynamic environment—the auditory system always needs to be “one degree more attentive” to sensory information than the visual one. We illustrate this idea in Figure 5A that can be summarized as follows: In the presence of temporarily unpredictable, relevant information, the auditory system needs to operate in a continuous mode of high excitability, whereas the visual system can sample rhythmically, due to the significantly slower input dynamics. A similar rhythmic mode of processing is used in the auditory system to sample unattended input, whereas it is processed in a mode of continuous low sensitivity in the visual case. The latter explains why we observed a phasic modulation of task-irrelevant tones in the current study, in contrast to an absence of such an effect in the visual modality (Busch and VanRullen, 2010). Our model is also supported by the finding that auditory distractors are more distracting than visual distractors (Berti and Schröger, 2001), even when the primary task is in the visual modality (Lui and Wöstmann, 2022). This might be because the auditory system exhibits periodic moments of high sensitivity to distractors and is therefore also more sensitive to potentially threatening stimuli that warrant immediate action.
Hypothetical “modes” of processing that do or do not rely on the phase of neural oscillations during selective (A) and divided attention (B). A, If the timing of relevant events is unknown, the auditory system might need to suppress neural oscillations to avoid a loss of information at the low-excitability phase and operate in a mode of continuous high excitability (continuous purple line), whereas the visual system can operate rhythmically (dashed purple), due to its slower sensory dynamics. If events become irrelevant, the auditory system might change to a mode of periodic high sensitivity, reflected in a rhythmic sampling of irrelevant information (continuous yellow). The visual system might not need these high-sensitivity moments for irrelevant information, resulting in a continuous mode of low excitability (dashed yellow). B, Three hypothetical modes of processing during auditory divided attention. When multiple targets need to be processed, the auditory system might operate in a continuous mode of processing to avoid loss of information at a low-excitability phase (I, left). Such a mode would lead to a detection of these targets that is independent of phase (I, right). Alternatively, the presence of multiple targets might require an alternation of attentional focus between possible sound frequencies that relies on neural phase at the frequency f. This might be achieved by prioritizing different sound frequencies at different neural phases (II, left), leading to a target detection probability that depends on the phase at f and a preferred phase for detection that changes with sound frequency of the target (II, right). In an alternative rhythmic mode, possible sound frequencies are processed at the same (high-excitability) phase of f, but in subsecutive cycles (III, left), leading to a phase effect at f/2 and to similar preferred phases across sound frequencies (III, right). The latter is what we have observed in the current study (compare Fig. 4A).
Alpha and theta oscillations modulate distinct processing steps of irrelevant events
We found that the prestimulus phase of alpha oscillations predicts a relatively early response to task-irrelevant tones whereas the prestimulus phase of theta oscillations predicts later responses (Fig. 3). We speculate that this finding can be attributed to distinct steps in the processing of task-irrelevant events that depend on different oscillatory frequency bands.
The phase of alpha oscillations is posited to gate perception via pulsed inhibition (Jensen and Mazaheri, 2010) at early stages of cortical processing where the encoding of sensory events takes place (Klimesch et al., 2011). Indeed, the phasic modulation of the early evoked response in the alpha band seemed to originate from relatively early stages of the auditory cortical hierarchy in our study (Fig. 3B). The timing of the early evoked GFP (∼119 ms) is well in line with components of stimulus-evoked neural responses (e.g., P1, N1) that have been associated with stimulus encoding (Näätänen and Picton, 1987). Although imaging methods with higher spatial localization are required to validate this hypothesis, we speculate that alpha oscillations phasically modulate the encoding of task-irrelevant events (e.g., distractors).
Stimulus-evoked neural responses at later delays have been associated with higher-level cognitive operations, such as distractibility (Chao and Knight, 1995) as well as response execution and inhibition (Bokura et al., 2001). Theta oscillations in the frontal cortex have been considered a neural proxy of executive control (Mizuhara and Yamaguchi, 2007; Sauseng et al., 2007). A previous study showed evidence for a theta rhythm in distractibility by showing that perceptual sensitivity is explained by predistractor theta phase (Lui et al., 2023). It is thus possible that the propensity to ignore task-irrelevant events depends on prestimulus theta oscillations. The later timing of the theta-phase modulation in our study as well as its localization to more frontal brain regions is in line with this assumption (Fig. 3D). This effect may therefore reflect the inhibition of the processing of task-irrelevant events that occurs after their encoding. The fact that only early phasic effects were present, but the later theta-phase modulation was absent during divided attention (Fig. 4) further supports this assumption, as no distractors needed to be inhibited in that condition.
It remains an open question why the strongest phase effect occurred relatively early before tone onset (∼−800 to −600 ms) and earlier than what has previously been reported (Busch and VanRullen, 2010; Harris et al., 2018; Zazio et al., 2021). On the one hand, the closer to stimulus onset, the stronger is the “contamination” of phase estimates by poststimulus data (Zoefel and Heil, 2013; Vinao-Carl et al., 2024), potentially obscuring maxima closer to tone onset. On the other hand, the earliest time points that remain unaffected by temporal smearing can be estimated precisely and do not show the strongest effects (Figs. 2–4). Other factors might therefore play a role and need to be identified in future work. For example, it is possible that the perception and suppression of task-irrelevant auditory events is achieved through connectivity with other brain regions that eventually cascades down to the auditory system at stimulus onset.
A rhythmic mode in auditory divided attention
We found evidence for a rhythmic mode of processing during auditory divided attention, and our results provide insights into a mechanistic implementation of such a mode. The phasic modulation of the early GFP evoked by the two tones (Fig. 4A) contradicts our initial hypothesis that neural oscillations are suppressed during divided attention to task-relevant tones (Fig. 5BI). Nevertheless, the two tones (low and high sound frequency) could be processed in the same oscillatory cycle but at different phases (Fig. 5BII) as often proposed in the context of neural oscillations (Jensen et al., 2014; Gips et al., 2016) or in subsecutive cycles (Gaillard and Ben Hamed, 2022) and at a similar phase (Fig. 5BIII). Based on predicted result patterns that can distinguish these alternatives (Fig. 5B, right panels), our results favor the second one, as (1) the frequency of the early modulation is divided approximately by two as compared with the processing of a single tone (compare Figs. 3A, 4A) and (2) phases do not differ between the low- and high-frequency tones (Fig. 4B,C). Therefore, our results suggest that alpha oscillations do not only modulate the processing of task-irrelevant information, but also the early stages of task-relevant processing during divided attention, alternating between possible sound frequencies of targets.
This conclusion is well in line with previous research. For instance, the frequency of visual perception decreases with increasing number of to-be-attended features (Holcombe and Chen, 2013; Schmid et al., 2022). The spotlight of attention has been posited to alternate between two locations when both are attended, dividing an overall ∼8 Hz rhythm into a ∼4 Hz fluctuation in perceptual sensitivity per location (Landau and Fries, 2012; Song et al., 2014; Zoefel and Sokoliuk, 2014). In the auditory modality, a similar alternation between the two ears has been reported during divided attention (Ho et al., 2017). We here extend this mechanism to an alternation between sound frequencies, supporting the previous observation that oscillatory mechanisms follow the tonotopic organization of the auditory cortex (Lakatos et al., 2013; L’Hermite and Zoefel, 2023).
Conclusion
By showing that the processing of task-irrelevant but not task-relevant tones depends on the prestimulus phase of neural oscillations during selective attention, we here provide evidence that oscillatory mechanisms in audition critically depend on the degree of possible information loss. We propose that this effect represents a crucial difference to the visual modality which might not be equally responsive to sensory information (Fig. 5). During divided attention, cycles of alpha oscillations seem to alternate between possible targets similar to what was observed in vision, suggesting an attentional process that generalizes across modalities.
Footnotes
This work was supported by a grant from the Agence Nationale de la Recherche (ANR-21-CE37-0002). We thank Quentin Busson for help with data collection.
The authors declare no competing financial interests.
- Correspondence should be addressed to Troby Ka-Yan Lui at trobylui{at}gmail.com or Benedikt Zoefel at benedikt.zoefel{at}cnrs.fr.