Abstract
Rhythmic entrainment echoes—rhythmic brain responses that outlast rhythmic stimulation—can demonstrate endogenous neural oscillations entrained by the stimulus rhythm. Here, we tested for such echoes in auditory perception. Participants detected a pure tone target, presented at a variable delay after another pure tone that was rhythmically modulated in amplitude. In four experiments involving 154 human (female and male) participants, we tested (1) which stimulus rate produces the strongest entrainment echo and, inspired by the tonotopical organization of the auditory system and findings in nonhuman primates, (2) whether these are organized according to sound frequency. We found the strongest entrainment echoes after 6 and 8 Hz stimulation, respectively. The best moments for target detection (in phase or antiphase with the preceding rhythm) depended on whether sound frequencies of entraining and target stimuli matched, which is in line with a tonotopical organization. However, for the same experimental condition, best moments were not always consistent across experiments. We provide a speculative explanation for these differences that relies on the notion that neural entrainment and repetition-related adaptation might exercise competing opposite influences on perception. Together, we find rhythmic echoes in auditory perception that seem more complex than those predicted from initial theories of neural entrainment.
SIGNIFICANCE STATEMENT Rhythmic entrainment echoes are rhythmic brain responses that are produced by a rhythmic stimulus and persist after its offset. These echoes play an important role for the identification of endogenous brain oscillations, entrained by rhythmic stimulation, and give us insights into whether and how participants predict the timing of events. In four independent experiments involving >150 participants, we examined entrainment echoes in auditory perception. We found that entrainment echoes have a preferred rate (between 6 and 8 Hz) and seem to follow the tonotopic organization of the auditory system. Although speculative, we also found evidence that several, potentially competing processes might interact to produce such echoes, a notion that might need to be considered for future experimental design.
Introduction
Rhythmic stimulation, both sensorily and electrically, produces rhythmic patterns in neural and perceptual measures that are synchronized with the rhythm of stimulation (Walter and Walter, 1949; Lakatos et al., 2008; Fröhlich and McCormick, 2010). This effect is often called neural entrainment (Lakatos et al., 2019; Obleser and Kayser, 2019) and is assumed to involve neural oscillations, that is, brain activity that is endogenously rhythmic (Lakatos et al., 2008). This assumption is difficult to verify during stimulation as any rhythmicity in neural or perceptual responses can be because of the rhythmicity of the stimulus without involving endogenous neural oscillations (Helfrich et al., 2019; Keitel et al., 2014; Zoefel et al., 2018).
Rhythmic entrainment echoes are rhythmic brain responses that are produced by a rhythmic stimulus and persist after its offset. Endogenous oscillations should linger for some time after having been entrained, similar to a swing that has been pushed, whereas other evoked brain activity will disappear rapidly when no stimulus is present (Thut et al., 2011). Rhythmic entrainment echoes therefore play a crucial role in distinguishing entrained neural oscillations from other brain responses that are not endogenously rhythmic.
Several studies have reported rhythmic entrainment echoes (sometimes called forward entrainment; Saberi and Hickok, 2022a). These have been observed after visual flicker (Spaak et al., 2014) and speech sounds (Kösem et al., 2018; van Bree et al., 2021) in human neurophysiological recordings, after regular tone sequences in the auditory cortex of nonhuman primates (Lakatos et al., 2013), and even after transcranial alternating current stimulation in speech perception (van Bree et al., 2021).
Here, we examined rhythmic entrainment echoes in human auditory perception. We presented participants with a rhythmically amplitude-modulated (AM) pure tone, followed by a target tone that they were asked to detect (Fig. 1). We hypothesized that the AM tone would entrain oscillations and produce an entrainment echo. The target was presented at different delays relative to the AM tone and was thus used to probe this echo.
In a similar paradigm, Hickok et al. (2015) showed that AM noise, presented at 3 Hz, produces a 3 Hz oscillation in the detection of a subsequently presented target tone. In that study, targets were most likely to be detected when presented in antiphase with the preceding AM noise. We followed up on this finding, guided by two principal questions. First, what is the preferred rate (eigenfrequency) of oscillations in audition? Stimulus rates that are closest to the natural frequency of neural oscillations should produce the strongest entrained oscillations (Fröhlich, 2015) and, consequently, the strongest entrainment echoes. A single previous study (Farahbod et al., 2020) reported that these echoes are strongest for relatively slow rates (∼2–3 Hz), but sample size was low (N = 3–5). We varied the rate of the AM tone and tested which of those rates produces the strongest oscillation in subsequent target detection.
Second, are entrainment echoes tonotopically organized? Previous work in nonhuman primates (Lakatos et al., 2013; O’Connell et al., 2014) showed that rhythmically presented pure tones with a sound frequency f entrain neural activity in most of primary auditory cortex (A1). However, neuronal ensembles in parts of A1 that are tuned to (i.e. respond most strongly to) f aligned their high-excitability phase to the tones, whereas those in other parts aligned the opposite low-excitability phase. This suggests that neural entrainment, and consequently entrainment echoes, are organized according to sound frequency. We independently varied the sound frequency of AM tone (
Materials and Methods
Participants
We tested for entrainment echoes in four independent experiments. Experiment 1 was run in the laboratory. 16 participants (9 female; mean, 26.2 years; range, 23–34 years) completed the experiment after giving written informed consent.
Experiments 2–4 were run online in chronological order as they are described here. Fifty-five, 59, and 50 participants were recruited from Prolific (https://www.prolific.co; cf. Peer et al., 2016) for those three experiments and gave informed consent by clicking on a button to confirm they wanted to participate. Eight, 10, and 8 participants were excluded, either for failing a test designed to ensure they were wearing headphones or because they did not respond accurately during practice trials (see below, Experimental design). Consequently, data from 47 (10 female; mean, 29.1 years; range, 19–46 years), 49 (22 female; mean 32.8 years; range, 19–46 years), and 42 (15 female; mean 28.7 years; range, 19–44 years) participants were included for subsequent analyses for experiments 2, 3, and 4, respectively. This study was approved by the Comité de Protection des Personnes Ouest II Angers (protocol number 2021-A00131-40).
Stimuli
In all experiments, participants were presented with AM pure tones at a certain rate, followed by a target tone they were asked to detect. The target tone was present in 50% (experiment 1) or 33.33% (experiments 2–4) of the trials, and its level was adapted to an individual threshold (see below, Experimental design). The duration of the AM tone was always five cycles of the presentation rate, whereas the duration of the target tone was 0.25 cycles of the presentation rate. The delay between the two was defined as the onset of the target tone relative to the final peak of the AM tone (Fig. 1). This delay was variable and the critical experimental manipulation to reveal entrainment echoes. If such echoes existed, then the probability of target detection should depend on the time of target presentation relative to the offset of the AM tone. Several acoustic parameters differed between experiments and are summarized in Table 1. These are the rate of the AM tone, the sound frequencies of both AM tone (
In experiment 1, we tested which of five different rates produces the strongest entrainment echo. We only used a single combination of sound frequencies
For experiment 2, we therefore restricted the rate of AM tones to 6 Hz and used longer delays. We used different combinations of sound frequencies (divided into the two categories
Whereas
In experiment 1, the rates tested for the AM tone covered a relatively wide but coarsely sampled range (Table 1). In experiment 4, we again varied AM rate, but in a narrower range and with a finer resolution, centered around the one that produced the strongest echo in experiment 1 (6 Hz). This manipulation had several purposes, It allowed us (1) to estimate the preferred rate for entrainment echoes with a finer spatial resolution; (2) vary rate and sound frequencies (
Experimental design
The participants’ task was fairly simple and similar across experiments. They were asked to indicate with a button press whether they perceived a target tone. In experiments 1 and 3, this corresponded to a simple yes/no (forced) choice. In experiments 2 and 4, they had to choose between high pitch, low pitch, and no target present. This was done to obtain false alarms that can be interpreted more easily but are not feasible in the other experiments as these included experimental blocks with only one possible
Experiments 2–4 were conducted over the Internet using the jsPsych JavaScript library (de Leeuw, 2015) and Cognition experiment management software. Mobile phones and tablets were ineligible, which was ensured with a JavaScript-based device check. Participants were asked to complete the experiment in a quiet room and wear headphones.
All experiments began with a calibration sound (pure tone) to verify that the audio could be heard clearly and to allow participants to adjust their volume to a comfortable level. Participants were instructed not to adjust the volume on their computer for the remainder of the experiment.
For online experiments only, this was followed by a test designed by Woods et al. (2017) to ensure that participants were wearing headphones. This test can be easily completed when wearing headphones but not otherwise. A detailed description can be found in Zoefel et al. (2023).
In all experiments, participants then received detailed instructions about the task and listened to example sounds. They were asked to complete practice trials in which the target was clearly audible. Participants received feedback on whether they responded correctly after each practice trial. In online experiments, participants were only able to continue with the main part of the experiment if they responded correctly in at least four of five practice trials. In case of failure, they were allowed to repeat practice once. In the lab experiment, all participants completed practice successfully.
Subsequently, the level of the target tone was adjusted to individual participants’ detection thresholds. Participants were presented with the same stimuli that were used for the main experiment, with a randomly chosen delay between AM tone and target tone in each trial. The level of the target tone decreased or increased in each trial. Participants were instructed to press a button when they could no longer hear the tone (for the decreasing level sequences) and when they started to hear it again (for the increasing level sequences). Decreasing and increasing level sequences were used in alternation. Participants’ detection threshold was defined as the average level of the target tone during the last four button presses. This adaptation procedure was run separately for each rate (experiments 1 and 4) and for the combination of
Finally, participants completed the main experimental task, as described above. Participants completed 1440 trials in experiment 1, 4720 trials in experiment 2, and 1080 trials in experiment 3. Because of an unequal number of conditions tested, experiments also differed in the number of targets per condition and delay. These are shown in column N (targets) in Table 1. They correspond to the number of trials that contributed to each data point shown in the results (Figs. 3, 5, 6, 8). In each trial, (1) the level of the target tone (−2 , 0, or +2 dB relative to the individual threshold; in experiment 4, only 0 dB was used), (2) the delay between AM tone and target tone, (3) the rate of the AM tone (only in experiments 1 and 4), and (4) the combination of
Statistical analyses
For each participant, delay, and experimental condition (e.g., AM rate), we computed the proportion of correctly detected targets (
We tested for entrainment echoes in sliding windows each consisting of four delays (i.e., one full cycle), and using a step size of one delay (i.e., one-fourth cycle). These four delays corresponded to the peak, trough, and the two zero crossings of the preceding AM rhythm (Fig. 1). This allowed us to estimate entrainment echoes with a finer temporal resolution compared with an approach that combines all delays tested into a single estimate.
Our hypothesis makes clear predictions about the best and worst moments for target detection (Fig. 2B). In particular, for
We used two additional analytical steps to rule out other factors that could have produced the expected peak/trough performance difference without the cyclic pattern of performance (Fig. 2B) that characterizes entrainment echoes. First, we corrected for linear changes in detection performance over time by removing a linear fit from each cycle (i.e., separately for each analysis window) before computing performance differences described above. Second, we used the fact that apart from the positive or negative difference in performance at peak and trough (Fig. 2C, orange and black) our hypothesis predicts a near-zero difference between the two zero crossings (Fig. 2C, green). We tested for this by contrasting (via paired t tests), after correction for linear trends, the two corresponding differences (peak/trough vs zero crossings). As performance should be similar for both zero crossings, the order of the two performance values (first minus second zero crossing or vice versa) was chosen randomly.
We here note that our assumption that the highest
For each participant and condition, we also determined the proportion of false alarms (
Note that
Results
In four experiments, we tested whether the detection of a target tone fluctuates rhythmically at the rate of a preceding AM tone (Fig. 1), revealing entrainment echoes in auditory perception.
The simplest version of the neural entrainment theory predicts best perception in phase with a rhythmic stimulus as high-excitability moments of the oscillatory cycle synchronize with the expected timing of upcoming events (Lakatos et al., 2008; Schroeder and Lakatos, 2009). Studies on nonhuman primates in primary auditory cortex (Lakatos et al., 2013; O’Connell et al., 2014) confirmed this assumption when neural activity was measured in areas tuned to the sound frequency of the entraining (rhythmic) stimulus. However, the opposite effect seems to occur (high excitability in antiphase with the rhythmic stimulus) for other sound frequencies (Fig. 2A).
We quantified rhythmic entrainment echoes by testing two predictions that follow from these previous findings and from the expected cyclic shape of the perceptual modulation (Fig. 2B). We first computed proportions of detected targets that were presented in phase and in antiphase with the preceding AM tone, respectively (i.e., at its peak vs trough, had it continued). The difference between in- and antiphase target detection (peak/trough difference) should differ from zero (Fig. 2C), whereas the sign of the difference reflects the phase of the echo (positive and negative for in-phase and antiphase entrainment echoes, respectively). We then compared this difference (peak/trough; Fig. 2C, orange and black) with the corresponding difference between two zero crossings of the putative rhythmic echo and that we predicted to be near zero (Fig. 2C, green).
In a first lab experiment, we tested which stimulus rate produces the strongest entrainment echoes. In three follow-up online experiments, we tested the hypothesis that entrainment echoes are frequency specific (Fig. 2), leading to simultaneous best and worst moments for the detection of a target, depending on whether its sound frequency differs from that of the entraining stimulus (AM tone), respectively. In experiment 4, we also estimated the preferred stimulus rate for entrainment echoes with a finer spatial resolution.
Experiment 1 (N= 16, lab experiment)
In experiment 1, we varied the rate of the amplitude modulation of the entraining tone. The same sound frequencies were used in each trial and differed between AM tone
On average, participants detected 50.0% (±13.1% SD) of the targets and made 4.6% (±3.4%) false alarms (resulting in an average d-prime of 1.88 ± 0.37). There was no difference in the proportion of detected targets across rates (F(4,60) = 1.24, p = 0.30; repeated-measures ANOVA), indicating that target levels were successfully adapted to individual detection thresholds for all rates (see above, Experimental design). There was, however, a difference in false alarm probability (F(4,60) = 3.04, p = 0.02), with fewer false alarms after 10 Hz AM tones than after 24 and 40 Hz AM tones. (All other post hoc comparisons were nonsignificant.) The d-prime measures did not differ across rates (F(4,60) = 0.52, p = 0.72).
For each rate, we then tested for rhythmic entrainment echoes in different temporal windows, each one cycle long, after the offset of the AM tone (see above, Statistical analyses). We found an entrainment echo, exclusively after 6 Hz stimulation and in the last cycle tested. Figure 3A shows the relevant statistical comparisons for this cycle, whereas Figure 3B shows perceptual outcomes for all delays after the 6 Hz AM tone. The 6 Hz entrainment echo (Fig. 3B, continuous line) was illustrated by a negative difference between peak and trough performance that was reliably different from zero [t(15) = 4.43; false discovery rate (FDR) corrected, p = 0.006; effect size, Cohen’s d = 1.11]. This peak/trough difference (Fig. 3A, black) was also significantly different (t(15) = 4.82; FDR corrected, p = 0.006; Cohen’s d = 1.20) from the difference between zero crossings (Fig. 3A, blue), as predicted from a cyclic pattern in perceptual outcomes (see above, Statistical analyses; Fig. 2C). Importantly, the highest number of targets were detected in antiphase with the preceding AM tone (Fig. 3B). This is predicted from tonotopic entrainment as
This result was confirmed by contrasting differences in peak/trough performance across conditions (Fig. 3A, black. Most importantly, we found an interaction of rate and delay (F(16,240) = 2.22, p = 0.005), reflecting echoes that are only present for certain rates (6 Hz) and for later delays tested.
Performance measures like d′ are a more complete indicator of participants’ perceptual sensitivity as they combine the proportion of correctly detected targets (hits) with that of target-absent trials that were incorrectly labeled as target present (false alarms). In the current paradigm, however, false alarms are not defined for individual delays as no target is present. Nevertheless, Saberi and Hickok 2022b) showed that entrainment echoes in hits and d′ can differ, even if the same (average) proportion of false alarms is used for each delay. Results for entrainment echoes in d′ are shown in Figure 4A. We found very similar results as for hits, with an antiphase entrainment echo only after 6 Hz stimulation and in later delays tested (FDR-corrected p values < 0.01).
Experiment 2 (N= 47, online experiment)
Based on results from experiment 1, we fixed the rate of the AM tone to 6 Hz. As the entrainment echo was most apparent in the second cycle tested (Fig. 3B), we delayed possible presentation times of the target by half a cycle (Table 1). The sound frequency of the AM tone (
On average, participants detected 61.4% (±14.0%) of the targets and made 1.6% (±2.5%) false alarms (resulting in an average d′ of 2.81 ± 0.66). There was no significant difference in target detection across conditions (
Figure 5A shows how target detection fluctuated after the offset of the rhythmic AM stimulus, separately for
An ANOVA on peak/trough differences revealed a main effect of temporal window (F(4,184) = 2.93, p = 0.02), driven by a change in maximal detection from in phase to antiphase (Fig. 5B, compare the two temporal windows). However, there was no main effect of condition (
It is of note that (1) maximal detection for
Experiment 3 (N= 49, online experiment)
Experiments 1 and 2 differed in the predictability of sound frequencies;
Overall performance followed a similar pattern to that observed in experiment 2. Participants detected 63.0 ± 14.5% of the targets, made 3.0 ± 3.2% false alarms, leading to a d′ of 2.51 ± 0.67. Conditions did not differ in the proportion of detected targets (
Figure 6, A and B, shows main results from experiment 3, reminiscent of those obtained in experiment 2 (Fig. 5). Again, the proportion of detected targets peaked approximately in phase with the preceding AM tone for
Despite some statistically reliable results, entrainment echoes were relatively weak in experiments 2 and 3. Given similar results in the two experiments, we therefore pooled their subjects (ignoring the differences in predictability of sound frequencies for this analysis). Results are shown in Figure 6C, confirming peaks in the proportion of detection targets when they were presented in phase
We also used the pooled dataset to test for interindividual differences in entrainment echoes. In particular, previous research suggested two distinct groups of participants that do or not do spontaneously entrain motor output to acoustically presented speech, resulting in a bimodal distribution of audiomotor synchronization (Assaneo et al., 2019). As shown in Figure 7A, we did not find such a bimodal distribution in entrainment echoes (here expressed as peak/trough performance difference) in auditory perception. It is possible that the bimodal distribution is specific to audiomotor synchronization, leading to a more normal distribution in the current study as no such synchronization was required for the task.
Experiment 4 (N= 42, online experiment)
Results from experiment 3 showed that a difference in the predictability of sound frequency cannot explain opposite phases of entrainment echoes in experiments 1 and experiments 2 and 3. In experiment 4, we tested an alternative explanation for this effect. A repeated presentation of a given stimulus, such as pure tones used here, leads to progressive reduction of neural responses to this stimulus if it occurs at the expected time (Lange, 2009; Costa-Faidella et al., 2011; Herrmann et al., 2013), whereas any deviance in time or identity (e.g., sound frequency) produces a stronger response and possibly enhanced detection (Ulanovsky et al., 2003; Khouri and Nelken, 2015; see below, Discussion). In our case, an enhanced detection of unexpected events would be visible as the highest proportion in detected targets in antiphase with the preceding rhythm for
Importantly, in experiment 1, but not in experiments 2 and 3, the rate of the entraining AM tone varied across trials. It is possible that changes in rate prevented a repetition-related suppression of responses, leading to perceptual outcomes that are instead dominated by entrainment effects. We tested this hypothesis in experiment 4. Again, we varied the AM rate across single trials. Instead of using a wide range of rates as for experiment 1, we used smaller steps around the rate that turned out to be optimal in experiment 1 (4–8 Hz, in steps of 1 Hz). This also allowed us to test for preferred rates for entrainment echoes with a higher resolution.
Participants detected 58.3 ± 14.6% of the targets, made 1.0 ± 1.1% false alarms, leading to a d′ of 2.70 ± 0.48. There was a main effect of rate on target detection (F(4,164) = 3.92, p = 0.005, repeated-measures ANOVA) that was driven by more detected targets after 4 and 5 Hz AM tones (59.7 and 59.6%, respectively) than after 8 Hz tones (56.6%). There was also a main effect of condition (F(1,41) = 14.90, p < 0.001), with more targets detected for
Figure 8 shows main results from experiment 4. We again found an entrainment echo, this time exclusively after 8 Hz stimulation and for
We also examined interindividual differences in preferred rates for entrainment echoes. Figure 7B shows the distribution of these preferred rates, quantified as the rate leading to the largest peak/trough difference in performance for individual subjects, measured in the cycle with the strongest average effect (Fig. 8B, continuous line) and for
Discussion
Summary: rhythmic entrainment echoes in auditory perception
Rhythmic brain responses that outlast rhythmic stimulation, rhythmic entrainment echoes (Hanslmayr et al., 2014; van Bree et al., 2021) or forward entrainment (Saberi and Hickok, 2022a), do not only play an important role to demonstrate the involvement of endogenous brain oscillations (Thut et al., 2011; Zoefel et al., 2018) but they can also give us insights into whether and how participants predict the timing of events (Saberi and Hickok, 2022a), a fundamental notion for the fields of neural entrainment (Lakatos et al., 2019; Obleser and Kayser, 2019) and temporal attending (Large and Jones, 1999; Bauer et al., 2015).
Here, in four independent experiments, we examined entrainment echoes in auditory perception. Specifically, we asked (1) which stimulus rate leads to strongest echoes in the detection of a subsequent auditory target and (2) whether these effects are organized tonotopically (Fig. 2). For the latter, we hypothesized that pure tone targets are detected best if they are presented in phase with a preceding entraining stimulus at the same sound frequency, whereas detection is most likely in antiphase when sound frequencies of target and entrainer differ. Notably, some previous studies reported peaks in performance or neural activity in phase with a preceding rhythmic stimulus (Jones et al., 2002; de Graaf et al., 2013), whereas for others they occurred in antiphase (Spaak et al., 2014; Hickok et al., 2015). These results seemed to support our hypothesis, but it remained to be tested whether sound frequency is indeed a decisive factor for these seemly opposing findings.
Indeed, not only did we find fluctuations in target detection that depended on the rhythm of the preceding AM stimulus, supporting the existence of entrainment echoes, these also changed their phase depending on whether sound frequencies of target and entraining stimulus matched. Surprisingly, however, for the same experimental condition (
Entrainment versus repetition-related adaptation—two opposing processes?
Although the following does entail some speculation, differences between the four experiments conducted here can provide us with some clues about the origins of these seemingly opposite patterns in auditory perception. Such opposite findings (e.g., in phase versus in antiphase echoes) were most prominent between experiments 1 and 2–3 (antiphase vs in phase echo for
A regular repetition of a stimulus often leads to a progressive reduction of neural responses to that stimulus. Importantly, such habituation or adaptation effects are temporally and spectrally specific: Neural activity is suppressed only in response to the expected stimulus and at the expected moment in time (Ulanovsky et al., 2003; Micheyl et al., 2005; Costa-Faidella et al., 2011; Herrmann et al., 2013). It is likely that such effects stem from circuits designed to detect novel information and therefore deviants in both timing and identity (Ulanovsky et al., 2003; Khouri and Nelken, 2015). In our case, highest sensitivity to novel information would predict a highest number of detected targets in phase and in antiphase with the preceding AM tone for
If a change in stimulus rate is sufficient to prevent repetition-related adaptation, then this might explain why we found results to be consistent with neural entrainment only in experiments 1 and 4 (although some remained inconclusive;
Online experiments enabled us to collect data from a high number of subjects but ruled out simultaneous neural recordings. Nevertheless, our results and their speculative interpretation yield precise hypotheses to be tested in future electro/neurophysiological experiments. (1) Repetition-related adaptation during rhythmic acoustic stimulation is stronger (i.e., the amplitude of stimulus-aligned activity is smaller) when stimulus rate is constant across trials than when it is not. Adaptation might be weakest at the start of the experiment and might increase over time, although the impact of participants’ expectation also needs to be considered and tested (e.g., responses might habituate as soon as a certain rate is expected). Other means of preventing adaptation (e.g., a more variable, jittered stimulus rhythm) can also be tested. (2) The phase of stimulus-aligned activity is shifted when comparing scenarios assumed to favor repetition-related adaptation (constant rate) and entrainment (variable rate or jitter), respectively. This phase shift goes along with changes in preferred moments for target detection as observed here. A change in phase is also expected after the offset of the rhythmic stimulus (i.e., in the rhythmic entrainment echo). (3) It is possible that repetition-related adaptation and entrainment are driven by different neural sources, and the spatial resolution of conventional EEG might not suffice to distinguish them. Collection of MEG data would be preferable for this purpose; separating activity from distinct neural sources with potentially differing properties (e.g., eigenfrequency) might also reconcile other inconsistencies or individual differences in our behavioral data (e.g., slight differences in preferred rate in experiments 1 and 4). Assuming distinct sources exist, their relative strength would depend on the experimental scenario (favoring adaptation or entrainment, respectively) and their phase lag would differ. In addition, intracranial recordings would allow us to test the tonotopic organization we observed in perceptual data and that neither EEG nor MEG can reveal.
If confirmed, these hypothesized findings would have important consequences for future experimental design as mixing two counteracting processes might lead to falsely negative or conflicting outcomes. It might also explain why previous results on rhythmic facilitation of perception and attention (Haegens and Zion Golumbic, 2018) have not always been unambiguous or straightforward to interpret (Bauer et al., 2015; Haegens and Zion Golumbic, 2018; Barne et al., 2022; Vilà-Balló et al., 2022), leading to a debate on the concept of entrainment echoes (Lin et al., 2022; Saberi and Hickok, 2022b,c; Sun et al., 2022). Intriguingly, opposite perceptual effects have also been observed in other research fields that examine consequences of stimulus repetition (including forward masking and repetition suppression) that have reported both beneficial and detrimental effects on perception or behavior (Jesteadt et al., 1982; Ulanovsky et al., 2003; Lange, 2009; Todorovic and de Lange, 2012; Segaert et al., 2013; Sohoglu and Chait, 2016; Saberi and Hickok, 2022a).
Preferred rates for rhythmic entrainment echoes
Preferred rates (eigenfrequencies) for auditory neural or perceptual responses have been described before (for review, see Zoefel and Kösem, 2022). Humans are most sensitive to acoustic spectrotemporal modulations between ∼2 and 5 Hz (Edwards and Chang, 2013). Neural activity measured with electroencephalography (EEG) or magnetoencephalography (MEG) follows such modulations most reliably if they occur in the theta (∼4–7 Hz) or gamma (∼30–45 Hz) range but not in between (Teng et al., 2017; Teng and Poeppel, 2020). It has been speculated that these eigenfrequency ranges reflect the specialization of the auditory system to process speech (Poeppel and Assaneo, 2020; Zoefel and Kösem, 2022), which contains amplitude modulations and changes in linguistic patterns at similar rates (Ding et al., 2017). However, most studies have almost exclusively focused on responses during the rhythmic stimulus, when endogenous oscillations are difficult to identify (Keitel et al., 2014). Indeed, the preferred theta and gamma ranges might simply produce the strongest neural responses because they correspond to stimulus rates that lead to maximally overlapping evoked responses at different cortical levels (Edwards and Chang, 2013). It remained, therefore, unclear whether the same preferred rates apply for entrainment echoes, which can be more reliably interpreted as a product of entrained endogenous oscillations.
We here identified the preferred rate for rhythmic entrainment echoes as 6 Hz (in experiment 1, using a wider but coarser range of rates; Fig. 3A) and 8 Hz (in experiment 4, using a narrower but finer range), respectively. These rates are in line with findings by Ho et al. (2017), who found that the onset of broadband noise produces rhythmic fluctuations in auditory perception at similar and equally variable frequencies (6–8 Hz). Additional research is required to understand why 6 Hz stimulation did not produce a reliable echo on the group level in experiment 4. Nonetheless, both 6 and 8 Hz are within the dominant AM range of human speech, which, despite a peak at 4–5 Hz, contains significant energy at 6–8 Hz (Ding et al., 2017; Varnet et al., 2017). Moreover, we found interindividual differences in the preferred rate, with an equal number of subjects preferring 5 and 8 Hz in experiment 4 (Fig. 7). Our findings are therefore compatible with the system tuning to process speech. The fact that eigenfrequencies are located at the upper limit of this range is in line with the suggestions that they decrease along the auditory hierarchy. As we used tones, not speech, to entrain neural oscillations, perceptual outcomes might have been determined by activity in an earlier cortical stage of cortical processing, where eigenfrequencies are likely to be higher than in regions processing more complex linguistic structures (Giraud et al., 2000; Edwards and Chang, 2013; Zoefel and Kösem, 2022).
We found preferred rates for rhythmic entrainment echoes that differ from those described by Farahbod et al. (2020) at ∼2 Hz. However, in their study only a few (three to five) subjects were tested. These subjects were tested extensively, each completing several thousand trials. This allowed the authors to estimate individual responses very reliably; however, the low number of subjects makes it difficult to draw conclusions on the population level. It is possible that subjects were selected with unusually low preferred rates. How strongly those rates vary on the interindividual level remains to be investigated. Finally, we add that other studies have reported low natural rates (1–2 Hz) for audition, but these were often linked to beat perception and audiomotor interactions (Zalta et al., 2020; Weineck et al., 2022).
Limitations, open questions, and future directions
In all experiments, rhythmic entrainment echoes were relatively short and often restricted to a single cycle (with the exception of subjects with a preferred rate of 8 Hz in experiment 4; Fig. 7C). Although this observation is common in the literature (Hickok et al., 2015; van Bree et al., 2021; Saberi and Hickok, 2022a), it does raise some questions. One might wonder whether a rhythmic process that lasts one cycle can be considered a true oscillation. This issue is more problematic for standard time-frequency analyses that estimate phase, power, and frequency of the putative oscillation and are therefore more vulnerable to misinterpret other nonoscillatory signals as such. In the current study, however, our analysis of entrainment echoes was guided by clear predictions about both frequency and phase of the hypothesized rhythmic process (Fig. 2). It is relatively unlikely that a nonoscillatory process would have led to best and worst detection in phase and in antiphase (or vice versa) with the preceding rhythmic stimulus and intermediate performance in between. It is also noteworthy that the entrainment echo appeared at delays that are relatively consistent across experiments (∼2–3 cycles after the final peak of the AM tone). The absence of an echo in the first cycle after the offset of the rhythmic stimulus (Fig. 3B) can be explained by various effects that are related to this offset (e.g., omission or orientation response; Hughes et al., 2001). Such effects should be maximal close to stimulus offset and might have masked entrainment echoes.
Independent of this possibility, another factor to be considered is the usefulness of echoes for the auditory system and the consequence for their duration. If these echoes indeed reflect participants’ expectation about stimulus timing, induced by the rhythmicity of the AM stimulus, then it was violated in the current paradigm as the target was presented at random delays. This would lead to targets randomly coinciding with high- or low-excitability phases of the oscillatory cycle, thus losing the advantage of relevant information being boosted at the high-excitability phase (Zoefel and VanRullen, 2017). In such scenarios, it is possible that entrainment echoes are suppressed so that targets are not missed if they occur at the suppressive (low excitability) phase. The fact that we did observe some rhythmic effects on perception suggests that these echoes were not, or could not, be stopped immediately after entrainer offset. Nevertheless, it might explain their relatively short duration. The notion that longer or stronger echoes only when they are useful for perception (e.g., when most targets are presented at the high-excitability phase of the echo) needs to be tested in future work. Together, it is likely that target detection is determined by a sophisticated interaction between potentially automatic adaptation processes related to stimulus repetition, more higher-level effects of temporal expectation, and attentional mechanisms that have been shown to reverse such expectation effects (Kok et al., 2012). All these factors might also interact with interindividual differences. Disentangling them is beyond the current study and is an exciting endeavor for future work.
Conclusion
We here demonstrate that the detection of a pure tone target depends on the rhythm of a preceding stimulus if (and only if) the latter is presented between 6 and 8 Hz. The best moments for stimulus detection depended on whether sound frequencies of target and entraining stimulus match, supporting the notion of tonotopy in rhythmic entrainment echoes. Nevertheless, these echoes seem more complex than those predicted from initial theories of neural entrainment. This complexity might be partly because of the fact that neural entrainment and repetition-related adaption exercise competing and opposite influences on perception.
Footnotes
B.Z. was supported by Agence Nationale de la Recherche Grant ANR-21-CE37-0002 and Fondation pour l’Audition Grant FPA-RD-2021-10. We thank Andrea Alamia, Florian Kasten, and Jules Erkens for discussions.
The authors declare no competing financial interests.
- Correspondence should be addressed to Benedikt Zoefel at benedikt.zoefel{at}cnrs.fr