Our brains continuously track the temporal structure of incoming information to anticipate upcoming events (Haegens and Zion Golumbic, 2018). Temporal expectations allow us to efficiently distribute attention over time (Haegens and Zion Golumbic, 2018), and they play an important role in the perception of speech (ten Oever and Martin, 2021) and music (Haegens and Zion Golumbic, 2018). At the neural level, the processing of temporal structure in the environment is thought to be implemented through entrainment: aligning the phase of low-frequency neural oscillations to temporal regularities in the external input (Obleser and Kayser, 2019). Such phase alignment induces heightened neural excitability precisely when external events are most expected, thereby optimizing neural processing. For natural auditory stimuli, such as music and speech, regularities in the delta range (0.5–4 Hz) have been argued to be most informative for neural processing (Ding et al., 2017; Haegens and Zion Golumbic, 2018). Indeed, better behavioral performance is observed when neural oscillations in this frequency range are aligned with an external target stimulus (Henry and Obleser, 2012).
Whether auditory cortex passively entrains to sensory input as a self-sustaining oscillator, or whether its entrainment is under active, top-down control remains a topic of active debate (Rimmele et al., 2018; Obleser and Kayser, 2019; Meyer et al., 2020). Many aspects of processing rhythmic sound streams—including the formation of temporal expectations and beat perception, which have been linked to entrainment—can occur under passive-listening conditions (Bouwer et al., 2020). However, a seminal study in monkeys showed entrainment only for attended, not unattended, sound streams (Lakatos et al., 2013), raising the possibility that entrainment requires top-down modulations.
Pursuing this topic, Pesnot Lerousseau et al. (2021) examined whether entrainment was present during passive-listening conditions in humans, using both MEG and stereotactic (intracranial) EEG (sEEG). Participants were presented with rhythmic stimulus streams of 16 tones with two different pitches (fundamental frequencies of 60 and 80 Hz). The time interval between the onsets of consecutive tones was constant, at 390 ms, resulting in isochronous, metronome-like sequences, with a regular pace of 2.6 Hz (i.e., in the delta range). Sounds were played while participants passively listened (sEEG) or watched a silent movie (MEG).
Of note, during presentation of a rhythmic sound stimulus, regular transient evoked responses elicited by temporally structured input (“neural tracking”) can appear highly similar to the alignment of ongoing endogenous low-frequency oscillations to external stimuli (“real” entrainment), as both lead to phase-locked activity in the delta range when measured with electrophysiology (Haegens and Zion Golumbic, 2018; Obleser and Kayser, 2019). However, entrainment, but not neural tracking, is characterized by the lingering of an aligned oscillation for a few cycles after input ceases, before the oscillation returns to its endogenous phase (Obleser and Kayser, 2019) and endogenous frequency (i.e., the eigenfrequency). Importantly, Pesnot Lerousseau et al. (2021) examined persistent oscillatory responses after the cessation of the rhythmic streams, sidestepping methodological confounds.
Surprisingly, and contrary to their own expectations, Pesnot Lerousseau et al. (2021) found persistent entrainment in the gamma range (60/80 Hz) in a wide cortical network including the auditory processing pathways, but not at the frequency of the rhythmic stream carrying the sounds (2.6 Hz). To differentiate between neural populations with different dynamic properties, they fitted a linear damped harmonic oscillator model to the sEEG data. This model has the following three free parameters: the endogenous frequency of the oscillation (i.e., the eigenfrequency), a time delay to account for transmission time in the auditory system, and a damping ratio. The damping ratio defines whether the oscillation will be self-sustaining (underdamping) or die out in the absence of external stimulation (overdamping). The model yielded three clusters of electrodes with distinct estimated optimal model parameters. All clusters consisted of electrodes belonging to the auditory pathways, positioned in bilateral auditory cortices, and associative regions in superior and medial temporal gyri, precentral gyrus, and inferior frontal gyrus. However, the clusters exhibited different oscillatory dynamics, as follows: two clusters had low eigenfrequencies (0.73 and 2.1 Hz) and high damping, and one cluster had a high eigenfrequency (60 Hz) and low damping. While the former could account for the immediate loss of power and phase locking after the offset of the rhythmic stream at 2.6 Hz, the latter could account for the persistent entrainment at higher frequencies. Thus, the model showed that the complex neural response to rhythmic streams measured with electrophysiology originated from several neural populations in the auditory system with distinct dynamic properties. The results show that under passive-listening conditions, low-frequency neural oscillations in the delta range do not show persistent entrainment to auditory rhythmic stimulation. In two previous studies, however, persistent entrainment in the delta range was found for speech stimuli when the rhythmic input was task relevant (Kösem et al., 2018; van Bree et al., 2021). Together, these results suggest that persistent entrainment is influenced by a task-related, top-down process. The nature of such a process is unclear, but two distinct possibilities are discussed below.
First, as suggested by Pesnot Lerousseau et al. (2021), it is possible that top-down attentional control directly affects entrainment in sensory areas, as attention can modulate sensory gain. That is, auditory selective attention, likely implemented by a frontoparietal attention network (Ross et al., 2010), could lead to stronger early auditory responses, and hence, stronger entrainment in auditory cortex (Lakatos et al., 2019). This would be in line with the hypothesis that entrainment mediates attentional selection (Lakatos et al., 2013). Speculatively, attention may affect not only the strength, but also the dynamics of cortical oscillations in sensory regions. When modeling entrainment using nonlinear oscillators (Doelling and Assaneo, 2021), even small changes to one parameter can push the model across a bifurcation point, with the output changing radically and qualitatively. For example, a small change in a damping parameter could push the system from a self-sustaining oscillatory state to a stimulus-following regime, in which activity quickly returns to rest once a stimulus ceases (Doelling and Assaneo, 2021). Possibly, in addition to increasing the gain on early sensory responses, attention could also influence entrainment by affecting the damping behavior of cortical oscillations. Future research incorporating both electrophysiology and modeling, as done by Pesnot Lerousseau et al. (2021), is necessary to test this hypothesis.
A second possibility is that entrainment as measured in sensory areas is not directly stimulus induced, but is the consequence of attention-dependent top-down signals provided by frontal and motor areas (Rimmele et al., 2018). Endogenous processes in frontal language areas may exhibit rhythmic structure (Meyer et al., 2020) and could drive phase resets in sensory areas, resulting in regular, phase-locked responses to speech (Rimmele et al., 2018). This could explain why persistent entrainment to rhythmic speech streams was found only for intelligible, not unintelligible, speech, although the temporal structure of the speech streams was identical (van Bree et al., 2021). Of importance for nonlinguistic auditory signals such as music, top-down guided phase resets in auditory cortex could also result from the motor system signaling temporal predictions to auditory cortex along the dorsal auditory pathway (Cannon, 2021). Both the SMA and basal ganglia are thought to be crucial for generating predictions about the timing of events (Grahn and Rowe, 2009), possibly through a sequential process in which neural trajectories in SMA precisely index temporal intervals, which are strung together in rhythmic sequences in the dorsal striatum (Cannon and Patel, 2021). Efferent connectivity from the SMA and the basal ganglia to auditory cortex could then lead to phase locking of auditory oscillations to the temporal structure of rhythmic sequences as indexed by these motor regions. If entrainment in sensory regions indeed relies on top-down information conveyed by frontal language and motor areas, the absence of sustained entrainment in passive-listening conditions (Pesnot Lerousseau et al., 2021) could be explained by a lack of top-down information when stimuli are not task relevant.
To account for the elusiveness of sustained cortical oscillations, in addition to considering task demands, individual differences in entrainment may be informative. Behaviorally, persistent effects of rhythmic input, as predicted by entrainment theories, have proven hard to replicate (Lin et al., 2021), possibly because of large interindividual differences (Sun et al., 2021). Indeed, on a task in which spontaneous synchronization of speech to external input was measured, participants' performance yielded a bimodal distribution, with some people seemingly automatically entraining their speech, while others were not perturbed by the external input (Assaneo et al., 2019). Thus, subgroups of individuals may show entrainment even under passive-listening conditions.
In conclusion, by combining electrophysiology and modeling, Pesnot Lerousseau et al. (2021) show that under passive-listening conditions there are no sustained oscillations in the delta range in response to auditory rhythmic stimulation. This is inconsistent with the hypothesis that auditory cortex passively entrains to input as a self-sustaining oscillator and opens up interesting new questions for future research concerning the effects of listening conditions and efferent connectivity from frontal and motor areas on entrainment. The modeling approach used by Pesnot Lerousseau et al. (2021) provides invaluable information about the possible neural computations giving rise to entrainment effects observed with electrophysiology. Such modeling approaches can move the field beyond the question of whether there is entrainment to addressing the nature of the underlying computations (Doelling and Assaneo, 2021), thereby furthering our understanding of a process that is thought to underlie speech and music processing, but is still ill defined.
Footnotes
Editor's Note: These short reviews of recent JNeurosci articles, written exclusively by students or postdoctoral fellows, summarize the important findings of the paper and provide additional insight and commentary. If the authors of the highlighted article have written a response to the Journal Club, the response can be found by viewing the Journal Club at www.jneurosci.org. For more information on the format, review process, and purpose of Journal Club articles, please see http://jneurosci.org/content/jneurosci-journal-club.
This work was supported by Veni Grant VI.Veni.201G.066 awarded by the Dutch Research Council NWO. I thank Professor Heleen Slagter, Dr. Atser Damsma, and Tom Kaplan for helpful feedback on an earlier version of this manuscript.
- Correspondence should be addressed to Fleur L. Bouwer at fleurbouwer{at}hotmail.com