Skip to main content

Main menu

  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Collections
    • Podcast
  • ALERTS
  • FOR AUTHORS
    • Information for Authors
    • Fees
    • Journal Clubs
    • eLetters
    • Submit
    • Special Collections
  • EDITORIAL BOARD
    • Editorial Board
    • ECR Advisory Board
    • Journal Staff
  • ABOUT
    • Overview
    • Advertise
    • For the Media
    • Rights and Permissions
    • Privacy Policy
    • Feedback
    • Accessibility
  • SUBSCRIBE

User menu

  • Log out
  • Log in
  • My Cart

Search

  • Advanced search
Journal of Neuroscience
  • Log out
  • Log in
  • My Cart
Journal of Neuroscience

Advanced Search

Submit a Manuscript
  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Collections
    • Podcast
  • ALERTS
  • FOR AUTHORS
    • Information for Authors
    • Fees
    • Journal Clubs
    • eLetters
    • Submit
    • Special Collections
  • EDITORIAL BOARD
    • Editorial Board
    • ECR Advisory Board
    • Journal Staff
  • ABOUT
    • Overview
    • Advertise
    • For the Media
    • Rights and Permissions
    • Privacy Policy
    • Feedback
    • Accessibility
  • SUBSCRIBE
PreviousNext
Research Articles, Behavioral/Cognitive

“What” and “When” Predictions Jointly Modulate Speech Processing

Ryszard Auksztulewicz, Ozan Bahattin Ödül, Saskia Helbling, Ana Böke, Drew Cappotto, Dan Luo, Jan Schnupp and Lucía Melloni
Journal of Neuroscience 14 May 2025, 45 (20) e1049242025; https://doi.org/10.1523/JNEUROSCI.1049-24.2025
Ryszard Auksztulewicz
1Department of Neuropsychology and Psychopharmacology, Maastricht University, Maastricht 6229 ER, The Netherlands
2Centre for Cognitive Neuroscience Berlin, Freie Universität Berlin, Berlin 14195, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ozan Bahattin Ödül
3Department of Brain and Behavioral Sciences, Università di Pavia, Pavia 27100, Italy
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Saskia Helbling
4Ernst Strungmann Institute, Frankfurt am Main 60528, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Saskia Helbling
Ana Böke
2Centre for Cognitive Neuroscience Berlin, Freie Universität Berlin, Berlin 14195, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ana Böke
Drew Cappotto
5Ear Institute, University College London, London WC1X 8EE, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Dan Luo
6Department of Otorhinolaryngology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jan Schnupp
7Gerald Choa Neuroscience Institute, Chinese University of Hong Kong, Hong Kong SAR, Hong Kong
8Department of Otorhinolaryngology, Head and Neck Surgery, Chinese University of Hong Kong, Hong Kong SAR, Hong Kong
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Lucía Melloni
9Research Group Neural Circuits, Consciousness and Cognition, Max Planck Institute for Empirical Aesthetics, Frankfurt am Main 60322, Germany
10Predictive Brain Department, Research Center One Health Ruhr, Faculty of Psychology, University Alliance Ruhr, Ruhr University Bochum, Bochum 44801, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Lucía Melloni
  • Article
  • Figures & Data
  • Info & Metrics
  • eLetters
  • Peer Review
  • PDF
Loading

Abstract

Adaptive behavior rests on predictions based on statistical regularities in the environment. Such regularities pertain to stimulus contents (“what”) and timing (“when”), and both interactively modulate sensory processing. In speech streams, predictions can be formed at multiple hierarchical levels of contents (e.g., syllables vs words) and timing (faster vs slower time scales). Whether and how these hierarchies map onto each other remains unknown. Under one hypothesis, neural hierarchies may link “what” and “when” predictions within sensory processing areas: with lower versus higher cortical regions mediating interactions for smaller versus larger units (syllables vs words). Alternatively, interactions between “what” and “when” regularities might rest on a generic, sensory-independent mechanism. To address these questions, we manipulated “what” and “when” regularities at two levels—single syllables and disyllabic pseudowords—while recording neural activity using magnetoencephalography (MEG) in healthy volunteers (N = 22). We studied how neural responses to syllable and/or pseudoword deviants are modulated by “when” regularity. “When” regularity modulated “what” mismatch responses with hierarchical specificity, such that responses to deviant pseudowords (vs syllables) were amplified by temporal regularity at slower (vs faster) time scales. However, both these interactive effects were source-localized to the same regions, including frontal and parietal cortices. Effective connectivity analysis showed that the integration of “what” and “when” regularity selectively modulated connectivity within regions, consistent with gain effects. This suggests that the brain integrates “what” and “when” predictions that are congruent with respect to their hierarchical level, but this integration is mediated by a shared and distributed cortical network.

  • audition
  • effective connectivity
  • magnetoencephalography
  • predictive processing
  • source reconstruction
  • speech processing

Significance Statement

This study investigates how the brain integrates predictions about the content (“what”) and timing (“when”) of sensory stimuli, particularly in speech. Using magnetoencephalography (MEG) to record neural activity, researchers found that temporal regularities at slower (vs faster) time scales enhance neural responses to unexpected disyllabic pseudowords (vs single syllables), indicating a hierarchical specificity in processing. Despite this specificity, the involved brain regions were common across different hierarchical levels of regularities and included frontal and parietal areas. This suggests that the brain uses a distributed and shared network to integrate “what” and “when” predictions across hierarchical levels, refining our understanding of speech processing mechanisms.

Introduction

Speech comprehension relies on auditory predictions, enabling the brain to anticipate and efficiently process sounds (Hickok, 2012; Hovsepyan et al., 2020; Poeppel and Assaneo, 2020; Caucheteux et al., 2023; Mai and Wang, 2023). Accurate predictions reduce bottom-up processing, optimizing other aspects of language comprehension (Federmeier, 2007; Sohoglu and Davis, 2016; Hakonen et al., 2017; Ryskin and Nieuwland, 2023). Inaccurate predictions draw cognitive resources, causing error signals (increases in amplitude) or delays in the responses of brain regions including the superior temporal gyrus (STG), superior parietal lobule (SPL), and prefrontal cortex (Shain et al., 2020; Caucheteux et al., 2023). The timing of their activity suggests that predictions span multiple levels, from low-level acoustic to high-level semantic representations.

During speech processing both the content (“what”) and the timing (“when”) of speech can be predicted (Gómez Varela et al., 2024). For instance, when listening to a familiar speaker, our brain not only anticipates what words might come next but also when they will be spoken, allowing us to follow rapid conversations effortlessly. This predictive mechanism explains why we can still understand speech even in noisy environments—our brain fills in the gaps based on expected sounds and meanings.

In turn, unexpected contents, even in artificial speech, trigger errors which typically lead to further processing, increasing activity in the inferior frontal gyrus (IFG; Petersson et al., 2012; Wilson et al., 2015) and superior temporal cortex (Ling et al., 2022). “What” predictions span multiple levels, ranging from phonemes to phrases (Su et al., 2023), suggesting a hierarchical organization where higher-level predictions guide lower-level processing (Heilbron et al., 2022). Similarly, “when” predictions operate across several time scales, from phoneme onsets to phrase boundaries (Donhauser and Baillet, 2020; Schmitt et al., 2021). The rhythmic patterns of speech synchronize neural oscillations (Ding et al., 2016), with speech tracking linked to phase locking and increased activity (Obleser and Kayser, 2019) in auditory regions like the STG (Keitel et al., 2017), as well as (pre)motor regions (Morillon et al., 2019) and subcortical structures including the basal ganglia (Merchant et al., 2015) and the cerebellum (Kotz et al., 2014). Increased synchronization is thought to aid speech processing, particularly with predictable temporal structures (Kösem et al., 2018; Riecke et al., 2018; Zoefel et al., 2018). In contrast irregular or unpredictable speech sequences show weaker neural tracking (Klimovich-Gray et al., 2021). However the extent to which speech tracking reflects oscillatory mechanisms as opposed to evoked responses to each new sound is still debated (Doelling et al., 2019; Zou et al., 2021; Oganian et al., 2023).

While “what” and “when” predictions are both crucial for speech comprehension, they likely rely on different mechanisms (Arnal and Giraud, 2012; Auksztulewicz et al., 2018). Theoretically, it has been suggested that the brain performs separate predictive computations for “what” and “when” information, allowing it to both process them independently (factorization) and integrate them when needed (conjunction; Friston and Buzsáki, 2016). This separation has the advantage of optimizing cognitive resources efficiently. Empirically, a study using invasive electrophysiological measurements and computational modeling suggested that “what” predictability relies on modulated connectivity between sensory, prefrontal, and premotor regions, while “when” predictability mainly involves gain modulation in sensory regions (Auksztulewicz et al., 2018). Increased gain due to rhythmic “when” regularity (Auksztulewicz et al., 2019) may also amplify mismatch responses to violated “what” predictions (Todd et al., 2018; Lumaca et al., 2019; Jalewa et al., 2021).

Given the interactive effects of “what” and “when” predictability on neural activity, how do they modulate the processing of hierarchically organized stimulus sequences such as speech streams? One hypothesis suggests that hierarchies of predictions map onto neural hierarchies, such that lower cortical regions like the STG (Oganian and Chang, 2019) handle interactions between “what” and “when” regularities for single chunks (e.g., syllables) and faster time scales (syllable onsets), while higher (e.g., frontal) regions (Rimmele et al., 2023) handle longer segments (e.g., words) and slower time scales (word onsets). Alternatively, interactions between “what” and “when” regularities might occur through sensory-independent mechanisms involving attention-related regions, such as the left parietal cortex, which integrates content- and time-based speech information (Orpella et al., 2020).

This study examines the neural correlates of “what” and “when” regularities across levels of artificial speech processing. We independently manipulated “what” and “when” regularity of syllables and disyllabic pseudowords, while recording neural responses using magnetoencephalography (MEG). We first quantified the phase locking of MEG responses to different time scales of “when” regularities. Then, we used source reconstruction of evoked responses to test if “what” and “when” regularities at different levels interactively modulate activity in hierarchically distinct regions or in shared networks. Finally, computational modeling of evoked responses allowed us to infer network connectivity patterns mediating the effects across the cortical hierarchy.

Materials and Methods

Participant sample

A total of 24 participants took part in the study upon written informed consent. Two participants did not complete the study, resulting in data from 22 participants taken into analysis (13 females, 9 males; median age, 28; range, 21–35 years; all right-handed). The study adhered to protocols approved by the Ethics Board of the Goethe-University Frankfurt am Main. All participants confirmed normal hearing in their self-reports, and none reported any current or past neurological or psychiatric disorders.

Stimulus and task description

The experimental paradigm used auditory sequences which were manipulated with respect to “what” and “when” regularity at two levels each. “What” regularity was established by presenting sequences of pseudowords drawn from a set of six items and violated at the level of single syllables versus disyllabic pseudowords. “When” regularity was established by presenting sequences with isochronous timing and violated at a faster time scale of 4 Hz versus a slower time scale of 2 Hz. The details of both of these manipulations will be described below. By independently manipulating the regularity of stimulus identity and stimulus timing, we were able to analyze how “what” and “when” regularities interacted at each level of processing speech sequences.

Prior to the experimental task, participants implicitly learned six disyllabic pseudowords (“tupi,” “robi,” “daku,” “gola,” “zufa,” “somi”; Fig. 1A). The syllables were taken from a database of Consonant-Vowel syllables (Ives et al., 2005) and resynthesized using an open-source vocoder, STRAIGHT (Kawahara, 2006) for MATLAB R2018b (MathWorks) to match their duration (166 ms), fundamental frequency (F0 = 150 Hz), and sound intensity.

Figure 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 1.

Experimental paradigm. A, Participants listened to sequences composed of pseudowords, drawn from a set of six items to which they had been passively exposed in a training session. B, After participants had implicitly learned the six pseudowords, they engaged in a syllable repetition detection task. During the task, “what” regularity was manipulated across three experimental conditions (at a single-trial level): pseudoword sequences could be composed of only legal pseudowords (“standard” trials), or they could contain a pseudoword with a deviant word-final syllable (“deviant syllable” trials, whereby the pseudoword starts in an expected manner but ends with a violation), or they could contain a pseudoword with a deviant word-initial syllable substituted from a final syllable of another pseudoword (“deviant pseudoword” trials, whereby the pseudoword starts with a violation). C, Sequences were blocked into three temporal conditions: an “isochronous” condition, in which ISI between all syllables was fixed at 0.25 s; a “beat-based” condition, in which the ISI between pseudoword onsets was fixed at 0.5 s but the ISI between the initial and final syllables of each pseudoword was jittered, such that only the timing of the deviant pseudoword (but not of the deviant syllable) could be predicted; and a “single-interval” condition, in which the ISI between pseudoword onsets was jittered but the ISI between the initial and final syllable of each pseudoword was fixed at 0.25 s, such that only the timing of the deviant syllable (but not of the pseudoword) could be predicted.

In the implicit learning task (administered outside of the MEG scanner but immediately prior to the MEG recording session), participants were exposed to continuous auditory streams of six pseudowords presented in a random order. The stimulus onset asynchrony (SOA) between each two syllables was set to 250 ms, resulting in an isochronous syllable rate of 4 Hz. The stream was 120 s long, amounting to 80 occurrences of each pseudoword. Following exposure to the continuous stream, participants listened to pairs of pseudowords and were asked to discriminate “correct” pseudowords (e.g., “tupi”) from “incorrect” pseudowords (e.g., “pitu,” “turo,” “tuku”). Each participant performed 60 trials of the pseudoword discrimination task.

Following the implicit learning task, participants were exposed to the main experimental paradigm in the MEG scanner. They were asked to listen to continuous sequences of four unique pseudowords (e.g., “tupirobidakugola”). As a cover task, to monitor participants' attention, we asked them to detect immediate syllable repetitions (e.g., “tupirobidakugogo”), present in 6.6% sequences, immediately after repetition onset. These trials were rejected from subsequent analysis of neural data.

The sequences presented in the MEG scanner were independently manipulated with respect to “what” and “when” regularity. “What” regularity manipulations had three levels (Fig. 1B): (1) in standard sequences (66.6%), only correct (regular) pseudowords were used (e.g., “tupirobidakugola”); (2) in “deviant syllable” sequences (13.3%), the final syllable of the sequence was replaced with a syllable belonging to a different pseudoword (e.g., “tupirobidakugofa”), such that upon hearing the syllable “go,” the prediction of the subsequent syllable “la” is violated by the syllable “fa”; (3) in “deviant pseudoword” sequences (13.3%), the penultimate syllable of the sequence was replaced with a syllable which should be a final syllable of a different pseudoword (e.g., “tupirobidakufala”), creating an irregular disyllabic pseudoword (“fala”). “What” regularities were manipulated at the level of the sequence level, such that the three different types of sequences were presented in a random order. Each trial consisted of seven seamlessly concatenated sequences, amounting to 14 s per trial. Trials were separated by an intertrial interval of 1 s. To ensure that differences between deviants and standards were not confounded by differences in the stimulus positions and/or timing in the sequences, each deviant syllable/word was matched with one designated standard syllable/word. As such, the standards were drawn from the same positions (i.e., penultimate or final syllable) and had the same timing as their matched deviants.

Independently, “when” regularity manipulations also had three levels (Fig. 1C): (1) in isochronous sequences (33% of the trials), the SOA between consecutive syllables was fixed at 250 ms; (2) in “beat-based” sequences (33% of the trials), the SOA between pseudoword-initial syllables was fixed at 500 ms, but the SOA between the initial and the final syllable of the pseudoword was jittered between 167 and 333 ms, resulting in irregular timing of final syllables of each pseudoword (i.e., at a faster time scale, corresponding to syllable rate) but regular timing of pseudoword onsets (i.e., at a slower time scale, corresponding to pseudoword rate); (3) in “single-interval” sequences (33% of the trials), the SOA between the initial and the final syllable of the pseudowords was fixed at 250 ms, but the SOA between pseudoword-initial syllables was jittered between 167 and 333 ms, resulting in irregular timing of pseudoword onsets (i.e., at the slower time scale) but regular timing of final syllables of each pseudoword (i.e., at the faster time scale). Here, we use the term “single-interval” aligning with previous literature using similar temporal manipulations (Breska and Ivry, 2018), although we acknowledge that other authors refer to similar manipulations with terms such as “interval-based” (Merchant and Honing, 2013), “memory-based” (Bouwer et al., 2020), and “duration-based” (Teki et al., 2011) timing. “What” regularity was manipulated at the trial level (seven repetitions of a six-pseudoword sequence), while “when” regularity was manipulated at a block level (20 trials per block). Each “when” condition was administered in four blocks, resulting in 12 blocks in total. Blocks were presented in a pseudorandom order, such that no immediate repetitions of the same “when” condition was allowed.

To prevent differences in baseline duration from affecting the MEG analysis of stimulus-evoked responses, all syllables preceding deviant syllables and the designated standard syllables were set to a fixed 250 ms SOA. This adjustment was necessary because comparing stimuli with fixed SOA versus random SOA could introduce baseline contamination, potentially leading to the rejection of a large number of trials. Therefore, temporal regularity was manipulated at the sequence level, affecting only syllables surrounding the analyzed syllables (deviants and designated standards), but not the syllables immediately preceding them. The global deviant probability was 4.16% (including repetitions) or 3.32% (excluding repetitions) of all syllables, resulting in up to 75 deviant stimuli (12 blocks × 20 trials × 7 sequences × 0.133 deviant sequence probability × 0.333 temporal condition probability) per combination of “what” regularity violation (deviant pseudoword vs syllable) and “when” regularity (isochronous, beat-based, single-interval).

Behavioral analysis

Behavioral analysis focused on accuracy and response time (RT) data derived from participant responses in the syllable repetition detection task. Single-trial RTs exceeding each participant's median + 3 standard deviations were removed from the analysis. The remaining RTs, derived exclusively from correct trials, underwent a log transformation to achieve an approximately normal distribution and subsequently averaged. To test for the effect of “when” regularity on behavioral performance in the repetition detection task, accuracy and mean log RTs were separately subjected to repeated-measures ANOVAs, incorporating the within-subjects factor of “when” regularity (isochronous, beat-based, single-interval). Since the behavioral task was limited to detecting immediate syllable repetitions (i.e., did not differentiate between repeated pseudowords or syllables), “what” regularities were not included as a factor in these analyses. Post hoc comparisons were executed through paired t tests in MATLAB, and corrections were applied for multiple comparisons—specifically, three for accuracy and three for RTs—using a false discovery rate of 0.05.

MEG data acquisition and preprocessing

Participants were seated in a 275-channel whole-head CTF MEG system with axial gradients (Omega 2005, VSM MedTech). The data were acquired at a sampling rate of 1200 Hz with synthetic third-order gradient noise reduction (Vrba and Robinson, 2001). For monitoring eyeblinks and heart rate, four electrooculogram (EOG) and two electrocardiogram (ECG) electrodes were placed on the participant's face and clavicles. Continuous head localization was recorded throughout the session.

Auditory stimuli were generated by an external sound card (RME) and transmitted into the MEG chamber through sound-conducting tubes which were linked to plastic ear molds (Promolds, Doc's Proplugs). The sound pressure level was adjusted to ∼70 dB SPL. Visual stimuli, consisting of the instructions between blocks and fixation cross during acoustic stimulation, were presented using a PROPIX projector (VPixx ProPixx) and back-projected onto a semitransparent screen positioned 60 cm from the participant’s head. Participants responded to stimuli by operating a MEG-compatible button response box (Cambridge Research Systems) with their right hand. Short breaks were administered between runs.

The continuous MEG recordings were high-pass filtered at 0.1 Hz and notch filtered between 48 and 52 Hz, down-sampled to 300 Hz, and further subjected to a low-pass filter at 90 Hz (including antialiasing). All filters were fifth-order zero-phase Butterworth and implemented in the SPM12 toolbox for Matlab. Based on continuous head position measurement inside the MEG scanner, we calculated six movement parameters (three translations and three rotations; Stolk et al., 2013), which were regressed out from each MEG channel using linear regression. Eyeblink artifacts were automatically detected based on the vertical EOG and removed by subtracting the two top spatiotemporal principal components of eyeblink-evoked responses from all MEG channels (Ille et al., 2002). Heartbeat artifacts were automatically detected based on ECG and removed in the same manner. The cleaned signals were subsequently subjected to separate analyses in the frequency domain (phase locking) and the time domain (event-related fields).

MEG analysis: phase locking

To investigate whether speech sequences exhibit distinct spectral peaks in neural responses at both the syllable rate (4 Hz) and the pseudoword rate (2 Hz), we conducted a frequency domain analysis. Following previous research (Ding and Simon, 2013), we chose the intertrial phase coherence (ITPC) as a metric of tracking temporal regularity and accessing frequency rates characteristic for different speech units (syllables and pseudowords). The continuous data were segmented into epochs spanning from the onset to the offset of each trial (speech sequence). For each participant, channel, and sequence, we computed the Fourier spectrum of MEG signals recorded during that specific sequence. To assess phase consistency within each condition, we computed ITPC for each temporal condition (isochronous, beat-based, single-interval) using the following equation:ITPCf=[ΣNcosϕf]2+[ΣNsinϕf]2N. In the equation above, Φf denotes the Fourier phase for a given frequency f, and N denotes the number of trials (here: sequences; 80 per condition).

We applied ITPC in the aperiodic conditions as well, based on previous findings (Breska and Ivry, 2018) that temporally consistent slow ramping neural activity (such as the contingent negative variation) can produce significant ITPC values even in aperiodic (single-interval) sequences. We also assessed the neural phase consistency in response to both periodic and aperiodic stimuli conditions by calculating the ITPC based on the raw stimulus waveform.

In the initial analysis, ITPC estimates were averaged across MEG channels. To assess the presence of statistically significant spectral peaks, ITPC values at the syllable rate (4 Hz) and pseudoword rate (2 Hz) were compared against the mean of ITPC values at their respective neighboring frequencies (syllable rate: 3.93 and 4.07 Hz; pseudoword rate: 1.929 and 2.071 Hz) using paired t tests.

Additionally, to examine whether spectral peaks at the syllable rate and pseudoword rate observed at individual MEG channels exhibited modulations due to temporal regularity, spatial topography maps of single-channel ITPC estimates were transformed into 2D images. These images were then smoothed with a 5 × 5 mm full-width at half-maximum (FWHM) Gaussian kernel to ensure that the data conform to the assumptions of the statistical inference approach, namely, cluster-based correction based on random field theory (Litvak et al., 2011). The smoothed images were subjected to repeated-measures ANOVAs, separately for syllable-rate and pseudoword-rate ITPC estimates, incorporating a within-subjects factor of Time (isochronous, beat-based, single-interval). This analysis was implemented in SPM12 as a general linear model (GLM). To address multiple comparisons and ITPC correlations across neighboring channels, statistical parametric maps were thresholded at p < 0.005, a conservative cluster-forming threshold chosen to avoid inflating the false-positive ratio (Eklund et al., 2016; Flandin and Friston, 2019; Henson et al., 2019) and corrected for multiple comparisons over space at a cluster-level pFWE < 0.05, following random field theory assumptions (Kilner et al., 2005). Repeated-measures parametric tests were selected based on previous literature using ITPC (Sokoliuk et al., 2021), assuming that differences in ITPC values between conditions follow a normal distribution. Post hoc tests were conducted at a Bonferroni-corrected FWE threshold (0.05/3 pairwise comparisons per rate).

MEG analysis: event-related fields

For the analysis in the time domain, the data underwent segmentation into epochs spanning from −50 ms before to 250 ms after the onset of deviant/standard syllables. To prevent contamination from the temporally structured presentation, baseline correction was applied from −25 ms to 25 ms, following a previously published approach (Fitzgerald et al., 2021). The data were then denoised using the “Dynamic Separation of Sources” algorithm (de Cheveigné and Simon, 2008), aimed at minimizing the impact of noisy channels. Condition-specific event-related fields (ERFs) corresponding to syllable and pseudoword deviants and the respective standards, presented in each of the three temporal conditions, were computed using robust averaging. It is a standard method of obtaining ERFs in SPM which iteratively calculates the weighted average across trials based on the deviation of each trial from the median (Litvak et al., 2011). A main advantage of robust averaging is that it downweighs the influence of outlier trials and improves the accuracy of the averaged signal, thereby minimizing the impact of artifacts. This analysis was conducted with the SPM12 toolbox and included a low-pass filter at 48 Hz (fifth-order zero-phase Butterworth) to derive the final ERFs. The ERFs were then subjected to univariate analysis to assess the effects of “what” and “when” regularity on evoked responses.

The ERF data were transformed into 3D images (2D for spatial topography; 1D for time) which underwent spatial smoothing using a 5 × 5 mm FWHM Gaussian kernel. Subsequently, the smoothed images were entered into a GLM, which implemented a 3 × 3 repeated-measures ANOVA with within-subject factors “what” (standard, deviant syllable, deviant pseudoword) and “when” regularity (isochronous, beat-based, single-interval). In addition to testing for the two main effects and a general 3 × 3 interaction, we also tested for the following planned contrasts: (1) deviant versus standard “what” conditions, (2) isochronous versus nonisochronous “when” conditions, and (3) an interaction contrast isolating the congruence effect. This last contrast aimed to investigate whether “when” regularity specifically influenced the amplitude of mismatch responses evoked by “what” deviants presented at a congruent time scale—i.e., deviant syllables in the single-interval condition and deviant pseudowords in the beat-based condition. This involved testing for a 2 × 2 interaction between “what” regularity violation (deviant syllable, deviant pseudoword) and “when” manipulation (single-interval, beat-based).

To address multiple comparisons and ERF amplitude correlations across neighboring channels and time points, statistical parametric maps were thresholded at p < 0.005 and corrected for multiple comparisons over space and time at a cluster-level pFWE < 0.05, following random field theory assumptions (Kilner et al., 2005).

MEG analysis: source reconstruction

Source reconstruction was conducted under group constraints (Litvak and Friston, 2008), enabling the estimation of source activity at the individual participant level by assuming that activity is reconstructed within the same subset of sources for each participant, thereby reducing the impact of outliers. An empirical Bayesian beamformer (Wipf and Nagarajan, 2009; Belardinelli et al., 2012; Little et al., 2018) was employed for estimating sources based on the entire poststimulus time window (0–250 ms). Given the principal findings identified in the ERF analysis—specifically, a difference between ERFs elicited by deviants and standards; a difference between ERFs elicited in isochronous and nonisochronous temporal conditions; and an interaction between deviant type and temporal condition—we focused on comparing source estimates related to these effects.

For the analysis of the difference between deviants and standards, as well as the difference between stimuli presented in isochronous and nonisochronous sequences, source estimates were extracted for the 33–250 ms time window (based on the results of the ERF analysis; see below). These estimates were converted into 3D images with three spatial dimensions and then smoothed using a 5 × 5 × 5 mm full-width at half-maximum (FWHM) Gaussian kernel. The smoothed images were entered into a GLM that implemented a 3 × 3 repeated-measures ANOVA with within-subjects factors of “what” (standard, deviant syllable, deviant pseudoword) and “when” regularity (isochronous, beat-based, single-interval). Parametric tests based on a GLM were employed as an established method for analyzing MEG source reconstruction maps (Litvak et al., 2011).

For the analysis of the interaction between deviant type and temporal condition, source estimates were extracted for the 127–250 ms time window (based on the results of the ERF analysis) and processed as described above. Smoothed images were then entered into a GLM implementing a 2 × 2 repeated-measures ANOVA with within-subjects factors of content (deviant syllable, deviant pseudoword) and time (single-interval, beat-based). To address multiple comparisons and source estimate correlations across neighboring voxels, statistical parametric maps were thresholded and corrected for multiple comparisons over space at a cluster-level pFWE < 0.005 (minimum voxel extent: 64 voxels), adhering to random field theory assumptions (Kilner et al., 2005). Source labels were assigned using the Neuromorphometrics probabilistic atlas, implemented in SPM12.

MEG analysis: dynamic causal modeling

Dynamic causal modeling (DCM) was employed to estimate connectivity parameters at the source level, specifically related to the general processing of mismatch responses (deviant vs standard) and the contextual interaction between “what” and “when” regularity (deviant syllable in the single-interval condition and deviant pseudoword in the beat-based condition vs deviant syllable in the beat-based condition and deviant pseudoword in the single-interval condition). DCM, a form of effective connectivity analysis, utilizes a generative model to map sensor-level data (in this case, ERF time series across MEG channels) to source-level activity (David et al., 2005). The generative model encompasses several sources representing distinct cortical regions, forming a sparsely interconnected network. DCM is an explanatory method designed to investigate the underlying neural connectivity that mediates observed effects in sensor space (here, ERFs) and/or source space (here, source reconstruction). As such, DCM takes the observed significant effects as a starting point (Stephan et al., 2010) and aims to disambiguate between alternative hypotheses regarding the connectivity patterns mediating those effects. Since no sources were identified in the source reconstruction of the main effect of “when” regularity (see Results), this effect was not included in the model. Therefore, the analysis focused on disambiguating neural activity patterns mediating the observed significant effects, namely, the main effect of “what” regularity violation (deviants vs standards) and the interactive congruence effect of “what” and “when” regularity (deviant syllables in single-interval vs beat-based temporal sequences; deviant pseudowords in beat-based vs single-interval sequences).

Each source’s activity is explained by neural populations based on a canonical microcircuit (Bastos et al., 2012), modeled using coupled differential equations describing changes in postsynaptic voltage and current in each population. In our study, the microcircuit consisted of four populations (superficial and deep pyramidal cells, spiny stellate cells, and inhibitory interneurons), each with a unique connectivity profile, including ascending, descending, and lateral extrinsic connectivity (connecting different sources) as well as intrinsic connectivity (connecting different populations within each source). The canonical microcircuit’s form and connectivity profile followed procedures established in the previous literature on the subject (Auksztulewicz and Friston, 2015; Auksztulewicz et al., 2018; Rosch et al., 2019; Fitzgerald et al., 2021; Todorovic and Auksztulewicz, 2021).

Crucially, a subset of intrinsic connections represented self-connectivity parameters, describing the neural gain of each region. Both extrinsic connectivity and gain parameters were allowed to undergo condition-specific changes to model differences between experimental conditions (deviants vs standards and the congruence between “what” and “when” regularity). The canonical microcircuit models prior connection weights for all ascending and lateral connections as excitatory, and for descending and intrinsic connections as inhibitory, based on the previous literature linking descending connections to predictive suppression, and intrinsic connections to self-inhibitory gain control (Bastos et al., 2012). However, these priors can be overridden by the data likelihood, possibly resulting in ascending/lateral inhibition and descending/intrinsic excitation if this maximizes model evidence.

In this study, we employed DCM to fit the individual participants’ ERFs specific to each condition within the 0–250 ms timeframe. DCM was applied to the responses evoked by the same stimuli as those analyzed in the ERF comparisons and source reconstruction. These stimuli correspond to deviant syllables, deviant disyllabic pseudowords, and designated standards, each embedded in the three temporal conditions (isochronous, beat-based, single-interval). The 0–250 ms time frame was chosen to analyze the entire time course of the syllable/pseudoword-evoked response avoiding contamination by the subsequent stimulus.

Drawing on the results of source reconstruction (refer to the Results section) and prior research (Garrido et al., 2008), we integrated 10 sources into the cortical network. These 10 sources included eight regions identified in the source reconstruction (Fig. 5) based on their peak MNI coordinates: bilateral superior temporal gyrus (STG; left, [−60 −18 −16]; right, [62 −32 8]), left angular gyrus ([−44 −70 34], right supramarginal gyrus ([44 −26 48]), bilateral superior parietal lobule (left, [−30 −64 54]; right, [22 −50 66]), and bilateral inferior frontal gyrus (left, [−48 28 8]; right, [42 34 14]). Additionally, we included two regions corresponding to bilateral primary auditory cortex for anatomical plausibility (Garrido et al., 2008; A1; MNI coordinates: left, [−56 −12 −2]; right, [60 −14 18]). The A1 coordinates were based on local maxima in the source reconstruction contrast maps for which probabilistic labeling returned “planum polare” (left A1 coordinates) or “planum temporale” (right A1 coordinates). To evaluate model fits, we utilized the free-energy approximation to model evidence, penalized by model complexity.

The analysis followed a sequential approach: initially, model parameters encompassing only extrinsic connections were estimated based on all experimental conditions, without modeling differences between conditions. The aim of this initial step was to find the optimal connectivity pattern between sources, providing the best fit of the data. In a second step, condition-specific changes in both extrinsic and intrinsic connections were optimized at the individual participant level. In both steps, models were fitted to individual participants’ data. Significant parameters (connection weights) were inferred at the group level using parametric empirical Bayes (PEB) and models were optimized using Bayesian model reduction (BMR; Friston et al., 2016), as described below.

At the individual participant level, models were fitted to ERF data considering two factors: “what” regularities (all deviants vs standards) and the interaction between “what” and “when” regularity (deviant syllable in the single-interval condition and deviant pseudoword in the beat-based condition vs deviant syllable in the beat-based condition and deviant pseudoword in the single-interval condition). These two effects were modeled in parallel, such that the model space consisted of models where these two effects could independently and factorially influence different subsets of connections. At this stage, all extrinsic and intrinsic connections were included in the network, representing a “full” model. Due to the potential for local maxima in model inversion within DCM, the group level analysis implemented PEB. This involved inferring group-level parameters by (re)fitting the same “full” models to individual participants’ data. The assumption underlying this step was that model parameters should be normally distributed in the participant sample, helping to mitigate the impact of outlier participants. For model comparison, BMR was applied, contrasting the “full” models against a range of “reduced” models where certain parameters were fixed to zero. Therefore, the model space also included a “null” model in which neither the “what” main effect nor the “what”/“when” interactive effect could modulate connectivity. This null model served as a baseline against which the other models were compared, and it would be favored if the remaining models were penalized for complexity or overfitting the data. This approach led to the creation of a model space encompassing different combinations of parameters.

In the first step (optimizing extrinsic connections independent of conditions), we used BMR to prune the extrinsic connectivity matrix. The free-energy approximation to log-model evidence was employed to score each model with a given extrinsic connectivity parameter set to 0, relative to the full model. This approach resulted in Bayesian confidence intervals for each parameter, indicating the uncertainty of parameter estimates. Parameters with 99.9% confidence intervals spanning either side of zero (equivalent to p < 0.001) were deemed statistically significant.

In the second step (optimizing the modulation of extrinsic and intrinsic connections by experimental conditions), a total of 64 models were generated. The model space was designed in a factorial manner, such that the following six groups of connections were set as free parameters or fixed to zero independent of each other: (1) ascending connectivity modulation by “what” regularities; (2) descending connectivity modulation by “what” regularities; (3) intrinsic connectivity modulation by “what” regularities; (4) ascending connectivity modulation by “what” and “when” congruence; (5) descending connectivity modulation by “what” and “when” congruence; and (6) intrinsic connectivity modulation by “what” and “when” congruence. The resulting 64 (26) models were fitted using BMR (switching off subsets of parameters of the full model) and compared using Bayesian model selection. Since a single model was identified as winning (see Results), its posterior parameters were inspected. Parameters with 99.9% nonzero confidence intervals were treated as statistically significant.

Results

Prior to MEG measurements, participants (N = 22) implicitly learned six disyllabic pseudowords (“tupi,” “robi,” “daku,” “gola,” “zufa,” “somi”; Fig. 1A) in a 2 min block of passive exposure (see Materials and Methods for more details). After participants had learned the set of pseudowords, they listened to a continuous stream of syllables, engaging in a syllable repetition detection task while MEG was measured. The stimulus sequences were independently manipulated with respect to “what” and “when” regularity at two different levels: single syllables and disyllabic pseudowords. “What” regularity manipulations had three levels (Fig. 1B): (1) standard sequences (66.6%), in which only “correct” pseudowords were used; (2) “deviant syllable” sequences (13.3%), in which the final syllable of the sequence was replaced with a syllable belonging to a different pseudoword; (3) “deviant pseudoword” sequences (13.3%), in which the penultimate syllable of the sequence was replaced with a syllable which should be a final syllable of a different pseudoword, resulting in an irregular disyllabic pseudoword. The remaining 7% of trials contained task-related repetitions that were discarded from MEG analysis. Independently, “when” regularity manipulations also had three levels (Fig. 1C): (1) isochronous sequences (33.3%), in which the SOA between consecutive syllables was fixed at 250 ms; (2) “beat-based” sequences (33.3%), in which the timing of pseudoword onsets was regular but the timing of word-final syllables was irregular; and (3) “single-interval” sequences (33.3%), in which the timing of pseudoword onsets was irregular but the timing of word-final syllables was regular.

The aim of the analysis was twofold: first, to examine the impact of “when” regularity on responses to syllables and on neural tracking at lower versus higher timescales and second, to assess how these “when” regularity manipulations modulate neural signatures associated with lower and higher-level “what” regularities, specifically the mismatch responses (MMRs). The latter analysis focused on testing interactions between “what” and “when” regularity, with a specific emphasis on testing whether MMRs exhibit contextually specific modulation based on temporal regularity. For instance, the analysis probed whether faster (vs slower) “when” regularity selectively influenced MMRs in response to violations of “what” regularities concerning syllables (vs pseudowords). To draw inferences on the putative mechanisms underlying interactions between “what” and “when” regularity observed at the sensor level, we performed source reconstruction and model-driven data analysis techniques (dynamic causal modeling).

Behavioral results

We first tested if participants implicitly learned the pseudowords. In line with that hypothesis, we observed that pseudoword discrimination following initial training far exceeded chance level, reaching 71.35% accuracy (SEM: 1.81%; one-sample t test against chance level: t(20) = 11.77, p < 0.001; Fig. 2A) and a mean d’ of 1.49 (SEM: 0.25; one-sample t test against chance level: t(20) = 5.88, p < 0.001). This indicates that participants could learn the pseudowords after short passive exposure to a continuous syllable stream.

Figure 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 2.

Behavioral results. A, Accuracy in the training session in the pseudoword discrimination task. B, Left panel, Reaction times during the main continuous sequence where subjects performed a repetition detection task; right panel, accuracy during the repetition detection task. Rain cloud plots denote individual participants’ data points. Outliers are shown in red. Box plots show median values and interquartile ranges. Whiskers show data variability outside of the interquartile ranges, excluding outliers. Violin plots show the data density across participants. Asterisks denote p < 0.05.

Participants performed well and comparable across conditions in the syllable detection task (F(2,40) = 1.730, p = 0.190; Fig. 2B). Yet, we observed a clear effect of temporal regularity in the RTs (F(2,40) = 5.188, p = 0.009): in the isochronous condition, participants were faster (mean ± SEM: 488  ±  26 ms, after exponentiating log RTs) than both in the single-interval (mean ± SEM: 550 ± 34 ms; t(20) = −2.475, p = 0.022, FDR-corrected) and beat-based conditions (mean ± SEM: 544 ± 38 ms; t(20) = −3.429, p = 0.003, FDR-corrected; Fig. 2B). To quantify the prevalence of these effects at the level of single participants, we calculated two-sample t tests on single-trial RTs for each condition (median df = 48, accounting for trials removed from the analysis; see Materials and Methods). For the comparison between isochronous and single-interval conditions, the median t statistic was equal −0.283 (interquartile range between −1.420 and −0.001; 76% participants above 0), while for the comparison between isochronous and beat-based conditions, it was equal −0.564 (interquartile range between −1.246 and 0.161; 62% participants above 0), suggesting that the majority of participants showed results consistent with the grand average but with considerable variability at the single-trial level. Taken together, these results indicate that participants capitalized on the temporal regularity, adjusting their responses based on the temporal context information.

MEG results: phase locking

The stimulus spectrum, quantified as intertrial phase coherence (ITPC) of the sound waveform across 80 unique sequences per condition, showed pronounced differences between the three “when” conditions (Fig. 3A,C). Specifically, (1) in isochronous sequences, both a prominent syllable-rate (4 Hz) and a pseudoword-rate (2 Hz) peak were found; (2) in beat-based sequences, the pseudoword-rate (2 Hz) peak was largely preserved, and a syllable-rate (4 Hz) peak was relatively weaker but still present; (3) in single-interval sequences, both peaks were relatively weaker compared with the other conditions. All pairwise differences for the 2 Hz rate as well as for the 4 Hz rate were significant (all p < 0.001, all t(21) > 4.95).

Figure 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 3.

Phase-locking results. A, Stimulus spectrum. Prominent peaks are observed at the 2 Hz (pseudoword rate) and 4 Hz (syllable rate). B, MEG spectrum, averaged across channels (see panels E and F for channel topographies). Similar peaks are observed as in the stimulus spectrum. C, Pseudoword (2 Hz) and syllable-rate (4 Hz) peaks based on the stimulus spectrum, showing differences between conditions. Rain cloud plots denote individual participants’ data points. Outliers are shown in red. Box plots show median values and interquartile ranges. Whiskers show data variability outside of the interquartile ranges, excluding outliers. Violin plots show the data density across participants. All pairwise comparisons for the 2 Hz peaks as well as for the 4 Hz peaks were significant (see Results). Since stimuli were generated pseudorandomly for each subject, and to facilitate comparisons with MEG spectra, error bars denote SEM across participants. D, Rain cloud, box, and violin plots of 2 Hz and 4 Hz peaks based on the MEG spectrum, showing differences between conditions. Plot legend as in C. All pairwise comparisons for the 2 Hz peaks as well as for the 4 Hz peaks were significant (see Results). Error bars denote SEM across participants. E, MEG channel topography of significant differences in the pseudoword-rate 2 Hz peak between conditions. Color bar denotes F statistic. The transparency mask shows significant topography clusters (p < 0.05, FWE-corrected). Significant clusters are also outlined on the topography maps. F, MEG channel topography of significant differences in the syllable-rate 4 Hz peak between conditions. Figure legend as in E. Please note the difference in colormap scales between panels E and F. In F, nearly all channels show significant effects.

The MEG spectrum (ITPC averaged across channels and conditions) also showed prominent syllable-rate (4 Hz) and pseudoword-rate (2 Hz) peaks for all three temporal conditions (isochronous, beat-based, single-interval; paired t tests of the peaks of interest vs neighboring frequencies; syllable-rate, all t(21) > 2.87, all p < 0.009; pseudoword-rate: all t(21) > 7.68, all p < 0.001; Fig. 3B). The syllable-rate ITPC was stronger than the pseudoword-rate ITPC when averaging across channels and conditions (paired t test: t(21) = 5.98, p < 0.001). “When” regularity had a significant main effect on both the syllable-rate peaks (Fmax = 83.57, Zmax = 7.82, pFWE < 0.001) and on pseudoword-rate peaks (Fmax = 13.29, Zmax = 3.98, pFWE < 0.001; Fig. 3D). However, the syllable-rate differences showed a broad and distributed MEG topography and were significant for virtually all channels (Fig. 3F), while the pseudoword-rate differences were only significant over left-lateralized anterior and posterior channels (Fig. 3E). Post hoc tests revealed that, at the syllable rate, ITPC was higher in the isochronous condition than in the beat-based condition (t(21) = 9.62, p < 0.001) and in the beat-based condition than in the single-interval condition (t(21) = 4.16, p = 0.004). The same pattern of results was found for the pseudoword-rate ITPC (pairwise comparisons: isochronous vs beat-based, t(21) = 2.83, p = 0.009; beat-based vs single-interval, t(21) = 5.37, p < 0.001). Thus, the phase-locking analysis showed a close correspondence between spectral characteristics of the stimulus waveform and the MEG responses; however, in sensor MEG data, sensitivity to pseudoword-rate peaks was relatively limited to left-lateralized channels. This suggests that neural activity did not merely follow the stimulus spectrum but was sensitive to the temporal structure of the syllable streams with a degree of topographic specificity.

MEG results: event-related fields and source reconstruction

To test for the temporal modulations of “what” and “when” regularity in the stimulus-evoked activity, we analyzed MEG data in the time domain and subjected ERFs to a general linear model with fixed effects “what” (standard, deviant syllable, deviant pseudoword) and “when” regularity (isochronous, beat-based, single-interval). First, we found that the three-way main effect of “what” regularities (Fmax = 12.13, Zmax = 4.22, pFWE = 0.03) corresponded to a significant difference between deviants (pooled across deviant syllables and pseudowords) and standards (133–250 ms over left central/posterior channels; Fmax = 24.18, Zmax = 4.60, pFWE = 0.003; Fig. 4A), such that the ERFs evoked by deviant stimuli were stronger than ERFs evoked by standard stimuli (Tmax = 4.92). To quantify the prevalence of this effect among single participants, for each participant we calculated a two-sample t test between single-trial evoked response amplitudes following deviants versus standards, measured at the channel and time point where the peak group effect was observed. The median t statistic was equal to 1.218 (median df: 944; interquartile t statistic range between 0.381 and 3.245; 77% participants above 0), indicating that a robust majority of participants showed results consistent with the grand average. No significant differences were found between ERFs evoked by deviant syllables and deviant pseudowords, indicating that “what” regularities alone have a relatively coarse effect on neural response amplitude and only differentiate between deviants and standards.

Figure 4.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 4.

Event-related fields. A, Main effect of “what” regularity violation (deviant vs standard). Left panels, Time courses of ERFs averaged over the spatial topography clusters shown in the right panels. For each participant and condition, the ERF was based on an average of up to 450 stimuli (before discarding trials with artefacts). Shaded area denotes SEM across N = 22 participants. Black horizontal bar denotes pFWE < 0.05. Right panels, spatial distribution of the main effect. Color bar: F values. B, Main effect of “when” regularity (isochronous vs single-interval vs beat-based). For each participant and condition, the ERF was based on an average of up to 300 stimuli (before discarding trials with artefacts). Figure legend as in A. C, Congruency/interaction between “what” (deviant syllable vs deviant pseudoword) and “when” regularity (single-interval vs beat-based). For each participant and condition, the ERF was based on an average of up to 75 stimuli (before discarding trials with artefacts).

To source-localize the ERF effect found for deviants versus standards, we reconstructed the MEG topography of evoked responses in source space and tested for differences in 3D source maps using a GLM (see Materials and Methods section 4.7 for details). Overall, averaged across stimulus types, source reconstruction could explain 88.34 ± 3.40% of sensor-space variance (mean ± SEM across participants). We identified stronger activity estimates for deviant stimuli (collapsed across deviants) versus standard stimuli in a range of sources (Table 1; Fig. 5A), including bilateral STG, SPL, and IFG, the left angular gyrus (ANG), and the right supramarginal gyrus (SMG), reflecting a distributed network sensitive to auditory deviance.

Figure 5.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 5.

Source reconstruction. A, Regions showing a significant main effect of “what” regularities (deviant vs standard) after applying a binary significance mask. Insets show unthresholded Z-maps. B, Regions showing a significant congruency effect between “what” (deviant syllable vs deviant pseudoword) and “when” regularity (single-interval vs beat-based). Legend as in A. STG, superior temporal gyrus; ANG, angular gyrus; SPL, superior parietal lobule; SMG, supramarginal gyrus; IFG, inferior frontal gyrus. Left and right hemispheres are shown in separate columns.

View this table:
  • View inline
  • View popup
Table 1.

Source reconstruction results. Summary statistics of all clusters showing significant differences between conditions (pFWE < 0.05)

Second, we tested for the effect of “when” regularity on ERF amplitude. While the overall three-way difference between conditions was not significant (all pFWE > 0.05), in a planned contrast we identified a significant difference between isochronous and nonisochronous (pooled over beat-based and single-interval) conditions (Fig. 4B; Fmax = 16.92, Zmax = 3.84, pFWE < 0.025). Here, ERF amplitudes were stronger for isochronous versus nonisochronous conditions (Tmax = 4.11) between 33 and 250 ms over right posterior channels. A prevalence analysis of this effect (conducted in the same way as for deviants vs standards) showed that the median t statistic across participants was equal to 1.014 (median df: 944; interquartile t statistic range between 0.143 and 3.359; 77% participants above 0), indicating that a robust majority of participants showed results consistent with the grand average. Source reconstruction of this ERF effect, however, did not reveal any significant source-level clusters after correcting for multiple comparisons (all pFWE > 0.05), suggesting that the ERF effect did not systematically map onto underlying sources.

Finally, we tested for the ERF interaction effect between “what” regularities (i.e., the type of deviant stimulus) and “when” regularity (i.e., the type of nonisochronous temporal regularity). We observed a significant interaction, such that deviant syllables and pseudowords were associated with stronger ERFs when their timing was regular (i.e., in the single-interval and beat-based conditions, respectively), relative to when their timing was irregular (i.e., in the beat-based and single-interval conditions, respectively). This effect localized to left posterior channels (Fig. 4C; time extent: 127–250 ms, Fmax = 13.10, Zmax = 3.36, pFWE = 0.036). Post hoc tests on the interaction effect found that it was driven primarily by deviant pseudowords presented in the beat-based versus single-interval conditions (t(21) = 2.899, p = 0.009). A prevalence analysis of this effect (conducted in the same way as for the main effects) showed that the median t statistic across participants was equal to 0.229 (median df: 156; interquartile t statistic range between −0.961 and 2.169; 59% participants above 0), indicating that a majority of participants showed results consistent with the grand average albeit with considerable variance at the level of single trials. The remaining pairwise comparisons did not reach significance after correcting for multiple comparisons (deviant syllables presented in the beat-based vs single-interval condition: t(21) = −1.552, p = 0.013, uncorrected; deviant syllables vs pseudowords presented in the beat-based condition: t(21) = −1.197, p = 0.244; deviant syllables vs pseudowords presented in the single-interval condition: t(21) = 2.505, p = 0.021, uncorrected). Taken together, this interaction indicates that the effects of “what” regularities were stronger when they were congruent with “when” regularity.

To source-localize the interaction of “what” and “when” regularity, we compared sources of activity evoked by deviants whose timing was regular (deviant syllables presented in single-interval blocks and deviant pseudowords presented in beat-based blocks) against deviants whose timing was irregular (deviant syllables presented in beat-based blocks and deviant syllables presented in single-interval blocks). This contrast revealed significant differences in two regions (Fig. 5B): the left SPL and right IFG (see Table 1 for region coordinates and statistical information). A third cluster with the most probable anatomical label being “unknown” (peak MNI [−64 −4 22], voxel extent 833, lying in the vicinity of the left postcentral/precentral gyrus) was excluded from further analysis. All sources showed weaker activity for deviant stimuli presented in temporally congruent versus incongruent conditions (Tmin = −8.44). In summary, the interaction of “what” and “when” regularity was mapped to a more limited set of brain regions than the main effect of “what” regularities.

MEG analysis: dynamic causal modeling

To infer the connectivity patterns mediating the effects of “what” and “when” regularity on speech processing, we used dynamic causal modeling (DCM)—a Bayesian model of effective connectivity fitted to individual participants' spatiotemporal patterns of syllable-evoked ERFs (David et al., 2005; Auksztulewicz and Friston, 2015). DCM models evoked ERFs as arising in a network of sources. Network structure is quantified by extrinsic connections (linking distinct sources) and intrinsic connections (linking distinct populations within the same source and amounting to neural gain modulation at each source). The analysis consisted of two steps. In the first step, we created a fully interconnected model based on the eight regions identified in the source reconstruction (see above) as well as bilateral AC (see Materials and Methods) and optimized its extrinsic connectivity using Bayesian model reduction (BMR). This procedure pruned 75% of the connections, leaving 19 significant connections out of 76 connections of the fully interconnected model in the reduced model (p < 0.001; Fig. 6A). This finding indicates that the model space accommodated both overly complex and overly simple models which were appropriately penalized.

Figure 6.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 6.

Dynamic causal modeling. A, Anatomical model including the eight regions identified in the source reconstruction analysis (bilateral superior temporal gyrus, STG; superior parietal lobule, SPL; inferior frontal gyrus, IFG; left angular gyrus, ANG; right supramarginal gyrus, SMG) as well as bilateral auditory cortex (AC). The figure depicts a model with reduced anatomical connectivity based on Bayesian model reduction, used for subsequent modeling of condition-specific effects. Black arrows, excitatory connections; red arrows, inhibitory connections. Intrinsic (self-inhibitory) connections not shown. B, Top panel, Model space showing models on the horizontal axis, and groups of connections included as free parameters (gray) or switched off (black) in each model on the vertical axis. The winning model, allowing “what” regularities to modulate ascending and descending (but not intrinsic) connections, and congruence between “what” and “when” regularity to modulate intrinsic (but not ascending or descending) connections, is shown as a white column. Bottom panel, Bayesian model comparison; model probability per model. C, Posterior model parameters sensitive to “what” regularity violations. Only significant parameters (>99.9%) shown. Black, excitatory; red, inhibitory; solid lines, stronger connectivity; dashed lines, weaker connectivity for deviants versus standards. D, Posterior model parameters sensitive to congruence between “what” and “when” regularity. Self-inhibitory intrinsic connections showed region-dependent increase (solid) or decrease (dashed) for deviants temporally predicted at congruent versus incongruent time scales.

In a second step, we took the reduced model and allowed extrinsic connections to vary systematically between “what” and “when” conditions. Specifically, since in the source reconstruction results we only identified significant differences in source maps related to (1) all deviants versus standards and (2) congruent versus incongruent “what” and “when” regularities, we considered these two factors as possible modulatory effects of extrinsic and/or intrinsic connectivity. Bayesian model comparison of the 64 resulting models revealed a single winning model, in which “what” regularities modulated only extrinsic connections, while its congruence with “when” regularity modulated only intrinsic connections (Fig. 6B). The difference in the free-energy approximation to log-model evidence between the winning model and the next-best model (log Bayes factor) was 3.701, amounting to 97.53% probability that the winning model outperforms the next-best model. Therefore, the winning model provides strong evidence for distinct effective connectivity patterns sensitive to “what” regularity and its interaction with “when” regularity.

Based on the winning model, we then inferred its significant connectivity parameters. “What” regularity significantly modulated a subset of extrinsic connections (Fig. 6C; see Table 2 for statistical information). Specifically, deviants were linked to increased ascending bilateral connectivity at multiple levels in the auditory hierarchy, from AC via STG to SPL. Conversely, ascending connectivity decreased for cross-hemispheric connections at higher levels of the hierarchy, from SPL to IFG and from the left STG via ANG to right IFG. Deviants also modulated descending connectivity, such that top-down inhibition increased at higher levels of the hierarchy (from bilateral IFG to SPL, ANG, and SMG) and decreased at lower levels of the hierarchy (from STG to AC). In summary, deviant processing differentially affected ascending and descending connections, leading to a net ascending drive especially within hemispheres and at lower levels of the hierarchy.

View this table:
  • View inline
  • View popup
Table 2.

Dynamic causal modeling results. Summary of significant condition-specific effects on connectivity estimates (p < 0.001)

Finally, the congruence of “what” and “when” regularity exclusively modulated a subset of intrinsic connections (Fig. 6D; see Table 2 for statistical information). Specifically, deviant syllables and pseudowords predictable in time were linked to increased gain (decreased self-inhibition) in bilateral AC and the right SMG and decreased gain (increased self-inhibition) in most other sources in the network with the exception of the right STG and left IFG for which no significant self-connectivity modulation was found. This model explained 81.44 ± 1.62% of the variance of spatiotemporal ERF patterns (mean ± SEM across participants). This pattern indicates that “what” regularities congruent with “when” regularity increased gain at low levels of the hierarchy and decreased gain at higher levels.

Discussion

In the current study, we observed a contextual modulation of the neural responses to deviations from predicted speech contents, dependent on their temporal regularity. This modulation showed that faster “when” (single-interval) regularity amplified responses to deviant syllables and slower “when” (beat-based) regularity amplified responses to deviant disyllabic pseudowords. This implies a congruence effect in the processing of “what” and “when” regularity across hierarchical levels of speech processing whereby smaller processing units (syllables) are modulated at faster rates and larger processing units (words) are modulated at slower rates. However, the interactive effects between “what” and “when” regularities on evoked neural responses did not differentiate between hierarchical levels and were instead linked to a shared network of sources including the left SPL and right IFG. In the connectivity analysis, these modulatory effects of “when” regularity on “what” mismatch responses were best explained by widespread gain modulations, including stronger sensitivity of early auditory regions bilaterally, and right SMG, as well as weaker sensitivity of most of the temporo-fronto-parietal network. Thus, our analysis of evoked responses (as well as subsequent source reconstruction and computational modeling) suggest that the interactions between “what” and “when” regularities, while contextually specific due to their congruent effects, are facilitated by a common and distributed cortical network, independent of the hierarchical level of these regularities.

Mismatch responses to unpredicted speech contents are well documented in neuroimaging studies and have been found in a range of cortical regions including the STG (Rothermich and Kotz, 2013), the SMG (Celsis et al., 1999), and—in case of nonsense words—bilateral IFG, largely matching our findings (Wilson et al., 2015). Beyond these regions, we also found sensitivity to “what” regularity violations in the angular gyrus, consistent with its role in phonological processing and novel word acquisition (Seghier, 2023), and the SPL, part of the dorsal attentional network previously linked to statistical learning of artificial speech streams (Sengupta et al., 2019). Our dynamic causal modeling of mismatch responses suggested a consistent pattern of connectivity modulations, with qualitative differences between lower and higher levels of the cortical hierarchy. At the lower levels (from A1 to STG and from STG to SPL), speech deviants were linked to stronger ascending excitation and weaker descending inhibition, consistent with increased forward prediction errors signaled to short-term violations of speech predictions by auditory regions in the temporal lobe (Caucheteux et al., 2023). Conversely, at the higher levels of the hierarchy (in the frontoparietal network), speech deviants were associated with weaker ascending and stronger descending connectivity, possibly reflecting internal attentional orienting to unpredicted speech contents (Reiche et al., 2013; Lückmann et al., 2014).

Temporal regularity of speech sounds was found to modulate behavior in an incidental task (with RTs being shortest following isochronous vs nonisochronous sequences), consistent with previous findings (Morillon et al., 2016). We also found that temporal regularity (in isochronous sequences) increased sound-evoked ERF amplitude (Bouwer et al., 2016), albeit to a similar extent relative to both beat-based and single-interval nonisochronous sequences. A previous EEG study comparing these two types of temporal regularities found largely comparable behavioral and neural effects, with the differences limited to sounds presented at unexpected times (but not to sounds presented at expected times) in beat-based sequences (Bouwer et al., 2020). In our study, differences between these two types of sequences were found in the analysis of frequency-domain effects of “when” regularity. Frequency-domain analyses generally indicated a close alignment between the MEG spectrum and the stimulus spectrum, although neural tracking of syllables (as quantified using ITPC) was stronger than that of longer chunks, consistent with previous results (Har-Shai Yahav and Zion Golumbic, 2021). However, while the syllable-rate effect was distributed over a large number of channels, the pseudoword-rate effect was predominantly observed over the left hemisphere, suggesting that the statistical learning of speech sequences can exert asymmetric effects. This is consistent with previous reports of left-hemispheric contributions to speech segmentation based on statistical regularities (Cunillera et al., 2009; López-Barroso et al., 2013). Similarly, studies using phase-locking measures to quantify tracking at suprasyllabic time scales showed more pronounced differences in the left hemisphere (Ding et al., 2017; Har-Shai Yahav and Zion Golumbic, 2021), although these pertained to multiword phrases rather than single words and were based on familiar disyllabic words rather than recently learned pseudowords. Interestingly, the left-hemispheric dominance found for the word-rate tracking in the current study contrasts with the right-hemispheric dominance found for tone-pair tracking in a similar study based on nonspeech musical sequences rather than on speech stimuli (Cappotto et al., 2023), suggesting differential tracking between speech and nonspeech sequences.

Besides their distinct individual effects on neural activity, regularities related to speech contents and timing demonstrated interactive effects, such that temporally regular deviant sounds yielded stronger ERF amplitudes than temporally irregular deviant sounds. This interaction was specific with respect to the hierarchical level of speech organization and its respective time scale, such that deviant pseudowords had a stronger amplitude following beat-based regularities, whereas deviant syllables had a stronger amplitude following single-interval regularities. These results build upon prior research indicating that temporal regularities increase MMRs (Yabe et al., 1997; Takegata and Morotomi, 1999; Todd et al., 2018; Lumaca et al., 2019; Jalewa et al., 2021) and demonstrate that these modulatory effects align consistently with expected time points, irrespective of the specific nature of the “when” regularity (whether single-interval or beat-based). However, there were no significant differences between the interactive effects observed at the lower versus higher level of the hierarchy. Instead, the interactive ERF effects were source-localized to shared regions, the right IFG and left SPL, where deviants presented at regular latencies were linked to lower source activity estimates than deviants presented at irregular latencies. Identifying these two regions as mediating interactions between “what” and “when” regularities extends the results of previous studies, where the left parietal cortex was found to be involved in integrating “what” and “when” information in speech processing (Orpella et al., 2020), while the right IFG was found to be inhibited by regular speech timing (metrical context; Rothermich and Kotz, 2013). Our DCM results suggest that the interactive effects of “what” and “when” regularities were subserved primarily by gain modulation, including weaker gain at hierarchically higher regions (the right IFG and left SPL) as identified in the source reconstruction. This relative attenuation at hierarchically higher levels was accompanied by stronger gain at lowest levels of the network including bilateral A1, consistent with previously reported gain-amplifying effects of temporal orienting on auditory processing (Auksztulewicz and Friston, 2015; Morillon et al., 2016; Auksztulewicz et al., 2019), as well as in the SMG, previously shown to be involved in processing temporal features of speech (Geiser et al., 2008).

This MEG study analyzing speech sequences aligns closely with findings from Cappotto et al. (2023), who used similar methods to examine EEG responses to musical sequences. In both cases, faster and slower “when” regularity intensified the deviant responses to unexpected elements at the appropriate hierarchical levels—for speech, syllables versus disyllabic pseudowords and for music, single tones versus tone pairs. Additionally, both studies found that the interactive effects of “what” and “when” regularities are associated with the left superior parietal lobule (SPL). However, our study also implicates the right inferior frontal gyrus (IFG) as a potential source of these sensor-level effects. Furthermore, both studies reported that the sensor-level effects could be attributed to increased gain in bilateral auditory cortices and reduced gain in other network nodes. The EEG study further linked these effects to changes in forward connectivity. Collectively, these findings suggest that the interactions between “what” and “when” regularities are consistent across different stimulus domains (speech and music) and data modalities (MEG and EEG), revealing largely overlapping cortical sources and connectivity modulations. Therefore, despite the inherent differences in temporal modulation patterns and acoustic features between music and speech (Siegel et al., 2012; Ding et al., 2017; Albouy et al., 2020; Zatorre, 2022), these common underlying mechanisms support the hypothesis that the interactions between “what” and “when” regularities are broadly generalizable across stimulus characteristics.

Taken together, our study complements recent model-based reports of cortical hierarchies aligning with speech processing hierarchies (Schmitt et al., 2021; Caucheteux et al., 2023) and suggests that while “what” and “when” predictions may jointly modulate speech processing, their interactions are not necessarily expressed at different levels of the cortical hierarchy. Instead, the effects of temporal regularities on unexpected speech sounds may be subserved by a common set of frontoparietal regions, reflecting attention-like amplification of mismatch responses due to temporal predictions (Auksztulewicz and Friston, 2015; Auksztulewicz et al., 2019) and irrespective of the contents of mispredicted stimuli. Rather than requiring dedicated resources to integrate “what” and “when” features separately at each stage of hierarchical speech processing, such a generic mechanism may help integrate streams of information across hierarchical levels.

Footnotes

  • This work was supported by the European Commission’s Marie Skłodowska-Curie Global Fellowship (750459 to R.A.); a grant from the European Commission/Hong Kong Research Grants Council Joint Research Scheme (9051402 to R.A. and J.S.); and a grant from the German Science Foundation (AU 423/2-1 to R.A.).

  • The authors declare no competing financial interests.

  • Correspondence should be addressed to Ryszard Auksztulewicz at ryszard.auksztulewicz{at}maastrichtuniversity.nl.

SfN exclusive license.

References

  1. ↵
    1. Albouy P,
    2. Benjamin L,
    3. Morillon B,
    4. Zatorre RJ
    (2020) Distinct sensitivity to spectrotemporal modulation supports brain asymmetry for speech and melody. Science 367:1043–1047. https://doi.org/10.1126/science.aaz3468
    OpenUrlAbstract/FREE Full Text
  2. ↵
    1. Arnal LH,
    2. Giraud A-L
    (2012) Cortical oscillations and sensory predictions. Trends Cogn Sci 16:390–398. https://doi.org/10.1016/j.tics.2012.05.003
    OpenUrlCrossRefPubMed
  3. ↵
    1. Auksztulewicz R,
    2. Friston K
    (2015) Attentional enhancement of auditory mismatch responses: a DCM/MEG study. Cereb Cortex 25:4273–4283. https://doi.org/10.1093/cercor/bhu323 pmid:25596591
    OpenUrlCrossRefPubMed
  4. ↵
    1. Auksztulewicz R,
    2. Myers NE,
    3. Schnupp JW,
    4. Nobre AC
    (2019) Rhythmic temporal expectation boosts neural activity by increasing neural gain. J Neurosci 39:9806–9817. https://doi.org/10.1523/JNEUROSCI.0925-19.2019 pmid:31662425
    OpenUrlAbstract/FREE Full Text
  5. ↵
    1. Auksztulewicz R,
    2. Schwiedrzik CM,
    3. Thesen T,
    4. Doyle W,
    5. Devinsky O,
    6. Nobre AC,
    7. Schroeder CE,
    8. Friston KJ,
    9. Melloni L
    (2018) Not all predictions are equal: “what” and “when” predictions modulate activity in auditory cortex through different mechanisms. J Neurosci 38:8680–8693. https://doi.org/10.1523/JNEUROSCI.0369-18.2018 pmid:30143578
    OpenUrlAbstract/FREE Full Text
  6. ↵
    1. Bastos AM,
    2. Usrey WM,
    3. Adams RA,
    4. Mangun GR,
    5. Fries P,
    6. Friston KJ
    (2012) Canonical microcircuits for predictive coding. Neuron 76:695–711. https://doi.org/10.1016/j.neuron.2012.10.038 pmid:23177956
    OpenUrlCrossRefPubMed
  7. ↵
    1. Belardinelli P,
    2. Ortiz E,
    3. Barnes G,
    4. Noppeney U,
    5. Preissl H
    (2012) Source reconstruction accuracy of MEG and EEG Bayesian inversion approaches. PLoS One 7:e51985. https://doi.org/10.1371/journal.pone.0051985 pmid:23284840
    OpenUrlCrossRefPubMed
  8. ↵
    1. Bouwer FL,
    2. Honing H,
    3. Slagter HA
    (2020) Beat-based and memory-based temporal expectations in rhythm: similar perceptual effects, different underlying mechanisms. J Cogn Neurosci 32:1221–1241. https://doi.org/10.1162/jocn_a_01529
    OpenUrlCrossRefPubMed
  9. ↵
    1. Bouwer FL,
    2. Werner CM,
    3. Knetemann M,
    4. Honing H
    (2016) Disentangling beat perception from sequential learning and examining the influence of attention and musical abilities on ERP responses to rhythm. Neuropsychologia 85:80–90. https://doi.org/10.1016/j.neuropsychologia.2016.02.018
    OpenUrlCrossRefPubMed
  10. ↵
    1. Breska A,
    2. Ivry RB
    (2018) Double dissociation of single-interval and rhythmic temporal prediction in cerebellar degeneration and Parkinson’s disease. Proc Natl Acad Sci U S A 115:12283–12288. https://doi.org/10.1073/pnas.1810596115 pmid:30425170
    OpenUrlAbstract/FREE Full Text
  11. ↵
    1. Cappotto D,
    2. Luo D,
    3. Lai HW,
    4. Peng F,
    5. Melloni L,
    6. Schnupp JWH,
    7. Auksztulewicz R
    (2023) “What” and “when” predictions modulate auditory processing in a mutually congruent manner. Front Neurosci 17:1180066. https://doi.org/10.3389/fnins.2023.1180066 pmid:37781257
    OpenUrlCrossRefPubMed
  12. ↵
    1. Caucheteux C,
    2. Gramfort A,
    3. King J-R
    (2023) Evidence of a predictive coding hierarchy in the human brain listening to speech. Nat Hum Behav 7:430–441. https://doi.org/10.1038/s41562-022-01516-2 pmid:36864133
    OpenUrlCrossRefPubMed
  13. ↵
    1. Celsis P,
    2. Boulanouar K,
    3. Doyon B,
    4. Ranjeva JP,
    5. Berry I,
    6. Nespoulous JL,
    7. Chollet F
    (1999) Differential fMRI responses in the left posterior superior temporal gyrus and left supramarginal gyrus to habituation and change detection in syllables and tones. Neuroimage 9:135–144. https://doi.org/10.1006/nimg.1998.0389
    OpenUrlCrossRefPubMed
  14. ↵
    1. Cunillera T,
    2. Càmara E,
    3. Toro JM,
    4. Marco-Pallares J,
    5. Sebastián-Galles N,
    6. Ortiz H,
    7. Pujol J,
    8. Rodríguez-Fornells A
    (2009) Time course and functional neuroanatomy of speech segmentation in adults. Neuroimage 48:541–553. https://doi.org/10.1016/j.neuroimage.2009.06.069
    OpenUrlCrossRefPubMed
  15. ↵
    1. David O,
    2. Harrison L,
    3. Friston KJ
    (2005) Modelling event-related responses in the brain. Neuroimage 25:756–770. https://doi.org/10.1016/j.neuroimage.2004.12.030
    OpenUrlCrossRefPubMed
  16. ↵
    1. de Cheveigné A,
    2. Simon JZ
    (2008) Denoising based on spatial filtering. J Neurosci Methods 171:331–339. https://doi.org/10.1016/j.jneumeth.2008.03.015 pmid:18471892
    OpenUrlCrossRefPubMed
  17. ↵
    1. Ding N,
    2. Melloni L,
    3. Zhang H,
    4. Tian X,
    5. Poeppel D
    (2016) Cortical tracking of hierarchical linguistic structures in connected speech. Nat Neurosci 19:158–164. https://doi.org/10.1038/nn.4186 pmid:26642090
    OpenUrlCrossRefPubMed
  18. ↵
    1. Ding N,
    2. Patel AD,
    3. Chen L,
    4. Butler H,
    5. Luo C,
    6. Poeppel D
    (2017) Temporal modulations in speech and music. Neurosci Biobehav Rev 81:181–187. https://doi.org/10.1016/j.neubiorev.2017.02.011
    OpenUrlCrossRefPubMed
  19. ↵
    1. Ding N,
    2. Simon JZ
    (2013) Power and phase properties of oscillatory neural responses in the presence of background activity. J Comput Neurosci 34:337–343. https://doi.org/10.1007/s10827-012-0424-6 pmid:23007172
    OpenUrlCrossRefPubMed
  20. ↵
    1. Doelling KB,
    2. Assaneo MF,
    3. Bevilacqua D,
    4. Pesaran B,
    5. Poeppel D
    (2019) An oscillator model better predicts cortical entrainment to music. Proc Natl Acad Sci U S A 116:10113–10121. https://doi.org/10.1073/pnas.1816414116 pmid:31019082
    OpenUrlAbstract/FREE Full Text
  21. ↵
    1. Donhauser PW,
    2. Baillet S
    (2020) Two distinct neural timescales for predictive speech processing. Neuron 105:385–393.e9. https://doi.org/10.1016/j.neuron.2019.10.019 pmid:31806493
    OpenUrlCrossRefPubMed
  22. ↵
    1. Eklund A,
    2. Nichols TE,
    3. Knutsson H
    (2016) Cluster failure: why fMRI inferences for spatial extent have inflated false-positive rates. Proc Natl Acad Sci U S A 113:7900–7905. https://doi.org/10.1073/pnas.1602413113 pmid:27357684
    OpenUrlAbstract/FREE Full Text
  23. ↵
    1. Federmeier KD
    (2007) Thinking ahead: the role and roots of prediction in language comprehension. Psychophysiology 44:491–505. https://doi.org/10.1111/j.1469-8986.2007.00531.x pmid:17521377
    OpenUrlCrossRefPubMed
  24. ↵
    1. Fitzgerald K,
    2. Auksztulewicz R,
    3. Provost A,
    4. Paton B,
    5. Howard Z,
    6. Todd J
    (2021) Hierarchical learning of statistical regularities over multiple timescales of sound sequence processing: a dynamic causal modeling study. J Cogn Neurosci 33:1549–1562. https://doi.org/10.1162/jocn_a_01735
    OpenUrlPubMed
  25. ↵
    1. Flandin G,
    2. Friston KJ
    (2019) Analysis of family-wise error rates in statistical parametric mapping using random field theory. Hum Brain Mapp 40:2052–2054. https://doi.org/10.1002/hbm.23839 pmid:29091338
    OpenUrlCrossRefPubMed
  26. ↵
    1. Friston K,
    2. Buzsáki G
    (2016) The functional anatomy of time: what and when in the brain. Trends Cogn Sci 20:500–511. https://doi.org/10.1016/j.tics.2016.05.001
    OpenUrlCrossRefPubMed
  27. ↵
    1. Friston KJ,
    2. Litvak V,
    3. Oswal A,
    4. Razi A,
    5. Stephan KE,
    6. van Wijk BCM,
    7. Ziegler G,
    8. Zeidman P
    (2016) Bayesian model reduction and empirical Bayes for group (DCM) studies. Neuroimage 128:413–431. https://doi.org/10.1016/j.neuroimage.2015.11.015 pmid:26569570
    OpenUrlCrossRefPubMed
  28. ↵
    1. Garrido MI,
    2. Friston KJ,
    3. Kiebel SJ,
    4. Stephan KE,
    5. Baldeweg T,
    6. Kilner JM
    (2008) The functional anatomy of the MMN: a DCM study of the roving paradigm. Neuroimage 42:936–944. https://doi.org/10.1016/j.neuroimage.2008.05.018 pmid:18602841
    OpenUrlCrossRefPubMed
  29. ↵
    1. Geiser E,
    2. Zaehle T,
    3. Jancke L,
    4. Meyer M
    (2008) The neural correlate of speech rhythm as evidenced by metrical speech processing. J Cogn Neurosci 20:541–552. https://doi.org/10.1162/jocn.2008.20029
    OpenUrlCrossRefPubMed
  30. ↵
    1. Gómez Varela I,
    2. Orpella J,
    3. Poeppel D,
    4. Ripolles P,
    5. Assaneo MF
    (2024) Syllabic rhythm and prior linguistic knowledge interact with individual differences to modulate phonological statistical learning. Cognition 245:105737. https://doi.org/10.1016/j.cognition.2024.105737
    OpenUrlCrossRefPubMed
  31. ↵
    1. Hakonen M,
    2. May PJC,
    3. Jääskeläinen IP,
    4. Jokinen E,
    5. Sams M,
    6. Tiitinen H
    (2017) Predictive processing increases intelligibility of acoustically distorted speech: behavioral and neural correlates. Brain Behav 7:e00789. https://doi.org/10.1002/brb3.789 pmid:28948083
    OpenUrlCrossRefPubMed
  32. ↵
    1. Har-Shai Yahav P,
    2. Zion Golumbic E
    (2021) Linguistic processing of task-irrelevant speech at a cocktail party. Elife 10:e65096. https://doi.org/10.7554/eLife.65096 pmid:33942722
    OpenUrlCrossRefPubMed
  33. ↵
    1. Heilbron M,
    2. Armeni K,
    3. Schoffelen J-M,
    4. Hagoort P,
    5. de Lange FP
    (2022) A hierarchy of linguistic predictions during natural language comprehension. Proc Natl Acad Sci U S A 119:e2201968119. https://doi.org/10.1073/pnas.2201968119 pmid:35921434
    OpenUrlCrossRefPubMed
  34. ↵
    1. Henson RN,
    2. Abdulrahman H,
    3. Flandin G,
    4. Litvak V
    (2019) Multimodal integration of M/EEG and f/MRI data in SPM12. Front Neurosci 13:300. https://doi.org/10.3389/fnins.2019.00300 pmid:31068770
    OpenUrlCrossRefPubMed
  35. ↵
    1. Hickok G
    (2012) The cortical organization of speech processing: feedback control and predictive coding the context of a dual-stream model. J Commun Disord 45:393–402. https://doi.org/10.1016/j.jcomdis.2012.06.004 pmid:22766458
    OpenUrlCrossRefPubMed
  36. ↵
    1. Hovsepyan S,
    2. Olasagasti I,
    3. Giraud A-L
    (2020) Combining predictive coding and neural oscillations enables online syllable recognition in natural speech. Nat Commun 11:3117. https://doi.org/10.1038/s41467-020-16956-5 pmid:32561726
    OpenUrlCrossRefPubMed
  37. ↵
    1. Ille N,
    2. Berg P,
    3. Scherg M
    (2002) Artifact correction of the ongoing EEG using spatial filters based on artifact and brain signal topographies. J Clin Neurophysiol 19:113–124. https://doi.org/10.1097/00004691-200203000-00002
    OpenUrlCrossRefPubMed
  38. ↵
    1. Ives DT,
    2. Smith DRR,
    3. Patterson RD
    (2005) Discrimination of speaker size from syllable phrases. J Acoust Soc Am 118:3816–3822. https://doi.org/10.1121/1.2118427 pmid:16419826
    OpenUrlCrossRefPubMed
  39. ↵
    1. Jalewa J,
    2. Todd J,
    3. Michie PT,
    4. Hodgson DM,
    5. Harms L
    (2021) Do rat auditory event related potentials exhibit human mismatch negativity attributes related to predictive coding? Hear Res 399:107992. https://doi.org/10.1016/j.heares.2020.107992
    OpenUrlCrossRefPubMed
  40. ↵
    1. Kawahara H
    (2006) STRAIGHT, exploitation of the other aspect of VOCODER: perceptually isomorphic decomposition of speech sounds. Acoust Sci Technol 27:349–353. https://doi.org/10.1250/ast.27.349
    OpenUrlCrossRef
  41. ↵
    1. Keitel A,
    2. Ince RAA,
    3. Gross J,
    4. Kayser C
    (2017) Auditory cortical delta-entrainment interacts with oscillatory power in multiple fronto-parietal networks. Neuroimage 147:32–42. https://doi.org/10.1016/j.neuroimage.2016.11.062 pmid:27903440
    OpenUrlCrossRefPubMed
  42. ↵
    1. Kilner JM,
    2. Kiebel SJ,
    3. Friston KJ
    (2005) Applications of random field theory to electrophysiology. Neurosci Lett 374:174–178. https://doi.org/10.1016/j.neulet.2004.10.052
    OpenUrlCrossRefPubMed
  43. ↵
    1. Klimovich-Gray A,
    2. Barrena A,
    3. Agirre E,
    4. Molinaro N
    (2021) One way or another: cortical language areas flexibly adapt processing strategies to perceptual and contextual properties of speech. Cereb Cortex 31:4092–4103. https://doi.org/10.1093/cercor/bhab071
    OpenUrlCrossRefPubMed
  44. ↵
    1. Kösem A,
    2. Bosker HR,
    3. Takashima A,
    4. Meyer A,
    5. Jensen O,
    6. Hagoort P
    (2018) Neural entrainment determines the words We hear. Curr Biol 28:2867–2875.e3. https://doi.org/10.1016/j.cub.2018.07.023
    OpenUrlCrossRefPubMed
  45. ↵
    1. Kotz SA,
    2. Stockert A,
    3. Schwartze M
    (2014) Cerebellum, temporal predictability and the updating of a mental model. Phil Trans R Soc Lond B 369:20130403. https://doi.org/10.1098/rstb.2013.0403 pmid:25385781
    OpenUrlCrossRefPubMed
  46. ↵
    1. Ling X,
    2. Sun P,
    3. Zhao L,
    4. Jiang S,
    5. Lu Y,
    6. Cheng X,
    7. Guo X,
    8. Zhu X,
    9. Zheng L
    (2022) Neural basis of the implicit learning of complex artificial grammar with nonadjacent dependencies. J Cogn Neurosci 34:2375–2389. https://doi.org/10.1162/jocn_a_01910
    OpenUrlCrossRefPubMed
  47. ↵
    1. Little S,
    2. Bonaiuto J,
    3. Meyer SS,
    4. Lopez J,
    5. Bestmann S,
    6. Barnes G
    (2018) Quantifying the performance of MEG source reconstruction using resting state data. Neuroimage 181:453–460. https://doi.org/10.1016/j.neuroimage.2018.07.030 pmid:30012537
    OpenUrlCrossRefPubMed
  48. ↵
    1. Litvak V,
    2. Friston K
    (2008) Electromagnetic source reconstruction for group studies. Neuroimage 42:1490–1498. https://doi.org/10.1016/j.neuroimage.2008.06.022 pmid:18639641
    OpenUrlCrossRefPubMed
  49. ↵
    1. Litvak V,
    2. Mattout J,
    3. Kiebel S,
    4. Phillips C,
    5. Henson R,
    6. Kilner J,
    7. Barnes G,
    8. Oostenveld R,
    9. Daunizeau J,
    10. Flandin G
    (2011) EEG and MEG data analysis in SPM8. Comput Intell Neurosci 2011:852961. https://doi.org/10.1155/2011/852961 pmid:21437221
    OpenUrlCrossRefPubMed
  50. ↵
    1. López-Barroso D,
    2. Catani M,
    3. Ripollés P,
    4. Dell’Acqua F,
    5. Rodríguez-Fornells A,
    6. de Diego-Balaguer R
    (2013) Word learning is mediated by the left arcuate fasciculus. Proc Natl Acad Sci U S A 110:13168–13173. https://doi.org/10.1073/pnas.1301696110 pmid:23884655
    OpenUrlAbstract/FREE Full Text
  51. ↵
    1. Lückmann HC,
    2. Jacobs HIL,
    3. Sack AT
    (2014) The cross-functional role of frontoparietal regions in cognition: internal attention as the overarching mechanism. Prog Neurobiol 116:66–86. https://doi.org/10.1016/j.pneurobio.2014.02.002
    OpenUrlCrossRefPubMed
  52. ↵
    1. Lumaca M,
    2. Trusbak Haumann N,
    3. Brattico E,
    4. Grube M,
    5. Vuust P
    (2019) Weighting of neural prediction error by rhythmic complexity: a predictive coding account using mismatch negativity. Eur J Neurosci 49:1597–1609. https://doi.org/10.1111/ejn.14329
    OpenUrlCrossRefPubMed
  53. ↵
    1. Mai G,
    2. Wang WS-Y
    (2023) Distinct roles of delta- and theta-band neural tracking for sharpening and predictive coding of multi-level speech features during spoken language processing. Hum Brain Mapp 44:6149–6172. https://doi.org/10.1002/hbm.26503 pmid:37818940
    OpenUrlCrossRefPubMed
  54. ↵
    1. Merchant H,
    2. Grahn J,
    3. Trainor L,
    4. Rohrmeier M,
    5. Fitch WT
    (2015) Finding the beat: a neural perspective across humans and non-human primates. Phil Trans R Soc Lond B 370:20140093. https://doi.org/10.1098/rstb.2014.0093 pmid:25646516
    OpenUrlCrossRefPubMed
  55. ↵
    1. Merchant H,
    2. Honing H
    (2013) Are non-human primates capable of rhythmic entrainment? Evidence for the gradual audiomotor evolution hypothesis. Front Neurosci 7:274. https://doi.org/10.3389/fnins.2013.00274 pmid:24478618
    OpenUrlCrossRefPubMed
  56. ↵
    1. Morillon B,
    2. Arnal LH,
    3. Schroeder CE,
    4. Keitel A
    (2019) Prominence of delta oscillatory rhythms in the motor cortex and their relevance for auditory and speech perception. Neurosci Biobehav Rev 107:136–142. https://doi.org/10.1016/j.neubiorev.2019.09.012
    OpenUrlCrossRefPubMed
  57. ↵
    1. Morillon B,
    2. Schroeder CE,
    3. Wyart V,
    4. Arnal LH
    (2016) Temporal prediction in lieu of periodic stimulation. J Neurosci 36:2342–2347. https://doi.org/10.1523/JNEUROSCI.0836-15.2016 pmid:26911682
    OpenUrlAbstract/FREE Full Text
  58. ↵
    1. Obleser J,
    2. Kayser C
    (2019) Neural entrainment and attentional selection in the listening brain. Trends Cogn Sci 23:913–926. https://doi.org/10.1016/j.tics.2019.08.004
    OpenUrlCrossRefPubMed
  59. ↵
    1. Oganian Y,
    2. Chang EF
    (2019) A speech envelope landmark for syllable encoding in human superior temporal gyrus. Sci Adv 5:eaay6279. https://doi.org/10.1126/sciadv.aay6279 pmid:31976369
    OpenUrlFREE Full Text
  60. ↵
    1. Oganian Y,
    2. Kojima K,
    3. Breska A,
    4. Cai C,
    5. Findlay A,
    6. Chang E,
    7. Nagarajan SS
    (2023) Phase alignment of low-frequency neural activity to the amplitude envelope of speech reflects evoked responses to acoustic edges, not oscillatory entrainment. J Neurosci 43:3909–3921. https://doi.org/10.1523/JNEUROSCI.1663-22.2023 pmid:37185238
    OpenUrlAbstract/FREE Full Text
  61. ↵
    1. Orpella J,
    2. Ripollés P,
    3. Ruzzoli M,
    4. Amengual JL,
    5. Callejas A,
    6. Martinez-Alvarez A,
    7. Soto-Faraco S,
    8. de Diego-Balaguer R
    (2020) Integrating when and what information in the left parietal lobe allows language rule generalization. PLoS Biol 18:e3000895. https://doi.org/10.1371/journal.pbio.3000895 pmid:33137084
    OpenUrlCrossRefPubMed
  62. ↵
    1. Petersson K-M,
    2. Folia V,
    3. Hagoort P
    (2012) What artificial grammar learning reveals about the neurobiology of syntax. Brain Lang 120:83–95. https://doi.org/10.1016/j.bandl.2010.08.003
    OpenUrlCrossRefPubMed
  63. ↵
    1. Poeppel D,
    2. Assaneo MF
    (2020) Speech rhythms and their neural foundations. Nat Rev Neurosci 21:322–334. https://doi.org/10.1038/s41583-020-0304-4
    OpenUrlCrossRefPubMed
  64. ↵
    1. Reiche M,
    2. Hartwigsen G,
    3. Widmann A,
    4. Saur D,
    5. Schröger E,
    6. Bendixen A
    (2013) Involuntary attentional capture by speech and non-speech deviations: a combined behavioral-event-related potential study. Brain Res 1490:153–160. https://doi.org/10.1016/j.brainres.2012.10.055
    OpenUrlCrossRefPubMed
  65. ↵
    1. Riecke L,
    2. Formisano E,
    3. Sorger B,
    4. Başkent D,
    5. Gaudrain E
    (2018) Neural entrainment to speech modulates speech intelligibility. Curr Biol 28:161–169.e5. https://doi.org/10.1016/j.cub.2017.11.033
    OpenUrlCrossRefPubMed
  66. ↵
    1. Rimmele JM,
    2. Sun Y,
    3. Michalareas G,
    4. Ghitza O,
    5. Poeppel D
    (2023) Dynamics of functional networks for syllable and word-level processing. Neurobiol Lang 4:120–144. https://doi.org/10.1162/nol_a_00089 pmid:37229144
    OpenUrlCrossRefPubMed
  67. ↵
    1. Rosch RE,
    2. Auksztulewicz R,
    3. Leung PD,
    4. Friston KJ,
    5. Baldeweg T
    (2019) Selective prefrontal disinhibition in a roving auditory oddball paradigm under N-methyl-D-aspartate receptor blockade. Biol Psychiatry Cogn Neurosci Neuroimaging 4:140–150. https://doi.org/10.1016/j.bpsc.2018.07.003 pmid:30115499
    OpenUrlCrossRefPubMed
  68. ↵
    1. Rothermich K,
    2. Kotz SA
    (2013) Predictions in speech comprehension: fMRI evidence on the meter-semantic interface. Neuroimage 70:89–100. https://doi.org/10.1016/j.neuroimage.2012.12.013
    OpenUrlCrossRefPubMed
  69. ↵
    1. Ryskin R,
    2. Nieuwland MS
    (2023) Prediction during language comprehension: what is next? Trends Cogn Sci 27:1032–1052. https://doi.org/10.1016/j.tics.2023.08.003 pmid:37704456
    OpenUrlCrossRefPubMed
  70. ↵
    1. Schmitt L-M,
    2. Erb J,
    3. Tune S,
    4. Rysop AU,
    5. Hartwigsen G,
    6. Obleser J
    (2021) Predicting speech from a cortical hierarchy of event-based time scales. Sci Adv 7:eabi6070. https://doi.org/10.1126/sciadv.abi6070 pmid:34860554
    OpenUrlCrossRefPubMed
  71. ↵
    1. Seghier ML
    (2023) Multiple functions of the angular gyrus at high temporal resolution. Brain Struct Funct 228:7–46. https://doi.org/10.1007/s00429-022-02512-y
    OpenUrlCrossRefPubMed
  72. ↵
    1. Sengupta P,
    2. Burgaleta M,
    3. Zamora-López G,
    4. Basora A,
    5. Sanjuán A,
    6. Deco G,
    7. Sebastian-Galles N
    (2019) Traces of statistical learning in the brain’s functional connectivity after artificial language exposure. Neuropsychologia 124:246–253. https://doi.org/10.1016/j.neuropsychologia.2018.12.001
    OpenUrlCrossRefPubMed
  73. ↵
    1. Shain C,
    2. Blank IA,
    3. van Schijndel M,
    4. Schuler W,
    5. Fedorenko E
    (2020) fMRI reveals language-specific predictive coding during naturalistic sentence comprehension. Neuropsychologia 138:107307. https://doi.org/10.1016/j.neuropsychologia.2019.107307 pmid:31874149
    OpenUrlCrossRefPubMed
  74. ↵
    1. Siegel M,
    2. Donner TH,
    3. Engel AK
    (2012) Spectral fingerprints of large-scale neuronal interactions. Nat Rev Neurosci 13:121–134. https://doi.org/10.1038/nrn3137
    OpenUrlCrossRefPubMed
  75. ↵
    1. Sohoglu E,
    2. Davis MH
    (2016) Perceptual learning of degraded speech by minimizing prediction error. Proc Natl Acad Sci U S A 113:E1747–E1756. https://doi.org/10.1073/pnas.1523266113 pmid:26957596
    OpenUrlAbstract/FREE Full Text
  76. ↵
    1. Sokoliuk R,
    2. Degano G,
    3. Banellis L,
    4. Melloni L,
    5. Hayton T,
    6. Sturman S,
    7. Veenith T,
    8. Yakoub KM,
    9. Belli A,
    10. Noppeney U,
    11. Cruse D
    (2021) Covert speech comprehension predicts recovery from acute unresponsive states. Ann Neurol 89:646–656. https://doi.org/10.1002/ana.25995
    OpenUrlCrossRefPubMed
  77. ↵
    1. Stephan KE,
    2. Penny WD,
    3. Moran RJ,
    4. den Ouden HEM,
    5. Daunizeau J,
    6. Friston KJ
    (2010) Ten simple rules for dynamic causal modeling. Neuroimage 49:3099–3109. https://doi.org/10.1016/j.neuroimage.2009.11.015 pmid:19914382
    OpenUrlCrossRefPubMed
  78. ↵
    1. Stolk A,
    2. Todorovic A,
    3. Schoffelen J-M,
    4. Oostenveld R
    (2013) Online and offline tools for head movement compensation in MEG. Neuroimage 68:39–48. https://doi.org/10.1016/j.neuroimage.2012.11.047
    OpenUrlCrossRefPubMed
  79. ↵
    1. Su Y,
    2. MacGregor LJ,
    3. Olasagasti I,
    4. Giraud A-L
    (2023) A deep hierarchy of predictions enables online meaning extraction in a computational model of human speech comprehension. PLoS Biol 21:e3002046. https://doi.org/10.1371/journal.pbio.3002046 pmid:36947552
    OpenUrlCrossRefPubMed
  80. ↵
    1. Takegata R,
    2. Morotomi T
    (1999) Integrated neural representation of sound and temporal features in human auditory sensory memory: an event-related potential study. Neurosci Lett 274:207–210. https://doi.org/10.1016/S0304-3940(99)00711-9
    OpenUrlCrossRefPubMed
  81. ↵
    1. Teki S,
    2. Grube M,
    3. Kumar S,
    4. Griffiths TD
    (2011) Distinct neural substrates of duration-based and beat-based auditory timing. J Neurosci 31:3805–3812. https://doi.org/10.1523/JNEUROSCI.5561-10.2011 pmid:21389235
    OpenUrlAbstract/FREE Full Text
  82. ↵
    1. Todd J,
    2. Petherbridge A,
    3. Speirs B,
    4. Provost A,
    5. Paton B
    (2018) Time as context: the influence of hierarchical patterning on sensory inference. Schizophr Res 191:123–131. https://doi.org/10.1016/j.schres.2017.03.033
    OpenUrlCrossRefPubMed
  83. ↵
    1. Todorovic A,
    2. Auksztulewicz R
    (2021) Dissociable neural effects of temporal expectations due to passage of time and contextual probability. Hear Res 399:107871. https://doi.org/10.1016/j.heares.2019.107871
    OpenUrlCrossRefPubMed
  84. ↵
    1. Vrba J,
    2. Robinson SE
    (2001) Signal processing in magnetoencephalography. Methods 25:249–271. https://doi.org/10.1006/meth.2001.1238
    OpenUrlCrossRefPubMed
  85. ↵
    1. Wilson B,
    2. Kikuchi Y,
    3. Sun L,
    4. Hunter D,
    5. Dick F,
    6. Smith K,
    7. Thiele A,
    8. Griffiths TD,
    9. Marslen-Wilson WD,
    10. Petkov CI
    (2015) Auditory sequence processing reveals evolutionarily conserved regions of frontal cortex in macaques and humans. Nat Commun 6:8901. https://doi.org/10.1038/ncomms9901 pmid:26573340
    OpenUrlCrossRefPubMed
  86. ↵
    1. Wipf D,
    2. Nagarajan S
    (2009) A unified Bayesian framework for MEG/EEG source imaging. Neuroimage 44:947–966. https://doi.org/10.1016/j.neuroimage.2008.02.059 pmid:18602278
    OpenUrlCrossRefPubMed
  87. ↵
    1. Yabe H,
    2. Tervaniemi M,
    3. Reinikainen K,
    4. Näätänen R
    (1997) Temporal window of integration revealed by MMN to sound omission. Neuroreport 8:1971–1974. https://doi.org/10.1097/00001756-199705260-00035
    OpenUrlCrossRefPubMed
  88. ↵
    1. Zatorre RJ
    (2022) Hemispheric asymmetries for music and speech: spectrotemporal modulations and top-down influences. Front Neurosci 16:1075511. https://doi.org/10.3389/fnins.2022.1075511 pmid:36605556
    OpenUrlCrossRefPubMed
  89. ↵
    1. Zoefel B,
    2. Archer-Boyd A,
    3. Davis MH
    (2018) Phase entrainment of brain oscillations causally modulates neural responses to intelligible speech. Curr Biol 28:401–408.e5. https://doi.org/10.1016/j.cub.2017.11.071 pmid:29358073
    OpenUrlCrossRefPubMed
  90. ↵
    1. Zou J,
    2. Xu C,
    3. Luo C,
    4. Jin P,
    5. Gao J,
    6. Li J,
    7. Gao J,
    8. Ding N,
    9. Luo B
    (2021) θ-Band cortical tracking of the speech envelope shows the linear phase property. eNeuro 8:ENEURO.0058-21.2021. https://doi.org/10.1523/ENEURO.0058-21.2021 pmid:34380659
    OpenUrlAbstract/FREE Full Text
Back to top

In this issue

The Journal of Neuroscience: 45 (20)
Journal of Neuroscience
Vol. 45, Issue 20
14 May 2025
  • Table of Contents
  • About the Cover
  • Index by author
  • Masthead (PDF)
Email

Thank you for sharing this Journal of Neuroscience article.

NOTE: We request your email address only to inform the recipient that it was you who recommended this article, and that it is not junk mail. We do not retain these email addresses.

Enter multiple addresses on separate lines or separate them with commas.
“What” and “When” Predictions Jointly Modulate Speech Processing
(Your Name) has forwarded a page to you from Journal of Neuroscience
(Your Name) thought you would be interested in this article in Journal of Neuroscience.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Print
View Full Page PDF
Citation Tools
“What” and “When” Predictions Jointly Modulate Speech Processing
Ryszard Auksztulewicz, Ozan Bahattin Ödül, Saskia Helbling, Ana Böke, Drew Cappotto, Dan Luo, Jan Schnupp, Lucía Melloni
Journal of Neuroscience 14 May 2025, 45 (20) e1049242025; DOI: 10.1523/JNEUROSCI.1049-24.2025

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Respond to this article
Request Permissions
Share
“What” and “When” Predictions Jointly Modulate Speech Processing
Ryszard Auksztulewicz, Ozan Bahattin Ödül, Saskia Helbling, Ana Böke, Drew Cappotto, Dan Luo, Jan Schnupp, Lucía Melloni
Journal of Neuroscience 14 May 2025, 45 (20) e1049242025; DOI: 10.1523/JNEUROSCI.1049-24.2025
Twitter logo Facebook logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Jump to section

  • Article
    • Abstract
    • Significance Statement
    • Introduction
    • Materials and Methods
    • Results
    • Discussion
    • Footnotes
    • References
  • Figures & Data
  • Info & Metrics
  • eLetters
  • Peer Review
  • PDF

Keywords

  • audition
  • effective connectivity
  • magnetoencephalography
  • predictive processing
  • source reconstruction
  • speech processing

Responses to this article

Respond to this article

Jump to comment:

No eLetters have been published for this article.

Related Articles

Cited By...

More in this TOC Section

Research Articles

  • Phospho-CREB regulation on NMDA glutamate receptor 2B and mitochondrial calcium uniporter in the ventrolateral periaqueductal gray controls chronic morphine withdrawal in male rats.
  • Role of hypothalamic CRH neurons in regulating the impact of stress on memory and sleep
  • Bmal1 modulates striatal cAMP signaling and motor learning
Show more Research Articles

Behavioral/Cognitive

  • Phospho-CREB regulation on NMDA glutamate receptor 2B and mitochondrial calcium uniporter in the ventrolateral periaqueductal gray controls chronic morphine withdrawal in male rats.
  • Shared and diverging neural dynamics underlying false and veridical perception
  • Cerebellar Activity Affects Distal Cortical Physiology and Synaptic Plasticity in a Human Parietal–Motor Pathway Associated with Motor Actions
Show more Behavioral/Cognitive
  • Home
  • Alerts
  • Follow SFN on BlueSky
  • Visit Society for Neuroscience on Facebook
  • Follow Society for Neuroscience on Twitter
  • Follow Society for Neuroscience on LinkedIn
  • Visit Society for Neuroscience on Youtube
  • Follow our RSS feeds

Content

  • Early Release
  • Current Issue
  • Issue Archive
  • Collections

Information

  • For Authors
  • For Advertisers
  • For the Media
  • For Subscribers

About

  • About the Journal
  • Editorial Board
  • Privacy Notice
  • Contact
  • Accessibility
(JNeurosci logo)
(SfN logo)

Copyright © 2025 by the Society for Neuroscience.
JNeurosci Online ISSN: 1529-2401

The ideas and opinions expressed in JNeurosci do not necessarily reflect those of SfN or the JNeurosci Editorial Board. Publication of an advertisement or other product mention in JNeurosci should not be construed as an endorsement of the manufacturer’s claims. SfN does not assume any responsibility for any injury and/or damage to persons or property arising from or related to any use of any material contained in JNeurosci.