Abstract
Human listeners possess an innate capacity to discern patterns within rapidly unfolding sensory input. Core questions, guiding ongoing research, focus on the mechanisms through which these representations are acquired and whether the brain prioritizes or suppresses predictable sensory signals. Previous work, using fast auditory sequences (tone-pips presented at a rate of 20 Hz), revealed sustained response effects that appear to track the dynamic predictability of the sequence. Here, we extend the investigation to slower sequences (4 Hz), permitting the isolation of responses to individual tones. Stimuli were 50 ms tone-pips, ordered into random (RND) and regular (REG; a repeating pattern of 10 frequencies) sequences; Two timing profiles were created: in “fast” sequences, tone-pips were presented in direct succession (20 Hz); in “slow” sequences, tone-pips were separated by a 200 ms silent gap (4 Hz). Naive participants (N = 22; both sexes) passively listened to these sequences, while brain responses were recorded using magnetoencephalography (MEG). Results unveiled a heightened magnitude of sustained brain responses in REG when compared to RND patterns. This manifested from three tones after the onset of the pattern repetition, even in the context of slower sequences characterized by extended pattern durations (2,500 ms). This observation underscores the remarkable implicit sensitivity of the auditory brain to acoustic regularities. Importantly, brain responses evoked by single tones exhibited the opposite pattern—stronger responses to tones in RND than REG sequences. The demonstration of simultaneous but opposing sustained and evoked response effects reveals concurrent processes that shape the representation of unfolding auditory patterns.
Significance Statement
Humans excel at detecting predictable patterns within sound sequences, a process crucial for listening, language processing, and music appreciation. However, questions persist about the underlying neural mechanisms and the specific information monitored by the brain. Our study addresses these questions by analyzing magnetoencephalography (MEG) signals from participants exposed to predictable and unpredictable tone-pip patterns. We found that the MEG signal simultaneously captures two crucial aspects of predictability tracking. Firstly, sustained MEG activity, tracking the sequence's evolution, dynamically assesses pattern predictability, shedding light on how the brain evaluates reliability. Secondly, phasic MEG activity, reflecting responses to individual events, shows reduced activity to predictable tones, aligning with the idea that the brain anticipates and efficiently encodes upcoming events in predictable contexts.
Introduction
The physical rules that govern the environment and impose constraints on its agents result in statistically structured, predictable sensory signals. The brain is hypothesized to have developed the capacity to rapidly detect and track the regularities within these signals (de Lange et al., 2018; Press et al., 2020). This ability plays a crucial role in the comprehension of our surroundings, facilitating efficient recognition and processing of incoming information, to empower us to respond rapidly and adaptively to changing circumstances.
The auditory system, in particular, has demonstrated remarkable tuning to regularities across various time scales and dimensions (Bendixen, 2014; Carbajal and Malmierca, 2018; Heilbron and Chait, 2018; Asokan et al., 2021; Fitzgerald and Todd, 2020). This plays a crucial role in our ability to understand spoken language (Arnal and Giraud, 2012), appreciate the nuances of musical compositions (Koelsch et al., 2019) and make sense of the complex soundscape that surrounds us. However, core questions regarding the mechanisms through which regularity is discovered and tracked remain elusive. In particular, pivotal issues revolve around whether the brain chooses to prioritize or suppress predictable sensory signals (Press et al., 2020).
Barascud et al. (2016) provided insight into the brain's automatic ability to detect the emergence of predictable acoustic structure by examining low-frequency activity in the M/EEG signal (Sohoglu and Chait, 2016; Southwell et al., 2017; Herrmann and Johnsrude, 2018; Herrmann et al., 2019; Zhao et al., 2024). Using rapidly unfolding (20 Hz) tone-pip sequences that contained transitions from a random (RND) to a regularly repeating pattern (REG), they observed that a gradual increase in sustained power accompanies the emergence of repeating structures. The timing of the differentiation between REG and RND sequences (3 tones after the first cycle) was consistent with that predicted by an ideal observer model (Pearce, 2005; Harrison et al., 2020), demonstrating statistically efficient processing of structure even when not behaviorally relevant (Barascud et al., 2016).
The sustained response effect is interesting for several reasons: Firstly, it suggests that the brain encodes the inherent state of the stimulus (RND vs REG) rather than merely registering changes in the environment. Secondly, the observed increase in sustained power during structure discovery challenges our understanding of how the brain processes and represents predictability. Specifically, it appears to contradict expectations derived from predictive coding frameworks (e.g., Rao and Ballard, 1999; Friston, 2005, 2009), where predictable information is typically associated with reduced neural activity, as the brain can efficiently encode and predict upcoming events (de Lange et al., 2018). Barascud et al. showed that the sustained response, underpinned by activation in the auditory cortex (AC), hippocampus (HP), and inferior frontal gyrus (IFG), increases with the predictability of the ongoing stimulus sequence. This prompted the hypothesis that it might reflect the process of tracking the inferred reliability of the unfolding input (“precision”; the accuracy, or conversely the “expected uncertainty” with which future inputs can be predicted, Yon and Frith, 2021; O’Reilly et al., 2013) whereby predictable sensory streams are associated with heightened sensitivity (see also Zhao et al., 2024).
Several issues need to be addressed for a better interpretation of the sustained response. Firstly, it is important to consider that the effects observed may be specific to the rapid sequences used in Barascud et al. (2016). Other research (e.g., reviewed by de Lange et al., 2018; Heilbron and Chait, 2018) has focused on slower patterns, which may elicit different neural responses. Secondly, it is crucial to determine whether the observed effect primarily reflects a shift in background neural activity or if it also extends to modulations of responses to individual events due to their integration within the structured sequence.
To address these questions, the current study expands upon the original stimulus by introducing silent gaps between successive tones (Figs. 1, 2). We aim to explore the generality of the sustained-response effects across different temporal scales and provide a clearer understanding of the mechanisms involved in the processing of structured auditory sequences.
Materials and Methods
Experiment 1—online behavioral study
The behavioral study was designed to probe how the introduction of silent gaps between tones affects explicit pattern detection. We sought to pinpoint an optimal gap duration that is sufficiently long to allow us to isolate responses to individual tones, yet brief enough to maintain high-performance levels in pattern detection.
Stimuli
Stimuli were sequences of 50 ms tone-pips (gated on and off with 5 ms raised cosine ramps) drawn from a pool of 20 values equally spaced on a logarithmic scale between 222 and 2,000 Hz (12% steps). The order in which these tone-pips were successively distributed defined two different sequence types. RND sequences consisted of 20 tone-pips (sampled from the full pool) arranged in random order. Each tone-pip occurred equi-probably across the sequence duration. RNDREG sequences contained a transition between a RND sequence, and a regularly repeating pattern (REG). REG consisted of 10 different tone-pips, randomly chosen from the full pool on each trial and repeated in three identical cycles. The RND to REG transition always occurred after 30 tone-pips. Opting for this method, as opposed to a variable transition time, ensured a consistent context (in terms of frequency information available) both preceding each transition and across different gap duration conditions. RND and RNDREG sequences were generated anew for each trial and presented equi-probably throughout the experiment. Therefore, the occurrence of a transition in any given trial was unpredictable. The amplitude of each tone-pip was normalized to yield an approximately similar perceived loudness (Moore, 2014). Across blocks, the inter-ton intervals were manipulated to form four conditions (Fig. 1A): Gap0 (continuous presentation), Gap100 (a 100 ms gap inserted between tones), Gap200 (a 200 ms gap inserted between tones), Gap500 (a 500 ms gap inserted between tones).
Two control stimuli were also included: sequences of contiguous (no silent gap) tone-pips of a fixed frequency (CONT) that lasted 4,000 ms, and sequences with a step change in frequency partway through the trial (STEP, the change always occurred after 2,000 ms). These were used to measure individuals’ response time to simple acoustic changes and served as “catch trials” to assess task engagement.
Procedure
The experiment was implemented online using the Gorilla Experiment Builder (www.gorilla.sc). Before the main task, participants completed a headphone screening task (Milne et al., 2020) to ensure they were using appropriate audio equipment. They then received an explanation of the task and completed a practice session. Due to length constraints, the experiment was divided into two parts, performed by two different groups of participants. Experiment 1a contained the Gap0, Gap100, and Gap200 conditions along with the control stimuli (STEP and CONT; see above). Experiment 1b contained the Gap0, Gap100, and Gap500 conditions, along with the control stimuli.
Participants were instructed to respond, by pressing a keyboard button, as soon as possible once they had detected a RNDREG transition or a STEP. To motivate participants to focus on the task, they were given feedback on their accuracy and speed after each trial. A small monetary bonus was given for each correct response (Bianco et al., 2021).
In each experiment, three blocks of 40 trials were delivered. Each block contained the following sequence types: 15 RNDREG, 15 RND, 5 STEP, and 5 CONT. The first block always presented the Gap0 condition. This block lasted 5 min. Thereafter, listeners completed the other two blocks (Gap100 and Gap200 in experiment 1a, Gap100 and Gap500 in experiment 1b) in random order. Starting with Gap0 ensured that all participants experienced the easiest condition first and had adequate opportunity to practice the regularity detection task, reducing the likelihood of frustration and dropout that may occur if participants are immediately faced with the most difficult condition. The main task in experiment 1a lasted about 20 min, and that in experiment 1b lasted about 30 min.
Participant rejection criteria
Previous work (Barascud et al., 2016; Bianco et al., 2020) demonstrated that participants are sensitive to the emergence of regularity in RNDREG sequences, exhibiting high sensitivity and rapid detection time (usually responding within two regularity cycles). Due to the online nature of the present experiments and associated reduced control over participants’ environments, equipment, and engagement (Bianco et al., 2021), it was important to implement a series of rejection criteria to make sure that data reflect true sequence tracking sensitivity. Therefore, subject data were excluded from the experiment following the below (a priori determined) criteria:
Failure on the headphone screen: We used the task introduced by Milne et al. (2020). Participants who did not pass the screening procedure did not proceed to the main experiment.
Low performance in the practice run: To ensure participants understand the task, 24 trials with no gap (10 RNDREG, 10 RND, 2 CONT, and 2 STEP) were given. Participants did not proceed to the main task if their correct response rate was below 80% in the practice task (see also Bianco et al., 2023). This ensured that those participants who proceeded to the main experiment could detect the REG transitions, thus allowing us to focus on how performance is affected by increasing the gaps between tones. Our previous experience with similar stimuli in lab settings (see e.g., Barascud et al., 2016; Bianco et al., 2020) suggests that the vast majority of young participants can achieve ceiling performance. We, therefore, reasoned that those online participants who performed below 80% are likely not sufficiently engaged with the task (i.e., distracted, not following instructions, etc.).
Of those participants who completed the full experiment, we rejected the data from those subjects who failed to respond to STEP trials (allowing at most one miss per block) or whose RT to STEP trials fell above 2 STDEV relative to the group mean. Failure to respond quickly to the (easy) STEP trials indicated low task engagement.
Participants
Two participant groups were recruited via the Prolific platform (https://www.prolific.co/). Based on previous work with a similar task (Barascud et al., 2016; Bianco et al., 2020), we aimed to recruit N = 25 valid datasets per group.
In Experiment 1a, 43 participants completed the experiment. Data from 14 participants were rejected due to failure to respond to STEP trials or because responses to STEP trials were too slow (rejection criterion #3, above). Data from 29 subjects are included in the analysis below (seven females; average age, 24.3 ± 4.79 years). Additionally, 42 subjects did not proceed to the main task due to not passing the pre-determined performance threshold in the practice task (rejection criterion #2, above). This number is much higher than that normally encountered in the lab (e.g., Bianco et al., 2020) and likely reflects variable engagement by online participants.
In experiment 1b, 34 participants completed the experiment. Data from seven participants were rejected due to failure to respond to STEP trials or because responses to STEP trials were too slow (rejection criterion #3, above). Data from 27 subjects are included in the analysis below (six females; average age: 22 ± 4.69 years). Additionally, 13 subjects did not proceed to the main task due to low performance in the practice task (rejection criterion #2, above).
In both experiments, about 30% of the participants who initially accessed the experiment did not pass the headphone screen and therefore did not proceed further. This is a similar fail rate to that reported in Milne et al. (2020).
Experiment 2—MEG in naive passively listening participants
Stimuli
Stimuli (Fig. 2) were generated similarly to those in Experiment 1. To reduce the duration of the (passive listening) MEG experiment, we focused on REG and RND sequences, without transitions. Sensitivity to regularity is investigated by comparing brain responses to the onset of REG and RND sequences. During the initial portion of the sequence (first cycle in REG), responses to the two sequence types should be identical, with differences emerging as soon as the auditory system has discovered that the pattern is repeating. Ideal observer modeling (Barascud et al., 2016; Harrison et al., 2020) suggests that about 3 tones, following the first cycle, are needed for the transition to be statistically detectable. REG sequences were generated by randomly selecting (without replacement) 10 frequencies from the pool and iterating that order to create a regularly repeating pattern. RND sequences consisted of a random succession of 10 tones, newly selected on each trial. All stimuli contained 60 tone-pips. Two timing conditions were used: in “Fast” sequences, tone-pips were presented in direct succession (20 Hz rate; 500 ms REG cycle duration; 3 s overall sequence duration); in “Slow” sequences, tone-pips were separated by a 200 ms silent gap (4 Hz rate; 2,500 ms REG cycle duration; 15 s overall sequence duration). One hundred instances of each condition were presented. Sequences were generated anew for each trial such that each stimulus was created of the same frequency “building blocks” (random selection of 10 out of 20 frequencies). Condition presentation was fully randomized.
Procedure
The experiment was controlled with the Psychophysics Toolbox extension in MATLAB (Kleiner et al., 2007). All auditory stimuli were presented binaurally via tube earphones (EARTONE 3A 10 Ω; Etymotic Research) inserted into the ear canal, with the volume set at a comfortable listening level, adjusted for each participant.
The experiment lasted 40 min. Participants listened passively to the stimuli (presented in random order with an ISI jittered between 1,400 and 1,800 ms) and engaged in an incidental visual task. The task consisted of landscape images, grouped in triplets (the duration of each image was 5 s, with 2 s ISI between trials during which the screen was blank). Participants were instructed to fixate on a cross in the center of the screen and press a button whenever the third image was identical to the first image (10% trials). The visual task served as a decoy task for diverting subjects’ attention away from the auditory stimuli. Participants were naive to the nature of the auditory stimuli and encouraged to focus on the visual task. Feedback was displayed at the end of each block. The experimental session was divided into six 12 min blocks. Participants were allowed a short break between blocks but were required to remain still.
Participants
Twenty-three naive subjects participated in the study. One participant's data were discarded due to excessive noise in the data. Data from 22 participants (11 females; average age, 25.14 ± 4.61 years) are reported below.
Data recording and pre-processing
Magnetic signals were recorded using CTF-275 MEG system (axial gradiometers, 274 channels; 30 reference channels; VSM MedTech). The acquisition was continuous, with a sampling rate of 600 Hz. Offline low-pass filtering was applied at 30 Hz (all filtering in this study was performed with a two-pass, Butterworth filter with zero phase shift). All pre-processing and time domain analyses were performed using the fieldtrip toolbox (Oostenveld et al., 2011). To analyze time domain data, we selected the 40 most responsive channels for each subject. This was done by collapsing across all conditions and identifying the M100 component of the onset response (Näätänen and Picton, 1987; Stufflebeam et al., 1998; Näätänen et al., 2011; Gorina-Careta et al., 2021), as a source–sink pair located over the temporal region of each hemisphere. For each subject, the 40 most strongly activated channels at the peak of the M100 (20 in each hemisphere; 10 in each sink/source) were considered to best reflect auditory activity and thus selected for all subsequent time-domain analyses. This procedure served the dual purpose of enhancing the relevant response components and compensating for any channel misalignment between subjects.
We report two time-domain analysis pipelines.
Whole sequence analysis
Initially, we focused on responses to the entire sequence. Low-frequency activity is of prime importance as a possible marker of predictability tracking (Barascud et al., 2016; Southwell et al., 2017). Therefore, no high-pass filter was used. Data were segmented into epochs from 200 ms before onset to 1,000 ms post offset (yielding epochs of 4,200 ms and 16,200 ms in “Fast” and “Slow” conditions, respectively). Epochs containing artifacts were removed (based on within trial variance summary statistics) using Fieldtrip's manual visual artifact rejection function. Around 5% of epochs, containing large shifts in magnetic activity, were removed from each subject's data (range 0–10%). The remaining epochs were then averaged by condition. To help denoise the data from “Slow” conditions (low-frequency drift artifacts) denoising source separation (DSS) analysis was applied to maximize reproducibility across trials (Särelä and Valpola, 2005; de Cheveigné and Simon, 2008; de Cheveigné and Parra, 2014). For each subject, the three most significant components (i.e., the three “most reproducible” components across trials) were kept and projected back into sensor space.
The single-tone response analysis
A subsequent analysis focused on responses to individual tones in REG versus RND sequences in the “Slow” sequences. To identify activity associated with individual tone-evoked responses which might be masked by the sustained activity, the raw data were high-pass filtered at 2 Hz. Filtered data were then cut into individual tone epochs, from 50 ms before the onset of the tone, to 200 ms post onset. Responses from tones within each cycle were averaged, yielding six time series per condition per subject (tones in Cycle#1, Cycle#2, etc.). Time series were baselined based on pre-tone onset activity.
Statistical analysis
The time domain data are summarized as root mean square (RMS) across the 40 selected channels for each subject (see above). The RMS is a useful summary signal, reflecting the instantaneous power of the neural response irrespective of its polarity. Group RMS (RMS of individual subject RMSs) is plotted; statistical analysis was always performed across subjects.
To evaluate differences between conditions (RND vs REG), the RMS differences at each time point were computed for each participant, and a bootstrap re-sampling (Efron and Tibshirani, 1998) was applied (10,000 iterations) on the entire epoch. Significance was inferred by inspecting the proportion of bootstrap iterations that fell above or below zero, here p = 0.01 (and p = 0.05 for the sustained response for “Slow” sequences; Fig. 4) was used as a threshold.
Source analysis
To estimate the brain sources that underlie the observed time domain effects at the sensor level, we performed source reconstruction using the standard approach implemented in SPM12 (Litvak and Friston, 2008; López et al., 2014; Bartha-Doering et al., 2015) Sensor-level data were converted from Fieldtrip to SPM. By using three fiducial marker locations, the data were co-registered to a generic 8196-vertex inverse-normalized canonical mesh warped to match the SPM's template head model based on the MNI brain (Ashburner and Friston, 2005). This had the advantage of providing a one-to-one mapping between the individual's source-space and the template space, facilitating group analyses (Litvak and Friston, 2008). The forward model was solved with a single shell forward head model for all subjects. Source reconstruction was performed using the multiple sparse priors (MSP) model that assumes that activity can be expressed in multiple patches or covariance components, each of which has an associated hyperparameter (Litvak and Friston, 2008; López et al., 2014; Bartha-Doering et al., 2015). These were optimized with greedy search (GS) technique (Litvak and Friston, 2008) by iterating over successive partitions of MSP to find the set yielding the best fit (here we specify a total of 512 dipoles). The MSP model was used to identify distributed sources of brain activity; hence, the two conditions (RND and REG) were inverted together.
We were interested in capturing the sources underlying two aspects of the data:
The discovery of regularity (REG vs RND). This analysis was performed on the “Fast” sequence evoked response. The analysis used DSSed data (de Cheveigné and Parra, 2014), with the three most reproducible components projected back into sensor space and used for the inversion. Trials were averaged by condition and the inverse estimates were obtained for the two conditions together using an interval of 300 ms between 665 and 965 ms poststimulus onset. The interval was chosen to coincide with the timing of divergence between the REG and RND conditions as seen in the time domain analysis (Fig. 3). An attempt was made to analyze the “Slow” sequences (between 3,500 and 6,000 ms post stimulus onset, coinciding with the timing of divergence between REG and RND conditions), but no significant sources were identified. This lack of findings can be attributed to several factors, primarily the weaker sustained response effect (see below). Memory constraints probably further exacerbated the issue, resulting in substantial variability across participants when tracking the slow sequences. Furthermore, the opposing effects observed for the sustained and tone-evoked responses (see “results”) likely contributed to a net cancellation of effects, making it challenging for the source model to discern meaningful patterns in the “slow” sequence evoked activity.
The effect of regularity (REG vs RND) on the individual tone responses in “Slow” sequences. A similar analysis pipeline as that described above was used. This analysis focused on the interval between 5 and 15 s—from the third cycle of the REG until offset, i.e., where the regularity in REG stimuli was well established (theoretically, and, as seen in the time domain data, regularity is discovered partway through the second cycle and well established by the third cycle). The filtered raw signal (2–30 Hz), epoched over 0–200 ms post tone onset and averaged across tone presentations, was used for the inversion. The interval was chosen to coincide with the largest possible time window post tone onset to allow the algorithm to encompass all brain sources responsible for generating the response (Henson et al., 2011).
After inversion, source activity for each condition was projected to a three-dimensional source space and smoothed [12-mm full width at half maximum (FWHM) Gaussian smoothing kernel] to create Neuroimaging Informatics Technology Initiative (NIfTI) images of source activity for each subject. At the second level of statistical analysis, the two conditions (REG vs RND) were modeled with the within-subject factor Regularity (REG/RND). Statistical maps of the contrast were thresholded at a level of p = 0.05 uncorrected (F contrasts) across the whole-brain volume. Relevant brain regions were identified using the AAL3 toolbox (https://www.oxcns.org/aal3.html).
Results
Behavioral performance reveals good sensitivity to regularity even following the introduction of silent gaps between tones
We tested how pattern detection ability is affected by the introduction of a silent gap of increasing length between successive tone-pips. Figure 1B shows performance (quantified as d’ sensitivity score) for each condition in experiments 1a and 1b. With increasing gap duration, an overall gradual worsening of performance was observed. A repeated measures ANOVA over the three gap duration conditions in experiment 1a confirmed a main effect of condition [F(2, 56) = 3.814, η2 = 0.123, p = 0.026]. Post hoc tests (Bonferroni corrected) indicated a significant difference between Gap0 and Gap100 conditions [p = 0.034] and between Gap0 and Gap200 conditions [p = 0.026]. No difference between Gap100 and Gap200 was seen [p = 1]. In general, most participants achieved a d’ above 2 in the Gap200 condition, revealing a largely conserved sensitivity even though the duration of the pattern increased five-fold from 500 ms in Gap0 to 2,500 ms in Gap200. In experiment 1b we further tested the performance for silence gaps of 500 ms. A repeated measures ANOVA with factor Gap (0, 100, 500 ms) confirmed a main effect of condition [F(2,52) = 33.687, η2 = 0.564, p < 0.001]. Post hoc (Bonferroni corrected) comparisons indicated significantly worse performance in Gap100 [p = 0.025] and Gap500 [p < 0.001] compared to Gap0, and between Gap100 and Gap500 [p < 0.001].
Overall, the pattern of results is consistent with a slow decline in performance for gaps up to 200 ms and a steeper drop thereafter. We, therefore, selected the 200 ms gap duration for the MEG experiments (in naive distracted listeners) below.
The emergence of regularity is associated with an increase in sustained MEG activity
The Group RMS (mean of all subjects’ RMSs) of the evoked response to the “Fast” sequences is shown in Figure 3A,B. The brain response presents prototypical onset activity, followed by a subsequent rise to a sustained response that persists until offset. A pronounced offset response is seen about 100 ms after sound cessation. Fluctuations at 20 Hz, reflecting the tone presentation rate, are visible in the sustained portion of the response. In line with previous observations (Barascud et al., 2016; Southwell et al., 2017; Southwell and Chait, 2018), REG shows an increased sustained response when compared with RND. The timing at which the response to REG diverges from RND is considered to reflect the information required to detect the regularity. A significant difference between conditions emerged after 665 ms (13 tone-pips, 1.3 cycles). This estimate is consistent with previous modeling work (Barascud et al., 2016; Harrison et al., 2020) which demonstrated that an ideal observer model required 3–4 tones following the first cycle to detect the emergence of regularity.
Figure 3C displays the source analysis, applied over a 300 ms interval over which the REG and RND conditions begin to diverge (yellow shading in Fig. 3A). The activation map (F contrast, REG > RND, p = 0.05, see Table 1) demonstrates increased activity in the AC (bilaterally), IFG (bilaterally) and HP (right hemisphere only). No areas were identified by using the opposite (RND > REG, p = 0.05) contrast. Overall, the source data are largely consistent with what was previously shown by Barascud et al. (2016) for similar stimuli, confirming a distributed network spanning auditory, frontal and hippocampal sources which underlies sensitivity to regular patterns.
Responses to the “Slow” (Gap200) sequences are shown in Figure 4A. Pronounced fluctuations at 4 Hz, reflecting the tone presentation rate, are clearly visible on top of the sustained response. Similar to what was observed for the “Fast” sequences, a difference in sustained response emerges between REG and RND when the REG pattern begins to repeat (after 2,500 ms). This effect is much smaller, however. To separate the sustained response from phasic activity associated with tone-evoked responses, the data were low-pass filtered (0–2 Hz; Fig. 4B). A significant difference between conditions emerged after 13 tones (3,266 ms) consistent with the observations from the “Fast” sequence above. This suggests that irrespective of the rate at which tones are presented (at least within the range tested here), regularity detection requires a constant amount of information (as measured in number of tones pips). However, it is notable that the sustained difference between REG and RND conditions in the “Slow” sequences is smaller and rather noisier (e.g., as reflected by the discontinuous significance, see Fig. 4) than in the “Fast” sequences. A repeated measures ANOVA on the difference between mean sustained response power in REG and RND (as shown in Figs. 3B, 4B) confirmed a significantly smaller difference between REG and RND in the “Slow” sequences (F(1, 42) = 18.31, η2 = 0.3036, p < 0.001).
Overall, the MEG results demonstrate that passively elicited brain responses to REG relative to RND sequences are associated with significantly stronger sustained response magnitude, including when pattern durations are long (2,500 ms in “slow” sequences).
Responses to individual tones are decreased in REG relative to RND sequences
To focus on phasic activity associated with responses to individual tones, sequence-evoked responses were high-pass filtered at 2 Hz (Fig. 4C) and tone-centered epochs were extracted (from 50 ms pre-tone onset to 200 ms post tone onset). The main analysis (Fig. 5), focused on tones presented in each cycle of the REG sequences (see indicated in Fig. 4C; 0–2.5 s; 2.5–5 s; 5–7.5 s; 7.5–10 s; 10–12.5 s; 2.5–15 s), and corresponding tones in RND sequences. As expected, no differences between conditions are seen in the first cycle (Cycle#1) (Fig. 5A). In contrast, clear differences between tones presented in REG versus RND contexts are seen in Cycle#2 onwards (Fig. 5B; Cycle#6 also plotted; Fig. 5C). Critically, REG tones evoke reduced responses relative to RND tones. This effect appears to be specific to the latter part of the tone-evoked response: from ∼100 ms post tone onset, i.e., during the tone-evoked M100 peak.
An additional repeated measures ANOVA on response magnitude (mean power between 100 and 200 ms post tone onset) with regularity (REG vs RND) and tone position in the second to sixth cycles (i.e., from tone #11 to tone #60) as factors revealed a main effect of regularity only (F(1,21) = 4.634, η2 = 0.181, p = 0.043), with no effect of tone position (F(1,49) = 1.063, η2 = 0.048, p = 0.359) or interaction of the two factors (F(1,49) = 0.937, η2 = 0.043, p = 0.599). Though clearly noisy, this tone-by-tone analysis reveals a sustained, stable difference between REG and RND conditions. As a control analysis, a repeated measures ANOVA on the first 10 tones in the sequence (Cycle#1) indicated a main effect of tone position (F(1,21) = 9.877, η2 = 0.32, p < 0.001) only. Post hoc tests indicated that the responses to the first two tones are significantly different from the third through tenth tones (p < 0.01) in both REG and RND sequences, reflecting increased responses at sequence onset. Neither condition (F(1,9) = 2.647, η2 = 0.112, p = 0.119) nor the interaction of condition by tone position (F(1,9) = 0.556, η2 = 0.026, p = 0.832) was significant. Together, these analyses confirm no difference between REG and RND during the first cycle, with a sustained difference between conditions emerging during the second cycle onwards.
To further understand whether and how the tone-evoked responses in REG and RND contexts changed over time, we computed the mean evoked field differences between tones presented in the first and subsequent cycles in REG and RND conditions. Because responses to the initial couple of tones (first two tones in Cycle#1) were affected by onset-response activity, we focused this analysis on the last eight tones of each cycle (Cycle#1, tone 3-10; Cycle#2, tone 13–20; and so on). The mean tone-evoked response (computed between 100 and 200 post onset) during Cycle#1 was subtracted from that of Cycle#2 to Cycle#6 to understand how the presence of regularity affects tone responses. The data are plotted in Figure 5D. A repeated measures ANOVA with the condition and cycle number as factors yielded a main effect of condition only (F(1, 21) = 4.723, η2 = 0.184, p = 0.041). No effect of cycle number (F(4, 84) = 1.078, η2 = 0.049, p = 0.373) or interaction of those two factors (F(4,84) = 1.087, η2 = 0.049, p = 0.368) was observed. This indicates a sustained difference between REG and RND conditions that does not change over time. A one-sample t test (uncorrected) confirmed that such differences for cycles#2-#6 in the REG condition were below zero, i.e., consistently reduced relative to cycle 1 [Cycle#2 t(1,21) = −3.102, d = −0.661, p = 0.003; Cycle#3 t(1,21) = −3.288, d = −0.701, p = 0.002; Cycle#4 t(1,21) = −3.702, d = −0.789, p < 0.001; Cycle#5 t(1,21) = −2.161, d = −0.461, p = 0.021; Cycle#6 t(1,21) = −2.478, d = −0.528, p = 0.011]. In contrast, the same analysis for RND indicated nonsignificant effects [Cycle#2 t(1,21) = −1.051, d = −0.224, p = 0.153; Cycle#3 t(1,21) = −1.7, d = −0.363, p = 0.052; Cycle#4 t(1,21) = −1.604, d = −0.342, p = 0.062; Cycle#5 t(1,21) = −1.829, d = −0.390, p = 0.041; Cycle#6 t(1,21) = −0.125, d = −0.027, p = 0.451].
Overall, the tone-evoked analysis demonstrates a consistent difference between tones presented in REG relative to RND contexts, the effect emerges early during the second regularity cycle (i.e., when the regularity has been established) and is manifested as a reduction in responses to REG tones, while responses to RND tones remain stable throughout the stimulus period.
Source localization (see Fig. 5F and Table 1) for the contrast RND > REG (p = 0.05) during the tone-evoked response (full epoch – 0–200 ms; extracted from the 3rd cycle until sequence offset; 5–15 s; i.e., after the regularity in REG has been established; see Figs. 4B, 5E) identified sources in bilateral temporal lobe (superior temporal gyrus, Heschel's gyrus) and bilateral IFG that underlie the time-domain effect. The opposite contrast (REG > RND, p = 0.05) yielded no significant activations.
No significant correlation between tone-evoked and sustained-response effects
To investigate a potential link between the sustained response and tone-evoked responses we correlated (spearman) the difference in the tone-evoked response (REG-RND; mean power between 100 and 200 ms post tone onset) with a difference in the sustained response (REG-RND; low-pass filtered as in Fig. 4B) during Cycle#2 and Cycle#6 across subjects. Both analyses yielded nonsignificant effects (p > 0.2).
We also conducted a more complex ridge regression analyses over single trial data using linear mixed effects models implemented by the lme4 package in R (Bates et al., 2015). Likelihood estimation was conducted using the anova function in the stats package. Models were run separately on Cycle#2 (where we expect initial divergence between REG and RND conditions) and Cycle#6 (where we expected the regularity in REG to be fully established) data predicting the tone-evoked response with the sustained response as a continuous predictor in interaction with condition (REG/RND) as a categorical predictor. Both models included random intercepts by trial number and subjects. No significant effects were observed (p > 0.29).
However, it is essential to acknowledge that the absence of a correlation between tone-evoked and sustained-response effects in the current dataset may stem from the relatively modest effects observed in both the sustained and tone-evoked responses. A more comprehensive exploration of this relationship could be achieved through a systematic examination of data with an increased number of trials per subject or a larger sample size. This approach would provide a more robust foundation for drawing conclusions about potential correlations between these response types.
Discussion
We demonstrated that an increased sustained response to REG compared to RND patterns previously observed in rapid tone sequences (20 Hz; 500 ms cycle duration), also occurs in slower sequences (4 Hz; 2,500 ms cycle duration). This confirms the auditory brain's remarkable implicit sensitivity to complex patterns. Critically, brain responses evoked by single tones exhibited the opposite effect - lower responses to tones in REG compared to RND sequences. The observation of opposing sustained and evoked response effects reveals parallel processes that shape the representation of unfolding auditory patterns.
Sustained brain responses track pattern emergence even in slow sequences
Increased brain responses to predictable, relative to random patterns have previously been documented in many contexts (Barascud et al., 2016; Sohoglu and Chait, 2016; Southwell et al., 2017; Herrmann and Johnsrude, 2018; Herrmann et al., 2019; Zhao et al., 2024). A greater amplitude for REG over RND stimuli is not easily interpretable as a response to physical attributes of the signal. Adaptation, for example, would result in the opposite pattern (Megela and Teyler, 1979; Pérez-González and Malmierca, 2014). Instead, the dynamics of this response, including when it diverges between REG and RND stimuli, suggest that the brain is sensitive to changes in the predictability of sound sequences. Previous observations regarding how the sustained response is modulated by sequence predictability have prompted the hypothesis that it might reflect the coding of precision, or inferred reliability, of the incoming sensory information (Barascud et al., 2016; Heilbron and Chait, 2018; Zhao et al., 2024).
Here we showed that sustained response effects persist even when sequences are presented at a slower rate (4 Hz). Despite the fivefold increase in pattern duration, the divergence between REG and RND conditions occurred roughly at the same time (three tones into the second cycle), in ‘Slow’ and ‘Fast’ sequences, consistent with ideal observer benchmarks (Pearce, 2005; Barascud et al., 2016; Harrison et al., 2020).
It is noteworthy that the sustained response was diminished in the ‘Slow’ compared to ‘Fast’ sequences. This could be attributed, at least in part, to limitations in human listeners’ memory capacity. Indeed, Barascud et al. (2016) observed a reduced sustained response to REG sequences consisting of cycles of 15 tones relative to 10 tones. This was interpreted as indicative of a threshold in encoding patterns that emerges when detecting longer repeating cycles. Similarly, Herrmann et al. (2019) reported reduced sustained responses in older individuals compared to younger participants, hypothesizing that this reduction could stem from age-related decline in tracking regularity patterns (Bianco et al., 2023). To detect the emergence of regularity, the auditory system must presumably maintain and update a statistical model of the auditory input, registering tone repetitions, and decide at which point there is sufficient evidence to indicate a regular pattern. The efficiency of this process relies on the interplay between echoic and short-/long-term memory capacity (Bianco et al., 2020; Harrison et al., 2020). In our study, the introduction of gaps between consecutive tones and the subsequent increase in cycle duration from 500 ms to 2,500 ms likely strained short-term memory, leading to less precise memory encoding and therefore overall lower encoding fidelity for the 'Slow' sequences. The behavioral results indeed indicate a decline in pattern detection (Fig. 1). However, it is crucial to emphasize that, despite this decline, the mean performance level remained high, underscoring the largely preserved sequence tracking capacity.
The brain mechanisms underlying the sustained response remain unclear. Source analysis suggests that the amplified response is driven by cortical activation in auditory, inferior frontal (IFG) and hippocampal sources (Barascud et al. (2016)). A similar network involving the AC and IFG has been implicated in the generation of the Mismatch Negativity response (Näätänen et al., 2012) and has been postulated to represent the circuit responsible for maintaining an auditory model and conveying predictions to lower processing levels (Garrido et al., 2009; Heilbron and Chait, 2018).
According to one interpretation, the sustained response might reflect an excitatory processing mechanism, characterized by an increase in gain, potentially via neuromodulation, on units responsible for encoding reliable sensory information (Feldman and Friston, 2010; Auksztulewicz et al., 2017). In particular, tonic acetylcholine (ACh) has been shown to be modulated by environmental uncertainty (Dalley et al., 2001; Yu and Dayan, 2005; Bland and Schaefer, 2012). However, this interpretation may be less tenable, as it predicts heightened responses to tones within the REG sequences, which is contrary to our observed findings (see below). Alternatively, the sustained response may indicate an enhancement in the inhibition of neuronal units that convey low information content. This is consistent with prior research, albeit involving simpler stimuli, where an increase in inhibitory activity linked to the presence of predictable information has been documented (Natan et al., 2015, 2017; Schulz et al., 2021; Richter and Gjorgjieva, 2022; Yarden et al., 2022). A specific role for inhibition, instead of excitation, in governing responses to predictable sensory stimuli, is also supported by indirect evidence from dynamic causal modeling (Lecaignard et al., 2022) and behavioral findings: rather than capturing attention, predictable patterns are more easily ignored (Southwell et al., 2017) and are linked to reduced arousal (Milne et al., 2021). It is important to emphasize that M/EEG (or BOLD) do not readily differentiate between inhibitory and excitatory activity. Therefore, further advancement in understanding this phenomenon necessitates focused investigations at the cellular level.
Reduced responses to tones in REG relative to RND patterns
Introducing temporal gaps between successive pips allowed us to disentangle the neural responses elicited by individual tones. Results revealed a reduction in neural activity in response to tones embedded within regularly repeating relatively to random patterns. This effect appears to be driven by relatively stable responses to tones in random patterns but declining responses in the REG context. The dynamics of this effect are consistent with a step change in response magnitude during the second cycle (after the regularity has been introduced) that is then fixed for the remainder of the sequence.
Reduced response to REG tones is consistent with predictive coding theories (Rao and Ballard, 1999; Lee and Mumford, 2003; Friston, 2005, 2009). According to these models, top-down expectations, derived from statistical regularities in the external world, play a crucial role in suppressing anticipated sensory input. This mechanism serves as an efficient neural coding scheme, optimizing the allocation of neural resources and enabling the brain to prioritize the processing of novel or unexpected information, which may hold greater relevance (Olshausen and Field, 1996, 2004; Friston, 2005, 2009; Tang et al., 2018). Empirical support for these predictions, often referred to as “expectation suppression,” has been mounting across sensory modalities, (Baldeweg, 2006; Summerfield et al., 2008; Alink et al., 2010; den Ouden et al., 2010; Todorovic et al., 2011; Kok et al., 2012; Todorovic and de Lange, 2012; Barbosa and Kouider, 2018; Heilbron and Chait, 2018). In the auditory domain, Todorovic and de Lange (2012) demonstrated that when tones were expected based on the probability structure of tone transitions, they elicited suppressed auditory activity within a specific time window of 100–200 ms. This suppression was uniquely attributable to the phenomenon of expectation suppression and distinct from adaptation (repetition suppression) effects.
Notably, the effects we report manifest within this same time window (100–200 ms; during the M100 phase of the response). While it is difficult to exclude low-level processes such as adaptation, several patterns in the dynamics of the development of these effects suggest that simple adaptation is unlikely to be a main factor. Firstly, the effects require processes that persist for 2,500 ms (duration of a cycle). Secondly, we do not see a gradual reduction in responses to REG tones that builds up over cycles. Rather there is a step change in the second cycle that is then consistent for the remainder of the sequence.
Multiplexed representation of sequence predictability
The challenge faced by sensory systems is to accurately and swiftly represent information to support adaptive behavior and facilitate interaction with the environment. A fundamental question pertains to whether the brain primarily encodes predictable or novel information (Press et al., 2020). Bayesian cognitive models propose that our predisposition to perceive what we expect enhances the fidelity of our sensory experiences (Wyart et al., 2012; Summerfield and de Lange, 2014; Kaiser et al., 2019). In contrast, cancellation models suggest that our perceptual system prioritizes unexpected stimuli, as they carry an informative value (Blakemore et al., 1998; Meyer and Olson, 2011; Richter et al., 2018). In line with these considerations, predictive coding models (Rao and Ballard, 1999; Friston, 2005, 2009) postulate the existence of two functionally distinct subpopulations of neurons within the brain. One encodes the conditional expectations of perceptual causes, while the other encodes prediction error.
Our findings confirm the coexistence of these facets of regularity coding within the MEG signal: the sustained response is consistent with the encoding of the predictability of the signal, whereas responses to individual tones appear to correspond to the coding of prediction error, as indicated by the reduced responses to predictable tones. Intriguingly, our results underscore the active involvement of the same neural network, encompassing the AC and the IFG, in both discovering structural patterns within auditory sequences and dampening responses to anticipated stimuli. However, the spatial resolution limitations inherent to MEG source analysis prevent definitive conclusions about the precise co-localization of these neural processes.
Indeed, the question of whether these manifestations stem from a singular process exhibiting differential characteristics in sustained and tone-evoked responses or whether they represent two distinct mechanisms, as proposed in previous works (Rao and Ballard, 1999; Friston, 2005, 2009), emerges as a crucial avenue for future exploration. For example, it is possible that the sustained response reflects activity linked to a tonic inhibitory drive (implementing gain control) onto sensory units, resulting in a diminished evoked response to individual stimuli. Notably, our study did not reveal a correlational relationship between tone-evoked and sustained responses. While this may tentatively suggest no direct linkage between the two mechanisms, it's essential to consider the possibility that this observation could be influenced by the inherent noise in MEG measurements. We anticipate that more nuanced insights will be gleaned with the application of sensitive invasive tools in future investigations.
Data Availability Statement
The data reported in this manuscript alongside related analysis scripts are available at doi: 10.5522/04/25122383.
Footnotes
We thank Theofilos Petsas for contributing to data collection. This work was supported by a BBSRC project grant to M.C. R.B. is funded by the European Union MSCA (101064334). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
The authors declare no competing financial interests.
- Correspondence should be addressed to Maria Chait at m.chait{at}ucl.ac.uk.
This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license, which permits unrestricted use, distribution and reproduction in any medium provided that the original work is properly attributed.