Repetition suppression, the phenomenon that the second presentation of a stimulus attenuates neural activity, is typically viewed as an automatic consequence of repeated stimulus presentation. However, a recent neuroimaging study has suggested that repetition suppression may be driven by top-down expectations. Here we examined whether and when repetition suppression can be modulated by top-down expectation. Participants listened to auditory stimuli in blocks where tone repetitions were either expected or unexpected, while we recorded ongoing neural activity using magnetoencephalography. We found robust repetition suppression in the auditory cortex for repeated tones. Interestingly, this reduction was significantly larger for expected than unexpected repetitions, both in terms of evoked activity and gamma-band synchrony. These findings indicate a role of top-down expectation in generating repetition suppression and are in line with predictive coding models of perception, in which the difference between expected and actual input is propagated from lower to higher cortical areas.
When a stimulus is repeated, the neural activity evoked by its second appearance is reduced. This phenomenon, known as repetition suppression (RS), is robust and has been observed across a range of stimulus properties, time scales, sensory modalities, and brain areas (Grill-Spector et al., 2006). Thereby, RS has become an indispensable tool in cognitive neuroscience to characterize the nature of neural representations.
Despite its wide use, the underlying neural mechanisms of RS are still not well understood. On the one hand, it has been hypothesized that RS is the automatic result of changes in the responsivity of relevant neurons, due to stimulus-induced adaptation of the neuronal pool involved (Grill-Spector et al., 2006). Implicit in this view is that RS is an automatic effect, triggered by the repeated sensory stimulation. In contrast to this view, a recent study suggests that RS may rather be a consequence of top-down perceptual expectations (Summerfield et al., 2008). In this study, participants viewed pairs of face stimuli, which could be repeated. Crucially, the probability of a repetition could be either high or low. While the fusiform face area response was strongly reduced when repetitions were expected, it was only moderately reduced when the repetition was unexpected (i.e., RS was attenuated). This suggested the possibility that RS reflects a relative reduction in perceptual “prediction error” when processing an expected stimulus.
Although this explanation of RS in terms of perceptual expectation is appealing, a recent study that attempted to replicate the results of Summerfield et al. (2008) using electrophysiological recordings in inferotemporal cortex of macaques found no such effect (Kaliukhovich and Vogels, 2010). Although they observed robust RS, this was not modulated by expectation. While there are some important differences between the two studies, both in terms of task requirements [subjects in the Kaliukhovich and Vogels (2010) study passively fixated, while subjects in the Summerfield et al. (2008) study monitored the stimuli for occasional target stimuli] and neural activity measures [Kaliukhovich and Vogels (2010) measured electrophysiological markers of brain activity, while Summerfield et al. (2008) measured hemodynamic markers of neural activity], this negative finding potentially casts doubt on the account of RS in terms of perceptual expectations.
In the current study, we quantified electrophysiological correlates of auditory RS in humans, to assess whether RS is modulated by top-down expectation. We measured neural activity over auditory cortex using magnetoencephalography (MEG) while participants listened to repeated tones, which could be either expected or unexpected, and responded to rare deviant tones. In short, we found strong RS for repeated tones in the auditory cortex, which was markedly reduced when the repetition was unexpected. Thereby, our results provide empirical support for a role of top-down expectation in RS (Baldeweg, 2006; Summerfield et al., 2008).
Materials and Methods
Sixteen healthy participants (10 female, age 23 ± 3 years, mean ± SD) participated in the experiment upon signing an informed consent form in accordance with the Declaration of Helsinki. All participants had normal hearing and no history of neurological or psychiatric disorders. The study was approved by the regional ethics committee.
Stimuli and experimental design.
The experimental stimuli consisted of brief auditory tones (frequency 1000 Hz, duration 5 ms, ∼70 dB SPL), which were presented binaurally via MEG-compatible air tubes. Stimuli were presented using a PC running Presentation software (Neurobehavioral Systems).
Each trial started with the presentation of a small central fixation cross on the screen for 2–4 s. Then, an auditory tone was presented, which was either repeated after 500 ms (repetition trial) or not (nonrepetition trial). This was followed by an additional period during which the fixation cross was presented (0.5–1 s), and a short period in which the participants could freely move their eyes and blink (1.5–2 s), resulting in a 4–6 s intertrial interval (defined as the interval between the last tone of the current trial and the first tone of the next trial). Participants were instructed to listen to the tones and press a button with their right hand if they heard a deviant tone (frequency 1200 Hz). Each block consisted of 90% standard tones and 10% deviant tones.
We manipulated the expectation of repetition by creating different types of blocks. Each block consisted of 86 trials. In some blocks, tone repetitions occurred frequently (75%) and were therefore expected, whereas in other blocks they occurred rarely (25%) and were therefore unexpected. Given that unexpected repetitions were relatively rare (as a logical consequence of our experimental design), we doubled the amount of blocks of trials within this context to generate a sufficient amount of trials for statistical analysis. In total, there were two blocks with expected repetitions, and four with unexpected repetitions. We also manipulated the temporal position of the task-relevant rare deviant tones, which could be on either the first or the second temporal position of a tone pair throughout the whole block. Since this manipulation did not induce overall differences in processing the standard tones, we collapsed over this factor. The participants were informed in advance of each block on whether tone repetitions would be frequent or infrequent within the block, as well as the temporal position of the deviant tones. The experiment lasted 1 h, and was preceded by an 8 min practice session consisting of 60 trials.
Ongoing brain activity was recorded using a whole-head MEG with 275 axial gradiometers (VSM/CTF Systems) in a magnetically shielded room. Head localization was monitored continuously during the experiment using coils that were placed at the cardinal points of the head (nasion and left and right ear canals). As an aid for eye blink and heartbeat artifact rejection, an electro-oculogram (EOG) was recorded from the supraorbital and infraorbital ridge of the left eye, and an electrocardiogram (ECG) was recorded, both using 10-mm-diameter Ag–AgCl surface electrodes.
MEG data analysis.
The data were analyzed using the FieldTrip toolbox developed at Donders Institute for Brain, Cognition, and Behavior (Oostenveld et al., 2011) using Matlab 7 (MathWorks). Data analysis was performed only on the trials consisting of standard tones, and only on trials in which a response was correctly withheld. Data epochs of interest were checked for artifacts using a semiautomatic routine that helped detecting and rejecting trials containing muscle artifacts and jumps in the MEG signal caused by the SQUID electronics. Subsequently, independent component analysis (Bell and Sejnowski, 1995) was used to partial out any remaining variance attributable to eye blinks and heartbeat artifacts (Jung et al., 2000). Finally, the data were visually inspected and any remaining trials with artifacts were removed manually.
Before calculating event-related fields (ERFs), data were low-pass filtered using a two-pass Butterworth filter with a filter order of 6 and a frequency cutoff of 40 Hz. ERFs were baseline corrected using an interval of 500 to 400 ms before the occurrence of the first tone. A planar gradient transform was then calculated (Bastiaansen and Knösche, 2000). This simplifies the interpretation of the sensor-level data because it typically places the maximal signal above the source (Hämäläinen et al., 1993). To avoid differences in the amount of noise when comparing blocks with different numbers of trials, we matched by randomly selecting a subsample of trials from the conditions with more trials.
We calculated time–frequency representations (TFRs) using a Fourier transform approach applied to short sliding time windows. Before the Fourier transform, one or more tapers were multiplied to each time window and the resulting power estimates were averaged across tapers. The power values were calculated for the horizontal and vertical component of the planar gradient and then summed. We then took the median of the planar gradient power estimates for all trials within a condition. For the frequencies 5–35 Hz, we used a single Hanning taper and applied an adaptive time window (T) of four cycles for each frequency (ΔT = 4/f), which resulted in an adaptive smoothing of Δf = 1/ΔT. In the higher frequency bands (35–140 Hz), we used a fixed taper length of 200 ms with a Δf = 20 Hz frequency smoothing (Percival and Walden, 1993). Percentage of change in power was calculated with respect to a baseline window, which was centered around 500 to 400 ms before the presentation of the first tone, and had equal window length as the time windows of interest. Based on the average spectral activity profile, we restricted our data analyses to the theta/alpha-band (5–12 Hz) as well as the gamma-band (50–100 Hz) frequencies.
Sources of evoked activity were identified using a time-domain beam-forming approach on the axial sensor data (linearly constrained minimum variance). We looked at average activity elicited by the standard tone, between 50 and 150 ms after stimulus. We created a realistic single-shell head model for 15 of 16 participants of which we had acquired structural MRI images, using the brain surface from their individual segmented MRIs (Nolte, 2003). The brain volume of each participant was discretized to a grid with a 1 cm resolution and the lead field matrix was calculated for each grid point according to the head position in the system and the forward model. A spatial filter was then constructed for each grid point using the covariance and the lead field matrices. Source strength was calculated in the activation period, and normalized to unit strength for each participant. Individual source estimations were overlaid on the corresponding anatomical MRI, after which the anatomical and functional data were spatially normalized using SPM8 (Statistical Parametric Mapping; http://www.fil.ion.ucl.ac.uk/spm) to the MNI (Montreal Neurological Institute) template.
Identification of auditory activation.
We performed all statistical analyses on the average activity of 20 sensors (10 over the left hemisphere, 10 over the right hemisphere) that showed maximal auditory activation when averaged across all trial types, conditions, and tone presentations. Loci of auditory activation were defined by identifying the 10 left and right hemisphere sensors that showed maximal activity in the 50–150 ms period following the tone presentation (Fig. 1A). The evoked (Fig. 1B) and oscillatory (Fig. 1C) activity of this set of sensors constituted the measure of auditory activation that served as the dependent variable for all subsequent analyses.
Evoked and oscillatory auditory activity of different conditions were statistically compared using nonparametric cluster-based permutation t tests (Maris and Oostenveld, 2007). This type of test controls the type I error rate in the context of multiple comparisons by identifying clusters of significant differences over space, time, and/or frequency instead of performing a separate test on each sensor, sample, and frequency pair. For all analyses, we averaged over the spatial (channel) dimension, on the basis of independent localization of the 10 left and 10 right channels that showed most robust auditory tone-related activity (Fig. 1). Therefore, our statistical analysis considered one-dimensional (temporal, for the analysis of evoked activity differences) or two-dimensional (spectrotemporal, for the analysis of oscillatory activity differences) clusters. All cluster-level statistics, defined as the sum of t values within each cluster, were evaluated under the permutation distribution of the maximum (minimum) cluster-level statistic. This permutation distribution was approximated by drawing 5000 random permutations of the observed data. The obtained p values represent the probability under the null hypothesis (no difference between the conditions) of observing a maximum (minimum) cluster-level statistic that is larger (smaller) than the observed cluster-level statistics. We used this method to assess whether there were significant temporal (ERF) or spectrotemporal (TFR) clusters of differential activity.
The participants' task was to press a button whenever a deviant tone was presented. Participants correctly responded to virtually all (96.0 ± 0.05%, mean ± SD) of the deviant tones and correctly refrained from responding to virtually all (99.6 ± 0.06%, mean ± SD) of the standard tones. Subjects responded faster to the deviant tone when it was the second tone (710 ms) compared to when it was the first tone (916 ms) of the pair (F(1,15) = 34.8, p < 0.001). Expectation of stimulus repetition did not affect response time (F < 1, p > 0.10).
Neural activity elicited by the auditory stimuli
Auditory tones elicited strong neural activity over bilateral temporal cortex (Fig. 1A), which was maximal between 50 and 150 ms after stimulus (Fig. 1B). A time–frequency representation of the power in the signal showed that the auditory stimulus elicited an increase in low-frequency power related to the phase-locked evoked response, as well as an increase in oscillatory activity in the gamma band (60–90 Hz) (Fig. 1C). Source localization of neural activity between 50 and 150 ms indicated a bilateral source distribution along the superior temporal sulcus (Fig. 1D), in the vicinity of the primary auditory cortex (Rademacher et al., 2001).
Expectation of repetition reduces auditory activity for repeated tones
We compared neural responses to expected and unexpected tone repetitions in the sensors that showed strongest auditory activity (Fig. 1B). While there were no differences in neural activity elicited by the first tone as a function of repetition expectation (all p > 0.10), repetition expectation strongly modulated the activation elicited by the second tone (Fig. 2A). When the repetition was expected, the auditory stimulus resulted in evoked activity lower than when it was unexpected (100–500 ms after stimulus, p < 0.001). Analysis of the time–frequency representations showed similar results, showing a significant spectrotemporal cluster of larger power for unexpected repetitions in the low frequencies (0–350 ms after stimulus, frequency range 5–9 Hz, p < 0.001), as well as in the gamma band (200–300 ms after stimulus, frequency range 80–95 Hz, p < 0.05) (Fig. 2B).
Expectation of repetition increases auditory activity for omitted tones
Interestingly, similar effects of expectation were observed when the second tone was omitted (Fig. 3). When subjects expected a tone repetition, a tone omission resulted in a stronger evoked field (100–150 ms after omission, p < 0.05) (Fig. 3A). Similarly, analysis of oscillatory activity showed larger gamma band power for omitted tones when subjects expected a tone repetition (200–400 ms after omission, frequency range 60–75 Hz, p < 0.05) (Fig. 3B).
Since we observed larger activity for both unexpected repetitions (Fig. 2A) and unexpected omissions (Fig. 2B), we wondered whether these expectation-mediated activity differences were related. For this, we correlated the activity difference between expected and unexpected repetitions (averaged over the 100–500 ms window following the second tone) with the activity difference between expected and unexpected omissions (averaged over the same temporal window) (Fig. 4). Indeed, we found that individual differences in the amount of activity difference between expected and unexpected repetitions were correlated with the amount of activity difference between expected and unexpected omissions (r = 0.43, p < 0.05), suggesting that these two phenomena may be mediated by the same neural mechanisms.
In this study, we examined whether the reduced neural activity for repeated events (RS) was modulated by expectation of repetition. We observed that the expectation of repetition of auditory events strongly increased repetition suppression in the auditory cortex: the more expected a repeated tone was, the more its evoked response was suppressed. This effect was visible both in early evoked activity 100 ms after stimulus and by a change in gamma band synchrony 200–300 ms after stimulus onset. The effect of expectation was also present in the absence of a physical stimulus: unexpected omissions resulted in stronger evoked activity and gamma band synchrony over auditory cortex. Finally, individual differences in the amount of activity increase for unexpected repetitions were correlated with the amount of activity increase for unexpected omissions, suggestive of a common neural mechanism for both phenomena.
Our findings are in favor of a top-down account of RS that has been previously suggested (Baldeweg, 2006; Summerfield et al., 2008). In particular, predictive coding models (Rao and Ballard, 1999; Lee and Mumford, 2003; Friston, 2005, 2009) posit that top-down expectations (which are derived from the statistical regularities in the world) help to suppress expected input, thereby constituting an efficient neural coding scheme (Olshausen and Field, 1996, 2004; Friston, 2005, 2009). In this view, feedforward stimulus-evoked activity reflects the mismatch between top-down expectation and sensory input, i.e., prediction error. In our paradigm, the (temporally unpredictable) occurrence of the first tone may set up an expectation about the occurrence of the second tone (which has a fixed temporal lag with respect to the first). This expectation is dependent on the observed statistical regularities within a block: when repetitions are more often observed, the prediction of the occurrence of the second tone will be stronger. Hereafter, the occurrence of the second tone is associated with reduced prediction error, and hence attenuated neural activity, while the omission of the second tone is associated with increased prediction error, and hence increased neural activity (see den Ouden et al., 2009 for similar results). While our findings support this framework, an alternative (and not mutually exclusive) interpretation of the unexpected omission-induced activity is that it may be a reflection of the prediction signal itself. Indeed, a recent study suggests that the population response to expected and unexpected events may best be explained by a combination of prediction- and prediction-error-related responses (Egner et al., 2010). Generally, our findings are well in line with earlier work that has shown that probability can have large effects on early cortical processing in the auditory cortex, both in nonhuman (Ulanovsky et al., 2003) and human (Haenschel et al., 2000, 2005; Weiland et al., 2008; Valentini et al., 2011) primates.
A potential limitation of our experimental design is that there was a difference in the total amount of tones between the expectation conditions: blocks where repetitions were expected had overall a larger number of tones than blocks in which repetitions were unexpected, which could potentially lead to generally larger adaptation effects. This effect should, however, be equally (or potentially even more strongly) present for the first tone of the tone pair. Our results, however, showed that auditory activity elicited by the first tone of the tone pair was indistinguishable between conditions, while there were large and robust differences between conditions related to the second tone. Also, intersubject variability in the observed neural activity increase for unexpected repetitions were correlated with variability in neural activity increase for unexpected omissions. While this is in line with both these effects stemming from one neural mechanism (increased prediction error), these findings do not seem consistent with a low-level adaptation account.
In our study, we manipulated the expectation of tone repetition, while the pitch of the tones was kept constant. Therefore, it could be argued that our study jointly manipulated the expectation of the occurrence of an event in general (information-bound surprise), along with the expectation of a particular stimulus (stimulus-bound surprise). In this regard, our study differs from the RS design of Summerfield et al. (2008), in which expectations were induced about the identity of a face (stimulus-bound surprise), but not about whether a face stimulus would be presented in general (information-bound surprise). However, the fact that our expectation modulation was early (100 ms) and confined to sensory regions (auditory cortex) may suggest that it reflects sensory (stimulus-bound) predictions, rather than more general (information-bound) predictions about event occurrence. At a later processing stage, both forms of surprise are associated with a more widespread and later (∼300 ms) increase in activity, which has been localized in a frontoparietal network (McCarthy et al., 1997; Mars et al., 2008; Bekinschtein et al., 2009). A follow-up study that specifically modulates the predictability of a particular tone, while keeping constant the probability of tone occurrence, could help to further dissociate these effects.
The observed expectancy modulation on repetition effects is in line with earlier human work that has shown increased neural activity in primary sensory cortex following both surprising presence and absence of sensory stimulation (Garrido et al., 2007; den Ouden et al., 2009), which is also manifest in the gamma band (Gurtubay et al., 2006; Wyart and Tallon-Baudry, 2008). Interestingly, recent neurophysiological work has indicated that gamma-band oscillations are particularly strong in superficial layers of the cortical column (Lakatos et al., 2005; Maier et al., 2010, 2011). These superficial layers have dense “forward” connections to higher-order areas (Thomson and Bannister, 2003). Therefore, it is tempting to view the observed larger gamma-band activity for unexpected presence and absence of sensory stimulation in the auditory cortex as a “prediction error” response that is fed forward from early auditory cortex to higher-order regions (Rao and Ballard, 1999; Friston, 2005). In this sense, the gamma-band activity we observed may signal a “prediction error” response. This response should therefore not be confused with the expectancy state itself, which is expressed in the temporal structure of activity patterns before the appearance of stimuli (Engel et al., 2001; van Ede et al., 2011).
While we observed a strong modulation of RS by expectation, this does not preclude that RS is also partly driven by automatic stimulus-driven mechanisms such as fatigue (Miller and Desimone, 1994; Grill-Spector and Malach, 2001) or sharpening (Desimone, 1996). In fact, a recent study that aimed to replicate the fMRI findings by Summerfield et al. (2008) found robust RS without any modulation of RS by expectation (Kaliukhovich and Vogels, 2010). This study measured neural activity (local field potentials and spike rates) in the inferior temporal (IT) cortex while monkeys observed complex visual stimuli (fractals and natural stimuli). The researchers observed robust RS, but complete absence of a RS modulation by expectation. While the lack of modulation by RS in their study is puzzling, it should be noted that there are marked differences between their study and the current study. While they measured neural activity from a higher-order area (inferotemporal cortex), we analyzed responses in early sensory cortex. It is well possible that expectation-related modulations are more pervasive in early sensory than later stages of cortical processing (Rao and Ballard, 1999). Another important aspect may be the presence or absence of selective attention. Previous studies have observed that RS is markedly increased by selective attention (Eger et al., 2004; Murray and Wojciulik, 2004; Yi and Chun, 2005). While the monkeys in the study by Kaliukhovich and Vogels (2010) passively fixated the screen, the subjects in our study (as well as in the study by Summerfield et al., 2008) were required to monitor the stimuli for occasional targets to which they had to respond. Therefore, the increased attentional state may have enabled the occurrence of expectation effects. This notion remains speculative however, and could be a fruitful topic of future research. Of note, while we find a strong modulation of expectation on RS, it is highly plausible that this is not the only mechanism by which RS can occur. Indeed, RS may partly reflect intrinsic cellular properties of the system (Zucker, 1989; Farley et al., 2010).
In conclusion, we provide evidence for a top-down mediation of RS by expectation in early auditory cortex. These findings are of importance for studies that use RS as a tool to probe the functional representation of neuronal populations, since activity reductions due to repetition may be related to the predictive relationship between the first and second stimulus, rather than the repetition of a particular representation.
F.P.d.L. received funding from the Netherlands Organisation for Scientific Research (NWO VENI).
- Correspondence should be addressed to Dr. Floris de Lange, Donders Institute for Brain, Cognition, and Behaviour, Radboud University Nijmegen, 6500 HB Nijmegen, The Netherlands.