Abstract
Repeated stimulus exposure alters the brain's response to the stimulus. Recording fMRI data from both men and women viewing 120 presentations of two Gabor patches (each Gabor repeating 60 times), we evaluated support for two prominent models of stimulus repetition, the fatigue model and the sharpening model. Our results uncovered a two-stage learning process in the primary visual cortex. In Stage 1, univariate BOLD activation in V1 decreased over the first 14 repetitions of the stimuli, replicating the well known effect of repetition suppression. Applying moving-window multivoxel pattern analysis decoding, we found that (1) the decoding accuracy between the two Gabors decreased from the above-chance level (∼60 to ∼70%) at the beginning of the stage to the chance level at the end of the stage (∼50%). This result, together with the accompanying weight map analysis, suggested that the learning dynamics in Stage 1 were consistent with the predictions of the fatigue model. In Stage 2, univariate BOLD activation for the remaining 46 repetitions of the two stimuli exhibited significant fluctuations but no systematic trend. The moving-window decoding accuracy between the two Gabor patches was at the chance level initially and became progressively higher as stimulus repetition continued, rising above and staying above the chance level starting at the ∼36th repetition. Thus, results from the second stage supported the notion that sustained and prolonged stimulus repetition prompts sharpened representations. Additional analyses addressed (1) whether the neural patterns within each learning stage remained stable and (2) whether new neural patterns were evoked in Stage 2 relative to Stage 1.
- fatigue model
- fMRI
- multivariate pattern analysis
- perceptual learning
- repetition suppression
- sharpening model
- visual cortex
Significance Statement
Repeated stimulus exposure alters the brain's response to the stimulus. The underlying neural mechanisms remain debated. Formulating the fatigue and sharpening models in multivariate terms, we applied moving-window decoding and weight map analyses to fMRI data evoked by two repeating Gabor patches in V1 to evaluate the support for the two models. Our findings reveal a two-stage learning process: an initial phase of repetition suppression driven by a fatigue mechanism, followed by a longer-term learning phase characterized by a sharpening effect. A further cross-stage decoding analysis indicates that the two learning stages evolved distinct neural representations of the same physical stimuli. These results shed new light on the brain's continuous adaptation to repeated stimuli over both short and longer timescales.
Introduction
Repeated stimulus elicits decreased neural response. Repetition suppression (Henson, 2003) has been observed in many brain regions in both humans and nonhuman primates (Buckner et al., 1995; Demb et al., 1995; Stern et al., 1996; McMahon and Olson, 2007; De Baene and Vogels, 2010). In nonhuman primates, single-unit responses to simple stimuli such as gratings decrease in the primary visual cortex (V1) after repeated stimulations (Makino and Komiyama, 2015; Kaliukhovich and Op de Beeck, 2018). Repeated simple stimuli also evoke decreased fMRI responses in V1 as well (Fang et al., 2005; Kourtzi and Huberle, 2005; Sapountzis et al., 2010; Larsson et al., 2016).
A number of neural models have been proposed to account for repetition suppression (Grill-Spector et al., 2006, Segaert et al., 2013). According to the fatigue model, repetition suppression occurs as the result of decrease in the firing rates of the stimulus-selective neurons (Miller and Desimone, 1994; Müller et al., 1999; Westerberg et al., 2019), whereas the stimulus-nonselective neurons are less impacted (Grill-Spector et al., 2006), leading to decreased population-level activities. The sharpening model, in contrast, proposes that repetition prompts strong suppression of stimulus-nonselective neurons, with the stimulus-selective neurons being less impacted, resulting in decreased population-level activities (Desimone, 1996; Martens and Gruber, 2012). This model has also been linked to the formation of increasingly sparse, efficient networks for representing the repeated feature (Gruber et al., 2002; Li and Keil, 2023). Both models, fatigue and sharpening, have been mainly tested with single-unit recordings in monkeys where stimulus selectivity can be precisely defined. In humans, the typical univariate fMRI analysis involves the averaging of BOLD activity across voxels in a brain region of interest (ROI) and, as such, does not have the spatial resolution to test these models.
The advent of multivoxel pattern analysis (MVPA) offers a new perspective on testing the neural models of repetition suppression. Consider two repeating stimuli A and B (e.g., two Gabor patches of orthogonal orientations). Suppose that each stimulus evokes a distinct neural pattern in a cortical area (e.g., V1) and an SVM (support vector machine) classifier can be trained to decode the two patterns. From the SVM classifier, one can derive not only the decoding accuracy indexing the difference between the A pattern and the B pattern but also a weight map, which divides the voxels into those that prefer A and those that prefer B. Treating the A-preferring voxels and B-preferring voxels as “neurons” that are selective for A and for B, we can then examine the temporal dynamics of those voxels. Our first objective in this work is to apply the MVPA approach to fMRI data to test the competing models of repetition suppression in V1.
The number of stimulus repetitions in typical repetition suppression studies is often small, on the order of 5–10 repetitions, after which the response typically becomes stabilized (Grill-Spector and Malach, 2001). Behavioral studies have shown that perceptual abilities continue to improve over prolonged stimulus exposure, even in cases where the stimuli are not behaviorally relevant (Watanabe et al., 2001, 2002). Neurophysiologically, single-unit activities in the primary visual cortex continue to evolve with repeated presentations of the same stimuli over several days (Henschke et al., 2020). It is thus likely that the univariate fMRI approach lacks the ability to identify the learning processes taking place over longer time horizons. Our second objective in this work is to uncover these putative learning processes by applying the MVPA approach.
fMRI data were recorded from subjects passively viewing repeated presentations of two Gabor patches (45 vs 135) in a random order. Each Gabor patch was presented 60 times for a total of 120 trials. Both univariate and multivariate approaches were applied to analyze the data from V1. Particular emphasis was placed on (1) identifying the neural mechanisms of repetition suppression at the multivoxel pattern level and (2) exploring learning processes beyond the timescale used in typical repetition suppression studies.
Materials and Methods
Participants
The experimental protocol was approved by the Institutional Review Board of the University of Florida. Eighteen healthy individuals (nine females and nine males; mean age, 21.3 ± 4.5 years) gave written informed consent and participated in the study.
Experimental paradigm
The stimuli (20 × 20 cm), subtending ∼5° of the visual angle, were presented on a back-illuminated screen (60 × 60 cm) placed 230 cm away from the participant's head. The participant viewed the stimuli through a set of prismatic glasses attached to the radio-frequency head coil. Two Gabor patches with orthogonal orientations (45 and 135°), referred to as Gabor 1 (45°) and Gabor 2 (135°), were used as the stimuli. In each trial, the stimulus was displayed for 1 s, which was followed by a randomized intertrial interval (3, 5, or 7 s; Fig. 1A). The subject was asked to passively view the stimulus. A total of 120 trials, 60 trials for each of the two Gabor patches, were presented in a single scanning run; the order of stimulus presentation was randomized.
Experimental paradigm and ROI definition. A, Timeline of stimulus presentation. There were two Gabor patches, Gabor 1 (45°) and Gabor 2 (135°), and each was presented 60 times for a total of 120 trials. The order of stimulus presentation was randomized. B, Primary visual cortex (V1), defined according to a retinotopic atlas of the visual cortex, was chosen to be the ROI. ITI, intertrial interval.
We note that the data analyzed here came from the habituation session of a pavlovian fear conditioning paradigm consisting of three sessions: a habituation session (the present data), an acquisition session in which the CS+ (Gabor 1) was occasionally paired with a human scream (the unconditioned stimulus or US), and an extinction session in which 120 trials of the two Gabor patches were presented in a random order without the US. Two manuscripts have been published on these data to investigate the neural mechanisms of fear learning and extinction (Yin et al., 2018, 2020). The questions addressed here have not been addressed in those previous studies.
Data acquisition
The fMRI data were acquired on a 3 Tesla Philips Achieva MRI scanner (Philips Medical Systems) with the following parameters: TR of 1.98 s, TE of 30 ms, flip angle of 80°, field of view of 224 mm, 36 slices, and 3.5 × 3.5 × 3.5 mm voxels. A T1-weighted structural image was also obtained from each participant.
Data preprocessing
The fMRI data were preprocessed with SPM. Slice timing correction was first done using the interpolation method. The images were spatially realigned to the first stable volume, normalized and registered to the Montreal Neurological Institute template, and resampled to a spatial resolution of 3 by 3 by 3 mm. The transformed images were spatially smoothed with a 7 mm full-width at half-maximum Gaussian kernel. The BOLD time series were high-pass filtered with a cutoff frequency at 1/128 Hz to remove low-frequency temporal drifts.
ROI
The primary visual cortex (V1) was chosen as the ROI. The V1 ventral and V1 dorsal of both hemispheres from a probabilistic visual retinotopic atlas (Wang et al., 2015) were combined to form the V1 ROI (Fig. 1B). In this atlas, each voxel is assigned probabilities of belonging to various visual areas. A voxel was included in V1 here if it had the highest probability of being part of V1.
Single-trial estimation of BOLD response
The single-trial BOLD activation in each voxel was estimated by the beta series method based on the general linear model approach to event-related fMRI data analysis (Rissman et al., 2004). Specifically, in the general linear model, each trial was represented by a separate regressor, and six head motion regressors were included as regressors of no interest to account for potential movement artifacts. This method yielded 120 single-trial BOLD responses for every voxel in V1. Across all voxels in V1, 120 single-trial–level BOLD activation patterns enabled the trial-by-trial analysis of neural activity change with learning both at the univariate activation level and at the multivariate representation level.
Univariate analysis of fMRI data
For each subject, BOLD activation within V1 for each trial was averaged across all V1 voxels, which were then further averaged across 18 subjects to generate trial-by-trial population–level BOLD activation for V1. Systematic changes in univariate activation over trials were examined for evidence of repetition suppression and/or possibly other forms of learning.
MVPA analysis of fMRI data
Moving-window approach
The differences in multivoxel responses to Gabor 1 and Gabor 2 were examined using MVPA applied at the subject level. To study the temporal dynamics of multivoxel representations, we adopted a moving-window approach, in which a time period of interest (e.g., the time period of repetition suppression) was divided into overlapping analysis windows with each window consisting of an equal number of consecutive Gabor 1 and Gabor 2 trials and two neighboring windows separated by one trial (i.e., stepping the analysis window forward one trial at a time). The reason for the moving-window approach is that, on the one hand, decoding analysis requires a set of trials from each condition so that some trials can be used to build the classifier and the remaining trials can be used to test the classifier to yield decoding accuracy, and, on the other hand, to investigate learning dynamics, we need to examine how decoding accuracy evolves over time. The moving-window approach accommodates both requirements. Figure 2A shows an example where the window size is 7 and the step size is 1, namely, the first analysis window consisted of Gabor 1 Repetitions 1–7 (G1R1 to G1R7) and Gabor 2 Repetitions 1–7 (G2R1 to G2R7), and the second analysis window consisted of Gabor 1 and 2 Repetitions R2 to R8 and so on. The middle repetition was used to denote the time of the window. For example, the first analysis window in Figure 2A included Repetitions 1–7, and it was given a time stamp of 4.
Moving-window MVPA analysis. A, Definition of moving windows (G1R1 indicates first presentation of Gabor 1, G2R1 first presentation of Gabor 2, and so on). B, SVM classification and weight map within a window. In addition to decoding accuracy, the SVM classifier also generates a weight map, which defines two types of voxels: Gabor 1 voxels (red) and Gabor 2 voxels (blue). C, Gabor 1 voxels and Gabor 2 voxels are selective for Gabor 1 and Gabor 2, respectively.
Within each analysis window, the SVM method as implemented in the LibSVM package (Chang and Lin, 2011) was combined with leave-one-out cross–validation to yield a decoding accuracy for classifying Gabor 1 versus Gabor 2 based on the patterns evoked by the two stimuli. Specifically, suppose there are N repetitions of Gabor 1 and N repetitions of Gabor 2 in the window. N − 1 repetitions of Gabor 1 and N − 1 repetitions of Gabor 2 are used to train the classifier which is then tested on the remaining 1 repetition of Gabor 1 and 1 repetition of Gabor 2. Repeat the process N times, and average the N decoding accuracies to obtain the decoding accuracy for the analysis window. As the analysis window was stepped forward, the above process was repeated for the new analysis window, yielding another decoding accuracy. Across all windows a decoding accuracy time course was obtained for the subject. The population-level decoding accuracy time course was obtained by averaging the subject-level decoding accuracy time course across all subjects and taken to indicate learning-related changes in multivariate representations of the Gabors.
Statistical assessment of decoding accuracy
The Wilcoxon sign-rank test was utilized to assess the statistical significance of the decoding accuracy. Specifically, the decoding accuracy from each analysis window was compared against the chance level of 50%, and the resulting p values were corrected for multiple comparisons using a false discovery rate control threshold of p < 0.05. The cluster size (≥3 points) was further imposed to enhance statistical rigor.
Stage identification analysis
We applied the univariate and multivariate analyses in one continuous stream from 1 to 60 repetitions and tested whether “staging” can be consistently identified by combining both univariate activation and multivariate decoding methods. For the continuous-stream moving–window MVPA analysis, the window size was seven trials and the step size was one trial. To segment the univariate BOLD time course and the multivariate decoding accuracy time course, we applied a method called multivariate adaptive regression splines (MARS; Friedman, 1991). Breakpoints in the BOLD signal and decoding accuracy curves were identified as transition points between two linear segments. For the decoding accuracy time course, given that it is based on moving windows, an allowance of ±2 trial is granted for each breakpoint. By comparing the consistency of breakpoints between the two curves, we determined the number of distinct stages present in the learning process.
Weight map analysis
Voxel-level stimulus selectivity
In addition to detecting the differences in stimulus-evoked patterns, the MVPA also makes possible the determination of stimulus selectivity at the voxel level, opening the possibility for testing the models of repetition suppression using fMRI. For an SVM classifier, the weight vector is the vector that is normal to the hyperplane that maximally separates the data points of two different classes (e.g., Gabor 1 and Gabor 2). This vector has the same dimension as the number of voxels in a ROI (e.g., V1). The physiological interpretation of the classifier weight vector is not straightforward, however. Voxels that do not contain task-related signals can be given high weight whereas voxels that contain task-related signals may be given low weight (see below). Haufe et al. (2014) proposed a transformation that transforms the weight vector from the SVM classifier into a new vector according to A = covariance(X) * W, where the X is the recorded data, W is the weight vector from the SVM classifier, and A represents the transformed weight vector. The new weight vector, referred to as the weight map here, is shown to reflect more accurately the generative neurophysiological processes behind stimulus-evoked patterns (Haufe et al., 2014; Grootswagers et al., 2017).
To illustrate, consider a two-voxel ROI where voxel 1 [x1(n)] contains both the signal of interest, s(n), and noise, d(n), while voxel 2 [x2(n)] contains only noise, d(n). A simple approach to recover the signal of interest is to compute the difference x1(n) − x2(n) = wTx(n), where w = [1, −1]T is the weight vector of the model. However, this weight vector does not account for the physiological significance of the voxels, as Voxel 2 does not contain information about the signal of interest and yet it is given the same weight as Voxel 1. Assume that the signal of interest, s(n), is sampled from a normal distribution with a mean of 3 and a standard deviation of 1, while the noise, d(n), follows a normal distribution with a mean of 0 and a standard deviation of 1. The covariance matrix of the two-voxel system is
Figure 2B demonstrates the use of the weight map to identify stimulus selectivity at the voxel level. Specifically, the sign of the weight in each voxel obtained from the weight map provides information about which of the two stimuli (Gabor 1 vs Gabor 2) the voxel is selective for, and the magnitude of the weight denotes the strength of the selectivity. Suppose that Gabor 1 is assigned the sign of +1 and Gabor 2 the sign of −1. A voxel with a positive weight indicates that the voxel has higher activation for Gabor 1 compared with Gabor 2 and is thus deemed to be selective for Gabor 1. A voxel with a negative weight will have an opposite activation profile and is thus deemed to be selective for Gabor 2. See Figure 2C for an illustration. In both cases, the magnitude of the weight determines the magnitude of differential activation. To reduce the possible adverse effects of noise, we rank-ordered the voxels according to the magnitude of the weight and took the top 80% of voxels to compute the BOLD activation of two types of voxels. It is worth noting that the results were robust to the choice of the threshold. The same findings were obtained for threshold ranging from 70 to 100%.
MVPA formulation of repetition suppression models
Temporal dynamics of decoding accuracy
The fatigue and sharpening models make distinct predictions regarding the dynamics of the decoding accuracy between the two Gabors. According to the fatigue model, the decoding accuracy between the two repeating Gabors should decline as learning progresses, as stimulus-selective neurons reduce their response amplitude and the activity of stimulus-nonselective neurons remains relatively stable, leading to less distinctively different neural patterns between the two stimuli; see Figure 3A, top. In contrast, according to the sharpening model, the decoding accuracy should increase with increasing stimulus repetition, due to the fact that the stimulus-nonselective neurons decrease in activity with learning and the activity of stimulus-selective neurons stays relatively stable, leading to more distinctively different patterns between the two stimuli; see Figure 3A, bottom.
MVPA formulation of fatigue and sharpening models. A, The fatigue model predicts that the two patterns evoked by Gabor 1 and Gabor 2 become less distinct with stimulus repetition, resulting in decreased decoding accuracy (top). The sharpening model, in contrast, predicts that two patterns evoked by Gabor 1 and Gabor 2 become more distinct with stimulus repetition, resulting in increased decoding accuracy (bottom). B, The fatigue model predicts that BOLD activity in stimulus-selective (stimulus-nonselective) voxels decreases (stays unchanged) with stimulus repetition (left panel). The sharpening model predicts that BOLD activity in stimulus-nonselective (stimulus-selective) voxels decreases (stays unchanged; right panel).
To test these predictions, the decoding accuracy as a function of moving windows would be generated according to the procedure outlined above. Linear regressions were fitted to the decoding accuracy time course and the slope was subjected to statistical analysis.
Temporal dynamics of BOLD response in stimulus-selective and stimulus-nonselective voxels
The fatigue and sharpening models make distinct predictions regarding the response dynamics of stimulus-selective and stimulus-nonselective voxels. According to the fatigue model, the BOLD activation in stimulus-selective voxels should exhibit decline, whereas the BOLD activation in stimulus-nonselective voxels should show minimal change (Fig. 3B, left). In contrast, the sharpening model predicts a decline in the BOLD activation in stimulus-nonselective voxels, whereas the BOLD activation in stimulus-selective voxels is expected to exhibit minimal change (Fig. 3B, right).
To test these predictions, for a given Gabor patch, the BOLD activation in voxels selective for the Gabor patch was first averaged for a trial, which was then averaged across subjects; the resulting value was then displayed as a function of trials. Similar temporal activation function of trials can be constructed for the stimulus-nonselective voxels. Linear regressions were fitted to the BOLD activation time courses of two types of voxels, and the slope was subjected to statistical analysis.
Results
fMRI data were recorded while subjects passively viewed the repeated presentation of two Gabor patches (Gabor 1 = 45°; Gabor 2 = 135°). Each Gabor was presented 60 times for a total of 120 trials. The order of stimulus presentation was randomized. We analyzed the fMRI data using both univariate and multivariate methods to assess learning and plasticity in the primary visual cortex (V1).
Learning stage identification
We applied both univariate and multivariate analyses in one continuous stream to Repetitions 1–60 and tested whether learning stages can be consistently identified across the two different types of analysis methods. Univariate analysis was done at the single repetition level. For each Gabor 1 trial, the BOLD activation was averaged across voxels of V1 and then across all subjects. Repeating the procedure for all 60 repetitions resulted in the BOLD activation time course for Gabor 1. The BOLD activation time course for Gabor 2 was similarly obtained. The two time courses were then averaged. MVPA analysis was done using the moving-window approach with the window size being 7 repetitions for each Gabor (a total of 14 trials) and the step size being 1 repetition (Fig. 2A). Within each window, we decoded the differences between the multivoxel responses evoked by Gabor 1 and Gabor 2, and the decoding accuracy was obtained by combining SVM with leave-one-trial-out cross–validation and then averaged across all subjects. Figure 4, A and B, shows the univariate BOLD activation as a function of repetitions and the MVPA decoding accuracy as a function of moving windows with each window indexed by the middle repetition. Applying MARS (Friedman, 1991), six linear segments were identified for each of the two time courses in Figure 4, A and B, as determined by the generalized cross-validation criterion (Friedman, 1991; Hastie et al., 2009).
Learning stages in V1 revealed by univariate and multivariate analyses. A, Univariate BOLD activation as a function of stimulus repetition and the time course segmentation by multivariate adaptive regression. B, Moving-window decoding accuracy time course and the time course segmentation by multivariate adaptive regression. C, Multivariate adaptive regressing fittings for univariate and multivariate time courses in A and B. Break points were marked by small circles. Only the first segment (repetition suppression) was consistently identified by both the univariate and the multivariate methods. D, Break point comparison between model fittings of univariate BOLD activation and multivariate decoding accuracy. Again, only the first breakpoint was consistently identified across methods. E, Two-stage segmentation of learning dynamics.
For a model with six linear segments, there are five breakpoints, with each breakpoint being the repetition at which two neighboring segments join. For decoding accuracy time course, given that it is based on moving windows, each of which contains seven repetitions, an allowance of ±2 repetitions is granted for each breakpoint. The breakpoints of the univariate BOLD time course are repetitions 14, 25, 32, 38, and 54 (Fig. 4A). The breakpoints of the decoding accuracy time course are windows centered at Repetition 14, 18, 26, 43, and 46 (Fig. 4B). To facilitate comparison, we plotted together the multisegment linear fits in Figure 4, A and B, in Figure 4C, where the break points were marked by small colored circles. As one can see, except for the first segment, which starts at Repetition 1 and ends at Repetition 14, there are no consistently identified breaking points between the two methods. To demonstrate this another way, in Figure 4D, for each breakpoint, the repetitions from the univariate analysis and from the MVPA analysis were plotted. The repetitions at which Breakpoint 1 occurred from the two methods overlapped, suggesting that Stage 1, the repetition suppression stage, was consistently identified by both the univariate and the multivariate methods. Beyond the first segment, however, there is no consistent segmentation between the univariate BOLD activation and the multivariate decoding time courses. Given these results, and the lack of prior knowledge regarding the cortical dynamics beyond the initial repetition suppression, we concluded that a two-stage model is appropriate for the data with Repetitions 1–14 being Learning Stage 1 and Repetitions 15–60 being Learning Stage 2 (Fig. 4E).
Univariate analysis of Stage 1 and Stage 2
For the first 14 repetitions of the stimuli, labeled S1 (Stage1) in Figure 4E, the repetition suppression effect in the BOLD activation time course (Fig. 4A) was apparent, whereas for the remaining repetitions, labeled S2 (Stage 2), there were significant fluctuations but no clear trend. Quantitatively, for S1, linear regression models were fit to each subject's data, and the slopes of regression models (−0.081 ± 0.038) were found to be significantly negative at p = 0.024. Furthermore, the BOLD activation of the 14th repetition was significantly lower than that of the first presentation at p = 0.01. For S2, the slopes of the linear regression models were not significantly different from 0 (p = 0.93). See also Figure 6A where the smoothed BOLD activation time course from S2 was shown, which again revealed no systematic trend in S2.
Multivariate analysis: Stage 1
Figure 5 shows more detailed multivariate analysis of Stage 1 learning. The moving-window decoding accuracy time course in Figure 5A showed that the decoding accuracy in Window 1 (Repetition 4) was 59%, which is significantly higher than the chance level of 50% (p < 0.01), increased to 73% (p < 0.001) in Window 2 (repetition 5), and then decreased to 53% at the end of S1, which was no longer significantly above chance (p = 0.3). The generally decreasing trend of the decoding accuracy shows that at the multivoxel pattern level, Gabor 1 and Gabor 2 evoked less distinguishable neural patterns in V1 toward the end of repetition suppression. Regression fits to the data yielded a slope of −2.2 ± 0.79 which was significantly <0 at p = 0.006 (Fig. 5A). The diminishing decoding accuracy, according to the model formulation in multivoxel terms in Figure 3A, is consistent with the prediction of the fatigue model.
More detailed MVPA analysis of Stage 1 learning dynamics. A, Moving-window decoding accuracy in V1 across stimulus repetition. The asterisk indicates decoding accuracy significantly above the chance level (50%) at p < 0.05. The repetition number is the middle repetition of each window. B, BOLD activation time course in stimulus-selective voxels. C, BOLD activation time course in stimulus-nonselective voxels. D, Weight map correlation for all pairwise combinations of moving windows. Only weight maps in decodable windows were used. All correlations are significantly positive (p < 0.05). E, Weight maps from Window 1 (Repetition 4) and Window 5 (Repetition 8) of one-sample participant are shown on the flattened brain. The warmer color (yellow) indicates voxels that are selective for Gabor 1, and the cooler color (blue) indicates voxels that are selective for Gabor 2.
To test this idea further, we utilized the weight map analysis and defined two types of voxels: one that selectively responds to Gabor 1 and one that selectively responds to Gabor 2. For stimulus-selective voxels, responses of Gabor 1-selective voxels to Gabor 1 and responses of Gabor 2-selective voxels to Gabor 2 were averaged and plotted as a function of stimulus repetition. As shown in Figure 5B, the responses of the stimulus-selective voxels exhibited a statistically declining trend, with the slope being −0.14 ± 0.05 which is significantly <0 at p = 0.007. For the stimulus-nonselective voxels, responses of Gabor 1-selective voxels to Gabor 2 and responses of Gabor 2-selective voxels to Gabor 1 were averaged and plotted as a function of stimulus repetition. The slope was −0.016 ± 0.045 which was not significantly different from 0 (p = 0.36; Fig. 5C). Moreover, the slopes in Figure 5, B and C, are significantly different at p = 0.013. Based on Figure 3B, these weight map results demonstrated that the fatigue model accounted for the neural dynamics underlying Stage 1 learning, consistent with the conclusion drawn from the decoding accuracy dynamics in Figure 5A.
Whether the neural representations of the two stimuli were stable during the course of repetition suppression was tested by computing correlation across weight maps for all pairwise combinations of moving windows. As shown in Figure 5D, all the correlations were significantly positive (p < 0.05), demonstrating that the patterns underlying the two stimuli stayed relatively stable. Figure 5E shows the weight maps from Window 1 (Repetition 4) and Window 5 (Repetition 8) for a typical subject. The consistency between the two maps was readily noticeable.
Multivariate analysis: Stage 2
It can be seen that the moving-window decoding accuracy time courses for Stage 2 learning in Figure 4, B and E, displayed significant fluctuations. To see the long-term trend more clearly, we adopted a smoothing approach in which the moving-window size was increased to 20 repetitions and the step size remained at 1 repetition. As shown in Figure 6A, at the beginning of S2, the decoding accuracy was at the chance level, but with further repetitions of the stimuli, the decoding accuracy began to trend up, reaching statistical significance at Window 13 (Repetition 36) and staying above the chance level until the end of the experiment. Linear regression fitting of the data yielded a slope of 0.5 ± 0.1 which was significantly larger than 0 at p < 0.0001 (Fig. 6A). Thus, beyond the initial repetition suppression (Stage 1), there was a second stage of perceptual learning, in which the neural representations of the two Gabor patches become more distinct from one another with learning, in contrast to what was observed during Stage 1 where the neural representations of the two Gabors became less distinct with learning. According to the multivoxel formulation of the models in Figure 3A, the above temporal behavior of the decoding accuracy is consistent with a sharpening process. The smoothed univariate BOLD activation, however, showed no systematic trend (Fig. 6A), suggesting that learning may take place at the multivariate representational level without affecting population-level univariate activation.
MVPA analysis of Stage 2 learning dynamics with a larger window size. A, Moving-window decoding accuracy and smoothed univariate BOLD activation in V1 across stimulus repetition. The asterisk indicates decoding accuracy significantly above the chance level (50%) at p < 0.05. The repetition number denotes the middle repetition of each window. All correlations are significantly positive (p < 0.05). B, BOLD activation time course in stimulus-selective voxels. C, BOLD activation time course in stimulus-nonselective voxels. D, Weight map correlation for all pairwise combinations of moving windows. Only weight maps in decodable windows were used. E, Weight maps from Window 13 (Repetition 36) and Window 27 (Repetition 50) of one-sample participant are shown on the flattened brain. The warmer color (yellow) indicates voxels that are selective for Gabor 1, and the cooler color (blue) indicates voxels that are selective for Gabor 2.
We examined the sharpening idea further using the weight map analysis. For stimulus-selective voxels, responses of Gabor 1-selective voxels to Gabor 1 and responses of Gabor 2-selective voxels to Gabor 2 were averaged and plotted as a function of stimulus repetition. As shown in Figure 6B, the responses exhibited an increasing trend, where the slope was 0.03 ± 0.01 which was statistically significantly larger than 0 at p = 0.02. For stimulus-nonselective voxels, responses of Gabor 1-selective voxels to Gabor 2 and responses of Gabor 2-selective voxels to Gabor 1 were averaged and plotted as a function of stimulus repetition. A declining trend was seen although the slope was marginally statistically significant (p = 0.054; Fig. 6C). The slopes in Figure 6, B and C, are significantly different (p < 0.0001). Thus, the weight map analysis results are in line with the conclusion from decoding accuracy analysis, suggesting that the sharpening model accounted for the neural dynamics underlying Stage 2 learning. It is worth noting that the sharpening process seen here is slightly different from the one depicted in Figure 3B, in that the activity in the stimulus-selective voxels exhibited significant increase rather than remaining relatively constant. The opposing trends in the two types of voxels (Fig. 6B,C) may explain why the overall univariate BOLD activation, which takes the average across the two types of voxels, did not show any systematic trend (Fig. 6A).
Whether the neural representations of the two Gabor patches were stable during the course of repetition sharpening was tested by computing correlation across weight maps for all pairwise combinations of moving windows. The result in Figure 6D shows that the neural representations generally stayed stable during the course of S2 with the correlations between all pairwise window combinations being significantly positive (p < 0.05). Figure 6E shows the weight maps from Window 13 (Repetition 36) and Window 27 (Repetition 50) for a typical subject. The similarity between the two weight maps was readily noticeable.
Multivariate analysis: comparison of representations between Stage 1 and Stage 2
For much of Stage 1 and Stage 2, the two Gabor patches are decodable based on their distinct neural activation patterns in V1. The question is whether the decodable representations of Gabor 1 and 2 are shared across the two stages of learning. If the representations of Gabor 1 and Gabor 2 are similar between the two stages, then MVPA classifiers constructed on data from one stage would decode the data from another. However, as shown in Figure 7A, the cross-decoding accuracy was at the chance level for all possible pairwise combinations of S1 windows and S2 windows, indicating that during Stage 2 learning, new representations for the two Gabor patches evolved, which were different from that during Stage 1. The weight map correlation analysis further supports this observation, showing that the weight maps from S1 and S2 are not correlated for any of the possible S1 and S2 window combinations (Fig. 7B), in contrast to the within-stage weight map correlation in Figures 5D and 6D. To further visualize this, we plotted the weight maps from S1 and S2 for two sample participants. In these weight maps, the warmer color (yellow) represents voxels selective for Gabor 1, and the cooler color (blue) represents voxels selective for Gabor 2. Visually, it is apparent that voxels that are selective for Gabor 1/Gabor 2 in Stage 1 differ from those selective for Gabor 1/Gabor 2 in Stage 2 (Fig. 7C), which accorded with the chance-level cross–decoding result and the zero weight map correlation result above. Furthermore, as can be observed in Figure 7C, the dissimilarity in neural patterns is not only evident between learning stages within the same participant but also evident between different participants within the same learning stage, suggesting that even for simple stimuli such as Gabor patches and in the early visual cortex such as V1, the neural representations are individual-specific.
Cross-stage decoding and weight map analysis. A, Classifiers trained on analysis windows in Stage 1 were applied to test data from Stage 2. The cross-decoding accuracy was at the chance level (50%). The repetition number indicates the middle repetition in each window. B, Weight map correlations for all cross-stage pairwise combinations of analysis windows were not significantly different from 0. C, Weight maps from two participants in V1 are shown on the flattened hemispheres. The warmer color (yellow) indicates voxels that are selective for Gabor 1, and the cooler color (blue) indicates voxels that are selective for Gabor 2.
Discussion
Participants viewed 120 presentations of two Gabor patches (each repeating 60 times) in a random order, while their fMRI was recorded. Two types of learning behavior were found in V1. In Stage 1 learning, BOLD activation decreased over the first 14 repetitions of the stimuli, replicating the well known effect of repetition suppression, and the moving-window decoding accuracy between the two Gabors went from being above the chance level (∼60 to ∼70%) at the beginning to being at the chance level at the end of the stage (∼50%), which, along with the attendant weight map analysis, suggested that the learning dynamics in Stage 1 were consistent with the predictions of the fatigue model. In Stage 2 learning, BOLD activation for the remaining 46 repetitions exhibited significant fluctuations but no systematic trend, whereas the moving-window decoding accuracy between the two Gabor patches was at the chance level initially and became progressively higher as stimulus repetition continued, rising above the chance level at ∼36th repetition and staying above the chance level until the end of the experiment. This result, along with the attendant weight map analysis, suggested that a sharpening process underlies the learning dynamics in Stage 2.
Stage 1 learning: repetition fatigue
Repetition suppression is a robust index of short-term perceptual learning (Sobotka and Ringo, 1996; Naccache and Dehaene, 2001). For example, a single-unit study in nonhuman primates has reported that neurons in inferior temporal cortex (IT) show a continual reduction in the firing rate with each stimulus presentation for up to 6–10 repetitions, after which the response became stable (Li et al., 1993; Sawamura et al., 2006). fMRI studies have found that the first few stimulus repetitions produced a substantial reduction in signal strength in LOC, and no further reduction was observed upon continued repetitions of the stimuli (Grill-Spector and Malach, 2001; Sayres and Grill-Spector, 2006). Our univariate BOLD activation results, showing repetition suppression for the first 14 repetitions of the stimuli, replicate these findings in V1.
Among the models that have been proposed to account for the neural mechanisms of repetition suppression, the fatigue model posits that stimulus repetition reduces the responsiveness of neurons selective for the repeating stimuli (Carandini and Ferster, 1997; Carandini, 2000; Grill-Spector et al., 2006), whereas the sharpening model, in contrast, posits that stimulus repetition sharpens neuronal tuning by reducing the activity of neurons not selective for the repeating stimuli; both models can explain the reduction of responses at the population level (Wiggs and Martin, 1998; Grill-Spector et al., 2006). Evidence in support of one or the other model has appeared in single-unit studies. For example, Freedman et al. and Baker et al. both demonstrated that the repetition of stimuli leads to increased selectivity in the neural responses with a concurrent attenuation of activity in neurons that are less selective (Baker et al., 2002; Freedman et al., 2006), aligning with the predictions of the sharpening model. Conversely, Li et al. (1993) reported that the most significant reduction in neuronal response occurs for those stimuli that initially elicited the largest response, a finding that lends support to the fatigue model. In terms of terminology, we note that alternative terms such as scaling have been used in place of fatigue (McMahon and Olson, 2007), which place more emphasis on the empirical pattern and less emphasis on the underlying physiological mechanism.
In humans, the testing of these models has proven to be not straightforward (Jiang et al., 2007; Gotts et al., 2015; Rideaux et al., 2023). Despite that repetition suppression has been consistently observed with fMRI in a variety of brain regions, ranging from V1 to higher-order visual regions such as IT to executive structures in the frontal lobe (Buckner et al., 1998; Kourtzi and Kanwisher, 2001; Ewbank et al., 2005; Sawamura et al., 2005; Weigelt et al., 2008; Pourtois et al., 2009), the commonly applied univariate fMRI approach, which involves the averaging of activities across voxels in a given ROI, lacks the requisite spatial resolution and response specificity for testing the models of repetition suppression. Weiner et al. (2010) examined neural responses of stimulus-selective voxels under novel and repeated stimulus conditions, revealing a reduction in the fMRI signal for repeated stimuli. Their study, however, did not consider activities in stimulus-nonselective voxels; in addition, the lack of single-trial data limited the investigation of neural representation dynamics.
Over the past two decades, the advent of the MVPA method has opened new avenues for addressing challenging neuroscience questions, yielding insights not possible with the univariate approach (Said et al., 2010; Visser et al., 2013; Bo et al., 2021, 2022; Čeko et al., 2022). In a typical MVPA analysis, two experimental conditions are compared by training an SVM classifier on the neural data evoked by the two experimental conditions, with above-chance classification or decoding accuracy taken as the evidence that the two experimental conditions evoke distinct neural patterns. For the present study, the two experimental conditions were the two Gabor patches that were both repeating. By applying a moving-window MVPA approach to single-trial data, we examined how decoding accuracy evolves as a function of repetition. For the fatigue model to hold, the decoding accuracy between the two Gabors should decrease with stimulus repetition, whereas for the sharpening model to hold, the decoding accuracy between the two Gabors should increase with stimulus repetition. The result from the MVPA analysis, demonstrating decreasing decoding accuracy, suggests that the repetition suppression observed during Stage 1 results from a fatigue process. Similar findings have been made in a computational modeling study (Alink et al., 2018) and in the lateral ventral temporal cortex using complex visual stimuli (Weiner et al., 2010).
The weight map is another tool that can be derived from the SVM classifier. In particular, the transformation proposed by Haufe et al. (2014) enables the dividing of voxels in a ROI into those that are selective for Gabor 1 and for Gabor 2, respectively. By assessing the neural response in voxels that are selective for a given stimulus (stimulus-selective voxels) and the neural responses in voxels that are not selective for a given stimulus (stimulus-nonselective voxels), we are then equipped with the ability to test the models of repetition suppression in a way that is analogous to the single-unit studies in nonhuman primates. For the fatigue model to hold, neural activity in the stimulus-selective voxels should decrease and that in the stimulus-nonselective voxels should remain steady, whereas for the sharpening model to hold, neural activity in the stimulus-selective voxels should remain steady and that in the stimulus-nonselective voxels should decrease. The result from the weight map analysis, demonstrating decreasing neural activity in stimulus-selective voxel and steady neural activity in stimulus-nonselective voxels, supports the conclusion above that repetition suppression in Stage 1 results from a fatigue process.
Stage 2 learning: repetition sharpening
During Stage 2 learning (Repetition 15 to Repetition 60), univariate BOLD activity from V1 exhibited significant fluctuations but no systematic trend. Suspecting that the univariate analysis lacks the sensitivity to uncover the underlying learning effects (Op de Beeck et al., 2006), we applied the MVPA approach to complement the univariate analysis. The moving-window decoding accuracy indicated that the neural response patterns to Gabor 1 and Gabor 2 continued to evolve, becoming more distinct from one another, as evidenced by increasing decoding accuracy. From the weight map analysis, the BOLD response in stimulus-selective voxels increased with learning, whereas the BOLD response in stimulus-nonselective voxels decreased with learning, which agrees with the conclusion that a sharpening process underlies the second stage of repetition learning.
Past work has noted that the number of stimulus repetitions can impact the mechanisms of learning. Studies reporting sharpening have typically involved larger numbers of repetitions (Baker et al., 2002; Freedman et al., 2006), whereas studies reporting fatigue have typically involved fewer repetitions (McMahon and Olson, 2007; De Baene and Vogels, 2010). Our findings that in the same experiment, with increasing stimulus repetition, the initial repetition fatigue is followed by a subsequent repetition sharpening can be seen to advance the extant literature.
Methodological considerations
The findings reported here are made possible by several features of our methodology. First, using the two repeating stimuli paradigm, we were able to reformulate the models of repetition suppression in terms of multivariate representations, enabling the testing of the repetition suppression models in fMRI data. Second, decoding accuracy alone is insufficient for a comprehensive evaluation of different models, as identical decoding dynamics can result from different underlying voxel-level dynamics. The weight map analysis provides the necessary insights to disambiguate and thus complements the decoding accuracy dynamics. Third, the MVPA results during Stage 2, while suggesting a sharpening process, are not strictly in accordance with the definition of sharpening in the literature: both enhancement of activation in stimulus-selective voxels and suppression of activation in stimulus-nonselective voxels were observed in our data, whereas the traditional definition of sharpening places emphasis on the decreased activity in stimulus-nonselective neurons. The weight map analysis is again essential in establishing this finding.
Footnotes
This work was supported by National Institutes of Health Grant MH125615.
The authors declare no competing financial interests.
- Correspondence should be addressed to Mingzhou Ding at mding{at}bme.ufl.edu.