Abstract
Visual working memory (VWM) is used to maintain visual information available for subsequent goal-directed behavior. The content of VWM has been shown to affect the behavioral response to concurrent visual input, suggesting that visual representations originating from VWM and from sensory input draw upon a shared neural substrate (i.e., a sensory recruitment stance on VWM storage). Here, we hypothesized that visual information maintained in VWM would enhance the neural response to concurrent visual input that matches the content of VWM. To test this hypothesis, we measured fMRI BOLD responses to task-irrelevant stimuli acquired from 15 human participants (three males) performing a concurrent delayed match-to-sample task. In this task, observers were sequentially presented with two shape stimuli and a retro-cue indicating which of the two shapes should be memorized for subsequent recognition. During the retention interval, a task-irrelevant shape (the probe) was briefly presented in the peripheral visual field, which could either match or mismatch the shape category of the memorized stimulus. We show that this probe stimulus elicited a stronger BOLD response, and allowed for increased shape-classification performance, when it matched rather than mismatched the concurrently memorized content, despite identical visual stimulation. Our results demonstrate that VWM enhances the neural response to concurrent visual input in a content-specific way. This finding is consistent with the view that neural populations involved in sensory processing are recruited for VWM storage, and it provides a common explanation for a plethora of behavioral studies in which VWM-matching visual input elicits a stronger behavioral and perceptual response.
SIGNIFICANCE STATEMENT Humans heavily rely on visual information to interact with their environment and frequently must memorize such information for later use. Visual working memory allows for maintaining such visual information in the mind's eye after termination of its retinal input. It is hypothesized that information maintained in visual working memory relies on the same neural populations that process visual input. Accordingly, the content of visual working memory is known to affect our conscious perception of concurrent visual input. Here, we demonstrate for the first time that visual input elicits an enhanced neural response when it matches the content of visual working memory, both in terms of signal strength and information content.
Introduction
Humans navigate in a dynamic visual environment. Consequently, it is often necessary to maintain a visual representation “in the mind's eye” after the visual input has disappeared or changed. Visual working memory (VWM) is used to keep this visual information available for subsequent goal-directed behavior. During VWM maintenance, however, the visual system continues to receive visual input. This raises the question how the processing of visual input is affected by the concurrent contents of VWM.
Behavioral experiments have demonstrated that visual input matching the content of VWM attracts attention (Soto et al., 2005; Olivers et al., 2006; for review, see Soto et al., 2008) and eye movements (Hollingworth and Luck, 2009; Hollingworth et al., 2013; Schneegans et al., 2014; Silvis and Van der Stigchel, 2014). Along similar lines, visual input that matches the content of VWM gains preferential access to visual awareness compared with visual input that fails to match VWM content (Gayet et al., 2013, 2015, 2016; Scocchia et al., 2013; Pan et al., 2014; van Moorselaar et al., 2017). Thus, perceptual and behavioral advantages for visual input that matches the contents of VWM are well established. The neural mechanisms that enable these advantages, however, remain largely unknown.
Here, we consider the possibility that visual representations elicited by VWM and visual representations elicited by retinal input rely upon a shared neural substrate, a view known as sensory recruitment. This view is supported by recent fMRI studies, in which multivariate decoding of stimulus identity generalized across stimulus viewing conditions and VWM maintenance conditions (Harrison, and Tong, 2009; Serences et al., 2009; Stokes et al., 2009; Riggall and Postle, 2012; Albers et al., 2013), as well as between stimulus viewing conditions and visual imagery conditions (Cichy et al., 2012). Sensory recruitment also finds support in a recent study in which reported motion of transcranial magnetic stimulation-induced phosphenes was modulated by a concurrently memorized motion pattern (Silvanto and Cattaneo, 2010). Sensory recruitment would allow for the content of VWM to enhance visual input selectively when it matches, but not when it fails to match, the concurrently memorized content (Reynolds and Chelazzi, 2004; for similar views, see Chelazzi et al., 2011).
The aim of the present study was to investigate whether the neural response to visual input is enhanced when it matches the content of VWM. To manipulate the content of VWM, we used a delayed match-to-sample task in which participants were retro-cued to memorize one of two sequentially presented geometrical shapes drawn from three categories (rectangle, ellipse, and triangle) for subsequent recognition. During the retention interval, another shape was briefly presented (hereafter referred to as the “probe”). This probe was task-irrelevant, and could match the shape category of the cued (memorized) shape, of the uncued (discarded) shape, or of neither (unrelated). We expected that neural populations that respond to the presentation of the probe would show a stronger response when the probe matches rather than fails to match the content of VWM (i.e., memorized > discarded). In addition to enhancing the overall signal strength, we also set out to investigate whether VWM could enhance the information content of matching visual input. That is, if the same processing areas are recruited for perceiving a rectangle (or triangle) and for maintaining a rectangle (or triangle) in VWM, then the pattern of neural activity elicited by triangles and rectangles should be more distinct when perceived and maintained shape match, compared with when they mismatch. Hence, we conducted multivariate pattern analyses (MVPA) to investigate whether classification of shape category would yield higher classifier performance for visual input that matches compared with visual input that fails to matches the content of VWM.
Materials and Methods
Participants.
Fifteen students (three males; 24 years of age, SD = 4) participated for monetary reimbursement. All participants had (corrected to) normal vision, and had participated in ≥1 behavioral version of this experiment before taking part in the fMRI experiment. All participants gave their informed written consent before participating in this study, which complied with the guidelines set out in the Declaration of Helsinki and was approved by the local ethics committee.
Stimuli.
The shape stimuli (Fig. 1D) consisted of filled rectangles, isosceles triangles, and ellipses with a surface of 1 square degree of visual angle, thereby equating overall stimulus luminance. They were presented in dark gray (10% Weber contrast) on a lighter gray background (30 Cd/m2). Within each of these three shape categories, nine shape variants were created by varying the height-to-width ratio between 0.75 and 1.25, with smaller steps at the extreme ends (Fig. 1D). The stimuli presented during the retention interval (i.e., the probes) were rectangles, triangles, and ellipses with a height-to-width ratio of 1. These stimuli were presented at a fixed eccentricity of 3° of visual angle on one of seven equally interspaced locations on the left and right arcs of an imaginary circle, delimited by its main diagonals (i.e., at ±45, 75, 90, 105, 120, and 135°; Fig. 1B). Six other shape variations of each shape category were used for the memory task and, unlike the peripheral probes, were always presented at fixation. As such, the cued (i.e., memorized) and uncued (i.e., discarded) memory items were never identical to the probe stimulus, and always presented at a retinal location different from that the probe stimulus. Finally, the test items presented during the recognition task were presented left and right of fixation at an eccentricity of 1.5° of visual angle. These test items consisted of the cued (memorized) memory item, and an item with another shape drawn from the same shape category, but with a slightly different height-to-width ratio (i.e., either one step left or one step right; Fig. 1D).
A, Schematic depiction of a trial in the memorized condition. Each trial started with two centrally presented shape stimuli from different shape categories (the memory items), followed by a retro-cue indicating which of the two should be memorized for a later recognition task. During the delay interval, a different shape stimulus (the probe) was presented for 1 s at an unpredictable timing and location. After the delay interval, two test stimuli were presented, which were both drawn from the shape category of the cued (i.e., memorized) memory item. Participants were required to report which of these was the exact shape they had been cued to memorize at the start of the trial. B, Congruence conditions. The probe (here an ellipse) could either match the shape category of the cued memory item (memorized condition, as depicted here), the uncued memory item (discarded condition), or neither (unrelated condition). C, The memory items were presented at fixation, the probe could be presented on one of seven locations on each side of fixation, and the test items (during the recognition task) were presented at intermediate distances on both sides of fixations. D, The stimulus set, including the height-to-width (h/w) ratio of each shape used in this experiment. There were three shape categories, displayed on separate rows: rectangles, triangles and ellipses. The memory items were drawn from a collection of six distinct variations per shape category, varying in h/w. The probe could be one of three shapes: a rectangle, a triangle, or an ellipse with a h/w of 1. All stimuli depicted in this image could be presented in the recognition task of the experiment (i.e., the test phase).
Procedure.
Participants completed 144 experimental trials during the functional scans (divided into eight runs), and 6 min of practice trials during the preceding structural scan. Each trial (Fig. 1A) started with a fixation bull's-eye, which turned blue to indicate that the memory task would begin in 1 s. Participants were then successively presented with two shapes (the memory items), drawn from two different shape categories, for 400 ms each. This was followed by a 400 ms interval after which a retro-cue was presented for 800 ms. This retro-cue, either the number “1” or “2,” instructed participants to memorize either the first or the second memory item for later recognition. After a randomly jittered delay of 4.5 ± 1.5 s, a task-irrelevant shape stimulus (the probe) was presented for 1 s at one of 14 possible locations (seven left of fixation, seven right of fixation). Crucially, the probe could either match the shape category of the cued memory item (hereafter referred to as the memorized condition), it could match the shape category of the uncued memory item (discarded condition), or it could match the shape category that was not used on that trial (unrelated condition; Fig. 1B). After another delay of 7.5 ± 2 s, two test stimuli appeared at fixation for 1.5 s, one of which was identical to the cued (i.e., memorized) memory item, and one of which had a slightly different height-to-width ratio (one step in either direction). By pressing a button with the left-hand or right-hand index finger, participants reported which of these two test stimuli (left or right of fixation) was identical to the cued memory item. After participants gave a response, or after 3.5 s had passed, the fixation bull's-eye changed from blue to red to indicate that the trial had ended. After an intertrial interval of 3 ± 1.5 s, the fixation bull's-eye turned blue again, to indicate that the next trial would begin after one second.
Experimental design.
The experimental design comprised the within-subject factor Congruence (probe matches memorized, discarded, or unrelated shape category) as a main factor of interest. Factors of no interest included the shape of the probe (rectangle, triangle, or ellipse), the hemifield to which the probe was presented (left or right of fixation), the retro-cue (instruction to memorize first or second shape), and the correct answer in the memory task (test item left or right of fixation is identical to the cued memory item). A number of other factors of no interest were also equally (and randomly) distributed over the entire experiment, but were not counterbalanced with the other factors: the exact shape (i.e., the height-to-width ratio; Fig. 1D) of the cued and uncued memory items, the exact shape of the incorrect answer in the memory-recognition task (higher or lower height-to-width ratio than the cued memory item), and the exact angular position of the probe (one of seven positions within each hemifield; Fig. 1C).
Functional localizer.
We conducted a separate functional localizer run after the experimental runs. The aim of the functional localizer run was to locate the brain regions responsive to the presentation of our stimuli, relative to baseline (i.e., compared with a situation in which fixation was maintained on the background, but no additional shape stimuli were presented) and relative to scrambled versions of the shape stimuli. The scrambled stimuli were obtained by randomly rearranging an eight-by-eight tiling of a square-shaped area containing the shape stimuli. The localizer comprised a miniblock design with presentation of intact shapes, scrambled shapes, and fixation-only (baseline) as blocked conditions. Each of these three miniblocks lasted 29.4 s, and was separated by interblock intervals of 3.5 ± 2 s. The sequence of three miniblocks was repeated eight times in random order. Within a single miniblock, each of the 21 different shapes was presented once to the left hemifield and once to the right hemifield, for 450 ms followed by a fixation interval of 250 ms. Within each hemifield, each of the seven possible locations was used twice in random order. Participants were instructed to maintain fixation during the entire run and to press a button whenever they perceived the same shape twice in succession. This would occur at an unpredictable moment, three times per miniblock.
MRI data acquisition and preprocessing.
Functional MRI data were acquired on a 3 tesla Trio MRI system (Siemens) equipped with a 12-channel head coil, using a T2-weighted gradient-echo EPI sequence. The fMRI session comprised eight experimental runs and a functional localizer run. In each of the experimental runs, 213 whole-brain volumes were acquired. In the functional localizer run, 242 whole-brain volumes were acquired. The fMRI runs (2000 ms repetition time; 25 ms echo time; 78° flip angle; voxel size, 3 mm isotropic; 33 slices acquired in descending order; 0.75 mm interslice gap) were preceded by a high-resolution T1-weighted MPRAGE structural scan (192 sagittal slices, 1900 ms repetition time, 2.52 ms echo time, 9° flip angle, 256 mm field of view).
Preprocessing was performed using SPM12 (www.fil.ion.ucl.ac.uk/spm) and included slice-time correction, spatial realignment and coregistration with the structural image, and field-map correction. Additionally, normalization to the standard Montreal Neurological Institute template and smoothing with an 8 mm Gaussian kernel were applied for the univariate analyses.
Regions of interest.
The present study aimed at investigating whether the neural response to visual input is enhanced when it matches the content of VWM. Considering this research question, we constrained our analyses to those voxels responsive to the visual presentation of our stimuli. This helped us investigate whether these voxels would show an enhanced response when the probe matched (as opposed to mismatched) the content of VWM. For this purpose, we created our regions of interest (ROIs) based on the set of voxels that showed a significant response (pFWE < 0.05) in the stimulus > baseline contrast of the functional localizer run. The resulting activation pattern (1583 voxels) comprised the lateral occipital cortex, the inferior and superior parietal lobules, and the posterior part of the frontal lobe (Fig. 2C). We defined separate ROIs for these three regions by intersecting the localizer-based activations with anatomical masks derived from the Automated Anatomical Labeling Atlas (Tzourio-Mazoyer et al., 2002), respectively comprising all occipital masks (label numbers 49–54), all parietal masks (59–62), and frontal/precentral masks (1–16). Importantly, we avoided the circularity issues associated with “double dipping” by using a separate functional run as the basis for our ROI selection (Kriegeskorte et al., 2009). That is, the dataset on which the ROI analyses were conducted (experimental runs) was independent from the dataset from which the ROIs were constructed (functional localizer run). Note that, as the intact > scrambled contrast of the functional localizer run was unsuccessful in targeting shape-responsive voxels in visual cortex, this contrast was not used for generating ROIs.
Classification accuracies (and SEMs) for dissociating between the three shape categories used in this experiment (i.e., rectangles, ellipses, and triangles), within our three ROIs and combinations thereof. A leave-one-run-out cross-validation procedure on the eight miniblocks of the separate functional localizer run revealed that the compound occipital–parietal ROI, consisting of the occipital and parietal ROIs, yielded the highest classification accuracies. This was also the only ROI in which each individual classification of shape pairs was above chance. Hence, we chose this compound occipital–parietal ROI for conducting multivariate analyses on the fMRI data from the experimental runs. *p < 0.05, **p < 0.005. Black stars indicate significant classification for pairs of shape categories (e.g., rectangle vs triangle), whereas white stars indicate significant classification averaged over the three pairs of shape categories.
For the multivariate analyses, we were interested in targeting those brain regions most likely to contain visual representations of our stimuli. For this purpose, we first investigated which of our three (i.e., occipital, parietal, and frontal) ROIs, or combinations thereof, yielded the highest classification accuracies for dissociating between the different shape categories (i.e., rectangles, triangles, or ellipses) in the separate functional localizer run. For each subject, we first estimated a general linear model (GLM) based on the unsmoothed and non-normalized data. This GLM included six motion regressors and 24 regressors of interest—one for each of the three shape categories and each of the eight miniblocks. The estimated β images from the GLM were used for support vector machine (SVM) classification. SVM classification was performed with The Decoding Toolbox (Hebart et al., 2014), using a linear SVM (libsvm). Classification was performed following a leave-one-run-out cross-validation procedure (in which the different “runs” were effectively different miniblocks of the same functional localizer run). On each iteration, the classifier was trained on the β maps of seven miniblocks and tested on the β maps of the remaining eighth miniblock. Classification was done separately for the three pairs of shape categories (rectangle vs triangle; rectangle vs ellipse; and triangle vs ellipse) at the subject level, and subsequently compared between ROIs at the group level. These analyses revealed that classification accuracies were highest (and most consistent across pairs of shape categories) in the compound occipital–parietal ROI (61.7%, SE = 2.5; t(14) = 4.61, p < 0.001). As can be seen in Figure 2, the pattern of results generally suggested that the frontal ROI did not contribute to the performance of the classifier to distinguish between shape categories. Hence, we used the compound occipital–parietal ROI as a primary ROI for our multivariate analyses. For explorative purposes, we will also report multivariate analyses conducted within the separate occipital, parietal, and frontal ROIs. Again, and importantly, the ROIs were created (and selected) on the basis of a different dataset than the dataset on which the eventual analyses were conducted, to avoid the issues associated with double dipping.
Univariate fMRI data analysis.
To investigate how the neural response to the probe was affected by its match with the content of VWM, we estimated a first-level GLM. This GLM included three regressors tied to the onset of the probe, depending on the different levels of the factor Congruence (memorized, discarded, unrelated), as well as regressors of no interest tied to the onsets of both memory items, the retro-cue, and the two test items (regardless of the different factor levels). These regressors were modeled as stick functions (i.e., duration set to zero) and were convolved with the canonical hemodynamic response function provided in SPM12. Additionally, six regressors for head motion—from the spatial realignment procedure—were included in the GLM. The whole-brain maps of parameter estimates from the GLM were used to compute separate contrast images for the response to the probe in the memorized, discarded, and unrelated conditions against baseline.
Our research question was addressed in two different univariate analysis approaches. First, we performed an exploratory whole-brain analysis to investigate which brain regions were modulated by the match between the probe and the content of VWM. For this purpose, we estimated a one-way repeated-measures ANOVA for the three Congruence conditions (memorized, unrelated, and discarded). Subsequent pairwise t contrasts at the group level were conducted between each pair of Congruence conditions (memorized vs discarded; memorized vs unrelated; discarded vs unrelated) to investigate the nature of the main effect of Congruence. Significance was determined at the cluster level (familywise corrected; i.e., pcFWE < 0.05, for a cluster-defining threshold of puncorrected < 0.001).
Second, we aimed to more directly test our hypothesis that the response to the probe (as determined by the stimulus > baseline contrast from the localizer) is enhanced when it matches the content of VWM. With this aim, we extracted the average parameter estimates for each of the three Congruence conditions within the three different ROIs described above (occipital, parietal, and frontal) and their combination. A repeated-measures ANOVA with the factors ROI and Congruence was performed to test for a main effect of Congruence across ROIs. The critical analyses were subsequent repeated-measure ANOVAs with the factor Congruence within each ROI, followed by pairwise comparisons testing for a higher average parameter estimate in the memorized condition compared with the discarded and unrelated conditions.
Multivariate fMRI data analyses.
A first multivariate approach was undertaken to investigate whether shape-specific information could be retrieved from our primary ROI. This is important because brain regions that contribute to a congruency effect between memorized shapes and perceived shapes should contain shape-specific information of both. Hence, we first investigated whether we could obtain above-chance classification accuracy for the content of VWM throughout the retention period within our primary ROI (ignoring congruency for now). For this purpose, we estimated for each subject a GLM based on the unsmoothed and non-normalized data. The GLM included six motion regressors, and shape-specific regressors (i.e., rectangle, ellipse, or triangle shape categories) for the two primes, for the to-be-memorized prime during the retention interval, for the probe, and for the two test stimuli. Regressors for the primes, probe, and test stimuli were modeled as stick functions (i.e., duration set to zero), whereas regressors for the to-be-memorized prime were modeled as a duration between retro-cue onset (instructing participants which prime should be memorized) and probe presentation. All regressors were convolved with the canonical hemodynamic response function provided in SPM12. Choosing the onset of probe presentation as an endpoint of the retention interval allowed for assessing whether a memory signal was present preceding probe onset, while ensuring that the timing of the regressor was jittered with respect to the timing of the subsequent test stimuli (which were always of the same shape category as the to-be-memorized prime).
A second multivariate approach was undertaken to further address our main research question: whether VWM enhances the neural response to matching visual input not only in terms of signal strength (i.e., an increased univariate response), but also in terms of information content. For this purpose, we aimed to investigate whether classifier performance increases when the content of VWM (the to-be-memorized prime) matches rather than mismatches the concurrent visual input (i.e., the probe). According to the idea of sensory recruitment, the same pattern of neural activity should represent a specific shape (say an ellipse) regardless of whether it stems from visual input or from VWM. Hence, the patterns of activity associated with two different shapes (say an ellipse and a rectangle) should be more distinct (and therefore easier to dissociate) in case the probe and the to-be-memorized prime are in accordance compared with when they are in discordance. Using MVPA, we expected that an increased “distinctness” would be measured as higher classification accuracies between shape categories when the shape category of the probe (i.e., the visual input) matched rather than mismatched the shape category of the to-be-memorized prime (i.e., the content of VWM).
For these MVPA analyses, we first estimated for each subject a GLM based on the unsmoothed and non-normalized data. The GLM included six motion regressors, and shape-specific regressors (i.e., rectangle, ellipse, or triangle shape categories) for the two primes, and for the two test stimuli. All regressors were modeled as stick functions (i.e., duration set to zero) and were convolved with the canonical hemodynamic response function provided in SPM12. In case of the probe, nine regressors were considered, accounting for each combination of probe and VWM content (e.g., probe is triangle, VWM content is rectangle, etc.). This allowed us to compare classification accuracy for probes that matched the content of VWM (e.g., rectangle probe with rectangle in VWM vs triangle probe with triangle in VWM), with classification accuracy for probes that mismatched the content of VWM (e.g., rectangle vs triangle probe, both with ellipse in VWM). To assess whether classifier performance in the former case does not only reflect classification of VWM content during probe presentation, we also performed shape classification of probe-mismatching VWM content (e.g., rectangle vs triangle in VWM, both with ellipse as probe) during probe presentation. Thus, if VWM enhances the neural response to matching visual input in terms of information content, this would be revealed by an increased classifier performance for VWM-matching probes that cannot be explained by classifier performance for either VWM-mismatching probes or for probe-mismatching VWM.
For both multivariate approaches, the estimated β images from the GLM were used for SVM classification, using a linear SVM (libsvm) in The Decoding Toolbox (Hebart et al., 2014). Classification was initially performed within our primary ROI (compound occipital–parietal ROI, based on the separate functional localizer run), following a leave-one-run-out cross-validation procedure. On each iteration, the classifier was trained on the β maps of seven runs and tested on the β maps of the remaining eighth run. Next, statistical analyses of classifier performance were conducted at the group level. We conducted two-sided one-sample t tests (i.e., α = 0.05) to establish above-chance classifier performance in the different conditions (i.e., classification accuracy, >50%). For our main analysis, chance-level classifier performance was further analyzed with directional Bayesian one-sample t tests (using the standard Cauchy prior width of 0.707) or Bayesian paired-samples correlations (using the standard β prior width of 1) using JASP (JASP Team, 2016) to distinguish between experimental insensitivity (BF01 < 3) and robust support for the null hypothesis (BF01 > 3; Dienes, 2014). We also assessed robustness to wider priors (note that very narrow priors not only disadvantage H1, but also H0).
Results
Behavioral results
Participants were 62% (SD = 6) accurate in reporting which of two shape variations was identical to the cued (i.e., memorized) shape, which is above chance according to a one-sample t test against chance-level performance of 50% (t(14) = 8.498, p < 0.001). Thus, the task was feasible but demanding, which is a known requirement for delayed match-to-sample tasks to draw upon VWM, as simpler memory tasks allow participants to assign verbal labels to the different stimuli (Olivers et al., 2006, their Exp. 1). A paired-samples t test revealed that participants' performance did not depend on whether they were cued to memorize the first or the second memory item (accuracy difference of 2.1%, SD = 9; t(14) = 0.910, p = 0.378). Also, within-subject repeated-measures ANOVAs revealed that participants' accuracy on the recognition task did not significantly differ between experimental runs (ranging between 60 and 64%, SD ranging between 7 and 9; F(8,112) = 0.728, p = 0.541). Also, accuracy did not differ between the rectangle, triangle, and ellipse shape conditions (ranging between 61 and 64%, SD ranging between 7 and 9; F(2,28) = 1.381, p = 0.268). Finally, accuracy did not differ between the memorized, discarded, and unrelated Congruence conditions (ranging between 61 and 63%, SD ranging between 7 and 8; F(2,28) = 0.614, p = 0.548). Thus, recognition task difficulty was comparable both across conditions and over the course of the experiment.
fMRI univariate results
To assess the influence of the contents of VWM on the neural response to the probe, we first conducted a mass univariate analysis to reveal clusters of voxels in which the factor Congruence could explain variance in the BOLD response. A whole-brain repeated-measures ANOVA showed a main effect of Congruence in eight different clusters (pcFWE < 0.05, for a cluster-defining threshold of puncorrected < 0.001). These included the left and right lateral occipital cortices, the left and right superior parietal lobes, the right insular cortex, and three frontal regions (Table 1). Next, we evaluated the individual pairwise contrasts between the Congruence conditions. Following our hypothesis, we expected that the probe would elicit a stronger BOLD response when it matched, compared with when it mismatched, the shape category of a concurrently memorized shape. We found an extensive network of brain regions that showed enhanced responses for the memorized condition both compared with the discarded (Fig. 3A, top) and the unrelated (Fig. 3A, bottom) conditions (pcFWE < 0.05). This network comprised the visual processing region in the inferior division of the left (and right for the memorized > unrelated contrast) lateral occipital cortex extending into the inferior temporal gyrus, and bilateral superior parietal lobe extending into the superior division of the lateral occipital cortex. In addition, the network included frontal regions along the left and right precentral sulci, extending into the right dorsolateral prefrontal cortex (a region corresponding to the pars triangularis of the right inferior frontal gyrus).
Results of univariate t contrasts between Congruence conditions at the group level
Results of the univariate analyses of BOLD response to the probe. A, Clusters of significant voxels (pcFWE < 0.05, for cluster-defining voxel threshold puncorrected < 0.001) from two contrasts, projected on an inflated surface of the standard Montreal Neurological Institute (MNI) template brain. None of the other contrasts (i.e., unrelated > discarded; discarded > unrelated; unrelated > memorized; and discarded > memorized) yielded significant clusters of voxels. B, A coronal slice encompassing visual processing areas in the occipital and parietal cortices (y-coordinate, 35 mm, MNI) on which the significant clusters of voxels from A are binarized and superimposed (80% opacity) on the significant clusters of voxels from the stimulus > baseline contrast of the functional localizer run (at the same voxel threshold, for illustrative purposes). C, Depiction of our ROIs. These ROIs comprised the significant voxels (p < 0.05FWE) in the stimulus > baseline contrast of the functional localizer run, separated into occipital (red), parietal (green), and frontal voxels (blue). D, Average parameter estimates for each Congruence condition (memorized, unrelated, discarded) against baseline, for each of the ROIs, as well as for the compound ROI comprising the occipital, parietal, and frontal ROIs together. Error bars represent the SEM. *p < 0.05, **p < 0.005, ***p < 0.0005.
The reciprocal contrasts (unrelated > memorized; discarded > memorized) revealed no significant difference in BOLD response in any brain region. Also, there was no differential activation between probes in the discarded and unrelated conditions (discarded > unrelated and unrelated > discarded contrasts). Finally, because cluster-level analyses lead to higher rates of false positives when lower (i.e., more liberal) cluster-defining thresholds are used (Friston et al., 1994; Eklund et al. 2016), we replicated our cluster-level analyses with a cluster-defining threshold of puncorrected = 0.0001 (rather than puncorrected = 0.001). All the clusters observed in our previous analyses survived this more stringent analysis. Together, the present results indicate that visual input elicits a stronger neural response—in higher-level occipital areas as well as in frontal and parietal regions—when it matches rather than mismatches the content of VWM.
Next, we investigated whether the neural response to probe stimuli was modulated by the content of VWM within our functionally defined ROIs. Submitting the average parameter estimates within our localizer-based compound ROI (i.e., composed of the occipital, parietal, and frontal ROIs) to a one-way repeated-measures ANOVA with the factor Congruence (memorized, unrelated, discarded), we found a significant main effect of Congruence (F(2,13) = 12.33, p = 0.001). Subsequent t tests revealed that the BOLD response in these voxels was stronger for probes in the memorized condition than in either the discarded (t(14) = 4.30, p = 0.001) or the unrelated (t(14) = 4.56, p < 0.001) condition, but did not differ between the discarded and the unrelated conditions (t(14) = 0.31, p = 0.764). Next, we considered the separate occipital, parietal, and frontal ROIs that constitute the compound ROI. A three-by-three repeated-measures ANOVA was conducted on the average parameter estimates, with the factors ROI (occipital, parietal, and frontal) and Congruence (probe is in the memorized, unrelated, or discarded condition). This revealed a main effect of ROI (F(2,28) = 16.020, p < 0.001), a main effect of Congruence (F(2,28) = 15.194, p < 0.001), and an interaction between ROI and Congruence (F(4,46) = 3.261, p = 0.081). Post hoc Tukey's tests revealed that, across ROIs, parameter estimates were higher in the memorized condition than in both the unrelated (t = 4.891, ptukey < 0.001) and discarded (t = 4.648, ptukey < 0.001) conditions, with no difference between these two conditions (t = 0.234, ptukey = 0.968). This pattern of results was further investigated within each of the individual ROIs. Repeated-measures ANOVAs performed for each ROI individually showed that the main effect of the factor Congruence was significant in the occipital ROI (F(2,13) = 6.86, p = 0.009) in the parietal ROI (F(2,13) = 13.76, p < 0.001) and in the frontal ROI (F(2,13) = 10.28, p = 0.002). Subsequent paired-samples t tests confirmed that probes elicited a stronger BOLD response in the memorized condition than in the discarded condition within the occipital ROI (t(14) = 2.83, p = 0.013), the parietal ROI (t(14) = 4.95, p < 0.001), and the frontal ROI (t(14) = 3.97, p = 0.001). Similarly, probes elicited a stronger BOLD response in the memorized condition than in the unrelated condition within the occipital ROI (t(14) = 3.76, p = 0.002), the parietal ROI (t(14) = 4.74, p < 0.001), and the frontal ROI (t(14) = 4.68, p < 0.001). Again, the BOLD response to probes in the unrelated and discarded condition did not differ in any of the ROIs (all p's > 0.4). Average parameter estimates for all ROIs are depicted in Figure 3C. Overall, these findings indicate that those brain regions that respond to the presentation of our shape stimuli show a larger BOLD response when these stimuli match rather than mismatch the content of VWM.
fMRI multivariate results
The neural response to visual input is enhanced when it matches the content of VWM. Brain regions that contribute to this effect are thus expected to contain shape-specific information of both the probe and the content of VWM. Following this line of reasoning, we investigated whether the compound occipital–parietal ROI contains (1) shape-specific information on the to-be-memorized shape during the retention interval leading up to the presentation of the probe, and (2) shape-specific information on the shape category of the probe. As expected, linear classification between shape categories revealed above-chance classification accuracy for the shape category maintained in VWM during the retention interval (57.4%, SE = 2.4; t(14) = 3.13, p = 0.007), as well as above-chance classification accuracy for the shape category of the probe (54.2%, SE = 1.8; t(14) = 2. 31, p = 0.037; Fig. 4). The compound occipital–parietal ROI was chosen due to superior classification performance in the independent functional localizer data (see Materials and Methods). Exploratory analyses within the separate occipital, parietal, and frontal ROIs revealed that both VWM (55.7%, SE = 1.5; t(14) = 3.90, p = 0.002) and probe-shape category (58.2%, SE = 2.4; t(14) = 3.40, p = 0.004) yielded above-chance decoding accuracies within the occipital ROI, but not within the parietal and frontal ROIs (all p's > 0.05). Exploratory correlational analyses revealed that classifier performance for the shape category of VWM and shape category of the probe were correlated within the parietal ROI (r = 0.73, p = 0.002) but not in the occipital or the frontal ROI (all p's > 0.8). These exploratory correlational analyses suggest that VWM representations and representations elicited by visual input are still dissociated in the lateral occipital cortex, whereas they rely upon a more shared neural substrate in the superior parietal cortex.
Classification accuracies (and SEMs) for dissociating between shape categories maintained in VWM, and for dissociating between shape categories of the probe within the compound occipital–parietal ROI (left), and the three separate occipital, parietal, and frontal ROIs. *p < 0.05, **p < 0.005.
The univariate data revealed that VWM-matching visual input is enhanced in terms of signal strength. Next, we aimed to discover whether VWM-matching visual input is also enhanced in terms of information content. Following the sensory recruitment stance, we hypothesized that visual representations of different shape categories should be more distinct when visual input matches compared with when it mismatches the content of VWM. Thus, we investigated whether linear classification of shape category yielded higher classification accuracies when the probe and the content of VWM were of the same shape category compared with when they were not (Fig. 5). Within the compound occipital–parietal ROI, the classifier was able to successfully distinguish between shape categories when VWM and probe were of the same shape category (56.4%, SE = 2.0; t(14) = 3.15, p = 0.007), which is significantly higher than when classifying between shape categories of the probe that failed to match the content of VWM (t(14) = 2.231, p = 0.043) and marginally higher than when classifying between shape categories maintained in VWM that mismatched the shape of the probe (t(14) = 2.109, p = 0.053). In fact, the classifier was unable to distinguish between shape categories of VWM-mismatching probes (49.3%, SE = 1.9; t(14) = 0.38, p = 0.713; BF0+ = 6.7; BF0+ = 13.0 with an ultrawide prior width of 2), nor was it able to distinguish between probe-mismatching shape categories maintained in VWM (50.6%, SE = 2.4; t(14) = 0.23, p = 0.822; BF0+ = 4.3; BF0+ = 8.1 with an ultrawide prior width of 2). Subsequent exploratory analyses in the three separate occipital, parietal, and frontal ROIs revealed that this pattern of significant results described above only emerged in the parietal ROI. In sum, classifier performance for VWM-matching visual input cannot be accounted for by either classification of the probe shape category alone, or by classification of the shape category in VWM alone. Rather, the content of VWM enhanced the neural representation of matching visual input.
Classification accuracies (and SEMs) for dissociating between shape categories with a constant (mismatching) shape in VWM (left bar), between matching VWM and probe shape categories (middle bar), and between VWM shape categories with a constant (mismatching) probe shape. Classification was performed within the compound occipital–parietal ROI, and within the three separate occipital, parietal, and frontal ROIs. Classifier performance observed in the middle bar (i.e., matching), but not in the left and right bars (i.e., mismatching), indicates enhanced classifier performance for VWM-matching probes, which cannot be accounted for by classification of either the shape of the probe when it mismatches VWM (left bar) or by classification of VWM contents when it mismatches the probe (right bar). *p < 0.05, **p < 0.005.
Discussion
In many situations the human brain must maintain information in VWM for subsequent behavior, while simultaneously continuing to process visual input. This raises the question of how visual information maintained in VWM affects the processing of concurrent visual input. Here, we demonstrate that when visual input matches rather than mismatches the content of VWM, it elicits an enhanced neural response. Specifically, those brain regions responsive to the presentation of the stimuli showed an enhanced BOLD response to shape stimuli when a shape of the same category was concurrently maintained in VWM. Increased activity levels were observed in the inferior lateral occipital cortex, in the inferior parietal lobule, and along the precentral sulcus. The enhanced neural response to VWM-matching visual input provides a common explanation for a plethora of behavioral phenomena, such as attentional capture and preferential access to awareness (Soto et al., 2008; Gayet et al., 2013). In addition to the enhanced univariate neural response observed for VWM matching visual input, we also observed an enhanced multivariate neural response: the pattern of neural activity elicited by different geometrical shapes was more distinct when visual input and the content of VWM were in accordance, compared with when they were in discordance. This finding shows that VWM enhances the neural response to matching visual input not only in terms of signal strength, but also in terms of information content. This enhanced multivariate response is also in line with a sensory recruitment stance on VWM: visual representations of geometrical shapes relied upon the same pattern of neural activity when they were memorized (i.e., in VWM) or perceived (i.e., presented as a probe). On the one hand, we advocate caution in interpreting these multivariate analyses whose statistical robustness was less than that of the univariate analyses. On the other hand, these multivariate analyses corroborate the findings of the univariate analyses by showing that visual input that matches the content of VWM is enhanced both in terms of signal strength and information content.
The question remains at what stage of the visual-processing hierarchy the content of VWM affects concurrent visual input. One potential candidate is V1, where qualitatively similar neural traces have been observed during VWM maintenance and visual stimulation of oriented gratings (Serences et al., 2009; Harrison and Tong, 2009). Although the retinal distance between primes and probes in our study exceeded the receptive field sizes typically observed in V1 (Harvey and Dumoulin, 2011), orientation representations in V1 are not necessarily spatially selective (Ester et al., 2009). Therefore, retinal nonoverlap of our primes and probes do not rule out contributions of V1 to the present findings. Shape-category information, however, is arguably less likely to be stored in V1 than orientation information (for review, see Christophel et al., 2017).
In our study, lateral occipital areas and superior parietal areas both showed an enhanced response to visual input that matched the content of VWM. In lateral occipital areas, we were able to classify the shape category of both the shape maintained in VWM, and the shape presented on the screen, based on the pattern of neural activity. This supports the idea that lateral occipital areas served a content-based role in VWM maintenance in our study, which fits well with the proposed role of such brain areas as the lateral occipital complex (LOC) in representing categorical visual object information. More specifically, the LOC was found to be sensitive to differences between stimulus categories, while being relatively insensitive to noncategorical properties, such as viewpoint (Guggenmoset al., 2015), object size, and, most importantly, stimulus exemplars within categories (Grill-Spector et al., 1999; Eger et al., 2008). VWM maintenance of objects likewise has been related to the LOC (Xu and Chun, 2006) and, in line with the findings of the present study, also to superior parietal areas (Song and Jiang, 2006; Xu and Chun, 2006; Christophel et al., 2012; Ester et al., 2015; Bettencourt and Xu, 2016). In our exploratory analyses, a correlation was observed between probe and VWM shape classification in the parietal (but not in the occipital) ROI. Possibly, neural activities during VWM (of centrally presented shapes) and probes (peripherally presented elongated shapes) are more similar in parietal areas, which might code for stimuli in a more abstract manner. This could explain why the enhanced response to VWM-matching probes was most pronounced in the superior parietal cortex, although shape-classification accuracy was higher in the occipital cortex.
Finally, our frontal ROI also showed an increased univariate response to visual input that matched the content of VWM. The multivariate analyses, however, revealed no content-based activity in our frontal ROI, which is in line with the idea that frontal areas serve a control (rather than storage) function in VWM maintenance (Sligte et al. 2013). Our frontal ROI corresponds to two subregions of the precentral sulcus—the superior and inferior precentral sulci—that are characterized by topographical organization (Hagler and Sereno, 2006; Kastner et al., 2007; Jerde et al. 2012). In these studies, the inferior and superior precentral sulci are related to covert attention and saccade preparation toward peripheral stimuli, which is in line with the nature of the functional localizer run in our experiment. The finding that these areas show a stronger response to peripheral stimuli when they match compared with when they mismatch the content of VWM, could reflect that VWM-matching stimuli elicit a stronger neural response, thereby attracting attention and eliciting saccade preparation to a greater extent than mismatching stimuli. Behavioral findings have demonstrated that this is indeed the case (Soto et al., 2008; Silvis et al., 2014).
Our current findings add to the existing evidence for sensory recruitment in VWM maintenance (Serences et al., 2009; Harrison and Tong, 2009; Silvanto and Cattaneo, 2010). From this perspective, a possible interpretation of our findings is that the content of VWM preactivates neural populations, such that subsequent matching visual input, tapping upon the same neural circuitry, enjoys a priori elevated activity levels (Reynolds and Chelazzi, 2004; for a similar view, see Chelazzi et al., 2011). An alternative account of the present findings is that the stronger neural response observed here does not reflect preactivation, but top-down amplification of visual input that matches the content of VWM. From that perspective, the influence of VWM content on concurrent visual input does not rely on a putative shared neural substrate. Rather, identifying a match between the content of VWM and the concurrent visual input could occur at a nonsensory processing level, allowing for retroactively amplifying matching visual input. In a recent behavioral study, we were able to disambiguate between these two possibilities (Gayet et al., 2016). In this study, sequential sampling models (for a review, see Ratcliff and Smith, 2004) were used to model the perceptual processes leading up to the preferential detection of VWM-matching over VWM-mismatching stimuli. Model comparisons favored a model allowing for an a priori bias (as expected by preactivation), over a model allowing for faster accumulation of perceptual evidence (as expected by top-down amplification), in favor of VWM-matching visual input. Thus, the present results are more likely to be explained by a (bottom-up) preactivation account than a (top-down) amplification account. The present results are also in line with a matched-filter account, according to which the response to matching visual input is enhanced, due to alterations in the tuning properties of neurons that optimize responsiveness to the feature of interest (David et al., 2008).
Our current findings seem at odds with the finding that visual input that matches expectations elicits a weaker neural response (den Ouden et al., 2009; Alink et al. 2010; Kok et al., 2012). This is especially surprising, considering that an expected stimulus elicits a sustained cortical visual representation (Kok et al., 2014) akin to a stimulus maintained in VWM. Future research is needed to investigate how stimulus-specific delay activity can sometimes elicit an enhanced neural response (as is the case for VWM), and sometimes elicit a reduced neural response to matching visual input (as is the case for expectancy). Together, these phenomena allow for favoring potentially relevant visual input (i.e., unexpected visual input, or visual input that matches the current task goals) over irrelevant visual input.
To conclude, our results demonstrate that the neural response to visual input is enhanced in terms of both signal strength and information content when it matches the content of VWM. Considering that content-specific neural responses were observed in lateral occipital and superior parietal areas, we conclude that the observed interaction between visual input and VWM originated in high-level visual-processing areas. The present results add to the existing evidence that a common neural substrate underlies the processing of visual representations, regardless of whether their origin is retinal or mnemonic.
Footnotes
This work was supported by Grants 404.10.306 (to S.V.d.S. and C.L.E.P.) and 452.13.008 (to S.V.d.S.) from the Netherlands Organization for Scientific Research, and by a seed money grant from Neuroscience and Cognition Utrecht (to S.G.). M.G. and P.S. were supported by the German Research Foundation (Grants STE 1430/6-2 and STE 1430/7-1.
The authors declare no competing financial interests.
- Correspondence should be addressed to Surya Gayet, Department of Experimental Psychology, Utrecht University, Heidelberglaan 1, 3584 CS Utrecht, The Netherlands. surya.gayet{at}gmail.com