Abstract
How content is stored in the human brain during visual short-term memory (VSTM) is still an open question. Different theories postulate storage of remembered stimuli in prefrontal, parietal, or visual areas. Aiming at a distinction between these theories, we investigated the content-specificity of BOLD signals from various brain regions during a VSTM task using multivariate pattern classification. To participate in memory maintenance, candidate regions would need to have information about the different contents held in memory. We identified two brain regions where local patterns of fMRI signals represented the remembered content. Apart from the previously established storage in visual areas, we also discovered an area in the posterior parietal cortex where activity patterns allowed us to decode the specific stimuli held in memory. Our results demonstrate that storage in VSTM extends beyond visual areas, but no frontal regions were found. Thus, while frontal and parietal areas typically coactivate during VSTM, maintenance of content in the frontoparietal network might be limited to parietal cortex.
Introduction
Visual short-term memory (VSTM) refers to the ability to temporarily store visual information. Despite substantial research, its neural mechanisms have remained unclear. One common view is that visual content is re-represented in prefrontal cortex during short-term memory (Goldman-Rakic, 1995; Constantinidis and Procyk, 2004; Courtney, 2004; Funahashi, 2006). Other theories suggest that visual short-term storage is implemented in brain regions involved in visual perception and that prefrontal cortex carries out mainly control and gating functions but does not store the actual visual contents (Postle, 2006; Jonides et al., 2008). A third position argues that higher-order areas in posterior parietal cortices represent VSTM content (Todd and Marois, 2004; Edin et al., 2009). Here we aimed at a distinction between these theories.
In human neuroimaging studies, various brain regions have been shown to alter their activity during short-term memory, including occipital, parietal, temporal, and frontal cortices (McCarthy et al., 1994; Courtney et al., 1997; Ungerleider et al., 1998; for review, see Wager and Smith, 2003). While changes in overall BOLD activity indicate the involvement of a given brain region in short-term memory, they do not allow researchers to assess whether an area stores the specific content of short-term memory. It is possible that overall increases in activity during short-term memory tasks could reflect stimulus-unspecific processes that merely subserve short-term memory rather than representing the remembered stimulus. To demonstrate storage of short-term memory content directly, it is necessary to show that BOLD activity is content-specific.
Here, we used multivariate decoding to investigate content specificity of VSTM signals in different brain regions. In particular, we searched for areas from which we could reliably predict the identity of several remembered stimuli during a phase of active maintenance. If activity patterns in these regions carried information about the content of VSTM, this would point toward active storage in these neural populations. To avoid contributions of nonvisual semantic processes, we used complex artificial stimuli that did not belong to a semantic category (Fig. 1). Importantly, these complex stimuli differed from each other in multiple visual features such as shape, color, and spatial arrangement. In that way, we avoided the use of geometrically simple stimuli that can easily be represented using a verbal code (e.g., “red square” vs “green circle”). In addition, we adapted an experimental design (Sperling, 1960; Oberauer and Kliegl, 2001; Lepsien et al., 2005; Harrison and Tong, 2009) that elegantly allowed us to investigate storage independent of any low-level residual persistence to the stimulus.
Materials and Methods
Participants.
Nineteen healthy right-handed subjects with normal or corrected-to-normal vision participated in the experiment. The study was approved by the ethics committee of the Max-Planck Institute for Human Cognitive and Brain Sciences and conducted according to the Declaration of Helsinki. Two subjects were excluded due to poor task performance. The final sample consisted of five male and 12 female subjects (mean age, 26.9 years; SEM, ±0.96 years).
Task and stimuli.
Subjects executed a demanding VSTM task while positioned in an MRI scanner. On each trial, they were required to remember a complex artificial stimulus and to compare it with two test stimuli presented after a delay. The task was to identify which of the two test stimuli was more similar to the remembered stimulus. The stimuli consisted of multicolored random fields, random stimuli that had no semantic content (size: 4°). To allow for subsequent fMRI classification of the stored VSTM content, the number of sample stimuli for each subject to remember in the scanner was limited to four (Fig. 1B), but a different set was generated for each subject (i.e., 68 stimuli were used overall). These four images were equidistant in similarity space as defined by the maximal difference in the Fisher's z transform of the correlations between the samples (Δz < 0.15).
Task, stimuli, and instruction were designed to enforce an encoding of the stimuli that was based on their perceptual features rather than verbal terms. We used stimuli that had no explicit semantic content, and the similarity task we used strongly focused on the visual features of the stimuli. Also, we instructed the subject not to use a verbal code to remember the stimuli and trained them using a very large set of memory stimuli, which discourages verbal representations even more. After this training, all subjects were confident that they could execute the task without using semantic terms as part of their strategy.
At the beginning of each trial, two of the four sample stimuli were presented consecutively (Fig. 1A). Each was shown for 0.8 s followed by a 0.2 s fixation period. This was followed by the presentation of a retro-cue for 0.5 s. This cue consisted of either the digit “1” or “2”, indicating whether the first or the second stimulus was to be remembered. This procedure was chosen to disentangle the different contributions of mere stimulus presentation from processes specific to short-term memory (Sperling, 1960; Oberauer and Kliegl, 2001; Lepsien et al., 2005; Harrison and Tong, 2009), with the not-remembered sample serving as a control. The retro-cue was presented on top of a colored pattern mask that was introduced to suppress residual sample-related visual activity. The cue was followed by a fixation marker for the rest of the delay phase. The delay period lasted 12 s, beginning with the onset of the retro-cue and lasting until the onset of the test stimuli. The experiment included a number of catch trials with shorter delay times (10, 8, 6, 4, or 2 s) to make the onset of the test stimuli less predictable.
At the end of the delay, two test stimuli were shown simultaneously for 1 s left and right of fixation (2° offset). Subjects were asked to indicate which of the two test stimuli was more similar to the memorized sample. Similarity was estimated by the difference in z transformed correlations (z > 0.6; Fig. 1C). In addition, one of the two test stimuli (either the target or the nontarget) was more similar to the not-remembered sample (z > 0.6) to control for visual confounds. Responses (left or right index finger) to this highly challenging task were recorded and analyzed up to 4 s after the onset of the test stimuli, although subjects were trained to respond within 2 s.
The fMRI scanning session consisted of four runs. Each run contained 48 experimental trials and eight catch trials. The catch trials were not included in further analyses. With four memory samples, there were a total of six possible pairings of memory stimuli per trial. All of these pairs appeared equally often, and the design was counterbalanced for the order in which sample stimuli appeared and which of the two items was cued. Thus, every sample had to be remembered in 12 trials per run. The trial order was fully randomized. Behavioral training took place on one of the 2 d before the scanning session. In the training runs, memory sample sets were different from those used in the scanner to avoid long-term consolidation of the memory items. The training procedure started with a short exercise of the similarity task without any memory demand to make sure that participants understood the instructions and were able to execute the task. The actual training consisted of three or four runs identical to the runs in the MRI scanner, with two exceptions. All but the last training run included feedback to facilitate learning of the task, and the delay duration for the training was reduced to 4 s. On the scanning day, the four memory samples for the scanning session were introduced with a short similarity task exercise and one training run with feedback.
FMRI acquisition and analysis.
MRI data were acquired on a 3T TIM Trio scanner (Siemens) and all analyses were performed using SPM8, linear support vector machine (LIBSVM), and in-house software. We acquired functional BOLD images (T2*-weighted gradient-echo EPI: 32 contiguous slices; whole brain; TR = 2000 ms; TE = 30 ms; slice thickness = 3 mm; gap = 0.75 mm; ascending order; flip angle = 90°; FOV = 192 mm) and structural MRI data (T1-weighted MPRAGE: 192 sagittal slices, TR = 1900 ms, TE = 2.52 ms, flip angle = 9°, FOV = 256 mm). In each run, we collected 545 functional images time-locked to the stimulus presentation. This stimulus-locked acquisition procedure allows for a more accurate estimation of the hemodynamic response by omitting the acquisition of intermediate time-points. To preserve this fine-grained spatiotemporal structure of fMRI activity, no smoothing, normalization, or slice-time correction was performed, and preprocessing was limited to spatial realignment.
To identify brain areas in which changes in activity reflect changes in VSTM content, we used a time-resolved multivariate decoding technique (Soon et al., 2008). First, we used a general linear model (GLM) to estimate the responses to the different stimulus conditions (four remembered samples) for each of the six images recorded during the delay (finite impulse response design). We then conducted a decoding analysis for each time point to identify brain regions that carry spatially distributed information about the identity of the remembered sample. For this, we used a searchlight approach (Kriegeskorte et al., 2006; Haynes et al., 2007), which examines the information in small spherical voxel clusters at each position in the brain. This approach allowed us to extract information from locally distributed fMRI patterns without potentially biasing prior voxel selection to specific brain regions.
For a given voxel, vi, we first defined a small spherical cluster (radius: 4 voxels) centered on vi. In this parcel of data at a given time point, we separately estimated content-specific information for the remembered sample and for the stimulus that was not cued to be remembered. We did this using pairwise classification, meaning that we estimated information for each possible pair of two samples and then averaged this information estimate across all pairs. For each pair, we trained an LIBSVM to classify between data for the two samples using standardized parameter estimates from three of the four runs. The data from the remaining run was used to test the classifier for generalization. The training and testing procedure was repeated until every single run had been used as a test set (fourfold cross-validation).
The prediction accuracy for a given voxel vi was averaged across runs and pairwise comparisons. A different option for aggregating pairwise classifications is to use voting tables to generate one decision per label. The voting-table approach binarizes the output of the pairwise classifiers and feeds it into a vote, whereas the individual accuracies are simply averaged here. We choose this averaging approach because by omitting the binarization the information from the pairwise classifiers is better retained.
For each subject, the distribution of decoded information across the whole brain was represented by six accuracy maps, one for each time point. In a second analysis, this searchlight classification procedure was also used to estimate the amount of information in the brain about the behaviorally irrelevant control sample. This was done to estimate whether any low-level residual persistence of the visual presentation could have contributed to our findings.
For group analyses, the accuracy maps were transformed to MNI space using unified segmentation (Ashburner and Friston, 2005) and smoothed (5 mm FWHM) to account for individual differences. Using one-way ANOVAs with six levels (one for each time point) and extra sum of squares F-contrasts (Ollinger et al., 2001; Motulsky and Christopoulos, 2004), we tested whether a given voxel contained information against chance (50%) without any assumptions about the temporal unfolding of information. We report significant voxels at a threshold of p < 0.05, family-wise error corrected for multiple comparisons. For this correction, the smoothness of the statistical maps was estimated using standardized residuals from the group-level general linear model (Kiebel et al., 1999). This GLM was based on maps of decoding accuracies and the residual maps were affected in their smoothness, both by the searchlight procedure itself and through the postclassification Gaussian smoothing. Please note that the smoothness estimation has been shown to be more robust for higher levels of smoothing (Kiebel et al., 1999). In addition, we used a cluster size threshold of 20 voxels. This approach allows for a more conservative statistical approach to identify significant voxels without changing the p threshold. Specifically, cluster correction is intended to remove small clusters that are difficult to interpret in a physiological way. For descriptive purposes, we also show time series of decoded information within the peak of each cluster. Please note that although these time series provides additional information, the absolute value of the decoding accuracies in the time series is not an unbiased estimator of the true information.
We also investigated univariate increases in activity during the working memory delay. Delay period activity was assessed using a hemodynamic response function (HRF)-based GLM modeling approach (Curtis and D'Esposito, 2003) collapsed across the different contents for higher sensitivity. Parameter estimates for the delay period regressor were tested for significance using a one-sample t test.
We next tested whether areas with increased activation also carry information about the working memory content. For this purpose, we generated spherical ROIs of four voxels radius centered at the cluster peaks and tested the information content of these ROIs as described above. Finally, we also used coordinates from the literature to investigate whether there is information at typical sites of working memory-related activation, again using spherical ROIs of four voxels radius.
Results
Behavioral performance
The similarity detection task was intended to be demanding to enforce detailed encoding of the sample stimuli. Two subjects failed to perform above chance in at least one of the four runs and were excluded from further analysis. In the remaining sample of 17 subjects, the average percentage of correct responses was 77.07% (SEM ± 1.9%; Fig. 1D). Trials with no response were discarded (1.08% of all trials).
Decoding short-term memory content
Using a time-resolved searchlight approach (Kriegeskorte et al., 2006; Soon et al., 2008), we examined where in the brain distributed fMRI patterns contained information about the content of VSTM. More specifically, we searched for areas that showed above chance classification of four sample stimuli that each individual subject held in memory during the delay phase in the experiment. We used an extra sum of squares F test that tested information at each of the six time points of the delay period against chance (Ollinger et al., 2001; Motulsky and Christopoulos, 2004). The critical advantage of this method is that it makes no assumption about the temporal unfolding of the delay period signal. Figure 2A depicts areas (green) from which the memorized sample could be decoded during the delay period (pFWE < 0.05, k = 20), shown on a rendered representation of the human brain from a posterior point of view.
We found two areas involved in VSTM storage bilaterally. Posterior parietal cortex (PPC) and early visual cortex. The left PPC cluster (MNIpeak: [−26 −66 56]; F: 12.89; corrected pFWE < 0.001) is shown in sagittal, coronal, and axial slices in Figure 2B (top). The coronal slice also shows the right PPC cluster (MNIpeak: [20 −66 64]; F: 8.01; corrected pFWE < 0.05). Figure 2B (bottom) shows sagittal, coronal, and axial slices of an early visual cluster in the right hemisphere (MNIpeak: [34 −88 2]; F: 10.23; corrected pFWE < 0.001). The coronal slice in Figure 2B (bottom) and the rendered representation in Figure 2A show that the left PPC cluster extends into left early visual areas as well. The pattern of significant voxels shows some hemispheric asymmetry, suggesting a lateralization of working memory storage. To investigate this directly, we conducted three 6 (time) × 2 (hemisphere) repeated-measurements ANOVAs for accuracy values at the three peak voxels reported above and their contralateral counterparts. Importantly, none of the ANOVAs showed a significant main effect of hemisphere, nor a hemisphere × time interaction. Thus, we cannot conclude that storage in early visual cortex or parietal cortex is lateralized. We interpret the asymmetry as an effect of thresholding. With a relaxed threshold of pFWE < 0.4, we see more extended regions in bilateral occipitoparietal cortex.
Notably, the group analyses performed cannot rule out information present elsewhere in the brain in single subjects. In addition, as each of the subjects remembered a unique set of four stimuli and as we report the average of all pairwise comparisons between these stimuli, these results indicate storage of a general class of stimuli. On the other hand, these results cannot be seen as evidence for storage of each and every stimulus. To test whether a bias for a subset of stimuli existed, we ran two analyses. First we extracted the decoding accuracies of all 102 pairwise comparisons (17 subjects × 6 pairwise comparisons per subject) for the three peak searchlights and plotted their distribution. As can be seen in Figure 3, these distributions were unimodal and largely symmetric, thus not indicating any bias toward a subset of stimuli.
We further tested whether a strong bias for a subset of stimuli existed by iteratively probing the information content of single pairwise comparisons. For each iteration, we randomly picked one of the six pairwise comparisons per subject and ran a one-sample t test on accuracy against chance at the group-level. We then repeated this bootstrapping procedure 10,000 times. Of the 10,000 t tests per region, 97% in left PPC and 98.7% in right early visual cortex showed significant above chance classification (p < 0.05), thus suggesting storage not only for a subset, but for a large number of different stimuli.
Time course of information
We also examined the time course of stored information in PPC and early visual cortex (Fig. 4). We extracted the time courses of voxels at the statistical peaks of the clusters reported above and averaged across subjects. The time series data are shown at the exact time points of data acquisition of the respective slices. The slice acquisition offsets relative to image acquisition onsets was 1.462 s (SEM: 18 ms) in left PPC, 1.58 s (SEM: 16 ms) in right PPC, and 0.620 s (SEM: 22 ms) in right visual cortex.
We found no information about the remembered content at the beginning of the delay, but information peaks directly after this gap and remains above chance during the whole delay. This is generally consistent with typical time courses of BOLD activity during working memory tasks (Linden et al., 2003) and previous multivariate decoding studies (Harrison and Tong, 2009), which all include some offset between the beginning of the memory process and reliable differences in fMRI-based measures. Notably, the delay between stimulus encoding and the rise of content-specific signals is slightly larger in the current study than in prior work. This could be a result of the more accurate estimation of the time courses in the present study that did not include any smearing due to slice-timing correction. In addition, this could be a consequence of perceptual activity induced by the pattern mask that might have obscured content-specific activity at the beginning of the delay. Finally, it cannot be ruled out that information was stored elsewhere in the brain or in a format that is undetectable for our methods during the initial moments of the delay period.
Decoding the not remembered stimulus
To control for any contamination of our results by visual persistence of the memory samples, we conducted a control analysis in which we tried to identify areas containing information about the other, visually presented but not remembered sample stimulus that was irrelevant in a specific trial. If evidence for storage of the remembered sample were attributable to the visual exposure, an identical effect should be present for the not-remembered sample. All methods and parameters used in this analysis were identical to the ones used for the decoding of the remembered sample. The test did not reveal any voxels that showed significantly (pFWE < 0.05) above-chance decoding accuracies. Thus, we ensured that our results are specific to VSTM and not confounded by visual stimulation.
Activation increases during the memory delay
We also investigated activation increases using a standard HRF-based design (Curtis and D'Esposito, 2003). Several regions showed an increased univariate response during the delay period (pFWE < 0.05). In particular, we found areas in angular gyrus (MNIpeak: [−50 −70 36]), precuneus (MNIpeak: [−4 −50 34]) and medial prefrontal cortex (MNIpeak: [−4 42 14]). In an additional step, we tested whether these areas contain information. Importantly, none of these regions contained information about the remembered content even at uncorrected thresholds (all p > 0.05). Finally, to further consider storage in the prefrontal cortex, we investigated whether areas that are typically activated during working memory in other studies also represent remembered visual stimuli. From a cluster analysis of studies investigating passive storage (Wager and Smith, 2003, their Table 3), we selected the two cluster centroids in prefrontal cortex (MNI: [−33 32 12] and MNI: [33 31 12]) and generated spherical ROIs with a radius of four voxels at these positions. Surprisingly, there was no significant information in these two areas (both p > 0.05).
Notably, dissociations between univariate, overall signal increases and decodable information are frequently observed. For example, information could be retrieved from data where activity increases were minimal (Harrison and Tong, 2009), multivariate decoding allowed for dissociations for which mere BOLD differences were insensitive (Lewis-Peacock and Postle, 2012), and effect sizes estimates obtained from multivariate decoding and univariate analyses were shown to be uncorrelated in many areas (Jimura and Poldrack, 2012). A possible interpretation is that the processes in these univariately defined ROIs reflect processes that are not content-selective.
Discussion
In the present study, we investigated how the contents of VSTM are stored in the human brain. Specifically, our aim was to distinguish between models with prefrontal storage of visual content (Goldman-Rakic, 1995; Courtney, 2004; Funahashi, 2006) and models with storage in parietal (Todd and Marois, 2004; Edin et al., 2009) and visual (Postle, 2006; Jonides et al., 2008) areas. We found that activity patterns in regions of early visual cortex and in PPC encoded information about the memorized artificial visual stimuli during VSTM storage. Our findings are consistent with previous fMRI decoding work that demonstrated storage for orientation and color in early visual cortex (Harrison and Tong, 2009; Serences et al., 2009; Riggall and Postle, 2012). Importantly, these previous studies did not investigate areas beyond early visual cortex, especially not areas in prefrontal or parietal cortex.
Identifying content-specific signals in PPC (more specifically, areas at the intraparietal sulcus) shows that storage of VSTM contents extends beyond early visual areas and therefore beyond regions that are primarily involved in sensory coding. The storage of visual information thus involves regions that play a crucial role in visually guided action (Culham et al., 2006) and visual attention (Kastner and Ungerleider, 2000). These areas might at first sight seem unlikely candidates for the temporary storage of visual content. Visual processing in the dorsal visual stream is typically thought to encode where objects are in space (Mishkin et al., 1983) or how one can interact with them (Goodale and Milner, 1992), rather than the stimulus identity, which is believed to be encoded in the ventral stream. Two possibilities might explain the content-selectivity of this parietal brain region. For one, several studies have shown that the dorsal stream is involved in processing of visual objects (Sereno and Maunsell, 1998; Konen and Kastner, 2008). Thus, the memory-related information in PPC could reflect the temporary storage of such shape-like information. Another possibility is that PPC stores information about regions in the stimulus that are particularly conspicuous or salient. Consistent with this, PPC is believed to be involved in encoding of attention and saliency processing (Bisley and Goldberg, 2006; Bogler et al., 2011). It has been shown that PPC contains a retinotopic representation of the contralateral visual field (Sereno et al., 2001) and even that it has information about spatial attention in the ipsilateral visual field (Kalberlah et al., 2011). Our findings suggest that these contralateral and ipsilateral representations of visual space may also be capable of representing visual working memory content. Specifically, these maps could encode the spatial distribution of the colored contours (or their salient aspects) that defined the stimuli.
This is consistent with previous studies that reported changes in PPC activity during VSTM. For example, PPC activity is greater for face working memory and spatial memory compared with control tasks (Ungerleider et al., 1998) and PPC activity correlates with set size during VSTM (Todd and Marois, 2004). In addition, PPC predicts individual differences in VSTM capacity (Vogel and Machizawa, 2004; Todd and Marois, 2005). Finally, single-cell activity in PPC has been shown to encode the spatial position of a cue during a spatial working memory task (Constantinidis and Steinmetz, 1996).
Attention and working memory signals can be distinguished experimentally (Stokes, 2011). For example, a recent study (Lewis-Peacock et al., 2012) showed that attention shifts during working memory impair performance only slightly while the information extractable from delay-period activity is drastically reduced. In the current study, the time courses of content-specific signals (Fig. 4) are compatible with theoretical assumptions regarding the temporal unfolding of VSTM delay activity (Curtis and D'Esposito, 2003), but are not in line with an interpretation based on preparatory attention. Nonetheless, attention may play a role during VSTM not in the form of preparatory attention, but rather as attention-based rehearsal consistent with the location of our findings (Awh and Jonides, 2001; Corbetta and Shulman, 2002; Postle et al., 2004).
In attention and working memory experiments, PPC and the dorsolateral prefrontal cortex (DLPFC) typically coactivate during the allocation and maintenance of spatial attention (Hopfinger et al., 2000; Corbetta et al., 2002), forming the dorsal frontoparietal attention network. In addition, in short-term memory studies, DLPFC shows responses similar to PPC during the storage of visual content. For example, activity in DLPFC is increased during face working memory (Courtney et al., 1997; Ungerleider et al., 1998) and spatial working memory (McCarthy et al., 1994). DLPFC activity correlates with VSTM set size (Linden et al., 2003), but correlations with individual differences in VSTM capacity showed no significant effects (Todd and Marois, 2005). While previous work indicates a conjoint function of DLPFC and PPC in the dorsal frontoparietal network (Corbetta and Shulman, 2002), our findings suggest that content-selective short-term storage beyond visual cortex is specific to the PPC. Instead of storing visual content, PFC might control access to VSTM (McNab and Klingberg, 2008; Edin et al., 2009). Importantly, we cannot exclude the possibility that by using even more sensitive methods, information about visual working memory content can be retrieved from prefrontal cortex. Thus, future studies with higher resolution might still reveal content-selective working memory processing in prefrontal cortex.
Conclusion
Short-term memory storage extends beyond the visual cortex into visual field maps of the dorsal visual stream that are also involved in visual attention and saliency. Although frontal and parietal areas typically coactivate during VSTM, our results suggests that content storage in the frontoparietal network is specific to parietal cortex.
Footnotes
This work was supported by the Bernstein Computational Neuroscience Program of the German Federal Ministry of Education and Research Grant 01GQ0411, the Excellence Initiative of the German Federal Ministry of Education, and German Research Foundation Grants GSC86/1-2009 and HA 5336/1-1.
- Correspondence should be addressed to either Thomas Christophel or John-Dylan Haynes, Bernstein Center for Computational Neuroscience, Charité Universitätsmedizin Berlin, Philippstrasse 13, Haus 6, 10115 Berlin, Germany. thomas.christophel{at}bccn-berlin.de or haynes{at}bccn-berlin.de