Abstract
Contemporary models of episodic memory posit that remembering involves the reenactment of encoding processes. Although encoding-retrieval similarity has been consistently reported and linked to memory success, the nature of neural pattern reinstatement is poorly understood. Using high-resolution fMRI on human subjects, our results obtained clear evidence for item-specific pattern reinstatement in the frontoparietal cortex, even when the encoding-retrieval pairs shared no perceptual similarity. No item-specific pattern reinstatement was found in the ventral visual cortex. Importantly, the brain regions and voxels carrying item-specific representation differed significantly between encoding and retrieval, and the item specificity for encoding-retrieval similarity was smaller than that for encoding or retrieval, suggesting different nature of representations between encoding and retrieval. Moreover, cross-region representational similarity analysis suggests that the encoded representation in the ventral visual cortex was reinstated in the frontoparietal cortex during retrieval. Together, these results suggest that, in addition to reinstatement of the originally encoded pattern in the brain regions that perform encoding processes, retrieval may also involve the reinstatement of a transformed representation of the encoded information. These results emphasize the constructive nature of memory retrieval that helps to serve important adaptive functions.
SIGNIFICANCE STATEMENT Episodic memory enables humans to vividly reexperience past events, yet how this is achieved at the neural level is barely understood. A long-standing hypothesis posits that memory retrieval involves the faithful reinstatement of encoding-related activity. We tested this hypothesis by comparing the neural representations during encoding and retrieval. We found strong pattern reinstatement in the frontoparietal cortex, but not in the ventral visual cortex, that represents visual details. Critically, even within the same brain regions, the nature of representation during retrieval was qualitatively different from that during encoding. These results suggest that memory retrieval is not a faithful replay of past event but rather involves additional constructive processes to serve adaptive functions.
Introduction
Episodic memory is characterized by vivid reexperience of past events through mental time travel (Tulving, 1984). A long-standing hypothesis in memory research is that memory retrieval involves the reinstatement of encoding-related activity patterns (Alvarez and Squire, 1994; McClelland et al., 1995; Norman and O'Reilly, 2003), and/or the reenactment of encoding operations (Kolers, 1973, 1976; Kolers and Roediger, 1984). Consistently, imaging studies have shown that successful memory retrieval is accompanied by reinstatement of the task and category-level information during encoding (Kuhl et al., 2011; Staresina et al., 2012; Gordon et al., 2014). This activation reinstatement precedes memory (Polyn et al., 2005) and is associated with performance in free recall (Gelbard-Sagiv et al., 2008) and cued-retrieval (Kuhl et al., 2011). Recent studies using representational similarity analysis have revealed item-specific or event-specific reinstatement (Ritchey et al., 2013; Kuhl and Chun, 2014; Yaffe et al., 2014; Wing et al., 2015; Zhang et al., 2015), which provides the critical evidence linking pattern reinstatement and the retrieval of individual events.
Given the important role of item-specific reinstatement in advancing a mechanistic understanding of memory retrieval, it is worth a closer examination of existing evidence. In most of the previous studies that reported item-specific pattern reinstatement, the encoding and retrieval stage shared perceptual information. It is thus unclear whether the item-specific encoding-retrieval similarity (ERS) reflected overlapping perceptual or conceptual information. A recent study successfully removed the perceptual overlap by calculating the similarity between activation patterns during a recall test (where only the word cue was presented) and that in the recognition test (where only the associated picture was presented) (Kuhl and Chun, 2014). However, this result speaks more about item-specific retrieval rather than the ERS. It is also unclear whether their results reflected the reinstatement of the word (during recognition), the picture (during recall), or the word-picture association (in both stages).
More importantly, the nature of pattern reinstatement is poorly understood. Several lines of evidence suggest that reactivation/replay with perfect temporal and spatial precision is neither possible nor necessary (Kent and Lamberts, 2008). First, because of the stochastic nature of visual perception, replay of the same perceptual processes, such as the fixation scanpath, would be difficult (Kent and Lamberts, 2006). Second, what is reactivated during encoding and retrieval depends on the task requirements. Under many circumstances, mnemonic decisions could be made with the reinstatement of partial information. Third, EEG and iEEG data suggest that the replay could be compressed in time (Euston et al., 2007; Yaffe et al., 2014) and in reversed temporal order (Foster and Wilson, 2006). Late reply of early encoding processes has also been reported (Jafarpour et al., 2014). Finally, many studies have shown that memory retrieval is a constructive process (Schacter et al., 1998), and a new integrated representation would be formed by integrating the old experiences with the new information, resulting in memory updating (Kuhl et al., 2012) and memory integration (Zeithamova et al., 2012; Backus et al., 2016). To the best of our knowledge, no existing study has directly compared the neural representational spaces during encoding and retrieval, it is thus unclear whether the observed ERS reflects the precise reinstatement of encoded representation, or a transformed representation as suggested by the constructive memory models.
The present study aimed at addressing these questions with fMRI and a design that enabled us to simultaneously examine item-specific encoding, retrieval, and ERS (see Fig. 1). Participants were asked to study word-scene associations and later to retrieve the scene associated with a word cue. Importantly, each scene was paired with two different word cues, which enabled us to examine the item-specific scene representation during encoding and during retrieval. This also allowed us to examine ERS using encoding-retrieval pairs that did not share the same word cue, removing the confound of perceptual overlap. A high-resolution whole-brain scan was used to more finely assess the neural representations, and to explore the role of hippocampal subregions in memory encoding and retrieval.
Materials and Methods
Participants
Twenty healthy college students (11 males, mean age = 20.95 ± 1.96 years, range 18–25 years) participated in this study. All participants were right handed and had normal or corrected-to-normal vision. They were free of neurological or psychiatric history. Informed written consent was obtained from the participants before the experiments. The fMRI study was approved by the institutional review board of Peking University and the State Key Laboratory of Cognitive Neuroscience and Learning at Beijing Normal University in China.
Materials
Participants were introduced to learn word cue-picture pairs. Pictures were 48 well-known scenes, including 32 architectures (half from China and the other half abroad) and 16 natural landscapes (half depicting water landscapes and another half depicting terrestrial landscapes). The cues were 96 two-character Chinese verbs. Each picture was associated with two different word cues (cue set 1 and cue set 2). Words and pictures were randomly paired across subjects.
Experiment design and procedure
Prescan training.
Unlike previous studies where the associations were only exposed once during encoding (e.g., Staresina et al., 2012; Wing et al., 2015; Danker et al., 2016), 1 d before the fMRI scan, participants were first trained to be familiar with all pictures and then to remember all the 96 word-picture associations. The overtraining paradigm was used to make sure participants could recall the visual details of the pictures during retrieval. However, it would introduce retrieval processes during the encoding period in the scanner and reduce the differences between encoding and retrieval. Still, the high retrieval accuracy would also prevent us from linking pattern reinstatement with behavioral performance.
During picture familiarization, participants were instructed to pay attention to the detail of each picture to form a vivid mental image of the picture. During word-picture association learning, participants were asked to memorize the association using a self-paced procedure. The order of pairs was randomized during learning and across participants. They were encouraged to learn as much details of the pictures as they could to retrieve a vivid image during the memory test phase. Participants could test their memory in the self-test task, in which they were asked to report the category of the picture, i.e., foreign architectures (FA), Chinese architectures (CA), water landscape (WL), or terrestrial landscape (TL), by pressing one of the four keys. Once the accuracy was >95%, participants were then asked to orally report the details of the picture associated with each word cure. The training was ended once participants could correctly report the category and 4 details of the picture associated with each cue. On average, participants spent 2 hours in this session, during which each word-picture association was approximately studied or tested for 95 s (SD 11.4 s).
fMRI scan.
During the fMRI scan session, participants were asked to restudy the word cue-picture pairs (encoding run) and then to recall the details of pictures associated with the word cues (retrieval run) (see Fig. 1A). A slow event-related design (16 s for each trial) was used to obtain better estimates of single-trial BOLD responses associated with each item for both encoding and retrieval. During encoding, each trial started with 4 s presentation of the word cue-picture association, and participants were asked to try to remember as many details as they could (i.e., encoding stage). The frame of the picture then turned green for 2 s (i.e., category decision stage), during which participants were asked to judge the category of the picture but held their response until the frame turned red and the response labels showed on the screen. The response labels representing the four possible picture categories were introduced to prevent participants from planning motor response during the category judgment stage. Specifically, each response key/button corresponded to one of the four label locations (lined up from left to right) instead of the picture category, and the order of the four category labels was randomized across trials. Participants had another 2 s to make the response (i.e., response stage) according to the response labels. To prevent further processing of the word cue-picture association, participants were asked to do a perceptual oriental judgment task for 8 s. During this task, an arrow pointing either left or right was presented on the screen, and participants were asked to judge the orientation of the arrow as quickly as possible. A self-spaced procedure was used to make the task engaging.
The retrieval stage was similar to the encoding stage, except that, for the first 4 s, only the retrieval cue was presented, and participants were asked to retrieve the visual details of the associated picture (see Fig. 1A). For both the encoding and retrieval stage, participants were told explicitly to focus on the details and not simply the category of the pictures.
The 48 pictures were divided into two groups (to keep the scanning time of each run within 7 min). For each group, each picture was paired with cue set 1 in one encoding-retrieval session and then with cue set 2 in the following session. In each run, the word-picture pairs were presented in a random order. In total, there were four encoding-retrieval sessions (see Fig. 1B).
Postscan memory test.
After the scan, participants finished an oral test outside the scanner to report the details of the picture associated with each cue. Pictures correctly recalled with more than four different kinds of details (e.g., objects in the scene, color, structure, name, and so on) were scored as remembered with details.
fMRI image data acquisition and preprocessing
Scanning was carried out in the MRI Center at Peking University, using a MAGNETOM Prisma 3.0T MRI scanner (Siemens Healthcare) with a 64-channel head-neck coil. High-resolution functional images were acquired using a prototype simultaneous multi slices EPI sequence (FOV = 224 mm × 224 mm; matrix = 112 × 112; slice thickness = 2 mm; TR/TE/θ = 2000 ms/30 ms/90°, slice acceleration factor = 2). Sixty-four contiguous axial slices parallel to the AC-PC line were obtained to cover the whole cerebrum and partial cerebellum. High-resolution structural images using a 3D, T1-weighted MPRAGE sequence were acquired for the whole brain (FOV = 256 mm × 256 mm; matrix = 256 × 256; slice thickness = 1 mm; TR/TE/θ = 2530 ms/2.98 ms/7°). A high-resolution T2-weighed image was also acquired using a T2-SPACE sequence for use in medial temporal lobe (MTL) segmentation. The image plane was perpendicular to the main hippocampal axis and covered the whole MTL region (FOV = 220 mm × 220 mm; matrix = 512 × 512; slice thickness = 1.5 mm; TR/TE/θ = 13150 ms/82.ms/150°, 60 slices).
Image preprocessing and statistical analyses were performed using FEAT (FMRI Expert Analysis Tool), version 5.98, implemented in FSL (RRID: SCR_002823). The first 10 images from each run were automatically discarded by the scanner to allow scanner equilibrium. Functional images were realigned and temporally filtered (nonlinear highpass filter with a 90 s cutoff). The EPI images were first registered to the first volume of the fifth run and then registered to the MPRAGE structural volume using Advanced Normalization Tools (RRID: SCR_004757) (Avants et al., 2011). Registration from structural images to the standard space was further refined using Advanced Normalization Tools nonlinear registration SyN (Klein et al., 2009). All fMRI analyses were performed in each subject's native space and then transformed to standard space for group analysis.
Single-trial response estimate
The GLMs were created separately for each of 96 encoding and retrieval trials to estimate the single trial response. A least square single method was used, where the target trial was model as one EV, and all other trials were modeled as another EV (Mumford et al., 2012). The trial was modeled at its presentation time, convolved with a canonical hemodynamic response function (double gamma). The whole first 4 s was modeled during both encoding and retrieval for each trial. Following existing studies (Kuhl et al., 2011; Danker et al., 2016; Staresina et al., 2016), we believe 4 s was enough to capture the retrieved representations. We did not include the category judgment stage because the category labels varied across trials, which would introduce additional confounds. The t statistics were used for representation similarity analysis to increase the reliability by normalizing for noise (Walther et al., 2016).
In another analysis, we examined whether there were some differences in timing between the encoding and retrieval phases, leading to low ERS in the visual cortex. To examine this possibility, we extracted the BOLD responses for each of the 3 TRs (4 s stimulus presentation/retrieval plus 2 s category judgment) after stimulus onset during encoding and retrieval (with 4 s delay to account for the slow BOLD response), and calculated the ERS between each TR combination during encoding and retrieval.
Representational similarity analysis (RSA)
We used RSA to examine the similarity of activation patterns across trials (Kriegeskorte et al., 2008). Both region of interest (ROI)-based and searchlight methods (5 × 5 × 5 voxels cubic) were used (Kriegeskorte et al., 2006). For a given anatomic ROI (see below) or searchlight sphere, the multivoxel response pattern (t statistics) for each of the 96 cue-picture associations was extracted for each participant separately (see Fig. 1C). Pattern similarities were estimated by calculating the pairwise Pearson correlation among each trial's response pattern. To examine item-specific encoding, we compared different cues, same picture (C−P+) pairs with different cues, different pictures (C−P−) pairs from the encoding runs. The C−P− pairs were matched to the C−P+ pairs in terms memory performance (only remembered items were used), category (all within-category), and lag (all cross-runs). Similarly, item-specific retrieval was examined by comparing the C−P+ pairs with the C−P− pairs during retrieval. To examine ERS, we compared the pattern similarity among the three types of encoding-retrieval pairs: same cue, same picture (C+P+) pairs, different cues, same picture (C−P+) pairs, and different cues, different pictures (C−P−) pairs. All similarity scores were transformed into Fisher's z scores for further statistical analysis (see Fig. 1C).
The searchlight analysis was conducted in the native space for each subject and then transformed into standard space for group analysis. A random-effects model was used for group analysis. Because no first-level variance was available, an ordinary least square model was used. Group images were thresholded using cluster detection statistics, with a height threshold of z > 3.1 and a cluster probability of p < 0.05, corrected for whole-brain multiple comparisons using Gaussian Random Field Theory.
Definition of cortical ROIs
Following previous studies (Ritchey et al., 2013; Kuhl and Chun, 2014; Wing et al., 2015; Danker et al., 2016), we focused our analysis on the visual cortex, parietal lobule, and frontal lobule. Fourteen ROIs were defined based on the Harvard-Oxford probabilistic atlas (threshold at 25% probability) (RRID: SCR_001476), including the bilateral ventral visual cortex (VVC, containing ventral lateral occipital cortex, occipital fusiform, occipital temporal fusiform, and parahippocampus), the angular gyrus (AG), the supramarginal gyrus (SMG), the inferior frontal cortex (IFG), the middle frontal cortex (MFG), the superior frontal cortex (SFG), the medial prefrontal cortex (mPFC), and the posterior cingulate cortex (PCC) (see Fig. 2).
Segmentation of MTL subregions
Several subregions of MTL were obtained either by automatic segmentation or based on standard atlas to detect the MTL role in item-specific representation. Using automatic hippocampal subfield segmentation (RRID: SCR_005996) (Yushkevich et al., 2015), the hippocampus was segmented into CA1, CA2, DG, CA3, and subiculum (Sub), and the anterior portion of parahippocampus was divided into perirhinal cortex (PRc) and entorirhinal cortex (ERc) based on each subject's high-resolution T2-weight MRI image. Because of the limited number of voxels in CA2 (mean ± SD, 0.864 ± 0.7) and CA3 (1.65 ± 1.04), only CA1 (226.55 ± 29.931), DG (167.7 ± 26.495), Sub (54.95 ± 8.101), PRc (67.05 ± 12.124), and ERc (77.35 ± 9.917) were included in current study (see Fig. 5A). Single-trial estimates were then obtained within those five ROIs for each subject, respectively, for further analysis.
RSA with feature selection
Both the ROI and searchlight analyses assumed that the feature representations were locally distributed, and all voxels within that region contributed to the item-specific representations. We used feature selection to examine whether we could obtain clearer evidence for item-specific representation with feature selection. To do this, we obtained the rank for each voxel's contribution to item-specific representation using the following procedure (Finn et al., 2015). For each voxel v, we first z-scored the t value (β estimation in this case) along the trial dimension, and then calculated the trial-wise product vector: Cij = Vi × Vj (i, j represents trials). For certain trial i, if voxel v is sensitive to its item-specific information, then Cii (i.e., C−P+ pairs) should be larger than Cij (j ≠ i, C−P− pairs). We thus calculated the possibility Pi = P | Cii > Cij | for trial i, and then the mean probability across all trials: Pv = (∑t=148 Pi)/48. The index Pv indicates the ratio of the voxel v showing higher similarity for C−P+ pairs than for C−P− pairs, with a higher P value indicting a higher load of item-specific information. The top N voxels with highest P were chosen for further pattern similarity analysis. A cross-validation procedure was used. That is, the feature selection was based on one group of pictures, and the selected voxels were used to calculate the item-specific encoding, item-specific retrieval, and/or ERS on the other group of pictures. To compare the isomorphism of the representation during encoding and retrieval, we also did the feature selection on encoding pairs and applied that to retrievals pairs and vice versa. If the same voxels contributed to item-specific representation during both encoding and retrieval, we should observe significant item specificity when using the voxels selected from encoding stage to estimate the item specificity during retrieval, and vice versa.
Representational connectivity analysis
To examine the cross-region reinstatement hypothesis, we examined representational connectivity among the 10 cortical structural ROIs by calculating correlations between the representational similarity patterns during encoding and that during retrieval (ER-RS) (Kriegeskorte et al., 2008). For each ROI, representational similarity pattern was procured by obtaining the pairwise Pearson correlation among the neural activation pattern for each of the remembered pictures. To reduce the effect of intrinsic fluctuation on between-region representation similarity (Henriksson et al., 2015), we excluded the within-run pairs. We also excluded the C−P+ pairs that might skew the distribution because they had higher similarity than C−P− pairs. As a result, the representational connectivity results should not be driven by a stepwise difference between the C−P+ and C−P− pairs.
Mixed-effects model
A mixed-effects model was applied to test the relationship between activation in subregions of hippocampus and cortical item specificity. Concretely, activation of hippocampal subregions and item specificity was obtained for each picture and each subject. Then, activation in hippocampal subregions was used as the predictor to predict the item specificity in cortex during encoding and retrieval, respectively. Participants were included as a random effect. Mixed-effects modeling was implemented with lme4 in R (RRID: SCR_001905) (Bates et al., 2014). We used the likelihood ratio test to compare the models (with vs without the predictor) to determine the effect of the predictor.
Multiple-comparisons correction
For pattern similarity analysis, all the ROI-based analyses were conducted for 14 predefined ROIs. Bonferroni correction was performed for the correction of multiple comparisons (corrected p value: 0.05/14 = 0.0036). We report the uncorrected p values and indicated the results that survived correction. The same approach was applied in representational connectivity analysis, where the correction was performed according to the number of multiple comparisons (10 comparisons for within-region reinstatement and 16 comparisons for cross-region reinstatement).
Results
Item-specific pattern reinstatement revealed by ERS
Participants performed very well during the memory test in the scanner (hits 94.3 ± 5.1%). The postscan test further showed that participants could correctly report more than four details associated with each retrieved picture. Together, the behavioral results suggest that overtraining before the scan was effective.
We then examined whether the item-specific representation during encoding was reactivated during retrieval. To make sure that the item-specific reinstatement was not caused by the use of the same word cue during encoding and retrieval, we first compared the ERS between different-cue-same-picture (C−P+) pairs with the different-cue-different-picture (C−P−) pairs. The C−P− paired matched with the C−P+ pairs on memory performance, lag, and picture category (all from the same category) (Fig. 1). Brain regions engaged in item-specific pattern reinstatement should show higher pattern similarity for C−P+ pairs than C−P− pairs. Within the 14 predefined anatomic ROIs (Kuhl and Chun, 2014), we found significantly greater ERS for C−P+ pairs than for C−P− pairs in the left angular gyrus (LAG) (F(1,19) = 9.048, p = 0.0072), the left supramarginal gyrus (LSMG) (F(1,19) = 8.433, p = 0.0091), the LMFG (F(1,19) = 8.562, p = 0.0087), and the PCC (F(1,19) = 8.738, p = 0.0081), but not in the VVC (Fig. 2). These results did not survive Bonferroni correction for multiple comparisons (corrected threshold p value: 0.05/14 = 0.0036). Direct comparison across regions found that the item specificity for ERS (C−P+ minus C−P−) was significantly greater in the left inferior parietal cortex than that in the VVC (p < 0.046).
Experiment paradigm. A, Slow event-related design (16 s for each trial) was used to better estimate brain responses associated with single items. Self-spaced orientation judgment task applied during 8 s intertrial interval to prevent further encoding of the word cue-picture association. B, The arrangement of scanning runs. There were four encoding-retrieval sessions in total. C, Strategies to examine item-specific encoding (left), retrieval (middle), and ERS (right). The item specificity was obtained by comparing similarities between different cues-same picture (C−P+) pairs with different cues-different picture pairs (C−P−; matched with C−P+ pair in memory performance, category, and lag). The words used as cues were actually presented in Chinese.
ROI results for item-specific encoding, retrieval, and ERS. A, The location of the pre-defined anatomic ROIs. B, ROI results for item-specific encoding. C, ROI results for item-specific retrieval. D, ROI results for item-specific ERS. Error bars indicate within-subject error. After Bonferroni correction for 14 ROIs (p < 0.0036), effect of item-specific encoding survived in bilateral VVC, item-specific retrieval survived in bilateral AG, MFG, LSMG, and LIFG. ***p < 0.001/14. **p < 0.01/14. *p < 0.05/14.
Because most previous studies calculated ERS based on the C+P+ pairs, we further examined whether the use of the same word cue contributed to the item-specific ERS. Although numerically greater ERS was found for C+P+ pairs than for C−P+ pairs in the bilateral VVC and the IFG, direct comparison revealed no significant differences (p > 0.3; uncorrected). These results suggest that the use of the same word cue and the specific word-scene association contribute little to the ERS in these regions under the current experimental condition.
Different regions carrying item-specific information during encoding and retrieval
We then examined item-specific encoding and retrieval by comparing pattern similarity between C−P+ pairs and C−P− pairs during encoding phase and retrieval phase, respectively (Fig. 1). During encoding, C−P+ pairs showed significantly greater mean pattern similarity than C−P− pairs in the bilateral VVC (LVVC: F(1,19) = 40.069, p < 0.0001; RVVC: F(1,19) = 53.018, p < 0.0001) (Fig. 2; Table 1). Both regions survived Bonferroni correction. During retrieval, item-specific representation was found in the bilateral VVC, AG, SMG, IFG, MFG, SFG, mPFC, and PCC. After Bonferroni correction of multiple comparisons, the effect in bilateral AG and MFG, LSMG, LIFG, and PCC were still significant (Fig. 2; Table 1).
ROI-based results for item-specific encoding, retrieval, and ERSa
The above analysis suggests that the VVC and inferior parietal lobule (IPL)/PFC/PCC showed item-specific patterns of neural activity during encoding and retrieval, respectively. To directly test this dissociation, we performed a brain region (bilateral VVC/AG/SMG/IFG/MFG and mPFC/PCC) by phase (encoding/retrieval) two-way repeated measure ANOVA on the item specificity (C−P+ minus C−P− pairs). This analysis revealed a significant region × phase interaction (F(13,247) = 9.218, p < 0.001). Further planned simple effect tests found greater item specificity in encoding phase compared with retrieval phase in the left (t(1,19) = 3.585, p = 0.0019) and right VVC (t(1,19) = 3.318, p = 0.0036), which survived Bonferroni correction (0.05/14 = 0.0036). We also found marginally greater item specificity in retrieval phase than in encoding phase in the LAG (t(1,19) = 1.909, p = 0.07) and LMFG (t(1,19) = 1.859, p = 0.08), but not in other IPL or IFG frontal regions (p > 0.14).
To examine whether there was temporal mismatch between the representations during encoding and retrieval, we extracted the BOLD responses for each of the 3 TRs after stimulus onset during encoding and retrieval (with 4 s delay to account for the slow BOLD response), and calculated the ERS between representation at each of the TR during encoding and that during retrieval. The results showed that the strongest ERS was found at the same TR after stimulus onset in the left AG and SMG, and no significant ERS was found in the VVC in any TR combination (Fig. 3).
Time-resolved ERS between encoding and retrieval. The BOLD responses for each of the 3 TRs after stimulus onset during encoding and retrieval (with 4 s delay to account for the slow BOLD response) were extracted, and the ERS was calculated between representation at each of the TRs during encoding and that during retrieval. Values in the heat maps are the mean item-specific ERS [ERS(C−P+) − ERS(C−P−)] across subjects.
Whole-brain searchlight analysis
We also did whole-brain searchlight analysis to examine whether other brain regions showed item-specific representation during encoding and retrieval. These results were consistent with the ROI-based results, which emphasized item-specific encoding in sensory cortex, whereas item-specific retrieval was observed in higher-order cortices. In particular, item-specific ERS was found (after whole-brain correction, z > 3.1, p < 0.05) in bilateral lateral occipital cortex (LOC) when comparing C−P+ trials with C−P− trials, but no significant results were found when comparing C+P+ trials with C−P− trials (Fig. 4; Table 2). No significant difference was found between C+P+ trials and C−P+ trials.
Whole-brain searchlight results. Item-specific representation for encoding and retrieval, their direct comparisons, and ERS were rendered onto a population-averaged surface atlas (Xia et al., 2013). All activations were thresholded using cluster detection statistics, with a height threshold of z > 3.1 and a cluster probability of p < 0.05, corrected for whole-brain multiple comparisons using Gaussian Random Field Theory.
Searchlight results for item-specific encoding, retrieval, and ERSa
Item-specific representation during encoding was found in bilateral LOC and occipital fusiform, whereas item-specific representations during retrieval were found in the bilateral IFG/MFG, left IPL, left LOC, and left temporal occipital fusiform. Direct comparison showed that the bilateral ventral LOC showed greater item specificity during encoding than retrieval, whereas no brain region showed significantly greater item specificity during retrieval than during encoding (Fig. 4; Table 2).
Item-specific representation in MTL subregions
Although MTL subregions contribute to pattern completion, the fMRI evidence is still inconclusive regarding whether they carry item-specific information (Chadwick et al., 2010; Bonnici et al., 2012; LaRocque et al., 2013; Liang et al., 2013). Because of the small anatomical structure, the whole-brain searchlight results might not be able to characterize the functional dissociation in MTL subregions. We tested for item-specific encoding, retrieval, and ERS in anatomically defined MTL subregions. For ERS, we revealed no significant greater ERS for C−P+ pairs than for C−P− pairs (all p values >0.18), except for a reversed pattern in the enthorhinal cortex (F(1,19) = 10.391, p = 0.004; survived Bonferroni correction). No significant difference was found when comparing C+P+ pairs with C−P− pairs (all p values >0.27) (Fig. 5). We also found that no subregions showed significant differences between C−P+ pairs and C−P− pairs during encoding (p > 0.25). Item-specific representation during retrieval was found in the CA1 (F(1,19) = 4.681, p = 0.043), DG (F(1,19) = 5.333, p = 0.032; uncorrected), and Sub (F(1,19) = 4.726, p = 0.043). A two-way ANOVA on item specificity with region and phase revealed no significant main effect of phase (F(1,19) = 0.957, p = 0.34) or region × phase interaction (F(4,76) = 1.591, p = 0.185) (Fig. 5).
Item-specific representation in the MTL subregions. A, The segmentation of the MTL from one sample subject. There are five subregions of hippocampus: CA1, CA2, DG, CA3, and subiculum. CA2 and CA3 were not included in further analysis considering their relatively limited voxels. The anterior portion of parahippocampus was divided into two parts: PRc (which further divided into BA35 and BA36) and ERc. B, Item-specific representation in these regions for encoding (left), retrieval (middle), and ERS (right). Error bars indicate within-subject error. *p < 0.05, uncorrected.
CA1 and DG activation correlated with item-specific representation
Computational models posit that pattern reinstatement could be attributed to pattern completion supported by subregions of the hippocampus. Consistently, several studies found that hippocampal activities correlated with the fidelity of pattern reinstatement (Staresina et al., 2012; Wing et al., 2015; Danker et al., 2016). To test this hypothesis, we conducted mixed-effects models to evaluate the relationship between activities in hippocampal subregions (i.e., CA1 and DG, and cortical item-specific reinstatement). The result showed that ERS in the RAG was marginally positively correlated with activity during retrieval in CA1 (χ(1)2 = 3.424, p = 0.064) and DG (χ(1)2 = 3.304, p = 0.069). No significant correlation was found between hippocampal encoding strength and ERS (all p values >0.15).
We also tested the relationship between hippocampal activity and item-specific representation. The results revealed that, during encoding, item specificity in the VVC was positively correlated with neural activation in the CA1 and the DG (p < 0.007). During retrieval, item-specific representation in the bilateral VVC, the LIFG, and the LMFG were positively correlated with activation levels in the CA1 (p < 0.021), and item-specific representation in the bilateral VVC, IFG and MFG was correlated with the activation levels in the DG (p < 0.05) (Table 3). The CA1 and DG correlations with bilateral VVC during encoding survived Bonferroni correction of 20 comparisons. These results suggest that CA1 and DG activity contributed to item-specific encoding in the VVC.
Correlations between hippocampal subregion activity and cortical item-specific representationsa
Different voxels carrying item-specific information during encoding and retrieval
In the above analyses, it is assumed that all the voxels within a ROI or searchlight sphere contributed to item-specific representation, which might not be necessarily true. In a further analysis, we used a feature selection algorithm to select voxels carrying item-specific representation. The purposes of this analysis were twofold: First, with this more sensitive method, we might obtain stronger evidence for item-specific encoding or retrieval. Second, this allowed us to examine at a finer spatial scale the neural dissociation of item-specific representation during encoding and retrieval. That is, if the same voxels within a region contributed to item-specific representation during both encoding and retrieval, we should observe significant item specificity when using the voxels selected from encoding stage to estimate item specificity during retrieval, and vice versa.
We used a cross-validation procedure to choose voxels from half of the pictures (i.e., selection dataset) and then used these voxels to estimate the item-specific representation on the half of the pictures (i.e., validation dataset). Focused on the visual cortex (VVC) and the frontoparietal cortex (FPC: bilateral IFG/MFG/AG/SMG), it is not surprising to see that with stricter feature selection, there was higher item specificity in the selection dataset (e.g., the mean ± SD encoding item specificity increased from 0.018 ± 0.011 when using all voxels in VVC to 0.446 ± 0.021 when choosing the top 1% voxels). We then selected the most informative voxels to estimate the item-specific representation in the validation dataset: the top 0.8% (mean ± SD number of voxels = 86.775 ± 8.057 in VVC) and the top 0.5% in FPC (81.925 ± 7.399 voxels), to keep the number of voxels across VVC and FPC comparable. In the VVC, compared with using all voxels in this region, feature selection based on encoding data led to a significant increase in item specificity (C−P+ − C−P−) for encoding (t(19) = 3.488, p = 0.002), and the item specificity was higher than the highest item specificity obtained in the ROI-based analysis (i.e., the RVVC, t(19) = 3.284, p = 0.004). In the FPC, feature selection based on retrieval data did not increase the item specificity for retrieval when compared with using all the FPC voxels (t(19) = −1.006, p = 0.327), and the item specificity was not higher than the highest item-specific retrieval obtained in the ROI analysis either (i.e., LIFG, t(19) = −0.637, p = 0.532). Finally, feature selection based on ERS failed to increase item-specific ERS in the VVC (t(19) = 1.056, p = 0.304), or in the FPC (t(19) = −0.300, p = 0.767).
To test the spatial dissociation of item-specific representation between encoding and retrieval, we did a two-way ANOVA to examine the effect of selection stage (encoding vs retrieval) and validation stage (encoding vs retrieval) on the item specificity and their interaction. In the VVC, we found significant main effect of validation stage (F(1,19) = 29.72, p < 0.001), suggesting that it contains item-specific information during encoding (p < 0.001) but not retrieval (p > 0.1). In addition, there was also a significant selection stage × validation stage interaction (F(1,19) = 10.55, p = 0.004), suggesting that the item specificity for encoding was significantly lower when using voxels selected based on retrieval data than based on encoding data (t(19) = 2.977, p = 0.008) (Fig. 6).
Feature selection results. Left, Feature selection results in the VVC. Right, Feature selection results in the FPC. Error bars indicate within-subject error. EE, Item-specific encoding in voxels selected based on encoding data; ER, item-specific retrieval in voxels selected based on encoding data; RE, item-specific encoding in voxels selected based on retrieval data; RR, item-specific retrieval in voxels selected based on retrieval data.
Consistent with the fact that feature selection in the FPC did not increase item specificity for retrieval, we only found a similar yet statistically insignificant pattern in the FPC: the item specificity for retrieval was numerically greater than that for encoding (F(1,19) = 1.356, p = 0.259); and the item specificity for retrieval was numerically higher when using voxels selected based on retrieval data than based on encoding data (t(19) = 0.670, p = 0.511) (Fig. 6). Together, these results suggest that different voxels contain item-specific information during encoding and retrieval.
Distinct nature of representation during encoding and retrieval
The above analyses suggest that the brain regions/voxels containing item-specific information were largely nonoverlapping during encoding and retrieval. Here we asked a further question regarding the nature of representation between encoding and retrieval: Does ERS reflect a precise reinstatement of encoded representation during retrieval? If the same representation during encoding was reinstated during retrieval, one should expect that the size of item specificity for ERS should be comparable with that for encoding and retrieval.
Contrary to this hypothesis, we found that the perceptual information during encoding was not precisely reinstated during retrieval as reflected by low item-specific ERS in the VVC. The item specificity for ERS was not significant >0 (LVVC: t(19) = 2.243, p = 0.151; RVVC: t(19) = 1.047, p = 0.319) and was smaller than the item specificity during encoding (LVVC: t(19) = 4.666, p < 0.001; RVVC: t(19) = 5.499, p < 0.001). More interestingly, although the item specificity for ERS in the LAG was significantly >0, it was smaller than the item specificity for retrieval (t(19) = 2.177, p = 0.042). The lower item specificity for ERS compared with encoding AND retrieval suggests different nature of representations between encoding and retrieval, rather than the retrieved representations being inherently noisier and less accurate per se.
Cross-region reinstatement of item-specific representation
The above results suggest that the item-specific information during encoding and retrieval were represented in the VVC and FPC, respectively. Moreover, the item specificity for ERS in VVC was smaller than that during encoding. It is thus logical to hypothesize that the VVC representation during encoding may be reinstated in the frontoparietal cortex during retrieval. Because the ERS can only measure within-region reinstatement, we used representational connectivity analysis (Kriegeskorte et al., 2008) to examine this cross-region reinstatement hypothesis.
First, we examined whether encoded representational structure was reinstated in the same region during retrieval. Significant within-region correlation was found in several brain areas, including the LVVC, the RVVC, the RAG, the RSMG, the LIFG, the RIFG, the LMFG, and the RMFG, but not in the LAG or the LSMG (Fig. 7; Table 4). The bilateral VVC, RAG, left IFG, and bilateral MFG survived Bonferroni corrections for 10 ROIs.
Cross-region pattern reinstatement. Representational similarity matrix for encoding (A) and retrieval phase (B). Data are from one ROI (i.e., LVVC) of one participant (Subject 3). For each of the 48 pictures, the representational similarity matrix was obtained by calculating the pairwise Pearson correlation of activation pattern between each of the 48 pictures and the other 23 pictures from different run (the same picture pairs were removed), separately for encoding and retrieval phase. C, Heat-map for group-averaged encoding-retrieval representational connectivity within and across brain regions. D, Bar graphs of within-region encoding-retrieval representational connectivity (i.e., within-region reinstatement). E, F, Bar graphs of across-region encoding-retrieval representational connectivity (i.e., cross-region reinstatement) for the left and right VVC, respectively. Error bars indicate within-subject error.
Within-region reinstatement of representational structure
We then tested whether the representational structure during encoding in the VVC was reinstated in frontoparietal regions during retrieval (i.e., cross-region reinstatement). The results found positive evidence in several frontoparietal regions. The VVC-IPL reinstatement was mainly found in the right IPL, including the LVVC-RAG, RVVC-RAG, LVVC-RSMG, and RVVC-RSMG, and much weaker effect in the left IPL, such as LVVC-LAG, RVVC-LAG, LVVC-LSMG, and RVVC-LSMG (Fig. 7; Table 5). We also found significant VVC-IFG reinstatement in both hemispheres: including LVVC-LIFG, LVVC-RIFG, RVVC-LIFG, RVVC-RIFG, LVVC-LMFG, LVVC-RMFG, RVVC-LMFG, and RVVC-RMFG (Fig. 7; Table 5). The RVVC-RSMG and LVVC-LIFG, LVVC-RIFG, RVVC-RIFG, LVVC-LMFG, LVVC-RMFG, RVVC-LMFG, and RVVC-RMFG effect survived Bonferroni correction across 16 comparisons.
Cross-regions reinstatement (from VVC to frontoparietal cortex) of representational structure
More importantly, the ipsilateral VVC-SMG reinstatement was greater than the SMG-SMG reinstatement in both the left (t(19) = 3.354, p = 0.002) and right (t(19) = 2.446 p = 0.012) hemisphere. There was also a trend of smaller LAG-LAG reinstatement than LVVC-LAG reinstatement (t(19) = 1.546, p = 0.069) and RVVC-LAG reinstatement (t(19) = 1.422, p = 0.086) (Fig. 7).
The above results provide evidence for cross-region reinstatement. Notably, we also found significant within-region reinstatement in the VVC, although no significant item-specific ERS was found in these regions. This observation is consistent with the observation that the representations during retrieval might be transformed from that of encoding. In particular, the representational connectivity analysis examines whether the internal representational structure/space of all stimuli are maintained across regions, but it cannot specify whether the same features are maintained (like the RSA). In other words, a high representational connectivity could be observed when the relationship across stimuli remains similar, but the representation is significantly transformed and there is low ERS. Although the exact mechanisms remain to be examined, it is tempting to speculate that during retrieval, the encoded representation in the visual cortex was transformed and reinstated in the frontoparietal lobe. Via top-down modulation, this transformed representation could also be represented in the visual cortex during retrieval. As a result, although the representation structure was maintained during encoding and retrieval, the exact features could be different across stages. Nevertheless, this interpretation is highly speculative, and future studies are definitely required to further examine this important issue.
Discussion
The present study used fMRI to examine the item specificity of neural activation patterns during memory encoding and retrieval, and the relationship between encoding and retrieval (i.e., the ERS). By pairing each picture with two different word cues, we could dissociate the contribution of retrieval cues and the retrieved associations to the observed ERS. We observed clear evidence for item-specific pattern reinstatement, even in the absence of perceptual overlap. Consistent with previous observation (Kuhl and Chun, 2014), item-specific pattern reinstatement was found only in the parietal lobule but not in the visual cortex. We found no significantly stronger similarity for C+P+ pairs than C−P+ pairs, suggesting that the word cue and the specific word-picture association might contribute little to the ERS in the VVC and IPL regions. This might be due to the fact that participants were overtrained in the associations and the details of the pictures were emphasized; thus, the pictures might have contributed significantly to the representations. This approach can be used to examine the contribution of cues and associated contents when the associations were not well learned (Staresina et al., 2012; Wing et al., 2015; Danker et al., 2016).
More importantly, the present study directly examined the nature of representation during encoding and retrieval. We obtained several lines of evidence to suggest qualitatively different representations between the two processing stages. First, item-specific representation at encoding and retrieval was found in different brain regions. Direct comparison revealed significant region by processing stage interaction. Unlike the VVC, the FPC voxels carrying item-specific information for one group of material did not generalize to another group of material, suggesting the representation in this region might be sparser than that in the VVC. Second, the size of item specificity for ERS was lower than the size of item specificity during encoding and retrieval, suggesting the nature of representation during encoding and retrieval was different. Third, the feature selection results revealed that, although feature selection improved the item specificity when applying to the within-stage data, it reduced the item specificity when applying to the cross-stage data, providing additional evidence that different voxels might contain item-specific information during encoding and retrieval. Finally, we obtained preliminary evidence that the structure of the representation could be reinstated across regions. In particular, the encoded representation in the VVC was reinstated in the IPL and IFG during retrieval. Together, they suggest that item-specific information is represented at different regions during encoding and retrieval. Within the same brain region, the representation during retrieval is qualitatively different from that during encoding. This transformation serves as a pathway for human brain to construct individual knowledge network. Consistently, a recent study reported greater between-subject similarity in recalling the representations of same events than within-subject encoding-recall similarity (Chen et al., 2016). Our study further suggests that the encoded represented might be transformed and reinstated across brain regions. Future study should definitely examine the mechanisms underlying this cross-region reinstatement.
In supporting the different nature of representation during encoding and retrieval, recent evidence suggests that the IPL and VVC might carry distinct information: Whereas the representation in the IPL is abstract and identity-specific, invariant to viewpoint and other features, the VVC representation contains perceptual details (Jeong and Xu, 2016). Consistently, it is proposed the IPL lies at the convergence of multiple perceptual processing streams, enabling the progressive abstraction of conceptual knowledge from perceptual experience (Binder et al., 2011). The representation in the IPL, but not the visual cortex, is modulated by semantic similarity (Ye et al., 2016). As a result, retrieval of an abstract representation could lead to high item-specific representation in the IPL but not in the VVC. Future studies should definitely examine the nature of representations (e.g., abstractness, during retrieval).
The present study failed to reveal item-specific representation in the visual cortex during retrieval. Consistently, a recent study found only category-level reinstatement in the VVC during retrieval (Kuhl and Chun, 2014), and it was not possible to reconstruct the perceived images from the occipitotemporal representations during working memory (Lee and Kuhl, 2016). It should be noted in the current study, participants were extensively trained on the word-picture association, and they could specify at least four detailed features associated with each picture. This suggests that the lack of item-specific reinstatement in the VVC was not due to the lack of mnemonic details. In addition, item-specific representations were found during encoding, suggesting participants focused on the visual features during encoding.
In contrast to long-term memory, mounting evidence suggests that item-specific information could be decoded from the visual cortex during working memory (Harrison and Tong, 2009; Serences et al., 2009; Christophel et al., 2012; Riggall and Postle, 2012) and mental imagery (Stokes et al., 2009; Albers et al., 2013; Vetter et al., 2014). Several factors could account for this discrepancy. First, because there was a relative long delay between encoding and retrieval, the perceptual information might have been decayed. Second, it is possible that, although we emphasized the retrieval of perceptual details, participants relied on abstract information to make the mnemonic decisions. Third, in most short-term memory and mental imagery studies, only a small set of simple visual stimuli were used. As such, participants might rely on some simply visual features to perform the task. When using complex stimuli for visual imagery, the early visual cortex may only carry categorical but not precise pictorial information (Vetter et al., 2014). Consistent with the episodic memory results, some studies suggest that the representation during perception and imagery might be quantitatively different. In one study, the classifiers trained on perceptual data did a better job in classifying perception data than imagery data (Zeithamova et al., 2012). Similarly, the performance of “perceptual” classifier was lower than the “imagery” classifier in classifying the imagery data (Albers et al., 2013).
With high-resolution fMRI, the present study also examined the MTL subregions during encoding and retrieval. The MTL plays an important role in pattern completion during retrieval (Leutgeb et al., 2007; Bakker et al., 2008), and the level of activation was correlated with the ERS in cortical regions leading to reactivation of cortical patterns from initial encoding (Staresina et al., 2012; Ritchey et al., 2013; Gordon et al., 2014; Wing et al., 2015; Danker et al., 2016). Consistently, we found that DG and CA1 activation was significantly correlated with the item-specific encoding and retrieval in the visual cortex. Nevertheless, the representation in the hippocampal regions is very sparse (Quiroga et al., 2008; Wixted et al., 2014), making it hard to probe the content representations due to the limited spatial resolution of fMRI. Consistent with previous observations (LaRocque et al., 2013; Liang et al., 2013), we found no item-specific representation during encoding. Interestingly, weak item-specific representation in MTL subregions was found during retrieval, although no significant difference was found between encoding and retrieval. Future studies should definitely examine the MTL representation during encoding and retrieval.
The different nature of representation during encoding and retrieval is consistent with the idea that it is neither possible nor necessary to precisely reinstate the encoding process (see Introduction). These findings have both methodological and theoretical implications. Methodologically, it suggests that the use of perceptual classifier to examine the content of episodic memory retrieval might be less sensitive. Theoretically, it provides support for the notion that memory retrieval is not a faithful replay of past event, but rather involves additional abstraction and construction processes. Whereas the abstraction processes enable the formation of conceptual knowledge from perceptual experience to support highly complex functions, such as language, creative thinking, and problem solving (Binder et al., 2011), the construction processes allow the flexible use of past information to serve current and future goals (Schacter, 2012).
Our results also emphasize the importance to examine the functional necessity of processes reenactment or neural pattern reinstatement in successful memory retrieval. In supporting a functional role eye movement reenacting played for memory, forcing fixation during retrieval (Laeng and Teodorescu, 2002; Johansson et al., 2012; Johansson and Johansson, 2014; Laeng et al., 2014) reduced episodic memory performance, whereas showing the elements serially according to the original scan path's sequence yielded a significantly better recognition than the shuffled condition (Bochynska and Laeng, 2015). However, there is evidence that fixing eye movements during encoding does not affect eye movements during retrieval (Johansson et al., 2012), suggesting that the reenacted processes may be a consequence of activating the episodic representation and/or a strategy for facilitating the retrieval (Ferreira et al., 2008), but not necessarily the replay of the encoding processes.
In conclusion, humans are capable of vividly reexperiencing past events. With an innovative design, the present study obtained compelling evidence for item-specific pattern reinstatement. Beyond that, we also reveal important differences between neural representations during encoding and retrieval. These results suggest a more abstractive and constructive nature of human episodic memory, which is essential to support the adaptive function of human memory. Future studies could use our method to compare the representations in different mental processes, including perception, short-term maintenance, retrieval, and imagery, which could help to achieve a deeper understanding how the brain creates, maintains, and reexperiences memories.
Footnotes
This work was supported by the National Natural Science Foundation of China 31130025, 973 Program 2014CB846102, 111 Project B07008, and the NSFC Project 31521063. We thank Tianyi Tian and Dingxin Wang for developing the fMRI acquisition sequence.
The authors declare no competing financial interests.
- Correspondence should be addressed to Dr. Gui Xue, State Key Laboratory of Cognitive Neuroscience and Learning and IDG/McGovern Institute of Brain Research, Beijing Normal University, Beijing 100875, PR China. guixue{at}gmail.com