Abstract
The striatum is a central part of the dopaminergic mesolimbic system and contributes both to the encoding and retrieval of long-term memories. In this regard, the co-occurrence of striatal novelty and retrieval success effects in independent studies underlines the structure's double duty and suggests dynamic contextual adaptation. To test this hypothesis and further investigate the underlying mechanisms of encoding and retrieval dynamics, human subjects viewed pre-familiarized scene images intermixed with new scenes and classified them as indoor versus outdoor (encoding task) or old versus new (retrieval task), while fMRI and eye tracking data were recorded. Subsequently, subjects performed a final recognition task. As hypothesized, striatal activity and pupil size reflected task-conditional salience of old and new stimuli, but, unexpectedly, this effect was not reflected in the substantia nigra and ventral tegmental area (SN/VTA), medial temporal lobe, or subsequent memory performance. Instead, subsequent memory generally benefitted from retrieval, an effect possibly driven by task difficulty and activity in a network including different parts of the striatum and SN/VTA. Our findings extend memory models of encoding and retrieval dynamics by pinpointing a specific contextual factor that differentially modulates the functional properties of the mesolimbic system.
SIGNIFICANCE STATEMENT The mesolimbic system is involved in the encoding and retrieval of information but it is unclear how these two processes are achieved within the same network of brain regions. In particular, memory retrieval and novelty encoding were considered in independent studies, implying that novelty (new > old) and retrieval success (old > new) effects may co-occur in the striatum. Here, we used a common framework implicating the striatum, but not other parts of the mesolimbic system, in tracking context-dependent salience of old and new information. The current study, therefore, paves the way for a more comprehensive understanding of the functional properties of the mesolimbic system during memory encoding and retrieval.
Introduction
Dopaminergic projections from the substantia nigra and ventral tegmental area (SN/VTA) of the midbrain to the hippocampus can enhance synaptic plasticity and have been implicated in selective facilitation of long-term memory encoding (Jay, 2003; Shohamy and Adcock, 2010; Lisman et al., 2011). This mechanism may constitute the upward arm of a loop between SN/VTA and hippocampus which supports the entry of novel information into memory (Lisman and Grace, 2005). In this framework, novelty is initially detected in the hippocampus by comparison of sensory input with predictions based on stored information. A novelty signal is then conveyed through the ventral striatum and pallidum to the SN/VTA where it triggers firing of dopaminergic neurons projecting back to the hippocampus. In line with this model, neuroimaging studies have revealed novelty signals (i.e., higher BOLD for new than old stimuli) in the hippocampus, striatum, pallidum and SN/VTA (Bunzeck and Düzel, 2006; Stoppel et al., 2009; Bunzeck et al., 2014; Hawco and Lepage, 2014).
In contrast, activity increases in the striatum were demonstrated in recognition memory tasks for correctly recognized old compared with correctly rejected new stimuli (Spaniol et al., 2009). The precise role of the striatum during retrieval, however, remains debated. It may be actively engaged in retrieval (e.g., by gating working memory content; Scimeca and Badre, 2012), ascribe value to retrieved information (Han et al., 2010; Scimeca and Badre, 2012), or reinforce successful retrieval strategies (Scimeca and Badre, 2012; Scimeca et al., 2016). Interestingly, retrieval success effects lie in close proximity to the novelty effects mentioned above (Scimeca and Badre, 2012). This apparent contradiction suggests that the striatum is neither directly involved in retrieval itself, nor does it act as a mere relay for novelty signals. Instead, the striatum might combine information about item novelty with contextual and goal-related information (Lisman and Grace, 2005), possibly received via afferent projections from the prefrontal cortex (PFC; Ferry et al., 2000).
Whereas striatal retrieval success effects can be observed during explicit retrieval tasks, novelty effects in the striatum were reported during an indoor/outdoor discrimination task (Bunzeck et al., 2014). We suggest that these different task contexts induce retrieval versus encoding mode, respectively, and thereby modulate striatal old/new effects. Specifically, during perceptual categorization, the striatum might be engaged in detecting novel items, which are biologically more relevant in this context and hence selected for future storage. During explicit retrieval, in turn, old information is particularly salient and might lead to increased striatal engagement. Differential salience of old and new stimuli during encoding and retrieval mode might also be reflected in pupil size and visual exploration, two oculomotor markers sensitive to memory and salience (Smith et al., 2006; Sharot et al., 2008; Goldinger and Papesh, 2012; Kafkas and Montaldi, 2012; Wang and Munoz, 2015).
The hippocampal-VTA model (Lisman and Grace, 2005) suggests that the proposed activity pattern within the striatum is relayed to the SN/VTA and hippocampus and affects memory (re-)encoding (Scimeca and Badre, 2012). In this case, novel stimuli should subsequently be better remembered when presented during encoding mode, whereas old stimuli should be better remembered when presented during retrieval mode. Alternatively, memory (re-)encoding may generally benefit from retrieval (Roediger and Karpicke, 2006; Rowland, 2014). Such a “testing effect” might relate to the inherent difficulty of retrieval tasks (Bjork, 1999; Rowland, 2014) and depend on the hippocampal-VTA loop (Roediger and Butler, 2011). Indeed, retrieval mode (van den Broek et al., 2013) and task difficulty per se (Boehler et al., 2011) activate a similar brain network including the striatum and SN/VTA.
To investigate the neural mechanisms underlying encoding and retrieval of long-term memories, subjects viewed pre-familiarized scene images intermixed with new scenes in two different task contexts. Subjects discriminated the images according to either indoor/outdoor (ENC, encoding mode) or old/new (RET, retrieval mode) status while fMRI and eye tracking data were recorded. A final recognition task followed outside the scanner to assess subsequent memory performance (Fig. 1).
Materials and Methods
Participants.
Twenty-six right-handed healthy human subjects (11 male; age: 22–34 years, M = 26.1, SD = 3.5) gave written informed consent and participated in the experiment. No participant reported a history of neurological or psychiatric disorders. Two subjects were excluded from fMRI data analysis due to excessive movement and two subjects were excluded from eye tracking data analysis due to many partial blinks. For one subject behavioral data were not recorded for phase 3 (see below for details on data exclusion). The study was approved by the local ethics committee (Medical Association Hamburg).
Experimental design and task.
The experiment was programmed and run in MATLAB R2014b (MathWorks; RRID:SCR_001622), using the Psychtoolbox-3 (Brainard, 1997; Kleiner et al., 2007; RRID:SCR_002881). It consisted of three experimental phases (Fig. 1). In phase 1, participants performed a target detection task, which familiarized them with a set of 160 scene images (80 indoor, 80 outdoor). Initially, participants were presented two additional target images (1 indoor, 1 outdoor) for 12 s, which were drawn at random from the full set of 162 images for each subject. Subsequently, the 160 images were presented three times each in pseudorandom order intermixed with 9% of target trials (i.e., 48 target and 480 non-target trials). All images were presented for 1 s followed by an interstimulus interval of 1.5 s (fixation cross) and subjects had to press a button for target stimuli within 2 s. Participants could make a self-paced pause every 96 trials.
Phase 2 was performed in the MR scanner with simultaneous acquisition of eye movement data, ∼20 min after phase 1. Here, images from phase 1 were presented randomly intermixed with 160 new images in two task contexts, which should induce encoding and retrieval mode, respectively. In the encoding task (ENC) subjects categorized images as indoor/outdoor. In the retrieval task (RET), participants gave an old/new recognition judgment. The combination of the two factors (ENC/RET and OLD/NEW) resulted in a 2 × 2 design with 80 items per condition. Tasks randomly alternated in blocks of 8 trials and each block contained four OLD and four NEW stimuli in random order. Before each block an instruction screen informed participants about the upcoming task. Images were presented for 1 s with an interstimulus interval of 2.3 s (fixation cross) and participants gave a response using their right index or middle finger. Button presses were recorded for 2.8 s from stimulus onset. Button-response mappings were counterbalanced across participants. Participants could make a self-paced pause every 64 trials.
In phase 3, subjects performed a final recognition task 10 min after leaving the MR scanner. The 320 images from phase 2 were intermixed with 160 new images and presented with a visual analog scale (certainly old to certainly new; middle point) on which participants indicated their response with a mouse device. Images were presented for 1 s with an interstimulus interval of 3 s (fixation cross and analog scale) during which participants could still give their response. Participants could make a self-paced pause every 60 trials.
Images were randomly assigned to the different phases and conditions for each subject. Images were presented at a size of ∼16° horizontal and 9.76° vertical. To control for effects of illumination, mean luminance on each color channel (R, G, B) was set to 127 (scale from 0 to 288) and images were presented on a gray background of equal luminance. Before each phase of the experiment, participants completed a brief training session. Images used during the training phase were different from those used during testing.
Behavioral data analyses.
Data were analyzed using MATLAB R2014b (MathWorks; RRID:SCR_001622). Accuracy of behavioral responses was assessed using signal detection theory (Stanislaw and Todorov, 1999). For calculating d′, hits were defined as correctly detected targets in phase 1. For old/new categorization during phases 2 and 3, hits were defined as old items correctly classified as old. For indoor/outdoor categorization during phase 2, hits were defined as indoor images correctly classified as indoor. d′ was calculated by subtracting the inverse φ (conversion of probabilities into z-scores according to the normal cumulative distribution function) of the hit rate from the inverse φ of the false alarm rate for each subject and condition. Because the inverse φ of 0 and 1 is −∞ and ∞, respectively, 0.5 was added to the number of hits and false alarms and 1 was added to the number of signal and no signal trials (Stanislaw and Todorov, 1999) to avoid hit or false alarm rates of 0 or 1, and hence d′ of ±∞. Memory performance in phase 3 was measured on a continuous visual analog scale. Here, we will not only report d′ (based on dichotomized responses) but also the raw memory strength rating per condition. Additionally, reaction times (RTs) will be reported for phases 1 and 2. Data were statistically analyzed using a 2 (OLD/NEW) × 2 (ENC/RET) repeated-measures ANOVA or a dependent samples t test where appropriate. For one subject, data acquisition crashed during phase 3, this subject was therefore only considered in data analyses of phases 1 and 2.
MRI data acquisition and preprocessing.
Acquisition of MRI data was performed using a 3-Tesla Siemens Trio MRI scanner and a 32-channel head coil. For functional imaging, 42 slices (aligned to the AC–PC line) covering the whole brain were collected per volume using a T2*-weighted EPI sequence (continuous slice acquisition, 2 × 2 × 2 mm; slice gap: 1 mm; TR = 2600 ms; TE = 26 ms; flip angle: 80°). Images were acquired in five runs. At the beginning of each run, six initial volumes were recorded to allow for steady-state magnetization, which were excluded from the analyses.
At the end of the experiment, high-resolution anatomical images of each subject's brain were obtained using an MPRAGE sequence (1 × 1 × 1 mm; slice gap: 0.5 mm; TR = 2300 ms; TE = 2.98 ms; flip angle: 9°) for T1 weighted and a FLASH sequence [1 × 1 × 1 mm; slice gap: 0.2 mm; TR = 24 ms; TE = (2.2, 4.7, 7.2, 9.7, 12, 15) ms; flip angle: 6°] for MT-weighted images, as these are well suited to localize the SN/VTA.
EPI images were first realigned between and within runs to the first image in the time series using a six-parameter rigid body transformation. In the same step, images were unwarped to remove residual movement related variance. To prevent remaining movement artifacts from affecting further analyses, we set a cutoff for scan-to-scan motion of 1 mm or 1° in more than half of the runs. This led to the exclusion of two subjects from further fMRI analyses. Subsequently, the T1-weighted anatomical scan was coregistered to the functional images, again using a rigid body transformation. All images were then normalized to MNI space based on normalization parameters derived from a segmentation of T1-weighted images into white matter, gray matter, and CSF using default tissue probability maps. Finally, all images were smoothed using a 6 mm FWHM Gaussian kernel.
MRI data analyses.
MRI data were analyzed using SPM12 (http://www.fil.ion.ucl.ac.uk/spm/software/spm12/, RRID:SCR_007037). For first-level analysis, onset regressors were included for new images shown in the encoding (indoor/outdoor) task (ENC-NEW), old images shown in the encoding task (ENC-OLD), new images shown in the retrieval (old/new) task (RET-NEW), and old images shown in the retrieval task (RET-OLD). Error trials were not excluded to avoid introducing a bias by selecting images based on mnemonic performance only in the recognition condition (exclusion in encoding blocks could only be based on indoor/outdoor judgment because memory status is unknown). To characterize subsequent memory effects (SMEs), memory strength per trial as obtained from the continuous visual analog scale in phase 3 was used as a parametric modulator. It should be noted, that this procedure is conceptually similar to categorizing images into remembered and forgotten and calculating the contrast between both. Importantly, the current approach should have higher power as it is based on a continuous measure rather than dichotomization (Cohen, 1983). Furthermore, memory performance in some conditions (esp. RET-OLD) was very high, leading to small variance and few forgotten trials, which renders this analysis more sensitive.
Stick-functions were convolved with the canonical HRF and a high-pass filter (cutoff 1/128 Hz) was applied. Contrasts for each condition of interest versus baseline were entered into a flexible factorial model with the factors ENC/RET, OLD/NEW, and subject at the second level. The uncorrected cluster-forming threshold for all analyses was p < 0.001. Clusters were considered significant at the familywise error-corrected threshold of p(FWE) < 0.05 (whole brain) at the cluster level. Based on theoretical considerations, small volume correction (SVC) was used for the NEW > OLD contrast using one bilateral hippocampus mask (aal-atlas; Knight, 1996; Tulving et al., 1996; Dolan and Fletcher, 1997; Grunwald et al., 1998; Strange et al., 1999; Lisman and Otmakhova, 2001; Lisman and Grace, 2005; Jeewajee et al., 2008; Ben-Yakov et al., 2014; Kafkas and Montaldi, 2014) and for the RET > ENC contrast using one bilateral SN/VTA mask (manually drawn on mean MT image; Boehler et al., 2011; van den Broek et al., 2013; Wing et al., 2013). Here, clusters with p(FWE) < 0.05 at the peak voxel were considered significant.
Eye movement data acquisition and preprocessing.
Eye movement data were simultaneously acquired using an EyeLink 1000 fiber optic camera (SR Research; RRID:SCR_009602) and a first-surface reflection mirror. The right eye was tracked with a sampling rate of 1000 Hz. Default cognitive parsing thresholds were used to detect fixations, saccades, and blinks. A 13-point calibration was performed before the scan for all except for three subjects for which a five-point calibration was used instead.
Because we were interested in pupil size and viewing behavior, we did not require central fixation during stimulus presentation. However, recorded pupil size depends on the angle between camera and pupil and changes in fixation location might therefore influence recorded pupil size without any actual dilation or constriction of the pupil (Hayes and Petrov, 2016). To get an estimate of pupil size changes due to changes in fixation location, we presented 12 additional fixation targets (1.5°) before the task (see below for details on how these data were used to correct pupil size measurements). These fixation targets were evenly distributed across the area of image presentation and consecutively presented two times for 2 s each in random order.
Data were preprocessed using MATLAB R2014b (MathWorks; RRID:SCR_001622). Pupil size data were extracted from time points during steady fixation and epoched from −240 to 2760 ms. Visual inspection revealed many partial blinks in data from two subjects. As these lead to incorrect pupil size and gaze measurements, these subjects were excluded from further analyses. To correct for fixation location, a matrix of correction factors for each pixel within the image area was calculated using the fixation data acquired before the task. To do so, all pupil size samples within 1° horizontal and vertical distance from the respective fixation target were averaged to get one estimate of pupil size per fixation target. The 12 estimates (i.e., 1 per target) were then used to create a map covering the entire image area (i.e., a matrix with equal size and image resolution as the stimuli) using spline interpolation. Each number in this matrix was divided by the matrix's midpoint (corresponding to the center of the stimuli) to obtain the relative change in pupil size associated with eccentric fixation. Each sample acquired during task performance was then divided by the correction factor corresponding to the subject's current fixation location. Outlying samples were excluded by calculating mean and SD across all trials and per trial. Samples >3 SDs above or below the mean were excluded. Subsequently, data were baseline corrected using a relative baseline from −240 ms to stimulus onset to account for prestimulus fluctuation and adjust the scale across subjects. Because cognitive pupil responses are slow (frequency <4 Hz; Kloosterman et al., 2015), data were finally low-pass filtered using a Butterworth IIR filter (passband frequency: 8 Hz, stopband frequency: 12 Hz, passband ripple: 0.2 Hz, stopband attenuation: 60 Hz, mirror padding: 200 ms), and down-sampled to 500 Hz. The number of fixations was calculated for each image presentation (1 s). Fixations were merged if intervening saccades had an amplitude <0.5° to exclude periods misidentified as saccades due to noise from the scanner environment.
Eye movement data analyses.
Pupil size was analyzed with a nonparametric cluster-based permutation test (Maris and Oostenveld, 2007) as implemented in fieldtrip (Oostenveld et al., 2011; RRID:SCR_004849) in a time window from 0–2.7 s. Main effects of ENC/RET and OLD/NEW and their interaction were analyzed by calculating a t statistic for each sample across the time dimension. Clusters were formed by adjacent significant samples (p < 0.05). A Monte Carlo estimate of the permutation p value was calculated by randomly permuting condition labels (N = 5000) and comparing the clusters statistic (sum of t values) of each cluster found in real data with the maximal cluster statistics found in surrogate data. This procedure controls for multiple comparisons and results in a p value given by the proportion of cluster statistics in surrogate data exceeding that in real data, here α = 0.05. The number of fixations was analyzed using repeated-measures ANOVA.
Results
Behavioral data
Accuracy of target detection in phase 1 was high [d′ (M ± SD) = 5.37 ± 0.42] confirming that subjects were able to perform the task. Mean reaction time (±SD) for hits was 0.56 ± 0.05 s.
For accuracy (d′) during phase 2 (Fig. 2a), there was an effect of ENC/RET (t(1,25) = 14.92, p < 0.001), with more accurate responses during ENC than RET. As d′ for the retrieval condition combines responses for old and new images, these data cannot be analyzed in a 2 × 2 design. For RT during phase 2 (Fig. 2b), there was a main effect of ENC/RET (F(1,25) = 122.63, p < 0.001, ηp2 = 0.83), a main effect of OLD/NEW (F(1,25) = 10.27, p = 0.004, ηp2 = 0.29), and an interaction of both factors (F(1,25) = 5.55, p = 0.027, ηp2 = 0.18). Participants responded faster during ENC than RET (NEW: t(1,25) = 10.15, p < 0.001; OLD: t(1,25) = 9.13, p < 0.001). The interaction was driven by faster responses for OLD compared with NEW images only during RET (ENC: t(1,25) = 1.14, p = 0.264; RET: t(1,25) = 2.96, p = 0.007).
During phase 3, there was a main effect of ENC/RET (F(1,24) = 38.69, p < 0.001, ηp2 = 0.62) and OLD/NEW (F(1,24) = 120.88, p < 0.001, ηp2 = 0.83), and an interaction of both factors (F(1,24) = 11.88, p = 0.002, ηp2 = 0.33) on memory performance (d′; Fig. 3a). Participants were more accurate in their memory judgment for images which were old during phase 2 (i.e., images shown three times during phase 1 and one time during phase 2) than for images which were new during phase 2 (i.e., images shown once during phase 2) regardless of ENC/RET context (ENC: t(1,24) = 10.19, p < 0.001; RET: t(1,24) = 9.32, p < 0.001). Moreover, memory performance was higher for images shown during RET than ENC. This effect was stronger for NEW than OLD images (NEW: t(1,24) = 9.24, p < 0.001; OLD: t(1,24) = 2.60, p = 0.016). Analyzing raw memory ratings (Fig. 3b), as derived from the visual analog scale, instead of d′ revealed the same pattern of results (ENC/RET: F(1,24) = 71.63, p < 0.001, ηp2 = 0.75; OLD/NEW: F(1,24) = 138.26, p < 0.001, ηp2 = 0.85; Interaction: F(1,24) = 15.95, p = 0.005, ηp2 = 0.40).
Hit rates and false alarm rates for all phases can be found in Table 1.
Eye tracking data
During phase 2, a main effect of OLD/NEW on pupil size (Fig. 4b) was significant from 654 to 2700 ms poststimulus (∑t(1,23) = 39992, p < 0.001) with larger pupils for OLD than NEW stimuli. A main effect of ENC/RET was significant from 1530 to 2330 ms (∑t(1,23) = 1154, p = 0.030) with stronger dilation during RET than ENC. Importantly, the interaction between both factors was significant from 996 to 2700 ms (∑t(1,23) = 2663, p = 0.003). Post hoc comparisons of data extracted and averaged over this time period revealed a simple effect OLD > NEW during RET (t(1,23) = 6.52, p < 0.001) but not ENC (t(1,23) = 1.37, p = 0.182) and a simple effect RET > ENC for OLD (t(1,23) = 3.65, p = 0.001) but not NEW (t(1,23) = 0.29, p = 0.771) stimuli.
For the number of fixations (Fig. 4a), there was a main effect of OLD/NEW (F(1,23) = 25.33, p < 0.001, ηp2 = 0.52) and ENC/RET (F(1,23) = 105.77, p < 0.001, ηp2 = 0.82) but no interaction (F(1,23) = 0.39, p = 0.540, ηp2 = 0.02). Subjects fixated on more different spots for NEW than OLD stimuli and during RET compared with ENC.
Given, that task context had a significant effect on both, memory strength in phase 3 and the number of fixations in phase 2, we additionally used a within-subject regression analysis to delineate the independent contributions of retrieval mode (dummy coded as RET = 1 and ENC = 0) and viewing behavior (number of fixations) on memory strength (1 for certainly old, −1 for certainly new) in phase 3. This analysis should reveal whether effects of retrieval mode on memory are fully mediated by overt visual attention. Testing estimates of this model against the null at the group level essentially eliminates shared variance of both variables and hence provides a test for independent contributions of the predictor variables. We obtain an average β of 0.190 (±0.022 SEM) for task context and 0.081 (±0.013 SEM) for the number of fixations. Both are significant at the group level with p < 0.001.
Furthermore, we asked whether SMEs on pupil size depend on task context, as previously suggested (Goldinger and Papesh, 2012). To this end, pupil size was averaged around the peak of the cognitive effect on pupil size shown in Figure 4b (1.5–2.5 s poststimulus onset) for each trial. We then regressed subsequent memory strength on pupil size within each subject and condition. At the group level, the slopes of these regressions were significantly different from zero in all but the ENC-NEW condition [ENC-NEW: 0.026 (±0.041 SEM, p = 0.543), ENC-OLD: 0.505 (±0.052 SEM, p < 0.001), RET-NEW: 0.281 (±0.047 SEM, p < 0.001), RET-OLD: 0.621 (±0.042 SEM, p < 0.001)], indicating stronger dilation being associated with better subsequent memory performance. ANOVA on the slopes revealed a significant effect of ENC/RET (F(1,23) = 61.77, p < 0.001) with higher slopes for RET than ENC, OLD/NEW (F(1,23) = 164.66, p < 0.001) with higher slopes for OLD than NEW and a significant interaction (F(1,23) = 14.78, p < 0.001).
fMRI data
During phase 2, OLD compared with NEW stimuli were associated with higher activity in bilateral lateral prefrontal and parietal cortex, precuneus, medial PFC, middle temporal gyrus (MTG), thalamus, and right SN/VTA. The reversed contrast (NEW > OLD) was significant in right middle occipital gyrus and fusiform gyrus extending into the posterior parahippocampal gyrus. SVC with a bilateral hippocampus mask additionally revealed higher activity in left hippocampus for NEW compared with OLD stimuli (Table 2; Fig. 5a).
Retrieval versus encoding mode (RET > ENC) was associated with increased activity in bilateral insula, medial prefrontal and mid-cingulate cortex, visual cortex, thalamus, left cerebellum, right middle frontal gyrus and orbitofrontal cortex, striatum and, using SVC, in SN/VTA. Higher BOLD for ENC than RET was found in a large cluster spanning bilateral lateral temporal, parietal and prefrontal cortex, precuneus and medial PFC, cingulate cortex, as well as in right temporal pole and bilateral ventral striatum/basal forebrain (Table 3; Fig. 5b).
We were specifically interested in a modulation of OLD/NEW effects by task context. An interaction (RET-OLD + ENC-NEW > ENC-OLD + RET-NEW) was found in a cluster including right ventral striatum and pallidum. It was driven by stronger responses to NEW images during ENC and the reversed pattern, i.e., stronger responses to OLD images, during RET (Table 4; Fig. 6). The reversed contrast did not yield any significant results. Although this effect was significant using FWE correction at the cluster level, we additionally used a combined aal mask of right caudate, putamen, and pallidum to confirm localization of the effect. Thirty-two voxels lay within this mask and the effect at the peak voxel within the mask was significant at p(FWE) = 0.006.
Because the observed interaction mirrored effects seen in pupil size, we explored whether both effects were correlated. To this end, the difference in contrast estimates and pupil size [(RET-OLD + ENC-NEW) − (ENC-OLD + RET-NEW)] was extracted from the clusters showing the interaction effect. The correlation was trend-level significant (r = 0.409, p = 0.059, N = 22; Fig. 6).
To explore the neural mechanisms underlying subsequent memory performance we furthermore used subsequent memory strength as a parametric modulator. It should be noted here, that using one parametric modulator per condition controls for activation differences between the conditions (which may be associated with subsequent memory) and only captures variance in memory performance within each condition. However, BOLD signal was not significantly modulated by subsequent memory across experimental conditions (F-contrast of interest) in any brain region. SMEs also did not differ as a function of ENC/RET and OLD/NEW.
Discussion
In everyday life, the encoding of new information and the retrieval of already existing memories can switch on a short time scale. The current study sought to elucidate how the mesolimbic system, in particular the striatum, achieves this double duty. Using fMRI and a novel task including two contexts (ENC/RET), we can show that old/new effects in the ventral striatum and pallidum depend on explicit retrieval demands with higher activity for old stimuli in the retrieval context and higher activity for new stimuli in the encoding context. These results empirically confirm (Lisman and Grace, 2005) that striatal activity depends on incoming old/new signals and modulatory variables other than external rewards (Han et al., 2010). Particularly, our results suggest that the salience of old and new stimuli naturally reverses in a context, which induces encoding mode. Whenever retrieval is not relevant to the current task, the mesolimbic system might operate in a mode which emphasizes the detection of novel information to enhance memory encoding and exploration (Düzel et al., 2010).
An interaction effect of task and OLD/NEW status in pupil size data, which parallels striatal/pallidal activity, further supports our salience-based account, although the OLD > NEW effect during RET did not crossover during ENC in pupil size data. This result is in line with previous reports of pupil old/new effects (Võ et al., 2008; Heaver and Hutton, 2011; Otero et al., 2011) and suggests that these effects vanish when subjects are engaged in a perceptual rather than a retrieval task. Furthermore, the SME analysis shows that pupil dilation is associated with subsequent memory performance only when subjects are actively engaged in retrieval (Goldinger and Papesh, 2012) or are viewing familiar stimuli. Transient pupil dilation has been tightly linked to stimulus salience and arousal, a mechanism, which might temporarily increase visual sensitivity, and is mainly controlled by the superior colliculus and the locus ceruleus (Wang and Munoz, 2015; Joshi et al., 2016). The superior colliculus receives modulatory input from the basal ganglia (Wang and Munoz, 2015), rendering it biologically plausible that effects on striatal and pallidal activity are reflected in pupil size. Salience or arousal does not, however, seem to suffice to explain the full pattern of results. Future research may therefore further address the task dependence of memory-related effects on pupil size.
Based on the assumptions of a hippocampal-VTA loop (Lisman and Grace, 2005), one could expect the striatum to relay combined information about stimulus novelty and task context to the SN/VTA and hippocampus to drive memory formation. However, the current results do not support this claim (i.e., no interaction between task and OLD/NEW status was observed in these areas). Instead, the hippocampus and higher visual areas solely signaled novelty (NEW > OLD) independent of context. This is consistent with previous reports and models of the hippocampus as a novelty detector (Knight, 1996; Tulving et al., 1996; Grunwald et al., 1998; Strange et al., 1999; Lisman and Otmakhova, 2001; Lisman and Grace, 2005; Yonelinas et al., 2005; Daselaar et al., 2006; Duncan et al., 2012; Ben-Yakov et al., 2014; Kafkas and Montaldi, 2014). Likewise, novelty effects in higher visual areas and posterior parahippocampal gyrus confirm previous findings (Li et al., 1993; Xiang and Brown, 1998; Köhler et al., 2002; Howard et al., 2011; Bunzeck et al., 2014; Kafkas and Montaldi, 2014) and might relate to a sharpening of stimulus representations with repetition (Desimone, 1996; Wiggs and Martin, 1998; Ranganath and Rainer, 2003). Interestingly, these neural novelty effects were accompanied by enhanced visual exploration as indexed by a greater number of different fixation locations for NEW stimuli, in line with previous studies (Smith et al., 2006; Smith and Squire, 2008; Bradley et al., 2011). In this regard, enhanced visual activity for NEW stimuli might also relate to a more diverse visual input generated by more shifts in fixation location.
The SN/VTA, in turn, responded preferentially to OLD stimuli and was more active during retrieval than encoding mode. This latter finding is particularly interesting in light of the effects on subsequent memory performance during phase 3. Specifically, subsequent memory did not resemble the effects of task conditional stimulus salience in the striatum (i.e., there was no evidence for NEW and OLD items being better remembered when presented during ENC and RET, respectively) but instead was generally enhanced during retrieval mode, both for OLD and NEW images. Although most studies investigating the testing effect did not include novel stimuli during initial testing or a restudy control condition, one study did so and reported similar results (Jacoby et al., 2005) suggesting that not successful retrieval but retrieval mode per se boosts memory encoding (Rowland, 2014). Although such an interpretation (i.e., higher activity in the SN/VTA during retrieval drives memory performance) fits to the role of dopamine in learning and plasticity, it remains unclear why the SN/VTA did not exhibit novelty signals in the encoding context. One apparent difference between this and previous work (Bunzeck and Düzel, 2006; Wittmann et al., 2007; Bunzeck et al., 2014) lies in the relatively short encoding blocks, which alternated with retrieval contexts. Therefore, future studies may explore whether task switching comes at the cost of changes in SN/VTA novelty signals.
Higher activity during retrieval mode was not only observed in the dopaminergic midbrain but also in striatum, thalamus, lateral and medial PFC, cingulate cortex, insula, visual cortex, and cerebellum. Higher activity during encoding mode, in turn, was found in a large set of brain areas which overlaps with the default mode network (Raichle et al., 2001; Raichle, 2015). This finding resonates well with previous studies investigating the neural basis of the testing effect using word pairs (van den Broek et al., 2013; Wing et al., 2013). As outlined above, test-enhanced learning (and activations in the RET > ENC contrast) may (partly) be explained by the inherent difficulty of retrieval tasks (Bjork, 1999; Rowland, 2014). Following this rationale, the network of brain areas differentiating between RET and ENC also resembled results from a study investigating the effects of task difficulty in a visual detection task (Boehler et al., 2011). Furthermore, reaction times were increased and accuracy was decreased during RET compared with ENC. Subjects also fixated on more different spots during RET than ENC, indicating more extensive sampling of information during decision formation. This change in viewing behavior may explain some of the widespread main effects of task (e.g., in visual areas). It did, however, not fully account for improved subsequent memory in phase 3, suggesting that effects on memory performance are not fully mediated by overt visual attention.
Conclusion
Encoding versus retrieval mode enhances the salience of new and old information, respectively. This context-conditional salience is reflected in striatal activity and pupil size. Contrary to predictions derived from theories on the role of the mesolimbic system in memory processes, this signal was not relayed to the dopaminergic midbrain to boost memory (re-)encoding. Instead, subsequent memory benefitted from retrieval versus encoding mode, an effect that may relate to higher task difficulty inherent to retrieval and was associated with increased activity in a large set of areas including but not limited to the striatum and SN/VTA.
Footnotes
This work was supported by Grants from the German Research Foundation (Deutsche Forschungsgemeinschaft, BU 2670/2-1) to N.B. We thank Andreas Sprenger for helpful discussions on eye tracking data acquisition and analyses.
The authors declare no competing financial interests.
- Correspondence should be addressed to Dr. Nico Bunzeck, Institute of Psychology I, University of Lübeck, Maria-Goeppert-Strasse 9a, 23562 Lübeck, Germany. nico.bunzeck{at}uni-luebeck.de