Abstract
Research in reversal learning has mainly focused on the functional role of dopamine and striatal structures in driving behavior on the basis of classic reinforcement learning mechanisms. However, recent evidence indicates that, beyond classic reinforcement learning adaptations, individuals may also learn the inherent task structure and anticipate the occurrence of reversals. A candidate structure to support such task representation is the hippocampus, which might create a flexible representation of the environment that can be adaptively applied to goal-directed behavior. To investigate the functional role of the hippocampus in the implementation of anticipatory strategies in reversal learning, we first studied, in 20 healthy individuals (11 women), whether the gray matter anatomy and volume of the hippocampus were related to anticipatory strategies in a reversal learning task. Second, we tested 20 refractory temporal lobe epileptic patients (11 women) with unilateral hippocampal sclerosis, who served as a hippocampal lesion model. Our results indicate that healthy participants were able to learn the task structure and use it to guide their behavior and optimize their performance. Participants' ability to adopt anticipatory strategies correlated with the gray matter volume of the hippocampus. In contrast, hippocampal patients were unable to grasp the higher-order structure of the task with the same success than controls. Present results indicate that the hippocampus is necessary to respond in an appropriately flexible manner to high-order environments, and disruptions in this structure can render behavior habitual and inflexible.
SIGNIFICANCE STATEMENT Understanding the neural substrates involved in reversal learning has provoked a great deal of interest in the last years. Studies with nonhuman primates have shown that, through repetition, individuals are able to anticipate the occurrence of reversals and, thus, adjust their behavior accordingly. The present investigation is devoted to know the role of the hippocampus in such strategies. Importantly, our findings evidence that the hippocampus is necessary to anticipate the occurrence of reversals, and disruptions in this structure can render behavior habitual and inflexible.
Introduction
To accomplish their goals, humans adapt their behavior on the basis of learning stimulus–reward contingencies. A powerful explanation for reward-guided learning is provided by reinforcement learning (RL) theories (e.g., Sutton and Barto, 1998; Daw et al., 2005). RL models achieve behavioral adaptation by reinforcing successful actions and devaluating erroneous or nonrewarded actions through the computation of prediction errors (Watkins and Dayan, 1992). However, the main limitation of these classic RL models is that they work without using any representation of the task structure, such as rules describing the relationship between actions, rewards, and external variables (e.g., time) (Sutton and Barto, 1998; Daw et al., 2011). Indeed, in real life, reward contingencies often change in a predictable manner. For instance, in natural settings, some types of food are only available in a particular period of the year. It would be advantageous to take this into account to anticipate the occurrence of such periods and then initiate and exploit foraging behavior only in this particular temporal context.
Previous studies using the reversal learning task have shown that humans (Hampton et al., 2006) and other animals (Huh et al., 2009; Costa et al., 2015; Jang et al., 2015) can learn such high-order structure and use this knowledge to improve their decision-making. During this type of task, participants are requested to learn specific stimulus–response associations to attain certain rewards (e.g., money). After a variable number of trials (e.g., 16–24), the stimulus–reward associative structure changes. Although participants cannot fully predict the occurrence of a reversal, there is an underlying structure that can be learned. Specifically, the rule never changes during the first trials after a reversal (in the example given, the first 15 trials), but the reversal likelihood increases after that. Studies with nonhuman primates have shown that, through repetition, individuals are able to identify these two temporal contexts, anticipating the occurrence of reversals and, thus, adjust their behavior accordingly (Costa et al., 2015; Jang et al., 2015). Importantly, this high-order structure of the environment is not captured by classic RL models and is acquired slower than stimulus–reward mappings, as it involves creating an internal model of the task structure (Costa et al., 2015).
One brain region that may be especially involved in learning such structures is the hippocampus. Existing evidence suggests that the hippocampus supports rapid encoding of multiple items, sequences, and events in relation to the temporal context (Allen et al., 2016) by creating a flexible representation of the environment (Howard and Kahana, 2002; Konkel and Cohen, 2009; Hsieh et al., 2014), which can be later used to guide performance (Eichenbaum, 2000; Squire et al., 2004; Aggleton et al., 2007). Thus, the hippocampus might be critical for maintaining the representations of the contingency structure of a task given a particular context, which can be then adaptively applied to anticipate potential changes in the environment (Bornstein and Daw, 2013).
To unravel the functional role of the hippocampus in the implementation of anticipatory strategies in reversal learning, we first studied whether the gray matter (GM) anatomy and volume of the hippocampus were related to participants' ability to implement strategies based on the predictability of reversals. Additionally, we included a group of refractory temporal lobe epileptic patients with unilateral hippocampal sclerosis (who served as a hippocampal lesion model), to test whether the use of these strategies was hippocampus-dependent. In that case, we hypothesized that lesions within the hippocampus should impair the use of strategies determined by the temporal context, leading behavior to be dependent on classic RL mechanisms.
Materials and Methods
Participants.
Initially, 20 healthy subjects were recruited for the first part of the study (11 women, 55 ± 51.0%; age: 42.1 ± 14.3 years; education: 11.6 ± 4.2 years). However, the final sample was reduced to 19 participants (10 women, 53 ± 51.3%; age: 40.7 ± 13.3 years; education: 11.8 ± 4.1 years). One subject was excluded because of poor performance (he never changed his choice during the reversal learning task). The temporal lobe epileptic group with unilateral sclerotic hippocampus (TLE-UHS) consisted of 20 patients (11 women, 55 ± 51.0%; age: 41.8 ± 12.3 years; education: 11.9 ± 4.0 years), 10 suffering from left and 10 from right medically refractory TLE. All patients were recruited after a presurgical evaluation at the Hospital of Bellvitge. Patient diagnoses were established according to clinical EEG and MRI data (Cendes, 2005; Malmgren and Thom, 2012; Tatum, 2012). All of the patients underwent a standardized neurological and neuropsychological examination, prolonged interictal and ictal video-EEG monitoring, and brain MRI assessed by an expert neurologist and a neuroradiologist. All patients were evaluated before and 3 months after an anterior temporal lobe resection for the relief of medically intractable TLE. The surgery, performed by the same neurosurgeon, consisted of en bloc resection of the mesiotemporal structures. The resection line extended 4.0–5.0 cm from the temporal pole in the nondominant hemisphere and 3.5–4.0 cm in the dominant hemisphere. Hippocampal sclerosis was confirmed in all patients with an anatomopathological study by the same pathologist. None of the patients suffered a seizure during the experimental task or 24 h before it, and all of the patients were on habitual antiepileptic drug regimens. In the current study, the chronic temporal lobe epileptic patients were matched in gender (χ2 (1, N = 39) = 0.02, p = 0.88), age (t(37) = −0.25, p = 0.81), and years of education (t(37) = −0.01, p = 0.99) with the healthy control group (demographic data are summarized in Tables 1, 2). The study was approved by the Ethical Committee of Bellvitge University Hospital. Written informed consent was obtained from all of the patients and controls. The experiment was performed in accordance with the guidelines of the Declaration of Helsinki and approved by the Ethical Committee of University Hospital of Bellvitge, Spain.
Demographic data for temporal lobe epilepsy (left and right) patients and controls included in the study
Demographic data for controls and TLE+UHS patients included in the studya
Procedure.
Regarding behavioral acquisition, an initial reversal learning session was performed for each group. The neuropsychological battery was performed only one time for each group, and 1 d before the initial reversal learning session. The MRI acquisition session was held before the reversal learning session. To test the effects of removing unilateral medial-temporal lobe (MTL) on reversal learning, patients performed an additional session. This session (postsession) was undertaken a minimum of 6 months after the initial session, to avoid learning effects and at least 3 months after the surgery.
Experimental design.
We used a modified version of the probabilistic reversal learning task of Jocham et al. (2009) (Fig. 1, task illustration). The experiment was divided into 63 blocks (pseudorandom order), with 16–24 trials each, which resulted in 1260 trials. In every trial, two gray squares, located in the middle of the screen on either side of a central fixation point, were presented over a black background for 1000 ms. All participants had to select one of the two gray squares by pressing the right or left button of the mouse. A feedback stimulus indicating a win (happy face) or loss (sad face) of 0.06€ appeared 700 ms after the response. It was presented in the middle of the screen for 800 ms. To avoid automatic responses, the intertrial stimulus was randomly set to 500–900 ms. If participants did not respond in the established 1000 ms after the presentation of the stimulus, a question mark appeared in the middle of the screen. Every 3 blocks, self-paced resting breaks were given between the seventh and 10th trial of each block to impede guessing of the rule change. During these breaks, the information about the amount of money accumulated to date was provided.
Schematic illustration of the task design. For each trial, participants had to choose one of the two squares placed on each side of the screen by pressing the corresponding mouse button. After a delay period of 700 ms, feedback was presented. An example of a sequence of trials in the probabilistic reversal learning task is given. The probability of obtaining positive or negative feedback on each side is stated for each sequence. Note the differences between the two negative feedbacks displayed. In the first, the participant responded correctly (choosing the most rewarding action) but got a spurious negative feedback. In contrast, in the second, the participant made an incorrect choice as the rule changed and then kept selecting the previous correct choice.
Participants were informed that the probability of winning (75% to win on one side, and 25% to win on the other) remained constant during some trials, but after an unspecified period, the probability reversed. On each trial, participants had to choose the stimulus leading to the reward outcome. Reward contingencies reversed (rule reversal) after a randomly jittered block length of 16–24 trials (rule change occurred always between the 16th and 24th trial), and participants had to switch their selection to the new rewarded alternative. The responses to the side with the highest probability of winning were considered as correct choices; in like manner, responses to the side with the lowest probability of losing were considered as failures. In the present study, the accuracy rate was calculated as the sum of correct responses divided by the total number of responses. Because of the established probabilities, participants had to monitor the meaning of the different feedbacks and thus recognize when a correct choice was nonrewarded or when the negative feedback signaled the need for behavioral change because the rule had reversed (Fig. 1).
Importantly, participants were instructed to change the response pattern only when they were completely sure that the rule had changed. A brief training session was provided at the beginning of the encounter to ensure comprehension of the task. Participants were encouraged to win as much money as possible during the task.
RL model.
A Q-learning model used by Watkins and Dayan (1992) was implemented. The model uses reward prediction error to update the weights associated with each stimulus and probabilistically chooses the stimulus with the higher weight. The weight is then updated using the following algorithm:
where α is the learning rate and δ represents the prediction errors, calculated as the difference between the outcome and the expectancy or weight of the selected figure. Softmax action selection was used to compute the probability of choosing one of the following two options:
where γ is the inverse of the temperature parameter. γ determines the degree to which differences in reward values between two potential actions are translated into a more deterministic choice.
The model was run 10 times, using random initial values for each subject by maximizing the log likelihood estimate with the fmincon function of MATLAB R2008 (The MathWorks). The parameters α and γ with the best log likelihood estimate were selected. Once α and γ were individually calculated, model predictions could be determined on a trial-by-trial basis.
MRI data acquisition.
Controls and TLE-UHS patients underwent MRI whole-brain structural scans, including T1 weighted and FLAIR acquisition protocols. The high-resolution T1 weighted image (slice thickness = 1 mm; no gap; number of slices = 240; TR = 2300 ms; TE = 3 ms; matrix = 256 × 256; FOV = 244 mm; voxel size = 1 × 1 × 1 mm) was acquired with a 3.0 tesla Siemens Trio MRI system at the Hospital Clinic of Barcelona. The FLAIR image (slice thickness = 5.2 mm; no gap; number of slices = 19; TR = 7295 ms; TE = 12 ms; matrix = 256 × 256; FOV = 230 mm; voxel size = 0.89 × 0.89 × 5.2 mm) was acquired on a 1.5 Philips Intera scan at the University Hospital of Bellvitge. Both structural scans were assessed by an expert neurologist who found no structural abnormalities in addition to unilateral hippocampal sclerosis in TLE-UHS patients.
Voxel-based morphometry analyses.
To evaluate the possible neural correlates between “anticipatory switching” and the hippocampus in healthy participants, regions of interest (ROIs) for the left and right hippocampus were defined based on the Anatomical Automatic Labeling Atlas in MNI space using the WFU pickatlas tool (Tzourio-Mazoyer et al., 2002; Maldjian et al., 2003; Maldjian et al., 2004). Two extra ROIs covering the caudate and the ventral striatum were also created. These regions were selected given the strong connections between the striatum and the hippocampus, and their shared role in learning and memory (Lisman and Grace, 2005; Haber and Knutson, 2010). The caudate ROI was built using the toolbox WFU pickatlas and the Automated Anatomical Labeling Atlas, as with the hippocampal ROI. The ventral striatum ROI was created using the results from an independent monetary gambling task from a previous experiment, which was conceived to functionally localize this structure (contrast: gains > losses, thresholded at FWE-corrected p < 0.001) (Ripollés et al., 2014). These ROIs served as control regions to further assess the specificity of our findings (as in Ripollés et al., 2016). Voxel-based morphometry within these ROIs (VBM) (Ashburner and Friston, 2000) was performed using Statistical Parametric Mapping software (SPM8; Wellcome Department of Imaging Neuroscience, University College, London; www.fil.ion.ucl.ac.uk/spm). Specifically, New Segment (Ashburner and Friston, 2005) was applied to the structural T1 weighted images of each subject. The resulting GM tissue probability maps were imported and fed into Diffeomorphic Anatomical Registration using Exponentiated Lie algebra (DARTEL) (Ashburner, 2007) to achieve spatial normalization in MNI space (using “modulation” to compensate for the effect of spatial normalization). Normalized images were smoothed using an isotropic spatial filter (FWHM = 8 mm) to reduce residual interindividual variability. The individual smoothed GM volume images were entered into a second-level analysis using a one-sample t test and were correlated with the behavioral measures of anticipatory switching (deviation from RL in “anticipatory switching,” see Behavioral results).
VBM was also used to calculate the differences in GM volume between patients and controls within the hippocampus. Patients' T1 weighted images (before surgery) underwent the same processing pipeline described above. Individual smoothed GM volume images for controls and left TLE-UHS patients (LTLE-UHS) were entered into a two-sample t test, and two contrasts of interest were calculated: Controls > LTLE-UHS and LTLE-UHS > Controls. The same analysis was repeated using the controls and the right TLE-UHS patients (RTLE-UHS).
Contrasts were thresholded at a p < 0.005 uncorrected threshold at the voxel level with a cluster extent of >50 contiguous voxels (Lieberman and Cunningham, 2009). Corrected peaks are reported using a p < 0.05 FWE small volume correction (SVC) threshold.
Volumetric analyses of the hippocampus.
Following previous studies showing that hippocampal volume is a good predictor of behavior in TLE patients (Fuentemilla et al., 2013) and in other types of patients with hippocampal damage (Horner et al., 2012), we used Freesurfer (http://surfer.nmr.mgh.harvard.edu, version 5.0.0) to compute the volume of the hippocampus in both patients and controls. Freesurfer volume estimates of the hippocampus have been previously validated both in controls (Morey et al., 2009) and in TLE patients with hippocampal sclerosis (Pardoe et al., 2009). The Freesurfer pipeline includes removal of nonbrain tissue (Ségonne et al., 2004), automated Talairach transformation, intensity normalization (Sled et al., 1998), segmentation of the subcortical white matter and deep GM (Fischl et al., 2002, 2004), tessellation of the GM/white matter boundary, automated topology correction (Fischl et al., 2001; Ségonne et al., 2007), and surface deformation to detect GM/white matter and GM/CSF boundaries (Fischl and Dale, 2000). A deformable procedure was applied to parcellate the cerebral cortex into different regions according to gyral and sulcal structure information (Desikan et al., 2006).
Volumes for the hippocampus in both hemispheres were extracted and corrected for total intracranial volume (TIV) (Horner et al., 2012; Fuentemilla et al., 2013). First, a linear regression with the extracted volumes as the dependent variable and TIV as the independent was performed. Using the unstandarized β weights of this correlation, the volumes were corrected according to the equation (Vic = Vi + (meanTIV − TIVi) × β), where Vic is the corrected volume for a particular structure and subject, Vi is the raw volume as calculated by Freesurfer, meanTIV is the mean TIV for a particular group, TIVi is the specific TIV of the subject, and β is the β weight calculated for this subject during the linear regression. To perform this analysis, we used the sum of the left and the right hippocampal corrected volumes, given that half of the patients have damage to one hemisphere specifically (similar to Fuentemilla et al., 2013). First, using an independent-samples t test, hippocampal volumes were compared between patients and controls. Then, hippocampal volume was correlated with deviation from RL in “anticipatory switching” by using Pearson's r.
Neuropsychological scoring.
Neuropsychological data for all participants were obtained only for the “pre” sessions (presession for the control group, and presurgery for the patient group). The neuropsychological data are summarized in Table 2. All participants (patients and controls) completed the Logical Memory I (immediate verbal memory) and Logical Memory II (delayed verbal memory), the Visual Reproduction I (immediate visual memory) and Visual Reproduction II (delayed visual memory), the Digits Span, and the Letters and Numbers subtests of the Wechsler Memory Scale III (Wechsler, 2004); the Vocabulary subtest of the Wechsler Adult Intelligence Scale (Wechsler, 1999); the Rey Auditory Verbal Learning Test (Rey, 1941; Schmidt, 1996); the Trail Making Test (Reitan, 1955; Davies, 1968); the Semantic Fluency and the Phonemic Fluency subtest of the Barcelona Test-R (Peña-Casanova, 2005); and the Rey-Osterrieth Complex Figure (copy and memory) (Rey, 1941; Osterrieth, 1944; Peña-Casanova, 2005). For each neuropsychological subtest, we used independent-sample t tests to elucidate the differences between groups of participants (Controls, TLE-UHS). The mean scores and statistical analyses of the neuropsychological test for patients and controls are summarized in Table 2. Even though most of the verbal and visuospatial functions were in some regards preserved before the surgery in the TLE-UHS group (e.g., correct performance on Rey Auditory Verbal Learning Test, in Logical Memory I and II, in Visual Reproduction I, and in copy Rey-Osterrieth Complex Figure), some effects were observed, as previously reported (Hermann et al., 1995, 1997; Helmstaedter, 2002; Pereira et al., 2010), in figural memory (low scores in Visual Reproduction II and in Rey-Osterrieth Complex Figure memory) and in verbal aspects (low scores on the Vocabulary subtest).
Statistical analysis.
An analysis comparing the behavioral data from the left and the right TLE-UHS patients was performed to control for any effect of the laterality of the lesion or surgery. As no differences were found in any behavioral measure regarding the side of the lesion, we decided to collapse the results of both the right and the left TLE-UHS patients. Therefore, in the subsequent analysis, patients are treated as a single TLE-UHS group. Statistical effects regarding comparison within groups were obtained using univariate or repeated-measures ANOVA with age as a covariate. The same procedure was used to study differences between groups, adding group as a between-subject factor. To obtain an “anticipatory switching” measure, we used a principal components analysis. Factor coefficient scores for individual subjects were estimated using the Anderson-Rubin method (Anderson and Rubin, 1956).
For all statistical effects involving two or more degrees of freedom in the numerator, the Greenhouse-Geisser ε was used to correct possible violations of the sphericity assumption. p values after this correction are reported.
Results
Behavioral analyses in healthy subjects
Healthy subjects presented, on average, an accuracy rate (proportion of correct responses divided by the total number of responses) of 73.0% (SD = 5.3%) and earned 18.6€ (SD = 5.1€). To examine whether participant performance was captured by classic RL assumptions, a standard RL model, following a Q-learning model procedure used by Watkins and Dayan (1992), was fitted to participants' behavioral performance (pseudo-R2 = 0.49, SD = 0.13). Participants had a mean learning rate of 0.71 (SD = 0.14) and an inverse temperature value of 2.12 (SD = 0.73).
In Figure 2A, we show the performance of healthy subjects across trials and the performance predicted by the RL model. Although the model predicts most of the participants' performance, during early trials participants show faster adaptation than what the model predicts. In particular, the predictions of the model differed from the performance observed in healthy subjects in the first trials of each block (from trial 4 to 9, all p values < 0.05, false discovery rate [FDR] corrected). A similar pattern of behavior has been previously reported in other studies (Mas-Herrero and Marco-Pallarés, 2014). These differences could be driven by participants' ability to predict the occurrence of a reversal, which in turn may allow them to adapt faster. If so, participants should be more likely to switch from the current correct response to the current incorrect response following a negative feedback by the end of the block, when reversals are more likely. In contrast, once in the current incorrect response, they should be less likely to switch to the current correct response following a negative feedback by the end of the block than at the beginning.
A, Learning curves during reversal learning. y-axis indicates the percentage of trials in which participants selected the rewarding choices. x-axis indicates the trial number. Blue lines indicate control participants' behavior. Gray-pointed line indicates the prediction of the model given participants' data. B, Mean percentage of switch following spurious negative feedback (negative feedback with current correct action) and incorrect negative feedback (negative feedback with current incorrect action). Blue bars represent control participants' behavior. Gray bars represent the prediction of the model given participants' data. C, Mean percentage of anticipatory switching following spurious negative and incorrect negative feedback across blocks. D, Correlation between deviation from RL model in “anticipatory switching” and “accuracy.”
Thus, we analyzed the probability of a behavioral switch following spurious negative feedback (negative feedback with current correct action) and incorrect negative feedback (negative feedback with current incorrect action) during the first (from 1 to 12) and last trials of each block (from 13 to 24), where the occurrence of reversal is more likely. As expected, individuals switched more following a spurious negative feedback at the end than at the beginning of the block (t(18) = −3.66, p = 0.002), whereas they stayed more in the current incorrect target following an incorrect negative feedback in the last trials of the block (t(18) = 2.59, p = 0.019). That is, participants were anticipating the occurrence of reversals, as shown in the increase of the probability of selecting the current incorrect target along the block according to the increasing probability of the appearance of a reversal trial. Furthermore, we computed for each individual, the mean difference in the percentage of switching between early and late trials of each block following spurious and incorrect negative feedback and compared the resulting values with those predicted by the model. The probabilities of being rewarded are the same during these trials, and thus the fact that the probability of choosing the current incorrect response following negative feedback increases across the block, cannot be accounted by a devaluation of the current correct action. Indeed, differences were found between the model's predictions and the participants' behavior following both spurious (t(18) = 4.57, p < 0.001) and incorrect (t(18) = −5.13, p < 0.001) negative feedback (Fig. 2B).
We also examined to what extent this strategy was learned through the 63 blocks of the task. Figure 2C shows differences in switching between early and late trials of each block throughout the task for both kinds of feedback. As observed in the figure, difference in percentage switching between early and late trials increases through participants' experience following spurious (main effect of time, F(42,714) = 1.65, p = 0.007) and incorrect negative feedback (main effect of time, F(42,756) = 1.94, p < 0.00) but in opposite directions. Notably, difference in switching between early and late trials following spurious and incorrect negative feedback was not different from 0 during the first third of the task (t test against 0: p > 0.05, FDR corrected) but increased significantly during the second and final thirds (t test against 0: p < 0.05, FDR corrected). These results indicate that individuals learned to predict the occurrence of reversals through experience.
Notably, both measures (difference in percentage switching between early and late trials following spurious and incorrect negative feedback) were highly correlated (r(19) = −0.59, p = 0.008). This correlation further confirms that both patterns of behavior reflect the same strategy: anticipation of reversals. To have a single measure of this strategy, we performed a principal component analysis, including both measures. The resulting component was called “anticipatory switching.” Similarly, we performed a second principal component analysis with the predictions driven by the RL model. Differences between the two resulting factors were computed for each individual to assess the deviation from the RL predictions. If anticipating the occurrence of reversals, as measured by the “anticipatory switching” factor, is an optimal strategy, we hypothesized that individuals showing higher “anticipatory switching” than that predicted by the model would show better accuracy. To assess the contribution of anticipatory switching in accuracy enhancement, we correlated the deviation from RL in anticipatory switching with deviation from RL in accuracy (the difference between the observed accuracy and the one predicted by the model; Fig. 2D). The correlational analysis revealed a significant positive relationship between both measures (r(19) = 0.63, p = 0.004), indicating that this strategy was optimal to perform the task beyond classic trial-and-error adaptations.
Voxel-based morphometry analysis of the hippocampus and control regions in healthy subjects
Next, we used VBM to examine to what extent individual differences in GM volume in the hippocampus predicted anticipatory switching.”
In healthy controls, deviation from RL in “anticipatory switching” positively correlated with the hippocampus bilaterally (only the peak at the left hippocampus survived correction; Fig. 3A; 232 voxels at the left hippocampus, main peak at x = −30, y = −3, z = −29, t(17) = 3.92, p < 0.05 FWE-corrected using SVC; 215 voxels at the right hippocampus, main peak at x = 30, y = −10, z = −21, t(17) = 3.33, p = 0.1 FWE-corrected SVC; Fig. 3A). These results suggest that individuals with greater hippocampal GM volume showed greater anticipatory switching. No significant correlation was found, as expected, for any voxel within the control ROIs for caudate and ventral striatum.
A, Correlates between GM volume in healthy participants and deviation from RL in “anticipatory switching.” Results for hippocampus (red-yellow) are shown at a p < 0.005 uncorrected threshold at the voxel level (main peaks in the left cluster survived a FWE-corrected SVC, p < 0.05 threshold). Neurological convention is used with MNI coordinates at the bottom right of each slice. Scatter plots represent the correlation between mean GM volume and the deviation from RL in “anticipatory switching.” B, Left, Difference between total hippocampal volume (mean ± SD in milliliters) between controls (blue) and TLE+UHS patients (black). Right, Significant correlations between total hippocampal volume (in milliliters) and the deviation from RL in “anticipatory switching” in patients (black) and controls (blue). *p = 0.002. L, Left hemisphere; R, right hemisphere.
Between-group VBM comparison of the hippocampus
To confirm the relationship between hippocampal volume and the capacity to anticipate predictable changes in the environment, we analyzed a group of 20 TLE patients with unilateral hippocampal sclerosis (10 suffering from left and 10 from right medically refractory TLE). To further confirm the hippocampal lesions in the TLE-UHS group, we calculated the differences in GM volume between patients and controls within these regions. As expected, the right TLE-UHS patients group showed lower GM volume in the right hippocampus (one cluster with 408 voxels; x = 32, y = −16, z = −17; t(28) = 3.57, p < 0.039 FWE-corrected using SVC; Fig. 4A) compared with healthy controls. Similarly, left TLE-UHS patients also showed lower GM volume in the left hippocampus compared with healthy participants (one cluster with 1621 voxels; x = −32, y = −15, z = −17; t(28) = 6.85, p < 0.001 FWE-corrected using SVC; Fig. 4B). These results further confirm that the TLE-UHS group presented less hippocampal GM volume than controls.
Increased GM volume in healthy participants compared with (A) left TLE+UHS and (B) left TLE+UHS patients. Results for hippocampus (red-yellow) are shown at an auxiliary p < 0.005 uncorrected threshold at the voxel level (main peaks in both clusters survived an FWE-corrected SVC, p < 0.05 threshold). Neurological convention is used with MNI coordinates at the bottom right of each slice. L, Left hemisphere; R, right hemisphere.
Behavioral comparisons between left and right TLE-UHS
No behavioral differences in any measure were found between left and right TLE-UHS. The two groups presented, on average, similar accuracy rates (F(1,17) = 1.50, p = 0.24) and accumulated similar amounts of money (F(1,17) = 2.54, p = 0.13). RL parameters, such as the learning rate (learning rate: F(1,17) = 0.70, p = 0.41), inverse of the temperature (F(1,17) = 0.89, p = 0.36), and model fitting (pseudo-R2: F(1,17) = 0.71, p = 0.41), were also similar for the two groups. Both groups also showed similar percentage of switching between early and late trials following both spurious (F(1,17) = 0.01, p = 0.92) and incorrect negative feedback (F(1,17) = 0.06, p = 0.81). Taking into account the lack of behavioral differences between left and right TLE-UHS groups, both were collapsed into one TLE group.
Behavioral comparisons between controls and TLE-UHS
As no behavioral differences in any measure were found between left and right TLE-UHS, all patients were included in one single TLE-UHS group presenting on average an accuracy rate of 69.8% (SD = 5.2%), who earned 15.8€ (SD = 4.7€). Patients were less accurate than Controls (accuracy: F(1,36) = 4.46, p = 0.042) and consequently gained less money (F(1,36) = 4.34, p = 0.044).
The RL model was fitted to patients' behavioral performance (pseudo-R2= 0.43, SD = 0.19), leading to a mean learning rate of 0.77 (SD = 0.20) and a mean inverse temperature value of 1.89 (SD = 0.98). None of these measures differed between control participants and patients (all p values > 0.20).
In Figure 5A, we show the performance of TLE-UHS patients across trials and the performance predicted by the RL algorithm. In contrast to healthy subjects, no differences were found between the performance predicted by the model and the observed behavior in the TLE-UHS group (all p values > 0.15, FDR corrected). These results might suggest that patients from the TLE-UHS group were not able to take advantage of the predictability of the environment and could not anticipate potential reversals. This also means that they might be guiding their choices only according to standard RL principles.
A, Learning curves during reversal learning. y-axis indicates the percentage of trials in which participants selected the rewarding choices. x-axis indicates the trial number. Blue lines indicate control participants' behavior. Gray-pointed line indicates the prediction of the model given TLE+UHS patients' data. Black lines indicate TLE+UHS patients' behavior. B, Mean percentage of switching between early and late trials following spurious negative feedback (negative feedback with current correct action) and incorrect negative feedback (negative feedback with current incorrect action). Black bars represent TLE+UHS patients' behavior. Blue bars represent control participants' behavior. Gray bars represent the prediction of the model given participants' data. C, Mean percentage of anticipatory switching following spurious negative and incorrect negative feedback across blocks for TLE+UHS (solid black line) patients and controls (solid blue line). D, Mean percentage of accuracy for the first third (1–21) and the last third of blocks of the task (43–63), for control participants (solid blue line) and TLE-UHS patients (solid black line).
We then analyzed the pattern of switching developed by the TLE-UHS following spurious and incorrect negative feedback. Similar to controls, although less pronounced, the switching pattern of the TLE-UHS group between early and late trials of each block following either spurious (t(19) = 2.35, p = 0.03) or incorrect negative feedback (t(19) = −2.59, p = 0.02), also differed from the predictions of the RL model. Compared with the model, TLE-UHS patients switched more following a spurious negative feedback and stayed more following an incorrect negative feedback by the end each block, when reversals are more likely (Fig. 5B). However, controls developed greater “anticipatory switching” than patients following spurious (F(1,36) = 4.42, p = 0.04) and incorrect negative feedback (F(1,36) = 5.40, p = 0.027). In addition, the “anticipatory switching” strategy of the TLE-UHS group (Fig. 5C) remained stable across blocks following both spurious (time effect: F(42,756) = 0.87, p = 0.71) and incorrect negative feedback (time effect: F(42,714) = 1.04, p = 0.41), in contrast to the control group (interaction group × time following spurious: F(42,1512) = 1.4, p = 0.048; and incorrect negative feedback: F(42,1386) = 1.88, p = 0.001). These results further suggest that patients were not able to learn the high-order structure of the task through experience with the same success as healthy participants. Based on this, we also hypothesized that, whereas controls' performance should improve through experience, patients' performance should remain stable across the task. To test this, we compared the accuracy of both groups during the first and last third of the task. As observed in Fig. 5D, there was a clear interaction between time and group (F(1,36) = 6.37, p = 0.016). The performance of controls improved between the first and the last third (F(1,5) = 78.9, p < 0.001), something that did not occur in patients (F(1,4) = 0.08, p = 0.79). But most importantly, controls and patients did not differ in their performance during the first third of the task (F(1,36) = 0.3, p = 0.59), whereas controls surpassed patients in the last third (F(1,36) = 18.2, p < 0.001). Finally, to study whether these changes in accuracy were driven by changes in learning rate across blocks rather than by the anticipation of reversals, we fitted the model separately for the first and the second half of the task. The resulting learning rate and inverse temperature parameter did not differ between first and second half of the task, either in control or patients (all p values > 0.15). No interaction or group effect was found either (all p values > 0.25). This lack of effect indicates that differences in accuracy and anticipatory switching found across blocks and between groups cannot be explained by changes in learning rate.
Volumetric analyses of the hippocampus
To confirm the relationship between the use of anticipatory strategies and intact hippocampal volume in the TLE-UHS group (as measured with Freesurfer; for a similar procedure, see Horner et al., 2012; Fuentemilla et al., 2013), we correlated the total hippocampal volume with the deviation from RL in “anticipatory switching” in both groups. First, as expected, patients showed significant lower total hippocampal volume than controls (7.68 ± 0.97 ml and 8.72 ± 0.93 ml for patients and controls, respectively; t(37) = 3.37, p < 0.002; Fig. 3B). Crucial for our hypothesis, hippocampal volume predicted anticipatory switching in both patients (r(20) = 0.60, p < 0.005) and controls (r(19) = 0.47, p < 0.05; Fig. 3B; note how the regression line is similar in both groups; Fig. 3B).
Behavioral comparisons between preoperatory and postoperatory sessions
The TLE-UHS group was also tested 3 months after an anterior temporal lobe resection. Patients showed similar performance after the surgery. They presented, on average, similar accuracy rates (F(1,18) = 0.42, p = 0.53) and accumulated similar amounts of money (F(1,18) = 0.76 p = 0.39), as during the first session. The value of the RL parameters derived from the RL model was also similar between sessions (learning rate: F(1,18) = 0.001, p = 0.98; inverse temperature value: F(1,18) = 0.15, p = 0.70; and model fitting: F(1,18) = 0.008, p = 0.93). In both sessions, the TLE-UHS presented similar behavioral switching patterns between early and late trials following spurious negative (t(19) = 1.20, p = 0.25) and incorrect negative feedback (t(19) = 1.08, p = 0.30). The lack of differences in any of the aforementioned measures implies that the damage induced by the surgery does not further affect the already diminished capacity of TLE-UHS patients to learn the high-order structure of the reversal learning task.
Discussion
In the present study, we investigated the role of the hippocampus in subserving anticipatory strategies, which rely on the correct identification of differential temporal contexts in reversal learning. To test this hypothesis, a probabilistic reversal learning task was administered to healthy participants and to a group of refractory temporal lobe epileptic patients with unilateral hippocampal sclerosis (who served as a hippocampal lesion model). Our results provide evidence in favor of a key role for the hippocampus in the creation of an internal model of the task structure and in the implementation of optimal strategies given this knowledge.
It has been proposed that two control systems might be used in reversal learning. The first involves learning which actions are more optimal through direct evaluation of their rewarding consequences using prediction error signals (Sutton and Barto, 1998). The second, which arises later in time after some training, represents the subject's belief that reversals or changes in reward contingency can occur. Thus, individuals generate a likelihood estimate that a reversal can occur, which in turn allow them to anticipate and rapidly adapt the rule change (Costa et al., 2015; Jang et al., 2015). This idea has been proved using the reversal learning task in extensively trained nonhuman primates. Using a Bayesian model, Costa et al. (2015) showed that animals were able to adjust their behavior according to the structure of the task and they also tended to switch before the reversal trial, especially in more uncertain conditions. The authors proposed that this behavior could be a combination of priors and evidence obtained during task performance. In agreement with this line of reasoning, we have shown that healthy controls overtook behavioral adaptations predicted by model-free learning models during the reversal learning task. Notably, our results indicate that participants could learn the task's structure and could anticipate the moment in time at which the rule was most likely to change, thus using this information to guide their choices (Fig. 2B,C). These findings are also consistent with a previous study that showed how healthy individuals may take advantage of the underlying structure of the reversal learning task to guide their choices, rather than only relying on the individual reward history of each action as captured by classic RL algorithms (Hampton et al., 2006). Additionally, we observed that participants with more anticipatory switching showed better accuracy during the task. These results converge with previous evidence suggesting that, in volatile environments, learning the sequential contingencies of events and actions (learning the task structure), and using them to guide decisions, improves individual performance (Acuña and Schrater, 2010) compared with standard RL strategies.
Importantly, in another study using the same Bayesian approach, Jang et al. (2015) demonstrated that nonhuman primates with lesions in the hippocampus (as well as orbitofrontal cortex, amygdala, and rhinal cortices) showed an impairment in a reversal learning task, which was explained by a reduction of the initial values of prior beliefs in reversals, rather than by an impairment in trial-and-error adaptations. Accordingly, early animal studies already indicated relevant participation of inferior temporal lobe and MTL regions (rhinal, parahippocampal, and hippocampal) in the modulation of reversal learning (Murray et al., 1998; Bussey et al., 2002; Browning et al., 2007; Morellini et al., 2010). In particular, previous findings indicate that lesions in the inferior temporal lobe and hippocampal regions disrupted reversal learning in monkeys (Murray et al., 1998; Bussey et al., 2002; Browning et al., 2007) and rats (for a review, see Whishaw, 1998). On the other hand, rats with a higher number of neurons in the dentate gyrus of the hippocampus showed faster adaptation to reversal learning (Morellini et al., 2010). In this vein, here we show a clear association between GM volume in the hippocampus and the use of anticipatory strategies relying on the temporal context, pointing up the key role of this region in goal-directed behaviors (Fig. 3A,B). Additionally, to further confirm the role of the hippocampus, we tested a group of refractory temporal lobe epileptic patients with unilateral hippocampal sclerosis, who served as a hippocampal lesion model. First and in convergence with the VBM analysis, volumetric measures of the hippocampus predicted anticipatory strategies in both patients and controls (Fig. 3B; as expected, patients had less hippocampal volume than healthy individuals; Fig. 4). In addition, patients with hippocampal sclerosis were not able to anticipate the occurrence of the reversals in the same manner as controls, suggesting impairment in learning the structure of the task (Fig. 5). Indeed, patients' performance across trials fitted with the predictions generated by a model-free RL model, even at the beginning of the task when control behavior clearly differed from that of the model. These findings indicate that the behavior of TLE-UHS patients was not affected by the temporal context, relying mainly on RL strategies to complete the task. As expected, such behavioral pattern was less optimal (Acuña and Schrater, 2010), which was reflected by the fact that patients had lower accuracy rates and accumulated less money compared with the control group. Previous studies have already shown that MTL damage does not have a significant impact on individuals' abilities to learn reward contingencies, but rather in generalizing learned information. This deficit prevents patients from using previously learned knowledge when facing novel contexts with overlapping features of previous experiences (Meyers et al., 2003). We extend these results by showing that MTL damage also alters strategies that rely on the processing of contextual information.
Furthermore, converging evidence indicate that the hippocampus and surrounding medial temporal cortices subserve successful encoding (Davachi et al., 2003; Ranganath et al., 2004; Staresina and Davachi, 2006), retrieval (Cansino et al., 2002; Hannula and Ranganath, 2008; Diana et al., 2013), and representation (Bonnici et al., 2012) of contextual information, including temporal context (Jenkins and Ranganath, 2010; Ekstrom et al., 2011; Tubridy and Davachi, 2011). Such capacity might be related to the ability to distinguish between temporally distinct events that have overlapping elements (Hsieh et al., 2014). The capacity to use the representation of contextual information may be crucial to adaptively guide future actions and decisions in temporally structured sequences. Indeed, studies in rodents have shown that damage to the hippocampus results in impairments in the anticipatory selection of maze arms experienced in sequence (DeCoteau and Kesner, 2000). Hippocampal amnesic patients are impaired in tasks involving imagination of contextual information and representation of future events (Hassabis et al., 2007; Addis et al., 2009; Zeidman and Maguire, 2016). Similarly, the TLE-UHS group in the current study was not able to anticipate the occurrence of a reversal, leading to lower performance than the control group. In humans, there is no previous specific evidence regarding the role of the hippocampus in reversal learning. Here, we extend all these findings, by showing that changes in reversal learning performance may be driven by the role of the hippocampus in processing and representing contextual information to be used later for goal-directed behaviors.
Finally, the TLE-UHS group did not show differences before and after the resection of the hippocampus in their ability to solve the reversal learning task. We hypothesize that the damage present in the hippocampus before the surgery (reduced GM volume shown by patients in Figs. 3B, 4) was already impairing patient ability to use high-order strategies. Thus, the removal of part of the hippocampus might not further hinder their already disrupted system. Additionally, the fact that the surgery does not further disrupt their performance in reversal learning may contribute to improved clinical management.
Footnotes
This work was supported by Spanish Government Grants PSI2011-29219 and PSI2015-69178-P to A.R.-F. and Catalan Government (Generalitat de Catalunya) Grant 2009 SGR 93. A.V.-B. was supported by Predoctoral IDIBELL Grant 06/IDB-001. E.M.-H. was supported by FPI Program BES-2010-032702 and Montreal Neurological Institute Jeanne Timmins Costello Fellowship. P.R. was supported by FPU program AP2010-4179. We thank all the patients and control participants for great collaboration in the present project; and A. Suades for help with data collection.
The authors declare no competing financial interests.
- Correspondence should be addressed to either Dr. Adrià Vilà-Balló or Dr. Antoni Rodríguez-Fornells, Cognition and Brain Plasticity Unit (Campus Bellvitge), Department of Basic Psychology, University of Barcelona, L'Hospitalet de Llobregat, Feixa Llarga, Barcelona 08907, Spain. adriavilaballo{at}gmail.com or arfornells{at}gmail.com