Abstract
Learning to predict upcoming outcomes based on environmental cues is essential for adaptative behavior. In monkeys, midbrain dopaminergic neurons code two statistical properties of reward: a prediction error at the outcome and uncertainty during the delay period between cues and outcomes. Although the hippocampus is sensitive to reward processing, and hippocampal–midbrain functional interactions are well documented, it is unknown whether it also codes the statistical properties of reward information. To address this question, we recorded local field potentials from intracranial electrodes in human hippocampus while subjects learned to associate cues of slot machines with various monetary reward probabilities (P). We found that the amplitudes of negative event-related potentials covaried with uncertainty at the outcome, being maximal for P = 0.5 and minimal for P = 0 and P = 1, regardless of winning or not. These results show that the hippocampus computes an uncertainty signal that may constitute a fundamental mechanism underlying the role of this brain region in a number of functions, including attention-based learning, associative learning, probabilistic classification, and binding of stimulus elements.
Introduction
The ability to make predictions about potentially rewarding situations has been the focus of conditioning theories explaining how animals learn the predictive relationships between conditioned stimuli (CSs) and reinforcers. Most of these theories propose that learning emerges through the computation of a prediction error between predicted and actual rewards (Rescorla, 1972). Other theories propose that learning is achieved by attention to stimuli: the association between the CS and outcome is enhanced if there is uncertainty about the prediction associated with this stimulus, whereas a stimulus loses its association with a reinforcer when its consequences are accurately predicted (Pearce and Hall, 1980; Yu and Dayan, 2003). Dopamine is closely associated with reward processing (Schultz, 2007). In monkeys, midbrain dopaminergic neurons exhibit a phasic reward prediction error signal that varies monotonically with reward probability (P) at the time of the outcome and a sustained reward uncertainty signal, appearing between the cue and the outcome and following an inverted U-shaped relationship with reward probability, being highest for maximal reward uncertainty (Fiorillo et al., 2003).
Midbrain dopamine neurons broadcast reward-related signals to the ventral striatum and the orbitofrontal cortex. Although the functions of these structures during reward processing have largely been investigated, the role of the hippocampus in this domain has received little attention. However, a growing body of experimental data supports the existence of a functional loop between the ventral tegmental area (VTA) and the hippocampus (Thierry et al., 2000; Floresco et al., 2001, 2003; Lisman and Grace, 2005). In rodents, the novelty-induced activation of the VTA depends on the activation of hippocampal neurons (Legault and Wise, 2001), possibly via the nucleus accumbens–ventral pallidum–VTA pathway (Floresco et al., 2003; Lodge and Grace, 2006), and dopamine release in the hippocampus and prefrontal cortex enhances synaptic plasticity and learning in these regions (Frey et al., 1990). Moreover, several studies provided links between the hippocampus and the VTA both in schizophrenia and in rodent models of this disease (Laruelle and Innis, 1996; Lipska et al., 2003; Harrison, 2004; Lodge and Grace, 2007). In humans, functional magnetic resonance imaging (fMRI) studies showed that midbrain and hippocampus are coactivated during reward-motivated memory formation (Adcock et al., 2006), and prefrontal–hippocampal functional coupling during memory processing is strongly modulated by catechol-O-methyltransferase Val158/Met polymorphism (Bertolino et al., 2006). Reward also modulates hippocampal activity in rodents (Hölscher et al., 2003) and monkeys (Watanabe and Niki, 1985; Rolls and Xiang, 2005).
The hippocampus may also receive reward-related information from the amygdala and orbitofrontal cortex, which project to it and to the entorhinal/perirhinal cortex (Van Hoesen et al., 1975; Amaral and Cowan, 1980; Suzuki and Amaral, 1994). Together, these data suggest that reward-related information may reach the human hippocampus via several pathways.
Yet, it is still unknown whether the hippocampus codes a prediction error and/or uncertainty during learning of probabilistic cue–reward associations. To address these questions, we recorded hippocampal activity in epileptic patients implanted with depth electrodes while they learned to associate cues, i.e., images of different slot machines with distinct probabilities of monetary rewards.
Materials and Methods
Subjects.
Three male volunteers (ages 20, 40, and 27) suffering from drug-refractory partial epilepsy performed the experiment. They were stereotaxically implanted with depth electrodes as part of a presurgical evaluation. All the subjects were fully informed of the brain recordings for the present study and gave their informed consent. The procedure did not entail any additional risk for the subjects and was thus ethically acceptable according to French regulation. The target structures implanted with depth electrodes to identify the potential epileptogenic foci before eventual functional surgery were defined on the basis of noninvasive video-scalp EEG recordings, structural MRI, 18 fluorodeoxyglucose (18FDG) positron emission tomography (PET), and ictal SPECT (single photon emission computed tomography) [for a complete description of the rationale of electrode implantation, see the study by Isnard et al. (2004)]. Structural MRI and 18FDG PET scans showed no hippocampal atrophy or hypometabolism in any of the three subjects. Subject 1 suffered from right temporal lobe epilepsy, and subjects 2 and 3 suffered from left temporal lobe epilepsy. The hippocampus was included in the explored sites. Subject 1 had a unilateral implantation in the right hippocampus, subject 2 had a unilateral implantation in the left hippocampus, and subject 3 had bilateral hippocampal implantations. In subject 3, intracranial EEG recordings showed permanent paroxystic activities in the inner part of the left temporal lobe, suggesting a focal dysplasia of the left hippocampus. Consequently, the recordings from this subject's left hippocampus were discarded from our study, and only the activity from his right hippocampus was analyzed. In all subjects, EEG recordings from the epileptic temporal lobe showed that the hippocampus participated in seizure propagation but was not part of the primary epileptogenic zone. The epileptogenic trigger zones were located in the right superior parietal lobule in subject 1, in the external part of the left temporobasal neocortex in subject 2, and in the left amygdala in subject 3. Subject 1 is waiting for surgery. Subjects 2 and 3 were cured by corticectomy sparing the hippocampus and are today seizure free.
Stereotaxic implantation and electrode location.
Recording electrodes were 0.8 mm multicontact cylinders (DIXI Medical). They were implanted into the brain perpendicular to the midsagittal plane, according to Talairach and Bancaud's stereotaxic technique (Talairach and Bancaud, 1973), as already done by our group (Krolak-Salmon et al., 2004). Contacts (5–15 per electrode) were 2 mm long and spaced every 1.5 mm. Electrode locations were measured from x-ray images obtained on a stereotaxic frame and registered on the corresponding structural magnetic resonance images using a custom-designed Matlab program (MathWorks).
Behavioral task.
The experimental paradigm was implemented with the software Presentation (version 9, Neurobehavioral Systems). Subjects were presented with eight runs of five blocks with the same elementary structure. In each block, a single slot machine was presented on a computer screen during 20 consecutive trials. Each slot machine was made visually unique by displaying a particular fractal image on top of it.
In each run, five types of slot machines were presented in random order and, unbeknownst to the subjects, attached to five reward probabilities [P = 0 (P0), P0.25, P0.5, P0.75, and P1). A total of 8 × 5 = 40 different slot machines were presented in eight runs. Rewarded and unrewarded trials were pseudorandomized (Fig. 1).
The subjects' task was to estimate at each trial the reward probability of each slot machine at the time of its presentation, based on all the previous outcomes of the slot machine until this trial (i.e., estimate of cumulative probability since the first trial). The task was not to predict whether the slot machine would reward or not on the current trial. To perform the task, subjects had to press one of two response buttons: one button indicating that, overall, the slot machine had a high winning probability and the other button indicating that, overall, the slot machine had a low winning probability. Subjects were told that their current estimate had no influence on subsequent reward occurrence. During the task, subjects received no feedback relative to their correct/incorrect estimation of the winning probability of the slot machine. Finally, at the end of each block of 20 successive presentations of a single type of slot machine, they were asked to classify this slot machine on a scale from 0 to 4 according to their global estimate of reward delivery.
Recordings and signal averaging.
The experiment started at least 8 d after electrode implantation. At that time, anticonvulsive drug treatment had been drastically reduced for at least 1 week to record spontaneous epileptic seizures during continuous video-scalp EEG recordings performed in specially equipped rooms. The three subjects were under the following antiepileptic therapies: subject 1, lamotrigine (300 mg/24 h) and topiramate (100 mg/24 h); subject 2, carbamazepine (1400 mg/24 h) and clobazam (10 mg/24 h); and subject 3, oxcarbazepine (1200 mg/24 h), valproate (1000 mg/24 h), and alprazolam (0.75 mg/24 h). The experiment took place 48, 96, and 12 h after occurrence of a seizure for subjects 1, 2, and 3, respectively. Continuous-depth EEGs were recorded on a 128-channel device (Brain Quick System Plus; Micromed), amplified, filtered (0.1–200 Hz bandwidth), sampled at 512 Hz, and stored together with digital markers of specific events of the task for subsequent off-line analysis. These markers included five markers at the cue [appearance of the slot machine (S1)] to differentiate each of the five reward probabilities of the slot machines (P0, P0.25, P0.5, P0.75, and P1) and eight markers at the outcome [when the third spinner stopped spinning (S2)], fully informing the subject on subsequent reward or no reward delivery, defined according to the eight possible outcomes (three slot machines with either rewarded or unrewarded trials, one with only rewarded trials, and one with only unrewarded trials). The intrahippocampal EEG was referenced to another electrode contact located outside the brain, near the skull. In subjects 1 and 2, this reference electrode was located in the most superficial contact (outside brain tissue) of the hippocampal electrode with recording contacts, and in subject 3 it was located in another electrode in the contralateral side relative to the recording electrode. EEG was low-pass filtered (30 Hz) and visually inspected. Trials showing epileptic spikes and artifacts were discarded. Signals were processed with the software package for electrophysiological analyses (ELAN-Pack) developed at the Inserm U821 laboratory (Lyon, France; http://u821.lyon.inserm.fr). Averaging and analysis of the EEG were performed on epochs of 3500 ms (−1500 + 2000 ms from markers placed at the cue and at the outcome, respectively), with a baseline correction from −1500 ms to these markers. We chose this long time period as the baseline because during the delay period, when the spinners rolled around, no activity linked to the rotation of the spinners emerged in the hippocampus, providing a baseline long enough to eliminate electrical noise.
Behavioral data analysis.
The percentages of correct estimations of the high/low probability of winning for each slot machine were analyzed as a function of trial rank (1–20) averaged over subjects and runs. The estimations were defined as correct for the slot machines with low reward probabilities (P0 and P0.25) if subjects identified them as “low winning” and were defined as correct for the slot machines with high reward probabilities (P0.75 and P1) if subjects identified them as “high winning.” The slot machine with a reward probability of P0.5 had neither “low” nor “high” winning probability. The choice being binary, the percentage of 50% estimates of “high,” or symmetrically, of “low” winning probability corresponded to the correct estimate of winning probability for this slot machine.
For the probabilities P0, P0.25, P0.75, and P1, the trial rank when learning occurred was defined as the trial rank with at least 70% correct responses and for which the percentage of correct estimation did not decrease below this limit for the remaining trials. For the probability P0.5, the trial rank when learning occurred was defined as the trial rank with ∼50% of the responses being either “high” or “low” winning probability, with responses then oscillating around this value for the remaining trials. Moreover, results from subjects' classifications of the slot machines at each of the 20 successive presentations of a single type of slot machine within runs were compared with their estimations made at the end of each block.
Response time (RT) (time elapsed between the machine's appearance and the subject's response) was analyzed as a function of the reward probabilities of the slot machines and the trial rank.
Electrophysiological data analysis.
Trials containing epileptic spikes or artifacts were rejected. No trials were discarded from subject 1, whereas 30% and 16% of the trials were discarded from subjects 2 and 3, respectively (the percentages of rejected trials per condition are reported in supplemental Table 1, available at www.jneurosci.org as supplemental material).
For each subject, the mean peak amplitudes of the event-related potentials (ERPs) at S1 and S2 were computed over all trials for each of the five types of slot machines for rewarded and unrewarded trials separately. First, at S1, subjects 1 and 3 showed ERPs with constant amplitudes regardless of reward probability, whereas subject 2 had no ERP in the hippocampus. Because ERPs at S1 were not reproducible and were not related to the reward probabilities of the slot machines, they were not analyzed further.
Next, we examined the statistical significance of the ERPs at S2 with respect to the baseline signal (−1500–0 ms), with a Wilcoxon test performed on single trials for each probability on epochs of 3500 ms (−1500 to +2000 ms from the markers) with a moving time window of 20 ms, shifted by a 2 ms step. We then investigated the relationship between ERP peak amplitudes and reward probability for each subject by use of a multifactorial ANOVA, with reward probability and trial outcome (rewarded/unrewarded) as independent factors. Post hoc comparisons were then made using Tukey's HSD tests to further assess the significant differences between ERP peak amplitudes as a function of probability and outcome.
Finally, since the behavioral analysis showed that the learning criterion was reached at around the ninth trial for all reward probabilities, the first 10 trials of each block were discarded to rule out a possible effect of learning on the ERP peak amplitudes, and the same analysis on the ERP peak amplitudes was then performed for only the last 10 trials.
Moreover, for each subject, we determined the mean onset latencies, peak latencies, and durations of the ERPs, time locked to the time the third spinner stopped, for the five types of slot machines for rewarded and unrewarded trials.
Results
Behavior
Estimation of reward probability
A multifactorial ANOVA performed on the percentage of correct estimates of the probability of winning (low likelihood of winning for P0 and P0.25, high likelihood of winning for P0.75 and P1, and 50% of each alternative for P0.5) showed that both reward probability (P) and trial rank (R) influenced the percentage of correct estimations (FP(4,500) = 96.48, p < 0.000001; FR(19,500) = 4.44, p < 0.000001) and that the trial rank when learning occurred depended on reward probability (FR×P(76,500) = 1.87, p < 0.00004). The reward probabilities P0 and P1 reached the learning criterion after the 2nd trial (>80% correct estimations), whereas the reward probabilities P0.25 and 0.75 reached the learning criterion between the 4th and the 12th trial for P0.25 (7th trial, 91.6% correct estimations) and between the 5th and the 16th trial for P0.75 (9th trial, 70.8% correct estimations). The reward probability P0.5 reached the learning criterion after the ninth trial (estimations oscillating around 50% as “high” or “low” probability of winning) (Fig. 2A,B).
The fact that subjects learned the actual reward probability of each slot machine at asymptote was confirmed by their additional classification of the slot machines at the end of each block on a scale from 0 to 4 (96% correct estimations for P0, 100% for P1, 87% for P0.25, 83% for P0.75, and 92% for P0.5).
RTs
The mean RTs ± SEM for all the reward probabilities and trials were 809.20 ± 25 ms for subject 1, 612.90 ± 14.90 ms for subject 2, and 832.60 ± 27.27 ms for subject 3. Subject 2 had a significantly shorter RT than the other two subjects (p = 0.00002). RTs were analyzed over all subjects with two multifactorial ANOVAs.
First, an RT analysis was performed with the reward probability (P) of the slot machines and the trial rank (R) as independent factors. There was a main effect of trial rank (FR(19,2279) = 4.22, p < 0.0000001) and no main effect of probability (FP(4,2079) = 1.63, p = 0.16). Although the ANOVA did not reveal any effect of reward probability on RT, there was a trend for RT to decrease with increasing reward probabilities (Fig. 2C). The effect of trial rank on RT was caused by the first trial, which was slower for all subjects and all reward probabilities (1200 ± 13.54 ms, Tukey's HSD post hoc test, p < 0.0001).
Second, RTs were analyzed with an ANOVA, with trial outcome (reward/no reward) (O) and reward probabilities of the slot machines (P) as independent factors, followed by Tukey's HSD post hoc test. RTs did not vary with trial outcome (FO(1,2373) = 0.28, p = 0.59); values were 777.09 ± 24.76 ms for rewarded trials and 743.03 ± 23.50 ms for unrewarded trials (Fig. 2D).
Electrophysiology
Electrode location
In each subject, at least three contiguous contacts were located in the hippocampus. In subjects 1 and 3, they were located in the right hippocampus and in subject 2 in the left hippocampus. The Talairach coordinates of the hippocampal electrode contacts from the deepest to the most superficial were the following: for subject 1, x = 20–34 (five contacts), y = −22, z = −12; for subject 2, x = −25 to −34 (four contacts), y = −22, z = −10; and for subject 3, x = 25–32 (three contacts), y = −31, z = −5. These coordinates correspond to the rostral and dorsal parts of the hippocampus in subjects 1 and 2 and to the medial and dorsal parts of the hippocampus in subject 3 (Figs. 3, 4).
Hippocampal ERP amplitudes
Regardless of winning or not, a robust negative ERP emerged in the hippocampus of the three subjects, 256.5 ± 16.5 ms after the outcome (S2) and before the actual outcome presentation (picture of a bill or no reward) (Fig. 4). This signal was observed for three of the four hippocampal contacts in subject 1, for one of the four contacts in subject 2, and for two of the three contacts in subject 3. Here we report results from the contact yielding the largest potential in each subject. Contacts adjacent to the one yielding the largest signal yielded a smaller amplitude signal, no signal, or a polarity inversion, suggesting that the origin of the observed ERP was close to this contact (supplemental Fig. 1, available at www.jneurosci.org as supplemental material).
For each subject and for each type of slot machine (i.e., reward probability), this emerging signal was significantly different from baseline during a time window varying from 56 to 431 ms around the maximal amplitude (Wilcoxon tests, p values varying from <0.0001 to <0.048).
Importantly, for each subject, the mean peak amplitude of these ERPs (−28 to −112 μV) followed an inverted U-curve relationship with reward probability, varying nonlinearly with reward probability and being maximal when reward uncertainty is highest (P0.5) and minimal when reward uncertainty is lowest (P0 and P1), both for rewarded and for unrewarded trials. No difference in the peak amplitudes was observed for rewarded versus unrewarded trials (ANOVA with probability and outcome as independent factors). For subject 1, FP(3,800) = 6.44, p < 0.0003, and FO(1,800) = 0.027, p = 0.87, no interaction, FP×O(3,800) = 0.75, p = 0.52; for subject 2, FP3,486) = 4.71, p < 0.003, and FO(1,486) = 0.09, p = 0.76, no interaction, FP×O(3,486) = 0.12, p = 0.95; for subject 3, FP(3,632) = 7.70, p < 0.00005, and FO(1,632) = 7.70, p < 0.00005, no interaction, FP×O(3,632) = 0.29, p = 0.83 (Fig. 5). We therefore performed the same multifactorial ANOVA at the group level, with subject (S), probability, and type of outcome (reward or no reward) as independent factors. The factor subject had no effect: FP(3,1918) = 17.55, p < 0.000001; FO(1,1918) = 0.089, p = 0.76; FS(2,3630) = 0.12, p = 0.88, no interaction, FP×O(3,1918) = 0.06, p = 0.98, FP×O×S(6,7195) = 0.46, p = 0.83 (Fig. 6A).
Finally, to rule out the possible influence of early-stage learning of the reward probability on the amplitude of these ERPs, we also performed an additional analysis on the ERPs for the last 10 trials of each run. A similar inverted U-shaped relationship was observed between reward probability and the amplitudes of hippocampal ERPs (ANOVA with probability and outcome as independent factors: FP(3,750) = 10.71, p < 0.000001; FO(1,750) = 0.5, p = 0.47; no interaction, FP×O(3,750) = 0.25, p = 0.85) (Fig. 6B).
Hippocampal ERP latencies and durations
Multifactorial ANOVA on the mean onset latencies, peak latencies, and durations of ERPs time locked to S2 with reward probability, outcome (rewarded/unrewarded), and subject as independent factors showed that there was no significant effect of reward probability (P) or outcome (O) on onset latencies (FP(4,17) = 0.79, p = 0.51; FO(1,17) = 0.0005, p = 0.98), peak latencies (FP(3,17) = 0.55, p = 0.65; FO(1,17) = 0.018, p = 0.89), or durations (FP(3,17) = 2.06, p = 0.14; FO(1,17) = 1.11, p = 0.30). A significant effect of subject was observed on ERP onset latencies, peak latencies, and durations. Indeed, subject 1 had significantly longer ERP onset latencies (301.77 ± 10.47 ms) compared with subjects 2 and 3 [225.36 ± 24.91 ms, p = 0.04, and 242.24 ± 11.74 ms, p < 0.02, respectively; Fisher's least significant difference (LSD) test], whereas subject 2 had significantly longer peak latencies for unrewarded trials (475.10 ± 54.67 ms versus 407.23 ± 11.28 for rewarded trials, p < 0.005; Fisher's LSD test) and significantly longer ERP durations, regardless of whether or not the trial was rewarded (526.57 ± 37.10 ms), compared with subjects 1 and 3 (300.65 ± 23.11 ms and 316.38 ± 34.51 ms, respectively, p < 0.0001; Fisher's LSD test) (supplemental Table 2, available at www.jneurosci.org as supplemental material). These slight individual differences in ERP latencies and durations have no consequences concerning the significance of the hippocampal ERP amplitudes analyzed here.
Discussion
This study provides the first direct evidence that the anterior hippocampus codes uncertainty of cue–outcome associations in humans. It shows that when subjects learned to associate cues of slot machines with various monetary reward probabilities (P), the amplitude of negative ERPs recorded in the anterior hippocampus followed an inverted U-shaped relationship with the outcome probability, regardless of winning or not.
This inverted U-shape relationship is incompatible with prediction error, novelty, or surprise coding, which would have predicted a negative monotonic correlation between ERP amplitudes and increasing reward probability (Fiorillo et al., 2003; Dreher et al., 2006).
Also, the signal we observed at the outcome cannot reflect a negative error feedback (such as an error-related negativity), because no feedback was delivered on the current trial regarding subject's estimation and because the task was not to predict the outcome of the current trial (but to estimate the cumulative reward probability since the first trial).
Moreover, despite the well established role of the hippocampus in learning, we believe that the signal we observed codes uncertainty and cannot be interpreted as a learning signal alone, because it also occurred when restricting our analysis to the last 10 trials of our experiment, when all subjects had learned the winning probability of each slot machine.
In a previous fMRI study using a similar paradigm (Dreher et al., 2006), no hippocampal activation linked to reward uncertainty was seen at the outcome. This study used a much longer delay period (14 s) than our current experiment (delay = 2 s, equal to the one used in the monkey electrophysiology experiment), which may explain why the short-lasting hippocampal uncertainty signal currently observed at the outcome (∼300 ms) may have been missed in the fMRI study.
Our current results extend to the domain of associative learning results obtained in human neuroimaging studies showing that the BOLD (blood oxygen level dependent) response in the anterior hippocampus increases with uncertainty of probabilistic sequential events (Strange et al., 2005), although other studies reported opposite results (Harrison et al., 2006).
Two important characteristics distinguish uncertainty coding in the hippocampus from the uncertainty signal recorded in monkeys' dopaminergic neurons (Fiorillo et al., 2003). First, the signal recorded in the hippocampus is transient. Second, it occurs at the outcome and not during the delay between the cue and the outcome and therefore is not linked to reward expectation. These two modes of uncertainty coding may play different functions during associative learning: the sustained mode of midbrain activity may be related to a sustained form of attention to reinforcers, motivation, or exploratory behavior (Fiorillo et al., 2003; Dreher et al., 2006), whereas the transient mode of hippocampal activity may code a posteriori the degree of uncertainty of cue–outcome associations and signal selective attention to the informative outcome (Pearce and Hall, 1980). Providing information about trial outcome may be a fundamental computational operation achieved by the hippocampus, because this has been shown to occur in other domains (Watanabe and Niki, 1985; Wittmann et al., 2007).
Both forms of uncertainty coding are compatible with the concept of Shannon's entropy from information theory (Shannon, 1948), which measures an ensemble's average information content or its uncertainty and which is maximal for outcomes with a 50% chance of occurrence. Thus, we believe that the hippocampal signal recorded at the outcome may help to adjust attention to the level of outcome uncertainty regardless of reward. In summary, these findings extend early views in the probabilistic domain that the hippocampus is involved in decreasing attention to unimportant events (Douglas, 1967) and further support the idea that it can produce increases in attention to relevant stimuli (Pearce and Hall, 1980). This general computation of cue–outcome uncertainty may represent the underlying mechanism responsible for the involvement of the hippocampus in associative learning, probabilistic classification (Squire and Zola, 1996), binding of stimulus elements (Gluck and Granger, 1993), and transitive inference (Dusek and Eichenbaum, 1997; Frank et al., 2003). Indeed, in all these hippocampus-dependent functions, the encoding of item relationships is based on the strength of their associations, which can be efficiently computed by their degree of uncertainty. This a posteriori uncertainty encoding of item associations by the hippocampus may participate in a feedback process to update these relationships, enabling dynamic adaptation to the current event.
This hippocampal uncertainty signal might either be computed by the hippocampus itself, independently of dopaminergic neurons firing, or result from hippocampal–midbrain reciprocal connections. Indeed, the integration by the hippocampus of the tonic dopaminergic signal during the delay between the cue and the outcome might result in a phasic signal at the time of the outcome. Regardless of the precise contribution of dopaminergic neurons in the present findings, different representations of uncertainty arising from the hippocampus and VTA may be conveyed to postsynaptic dopaminergic projection sites, such as the orbitofrontal cortex and the striatum, allowing further computations required for decision making under uncertainty (Hsu et al., 2005). It is clear from previous findings that a ubiquitous coding of uncertainty exists in the human brain, particularly in the ventral striatum, insula, anterior cingulate cortex, and orbitofrontal cortex (Hsu et al., 2005; Dreher et al., 2006; Preuschoff et al., 2006, 2008; Tobler et al., 2007), and the present study reveals that the hippocampus also participates in uncertainty processing. Future studies are needed to pinpoint the specific roles of each structure in computing uncertainty in different contexts.
Together, our findings have crucial implications for understanding the basic neural mechanisms used by the brain to extract structural relationships from the environment when learning cue–outcome associations. They also have important consequences regarding impairment of these mechanisms in neuropsychiatric disorders involving dysfunctions of the dopaminergic–hippocampal loop (e.g., schizophrenia).
Footnotes
-
We thank Dr. M. Guénot for surgical implantation of epileptic patients, Dr. A. Cheylus for help with programming the experimental paradigm, and Drs. E. Procyk, S. Wirth, and K. Reilly for helpful comments on an early version of this manuscript.
- Correspondence should be addressed to either Dr. Giovanna Vanni-Mercier or Dr. Jean-Claude Dreher, Reward and Decision Making Group, Cognitive Neuroscience Center, CNRS, UMR 5229, 67 boulevard Pinel, 69675 Bron, France. g.vanni-mercier{at}isc.cnrs.fr. or dreher{at}isc.cnrs.fr