Midbrain dopaminergic neurons projecting to the ventral striatum code for reward magnitude and probability during reward anticipation and then indicate the difference between actual and predicted outcome. It has been questioned whether such a common system for the prediction and evaluation of reward exists in humans. Using functional magnetic resonance imaging and a guessing task in two large cohorts, we are able to confirm ventral striatal responses coding both reward probability and magnitude during anticipation, permitting the local computation of expected value (EV). However, the ventral striatum only represented the gain-related part of EV (EV+). At reward delivery, the same area shows a reward probability and magnitude-dependent prediction error signal, best modeled as the difference between actual outcome and EV+. In contrast, loss-related expected value (EV−) and the associated prediction error was represented in the amygdala. Thus, the ventral striatum and the amygdala distinctively process the value of a prediction and subsequently compute a prediction error for gains and losses, respectively. Therefore, a homeostatic balance of both systems might be important for generating adequate expectations under uncertainty. Prevalence of either part might render expectations more positive or negative, which could contribute to the pathophysiology of mood disorders like major depression.
In nonhuman primates, mesolimbic dopaminergic neurons are involved in the representation of reward probability and reward magnitude (Schultz et al., 1997; Fiorillo et al., 2003; Tobler et al., 2005). In humans, these response properties have been observed in the ventral striatum (Pagnoni et al., 2002; McClure et al., 2003; O'Doherty et al., 2003; Ramnani et al., 2004), a region known to receive afferent input from midbrain dopaminergic neurons (Haber et al., 1995). The ventral striatum responds to a conditioned stimulus predicting reward delivery (McClure et al., 2003; O'Doherty et al., 2003) and shows a strong outcome-related response when a reward occurs unexpectedly or an activity decrease when an expected reward is omitted (Pagnoni et al., 2002; McClure et al., 2003). These findings suggest that ventral striatal activations resemble a prediction error signal similar to the dopaminergic midbrain signal in the primate (Schultz and Dickinson, 2000).
Reward processing in the human has also been investigated using other incentive tasks containing a guessing or gambling component (Rogers et al., 1999; Elliott et al., 2000; Knutson et al., 2000; Breiter et al., 2001; Delgado et al., 2003; Ernst et al., 2004; Matthews et al., 2004; Abler et al., 2006). However, in contrast to reinforcement learning, a proper model for the prediction signal that combines reward magnitude and probability has not been established. “Expected value” (EV), defined as the product of reward magnitude times reward probability (Machina, 1987), is a likely basis for such a model (Knutson et al., 2005). In a guessing task with two possible outcomes (i.e., gain or loss), the total EV is the sum of the gain-related EV (EV+) and the loss-related EV (EV−). The former is the probability of a gain times the magnitude of the gain, whereas the latter is the probability of a loss times the magnitude of the loss. Previous studies investigating the neuronal basis of EV have used tasks with gain versus no-gain outcomes or loss versus no-loss outcomes (Knutson et al., 2005; Dreher et al., 2006). In the former case, EV equals EV+, simply because no loss can occur (i.e., EV− = 0), and in the latter case EV equals EV− (i.e., EV+ = 0).
We used a factorial design in combination with functional magnetic resonance imaging (fMRI) in which volunteers could gain and lose different amounts of money with different probabilities in each trial. This allowed us to explicitly test whether EV+ and EV− are processed in the same or different brain areas. Based on recent data (Bayer and Glimcher, 2005), showing a limited dynamic range of dopaminergic midbrain neurons, we expected that in such a task, the ventral striatum would only be able to signal EV+ but not EV−, and that an additional system exists that represents EV−. Because the amygdala has been implicated in the prediction of aversive events (Büchel et al., 1998; LaBar et al., 1998; Breiter et al., 2001; Kahn et al., 2002; Glascher and Buchel, 2005; Trepel et al., 2005), this structure is a possible candidate for such a system.
Materials and Methods
Forty-two healthy male volunteers, 27.3 ± 5.5 years of age (mean age ± SD), participated in the main study. A second cohort of 24 healthy male volunteers, 24.9 ± 4.9 years of age (mean age ± SD), was investigated for replication purposes. We concentrated our investigation on male volunteers to minimize the influence of differences in hormonal state during the menstrual cycle. Gonadal steroids have a regulatory influence on the reward system in female rats (Bless et al., 1994), and estradiol in particular has been shown to modulate dopamine (DA) release, synthesis, and receptor binding in the striatum (Pasqualini et al., 1996).
The local ethics committee approved the study and all participants gave written informed consent before participating. Volunteers were evaluated with a structured psychiatric interview (the Mini-International Neuropsychiatric Interview) (Sheehan et al., 1998) and with a gambling questionnaire (Kurzfragebogen Glücksspielverhalten) (Petry, 1996) to exclude psychiatric diseases and pathological gambling. All underwent a urine drug screening to exclude cocaine, amphetamine, cannabis, and opiate abuse.
The paradigm used was a simple guessing task subdivided into two phases: anticipation and outcome. Each trial began with the presentation of the backside of eight playing cards (Fig. 1a,b, top). In the initial phase, volunteers had to place money on individual playing cards. In some trials, they could place the money on the corners of four adjacent cards (Fig. 1a) and in others on a single card (Fig. 1b). This manipulation allowed us to control reward probability (low for a single card and high for four cards). Altogether, volunteers played a series of 200 trials. Because of trial randomization, the probability for the low-probability trials was 26 and 66% for the high-probability trials. This is a small deviation from the graphically expected probabilities of one-eighth (i.e., 0.125) and four-eighths (i.e., 0.5), which was necessary to avoid a rapid decrease in balance resulting from the unfortunate average gain/loss ratio of 31.25/68.75% when the individual gain probabilities are 12.5 and 50%. The inclusion of a third (i.e., very high) probability of seven-eighths (i.e., 0.875) as an alternative was dismissed, because it would have increased the total number of conditions from 8 to 12.
In summary, this can be seen as a 2 × 2 × 2 factorial design with the factors probability (high or low), magnitude (one or five Euro) and outcome (gain or loss), resulting in eight different conditions.
Initial credit was set to 20 Euro and continuously displayed on the screen. The money presented was either a one Euro coin (Fig. 1a) or a five Euro bill (Fig. 1b). Volunteers were able to place their bet using a magnetic resonance (MR) compatible optical mouse for 3034 ms. After placing the bet, the display was kept constant during an additional anticipation period of 4207 ms, after which all cards were flipped, and the volunteers could immediately see the outcome of the trial. Another 2015 ms later, the continuously visible credit display was updated and another 3006 ms (in 171 trials) or 12262 ms (in 29 trials) later, the next trial began. This resulted in 171 trials with an interstimulus interval (ISI) of 12.26 s and 29 trials with a longer ISI (21.46 s), introducing 14.6% null-events.
Seven of eight cards were black, the remaining one was a red ace (Fig. 1a,b, bottom row). If the red ace was touched by the bet (Fig. 1a, bottom), the volunteer gained the amount of money and otherwise lost the money (Fig. 1b, bottom). The order of trials was pseudorandomized and predetermined (i.e., the volunteer had no influence on the probability and the magnitude of each individual trial).
Before entering the scanner, subjects received a standardized verbal description of the task and completed a practice session, including all possible combinations of probability, magnitude, and outcome.
Volunteers were told explicitly before the experiment that they would receive their balance in cash. In case of a negative balance, they were told that the amount would be deducted from the payment offered for participating in this study. Volunteers ended the game with a negative balance of eight Euro, which was waived.
MR scanning was performed on a 3T MR Scanner (Siemens Trio; Siemens, Erlangen, Germany) with a standard headcoil. Thirty-eight continuous axial slices (slice thickness, 2 mm) were acquired using a gradient echo echo-planar T2*-sensitive sequence (repetition time, 2.22 s; echo time, 25 ms; flip angle, 80°; matrix, 64 × 64; field of view, 192 × 192 mm). High-resolution (1 × 1 × 1 mm voxel size) T1-weighted structural MRI was acquired for each volunteer using a three-dimensional FLASH sequence.
A liquid crystal display video-projector back-projected the stimuli on a screen positioned behind the head of the participant. Subjects lay on their backs within the bore of the magnet and viewed the stimuli comfortably via a 45° mirror placed on top of the head coil that reflected the images displayed on the screen. To minimize head movement, all subjects were stabilized with tightly packed foam padding surrounding the head.
The task presentation and the recording of behavioral responses were performed with Cogent 2000v1.24 (http://www.vislab.ucl.ac.uk/Cogent/index.html) and Matlab 6.5 (MathWorks, Natick, MA).
Image processing and statistical analyses were performed using SPM2 (www.fil.ion.ucl.ac.uk/spm). All volumes were realigned to the first volume, spatially normalized (Friston et al., 1995) to an echoplanar imaging template in a standard coordinate system (Evans et al., 1994), resampled to a voxel size of 3 × 3 × 3 mm and finally smoothed using a 10 mm full-width at half-maximum isotropic Gaussian kernel.
All eight conditions of the paradigm were modeled separately in the context of the general linear model as implemented in SPM2. We used two different models to characterize the data. In the first model, the anticipation and the outcome phase were modeled as individual hemodynamic responses (beginning of a trial and 7241 ms after trial onset), leading to 16 regressors (2 × 2 × 2 conditions times 2 regressors). The anticipation-related response was modeled as a small box-car with a duration of 7241 ms, and the outcome-related response was modeled as a single hemodynamic response. An additional covariate was incorporated into the model, representing the anticipation response modulated by the total amount of mouse movements in the choice period of this trial. This ensured that movement-related activation during the early trial period is modeled independently from the regressors of interest (Knutson et al., 2005).
To average the poststimulus BOLD response for display purposes, we defined a second model using a finite impulse response (FIR) basis function with a bin width of 2 s, modeling a total of 10 bins from 0 to 20 s poststimulus. This results in 10 regressors for each condition and 80 regressors for all conditions. Intuitively, this basis set considers each time bin after stimulus onset individually to model the BOLD response and can capture any possible shape of response function up to a given frequency limit. In this model, the parameter estimate for each time bin represents the average BOLD response at that time. In Figures 2–5, we therefore labeled the y-axis as “parameter estimates a.u.” Importantly, these parameter estimates are directly proportional to the BOLD signal. This additional analysis was only conducted to display activation time courses.
Data were analyzed for each subject individually (first-level analysis) and for the group (second level analysis). At the single-subject level, we applied a high-pass filter with a cutoff of 120 s to remove baseline drifts. All 16 parameter estimate images for the first analysis and all 80 parameter estimate images for the second analysis (FIR) were subsequently entered into a random effects analysis. The problem of nonindependent data within subjects as well as error variance heterogeneity was addressed by performing a nonsphericity correction.
For all analyses, the threshold was set to p < 0.05 corrected for multiple comparisons. For reasons of brevity, we focus our report on subcortical and frontal areas. Based on previous data, correction for hypothesized regions was based on volumes of interest. In particular, correction for the ventral striatum was based on an 18-mm-diameter sphere centered on x, y, z: ±15, 9, −9 mm (O'Doherty et al., 2004). Magnitude-dependent activation during the anticipation phase was expected in the orbital frontal cortex (Knutson et al., 2005), and correction was based on a 60-mm-diameter sphere centered on x, y, z: ±21, 42, −9 mm.
The involvement of the amygdala in predicting aversive events (i.e., losses) has been reported previously (Glascher and Buchel, 2005), and correction for multiple comparisons was based on the amygdala regions of interest provided by the Anatomical Automatic Labeling project at http://www.cyceron.fr/freeware/ (Tzourio-Mazoyer et al., 2002). Correction for hypothesized ventromedial prefrontal cortex activation (Knutson et al., 2003) was based on an anatomically defined 36-mm-diameter sphere centered between the genu of the corpus callosum and the anterior pole (center: x, y, z = 0, 52, −3).
We were interested in regions showing signal changes for prediction during the anticipation phase and a prediction error during the outcome phase. This commonality constraint was incorporated by using a conjunction analysis comprising the contrasts for prediction and prediction error. Intuitively, the ensuing conjunction analysis only shows areas in which both contrasts individually reach significance (Nichols et al., 2005).
Prediction error model.
In fMRI studies of reinforcement learning, the predictions and prediction errors have been used to model fMRI data (O'Doherty et al., 2003). The prediction error represents the difference between the actual outcome and the prediction. In reinforcement learning, this prediction error is then used to update future predictions. Although in guessing tasks, there is nothing to be learned per se, the concept of predictions and prediction errors can also be applied. Using a guessing task with fixed probabilities, we can express the prediction error δ as follows: where V is the prediction, R is the actual outcome, and p is the gain probability. This model can now be extended to also incorporate reward magnitude x into the prediction term V, which then becomes the expected value as follows: where EV indicates the predicted outcome (i.e., expected value). The prediction error δ is now the difference between actual outcome R and expected value EV.
EV can be further divided in gain (EV+)- and loss (EV−)-related EV as follows: It should be noted that the concept of expected value was unable to explain some phenomena in human choice behavior, and thus more general forms of the value function have been derived (Edwards, 1955; Kahneman and Tversky, 1991). In these models, x and p do not directly enter into the estimation but rather nonlinear functions of both (Machina, 1987; Kahneman and Tversky, 2000; Trepel et al., 2005). However, it should be noted that the deviation from linearity of these functions is most pronounced at the extremes. Analogous to previous studies (Knutson et al., 2005), we assumed local linearity and based the predictions on the expected value to explain BOLD responses in the human brain.
Dynamic model using trial-based probabilities.
The true average probability of all trials was different from what could be guessed by the visual card layout. We therefore created a model, which iteratively updates the probabilities for the high- and low-probability conditions on a trial-by-trial basis. For the beginning of the trial (i.e., before the first gain trial), the graphically visible probabilities (12.5 and 50%) were used. Figure 6a shows the traces of both (high and low) probabilities over the course of the experiment. This dynamic probability trace was then used to calculate trial-specific gain- and loss-related expected values and prediction errors and used to explain the fMRI data. The basis functions used for the anticipation and the outcome regressor were identical to the original model. However, in contrast to the original model, we entered gain- and loss-related expected value and the respective prediction errors as parametric modulations. In analogy to the first analysis, the parameter estimates for EV+, EV−, and the respective prediction errors were subsequently entered into a random effects analysis.
We continuously monitored all mouse movements during the choice period and could therefore compare the amount of mouse movements between different conditions. We observed a negative main effect of reward magnitude (Z = 2.5; p < 0.05) (i.e., more mouse-movement for one-Euro trials) (294.1 ± 18.0 pixels, mean ± SEM) compared with five-Euro trials (276.1 ± 17.5 pixels; mean ± SEM). Not surprisingly, more mouse movements were also observed (Z = 9.1; p < 0.05) for low-probability trials (318.3 ± 17.3 pixels; mean ± SEM) compared with high-probability trials (251.9 ± 17.6 pixels; mean ± SEM) attributable to more degrees of freedom in placing the bet in low-probability trials. No significant interaction was observed.
All eight conditions (all possible combinations of two reward probabilities, two reward magnitudes, and two outcomes; i.e., gain/loss) of the paradigm were modeled separately. To test for signal differences during anticipation, parameter estimates for the first hemodynamic response (i.e., modeling the anticipation phase of each trial) were compared. In addition, the total amount of mouse movements was modeled as a condition-specific nuisance covariate removing movement-related signal changes.
Reward magnitude-related activation
Bilateral ventral striatum showed a main effect of magnitude (i.e., stronger BOLD signal for trials with five Euro as opposed to one Euro) (Fig. 2a). The peak of this activation was located in bilateral ventral striatum (peak: x, y, z: −12, 3, 0 mm, Z = 5.6; peak: x, y, z: 12, 6, 0 mm, Z = 5.2; both p < 0.05, corrected). Other cortical areas showing a main effect of magnitude during anticipation comprised bilateral anterior insula (peak x, y, z: −33, 21, −6 mm, Z = 5.5; peak: x, y, z: 33, 24, −6 mm, Z = 6.7; both p < 0.05, corrected) and bilateral anterior orbitofrontal cortex (peak: x, y, z: −39, 57, 3 mm, Z = 4.0; peak: x, y, z: 36, 60, −3 mm, Z = 4.6; both p < 0.05, corrected).
Reward probability-related activation
The bilateral ventral striatum showed a main effect of probability (i.e., stronger BOLD signal for more likely gains) (Fig. 2b). The peak of this activation was located in the anterior ventral striatum (peak: x, y, z: −12, 15, −3 mm, Z = 3.4; peak: x, y, z: 15, 15, −6 mm; Z = 4.2; both p < 0.05, corrected). Additional reward probability-related activation was observed in ventromedial prefrontal cortex (peak: x, y, z: 3, 51, −6 mm; Z = 3.3; p < 0.05, corrected).
Main effect of gain-related expected value
BOLD responses that strongly covaried with the linear model of EV+ (Fig. 1d), but not total EV (Fig. 1c), were observed in bilateral ventral striatum (peak: x, y, z: 12, 9, −3 mm, Z = 5.2; peak: x, y, z: −12, 6, −3 mm, Z = 5.2; both p < 0.05, corrected) (Fig. 3a) and the right orbitofrontal cortex (peak: x, y, z: 36, 63, 0 mm; Z = 4.8; p < 0.05, corrected). We replicated this important finding in an additional cohort of 24 volunteers. Peak signal changes that correlate with EV+ were observed in bilateral ventral striatum (peak: x, y, z: 12, 9, −3 mm, Z = 5.8; peak: x, y, z: −12, 6, −3 mm, Z = 5.3; both p < 0.05, corrected) (Fig. 3b).
The outcome phase was defined as a BOLD response evoked by neuronal activity at the moment when the result of the trial was revealed (i.e., the cards were flipped).
Gain-related activation (i.e., gain > loss) was observed in bilateral ventral striatum (peak: x, y, z: 12, 9, −3 mm, Z = 11.8; peak: x, y, z: −12, 9, −3 mm; Z = 11.7; both p < 0.05, corrected) and in bilateral orbitofrontal cortex (peak: x, y, z: 48, 39, −18 mm, Z = 4.7; peak: x, y, z: −45, 45, −15 mm, Z = 5.7; both p < 0.05, corrected).
Prediction error-related responses
Because we observed ventral striatal responses during anticipation that were correlated with the linear model of gain-related expected value, we tested the hypothesis that a prediction error signal is computed as the difference between outcome and EV+ (see Materials and Methods) and therefore created a contrast according to mean corrected predictions from this model (Table 1). Most importantly, we were interested in identifying areas coexpressing both patterns, i.e., signal changes correlated with EV+ during the anticipation phase (Fig. 1d) and signal changes correlated with the prediction error based on EV+ during the outcome phase (Fig. 1e) as predicted by nonhuman primate data (Schultz et al., 1997). A conjunction analysis was used to identify such areas. Based on this conjunction analysis, we detected signal changes in the bilateral ventral striatum (peak: x, y, z, −12, 6, −3 mm, Z = 5.2; peak: x, y, z, 12, 9, −3 mm, Z = 5.2; both p < 0.05, corrected) that closely resemble EV+ during reward anticipation and an EV+-based prediction error signal during the outcome phase (Fig. 4a). We replicated this important finding in an additional cohort of 24 volunteers (Fig. 4b). Voxels that coexpress signal changes related to EV+ during anticipation and the related prediction error during outcome were observed in bilateral ventral striatum (peak: x, y, z, 12, 9, −3 mm, Z = 5.8; peak: x, y, z, −12, 6, −3 mm, Z = 5.3; both p < 0.05, corrected) (Fig. 4b). Interestingly, the time course in the left ventral striatum (Fig. 4c) shows more pronounced deactivations for loss trials (cyan) than activations for gain trials (magenta) in accordance with the EV+-based prediction error model (Fig. 1e). Because the actual gain probabilities (26 and 66%) were slightly higher compared with the graphically expected probabilities (12.5 and 50%), we replicated this result with an analysis using dynamic probabilities on a trial-by-trial basis (see Materials and Methods for details). This analysis showed activation patterns in the ventral striatum that were almost indistinguishable from the original analysis (peak: x, y, z, 12, 9, −3 mm, Z = 4.4; peak: x, y, z, −12, 3, −3 mm, Z = 4.7; both p < 0.05, corrected) (Fig. 6b).
Loss-related expected value and the associated prediction error
Analogous to our model driven analysis for EV+ and the associated prediction error, the same analysis was performed for loss-related expected value, EV−. Areas showing both EV−-related signal changes during anticipation (Fig. 1f) and an EV−-associated prediction error response during the outcome phase (Fig. 1g) were again identified using a conjunction analysis. In contrast to EV+, EV−-related activations showed a maximum in bilateral amygdala (peak x, y, z, 30, −3, −12 mm, Z = 5.4; peak x, y, z, −27, −3, −18 mm, Z = 3.9; both p < 0.05, corrected) (Fig. 5a). Again, this finding was replicated in an independent cohort of 24 volunteers (amygdala peak x, y, z, 27, −3, −18 mm, Z = 4.3; peak x, y, z, −24, −3, −15 mm, Z = 4.1; both p < 0.05, corrected) (Fig. 5b). Compared with the prediction error based on EV+ in the ventral striatum, the time course in the amygdala (Fig. 5c) shows less pronounced or no deactivations for loss trials (cyan), in accordance with the EV− based prediction error model (Fig. 1g). Analogous to the analysis of EV+-related responses, we replicated this analysis with an analysis using dynamic probabilities on a trial-by-trial basis. This analysis showed activation patterns in the amygdala that were similar to the original analysis (peak x, y, z, 27, 0, −18 mm, Z = 4.8; peak x, y, z, −27, −3, −18 mm, Z = 3.3; both p < 0.05, corrected) (Fig. 6c).
We systematically varied the characteristics of reward-related processing using a factorial design that allowed for all possible combinations of reward magnitude, reward probability, and outcome in combination with fMRI. In two large cohorts of healthy volunteers, we were able to show ventral striatal responses coding expected value (i.e., the product of reward probability and magnitude during anticipation). Importantly, ventral striatal responses did not express the full range of expected value but only gain-related expected value (EV+). At reward delivery, the same area showed a reward probability and magnitude-dependent prediction error signal, parsimoniously modeled as the difference between actual outcome and EV+. Conversely, loss-related expected value (EV−) and the associated prediction error were identified in the amygdala.
Most previous fMRI studies have either varied reward magnitude (Knutson et al., 2000, 2001a,b; Delgado et al., 2003) or reward predictability (Berns et al., 2001; Abler et al., 2006) or used a fixed combination of probability and magnitude (Rogers et al., 1999; Ernst et al., 2004; Matthews et al., 2004; Coricelli et al., 2005; Dreher et al., 2006). In most of these studies, volunteers had the choice between different gambles and therefore did not include the combination of low-gain probability and low magnitude, because this combination is least lucrative than the others, and normal volunteers would not choose such a gamble. More recent studies (Knutson et al., 2005) investigated different magnitudes and probabilities but restricted the analysis to the anticipation phase or did not use a full factorial design (Dreher et al., 2006). Based on these studies, we decided to independently manipulate anticipated reward magnitude and probability by presenting guessing scenarios with fixed probabilities and magnitudes. As in previous studies (Elliott et al., 2004; Zink et al., 2004; Knutson et al., 2005), volunteers were engaged in the task. Differences in motor behavior were included in the statistical model and thus are unlikely to confound the observed effects (Knutson et al., 2005).
During the anticipation phase, we were able to demonstrate a robust relationship between ventral striatal activation and reward magnitude. This finding is in accord with previous reports showing magnitude-dependent activation in the ventral striatum (Knutson et al., 2003). In addition, we observed a weaker main effect of probability showing more activation in the ventral striatum during the anticipation of more probable rewards consistent with a recent fMRI study (Abler et al., 2006). It is not surprising that these responses were observed in the ventral striatum rather than the midbrain, because the BOLD response reflects presynaptic input and processing (Logothetis et al., 2001). Therefore, spiking activity of dopaminergic midbrain neurons is expected to change the BOLD signal in areas to which these neurons project, such as the ventral striatum.
Prediction error-related responses
The observation that ventral striatal responses are stronger after delivery of a less likely gain is in agreement with the hypothesized role of the ventral striatum in encoding a reward-related prediction error. Previous studies have suggested that ventral striatal responses are correlated with a prediction error signal by either using Pavlovian or instrumental conditioning tasks (McClure et al., 2003; O'Doherty et al., 2003, 2004; Ramnani et al., 2004) or showing that the omission of reward leads to a deactivation in the ventral striatum (Pagnoni et al., 2002).
Our study confirms recent data (Abler et al., 2006) showing that in the ventral striatum, the positive response after reward delivery (i.e., gain trials) was greater if the reward was less likely to occur. Importantly, our data also extend these findings by showing a stronger deactivation in loss trials, when the loss was less likely to occur. Second, our data show a decrease of the BOLD signal below baseline in loss trials. As a consequence of omitted but predicted rewards, a decrease in neuronal firing has been observed in dopaminergic midbrain neurons in nonhuman primates (Schultz et al., 1997). The ventral striatum is one target of those dopaminergic midbrain neurons, and one might expect less presynaptic input and processing in the ventral striatum after omitted rewards. This reduction of presynaptic input can lead to a negative BOLD signal, as has been shown recently (Shmuel et al., 2006).
Prediction error signal scaled by magnitude
Primate data have suggested that dopaminergic midbrain neurons should be able to signal the magnitude of a prediction error (Tobler et al., 2005). In agreement with this data, we observed a prediction error signal that was not only modulated by the probability of the reward but also by its magnitude. Intuitively, this modulation is biologically plausible, because it is important for an organism to register whether an error in prediction concerns a small or a large reward. We noted that in a previous study, a magnitude-related outcome signal was observed in the dorsal rather than the ventral striatum (Delgado et al., 2003). However, the investigation of a prediction error signal was not the goal of this study.
Loss-related expected value and prediction error
We found a colocalization of EV− during anticipation and the associated prediction error during outcome in the amygdala, in accord with previous data (Breiter et al., 2001; Kahn et al., 2002; Glascher and Buchel, 2005), showing that the amygdala was involved in expressing predictions of aversive events.
Another study on classical conditioning using appetitive and aversive outcomes has shown the amygdala to play a role in signaling appetitive prediction errors and the lateral orbitofrontal and genual anterior cingulate cortex in prediction errors concerning aversive outcomes (Seymour et al., 2005), which seems to disagree with our findings.
However, this might be related to differences in the tasks used. In the study by Seymour et al. (2005), two specific conditioned stimuli (CSs) were either predictive of an appetitive (i.e., pain relief) or aversive (i.e., pain exacerbation) outcome, the alternative outcome was no change in state. In contrast, our paradigm used mixed gambles, i.e., a certain stimulus configuration could be considered as a single CS that can predict both an appetitive (i.e., gain) or an aversive (i.e., loss) outcome. A gambling task analogous to the learning paradigm by Seymour et al. (2005) would have been if the outcome was either a gain versus nothing or a loss versus nothing. Such a task has been used previously (Knutson et al., 2005), and the ventral striatum was found to express expected value. However, it should be noted that in designs in which the alternative to an appetitive outcome is no change in state, total EV and EV+ are identical. Therefore, such a paradigm cannot be used to disentangle both possibilities.
Model for prediction error signal
Our data show that the same parts of the ventral striatum that signal gain-related expected value during reward anticipation code the prediction error at outcome. The peak activations for EV+ and the related prediction error are almost identical, and the activated clusters overlap at p < 0.001. Moreover, our data lend support to the notion that not total EV but only EV+ represents the “prediction” against which outcomes are compared that generate the ventral striatal prediction error signal.
A recent primate study (Bayer and Glimcher, 2005), as well as a study on Parkinson's disease (PD) patients (Frank et al., 2004), has already hinted at the possibility that only gain-related predictions and the associated prediction errors might be expressed in the ventral striatum. The primate study showed that dopamine spike rates in the postreward interval seem to only encode positive reward prediction errors, and dopamine was therefore attributed to the positive reward prediction error term of reinforcement learning models (Bayer and Glimcher, 2005). In addition, it has been shown that PD patients, who have a dopaminergic deficit in the midbrain, are better at learning to avoid choices that lead to negative outcomes than learning from positive outcomes. Dopamine medication reversed this bias and made patients more sensitive to positive than negative outcomes (Frank et al., 2004). This finding might be related to our observation that the ventral striatum, which receives dopaminergic inputs from the midbrain, is predominantly expressing gain-related predictions.
With respect to the neurotransmitter system involved in the loss-related predictions, it has been advocated recently that the serotonergic system, which directly projects to the ventral striatum, is involved in this effect (Daw et al., 2002). However, an indirect effect through the amygdala, as would be expected by our data, are equally likely, given the presence of 5-HT receptors in the amygdala (Aggleton, 2000).
In summary, our data represent evidence for two dissociable value systems for gains and losses. The ventral striatum generates value predictions based on possible gains against which actual outcomes are compared. Conversely, the amygdala makes predictions concerning possible losses and, similar to the ventral striatum, compares these predictions against actual outcomes.
J.Y. was supported by the National Council of Technological and Scientific Development–CNPq, Brazil. J.G. was supported by the Studienstiftung des Deutschen Volkes. C.B. was supported by Volkswagenstiftung, the German Bundesministerium für Bildung und Forschung, and the Deutsche Forschungsgemeinschaft. We thank Eszter Schoell for helpful suggestions on a previous draft of this manuscript. We declare that we do not have any competing financial interest.
- Correspondence should be addressed to Christian Büchel, NeuroImage Nord, Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Building S10, Martinistrasse 52, D-20246 Hamburg, Germany.
- Abler et al., 2006.↵
- Aggleton, 2000.↵
- Bayer and Glimcher, 2005.↵
- Berns et al., 2001.↵
- Bless et al., 1994.↵
- Breiter et al., 2001.↵
- Büchel et al., 1998.↵
- Coricelli et al., 2005.↵
- Daw et al., 2002.↵
- Delgado et al., 2003.↵
- Dreher et al., 2006.↵
- Edwards, 1955.↵
- Elliott et al., 2000.↵
- Elliott et al., 2004.↵
- Ernst et al., 2004.↵
- Evans et al., 1994.↵
- Fiorillo et al., 2003.↵
- Frank et al., 2004.↵
- Friston et al., 1995.↵
- Glascher and Buchel, 2005.↵
- Haber et al., 1995.↵
- Kahn et al., 2002.↵
- Kahneman and Tversky, 1991.↵
- Kahneman and Tversky, 2000.↵
- Knutson et al., 2000.↵
- Knutson et al., 2001a.↵
- Knutson et al., 2001b.↵
- Knutson et al., 2003.↵
- Knutson et al., 2005.↵
- LaBar et al., 1998.↵
- Logothetis et al., 2001.↵
- Machina, 1987.↵
- Matthews et al., 2004.↵
- McClure et al., 2003.↵
- Nichols et al., 2005.↵
- O'Doherty et al., 2004.↵
- O'Doherty et al., 2003.↵
- Pagnoni et al., 2002.↵
- Pasqualini et al., 1996.↵
- Petry 1996.↵
- Ramnani et al., 2004.↵
- Rogers et al., 1999.↵
- Schultz and Dickinson, 2000.↵
- Schultz et al., 1997.↵
- Seymour et al., 2005.↵
- Sheehan et al., 1998.↵
- Shmuel et al., 2006.↵
- Tobler et al., 2005.↵
- Trepel et al., 2005.↵
- Tzourio-Mazoyer et al., 2002.↵
- Zink et al., 2004.↵