Abstract
During decision making, valuation of different types of rewards may involve partially distinct neural systems, but efficient choice behavior requires a common neural coding of stimulus value. We addressed this issue by measuring neural activity with functional magnetic resonance imaging while volunteers processed delayed and probabilistic decision options. Behaviorally, participants discounted both types of rewards in a hyperbolic manner, and discount rates, reflecting individual preferences, varied considerably between participants. Ventral striatum and orbitofrontal cortex showed a domain-general coding of subjective value regardless of whether rewards were delayed or probabilistic, strongly implicating these regions in the implementation of a common neural currency of value. In contrast, fronto-polar and lateral parietal cortex, as well as a region in the posterior cingulate cortex only correlated with the value of delayed rewards, whereas superior parietal cortex and middle occipital areas only represented the value of probabilistic rewards. These results suggest a mechanism for the neural coding of subjective value in the human brain that is based on the combination of domain-general and domain-specific valuation networks.
Introduction
A central aim in the field of decision neuroscience is to understand the neural mechanisms underlying human decision making and reward valuation. Although choice preferences are subjective, many studies have focused on objective properties of decision options, such as reward magnitude or probability. Subjective choice preferences have been extensively studied in the domains of probabilistic and intertemporal decision making (Green and Myerson, 2004; Kalenscher and Pennartz, 2008), i.e., delay discounting (DD) and probability discounting (PD), referring to the phenomena that the subjective value of rewards declines in a hyperbolic manner with increasing delay-to-reward and decreasing reward probability.
Until now, only two studies directly investigated the neural system representing the subjective value of discounted monetary rewards (Kable and Glimcher, 2007; Pine et al., 2009) and revealed that activity in ventromedial prefrontal cortex (PFC) and striatum correlated with subjective value, whereas posterior cingulate activity was only observed in the study by Kable and Glimcher (2007). Because these studies only investigated delay discounting, a number of open questions regarding the generality of the observed effects remain, in particular regarding risky decision making. For example, it is unclear whether the identified regions are specific for delayed reward valuation or are involved in coding for other aspects of rewards (Ballard and Knutson, 2009). Also, delayed and probabilistic rewards may be conceptually related (Hayden and Platt, 2007), because delay discounting may arise at least partly from the future being inherently more uncertain than the present (Rachlin et al., 1991). Therefore, the system identified by Kable and Glimcher, including the orbitofrontal cortex (OFC) (Wallis, 2007), which encodes the subjective pleasantness of stimuli (Kringelbach et al., 2003; O'Doherty et al., 2003a) and goal values during decision making (Padoa-Schioppa and Assad, 2006; Plassmann et al., 2007; Hare et al., 2008), may represent a generic system for subjective valuation of both delayed and probabilistic rewards (Tom et al., 2007).
Alternatively, different networks may integrate reward/ delay (Kable and Glimcher, 2007) and reward/probability information, with possible overlap being confined to domain-general reward valuation networks. This would be in line with data speaking against a unitary process account of DD and PD (Green et al., 1999; Green and Myerson, 2004; Chapman and Weber, 2006).
Here we used functional magnetic resonance imaging (fMRI) to investigate this issue. In experiment 1, volunteers first completed a range of behavioral testing sessions of DD and PD, during which we confirmed both short-term (∼4 d) and long-term (∼4 months) stability of subjective preferences. Subsequently, they participated in DD and PD tasks during fMRI, which involved the possibility of real gains. We analyzed changes in the blood oxygenation level-dependent (BOLD) signal varying parametrically with the subjective value of delayed and probabilistic monetary rewards, allowing us to investigate both overlapping and distinct neural networks involved in subjective valuation during DD and PD. We then conducted a second behavioral experiment to show that subjects are indifferent between delayed and probabilistic rewards of equal subjective value (i.e., the intrinsic value of the two types of rewards is the same).
Materials and Methods
Subjects.
In total, data from 22 subjects (mean age of 26.3 years; eight male) were included in experiment 1. Subjects were reimbursed for participation and provided informed written consent, and the study procedure was approved by the local ethics committee.
Behavioral pretests.
To confirm long-term stability of discounting and to adequately compute choice offers for the fMRI session, all participants completed two behavioral sessions before scanning (median time between behavioral sessions was 9 d). All experiments were implemented using the Presentation Software package (Neurobehavioral Systems). Median time between the second behavioral session and fMRI was 4 d. Additionally, 13 subjects completed tests of long-term stability (conducted on average 4 months apart). During the behavioral tests, subjects made repeated choices between €20 available immediately and greater amounts available at different delays or probabilities. The amount of the delayed/probabilistic option was reduced in a stepwise manner after two successive choices of the delayed/probabilistic reward and increased in a stepwise manner after two successive choices of the immediate reward. The algorithm terminated as soon as the difference between the accepted/rejected amounts reached a delay/probability-specific cutoff value, ranging from €0.5 to €4.
Indifference amounts were calculated by averaging the amounts of the delayed/probabilistic options that included the point of preference reversal, corresponding to the amount at which subjects were indifferent between the immediate and the delayed/probabilistic option. Indifference amounts were then converted into proportions of the fixed reward, and Equations 1 and 2 were fit to these data to obtain discount rates k using Matlab (MathWorks):
where SV is subjective (discounted) value, D is delay in days, and θ is reward probability P following odds-against-winning transformation:
The best-fitting discount rate k describes an individual's choice behavior. In the case of DD, smaller values of k reflect patient behavior and less discounting of future rewards, and greater values of k reflect steeper discounting and thus more impulsive choice behavior. For PD, smaller values of k reflect the willingness to take risky choices, whereas larger values of k reflect risk aversion.
fMRI task.
During scanning, participants made choices between a fixed, immediate reward of €20 and delayed or probabilistic offers of larger rewards. Based on the behavioral pretests, individual offers were computed for each participant to ensure that participants chose the delayed/probabilistic offer in ∼50% of trials. More specifically, the maximum amount of the delayed/probabilistic option was set to €80, and the minimum amount was set to €20.5. From this range of magnitudes, trials were constructed by selecting an equal, uniformly distributed number of offers with an estimated subjective value below and above the indifference point (based on the pretest data). In cases in which the indifference point was larger than €50, an equal number of trials with a subjective value below and above €50 were created. Delays and probabilities used (Rachlin et al., 1991) are listed in supplemental Table 1 (available at www.jneurosci.org as supplemental material).
Participants were instructed that the fixed, immediately available reward would not be displayed, and they would only be shown the alternative delayed or probabilistic offer. A green dot was shown for 500 ms (Fig. 1), signaling the start of the trial. Then, the delayed or probabilistic offer was shown for 2500 ms, followed by a red dot (jitter) that was shown for random duration between 3 and 7 s, drawn from a uniform distribution. Then, a red “X” and a green check mark were shown (randomly assigned to either side of the screen). Participants pressed the red X to choose the fixed reward of €20 and the check mark to choose the delayed/probabilistic offer. After response feedback, another 3–7 s jitter preceded the start of the next trial. Delay and probability trials were randomly intermixed. Subjects completed two sessions, which each lasted ∼22 min and comprised 48 delay trials and 48 probability trials, yielding a total of 96 trials per condition for each subject. Before scanning, participants were told that one of their choices would randomly be selected after scanning and that they would receive the reward amount as an e-mail gift certificate for an online shop (www.amazon.de) with the stated delay/probability. The average amount that participants received was €30.1 (range of €20–70).
Schematic illustration of the paradigm for experiment 1. Subjects made repeated choices between a fixed certain and immediate €20 reward and larger amounts of money available later (delay discounting) or with a reduced probability (probability discounting).
Analysis of the behavioral data from the scanning session used the same procedures as during the pretests. Reaction times (RTs) from the scanning session were additionally analyzed as a function of subjective value. First, trials were classified as trials in which the delayed/risky option had a higher subjective value than the reference amount of €20 and trials in which the subjective value was lower than the reference amount. The two-thirds of low-value trials with the lowest subjective values were then classified as “lower trials”, whereas the two-thirds of high-value trials with the highest subjective values were classified as “higher trials.” The remaining trials (in which subjective values were closest to the reference amount) were classified as “similar trials.”
fMRI data acquisition.
MRI data were acquired on a 3 T system (Siemens TIM-TRIO) using a 12-channel head coil. Five-hundred volumes, aligned to the line connecting anterior and posterior commissures, were acquired for each session, and the first 5 vol were discarded to allow for the BOLD signal to stabilize. Each volume comprised 40 slices with a voxel size of 2 × 2 × 2 mm and 1 mm gap (repetition time, 2.38 s; echo time, 25 ms). An additional magnetization-prepared rapid-acquisition gradient echo structural image was acquired for anatomical overlay (voxel size of 1 × 1 × 1 mm, 240 slices). Subjects viewed the experiment through a head-coil mounted mirror.
fMRI data analysis.
Data preprocessing and analysis was performed using SPM5 (Wellcome Department of Cognitive Neurology, University College London, London, UK). Functional images were slice time-corrected to the onset of the middle slice and spatially realigned using a six-parameter affine transformation. The high-resolution T1 image was then coregistered to the functional images and segmented into gray matter, white matter, and CSF using the VBM toolbox included in SPM5. Functional images were spatially normalized to Montreal Neurological Institute space using the normalization parameters obtained from the segmentation procedure and subsequently smoothed with a Gaussian kernel of 8 mm full-width at half-maximum.
Data analysis was performed using the general linear model (GLM) implemented in SPM5. The presentation of each type of option (delayed or probabilistic) was modeled by convolving the event train of stimulus onsets with the canonical hemodynamic response function separately for each session. In the model, for each event, a parametric regressor was included coding for the subjective value of the decision option, i.e., the objective amount multiplied by the experimentally derived discount fraction for this participant. Additional parametric regressors for reward probability/inverse delay-to-reward and reward amount were included in the model and orthogonalized with respect to the subjective value regressor. Apart from this model, we investigated three additional models. In the first one, the order of orthogonalization was changed such that the subjective value regressor was orthogonalized with respect to delay/probability and magnitude. Finally, we investigated two models including only reward magnitude or inverse delay-to-reward/reward probability as parametric regressors. Error trials and button presses were modeled separately. To account for residual variance caused by subject movement, the realignment parameters were included as additional regressors at the first level. For each subject, contrast images were computed for the parametric regressor coding for the subjective value of each option. We then entered these contrast images into a second-level random effects model using the flexible factorial design of SPM5. The model included a subject factor and the factor trial type (delayed/probabilistic).
For all analyses, the threshold was set to p < 0.05 corrected for multiple comparisons (based on the familywise error rate). Correction for multiple comparisons was performed using spherical search volumes centered at peak coordinates from previous studies. Correction for the posterior cingulate (Kable and Glimcher, 2007) and right OFC (Hare et al., 2008) was performed using 12 mm spheres. Correction for frontal pole (Addis et al., 2007) and left lateral parietal cortex (Kable and Glimcher, 2007) used 8 mm spheres, because the coordinates were ∼8 mm from the cortical surface (supplemental Table 3, available at www.jneurosci.org as supplemental material).
For display purposes, we used a threshold of p < 0.001, uncorrected, with 10 contiguous voxels throughout this report. Plots of contrast estimates for the parametric regressors were created using the toolbox rfxplot for SPM5 (Gläscher, 2009). All activations are shown projected onto the mean structural scan of all participants.
Results
Experiment 1
Long-term and short-term stability of discount rates
Long-term stability of discount rates was assessed in a subset of participants (n = 13). Discount rates were tested on average 119 d apart (range of 79–210 d) and showed good stability between testing sessions (test–retest DD, r = 0.69, p = 0.009; PD, r = 0.76, p = 0.0026) (Fig. 2a,c), providing strong evidence for a trait-like stability of individual preferences (Ohmura et al., 2006; Kable and Glimcher, 2007; Ballard and Knutson, 2009). Across all subjects (n = 22), discount rates from a behavioral pretest shortly before the fMRI session (median interval between this pretest and scanning was 4 d) were highly correlated with discount rates observed during fMRI (test–retest DD, r = 0.78, p = 0.000019; PD, r = 0.74, p = 0.00008) (Fig. 2b,d), indicating that the individual propensity for participants to be impulsive or risk-averse was well preserved during fMRI.
Discount rates (k parameters) showed long-term stability across 4 months (n = 13 subjects; a, delay discounting, p = 0.009; c, probability discounting, p = 0.0026) and between behavioral and fMRI sessions (n = 22 subjects; b, delay discounting, p = 0.00001; d, probability discounting, p = 0.00008).
Behavioral data during fMRI
Choice preferences during scanning were again well characterized by hyperbolic functions (median R2: DD, 0.81; PD, 0.93). The hyperbolic discounting model accurately predicted participants' choices (mean ± SD accuracy: DD, 84.82 ± 8.27%, one-sample t test vs chance, t(21) = 48.01, p < 0.001; PD, 86.75 ± 6.59%, one-sample t test vs chance, t(21) = 61.75, p < 0.001) and accuracy was comparable for DD and PD (paired t test, t(21) = −0.964, p = 0.346). Discount rates showed considerable between-subject variability [median (range) discount rates: DD, 0.007 (0.0003–0.061); PD, 4.232 (0.824–17.999)], and there was a negative but nonsignificant correlation between the DD and PD discount rates (r = −0.24, p = 0.28). The median discount function (based on the median indifference points across subjects) is depicted in Figure 3b. Individual, subject-specific discount functions are depicted in Figure 3c–e, illustrating the between-subject variability in choice preferences that was observed.
Behavioral data from experiment 1 for DD (red) and PD (blue). a, RTs for trials with an option value lower, similar, or higher than the reference amount (see Materials and Methods). RTs were shorter for trials with higher-value options. b, Median discount functions from the fMRI sessions. c–e show single-subject data: subject 01 was close to the group median, subject 09 was the most impulsive subject, and subject 21 was the most patient subject. Single-subject panels show (from left to right) subjective (discounted) value as a proportion of objective value and fitted hyperbolic discount functions, monetary amounts subjectively equivalent to an immediate/certain reward of €20 (indifference amounts), and proportions of delayed/risky choices for the different delays/probabilities. (Note that missing data points, e.g., the missing 180 d data point in c, indicate that the subject never chose a reward with a delay of 180 d. Missing data points in b indicate that more than 3 subjects never chose a reward with the particular delay/probability.)
We analyzed RTs as a function of the subjective value of the delayed/risky option (lower, similar, or higher than the reference amount; see Materials and Methods) (Fig. 3a) in a condition (DD/PD) × value (lower/similar/higher) ANOVA. Reaction times were comparable for the two conditions (mean ± SD RT: DD, 791 ± 86 ms; PD, 809 ± 95 ms; main effect condition, F(1,21) = 0.022, p = 0.833). There was a main effect of value (F(1.912,40.161) = 20.731, p < 0.001) but no interaction with condition (condition × value interaction, F(1.947,40.161) = 1.794, p = 0.18), reflecting the fact that RTs were generally faster in higher value trials relative to lower and similar valued trials.
fMRI data
We then searched for brain regions in which the amplitude of the hemodynamic response showed a positive correlation with subjective (discounted) value. Subjective value was calculated by multiplying the objective amount of the delayed/probalistic reward by the subject-specific discount fraction and was included as a parametric regressor in a general linear model using SPM5. Inverse delay-to-reward (for DD trials) and probability (for PD trials) in addition to the absolute amount were included as additional parametric regressors, which were orthogonalized with respect to subjective value.
Subjective valuation during delay discounting
Figure 4 (left) shows regions in which brain activity showed a positive correlation with the subjective value of delayed rewards. A complete list of activations can be found in supplemental Table 2 (available at www.jneurosci.org as supplemental material). We replicated previous findings in a smaller sample (Kable and Glimcher, 2007), observing correlations with subjective value in posterior cingulate cortex [peak coordinates x, y, z (in mm): −2, −46, 32; z value = 4.33], medial PFC/frontal pole (−8, 52, 16; z value = 4.05), bilateral lateral parietal cortex (left: −58, −46, 36; z value = 5.72; right: 54, −42, 26; z value = 4.98), and left ventral striatum (−8, 4, −8; z value = 5.29). Figure 4 (right) and supplemental Table 3 (available at www.jneurosci.org as supplemental material) show regions in which the subjective value correlation was significantly greater for DD than PD, and, with the exception of the ventral striatum, the same set of regions showed this pattern, indicating that the regions identified by Kable and Glimcher (2007), with the exception of the ventral striatum, correlate better with the subjective value of delayed than probabilistic monetary rewards.
Brain regions in which activity showed a positive correlation (display threshold, p < 0.001, uncorrected) with subjective value of delayed rewards (left) and that showed a significantly better correlation with subjective value during delay discounting than during probability discounting (right). The parameter estimate of the subjective value regressor was positive for DD, whereas it was ∼0 for PD in both FP/mPFC (mean ± SEM parameter estimate: DD, 1.59 ± 0.32; PD, −0.48 ± 0.32) and PCC (mean ± SEM parameter estimate: DD, 1.86 ± 0.42; PD, −0.57 ± 0.42). FP, Frontal pole; mPFC, medial prefrontal cortex; VS, ventral striatum; LPC, lateral parietal cortex; PCC, posterior cingulate cortex; L, left; R, right.
Subjective valuation during probability discounting
Figure 5 (left) shows that a network of regions, the most pronounced being located in the right superior/inferior parietal lobule (42, −38, 44; z value = 5.26) and the left middle occipital gyrus (−48, −62, −10; z value = 5.10), along with ventral striatum (−8, 4, −8; z value = 5.13), correlated with subjective value during PD (for a complete list, see supplemental Table 4, available at www.jneurosci.org as supplemental material). A subset of these regions showed significantly better correlations with subjective value during PD than during DD (Fig. 5, right) (supplemental Table 5, available at www.jneurosci.org as supplemental material), the most pronounced clusters being located in the right inferior/superior intraparietal lobule and left middle occipital gyrus.
Brain regions in which activity was positively correlated with the subjective value of the probabilistic option (display threshold, p < 0.001, uncorrected). Regions of the intraparietal sulcus, bilateral posterior parietal, prefrontal and inferior temporal cortices, as well as the ventral striatum showed this pattern (left). A similar set of regions, with the exception of the ventral striatum, showed a better correlation with subjective value during PD than during DD, including the intraparietal sulcus and middle occipital gyrus (right). Parameter estimates of the subjective value regressor were positive for PD and below or ∼0 for DD, for both the IPS (mean ± SEM parameter estimate: DD, −0.85 ± 0.38; PD, 2.92 ± 0.38) and MOG (mean ± SEM parameter estimate: DD, −0.11 ± 0.49; PD, 3.49 ± 0.49). IPS, Intraparietal sulcus; MOG, middle occipital gyrus; L, left; R, right.
A core network for subjective reward valuation
We then performed a conjunction analysis (Nichols et al., 2005) searching for regions that correlate with subjective value during both DD and PD. Note that this conjunction analysis requires that a given voxel exceeds the threshold in both contrasts independently. Left ventral striatum (−8, 4, −8; z value = 5.13; mean ± SEM parameter estimate: DD, 3.12 ± 0.39; PD, 2.95 ± 0.40) and right central OFC (26, 18, −16; z value = 3.62; mean ± SEM parameter estimate: DD, 1.68 ± 0.35; PD, 1.51 ± 0.35) coded for the subjective value of both delayed and probabilistic rewards, strongly implicating this network in domain-general reward valuation (Fig. 6) (supplemental Table 6, available at www.jneurosci.org as supplemental material). At a reduced uncorrected threshold of p < 0.005, this conjunction also revealed activity in a region of the ventromedial PFC (−4, 34, −6; z value = 3.09; mean ± SEM parameter estimate: DD, 1.27 ± 0.36; PD, 1.27 ± 0.36).
Regions in which the correlation with subjective value was significant (display threshold, p < 0.001, uncorrected) for both DD and PD in left ventral striatum (VS; a) and right OFC (b). L, Left; R, right.
Additional models
For completeness, we report the results for the delay/probability and magnitude regressors orthogonalized with respect to subjective value in supplemental Tables 7–10 (available at www.jneurosci.org as supplemental material).
Results from an additional GLM in which the order of orthogonalization was changed (i.e., the subjective value regressor was orthogonalized with respect to delay/probability and magnitude) are provided in supplemental Tables 11 and 12 (available at www.jneurosci.org as supplemental material). We also investigated two additional GLMs including only single parametric regressors, one in which only inverse delay-to-reward/reward probability were included and one model including only reward magnitude (Kable and Glimcher, 2007). The results from these analyses can be found in supplemental Tables 13–16 (available at www.jneurosci.org as supplemental material). Subjective value correlated better with the fMRI data than reward magnitude, inverse delay-to-reward or reward probability alone, in all of the above mentioned regions from the primary GLM, supporting our preference in interpreting the fMRI data in terms of subjective value rather than other aspects of the rewards.
Experiment 2
Experiment 1 was based on the assumption that delayed and probabilistic rewards are equally valuable if their discounted value is the same. To directly test this assumption, we conducted an additional behavioral experiment.
Participants (n = 18, 13 also participated in experiment 1) made repeated choices between €20 available with a given delay and €20 available with a given probability (supplemental Methods, available at www.jneurosci.org as supplemental material). As in experiment 1, delays and probabilities were computed based on previous behavioral testing sessions such that, in half the trials, the delayed option had the greater subjective value, and in the remaining trials the probabilistic option had the greater subjective value. If subjective value is sufficient to account for choice behavior in this setting, this would indicate that the two types of rewards have comparable intrinsic values. In contrast, if either delayed or probabilistic rewards are systematically preferred in cases of similar discounted value, this would indicate that the two types of rewards have different intrinsic values.
We first computed the proportion of trials in which subjects chose the delayed reward, as a function of the difference in subjective (discounted) value between the delayed and the probabilistic options. Group data are plotted in Figure 7a, and, on average, subjects were equally likely to choose the delayed or the probabilistic option at a value difference of 0. To quantify this effect, we fit logistic functions to individual subject data (Fig. 7b), and the subjective value difference at which participants were indifferent between the delayed and probabilistic options were derived (FitzGerald et al., 2009). The mean indifference point is depicted in Figure 7c, showing that participants were indifferent between delayed and probabilistic rewards at a value difference of −0.003, which is not significantly different from 0 (one-sample t test against zero, t(17) = −0.074, p = 0.942), indicating no systematic preference for either type of reward.
Behavioral data from experiment 2. Subjects (n = 18) made choices between delayed and probabilistic offers of €20. Plotted are proportions of delayed choices as a function of the subjective value difference (delayed − probabilistic) between the two options for group average data (a), logistic functions fitted to data from three individual subjects (b), and the value difference (based on individual logistic fits) at which subjects were indifferent between the delayed and probabilistic decision options (c).
Discussion
We investigated the neural coding of subjective value in the context of intertemporal (delay discounting) and risky decision making (probability discounting). Behaviorally, our data show that participants discounted monetary value over both time and probability in a hyperbolic manner. Furthermore, individual discount rates were highly stable over a time period of up to 4 months. Analysis of the neuroimaging data (experiment 1) revealed two main findings. First, a common system including ventral striatum and OFC coded for subjective value of both delayed and probabilistic monetary rewards, strongly implicating this network in providing the neural basis of a common neural currency of stimulus value. Second, we also identified regions in which value coding was specific for delayed or probabilistic rewards, and experiment 2 suggests that these effects are unlikely to be attributable to different intrinsic values associated with delayed and probabilistic rewards.
Ventral striatum and OFC activity correlated with subject-specific estimates of stimulus value, derived from a hyperbolic model of reward discounting. Orbitofrontal cortex is known to code for the value of decision options (Padoa-Schioppa and Assad, 2006; Hare et al., 2008; De Martino et al., 2009) as well as value differences between decision options (FitzGerald et al., 2009). OFC activation also increases as a function of subjective characteristics of stimuli, such as pleasantness or attractiveness (Kringelbach et al., 2003; O'Doherty et al., 2003a; Plassmann et al., 2008). A similar role has been suggested for the ventromedial PFC (Kable and Glimcher, 2007; Hare et al., 2008, 2009; Gläscher et al., 2009), and we also observed a common value signal in this region, albeit at a reduced uncorrected threshold. The role of the ventral striatum in reward valuation is somewhat more controversial. A number of studies have reported that ventral striatum activity scales with subjective value (Kable and Glimcher, 2007), subjective preferences for certain products (Knutson et al., 2007, 2008), or subject-specific distortions in the probability weighting function (Hsu et al., 2009). Neurophysiological data also suggest the existence of value signals in the striatum (Samejima et al., 2005; Lau and Glimcher, 2008). Conversely, it has been argued recently (Hare et al., 2008) that some of these findings might be attributable to the ventral striatum coding a prediction error rather than a value signal. Decision options with a high subjective value (i.e., a subjective value greater than average) may elicit a positive prediction error, whereas options with a low subjective value may elicit a negative prediction error (compared with the average option). These possibilities are difficult to differentiate experimentally (but see Hare et al., 2008), in particular because a prediction error signal requires the existence of a representation of stimulus value, which can be compared with the expected (predicted) value. Given that value and prediction errors are often highly correlated, it is not surprising that both OFC and ventral striatum have been reported previously to code for prediction errors (O'Doherty et al., 2003b). Nonetheless, based on previous findings implicating the OFC in valuation (Kringelbach et al., 2003; O'Doherty et al., 2003a; Plassmann et al., 2007, 2008) and the ventral striatum in prediction error coding (McClure et al., 2003; Abler et al., 2006; Hare et al., 2008), one possibility would be that, in our data, the OFC codes for stimulus value, whereas the ventral striatum codes a related prediction error (or alternatively a prediction error and a value signal). However, regardless of this distinction, the present findings show that the ventral striatum and OFC are part of an integrated system, jointly supporting reward processing in a manner that is independent of the precise nature of the decision option that is being evaluated.
In line with previous findings (Platt and Glimcher, 1999; Dorris and Glimcher, 2004), we observed valuation signals in the parietal cortex. However, as illustrated in supplemental Figure 1 (available at www.jneurosci.org as supplemental material), a more lateral parietal region coded for value during DD, whereas a more superior parietal region coded for value during PD, with minimal overlap. Our data therefore confirm a role of the parietal cortex in representing stimulus value but suggest possible differences as a function of decision option.
Regarding domain-specific valuation systems, our results indicate a specific role of the frontal pole and a subregion of the posterior cingulate cortex in coding for value in the context of delays. The posterior cingulate region that we identified as delay specific overlaps with a region identified previously (Kable and Glimcher, 2007), but some caution is warranted because other parts of posterior cingulate cortex showed probability-specific value coding (supplemental Table 5, available at www.jneurosci.org as supplemental material), and another study did not observe value signals in the posterior cingulate during DD (Pine et al., 2009). However, the latter finding may be attributable to methodological differences such as the fact that choices were always between two delayed rewards in this study. Nonetheless, the delay-specific regions overlap with a system implicated in episodic future thought (Addis et al., 2007; Schacter et al., 2007; Szpunar et al., 2007). In particular, the frontal pole is more involved in elaboration of future scenarios than elaboration of past scenarios (Addis et al., 2009) and shows a positive correlation with the amount of details generated during future thinking (Addis and Schacter, 2008). One speculative hypothesis regarding these effects may thus be that this system supports the subjective valuation of delayed rewards by means of its role in generating internal simulations of future outcomes associated with the anticipated reward delivery. This would be in line with a recent study (Luhmann et al., 2008) that showed activity increases in posterior cingulate, frontal pole, and lateral parietal cortex with increasing anticipated delay until reward delivery. In line with these findings, another study (Weber and Huettel, 2008) suggested that posterior cingulate and ventral striatum are more involved in intertemporal choices, whereas regions in the posterior parietal cortex and lateral PFC are more involved in risky choices. However, the interpretation of these data is limited because, in contrast to the present and another study (Kable and Glimcher, 2007), the authors did not investigate neural value signals. Our data, conversely, suggest that, over and above a possible role of these regions in mentally simulating impeding waiting times during intertemporal choice (Luhmann et al., 2008), these regions may specifically integrate reward magnitude and delay (but not probability) into a neural code of subjective value.
In contrast to delay discounting, regions in which activity better correlated with subjective value during probability discounting encompassed a range of regions (see supplemental Results, available at www.jneurosci.org as supplemental material) including the bilateral superior parietal cortex, middle occipital regions, and bilateral dorsolateral PFC. Because we did not have specific a priori hypotheses regarding probability-specific networks, we will focus on the most pronounced clusters. We note that the peaks in the right superior parietal cortex and the left middle occipital gyrus are remarkably close to regions implicated in a range of different tasks that involve the processing of numerosities and magnitudes, regardless of format of presentation (Pinel et al., 2004; Piazza et al., 2007), suggesting an abstract coding of magnitudes in these regions. Only during PD it is possible to compute expected values, and a preliminary interpretation would be that such computations may underlie the increased recruitment of neural structures involved in numerosity processing during PD.
The observation of these extensive domain-specific valuation networks is in line with the hypothesis that distinct processes underlie DD and PD (Green and Myerson, 2004). Nonetheless, intertemporal and risky preferences may still be inversely related with respect to the personality traits of impulsivity and risk aversion (Hayden and Platt, 2007). This may potentially provide a common conceptual framework for future studies of discounting.
The present findings thus suggest a possible mechanism for the coding of subjective value in the human brain that is based on two distinct types of value signals. On the one hand, subjective value coding involves mechanisms that are specific to the type of decision option that is evaluated. In the present setting, these mechanisms included frontal pole, a subregion of the posterior cingulate and lateral parietal cortex for the valuation of delayed rewards, and, among others, middle occipital gyrus and superior parietal cortex that were specific for the valuation of probabilistic rewards, and both networks may encode domain-specific representations of a particular choice outcome. Our data furthermore suggest that such different decision options may compete for behavioral control via a common neural value signal instantiated in the ventral striatum and OFC (Hare et al., 2008). Although, in particular in the light of the present data, such a common value signal appears to be a plausible basis for adaptive decision making, it should be pointed out that it is not a necessary prerequisite, because domain-specific value signals may also compete directly for the control of behavior.
Together, our data dissociate neural mechanisms of reward valuation into domain-specific and domain-general networks. Ventral striatum and OFC coded for subjective value in a domain-general manner, integrating results from domain-specific valuation systems into a common neural currency of value, through which decision options from different domains may compete for neurocognitive resources.
Footnotes
-
This study was funded by the Department of Systems Neuroscience.
- Correspondence should be addressed to Dr. Jan Peters, Neuroimage Nord, Department of Systems Neuroscience, University Medical Center Hamburg–Eppendorf, Martinistraße 52, 20246 Hamburg, Germany. jpeters{at}uke.uni-hamburg.de