Abstract
A characteristic marker of impulsive decision making is the discounting of delayed rewards, demonstrated via choice preferences and choice-related brain activity. However, delay discounting may also arise from how subjective reward value is dynamically represented in the brain when anticipating an upcoming chosen reward. In the current study, brain activity was continuously monitored as human participants freely selected an immediate or delayed primary liquid reward and then waited for the specified delay before consuming it. The ventromedial prefrontal cortex (vmPFC) exhibited a characteristic pattern of activity dynamics during the delay period, as well as modulation during choice, that is consistent with the time-discounted coding of subjective value. The ventral striatum (VS) exhibited a similar activity pattern, but preferentially in impulsive individuals. A contrasting profile of delay-related and choice activation was observed in the anterior PFC (aPFC), but selectively in patient individuals. Functional connectivity analyses indicated that both vmPFC and aPFC exerted modulatory, but opposite, influences on VS activation. These results link behavioral impulsivity and self-control to dynamically evolving neural representations of future reward value, not just during choice, but also during postchoice delay periods.
Introduction
Intertemporal decision making requires evaluation of future outcomes that vary both in their magnitude and time of delivery. A reliable finding is that individuals discount the value of rewards that are delayed in time (Mischel et al., 1989; Frederic et al., 2002; Ainslie, 2005). This phenomenon, known as delay discounting, also exhibits substantial individual differences, reflecting a critical aspect of behavioral impulsivity (Madden and Bickel, 2009; Peters and Büchel, 2011). While impulsivity is characterized by steep discounting of delayed rewards, and/or overvaluation of immediate rewards (IRs) (Kirby et al., 1999; Kable and Glimcher, 2007), patience may arise from self-control processes that bias attention toward, and/or enhance the valuation of delayed rewards (McClure et al., 2004; Luo et al., 2009; Figner et al., 2010).
Prior studies of intertemporal decision making have identified brain regions associated with the time-discounted coding of subjective value (SV), such as ventromedial prefrontal cortex (vmPFC) and ventral striatum (VS), but have focused exclusively on activation occurring at the time of choice (for review, see Montague et al., 2006; Carter et al., 2010; Peters and Büchel, 2010a). In contrast, the examination of postchoice neural activity has been neglected, although the choice of a delayed reward is, by definition, always followed by a waiting period. This may be because previous work has used hypothetical rewards and/or delay timescales that were not amenable to examination in a controlled manner. The current study directly addresses these limitations by using a decision-making paradigm for primary rewards that were delivered and consumed after an experienced, unfilled delay interval of 30–60 s (Jimura et al., 2009, 2011) (see Figs. 1A,B). A key goal of this study was to monitor the dynamics of delay-related activation unconfounded by residual decision-related activity (cf. Berns et al., 2006; McClure et al., 2007).
An important reason for examining postdecision brain activity is that the subjective value of a future reward might be continuously estimated during this delay interval. Specifically, it seems likely that subjective value is represented in a temporally evolving manner, with value estimates increasing in a nonlinear fashion as the time to reward delivery decreases (Montague and Berns, 2002; Green and Myerson, 2004; Kalenscher and Pennartz, 2008; Rangel et al., 2008). Additionally, the postdecision period may engage a distinct dynamic valuation component related to anticipation of the chosen reward. In particular, some economic models postulate that anticipation of a future reward itself confers current utility (Loewenstein, 1987; Berns et al., 2006). This component, known as anticipatory utility (AU), entails complementary characteristics to the time-discounted goal value, because of the intertemporal trade-off between them (Loewenstein, 1987).
The examination of postdecision neural activity dynamics provides an important opportunity to explore how impulsivity and self-control are related to the dynamic representation of reward value. Specifically, the patience needed to select delayed over immediate rewards might be associated with the representations of anticipatory utility during the postdecision period, which become more prominent in patient individuals. Conversely, impulsivity in decision making might be reflected in a steeper rise of reward valuation while awaiting future chosen rewards.
Materials and Methods
Human participants performed an intertemporal decision-making task that, in previous behavioral studies, we have demonstrated evokes a robust and stable form of delay discounting across short timescales (i.e., 10–60 s) (Jimura et al., 2009, 2011). We used this paradigm in conjunction with fMRI to focus on individual differences in delay discounting and associated brain activity dynamics, testing whether impulsivity and self-control are reflected in the evolving pattern of subjective value representation occurring during the window between choice and reward delivery.
Participants.
Participants (N = 43; mean age, 23.0 years; range, 18–35 years; 20 male, 23 female) were right handed and free from any history of psychiatric or neurological disorders. Each participant provided written informed consent after additional screening for physical or medical condition affecting eligibility for fMRI. The study protocol was approved in accordance with guidelines instituted by the Washington University Human Research Protection Office. Participants were compensated for their participation ($10 per hour for the behavioral session, $25per hour for the fMRI session). Of the 45 participants recruited into the study, two were eliminated due to the small number of choices (<10) for the delayed option in the fMRI session. No participants who participated in our previous experiments (Jimura et al., 2009, 2011) were recruited.
Apparatus.
E-Prime programs (Psychology Software Tools) controlled the behavioral task as well as the delivery of liquid rewards via a syringe pump (SP210iw; World Precision Instruments). Liquids from two 60 ml plastic syringes mounted on the pump were merged into one tube and then delivered to the participant's mouth through a plastic tube. The simultaneous use of two syringes allowed for a comfortable flow rate of 2.0 ml/s. The reward was delivered in 0.4 ml squirts but was experienced as a continuous flow. The amount of reward a participant received in a given trial was determined by the number of squirts.
Behavioral procedure.
The experiment consisted of two sessions (behavioral and fMRI) administered on separate days. In both of the sessions, participants were instructed not to drink any liquid for 4 h before the experiment. The behavioral session provided an estimate of each participant's delay-discounting rate. Previous work demonstrated that discounting behavior in this task is stable across sessions (Jimura et al., 2011). The results of the behavioral session were used for characterization of individual differences and optimization of the amount of immediate reward in the fMRI session (see fMRI session procedure, below). Before the experiment, participants were asked to choose one favorite drink that would serve as the reward from a list consisting of apple, orange, grape, grapefruit, and cranberry juices, lemonade, and water. No participants requested to change the reward drink in the fMRI session.
Behavioral task.
In both of the behavioral and fMRI sessions, participants performed the intertemporal decision-making task that we developed previously (Fig. 1A,B) (Jimura et al., 2009, 2011). At the beginning of each trial, two alternatives were presented at left and right locations on the screen: one involving a larger amount reward (20 or 40 squirts) available after a delay (10, 30, or 60 s) and a variable smaller amount available immediately. Participants were instructed to press one of two corresponding response buttons to indicate their preference. If the smaller, immediate amount was chosen, then a message appeared on the screen indicating that reward delivery could now begin. If the delayed reward was chosen, the participant had to wait to receive the reward. The choice period lasted from the onset of the choice presentation until participants' response, with the delay period starting immediately after the response. During the delay, a fixation cross was presented on the center of the screen. The background color of the screen was blue while the remaining time was 30–60 s, and red while remaining time 0–30 s. This was aimed to minimize the error of time estimation of reward outcome in the long delay condition.
Behavioral paradigm. A, Participants made a choice between larger amount of liquid available after seconds and smaller amount of liquid available immediately. B, In each trial, participants waited for the delay indicated after making a decision, and then drank the liquid before the next trial. The procedure was identical in the behavioral and fMRI sessions. C, Model-based value representation of future reward during the delay period. The subjective value model (orange) represents a gradual increase from the start of the delay toward reward outcome, whereas the anticipatory utility (blue) models an inverse pattern, a gradual decrease. The dotted and solid lines indicate economic and BOLD convolution models, respectively.
At the time of reward delivery, participants saw a visual message indicating the reward was ready. Importantly, participants were able to control the rate of liquid flow. Reward delivery continued as long as the button was held down; if the button was released, delivery paused and then resumed when the button was pressed again. During reward delivery, the amount remaining (in squirts) was displayed below a red horizontal bar whose length corresponded to the number of squirts still available. After the participant finished drinking, a fixation cross was presented.
Task trials had a constant onset asynchrony of 150 s independent of participants' choices. This procedure was implemented to prevent participants from selecting a strategy of repeatedly choosing the immediate option to maximize the reward attained per unit time (Fig. 1B). Because of this constant onset asynchrony, the optimal strategy to maximize reward rate is to always choose the delayed option. The duration of each trial thus varied, depending on participants' choice and the length of the delay period, from ∼24 to ∼84 s. Then, a variable intertrial interval (ITI) occurred, comprising both fixation and distractor task periods distributed across the ITI. The distractor tasks were included to minimize the effects of previous decisions, i.e., encouraging the treatment of each trial as a “one-shot” decision. Specifically, each task trial was followed by four pseudorandomized distractor trials, two trials of a Sternberg-type working memory task (Jimura et al., 2010, 2011) and two trials of a second delay-discounting task using hypothetical monetary rewards (Jimura et al., 2011). To inform participants of the type of task to be performed on the upcoming trial, a cue (i.e., “MONEY,” “JUICE,” or “MEMORY”) was presented for 2 s before each trial. Data from these distractor trials, concerning the relationship between working memory and delay discounting, will be described in a forthcoming report.
Behavioral session procedure.
Each participant performed the task sitting on a chair. To estimate individuals' delay-discounting rates, the current study used three delay conditions (10, 30, 60 s) for the larger amount (40 squirts), and two delay conditions (10, 30 s) for the smaller amount (20 squirts) (Jimura et al., 2009). The conditions were presented pseudorandomly, and the participants made three choices for each of the delay conditions. On the first trial of each delay condition, the choice was between a larger delayed amount and an immediate reward that was half of the delayed amount. For each delay condition, after the first trial, the amount of immediate reward was adjusted based on the participant's preceding choice. If the participant had chosen the smaller, immediate reward on the preceding trial, then the amount of the immediate reward was decreased by half (i.e., 10 and 5 squirts for the 40 and 20 squirt conditions, respectively); if the participant had chosen the larger, delayed reward on the preceding trial, then the amount of the immediate reward was increased by half (Jimura et al., 2009, 2011). The adjustment amount was five squirts in the third trial in the 40 squirt condition. The subjective value for the delayed reward was estimated to be equal to 1 ml (i.e., 2.5 squirts) more or less than the amount of immediate reward available on the last trial (third and second trials in 40 and 20 squirt conditions, respectively), depending on whether the delayed or immediate reward had been chosen on that trial. Although satiety and timing constraints resulted in only a few trials used at each delay and amount level for subjective value estimation, our previous work suggests that delay discounting in this paradigm is both robust and highly reliable at the individual level (test–retest reliability, r = 0.92) (Jimura et al., 2011).
Each experimental session began with two forced-choice trials and two practice trials, which familiarized participants with the choice procedure as well as with the rewards and delays. The syringes were refilled after every six trials, and the maximum amount of liquid that could be obtained in the session was 248 ml.
After the participants completed the task, they practiced drinking liquid though a plastic tube in a supine position in a mock scanner. They were instructed not to move their head while drinking, but were encouraged to move jaws and use muscles around the mouth to swallow the liquid. It is important that participants could fully control liquid flow into their mouth by holding down and releasing the button (see Behavioral task, above), which enabled the flow rate to be kept at a level that was comfortable without excess movement. After they completed five practice trials (three 16 ml and two 8 ml trials), they were asked whether they had any difficulties in drinking, and whether the practice drink was comfortable. No participants reported difficulties in drinking in the supine position or significant changes in comfort relative to a sitting position.
fMRI session procedure.
During fMRI scanning, participants engaged in intertemporal decision-making trials that were similar to those of the behavioral session. The primary difference was that the choice options for each trial were prespecified (rather than adjusted across the session), but set in an individualized manner based on discounting profile estimated from the behavioral session. Three conditions (60 s/40squirts; 30 s/40squirts; 30 s/20squirts) were used to measure brain activity during the delay period. The value of the IR was systematically manipulated such that, across trials, its value was one of the four different levels, 20, 40, 60, or 80%, of the subjective value of the delayed reward, estimated for each participant based on their choice profile in the behavioral session. This IR manipulation biased decisions toward delayed options, as the IR value was always smaller than the subjective value of delayed reward, providing more opportunity to measure brain activity during the delay period. Additionally, because the IR value was parametrically manipulated, it allowed examination of the effect of this variable on choice performance and brain activation.
When drinking liquid rewards, the participants were encouraged to use jaw movements and mouth muscles for swallowing, but not to move their head (a skill that had been practiced previously in a mock scanner setup; see Behavioral session procedure, above). Analyses of drinking related movements and their effects on image quality indicated that these were minimal (see Liquid delivery in fMRI session, below).
Each scanning run included six trials, with three or four scanning runs performed in the session, depending on the satiety of the participant. All trial variables (IR value, delay, amount of delayed rewards) were pseudorandomly intermixed within and across scans, with the constraint that the two middle IR levels were presented twice for each condition. Each experimental session began with two forced-choice trials and two practice trials, which familiarized participants with the choice procedure as well as with the rewards and delays. The maximum amount of liquid that could be obtained in the fMRI session was 344 ml.
It is important to note that the experimental design and procedure minimized any trial-by-trial learning and adjustment of choice biases across trials within the session. First, at the time of the scanning session, participants were highly familiar with the decision-making paradigm and associated reward consumption, having already experienced a previous behavioral session, mock-scanner training, and presession familiarization trials. Second, as described above, we previously found choice preferences to be highly stable across sessions (Jimura et al., 2011). Finally, the ITI (duration from the offset of liquid delivery to the onset of the next choice) was long (>1 min) and filled with distractor trials that disrupted memory for previous trial choices and encouraged participants to treat each trial as a one-shot decision.
Imaging procedures.
fMRI scanning was conducted on a whole-body Siemens 3T Trio System. A pillow and tape were used to minimize head movement in the head coil. Headphones dampened scanner noise and enabled communication with participants. Both anatomical and functional images were acquired from each participant. High-resolution anatomical images were acquired using an MP-RAGE T1-weighted sequence [repetition time (TR), 9.7 s; echo time (TE), 4.0 ms; flip angle (FA), 10°; slice thickness, 1 mm; in-plane resolution, 1 × 1 mm2]. Functional [blood oxygen level-dependent (BOLD)] images were acquired using a gradient echo-planar imaging sequence (TR, 2.0 s; TE, 27 ms; FA, 90°; slice thickness, 4 mm; in-plane resolution, 4 × 4 mm2; 34 slices) in parallel to the anterior–posterior commissure line, allowing complete brain coverage at a high signal-to-noise ratio. Each functional run involved 512 volume acquisitions.
Liquid delivery in fMRI session.
Liquid was delivered from the outside of the scanner room through a plastic tube, which enabled participants to consume the liquid during fMRI administration. The liquid consumption period partially covers the BOLD window associated with the (end of) delay period (Fig. 1C). Head motion was in fact greater during liquid consumption compared to button pressing during the money-discounting distractor task. However, the absolute magnitude of head motion during consumption was small, with maximum translations of (0.06, 0.09, 0.28 mm) ± (0.05, 0.05, 0.19 mm) (mean ± SD), and maximum rotations of (0.25, 0.08, 0.06) ± (0.17, 0.06, 0.03) degrees, along x-, y-, and z-axes, respectively.
Nonetheless, it is known that jaw movements can yield significant instability in echoplanar images, as based on nonhuman primate scanning (Keliris et al., 2007). However, inspections of our pilot scan showed such effects were minimal, due to the instructions and practice participants had with drinking in a supine position. In fact, instability may not have been present in the current study, as comparable EPI quality was present during consumption, compared to the fixation period. Finally, when examining BOLD activation during the consumption period, robust activations were observed, with prominent foci in somatotopic regions of mouth in the primary motor and sensory cortices, and in primary gustatory cortex. Given these results, we felt confident in assuming that movement-derived contamination during liquid consumption was minimal in the current study. Detailed results and activation maps of movement-related effects and consumption-related activity can be provided upon request.
Assessment of individual differences in delay-discounting effects.
From the behavioral session, the main effect of delay discounting was analyzed, first by conducting a repeated-measures one-way ANOVA with delay as a factor. A planned contrast on (log-) delay was then tested (Jimura et al., 2009, 2010). Participants were sorted into three groups—steep (STP), shallow (SHL), and intermediate (INT), based on visual inspection of their different discounting rates. The group effect was formally tested by entering group as an additional factor in the ANOVA. This test for group differences is more stringent than those based on model-fits of the data because it directly tests delay discounting (i.e., that the delay effect on subjective value will interact with group) without relying on any assumption regarding the direction or form of the effect.
Individual differences in delay discounting were quantified by calculating the area under the discounting curve (AuC) (Myerson et al., 2001; Sellitto et al., 2010; Jimura et al., 2011). The AuC represents the area under the observed subjective values at a given delay; more specifically, the AuC is calculated as the sum of the trapezoid areas under the indifference points normalized by the amount and delay (Myerson et al., 2001). Both subjective value and delay are normalized for purposes of calculating the AuC, which, as a result, ranges between 0.0 (maximally steep discounting) and 1.0 (no discounting). It has been argued that the AuC is the best measure of delay discounting to use for individual difference analyses, because it is theoretically neutral (i.e., assumption free) and also psychometrically reliable (Myerson et al., 2001).
An alternate approach to characterizing delay-discounting profile is to fit the discount function (either at the group or individual level) to a specific theoretical model. For example, the hyperbolic model of delay discounting, which is very popular and well validated (Rachlin et al., 1991; Green and Myerson, 2004; Kable and Glimcher, 2007), specifies that delay discounting can be characterized by the k parameter in the function A/(1+kD), where A is amount, and D is delay. The current study used AuC rather than k to characterize individual differences in delay discounting, because AuC is more psychometrically stable than k (Myerson et al., 2001) (see above). Nevertheless, the two measures tend to be highly similar in their characterization of individual differences in delay-discounting profiles. Indeed, this assumption was validated by a Spearman's rank-ordered correlation coefficient of 0.99 between individuals' discounting factor k and AuC for the current study. We did fit the group average discounting function with the hyperbolic model (using maximum likelihood estimation) so that a group-level k value could also be estimated; this value was subsequently used for model-based fMRI analyses of delay-period activity dynamics. However, the k value was not used in any individual difference analyses.
AuC was smaller in the smaller amount (20 squirts) condition than in the larger amount (40 squirts) condition (t(42) = 2.31; p < 0.05; based on two choices) (Jimura et al., 2009), replicating the amount effect observed in our previous report (Jimura et al., 2009). However, AuCs for the larger and smaller amount also correlated across participants (r = 0.61, t(41) = 4.91, p < 0.001), suggesting that individual differences in AuC were unrelated to the amount effect. Thus, for each participant, we averaged the AuCs for the larger and smaller amount conditions, and this averaged AuC was used in the imaging analysis to represent individual differences in delay discounting. Finally, it is important to note that AuC values and assignment to the discounting group were determined solely by performance in the behavioral session; therefore, individual differences analyses of the fMRI data (which were based on AuC) were unbiased.
Imaging analysis.
All functional images were first temporally aligned across the brain volume, corrected for movement using a rigid-body rotation and translation correction, and then registered to the participant's anatomical images to correct for movement between the anatomical and function scans. The data were then intensity normalized to a fixed value, resampled into 3 mm isotropic voxels, and spatially smoothed with a 9 mm full-width at half-maximum Gaussian kernel. Participants' anatomical images were transformed into standardized Talairach atlas space (Talairach and Tournoux, 1988) using a 12-dimensional affine transformation. The functional images were then registered to the reference brain using the alignment parameters derived for the anatomical scans.
A general linear model (GLM) approach was used to separately estimate parameter values for each event occurring in the task. Events were further convolved with a canonical hemodynamic response function. The focus of the analysis was on trials in which the delayed option was chosen. Consequently, both the distractor tasks and choice trials in which the immediate option was chosen were coded by events of no interest and not analyzed further.
On delayed option trials, a set of independent regressors coded for each of three trial periods: choice, delay, and reward delivery. The choice period was estimated by an event beginning with visual presentation of choice options and ending when the choice response was made. The duration of this period was thus equivalent to the choice reaction time. An additional parametric regressor during this period was used to code the IR value associated with the trial (four levels, normalized). Reward delivery was estimated with two events, a first transient event of no interest that coded the period during presentation of the instruction screen that indicated liquid was available, and a second longer event that coded (with a boxcar function) when the liquid was consumed (beginning with the first button press and ending when liquid delivery was completed).
The delay period before reward delivery was estimated by an event coded with two model-based regressors (Fig. 1C; additionally, on 60 s delay trials, a transient event of no interest was used to code the change in display color that occurred at the 30 s delay point). The first modeled a hypothesized dynamic increase in subjective value across the delay period, consistent with many neurocomputational and neuroeconomic models (Montague and Berns 2002; Berns et al., 2006; Kalenscher and Pennartz, 2008). We used a hyperbolic function to capture this pattern of delay-related increase, since this function has been used successfully in many theoretical accounts of delay discounting (Rachlin et al., 1991; Green and Myerson, 2004). To model delay-related SV dynamics with this function, we used SV = 1/(1 − kt), where t represents the time to reward delivery. The k parameter value was taken from the group mean discounting data in the behavioral session to provide a more stable fit of the model to delay-period activity. As illustrated in Figure 1C, the subjective value function gradually increases from the start of the delay toward the reward outcome. The second regressor coded AU, which reflects the positive utility derived from anticipation of a future reward (Loewenstein, 1987). Previous empirical and theoretical studies have postulated that AU is a complementary construct to SV, and thus maximal at the start of the waiting period and decreasing as the time to reward delivery grows near (Loewenstein, 1987; Berns et al., 2006). For simplicity, and to reflect the complementarity between AU and SV, we defined AU = 1 − SV. Importantly, although SV and AU are anticorrelated, when convolved with a hemodynamic response function, they produce dissociable BOLD regression functions (Fig. 1C). Specifically, the correlations between the SV and AU regressors were 0.18 and 0.51 for the 60 and 30 s conditions, respectively, allowing reasonable dissociation (cf. Otten et al., 2002). We also empirically double checked the dissociability in separate control GLM analyses in which only one of the two components was modeled. Highly similar results obtained when compared to the primary analyses, confirming that multicollinearity was not an issue.
It is worth noting that the form of the SV and AU representations incorporated in the analysis reflects a theoretical assumption, as some alternative models have postulated alternate functions to describe delay-discounting phenomena, such as exponential (Rachlin et al., 1991; Green and Myerson, 2004; Rangel et al., 2008) or subadditive (Read, 2001; Kable and Glimcher, 2010), rather than hyperbolic. Likewise, some accounts of AU postulate a linearly decreasing (Berns et al., 2006) or nonmonotonic delay function (i.e., peaking at some mid-delay point) (Loewenstein, 1987). Furthermore, some computational models do not even postulate that value representations are continuously maintained during delay periods (e.g., Brown et al., 1999; O'Reilly, 2006), and/or that value updates occur only in a cue-triggered fashion (O'Doherty et al., 2003; Redish, 2004; Seymour et al., 2004). Unfortunately, in the current data set we did not have sufficient statistical power to appropriately test and compare these alternative models in terms of their respective fits to the data. Thus, our use of the hyperbolic SV model, and the complementary AU model, is one of convenience, since these capture the broad qualitative characteristics of the functions we are interested in; we do not make any strong claims about the fit. Nevertheless, given the theoretical importance ascribed to distinctions between different discounting models, future research efforts could optimize the experimental design pioneered here to specifically focus on model comparison.
Parameter estimates for each subject were submitted to a group analysis using a voxelwise random-effects model. A whole-brain exploratory analysis was first performed to identify brain regions exhibiting subjective value and anticipatory utility effects during the delay period. Specifically, at each voxel, parameter estimates were identified that were significantly greater than zero at the group level (i.e., increased relative to baseline; p < 0.001), using cluster size thresholding to correct for multiple comparisons via the AlphaSim Monte Carlo procedure (http://afni.nimh.nih.gov/afni/). Significant voxel clusters were identified that surpassed a whole-brain corrected threshold of p < 0.05. Since this approach identified a large cluster that included both the vmPFC and VS (see Results) (Table 1), the VS effect was tested separately within an anatomically defined region of interest in VS created from TTatlas+tlrc Dataset (http://afni.nimh.nih.gov/afni/doc/misc/afni_ttatlas). Specifically, the VS was defined by voxels that fell within the region labeled as nucleus accumbens. Significance was then assessed by the threshold p < 0.05, corrected within the VS volume.
Brain regions showing significant effects of subjective value and anticipatory utility during delay period
The analysis of IR value effects aimed to test whether the regions showing delay-period effects also showed choice-related effects in intertemporal decision making, as the IR manipulation provides a good estimate of decision value. Accordingly, to test for IR value effects, voxel clusters (p < 0.05 uncorrected) were identified within the regions of interest (ROIs) that showed delay effects. The vmPFC and VS ROIs were defined as regions showing the subjective value effect (p < 0.05, uncorrected), and anterior PFC (aPFC) ROIs were defined as regions showing the anticipatory utility effect (p < 0.05, uncorrected). Then, the size of the cluster was tested, correcting for multiple comparison within each ROI using a Monte Carlo procedure. Significant clusters were then reported using a volume-corrected threshold of p < 0.05. For individual difference effects, voxelwise Pearson correlation coefficients were computed between behavioral measurements of delay-discounting profile (i.e., AuC) and BOLD GLM parameter estimates for the choice period, IR value, delay period, and reward delivery effects. The significance of these voxel clusters was assessed using the same method as just described for IR value (for similar approaches, see Jimura and Braver, 2010; Jimura et al., 2010).
To illustrate the subjective value and anticipatory utility effects and their individual differences, the delay-related time courses were directly estimated (rather than fit to a specific model function; Figs. 3, 4). This was done via an independent first-level GLM analysis that removed the subjective value and anticipatory utility regressors from the model. After removing these regressors, the residual time courses were extracted from the delay period to visualize the effects that were present during this period. Before visualization, the time series was first subjected to a bandpass filter (high cut, 15 s; low cut, 150 s) to remove transient activity and low-frequency signal drifts.
Functional connectivity analysis: primary model.
A multilevel mixed effect general linear model (Raudenbush and Bryk, 2002) was used to examine functional connectivity relationships between vmPFC, VS, and aPFC. This type of multiple regression analysis does not directly test the causal pathway linking the regions, but instead assumes a particular connectivity pattern in which VS receives inputs from aPFC and vmPFC. We think this pattern is more plausible than other logically possible patterns. Specifically, as suggested by previous work, VS may represent the reward-related estimation error or discrepancy (Hare et al., 2008; Rolls et al., 2008; Peters and Büchel, 2010b; Daw et al., 2011) between the representation of SV signaled from vmPFC (Kable and Glimcher, 2007; Ballard and Knutson, 2009; Sellitto et al., 2010) and the current utility due to anticipation (i.e., the AU) signaled by aPFC (Koechlin and Hyafil, 2007; Schacter et al., 2007; Glimcher, 2009; Peters and Büchel, 2010b; Benoit et al., 2011).
The model included two levels, one for within-subject effects (i.e., IR value), and the other for between-subject effects (i.e., AuC). The within-subject level was modeled as follows:
where β values indicate regression coefficients, X values indicate brain activity of each trial (see below), YIR denotes the amount of the IR on each trial, and ε is an error term. In this level, the VS activation was predicted by the aPFC and vmPFC activations (i.e., XaPFC and XvmPFC), the immediate reward (YIR), and their interactions with IR (i.e., YIR XaPFC and YIR XvmPFC).
Then, the higher-level between-subject effects were modeled as follows:
where γ values indicate regression coefficients that were decomposed into interactions of the AuC and constant terms from the β values modeled in the lower-level model in Equation 1. In this level, the modeled parameters in Equation 4 were further predicted by the degree of delay discounting (AuC) of individuals. Because vmPFC did not show any individual differences effect of AuC (see Results), AuC-related influences were not included in Equations 4 and 5.
The ROIs were defined in a manner that was unbiased with respect to these effects: VS was defined anatomically in the left hemisphere based on the TTatlas+tlrc Dataset (see Imaging analysis, above); aPFC was defined in the left hemisphere, as a sphere with a 6 mm radius from the peak voxel of mean signal magnitude for the group-level AU effect; and vmPFC was defined similarly, but for the SV effect (for a similar approach, see Jimura et al., 2010).
Then in those ROIs, the fMRI time courses relative to fixation baseline were extracted from a 92 s epoch (62 s in short delay trials): 62 s (32 s) before the start of reward delivery to 30 s after the start of reward delivery. This time window fully covered the trial period from the onset of choices to when liquid consumption was complete. The activation data were high-pass filtered (cutoff, 150 s) to remove low-frequency signal drift. In the first analysis, all of the time points within the epoch were averaged together to create a single value for each trial. These values were extracted for each ROI, in each trial, and for each participant, and then included into the model, along with relevant coefficients coding the IR value for that trial and the AuC for that participant.
In a separate analysis, different models were constructed for each trial period separately (choice, delay, reward delivery). The choice period was defined as an epoch spanning from 62 s (or 32 s in short delay trials) to 52 s (or 22 s in short delay trials) before the start of reward delivery. The delay period was defined as 50 s (20 s) before the start of reward delivery to the start of reward delivery. The reward delivery period was defined as 2 s after the start of reward delivery to 30 s after the start of reward delivery. In the delay and reward delivery models, IR value effects were excluded as coefficients in the model (under the assumption that IR value effects did not influence these epochs). For each of the models, unknown parameters were all simultaneously estimated using the lme4 procedure in R (http://www.r-project.org).
Functional connectivity analysis control models.
To examine the specificity of model predictors in the multilevel mixed-effect analysis, the original model (i.e., VS activation influenced by aPFC and vmPFC inputs, and further modulated by IR and the AuC: Model I; see Multilevel mixed-effect GLM: primary model, above) was modified in the following ways and then tested.
In the first modification (Model II), vmPFC activation and interaction effects involving vmPFC were eliminated from the original model. In the second modification (Model III), aPFC activation and interaction terms were instead eliminated from the original model. These control models allowed to us examine whether the observed effects were due to the multicollinearity between the vmPFC and aPFC effects. It also allowed us to test whether inclusion of the aPFC and vmPFC improved model fitness in terms of Akaike's (1974) (AIC) and Bayesian information criteria (BIC) (Schwarz, 1978).
A second set of control models tested whether the results could be explained by averaging the epoch into a single value, or by the high degrees of freedom present in the model. To examine this, we tested models that involved replacement of aPFC- and vmPFC-related terms with plausible “substitute” brain regions that could serve as controls. Model IV replaced vmPFC with a lateral occipitotemoral region (LOT) that showed the SV effect during delay period, with a similar pattern to vmPFC (Table 1). Model V replaced aPFC instead, with a medial temporal lobe (MTL) region that showed the AU effect during delay period with a pattern similar to that of aPFC (Table 1). The regions of interest in LOT and MTL were defined by a homologous procedure used in the original model, as spheres with 6 mm radii from the peak voxel of mean signal magnitude in the subjective value (LOT) or anticipatory utility (MTL) effect across the participants.
Results
Behavioral results
In the behavioral session, participants showed significant delay discounting for liquid rewards, which occurred on the order of seconds (delay effect, F(1,42) = 20.2, p < 0.0001; Fig. 2A), consistent with our previous reports (Jimura et al., 2009, 2011). The group mean data were then fit to a standard hyperbolic model of delay discounting, and k was estimated at 0.015 (based on a maximum likelihood estimation fitting procedure). This k value was used in model-based analyses of the fMRI data from the delay period (see Delay period activation, below).
A, In the behavioral session, the participants showed robust delay discounting. Each participant was classified into one of the three groups based on the steepness of the discounting [shallow (SHL; self-controlled), steep (STP; impulsive), and intermediate (INT)]. B, In the fMRI session, participants' choices were consistently biased toward the delayed option, but choice probability was modulated by the relative value of the immediate reward (IR) value. Error bars indicate SEM.
Participants were then classified into three groups, based on their delay-discounting pattern, as quantified by AuC (see Materials and Methods), and assessed in the behavioral session (Fig. 2A): steep discounters (N = 15), shallow discounters (N = 15), and an intermediate group (N = 13). A two-way ANOVA with group (STP, INT, and SHL) and delay (10, 30, and 60 s) as factors confirmed a significant main effect of group (F(2,40) = 54.0, p < 0.0001) and interaction of delay and group (F(2,40) = 4.01, p < 0.05). Although the groups significantly also differed in AuC (STP, 0.58 ± 0.10, mean ± SD; INT, 0.76 ± 0.04; SHL, 0.88 ± 0.04; F(2,40) = 78.6, p < 0.0001), it is important to note that group classification was done only for ease of description and visualization; the groups were arbitrarily defined. As such, primary analyses of individual differences used continuous correlations with AuC as the independent variable.
The fMRI session focused on activation associated with selection of delayed rewards; consequently, to bias participant choice behavior toward the delayed option, the liquid amount associated with the immediate option was individually adjusted according to the subjective value of delayed reward estimated in the behavioral estimation session (see Materials and Methods). This manipulation was successful, as participants exhibited a significant preference for the delayed option (t(42) = 15.6, p < 0.001). However, this choice bias was modulated by the relative IR value (expressed as a proportion of the delayed reward estimated subjective value; see Materials and Methods): in trials with higher IR, the preference for the delayed option was lower (t(42) = 3.21, p < 0.01; Fig. 1D). Nevertheless, IR effects on choice bias did not interact with individual differences in discounting (r = 0.22, p = 0.16). This finding indicates that the individualized adjustment procedure was successful in equating the subjective value of the IR amount across participants.
Delay-period activation
Most models of intertemporal choice behavior postulate that subjective value of a delayed reward decays rapidly as the time to its delivery increases, following a nonlinear function (Rachlin et al., 1991; Green and Myerson, 2004; Rangel et al., 2008). A corollary of this postulate is that individuals can continuously estimate the subjective value of a future reward (Hare et al., 2008; Peters and Büchel, 2010b) (often referred to as the goal value). This value estimate may dynamically evolve from the time of choice until reward delivery in a nonlinear manner, as suggested by delay-discounting models (Montague and Berns, 2002; Kalenscher and Pennartz, 2008). Thus, such brain regions may show a dynamic increase in activation during the waiting period for the chosen delayed reward (Roesch et al., 2006; Kalenscher and Pennartz 2008; Gregorios-Pippas et al., 2009).
The current study directly tested this hypothesis by continuously monitoring brain activation while participants waited for their upcoming reward (Fig. 1B). Although the exact form of the time-dependent anticipation function has been controversial (Read, 2001; Green and Myerson, 2004; Rangel et al., 2008), hyperbolic functions have repeatedly been shown to describe discounting patterns well (Rachlin et al., 1991; Kable and Glimcher, 2007). Thus, we used a model-based approach to explore whether activation dynamics could be explained by a hyperbolically increasing pattern during the delay period. Each of the two delay durations (30 and 60 s) was modeled by a standard hyperbolic delay-discounting function, convolved with a canonical hemodynamic response function (Fig. 1C; see Materials and Methods).
A number of brain regions were identified, including predominant foci within the vmPFC and VS (Fig. 3A,B; Table 1). Even when defining the VS through anatomical rather than functional criteria, the effect was still significant (left, z = 4.13, p < 0.001, corrected; right, z = 3.66, p < 0.001, corrected). The time courses of the delay-period activations confirmed that in both the VS and vmPFC there was a gradual increase in activation that accelerated in magnitude toward the reward consumption period and was present for both long and short delay durations (Fig. 3A,B, right).
Brain regions showing dynamic activation time courses related to subjective value and anticipatory utility effects during the delay period. A–C, Statistical significance maps (left) illustrate ventromedial prefrontal cortex (A) and ventral striatum (B) showing the subjective value effect, and anterior prefrontal cortex (C) showing the anticipatory utility effect. Time courses (right) demonstrate gradual increase (A, B) and decrease (C) toward the reward outcome in both of the short and long delay trials. Note that choice- and consumption-related effects are subtracted out of these time courses. The color bars indicate significance levels. Gray arrowheads indicate background color changes in 60 s conditions. A, Anterior; L, left.
Examination of the postdecision period also enabled identification of a distinct dynamic component of subjective value related to anticipation. It has been suggested that anticipation of a future reward can provide positive utility that involves complementary characteristics to the time-discounted goal value (Loewenstein, 1987; Berns et al., 2006, 2007; Prelec and Loewenstein, 1991) (see Materials and Methods). Therefore, in brain regions coding for anticipatory utility, the pattern of activation dynamics would be expected to be maximal at the beginning of the delay when anticipation is strongest, and decreasing thereafter as the time to reward delivery approaches (Loewenstein, 1987; Prelec and Loewenstein, 1991; Berns et al., 2007). We tested for regions fitting a delay-related function modeled as the complement of the subjective value function (Fig. 1C) (see Materials and Methods). This formulation is simple, but still retains the basic characteristics of an anticipatory utility function. A whole-brain exploratory analysis identified multiple brain regions fitting this pattern, including ventral parts of the aPFC bilaterally (Fig. 3C; Table 1). The time courses exhibited an initial uprising of activation, followed by a gradual decrease during the delay period (Fig. 3C, right).
These time courses are based on residuals after removing choice- and consumption-related effects (see also Materials and Methods). Nonetheless, it is important to emphasize that the interpretability of the delay-period effects is strengthened by the delay lengths used in the design (i.e., 30 and 60 s), since these are substantially longer than the canonical hemodynamic response epoch (cf. Gregorios-Pippas et al., 2009). Specifically, while the subjective value effect showed the primary acceleration of activity beginning at least 20 s after the cue, the anticipatory utility effect was still continuing to decay during this period. By this point in the delay period, all residual cue-related (i.e., choice-period) hemodynamic activity would have dissipated.
Individual differences effects
The identification of regions demonstrating delay-related activation dynamics that are consistent with the coding of subjective value (vmPFC and VS) and anticipatory utility (aPFC) raises the additional possibility that these regions reflect individual differences in impulsivity and self-control, respectively. If this hypothesis is correct, patient (self-controlled) individuals should show increased activation in brain regions coding for anticipatory utility during postchoice delay periods. Conversely, impulsive individuals should exhibit a steeper activity rise in value-related brain regions while waiting for a delayed reward, as they show steeper discounting effects in those brain regions during choice (Mischel et al., 1989; Kirby et al., 1999; Green and Myerson 2004; Berns et al., 2007; Kable and Glimcher, 2007, 2010; Ballard and Knutson 2009; Madden et al., 2009).
To test this hypothesis, we examined whether the ROIs showing delay-related activity also showed sensitivity to individual differences in delay-discounting pattern, quantified as the AuC value in the delay-discounting plot (see Materials and Methods). It is important to note that these analyses were unbiased, since individual differences were assessed purely on the basis of discounting behavior in the out-of-scanner behavioral session, and thus were independent of in-scanner behavior and brain activity.
No individual differences effects were observed in vmPFC; however, in the left VS region, a significant negative correlation was observed between AuC and delay-period activation dynamics [Fig. 4A; (−13, 8, −6); 12 voxels; p < 0.001, corrected within the anatomically defined VS region]. The negative direction of the correlation indicates that the increasing pattern of activation dynamics was preferentially present in steep discounters. As shown in the time courses displaying the patterns in the three discounting groups (Fig. 4A, right), steep discounters showed a more prominent increase during both the long and short delay-period trials. This pattern is consistent with the interpretation that individual differences in impulsivity are associated with a more strongly discounted subjective value representation (Kirby et al., 1999; Berns et al., 2007; Kable and Glimcher, 2007, 2010; Ballard and Knutson, 2009; Madden and Bickel, 2009), which then increases during the delay period.
Individual differences in delay-period dynamics. A, Ventral striatum showed a correlation between the subjective value effect and individual delay discounting rates. The statistical map (left) showed a negative correlation, indicating a larger subjective value effect in steep discounters. The time courses (right) demonstrate larger subjective value effects in steep discounters in both of the delay conditions. B, Anterior prefrontal cortex showed a correlation between anticipatory utility effect and individuals' delay discounting. Complementary to the ventral striatum, the statistical maps (left) showed a positive correlation, indicating the larger effect in shallow discounters. White closed lines indicate brain regions that showed anticipatory utility effect (p < 0.05, uncorrected). The time courses (right) demonstrate greater activation in the initial part of the delay period in shallow discounters. Note that choice- and consumption-related effects are subtracted out of these time courses. Gray arrowheads indicate background color changes in 60 s conditions. SFG, Superior frontal gyrus; MFG, middle frontal gyrus; IFG, inferior frontal gyrus; GR, gyrus rectus.
Interestingly, the anticipatory utility effect in aPFC was also modulated by individual differences in delay discounting, with a significant positive correlation in left aPFC observed between AuC and the anticipatory effect [(−31, 47, −10); 27 voxels; p < 0.05, corrected; Fig. 4B; see Materials and Method]. This positive correlation suggests a larger anticipatory utility effect in shallower discounters. Critically, this individual difference effect was opposite to that observed in the VS region, in that the delay-related aPFC activity was associated with greater patience rather than impulsivity.
Choice-period effects
To test whether these regions play a central role in intertemporal decision making, we next examined whether the vmPFC, VS, and aPFC ROIs also exhibited consistent patterns of activation and individual difference effects during choice periods. We first focused on the effects of decision value by focusing on the trial IR variable, since this factor was found to significantly modulate choice bias (Fig. 2B). It is important to note that because this analysis was restricted to the ROIs identified previously solely on the basis of delay-related activity (i.e., unbiased by choice effects; see Materials and Methods), any further modulation by choice would provide strong indication of functional involvement in both choice and postchoice delay-related processes.
It was found that during choice of the delayed option, the relative IR value modulated both VS and vmPFC, with stronger activation on trials with higher IR [vmPFC, (1, 36, −2), 75 voxels, p < 0.05, corrected; Fig. 5A; left VS, (−12, 7, −7), 17 voxels, p < 0.001, corrected; Fig. 5B]. This pattern is again consistent with the interpretation that the IR manipulation reflected a change in the subjective value of the immediate reward (Kirby et al., 1999; Berns et al., 2007; Kable and Glimcher, 2007, 2010; Ballard and Knutson, 2009; Madden and Bickel, 2009) or the total reward available on the trial (Cai et al., 2011).
Brain activity during choice period. A, B, Choice-period activations correlated with immediate reward value in ventral striatum (A) and ventromedial prefrontal cortex (B). C, Ventral striatum activity also showed a negative correlation with individual differences in delay discounting rate. D, These ventral striatum patterns are illustrated by presenting estimated signal magnitudes for the high and low immediate reward values in each of the discounting groups. Conversely, anterior prefrontal cortex showed positive correlation between activation and degree of delay discounting (E, blue arrowhead), indicating larger activation in shallower discounting individuals (F). The formats are similar to those in Figure 4B (left). Error bars indicate SEM. VS, ventral striatum; aPFC, anterior prefrontal cortex.
In the left VS, choice-related activation was also sensitive to individual differences in delay discounting, such that steep discounters showed stronger choice-related activation [(−13, 9, −7), 17 voxels, p < 0.01, corrected; Fig. 5C]. However, the individual difference effects were related to IR, such that on high-IR trials individual differences were significant (AuC correlation, r = −0.32, p < 0.05), while on low-IR trials the effects were not (r = −0.12, p = 0.45; Fig. 5D). One possible interpretation of this pattern is that impulsive individuals respond more strongly to the value of the immediate reward, but that this effect is particularly prominent when the immediate reward is close in subjective value to the delayed reward (i.e., on high-IR trials).
Finally, choice-related individual difference effects were also observed in left aPFC [(−29, 54, −4); 44 voxels; r = 0.42; p < 0.05, cluster-size corrected; Fig. 5E]; however, the correlation was in the opposite direction to that seen in VS, such that shallow discounters showed increased choice-related activation relative to steep discounters (Fig. 5F).
Functional connectivity analysis
The complementary patterns of activation dynamics and individual differences found in aPFC, VS, and vmPFC suggest that these regions subserve distinct functional roles during intertemporal decision making. First, the aPFC data are consistent with an anticipatory utility account (Loewenstein, 1987). Second, the results also suggest that vmPFC and VS may code different aspects of subjective value, since only the VS showed consistent effects associated with impulsivity. This finding is consistent with previous reports indicating that vmPFC appears to reliably track goal value, i.e., the forecasted and time-discounted value of the reward outcome, while VS might be more accurately characterized as coding the discrepancy between prior expectations regarding the goal value and the currently experienced reward utility (i.e., a reward prediction error) (Hare et al., 2008; Rolls et al., 2008; Peters and Büchel, 2010b; Daw et al., 2011). Putting these hypotheses together, one possible model of functional connectivity related to valuation is that VS activity dynamics arise from modulation by both vmPFC and aPFC inputs, reflecting both the goal value and potential sources of utility derived from anticipation of a delayed reward.
We directly examined this hypothesis in a multiple regression analysis, implemented with a hierarchical mixed-effect general linear model (Raudenbush and Bryk, 2002) that tested whether VS activation could be predicted by joint inputs from aPFC and vmPFC, along with their modulation by both trial-by-trial variability related to IR value and between-subjects variability related to delay discounting (see Materials and Methods). As shown in Table 2, the following predictors explained significant variance in VS activation: (1) vmPFC activation (t(651) = 5.87, p < 0.001); (2) the interaction of vmPFC activity and IR value (t(651) = 3.28, p < 0.01); (3) the interaction of aPFC and IR value (t(651) = −3.80, p < 0.001); and (4) the interaction of aPFC and the AuC (t(651) = −2.30, p < 0.01). Figure 6 also provides a graphical depiction of the statistical model of functional connectivity relationships. The first two effects indicate that VS activation was increased when vmPFC activation increased, and this effect was further amplified on high-IR trials. The last two effects were negative terms indicating that VS activation was decreased when aPFC activity increased, with amplified effects on high-IR trials and in high-AuC individuals (shallow discounters).
Statistical values of predicting effects modeled in mixed effect GLM analysis
Path diagram indicating the functional connectivity relationships between VS, vmPFC, and aPFC. All within- and between-subject effects were estimated in a multilevel mixed-effect GLM model. The numbers besides the paths indicate parameter estimates and their SEM in parenthesis (from the full-trial model). The vmPFC inputs (solid line) have a positive influence on VS activation, with the effects further amplified by IR value (stronger influences on high IR trials). Conversely, the aPFC inputs (dashed line) have a negative influence on VS activation, with the effects also amplified by IR value, and further individual differences in delay discounting rate (stronger influences for high AuC individuals). *p < 0.05; **p < 0.01; ***p < 0.001.
In a second stage of the analysis, we examined functional interactions between the three regions when considering just the choice and delay periods separately (i.e., in two separate mixed-level models; Table 2). Although this second-stage analysis was much lower powered and thus more highly conservative, very similar patterns were nevertheless obtained. Specifically, in both the choice and delay periods it was found that vmPFC provided a significant positive influence on VS activity, while aPFC showed a significant negative influence.
A key implication of the functional connectivity analysis is that aPFC and vmPFC exert the same modulatory influence on VS activation during decision making (the choice period) as they do during the delay period after the decision has been made. In particular, the results suggest that these regions have similar functional roles during the delay period and during decision making per se. As such, they provide clearer evidence for continuity in the representation of value information during different phases of the decision-making process. The functional role of such value representations during the post-choice delay may be the maintenance of both the utility associated with anticipation and a continuously updated prediction of the value of the upcoming reward.
Control analyses: alternate functional connectivity models
A series of control models were tested to examine the specificity of the effects observed in the functional connectivity analysis from the primary model (i.e., full-trial model; hereafter, Model I). In one set of control models, one of the input regions was removed to examine its effect on VS activity prediction. These models indicated that when the vmPFC input was removed (Model II), the aPFC effects were still present (aPFC by AuC, t(653) = −3.34, p < 0.001; aPFC by IR, t(653) = −2.00, p < 0.05), and when the aPFC input was removed (Model III), the vmPFC effects were still present (vmPFC, t(656) = 5.01, p < 0.001; vmPFC by IR, t(656) = 2.56, p < 0.01). This result confirms that the inputs from these two regions were statistically independent predictors of VS activation, and the observed effects from the full model were not a result of multicollinearity artifacts introduced by jointly including the two predictors. More importantly, we used the tests of model fitness, AIC (Akaike, 1974) and BIC (Schwarz, 1978), to quantitatively compare these control models to the original (Model I); poorer fits were found in both cases (i.e., higher AIC/BIC values; Model I, AIC, 1889; BIC, 1996; Model II, AIC, 2020; BIC, 2074; Model III, AIC, 2001; BIC, 2038).
In a second set of control models, we replaced each of the regions with a control region that also showed a similar dynamic activation effect during the delay period (see Materials and Methods). Thus, we replaced the vmPFC region with a region in extrastriate visual cortex (LOT; Model IV) that was also identified in the SV analysis (Table 1). In a separate model, we replaced the aPFC region with an MTL (Model V) region that was also identified in the AU analysis (Table 1). In both of these models, the replaced region did not predict significant VS activation, while the effects associated with the nonreplaced region remained unchanged (Model IV, aPFC by AuC, t(651) = −2.78, p < 0.01; aPFC by IR, t(651) = −2.50, p < 0.05; LOT-related effects, all |t| < 1.00, all p > 0.32; Model V, vmPFC, t(651) = 6.11, p < 0.001; vmPFC by IR, t(651) = 3.62, p < 0.001; MTL-related effects, all |t| < 1.38, all p > 0.17). These control results suggest that the significant effects observed in the original model were not attributable to using an averaged value for the trial epoch, or to the high degrees of freedom present in the model, but instead were specifically related to the trial-by-trial and between-subject variability in aPFC and vmPFC activation. Last, in this second set of control models (Models IV and V), AIC and BIC were again greater than those in the original model (Model IV, AIC, 1912; BIC, 1989; Model V, AIC, 2010; BIC, 2087), confirming that the combination of vmPFC, aPFC, and the AuC was needed to explain maximal variance in the observed data.
Discussion
The current study provides new insights regarding the core neural mechanisms of intertemporal decision making by focusing on brain activity dynamics during the postchoice delay period, in addition to choice-related activity. Impulsivity in decision making was associated with increased VS activation not only during choice, but also during the delay period, while waiting for the chosen future reward. In contrast, patient individuals exhibited increased aPFC activity both during choice and also during the early periods of waiting. Last, the time-discounted subjective value of delayed rewards corresponded with a pattern of vmPFC activity that dynamically tracked the time to reward delivery. These three brain regions were also found to show a particular pattern of functional connectivity, in which VS activity was positively influenced by vmPFC input, but negatively by aPFC, with the nature of this influence further dependent on trial-by-trial (immediate reward value) and individual difference (delay-discounting rate) factors. Importantly, these effects were observed not just during the delay period, but also during choice, suggesting functional continuity in the role of these regions in value representation. Together, the findings provide support for the idea that intertemporal decision making involves complementary neural mechanisms in the vmPFC, VS, and aPFC.
Many previous investigations have suggested key roles for vmPFC and VS in valuation-related processes (Montague and Berns, 2002; O'Doherty, 2004; Kim et al., 2008; Rangel et al., 2008; Hare et al., 2009; Jocham et al., 2011), which are corroborated but also extended here. Replicating previous findings, both of these regions were sensitive to changes in reward value during the choice phase of decision making, exhibiting greater activation on trials in which the value of the immediate (or total) reward was increased. The data are thus consistent with standard accounts of goal value representations, in which the subjective value of rewards is represented in a “common currency” code that can be accessed on the basis of predictive cues (Montague and Berns, 2002; Rangel et al., 2008; Peters and Büchel, 2010a; Padoa-Schioppa, 2011). Critically, however, the results provide novel support for the idea that such value representations are not just available at the time of choice, but also dynamically increase during the interval while waiting for that reward, peaking at the time of reward delivery. This pattern, though previously predicted by neuroeconomic models (Kalenscher and Pennartz, 2008; Montague and Berns 2002; Berns et al., 2006), provides the first direct evidence in humans that subjective reward value is coded in a time-discounted and continuously evolving manner.
The findings also support and extend previous research indicating a link between VS activation and behavioral impulsivity, both in terms of intertemporal decision making (Tanaka et al., 2004; Hariri et al., 2006; Kable and Glimcher, 2007; Ballard and Knutson, 2009; Pine et al., 2009) and other trait markers (Dalley et al., 2007; Forbes et al., 2009; Buckholtz et al., 2010). Consistent with this previous research, we found that impulsive individuals (indexed by steep discounting), exhibited greater VS activation during intertemporal choice, particularly on trials in which the immediate reward had high relative subjective value. This pattern suggests that impulsive individuals were more sensitive to immediate reward value. Yet an additional, novel finding of our study was that these individuals also exhibited a differential pattern of VS activation dynamics (i.e., more sharply accelerated) during the postchoice period. This pattern would be expected if impulsive individuals more steeply discounted the subjective value of the chosen delayed reward at the time when it was selected.
A novel contribution of the current study is the finding that ventral aPFC regions may encode an anticipatory utility signal associated with delayed rewards, that is, the extra utility derived from the pleasure of waiting for a reward delivered in the future (Loewenstein, 1987; Berns et al., 2006, 2007). Previous economic models have postulated a role for anticipatory utility in intertemporal choice, but this is the first demonstration that such utility has a direct neural correlate during human reward-based decision making. Support for the role of aPFC in anticipatory utility was demonstrated not only in terms of the characteristic pattern of delay-related activation dynamics, but also in the finding that this signal was stronger in more patient individuals. Importantly, the increased aPFC activity was associated with reduced VS activity observed in such individuals, suggesting functional connectivity between aPFC and VS during intertemporal choice behavior. Interestingly, this finding is consistent with previous work demonstrating negative functional connectivity from this same ventral aPFC region to VS, with connectivity strength predicting both behavioral success at resisting impulsive choices and reduced trait impulsivity (Diekhof and Gruber, 2010).
The localization of an anticipatory utility signal within aPFC is consistent with previous work and theoretical accounts of aPFC function. In particular, there is a growing consensus that anterior regions of PFC play a critical role in the process of episodic prospection, that is, consideration of future outcomes (Koechlin and Hyafil, 2007; Glimcher, 2009; Benoit et al., 2011; Roesch et al., 2012). Lateral aPFC regions are reliably engaged by prospective memory (Reynolds et al., 2009; Burgess et al., 2011) (i.e., maintenance of intentions to be carried out in the future), while medial/polar regions are engaged by episodic future thought (Schacter et al., 2007; Peters and Büchel, 2010b; Benoit et al., 2011) (i.e., mental time travel). These processes are thought to be a major component of planning, imagination, and affective forecasting (Koechlin and Hyafil, 2007). Anterior PFC may thus enable the internal representation of anticipated future states within working memory (Shamosh et al., 2008), such that these can be evaluated to bias action selection (cf., Braver et al., 2003; Sakai and Passingham 2006; Jimura and Braver, 2010). To speculate, this aPFC effect on decision making could occur not only in the current trial, but also in later trials, as delay-period aPFC activity may leave some kind of trace, either in terms of persistent activity or plasticity, which may influence subsequent decisions. Our data suggest that the current aPFC region, located at the intersection of medial and lateral sectors, might enable mental simulation of future states expected from delivery of an upcoming reward.
Thus, a plausible interpretation of the aPFC findings is that, in patient individuals, the current utility experienced from anticipation of a future reward matched well with the current estimate of subjective value of the reward while waiting for it (Loewenstein, 1987; Berns et al., 2007). In other words, individuals showing delay-related aPFC activity may have experienced greater anticipatory utility during the delay period (i.e., stronger hedonic benefits conferred by the act of waiting) (cf., Kahneman et al., 1997). Nevertheless, additional evidence would be needed to directly support this putative functional role of delay-related aPFC activation. Moreover, as the current study did not examine causal relations between decision- and delay-related effects, it is still unclear whether the delay-period activations influence the decision per se. Such evidence could be obtained through delay-of-gratification paradigms (Mischel et al., 1989) in which participants have an opportunity to “defect” to an immediate reward after making an initial choice to wait. Alternatively, perturbation studies (e.g., with transcranial magnetic stimulation; cf. Figner et al., 2010) would be useful for demonstrating that delay-related aPFC activity (or its disruption) alters choice preferences toward immediate rewards. Finally, it is possible that aPFC signals of anticipatory utility could be examined equally well, or even more sensitively, with conventional delay-discounting paradigms that focus more intensively on choice-period effects using abstract (e.g., monetary) rewards delivered at longer time scales in the future.
At a minimum, the characteristic pattern of activation dynamics and individual differences observed during the postchoice delay period in both aPFC and VS indicate unique and dissociable neural signatures in these two regions that are linked to individual differences in impulsivity and self-control. Although further work is certainly needed, the current study clearly highlights the utility and productivity of experimental approaches to decision making that shift focus toward the previously neglected temporal window between choices and their subsequent outcomes.
Footnotes
This work was supported by NIH Grant R01 MH66078 (T.S.B.) and a research fellowship from the Uehara Memorial Foundation (K.J.). We thank Drs. Russell A. Poldrack, Paul W. Glimcher, Antonio Rangel, Camillo Padoa-Schioppa, Junichi Chikazoe, and Teppei Matsui for comments and advice on early drafts of this manuscript. We also thank Bruna Martins, Carol Cox, Joseph Hilgard, and Dionne Clarke for administrative and technical assistance.
The authors declare no competing financial interests.
- Correspondence should be addressed to Koji Jimura, Precision and Intelligence Laboratory, Tokyo Institute of Technology, 4259-J3-10 Nagatsutacho Midoriku, Yokohama 226-8503, Japan. koji.jimura{at}gmail.com