Abstract
Human decisions are guided by “desire” or “reason,” which control actions oriented toward either proximal or long-term goals. Here we used functional magnetic resonance imaging to assess how the human brain mediates the balance between proximal reward desiring and long-term goals, when actions promoting a superordinate goal preclude exploitation of an immediately available reward option. Consistent with the view that the reward system interacts with prefrontal circuits during action control, we found that behavior favoring the long-term goal, but counteracting immediate reward desiring, relied on a negative functional interaction of anteroventral prefrontal cortex (avPFC) with nucleus accumbens (Nacc) and ventral tegmental area. The degree of functional interaction between avPFC and Nacc further predicted behavioral success during pursuit of the distal goal, when confronted with a proximal reward option, and scaled with interindividual differences in trait impulsivity. These findings reveal how the human brain accomplishes voluntary action control guided by “reason,” suggesting that inhibitory avPFC influences Nacc activity during actions requiring a restraint of immediate “desires.”
Introduction
Humans voluntarily engage in many difficult and unappealing tasks with little or no immediate gratification to achieve long-term goals. One popular theoretical view proposes that this may be accomplished by a restraint of the inborn behavioral bias oriented toward immediate reward (“desire”) through a cognitive mechanism mediating self-control (“reason”). This may ensure that human behavior conforms with superordinate goals, social norms, and laws, also when being challenged by instantly available incentives (Freud, 1927; Hume, 2000).
Neuroscientific evidence has implicated two different neural systems mediating the behavioral bias toward either the pursuit of immediate rewards (desire) or the accomplishment of long-term goals (reason). On the one hand, regions of the mesolimbic dopamine system [in particular the nucleus accumbens (Nacc) and the ventral tegmental area (VTA)] showed increased responses to predictors of reward (Schultz, 2000; Knutson et al., 2001) and were activated by decisions favoring immediate or high reward (McClure et al., 2004; Yacubian et al., 2007). On the other hand, the prefrontal cortex (PFC) has repeatedly been associated with cognitive control and the ability to maintain, manipulate, and integrate goal-relevant information over temporal delays (Owen, 1997; Duncan and Owen, 2000; Miller and Cohen, 2001). This may enable future-oriented behavior that is decoupled from impulsive desires in situations, which may be conceptualized as “desire–reason dilemmas.”
Still, quite little is known about the functional mechanisms that allow human beings to abstain from immediate rewards in favor of long-term goals. The PFC projects to both Nacc and VTA (Ongür and Price, 2000; Ferry et al., 2000; Frankle et al., 2006; Haber et al., 2006) and promotes goal-directed behavior through corticostriatal loops (Tzschentke and Schmidt, 2000; Del Arco and Mora, 2008). One possibility is that regulative influences from PFC counteract reward-related activity in the reward system, in which a superordinate future goal contradicts the proximal reward bias. This study sought to investigate the dynamic interactions between brain regions representing reason or desire in vivo, when actions oriented toward a long-term task goal precluded exploitation of an immediately available reward option.
Eighteen subjects (10 females) underwent functional magnetic resonance imaging (fMRI) while performing a novel sequential forced-choice task, in which they had to collect or reject a series of stimuli according to a predefined superordinate task goal to achieve a high reward after successful task completion. Before scanning, subjects had completed an operant conditioning task, in which acceptance of certain stimuli was directly associated with small immediate reward (see Materials and Methods). In the scanner, these conditioned stimuli were presented in different contexts, which allowed us to (1) identify reward signals elicited by these stimuli (see Fig. 1A) and (2) assess the downregulation of these reward signals in situations of a “desire–reason dilemma” (see Fig. 1B).
Materials and Methods
Subjects.
Subjects were 18 healthy volunteers (10 females) recruited from an academic environment. Ethical approval from local ethics committee and written informed consent were acquired before investigation. Subjects were paid for participation (€15 plus an additional bonus of up to €30).
Task procedure.
In the conditioning phase before scanning, squares in six different colors, which were repeated 30 times each, were presented in a shuffled mode. Button choice was free, and subjects were encouraged to explore the response–reward contingencies of the colors to maximize the overall outcome. Pressing button 1 meant that a certain color was collected, whereas button 2 indicated its rejection. Squares remained on screen until a button press took place. Decisions were immediately followed by a feedback displayed directly on the respective square, which indicated whether the decision for or against a color led to an immediate reward (+1 point) or not. Most colors led to a neutral outcome regardless of button choice. The goal of the operant conditioning procedure was to acquire and establish stimulus–response–reward contingencies.
During the second phase of the experiment, subjects had to perform a sequential forced-choice task in the MR scanner. Stimulus material remained the same as in the conditioning phase, but on most trials, subjects were no longer allowed to decide freely. Instead, in the second phase, subjects had to pursue a superordinate long-term task goal during blocks of four to six trials (for an example with four trials, see Fig. 1A). The superordinate task goal of a certain block was indicated by a cue showing the three target colors that had to be collected once (button 1), after their first appearance within the current block. In case a target color appeared a second or third time within the same block, subjects were not allowed to collect it again but had to reject it (button 2) to fulfill the task requirements and reach the superordinate task goal (Fig. 1B). Subjects were hence required to maintain the three target colors over the complete block of trials and had to keep track of their previous decisions to achieve the superordinate long-term goal. Most importantly, in some blocks, conditioned stimuli could also be targets. In these cases, subjects were required to follow the superordinate long-term goal, even if the conditioned reward association contradicted this goal (i.e., during a desire–reason dilemma) (Fig. 1B). Behavioral choice was free only in situations in which subjects encountered a color that was not one of the target colors in the current block (i.e., a nontarget stimulus). In these cases, the optimal strategy was to collect colors associated with immediate reward (Fig. 1A). Subjects were encouraged to collect these free-choice bonuses whenever they had the chance, because bonuses were added to the overall outcome after successful accomplishment of the superordinate goal.
Failure to implement the superordinate task goal (e.g., during a desire–reason dilemma) or to answer within 900 ms led to termination of the current block and zero outcome (goal failure), which meant that the overall block reward as well as all free-choice bonuses acquired during the block were lost. In contrast, successful completion of the superordinate task goal was rewarded with four points (high reward) plus potential free-choice bonuses from the current block (Fig. 1A). For this reason, it was always advantageous to stick to the long-term task goal over the course of a block.
The cue was always presented for 1500 ms, whereas individual squares within a block appeared for 1000 ms and were followed by a feedback (duration, 700 ms) that indicated the direct effect of the response chosen on a single trial (e.g., that the selection of a conditioned nontarget was immediately rewarded with one point). A general feedback, which indicated the overall outcome within a block, was presented for 3900 ms after subjects had either successfully accomplished the superordinate task goal at the end of a block or committed an error that led to termination of the current block (goal failure).
Assessment of trait impulsivity.
Subjects completed the Barratt Impulsivity Scale (BIS) (Patton et al., 1995) and the Novelty-Seeking Scale of the Temperament and Character Inventory (TCI-NS) (Cloninger et al., 1993) to assess interindividual differences in trait impulsivity. BIS scores are considered as a measure of motor and decision impulsiveness (e.g., acting without thinking or making decisions “on the spur of the moment”) and also reflects the inability to plan ahead. There is some evidence that high levels of impulsivity, as measured by BIS, are inversely correlated with serotoninergic responsivity. Novelty seekers are also characterized as impulsive, disorderly, and easily bored, but TCI-NS has been considered as a measure of the functional integrity of the dopamine system of the brain and was thus intended to complement the BIS in the current study.
Behavioral data analyses.
Statistical analyses of the behavioral data were done using the software package SPSS for Windows (version 13.0; SPSS Inc.). The influence of “motivational association” on the percentage of collected bonuses in a free-choice situation was examined with a paired t test (two-tailed significance).
fMRI data acquisition and analyses.
The experiment was performed on a 3 T MRI scanner (Siemens TRIO). Twenty-seven axial slices (voxel size, 3 × 3 × 3 mm3; gap, 20%) parallel to the anterior commissure–posterior commissure plane were acquired in ascending direction. Using a gradient echo planar imaging (EPI) sequence (interscan interval, 2 s; echo time, 33 ms; flip angle, 70°; field of view, 192 mm), a total of 1273 image volumes was acquired over the course of three sessions. Four initial “dummy” volumes were discarded from each session to allow for T1 equilibration effects. A high-resolution structural scan (three-dimensional magnetization-prepared rapid-acquisition gradient echo) was obtained for each subject. Head motion was restricted by small cushions.
Functional images were preprocessed and analyzed with SPM2 (Wellcome Department of Cognitive Neurology, University College London, London, UK). Preprocessing comprised coregistration, correction of movement-related artifacts (realignment and unwarping), corrections for slice-time acquisition differences and low-frequency fluctuations, normalization into standard stereotactic space [skull-stripped EPI template by the Montreal Neurological Institute (MNI)], and spatial smoothing with an isotropic Gaussian kernel filter of 12 mm full-width half-maximum.
Statistical analyses used a general linear model. A vector representing the temporal onsets of stimulus presentation was convolved with a canonical hemodynamic response function (hrf) to produce a predicted hemodynamic response to each experimental condition. Linear t contrasts were defined for assessing the specific effects elicited by conditioned stimuli. To test for reward-related activation in the absence of a competition between the distal goal and the proximal reward option, we compared the selection of conditioned (rewarding) stimuli and neutral stimuli when, and only when, these stimuli were not part of the target set and thus subjects were free to choose the response. We refer to this as “desire context,” because subjects were free to follow their desire to choose the immediate reward, if conditioned (rewarding) stimuli were nontargets (Fig. 1A). To assess the corresponding reward-related activations in the presence of competition between the immediate reward option and the long-term goal, we compared the correct rejection of conditioned (rewarding) stimuli and neutral stimuli, only when these stimuli were targets but had to be rejected because of their repeated occurrence within the same block. We refer to this as desire–reason dilemma, because subjects had to reject the conditioned (rewarding) stimulus, which was a repeated target, to achieve the long-term goal (Fig. 1B). Single-subject contrast images were then taken to the second level to assess group effects with random-effects analyses. As the standard statistical criterion, we used a threshold of p < 0.001, uncorrected. Corrections for multiple comparisons were performed using the false discovery rate at p < 0.05. For brain regions with a specific a priori hypothesis, it was justified to use small volume corrections (Worsley et al., 1996) at p < 0.05 for 4 mm spheres at previously reported activation foci (O'Doherty et al., 2004). We also report statistical effects at more lenient statistical criteria in supplemental Table S1 (available at www.jneurosci.org as supplemental material). This was done because we wanted to ascertain whether functional interactions of the Nacc and VTA with prefrontal regions were indeed lateralized or simply weaker in one hemisphere. Parameter estimates from the Nacc and VTA were extracted using marsbar (Brett et al., 2002) (supplemental Fig. S2, available at www.jneurosci.org as supplemental material).
Subsequent psychophysiological interaction analyses (PPI) (Friston et al., 1997) sought to identify the currently unknown prefrontal subregion(s) that controlled the downregulation of reward activation in Nacc and VTA, when the immediate reward contingency and the distal goal competed for action control (i.e., during the desire–reason dilemma). Seed areas for the PPI were selected if they (1) were part of the reward system, as indicated by increased activation caused by the collection of a conditioned stimulus for an immediate reward (i.e., in the desire context) and (2) also showed a significant downregulation of reward-related activity attributable to the requirement to reject the (same) conditioned stimulus to achieve the long-term goal during the desire–reason dilemma. This applied to two maxima in the left and right Nacc [MNI coordinates (x, y, z): −12, 12, −4 and 12, 12, −4] and to two seeds located in the left and right VTA (−4, −16, −20 and 4, −16, −20) (Table 1). Individual blood oxygenation level-dependent (BOLD) signal time courses were extracted from these four local activation maxima, which served as physiological vectors in four PPI. In each case, the psychological vector consisted of the above described contrast that tested for reward-related activations in the presence of competition between the immediate reward option and the long-term goal during the desire–reason dilemma. Using Matlab and SPM2, the hemodynamic signals were first deconvolved using a parametric empirical Bayesian formulation (Gitelman et al., 2003) and mean-corrected. Then the PPI term was built separately for each of the four regions by multiplying the deconvolved and mean-corrected BOLD signal with the psychological vector. After convolution with the hrf, mean correction, and orthogonalization, the three regressors (PPI term, physiological vector, and psychological vector) went into the statistical analysis to determine context-dependent changes of functional connectivity over and above any main effect of task or any main effect of activity in the corresponding brain areas. In the PPI contrasts, the PPI term was computed against implicit baseline. Random-effects analyses were performed on single-subject PPI contrast images (p < 0.001, uncorrected). PPI parameter estimates from the left anteroventral PFC (avPFC) were extracted from 4 mm spheres with marsbar (Brett et al., 2002) to assess the correlation between the individual coupling strength and behavioral performance data and personality scores.
Results
In the desire context, previous positive reinforcement affected behavior by significantly increasing the acceptance rate of conditioned (rewarding) stimuli compared with neutral stimuli (p = 0.0001) (supplemental Fig. S1, available at www.jneurosci.org as supplemental material). On the neural level, these decisions that satisfied the desire for immediate reward were associated with significant bilateral activation of the Nacc and VTA (Table 1; Fig. 2A,B), which is consistent with previous studies that implicated Nacc and VTA in reward prediction (Knutson et al., 2001; O'Doherty et al., 2004), subjective preferences (O'Doherty et al., 2006), and choice of immediate reward (McClure et al., 2004; Kable and Glimcher, 2007). Compared with this, in the context of the desire–reason dilemma, which precluded acceptance of conditioned stimuli, activation in these regions was significantly attenuated (Table 1) (supplemental Fig. S2, available at www.jneurosci.org as supplemental material).
How does reward-related activation in the Nacc and VTA interact with prefrontal activation when direct reward desiring and the superordinate long-term goal compete for action control? To examine whether the attenuation of activation of the Nacc and VTA in the context of the desire–reason dilemma may be explained by regulatory influences from prefrontal regions, we assessed the functional connectivity of the Nacc and VTA. We found that action choice guided by reason (i.e., choosing according to the long-term goal at the expense of an immediately available reward) was accompanied by an increased negative functional interaction between the Nacc and the avPFC (i.e., inferior frontomarginal cortex and adjacent parts of the anterior lateral orbitofrontal cortex) (supplemental Table S1, available at www.jneurosci.org as supplemental material) (Fig. 3A,B). At a more lenient statistical criterion, a negative coupling was also observed between the avPFC and the two seeds in the left and right VTA (supplemental Table S1, available at www.jneurosci.org as supplemental material). Most interestingly, the extent of the negative functional interaction between the left avPFC and the right Nacc correlated with interindividual differences in behavioral success (Fig. 3C) and trait impulsivity as indexed by the BSI and TCI-NS (Fig. 3D). Subjects with a higher inverse coupling between these regions were more successful in rejecting conditioned stimuli if required by the long-term goal and exhibited lower trait impulsivity scores, suggesting a key role for the avPFC during action control guided by reason.
Discussion
To our knowledge, this is the first in vivo neuroimaging study showing that human behavior oriented toward a long-term goal, but counteracting the competing desire for immediate reward, relies on dynamic interactions between avPFC and regions of the reward system. Hemodynamic responses to conditioned (rewarding) stimuli in both the Nacc and the VTA were significantly attenuated during the desire–reason dilemma (Table 1; Fig. 1A,B), probably as a consequence of an increased negative functional interaction with the avPFC that enabled unrewarded actions in favor of the distal goal. It has been demonstrated recently that reductions in the blood oxygen level-dependent response can be directly related to decreases in neural activity and may reflect neuronal inhibition (Stefanovic et al., 2004; Shmuel et al., 2006). In line with these findings, our data further support the concept that reward signals in the Nacc and VTA may be under inhibitory control exerted by prefrontal cortices (Duvauchelle et al., 1992; Jackson et al., 2001; Del Arco and Mora, 2008). The observed inverse relationship between anterior prefrontal and reward-related activation in the Nacc and VTA is thereby consistent with compelling animal data indicating that prefrontal activation modulates dopamine turnover and neural activity in mesolimbic pathways, which may lead to reduced sensitivity to afferent activation elicited by reward predictors (Grace, 1991; Carr and Sesack, 2000; Jackson et al., 2001; Grace et al., 2007; Dalley et al., 2008; Del Arco and Mora, 2008; Goto and Grace, 2008) (for a related fMRI study, see Meyer-Lindenberg et al., 2002). In the rat, the PFC may modulate accumbens output to the ventral pallidal system (Grace et al., 2007) either directly through projections to the Nacc (Sesack and Pickel, 1992) or indirectly through GABA interneurons in the VTA that in turn influence mesoaccumbens dopamine neurons (Carr and Sesack, 2000; Del Arco and Mora, 2008). The present results provide the first evidence that human beings may engage a similar mechanism, because they demonstrate an increased negative coupling between avPFC and both Nacc and VTA, when desire collided with reason. In this way, prefrontal influences on activation in the Nacc and VTA may decouple behavior from the impact of immediately rewarding stimuli and may promote behavioral flexibility in favor of long-term goals. This observation is also compatible with recent fMRI data in humans that demonstrated increased activation of the avPFC when humans explored alternative reward options at the expense of the currently exploited reward option (Daw et al., 2006), suggesting a crucial role for the avPFC in the capacity to supervene the reflexive desire to exploit immediately available reward. The present data complement these findings in an important aspect by showing that the successful decoupling of behavior from immediate reward desiring during desire–reason dilemmas may depend on the strength of the negative functional connectivity between avPFC and Nacc. Because trait impulsivity was also strongly correlated with individual coupling parameters between avPFC and Nacc (Fig. 3D), the degree of negative connectivity between these regions may underlie a general and supposedly stable disposition for an impulsive behavioral phenotype. A lower degree of negative functional interaction between avPFC and Nacc may thus be indicative of a reduced ability to control one's desires and behavioral impulses.
In summary, our findings support the idea that reasonable behavior in the face of instant gratification requires a suppression of reflexive reward desiring. The present data suggest that the human capacity for inhibiting the “inborn” bias toward immediate reward may critically depend on the functional interaction between avPFC and subcortical regions of the reward system. Anteroventral PFC may thereby modulate original stimulus–reinforcement associations in the Nacc and VTA, when they are not consistent with a long-term goal. The observed relationship between the ability to control the desire for immediate reward and the strength of coupling between prefrontal cortex and Nacc may have strong implications for neuropsychiatric research and may open new perspectives on the role of prefronto-mesolimbic pathways in different types of impulse control disorders. The present results help to understand the mechanistic basis of prefronto-accumbens interactions during goal-directed behavior and could prospectively aid to uncover the nature of disturbances in various neuropsychiatric disorders.
Footnotes
- Received July 15, 2009.
- Revision received November 25, 2009.
- Accepted December 7, 2009.
-
We thank Dr. F. Koenigstein for programming the test protocols and I. Pfahlert, Dr. J. Baudewig, and P.D. Dr. P. Dechent for help with data acquisition.
- Correspondence should be addressed to Dr. Esther K. Diekhof, Center for Translational Research in Systems Neuroscience and Psychiatry, Department of Psychiatry and Psychotherapy, Georg August University, Von-Siebold-Strasse 5, D-37075 Goettingen, Germany. e.diekhof{at}med.uni-goettingen.de
- Copyright © 2010 the authors 0270-6474/10/301488-06$15.00/0