Abstract
How does the brain translate information signaling potential rewards into motivation to get them? Motivation to obtain reward is thought to depend on the midbrain [particularly the ventral tegmental area (VTA)], the nucleus accumbens (NAcc), and the dorsolateral prefrontal cortex (dlPFC), but it is not clear how the interactions among these regions relate to reward-motivated behavior. To study the influence of motivation on these reward-responsive regions and on their interactions, we used dynamic causal modeling to analyze functional magnetic resonance imaging (fMRI) data from humans performing a simple task designed to isolate reward anticipation. The use of fMRI permitted the simultaneous measurement of multiple brain regions while human participants anticipated and prepared for opportunities to obtain reward, thus allowing characterization of how information about reward changes physiology underlying motivational drive. Furthermore, we modeled the impact of external reward cues on causal relationships within this network, thus elaborating a link between physiology, connectivity, and motivation. Specifically, our results indicated that dlPFC was the exclusive entry point of information about reward in this network, and that anticipated reward availability caused VTA activation only via its effect on the dlPFC. Anticipated reward thus increased dlPFC activation directly, whereas it influenced VTA and NAcc only indirectly, by enhancing intrinsically weak or inactive pathways from the dlPFC. Our findings of a directional prefrontal influence on dopaminergic regions during reward anticipation suggest a model in which the dlPFC integrates and transmits representations of reward to the mesolimbic and mesocortical dopamine systems, thereby initiating motivated behavior.
Introduction
Motivation translates goals into action. The initiation and organization of motivated behavior is thought to depend on the mesolimbic and mesocortical dopamine systems of the brain (Salamone et al., 2007), that is, the projections from the ventral tegmental area (VTA) to the nucleus accumbens (NAcc) and the prefrontal cortex (PFC) (Berridge and Robinson, 1998; Wise, 2004). Dopamine in the NAcc and PFC is indeed critical to the normal function of these regions (Goldman-Rakic, 1998; Durstewitz et al., 2000; Hazy et al., 2006; Alcaro et al., 2007). However, much remains unknown about how interactions among these regions relate to motivated behavior. VTA neurons respond to anticipated reward, but from where do they get information about reward cues in the environment? What dynamic changes in network function, triggered by reward anticipation, occur during motivation?
The VTA, NAcc, and PFC are each strongly implicated in motivation to obtain reward: VTA neurons respond to reward cues and increase their activity before goal-directed behavior (Ljungberg et al., 1992; Schultz, 1998; Fiorillo et al., 2003). The NAcc is essential for translating motivational drive into motor behavior (for review, see Goto and Grace, 2005). The PFC, specifically the dorsolateral prefrontal cortex (dlPFC), is involved in the representation and integration of goals and reward information (Miller and Cohen, 2001; Wagner et al., 2001; Watanabe and Sakagami, 2007).
Physiological relationships among the VTA, NAcc, and PFC are similarly thought to be crucial for executing reward-motivated behavior. These include dopamine release in the NAcc following VTA activation (Fields et al., 2007; Roitman et al., 2008) and modulation of VTA responsivity by both the NAcc (Grace et al., 2007) and PFC (Gariano and Groves, 1988; Svensson and Tung, 1989; Gao et al., 2007). Importantly, studies of the role of the PFC in driving VTA activity have been in anesthetized rodent models, thus imposing two important constraints on extant evidence. First, because rodents lack an expanded prefrontal cortex, the physiological role of dlPFC in modulating VTA remains unknown (Frankle et al., 2006). Second, and more fundamentally, characterization of physiological interactions in anesthetized animals cannot address the relationship between physiology and motivated behavior. Despite their well documented interactions, the dynamics of the network linking dlPFC, VTA, and NAcc has yet to be investigated during motivated behavior. Specifically, it is unclear where information signaling potential reward enters this network and how it impacts the relationships between these reward-responsive regions (but see Bromberg-Martin et al., 2010).
To investigate how this network supporting motivated behavior responds to information about potential rewards, we used functional magnetic resonance imaging (fMRI) to measure activations in the dlPFC, VTA, and NAcc during a rewarded reaction time task. This task allowed us to isolate activations associated with motivation, which occurred after the presentation of reward-informative cues and before the execution of goal-directed behavior. Using dynamic causal modeling (DCM), an analysis technique optimized to model causal relationships in fMRI data (Friston et al., 2003), we identified where reward information enters and how it modulates this dopamine-dependent system.
Materials and Methods
Subjects and behavioral task.
The data analyzed in this study were originally collected to examine the anticipation of either gaining or losing monetary rewards for either oneself or a charity, as described in detail in a previous report (Carter et al., 2009). Twenty young adults completed the original study. Four subjects were excluded due to poor data acquisition (i.e., signal drop out, poor coverage), one was excluded due to a Beck Depression Inventory score indicating depression, and three were excluded due to insufficient activation in at least one of the regions of interest (see below, Dynamic causal modeling), leaving 12 subjects in the data reported here (age, 23.9 years; SD, 3.8 years; six males).
To experimentally manipulate subjects' motivational state, we used a modified monetary incentive delay task (Knutson et al., 2001; Carter et al., 2009). The current analyses used only the data acquired while participants were anticipating monetary gains for themselves, resulting in 20 trials per condition. During these trials, initial cues marked the start of the trial and indicated whether individuals could earn either $4 (cue, figure on a red background) or $0 (cue, figure on a yellow background) for a fast reaction time to an upcoming target. After a variable delay (4–4.5 s), a response target (target, a white square) appeared, indicating that participants were to press a button using their right index finger as quickly as possible. Participants earned the amount indicated by the cue if they responded in time or earned nothing otherwise. Using information about response times on previous trials in the same condition, an adaptive algorithm set reaction time thresholds so that subjects won ∼65% of the time.
fMRI data acquisition.
A 3T GE Signa MRI scanner was used to acquire blood oxygen level-dependent (BOLD) contrast images. Each of the two runs comprised 416 volumes (TR, 1 s; TE, 27 ms; flip angle, 77°; voxel size, 3.75 × 3.75 × 3.75 mm) of 17 axial slices positioned to provide coverage of the midbrain, while also including striatum, and dlPFC (for image of coverage, see Carter et al., 2009). This restricted volume, which sacrificed superior parietal cortex, permitted a TR of 1 s for increased sensitivity in regions of interest where susceptibility artifact can be problematic. The GE Signa EPI sequence automatically passes images through a Fermi filter with a transition width of 10 mm and radius of one-half the matrix size, which results in an effective smoothing kernel of 4.8 mm2. At the beginning of the scanning session, we collected localizer images to identify the participant's head position within the scanner. Additionally, we acquired inversion recovery spoiled gradient recalled (IR-SPGR) high-resolution whole-volume T1-weighted images (voxel size, 1 × 1 × 1 mm) and 17 IR-SPGR images, coplanar with the BOLD contrast images, for use with registration, normalization, and anatomical specification of regions of interest (see below).
fMRI preprocessing.
Images were skull stripped using the BET tool of FSL (Smith, 2002). Preprocessing was performed using SPM8 software (http://www.fil.ion.ucl.ac.uk/spm). Images from separate runs from each subject were concatenated, and then realigned using a fourth-degree B spline, with the mean image as a reference. Images were then smoothed using a 4 mm Gaussian kernel, yielding a cumulative effective smoothing kernel of 6.25 mm FWHM. This 4 mm kernel was previously tested against 2, 6, and 8 mm kernels, for subcortical and medial temporal regions of interest, and gave both maximum Z scores and more limited spatial extent of activations, appropriate to the anatomy of these regions. For within-subject analysis, images were coregistered in two steps: first, IR-SPGR whole-volume T1-weighted images were coregistered to the 17 slice IR-SPGR images using a normalized mutual information function. Functional data were then coregistered to the resliced T1 images. For between-subject analyses, high-resolution structural images were normalized to MNI space and normalization parameters were applied to coregistered functional images.
The first (within-subject) level statistical models were analyzed using a general linear model (GLM). Regressors included 2 task-related regressors of interest, 13 task-related regressors of no interest, 6 motion regressors of no interest, and 3 session effects as covariates of no interest. The task-related effects were modeled with 15 columns in the design matrix that represented the anticipation and outcome possibilities for self, charity, and control. Both cue and outcome periods were modeled with 1 s boxcars at cue and outcome onset. This allowed for isolation of neural activity related to processing of the cue and preparation to execute motivated behavior. Task-related regressors of interest modeled the cue/anticipation periods for $4 and $0 gain trials. Task-related regressors of no interest included the outcome regressors for $4 and $0 self trails, as well as both cue and outcome regressors for charity, and control trials. The β estimates were then calculated using a general linear model with a canonical hemodynamic response basis function. Contrast images for $4–$0 cue regressors were computed for each subject and entered into a between-subject random-effects analysis. Statistical thresholds were set to p < 0.01, false discovery rate (FDR) corrected, with a cluster extent of six voxels for the group-level GLM analyses (Genovese et al., 2002). For construction of DCMs, an additional first-level analysis was run that included a column with the combined anticipation periods for $4 and $0 gain trials.
Dynamic causal modeling.
All DCM analyses were conducted in DCM8 as implemented by SPM8. Dynamic causal modeling uses generative models of brain responses to infer the hidden activity of brain regions during different experimental contexts (Friston et al., 2003). A DCM is composed of a system of nodes that interact via unidirectional connections. Experimental manipulations are treated as perturbations in the system, which operate either by directly influencing the activity of one or more nodes (driving inputs), or by influencing the strength of connection between nodes (modulatory input). The latter effect of exogenous input represents how the coupling between two regions varies in response to experimental manipulations.
DCM represents the hidden neuronal population dynamics in each region with a state variable x. For inputs u, the state equation for DCM is:
Matrix A represents the strengths of context-independent or intrinsic connections, matrix B represents the modulation of context-dependent pathways, and matrix C represents the driving input to the network. The state equation is transformed into a predicted BOLD signal by a biophysical forward model of hemodynamic responses (Friston et al., 2000; Stephan et al., 2007). Model parameters are estimated using variational Bayes under the Laplace approximation, with the objective of maximizing the negative free energy as an approximation to the log model evidence, a measure of the balance between model fit and model complexity.
Selection of volumes of interest.
Volumes of interest (VOIs) were defined by taking the intersection of anatomical boundaries and significant functional activations. For the NAcc and VTA, anatomical boundaries were hand drawn using AFNI software on high-resolution anatomical images in individual space (afni.nimh.nih.gov/afni). The NAcc was drawn according to the procedure outlined by Breiter et al. (1997). The ventral tegmental area was drawn in the saggital section and was identified with the following boundaries: the superior boundary was the most superior horizontal section containing the superior colliculus. The inferior boundary was the most inferior horizontal section containing the red nucleus. The lateral boundaries were drawn in the sagittal plane as vertical lines connecting the center of the colliculus and the peak of curvature of the interpeduncular fassa. The anterior boundary was clearly visible as CSF, and the posterior boundary was a horizontal line bisecting the red nucleus in the axial plane. For the dlPFC, anatomical boundaries were defined on the MNI template (fmri.wfubmc.edu/cms) as the intersection of the left BA 46 and medial frontal gyrus, and back-transformed into individual space.
The objective of DCM is to formulate and compare different possible mechanisms by which an established effect (local response) may have arisen. How this effect is initially detected and established, before the DCM analysis, depends on the particular question that is being asked. In the context of a GLM analysis, as in our case, this is usually done by requiring that there is a certain degree of activation in each region considered. Here, we operationalized this by requiring that, for each subject, each region showed an activation at the level of p < 0.05, uncorrected, with a cluster extent threshold of >3 voxels in subcortical/midbrain structures and 5 voxels in cortical structures. This criterion eliminated three subjects from the DCM analysis (two for NAcc; one for dlPFC). The time series for both the VTA and the NAcc were extracted from the peak voxel within each subject's VOI, following the study by Adcock et al. (2006). For the dlPFC, the VOI was an 8 mm sphere around each individual's peak activation. Mean coordinates ([x y z]) in MNI space for the three VOIs were as follows: left NAcc, [−8 14 −1]; right NAcc, [12 14 −3]; left VTA, [−2 −15 −13]; right VTA, [3 −17 −12]; dlPFC, [−44 38 19].
Construction of DCMs.
To answer the questions of where information about potential reward enters the system and how the system is modulated by reward anticipation, the model space included all possible configurations of driving input and several possible combinations of context-sensitive connections. Driving inputs represent cue information about all reward types (high and low), while modulatory inputs represented only high reward cues. All models assume reciprocal intrinsic connections between all regions due to the evidence from rodent literature of either monosynaptic pathways or direct functional relationships between the VTA, NAcc, and PFC (see Introduction). To reduce the number of models, we used a simplifying assumption that the bidirectional VTA–NAcc connections are modulated by the reward context. This is a reasonable assumption given the strong evidence that these two regions are critical for supporting a motivational state (for review, see Haber and Knutson, 2010). The context sensitivity of all connections to and from the dlPFC was systematically varied, resulting in 16 models. In addition, we crossed these models with all 7 possible combinations of driving input configurations. In total, 112 models per subject were fitted using a variational Bayes scheme, and posterior means [MAP (maximum a posteriori)] and posterior variances were estimated for each connection of each model.
Bayesian model selection.
Bayesian model selection (BMS), in combination with family level inference and Bayesian model averaging, was used to determine the most likely model structure given our data (Stephan et al., 2009, 2010; Penny et al., 2010). In a first comparison step, the model evidence, which balances model fit and complexity, was computed for each model using the negative free-energy approximation to the log-model evidence. Models were compared at the group level using a novel random-effects BMS procedure (Stephan et al., 2009). The models were assessed using the exceedance probability, the probability that a given model explains the data better than all other models. In a second level of comparison, we used family-level inference using Gibbs sampling on seven families (Penny et al., 2010), with all models in each family sharing a different driving input configuration. Models in the winning family were then subjected to the random-effects BMS procedure. Finally, we analyzed the winning family using Bayesian model averaging, a procedure that provides a measure of the most likely parameter values for an entire family of models across subjects (Stephan et al., 2009, 2010). Parameter significance is assessed by the fraction of samples in the posterior density that are greater from zero (posterior densities are sampled with 10,000 points), and parameters were considered significant at a posterior probability threshold of 95%.
Results
Behavioral data
To assess subjects' motivational state, we examined reaction times to target stimuli. An algorithm calibrated response time thresholds separately within each condition so that participants were successful ∼65% of the time [$4: mean (M), 64%; SD, 7%; $0: M, 63%; SD, 6%], to ensure equivalent reinforcement rates across conditions. For all included participants, reaction times to the target in $4 gain trials (M, 201 ms; SD, 42 ms) were faster than in $0 gain trials (M, 226 ms; SD, 61 ms; p < 0.001), signifying that participants were more motivated to perform in the $4 condition. Behavioral data for the loss and charity trials not included in this analysis can be found in the study by Carter et al. (2009).
GLM analyses
To identify neural substrates of motivation to obtain reward, we used a GLM analysis of the period beginning with the presentation of the cue (either $4 vs $0); hereafter, we refer to this period of anticipation of the opportunity to obtain reward as “reward motivation.” GLM analyses of the contrast $4 > $0 during this period revealed that reward motivation was associated with significant activations (p < 0.01, FDR-corrected, 6 voxel cluster extent threshold) in all of the regions of interest used for the DCM analysis: bilaterally in the VTA, NAcc, dlPFC. Additional significant activations were observed bilaterally in the midbrain (surrounding the VTA), dorsal striatum, ventral striatum (surrounding the NAcc), posterior parietal cortex, inferior parietal lobule, insula, ventrolateral prefrontal cortex, cerebellum, and ventral visual stream as well as the in the left hippocampus.
DCM analysis
We used DCM with BMS and model space partitioning to examine a network consisting of VTA, NAcc, and dlPFC during reward motivation. Our DCMs estimate the strength of driving inputs, whereby information signaling an upcoming opportunity to obtain reward directly influences neural activity in a region; intrinsic connectivity, whereby regions influence each other in the absence of reward information; and modulatory inputs, whereby information signaling the opportunity for future reward changes the strength of coupling between regions (Friston et al., 2003). We used BMS to compare the relative evidence of alternate DCMs that varied both the location of driving inputs into the system as well as the pattern of modulatory inputs, as specified below. We then applied the recently developed approach of model space partitioning, which allowed us to compare families of DCMs that varied with respect to driving input, factoring out all other differences in the models (Penny et al., 2010).
Using random-effects BMS, we compared 112 models that differed both in where information signaling potential reward entered the network and how this information influenced connectivity (Fig. 1). We tested a subset of the full model space that included all possible combinations of driving inputs, full intrinsic connectivity, and varied all combinations of possible modulatory connections between all regions except the connections between the VTA and NAcc (for details, see Materials and Methods). The exceedance probabilities (the probability that a model explains the data better than all others considered) of the top eight models together summed to 81% (Fig. 2). These top eight models all shared the feature of driving input to the dlPFC.
The full tested model space. Sixteen models that systematically varied the context sensitivity of the connections represented by blue arrows were constructed. The black arrows represent connections that were allowed to be context sensitive in all models. These 16 models were crossed with all seven possible combinations of driving inputs (red). Driving inputs represent cue information about all reward types (high and low), while modulatory inputs represented only high reward cues.
Bayesian model selection results for the full model space. The Bayesian model selection indicates the most likely model for the full model space. The top eight models account for 81% of the exceedance probability (the exceedance probabilities for all 112 models sum to 1). All eight of the best models have the driving input solely at the dlPFC. Context-sensitive connections for the top eight models are shown above in the inset.
We next used model space partitioning to compare families of models defined by their driving input configuration. This analysis determined the most likely target of driving input, regardless of modulatory connectivity. We grouped our 112 models into 7 families of 16 models, each with the same driving input configuration, and compared these families using Bayesian family level inference. This analysis yielded very strong evidence (exceedance probability of 0.93) that the family of models with the driving input solely to the dlPFC provided a better fit to the data (i.e., had higher evidence) than the six other families considered, including those where there were driving inputs to both the dlPFC and other regions (Fig. 3). Because exceedance probabilities across families sum to 1, the relative probabilities of families is more informative than the absolute probability (Penny et al., 2010). In the present study, the ratio of the exceedance probability of the best to second best family was 21. The equivalent ratio for the expected posterior probability, the expected likelihood of obtaining a model in a particular family from any randomly selected subject, was 3.3. Thus, the exceedance probability of 0.93 observed here represents very strong evidence that reward information was best modeled as entering the system at the dlPFC.
Exceedance probabilities for each family of models sharing a driving input configuration. The ratio of exceedance probabilities of the first to second best model is 21, indicating with very high certainty that reward information enters the modeled network solely at the dlPFC.
The models with driving inputs to the dlPFC were then compared with one another using BMS within the winning family. As anticipated, no single superior model within this group was determined. Notably, the top four models within the winning family, summing to 69% of the exceedance probability, all had modulatory connections from dlPFC to VTA, indicating that the influence of the dlPFC on the VTA was also important for determining model fit.
To determine how reward information entering the dlPFC affects network dynamics across subjects, we used Bayesian model averaging to compute the weighted average of modulatory changes in connectivity for the winning family of models (Stephan et al., 2009) (Fig. 4, Table 1). Parameter estimates were considered significant at a posterior probability threshold of 95% that the posterior mean is different from zero. At baseline, intrinsic connection strengths were strongest from the VTA to NAcc and NAcc to VTA. This connection was not significantly modulated by reward motivation even though all of the models tested allowed for the expression of context-dependent modulation of these connections. However, we emphasize that all intrinsic connections were included in all models, so the finding of strong intrinsic connectivity is independent of our constraints on the model space. This lack of significant modulation is unexpected given the role of these regions in supporting motivation.
Connectivity determined by Bayesian model averaging of models in the winning family, which all shared driving inputs solely to the dlPFC (Occam's window, 11.8 models; SD, 5.3). All solid connections are significant at a posterior probability threshold of 95% that the posterior mean is larger than zero. The dotted connections are not significant. Driving inputs represent cue information about all reward types (high and low), while modulatory inputs represented only high reward cues. Left, The intrinsic (baseline) connectivity for each connection. Right, Modulation of connectivity during reward motivation. Modulation of the blue connections was varied in the model space. Connection strengths are indicated on each arrow (in hertz). Only the connections from dlPFC to VTA and NAcc are significantly modulated by reward motivation.
Means and SDs of all the parameter estimates for the averaged model
Reward motivation induced significant increases in connection strength only in the connections from the dlPFC to the NAcc and to the VTA. This is not an artifact of the fact that driving input entered the dlPFC; in fact, it has been suggested that, when a driving input to a region and a modulatory input to its efferents are correlated (as they are here), the constraints on the prior variances of the driving and modulatory connections result in an underestimation of the strength of the modulatory effect (SPM listserv, 022036). The modulatory effect of reward motivation in these two connections from dlPFC was very strong: The connection from the dlPFC to the VTA increased from zero and the modulatory effect from the dlPFC to the NAcc is 290% of the intrinsic connection strength. Thus, reward motivation engaged previously weak or inactive pathways from the dlPFC to the VTA and NAcc, without significantly altering connectivity throughout the rest of the system.
Discussion
We investigated the impact of reward motivation on the dynamics of mesolimbic and mesocortical dopaminergic regions in a network comprising VTA, NAcc, and dlPFC. Importantly, our task structure permitted us to isolate neural activity concurrent with onset of information signaling potential reward and distinct from processing related to reward outcomes, allowing the observation of motivation preceding the execution of goal-directed behavior. Our results indicate that, during a simple rewarded reaction time task, information about expected reward entered this network solely at the dlPFC. This reward information increased the modulation by the dlPFC of the VTA and the NAcc, structures that are known to influence the physiology and plasticity of networks supporting motivated behavior, attention, and memory throughout the brain. Together, these findings suggest that, in response to goal-relevant information, the PFC harnesses these modulatory pathways to generate physiological states that correspond to expectancy and motivation.
To characterize the network of brain regions involved in motivation, we used DCM. Importantly, DCM does not assume that temporal precedence is necessary for causality. Because the lag between neural activity and BOLD activation can theoretically vary across brain regions, due to vascular factors, DCM is particularly appropriate for detecting network interactions in BOLD data. In addition, DCM allows for inference about causal interactions between regions that depend on the experimental context. These inferences can be tested across a theoretically unlimited model space, here allowing us to test among all possible driving input configurations.
Our DCM analysis of the VTA–NAcc–dlPFC network during reward motivation indicated that the driving input was exclusively to the dlPFC. This means that, in this behavioral context and within the modeled network, information signaling potentially available reward entered the dopamine system at the dlPFC and not at the other regions in the model. Furthermore, driving input unique to the dlPFC appeared to be the feature of the models that was most important for determining model fit. Our findings demonstrate that reward cues directly increased dlPFC activation, and only influenced activation in the VTA or NAcc indirectly, via connections from the dlPFC.
In addition to the regions modeled in our analysis, previous research has identified other candidate regions, such as the medial prefrontal cortex, orbitofrontal cortex, and habenula (Staudinger et al., 2009; Bromberg-Martin et al., 2010) that could plausibly initiate motivated behavior. The question of how these regions interact with this network, especially the dlPFC, is an important avenue of future research. However, it is important to note that, if any of these regions were driving the modeled network via efferents to the VTA or NAcc, one would expect to see this influence expressed in our data as a driving input to the VTA or NAcc. Thus, our finding of a unique driving input to the dlPFC indicates that, in this behavioral context, information signaling potential reward entered the modeled network neither via subcortical relays nor other prefrontal cortical inputs to the VTA, but rather via the dlPFC.
In addition to demonstrating PFC modulation of VTA in awake animals during motivated behavior, the current findings are, to our knowledge, the first demonstration of a prefrontal influence on the VTA in humans or other nonhuman primates. Bayesian model averaging revealed strong modulation of the VTA by the dlPFC specifically during reward motivation; this VTA–dlPFC pathway was not engaged intrinsically. Moreover, there was a nearly threefold increase in connectivity strength from the dlPFC to the NAcc during reward motivation. Conversely, intrinsic VTA–NAcc connectivity was significant, but was not modulated by reward. This result could indicate that connectivity between the VTA and NAcc is always strong regardless of the level of motivation. However, based on prior research showing that reward information has an effect on VTA modulation of NAcc (Bakshi and Kelley, 1991; Ikemoto and Panksepp, 1999; Parkinson et al., 2002), this interpretation is unlikely to be correct. More plausible is that changes in VTA–NAcc connectivity existed, but because their effect size was small relative to that of dlPFC connectivity, they did not contribute significantly to the model evidence, further suggesting that dlPFC modulation was highly influential for the function of this network.
The finding that there was no increase in the connection strength from the VTA to the dlPFC may seem to conflict with physiology literature demonstrating dopaminergic modulation over the PFC (Williams and Goldman-Rakic, 1995; Durstewitz et al., 2000; Gao and Goldman-Rakic, 2003; Paspalas and Goldman-Rakic, 2004; Seamans and Yang, 2004; Wang et al., 2004; Gao et al., 2007). Although we found a modest intrinsic influence of the VTA on the PFC, this influence did not change with motivational state. However, we do not believe our findings to be contradictory to the above literature, as the modulatory influences may contribute to separate, but strongly interacting, behavioral processes. We believe the modulatory role of the PFC over the VTA contributes to goal-directed, instrumental components of behavior, and the modulatory role of the VTA over the PFC may be especially important to other behavioral processes, such as updating or task switching, which were not manipulated in our paradigm. Thus, our findings, in the context of the previous literature, suggest that paradigms that evoked motivated executive behaviors would reveal bidirectional modulations between the VTA and dlPFC.
Building on the wealth of previous research outlining the influence of midbrain dopamine on target regions, the current findings suggest a model in which the dlPFC integrates information about potential reward and implements goal-directed behavior by tuning mesolimbic dopamine projections. This interpretation is consonant with evidence from the rodent literature showing that the PFC is the only cortical region that projects to dopamine neurons in the VTA (Beckstead et al., 1979; Sesack and Pickel, 1992; Sesack and Carr, 2002; Frankle et al., 2006). The findings fill a critical gap in this literature: stimulation of the PFC has been shown to regulate the firing patterns of dopamine neurons in rodents (Gariano and Groves, 1988; Svensson and Tung, 1989; Gao et al., 2007), and multisite recordings demonstrate phase-coherence between the PFC and the VTA that mediates slow-oscillation burst firing (Gao et al., 2007), but there has been no demonstration that these physiological relationships are driven by motivation and goal-directed behavior. Furthermore, the absence of an expanded frontal cortex in the rodent makes an appropriate rodent correlate of primate dlPFC unclear. Although there is evidence in primates for excitatory projections from the PFC to midbrain dopamine neurons (Williams and Goldman-Rakic, 1998; Frankle et al., 2006), the functional significance of these relatively sparse projections has been questioned. Our findings showing a physiological relationship between prefrontal cortex and VTA in humans thus fill a second critical gap in the extant literature on human (and nonhuman primate) motivation.
Within the PFC, dlPFC is well situated to orchestrate motivated behavior because of its role in planning and goal maintenance. Primate physiology studies have demonstrated that, while both the orbitofrontal cortex and the dlPFC encode reward information, only dlPFC activity predicts which behaviors a monkey will execute (Wallis and Miller, 2003). Furthermore, the dlPFC maintains goal-relevant information during working memory (Levy and Goldman-Rakic, 2000; Wager and Smith, 2003; Owen et al., 2005), updates this information as goals dynamically change during task switching (Dove et al., 2000; Kimberg et al., 2000; MacDonald et al., 2000; Rushworth et al., 2002; Crone et al., 2006; Sakai, 2008; Savine et al., 2010), and arbitrates between conflicting goals during decision making (MacDonald et al., 2000; McClure et al., 2004, 2007; Ridderinkhof et al., 2004; Boettiger et al., 2007; Hare et al., 2009). These previous findings suggest a role for the dlPFC in implementing behavioral goals, but they do not characterize the nature and direction of interactions between dlPFC and other regions supporting motivated behavior. Computational and neuroimaging work has posited a role for the dlPFC in modulating the striatum in the context of instructed reward learning (Doll et al., 2009; Li et al., 2011). Our results corroborate these recent findings and further implicate the dlPFC in initiating motivated behavior, via the novel demonstration of a directed influence on the VTA. Transcranial magnetic stimulation of the dlPFC changes the valuation of both cigarettes (Amiaz et al., 2009) and food (Camus et al., 2009), and also induces dopamine release in the striatum (Pogarell et al., 2006; Ko et al., 2008); however, these results do not reveal how dlPFC activation affects network activation or dynamics. The current findings directly demonstrate dlPFC influence over not only the NAcc but also the VTA during reward-motivated behavior, as postulated by prior work.
In summary, we found that motivation to obtain reward is instantiated by a transfer of information from the dlPFC to the NAcc and VTA; we saw no evidence of the reverse. These findings show that the dlPFC can orchestrate the dynamics of this neuromodulatory network in a contextually appropriate manner. Furthermore, by suggesting an anatomical source for information about expected reward that activates dopaminergic regions, the findings also shed light on the fundamental question of how dopamine neurons define value. Finally, because of the widespread effects of VTA activation and resultant dopamine release, this interaction represents a candidate mechanism whereby dorsolateral prefrontal cortex modulates physiology and plasticity throughout the brain to support goal-directed behavior.
Footnotes
This work was supported by NIMH Grant 70685 (S.A.H.), NINDS Grant 41328 (S.A.H.), National Alliance for Research on Schizophrenia and Depression (R.A.A.), and The Dana Foundation (R.A.A.). R.M.C. is supported by NIH Fellowship NIH51156. S.A.H. was supported by an Incubator Award from the Duke Institute for Brain Sciences. R.A.A. is supported by the Alfred P. Sloan Foundation and The Esther A. and Joseph Klingenstein Fund Foundation. We gratefully acknowledge K. E. Stephan's help with methodological considerations related to the DCM analyses.
The authors declare no competing financial interests.
- Correspondence should be addressed to R. Alison Adcock, B203 Levine Science Research Center, Duke University, Box 90999, Durham, NC 27708. alison.adcock{at}duke.edu