Abstract
Mesocorticolimbic dopamine (DA) has been implicated in cost/benefit decision making about risks and rewards. The prefrontal cortex (PFC) and nucleus accumbens (NAc) are two DA terminal regions that contribute to decision making in distinct manners. However, how fluctuations of tonic DA levels may relate to different aspects of decision making remains to be determined. The present study measured DA efflux in the PFC and NAc with microdialysis in well trained rats performing a probabilistic discounting task. Selection of a small/certain option always delivered one pellet, whereas another, large/risky option yielded four pellets, with probabilities that decreased (100–12.5%) or increased (12.5–100%) across four blocks of trials. Yoked-reward groups were also included to control for reward delivery. PFC DA efflux during decision making decreased or increased over a session, corresponding to changes in large/risky reward probabilities. Similar profiles were observed from yoked-rewarded rats, suggesting that fluctuations in PFC DA reflect changes in the relative rate of reward received. NAc DA efflux also showed decreasing/increasing trends over the session during both tasks. However, DA efflux was higher during decision making on free- versus forced-choice trials and during periods of greater reward uncertainty. Moreover, changes in NAc DA closely tracked shifts in choice biases. These data reveal dynamic and dissociable fluctuations in PFC and NAc DA transmission associated with different aspects of risk-based decision making. PFC DA may signal changes in reward availability that facilitates modification of choice biases, whereas NAc DA encodes integrated signals about reward rates, uncertainty, and choice, reflecting implementation of decision policies.
Introduction
Efficient cost/benefit decision making entailing risk/reward assessment requires coordination of a variety of cognitive operations. These include evaluation of the relative objective and subjective values of different options, monitoring changes in action/outcome contingencies over time, and tracking variations in reward frequency to modify choice biases. Studies using animal models indicate that these functions may be subserved in part by interactions between different nodes within mesocorticolimbic dopamine (DA) circuitry (Floresco et al., 2008; St. Onge et al., 2012). For example, systemic drug manipulations of DA transmission markedly alter risk-based decision making in rats, as assessed with a probabilistic discounting procedure. Stimulation or blockade of DA receptors can increase or decrease preferences for small/certain versus large, yet risky rewards, respectively (St. Onge and Floresco, 2009; St. Onge et al., 2010).
The medial prefrontal cortex (PFC) and the nucleus accumbens (NAc) are two DA terminal regions linked to distinct roles in risk-based decision making across species. For example, the prelimbic PFC enables rats to adapt to changes in reward probabilities required to bias their choices toward more advantageous options. In contrast, the NAc contributes information about reward magnitude and relative value to bias choice toward larger, uncertain rewards, particularly in response to recently rewarded actions (St. Onge and Floresco, 2010; Stopper and Floresco, 2011). Moreover, blockade of D1 or D2 receptors in the PFC produces dissociable effects on probabilistic discounting, by either accelerating or retarding shifts in decision biases as reward probabilities change (St. Onge et al., 2011). Thus, DA inputs to these forebrain regions appear to provide an important modulatory signal that can refine decision biases.
Increases in phasic DA signaling in the NAc measured with in vivo voltammetry have been linked to tracking of expected reward magnitude, effort, and delay-related costs (Day et al., 2010; Gan et al., 2010; Wanat et al., 2010; Sugam et al., 2012). In comparison, recent conjecture about the function of less dynamic, tonic DA levels has posited that this mode of DA transmission may signal variations in the net rate of reward that aids in determining optimal modes of responding (Niv, 2007; Niv et al., 2007). The manner in which tonic DA levels fluctuate when animals are engaged in decision making, as well as their relation to changes in behavior, remain poorly understood. Increases in DA efflux in the PFC measured with microdialysis have been observed during performance of a delay-discounting task and were attributed to tracking the amount of food reward received during the session rather than choice behavior, as rats in a yoked-control group that received food passively without making choices showed a similar DA profile (Winstanley et al., 2006).
To our knowledge, no studies have directly examined and compared DA release in the PFC or NAc during cost/benefit decisions about probabilistic outcomes. To this end, the present study used in vivo microdialysis to measure extracellular DA efflux in the PFC and NAc during risk-based decision making and to identify specific factors related to decision making that may be encoded by such changes in DA efflux.
Materials and Methods
Animals.
Male Long–Evans rats (Charles River Laboratories), weighing 275–300 g at the beginning of behavioral training, were used for the experiment. Upon arrival, rats were given 1 week to acclimatize to the colony and food was restricted to 85–90% of their free-feeding weight for an additional 1 week before behavioral training. Rats were given ad libitum access to water for the duration of the experiment. Feeding occurred in the rats' home cages at the end of the experimental day and body weights were monitored daily to ensure a steady weight loss during food restriction and maintenance or weight gain for the rest of the experiment. All testing was conducted in accordance with the Canadian Council of Animal Care and approved by the Animal Care Committee of the University of British Columbia.
Apparatus.
Behavioral testing for all experiments described here was conducted in operant chambers (30.5 × 24 × 21 cm; Med-Associates) enclosed in sound-attenuating boxes. The boxes were equipped with a fan to provide ventilation and to mask extraneous noise. Each chamber was fitted with two retractable levers, one located on each side of a central food receptacle where food reinforcement (45 mg; Bioserv) was delivered via a pellet dispenser. The chambers were illuminated by a single 100 mA house light located in the top-center of the wall opposite the levers. All experimental data were recorded by an IBM personal computer connected to the chambers via an interface.
Lever pressing training.
Our initial training protocols have been described previously (St. Onge and Floresco, 2010; St. Onge et al., 2011, 2012). On the day before their first exposure to the chambers, rats were given ∼25 sugar reward pellets in their home cage. On the first day of training, two to three pellets were delivered into the food cup and crushed pellets were placed on a lever before the animal was placed in the chamber. Rats were first trained under a fixed-ratio 1:1 schedule to a criterion of 60 presses in 30 min, first for one lever, and then the other (counterbalanced left/right between subjects). Rats were then trained on a simplified version of the full task. These 90 trial sessions began with the levers retracted and the operant chamber in darkness. Every 40 s, a trial was initiated with the illumination of the house light and the insertion of one of the two levers into the chamber. If the rat failed to respond on the lever within 10 s, the lever was retracted, the chamber darkened, and the trial was scored as an omission. If the rat responded within 10 s, the lever retracted and a single pellet was delivered with 50% probability. This procedure was used to familiarize the rats with the probabilistic nature of the full probabilistic discounting task. In every pair of trials, the left or right lever was presented once, and the order within the pair of trials was random. Rats were trained for ∼5–6 d to a criterion of 80 or more successful trials (i.e.; ≤10 omissions).
Decision making task.
Risk-based decision making was assessed with a probabilistic discounting task that has been described previously (Ghods-Sharifi et al., 2009; St. Onge and Floresco, 2009, 2010; St. Onge et al., 2010). Rats received daily sessions consisting of 72 trials, separated into four blocks of 18 trials, with an intertrial interval of 47 s. The entire session took 57 min to complete, and animals were trained 6–7 d per week. The intertrial interval length was increased from the original task design (40 s) to enable sampling from each forced- and free-choice block (see below). A session began in darkness with both levers retracted (the intertrial state). A trial began every 47 s with the illumination of the house light and, 3 s later, insertion of one or both levers into the chamber (the format of a single trial is shown in Fig. 1). One lever was designated the Large/Risky Reward lever, the other the Small/Certain Reward lever, which remained consistent throughout training (counterbalanced left/right). If the rat did not respond by pressing a lever within 10 s of lever presentation, the chamber was reset to the intertrial state until the next trial (omission). When a lever was chosen, both levers retracted. Choice of the Small/Certain lever always delivered one pellet with 100% probability; choice of the Large/Risky lever delivered four pellets but with a particular probability (see below). After food was delivered, the house light remained on for another 4 s, after which the chamber reverted back to the intertrial state. Multiple pellets were delivered 0.5 s apart. The four blocks were comprised of eight forced-choice trials where only one lever was presented (4 trials for each lever, randomized in pairs), permitting animals to learn the amount of food associated with each lever press and the respective probability of receiving reinforcement over each block. This was followed by 10 free-choice trials, where both levers were presented and the animal chose either the Small/Certain or the Large/Risky lever.
Probabilistic discounting task design. A, Cost/benefit contingencies associated with responding on either lever. B, Format of a single free-choice trial.
Animals were trained on one of two versions of the discounting task. In the descending version, the probability of obtaining four pellets after selecting the Large/Risky lever decreased across trial blocks: it was initially 100%, then decreased to 50, 25, and 12.5% for each successive block. In the ascending version, the probabilities increased across trial blocks from 12.5, 25, 50, to 100%. We did not include a condition where changes in reward probabilities occurred in a more random manner because previous studies have shown that rats take considerably longer to learn this type of task, and show less discounting of the large/risky option (St. Onge et al., 2010). For each session and trial block, the probability of receiving the large reward was drawn from a set probability distribution. Therefore, on any given day, the probabilities in each block may have varied, but averaged across many training days, the actual probability experienced by the rat approximated the set value. In the three probabilistic trial blocks of this task, selection of the larger reward option carried with it an inherent risk of not obtaining any reward on a given trial. Rats were trained on the task until as a group, they (1) chose the Large/Risky lever during the first trial block (100% probability) on at least 80% of successful trials, and (2) demonstrated stable baseline levels of choice, assessed using a standard procedure described in previous studies (St. Onge and Floresco, 2010). In brief, data from three consecutive sessions were analyzed with repeated-measures ANOVA with two within-subjects factors (Day and Trial Block). If the effect of Trial Block was significant at the p < 0.05 level but there was no main effect of Day or Day × Trial Block interaction (at p > 0.1 level), animals were judged to have achieved stable baseline levels of choice behavior.
Yoked-reward task.
We incorporated a control condition into the overall experimental design to ascertain how changes in DA efflux during probabilistic discounting related to delivery of food reward versus the making of risk/reward judgments. Rats in the yoked-reward groups were subjected to training wherein food reward was delivered passively on a schedule similar to rats performing the decision making task. Rats in this group were initially familiarized with operant chambers and food delivery over 12 d. During this training, they were placed in the chamber with the house light off. Every 47 s, the house light illuminated and 3 s later, the food dispenser delivered four, one, or zero food reward pellets, similar to food delivery experienced by rats performing the decision making task. Thus, illumination of the house light served as a cue that some food may be delivered on a trial, but no levers were extended and no explicit cues or actions were associated with the amount of reward delivered on any trial. Instead, a pseudorandom schedule of reward was designed to approximate the pattern and amount of reward delivery experienced by well trained rats performing the probabilistic discounting task. Separate groups of rats were trained on variants where the schedule of reward delivery over a session mimicked those experienced by rats performing the descending or ascending versions of the discounting task (i.e.; rats received progressively more or less food per sample over time).
We chose a representative subset of rats implanted with PFC and NAc probes that were trained on the descending and ascending versions of the discounting task to serve as the reference group. During microdialysis test days, the food delivery program for the yoked animals was set to ensure that reward delivered on each trial identically matched the pattern of reward experienced by a rat from the reference group that had performed the task.
Surgery.
Once rats trained on the probabilistic discounting task displayed stable levels of choice, they were provided food ad libitum and 2 d later were subjected to surgery. Rats trained for the yoked-reward experiments underwent surgery before training. Rats were anesthetized with either 100 mg/kg ketamine hydrochloride and 7 mg/kg xylazine or isoflurane and implanted with bilateral stainless steel guide cannulae [19 gauge, 15 mm, nitric acid (passivated)] directly over either both sides of the medial prelimbic PFC [anteroposterior (AP) = +3.5 mm; medial-lateral (ML) = ±0.6 mm from bregma; and dorsoventral (DV) = −1.0 mm from dura] or one over the medial PFC in one hemisphere and the other over the contralateral NAc (AP = +1.8 mm; ML = ±1.1 mm from bregma; and DV = −1.0 mm from dura). For all surgical preparations, the mouth bar was set to −3.3 mm (flat skull). An additional guide cannula mounted on the skull (training post) with dental acrylic was used to tether a 30 cm length stainless steel coil to the head during training sessions, thereby allowing the animals to habituate to the dialysis assembly before probe implantation. Twenty-three-gauge obdurators flush with the end of guide cannulae remained in place until the microdialysis experiments took place. Rats were given at least 7 d to recover from surgery before testing. During this recovery period, animals were handled for at least 5 min each day and food restricted to 85% of their free-feeding weight. Body weights were monitored daily to ensure a steady weight loss or maintenance during this recovery period. Rats were subsequently retrained on the probabilistic discounting task for at least 5 d while tethered to the steel coil until, as a group, they displayed stable levels of choice behavior.
Microdialysis procedure.
Fourteen to 16 h before the test session, concentric microdialysis probes (2 mm membrane length, 340 μm outer diameter, 65,000 Dalton molecular weight cutoff; Filtral 12; Hospal) with silica inlet–outlet lines were implanted via the guide cannulae in the medial PFC (−4.6 mm below dura) or NAc (−7.8 mm below dura). The use of right and left hemispheres were counterbalanced within groups. Probes were perfused at 1 μl/min with a modified Ringer's solution (10 mm sodium phosphate, 1.2 mm CaCl2, 3.0 mm KCl, 1.0 mm MgCl2, 147.0 mm NaCl, pH 7.4) by means of a 2.5 ml gas-tight syringe (Hamilton) and a syringe pump (Model 22; Harvard Apparatus). Typical in vivo recoveries of DA conducted at room temperature were 17% of a standard DA solution. Perfusion was continuous from the time of probe implantation until the end of the experiment. Implanted rats remained overnight in the test chamber and were given ad libitum access to water and their daily food allowance. The next morning, dialysis samples were collected from the medial PFC and NAc at 7 min intervals and analyzed immediately with high pressure liquid chromatography (HPLC) with electrochemical detection. Baseline sampling continued until four consecutive samples showed <5% fluctuation in DA content (∼1 h); subsequently, the test session began (either probabilistic discounting or yoked-reward). During the test session, eight on-task samples were obtained. A 7 min sampling interval used in these experiments permitted us to monitor changes in DA efflux during the forced-choice and free-choice portions of each trial block separately. In some experiments, animals with cannulae implanted above both brain regions were used for PFC and NAc microdialysis experiments on separate test days, which were separated by 2–3 d of retraining on their respective task.
HPLC.
Analysis of DA content in the medial PFC and NAc dialysates involved separation by reverse-phase HPLC and quantification by electrochemical detection. The system consisted of an Antec-Leyden pump (Alexis LC 110), a pulse damper (Scientific Systems), a Rheodyne Inert manual injector (model 9125i, 20 μl injection loop), a Tosoh Bioscience Super ODS TSK column (2 μm particle, 2 mm × 10 mm), and an Antec Leyden Intro Electrochemical detector and VT-03 flow cell with a Ag/AgCl reference electrode (Vapplied = +700 mV; Leyden). The mobile phase [70 mm sodium acetate buffer, 40 mg/l EDTA and 50 mg/l SDS (adjustable), pH 4.0, 9% methanol] flowed through the system at 0.17 ml/min. EZChrome Elite software (Scientific Software) was used to acquire and analyze chromatographic data. A three-point calibration curve of external DA standards was used to convert the area under the curve of DA peaks into concentration values.
Histology.
After completion of all behavioral testing, rats were killed either in a carbon dioxide chamber or with isoflurane. Brains were removed and fixed in a 4% formalin solution. The brains were frozen and sliced into 50 μm sections before being mounted and stained with Cresyl Violet. Placements were verified with reference to the neuroanatomical atlas of Paxinos and Watson (2005). The locations of acceptable probe placements in the PFC and the NAc are presented in Figure 2.
Location of the microdialysis probes in the PFC (top) and NAc (bottom). Vertical lines represent length of the 2 mm dialysis probes.
Data analyses.
Neurochemical data were transformed into percentage of change from baseline (i.e., 100% representing the average concentration of the three samples preceding the final baseline sample). Neurochemical data obtained from the discounting experiments were analyzed using either two- or three-way between-/within-subjects ANOVAs, with either sample or probability block (collapsing across force/free choice samples) and forced/free choice trials (averaging across blocks) as one or two within-subjects factors. The between-subjects factor was probability order (i.e., descending or ascending task variants). For these analyses, the data from on-task samples were aligned with respect to the particular forced- and free-choice portion of the probability block from which they were obtained. Accordingly, across the two-task versions, the first and second on-task samples from the descending groups (forced and free choice, 100% block) corresponded to the seventh and eighth samples (respectively) obtained from rats in the ascending groups; the third and fourth descending groups corresponded to the fifth and sixth ascending goups, and so forth. Data from the yoked-reward experiments were analyzed in a similar manner. The primary behavioral measure of interest for the probabilistic discounting task was the proportion of choices directed toward the Large/Risky lever for each block of free-choice trials, factoring in trial omissions. For each block, this was calculated by dividing the number of choices of the Large/Risky lever by the total number of successful trials. Trial omission and response latency data were analyzed with repeated-measures one-way ANOVAs for rats in the PFC or NAc experiments separately.
Results
Behavioral performance
Rats trained on the descending (n = 24) and ascending (n = 21) versions of the probabilistic discounting task required an average of 30–35 d of training before showing stable choice behavior and proceeding to microdialysis experiments. For PFC microdialysis tests, rats performing either the ascending or descending task version showed significant changes in preference for the Large/Risky lever across probability blocks (F(3,63) = 11.47, p < 0.001), with no differences between the two versions of the task in discounting behavior (F(1,21) = 0.43, n.s.) or interactions between task and probability block (F(3,63) = 2.03, n.s.; Fig. 3A). Similarly for NAc microdialysis tests, rats performing the descending and ascending versions also showed significant discounting of the Large/Risky lever across probability blocks (F(3,60) = 20.57, p < 0.001) with no main effect of task version (F(1,20) = 1.88, n.s.) or task-by-block interaction (F(3,60) = 2.56, n.s.; Fig. 3B). There were no differences in choice behavior between rats in the PFC versus NAc dialysis groups (all Fs < 1.4, n.s.; Fig. 3C). Furthermore, the amount of food obtained over the course of the discounting task on test days did not differ between rats in the PFC or NAc groups, which by definition did not differ from the amount received by rats used in the yoked-reward experiments (all Fs < 1.0, n.s.). Note that with this schedule, animals tended to receive substantially more food in the 100% block compared with that obtained in the other three blocks when delivery of the larger reward occurred with a lower probability (Fig. 3D).
Choice behavior and reward data obtained during microdialysis tests. A, Percentage choice of the Large/Risky lever across four blocks of free-choice trials from rats in the PFC groups tested on either the descending (circles, n = 12) or ascending (squares, n = 11) variants of the probabilistic discounting tasks. Choice of the Large/Risky lever (y-axis) is plotted as a function of the Large/Risky reward probability by block (x-axis) in the manner in which reward probabilities changed for the different groups. B, Choice data displayed by rats in the NAc groups (descending, n = 13; ascending, n = 10). C, Choice data averaged across descending and ascending variants of the probabilistic discounting task for rats in the PFC (squares) and NAc (circles) groups. D, The amount of reward pellets obtained over the different probability blocks for rats in the PFC and NAc groups tested on either the probabilistic discounting or yoked-reward tasks. Note that animals obtained a substantially greater number of pellets in the 100% and 50% blocks compared with the other blocks.
The number of trial omissions on microdialysis test days was generally low and were comparable between the descending and ascending versions of the probabilistic discounting task for both medial PFC tests (F(1,21) = 3.07, n.s.; Table 1) and NAc tests (F(1,20) = 0.13, n.s.; Table 1). The average latency to choose a lever was also very similar in both groups between the descending and ascending versions of the probabilistic discounting task for both medial PFC tests (F(1,21) = 1.86, n.s.; Table 1) and NAc tests (F(1,20) = 1.60, n.s.; Table 1).
Mean response latency (in seconds) and trial omission data recorded while animals performed the descending and ascending versions of the probabilistic discounting task during medial PFC and NAc microdialysis tests
Microdialysis
The data from 80 experiments from which we were able to detect measurable levels of DA are reported here. The PFC probability discounting analysis included data from 23 rats (12 trained on the descending version, 11 on the ascending version), while the NAc discounting analysis included data from 23 rats (13 descending, 10 ascending). For the yoked-reward experiments, we analyzed PFC microdialysis data from 18 rats (10 descending, 8 ascending) and NAc data from 15 rats (8 descending, 7 ascending).
In the PFC, average extracellular concentrations of DA (uncorrected for probe recovery) during the baseline period were 0.10 ± 0.02 (SEM) nm per sample (Descending Discounting), 0.13 ± 0.03 nm (Ascending Discounting), 0.29 ± 0.06 nm (Descending Yoked-Reward), and 0.07 ± 0.01 nm (Ascending Yoked-Reward). In the NAc, concentrations of DA were 1.09 ± 0.10 nm per sample (Descending Discounting), 1.02 ± 0.16 nm (Ascending Discounting), 1.06 ± 0.37 nm (Descending Yoked-Reward), and 1.38 ± 0.28 nm (Ascending Yoked-Reward). A one-way ANOVA revealed no significant differences in baseline DA across the four tasks in the NAc (F(3,36) = 0.60, n.s.). However, the analysis of PFC baseline DA did reveal a significant main effect of task (F(3,40) = 8.57, p < 0.01), which was driven by the Descending Yoked-Reward group, which had slightly more elevated baseline DA levels than the other three groups (p < 0.01).
DA efflux in the medial PFC
Decision making
As expected, DA efflux increased significantly above baseline at task onset and remained elevated over the duration of the test session. We were particularly interested in comparing the profiles of changes in DA efflux in animals tested on the descending versus ascending versions of the task. As displayed in Figure 4A, rats tested on the descending version displayed a robust (160–200%) increase in PFC DA extracellular levels during the first (100%) probability block, and then showed a relatively rapid decrease across the subsequent blocks. In contrast, those rats trained on the ascending version showed the opposite profile, with PFC DA levels increasing as reward probability on the Large/Risky lever (and the amount of food obtained per sample) increased over a session. Our initial analysis used two prebaseline, eight on-task, and two postbaseline samples and compared DA efflux between the two versions. A two-way, between-/within-subjects ANOVA with sample per probability block as a within-subjects factor, and probability order (descending or ascending) as a between-subjects factor, was used. Thus, in this analysis, we compared DA efflux across the two tasks for each specific probability block, rather than the specific time points from which the samples were obtained. The ANOVA revealed a significant main effect of sample (F(11,231) = 11.22, p < 0.001) but, importantly, no main effect of probability order (F(1,21) = 0.88, n.s.) or order-by-sample interaction (F(11,231) = 1.36, n.s.). Therefore, PFC DA levels were comparable across the different probability blocks, regardless of the order in which the animal experienced changes in reward probabilities over a session. These findings suggest it is unlikely that changes in DA efflux were attributable to satiety or rundown of the DA signal that may have occurred during a session. As summarized in Figure 4B (circles), PFC DA efflux was highest in the 100% probability block, and changed proportionally over the remaining blocks, with lower values of PFC DA associated with both lower likelihood of obtaining the Large/Risky reward and relatively fewer reward pellets per sample.
Fluctuations in PFC DA efflux during decision making track changes in the relative rate of reward received. A, Percentage change in basal DA extracellular levels in the PFC for rats trained on the descending (yellow circles, n = 12) and ascending (blue circles, n = 11) variants of the probabilistic discounting task, plotted as a function of 7 min sample number. Star denotes p < 0.01 versus baseline for all samples in both groups. Rats tested on the descending version displayed an initial increase in DA that diminished as Large/Risky reward probabilities decreased, whereas those trained on the ascending version showed the opposite profile. B, Change in PFC DA efflux for all rats trained on the both variants of the probabilistic discounting task (circles, n = 23) plotted as a function of probability block. Gray and green circles correspond to DA levels obtained during forced- and free-choice portions of each block. Combined data from rats in the yoked-reward experiment (squares, n = 18) are also plotted. Cross denotes p < 0.05 significant difference in DA from samples obtained during the 25% and 12.5% blocks, relative to the 100% and 50% blocks for both groups.
Next we performed a targeted analysis on the eight on-task samples to further explore the profile of change in PFC DA efflux as Large/Risky reward probabilities varied, and also to determine whether there were any differences in DA efflux during forced- versus free-choice trials. A three-way ANOVA, with probability order as a between-subjects factor and probability block and forced-/free-choice trials as within-subject factors did not yield a significant main effects of order or interactions with the within-subjects factors (all Fs < 2.1, n.s.). However, the analysis did reveal a significant main effect of probability block (F(3,63) = 13.12, p < 0.01), indicating that PFC DA varied in accordance with changes in the Large/Risky reward probability across blocks. Multiple comparisons further revealed that PFC DA efflux was significantly (p < 0.05) lower during the 25–12.5% probability blocks relative to the 100–50% blocks (which did not differ from each other; Fig. 4B). Interestingly, this analysis did not yield a significant main effect of forced-/free-choice (F(1,21) = 0.005, n.s.) or interactions with this factor and block (F(3,63) = 1.09, n.s.). Therefore, even though PFC DA efflux varied across probability blocks, within each particular block, average DA levels during forced-choice trials (149 ± 9%) were comparable to those observed when animals were required to make a choice (150 ± 6%).
Yoked-reward condition
Rats in the yoked-reward experiments displayed a profile of PFC DA efflux that was strikingly similar to that observed during decision making. Rats yoked to a reward schedule experienced by those trained on the descending version of the discounting task again showed a robust increase in PFC DA extracellular efflux at the start of the session, followed by a rapid decrease across subsequent blocks. Likewise, rats yoked to those trained on the ascending version showed the opposite profile, with PFC DA efflux increasing as the amount of food obtained per sample increased over the session. Analyzing these data with an ANOVA model similar to that used on the discounting task again revealed a significant main effect of sample (F(15,240) = 14.28, p < 0.01), but no main effect of probability order (F(1,16) = 0.84, n.s.) or sample-by-order interaction (F(15,240) = 1.58, n.s.). As displayed in Figure 4B (squares), regardless of the reward schedule rats experienced, PFC DA levels were significantly higher during samples where more food reward was delivered (i.e., the equivalent of the 100 and 50% blocks, corresponding to ∼54 and 31 food pellets, respectively) compared with the 25 and 12.5% equivalent blocks (corresponding to ∼17 and 11 pellets, respectively). As was observed in the discounting experiment, PFC DA levels were again significantly (p < 0.05) lower during the 25% and 12.5% probability blocks, relative to the 100% and 50% blocks, respectively (which did not differ from each other).
We then directly compared DA efflux from animals performing the discounting task to animals on the yoked-reward task (Fig. 4B, circles vs squares). Again, the main effect of sample was significant (F(7,273) = 13.38, p < 0.01). However, we did not observe significant effects of task (discounting vs yoked-reward; F(1,39) = 0.076, n.s.) or sample-by-task interaction (F(7,273) = 0.59, n.s.). This confirms that the profile and magnitude of change in PFC DA release exhibited by rats engaged in decision making for food reward was indistinguishable statistically from rats that were not required to make instrumental responses or risk-based decisions, but did receive, passively, a comparable amount of food delivered on a similar schedule. This finding suggests that the fluctuations in PFC DA transmission during either condition correspond primarily to changes in the relative rate of reward received.
DA efflux in the NAc
Decision making
As observed in the PFC, DA efflux in the NAc increased in both the descending and ascending versions of the discounting task compared with baseline (Fig. 5A). Analysis of the neurochemical data with a two-way ANOVA revealed a significant main effect of sample (F(11,231) = 17.48, p < 0.01) but no main effect of probability order (F(1,21) = 0.03, n.s.) or order-by-sample interaction (F(11,231) = 0.70, n.s.). Therefore, fluctuations in NAc DA showed a similar pattern for the descending and ascending versions of the task when matched for probability block. As observed in the PFC for rats tested on the descending version, DA efflux in the NAc showed an initial increase at task onset and then a trend of decreasing values as the probabilities decreased across blocks. Conversely, for rats tested on the ascending version, DA release increased progressively over the session.
Fluctuations in NAc DA efflux track multiple factors related to decision making. A, Percentage change in basal DA extracellular efflux in the NAc for rats trained on the descending (yellow circles, n = 13) and ascending (blue circles, n = 10) variants of the probabilistic discounting task, plotted as a function of sample number. All other conventions are the same as Figure 4A. B, Change in NAc DA efflux for all rats trained on both variants of the probabilistic discounting task (n = 23), plotted as a function of probability block. All other conventions are the same as Figure 4B. DA levels during probabilistic discounting did not differ across the 100% to 25% blocks (n.s.), whereas levels during the 12.5% block were significantly lower than the other three blocks (p < 0.05, denoted by the dagger). Note the sawtooth pattern of change during the discounting task, where DA levels tended to be higher during free-choice (green circles) versus force-choice (gray circles) trials. Squares represent combined data from rats in the yoked-reward experiments (n = 15). Cross denotes p < 0.05 significant difference in DA from samples obtained during the 50%, 25%, and 12.5% blocks, relative to the 100% blocks for rats in the yoked-reward groups. Dashed outline highlights data points used for a targeted analysis comparing DA levels during the discounting versus yoked-reward conditions (averaged in E) C, Percentage change in NAc DA efflux displayed by rats in the yoked experiments that received reward schedules matched to subjects trained on the descending (yellow triangles, n = 8) and ascending (blue triangles, n = 7) variants of the discounting task. Star denotes significant difference at p < 0.05 versus baseline. Cross denotes p < 0.05 for a particular sample in the descending group compared with a sample obtained from the ascending group during the equivalent probability block. All other conventions are the same as Figure 5A. D, Average change NAc DA concentrations obtained during forced-choice (gray bar) and free-choice (green bar) portions of the discounting task. E, Average DA levels relative to baseline taken from the 50–12.5% blocks (highlighted in B) from all rats in the discounting and yoked-reward experiments. During this portion of the tasks, DA levels were higher in rats performing the discounting task (p < 0.05, star) compared with those receiving a comparable schedule of reward, but not making decisions.
Importantly, distinct differences were observed in the profile of changes in NAc DA efflux relative to the PFC. Figure 5B illustrates that the rate of change in the NAc DA curve across probability blocks was more gradual than that observed in the PFC, and also exhibited a sawtooth-like pattern corresponding to forced- versus free-choice trials (Fig. 5B, gray vs green circles). An ANOVA of the eight on-task samples yielded a significant effect of probability block (F(3,63) = 10.15, p < 0.01). Multiple comparisons further confirmed that NAc DA efflux did not differ across the 100, 50, and 25% probability blocks, but were significantly lower during the 12.5% (p < 0.05) compared with the other three blocks. This is in keeping with the indication that changes in NAc DA levels across blocks occurred at a slower rate relative to PFC DA under similar conditions. In addition, the analysis exposed a main effect of forced-/free-choice trials (F(1,21) = 4.63, p < 0.05). Inspection of Figure 5A reveals that for the descending condition (yellow circles), DA levels were higher during the second microdialysis sample (i.e., the on-task free-choice portion) relative to the first sample (i.e., forced-choice portion); the DA values in the fourth sample were again higher than the third, a pattern that repeats whenever samples from the free-choice condition are compared with the forced-choice situation. A similar pattern can be seen in the ascending condition (Fig. 5A, blue circles), although when plotted in this manner, the sawtooth profile is masked by a steady increase in DA efflux over the entire session. When averaged across the session, DA levels were proportionally 15% higher (∼6% difference) on trials when rats had to make a choice compared with trials where only one lever was presented (Fig. 5D).
Yoked-reward condition
Analysis of data obtained from the yoked-reward experiment again yielded no main effect of probability order (F(1,13) = 0.92, n.s.). A main effect of sample was observed (F(15,195) = 9.23, p < 0.01), confirming that when collapsed across reward schedules, DA efflux changed in accordance with the relative amount of food received per block, being highest during the equivalent 100% probability block, and progressively lower in the other blocks (Fig. 5B, squares). Multiple comparisons further showed that the rate of change in DA efflux over reward-equivalent blocks was steeper than that observed during the decision-making task. Specifically, DA values across samples in the 50, 25, and 12.5% blocks were significantly lower when compared with the 100% block (p < 0.05; Fig. 5B, squares). Curiously, the analysis also produced a significant sample-by-probability order interaction (F(15,195) = 3.03, p < 0.01; Fig. 5C). This interaction was driven primarily by the fact that for rats yoked to descending schedules, the magnitude of increase in DA efflux, relative to baseline, during the 100–50% blocks was greater (p < 0.05) than that observed during the same reward-equivalent blocks from rats yoked to ascending schedules. Notably, the amount of food obtained during these blocks did not differ between groups [descending = 3.9 ± 0.2 g (87 pellets); ascending = 3.6 ± 0.2 g (79 pellets); t(14) = 1.07, n.s.]. Therefore, rats that received this quantity of food at the start of the test session showed greater increases in NAc DA efflux compared with rats that received a comparable amount near the end of a session. While the reason for this difference is unclear, it may merely be attributable to the fact that rats in the ascending group had received ∼32 reward pellets earlier in the session. As such, this difference in the magnitude of DA efflux may reflect that the incentive salience of these larger amounts of food was greater when received at the start of the test session, compared with the end of a test session. It is important to note that no such difference in the relative magnitude of DA efflux was observed in rats actively engaged in the two versions of the discounting task. Despite this difference, the primary conclusion from the yoked-reward experiment is that NAc DA efflux varies in proportion to changes in the relative amount of reward received, with higher DA levels associated with periods in which relatively greater amounts of food were available.
When we directly compared changes in NAc DA efflux displayed by rats in the decision-making and yoked-reward conditions, the overall analysis of the eight on-task samples did not yield a significant main effect of task (F(1,36) = 0.96, n.s.) or sample-by-task interaction (F(7,252) = 0.59, n.s.). Thus, overall, changes in NAc DA efflux across the two conditions did not differ significantly. Recall, however, that when these data were analyzed separately, changes in NAc DA efflux across probability blocks during the discounting task were smaller than those observed during the yoked-reward experiment. Furthermore, inspection of the two curves presented in Figure 5B shows that although DA efflux was nearly identical between the two conditions during the 100% probability block (or equivalent), values tended to be higher in rats that performed the discounting task during the other blocks. Specifically, DA efflux in the discounting condition was higher initially during the free-choice portion of the 50% probability block and remained higher than those in the yoked-reward group over the rest of the session. It is notable that during these phases of the discounting task, rats had to choose levers under conditions of reward uncertainty, compared with other blocks, where they were always either guaranteed some reward (100% block) or do not have to make a choice (50%, forced-choice sample). In light of these considerations, we conducted an exploratory comparison of the average DA values obtained during the 50% and 12.5% blocks between the decision-making and yoked-reward tasks (Fig. 5B). This comparison showed that for rats in the discounting group, average DA values tended to be higher during this period compared with the yoked-reward group (t(36) = 1.82, p < 0.05, one-tailed; Fig. 5E). Viewed collectively, these results suggest that in addition to the amount of reward obtained, other factors may have influenced the manner in which NAc DA levels fluctuated during decision making, such as whether a choice is required or the amount of reward uncertainty associated with those choices.
Comparison of patterns of changes in DA efflux during choice and reward
The preceding analyses revealed systematic differences between patterns of DA efflux in the PFC and NAc during decision making. Changes in PFC DA efflux during the discounting task closely resembled the pattern of release observed in animals receiving matched amounts of food, suggesting that fluctuations in DA transmission in this region may signal changes in the relative rate of reward received. In contrast, although DA efflux in the NAc shared some similarities in both the probabilistic discounting and yoked-reward tasks, some notable distinctions emerged. DA efflux in the NAc increased during the free-choice trials compared with the forced-choice trials (an effect not observed in the PFC), suggesting that DA in the NAc may be associated in a specific manner with selection of a particular option with a specific probability of reward.
For the probability discounting task, two key variables changed across probability blocks that covaried with changes in DA efflux: the proportion of choices rats made of the Large/Risky option and the amount of food reward they obtained per block. To compare directly the rate of change in DA efflux in the PFC and NAc to these other two variables and thereby clarify which factors were related more closely to fluctuations in the DA signal, each of these datasets was transferred to the same metric, so we could compare the slopes of the curves. Specifically, for each individual animal's data, the change score relative to the 100% block was compared for each variable (DA, choice, reward) at the three remaining blocks. Both the DA and choice data obtained during the free-choice portion of each block were expressed according to the following formula: Value[X] − Value[100% block] + 1, where Value[X] is the value of the variable (DA, choice, number of pellets) obtained at a particular probability block. Following these transformations, the value at the 100% block was always 1. Change scores obtained from the food data were converted to percentages by dividing them by the amount of reward obtained in the first block, thereby normalizing them on the same relative scale as the other data.
Next, we computed the linear slopes of the curves of each variable for each animal. The products of these transformations were three curves (DA, choice, food) for each animal that were identical to the raw data in terms of their relative rate of change (i.e., slopes), but were now on comparable scales (e.g., compare choice curves in Fig. 6 to those in Fig. 3C). Dependent sample t tests were used to compare the average slopes of DA curve to changes choice and food received per probability block.
Comparison of changes in choice biases, reward rates, and DA efflux in the PFC and NAc. Graphs depict change scores relative to the 100% block for DA concentration during free-choice blocks (circles), choice of the Large/Risky option (diamonds), and the amount of food reward received (triangles). Lines display the relative slopes for each of the curves. A, Changes in DA efflux in the PFC during decision making closely corresponded to changes in the amount of food reward received across blocks, as the slopes of these two curves were not significantly different (n.s.). In contrast, changes in risky choice occurred at a significantly slower rate (p < 0.05, star). B, Changes in DA efflux in the NAc closely tracked changes in choice behavior across blocks (n.s.), but not reward rates (p < 0.05, star).
The results of these analyses are presented in Figure 6. As displayed in Figure 6A, changes in PFC DA levels across blocks (circles) occurred at a substantially faster rate when compared with changes in choice of the Large/Risky Reward option (diamonds), and an analysis of these data revealed that the slopes of the two curves were significantly different from each other (t(22) = 8.98, p < 0.01). In contrast, changes in the amount of food reward obtained over blocks appeared to correspond much more closely to changes in the DA signal, with a statistical comparison revealing no difference in the slopes of these two curves (t(22) = 0.78, p > 0.40). This result further supports the notion that changes in PFC DA transmission during decision making is associated closely with changes in the relative rate of reward received.
The opposite profile was observed for the NAc data (Fig. 6B). Here, the relative change in the DA signal, as revealed by the slope, was substantially slower compared with the slope of the amount of food reward obtained across blocks. Analysis of the two curves confirmed that their slopes were significantly different (t(22) = 7.64, p < 0.01). In comparison, changes in choice of the Large/Risky lever across blocks were nearly identical to relative changes in DA release at the same time points, with the analysis revealing that the slopes of these curves were indistinguishable statistically (t(22) = 0.48, p > 0.60). Thus, unlike the PFC, DA efflux in the NAc closely tracked changes in choice during decision making as the odds of obtaining a Large/Risky reward varied over the course of a decision-making session.
Response latencies
In these experiments, direct measures of locomotor activity were not collected. However, during the discounting tasks, response latencies were recorded, which may be used as an indirect measure of task-related/-unrelated activity. For example, greater activity during the intertrial interval would be expected to displace animals from the response levers, thereby increasing response latencies. We therefore analyzed changes in response latencies to assess whether variations in motor activity across probability blocks was related to fluctuations in PFC and NAc DA efflux. In the PFC experiment, response latencies were longer during forced- versus free-choice trials (F(1,22) = 82.52, p < 0.01; Fig. 7A, white vs gray circles). However, response latencies did not vary significantly across blocks (F(3,66) = 2.10, n.s.). Rats implanted with probes in the NAc also showed longer response latencies during forced- versus free-choice trials (F(1,22) = 105.44, p < 0.01; Fig. 7B). This effect may be related to the higher levels of NAc DA observed during free- versus forced-choice trials (Fig. 5D). However, once again response latencies did not vary across blocks (F(3,66) = 1.86, n.s.). These observations that PFC and NAc DA efflux changed consistently across probability blocks, whereas response latencies did not, suggests that variations in DA transmission during choice behavior are unlikely to reflect systematic changes in motor behavior that differ between high- compared with low-probability blocks. This notion is consistent with previous observations that DA efflux in the NAc shows no changes during forced locomotion on a rotorod task (Damsma et al., 1992).
Changes in response latencies across probability blocks. Graph depicts latencies to response during forced-choice (open circles), free-choice (gray circles), and average of the two values (squares), plotted as a function of probability block, for rats with microdialysis probes in the PFC (A) and the NAc (B).
Discussion
Here we report that dynamic fluctuations in PFC and NAc DA transmission are associated with specific aspects of cost/benefit decision making entailing risk/reward assessment. Changes in PFC DA track changes in the relative rate of reward received. In comparison, DA efflux in the NAc exhibits more complex patterns of change related to multiple factors, including whether a choice is required, the degree of reward uncertainty, and relative rates of reward delivery. Thus, dynamic changes in DA efflux in these regions may signal aspects of reward value or the relative probability of obtaining a reward, which in turn could influence decision biases and response selection in the face of changes in reward probabilities.
Changes in PFC DA as a running-rate meter of reward availability
DA efflux in the PFC showed similar changes during decision making, regardless of whether the odds of obtaining the Large/Risky reward decreased or increased. Rather than increasing at task onset and remaining elevated, variations in DA closely corresponded to changes in amounts of food obtained per sample. Furthermore, animals tested on the yoked-reward tasks showed a similar DA profile. This observation, combined with the finding that the rate of change in PFC DA efflux was comparable to that of the amount of food obtained per block, strongly suggests that dopaminergic afferents to the frontal lobes may convey information about changes in the relative amount of reward availability.
Increased PFC DA efflux has been associated with consumption of primary reward, lasting up to 30 min (Cenci et al., 1992; Phillips et al., 1993; Feenstra and Botterblom, 1996; Bassareo and Di Chiara, 1997, 1999). Moreover, DA efflux in the PFC appears to reflect the relative incentive value of the food consumed, as demonstrated by sensory-specific satiety manipulations (Ahn and Phillips, 1999). Therefore, DA release in the PFC may encode information about food consumption and its relative subjective value that may facilitate motivated behaviors guided by different cognitive processes. In this regard, changes in PFC DA efflux have been observed during performance of different cognitive tasks, including working memory (Phillips et al., 2004), set-shifting (Stefani and Moghaddam, 2006), and, in particular, delay-discounting (Winstanley et al., 2006).
In the latter study, PFC DA was measured while animals selected either a small/immediate reward or a large/delayed reward. Increases in PFC DA efflux were observed during performance of the task, and also during a yoked-reward condition similar to that used here. This finding suggested that “DA function may increase in this region in response to the earning or delivery of reward regardless of the type of … contingencies involved” (Winstanley et al., 2006, p 111). The present findings expand on this conclusion by showing that DA efflux in the PFC is finely tuned to variations in reward availability and may serve as a running-rate meter for reward, thereby informing the PFC about changes in reward rates. This notion is consistent with the conjecture that the absolute value of tonic DA provides an integrated and continuous estimate of the net rate of reward (Niv et al., 2007). Although this function has been attributed to mesoaccumbens DA, the present data suggest that changes in tonic DA levels within the PFC also represent this type of information.
The decision-making task used here requires animals to continuously monitor rewarded and nonrewarded actions to determine which options are becoming relatively more or less profitable, and modify choice behavior accordingly. The prelimbic PFC plays a critical role in facilitating these functions; inactivation of this region impairs the ability to modify biases when reward probabilities are volatile (St. Onge and Floresco, 2010). The present data provide insight into the mechanisms by which the PFC may register changes in reward probabilities, as they indicate that variations in mesocortical DA transmission may encode information about changes in reward availability that can be used to modify choice accordingly. It follows that interfering with this DA signal would cause a discrepancy between the perceived versus actual rates of reward obtained, thereby leading to decreases or increases in risky choices. Indeed, manipulations of PFC D1 receptors can induce these predicted effects; D1 blockade induces risk aversion and increases negative feedback sensitivity, whereas D1 agonists tend to increase risky choice (St. Onge et al., 2011). Interestingly, D2 blockade has the opposite effect of D1 receptor antagonism, increasing risky choices when reward probabilities decrease, suggesting that modulation of decision making by PFC D2 receptors is less dependent on variations in extracellular DA levels. This may be related to the high affinity of D2 receptors for DA, or to their effect on a population of neurons distinct from those modulated by D1 activity (Gee et al., 2012; Seong and Carter, 2012). As such, fluctuations in PFC DA signaling likely modify decision-making biases primarily via actions on D1 receptors.
Changes in NAc DA during decision-making encode multiple factors
In contrast to the PFC, variations in mesoaccumbens DA during decision making displayed a more complex profile. As observed in the PFC, NAc DA efflux decreased or increased across blocks in accordance with variation in the probability of Large/Risky reward. Comparison of DA efflux between the discounting and yoked-reward conditions showed similar (but not identical) patterns of change, suggesting that delivery of varying amounts of food can contribute to fluctuations in NAc DA release (Heffner et al., 1980; Radhakishun et al., 1988; Salamone et al., 1994). However, closer examination revealed other factors related specifically to decision making that also appeared to contribute to the profile of DA efflux in the NAc.
NAc DA efflux during decision making tended to be greater than in the yoked-reward experiments, specifically during the free-choice portion of the 50%, and throughout the 25–12.5% probability blocks. When compared with the other blocks, these portions of the task represented circumstance in which rats were required to make choices under conditions of greater reward uncertainty. Notably, midbrain DA neurons are particularly sensitive to reward uncertainty. Stimuli predictive of potential rewards evoke sustained increases in neural firing of VTA DA neurons that are maximal during periods of greatest reward uncertainty (Fiorillo et al., 2003). Therefore, it is reasonable to propose that choice conditions associated with relatively greater reward uncertainty may augment DA neuron activity that in turn contributes to greater mesoaccumbens DA levels compared with food delivery alone. Thus, aspects of the DA signal in the NAc may also convey information about the relative uncertainty of upcoming rewards.
The magnitude of NAc DA efflux was also greater during free- versus forced-choice trials. These data complement recent findings from studies using subsecond recordings of phasic NAc DA signaling when rats chose between smaller or larger rewards associated with a variety of costs (Day et al., 2010; Gan et al., 2010, Sugam et al., 2012). DA transients were of lesser magnitude when rats could only select a small reward, compared with free-choice or large-reward forced-choice trials. The magnitude of choice-related phasic DA events were correlated with the animal's preferred option, regardless of which reward was eventually selected, suggesting that DA encodes potential availability of preferred rewards (Sugam et al., 2012). Accordingly, differences in NAc dialysate DA levels observed here during forced-/free-choice trials may reflect an integration of these smaller/larger phasic events summed across 7 min samples. The relatively small difference in magnitude of DA efflux during forced-/free-choice trials is consistent with findings that phasic DA signaling makes a relatively minor contribution to tonic levels measured with microdialysis (Floresco et al., 2003). Nevertheless, these data show that choice situations activate NAc DA transmission to a greater degree than when animals respond for a potential reward without the benefit of a choice, which may signal opportunities to select larger/preferred rewards.
Another striking observation was how closely changes in NAc DA efflux tracked choice behavior. The rate of change of NAc DA across trial blocks was virtually identical to the manner in which rats shifted their preference away from/toward the Large/Risky option. Indeed, comparing slopes of the three curves (choice, food, DA) revealed that changes in NAc DA and choice occurred at the same rate, whereas changes in the amount of food obtained over blocks had a significantly steeper slope. The NAc has been implicated in promoting bias toward large/uncertain rewards (Stopper and Floresco, 2011), a function critically dependent on both D1 receptors and input from the basolateral amygdala (Stopper et al., 2010; St. Onge et al., 2012), which in turn can influence mesoaccumbens DA release (Floresco et al., 1998, 2001; Howland et al., 2002, Jones et al., 2010). Thus, in addition to variations in midbrain DA neuron activity, NAc DA transmission during decision making may also be modulated by signals from the amygdala that influence choice behavior. Moreover, variations in extracellular DA concentration (and corresponding activation of D1 receptors) would be expected to decrease/increase bias toward Large/Risky rewards in accordance with changes in reward probabilities. Together, these findings suggest that fluctuations in tonic DA transmission in the NAc represent an integration of multiple types of information relevant to decision making, including reward uncertainty, opportunities to select preferred rewards, overt choice behavior, and changes in reward availability. Viewed in a broader context, dynamic fluctuations in PFC DA may serve as a reward-rate meter that signals changes in reward availability used to modify decision biases, whereas variations of tonic DA in the NAc may encode a combination of factors that facilitate implementation of decision policies. Elucidation of the dopaminergic mechanisms that underlie different aspects of risk/reward judgments provides insight into the complex neurochemical modulation of normal decision making, and may also inform our understanding of pathophysiology contributing to numerous disorders associated with perturbations in reward processing, decision making, and DA function.
Footnotes
This work was supported by grants from the Canadian Institutes of Health Research (MOP 89861 to S.B.F. and MOP 38069 to A.G.P.). S.B.F. is a Michael Smith Foundation for Health Research Senior Scholar and J.R.S.O. is the recipient of scholarships from the Natural Sciences and Engineering Research Council of Canada and the Michael Smith Foundation for Health Research. We are indebted to Kitty So for her assistance with microdialysis.
- Correspondence should be addressed to Dr. Stan B. Floresco, Department of Psychology and Brain Research Center, University of British Columbia, 2136 West Mall, Vancouver, B.C. V6T 1Z4, Canada. floresco{at}psych.ubc.ca