Abstract
Reward-seeking behavior depends critically on processing of positive and negative information at various stages such as reward anticipation, outcome monitoring, and choice evaluation. Behavioral and neuropsychological evidence suggests that processing of positive (e.g., gain) and negative (e.g., loss) reward information may be dissociable and individually disrupted. However, it remains uncertain whether different stages of reward processing share certain neural circuitry in frontal and striatal areas, and whether distinct but interactive systems in these areas are recruited for positive and negative reward processing. To explore these issues, we used a monetary decision-making task to investigate the roles of frontal and striatal areas at all three stages of reward processing in the same event-related functional magnetic resonance imaging experiment. Participants were instructed to choose whether to bet or bank a certain number of chips. If they decided to bank or if they lost a bet, they started over betting one chip. If they won a bet, the wager was doubled in the next round. Positive reward anticipation, winning outcome, and evaluation of right choices activated the striatum and medial/middle orbitofrontal cortex, whereas negative reward anticipation, losing outcome, and evaluation of wrong choices activated the lateral orbitofrontal cortex, anterior insula, superior temporal pole, and dorsomedial frontal cortex. These findings suggest that the valence of reward information and counterfactual comparison more strongly predict a functional dissociation in frontal and striatal areas than do various stages of reward processing. These distinct but interactive systems may serve to guide human's reward-seeking behavior.
Introduction
Being able to adequately process reward information is essential to our physical, mental, as well as socioeconomic well being (Fellows, 2004). Alterations in the reward system have been associated with various neuropsychiatric disorders, including depression (Drevets, 2001), pathological gambling (Goudriaan et al., 2004), substance abuse (Volkow et al., 2003; Bechara, 2005; Garavan and Stout, 2005), eating disorders and obesity (Volkow and Wise, 2005), and schizophrenia (Chau et al., 2004).
Recent neuroimaging studies on human reward circuitry have implicated many brain regions, including the orbitofrontal cortex (OFC) and striatum. These structures are the main projection areas of two distinct dopaminergic pathways, the mesocortical and mesolimbic pathways, respectively. Although it has been suggested that midbrain dopamine neurons play a major role in reward processing (Schultz, 2002, 2006) in that they code prediction errors between actual and anticipated reward (Holroyd and Coles, 2002; Montague and Berns, 2002; Montague et al., 2006), it remains unclear how dopamine neurons modulate frontal and striatal areas at various stages of reward processing. For example, potentially because of differences in experimental paradigms, there have been mixed results as to whether the striatum and OFC are responsible for reward anticipation or reward outcome (Breiter et al., 2001; Knutson et al., 2001b; McClure et al., 2003; Ramnani et al., 2004; Rogers et al., 2004; Tanaka et al., 2004; Delgado et al., 2005). Another issue is whether there are functionally distinct systems to process reward information of positive (e.g., gain) or negative (e.g., loss) valence (Kringelbach, 2005). Whereas some studies suggest that medial areas (e.g., medial OFC and striatum) are sensitive to relative gains (O'Doherty et al., 2001; Nieuwenhuis et al., 2005) and lateral areas (e.g., lateral OFC and anterior insula) for loss or punishment (O'Doherty et al., 2003a; Ullsperger and von Cramon, 2003), other studies found that activity of the caudate nucleus and insula was independent of the valence of outcomes (Elliott et al., 2000; Delgado et al., 2003). To a certain extent, these mixed results highlight some of the important distinctions in human decision-making research, such as expected values and utilities (von Neumann and Morgenstern, 1944; Knutson et al., 2005; Tobler et al., 2006), framing and prospect theory (Kahneman and Tversky, 1979; Tversky and Kahneman, 1981; Trepel et al., 2005), and cognitive-affective interaction (Loomes and Sugden, 1982; Mellers, 2000; Ursu and Carter, 2005).
To better understand the functions of reward circuitry at different stages, the present functional magnetic resonance imaging (fMRI) study used one experiment to examine three reward processes, including reward anticipation, outcome monitoring, and choice evaluation. In a computerized gambling task, participants chose to bet or bank a certain number of chips and received consequences (Fig. 1). With this interactive task, we were able to examine brain activation patterns for positive and negative conditions at various phases of reward processing. We found a functional dissociation in frontal and striatal areas for processing of positive and negative reward information and counterfactual comparison. These distinct but interactive systems may serve to guide human's reward-seeking behavior.
Reward flowchart (top) and task procedures (bottom). Participants were informed the result (win or lose) even if they decided to bank on a trial.
Materials and Methods
Participants.
Seventeen right-handed, native English speakers (age range, 18–45 years; average age of 26 ± 8; nine women) were recruited from the local community. All participants had normal or corrected-to-normal vision. We screened the participants for neurological, psychiatric, and medical conditions through self-report. Potential volunteers were excluded if they reported taking any medications that affected CNS function (in consultation with a doctor associated with our protocol). None of the participants in the present study reported taking such medications, and none reported any major neurological, psychiatric, or medical conditions (including substance abuse and learning disabilities). We did not ask questions with regard to menstrual cycle for female participants. A signed informed consent form approved by University of Kentucky Institutional Review Board was obtained from each participant before the experiment. Two male participants were excluded from data analysis because of excessive head motion (absolute displacement with regard to the reference scan exceeded half a voxel size, 1.75 mm).
Design and task.
The task was a gambling game, in which participants decided whether to bet or bank a certain number of chips at each trial. For incentive, the final compensation for their participation included a bonus of $5.00 to $15.00 based on the total number of chips they earned at the end of the experiment, in addition to the hourly payment.
The game proceeded as follows (Fig. 1A). Each participant started off with 10 complimentary chips in the bank and used one as the first wager. In each trial, the participant had to decide whether to bank or bet the current wager. If she decided to bank, she would immediately earn the current wager (i.e., the wager would be added to the bank). If she decided to bet, the betting result would depend on a subsequent dice roll. If she lost the bet, the wager was confiscated, and she needed to get another chip out of her bank to continue the next trial. If she won the bet, the wager would be doubled and she could continue with a doubled wager. When the current wager reached 16 chips, the chips would be banked automatically and the game started over. As shown in Figure 1A (the dotted “look” line), one critical manipulation of the current experimental design was that even when she decided to bank, she would still witness the following dice-throwing process and be informed of the outcome. Note that this outcome is counterfactual in that it represents what would have happened if she had chosen to bet. One of our major interests was to investigate whether there are distinct brain activation patterns in factual versus counterfactual conditions.
There were two levels of risk involved. For the low risk condition, participants were informed the following: “The risk of the bet is relatively low. If the dice is 1 or 2, you lose. Otherwise you win. Your chance of winning against losing the bet is 2–1.” For the high-risk condition, they were told the following: “The risk of the bet is relatively high. If the dice is 1, 2, or 3, you lose. Otherwise you win. Your chance of winning against losing the bet is 1–1.” The outcome sequence of the trials (winning or losing) was predetermined by the computer in a random manner but constrained by the risk level. To make the dice roll result more salient, the border of the dice was depicted as green if it was a “win” trial or red if it was a “lose” trial. To ensure that participants understood the procedures, they were trained for 10 trials of each risk level before they performed the task in the scanner.
Procedures.
An event-related design was implemented. A trial consisted of four different events in a fixed order: a chip number event indicating the current wager (i.e., the chips at stake), a response-collecting event asking participants to bank or bet the chips, an outcome event indicating a winning or losing dice roll regardless of the participant's choice to bet or bank, and a blank event with a central fixation (Fig. 1B). Each event within a trial lasted 2 s, except for the blank event, which was displayed for either 2, 4, or 6 s randomly. The first four scans (8 s) of each run were used as a buffer. The remainder of each run consisted of a total of 42 trials. Four runs were acquired for each participant, two for the high-risk condition and two for the low-risk condition. The order of the risk conditions was counter-balanced across the four runs within each participant as well as across participants.
One potential issue with such a fixed order of events was that neuronal responses evoked by different components within a trial might become highly correlated and hard to be differentially estimated. We considered as an alternative, a jittered design (i.e., jittering every event within a trial), which would significantly increase the trial length and the length of the session. However, this could compromise data quality by introducing more motion artifact resulting from longer scanning sessions. Another alternative was to have a jittered design but reduce the number of trials, keeping the session length the same as in the present design. However, reducing the number of trials would reduce detection power caused by reduced repetition of experimental conditions. For these reasons, we chose the present design, but we address the concern of potential correlations among events in the fMRI data analysis and behavioral results below.
Data acquisition.
A 3T Siemens (Erlangen, Germany) Trio magnetic resonance imaging system located at the University of Kentucky Magnetic Resonance Imaging and Spectroscopy Center equipped for echo-planar imaging (EPI) was used for data acquisition. EPI images were acquired with an eight-channel head coil using the blood oxygen level-dependent (BOLD) technique (repetition time, 2000 ms; echo time, 29 ms; flip angle, 76°), each consisting of 34 contiguous axial slices (matrix, 64 × 64; in-plane resolution, 3.5 × 3.5 mm2; thickness, 3.5 mm; no gap), parallel to the inside curve of each participant's OFC to reduce the signal loss and distortion in this region (Deichmann et al., 2003). A high-resolution T1-weighted magnetization-prepared rapid gradient echo (MP-RAGE) anatomical set (192 sagittal slices of full head; matrix, 224 × 256; field of view, 224 × 256 mm2; slice thickness, 1 mm; no gap) was collected for each participant.
Stimuli were presented using a high-resolution rear projection system with responses recorded via two fiber-optics response pads, each with one button. A computer running E-Prime controlled stimulus presentation and recording of responses. In addition, the timing of the stimulus presentation was synchronized with magnet trigger pulses.
Image analysis.
Before statistical analysis, the first four volumes of each run were discarded to allow the MR signal to reach steady state. The remaining images in each participant's time series were motion corrected using the MCFLIRT module of FSL (FMRIB Software Library, version 3.2) package (http://www.fmrib.ox.ac.uk/fsl). Images in the data series were then spatially smoothed with a three-dimensional Gaussian kernel (full-width half-maximum, 7 × 7 × 7 mm3) and temporally filtered using a high-pass filter (90 s). The FEAT (FMRIB Expert Analysis Tool) module of FSL package was used for these steps and later statistical analysis.
Customized square waveforms were generated for each individual. These waveforms were convolved with a double gamma hemodynamic response function (HRF). For each participant, we used FILM (FMRIB Improved Linear Model), with local autocorrelation correction, to estimate the hemodynamic parameters for four explanatory variables (the number of chips, bank or bet, loss or win, and wrong or right) and generate statistical contrast maps of interest. Given the concern about the fixed order of events within a trial, we expected that the right/wrong regressor would be correlated with the other two regressors given this variable was determined by the combination of the other two variables (bet-win and bank-loss constituted the right choices, whereas bet-loss and bank-win constituted the wrong choices). However, we expected that the bet/bank regressor would not highly correlate with the win/loss regressor because of randomness and unpredictability of the outcomes. Considering the nature of these regressors, we constructed a general linear model, orthogonizing the right/wrong regressor with regard to the win/loss regressor, basically similar to a partial correlation analysis to partial out the brain activation patterns uniquely explained by each regressor. It should be noted that a significant correlation between these regressors limits the sensitivity to discriminate between different phases.
Each of the four runs for each participant was analyzed separately, and the average of these four runs for each individual was obtained through a higher level analysis using the FLAME (FMRIB Local Analysis of Mixed Effects) module (stage 1 only). Contrast maps were warped into common stereotaxic space before mixed-effects group analyses were performed. The normalization procedure involved registering the average EPI image to the MP-RAGE image from the same participant and then to the ICBM152 T1 template, using the FLIRT (FMRIB Linear Image Registration Tool) module.
To identify the regions of brain activation, we defined the regions of interest (ROIs) by clusters of 30 or more contiguous voxels (Xiong et al., 1995) in which there was significant difference in brain activity across conditions (Z > 2.81; p < 0.005, two-tailed). Using the Mintun peak algorithm (Mintun et al., 1989), we further located the local peaks (maximal activation) within each ROI. Additional ROI analyses were performed using the average signals extracted from these clusters.
Results
Behavioral results
We present the descriptive statistics according to different reward processes (e.g., frequency and proportion of positive versus negative reward anticipation and choice evaluation). Because the frequencies of reward outcomes (e.g., win vs loss) were predefined, we report the net earning of chips for each run instead.
Reward anticipation and decision making: bet (positive) versus bank (negative)
The ideal ratio of bet versus bank should be 1:1 for the high-risk condition and 2:1 for the low-risk condition based on the predefined ratios of win versus loss in these conditions. However, there was a strong tendency for participants to bet the chips, χ2(1) = 17.922, p < 10−4 (against no bias) (Fig. 2A). The ratio of bet versus bank was 2.72 overall. This bias to bet was stronger for the low-risk condition than for the high-risk condition, as indicated by a significant interaction between the choices (bet vs bank) and the levels of risk (high vs low), χ2(1) = 37.578, p < 10−9 (against no bias across the levels of risk). For the low-risk condition, the ratio of bet versus bank was 3.58, χ2(1) = 5.006, p = 0.025 (against the ideal ratio of 2:1); and for the high-risk condition, the ratio was 2.13, χ2(1) = 10.906, p = 0.001 (against the ideal ratio of 1:1). Furthermore, within a risk condition, there was a significant interaction between the choice (bet vs bank) and the outcome of the previous trial (win vs loss). Specifically, for the high-risk condition, participants were equally likely to bet or bank after a win under the win–loss ratio of 1:1 (bet–bank ratio of 0.97, χ2(1) = 0.009, p = 0.925) but were more likely to bet after a loss (bet–bank ratio of 5.97 against the ideal ratio of 1:1, χ2(1) = 20.769, p < 10−5), χ2(1) = 20.778, p < 10−5 (Fig. 2B). For the low-risk condition, participants were twice as likely to bet than bank after a win under the win–loss ratio of 2:1 (bet–bank ratio of 2.25, χ2(1) = 0.156, p = 0.693) but were biased to bet after a loss (bet–bank ratio of 17.59 against the ideal ratio of 2:1, χ2(1) = 9.588, p = 0.002), χ2(1) = 9.744, p = 0.002 (Fig. 2C). These findings indicated that participants tended to be risk-taking to maximize monetary gain, particularly when they were losing. Relatively lower risk level further promoted such risk-seeking behavior.
Behavioral results. The figures show the frequency of bank versus bet (A), conditional probability of bank and bet based on the outcome of previous trial (B, C), frequency of right and wrong choices (D, E), and number of chips earned for each run (F). There were 84 trials across two runs for each risk level.
Choice evaluation: right (positive) versus wrong (negative)
When participants bet and won or when they banked and avoided a potential loss, the choices were “right” and these conditions represented positive choice evaluation. When they bet but lost or when they banked but could have won if they had decided to bet, the choices were “wrong” and these conditions represented negative choice evaluation. As seen in Figure 2D, participants made approximately equal number of right or wrong choices during the high-risk condition, χ2(1) = 1.190, p = 0.275, and relatively more right choices during the low-risk condition, χ2(1) = 5.902, p = 0.015. In fact, they made right choices approximately two times as frequently as wrong ones, which was about the same as the win–loss ratio of 2:1 for the low-risk condition, χ2(1) = 0.440, p = 0.507. Further breakdown of right and wrong choices is shown in Figure 2E. These patterns suggested that participants implicitly adopted the ideal strategy, which was prescribed by the predefined ratios of win versus loss, although they were biased toward risk-taking after a loss.
Number of chips earned
Figure 2F shows the total number of chips earned for each of the high- and low-risk runs. Participants broke even during the high-risk condition and gained chips during the low-risk condition. Repeated-measures ANOVA with the risk levels and runs (within-subject) and orders of the risk levels (between-subject) revealed that the number of chips gained was significantly affected by the risk levels [F(1,13) = 86.886, mean squared error (MSE) = 28.550, p = 0.000), the runs (F(1,13) = 9.6859, MSE = 17.704, p = 0.008), and marginally by the orders of the risk levels (F(1,13) = 4.061, MSE = 67.401, p = 0.065). This further confirmed that participants performed within the reasonable range, such that they did not gain or lose when the risk was at the chance level and profited when the odds were favorable.
Correlation between decision, outcome, and evaluation
There were moderate correlations among the variables for different stages of reward processing. As expected, correlations between decision (bet vs bank) and outcome (win vs loss) were moderate, given that of 60 total correlation analyses (i.e., 15 subjects × 4 runs/subject) between the bet/bank and win/loss variables, only nine were significant. However, correlations between the win/loss and right/wrong variables were significant in 48 of 60 cases. After convolving with the HRF, correlations among these regressors became slightly higher, with 17 of 60 significant for the bet/bank and win/loss regressors and 54 of 60 for the win/loss and right/wrong regressors. It should be noted that most of the significant correlations between bet/bank and win/loss regressors were negative, suggesting that it was not caused by the positive correlation between bet/bank (1/−1) and win/loss (1/−1) within a trial, but by the negative correlation between consecutive trials. As shown in Figure 2, B and C, the choice of the current trial was affected by the outcome of the preceding trial. Specifically, participants were more likely to bet after a loss than a win. As a consequence, jittering the intertrial interval helped alleviate the correlation between the bet/bank and win/loss regressors.
Imaging results
We report the imaging data related to three stages of reward processing separately (i.e., reward anticipation and decision making, reward delivery and outcome monitoring, and choice evaluation). Although we also manipulated the levels of risk in the study, this factor interacted with different reward processes to a minimal degree. This was possibly a result of separation of risk levels into different runs, which reduced the statistical power of detecting the difference caused by the risk levels. Nevertheless, several frontal regions, including the superior, middle frontal cortex, anterior cingulate cortex (ACC), supplementary motor areas, as well as the inferior and superior parietal cortex were differentially activated by the high- and low-risk conditions. The areas modulated by the risk levels are listed in supplemental Table 1 (available at www.jneurosci.org as supplemental material).
Given that the two risk levels produced otherwise similar brain activation patterns, the following analyses were based on pooled data of both high- and low-risk conditions. The focus of the results and discussion will be limited to the frontal cortex and striatum, although there are interesting patterns of activation observed in the posterior visual processing areas as well (supplemental Table 2, available at www.jneurosci.org as supplemental material).
Areas sensitive to reward anticipation and decision making (bet vs bank)
The contrast between bet and bank choices illustrated expectation of winning (positive anticipation) versus losing (negative anticipation). Presumably, participants would take the risk and bet if they thought they were likely to win. In contrast, they would bank the wager to avoid a loss if they expected the outcome was against them. As shown in Table 1, the caudate nucleus and medial OFC were significantly more active when participants made the bet than when they banked (Fig. 3A). In contrast, the lateral OFC, inferior frontal, and superior medial frontal cortex as well as the anterior insula showed greater activation when participants banked the chips in anticipation of losing the trial, compared with when they chose to bet in anticipation of winning (Fig. 3B).
Brain areas activated during the stage of reward anticipation and decision making
Imaging results. A, C, and E show the striatum (top) and medial/middle OFC (bottom), and B, D, and F show the lateral OFC and anterior insula/superior temporal pole (top) and dorsomedial frontal cortex (bottom). The right side of the image is right side of the brain.
According to many decision theories, the choice made at the bet/bank decision stage may result from a combination of different factors such as the probability assessment and risk tolerance, framing of the context, and expected values (objective) and utility (subjective) of different outcomes. We do not subscribe to any particular one. Instead, we use “reward anticipation” to summarize the net effect of all of these factors. Choosing a risky bet and foregoing the sure gain of banking reflects a win of the positive reward anticipation over the negative fear of loss. According to calculation of expected values, it was unlikely that participants made their decisions based on the expected values of the two choices. The design of the task was such that the winning payoff was doubled whenever participants bet and won. Therefore, in the case of the high-risk condition where the odds were 1:1 for win versus loss (50% to win), the expected values for both decisions (bank and bet) were the same on each specific trial. For example, when the chip count was 1, betting the chip had the expected value of 1 (= 2 × 50% + 0 × 50%), whereas banking the chip also had the expected value of 1 (= 1 × 100%). If participants won the first trial and had two chips to wager, the expected values of both “bet” and “bank” decision were still matched for bet (4 × 50% + 0 × 50%) and bank (2 × 100%). Therefore, the decision to “bet” or “bank” could not be solely based on choosing the option associated with a larger expected value but rely on the outlook of the reward outcome. In the case of the low-risk condition where the odds are 2:1 for win versus loss (67% to win), the expected value for the “bet” decision (2 × 67% + 0 × 33%) was always higher than that of the “bank” decision (1 × 100%). This payoff scheme could not explain why participants chose to bank in some trials if they made their decisions solely based on expected values. Therefore the logic behind the decision was likely to be driven by anticipation of the outcome, instead of the choice between two alternative expected values. However, we could not rule out that their decisions to bank or bet may also be influenced by other factors such as expected utilities of two alternatives, participants' framing and assessment of the risk involved in a specific trial, their perceived randomness of the outcome (e.g., they may underestimate the risk after a loss trial), etc.
Areas sensitive to reward delivery and outcome monitoring (win vs loss)
The contrast between win and loss trials illustrated positive and negative outcome monitoring. As shown in Table 2, the caudate nucleus, middle OFC, and left middle frontal cortex as well as the superior/posterior insula became more active when participants won than when they lost (Fig. 3C). In contrast, the inferior frontal and superior medial frontal cortex, anterior insula, superior temporal pole, and midbrain were significantly activated when the outcome was against the participants (Fig. 3D). It is notable that there is a mediolateral distinction in the ventrofrontal cortex between positive and negative outcome monitoring, similar to the one observed during positive versus negative reward anticipation.
Brain areas activated during the stage of reward delivery and outcome monitoring
We did not, however, observe amygdala and surrounding areas activated for processing of negative reward information, which was found in previous studies (Breiter et al., 2001; O'Doherty et al., 2003a). We speculate that the emotional response elicited by the negative consequences (i.e., losing chips) was not strong enough to activate the amygdala, given that all but one participant never had a cumulative chip count lower than zero. Nevertheless, according to one of the principles of the prospect theory (Kahneman and Tversky, 1979), the outcome of a gamble is framed as a gain or loss with respect to a neutral point instead of the cumulative asset. Therefore, it is legitimate to compare winning versus losing of each trial to examine reward process involved in monitoring positive or negative outcomes. It should be noted, however, that because of the absence of strong aversive consequences (punishment), the mediolateral distinction observed above may only be present within the intensity range of the outcomes tested in this study. Given the lack of this distinction observed in the literature using strong negative reinforcers (e.g., pain), additional research is needed to assess how broadly such a distinction may apply to strong rewarding or aversive stimuli.
Areas sensitive to evaluation of choices (right vs wrong)
Evaluation of choices can also be distinguished by the feedback provided to the participants. Specifically, when the outcome of reward matched reward expectation (“right” decision), the choice was positively evaluated (e.g., rejoicing and glad). This included both banking-and-lost and betting-and-won. In contrast, when the outcome did not match expectations (“wrong” decision), the choice was negatively evaluated (e.g., regretting and sad). This included both banking-and-won and betting-and-lost. As shown in Table 3, positive choice evaluation significantly activated the striatum, middle and superior OFC, as well as the middle frontal cortex compared with negative choice evaluation (Fig. 3E). In contrast, the inferior frontal and dorsomedial frontal cortex, including the ACC, superior temporal pole, and anterior insula, became more active when participants negatively evaluated their decision compared with when they made the right choice (Fig. 3F).
Brain areas activated during the stage of choice evaluation
Similar and unique regions for different reward processing stages and distinct areas for positive and negative valences of these processes
As illustrated in a conjunctive overlay in Figure 4A, the positive aspect of each of the three reward processes recruited similar regions along the medial aspects of the brain, including the caudate nucleus, and medial/middle OFC. Figure 4B displays the details of overlapping activation in the left and right striatum. In contrast, the negative aspect of each of the three processes demonstrated similar neural profiles in the ventrolateral areas, such as the lateral OFC, inferior frontal cortex, anterior insula, and superior temporal pole, as well as the dorsomedial frontal cortex (Fig. 4C). Figure 4D displays the details of overlapping activation in the left and right OFC and anterior insula regions.
Conjunctive overlay of positive (A) and negative (B) aspects of reward processing and detailed coronal slices of overlapping regions of positive reward processing in the striatum (C) and negative reward processing in the lateral OFC and anterior insula (D).
We also performed direct contrasts between different phases to determine brain regions uniquely activated in a specific phase more than the others (supplemental Table 3, available at www.jneurosci.org as supplemental material). In examining brain activation patterns resulting from these direct contrasts, we noted, however, that these patterns may not be specific to a certain reward process per se. For example, the decision stage (bet/bank) involved a motor response and simple visual input, whereas the outcome stage (win/loss) involved complex visual input but no motor response. The direct contrast revealed that the decision stage activated the motor cortex to a higher degree, whereas the outcome stage significantly recruited the visual cortex. However, these activation patterns were not specific to reward processing. Also, because the processes involved in different stages were not controlled, direct contrasts between these phases could not pinpoint specific reward-related functions.
Region of interest analysis of the striatal and lateral OFC/insula areas
Through conjunctive masking, we identified two medial striatal areas (Fig. 4B) and two lateral OFC/insula areas (Fig. 4D) commonly activated by the valence contrast (positive vs negative) across various stages. ANOVA analyses were performed on average signal intensity extracted from these ROIs (Fig. 5). We confirmed that there were significant valence effects across these ROIs. Additionally, we found that although the valence effects did not differ across different phases (no significant valence by stages interaction), the main effect of reward phases indicated that these three stages engaged these ROIs differentially. Activation patterns were similar between the decision and outcome phases, whereas the profile for the evaluation phase presented a slightly different pattern.
Regional BOLD response of the conjunctive ROIs in the left and right striatum (positive processes > negative processes) and left and right OFC and anterior insula (negative processes > positive processes).
To illustrate common and distinct involvement of these medial (left and right striatum) and lateral (lateral OFC/anterior insula) ROIs in different aspects of reward processing, we plotted their respective time courses for different conditions. As shown in Figure 6, positive anticipation of reward (bet) activated the bilateral striatum more than negative anticipation (bank), at 4–6 s after the choices were made. In parallel, positive outcome (win) also activated these areas to a higher degree than negative outcome (loss), at 8–10 s after the outcomes were revealed. Similar time courses of the striatum for choice evaluation were also observed. Positive evaluation of right choices significantly activated the striatum, compared with negative evaluation of wrong choices, at 4–10 s after the consequence of the choice became clear. In contrast, the lateral OFC and anterior insula regions displayed greater activation for negative anticipation, outcome, and choice evaluation (Fig. 6).
Time course plots of the conjunctive ROIs for various stages of reward processing of positive (solid lines) and negative (dashed lines) information. The left and right striatum show greater responses to the positive aspects of reward processing, whereas the left and right OFC and anterior insula areas show greater responses to the negative aspects of reward processing.
Another interesting finding from this analysis concerns different neural substrates for factual and counterfactual reward processes. Counterfactual reasoning, often in the form of “if only I had acted differently,” is an important type of reasoning underlying human causal inference. In the current context, counterfactual thinking manifested itself most clearly when we compared the two banking conditions, in which the actual gain was not affected by the outcomes. However, having participants witness the outcome even after they banked prompted them to realize that “this would have happened if only I had bet.” An added value of examining counterfactual comparison in the reward task is that it helps disentangle the effects of attention and reward-related processes on the neurophysiological measures (e.g., fMRI signals) as mentioned by Maunsell (2004). Because counterfactual comparison reflects the interaction between different stages of reward processing such that reward expectancy modifies the experience of an outcome, it helps rule out the possibility that the signals observed in these reward-related processes are potentially caused by an alteration of attention allocation.
To investigate counterfactual comparison, we examined choice evaluation more closely by breaking down the right and wrong choices into four different conditions and plotting the time courses of the conjunctive ROIs for these conditions (Fig. 7). Activity in the striatum was determined by the combination of decision and outcome for making the right choice. Both the bet-win and bank-loss conditions (correct choices to maximize gain and prevent loss) activated bilateral striatum significantly higher than the two incorrect choices (i.e., suffer a loss for the bet-loss condition and fail to profit for bank-win condition). Although the actual gain was not affected by the outcomes after a banking choice, participants devaluated their banking choice when they would have won and doubled their chips had they decided to bet. The striatal response for this bank-win condition was even lower, in rank, than the bet-loss condition when participants suffered an actual loss (Fig. 7A,B).
A–D, Time course plots for the breakdown of the right and wrong choices. Activity in the striatum was sensitive to the combination of decision and outcome for making the right choice (solid lines). Although the actual gain was not affected by the outcome after a banking choice, the reduced striatal activity for the bank-win scenario clearly demonstrated counterfactual comparison. Participants devaluated the gain when they could have doubled their chips had they bet. Activity in the lateral OFC and anterior insula was driven by counterfactual reasoning after a banking choice (square markers). “What if I had bet” produced counterfactual regret (bank-win, “I could have doubled the chips”) or relief (bank-loss, “I could have lost”), compared with the factual regret (bet-loss, “I lost”) or relief (bet-win, “I won”).
In contrast, a different counterfactual comparison pattern was observed in the lateral OFC and anterior insula. These regions were significantly activated by both counterfactual regret in the bank-win scenario and counterfactual relief in the bank-loss scenario (Fig. 7C,D). For example, one comparison was between bet-loss and bank-win conditions, both of which were “wrong” and led to a negative feeling of regret. However, the regret in the former case was indicative (“I bet but I lost”), whereas the regret in the latter case was counterfactual (“I would have won had I bet”). A similar comparison was between bet-win and bank-loss. Both scenarios were “right” and led to a positive feeling of relief. Again, the former was indicative (“I bet and won”), whereas the latter was counterfactual (“I would have lost had I bet”). Alternatively, this pattern of activity in the OFC and anterior insula may simply be driven by the decision to bank. However, the responses of the OFC and anterior insula in Figure 7, C and D, were quite different from those shown in Figure 6 in two aspects. First, the time courses were time locked to the onsets of the events. The responses peaked at 4–6 s after a decision was made, whereas they peaked at 4–8 s after the consequence of the decision became evident, which itself was at least 2 s after a decision was made. Second, the differences in decision (bet vs bank) were driven by more negative activation for the “bet” choice, whereas the differences in evaluation (factual vs counterfactual) were driven by more positive activation for the counterfactual comparison.
Discussion
A range of decision theories, including classical expected utility theory (von Neumann and Morgenstern, 1944), prospect theory (Kahneman and Tversky, 1979), and regret theory (Loomes and Sugden, 1982; Mellers, 2000), could each offer great insights about why people behaved the way they behaved in this gambling task. For instance, whereas the expected value of banking was equal to that of betting in the high-risk condition (both equal to the number of chips at stake), by preferring betting to banking people were presumably maximizing expected utility. However, it is important to recognize that no single factor is able to explain the full range of results reported here. It seems clear that the adopted strategy depends on multiple factors, including the participant's attitude toward risk, framing of the outcome, emotional response, and even the context (e.g., reward history). By contrasting different conditions (bank/bet, win/loss, right/wrong, factual/counterfactual), the current study allows us to examine the possible functional distinction in frontal and striatal areas for reward processing across different stages.
One distinction has to do with the relationship between reward anticipation and reward delivery. Some neuroimaging studies have suggested that different brain regions are involved in reward expectation and reward delivery (Knutson et al., 2001b). They found that the nucleus accumbens was significantly activated when participants were anticipating reward (Knutson et al., 2001a), whereas the mesial prefrontal cortex was preferentially recruited during reward delivery (Knutson et al., 2003). However, results from other studies question such a distinction. Rogers et al. (2004) found that positive outcomes activated both the striatum and medial OFC more than negative outcomes at the phase of reward delivery. Delgado et al. (2005) reported that both anticipation and outcome stages of reward recruited the caudate nucleus. Moreover, Breiter et al. (2001) found that both nucleus accumbens and OFC were similarly activated during both phases of reward anticipation and delivery of reward.
Single-cell recording studies in animals have also shown that neurons in both OFC and striatum increased firing during both expectation and delivery of reward [Schultz et al. (2000), their Fig. 9], under phasic modulation of midbrain dopamine neurons (Schultz, 1998). Such an animal model has been supported by neuroimaging studies in humans as well (Braver and Brown, 2003). The striatum (Pagnoni et al., 2002; McClure et al., 2003) and OFC (O'Doherty et al., 2003b) are the primary brain regions targeted by such dopaminergic modulation in reward processing.
Consistent with this neuroimaging and neurophysiological evidence, we found that both reward expectation and reward delivery recruited similar brain regions in the medial/middle OFC and striatum when the outlook of reward processing was positive. In addition, when outcome did not match expectation, which resulted in a negative prediction error, activity in these regions attenuated and even dropped below the baseline. Deactivation of these regions for negative prediction errors has been documented in previous studies (Knutson et al., 2001b; McClure et al., 2003; O'Doherty et al., 2003b). These findings suggest that both medial/middle OFC and striatum are commonly recruited in reward expectation, outcome monitoring, and choice evaluation. This idea is in accordance with the concept of cognitive-affective interaction in choice behavior (Mellers, 2000). Choices are made based on expected utilities. Comparison between the obtained and alternative outcomes affects the anticipated feelings of obtaining a certain outcome (e.g., regret or relief). These anticipated feelings, in turn, modify the utility function. Therefore, it is reasonable for these different reward processes to share certain neural circuitry.
A second distinction is related to processing of positive versus negative reward information. The current results corroborated the mediolateral distinction within the OFC for positive and negative reward processing (O'Doherty et al., 2001) [see a meta analysis by Kringelbach and Rolls (2004)]. When participants expected to win, actually won the chips, and positively evaluated their choices, the medial/middle areas of the OFC became more active, compared with when the outlook of reward processing was negative. The striatum showed similar activation patterns as the medial OFC. In contrast, the lateral areas, including the lateral OFC, anterior insula, and superior temporal pole, were significantly activated for negative reward processes during anticipation, outcome, and choice evaluation. Previous studies have shown that the anterior insula is involved in negative emotion and reward-related processing (for review, see Phan et al., 2002), given its close reciprocal connection with the amygdala. Critchley et al. (2001), using a card-guessing reward task, found that activity of the anterior insula and lateral OFC positively correlated with the risk involved. Paulus et al. (2003) reported that activity of the insula became stronger when participants selected a more risky choice versus a safer one. The right insula was also significantly activated by the punishing trials. Kuhnen and Knutson (2005) found that relative loss between the chosen and unchosen stocks activated the anterior insula. However, it should be noted that studies using physiological stimuli (e.g., taste) suggest that activity in both the insula and ventral striatum may be driven by the stimulus intensity as well as the valence of stimuli (Small et al., 2003) or subjective preference (O'Doherty et al., 2006).
Another distinction is related to choice evaluation in reward processing. Emotional responses associated with choice assessment may exert a significant influence on future reward behavior. Usually choice evaluation involves determining whether the anticipated reward is realized or not (i.e., prediction error) as well as whether the alternative is better or worse (i.e., counterfactual comparison). When there is no negative prediction error, people likely evaluate their choice positively and choose to maintain their decision-making strategy. Otherwise, they negatively evaluate their choice and adjust their future responses. In the present study, we found that positive evaluation of right choices activated similar brain areas as other positive reward processes, such as the middle OFC and striatum. In contrast, negative evaluation of wrong choices significantly recruited the bilateral superior temporal pole extending to the anterior insula. The role of anterior insula in negative emotions such as regret (Kuhnen and Knutson, 2005) and disgust (Sanfey et al., 2003) in reward behavior may affect peoples' decision strategy and lead them to adjust their future choice behavior.
Counterfactual comparison also plays a critical role in choice evaluation and activity of the reward circuitry. This process involves comparison of the obtained outcome and the outcome of an alternative choice. When the alternative yields a better outcome than the executed choice does, people usually experience negative emotion such as regret and reevaluate utilities associated with different options. For example, the “what if” scenario resulted from the bank-win condition caused participants devaluate their banking decision, because they would have profited more from the alternative choice. Although the actual gain did not depend on the outcome after the banking decision, the striatum showed significant deactivation for the bank-win condition, compared with the bank-loss condition. This result confirms a similar finding (Breiter et al., 2001), in which activity in the nucleus accumbens was sensitive to counterfactual comparison. In contrast, activity in the lateral OFC and anterior insula was driven by both counterfactual regret in the bank-win scenario and counterfactual relief in the bank-loss scenario (Fig. 7C,D). The counterfactual conditions showed higher activation than the corresponding factual conditions, indicating that bilateral OFC and anterior insula were closely related to counterfactual reward processing. This result is consistent with another fMRI study on counterfactual comparison (Ursu and Carter, 2005).
Positive and negative evaluation of choices may have different effects in guiding future decision-making behavior. Present behavioral results suggested that participants adopted different strategies based on the outcomes from the previous trials. When they won, the likelihood of choosing to bet versus bank implicitly followed the “ideal” ratios of win versus loss. However, after a loss, they completely abandoned this strategy and became more likely to bet. This win-stay-lose-switch strategy is often observed in reinforcement learning in both animals and humans. The mediolateral distinction we observed suggests the roles of the medial areas in maintaining response strategy and the lateral areas in adjusting choice behavior (Cools et al., 2002; O'Doherty et al., 2003a).
The dorsomedial frontal activation observed for negative reward processes in the current study also supports the role of the ACC in response switching (Bush et al., 2002; O'Doherty et al., 2003a). Event-related potential studies on error-related negativity point to the ACC as the source for negative reward prediction errors (Nieuwenhuis et al., 2004; Yeung et al., 2005), which may be responsible for switching behavior. Kennerley et al. (2006) found that ACC lesion in the nonhuman primate impaired the animal's ability to sustain rewarded responses and suggested that the ACC was critical in guiding choice behavior based on the consequences of previous actions.
In conclusion, the current study revealed a functional distinction in frontal and striatal areas for processing of positive and negative reward information at various phases. A better understanding of common and distinct involvement of these regions in reward processing will not only help model complex reward-related decision making but also aid in developing treatments targeted toward disruption of different components of reward circuitry.
Footnotes
-
This work was supported by grants from the Internal Service Center fund from the University of Kentucky and by National Institutes of Health Grants R01 MH063817 and P20 RR015592. We thank two anonymous reviewers for their constructive criticism and insightful comments.
- Correspondence should be addressed to Dr. Xun Liu, Department of Anatomy and Neurobiology, University of Kentucky, Lexington, KY 40536-0098. xun.liu{at}uky.edu