Responses to monetary reward in humans have been assessed in a number of recent functional imaging studies, and it is clear that the neuronal substrates of financial reinforcement overlap extensively with regions responding to primary reinforcers, such as food. Money has the practical advantage of being an objectively quantifiable reinforcer. In this study, we exploit this advantage using a parametric functional magnetic resonance imaging design to look at the patterns of responding to systematically varying reward values. Twelve healthy volunteers were scanned during performance of a rewarded target detection task, in which the reward value varied between task blocks. We observed three distinct patterns of responding in different regions. Amygdala, striatum, and dopaminergic midbrain responded to the presence of rewards, regardless of value. In contrast, premotor cortex showed a linear increase in response with increasing reward value. Finally, medial and lateral foci of orbitofrontal cortex responded nonlinearly, such that response was enhanced for the lowest and highest reward values relative to the midrange. These results suggest functional distinction in response patterns within a distributed reward system.
Electrophysiological studies in animals have revealed much about neural systems mediating reward. Functional neuroimaging now allows these systems to be investigated in the human brain. Studies using pleasant sensory stimuli (Rolls et al.,1997; O'Doherty et al., 2001, 2002), drugs (Breiter et al., 1997;Stein et al., 1998; Volkow et al., 1999), or financial reward (Delgado et al., 2000; Elliott et al., 2000; Knutson et al., 2000; Breiter et al., 2001) have demonstrated roles for distributed neural systems in mediating human reward processing. Key components include midbrain, amygdala, striatum, thalamus, and regions of prefrontal cortex, in particular orbitofrontal and anterior cingulate cortices. These regions parallel those identified in an extensive animal literature (Koob, 1992; Robbins and Everitt, 1992, 1996; Schultz, 2000), and it is striking that abstract rewards in humans (winning money, success in a fictitious competition, and symbolic reward) are associated with neuronal responses in the same regions that respond to primary reinforcers.
Schultz and others have proposed detailed theoretical models of the functional divisions within extended reward systems based on electrophysiological and lesion evidence in animals (Schultz, 2000). Ventral striatal neurons fire in response to actual rewards but also during anticipation of predicted rewards (Schultz et al., 1992, 1993;Hollerman and Schultz, 1998). In contrast, orbitofrontal cortex (OFC) appears to code relative, rather than absolute, values of rewards (Tremblay and Schultz, 1999; Watanabe, 1999). Other regions within the system also play functionally distinct roles; for example, the amygdala is critical in associative learning, relating stimuli to rewards (Hatfield et al., 1996; Holland and Gallagher, 1999).
Dissociable functions within human reward systems are less clearly understood, although evidence from functional magnetic resonance imaging (fMRI) has started to suggest important distinctions. In a previous study (Elliott et al., 2000), we used a simple gambling paradigm to show that total winnings correlated with hemodynamic response in ventral striatum. In contrast, OFC responses correlated with the most extreme outcomes, whether winning or losing, a finding also reported by Breiter et al. (2001). In a different approach,O'Doherty et al. (2001) showed that magnitude of symbolic monetary reward received in a reversal-learning task was correlated with neuronal response in medial OFC, whereas magnitude of punishment correlated with response in lateral OFC.
The aim of the present experiment was to explicitly dissociate the responses of human reward systems, specifically foci of midbrain, ventral and dorsal striatum, amygdala, and prefrontal cortex, to varying magnitudes of financial reward using a parametric study design. Parametric designs have proved valuable in exploring relationships between systematically varying experimental parameters and physiological responses (Buchel et al., 1998). We used a simple target detection task in which correct responses were financially rewarded. The size of reward was varied across blocks of the task to detect different patterns of response in relation to reward value. Specifically, it allowed us to distinguish between regions in which the response to reward was a simple on–off function, regions in which there was a linear response to increasing reward, and regions in which the response related to reward value in a nonlinear manner.
Materials and Methods
Subjects. Twelve right-handed subjects, six male and six female, were recruited to participate in this experiment. All subjects were students at the University of Manchester (mean age of 23.6) and were not wage earners. The financial rewards used were therefore likely to have a similar value for all subjects. Subjects with self-reported neurological or psychiatric history were excluded, and subjects were asked not to use recreational drugs or drink excessive alcohol in the 48 hr before scanning. The Beck Depression Inventory was used to screen subjects for clinically significant depression. Subjects who were color-blind were also excluded.
fMRI scanning. Subjects were scanning using a Phillips (Eindhoven, Holland) 1.5T Gyroscan ACS NT, retrofitted with Powertrak 6000 gradients, operating at software level 6.1.2. One hundred two single-shot echo-planar volume images were acquired, with a repeat time of 5 sec and an echo time of 40 msec. Each volume comprised 40 axial slices with 3.5 mm spacing and in-plane resolution of 3 × 3 mm. The first two volumes of each run were to allow for T1 equilibration effects and were discarded before analysis. A T1-weighted structural scan was also acquired for each subject, and these were examined by a consultant radiologist to exclude any structural abnormality; no such abnormality was reported for any of the 12 subjects.
Cognitive task. Subjects were scanned during performance of a simple target detection task. Different colored squares were presented on a screen at a rate of one every 1.33 sec. Subjects were told to respond by squeezing a pneumatic bulb with their right hand every time they saw a green or blue square. The study was divided into blocks 40 sec long; each block contained 22 colored squares, eight of which were targets, interspersed randomly among nontargets. When subjects responded to a target, they saw a reward stimulus comprising an image of a coin with the monetary value superimposed. Each reward stimulus was also displayed for 1.33 sec. The value of the reward was constant within blocks, and the amount to be won for correct responses was displayed continuously at the bottom of the screen.
Four levels of reward were used in different blocks [10p (pence), 20p, 50p, and £1 (pound)], and there were also blocks in which responses elicited no reward. Subjects saw a blank circle after squeezing the bulb. In between the 40 sec blocks were 10 sec rest blocks. These were included partly to give subjects a break and partly to allow nonspecific drift in fMRI signal to be modeled out of the data.
Data analysis. Data were analyzed using SPM99 (K. J. Friston, The Wellcome Department of Cognitive Neurology, London, UK). Images were first realigned, using the first image as a reference. They were then normalized into a standard stereotactic space, using Montreal Neurological Institute templates and the coordinate system of Talairach and Tournoux (1988), and smoothed using an isotropic Gaussian kernel filter of 10 mm full-width half-maximum to facilitate intersubject averaging.
Statistical analysis was performed with a random effects model. A parametric design was used, as discussed by Buchel et al. (1998), that allowed us to model nonlinear as well as linear hemodynamic responses using orthogonalized polynomial expansion functions. First-level analysis was performed on each subject to generate a single mean image corresponding to each term of the polynomial expansion. These mean images were then combined in a second-level analysis using one-samplet tests to investigate group effects. Statistical maps were thresholded at p < 0.001 uncorrected, and small volume corrections (Worsley et al., 1996) were applied to a priori regions of interest: amygdala, dorsal and ventral striatum, and medial and lateral OFC.
All subjects correctly detected all targets. Response latencies did not differ significantly under the different reward conditions, although there was a nonsignificant trend for subjects to respond quicker for larger rewards (p < 0.1).
For clarity, we focus here on positive associations between reward size and neuronal responses. We had no clear predictions about regions that would be more responsive for smaller rewards, and there were no responses observed in negative contrasts that survived correction for multiple comparisons.
Regions responding to reward compared with no reward
This represents the zeroth-order term in the parametric analysis and corresponds to those regions in which there is an on–off or all-or-nothing response to the presence of reward (Table1).
Neuronal responses significant at p < 0.05, corrected for multiple comparisons, were observed in the bilateral lingual gyrus, left postcentral gyrus (BA 3), anterior medial prefrontal cortex (BA 9), and left putamen. Responses significant at p < 0.001 uncorrected were seen in bilateral superior temporal gyrus, right insula, right premotor cortex (BA 6), dopaminergic midbrain, right putamen, right ventral striatum, and right amygdala. Applying a small volume correction to the hypothesized regions of midbrain, striatum and amygdala, these regions were significant at p < 0.05 corrected (Fig. 1). There was also a response in the left lateral OFC (BA 11), significant atp < 0.001 uncorrected, but this did not survive small volume correction.
Regions responding linearly to increasing reward
This represents the first-order term in the parametric analysis and identifies those regions in which neuronal response increases monotonically with increasing reward (Table 1). The main region involved was a large cluster with the voxel of maximal response in right premotor cortex (BA 6) (Fig.2).
Regions responding nonlinearly to increasing reward
This represents the second-order term in the parametric analysis. Because of the orthogonalization of the polynomial expansion terms, the form of the model was a U-shaped curve (Buchel et al., 1998). Thus, this actually represented regions in which the response was maximal at the lowest (zero) and highest (£1) levels of reward and less so at the intermediate levels. The regions involved were the anterior medial frontal cortex (BA 8), in which response survived correction for multiple comparisons, and the medial (BA 10) and lateral orbitofrontal cortex bilaterally (BA 47) (Fig.3 A,B). The medial focus survived small volume correction at p< 0.05. Although the individual lateral OFC foci did not, the fact that this response was bilateral and symmetrical argues against a type 2 error.
This study confirmed roles for dorsal and ventral striatum, amygdala, and medial and ventral prefrontal regions in human reward processing. As in previous studies (Elliott et al., 2000; Knutson et al., 2000; Breiter et al., 2001), it is striking that regions responsive to monetary reinforcement overlap extensively with those responsive to primary reinforcers in animals. The key finding of the present study was of differential patterns of responsiveness in different regions. Amygdala and striatum showed an all-or-nothing response to reward, whereas premotor cortex responded linearly to increasing reward and anterior medial frontal and OFC foci responded in a more complex, nonlinear manner.
The simple on–off striatal response to reward is, at first sight, contradictory to our previous finding (Elliott et al., 2000) of increased ventral striatal response associated with cumulative amount of money won. However, in that study, the amount won was confounded with the number of reward experiences. Each individual reward had the same value, and high accumulated winnings reflected more rewards experienced. In the present study, the number of rewards experienced is constant across rewarded blocks; it is the value of individual rewards that varies. It is therefore possible that striatal signal reflects the number of reward experiences to a greater extent than their value. Breiter et al. (2001) also reported increases in ventral striatal response associated with increasing reward value in a design in which value and number of rewards were not confounded. However, it is striking that, in their study comparing $0, $2.50, and $10 rewards, the difference in ventral striatal signal between $0 and $2.50 appeared much greater than the difference between $2.50 and $10, which would be reasonably consistent with the pattern of responding observed here.
Although a role for striatum in processing monetary reward has been reliably demonstrated, amygdala response has been less consistently observed. Here, the pattern of amygdala response is similar to that seen in ventral striatum. In neuropsychological studies of gambling (Bechara et al., 1999), patients with bilateral amygdala damage fail to observe the normal emotional responses to monetary reward, clearly suggesting a role for this region in financial reward processing. However, imaging studies of gambling have not always reported amygdala activation, perhaps reflecting a relatively transient signal in this region. An important consideration is that, in most imaging studies of gambling, rewards are not fully predictable. For rewarded blocks in the present study, there is a 100% contingency between target stimuli and rewards. The task therefore has similarities with secondary reinforcement and associative learning paradigms in animals, which critically implicate the amygdala (Hatfield et al., 1996; Schoenbaum et al., 1998; Holland and Gallagher, 1999). Conditioned reinforcement is likely to occur to a greater extent here than in tasks in which relationships between cues, responses, and rewards are not completely predictable.
Perhaps the most striking finding of this study is of dissociable patterns of responding in striatum–amygdala compared with OFC. This corroborates studies in animals (Tremblay and Schultz, 1999; Watanabe, 1999) that suggest that patterns of neuronal firing associated with reward are different in striatal and OFC neurons. Although both regions contain neurons that respond during the expectation and detection of reward, OFC neurons additionally code relative values of different rewarding stimuli. The on–off pattern of striatal response observed here is clearly consistent with the proposal that this region is involved in expectation and detection of rewards. Rewards are expected with the same probability and detected with the same frequency in all of the rewarded blocks; what varies is the value of the reward. The exact pattern of responding in OFC regions is such that response is maximal to the zero reward and £1 reward conditions and lowest to the midrange values. A region that responds to extremes of the reward range may be best equipped to code relative values.
In a previous study (Elliott et al., 2000), we reported that OFC regions (although exclusively lateral ones in that study) responded under the most extreme situations of winning or losing in a gambling task. Similarly, Breiter et al. (2001) demonstrated OFC responses (both medial and lateral) that reflected either the worst or best possible outcomes in a probabilistic task, including foci that coded both extremes rather than intermediate situations. Neuropsychological studies (Bechara et al., 1999, 2000) in patients with OFC lesions suggests that the deficits in decision making shown by these patients are attributable to impaired ability to weigh up consequences of actions rather than hyposensitivity or hypersensitivity to good or bad outcomes. Again, this suggests a more relative than absolute coding of reward value in the OFC.
The finding of a U-shaped relationship between reward value and OFC function is not, however, consistent with the results of O'Doherty et al. (2001), demonstrating a positive correlation between medial OFC response and reward value but a negative correlation between lateral OFC and reward value (expressed as a positive correlation with punishment). Although (with the eye of faith) there is some evidence from our study (Fig. 3) that the U-shaped function observed is skewed toward the positive extreme in medial OFC and the negative extreme (actually zero) in lateral OFC, the dissociation observed by O'Doherty et al. is not borne out here. A possible explanation for the discrepancy is that we only used rewards, whereas both rewards and punishments were used by O'Doherty et al. Lateral orbitofrontal responses have been particularly associated with behavioral inhibition and perceptual set shifting (Dias et al., 1996; Bechara et al., 2002), and negative outcomes may act as cues to elicit such behavioral change. Financial penalties were not included in the present design, and it is possible that the prospect of negative outcomes may have led to a clearer functional dissociation between medial and lateral regions.
The differential pattern of responding in OFC relative to limbic–striatal structures observed here was predictable on the basis of previous research. More surprising was the linear pattern of responding in premotor cortex. This finding should be interpreted with caution because the observed response did not survive correction for multiple comparisons, and, because it was not predicted a priori, use of a region of interest approach was not appropriate. However, response in this region was spatially extensive, and we therefore believe that it is likely to represent a genuine effect. It is interesting that the linear relationship between increasing reward value and premotor response is paralleled by a trend toward a linear decrease in reaction time. Subjects tend to respond quicker when targets predict higher reward value. Premotor responses may reflect increased motor preparedness to respond to stimuli predicting larger rewards. In a framework proposed by Schultz (2000), dorsal and lateral prefrontal regions, including premotor cortex, are suggested to be particularly involved in using information about expected rewards to mediate the goal-directed behavior that elicits reward delivery.
Unlike several recent studies (Knutson et al., 2000; Breiter et al., 2001; Critchley et al., 2001; O'Doherty et al., 2001), we adopted here a blocked rather than event-related approach. An event-related study in which reward magnitudes are varied would inevitably have introduced an element of unpredictability. Our approach allowed us to look at responses to reward magnitudes that were fully predictable within blocks and thus unaffected by the confound of expectation. This is an important point, because Breiter et al. (2001) have shown that responses to reward value are critically modulated by subjects' expectancy. However, by choosing the blocked approach, we are unable to specify whether the responses observed reflect reward anticipation, reward detection, or a combination of the two. It is possible that differential responses to reward value in these regions would be accompanied by differential temporal patterning of response in relation to cues, responses, and rewards, as previous studies in both animals (Schultz, 2000) and humans (Breiter et al., 2001) would predict.
This discussion has focused on the responses of amygdala, striatum, premotor cortex, and OFC. Other regions in which there were significant reward-related responses included occipital areas, showing an all-or-nothing response and perhaps reflecting more varied visual input in the reward conditions in which colored squares were interspersed with coin images. Also, a dorsomedial prefrontal region above the anterior cingulate showed a similar pattern of responding to the OFC. A corresponding region, with sensitivity to reward value, was reported byO'Doherty et al. (2001). This region has been implicated in studies of internal generation of emotional states (Reiman et al., 1997), independent of emotional valence, and may reflect enhanced emotive responses to the best and worst outcomes.
In conclusion, this study has shown that different components of human reward processing systems respond differentially to monetary value. Regions including midbrain, striatum, and amygdala were more responsive to the presence or occurrence of reward than its value. Premotor cortex responded linearly to increasing reward value, perhaps reflecting the increasing potency with which larger rewards control goal-directed behavior. Finally, a more subtle pattern of responding was seen in medial and lateral parts of OFC, whereby response was greatest for the lowest and highest rewards. This is consistent with a role for orbitofrontal cortex in coding relative, rather than absolute, values of rewards.
This work was supported by the University of Manchester Research Support Fund and the Medical Research Council.
Correspondence should be addressed to Dr. Rebecca Elliott, Neuroscience and Psychiatry Unit, Room G907, Stopford Building, University of Manchester, Oxford Road, Manchester M13 9PT, UK. E-mail:.