A great deal of human behavior and motivation is based on the intrinsic emotional significance of rewarding or aversive events, as well as on the associations formed between such emotional events and concurrent environmental stimuli. Recent functional neuroimaging studies have implicated the ventral striatum, orbitofrontal cortex (OFC), and amygdala in the representation of reward values and/or in the anticipation of rewarding events. Here, we use functional magnetic resonance imaging to compare brain activation during the presentation of reward with that during presentation of (conditioned) stimuli that have been paired previously with reward. Specifically, we aimed to investigate conditioned reward in the absence of explicit reward anticipation. Twenty-two healthy volunteers were scanned while monochrome visual patterns were incidentally associated with reward or negative feedback in the context of a simple card game. In the subsequent session, visual patterns, including the conditioned stimuli, were presented without reward or negative feedback, and the affective valence of these stimuli was assessed behaviorally. The presentation of reward compared with negative feedback activated the ventral striatum and OFC. Activation in the same OFC region was observed when, in the subsequent session, subjects passively viewed the stimuli that had been paired with reward, without the administration of reward and with subjects being essentially unaware of the conditioning manipulation. These findings suggest that the OFC in humans plays an important role in the representation of both rewarding stimuli and conditioned stimuli that have acquired reward value.
- appetitive conditioning
- nucleus accumbens
- prefrontal cortex
- preference judgments
- financial reward
Novel stimuli in the environment can acquire affective and motivational significance via their association with rewarding events. This process of pavlovian conditioning has consequences for behavior, because these conditioned stimuli can act as rewards in their own right, eliciting behavior and affecting mood. Conditioned reward is not a unitary phenomenon, however (Cardinal et al., 2002), and at least two kinds of learning can be distinguished. First, the conditioned stimulus (CS) comes to share affective properties with the reward: it takes on a reward value of its own (here referred to as intrinsic reward value). Second, the CS comes to explicitly predict the occurrence of a rewarding outcome. In this study, we examine the brain substrates of conditioned reward. We begin to decouple the rewarding and predictive values of a CS by using a procedure that minimizes the degree to which volunteers can explicitly predict reward when presented with a CS.
Behavioral procedures in animals have shown how a CS acquires incentive value and have implicated the amygdala and orbitofrontal cortex (OFC) in this process (Holland and Gallagher, 1999; Parkinson et al., 2000, 2001; Chudasama and Robbins, 2003; Pears et al., 2003). In addition, neurophysiological work in animals has demonstrated that the ventral striatum plays an important role in processing rewarding stimuli and in the anticipation of predicted, as well as conditioned, rewards (Everitt et al., 1999; Schultz, 2000).
Recent functional neuroimaging studies in humans have yielded results that are consistent with a wealth of animal research (Gallagher et al., 1999; Schoenbaum et al., 1999; Schultz, 2000; Parkinson et al., 2001; Baxter and Murray, 2002) in demonstrating that the amygdala, ventral striatum, and OFC are involved in aspects of reward (Delgado et al., 2000; Elliott et al., 2000; J. O'Doherty et al., 2001a; Small et al., 2001) and conditioned reward (Gottfried et al., 2002b, 2003; Arana et al., 2003; Kirsch et al., 2003; J. P. O'Doherty et al., 2003). However, it is unclear in these studies of conditioned reward to what extent the results are explained by (explicit) reward anticipation or by other components of reward conditioning, such as the representation of the affective value of the CS.
Our previous work demonstrates that preferences can be conditioned for previously neutral stimuli after repeated pairings with rewarding events in the absence of explicit reward anticipation (Johnsrude et al., 1999, 2000; Cox et al., 2002). However, the brain system supporting this preference learning has not yet been explored in detail. Here, we use a procedure, based on that of Johnsrude et al. (1999), during which abstract visual patterns are consistently associated with monetary and auditory reward. In a subsequent session, the patterns are presented alone and their (conditioned) affective valence is assessed behaviorally. Functional magnetic resonance imaging (fMRI) scans are acquired continuously throughout both of these experimental sessions. Subsequent debriefing ensures that volunteers are mostly unaware of reward contingencies. This design allows us to study neural responses in response to reward as well as to conditioned reward in the absence of explicit reward anticipation.
Materials and Methods
Twenty-two right-handed, healthy volunteers (12 males, 10 females; 18-30 years of age) took part in this study. The participants, all of whom were students, were recruited from the Medical Research Council Cognition and Brain Sciences Unit volunteer panel. All of the participants gave informed consent. This study was approved by the Local Research Ethics Committee (Cambridge, UK).
Participants were scanned while performing two separate tasks, which were presented in a fixed order. The first task (conditioning) consisted of two components: (1) a cover task, which was a simple card game, and (2) a conditioning procedure, concurrent with the card game, in which visually discriminable abstract patterns were associated with reward or negative feedback. The card game served to mask the relationships between the visual patterns and reward to prevent the subjects from being aware of the conditioning manipulation.
During the second task (judgment), these abstract patterns were presented on their own, without reward or negative feedback, and subjects were asked to indicate their preference for individual patterns via a paired-comparison procedure.
Cover task. A schematic drawing of a set of trials in the conditioning phase is shown in Figure 1. Subjects were presented with virtual playing cards, shown face up, one at a time (duration, 500-2000 ms), on a computer screen. Subjects were told that each card had a number written on the “back” of it. They were to guess whether the number on the back was higher or lower than the value shown. After the response (and an interstimulus interval of 1500 ms) made with either the right (higher) or left (lower) index finger to press a key on a button box, the back of the card was shown (duration, 2000 ms), and feedback (reward or negative feedback) was given simultaneously. On “correct” trials, accompanied by reward, the number superimposed on the back of the card was presented in green. On “incorrect” trials, accompanied by negative feedback, the number was presented in red. Reward consisted of £0.25 and a melodic flourish (duration, 2670 ms) played through the headphones. Negative feedback was a loss of £0.22 and a buzzer sound (duration, 960 ms). After an intertrial interval of 1000 ms, the following trial was initiated. A “reward bar,” continuously visible on the right side of the computer screen, represented the cumulative winnings. This bar rose or fell simultaneously with the presentation of reward or negative feedback. The playing cards varied from numbers 4 to 9. The outcome numbers on the back varied from 1 to 12.
Conditioning procedure. Unknown to the subjects, the outcome of a trial (correct or incorrect) was prespecified, and the outcome numbers were (pseudorandomly) chosen after the subject's response to conform to this prespecified outcome. Thus, if the face of the card was a 4, the subject pressed “higher,” and the trial outcome was supposed to be correct, then the number written on the back of the card would be between 5 and 12. Conversely, if that trial was specified to be incorrect, the number on the back would be 1, 2, or 3. Five visually discriminable, abstract patterns (Petrides and Milner, 1982; Johnsrude et al., 1999) were presented on the backs of the cards, and the outcome number was superimposed on these. These patterns were paired with reward and negative feedback in predetermined ways at different contingencies. The patterns were not mentioned in the instructions and were irrelevant to the card game cover task. The “positive pattern” was associated with reward on 90% of the trials in which it appeared and with negative feedback on the other 10% of trials. The “negative pattern” was accompanied by reward on 10% of the trials in which it appeared and by negative feedback on the other 90%. The other three patterns were each associated with reward on 50% of the trials in which they appeared and with negative feedback on the other 50% (“bivalent” patterns). These bivalent patterns were included to obscure the pattern-reward/pattern-negative feedback contingencies. To avoid stimulus-specific effects, five versions of the task were prepared, and subjects were randomly assigned to a particular version. In each version, different patterns were assigned to each of the reinforcement contingencies. Thus, the individual patterns that served as positive, negative, and bivalent patterns were different for different subjects.
Two runs of 120 trials (120 correct and 120 incorrect trials) were presented. The positive and negative patterns and one of the three bivalent patterns were presented 60 times each. The other two bivalent patterns were presented 30 times each. Trial order was pseudorandom and fixed. Trial length was dependent on subjects' reaction times and varied from 5000 to 6500 ms. The total length of the task was ∼30 min.
Eight different patterns were used in this part of the experiment: the five patterns from the conditioning task plus three novel patterns (Johnsrude et al., 1999). Each trial consisted of a set of two patterns. Within each trial, the patterns were first presented one at a time, in the center of the screen, in succession (stimulus duration, 1000 ms; interstimulus interval, 500 ms), and then both were presented simultaneously, with one on either side of the screen (Fig. 1 B). When shown together, subjects had to indicate, with a button press, which of the two patterns they preferred (they were instructed “indicate the one you like better: don't think about it too hard, just go with your first impression”). Stimuli remained on the screen until a response was made, again using either the right or left key on the button box. After an intertrial interval of 1000 ms, the next pair of stimuli was presented. Importantly, no reward or negative feedback was given during the judgment task. A total of 72 trials were presented. Each pattern was shown in 18 trials: 9 times on the right and 9 times on the left. In 30 trials, the positive, negative, and bivalent patterns were paired with each other, which allowed us to investigate conditioning effects independent of difference in exposure. In 10 of these 30 trials, the positive pattern was paired directly with the negative pattern. Thirty trials were presented in which each of the novel patterns (those stimuli only presented during judgment) was presented with each of the five patterns shown during the conditioning task. Finally, 12 trials were presented in which each of the novel patterns was paired with another novel pattern. Trial length was dependent on subjects' reaction times and varied from 4500 to 6000 ms. The total length of the task was ∼7 min.
Participants were questioned carefully after scanning to determine the degree to which they were aware of the experimental contingencies and the purpose of the experiment. All eight of the patterns that were seen during the judgment condition were presented simultaneously on the screen (Fig. 1C). A number on top of each pattern indicated how many times (of a possible maximum of 18) it had been chosen during the judgment condition. Subjects were asked the following questions to assess their perceptions of their preferences and their awareness of the conditioning manipulation: (1) for the two most-preferred patterns, “You chose this pattern most. Why?”; (2) for the two least-preferred patterns, “You chose this pattern least. Why?”; (3) “Did you see any of these pictures during the card game?” If the subjects answered “yes,” they were then asked “Which ones?”; and (4) “Did you notice anything in particular when you saw them?” If the subjects did not say anything spontaneously, “for example, with a particular number or color?” was used as a prompt.
Finally, participants were fully informed about the procedure and purpose of the experiment, reimbursed for their time, and were also paid the money that they had “won” during the card game (£3.60).
Experimental stimulus delivery
Custom software was written in Visual Basic version 6.0 (Microsoft, Redmond, WA). Stimuli were back projected onto a translucent screen positioned within the bore of the magnet behind each subject's head, visible in a mirror placed just above each subject's head. Subjects also wore magnet-compatible, high-fidelity electrostatic earphones, mounted in sound-attenuating ear defenders (Palmer et al., 1998).
Imaging data were collected using a 3 tesla Bruker MedSpec MR system (Ettlingen, Germany). Gradient echo T2*-weighted echo planar images (EPIs) were acquired continuously during conditioning and judgment, with a repetition time of 3 s. Whole-brain volumes of 21 slices (4 mm thick, with a 1 mm interslice gap and an in-plane resolution of 1.95 × 1.95) were collected in an axial-oblique orientation, angled slightly away from the eyes. In addition, gradient-echo non-EP magnetic-field map images were acquired to calculate the deviations of the magnetic field from the desired homogenous field. This information was used to geometrically undistort the EPIs. Finally, a T1-weighted high-resolution anatomical scan of each volunteer was acquired. The start of each run was synchronized with the onset of scanning. The first six EPIs in each session were discarded to avoid T1 equilibrium effects.
fMRI data were preprocessed and analyzed using statistical parametric mapping 99 (SPM99) (Wellcome Department of Cognitive Neurology, London, UK) and statistical nonparametric mapping (SnPM) (Nichols and Holmes, 2002).
Before analysis, images were corrected for slice timing, using the first slice of each scan as a reference. To correct for motion, images were realigned with respect to the first image, and a mean realigned image was created. EPIs were then geometrically undistorted using field maps (Cusack et al., 2003), followed by spatial normalization to the standard Montreal Neurological Institute (Montreal, Quebec, Canada) (International Consortium for Brain Mapping 152) EPI template. To prevent the normalization from being influenced by areas of signal dropout, a mask was applied to areas in which dropout was observed in the mean EPI images during the calculation of normalization parameters. Finally, our data were spatially smoothed using a 10 mm (full-width at half-maximum) Gaussian kernel.
Data from the conditioning and judgment tasks were modeled and analyzed separately. For both sessions, a general linear model was applied to fit each voxel with a combination of functions derived from convolving the standard hemodynamic response with the time series of the events. Low-frequency “noise” in the time series was removed using a high-pass filter (cutoffs: conditioning, 32 s; judgment, 120 s).
Conditioning. The hemodynamic response function during conditioning was modeled for 12 different events: (1) the start of each trial, which is the presentation of the playing card, (2) the response (“high” or “low” button press), and (3) the onset of 10 different kinds of outcome (five back-ground patterns, each combined with reward and negative feedback).
Six movement regressors were added to the model to take variation attributable to movement into account. The presentation time of the face of the card varied over trials, because this was dependent on how quickly the subjects responded on that trial (mean reaction time, 1245 ms; SD, 657 ms). After an interstimulus interval of 1500 ms, the back of the card, outcome number, and feedback (reward or negative feedback) were presented for 2000 ms. This was followed by an intertrial interval of 1000 ms. To reduce the chance of observing an effect resulting from a correlation between events, the start of the trial and the response were orthogonalized with respect to the outcome of the trial. This meant that variance, which could be attributable to the trial outcome, the high/low response, or the start of the next trial (because these three events were correlated in time), was in fact attributed to the trial outcome (Andrade et al., 1999). The events of interest were those in which feedback was presented; blood oxygenation level-dependent (BOLD) signal in response to the presentation of reward was compared with that in response to the presentation of negative feedback. The single-subject contrast images, generated by pairwise comparisons among the parameter estimates for these different events, were then taken to a second-level random-effects analysis using SnPM (www.fil.ion.ucl.ac.uk/spm/snpm). A standard nonparametric randomization/permutation testing procedure was used, because this approach makes minimal assumptions about the distribution of the data (Nichols and Holmes, 2002). As a result, the voxel variance can be smoothed over neighboring voxels to calculate pseudo-t statistics. This type of variance smoothing reduces noise from the statistic image and adds degrees of freedom, resulting in increased power of the design. Here, pseudo-t-statistic maps were calculated using 8 mm full-width at half-maximum smoothing of the variance images (this results in smoothing the noise, which is different from spatial smoothing of voxel signal, as described above) and were thresholded at p < 0.05. Because this nonparametric analysis provides a measure of exchangeability of labels across all of the possible comparisons, the calculated p value of this permutation distribution represents an exact, as opposed to estimated, measure and is thus corrected for multiple comparisons across the whole brain. Activation clusters that survived significance testing were then used as regions of interest (ROIs) in the analysis of data from the judgment condition.
Judgment. Ten different event types were modeled during this condition: 1-8, the presentation of the eight patterns individually (the positive, negative, three bivalent, and three novel patterns); 9, the presentation of the two patterns simultaneously; and 10, the response. Six movement regressors were added to the model to take into account variation attributable to movement. The individual patterns were presented for 1000 ms each with an interstimulus interval of 500 ms. The presentation time of the pair of stimuli varied, depending on the reaction time of the participants (mean reaction time, 814 ms; SD, 431 ms).
Contrast images were generated using the events that indexed individually the presentation of each of the eight stimuli. More specifically, activation while subjects were passively viewing the positive pattern was compared with that during viewing of the negative pattern. To exclude any BOLD signal explained by the variance of adjacent events, the onset of the presentation of the pairs of stimuli and the response were orthogonalized with respect to the presentation of the individual stimuli (Andrade et al., 1999).
The contrast images of each individual subject were taken into a random-effects analysis using SPM. Statistical maps were thresholded using an ROI analysis using MarsBaR (Brett et al., 2002), which applied the statistical model to the mean signal of each ROI. ROIs were defined by the significant activation clusters observed in the reward versus negative feedback contrast during the conditioning phase.
As a result of technical problems during the judgment condition, data on two subjects had to be excluded. Reward conditioning was assessed by computing a preference score, which compared the judgment score for the pattern paired most with reward (positive pattern) with that of the pattern paired least with reward (negative pattern). To exclude any confounding effects in this measure, such as the possible effects on preference of unequal exposure to patterns (i.e., novel vs seen before), the judgment score was extracted from only those trials in which the positive pattern was directly paired with the negative pattern, and the subject had to choose which of these two patterns he or she preferred. The preference score was calculated as the number of times that the positive pattern was chosen over the negative pattern, of a maximum of 10; 5 indicated chance performance. The mean across subjects was 6.6. A sign test was used to assess whether preference scores across subjects were, on average, reliably >5. The results show that subjects significantly preferred the positive pattern to the negative pattern (p = 0.048).
Other factors, such as a priori preferences, may have played a role in preference formation, and these may be more subjectively salient. When asked why they had chosen these patterns most/least (debriefing question 1 and 2), all of the participants attributed their liking/disliking to the physical characteristics of the patterns and not to previous experience of the patterns with reward/negative feedback during the conditioning phase (for example, responses for a positive and subsequently preferred pattern included “looks like a roller coaster”; responses for a negative and subsequently not preferred pattern included “looks like broken bones”).
In fact, none of the participants was completely aware of the experimental contingencies (able to correctly identify the two patterns most closely associated with reward and negative feedback). Five of 20 people did not recognize any patterns in response to debriefing question 3 (“Did you see any of these pictures during the card game?”). Two people were able to recognize some of the patterns when asked this question but did not include either their positive or negative pattern. Seven people recognized either their positive or negative pattern, but not both, among which four people noticed the association of the positive pattern with positive feedback, concluded from debriefing question 4 (“Did you notice anything in particular when you saw the stimuli during the card game?”). None of the people who had recognized their negative pattern was able to report the relationship between it and negative feedback in response to question 4. Six people recognized both their positive and negative patterns in response to question 3. Of these six, one person claimed that both patterns were associated with negative feedback, and the other five did not report any association when prompted to do so at question 4. Thus, only 5 of 20 volunteers were partially aware of the experimental contingencies: they were able to identify either the positive or the negative pattern associated with reward or negative feedback but not both. The mean preference score among these five partially aware participants was 5.4, which was not significantly different from that of the nonaware volunteers (mean, 6.93; t = 0.72). None of the participants was aware of the purpose of the experiment.
The majority of subjects appeared to be unaware of the experimental contingencies in the conditioning phase, and all of the subjects were unaware of the effects of their experience in the conditioning phase on their own subsequent behavior. The conditioning effect clearly contributes to observed preference but in a relatively subtle way, because most subjects preferred at least one other pattern to the positive pattern. When preference scores were calculated per pattern and ranked within subjects in order of preference, the positive pattern was one of the two most-preferred patterns in 10 of 20 subjects (and it obtained the highest score in only 6 of 20 subjects). The negative pattern was one of the two least-preferred patterns in 10 of 20 subjects (and the lowest-scoring pattern in only 3 of 20 subjects).
Subjects took, on average, 814 ms (SD, 431 ms) to make a judgment response. There was no cutoff for subjects' reaction time (maximum reaction time, 4094 ms), and there were no missed trials.
Conditioning phase: reward versus negative feedback
This contrast reveals the areas involved in response to the presentation of reward compared with the presentation of negative feedback during the conditioning phase.
Significant activations at a level of p < 0.05, surviving a whole-brain correction for multiple comparisons, are listed in Table 1 (Fig. 2). These coordinates represent the peak activation of an entire cluster. One cluster includes all of the adjacent voxels that survive significance. Only one significant peak per cluster is presented in the table.
In this contrast, we observed activation in the ventral striatum, primarily in the putamen and nucleus accumbens, extending into the amygdala in the left hemisphere (-26, -10, and -16) (Fig. 2). Significant effects were also detected in the OFC on the midline, the left OFC, anterior cingulate cortex, dorsal thalamus, and bilaterally in the superior temporal gyrus, in what is probably auditory cortex. These auditory activations may result from acoustic differences between the sounds used to signal the outcome (melodic flourish vs buzzer); these sounds were of different duration and were different in spectrotemporal structure.
Judgment condition: activation related to conditioned patterns
In the judgment task, we compared BOLD signal during passive viewing of the positive pattern (presented on its own in the middle of the screen) with BOLD signal during passive viewing of the negative pattern (presented on its own in the middle of the screen). Because of technical problems during the judgment condition, data from two subjects had to be excluded from the analysis (the same two subjects that were excluded from the behavioral analysis). Differences in activation between these two conditions in the remaining 20 subjects, using a whole-brain analysis with p < 0.001 uncorrected, are shown in Figure 3. These areas did not reach significance after a whole-brain correction for multiple comparisons. An ROI analysis was applied based on the significant activations (p < 0.05) observed in the reward versus negative feedback contrast from the conditioning task, as shown in Table 1. Only activation in OFC (peak 0, 44, and -8) was significant. Activation in the retrosplenial cortex reached significance at an uncorrected level but did not survive correction for multiple comparisons across all of the resels of the ROI clusters of activation.
Although post-test debriefing revealed that a minority of participants may have been partially aware of the relationship between particular patterns and reward and negative feedback, no subject was fully aware. All of the subjects attributed their preferences to the physical characteristics of the patterns and not to previous experience with the stimuli during the card game. We conclude that our participants were mostly unaware of the conditioning manipulation during testing. We observed activation in the OFC region during both reward presentation and also when subjects viewed conditioned patterns without the administration of a physical reward. Because subjects could not explicitly anticipate a rewarding outcome while viewing the CS, we conclude that it is the intrinsic affective properties evoked by the CS that must primarily underlie the BOLD response in the OFC.
Our observation of OFC activation when subjects viewed stimuli associated previously with reward is consistent with theories suggesting that the OFC codes information regarding the reward value of a stimulus. Neurophysiological studies in primates indicate that neurons in the OFC detect and discriminate among rewards of different value and thus may play an important role in identifying or perceiving reward (Schultz et al., 2000). Furthermore, neurons in the monkey OFC cease to respond to food rewards when a monkey is satiated, so that the food no longer holds a rewarding value (Rolls, 2000). Compatible results have been reported in functional neuroimaging studies in humans. These implicate the OFC in the representation of relative reward value (Elliott et al., 2003; Knutson et al., 2003) and incentive value of stimuli (Arana et al., 2003).
The results also fit in with an alternative view of OFC function: that it is involved in the generation of “somatic markers” for affective states and the integration of this information into subsequent behavior (Schoenbaum et al., 1998; Bechara et al., 2000). According to the somatic marker hypothesis (Damasio, 1996), stimuli evoke a somatic state (measured by an increased skin-conductance response) that is associated with pleasurable or aversive somatic markers (e.g., the anticipation of affective outcomes).
Paulus and Frank (2003) suggest that the brain substrate of responses related to somatic markers, which includes regions overlapping with our OFC activation, is also involved in preference judgments. They conducted an fMRI study in which subjects made a preference judgment on pictures of soft drinks, using a paired-comparison procedure, and this was compared with a visual-discrimination task. The authors suggest that competition between appetitive and aversive somatic markers, processed in the ventromedial prefrontal cortex (VMPFC), is critical for decision making and may guide preference judgments even in the absence of rewarding or aversive outcomes associated with the response. Thus, the VMPFC appears to play a role in the representation of complex rewarding values involved in preference judgments (Paulus and Frank, 2003) (but see Erk et al., 2002). Our results are consistent with and extend these findings. In the
judgment condition, the affective information of the CS was important for the behavioral response later in the trial. After the presentation of the CS, subjects may have been preparing for a response by extracting affective information to be able to subsequently use this information for their response. However, no response selection was required while subjects were viewing the CS (the response was modeled separately from stimulus presentation). In fact, response selection at the time of CS presentation was impossible for approximately half of the trials, when the CS was presented as the first in the pair, because no decision could be made until both stimuli were known. This finding provides additional information that the OFC is specifically implicated in coding and perhaps extracting the affective information from conditioned stimuli to prepare for subsequent decision making, in the absence of response selection. Thus, conditioned stimuli may trigger the cascade of somatic responses via the OFC independently of any decision being required. However, additional studies are required to investigate the exact relationship between somatic responses and neural markers in response to rewarding stimuli. A role for the human OFC in the representation of reward value, as well as guiding behavior based on this information, is consistent with functional neuroimaging studies that investigated responses to rewarding stimuli in the presence or absence of response selection (Arana et al., 2003; J. O'Doherty et al., 2003). The present study, however, is the first to show similar OFC responses to stimuli that have acquired affective value via associative conditioning.
Rewarding properties of conditioned stimuli have been investigated in rats and nonhuman primates, using conditioned reinforcement and autoshaping procedures. These studies suggest a role for the OFC in processing the incentive value of a conditioned reward (Chudasama and Robbins, 2003; Pears et al., 2003). The activation in the OFC in response to conditioned reward observed in the present study provides compelling evidence for a similar role for the human OFC in the representation of intrinsic affective value of the CS after CS-unconditioned stimulus (US) pairings.
OFC activation in humans has been observed in only a few other neuroimaging studies of reward conditioning (Gottfried et al., 2002b, 2003; J. P. O'Doherty et al., 2003). For example, Gottfried et al. (2002b) paired faces with a pleasant, unpleasant, or neutral odor or with no olfactory stimulus. Participants had to decide the sex of each face, and the odor was administered while each face was on the screen. Activation in the OFC was observed when faces associated with a pleasant odor were compared with neutral faces never paired with an odor. A direct comparison between positively and negatively conditioned faces implicated the ventral striatum and the amygdala. Recently, the same authors reported an fMRI study using a postconditioning devaluation procedure and implicated both the OFC and amygdala in the representation of the current value of reward (odor) predicted by the CS (abstract visual stimulus), as shown by decreased activation in these areas in response to the CS after reward devaluation (Gottfried et al., 2003). A crucial difference between these procedures and the paradigm used here concerns anticipation of the CS. Although subjects in the study by Gottfried et al. (2002b) had not explicitly been told about the CS-US contingencies, there was no attempt to make them difficult to perceive, and the authors did not report any measures of awareness. Adjusted responses to the CS after postconditioning reward devaluation may reflect both a change in reward prediction and a change in intrinsic reward value of the CS. Finally, J. P. O'Doherty et al. (2003) reported activation in the ventral striatum and the OFC in response to reward-prediction errors in the context of appetitive pavlovian conditioning. Reward-prediction errors index the magnitude of the discrepancy between reward expectancy and outcome. Although this clearly plays an important role in the acquisition of motivational significance of the CS via CS-US pairings, this aspect emphasizes the explicit predictive value of the CS. In contrast, the current study reveals OFC activation in the absence of explicit reward prediction, suggesting a role for this region in processing the intrinsic reward properties of stimuli that have acquired affective significance via pavlovian conditioning.
A network of areas, including the ventral striatum, amygdala, and anterior cingulate, was activated in response to the presentation of reward compared with negative feedback, consistent with results from both neurophysiological studies in animals and other human functional neuroimaging studies of reward (Delgado et al., 2000; Elliott et al., 2000, 2003; Knutson et al., 2000; Schultz et al., 2000; Zalla et al., 2000; J. O'Doherty et al., 2001a,b, 2003; Small et al., 2001; Gottfried et al., 2002a). However, unlike other studies of appetitive conditioning (Gottfried et al., 2002b, 2003; J. P. O'Doherty et al., 2003), we did not observe activation in the amygdala or ventral striatum during the presentation of conditioned reward. This may be because the affective valence is too subtle for these regions to evoke a persistent BOLD response or because the smoothing filter that we used was not optimally sensitive to subcortical activation foci. An alternative explanation may be the lack of explicit anticipation of a physical reward during the judgment condition, which relates to other studies in which amygdala and ventral striatal activation was found in response to the anticipation of both abstract and primary rewards (Knutson et al., 2001a,b; J. P. O'Doherty et al., 2002; Hommer et al., 2003). In these studies, reward anticipation was induced by the presentation of differential cues that explicitly signaled subsequent administration of a rewarding, neutral, or aversive outcome. Although a cue predictive of a rewarding outcome may also acquire intrinsic affective value, a CS need not be endowed with the affective properties of the US for the animal to use it to anticipate the occurrence of the US.
In conclusion, we showed distinct patterns of neural responses after the presentation of reward compared with negative feedback, implicating the ventral striatum, amygdala, and OFC. An overlapping OFC region was also activated by the presentation of stimuli associated with reward, without the administration of an immediate reward and without participants anticipating a reward. These results suggest that the OFC plays an important role in coding the intrinsic affective properties of both rewarding events and conditioned stimuli that have acquired a rewarding value in the absence of explicit anticipation of a reward.
This work was supported by the Medical Research Council and the University of Maastricht (S.M.L.C.). A.A. was funded by a Marie Curie Individual Fellowship. We thank the radiographers from the Wolfson Brain Imaging Center (Cambridge, UK) for their help with data acquisition and Matthew Brett and Tom Nichols for advice on statistical analysis. We are also grateful to John Parkinson for helpful comments and discussion.
Correspondence should be addressed to Sylvia M. L. Cox, Department of Psychiatry, McGill University, 1033 Pine Avenue West, Montreal, Quebec, Canada H3A 1A1. E-mail:.
Copyright © 2005 Society for Neuroscience 0270-6474/05/252733-08$15.00/0