Abstract
Multiple features of the environment are often imbued with motivational significance, and the relative importance of these can change across contexts. The ability to flexibly adjust evaluative processes so that currently important features of the environment alone drive behavior is critical to adaptive routines. We know relatively little about the neural mechanisms involved, including whether motivationally significant features are obligatorily evaluated or whether current relevance gates access to value-sensitive regions. We addressed these questions using functional magnetic resonance imaging data and a task design where human subjects had to choose whether to accept or reject an offer indicated by visual and auditory stimuli. By manipulating, on a trial-by-trial basis, which stimulus determined the value of the offer, we show choice activity in the ventral striatum solely reflects the value of the currently relevant stimulus, consistent with a model wherein behavioral relevance modulates the impact of sensory stimuli on value processing. Choice outcome signals in this same region covaried positively with wins on accept trials, and negatively with wins on reject trials, consistent with striatal activity at feedback reflecting correctness of response rather than reward processing per se. We conclude that ventral striatum activity during decision making is dynamically modulated by behavioral context, indexed here by task relevance and action selection.
Introduction
In a laboratory environment it is common that reward contingencies depend upon a single feature of the environment (for example, the pitch of a tone). In more ecological contexts multiple features of the environment potentially signal reward, and the relative importance of these features varies according to context. Despite its importance, the mechanisms by which task relevance modulate behavior are poorly understood (Wilson and Niv, 2011). One possibility, suggested by theoretical models, is that behavioral relevance is controlled by selective attention (Dayan et al., 2000; Gershman et al., 2010). Here a single value prediction is generated, based upon a combination of stimulus features weighted by their behavioral importance, and this drives behavior (Fig. 1). Alternatively, different stimulus features might be automatically evaluated (Pessiglione et al., 2008), with a relevance modulation reflected in the strength of the connections between value representations and effector regions (Fig. 1). Functional neuroimaging can enable a discrimination between the above accounts. If relevance modulates the influence of stimulus features on reward representations, then only the value of relevant features should be signaled in reward-related areas, such as the ventral striatum (Schultz et al., 1992; Knutson et al., 2001; O'Doherty et al., 2004; Kable and Glimcher, 2007). If, in contrast, behavioral relevance exerts an effect on the links between reward signaling and effector areas, then the value of all stimulus features should be simultaneously represented.
We devised a paradigm in which subjects were presented simultaneously with one of two visual and auditory cues. The (explicitly signaled) reward contingencies for each trial were determined either by the visual cue, the auditory cue, or a combination of the two (the “cross-modal” condition). The relevant cue indicated whether the subject was presented with either a “good” or a “bad” offer, and subjects then indicated a choice whether to accept this or not. Feedback was provided on each trial, but only if the subject accepted the offer did outcomes contribute to their winnings. Using this design we were able to test a hypothesis that activity in ventral striatum reflects the value only of behaviorally relevant stimuli, consistent with an effect of relevance on evaluation processes themselves (Dayan et al., 2000).
In addition, our task allowed us to test whether outcome signals in the ventral striatum indexed an updating of action policies (Klein-Flügge et al., 2011; Li and Daw, 2011), or rather an updating of the value assigned to particular actions (Watkins and Dayan, 1992). This is important for understanding how human subjects learn (Dayan and Daw, 2008; Friston et al., 2009). Critically, the distinct accounts outlined above make opposite predictions about trials where subjects choose to reject an offer. If striatal signals reflect a direct updating of policies, as previously suggested (Li and Daw, 2011), then a foregone win represents a mistaken action and we would then expect to see a negative response in ventral striatum. If, on the other hand, striatal activity reflects an updating of action (or stimulus) values themselves then this predicts foregone rewards should be associated with a positive signal.
Materials and Methods
Subjects.
Twenty-five (17 female) right-handed subjects, age range 19–48, all free of psychiatric or neurological disease, participated in the study. The study was approved by the Joint National Hospital for Neurology and Neurosurgery (University College London Hospitals NHS trust) and Institute of Neurology (University College London) Ethics Committee. Subjects were paid according to their performance during the task (receiving from £17.40–39.80).
Stimuli and task.
On each trial of the experiment, subjects were asked to decide either to accept or reject an offer made to them (Fig. 2). They were instructed that whatever they decided, the outcome of the trial (either a win or a loss) would be shown, but only if they had chosen to accept the offer would it impact on their winnings in the session. The task comprised three experimental conditions: a “visual” condition in which offer value was determined solely by the visual cue (one of two colored boxes), an “auditory” condition in which it was determined solely by the auditory cue (one of two short clips of synthesizer pads), and a “cross-modal” condition where the combination of the auditory and visual cues dictated the offer value. Each trial consisted of a concurrent presentation of both auditory and visual cues. Stimuli were thus identical between conditions, and the individual trials differed only according to which stimulus features were behaviorally relevant for that trial. Conditions were presented in a pseudorandomized order, and were explicitly signaled to the subject by text presented for 1500 ms before stimulus presentation (Fig. 2).
Offers could be either good or bad, with good offers indicating an 80% win probability and a 20% probability of loss, and vice versa for bad offers. Two visual cues were used, one of which was fixed as the good cue throughout the experiment, with a similar arrangement for the auditory cues. This meant that there were four possible cue combinations in the cross-modal condition. For each subject we specified that good offers in the cross-modal condition were indicated either by congruent stimuli (both cues good or both cues bad) or incongruent stimuli, and this arrangement was counterbalanced across subjects so as to decorrelate the effects of value and congruence in the cross-modal condition. In subjects for whom congruent stimuli indicated good offers, the text for the cross-modal condition read “Congruent,” while for the other subjects it read “Incongruent.”
Stimuli were presented for 2000 ms (Fig. 2) and subjects were instructed to make an accept or reject response via a functional magnetic resonance imaging (fMRI)-compatible button box (response keys were counterbalanced across subjects). Outcomes (text reading “Win” of “Lose”) were presented visually for 1200 ms after a delay, which varied between 2000 and 8000 ms. The outcome of each trial was presented regardless of whether subjects chose to accept or reject an offer. In trials where they had chosen accept, a bar at the bottom of the screen indicating their cumulative earnings either increased or decreased in length by equal amounts according to whether they won or lost. If they had chosen to reject the text was presented with a line through it indicating that it did not impact on their earnings, and accordingly the earnings bar did not change in length. At the start of each session the indicator bar started at a length corresponding to winnings of £3.60. This was implemented to ensure losses were meaningful even in early trials of the session. On trials where subjects failed to make either an accept or reject response, the words “Too Slow” were displayed, and subjects lost an amount equal to trials where they accepted an offer and a lose outcome occurred. Note that these latter trials (1.4%) were excluded from our behavioral and neuroimaging analyses.
Before scanning, subjects were trained on the value of the auditory and visual stimuli separately using a simple instrumental learning procedure in which auditory and visual stimuli were presented alone in separate blocks of 24 trials. As in the main task, subjects chose to accept or reject offers indicated by the stimulus present on each trial, and feedback was presented in relation to the outcome of both accepted and rejected gambles. They then underwent at least one training session consisting of 60 trials of the task proper (altered to reduce the gap between action selection and offer presentation to a maximum of 2500 ms; the length of training varied between subjects according to their speed of learning). During scanning subjects performed two sessions each consisting of 120 trials. We present behavioral data acquired during scanning alone.
Behavioral analysis.
To check that subjects were able to adequately acquire and maintain the reward contingencies during the task, we analyzed the mean probability of selecting the correct action (accepting on trials when a good stimulus was presented and rejecting on trials where a bad one was presented) in each of the three conditions. We compared mean accuracy rates between conditions by taking the mean rates for each subject as summary statistics, testing for differences using a two-tailed Wilcoxon signed rank test.
To test whether the valence of stimuli in task-irrelevant modalities affected behavioral responding, we performed a logistic regression for each subject with separate regressors for the valences of task-relevant and task-irrelevant stimuli in each of the three conditions (giving us a total of six regressors). These were used to predict whether subjects accepted the offer (or not) on each trial (as reflected by a positive regression coefficient). We then performed group-level statistics using the single subject regression coefficients and one-tailed signed rank tests. This reflected our strong prior hypothesis that the presence of positively valenced stimuli should make subjects more likely to accept on any given trial.
fMRI data acquisition and preprocessing.
Gradient-echo T2*-weighted echo-planar (EPI) images were acquired on a 3 T Trio Siemens scanner with a resolution of 3 mm isotropic. Scanner settings (echo time, 30 ms; repetition time, 3.36 s; 48 slices acquired in descending order at an angle of 30° in the anterior–posterior axis) were designed to optimize sensitivity in the orbital frontal cortex (Deichmann et al., 2003). In each session, at least 469 images were collected (∼27 min each, two per subject). The first five images from the task sessions were discarded to allow for T1 equilibration effects, and the fMRI time series realigned and unwarped to correct for both motion-related and static distortions (Hutton et al., 2002). Whole-brain 1 mm × 1 mm × 1 mm T1-weighted structural images were acquired and coregistered with mean EPI images. Functional and structural data were then spatially normalized to Montreal Neurological Institute (MNI) space and smoothed with a 6 mm3 full-width at half-maximum (FWHM) Gaussian using the DARTEL toolbox (Ashburner, 2007). Respiration and heart rate were recorded using a breathing belt and pulse oximeter (Hutton et al., 2011).
Region of interest selection.
Based on previous literature we defined regions of interest (ROIs) for our contrasts of interest in the ventral striatum (6 mm radius spheres centered at [11 11 −2] and [−11 11 −2]; Guitart-Masip et al., 2011) and the ventromedial prefrontal cortex (vmPFC; 8 mm spheres centered at [6 50 −11] and [−6 50 −11]; Wright et al., 2013). These ROIs were used for small volume correction in our fMRI analyses.
fMRI univariate analysis.
We created a general linear model (GLM) containing separate events for each offer condition (visual, V, auditory, A. or cross-modal, C), modeled as 2 s duration boxcars. Each of these event regressors was modulated by three additional parametric regressors, reflecting the value (indicated by a zero or one) of stimuli in visual (Vv), auditory (Va), and cross-modal (Vc) modalities. We modeled outcome presentation separately for both accept and reject conditions, using a stick function, with each of these regressors in turn modulated by an additional parametric regressor indicating whether a win or a loss was signaled on that trial, giving four regressors in total. Regressors reflecting condition presentation time (when the text indicating which sensory modality was relevant was presented to subjects); the six motion regressors produced by the realignment stage of preprocessing; and physiological noise regressors consisting of six cardiac regressors, six respiratory regressors, and two regressors for heart rate change and change in respiratory volume (Hutton et al., 2011) were included as regressors of no interest. Unless otherwise stated, we report results that were significant at a threshold of p < 0.05, familywise error corrected (pwb), either for the whole brain, or using small volume correction for one of our regions of interest (psvc).
To compare the overall effect of behaviorally relevant properties of the stimuli with behaviorally irrelevant ones, we created appropriately weighted contrast images for each subject encoding the mean activity for relevant values (Vrel = ((Va|A) + (Vv|V) + (Vc|C))/3), irrelevant values (Virr = ((Vv|A) + (Vc|A) +(Va|V) + (Vc|V) + (Va|C) + (Vv|C))/6), and their difference (Vrel − Virr). We performed a second-level analysis using a summary statistics approach. To test for whether the overall effects we observed were present in each of the three task conditions, we also performed similar analyses for each task conditions (V, A, or C) separately.
To test for intersubject correlations between activity in the ventral striatum and the effects of task-relevant and task-irrelevant value signals on behavior, we extracted the average parameter estimates from our striatal ROIs for relevant and irrelevant value signals (averaged across modalities), and compared these with parameter estimates from our logistic regression (averaged across modalities).
For approximately half of the subjects in our study (n = 12) incongruent cross-modal stimuli were specified as good, while for the remaining subjects (n = 13) this was the case for congruent stimuli. To rule out the possibility that additional computations in one of these groups, for example, the configural demands in the incongruent group, altered neuronal representations of value in such a way as to eliminate behaviorally irrelevant value signals that would have otherwise been present, we tested for relevant and irrelevant value effects separately in both congruent and incongruent groups, using the mean parameter estimates for each contrast extracted from our striatal ROIs.
To assess outcome processing, for each subject we created separate contrasts for the parametric regressors encoding wins or losses in both the accept (Wacc) and reject (Wrej) conditions. The mean activity across accept and reject conditions reflecting correct outcomes was calculated as Wacc − Wrej, reflecting the fact that losses in the reject condition signaled that a subject had made the correct decision. (Error signaling across conditions was thus reflected by Wrej − Wacc.)
fMRI multivariate decoding analysis.
To explore brain regions that might contain information about stimulus properties even on task-irrelevant trials, we applied a multivariate searchlight decoding approach to our data using a linear support vector machine (SVM; Kamitani and Tong, 2005; Norman et al., 2006; Kriegeskorte and Bandettini, 2007; Chadwick et al., 2012). For each subject, using unsmoothed native space data, we first estimated a GLM containing a separate 2 s boxcar regressor encoding stimulus presentation for each trial, together with regressors encoding outcomes, condition presentation, movement, and physiological noise as described above. We then calculated a single T statistic image for each trial, and used these for decoding analysis using a linear C-SVM implemented in LIBSVM (Chang and Lin, 2011). Analysis was based on T statistic images rather than contrast images as these downweight the effects of noisy voxels and have been shown to be advantageous for multivariate analysis (Misaki et al., 2010).
For each stimulus property (visual, auditory, cross-modal), we attempted to decode which stimulus (good or bad) was present on each trial, separating trials where the stimulus property was relevant from those where it was irrelevant giving a total of six separate decoding analyses (for the cross-modal condition we attempted to classify between congruent and incongruent combinations alone, not between all four individual pairings, since these depended on the visual and auditory stimuli themselves). To assess classification accuracy at each voxel we first extracted T values for each voxel within a spherical searchlight with 6 mm radius centered on it (31 voxels), and then performed classification with 10-fold cross-validation. Trials were randomly separated into 10 partitions, one partition was removed from the dataset, the classifier was then trained on the remaining nine, and then accuracy was assessed on the tenth. This was repeated 10 times, using a different partition each time, and the resulting estimates were averaged to give a single decoding accuracy value for each voxel. Classification accuracy images were normalized to MNI space and smoothed with an 8 mm3 FWHM Gaussian using DARTEL, and second-level inference performed using SPM.
Modality-specific responses.
We also examined whether activity reflecting offer values or trial outcomes varied depending upon the task condition (whether visual, auditory, or cross-modal stimuli were task relevant), effects we hypothesized might be observed in visual, auditory, and multisensory areas for the three experimental conditions based on recent demonstrations that rewarding feedback activates sensory cortices (Pleger et al., 2008, 2009; Weil et al., 2010; FitzGerald et al., 2013). Accordingly we compared relevant value signals between conditions in the model described above, and created an additional model in which separate regressors were used for outcome events in the three conditions. Despite performing a number of different analyses using different functional and anatomical ROIs, we found no clear evidence of offer or trial outcome-related activity that differed between conditions, and we do not discuss the results of these analyses any further below.
Results
Behavior
For all three conditions, subjects showed a strong preference for selecting the correct action (accepting on trials with a positive expected utility and rejecting on trials with a negative expected utility; Visual: μ = 0.97, σ = 0.03; Auditory: μ = 0.93, σ = 0.06; Cross-modal: μ = 0.89, σ = 0.10). Accuracy was significantly higher in the visual compared with both the auditory (p = 0.004 signed rank test) and cross-modal conditions (p < 0.001 signed rank test). Accuracy was significantly higher in the auditory than the cross-modal condition (p = 0.032 signed rank test). No significant differences were observed for rates of correct responding to good and bad stimuli, suggesting accuracy was unaffected by whether subjects were making a decision about positively or negatively valenced stimuli.
The results of our logistic regression analysis suggest the effect of stimulus valence in nonrelevant modalities was much smaller than in relevant modalities. However, these effects were significantly greater than zero for all three sensory modalities, reflecting a positive effect of value on acceptance likelihood (Visual relevant: μ = 47.0, p < 0.001; Visual irrelevant: μ = 8.58, p = 0.004; Auditory relevant: μ = 31.4, p < 0.001; Auditory irrelevant: μ = 6.89, p = 0.008; Cross-modal relevant: μ = 19.4, p < 0.001; Cross-modal irrelevant: μ = 6.62, p = 0.020; all signed rank test). There were no significant differences in the effect of task-irrelevant valence between the distinct modalities.
Offer value signals in the ventral striatum
Across all three conditions, the value of behaviorally relevant stimulus features correlated positively with activity in bilateral ventral striatum (Right: [12 9 −6], Z = 4.31, psvc = 0.0003; Left: [−9 12 −6], Z = 3.90, psvc = 0.002; Fig. 1). No other region showed significant positive or negative correlations with behaviorally relevant value, although activity in vmPFC correlated positively with value, but this did not survive correction for multiple comparisons (peak voxel: [3 45 −6], Z-score: 2.53). Focusing on activity in ventral striatum alone we were unable to show either a significant positive or negative correlation with behaviorally irrelevant value (Fig. 1) offer value signals.
To unambiguously demonstrate that ventral striatal activity reflects the value of behaviorally relevant, more than behaviorally irrelevant, stimuli it is necessary to show not just a difference in significance between conditions but also a significant difference as reflected in the (Vrel − Virr) contrast. This was exactly what we observed in bilateral ventral striatum (Right: [12 9 −6], Z = 4.66, psvc < 0.0001; Left: [−12 9 −6], Z = 3.84, psvc = 0.002), again consistent with this region being preferentially engaged by the value of behaviorally relevant stimulus features. We also tested whether this difference, in our analysis pooled across V, A, and C, was also evident in each of these conditions considered separately. For the visual condition, the (Vrel − Virr) contrast showed a significant positive correlation with activity in bilateral striatum (Right: [12 12 −6], Z = 3.74, psvc = 0.003; Left: [−12 12 −9], Z = 2.89, psvc = 0.043). Similar effects were seen in the cross-modal condition (Right: [12 6 −3], Z = 4.44, psvc = 0.0001; Left: [−9 9 −6], Z = 3.47, psvc = 0.008) while in the auditory condition a significant correlation was seen in the right striatum alone ([9 9 −3], Z = 2.86, psvc = 0.044), with activity in the left striatum ROI not surviving correction for multiple comparisons ([−9 12 0], Z = 2.20, psvc = 0.158, p = 0.0137 uncorrected). These data are consistent with the idea that activity in ventral striatum reflects the value of behaviorally relevant stimuli alone, regardless of whether these features are visual, auditory, or conjoint visual and auditory modalities.
Between-subject effects
Although we found strong positive correlations between the size of effect of task-relevant value on behavior and activity in bilateral ventral striatum (Right: r = 0.581, p = 0.002; Left: r = 0.552, p = 0.004), we failed to observe a similar relationship for task-irrelevant value (Right: r = −0.243, p = 0.241; Left: r = −0.183, p = 0.382). This indicates that the effects of task-irrelevant value on behavior are not mediated by signals in the ventral striatum, but we also acknowledge the alternative possibility that it could reflect the fact these signals are small in amplitude.
Both groups of subjects, namely those who performed the task under conditions where congruent stimuli represented good offers and those for whom this was the case for incongruent stimuli, showed significantly greater striatal responses to relevant than irrelevant value (Congruent right: μ = 0.289, p = 0.003; Congruent left: μ = 0.266, p = 0.004; Incongruent right: μ = 0.339, p = 0.021; Incongruent left: μ = 0.266, p = 0.039; all signed rank test). No significant responses to irrelevant value were observed in either condition, though a trend was observed in left ventral striatum for the Incongruent group (Congruent right: μ = −0.097, p = 0.971; Congruent left: μ = −0.058, p = 0.916; Incongruent right: μ = 0.043, p = 0.235; Incongruent left: μ = 0.079, p = 0.055; all signed rank test). This suggests that differential representation of task-relevant and -irrelevant value was largely unaffected by the valence of congruent stimuli in the cross-modal condition.
Offer value signals in the rest of the brain
To examine whether other brain regions might represent the value of behaviorally irrelevant stimuli, we generated whole-brain activation maps. No regions showed activity that survived correction for multiple comparisons, and even at a very liberal threshold (p < 0.01 uncorrected, minimum cluster size 5 voxels), and we did not observe activation either in sensory cortex or in regions typically associated with value, such as the vmPFC or striatum (Table 1). This is consistent with the hypothesis that behavioral relevance gates the flow of information into value-sensitive regions, but like all negative findings it should be interpreted with caution.
In addition, in an exploratory analysis we performed a multivariate decoding analysis to see whether stimulus representations not evident in localized mean signal changes were present when they were task irrelevant (because we used only two visual and auditory stimuli, our decoding analysis is unable to distinguish between representations of stimulus value, and representations of the stimulus per se, if indeed these are different; nonetheless, it can to provide information about whether and where some stimulus features are represented). Visual stimulus properties could be decoded from visual cortex on both task-relevant and task-irrelevant trials (Relevant: [30 −84 −6], Z = 5.57, pwb = 0.001; Irrelevant: [−36 −81 −6], Z = 5.19, pwb = 0.006), and conjunction analysis at a threshold of p < 0.001 uncorrected revealed a large overlap in bilateral visual cortex (Fig. 3), consistent with the hypothesis that the flow of information from sensory regions representing individual stimuli to the ventral striatum is gated by task relevance (Fig. 1). No significant differences between relevance conditions were found, even using a small volume correction for the results of the conjunction analysis.
For auditory and cross-modal stimuli no regions showed decoding accuracy that survived correction for multiple comparisons in either the task-relevant or -irrelevant conditions. This may reflect properties of the stimuli themselves, or else of the neuronal responses in regions processing auditory and cross-modal stimuli.
Outcome signals in the ventral striatum
In the accept condition, activity in bilateral ventral striatum correlated positively with rewarded outcomes (Right: [9 9 −6], Z = 2.82, psvc = 0.045; Left: [−9 9 −6], Z = 2.95, psvc = 0.033), as predicted by previous findings (Seymour et al., 2004). In the reject condition, striatal activity showed the opposite pattern, manifesting a significant negative pattern of responding (Right: [9 12 −6], Z = 2.85, psvc = 0.047; Left: [−12 12 −9], Z = 3.09, psvc = 0.027; Fig. 4). This supports the idea that activity in this region at outcome presentation time is best explained in terms of a signal needed for implementation of a successful behavioral policy (Klein-Flügge et al., 2011; Li and Daw, 2011) rather than one used in updating an action value using a fictive reward signal (Watkins and Dayan, 1992; Lohrenz et al., 2007).
Outcome signals in the rest of the brain
For correct actions, significant positive correlations with obtained rewards were evident in the right caudate ([21 18 21], Z = 4.86, pwb = 0.033), the left superior parietal cortex ([−15 60 69], Z = 4.78, pwb = 0.050), and vmPFC (Right: [9 51 −6], Z = 4.08, p = 0.002; Left: [−3 48 −9], Z = 3.75, psvc = 0.007). Negative correlations with forgone rewards were found in the left supplementary motor area ([−6 −15 51], Z = 5.00, pwb = 0.016), the right precentral gyrus ([24 −9 51], Z = 4.92, pwb = 0.024), and the right vmPFC ([9 48 −9], Z = 3.54, psvc = 0.014). The finding that the vmPFC responds to feedback about correct decisions in both accept and reject conditions is interesting as it suggests, that, as for the striatum, it is concerned with evaluating the quality of action outcomes rather than processing outcomes themselves, consistent with the recent finding that vmPFC activity encodes information about specific actions (FitzGerald et al., 2012).
Foregone rewards were positively correlated with activity in the right lateral prefrontal cortex ([48 27 30], Z = 5.00, pwb = 0.016), bilateral anterior insula (Right: [30 21 −3], Z = 4.99, pwb = 0.017; Left: [−27 21 0], Z = 4.86, pwb = 0.033), and the dorsomedial prefrontal cortex ([6 30 42], Z = 4.84, pwb = 0.036; Fig. 4). A similar pattern of activity was observed for obtained losses, which were positively correlated with activity in the dorsomedial prefrontal cortex ([6 30 39], Z = 5.52, pwb = 0.001), bilateral anterior insula (Right: [48 18 −9], Z = 5.16, pwb = 0.007; Left: [−30 18 −3], Z = 5.47, pwb = 0.001), right middle temporal gyrus ([60 −30 −6], Z = 5.14, pwb = 0.008), and right lateral prefrontal cortex ([48 12 21], Z = 4.81, pwb = 0.043; Fig. 4). These results are consistent with findings of previous studies of error monitoring, which implicate the dorsomedial (or possible anterior cingulate) and insula cortices (Ridderinkhof et al., 2004; Klein et al., 2007), as well as a recent study examining counterfactual outcome processing, which showed similar activity in the dorsomedial prefrontal cortex (Boorman et al., 2011).
Differences between conditions in the processing of correct responses and errors
No region showed significantly stronger activity when comparing correct responses in the accept and reject conditions. While it is unwise to infer conclusions from a negative result, it is worth observing that this is consistent with the idea that similar processes are implemented in ventral striatum (and other areas) during the processing of feedback indicating correct or incorrect action selection, whether or not that feedback itself indicates either a positive or a negative outcome.
Because of subjects' high levels of accuracy on the task, we were unable to compare differences in activity between outcome signals when subjects chose to accept the good offer versus those when they chose to accept the bad offer (and similarly for reject trials). Future work examining this could shed useful light on the sort of outcome signals present during performance of this task.
Discussion
We show value-correlated activity in the ventral striatum reflects the behavioral relevance of stimulus features. This is consistent with the idea that selective and appropriate responding to valenced stimuli depends upon modulation that occurs before or during the process of evaluation (Dayan et al., 2000) rather than changes in the influence of automatically generated value representations on effector responses (Fig. 1). In addition, the results of our multivariate decoding analysis suggest that (at least for the visual modality) stimuli are represented in a similar manner in sensory cortex both when they are relevant and when they are irrelevant, and that task relevance is thus likely to modulate the influence of sensory areas on regions processing value. Also in accord with previous findings (Klein-Flügge et al., 2011; Li and Daw, 2011; Guitart-Masip et al., 2012), we show that activity in ventral striatal activity to rewarding outcomes is critically sensitive to features of an action (correct versus incorrect), a profile that resembles a policy update signal rather than a reward prediction error per se.
By manipulating whether a visual cue, a simultaneously presented auditory cue, or a combination of both determined the value of a trial offer we show that striatal activity was influenced solely (or only detectably) by stimulus value in the behaviorally relevant modality. Although we are unable to directly test what mechanisms are responsible for this modulation by behavioral context, our findings are in keeping with theoretical proposals, which suggest selective attention gates inputs to value-processing regions, and weight-value predictions appropriately (Dayan et al., 2000; Yu and Dayan, 2005; Gershman et al., 2010), and the recent finding that spatial attention affects value comparison signals during binary choice (Lim et al., 2011). If true, this is a clear example of a case in which selective attention, rather than reflecting limited computational resources, is in accord with the demands of optimal inference (Rao, 2005; Yu et al., 2009; Feldman and Friston, 2010; Dayan and Solomon, 2010). Indeed we see this as a promising area for future study, particularly the question of whether diverting attention through exogenous cuing or attentional load manipulations impairs subjects' ability to focus on task-relevant stimuli, and whether there are consequential effects on ventral striatal reward signals.
Our results build on a considerable existing literature implicating ventral striatum in anticipated reward (Schultz et al., 1992; Knutson et al., 2001; O'Doherty et al., 2004; Kable and Glimcher, 2007; Bartra et al., 2013; Clithero and Rangel, 2013). Of particular relevance here is a recent finding that responses in the ventral striatum during a binary decision-making task are sensitive to exogenous manipulations of spatial attention (Lim et al., 2011), which altered the sign of a signal reflecting the difference in value between the two options. In our experiment any attentional shifts are endogenous, and occur not between spatial locations but between sensory modalities. This implies that value signals in ventral striatum are subject to flexible and dynamic modulations by attention, a finding that makes sense given the key role they are likely to play in adaptive choice.
For simplicity, we explicitly manipulated which stimulus features determined the value of the offer on a particular trial, but the issue of how subjects infer which features are currently relevant for behavior is also of considerable interest (Dayan et al., 2000). Recently, sophisticated behavioral modeling has been deployed to test hypotheses about how subjects infer which stimulus features they should attend to (Gershman et al., 2010; Wilson and Niv, 2011), but the neural substrates of this process remain to be elucidated, something for which model-based neuroimaging approaches seem ideally suited.
A question unresolved in our analysis is which brain regions mediate the (admittedly small) effects of task-irrelevant stimulus properties on behavior. One possibility is that task-irrelevant value signals are indeed present in the ventral striatum, but that these are either so small in magnitude or so transient that they were not discernible in the blood oxygenation level-dependent signal. This is suggested by a Bayesian perspective (Dayan et al., 2000), where task relevance is encoded as a context-dependent probability weight, rather than a binary on/off switch (on this account subjects are never completely certain about which context they are in, explaining the weak effect of irrelevant value behavior). Alternatively, it may be that these effects are mediated by other structures, perhaps through some sort of valence-based priming of approach (acceptance) behavior (Tucker and Ellis, 2004; Guitart-Masip et al., 2011) as, for example, seen in pavlovian-instrumental transfer (Balleine and Killcross, 2006; Bray et al., 2008; Talmi et al., 2008).
In principle, subjects could learn and maintain appropriate responding on our task either by directly learning a policy (which action they should perform given a particular state), or by basing choices on the representation of specific action values (Watkins and Dayan, 1992; Sutton and Barto, 1998; Li and Daw, 2011). These alternatives make similar predictions about the type of feedback signals we expect to see when subjects accepted the offer made to them (obtained outcomes), but critically they make opposite predictions about signals when subjects rejected the offer (foregone outcomes). If subjects solve the task by encoding and maintaining action values, this should be reflected in outcome signals with the same sign in the case of both real and foregone outcomes. If, in contrast, subjects learn policies (correct actions) directly, the real and fictive learning signals should have opposite signs, because a good foregone outcome provides evidence against the current policy (Li and Daw, 2011).
As reported previously (Li and Daw, 2011; for review, see Lohrenz et al., 2007), our data strongly suggest that activity in ventral striatum resembles a policy update signal rather than a value prediction signal. Interestingly, this may simply be a special case of a more general role of outcome signals in the ventral striatum in signaling the accuracy or correctness of a recent action. In a recent study where participants were asked to judge the time at which rewards were delivered, striatal activity reflected the accuracy of predictions about timing, and was clearly dissociated from activity in the midbrain, which showed a pattern of activity more consistent with a reward prediction error (Klein-Flügge et al., 2011). This leads us to hypothesize that, at least in certain contexts, the key role of ventral striatum in outcome processing is to signal the success of a particular action, and hence the desirability of repeating it in the future. This may explain why outcome signals in the ventral striatum are profoundly reduced when subjects have no ability to determine the outcome through their actions (Zink et al., 2004; Coricelli et al., 2005; Nicolle et al., 2011; similar results have also been reported in the dorsal striatum (Tricomi et al., 2004).
Our results should not be interpreted as indicating outcome signals in the ventral striatum never reflect value updating (O'Doherty et al., 2004; Seymour et al., 2004; Hare et al., 2008; Rangel et al., 2008; Kim et al., 2009). Instead we take a more nuanced view that under certain circumstances subjects adopt a strategy based upon a direct updating of policies (Sutton and Barto, 1998; Li and Daw, 2011). Plausibly human subjects are able to employ different types of learning strategy depending on the precise nature of the task, and understanding when they do so has echoes with issues such as characterizing the nature of strategies used in particular environments. When considering neuronal responses, which positively covaried with outcomes indicating erroneous actions (accepted losses or forgone wins), we observed activity in similar networks of regions that have previously been associated with the error processing (Ridderinkhof et al., 2004; Klein et al., 2007). How this error signaling relates to the mirror image correctness signal we find in the ventral striatum is an interesting area for future studies to explore.
Combining the results of our two key analyses suggests that rather than simply reflecting the input of simple cue-dependent reward prediction error-like signaling from the dopaminergic midbrain, activity in ventral striatum during instrumental learning is tuned to which features of the environment are relevant for action, and the desirability or otherwise of repeating a particular action regardless of whether a subject received reward (Klein-Flügge et al., 2011). This represents a further step toward understanding the contribution the basal ganglia make to value-based decision making, and therefore human decision-processes themselves.
Footnotes
This work was supported by a Wellcome Trust Senior Investigator Award 098362/Z/12/Z to R.J.D. and The Wellcome Trust Centre for Neuroimaging is supported by core funding from Wellcome Trust Grant 091593/Z/10/Z. We thank the Functional Imaging Laboratory radiographers for their time and patience.
The authors declare no competing financial interests.
- Correspondence should be addressed to Dr. Thomas H.B. FitzGerald, Wellcome Trust Centre for Neuroimaging, 12 Queen Square, London WC1N 3BG, UK. thomas.fitzgerald{at}ucl.ac.uk