Abstract
Modulations of the feedback-related negativity (FRN) event-related potential (ERP) have been suggested as a potential biomarker in psychopathology. A dominant theory about this signal contends that it reflects the operation of the neural system underlying reinforcement learning in humans. The theory suggests that this frontocentral negative deflection in the ERP 230–270 ms after the delivery of a probabilistic reward expresses a prediction error signal derived from midbrain dopaminergic projections to the anterior cingulate cortex. We tested this theory by investigating whether FRN will also be observed for an inherently aversive outcome: physical pain. In another session, the outcome was monetary reward instead of pain. As predicted, unexpected reward omissions (a negative reward prediction error) yielded a more negative deflection relative to unexpected reward delivery. Surprisingly, unexpected pain omission (a positive reward prediction error) also yielded a negative deflection relative to unexpected pain delivery. Our data challenge the theory by showing that the FRN expresses aversive prediction errors with the same sign as reward prediction errors. Both FRNs were spatiotemporally and functionally equivalent. We suggest that FRN expresses salience prediction errors rather than reward prediction errors.
Introduction
When a reward deviates from the one that was expected, dopaminergic neurons produce an error signal (prediction error) proportional to the magnitude of the deviation, eventually leading to learning or extinction (Schultz, 2007). Prediction error signals are expressed in the midbrain in which dopaminergic neurons increase their phasic firing rate for unexpected reward (a positive prediction error) and pause when reward is unexpectedly omitted (a negative prediction error). A dominant theory, reinforcement-learning error-related negativity (RL-ERN; Holroyd and Coles, 2002), contends that the event-related potential (ERP) feedback-related negativity (FRN), 230–270 ms after feedback is received (Nieuwenhuis et al., 2004), reflects the operation of the neural system underlying reinforcement learning in humans. RL-ERN claims that the FRN expresses a reward prediction error signal derived from dopaminergic projections to the anterior cingulate cortex (ACC), as confirmed recently by a review of source-localization studies (Walsh and Anderson, 2012). Specifically, decreases in phasic dopamine firing are thought to disinhibit while increases inhibit ACC neurons, resulting in a more negative or positive signal, respectively. FRN, the difference wave computed by subtracting positive from negative outcomes, is thought to guide optimal decision making on a trial-by-trial basis (Cohen and Ranganath, 2005) and changes with personality, age, and psychopathology (Cohen et al., 2011).
RL-ERN has typically been tested in situations in which reward attainment is feasible. An exception is an investigation of loss avoidance (Holroyd et al., 2004) which supported RL-ERN. We tested this theory by investigating whether FRN will be observed for an inherently aversive outcome: physical pain. In our experiment, participants experienced blocks in which an aversive outcome (pain) was delivered or omitted and separate blocks in which a rewarding outcome (money) was delivered or omitted. RL-FRN should predict that the usual FRN will be obtained in our aversion blocks, with unexpected pain omission coded similarly to unexpected reward delivery (positive prediction error) and unexpected pain delivery coded similarly to unexpected reward omission (negative prediction error).
At the time when RL-ERN was developed, neural recording data supported a primary role for phasic dopaminergic firing in coding prediction errors associated with motivational value, consistent with the computational reward prediction error signal (Mirenowicz and Schultz, 1996; Ungless, 2004). It is now recognized that this response of dopamine neurons is more diverse (Bromberg-Martin et al., 2010; Ungless et al., 2010). Phasic increase in firing in different populations of dopamine neurons may code unexpected reward, unexpected aversion (Joshua et al., 2008; Anstrom et al., 2009; Lammel et al., 2011), or both rewarding and aversive events (Matsumoto and Hikosaka, 2009; Lammel et al., 2011). The latter population is thought to code motivational salience (Bromberg-Martin et al., 2010). Given these recent findings, it is possible that FRN codes prediction errors associated with motivational salience rather than motivational value. A prominent hypothesis is that ERPs are typically more negative for losses than rewards because losses are more motivationally salient (Oliveira et al., 2007). In which case, and given our design where rewarding and aversive outcomes were presented separately, FRN should reverse in our aversion block, with unexpected pain delivery coded similarly to unexpected reward delivery, and both more positive than unexpected pain and reward omission. This is exactly the pattern we obtained, presenting an important challenge for RL-ERN.
Materials and Methods
Participants
Twenty undergraduate students aged 18–35 (12 females) received course credits and won £10 in the reward condition of the study. Participants had no history of neurological, neuropsychological, psychiatric, or chronic pain conditions, were not taking centrally acting medications, and had normal or corrected-to-normal vision. The study was approved by the University of Manchester ethics committee.
Procedure
On arrival, a transcutaneous electrical nerve stimulator that delivered electrical pulses (2 ms in duration) was fitted on the participants' left index finger. Participants incrementally increased the pulse current level to establish a subjective “low” (just painful) and “high” (but tolerable) pain intensity levels. It was explained to participants that “low” corresponded to a score of 4 and “high” to 7 on a 1- to 10-point scale, where 1 indicated non-painful and 10 intolerably painful stimulation.
The study used a 2 (type: reward/pain) × 2 (expectancy: unexpected/expected) × 2 (magnitude: high/low) × 2 (outcome: delivered/omitted) within-subjects design. Expectancy was manipulated by varying outcome probability, which was either 0.25 or 0.75, corresponding to unexpected and expected outcomes (Fig. 1). The task comprised four reward blocks and four aversion blocks, each with 120 trials; each cell in the design included 60 trials. Block order was counterbalanced between participants. Trial order was randomized for each participant.
Design and paradigm. The boldface rows in the table indicate which conditions are illustrated in the timeline diagrams.
Each trial began with a fixation cue (625 ± 125 ms) followed by a chance cue (1000 ms) that indicated the type, probability, and magnitude of the outcome that may follow (Fig. 1). The chance cue was a pie chart that signaled outcome magnitude by one of two colors (blue for high and green for low, counterbalanced across participants), outcome probability by the colored portion of the chart, and outcome type by a picture of a coin or a flash. In the reward condition, outcome magnitude was also signaled by the coin, a 1- or 50-pence piece. All chance cues had the same luminance. A truth cue (750 ms) was presented at the offset of the chance cue, indicating with 100% contingency whether the outcome was to be delivered (the entire chart was colored) or omitted (none of the chart colored). All truth cues had the same luminance. Outcomes were presented at the offset of the truth cue. Aversive outcomes were presented as an electric stimulation of the hand. Rewards were presented visually (an image of the relevant coin, 100 ms). Participants were informed that, at the end of each block, they will receive rewards in the form of tokens, which will be traded for money once all data were collected, but were not told what the exchange rate would be. For ethical reasons, all participants received the same amount of money (£10). To maintain concentration, participants were asked at the end of each block if outcome delivery corresponded to the probabilities signaled by the anticipation cue.
EEG recording
Continuous EEG was recorded from 32 Ag/AgCl scalp electrodes placed according to the 10–20 system (Synamps; Neuroscan) fitted on an elasticated cap (EASYCAP) and referenced to the vertex electrode Cz with FPz as the ground. Bandpass filters were set at 0.1–100 Hz, with a sampling rate of 500 Hz and gain of 500. Impedances were kept at 5 KΩ or less. The experiment was conducted in a quiet, dimmed room.
EEG data analyses
Preprocessing.
Data analysis focused on the signal associated with the truth cue. Data were analyzed using SPM8 (Litvak et al., 2011). The electrophysiological signals were re-referenced to the mean of all electrodes, downsampled to 125 Hz, and filtered with a fifth-order Butterworth filter between 0.5 and 20 Hz. ERPs were computed for the epoch 100 ms before truth cue onset until 500 ms later. Artifact rejection proceeded in three steps. First, trials with flat segments were rejected, as were trials in which signal in posterior electrodes exceeded a lenient threshold of 110 μV. Second, eyeblink confounds were corrected by a signal space projection method (Nolte and Hämäläinen, 2001) implemented in MEEGTools toolbox distributed with SPM8. Artifact subspace was defined per session by principal component analysis of session-averaged data epoched around eyeblinks, using one principal component. Third, trials in which signal in any of the electrodes exceeded 80 μV were rejected. On average, 7% of trials were removed across participants and conditions, and electrodes were rejected when the proportion of rejected trials exceeded 15%. Single-trial ERPs were averaged separately for each of the 16 conditions using the “robust averaging” method in SPM8 (Litvak et al., 2010). This method considers the distribution of values over trials for each channel and time point, and the outliers are down-weighted when computing the average. This makes it possible to neutralize artifacts restricted to narrow time ranges without rejecting whole trials. Moreover, a clean average can be computed with no clean trials, given that the artifacts do not consistently overlap and only corrupt different parts of trials. Averages were then filtered between 0.1 and 20 Hz to remove high frequencies introduced by the robust averaging method. For each subject and condition, a three-dimensional channel space by time was created by projecting, for each sample point, the electrode locations onto a plane following a linear interpolation to a 64 × 64 pixel grid (pixel size, 2.13 × 2.69 mm). The resulting images were smoothed using a Gaussian kernel full-width at half-maximum of 8 mm/ms.
Statistical analyses.
These images were masked temporally between 100 and 350 ms after truth cue onset and entered into two three-way ANOVA models with the factors expectancy, magnitude, and outcome. Because our manipulations of reward and aversion were so different from each other (see Discussion), we used separate models for the reward and the aversion conditions and never compared them statistically.
The FRN is defined as the difference between outcome delivery and omissions, here operationalized as the onset of the truth cue. Therefore, we first examined which regions expressed the main effect of outcome. For this purpose, we used a t test, with a statistical threshold of familywise errors (FWE) <0.05 and a cluster size threshold exceeding an extent of 100 mm/ms. The FRN is also known to be sensitive to outcome expectancy, so that in our experiment the difference between the delivered and omitted outcomes should be greater for outcomes that are expected 25% of the time relative to those expected 75% of the time. Thus, the FRN can be further defined as signal that is sensitive not only to the main effect of outcome but also to the interaction between outcome and expectancy. To extract such signal, we used a t test to examine the interaction contrast but searched only within the limited set of spatiotemporal regions that expressed the orthogonal main effect of outcome (FEW <0.05). We did not want to impose the FWE criterion twice, to avoid false negatives. To ensure that the results reported for the interaction contrast were robust, we conducted a Monte Carlo simulation (Song et al., 2011) that showed that a cluster extent of 100 mm/ms, in combination with an uncorrected p < 0.001, corresponds to FWE < 0.001. The Monte Carlo simulation formalizes the theory that 100 contiguous voxels are more likely to reflect true activation than one significant voxel.
Results
The main effect of outcome yielded large frontocentral cluster in both reward (x = −13, y = 10; peak FWE <0.001; extent, 2559 pixels) and pain (x = −9, y = −3; peak FWE <0.001; extent, 10,003 pixels) conditions. The reward cluster extended 205–250 ms after the truth cue, and the pain cluster extended 194–289 ms after the truth cue. The topographies that corresponded to the main effect of outcome (omitted outcomes > delivered outcomes) in the reward and the aversion conditions resembled each other in exhibiting marked central–frontal negativity (Fig. 2A,B). To determine the peaks of the corresponding waveforms, we extracted time courses from cluster maxima and computed the FRN by subtracting all outcomes with positive value (reward delivery, aversion omission) from all outcomes with negative value (reward omission, aversion delivery; Fig. 2C). We observed an FRN in the reward condition, as hypothesized and previously demonstrated, but a positivity in the aversion condition. The morphologies of the FRN were similar in both conditions. The reward FRN peaked 236 ms after the truth cue. The pain “FRN: positivity peaked 246 ms after the truth cue.
The main effect of outcome showing the reward FRN and its reversal in the aversion condition. The topographies correspond to the main effect of outcome (omitted outcomes > delivered outcomes) 240 ms after the truth cue in the reward (A) and the aversion (B) conditions. Time courses (C) were extracted from the cluster maxima for this contrast in SPM (224 ms in the reward condition, 232 ms in the aversion condition). Difference waves, time locked to the truth cue, were computed following the FRN literature by subtracting outcomes with positive value (reward delivery, aversion omission) from outcomes with negative value (reward omission, aversion delivery). An FRN was obtained in the reward condition (black) and a positivity in the aversion condition (dashed). The morphologies of the difference waves that correspond to the main effect of outcome in the reward (black; same as the FRN computation) and aversion (gray) conditions bear striking resemblance to each other.
The interaction between outcome and expectancy, masked inclusively by the main effect of outcome, yielded a single cluster at frontocentral scalp region in the reward condition that extended 205–225 ms after the truth cue (x = −10, y = 18; peak FWE <0.06; extent, 318 pixels) and a single cluster in a similar location in the aversion condition that extended 260- 290 ms after the truth cue (x = 0, y = 13; peak FWE <0.05; extent, 695 pixels). Parameter estimate plots extracted from the maxima of each cluster show that both clusters expressed the difference between delivered and omitted outcomes and that this difference was greater for unexpected outcomes (Fig. 3). A similar pattern was obtained when data were extracted from electrode Fz (see Notes), which was used to measure the FRN in previous studies (Cohen et al., 2007).
The interaction of outcome and expectancy in the reward and aversion conditions. Parameter estimates were extracted from cluster maxima (208 ms in the reward condition, 288 ms in the aversion condition) for all eight conditions in the design: expectancy (unexpected/expected) × magnitude (high/low) × 2 outcome (delivered/omitted), separately for the reward (left) and the aversion (right) conditions. Insets, Statistical parametric maps overlaid over the glass brain showing where the interaction of outcome and expectancy (masked by the main effect of outcome) was maximally significant. Error bars represent SEs.
Figure 3 suggested that the interaction was larger for high-magnitude outcomes regardless of motivational value. For both reward and aversion, we analyzed the outcome × expectancy interaction in two separate models, when the outcome magnitude was high or low. Only the high-magnitude data yielded significant effects: central–frontal clusters with maxima that resembled the clusters reported above (reward, 208 ms; aversion, 288 ms). We compared the interaction regressors by extracting the value of the interaction contrast from the same spatiotemporal locations using a mask from the significant clusters in high-magnitude outcomes. There was no significant difference between high and low magnitude (reward, t(38) = 1.03, p > 0.3; aversion, t(38) = 1.05, p > 0.29).
For completion, Figure 4 shows that the waveforms for delivered and omitted outcomes, as well as for the difference between them, had a similar morphology in the reward and aversion conditions during the FRN time window. A similar pattern was obtained when the reward and aversion analyses were both restricted to data from electrode Fz (see Notes), which was used to measure the FRN in previous studies (Cohen et al., 2007).
Time courses in the reward (top) and aversion (bottom) conditions, time locked to the truth cue, separately for high- and low-magnitude outcomes that were expected (75% likely) or unexpected (25% likely). Left, Delivered outcomes; middle, omitted outcomes; right, the difference wave (omitted − delivered outcomes). The waveform morphologies in the reward and aversion condition bear striking resemblance to each other.
Finally, we analyzed the signal that was time locked to the chance cues using the same methodology. Although it would be somewhat challenging to compute the prediction error at that time point, we argue that the prediction error would be larger when the cue signals high- than low-magnitude outcomes. In line with the hypothesis that the FRN expresses salience, rather than reward, prediction errors, the signal we observed within the time window of the FRN in electrode Fz was more negative for high- than low-magnitude outcomes regardless of their valence (see Notes).
Discussion
Both reward and aversion conditions yielded frontal–central clusters that expressed both the main effect of outcome—the difference between delivered and omitted outcomes—and the interaction of outcome and expectancy. The main effect of outcome yielded clusters with similar location and maxima in the reward and the aversion condition. The interaction also yielded clusters with similar location and parameter estimates in the reward and aversion condition, but it was expressed earlier in the reward than in the aversion condition. In both reward and aversion conditions, the interaction stemmed from a smaller difference between delivered and omitted outcomes when outcomes were expected, relative to when they were unexpected. Parameter estimates were larger for unexpected delivered outcomes relative to unexpected omitted outcomes, whereas expected outcomes—either delivered or omitted—were expressed in middling values.
Contrary to the findings of Holroyd et al. (2004), the time courses show that the ERPs we obtained were more positive for delivered outcomes than omitted outcomes in both reward and aversion conditions. The waveform morphologies were strikingly similar in the reward and aversion conditions. Therefore, our data agree with the prediction of RL-ERN that FRN will be obtained in the reward condition but refute its prediction that FRN will be obtained in the aversion condition. Instead, when we computed the FRN according to the literature—by subtracting outcomes with positive value (reward delivery, aversion omission) from outcomes with negative value (reward omission, aversion delivery)—we obtained a “reverse” FRN (a positivity) in the aversion condition.
A computational model of salience suggests that a signal for certain reward is more salient than a signal for certain nonreward (Esber and Haselgrove, 2011). This implies that the truth cue that signaled outcome delivery was more salient than the one that signaled outcome omission. A parsimonious account of our results is that the ERPs we observed express salience prediction errors as a positivity. This claim can only be validated if it could be demonstrated that the signals share the same generator. This would extend RL-ERN, reflecting an updating of the theory given recent evidence for a diversity in what dopaminergic neurons code for (Bromberg-Martin et al., 2010). Of note, our data suggest that salience prediction errors occur even in the absence of explicit opportunity for action.
The present data extend our previous results (Talmi et al., 2012), in which the separate examination of signal associated with wins and losses challenged the interpretation of the FRN as a neural marker of reward prediction errors. Our data agrees with previous fMRI data (Metereau and Dreher, 2013) that showed that both the ACC and the striatum (both targets of dopaminergic projections, which according to RL-ERN cause the FRN) code salience rather than reward prediction errors and supports the suggestion that the medial prefrontal cortex codes discrepancies between expected and unexpected outcomes regardless of their valence (Alexander and Brown, 2011). However, because we did not analyze the generators, we cannot ascertain that the scalp signals we report originate from the same neural source.
The hypothesis that the FRN codes salience, rather than reward, prediction errors is not new. For example, Oliveira et al. (2007) asked participants to evaluate their own response as either correct or incorrect and then gave them real feedback on their performance. The FRN was obtained whenever the feedback was surprising, regardless of its motivational value, namely for both unexpected “correct” or “incorrect” feedback. Unfortunately, the absence of neutral outcomes makes it difficult to ascertain that participants in that study perceived their errors as more salient than their correct evaluations. Therefore, it was possible that the signal Oliveira et al. (2007) observed merely reflected the deviation of the outcomes from expectation rather than a salience prediction error, defined as a response to unexpected, motivationally salient events. Our results extend those of Oliviera et al. and support their conclusions.
A recent review supported RL-ERN and rejected the hypothesis that the FRN expresses motivational salience (Walsh and Anderson, 2012). In response, we note that they appear to confuse the salience of an outcome with its likelihood. For example, they discuss the observation of Holroyd and Krigolson (2007) that ERPs for errors were more negative than ERPs for correct responses even in “hard” blocks, when errors were more frequent. In our opinion, this does not contradict the salience hypothesis of the FRN if we assume that errors are more salient than correct responses even when the former are more frequent. Future research should seek to corroborate independently which outcomes are more salient.
In our experiment, participants were exposed to blocks in which an aversive outcome was delivered or omitted and to separate blocks in which a rewarding outcome was delivered or omitted. This is different from typical paradigms in studies of RL-ERN, in which positive outcomes are always a possibility. For example, participants may be asked to choose between stimuli that are associated with either rewards or punishments (Cohen et al., 2007), presented with a gamble in which they may win or lose (Talmi et al., 2012), or given positive or negative feedback on task performance (Holroyd and Krigolson, 2007; Oliveira et al., 2007). Separating the reward and aversion blocks in the present study renders the omission of pain in the aversion blocks the best possible outcome, therefore maximizing our chance to observe FRN for purely aversive outcomes. In line with Konorski's opponent model (Konorski, 1987), in the aversion blocks, participants should perceive pain omission as a positive outcome and its delivery as a negative outcome. Had pain omission trials been mixed with reward delivery trials, both pain delivery and omission may have been coded as negative outcomes.
A weakness of our design is that the reward and aversion conditions differed in two important ways in addition to their motivational value: (1) the nature of the outcomes themselves, a primary reinforcer in the aversion condition and a secondary reinforcer in the reward condition; and (2) the timing of their delivery, immediately after the truth cue (aversion) or after a prolonged delay (reward). For this reason, we never compare the reward and aversion condition statistically or make claims related to their differential strengths or latency, although we speculate that these differences account for the more robust signal in the aversion condition. Our main finding, that the FRN “reverses” when the outcome is aversive, stands even without such a comparison.
Modulation of the FRN has been suggested as potential biomarker in psychopathology (Olvet and Hajcak, 2008) and individual differences (Sosic-Vasic et al., 2012). For example, although feedback processing is relatively intact in schizophrenia, the representation of value is impaired (Morris et al., 2011). Further, Mason et al. (2012) have reported reduced FRN in a hypomania-prone group, suggesting positive evaluation bias and impaired reinforcement learning. Reaching a clearer understanding of the functional significance of reinforcement learning error-related signals is crucial if their modulation were to be interpreted as biomarkers or risk indicators. Our results suggest that FRN may serve to optimize not only decisions about reward but also decisions about punishment, which may help explain why it is altered in psychopathology.
Notes
Supplemental material for this article is available at www.psych-sci.manchester.ac.uk/staff/talmi/. This material has not been peer reviewed.
Footnotes
The study was partially funded by an Economic and Social Research Council First Grant (D.T.) We thank K. Birchall and H. Balmforth for help with data collection and R. Mars and L. Fuentemilla for their helpful comments.
- Correspondence should be addressed to Deborah Talmi, School of Psychological Sciences, University of Manchester, Oxford Road, Manchester,M13 9PL, UK. Deborah.Talmi{at}manchester.ac.uk