Abstract
The human mediofrontal cortex, especially the anterior cingulate cortex, is commonly assumed to contribute to higher cognitive functions like performance monitoring. How exactly this is achieved is currently the subject of lively debate but there is evidence that an event's valence and its expectancy play important roles. One prominent theory, the reinforcement learning theory by Holroyd and colleagues (2002, 2008), assigns a special role to feedback valence, while the prediction of response–outcome (PRO) model by Alexander and Brown (2010, 2011) claims that the mediofrontal cortex is sensitive to unexpected events regardless of their valence. However, paradigms examining this issue have included confounds that fail to separate valence and expectancy.
In the present study, we tested the two competing theories of performance monitoring by using an experimental task that separates valence and unexpectedness of performance feedback. The feedback-related negativity of the event-related potential, which is commonly assumed to be a reflection of mediofrontal cortex activity, was elicited not only by unexpected negative feedback, but also by unexpected positive feedback. This implies that the mediofrontal cortex is sensitive to the unexpectedness of events in general rather than their valence and by this supports the PRO model.
Introduction
How do we learn from past events? Although there is general agreement that monitoring and evaluating the consequences of our behavior is important for action selection and learning, there is disagreement about how exactly this is achieved in the human brain and whether learning is primarily driven by an event's valence or unexpectedness.
A number of studies have demonstrated that the mediofrontal cortex, especially the anterior cingulate cortex (ACC), plays a crucial role in performance monitoring and behavioral adjustment. One prominent theory, the reinforcement learning (RL) theory (Holroyd and Coles, 2002), postulates that when events are “worse than expected” decreases in dopamine activity train the ACC to adjust control of the motor system. Supporting evidence comes from studies showing that unexpected negative events elicit negativities in the event-related potential (ERP) [e.g., the feedback-related negativity (FRN)], which are related to learning processes and generated in ACC (Miltner et al., 1997; Gehring and Willoughby, 2002; Holroyd and Coles, 2002; Luu et al., 2003; Gehring et al., 2012). A growing body of evidence now suggests a specific role of positive feedback for behavioral adjustments (Potts et al., 2006; Eppinger et al., 2008; Opitz et al., 2011). Also, Holroyd and colleagues (2008) have argued that unexpected positive feedback elicits electrophysiological activity distinct from the FRN. They assume this feedback correct-related positivity (fCRP) is produced by dopaminergic activity when events are “better than expected.”
An alternative view, the prediction of response-outcome (PRO) theory by Alexander and Brown (2010, 2011), suggests that the key function of the ACC is predicting the likely outcomes of actions and signaling unexpected nonoccurrences of those events. This includes detecting not only unexpected undesirable outcomes, such as negative feedback, but also unexpected desirable or rewarding outcomes. Consistent with this view, researchers found larger ACC activation in imaging studies after unexpected events, signaling the need for increased control (Braver et al., 2001; Aarts et al., 2008). One study even found increased ACC activation when these events were rewards (Jessup et al., 2010).
The RL theory thus predicts a difference in ERPs to unexpected positive and negative feedback. Unexpected negative feedback should increase FRN amplitude while unexpected positive feedback should elicit a fCRP. In contrast, the PRO model predicts an FRN after unexpected outcomes regardless of valence. The primary aim of this study was to test these different claims. Note that our approach is agnostic as to whether the activity we measure originates solely from dopaminergic influences on the ACC [as Holroyd et al. (2008) presume] or extends beyond the ACC to include activity from other regions of the mediofrontal cortex and possibly other neurotransmitters (Alexander and Brown, 2011; Godlove et al., 2011; Gehring et al., 2012; Reinhart et al., 2012). Rather, our concern is with the differing predictions the theories make about the functional properties of the FRN response. We additionally examined another ERP correlate of performance monitoring, the P300, which has been shown to vary with the frequency and valence of feedback stimuli (Bellebaum and Daum, 2008). For this purpose, the electroencephalogram was recorded during a time-estimation task with positive, negative, and intermediate feedback. To disentangle feedback valence and expectancy, feedback was given via an adaptive mechanism tied to each participant's performance. This mechanism ensured that intermediate feedback occurred frequently and therefore was expected, whereas positive and negative feedback occurred only rarely and thus were rather unexpected. An important innovation of our approach is that positive feedback was truly positive performance feedback: it occurred after exceptionally good performance and was not elicited by false-positive feedback after erroneous behavior as was the case in earlier investigations (cf. Oliveira et al., 2007).
Materials and Methods
Participants.
Twenty-four volunteers participated in the experiment, which was in accordance with the ethical guidelines of the Declaration of Helsinki. All subjects were right-handed and had normal or corrected-to-normal vision. All signed informed consent before the experiment and were paid €8 per hour.
One participant had to be excluded because he confused the meaning of the feedback colors. Three more were excluded because the adaptive mechanism did not succeed in adjusting the time windows fast enough and, as a result, the frequency distributions of positive and negative feedback differed reliably from the intended distribution (for a detailed description, see Data Analyses). Consequently, all analyses were based on the data of 20 subjects (11 female and 9 male; ages, 20–27 years; mean age, 22.4 years).
Task, stimuli, and procedure.
After subjects filled out a short demographic questionnaire, they performed the time estimation task (Fig. 1a). The task started with a fixation cross and participants were instructed to press a response button 2.5 s after the cross vanished from the screen. They received positive, negative, or intermediate feedback about their estimation accuracy in form of a yellow, purple, or blue rectangle 5 s after the fixation cross disappeared. We used simple colored rectangles as feedback stimuli to avoid differences in P2 amplitudes due to perceptual processing between feedback conditions (Liu and Gehring, 2009). The assignment of colors to the type of feedback was counterbalanced across subjects. Variable presentation times were used for fixation cross (250, 500, or 750 ms) and intertrial interval (750, 1000, or 1250 ms) to prevent mere rhythmic responses. Overall, subjects completed 300 trials and had a short break every 60 trials.
a, Trial procedure for the time-estimation task. Participants were instructed to press a response button 2.5 s after a fixation cross vanished from the screen. Positive, negative, or intermediate feedback about the accuracy of the time estimation was given in form of colored rectangles 5 s after the fixation cross had vanished. Variable presentation times were used for the fixation cross (250, 500, or 750 ms) and the intertrial interval (750, 1000, or 1250 ms) to prevent mere rhythmic responses. b, To keep the rate of positive and negative feedback low and thus unexpected, and the rate of intermediate feedback high and expected, an adaptive procedure was used to adjust the feedback to participants' performance. During the first 20 trials, negative feedback was given if the participant's response (blue line) was faster than 2000 ms or slower than 3000 ms. Intermediate feedback was given if the response occurred between 2000 and 2400 ms or between 2600 and 3000 ms. Positive feedback was presented if the response occurred between 2400 and 2600 ms. In the following trials, the inner (green lines) and outer (red lines) time windows were adjusted independently of each other whenever negative or positive feedback occurred in <20% or >20% of cases.
To keep the rate of intermediate feedback high (at ∼60%) and therefore highly expected, and the rate of positive and negative feedback low (at ∼20% each) and thus rather unexpected, we applied an adaptive procedure to adjust the feedback to the participants' performance (for practical reasons, we refer to these conditions as “expected” and “unexpected” feedback in the following although referring to differences on a continuum of expectancy). In the first 20 trials, negative feedback was given if the participant's response was faster than 2000 ms or slower than 3000 ms, intermediate feedback was given if the response occurred between 2000 and 2400 ms or between 2600 and 3000 ms, and positive feedback was presented if the response occurred between 2400 and 2600 ms. In the following trials, outer time windows (2000 and 3000 ms) were adjusted every 20 trials by adding (to the lower time limit) and subtracting (from the upper time limit) 75 ms whenever negative feedback occurred in <20% of the last 20 trials and overall or by adding (to the upper time limit) and subtracting (from the lower time limit) 75 ms whenever negative feedback occurred in >20% during the last 20 trials and overall. Inner time windows (2000 and 2400 ms) were adjusted by adding or subtracting 15 ms whenever positive feedback occurred in <20% or >20% of the last 20 trials and overall (Fig. 1b).
Subjects were instructed by using a colored time line similar to the one shown in Figure 1a. They were told that they would receive “excellent” feedback when their time estimation had been very good and very close to 2.5 s and “bad” feedback when their button press was very late or very early and thus far away from 2.5 s. It was also explained to them that they would probably receive intermediate “ok” feedback most of the time because this type of feedback is easiest to get. Participants were given nine practice trials to familiarize themselves with the task. Afterward, they were told they should try to get positive feedback as often as possible and avoid negative feedback, but that they would play against the computer, which would try to make the task more difficult for them when they succeeded too often. This was included in the instruction to prevent participants from getting the impression that the feedback was not valid because the adaptive mechanism could change the time windows according to participants' performance over the course of the experiment.
Electroencephalogram recording.
Subjects were seated in a dimly lit, electrically shielded, and sound-attenuated chamber. While performing the experiment, the electroencephalogram (EEG) was recorded from 58 Ag/AgCl electrodes embedded in an elastic cap and amplified from DC to 100 Hz at a sampling rate of 500 Hz. The left mastoid served as reference. To control for vertical and horizontal eye movements, the electrooculogram (EOG) was recorded from the outer ocular canthi and the right suborbital and supraorbital ridges. Impedances for all electrodes were kept <15 kΩ. Further off-line data processing included a digital bandpass filter from 0.5 to 30 Hz in case of low-frequency signal drifts or high-frequency noise in the EEG channels. Recording epochs, including eye movements, were corrected by using a linear regression approach (Gratton et al., 1983), and epochs with other recording artifacts were rejected before averaging whenever the SD in a 200 ms time interval exceeded 30 μV in any EOG channel.
Data analyses.
Statistical analyses of behavioral data include mean numbers of feedback frequencies. Analyses of EEG data were based on ERPs time-locked to feedback presentation. For behavioral and EEG data, we excluded timeout trials from further analyses. Time windows for ERP analyses were selected according to previous studies and on visual inspection of the waveforms. To separate the FRN from other ERP activity in the same time range, the FRN was analyzed by means of the peak-to-peak difference between the positivity in a time window from 200 to 260 ms and the following negativity in a time window from 250 to 310 ms after feedback presentation (Holroyd et al., 2006). ERPs in the time range from 230 to 330 ms were also analyzed using mean amplitudes. Because the FRN is usually most pronounced at frontocentral sites (Miltner et al., 1997; Gehring and Willoughby, 2002; Holroyd and Coles, 2002), these analyses were performed at electrode FCz. P300 was examined by means of mean amplitudes in a time window between 340 and 440 ms at the recording site where it is largest (Pz). A 100 ms prestimulus baseline was used for all ERP averages. For topographical analyses, mean amplitude data were normalized using the vector scaling procedure as described by McCarthy and Wood (1985). Electrodes F3, Fz, F4, C3, Cz, C4, P3, Pz, and P4 were used, and, in addition to the experimental factors of interest, the factors laterality (left, middle, and right electrodes) and anterior–posterior (frontal, central, parietal, occipital electrodes) were formed.
Behavioral and ERP data were analyzed using repeated measures ANOVAs with an α level of 0.05. The Greenhouse–Geisser correction for nonsphericity was applied whenever appropriate and ε-corrected p values are reported together with uncorrected degrees of freedom and Greenhouse–Geisser ε values. Contrast analyses were conducted to test our specific predictions [i.e., expected feedback was compared with unexpected (positive and negative) feedback and positive feedback was compared with negative feedback].
To ensure that the above reported frequencies were actually due to an equal distribution of positive and negative feedback in each subject, mean feedback frequencies for the first five blocks of the experiment were subjected to a χ2 test for each individual participant. For this assessment, we used the first third of experimental blocks because the beginning of the experiment should be most critical for expectancy formation. For three participants, the adaptive mechanism did not succeed in adjusting the time windows fast enough and, as a result, the frequency distributions of positive and negative feedback differed reliably from the intended distribution (X2(α=0.05,df=1) = 4.00, X2(α=0.05,df=1) = 9.49, and X2(α=.05,df=1) = 5.05, respectively). Those participants were excluded from all further analyses.
Results
Behavioral results
Participants were able to estimate the target time of 2.5 s fairly well (mean estimation time = 2509.8 ms; SD = 205.4 ms). Moreover, the behavioral results show that the adaptive mechanism succeeded in generating the intended frequency distribution for the three feedback types. An ANOVA with the factors feedback type (positive, negative, intermediate) and experimental third (first, second, third) yielded a main effect of feedback type (F(2,38) = 802.55, p < 0.01). Overall, intermediate feedback (mean = 58.25%; SE = 0.66) was more frequent than the mean of positive and negative feedback (F(1,19) = 1502.69, p < 0.01, η2 = 0.99), while positive and negative feedback were rare (mean = 20.82%; SE = 0.73 and mean = 20.52%; SE = 0.47, respectively) and did not differ from each other (p = 0.77). Feedback frequencies did not change over the course of the experiment (p > 0.28).
ERPs in the FRN time range
Figure 2a shows the ERPs elicited by the three feedback types at representative frontocentral and parietal recording sites. Unexpected negative and unexpected positive feedback generated larger peak-to-peak FRNs than expected intermediate feedback (F(1,19) = 8.34, p < 0.01, η2 = 0.31) at FCz, while positive and negative feedback did not differ (p = 0.94).
a, Feedback-locked ERPs are displayed for all three types of feedback at electrodes FCz and Pz. The time window used to compute the peak-to-peak FRN is highlighted at electrode FCz and the window for P300 mean amplitude is highlighted at Pz. b, The peak-to-peak FRN at electrode FCz is larger for unexpected positive and unexpected negative feedback than for expected intermediate feedback. c, Mean P300 amplitude at Pz is largest for unexpected positive feedback and larger for unexpected negative feedback than for expected intermediate feedback. Bars depict SEM.
Mean amplitudes at FCz in the time range from 220 to 320 ms were more positive-going for unexpected negative and positive feedback than for expected intermediate feedback (F(1,19) = 15.50, p < 0.01), and more positive-going for positive than for negative feedback (F(1,19) = 8.41, p < 0.01).
P300
P300 amplitude was analyzed at Pz in the time range from 340 to 440 ms. It was found to be larger for the mean of positive and negative feedback than for intermediate feedback (F(1,19) = 31.33, p < 0.01) and larger for positive than for negative feedback (F(1,19) = 12.61, p < 0.01).
Topographical analyses
To examine whether the distributions of mean amplitudes in the early (230–330 ms) and late (340–440 ms; P300) time windows are qualitatively different, an ANOVA with factors time window (early/late), feedback, laterality, and anterior–posterior was calculated. It revealed an interaction between time window, laterality, and anterior–posterior (F(4,76) = 3.06, p < 0.05). Separate analyses for the two time windows showed that while P300 mean amplitude was largest at midparietal sites, mean amplitude in the early time window showed a similar parietal maximum but was slightly more left-lateralized [smaller amplitudes at left than midline electrodes for the late (F(1,19) = 36.52, p < 0.01) but not the early time window (p = 0.14)].
Discussion
The goal of this study was to examine whether the FRN, as a reflection of ACC activity, is mainly driven by the feedback's valence, as suggested by the RL theory, or by its unexpectedness, as proposed by the PRO model. For this purpose, we developed a time-estimation task with positive, negative, and intermediate feedback. Because an adaptive procedure ensured that the occurrence of positive and negative feedback was rare and thus unexpected, a comparison without the confounding influence of expectancy differences was possible.
Unexpected feedback generated larger FRNs than expected feedback regardless of valence. In the present experimental task, the similarity between positive and negative feedback conditions and the difference to intermediate feedback lies in the expectancy. Positive and negative feedback is equally rare and thus unexpected while intermediate feedback is frequent and expected. Therefore we feel safe to conclude that the size of the peak-to-peak FRN reflects an effect of expectancy violation. Positive feedback and negative feedback generate FRNs of the same size because both are equally unexpected.
A so far neglected problem in ERP research on performance monitoring is that it mostly focuses on negative events. As a consequence, feedback valence and expectancy in most studies are confounded. In studies investigating feedback processing in learning tasks, positive feedback is unexpected and negative feedback is expected in the beginning, and vice versa after learning has taken place (Holroyd and Coles, 2002; Nieuwenhuis et al., 2002; Cohen et al., 2007; Holroyd et al., 2008; Opitz et al., 2011). This general criticism is not always applicable to studies using gambling tasks because this confound is easy to avoid in those paradigms. However, they face a different problem. In gambling, feedback is not useful for behavioral adjustments and therefore participants may not generate expectancies. At best, their expectancies are hard to predict (Hajcak et al., 2007; Holroyd et al., 2009). Additionally, in many gambling studies explicitly addressing the role of valence, positive and negative feedback both occur in 50% of cases and thus are equally expected (Gehring and Willoughby, 2002; Hajcak et al., 2006). These differences in study design may explain why FRNs for unexpected positive feedback have not been found in earlier studies.
To our knowledge, although some studies have attempted to examine positive feedback effects (Donkers and van Boxtel, 2005), there are only two that tried to avoid the confound between expectancy and valence. Oliveira and colleagues (2007) asked their participants to estimate the time when a moving light would reach a covered target position and, after that, to judge the accuracy of their own estimation. The trial ended with feedback on how accurate their estimation had actually been. This multiple judgment procedure made it possible to compare unexpected positive and negative feedback (subjects' thinking their timing had been bad but it had been good and vice versa). Contrary to the RL theory (Holroyd et al., 2008), Oliveira and colleagues (2007) found that unexpected positive as well as unexpected negative feedback elicited an FRN. Nevertheless, in the first experiment of their study, it is not clear whether unexpected positive feedback was actually perceived as “better than expected” because the expectancy could have been determined either by participants' time estimation or by their own judgment about their estimation. The order of time estimation, judgment, and feedback makes it plausible to assume that expecting negative but getting positive feedback is an expectancy violation in the sense of “worse than expected” because it means that the participant's judgment about their performance had been inaccurate. Oliveira and colleagues (2007) acknowledged that problem and performed a second experiment that attempted to overcome it by using “false-positive” feedback (i.e., positive feedback occasionally occurred after erroneous estimates). Nevertheless, this has other drawbacks. False-positive feedback has the undesirable (and unusual) property that it does not afford useful information for adjusting behavior. It also has the potential to conflict with subjects' veridical assessment of their own performance: it can convey that “something was wrong” about the subjects' self-monitoring. In that sense, the feedback presents negative information, undercutting the argument that the FRN it elicits is simply an unexpected positive outcome. [Oliveira and colleagues (2007) acknowledge an additional problem with false feedback in that subjects' may come to disbelieve the feedback.] In our study, positive feedback truly represented an unexpected positive outcome resulting from exceptionally good performance.
In a second study attempting to avoid the valence/expectancy confound, Jessup and colleagues (2010) encouraged participants to choose a low-probability gamble over a sure win by manipulating the win value (gamble: 64 cents; sure win: 3 cents). They found that ACC is more active for rewards than losses when rewards are unlikely. However, the interpretation of their finding is limited because win value and probability were not independent of each other and thus the result could be due to the high win value instead of unexpectedness. Our present findings are consistent with the findings of the studies mentioned above. Moreover, our design enables us to contrast positive and negative feedback without the confounding influence of expectancy and uses a time-estimation task that provides feedback useful for modifying behavior.
Mean amplitude for unexpected positive feedback in the FRN time range was more positive-going than that for unexpected negative feedback and both were more positive-going than that for intermediate feedback. This means that positive feedback and negative feedback show the same waveform shape, as revealed by the peak-to-peak measures, but positive feedback results in more positive mean amplitudes. Because, in the current design, the only difference between the positive and negative feedback conditions is their valence, we infer that the difference in mean amplitudes in the FRN time range reflects an effect of feedback valence. In addition, an explanation is needed for the fact that the ERP for intermediate feedback generated the least positive going waveform. One possible reason for this is that intermediate feedback was not only of intermediate valence but also frequent. Thus, it differs from positive and negative feedback in valence and expectancy and therefore a direct comparison of the three feedback types is difficult.
However, another explanation for the early positive deflection comes to mind when looking at the P300, which was larger for positive than for negative feedback and smallest for intermediate feedback. In contrast to the FRN, which represents an initial and fast evaluation of an event, the P300 is thought to reflect a later, higher-order form of performance monitoring associated with the processing of unexpected events and is often linked to the evaluation of task relevance and working memory updating. Because P300 amplitude is sensitive to the frequency of events (Donchin and Coles, 1988; Mecklinger and Ullsperger, 1995; Polich, 2004, 2007), it is not surprising that intermediate feedback elicited the smallest P300. Interestingly, the difference between positive and negative feedback occurred although both feedback types were equally rare. This may reflect the fact that positive feedback had a greater task relevance because it signaled that the intended goal (avoid negative and get positive feedback) had been achieved or alternatively that its subjective probability was experienced as being lower (Johnson, 1986). Hints in favor of the latter interpretation come from the interviews we conducted after the experiment. When asked about feedback frequencies, 11 of 20 participants believed that positive feedback had been less frequent than negative feedback, while only four subjects believed the opposite. Although previous studies examining reward processing have led to inconsistent results, linking the P300 to reward magnitude regardless of valence (Yeung and Sanfey, 2004; Sato et al., 2005) or negative valence (Frank, 2005), the present results are in line with findings that P300 is larger after positive events (Hajcak et al., 2007; Bellebaum and Daum, 2008).
In addition to the aforementioned implications, the P300 results may also offer an explanation for the early valence effect because it is possible that an early P300 onset modulated amplitudes due to component overlap. Apart from the fact that the two effects show the same pattern of results, it has been shown before that P300 latency can occur relatively early for perceptually simple and highly salient stimuli (Polich, 2007) as our feedback stimuli. Partial support for this idea is also provided by the comparison of the topographical distributions in the early and late mean amplitudes, revealing highly similar posterior distributions for both effects. Therefore, it seems reasonable to assume that mean amplitude differences in the early time range reflect component overlap from an early-onsetting P300.
In conclusion, our results suggest that predictions regarding future events are made and discrepancies between actual and expected outcomes are reflected by the peak-to-peak FRN. This includes negative expectancy violations, like negative feedback, as well as positive expectancy violations, like unexpectedly good outcomes. By this, our results are consistent with the PRO model (Alexander and Brown, 2010, 2011), which suggests that the ACC is sensitive to the unexpectedness of events regardless of their valence. Future research will need to determine the specific functional role of these expectancy violations in behavior. The generalizability of our findings to other types of expectancy violations and the relationship to the bottom-up detection of new or unattended stimuli remain open issues and need to be examined in future studies.
Footnotes
This research was supported by the German Research Foundation (Grant International Research Training Group 1457). We thank Bertram Opitz for helpful comments and Aljoscha Becker and Dennis Stabler for data collection.
- Correspondence should be addressed to Nicola K. Ferdinand at the above address. n.ferdinand{at}mx.uni-saarland.de








