Abstract
Perceptual decision making is the process by which information from sensory systems is combined and used to influence our behavior. In addition to the sensory input, this process can be affected by other factors, such as reward and punishment for correct and incorrect responses. To investigate the temporal dynamics of how monetary punishment influences perceptual decision making in humans, we collected electroencephalography (EEG) data during a perceptual categorization task whereby the punishment level for incorrect responses was parametrically manipulated across blocks of trials. Behaviorally, we observed improved accuracy for high relative to low punishment levels. Using multivariate linear discriminant analysis of the EEG, we identified multiple punishment-induced discriminating components with spatially distinct scalp topographies. Compared with components related to sensory evidence, components discriminating punishment levels appeared later in the trial, suggesting that punishment affects primarily late postsensory, decision-related processing. Crucially, the amplitude of these punishment components across participants was predictive of the size of the behavioral improvements induced by punishment. Finally, trial-by-trial changes in prestimulus oscillatory activity in the alpha and gamma bands were good predictors of the amplitude of these components. We discuss these findings in the context of increased motivation/attention, resulting from increases in punishment, which in turn yields improved decision-related processing.
Introduction
Perceptual decision making is the process by which information gathered from sensory systems is combined and used to guide our behavior (Gold and Shadlen, 2007; Heekeren et al., 2008). It is influenced primarily by the quality of incoming sensory evidence but remains susceptible to external factors, such as the presence and the relative amount of reward or punishment associated with potential choices (Liston and Stone, 2008; Pleger et al., 2008; Feng et al., 2009; Fleming, 2009; Fleming et al., 2010; Nomoto et al., 2010; Mulder et al., 2012).
Several studies investigated the neural correlates of perceptual decisions in primates and humans, providing valuable information about the underlying neural mechanisms (Gold and Shadlen, 2007; Heekeren et al., 2008). Neural correlates of time-dependent accumulation of stimulus evidence have been localized in parietal cortex with additional decision-making processing identified in prefrontal cortices (Kim and Shadlen, 1999; Shadlen and Newsome, 2001; Heekeren et al., 2004; Ploran et al., 2007, 2011; Tosoni et al., 2008; Philiastides et al., 2010, 2011; Rorie et al., 2010; Bennur and Gold, 2011; Ding and Gold, 2012), as well as in the superior colliculus (Horwitz and Newsome, 1999) and striatum (Basten et al., 2010; Ding and Gold, 2010; Forstmann et al., 2010; Green et al., 2012). Similarly, functional magnetic resonance imaging (fMRI) and electroencephalography (EEG) experiments have shown that reward and punishment have a strong effect on choice behavior and the accompanying neural processes (O'Doherty et al., 2001; Yeung and Sanfey, 2004; Knutson and Cooper, 2005; Talmi et al., 2009; Hare et al., 2010; Summerfield and Koechlin, 2010; Dambacher et al., 2011; Harris et al., 2011).
However, only few studies investigated how reward and punishment modulate perceptual decision making (Pleger et al., 2008, 2009; Fleming et al., 2010; Pessoa and Engelmann, 2010; Weil et al., 2010; Laufer and Paz, 2012). Although fMRI studies reported value-related modulations throughout spatially selective areas in visual (Serences, 2008; Serences and Saproo, 2010; Weil et al., 2010) and primary somatosensory (Pleger et al., 2008; Pleger et al., 2009) cortices, high temporal resolution information of this modulatory activity is lacking. Specifically, it is currently unclear whether reward and punishment influence early sensory processing (Shulman et al., 1997), later postsensory decision-related activity (Small et al., 2005), motor preparation and execution, or a combination of these processes (Engelmann et al., 2009; Pessoa, 2009). Moreover, the role of attention and prestimulus state on these reward/punishment-induced effects remains unknown.
Here, we investigated the temporal dynamics of the influence of punishment on perceptual decision making using single-trial analysis of the EEG collected during a perceptual categorization task in which the amount of punishment and sensory evidence were experimentally manipulated. We used a machine learning approach (Philiastides and Sajda, 2006b, 2007; Philiastides et al., 2006) to identify linear spatial weightings of the EEG sensors for specific temporal windows, which optimally discriminated along the punishment and sensory evidence dimensions. We analyzed the relative timing of the resulting discriminating activity to identify whether punishment-related activations temporally overlap with early sensory processing or whether they reflect primarily postsensory processing. In addition, we looked at their temporal profile and the extent to which they predict psychophysical performance on a trial-by-trial basis. Finally, we tested for differences in prestimulus oscillatory activity across punishment levels and the extent to which they predict poststimulus processing.
Materials and Methods
Participants.
Twenty-four subjects participated in the study (11 males; mean ± SD age, 25.89 ± 2.95 years). They had normal or corrected-to-normal vision and reported no history of neurological problems. Written informed consent was collected from all participants according to procedures approved by the local ethics committee of the Charité, University Medicine Berlin. In addition to a fixed sum of €15 paid for participation, participants could earn extra money (maximum another €15) depending on their performance in the task. Two participants were excluded as a result of excessive movement artifacts in the EEG that caused the amplifiers to saturate at multiple occasions. All analyses are based on the remaining 22 subjects.
Stimuli.
We used a set of 20 face (face database; Max Planck Institute for Biological Cybernetics, Tuebingen, Germany) and 20 car (retrieved from the internet) grayscale images (image size, 512 × 512 pixels, 8 bits/pixel). All images were equated for spatial frequency, luminance, contrast, and magnitude spectra. Their corresponding phase spectra were manipulated using the weighted mean phase technique to generate a set of noisy images characterized by their percentage phase coherence (i.e., amount of sensory evidence) (Dakin et al., 2002). For the training session, each image had six different phase coherence values (27.5, 30, 32.5, 35, 40, and 45%). For the main experiment, two coherence levels per category were selected for each subject, corresponding to ∼75 and 90% correct performance during training. A Dell Precision 360 Workstation with nVidia Quadro FX500/FX600 graphics card and Presentation software (Neurobehavioral Systems) controlled the stimulus display. Images were presented on a Dell 2001FP TFT monitor (resolution, 1024 × 768 pixels; refresh rate, 60 Hz). Each image subtended 8° × 8° of visual angle.
Behavioral paradigm.
We used a face-versus-car categorization task with a 2 × 3 factorial design (2 phase coherence levels × 3 punishment levels). Subjects had to discriminate noisy images of faces and cars presented on a computer monitor. The different punishment levels were as follows: possible loss of €0, €5, and €10 for an incorrect answer. Punishment levels were implemented block-wise. For the final payout, subjects were told that three answers, one from each type of block, would be chosen randomly at the end of the experiment. If these answers were incorrect, subjects would lose €0, €5, and €10, depending on the type of block from which the incorrect answer was drawn.
A schematic representation of the behavioral paradigm is given in Figure 1A. The participants were sitting in a dark, soundproof, electrically shielded cabin, with 1 m distance from a computer screen. Each image was presented for 50 ms, followed by a blank screen of maximal 1000 ms, during which subjects had to make a choice by pressing one of two (left or right) buttons on a response box. This was followed by another delay period, randomized in the range 1750–2250 ms (mean delay, 2 s). Consequently, the average total interstimulus interval (ISI) was 3 s. Subjects were instructed to respond as soon as they have known (or believed to have known) the correct answer. Participants had to press the left button for a face choice and the right button for a car choice, using their right index and middle fingers, respectively.
Each subject performed a total of nine blocks (i.e., three blocks of each punishment level) while EEG was recorded simultaneously. One block consisted of a total of 80 trials (40 trials for each of the two levels of sensory evidence, with an equal number of face and car images). The order of punishment levels was quasi-randomized across the nine punishment blocks, not allowing the same punishment level in two subsequent blocks. Specifically, we used four different orders (three blocks of 10–0-5, 5–0-10, 0–5-10, or 10–5-0). Participants were distributed equally across orders. Additionally, subjects had to pay 10¢ for too slow responses (longer than 1 s), independent of block type. This was implemented to prevent subjects from being unmotivated and from not giving responses to avoid punishments. Trials in which subjects failed to respond within the allocated time of 1 s were excluded from additional analyses.
We used a design with block-wise manipulation of punishment to avoid a bias for image location (Feng et al., 2009) and image property (Kiss et al., 2009). Therefore, incorrect answers were punished independently from image category. Furthermore, compared with an event-related paradigm, a blocked design with predictable and constant monetary punishment should evoke sustained activation instead of rapid alternation and therefore be most robust for examining continuous electrophysiological activations (Goldstein et al., 2006). Furthermore, trial-wise changes of payoffs are only moderately successful in inducing changes in choice behavior (Diederich, 2008; Simen et al., 2009; Bogacz et al., 2010).
During the main experiment, we provided no feedback about correct or incorrect answers to avoid any interference attributable to feedback processing. Moreover, without feedback, we expected smaller learning effects during the experiment and no motivational effects caused by positive feedback. In contrast to the main experiment, during training, subjects received feedback about correct or incorrect answers, and they were not punished for incorrect answers. This was done to facilitate learning the task in a reasonable amount of time.
Note that we included a manipulation of the amount of sensory evidence (i.e., phase coherence) in our paradigm because we wanted to compare the influence of punishment and phase coherence level on the decision-making process separately and compare the relative timing of the resulting activations. In addition, we decided to use punishment instead of reward, expecting clearer behavioral and neurophysiological effects attributable to a higher impact of loss than of gain (see prospect theory by Kahneman and Tversky, 1979; negativity bias by Taylor, 1991).
EEG data acquisition.
Scalp electrophysiological data were recorded using Brain Products amplifiers (BrainVision; Brain Products) with a sampling rate of 1000 Hz from 74 Ag/AgCl scalp electrodes in equidistant positions according to the 10% system (EasyCap). Two electrodes at the outer canthi of the eyes and one electrode below the left eye recorded the ocular activity, and the chin electrode served as ground. Impedances were kept below 10 kΩ, and all channels were referenced to left mastoid. Data underwent online filtering with a bandpass filter of 0.1–250 Hz. A software-based 0.5 Hz high-pass filter was applied to the data in addition to 50 and 100 Hz notch filters to minimize line-noise artifacts. These filters were designed to be linear phase to minimize delay distortions. Subsequently, data were re-referenced to the average of all channels including the second mastoid. Finally, data were downsampled to 500 Hz.
To obtain accurate event triggers, we placed a custom-made photodiode on the screen to detect the onset of the stimuli. An external response device was used to collect response times (RTs). Both signals were collected on two external channels of the EEG amplifiers to ensure synchronization between stimulus events, responses, and the EEG data.
Before the main experiment, subjects completed an eye-movement calibration task. They were instructed to blink repeatedly on the appearance of a white-on-black fixation cross and to then make several horizontal and vertical saccades according to the position of the fixation cross on the screen. The fixation cross was subtended 0.6° × 0.6° of visual angle. Horizontal saccades subtended 20°, and vertical saccades subtended 15°. The timing of these visual cues was recorded with EEG. This enabled us to determine linear components associated with eye blinks and saccades (using principal component analysis) that were subsequently projected out of the EEG data recorded during the main experiment (Parra et al., 2005).
Behavioral data analyses.
To test the behavioral effects of punishment and the amount of sensory evidence on accuracy and RTs, we used separate two-factor repeated-measures ANOVA, with factors punishment and image phase coherence.
Single-trial analyses.
We used linear discriminant analysis (LDA) (Parra et al., 2002, 2005; Philiastides and Sajda, 2006b; Philiastides et al., 2006; Ratcliff et al., 2009; Blankertz et al., 2011) to perform binary discriminations between conditions of interest. Specifically, we performed discrimination along a punishment dimension [i.e., 10 (highest) vs 0 (lowest) punishment trials] and along a sensory evidence dimension (i.e., high vs low image sensory evidence trials). Data from punishment level 5 served as an “unseen” dataset to establish a parametric modulation of neural activity across all punishment levels (e.g., 0 < 5<10; for details, see Results). The analysis was repeated for each subject separately.
Unlike conventional, univariate, trial-average event-related potential (ERP) analysis techniques, LDA algorithms are designed to spatially integrate information across the multidimensional sensor space such that trial-to-trial variability is preserved while at the same time ensuring that the resulting discriminating components have higher signal/noise ratio (SNR) compared with ERP data from individual or small subsets of sensors. Specifically, for each binary comparison, the method tries to identify, within short predefined time windows of interest, a projection in the multidimensional EEG space that maximally discriminates between each of the relevant conditions. Here, we defined time windows of interest with duration δ and onset time τ and used regularized Fisher discriminant analysis (FDA) (Duda et al., 2001; Blankertz et al., 2011) to estimate weighting vectors wδ,τ (spatial filters) to generate one-dimensional projections yτ(t) from D channels (indexed by c) in the EEG data, denoted with x(t): such that yτ(t) is maximally discriminating between conditions of interest (i.e., 10-versus-0 punishment levels and high-versus-low sensory evidence). Specifically, the projection vector wδ,τ (Duda et al., 2001; Blankertz et al., 2011) is defined as follows: wδ,τ = Sc(m2 − m1), where mi is the estimated mean of condition i, and Sc = 1/2(S1 + S2) is the estimated common covariance matrix (i.e., the average of the condition-wise empirical covariance matrices, Si = 1/(n − 1)∑j=1n(xj − mi)(xj − mi)T, where n is number of trials). However, for multidimensional data and relatively few data points/trials, the estimation of the empirical covariance matrices might become imprecise (attributable to the quadratic nature of the covariance estimate). To counterbalance potential estimation errors, we replaced the condition-wise covariance matrices with regularized versions of these matrices: S̃i = (1 − λ)Si + λvI, where λ ϵ [0,1] is the regularization term, and v is the average eigenvalue of the original Si [i.e., trace(Si)/D, with D being the dimensionality of the feature space, here the number of EEG channels] (Duda et al., 2001; Blankertz et al., 2011). Note that λ = 0 yields unregularized FDA and λ = 1 assumes spherical covariance matrix. Here, we optimized λ for each participant based on discriminator performance (see below) using grid search in increments of 0.01.
For all binary comparisons, we used a training window length δ = 60 ms and either stimulus- or response-locked EEG data. Note that yτ(t) is an aggregate representation of the data over all sensors (i.e., we are collapsing the multidimensional sensor space into a single representation). Compared with individual channel data, the resulting “discriminating component” yτ(t) is a better estimator of the underlying neural activity and is often thought to have better SNR and reduced interference from sources that do not contribute to the discrimination (Parra et al., 2005). We use the term “component” instead of “source” to make it clear that this is a projection of all the activity correlating with the underlying source.
To quantify the discriminator performance, we used the area under the receiver operator characteristic (ROC) curve, referred to as Az, with a leave-one-out cross-validation approach (Duda et al., 2001). We used the ROC Az metric to characterize the discrimination performance at multiple time points (relative to stimulus and response) by sliding our discriminator training window across time (varying τ). Finally, to assess the significance of the resulting discriminating component, we used a bootstrapping technique to compute an Az value, leading to a significance level of p = 0.01. Specifically, we computed a significance level for Az by performing the leave-one-out test after randomizing the true trial labels of the relevant conditions. We repeated this randomization process 1000 times to produce an Az randomization distribution and compute the Az, leading to a significance level of p = 0.01.
Given the linearity of our model, we also computed scalp topographies of the discriminating components resulting from Equation 1 by estimating a “forward model” for each component: where the EEG data and discriminating components are now in a matrix and vector notation, respectively, for convenience (i.e., time is now a dimension of X and yτ). Intuitively, aτ can be seen as a linear spatial projection of a one-dimensional component (yτ) back onto the surface electrodes. That in turn allows one to visualize the spatial distribution of component activity on the scalp. A strong projection indicates low attenuation of the component and can be visualized as the intensity of the “sensor projections” aτ (in units of microvolts). Therefore, the intensity of sensor projections aτ indicates proximity/correlation of the discriminating component to the sensors. Red represents positive correlation between the sensors and the discriminating component, whereas blue represents negative correlation. The color intensities on these maps can be thought of as representing “differential activity” between the conditions of interest (i.e., 10-versus-0 punishment and high-versus-low sensory evidence), as well as an index of how much each of the sensors contributes to discriminability. Crucially, the sign in aτ is arbitrary and depends on the class labels assigned during discrimination (i.e., discriminating 0-versus-10 for punishment instead of 10-versus-0 would have reversed the sign on these maps; similarly for sensory evidence). All scalp maps were plotted using EEGLAB (Delorme and Makeig, 2004).
In cases in which we identified sustained discriminating activity, we used these forward model estimates to identify temporal transitions between different components based on differences in scalp distribution, which are naturally suggestive of changes/differences in the underlying cortical sources. Specifically, we used a simple k-means clustering algorithm using a Euclidean distance metric (Duda et al., 2001) on the intensities of vector aτ for the entire time range of interest and optimized k (i.e., the number of different time windows with similar scalp topographies) using silhouette values (Rousseeuw, 1987).
To visualize the temporal evolution of the discriminating components, we constructed temporal profiles of relevant discriminant components (as seen in Figs. 3D, 5D, 7). Specifically, after aligning trials to the appropriate experimental event (stimulus or response), the optimal projection vector wδ,τ estimated for a given window τ was applied across an extended time window. Trials were then divided based on the relevant conditions [punishment (Figs. 3D, 5D) or sensory evidence (Fig. 7) level] and averaged together to yield an average temporal profile for each of the components of interest. Note that the polarity of these components is arbitrary and depends on the directionality of class labels during discrimination.
Prestimulus spectral analyses.
Because we used a design with block-wise manipulation of punishment, we wanted to formally test whether differences in prestimulus (baseline) oscillatory activity across punishment levels exist and the extent to which they predict poststimulus activity revealed by our multivariate single-trial analysis. Specifically, for each trial, we computed the amplitude spectrum of the EEG in the 500 ms preceding stimulus onset at each electrode by Fourier analysis [i.e., using fast Fourier transform (FFT) as implemented in MATLAB (MathWorks)]. For each of four different frequency bands [theta (1–4 Hz), alpha (8–12 Hz), beta (12–36 Hz), and gamma (36–100 Hz)], we computed spectral amplitudes (FFTAf0) for each of the three punishment levels. We then performed a linear fit through these data points (using linear regression) to estimate a slope through the different punishment levels. To establish significant parametric modulation as a function of punishment, we required that the slopes across participants were significantly different from zero. Finally, we used linear regression to test whether trial-by-trial changes in prestimulus activity (from frequency bands and sensors that showed punishment-induced effects) were predictive of trial-by-trial fluctuations in poststimulus component activity. Once again, we tested whether the resulting regression coefficients were significantly different from zero.
Results
Behavioral performance
The analysis of the behavioral data revealed that accuracy increased significantly with punishment level (F(2,21) = 5.5920, p = 0.007; Fig. 1B), whereas RTs did not differ between punishment conditions (F(2,21) = 0.0231, p = 0.9771; Fig. 1C). Post hoc paired t tests showed that the accuracy during the “no-punishment condition” differed significantly from both “punishment conditions” (0 vs 5 punishment, t(21) = 2.1634, p = 0.0422; 0 vs 10 punishment, t(21) = 2.9366, p = 0.0079). Although, on average, the accuracy for the highest punishment condition (i.e., pun 10) was higher than that for the intermediate one (i.e., pun 5), the two punishment conditions did not differ significantly (5 vs 10 punishment, t(21) = 0.7283, p = 0.4745), likely because of interindividual differences as well as behavioral ceiling effects at the highest punishment level.
In contrast to the punishment manipulation, the amount of sensory evidence had, as expected, a significant effect on both accuracy and RTs (Philiastides and Sajda, 2006b, 2007; Philiastides et al., 2006). Accuracy was significantly decreased (F(1,21) = 79.3214, p < 1 × 10−7; Fig. 1D) in the low relative to the high sensory evidence condition, and RTs were significantly increased from the high compared with the low sensory evidence condition (F(1,21) = 84.5821, p < 1 × 10−8; Fig. 1E). The number of slow responses (>1s) did not differ between the three punishment conditions (nonparametric Friedman's test, χ2(2) = 0.9259, p = 0.6294). There were no interaction effects of punishment and sensory evidence.
Neural components associated with punishment
To identify EEG activity related to our punishment manipulation, we initially used our multivariate discriminant analysis to classify components that discriminated between the 10 (highest) and 0 (lowest) punishment levels. Having identified components that discriminated between the two extreme punishment levels, we subsequently applied the resulting discriminating projection vectors (wδ,τ) to “unseen” trials from the intermediate punishment level (pun 5) to establish whether the resulting activity is parametrically modulated by the amount of punishment (e.g., 0 < 5 < 10) or whether it reflected an “all-or-none” effect of punishment instead (e.g., 0 < 5 ≈ 10; see below). Furthermore, we tested the extent to which our ability to discriminate between punishment levels based on EEG data correlated with the amount of behavioral improvements seen in individual participants. Finally, to characterize the temporal evolution of the resulting discriminating components, we constructed temporal profiles of the relevant discriminant components. We performed this analysis on both stimulus- and response-locked EEG data.
The stimulus-locked analysis revealed sustained significant discrimination performance of the 10-versus-0 punishment levels in the time range 200–470 ms after stimulus onset (Fig. 2A). The gradual evolution of the scalp distributions (aτ) of the resulting discriminating activity within this time range suggests a possible cascade of events in a rather distributed network (Fig. 2A). A closer inspection of these scalp topographies revealed at least four different spatial component distributions. To quantify the time range for each of the four components, we used a simple k-means clustering algorithm on the scalp map data [k = 4, mean silhouette value = 0.71 (Duda et al., 2001)], which revealed the following four component intervals: 205–255, 265–305, 315–395, and 405–465 ms after stimulus (Figs. 2A, 3A). Although the associated scalps maps were spatially distinct, they also revealed a seemingly gradual transition of component activity from centrofrontal to parietal sensors, from the second component onward (Fig. 3A).
Importantly, the discrimination did not only reflect a simple punishment absent/present (i.e., all-or-none) effect. Instead, the mean discriminator output (yτ) was parametrically modulated across the three punishment levels (Fig. 3B). The reconstructed discriminator output for the intermediate punishment condition [yτ(5)] was situated between and differed significantly from the 0 and 10 punishment conditions, for all four components (paired t tests, all p < 1 × 10−4; Fig. 3B).
Next, we wanted to capitalize on the fact that there is often a sizable variability across participants in how strongly they respond to simple reward/punishment manipulations in the laboratory (Diederich and Busemeyer, 2006; Diederich, 2008; Bogacz et al., 2010) to establish that our four punishment components do in fact reflect quantitative information necessary to influence behavior. Specifically, we hypothesized that overall accuracy changes attributable to the presence of punishment [i.e., accuracy(pun 10 + 5) − accuracy(pun 0)] in individual participants should correlate positively with our ability to discriminate between the relevant conditions in each of these participants using their electrophysiological data [i.e., yτ(5 + 10) − yτ(0)]. Correlations were highly significant for all four components of interest (mean r = 0.7225, all p < 1 × 10−3; Fig. 3C), demonstrating that, the more an individual was affected by our punishment manipulation, the greater the modulation of the relevant electrophysiological correlates.
To visualize the temporal evolution of the four discriminating components, we constructed temporal profiles for each of the components by aligning trials to the onset of the stimulus and applying the optimal projection vector wδ,τ estimated for a given component across an extended time window (200 ms before to 600 ms after the stimulus). Although we expected the difference between the 10-versus-0 punishment conditions to be maximal within this window, we constructed these profiles to primarily visualize the temporal shape of the components and report changes in slope versus amplitude across the three different punishment conditions. The temporal profiles of all four components confirmed the parametric effect of punishment on the neural data and provided preliminary support for the roles of each of the components in the decision process (Fig. 3D). The first three components exhibited a ramp-like profile (especially pronounced in the third component), with the slope of this activity seemingly being modulated by the amount of punishment, potentially consistent with a process of sensory evidence integration over time (Gold and Shadlen, 2001; Ploran et al., 2007; Philiastides et al., 2011). Although the ramp-like profile of these components might be a side effect of averaging over trials and subjects, temporal profiles from individual participants (for an example, see Fig. 3D, inset) showed qualitatively a very similar pattern. Interestingly, the fourth component showed plateauing activity shortly before the subjects' response with only amplitude differences between the different punishment levels, possibly indicating commitment to a choice with different decision thresholds or more generally the level of confidence in the impending response (Domenech and Dreher, 2010).
As with any decision-making paradigm involving a manipulation of reward or punishment, attention is bound to play a major modulatory role, whether to signal for the allocation of additional resources or to more directly affect decision-related processing (Maunsell, 2004; Peck et al., 2009; Anderson et al., 2011; Litt et al., 2011; Louie et al., 2011). To decipher which of our four punishment components were more related to overall changes in attention/alertness across the different punishment levels as opposed to actual decision-related processing, we exploited the single-trial variability in the EEG data, afforded by our multivariate analysis approach. We hypothesized that trial-by-trial changes in neural activity from decision-related components should be more predictive of choice behavior than components representing primarily global changes in attention and general arousal. Importantly, however, this does not preclude a potential influence of attention on the identified decision-related components themselves.
To test this formally, we first removed the overall influence of punishment from individual trials (and hence potential effects of overall arousal) by z-transforming the single-trial discriminator output values [yτ(z-scored)] for each punishment level separately. We then used these trial-by-trial fluctuations around the mean response as predictors of behavioral accuracy in a single (pooling data over all subjects) multiple logistic regression model (i.e., Pcorrect = 1/(1 + e−(β0+β1y1+β2y2+β3y3+β4y4))). Importantly, we found no multicollinearities for the four punishment predictors [variance inflation factor (VIF) were 1.3, 1.5, 1.4, 1.2, respectively; all values were <5; multicollinearity is considered high if VIF >5–10]. In addition, for individual participants (without any transformation of y values), the analysis confirmed that there were no multicollinearities between predictors [max(VIF) = 3.4721 < 5]. The third and fourth components were found to be significantly predictive of participants' probability of correct choice (β3/4 significantly greater than 0, t test, both p < 0.05) (Fig. 4). Interestingly, the fourth (latest) punishment component remained significantly predictive of choice behavior even when we expanded the logistic regression model to include the influence of two additional components identified when discriminating along the sensory evidence dimension (see below, Neural components associated with sensory evidence). In contrast, the first two components were not significantly predictive of behavioral accuracy (β1/2 not significantly greater than 0, t test, both p > 0.05). Importantly, the regression coefficients for the last two components were significantly greater than those of the first two components (β1/2 < β3/4, t test, p < 0.05). Including the subject variable as an additional random factor to the regression revealed qualitatively the same results (β3/4 significantly greater than 0, p < 0.05; β1/2 not significantly greater than 0, p > 0.05; and β1/2 < β3/4, p < 0.05). Together, these findings suggest that the later discriminating components were more tightly associated with behavior than the early discriminating components.
The response-locked analysis revealed significant discrimination performance (10-versus-0 punishment levels) in the time range of 200 ms before to 250 ms after the response (Fig. 2B). Similar to the stimulus-locked analysis, clustering on the scalp topographies (aτ) of the resulting discriminating activity revealed at least three different response-locked components: (1) a first component 190–120 ms before the response; (2) a second component 70 ms before to 50 ms after the response; and (3) a third component 90–250 ms after the response (Figs. 2B, 5A). Interestingly, the scalp topographies of the first two response-locked components are qualitatively very similar to the last two stimulus-locked components, suggesting that punishment modulation emerged locked to the stimulus onset and persisted until the response, reaffirming that these components are more likely to be associated with the actual process of making the decision itself.
Analogous to the stimulus-locked analysis, for each of the three components, the mean discriminator output (yτ) was parametrically modulated by the amount of punishment and did not reflect a mere punishment absent/present effect (Fig. 5B). The discriminator output for the intermediate punishment condition [yτ(5), estimated by applying the spatial projection vectors resulting from the 10-versus-0 punishment discrimination] was situated between the 0 and 10 punishment conditions and was significantly different from each one of them, for all three components (paired t tests, all p < 1 × 10−4; Fig. 5B). A significant correlation between the overall accuracy changes attributable to the presence of punishment [i.e., accuracy(pun 10 + 5) − accuracy(pun 0)] in individual participants and the performance of the discriminator in separating the relevant conditions based on the neural data [i.e., yτ(10 + 5) − yτ(0)] was also present (mean r = 0.7633, all p < 1 × 10−3; Fig. 5C), confirming that the degree of behavioral adaptation was reflected in the degree of modulation of the relevant neural components.
The temporal profile of all response-locked components confirmed the parametric effect of punishment (Fig. 5D), whereas the gradual buildup of activity in the earlier components suggests that the process of evidence accumulation leading up to the decision is likely to be modulated by punishment. Finally, the strong punishment induced effects arising well into the post-response period (third response-locked component) might point to a postdecision expected “reward signal” that is likely the result of an improved expected performance. Indeed, previous research showed that rewards and punishment are relatively coded, so that when all outcomes are losses, smaller losses (or no losses at all) will be perceived as rewards (Kim et al., 2006).
Neural components associated with sensory evidence
To test whether punishment influences early sensory processing or primarily later postsensory and decision-related activity, we also analyzed our data along the sensory evidence dimension (i.e., high vs low sensory evidence discrimination) to identify whether component activity associated with early sensory processing temporally overlaps with any of our punishment-related activations.
Our stimulus-locked analysis revealed two components in line with previous reports by Philiastides and colleagues: an early component, temporally consistent with the well-known N170 ERP component (Jeffreys, 1996; Halgren et al., 2000; Liu et al., 2000; Rossion et al., 2003; Philiastides and Sajda, 2006a,b; Philiastides et al., 2006), which is associated with early stimulus encoding and a more persistent, postsensory, component later in the trial (after 340 ms after stimulus) reflecting the quality of the evidence entering the decision process itself (Figs. 6A, 7A). Importantly, the early component, which is associated with early sensory processing, appeared before the earliest punishment-induced effects, which in turn suggests that all of our punishment components are likely to represent postsensory processing stages such as top-down influences of attention and decision-related processing.
Consistent with previous reports (Philiastides and Sajda, 2006a,b, 2007), the later sensory evidence component, which has been shown to index, on a trial-by-trial basis, the quality of evidence used in the decision stage itself (Ratcliff et al., 2009), appeared in our response-locked analysis as well (peak discrimination performance ∼100 ms before the response; Fig. 6A), as evident by the similarities in component scalp topologies across the stimulus- and response-locked analyses (Figs. 6B, 7B). This finding suggests that this component activity starts out as being stimulus-locked but persists until the response (see its temporal profile in Fig. 7B), consistent with the notion that decision evidence is used/accumulated continuously until one commits to a choice. This late sensory evidence component followed the third punishment component in the stimulus-locked analysis (compare Figs. 2A, 6A). The order as well as the time difference (∼100 ms) of these two stimulus-locked components was also evident in the response-locked analysis. Note that, although in the response-locked analysis the first punishment component appeared before the first component of sensory evidence, the stimulus-locked analysis showed that the earliest stimulus-locked punishment component emerged clearly after the earliest stimulus-locked sensory-related component.
To provide additional support that late punishment effects might in fact represent decision-related activity, we performed an additional analysis, in which we capitalized on the fact that our late sensory evidence component is already known to be associated with the decision process itself (Philiastides and Sajda, 2006b; Philiastides et al., 2006; Ratcliff et al., 2009). Specifically, we applied the discriminating projection for the late sensory evidence component to the punishment trials and tested for potential punishment effects (Fig. 8). On average, component activity appeared to be modulated by punishment. Although there was no significant effect of punishment in a one-factor, within-subject repeated-measures ANOVA (F(2,21) = 1.771, p = 0.1826), post hoc t tests revealed that component amplitudes for the high punishment condition (10) were significantly higher than the no-punishment condition (one-tailed, paired t test t(21) = 1.9087, p = 0.035). This finding could be cautiously interpreted as a sign that punishment effects were present during the late sensory evidence component, which in turn would provide additional support to the notion that late punishment effects modulate decision-related activity. Although the scalp maps for the late sensory evidence and the late punishment component look different (possibly attributable to signal multiplexing from other sources/processes; for more details, see Discussion), these results point to partially overlapping neuronal sources echoing both punishment and sensory evidence effects.
Prestimulus oscillatory activity
In this study, we used a design with block-wise manipulation of punishment, and therefore we wanted to test whether differences in prestimulus (baseline) oscillatory activity across punishment blocks exist and the extent to which they predict the poststimulus punishment-related effects revealed by our multivariate discriminant analysis. Specifically, for each trial, we computed the amplitude spectrum of the EEG in the 500 ms preceding stimulus onset at each electrode by Fourier analysis. To establish whether there exists a parametric modulation in the amplitude of different oscillatory rhythms across punishment levels, we used linear regression to estimate the slope of change in spectral amplitudes as a function of the amount of punishment, on a sensor-by-sensor basis. We found a significant reduction in alpha spectral amplitudes with increases in punishment in a distributed set of frontal and occipitoparietal sensors (slopes significantly <0 across participants, t tests, all p < 0.05; Fig. 9A). In addition, gamma band amplitude decreased in frontal and increased in occipitotemporal sensors (slopes significantly different from 0, t tests, all p < 0.05; Fig. 9B).
Next, to test whether prestimulus activity (in the alpha and gamma band separately) is predictive of poststimulus punishment effects, we extracted single-trial spectral amplitudes (FFTAα or γ) from sensors of interest (those that exhibited significant effects above) and used these values to predict trial-by-trial variability in discriminator output (yτ) for each of the four stimulus-locked punishment components reported previously (i.e., yτ = β0 + β1FFTAα or γ). We found that baseline oscillatory phenomena from both the alpha and gamma frequency bands were predictive of trial-by-trial changes in poststimulus component activity in all four stimulus-locked punishment components (β1 values significantly different from 0 across participants, t test, all p < 0.01). These findings suggest that, at least for block-wise manipulations of punishment, baseline activity that is likely reflective of changes in attentional states, is used to influence later, poststimulus processing.
Discussion
We used EEG and a perceptual categorization task in which the degree of monetary punishment was manipulated to identify temporally distinct neural components affected by punishment. We used multivariate single-trial discriminant analysis to discriminate between low and high punishment conditions. Similarly, we manipulated the stimulus phase coherence to identify neural components that discriminate the amount of sensory evidence. Punishment-related activations followed early sensory processing and punishment-induced components correlated with punishment-induced accuracy improvements, both across and within participants. Finally, punishment induced differences in prestimulus oscillatory activity, which in turn predicted poststimulus trial-by-trial changes in neuronal responses.
Punishment-induced effects on decision-related activity appeared in four temporally distinct poststimulus components (Figs. 2A, 3A), whereas sensory evidence-induced effects appeared in two components (Figs. 6A, 7A). The comparison of the modulations induced by punishment and sensory evidence confirmed that punishment-induced components follow early sensory processing. The earliest sensory evidence-induced component (170–205 ms, consistent with the N170 ERP component reflecting early stimulus encoding; Jeffreys, 1996; Halgren et al., 2000; Liu et al., 2000; Rossion et al., 2003) preceded the earliest punishment-induced component (205–255 ms). After this component (and within the same early time window) a component showing greater response to low than high sensory evidence (temporal profile in Fig. 7A) resembled the “difficulty component” (Philiastides et al., 2006) that was shown to reflect top-down influence of attention on decision making.
The later component modulated by sensory evidence (350–460 ms) is also consistent with previously reported activity (Philiastides and Sajda, 2006b, 2007; Philiastides et al., 2006; Ratcliff et al., 2009) representing postsensory processing reflecting the quality of decision evidence. The timing of this component overlapped with the last two punishment components (315–395 and 405–465 ms), suggesting that these punishment activations are linked to decision-related information processing. Each of our components does not necessarily represent a single neuronal source but instead an aggregate of parallel-implemented processing stages (Heekeren et al., 2008; Engelmann et al., 2009; Cisek and Kalaska, 2010; Otto and Mamassian, 2012) correlating with the respective dimension of interest (e.g., evidence accumulation and top-down influence of attention). Consistent with this interpretation, punishment-induced effects as captured in the temporal profiles of the four punishment components (Fig. 3D) appeared persistent and overlapping in time. Because of this potential multiplexing of neuronal sources, dipolar fields projected onto the scalp could look different across punishment and sensory evidence components despite the fact that they might, in part, capture activity from similar sources.
The timing of the first two punishment components is consistent with the N2pc ERP component reported in attention studies (Mazza et al., 2009; Hickey et al., 2010; Sänger and Wascher, 2011). The N2pc is considered an index of covert attention (Hickey et al., 2010) and is enhanced by monetary reward during selection of competing stimuli (Sänger and Wascher, 2011). The timing and scalp distribution of these components are also consistent with the “difficulty” component reported previously (Philiastides et al., 2006), suggesting that punishment effects are mediated by improved, higher-level mechanisms involving top-down influences of attention (Heekeren et al., 2004; Egner and Hirsch, 2005; Philiastides and Sajda, 2006a, 2007; Zanto et al., 2011; Siegel et al., 2012).
In contrast, a multiple regression analysis revealed that only trial-by-trial changes in neural activity from the last two punishment components predicted trial-by-trial accuracy and that neuronal variability in these components was significantly more predictive of subjects' choices than the first two. This suggests that the last two components more likely reflect decision-related processing. The third component exhibited ramp-like activity (Fig. 3D) with the slope seemingly modulated by punishment, pointing to a potential influence of punishment on the process of temporal evidence accumulation (Gold and Shadlen, 2001, 2007; Ploran et al., 2007; Heekeren et al., 2008; Liu and Pleskac, 2011; Philiastides et al., 2011). The fourth component peaked near the response and exhibited a plateauing profile with primarily amplitude differences between punishment levels, possibly indicating commitment to a choice with different decision thresholds or confidence in the impending response (Domenech and Dreher, 2010).
Analogous to the stimulus-locked effects, we identified three response-locked punishment components. The temporal profile of all response-locked components confirmed the parametric effect of punishment (Fig. 5D). The third component appeared after the response, and its spatiotemporal profile corresponds to the postmotor potential (Makeig et al., 1996, 1999). This component might represent a postdecision “expected reward signal” in response preparatory structures, likely resulting from an improved expected performance (Iyer et al., 2010). In turn, this would suggest that, for optimal response selection, punishment affects premotor/motor cortex such that consequences associated with success or failure are appraised accordingly (Brown et al., 2011; Klein-Flügge and Bestmann, 2012).
Analogous to previous studies, we also found considerable inter-individual differences in punishment-induced behavioral effects (Bogacz et al., 2006; Diederich and Busemeyer, 2006; Diederich, 2008; Simen et al., 2009). The 10th, 50th, and 90th percentile of the behavioral effects defined as the difference in accuracy between “punishment − no-punishment conditions” were −2.63, 1.65, and 11.45%, indicating that, in some subjects, punishment impaired performance, whereas in others, it caused substantial behavioral improvement. Our ability to discriminate punishment and no-punishment neural activity tracked these inter-individual differences (Figs. 3C, 5C), demonstrating that the more one was affected by punishment, the greater the modulation of the relevant electrophysiological signatures. This established that punishment components reflect quantitative information necessary to influence behavior.
In contrast to the identified EEG components, accuracy appeared to be modulated by punishment in an all-or-none manner that could be attributable to a nonlinear relationship between observed behavior and neural activation, as indicated in previous studies (Gold and Shadlen, 2007). Additionally, the effect of punishment on performance could be limited by the stimulus quality (available evidence), that is, participants could have reached maximal performance given stimulus quality in the intermediate punishment condition, such that increases in motivation or arousal would not additionally improve performance. The fact that we did not sample the punishment dimension more tightly coupled with the considerable inter-individual differences precludes strong inferences about the general nature of the observed behavioral effects on accuracy.
Our block-wise manipulation of punishment allowed us to test whether differences in baseline activity across different punishment blocks exist. Punishment modulated the spectral amplitudes of alpha and gamma bands, supporting the notion that it boosts attention as reduced amplitude/power in the alpha band is associated with increased levels of attention and top-down processing (van Dijk et al., 2008; Romei et al., 2010; Gould et al., 2011; Hanslmayr et al., 2011). Increased amplitude/power in the gamma band is consistent with increased representations of sensory evidence in visual areas (Siegel et al., 2007) and improved communication between areas attributable to attention (Andino et al., 2005; Siegel et al., 2008; Wyart and Tallon-Baudry, 2008; Gregoriou et al., 2009). Furthermore, both alpha and gamma frequency bands were predictive of trial-by-trial changes in stimulus-locked punishment components, suggesting that changes in prestimulus attentional states influence poststimulus processing during perceptual decision making.
The scalp topographies of these punishment-induced changes in spectral amplitudes resemble the human attentional network (Desimone and Duncan, 1995; Hopfinger et al., 2000; Corbetta and Shulman, 2002; Corbetta et al., 2008). Prefrontal and parietal cortices have consistently been implicated in top-down executive control and control of attention (Desimone and Duncan, 1995; Shulman et al., 1997; Kastner et al., 1999; Egner and Hirsch, 2005; Corbetta et al., 2008) that could be involved in controlling the baseline rhythms seen here because punishment affects attention and motivation (Seymour et al., 2007). Alternatively, the trial-by-trial relationship between prestimulus oscillatory activity and poststimulus punishment components could reflect an increased level of general arousal that persisted throughout the trial attributable to higher punishment expectations (Roesch and Olson, 2003; Maunsell, 2004; Knutson and Greer, 2008).
However, attention-mediated punishment effects on motivation could be specific to circumstances in which stimuli are presented very briefly and slow responses are penalized. This penalty could explain why we did not find differences in RTs across punishment conditions because participants might not have used additional time to make more cautious decisions (Potts, 2011). Another possibility for the absence of RT effects could be that, although sensory information was accumulated faster, subjects simultaneously increased their internal decision threshold for the response (Wrase et al., 2007; Nomoto et al., 2010).
Starting at the second stimulus-locked punishment component, activity gradually transitions from centrofrontal to parietal sensors (Fig. 3A). This is consistent with previous findings showing earlier frontal-only and later parietal attention-orienting activity (Grent-'t-Jong and Woldorff, 2007), suggesting that attention might exert control throughout the information-processing stream, including late decision-related activity. Additionally, punishment is likely to be involved in integrating a number of distinct representation, learning, and action systems (Seymour et al., 2007). Consistent with this view, recent evidence suggests that the amygdala plays a major modulatory role in punishment-induced motivation (Murty et al., 2012) and that reinforcement and punishment can cause broadly distributed activations throughout the brain (Vickery et al., 2011). Here, we focused primarily on the timing information provided by the EEG signals and their relationship to behavioral output to provide an interpretation of our punishment-induced neuronal components. Future work, using simultaneously acquired EEG–fMRI data could provide a more comprehensive spatiotemporal characterization of the influence of punishment on perceptual decision making.
In conclusion, our results indicate that, during perceptual decision making, punishment increases attention/motivation, which in turn yields more efficient decision processing. The very nature of our design, in which errors were punished independently of stimulus category, provides additional support that punishment modulated decision-making efficacy rather than inducing specific criterion shifts to either one of the perceptual categories used in the task. In line with this interpretation, our data revealed postsensory punishment-induced effects in a highly distributed network, in which top-down influences of attention appear to play a major modulatory role on decision making.
Footnotes
This work was supported by the Max Planck Society, German Research Foundation Grant HE 3347/2-1 (H.R.H.), and Royal Society Research Grant RG110054 (M.G.P.).
The authors declare no competing financial interests.
- Correspondence should be addressed to Dr. Marios G. Philiastides, University of Nottingham, School of Psychology, NG7 2RD, UK. marios.philiastides{at}gmail.com