A Postdecisional Neural Marker of Confidence Predicts Information-Seeking in Decision-Making

Theoretical work predicts that decisions made with low confidence should lead to increased information-seeking. This is an adaptive strategy because it can increase the quality of a decision, and previous behavioral work has shown that decision-makers engage in such confidence-driven information-seeking. The present study aimed to characterize the neural markers that mediate the relationship between confidence and information-seeking. A paradigm was used in which 17 human participants (9 male) made an initial perceptual decision, and then decided whether or not they wanted to sample more evidence before committing to a final decision and confidence judgment. Predecisional and postdecisional event-related potential components were similarly modulated by the level of confidence and by information-seeking choices. Time-resolved multivariate decoding of scalp EEG signals first revealed that both information-seeking choices and decision confidence could be decoded from the time of the initial decision to the time of the subsequent information-seeking choice (within-condition decoding). No above-chance decoding was visible in the preresponse time window. Crucially, a classifier trained to decode high versus low confidence predicted information-seeking choices after the initial perceptual decision (across-condition decoding). This time window corresponds to that of a postdecisional neural marker of confidence. Collectively, our findings demonstrate, for the first time, that neural indices of confidence are functionally involved in information-seeking decisions. SIGNIFICANCE STATEMENT Despite substantial current interest in neural signatures of our sense of confidence, it remains largely unknown how confidence is used to regulate behavior. Here, we devised a task in which human participants could decide whether or not to sample additional decision-relevant information at a small monetary cost. Using neural recordings, we could predict such information-seeking choices based on a neural signature of decision confidence. Our study illuminates a neural link between decision confidence and adaptive behavioral control.


Introduction
Humans seek information adaptively to improve the quality of their decisions, for example, by requesting expert advice when they lack the relevant domain knowledge or discriminating evidence (for review, see Bonaccio and Dalal, 2006). This tendency has been documented for everyday financial (Hung and Yoong, 2010) and medical (Butler et al., 2009) decisions, as well as in more carefully controlled laboratory settings (Sniezek and Van Swol, 2001). Recent evidence suggests that information-seeking depends crucially on explicit representation of decision confidence (Desender et al., 2018). Theoretically, decision confidence has been treated as an internal evaluation signal that can be used to adapt behavior in the absence of external feedback (Yeung and Summerfield, 2012;Meyniel et al., 2015). When confidence in a decision is low, this implies that the probability of a decision being correct is also low, and seeking additional information before committing to a decision might be particularly beneficial. In a previous behavioral study, participants engaged more in information-seeking in conditions associated with low compared with high confidence, despite equal accuracy in both (Desender et al., 2018).
At present, however, it remains unclear how the neural coding of confidence informs the decision to engage in additional information-seeking. It has been argued that decision confidence reflects the strength of the evidence in favor of a decision (Vickers, 1979;Zylberberg et al., 2012), stressing the importance of predecisional evidence in the computation of confidence (Kiani and Shadlen, 2009). On such a view, neural signals involved in the decision process itself should be predictive of informationseeking (e.g., the P3 component of the scalp-recorded EEG) (O'Connell et al., 2012;Twomey et al., 2015). Alternatively, decision confidence has been quantified as a function of continued evidence accumulation following a decision (Pleskac and Busemeyer, 2010;Moran et al., 2015), stressing the importance of postdecisional signals in the computation of confidence. Recently, a postdecisional centroparietal positivity was found in scalp EEG recordings that reflects postdecisional neural evidence accumulation, informing judgments about the accuracy of the preceding decision . This signal is commonly referred to as the error positivity (Pe) (Nieuwenhuis et al., 2001) and has been shown to reflect fine-grained variations in decision confidence (Boldt and Yeung, 2015).
The current study aimed to identify neural signatures of confidence that are predictive of information-seeking behavior. In our experimental paradigm, on each trial, participants made an initial perceptual decision about the mean color of eight visual elements, and then decided whether or not to sample additional evidence (at a small cost) before committing to a final decision. Electrophysiological recordings allowed us to evaluate which neural signatures of confidence were related to informationseeking choices. To do so, we relied on time-resolved decoding of EEG data to test for shared neural coding of confidence and decisions to seek additional information. Specifically, we tested whether a multivariate classifier trained to decode confidence from EEG data would be predictive of information-seeking choices. Such between-condition generalization isolates neural processes integral to translating decision confidence into the overt decision to sample additional information. Our main question of interest was whether information-seeking choices could be predicted based on neural markers of confidence observed predecisional (P3), postdecisional (Pe), or both.

Materials and Methods
Participants Seventeen participants (9 males, mean Ϯ SD age: 24.1 Ϯ 3.0 years; range: 21-32 years) took part at Oxford University for monetary compensation (£20 plus up to £4.92 dependent on performance, range of the rounded actual payments: £22-£24). The data of 2 participants were excluded because accuracy of the primary responses was at chance level (48.9% and 49.6% correct); thus, the final sample comprised 15 participants. All provided written informed consent, reported normal or corrected-tonormal vision, and were naive with respect to the hypothesis. All procedures were approved by the local ethics committee.

Stimuli and apparatus
Stimuli were presented on a gray background on a 20-inch CRT monitor with a 75 Hz refresh rate, using the MATLAB toolbox Psychtoolbox3. Each stimulus consisted of eight colored shapes spaced regularly around a fixation point (radius 2.8°visual arc). To manipulate task difficulty, the mean and the variance of the eight elements varied across trials. The mean color of the eight shapes was determined by the variable C; the variance across the eight shapes was determined by the variable V. The mean color of the stimuli varied between red (1, 0, 0) and blue (0, 0, 1) along a linear path in RGB space (C, 0, 1 Ϫ C). At the start of the experiment, C could take four different values: 0.450, 0.474, 0.526, and 0.550 (from blue to red, with 0.5 being the category boundary), and V could take two different values: 0.0333 and 0.1000 (low and high variance, respectively). The current task was used because previous research has shown it can dissociate subjective confidence from objective accuracy, by matching the difficulty of two conditions that either have a low mean (i.e., average color close to the category boundary) or a high variance (i.e., a varied mix of colors across the eight shapes in the display), with confidence found to be systematically lower in the latter case (Boldt et al., 2017a;Desender et al., 2018). On every trial, the color of each individual element was pseudo-randomly selected with the constraint that the mean and variance of the eight elements closely matched the mean of C and its variance V, respectively. Each combination of C and V values occurred equally often. The individual elements did not vary in shape. Responses were made using a USB mouse and a standard QWERTY keyboard. Figure 1 shows an example trial during the main part of the experiment. After a 200 ms fixation interval, the stimulus was flashed for 200 ms. Participants were instructed to respond as quickly as possible, deciding whether the average color of the eight elements was more blue or more red, by clicking one of two mouse buttons. The mapping between color and response was counterbalanced between participants. After a 200 ms postresponse interval, there was a choice phase consisting of two condi- Figure 1. Timeline of an experimental trial. A stimulus was presented for 200 ms, and participants made a speeded response with the mouse, deciding whether the average color of the eight elements was red or blue. On free-choice trials (75%), participants subsequently used a vertical slider to choose either to see the stimulus again in an easier version (by moving the gray cursor toward S) or to give their response (by moving the gray cursor toward R). When the stimulus is shown again, the mean of the eight elements is more clearly red and the variance is smaller (note that the displayed change is exaggerated for illustration purposes). On no-choice trials (25%), participants could only choose to give their response. Finally, on all trials, participants jointly indicated their final response and level of confidence on a horizontal continuous response scale. Being accurate was rewarded (5 points), errors were punished (Ϫ5 points), and there was a small cost associated with sampling more information (Ϫ1 point).

Procedure
tions: free-choice or no-choice. On free-choice trials (75% of trials), the letters R and S appeared, indicating that participants could either choose to request additional evidence by seeing the stimulus again in an easier version (S) or to give their response (R). They indicated their choice by moving a gray slider up or down with their mouse toward their choice, and confirmed by pressing the space bar (locations of R and S were fixed across trials and counterbalanced across participants). On no-choice trials (25% of the trials), only an R appeared, and participants were forced to select the option to give their response.
When participants chose to see the stimulus again, the values of the stimulus were slightly altered so that the distance to the boundary was higher (CЈ ϭ C Ϯ 0.01) and the variability lower (VЈ ϭ V Ϫ0.0167), making the discrimination easier. They were presented with a fixation point for 400 ms, this easier stimulus for 200 ms, and a fixation point for 400 ms. When participants opted (or were forced) to give their response without seeing the stimulus again, they simply viewed the fixation point for the same total amount of time (1000 ms).
Afterward, a horizontal response scale appeared (0.4°high and 9.0°w ide) with a slider (0.4°high and 0.1°wide) in the center. The left-hand side of the bar was labeled as "sure blue," and the right-hand side was labeled as "sure red" (depending on the counterbalancing of the response mapping). The location of the slider on the scale was translated into a numerical score, ranging from Ϫ50 (sure blue) to 50 (sure red), with every three screen-pixel increments (0.09°) resulting in a difference of 1 confidence point. Participants moved the cursor with their mouse to indicate jointly their response and their level of confidence, and confirmed by pressing the space bar. A response could not be given when the cursor was exactly in the middle (0 on the scale), so participants were always forced to make the categorical judgment between red or blue. They were instructed to make this judgment at their own pace. Accuracy for the final response was scored as a binary variable (i.e., ignoring confidence level). Confidence was scored as the absolute value on the response scale. To account for between-participant variation in use of the confidence scale and drift in confidence judgments over the course of the experiment, ratings were z-scored separately for each participant and each block.
Participants gained 5 points for correct answers and lost 5 points for errors. They could win up to an additional £4.92 by scoring points (650 points ϭ £1). Choosing to see the stimulus again in an easier version cost 1 point, giving participants an incentive to sample more information only when the benefit of doing so (in terms of increasing the probability of making a correct choice) outweighed this cost. Participants were explicitly instructed that they could score more points by strategic use of the see again option.
The main part of the experiment comprised 10 blocks of 64 trials, with balanced numbers of trials for each combination of mean and variance separately for each trial type (free-choice vs no-choice), in pseudorandomized order. Each block started with 8 additional practice trials in which the free-versus no-choice phase was omitted and participants received auditory feedback on the accuracy of their responses. This was done to maintain a stable color criterion over the course of the experiment. Before the main part of the experiment, several practice blocks were administered. In the first block (64 trials), participants practiced the color judgment task, and only were to give a speeded response with the mouse, with auditory feedback to signal decision accuracy. In Blocks 2 and 3 (64 trials each), the second response (including the confidence judgment) was added to the task. No feedback was delivered during these blocks, which served to familiarize participants with the confidence rating scale. Finally, practice Block 4 was identical to the main part of the experiment.
At the end of each block (starting from Block 2), the C value in the low-mean condition was adjusted depending on performance in that block. Specifically, an inverse efficiency score (median RT/p(correct)) was calculated for the condition with low mean and low variance and the condition with high mean and high variance (across all trials). When the difference between the two was Ն100/50/10 in absolute value, the C value of the low-mean condition was adjusted by 0.0025/0.0012/0.0005, respectively, depending on the sign of the difference to match performance in these two conditions. This manipulation was performed with the in-tention of creating two conditions with equated performance but different levels of confidence (Boldt et al., 2017a;Desender et al., 2018). However, because in the current data there were small but consistent performance differences between conditions and no significant difference in confidence (see Results), our key analyses were performed regardless of condition. Importantly, however, additional analyses were performed to rule out that our findings are driven by task difficulty.

EEG recording and preprocessing
Participants sat in a dimly lit, electrically shielded room. EEG data were recorded using a fabric cap (QuickCap, Neuroscan) with 32 channels, all referenced to the right mastoid online. Vertical and horizontal electrooculogram was measured from above and below the left eye and the outer canthi of both eyes. Impedance was kept at Ͻ50 k⍀. The data were continuously recorded using SynAmps2 amplifiers (Neuroscan), sampled at 1000 Hz. Stimulus-locked data were baselined Ϫ100 ms to 0 ms before stimulus onset. In addition, these data were aligned to the time of the primary response while keeping the same prestimulus baseline. Independently from this, the raw data were also locked to the primary response with a baseline Ϫ100 to 0 ms before response onset. For analyses locked to the information-seeking decision, the response-locked data (keeping the same preresponse baseline) were realigned to the onset of the information-seeking decision (i.e., the space bar press confirming the decision). In preprocessing, segments containing gross artifacts were first identified by visual inspection and removed. Next, eye blinks were removed using independent component analysis, and segments containing values Ϯ200 V were excluded using extreme value rejection. Bad (noisy) channels were replaced by an interpolated weighted average from surrounding electrodes using the EEGLAB toolbox (Delorme and Makeig, 2004) in MATLAB (The MathWorks). Finally, segments containing further artifacts, identified by visual inspection, were removed before averaging. For plotting purposes only, data were filtered using a 10 Hz low pass filter.

Statistical analyses
Behavioral analysis. To test whether our different measures of performance lawfully scaled with difficulty, indices of performance were calculated separately for the factors mean and variance. This was done for median response times (RTs) on correct trials and mean accuracy of the primary response (calculated based on all trials), mean accuracy of the secondary response, mean confidence on the secondary response (calculated on no-choice data), and the proportion of see again choices (calculated on free-choice data). A repeated-measures ANOVA with the factors mean (high or low) and variance (high or low) was then performed on all these indices.
Next, we fitted a mixed-regression model to the data to test whether confidence predicts information-seeking over and above other factors. This analysis cannot be performed at the trial level because confidence and see again choices are measured on separate parts of the data (nochoice vs free-choice, respectively). Therefore, for all variables, we computed eight data points (2 levels of mean ϫ 2 levels of variance ϫ 2 colors), separately for each participant. Specifically, we computed (1) mean confidence based on the no-choice data, (2) the proportion of see again choices based on the free-choice data, (3) mean accuracy of the primary response based on all data, and (4) median RTs on the primary response based on all data, separately for the factors evidence variance (high or low), evidence mean (high or low), and color (red or blue). Values were calculated separately for each color to partition the data in a more fine-grained manner, but this variable was not taken into account in the analyses. For ease of interpretation, low variability and high mean were dummy coded as reference categories so that a positive effect of each factor corresponds to an increase in difficulty. We used mixed-regression modeling (using the lme4 package in R) (Bates et al., 2015) to construct models of increasing complexity. For each model, random slopes were added for all variables for which this significantly increased the fit compared with that model without random slopes. When required, degrees of freedom were estimated using Satterthwaite's approximation (using the lmerTest package) (Kuznetsova et al., 2014).
Event-related potentials (ERPs). In a first set of analyses, we wanted to confirm that our data showed the usual modulation of predecisional and postdecisional event-related components as a function of decision con-fidence. Building on previous work, ERP indices of confidence and see again choices were tested at electrode CPz (Boldt and Yeung, 2015). Significant time windows during which ERPs for high and low confidence trials (or for respond and see again trials) differed from each other were identified using a standard two-tailed within-subjects cluster-based permutation test using custom code in MATLAB. Elements that were adjacent and significant (element-level p Ͻ 0.05) were collected in a cluster. Cluster-level test statistics consisted of the absolute sum of t values within each cluster, and these were compared with a null distribution of test statistics created by drawing 1000 random permutations of the observed data. A cluster was considered significant when its (clusterlevel) p value was Ͻ 0.05. To examine whether the modulation of the ERPs by confidence extended over and above our manipulation of task difficulty, multivariate regression analyses were performed on each time point (separately for each participant) predicting single-trial EEG amplitude at electrode CPz by the factors mean (high or low), variance (high or low), the interaction between mean and variance, and confidence (high or low). We then performed cluster-based permutation tests on the t values associated with each factor to examine whether confidence explains unique variance in EEG data after task difficulty is accounted for.
Time-resolved decoding. To test for overlap in the neural coding of confidence and see again choices, a classifier was trained separately for each participant using single-trial logistic regression based on the linear derivation method introduced by Parra et al. (2005). This approach identifies the spatial distribution of scalp EEG activity in a given time window that maximally distinguishes two classes to deliver a scalar estimate of component amplitude. Three sets of analyses were performed. First, it was tested how well information-seeking choices and how well decision confidence can be decoded from EEG data (within-condition decoding). These analyses were a first step in identifying the time window during which information-seeking choices and decision confidence are decodable. Confidence judgments on no-choice trials were median split into high and low values. In both analyses, both the training and testing sets were restricted to trials with correct responses on the primary decision only, to dissociate the coding of information-seeking and confidence from the coding for errors. A 10-fold cross-validation approach was used to avoid overfitting. To reduce noise due to the random assignment of single trials in each fold, we repeated this procedure 100 times, resulting in 1000 classifier predictions per time point. These scores were then averaged, and below we report these averaged classification values. For the third set of analyses, we tested how well see again choices can be decoded from a classifier trained to discriminate EEG activity associated with differing levels of confidence. The decoder was trained to discriminate high versus low confidence judgments (on correct trials only, from no-choice trials) and tested on see again versus respond choices on correct trials from free-choice data; thus, there was no overlap between training and testing data. This approach was repeated 1000 times. Variability across iterations arose due to the random selection of data necessary to obtain an equal number of high and low confidence trials in the training data and an equal number of see again and respond trials in the free-choice data.
To increase the signal-to-noise ratio of data used for decoding, classifiers were trained on time-averaged signals within discrete temporal windows (window width of 106 ms, moving in 10 ms increments along entire epochs aligned to stimulus onset, initial decisions, and see again choices). The ability to successfully classify individual trials was quantified by calculating the Az score, which gives the area under the receiver operating characteristic curve, derived from signal detection theory (Stanislaw and Todorov, 1999). Classifiers were trained and tested on each time point of the EEG data. Thus, this method produces a 2D (training time ϫ testing time) decoding performance matrix. Specific dynamics of mental representations can be unraveled by evaluating the shape of the decoding matrix (King and Dehaene, 2014). The statistical reliability of this decoding was determined via a bootstrap procedure, comparing actual classification with classifier performance on trials with randomized condition labels (1000 iterations with different randomizations), to provide an estimate of the null classification. Clusters were formed via paired-samples t tests for the entire 2D matrix, comparing true and null classifications. Neighboring elements that passed a threshold value corresponding to an (element-level) p value of 0.01 (two-tailed) were collected into a separate cluster. The same results were obtained when using a more liberal thresh-old of 0.05. Elements were considered as neighbors when they were (cardinally or diagonally) adjacent. Cluster-level test statistics consisted of the absolute sum of t values within each cluster, and these were compared with a null distribution of test statistics created by drawing 1000 random permutations of the observed data. A cluster was considered significant when its (cluster-level) p value was Ͻ0.05.

Behavioral results
Performance, confidence, and information-seeking We first confirmed that all behavioral measures (performance, confidence, and information-seeking) scaled with task difficulty (Fig. 2). For the primary perceptual decision (speeded color discrimination) data, a repeated-measures ANOVA showed that both median RTs on correct trials and mean accuracy were significantly affected by color mean (RTs: F (1,14) ϭ 14.96, p ϭ 0.002; accuracy: F (1,14) ϭ 72.94, p Ͻ 0.001) and by color variance (RTs :  F (1,14) ϭ 27.89, p Ͻ 0.001; accuracy: F (1,14) ϭ 38.37, p Ͻ 0.001), but not by their interaction (RTs: F Ͻ 1; accuracy: F Ͻ 1). Mean accuracy of the final response (regardless of the level of confidence), calculated on the data of the no-choice condition in which participants were forced to respond without viewing the stimulus again, was likewise affected by both color mean (F (1,14) ϭ 45.55, p Ͻ 0.001) and color variance (F (1,14) ϭ 23.56, p Ͻ 0.001), but the interaction failed to reach significance (F (1,14) ϭ 4.32, p ϭ 0.056). Mean confidence, again calculated on the data of the no-choice condition (to avoid effects of seeing the stimulus again), was affected by color mean (F (1,14) ϭ 32.18, p Ͻ 0.001) and color variance (F (1,14) ϭ 21.73, p Ͻ 0.001), but not by their interaction (F Ͻ 1). Finally, on free-choice trials, participants on average chose to see the stimulus again in an easier version on 43.2% (range 2%-94%) of the trials. The proportion of see again choices was affected by mean (F (1,14) ϭ 21.75, p Ͻ 0.001) and variance (F (1,14) ϭ 20.02, p Ͻ 0.001), with no significant interaction between these factors (F (1,14) ϭ 3.22, p ϭ 0.09). The large variation in see again choices did not correlate with individual differences in overall mean confidence on forced choice trials (r (13) ϭ 0.22, p ϭ 0.428, Bayes Factor ϭ 0.26) or accuracy on the primary response (r (13) ϭ 0.36, p ϭ 0.18, BF ϭ 0.47). Replicating previous work (Desender et al., 2018), participants chose more often to see the stimulus again in the high mean/high variance compared with the low mean/low variance condition (t (14) ϭ 2.47, p ϭ 0.027), even though, if anything, they made numerically fewer errors in the former condition ( p ϭ 0.189). A one-way ANOVA on see again proportions across the two medium-difficulty conditions, including the accuracy difference as a covariate, confirmed the main effect of condition (F (1,13) ϭ 8.92, p ϭ 0.011), which was not modulated by differences in accuracy (F (1,13) ϭ 2.75, p ϭ 0.139). RTs were slower in the high mean/high variance compared with the low mean/low variance condition (t (14) ϭ Ϫ2.819, p ϭ 0.014), and unexpectedly, there was no difference in confidence between these conditions ( p ϭ 0.570). Because the two conditions of medium difficulty did not differ in terms of confidence, we analyzed the EEG data combined across conditions and then performed additional analyses to demonstrate that the observed effects are not explained by task difficulty.

The relation between decision confidence and information-seeking
To further interrogate potential sources of variability in observed information-seeking behavior, mixed-regression models of increasing complexity were fit to the data predicting variation in the proportion of see again choices across conditions of the experi-mental design (see Materials and Methods) by five predictors: the factors color variance (high or low) and color mean (high or low), and the variables median primary RTs (both on free-choice and no-choice), mean accuracy of the primary response (both on free-choice and no-choice), and mean confidence (on no-choice trials). A model building strategy was used (Table 1). The experimental variables mean and variance explained a significant part of the variance in information-seeking (Model 1), and measures of primary task performance (accuracy and RT) significantly increased the fit (Model 2). Crucially, adding confidence to a model that already contained these four variables provided the best fit to the data. As predicted, in this final model (Model 3), there was a clear negative effect of confidence on see again choices (␤ ϭ Ϫ0.08, t (85.63) ϭ Ϫ3.09, p ϭ 0.003), whereas the effects of RT (␤ ϭ 0.15, p ϭ 0.065) and variance (␤ ϭ 0.03, p ϭ 0.069) were no longer statistically significant. The effects of accuracy and mean were not significant either (both p values Ͼ 0.10). Thus, although mean, variance, RTs, and accuracy explained a significant amount of variation in information-seeking in simpler models, their statistical contributions were largely accounted for by confidence.

EEG analysis ERP markers of confidence
To examine how ERP waveforms are modulated by confidence, correct trials in the no-choice condition were split into high and low confidence bins via median-split separately for each participant. As can be seen in Figure 3A, at electrode CPz, the stimuluslocked ERP showed significant modulation by confidence from 414 to 581 ms ( p ϭ 0.016, cluster level), corresponding to a typical P300 component (Hillyard et al., 1971;Polich, 2007). Mean amplitude was more positive for trials on which participants later indicated high (vs low) confidence. By contrast, the ERP aligned to and following the primary response showed the opposite pattern: more negative amplitudes for high confidence trials were observed from 403 ms after response until the end of the analyzed epoch (700 ms; p ϭ 0.008, cluster level), corresponding to a typical Pe component (Ridderinkhof et al., 2009). The topographies associated with these significant time windows showed that these effects had a similar centroparietal scalp distribution.

Postdecisional ERPs are modulated by confidence over and above task condition
We next examined whether the modulation of the ERPs by confidence extended over and above our manipulation of task difficulty. To do so, we used multivariate regression analyses to examine whether confidence explains unique variance in EEG data after task difficulty is accounted for. Figure 4 shows the  average t value for each predictor from the multivariate regression on each time point. As can be seen, after controlling for task difficulty, there is no consistent effect of confidence in the stimulus-locked data (no significant elements in the clusterforming step). None of the factors capturing task difficulty was significant ( p values Ͼ0.079). In the response-locked data, we observed a modulation of the EEG data by confidence, even when controlling for task difficulty, occurring between 481 and 615 ms after response ( p ϭ 0.042). Slightly earlier in time, there also was a significant main effect of variance between 225 and 517 ms after response ( p ϭ 0.021). Finally, in the data locked to the information-seeking choice, there was no effect of confidence (no significant elements in the cluster-forming step), but a significant main effect of mean between Ϫ196 ms before and 13 ms after the information-seeking choice ( p ϭ 0.007). In sum, in the stimuluslocked P3 cluster, we found no reliable modulation by confidence after controlling for task difficulty; whereas in the responselocked Pe cluster, we did observe a modulation by confidence beyond the effect of experimentally induced task difficulty. Figure 3B shows the ERP waveforms at CPz, separately for see again and respond directly choices, again averaged over correct trials only. The results closely mirror the modulation of the ERPs by confidence. The stimulus-locked ERPs showed more positive amplitudes on respond compared with see again trials, from 357 until 534 ms ( p ϭ 0.028, cluster level). The response-locked ERPs showed more negative amplitudes on respond compared with see again trials, from 250 ms until the end of the epoch (700 ms, p Ͻ 0.001, cluster level). Finally, the latter difference remained significant up until 29 ms before the information-seeking decision ( p ϭ 0.008, cluster level). Similar to confidence, all significant effects had a clear centroparietal scalp distribution, although the stimulus-locked P3 component for information-seeking choices has a slightly more anterior scalp distribution than that for confidence.

ERP markers of information-seeking choices
In sum, analysis of the ERPs provides preliminary evidence for a link between confidence and information-seeking, given that both processes have very similar neural markers. In the following, we use multivariate single-trial decoding to provide a more rigorous appraisal of this link, testing the informational content of these neural markers (i.e., whether information-seeking choices can be decoded reliably) and whether neural markers of confidence are predictive of information-seeking behavior.

Time-resolved decoding
Within-condition decoding of information-seeking choices and decision confidence We first tested whether information-seeking choices can be decoded over time from the EEG data. This analysis is a first step in identifying the time window during which information-seeking can be decoded. In this analysis, classifiers were both trained and tested on data from free-choice trials, using 10-fold crossvalidation as described above. Classifiers were trained and tested on each point in time, thus shedding light on the generalization of the discriminate pattern over time. In the stimulus-locked matrix (Fig. 5A), no robust decoding was possible in the prestimulus period (all cluster p values Ͼ0.143). The decoder was only trained on data up until 400 ms after stimulus (13th percentile of RTs) to avoid contamination from postresponse data (from trials with short RTs). To complement this, the same analysis was repeated after the data were realigned to the time of the response (but keeping the same prestimulus baseline). Again, no decoding was visible before response (all p values Ͼ0.156), but there was a significant cluster that largely fell after response ( p ϭ 0.040; Fig.  5B). Next, we decoded EEG data measured in the postresponse period, using a conventional preresponse baseline. This analysis revealed that, in marked contrast to the preresponse epoch, it was possible to classify see again choices reliably across the entire epoch from the initial task response to the subsequent see again choice: There was a highly significant cluster that began just before primary response execution ( Fig. 5C; p Ͻ 0.001, cluster level) and peaked just before the decision of whether or not to sample more information (Fig. 5D; p Ͻ 0.001, cluster level). Decoding performance was strongest along the diagonal, indicating best decoding when the cross-validation test data came from the same time period as the classifier training data. Nevertheless, both clusters also displayed significant off-diagonal decoding, indicating that the cross-validation test data could be predicted above chance level, even when classifier training data were obtained from a different time window. This strong temporal generalization suggests a neural activity pattern that is consistent and sustained over time during the formation of information-seeking choices.
Next, we used the same approach to examine whether decision confidence could be decoded from EEG data. This analysis used no-choice trials only, to avoid any contaminating effects of the see again choice itself. In the stimulus-locked matrix (Fig. 5E), no robust decoding was observed (all cluster p values Ͼ0.785). Complementing this, when these preresponse data were re- aligned to the response (keeping the same prestimulus baseline), no significant clusters were observed (all cluster p values Ͼ0.195; Fig. 5F ). In contrast, in the postresponse period, there was a highly significant cluster ( p ϭ 0.002) during which decoding of confidence was possible (Fig. 5G). Although this cluster was limited to ϳ400 ms after response, it extended up to 550 ms after response when using a more liberal cluster threshold of 0.05. Finally, in the data locked to the information-seeking decision (Fig. 5H ), there were no significant clusters ( p values Ͼ0.410).
Collectively, the preceding analyses suggest more robust decoding of confidence and see again choices based on postresponse EEG than preresponse EEG data. A final set of analyses aimed to test these postresponse vs preresponse differences more directly. To this end, we first identified clusters in the preresponse matrices that came closest to significance in the permutation analyses (comprising neighboring pixels with a significant effect, uncorrected), as representing the strongest observed decoding in the preresponse period. We then selected a corresponding number of pixels in the postresponse cluster, centered on its peak, and compared decoding accuracies between clusters. When decoding see again choices, decoding accuracy in the postresponse cluster (Fig.  5C) was significantly higher than the closest-to-significant cluster in the stimulus-locked matrix ( Fig. 5A; 5 pixels, t (14) ϭ 15.14, p Ͻ 0.001) and the preresponse matrix with a prestimulus baseline ( Fig. 5B; 34 pixels, t (14) ϭ 26.83, p Ͻ 0.001). The same was true when decoding confidence (higher decoding than in the closest-to-significance stimulus-locked cluster in Fig. 5E; 3 pixels, t (14) ϭ 3.69, p ϭ 0.002; and the closest-to-significance cluster in the preresponse data with a prestimulus baseline in Fig. 5F; 23 pixels, t (14) ϭ 5.52, p Ͻ 0.001). To confirm that these results were not specific to the particular cluster sizes used, we confirmed that corresponding results were observed when taking clusters of 3 ϫ 3, 5 ϫ 5, 7 ϫ 7, and 9 ϫ 9 pixels around the peak pixel in each matrix.
In sum, both information-seeking choices and decision confidence could be decoded from EEG data, but only during the time window following the primary task response. No abovechance decoding was visible when classifiers were trained on EEG data from the preresponse window.

Across-condition decoding of information-seeking choices by confidence
While the previous analyses already hinted at the importance of postdecisional neural activity for information-seeking choices, those analyses are uninformative about the role of confidence in this process. Our final set of analyses provided a more direct test of the hypothesis that confidence underpins adaptive information-seeking. Specifically, we tested whether an EEG classifier trained to decode confidence on no-choice trials (i.e., in which participants were not asked to decide whether or not to seek further information before their final decision) would be able to predict see again choices on free-choice trials (i.e., on the separate set of trials in which participants had the option of seeking or declining additional information). Importantly, on nochoice trials participants did not make an information sampling choice themselves but were forced to select the option to give their response. This rules out the possibility that our classifier is decoding incidental processes related to this choice, such as motor-related neural activity resulting from the informationseeking choice. Moreover, to ensure that our classifier was decoding decision confidence, and not achieving above-chance classification by exploiting other correlated features of the data (e.g., trials in which participants changed their mind), only trials on which participants were correct in both their initial and their final decision were used to train the classifier.
In the resulting stimulus-locked matrix, there was no sign of above-chance decoding (all p values Ͼ0.289; Fig. 6A). A complementary analysis in which these data were realigned to the time of the response (keeping the same prestimulus baseline) also showed no reliable preresponse decoding (all p values Ͼ0.218; Fig. 6B). The response-locked data, by contrast, showed significant decoding from ϳ350 to 700 ms after the response ( p ϭ 0.008, cluster level; Fig. 6C). Thus, a decoder trained to predict confidence from EEG data from no-choice trials was able to predict information-seeking decisions from EEG data recorded on free-choice trials, specifically in this postresponse window. As becomes clear from the associated topography in Figure 6C, the scalp projections show a very similar scalp distribution to the Pe that was observed in the ERPs. Importantly, across-condition decoding was not driven by a joint relation of confidence and information-seeking to task difficulty (e.g., stimulus variance): the same postresponse cluster was found ( p ϭ 0.022) when the factors mean (high or low), variance (high or low), and their interaction were first regressed out from the EEG data (from both the training and test datasets) on each time point at each electrode, separately for each participant. Similar results were observed in a further control analysis where these factors were regressed out from participants' confidence ratings before running the EEG decoding analysis. Conversely, after confidence (high or low) was regressed out from the EEG data, a classifier trained to decode task difficulty itself (specifically, whether a trial had an easy high mean/low variance stimulus vs a difficult low mean/ high variance stimulus) did not robustly generalize across conditions to predict information-seeking choices (p values Ͼ0.101), nor was there reliable decoding of task difficulty itself (p values Ͼ0.405; within-condition decoding). Together, these findings provide further evidence that confidence predicts information-seeking behavior, even when controlling for possible confounding effects of objective task difficulty. Significant decoding was not observed in analysis of data time-locked to the information sampling decision ( Fig. 6D; all cluster p values Ͼ0.256).
Our final set of analyses directly compared decoding accuracy in postresponse versus preresponse periods. These analyses revealed that decoding accuracy in the postresponse cluster ( Fig.  6C) was significantly higher than the closest-to-significance cluster in the stimulus-locked matrix ( Fig. 6A; 1 pixel, t (14) ϭ 3.29, p ϭ 0.005), and numerically higher compared with a cluster in the preresponse data with a prestimulus baseline ( Fig. 6B; 36 pixels, t (14) ϭ 1.56, p ϭ 0.140). To check that these results were not specific to the particular cluster sizes used, we confirmed that corresponding results were observed when taking clusters of 3 ϫ 3, 5 ϫ 5, 7 ϫ 7, and 9 ϫ 9 pixels around the peak pixel in each matrix.

Discussion
The current study examined whether neural markers of decision confidence are predictive of information-seeking behavior. To this end, we recorded scalp EEG while participants performed a task in which they first made an initial decision about a stimulus, then chose whether or not to sample more information, before providing their final response and level of confidence. Information-seeking choices and confidence similarly modulated predecisional (P3) and postdecisional (Pe) ERP components. Using multivariate classification, we then showed that information-seeking choices could be decoded from EEG data from the time of the initial decision to the time of the subsequent information-seeking choice (within-condition decoding). However, no above-chance decoding was visible preceding the initial decision. Crucially, a classifier trained to decode high versus low confidence generalized to prediction of information-seeking choices (across-condition decoding), and this too was restricted to a post-RT window. The time period during which we observed robust across-condition generalization (i.e., from no-choice trial confidence ratings to free-choice trial information-seeking behavior) corresponds to that of a postdecisional neural marker of decision confidence, suggesting that the latter reflects a neural process integral to translating one's subjective sense of confidence into overt decisions to sample more information.
When confronted with difficult decisions, humans seek further information to improve the quality of their decisions. Unsolicited information does not affect decisions, whereas decisions are more accurate when additional information is actively solicited (Hung and Yoong, 2010). This suggests that the act of seeking further information is driven by an internal evaluation. A straightforward hypothesis is that humans internally compute the probability of making a correct decision; and when this probability is low, they seek additional information. Indeed, in a recent behavioral study, we were able to demonstrate that explicitly represented decision confidence predicts information-seeking, even across conditions matched for objective difficulty (Desender et al., 2018). In contrast to that finding and other previous work (Spence et al., 2016;Boldt et al., 2017a), in the current study we did not observe a difference in confidence between the two medium difficulty conditions (high mean/high variance vs low mean/low variance) of the same task. This discrepancy might in part reflect the failure of our psychophysical staircase to match behavioral performance across these conditions. Participants made perceptual decisions significantly more slowly and numerically more accurately with high variance stimuli, suggesting a more cautious response strategy that would tend to inflate confidence (Vickers and Packer, 1982). Moreover, in the current study, confidence was only calculated from no-choice data, which constituted 25% of all data, thus making our design less sensitive to find subtle differences between two specific conditions. Despite this, using mixed modeling, we were able to demonstrate that decision confidence, not objective accuracy or stimulus difficulty, was the main variable predicting information-seeking.
The current work significantly extends our previous behavioral findings by characterizing the neural signatures integral to translating decision confidence into overt information-seeking choices. In particular, we could predict information-seeking behavior based on confidence; however, we could do so only in postdecisional EEG activity, even though averaged ERP waveforms varied significantly as a function of confidence and see again choices also in the predecisional period. As such, our findings converge with theoretical work arguing that postdecisional evidence accumulation plays a critical role in confidence judgments (Pleskac and Busemeyer, 2010;Moran et al., 2015;Fleming and Daw, 2017) and subsequent actions. Depending on the strength of the postdecisional evidence (i.e., reflecting the degree of confidence), participants will seek additional information or not.
The postdecisional neural marker observed in the current work closely resembles the classical Pe component of the ERP (Ridderinkhof et al., 2009). This neural marker has been suggested to reflect an evidence accumulation signal evaluating the likelihood that the just-executed response was incorrect (Steinhauser and Yeung, 2010;Murphy et al., 2015). Accordingly, the amplitude of this signal (reflecting the amount of evidence for an error) has been shown to scale inversely with decision confidence (Boldt and Yeung, 2015). Our observation of this signal after participants made a primary decision in both the no-choice and free-choice conditions suggests that, in both conditions, evidence is accumulated about the likelihood of this decision being correct. This evaluation process could, in principle, be used to guide both immediate, binary information-seeking choices (by imposing a single threshold on the evolving tally of error evidence)  and later confidence reports (by translating the final continuous evidence tally into a correspondingly graded expression of confidence). The present finding of significant across-condition decoding provides important support for this idea by showing that both confidence reports and information-seeking choices appear to be reflected in the same underlying neural signal.
An alternative interpretation of our results could be that the postdecisional neural marker we describe reflects the internal decision to seek more information, rather than a common signal that can be leveraged for making both information-seeking decisions and graded confidence reports. In other words, perhaps participants made internal information-seeking decisions on nochoice trials despite there being no explicit requirement to do so, and the neural signal that is central to our analyses reflects this. Such an explanation of our findings seems unlikely for two reasons. First, our postdecisional neural signal highly resembles the well-characterized Pe component, both in time and scalp distribution (Ridderinkhof et al., 2009). The Pe is reliably observed in paradigms that do not require any overt postdecisional response, which makes it unlikely that it reflects a signal specifically related to this additional response. Second, if this postdecisional neural signal directly reflects the information sampling choice, it should be related to systematic biases that are observed in such choices. We indeed observed large interindividual differences in the tendency to prefer one of the information-seeking options, with some participants biased toward respond choices (N ϭ 10, on average 23% see again choices, RT see again ϭ 1245 ms vs RT respond ϭ 800 ms, t (9) ϭ 4.18, p ϭ 0.002) and others toward see again choices (N ϭ 5, on average 73% see again choices, RT see again ϭ 797 ms vs RT respond ϭ 953 ms, t (4) ϭ Ϫ2.02, p ϭ 0.11). For both subgroups, however, the postdecisional neural marker was of higher amplitude when participants chose to see more information (replicating Fig. 3B; both p values Ͻ0.001). This observation is hard to reconcile with the idea that postdecisional neural activity reflects processes related specifically to the information-sampling choice, but are compatible with our interpretation that this activity indexes an evaluation of the accuracy of the just-executed response, which then informs explicit confidence reports and information-seeking choices.
Neither within-condition nor across-condition decoding analyses yielded above-chance decoding of confidence or information-seeking choices in the time window occurring before the response. This lack of decoding is striking given that predecisional neural activity (corresponding to the well-studied P3 component) was found to be sensitive to confidence and information-seeking in univariate analyses. In this regard, it is important to note that not all neural signals that are modulated by confidence are also involved in the representation of explicit (i.e., subjective) confidence (Pouget et al., 2016). Several studies showed predecisional neural markers that are sensitive to the level of confidence (Kiani and Shadlen, 2009;Philiastides, 2015, 2018;Odegaard et al., 2018). For example, Odegaard et al. (2018) demonstrated that opt-out decisions (presumably reflecting low confidence) are associated with weak traces of neural evidence in monkey superior colliculus occurring before the response. However, using a positive-evidence manipulation that dissociates evidence quality from confidence, they were able to show that the predecisional neural activity tracks evidence quality, not subjective confidence (both of which are typically closely associated). A similar observation was made in the current work. Significant modulation of the predecisional neural marker by confidence disappeared once stimulus variability (i.e., a measure of evidence quality) was taken into account. Together, this might explain why predecisional neural markers are modulated by confidence but do not predict informationseeking choices.
Our findings are of relevance to research on the role of uncertainty in action control (Daw et al., 2005;Yu and Dayan, 2005).
From a Bayesian perspective, uncertainty (i.e., the inverse of confidence) can be used as a cue for behavioral control (Daw et al., 2005). From this, it follows that decision confidence should predict strategic decisions, such as whether or not to sample more evidence (Meyniel et al., 2015). Our findings are the first empirical demonstration that decision confidence and the adaptive act of information-seeking share the same neural signal. Another relevant connection to our work is research on exploitationexploration dilemmas. When forced to decide from which of several patches to harvest, participants are faced with the dilemma between exploiting a known patch or exploring unknown but potentially more rewarding patches. Empirical and modeling work has demonstrated that participants' uncertainty about these choices arbitrates between the two strategies. Consistent with normative theories (Sutton and Barto, 2018), participants actively explore unknown patches to gain information and reduce their uncertainty (Badre et al., 2012;Cavanagh et al., 2012). Interestingly, recent research has indicated an important role for subjective confidence in this process (Boldt et al., 2017b). Specifically, when confidence in value representations was low, participants actively explored unknown patches. Although the underlying construction of confidence may differ (confidence in value representation vs confidence in the accuracy of a decision), in both cases, low confidence triggers the need for information-seeking.
In conclusion, it was shown that a classifier trained to decode confidence based on EEG data was able to predict informationseeking decisions, specifically in the time window following a speeded decision about the stimulus. This suggests that postdecisional neural processes are integral to translating decision confidence into overt decisions to seek further information.

Notes
All raw data can be freely accessed at https://osf.io/z7umt/.