Abstract
Some decisions, such as selecting a food item in a novel menu, are not based on rational norms, or on trained habits, but on subjective preferences. How the human brain makes these preference-based decisions is still debated in cognitive neuroscience. Classical models focus on the comparison mechanism that achieves the selection of the option with best expected value. Recent models suggest that estimates of option values are refined until reaching sufficient confidence in the considered choice. Neuroimaging studies in humans and electrophysiology studies in animals have gathered evidence that value and confidence estimates are both represented in the medial and lateral regions of the orbitofrontal cortex (OFC). Here, we took advantage of electrodes implanted within the OFC of human patients with pharmacoresistant epilepsy (14 women, 12 men) to investigate whether value and confidence estimates can be dissociated in electrophysiology activity recorded during preference-based binary decisions. The overall value (likeability ratings summed over options) and choice confidence (selection probability of the chosen option) were identified in low-frequency (4–8 Hz) OFC activity. These value and confidence signals were time-locked to the decision, showed opposite signs of correlation, and were recorded in separate sites. This pattern of results is not consistent with the simulations of an attractor neural network model implementing a comparison of option values. However, it is compatible with the notion of a neural network generating sparse representations of option values and choice confidence estimates, based on which decisions can be made.
Significance Statement
The orbitofrontal cortex (OFC) is known to play a critical role in decisions based on subjective preferences, such as choosing between food items in a menu. However, the information provided by the human OFC has remained elusive, due to limitations of neuroimaging techniques. Here, taking advantage of electrodes implanted in patients for clinical purposes, we present a rare dataset of electrophysiological activity recorded during preference-based decisions. Our analyses suggest that the OFC signals two distinct constructs on which decisions could be based: the subjective value of available options and the confidence in the intended choice.
Introduction
Cheese or dessert? Bowling or dancing? Art or science? Many choices, from mundane activities to career paths, are based on subjective preferences rather than normative principles. How the human brain makes these so-called preference-based decisions is a central question in cognitive neuroscience.
At the computational level, the decision process is traditionally decomposed into (1) valuation of choice options and (2) selection of the best option. Valuation is understood as an integration over attributes of choice options, and selection is reduced to a comparison between option values (Glimcher and Rustichini, 2004; Rangel et al., 2008; Padoa-Schioppa, 2011). However, these simple computational accounts only solve the problem of which option should be selected, not when the decision should be made. To address that issue, evidence accumulation models were borrowed from theories of perceptual decision and applied to preference-based decisions (Krajbich et al., 2010; Philiastides et al., 2010). These models assume that the relative preference for a given option is accumulated over time, until it reaches a bound that triggers the choice. Although passive accumulation makes sense for perceptual evidence, which is a noisy indicator of some external feature (such as light), it cannot capture the formation of subjective preference, which must be internally constructed from information stored in memory. For this reason, accumulation models were reframed as a tradeoff between the time invested in deliberation and the confidence gained in preference estimation (Lee and Daunizeau, 2021; Bénon et al., 2024). In this perspective, the brain would need to estimate both the value of choice options and the confidence in the forthcoming decision.
At the neural level, value and confidence estimates were both linked to activity in the orbitofrontal cortex (OFC). In humans, fMRI studies identified the ventromedial prefrontal cortex (vmPFC) as a key valuation node, notably during likeability rating or even during distractive tasks (Lebreton et al., 2009; Levy et al., 2011; Suzuki et al., 2017; Shenhav and Karmarkar, 2019). In nonhuman primates, single-unit electrophysiology studies found value signals in both the vmPFC and the lateral OFC (Tremblay and Schultz, 1999; Padoa-Schioppa and Assad, 2006; Strait et al., 2014; Abitbol et al., 2015). Bridging across techniques and species, we previously recorded intracerebral EEG (iEEG) activity from electrodes implanted in human patients with pharmacoresistant epilepsy and found evidence for value signals during likeability rating, in both the vmPFC and the lateral OFC (lOFC), which we designate together by the global label “OFC” (Lopez-Persem et al., 2020). Confidence was also related to vmPFC activity by human fMRI studies, during both perceptual decisions (Bang and Fleming, 2018; Gherman and Philiastides, 2018; Rouault et al., 2023) and preference-based decisions (De Martino et al., 2013).
However, the dissociation of value and confidence signals is not straightforward. It is easier during rating tasks, in which confidence varies as a quadratic (U-shaped) function of value (Lebreton et al., 2015; De Martino et al., 2017). It is trickier during choice tasks, in which OFC activity typically reflects the difference between chosen and unchosen option values (Gläscher et al., 2009; Chau et al., 2014; Gherman and Philiastides, 2018). Despite being related to value comparison, this difference signal is close to a notion of confidence, defined as the subjective probability of making the right choice. A seminal MEG study (Hunt et al., 2012) concluded that the vmPFC reflects first the sum and then the difference of option values. Using fMRI, we generalized this distinction to identify brain regions signaling value and confidence across rating and choice tasks (Clairis and Pessiglione, 2022). There was an overlap, with value being represented in more posteroventral and confidence in more anterodorsal parts of the medial prefrontal cortex. Thus, fMRI recordings suggest a spatial gradient, whereas MEG recordings suggest a temporal dissociation between value and confidence signals.
Leveraging iEEG spatial and temporal resolution, our aim here was to test whether value and confidence signals, recorded within the OFC while patients performed preference-based decisions, could be dissociated in space and/or time.
Materials and Methods
Participants
Thirty-five participants (37.9 ± 10.7 years, 21 females) were informed and gave written consent to their inclusion in the study. Participants were the same as described in our previous study (Lopez-Persem et al., 2020). They were patients with pharmacoresistant focal epilepsy who were stereotactically implanted with multilead depth electrodes as part of a preresection procedure. Surgical implantations and iEEG recordings took place in three different epilepsy departments: Lyon (n = 18), Grenoble (n = 6), and Paris (n = 11). Procedures were approved by the French Ethics Committee (CPP 09-CHUG-12, study 0907 for Lyon and Grenoble; CPP Paris VI, Pitié-Salpêtrière Hospital, INSERM C11–16 for Paris).
Recordings
Intracerebral activity was recorded in patients after the stereotactical implantation of multilead depth electrodes (as described in Lachaux et al., 2003; Lopez-Persem et al., 2020). In Grenoble and Lyon, 12–18 semirigid 0.8-mm-wide electrodes, with 6–18 leads of 2 mm positioned 1.5 mm apart (Dixi), were implanted in each patient, depending on the targeted region. Anatomical localization of electrode contacts was determined by positioning individual stereotactic scheme in the proportional atlas of Talairach and Tournoux (1988), after adjustment for brain size. Brain activity was recorded using an audio–video–EEG monitoring system (Micromed), equipped with 128 or 256 depth-EEG channels sampled at 512 Hz (0.1–200 Hz bandwidth). A contact located in the white matter served as a reference.
In Paris, brain activity was recorded using a NeuraLynx system (ATLAS, NeuraLynx), via 4–12 platinum contact, 1-mm-wide, 1.6-mm-long, nickel–chromium wired electrodes (AdTech). The contacts were anatomically localized on the basis of postimplant CT scans coregistered to preimplant 1.5 T MR scans. A bandpass filter (0.1–1,000 Hz) was used, and the least active electrode (in the white matter whenever possible) was set as reference. Anatomical localization in the MNI space was automatically recovered using the EpiLoc toolbox (v.V1, STIM engineering facility at the Paris Brain Institute; García-Pérez et al., 2015).
Before preprocessing, all contacts were rereferenced to their nearest neighbor on the same electrode, yielding bipolar derivations for signal analysis.
Experimental tasks
Participants completed two tasks: a likeability rating task in which they rated how much they would like to receive each of the items that were presented on screen sequentially and a binary choice task in which they chose between two items presented simultaneously the one they would prefer to receive. The results related to the rating task have already been reported (Lopez-Persem et al., 2020).
Behavioral tasks used in Paris were programmed on a PC using Matlab 2013 and the Cogent 2000 (Wellcome Department of Imaging Neuroscience) library of Matlab functions for stimulus presentation. Behavioral tasks used in Lyon and Grenoble were programmed using Presentation software (v.16.5, Neurobehavioral Systems). A set of 60 food items was used in Paris (for 60 rating and 60 choice trials) and 120 food items in Lyon and Grenoble (for 120 rating and 120 choice trials). All trials started with a fixation cross lasting for 1,500 ± 500 ms in the rating task and for 2,500 ± 500 ms in the choice task. There was no time limit for making the response (rating or choice).
In the likeability rating task (Fig. 1A, top panel), participants rated how much they liked food items, presented one by one in a random order, on a 21-step scale (from −10 to 10). For each rating, the initial position of the cursor was randomized. Using their right hand, participants could move it by pressing the left and right arrows on the keyboard and then validate its final position by pressing the space bar.
Behavioral tasks and results. A, Example task trials. Successive screenshots are shown from left to right, with durations in milliseconds (except for responses, which were self-paced). Top panels, in the rating task (short version), participants rated the likeability of food items by moving a cursor on a visual analog scale as reported in Lopez-Persem et al. (2020). The cursor was moved using the left and right arrows of the keyboard, and the rating was validated by pressing the space bar. Bottom panels, in the choice task (short version), participants selected their preferred food item by pressing the left or right arrow of the keyboard. B, Behavioral results of participants with electrodes implanted in the orbitofrontal cortex (OFC, n = 26). Top graph, Choice rate was fitted using a logistic regression against the signed decision value (difference between left and right option values). Bottom graph, Response time (RT) was fitted using a linear regression against the unsigned decision value. In both graphs, error bars and shaded areas are interparticipant SEM of observed data and model fits, respectively. See Extended Data Figure 1-1 for the distributions of value, confidence, and RT plotted as a function of distance and conditioned on choice consistency.
Figure 1-1
Distributions of value, confidence and RT conditioned on choice consistency. Val, Conf and RT are plotted as a function of distance (unsigned difference between option values), with light and dark colors for consistent versus inconsistent choices (i.e., for when the best-rated option was chosen versus not chosen). Each dot is a choice trial. Lines show group-level means within two bins obtained by median-splitting the distance, separately for consistent and inconsistent trials. Error bars are standard error of the mean. Download Figure 1-1, TIFF file.
In the binary choice task (Fig. 1A, bottom panel), participants expressed their preference between food items presented two by two. Likeability ratings were used to pair the food items such that in half the trials, the mean option value was varied while the distance was kept constant, and vice versa for the other half. The mean value is the average of likeability ratings and distance the unsigned difference between the two. The position of food items within a pair (left or right of fixation cross) and the presentation order of option pairs within a block (constant mean or constant distance) were pseudorandomized. Participants indicated which item they preferred by pressing the left or right arrow of the keyboard.
Behavioral data analysis
Only trials with 0.1 < response time (RT) < 10 s were included in the analysis (15 trials removed out of 2,398 trials in total). Likeability ratings (z-scored) were taken as option values for the analysis of binary choices. We first tested the psychometric properties of choice behavior to check that participants had understood the tasks, with (1) a logistic regression of choice against the signed difference between option values and (2) a linear regression of RT against the distance (unsigned difference) between option values. Unless otherwise specified, all regressions were conducted at the individual level and tested for significance at the group level, using two-tailed, paired t tests (random-effect analysis). All statistical analyses were performed using Matlab Statistical Toolbox (Matlab R2020a, The MathWorks).
ROI definition
The automated anatomical labeling (AAL; Tzourio-Mazoyer et al., 2002) atlas was restructured as explained in our previous study (Lopez-Persem et al., 2020). The vmPFC ROI was defined by merging the regions labeled as gyrus rectus and frontal medial orbital and the lOFC ROI by merging the frontal superior orbital and the frontal middle orbital regions. Coordinates of recording sites were calculated by averaging the MNI coordinates of the two contacts composing the bipolar derivation. The original dataset included a total of 4,273 recording sites in 35 participants. Among the 3,440 remaining recording sites after removal of those with low-quality signal, 204 sites were located within one of our OFC ROI (vmPFC + lOFC; Fig. 2).
Anatomical localization of recording sites in the orbitofrontal cortex (OFC). Each recording site was positioned in the MNI space and labeled according to the AAL atlas, as explained in Lopez-Persem et al. (2020). Colored voxels show the location of recording sites in the ventromedial prefrontal cortex (vmPFC, 66 sites in cyan) and the lateral orbitofrontal cortex (lOFC, 138 sites in magenta). Frontal slices at the top and axial slices at the bottom correspond to planes illustrated with vertical and horizontal bars, respectively, on the lateral view of the brain (in the top left corner). Numbers indicate their MNI y and z coordinates (in millimeters).
Electrophysiological signal processing
Recorded iEEG activity was analyzed using the FieldTrip Matlab toolbox for electrophysiological analysis (http://www.ru.nl/neuroimaging/fieldtrip; Oostenveld et al., 2011) as well as homemade Matlab scripts. Derivations were computed between adjacent recording sites, from the same electrode, yielding a bipolar montage. Because contributions from nonlocal assemblies were canceled out, the signal was considered as originating from a cortical volume centered in between the two contacts, referred to as the recording site (Jerbi et al., 2009). The signal was bandpass filtered (2–200 Hz), and the 50 Hz line noise was notch-filtered out.
Next, iEEG signal was decomposed using a “multitapering” time–frequency transform (Slepian tapers, lower frequency range, 2–32 Hz, six cycles and 3 tapers per window; higher frequency range, 32–200 Hz, fixed time windows of 240 ms, 4–31 tapers per window). To enable more precise power estimation, smoothing was adaptively increased across frequencies by using a constant number of cycles across frequencies up to 32 Hz (hence a time window that expands when frequency increases) and a fixed time window with an increasing number of tapers above 32 Hz.
For the frequency-specific analysis, continuous iEEG signals were first bandpass filtered in sub-bands of 1 Hz width using a zero-phase shift, noncausal, finite impulse filter with 0.5 Hz roll-off. The envelope of each sub-band was then computed using the standard Hilbert transform. The resulting envelope signal, i.e., time-varying amplitude, was downsampled to 64 Hz (duration of a time sample, 15.625 ms) and normalized (divided by its mean across the entire recording session and multiplied by 100). A single time series was then estimated by averaging envelopes across sub-bands. Of note, the mean value of time series over the entire session is 100 for each band, by construction. This procedure was meant to counteract the bias favoring higher frequencies induced by the 1/f drop-off in amplitude.
Electrophysiological data analysis
Analyses of iEEG activity were first focused on the standard θ band (4–8 Hz), because it matches with both the low-frequency range in which value and confidence signals were initially observed (Hunt et al., 2012) and with the main cluster of choice-evoked increase in power (Fig. 3A). Frequency-specific time series of iEEG activity recorded from each site were epoched on trial durations with two possible time-locking: either to stimulus onset (option display) or response selection (button press). For each time point, the signal was then regressed across trials against a general linear model (GLM):
Val and Conf signals in low-frequency iEEG activity. For each recording site located in the OFC, trial-by-trial iEEG time series were time-locked either to stimulus onset (left panels) or button press (right panels). The time window is restricted to the interval around choice (RT95% is the duration at which choice is still ongoing in 95% of trials; −RT95% is the time at which options are already on screen in 95% of trials). The same analysis was applied to each site and results were then averaged across sites. A, Time–frequency decomposition of the evoked response. Power at each frequency is corrected for baseline activity (mean over a 1 s prestimulus time window). Contour lines indicate significant clusters surviving Bonferroni’s correction for multiple comparisons. For the results of the same analysis considering each separate subregion (the vmPFC and the OFC), see Extended Data Figure 3-1. B, Time course of Val and Conf regression estimates (β). At each time point, power in the θ range (4–8 Hz) was regressed against Val and Conf, which were orthogonalized and included in the same GLM. Significance level (uncorrected p-value) is estimated using a t test of regression estimates against zero, across recording sites. Horizontal red dotted lines indicate the Bonferroni-corrected statistical threshold; p-values are highlighted in bold when they survive correction based on random-field theory (RFT). For the results of the same analysis considering each separate subregion (the vmPFC and the OFC), see Extended Data Figure 3-1. C, Variants of the response-locked analysis presented in B, with Val and Conf regression coefficients estimated from two separate GLMs (top graph) or with explained variance instead of regression estimates (bottom graph). In the latter case, chance level under the null hypothesis was estimated using permutations (see Materials and Methods).
Figure 3-1
Val and Conf signals in vmPFC versus lOFC activity. For each recording site located in either the vmPFC (n = 66, left panels) or the lOFC (n = 138, right panels), trial-by-trial iEEG time series were locked either to stimulus onset or button press. Vertical dotted lines indicate the time at which choice is still ongoing in 95% of trials (RT95%). The same analysis was applied to each site and results were then averaged across sites. A) Time-frequency decomposition of the evoked response. Power at each frequency is corrected for baseline activity (mean over a 1-s pre-stimulus time window). Contour lines indicate significant clusters surviving Bonferroni correction. B) Time course of Val and Conf regression estimates and significance levels. At each time point, power in the θ range (4-8 Hz) was regressed against Val and Conf. Significance level (uncorrected p-value) is estimated using a t-test of regression estimates against zero, across recording sites. Horizontal red dotted lines indicate the Bonferroni-corrected statistical threshold; p-values are highlighted in bold when they survive correction based on random-field theory (RFT). Download Figure 3-1, TIFF file.
Statistical significance of regression estimates was computed across recording sites (n = 204), using two-tailed one-sample t tests. Two corrections for testing multiple time–frequency points were implemented. The first is Bonferroni’s correction, which consists of dividing the significance threshold by the number of data points. The second correction integrates statistical dependencies across frequencies and time points, which were estimated based on random-field theory, using the VBA toolbox (available at http://mbb-team.github.io/VBA-toolbox/; Daunizeau et al., 2014).
Several control analyses were conducted to assess the solidity of the findings. To ensure that results were not driven by particular individuals, an intercept per participant was added as a random factor in a mixed-effect GLM (using Matlab function fitglme) and residual regression estimates within a periresponse [−0.5 s, 0.5 s] time window were tested against zero. Another control analysis was run to address the issue that Val and Conf constructs are partially correlated, as they share a common term (the chosen option value). The regressors were orthogonalized in the GLM but this orthogonalization itself could generate spurious correlations. To mitigate potential artifacts due to the orthogonalization of regressors in the main GLM (Eq. 1), the analysis was repeated with a GLM that included only Val (without Conf) or only Conf (without Val):
We also investigated the extent to which Val and Conf signals originate from the same recording sites. To that end, we computed a single regression estimate per site by fitting the main Val/Conf GLM (Eq. 1) to the mean θ signal over the periresponse [−0.5 s, 0.5 s] time period. The correlation of the resulting Val and Conf regression estimates was tested across recording sites using Pearson's coefficient. Significance at the site level was assessed by comparing regression estimates against zero using a t test. This test served to identify the sites exhibiting a significant correlation with either Val or Conf or both. The figures that display the localization of Val and Conf signals on the frontal and sagittal slices of the anatomical MNI brain template were made using the FSL function fslmaths (to create the anatomical mask of the recording sites; Jenkinson et al., 2012) and MRIcroGL (to superimpose the anatomical mask onto the MNI template; www.nitrc.org/projects/mricrogl).
The spatial distribution of Val and Conf signals within the OFC was investigated to test for a dissociation between the vmPFC and lOFC regions (along the mediolateral x-axis), and for a posteroventral-to-anterodorsal gradient (along the y–z axis) that was previously observed for value and confidence in the medial prefrontal cortex (De Martino et al., 2017; Clairis and Pessiglione, 2022). Periresponse Val and Conf signals (i.e., regression estimates across trials) were regressed against MNI coordinates of each axis separately, as well as the composite y–z axis:
Additionally, the mean θ activity over the periresponse [−0.5 s, 0.5 s] time period was compared between trials presenting easy versus hard choices (sorted by the median split of the distance between option values) and between trials ending with consistent versus inconsistent choice (consistent meaning that the best-rated option was selected).
Value comparison model simulations
We simulated the choices that an attractor network model would make when offered the same options as our participants. The model used to simulate the value comparison process is a mean-field reduction detailed in Wong and Wang (2006) of a spiking neuronal network described in Wang (2002) that was adapted to the case of economic binary choice by Hunt et al. (2012). The network is reduced to two units, each receiving external input currents proportional to the value of one specific option, as well as noisy background input that resembles endogenous noise in the cortex. Synaptic connections include an excitatory recurrent coupling onto each unit and an effective inhibitory coupling to the other unit. The firing rate of each unit is then calculated as a monotonic function of the total synaptic input (integrating external and internal currents). We kept the specifics of the script kindly provided by Laurence Hunt, except for adjustments to our task and participants:
The number of simulated datasets was increased to 26 (the number of participants included in the present study) and the number of trials to 60 or 120 (depending on the version of the task that was performed).
Actual likeability ratings were used for option values, after normalization to match the range of values used in the initial simulations (Hunt et al., 2012).
The duration was increased ([−1 s, 5 s], stimulus-locked) to ensure that a decision was reached. Stimulation (input values) was suppressed 1 s after it started, to match the RT95% of participants.
The input strength parameter was adjusted (kopt = 0.225) to match the average choice consistency (i.e., the proportion of trials in which the best-rated option is selected) observed across our participants.
Simulations were done separately for each participant and were then averaged across participants. We then added a couple of extrasimulations. First, we time-locked the activity to the decision, defined when one of the units reached a fixed threshold of 30 Hz as was done in Hunt et al. (2012). The total activity of the network (synaptic current summed over the two units) was then regressed across trials against the same GLM as done with iEEG activity (with left-to-right serial orthogonalization):
Results
Choice behavior
For our purposes, we restricted our sample to the subgroup of patients with electrical contacts in the OFC (n = 26; age, 37.2 ± 10.2 years; 14 females), who performed both likeability rating and binary choice tasks (Fig. 1A). Likeability ratings were used as proxies for the values assigned to options in the analysis of choice behavior. Choice rate (frequency of left choice) was significantly related to decision value (difference between the left and right option value) in a logistic regression model (βVl−Vr = 1.89 ± 0.21, tVl−Vr(26) = 9.07, pVl−Vr= 2.23⋅10−9; Fig. 1B, top panel). Choice response time (RT) was significantly related to choice easiness (i.e., negatively correlated to the distance between option values) in a linear regression model (β|Vl−Vr| = −0.21 ± 0.04, t|Vl−Vr|(26) = −5.47, p|Vl−Vr| = 1.12⋅10−5; Fig. 1B, bottom panel). Thus, standard psychometric measures confirmed that patients were making decisions based on their subjective preferences.
We also checked the psychometric properties of our two key constructs for value and confidence and noted Val (for the sum of option values) and Conf (for the probability that the chosen option is best). The variations of Val and Conf with distance, conditioned on choice consistency (i.e., on whether or not the best-rated option was chosen), are shown in Extended Data Figure 1-1. Val globally decreased with distance (mean r = −0.27), irrespective of choice consistency. The distribution of choice trials reflects the variations along horizontal and vertical lines, corresponding to the blocks with constant mean and constant distance, respectively. Conf globally increased with distance (mean r = 0.49), but the slope depended on choice consistency: it was positive for consistent choices and negative for inconsistent choices. This pattern demonstrates a well-known property of confidence, which increases with evidence when choice is correct and decreases when choice is incorrect (Sanders et al., 2016; Urai et al., 2017; Rouault et al., 2023). The reverse pattern was observed with RT, as could be expected from the negative relationship between Conf and RT (mean r = −0.29), which has been repeatedly observed (De Martino et al., 2013; Clairis and Pessiglione, 2022).
iEEG activity
We took advantage of electrodes implanted in a total of 26 patients to record iEEG activity from 204 contacts (Fig. 2) located in the lOFC, defined as in our previous study (Lopez-Persem et al., 2020) using the automated anatomical labeling (AAL, see Materials and Methods) by merging the vmPFC (n = 66 sites: 1–9 sites per patient, mean of 2.64 ± 0.37 sites) and lOFC (n = 138 sites: 1–16 sites per patient, mean of 5.75 ± 0.80 sites).
We first examined which frequency bands would show power modulation during the choice process relative to baseline. The time–frequency decomposition of trial-wise iEEG signals (full range, 2–150 Hz) revealed a significant power increase in low-frequency activity (Fig. 3A), around the θ range (4–8 Hz), that started with stimulus onset and ended with a button press. There was also a significant power decrease in higher frequency bands, around the β range (13–30 Hz), that lasted after button press and was previously described as movement-related desynchronization (Meyniel and Pessiglione, 2014). When doing this analysis separately for vmPFC and lOFC recording sites (Extended Data Fig. 3-1A), we observed a qualitatively similar pattern in the two regions but a weaker θ increase in the vmPFC relative to lOFC, probably due to a lower number of electrodes implanted in this region.
To investigate the neural correlates of value and confidence, we focused on the θ range, because it showed the highest evoked response in our dataset and because the electrophysiological correlates of value sum and difference were previously observed in a low-frequency range (Hunt et al., 2012). The parametric modulation of iEEG activity across trials was investigated by fitting to every time point a GLM that included our regressors of interest (Val and Conf) together with a trial-wise baseline (mean signal between −1 and 0 s prestimulus), as well as trial number and response time (RT). Surprisingly, regression estimates were more clearly modulated when time-locking iEEG signals to the response and showed opposite signs for Val and Conf (Fig. 3B). Indeed, θ power was negatively correlated with Val (from −0.61 to 0.25 s postresponse, with peak βVal = −0.032 ± 0.009, tVal(203) = −3.65, pVal = 3.38⋅10−4) and positively correlated with Conf (from 0.016 to 0.41 s postresponse, with peak βConf = 0.028 ± 0.007, tConf(203) = 4.01, pConf = 8.53⋅10−5). Of note, this result remained significant when including patient identity as a random factor in the group-level analysis (within the [−0.5 s, 0.5 s] time window: βVal = −0.014 ± 0.004, tVal(203) = −3.48, pVal = 6.22⋅10−4; βConf = 0.010 ± 0.005, tConf(203) = 2.01, pConf = 4.59⋅10−2), suggesting that it was not driven by a minority of individuals.
When doing this analysis separately for the two OFC subregions (Extended Data Fig. 3-1B), we observed that the association with Val was significant in the vmPFC and the association with Conf was significant in the lOFC sites. However, we could not conclude for a dissociation, because the effects were qualitatively similar in the two regions (just passing the threshold in one case and not the other). They both showed a stronger association (negative for Val and positive for Conf) with iEEG activity when time-locked to responses rather than stimuli.
These results confirm the presence of information about overall value and choice confidence in low-frequency OFC activity, over a periresponse time period. To check that the results were not artifacted by the orthogonalization of Val and Conf regressors when included in a single GLM, we repeated the regression analysis with a GLM that contained only Val (without Conf) or only Conf (without Val), plus the regressors of no interest. The results (Fig. 3C, top panel) were virtually unchanged, showing that Val and Conf were capturing separate parts of variance in iEEG activity. Then we conducted an analysis that is agnostic about the sign of correlation by reversing the logic and computing how much variance in value and confidence would be explained by iEEG activity (Fig. 3C, bottom panel). Significant information about Val was found from −0.48 to 0.17 s postresponse (peak r2Val = 1.92 ± 0.21%, tVal(203) = 3.39, pVal = 4.20⋅10−4), and significant information about Conf was found from −0.53 to −0.094 s postresponse (peak r2Conf = 1.91 ± 0.21%, tConf(203) = 3.05, pConf = 1.30⋅10−3). This analysis confirmed the presence of both value and confidence representations in OFC low-frequency activity, peaking at about the same time just before the response.
To explore whether associations with Val and Conf would be present in other frequency bands, we extended the GLM analysis to the full time–frequency space (Fig. 4A). Although the negative correlation with Val and positive correlation with Conf are visible in the low-frequency area (4–8 Hz), no cluster survived correction for multiple comparisons, due to the high number of time–frequency data points. However, a simple median-split analysis, comparing trials with high versus low Val and high versus low Conf, yielded globally similar maps but with significant clusters (Fig. 4B). This analysis extends our conclusions based on θ-range activity, the negative association with Val being also observed in δ-range, α-range, and β-range clusters and the positive association with Conf being observed in α-range clusters, all emerging within a periresponse time window (i.e., [−0.5 s, 0.5 s] around choice).
Val and Conf signals across frequency bands of iEEG activity. For each recording site located in the OFC, trial-by-trial iEEG time series were time-locked to button press and epoched around choice (−RT95% is the time at which options are already on screen in 95% of trials). The same analysis was applied to each site, and results were then averaged across sites. A, Regression estimates obtained for Val (left map) and Conf (right map) regression estimates. B, Contrasted power between high versus low Val (left map) or high versus low Conf (right map). High and low trials were identified using a median split. Contour lines indicate significant clusters surviving Bonferroni’s correction for multiple comparisons.
Then we examined the spatial distribution of Val and Conf signals, meaning Val and Conf regression weights in explaining θ activity, extracted from the [−0.5 s, 0.5 s] time window. There was no particular pattern for Val and Conf signals distribution (Fig. 5A) along the mediolateral axis, which distinguishes vmPFC and lOFC regions. Among the 204 recording sites, 16 showed a significant correlation with Val only, 17 with Conf only, and 3 with both (Fig. 5B). This is not different from what could be expected if the two signals were independent from each other (χ2 = 1.01, p = 0.32). Although correlation was negative with Val and positive with Conf over the entire set of recordings, small clusters of recording sites showed significant opposite signals (positive with Val or negative with Conf). However, these clusters also failed to show significant correlations of the same sign with both Val and Conf (i.e., both Val-positive and Val-negative clusters failed to signal Conf, and both Conf-positive and Conf-negative clusters failed to signal Val). Recording sites with regression weights of the same sign for Val and Conf should cluster in the bottom left and top right quadrants of the 2D distribution, which was not observed (Fig. 5B). Overall, there was no correlation between Val and Conf signals across recording sites (r = 0.06, p = 0.42).
Distribution of Val and Conf signals across recording sites. A, Localization of Val and Conf signals. Significant recording sites (blue for Val, orange for Conf, black for both) are superimposed on frontal and sagittal slices of the anatomical MNI brain template (taken at y = 40 mm and x = 5 mm, respectively). Significance was assessed using a GLM fitted to the mean signal in the θ range (4–8 Hz), extracted from the [−0.5 s, 0.5 s] time window surrounding choice (button press). B, Correlation between Val and Conf β regression estimates across recording sites. Dots are recording sites, and r is Pearson's correlation coefficient. The dark gray line shows the linear regression fit and bars the marginal distributions of β regression estimates. C, Gradient of Val and Conf signals along a posteroventral-to-anterodorsal (y–z) axis. Dots are recording sites and βy–z is the estimate from the regression of the difference between Conf and Val β estimates against the coordinates projected onto the y–z axis. For the gradient of Val and Conf along each separate axis (x, y, z), see Extended Data Figure 5-1.
Figure 5-1
Spatial gradients of Val and Conf signals. Spatial gradients of Val and Conf regression coefficients (top and bottom rows, respectively) were tested within the OFC along the medial-lateral (|x|), the posterior-anterior (y), and the caudal-rostral (z) axes (left, middle, and right columns, respectively). For each recording site, β regression coefficients were estimated across trials from a GLM fitted to the mean signal in the θ range (4-8 Hz), extracted from the [–0.5 s, 0.5 s] time window surrounding choice (button press). Then, the gradients were estimated using linear regressions of β estimates against MNI coordinates, separately for each of the three axes. Dots are recording sites; colors indicate significant associations (with Val in blue, with Conf in orange, with both in black); dark grey lines show the linear fits. Bold regression coefficient and p-value indicates the only significant gradient surviving Bonferroni correction (Conf signal along the z-axis). Download Figure 5-1, TIFF file.
To test for the presence of a Val-to-Conf gradient from posteroventral to anterodorsal regions (De Martino et al., 2017; Clairis and Pessiglione, 2022), we regressed the difference between Val and Conf signals against MNI coordinates projected along the y–z axis (Fig. 5C). This gradient was indeed significant (βy–z = 4.11·10−3 ± 1.70·10−3, ty–z(202) = 2.42, py–z = 1.66·10−2). However, a systematic regression of Val and Conf signals against each of the three axes separately (Extended Data Fig. 5-1) showed that the gradient was driven by Conf representation being more dorsal—this being the only association that survived Bonferroni’s correction for multiple tests (βz = 6.30·10−3, tz(202) = 3.03, pz = 2.80·10−3).
To further specify Val and Conf signals, we analyzed separately the pool of recording sites that were driving the main global results: significant negative association with Val (“Val−” sites, n = 15) or significantly positive correlation with Conf (“Conf+” sites, n = 10). When using our main GLM, the dissociation was confirmed (Fig. 6A, top panels): Val− sites were insensitive to confidence, and Conf+ sites were insensitive to value. When replacing Val and Conf in the GLM by the values of chosen and unchosen options (VCh and VUnch), regression estimates (Fig. 6A, middle panels) confirmed that Val− sites were signaling the two options with the same sign (for VCh, peak βCh = −0.18 ± 0.031, tCh(14) = −5.72, pCh = 5.29⋅10−5; for VUnch, peak βUnch = −0.15 ± 0.024, tUnch(14) = −6.19, pUnch = 2.36⋅10−5), whereas Conf+ sites were signaling the two options with opposite signs (for VCh, βCh = 0.12 ± 0.022, tCh(9) = 5.57, pCh = 3.48⋅10−4; for VUnch, peak βUnch = −0.13 ± 0.027, tUnch(9) = −4.65, pUnch = 1.20⋅10−3).
Decomposition of Val and Conf signals. The left and right columns show results pooled over recording sites driving Val and Conf signals, meaning sites for which θ-range (4–8 Hz) activity extracted from the [−0.5 s, 0.5 s] time window surrounding choice (button press) is negatively associated with Val and positively associated with Conf (Fig. 4, blue and orange sites). A, Time course of Val and Conf β regression estimates. At each time point, power in the θ range (4–8 Hz) was regressed against a GLM including as regressors either Val and Conf (same as in Fig. 3; top panels), chosen and unchosen option values (not orthogonalized; middle panels), or response time (RT; bottom panels). Significance level (uncorrected p-value) is estimated using a t test of regression estimates against zero, across recording sites. Horizontal red dotted lines indicate the Bonferroni-corrected statistical threshold; p-values are highlighted in bold when they survive correction based on random-field theory (RFT). B, Effects of choice consistency and choice easiness. θ-range (4–8 Hz) activity extracted from the [−0.5 s, 0.5 s] time window surrounding choice (button press) was averaged separately for consistent and inconsistent trials (when the best-rated option was chosen and not chosen) and for easy and hard trials (sorted by a median split on the distance between option values). Dots are recording sites, and bars and error bars are means and standard errors of the mean. Bold p-values indicate the differences that survived Bonferroni’s correction.
When replacing Val and Conf by RT in the GLM (Fig. 6A, bottom panels), we observed no significant correlate in Val− sites, but a significant negative association in Conf+ sites (peak βRT = −0.18 ± 0.061, tRT(9) = −3.00, pRT = 1.49⋅10−2). This strengthens the interpretation that Conf+ sites reflect confidence, given that confidence is lower when RT is longer. As shown in Extended Data Figure 1-1, confidence-signaling activity is expected to increase with both choice easiness (distance between option values) and choice consistency (selection of best-rated option). We tested these predictions on θ activity recorded in Val− sites and Conf+ sites (Fig. 6B): we found no effect in Val− sites, but θ activity in Conf+ sites was higher when choices were easier (teasiness(11) = 4.30, peasiness = 1.27⋅10−3) and consistent (tconsistency(11) = 6.30, pconsistency = 5.82⋅10−5), following the expected signature of a confidence signal.
Model simulation
Finally, we examined whether the value and confidence signals observed here could arise from a comparison process, as was previously argued by Hunt and colleagues in their seminal paper (2012). Indeed, low-frequency MEG activity reconstructed from a vmPFC region was found to signal the sum and difference of option values in a binary choice task, which was interpreted as reflecting the activity of neural network implementing the comparison between options. This comparison mechanism can be simulated in an artificial neural network (ANN; Wang, 2002; Wong et al., 2007), where two distinct excitatory units take as input the values of the two options and compete with each other through reciprocal inhibition (Fig. 7A). When the two options are made available, a competition emerges until one pyramidal unit overruns and silences the other, while reaching a plateau that triggers the selection of the associated option. We adapted the ANN used by Hunt and colleagues to reproduce the mean rate of consistent choices and the mean RT of our patients, using their likeability ratings. Simulated activity of the ANN was then analyzed with the same GLM as was done with observed OFC iEEG activity. Results replicate the pattern reported by Hunt and colleagues and extend to Val and Conf signals (Fig. 7B), which was expected given that Val is nothing but the sum of option values and Conf is a sigmoid transformation of the difference between chosen and unchosen option values.
Simulations of the attractor network model. A, The model (borrowed from Hunt et al., 2012) simulates a competition, via mutual inhibition, between two self-excitatory units, each taking as input one of the two option values. The model has been adapted to the choices made by each of the 26 participants, using likeability ratings as option values. A decision is made when the difference in activity between the two units reaches a plateau. The input is stopped 1 s after the onset of choice options, to roughly match the average RT95% (time at which the choice is still ongoing in 95% of trials) observed in our participants. B, At each time point, the total activity of this simple network has been regressed against Val and Conf, meaning the overall value (Ch + Unch) and the choice probability, which is a sigmoid transform of the value difference (Ch – Unch). The left and right panels show the time course of regression estimates (top) and significance levels (bottom) when activity is locked to stimulus onset and to button press, respectively. Significance level (uncorrected p-value) is estimated using a t test of regression estimates against zero, across the 26 simulated participants. Horizontal red dotted lines indicate the Bonferroni-corrected statistical threshold; p-values are highlighted in bold when they survive correction based on random-field theory (RFT). Vertical black dotted lines indicate median RT across trials (RT50%).
However, there were key differences between simulated and real data. First, Val and Conf signals only appear when time-locked to the response in OFC data, whereas time-locking does not matter in simulated data. Second, Val and Conf signals have opposite signs in OFC data, whereas they are both positive in simulated data. Third, different recording sites in the OFC only pick up either Val or Conf signal, whereas an electrode at the vicinity of the ANN would necessarily pick up both Val and Conf signals.
Thus, low-frequency iEEG activity in the OFC was more compatible with sparse coding of value and confidence estimates that build up in the course of choice deliberation, rather than reflecting a single mechanism comparing ready-made option values.
Discussion
In this study, we took advantage of electrodes implanted in the OFC of patients with pharmacoresistant epilepsy to investigate value and confidence signals during preference-based decisions. We observed that low-frequency OFC activity reflected both constructs Val (sum of option values) and Conf (probability that chosen option is best).
The correlation with Val is consistent with the general idea that the OFC is part of the brain valuation system (Schultz, 2006; Padoa-Schioppa, 2007; Rangel et al., 2008; Bartra et al., 2013), whose signals may provide a common neural currency for ordering choice options (Montague and Berns, 2002; Levy and Glimcher, 2012). However, this correlation does not tell whether OFC activity signals the overall value of the option set, as previously suggested (Shenhav and Karmarkar, 2019), or represents separate estimates for the two option values, which would be aggregated in the recorded signal. When introduced as separate regressors in the general linear model, option values were both reflected in the activity of Val-signaling sites, with similar temporal dynamics. This pattern seems to suggest that the two value estimates are generated simultaneously, but it could also come from averaging even if options are valuated alternatively. Indeed, if the sequence of option exploration differs between participants or between trials, the temporal alignment of option value signals would be blurred. The issue might be solved by monitoring eye movements, if we assume that the OFC signals the value of the option that is looked at, as suggested in some versions of sequential sampling models (Krajbich et al., 2010). The value signal observed here is indeed compatible with a neural network model that transforms a set of attributes into a distribution of activity that can be decoded by downstream neurons to infer the value of attended and unattended options (Pessiglione and Daunizeau, 2021). To further test this model, it would be necessary to know which option is attended at any time point and to simultaneously record a distribution of neural activities. With the present dataset, multivariate decoding was not feasible because recording sites were sampled in different patients making different choices.
The correlation with Conf is consistent with a wealth of studies that reported representation of confidence in OFC activity, not only during preference-based decision or judgment (De Martino et al., 2013; Lebreton et al., 2015; Bobadilla-Suarez et al., 2020; Lopez-Persem et al., 2020; Shapiro and Grafton, 2020) but also during perception-based or memory-based decisions (Chua et al., 2006; Bang and Fleming, 2018; Gherman and Philiastides, 2018; Morales et al., 2018; Rouault et al., 2023). Further investigation showed that the pattern of low-frequency activity recorded by Conf sites was consistent with a notion of confidence: it increased with chosen option value and decreased with unchosen option value, it was higher both when choice was easier (more distance between option values) and when choice was right (the best-rated option being selected), and it was also associated with shorter choice RT. Yet, one limitation of these results is that we use a proxy for confidence and not the confidence reported by the participant. Previous studies nevertheless showed that our confidence proxy is tightly correlated to confidence rating (Clairis and Pessiglione, 2022) and that both were correlated with iEEG activity in the OFC (Lopez-Persem et al., 2020).
It could be argued that the inclusion of Val and Conf regressors in the same GLM might induce spurious correlations since they share one variable (chosen option value). This was not the case, however, because the same result was obtained whether the two regressors were orthogonalized and tested in the same GLM or tested in separate GLMs. Regarding the dynamics, Val signals were observed slightly before Conf signals in the parametric regression analysis (with Val just before and Conf just after the response), but not in the explained variance analysis (in which the two signals peaked before the response). We note that results are also mixed in fMRI studies (which may have insufficient time resolution), some claiming that values are signaled before confidence (Shapiro and Grafton, 2020) and others that the two variables are simultaneously represented in vmPFC activity (Lebreton et al., 2015). The anteriority of value over confidence makes sense in the view that values are provided for the comparison and selection to take place, but not in the view that option values are constructed on the fly, simultaneously to the decision-making process. A recent account of decision-making (Lee and Daunizeau, 2021) suggested that option values are refined until the tradeoff between the expected confidence gain and the required deliberation time becomes disadvantageous. For the implementation of such a tradeoff mechanism, option values and choice confidence would need to be simultaneously represented.
Regarding the frequency range, the negative correlation between option value and low-frequency activity was already reported in iEEG studies during rating or bidding tasks where options were presented one at a time (Lopez-Persem et al., 2020; Shih et al., 2023). When extending the analysis to the full frequency spectrum, we observed that the negative/positive correlation with value/confidence was globally true, with significant clusters in higher frequency ranges (classically labeled α and β). The absence of significant correlate in high frequencies may be surprising, as most iEEG studies reported equivalents of BOLD responses in broadband γ-activation (Jerbi et al., 2009; Lachaux et al., 2012), including recent studies using similar recording and analytic methods, but a task requiring just one value estimate per trial (Lopez-Persem et al., 2020; Cecchi et al., 2022; Shih et al., 2023). One explanation could be that value and confidence signals follow gaze fixation (i.e., which option is considered for selection) and therefore would only appear in low-frequency activity because temporal smoothing would compensate for misalignment of visual saccades.
Regarding the spatial distribution, OFC signals appeared to shift from value to confidence along the posterior-to-anterior and ventral-to-dorsal axis. This replicates, within the OFC, the value-to-confidence gradient that was previously documented along the medial prefrontal cortex (De Martino et al., 2017; Clairis and Pessiglione, 2022). This gradient might reflect some overlapping, but independent, distributed codes for value and confidence across OFC subregions. It should be noted that, as in our previous iEEG study (Lopez-Persem et al., 2020), there was no clear dissociation between vmPFC and lOFC subregions, both containing recording sites with significant value and confidence signals. This is at odds with fMRI studies, which typically find correlates of value and confidence in ventromedial but not ventrolateral prefrontal regions (Bartra et al., 2013; Clithero and Rangel, 2014; Vaccaro and Fleming, 2018). However, electrophysiological recordings in monkeys also identified value signals in lateral parts of the OFC (Rich and Wallis, 2016; Hunt et al., 2018; Pastor-Bernier et al., 2021). Thus, iEEG recordings in patients may offer a bridge between human fMRI and monkey electrophysiology results. It remains unclear why fMRI studies generally fail to demonstrate correlates of value in the lOFC. This might come from a lack of sensitivity due to the proximity of air-filled sinuses, to a more stringent correction for multiple comparisons in whole-brain analyses given the large number of voxels, or to a higher individual variability in the location of value-signaling voxels that would weaken group-level statistics.
Finally, we compared the value and confidence signals observed in OFC activity with those generated by an attractor network model implementing a comparison between option values (Hunt et al., 2012). While both expressed value and confidence signals, real OFC activity differed from simulated activity on several key points. First, value and confidence signals only appeared when time-locking OFC activity to response onset (not stimulus onset), suggesting that these representations were not input to the system but constructions that triggered the decision when fully achieved. Second, the sign of correlation with OFC activity was opposite for value and confidence, contradicting the predictions of the comparison mechanism. Third, value and confidence signals were recorded in distinct locations, violating the assumption that they could arise from the same attractor neural network. Overall, the pattern of OFC activity observed here was more compatible with the idea of a large neural population transforming the attributes of choice options into sparsely coded value and confidence estimates that may be readout by different downstream neurons (Pessiglione and Daunizeau, 2021).
Although patients tested here suffered from epilepsy, we treated iEEG activity recorded in their OFC as stemming from a normal brain. This is reasonable given that epileptic foci were located outside the OFC, that epileptic artifacts were removed from raw recordings, and that epileptic events were unlikely to coincide with value and confidence constructs anyway. Yet our findings may provide insight into the consequences of OFC damage for decision-making. Rather than being unable to compare values, patients with partial OFC lesions may have distorted values and confidence signals, depending on which neurons are impaired with respect to the readout codes. If confidence is indeed driving the metacognitive control of decisions, lesioned patients may be unable to adjust the deliberation process, hence making poor decisions with high confidence or vice versa.
Footnotes
We thank Jean Daunizeau and Sébastien Bouret for their insightful comments and Laurence Hunt for providing the code of his neural network model. T.L. was supported by the École de l’INSERM Liliane Bettencourt and funded by the Fondation pour la Recherche Médicale (ECO201906008985, FDT202204014807). The study was supported by the Investissements d’Avenir program (ANR-10-IBHU-0003).
The authors declare no competing financial interests.
- Correspondence should be addressed to Mathias Pessiglione at mathias.pessiglione{at}gmail.com.