Abstract
Deciding how long to keep waiting for uncertain future rewards is a complex problem. Previous research has shown that choosing to stop waiting results from an evaluative process that weighs the subjective value of the awaited reward against the opportunity cost of waiting. Activity in the ventromedial prefrontal cortex (vmPFC) tracks the dynamics of this evaluation, while activation in the dorsomedial prefrontal cortex (dmPFC) and anterior insula (AI) ramps up before a decision to quit is made. Here, we provide causal evidence of the necessity of these brain regions for successful performance in a willingness-to-wait task. Twenty-eight participants (20 female and 8 male) with lesions to different regions of the frontal lobe were tested on their ability to adaptively calibrate how long they waited for monetary rewards. We found that participants with lesions to the vmPFC waited less overall, while participants with lesions to the dmPFC and anterior insula were specifically impaired at calibrating their level of persistence to the environment. These behavioral effects were accounted for by systematic differences in parameter estimates from a computational model of task performance.
Significance Statement
Achieving positive outcomes in education, health, or personal finance often involves pursuing larger future rewards at the cost of smaller, more immediate ones. Most neuroscience research on future reward pursuit has focused on the initial choice between a smaller reward that will arrive quickly or a larger reward that will arrive later. However, once the choice has been made, persisting in the initial choice of the later reward through the waiting period is perhaps even more critical to success. Here, we identify specific and dissociable causal roles for different regions of the prefrontal cortex in determining people’s ability to adaptively persist. This finding extends our understanding of how the brain supports subjective value maximization in the context of delayed rewards.
Introduction
In order to achieve desirable outcomes, we must often devote time toward a goal. From mastering a new skill to developing meaningful relationships, many realms of human activity benefit from perseverance. This notion has led to the conventional wisdom that grit is a key to success and personal development (Duckworth et al., 2007). In a laboratory setting, delay of gratification paradigms like the well-known marshmallow test (Mischel et al., 1972, 1989; Mischel and Baker, 1975) have measured persistence by evaluating how long people—mostly children—are willing to wait to receive a desirable reward. Individual differences in the ability to delay gratification in these tasks correlate with lifelong success, and longer wait durations are taken to be indicative of greater discipline and self-control (Shoda et al., 1990; Ayduk et al., 2000; Casey et al., 2011).
That being said, there are also situations in which persistence leads to suboptimal outcomes. For example, a test taker who gets stuck on a question and devotes the rest of the time to solving it will most likely get a bad grade overall. Similarly, a tourist who waits in line for hours for the city’s most popular chocolate chip cookie is probably missing out on more exciting sight-seeing experiences. Thus, although grit and perseverance can be necessary for success, they can also be misapplied—any attempt to evaluate the optimality of committing time toward a prospective reward must take the broader decision context into account (Fawcett et al., 2012; McGuire and Kable, 2013; Berkman et al., 2017).
In fact, many psychiatric disorders are characterized by either too much or too little persistence. For instance, obsessive-compulsive disorder (OCD) often involves the repeated performance of rituals that are time-consuming and not causally related to tangible success (Leckman et al., 1997; Stewart et al., 2007). Conversely, depression can lead people to give up sooner than they should, even when the outcome is valuable and approaching quickly (Mukherjee et al., 2020). Individuals who struggle with addiction often have difficulty sticking with the choice of the delayed but ultimately more fulfilling rewards associated with sobriety, especially when faced with the immediacy of the drug. A more comprehensive understanding of how people assess when persistence is worthwhile and when it is not will pave the way for clinical interventions that may help improve decisions.
Previous research has shown that healthy adults do indeed leverage the contextual features of their environment in order to calibrate persistence (McGuire and Kable, 2012, 2015; Kidd and Hayden, 2015; Lempert et al., 2018; Lang et al., 2021). This adaptability has most explicitly been demonstrated in the realm of waiting behavior, when participants must decide how long they are willing to wait for an uncertain future reward. Thus, although persistence can take many different forms, including active effort expenditure, we will focus here on willingness to wait.
We have previously shown that people use the temporal statistics of reward delivery to adaptively determine whether to persist (McGuire and Kable, 2012, 2015). In those experiments, participants waited for a monetary reward in two temporal environments: one in which the distribution of wait durations was heavy-tailed and another in which it was uniform. In the heavy-tailed condition, the expected time remaining until the reward arrived increased the longer one had already waited. The reward-maximizing behavior was therefore to move on to a new trial after a short delay. Conversely, in the uniform condition, the expected time remaining until the reward arrived decreased as one waited, meaning that the optimal behavior was to continue waiting as long as necessary. Participants’ behavior paralleled these normative principles, as they displayed significantly greater willingness to wait in the uniform environment. In general, performance on these sorts of willingness-to-wait (WTW) tasks reflects the results of a dynamic and context-dependent comparison between the anticipated value of the potential reward and the opportunity cost of waiting for it.
Neural evidence further corroborates this framework: McGuire and Kable (2015) found that activity in the vmPFC, an area of the brain sensitive to subjective value (Chib et al., 2009; Bartra et al., 2013; Clithero and Rangel, 2014), evolved differently in the different temporal environments. If the distribution of wait durations favored persistence (uniform distribution), the vmPFC signal ramped up progressively as time passed and the wait shortened. Conversely, if the distribution of wait times was such that the value of waiting decreased after a certain point (heavy-tailed distribution), activity in the vmPFC was flat until people chose to quit (Lamichhane et al., 2022 for a parallel finding). Furthermore, activity in several brain regions—including the insula and dorsal regions of the prefrontal cortex—ramped up before the decision to quit was made (McGuire and Kable, 2015). Beyond providing insight into the neural mechanisms that underlie persistence in willingness-to-wait tasks, these findings are important because both the location and time course of neural activity suggests a dynamic assessment of subjective value.
Although these fMRI results provide evidence that the vmPFC, dmPFC, and anterior insula are involved in adaptive persistence, they cannot demonstrate that these brain regions are necessary for such behavior. To address this question, we collected behavioral data in a sample of 28 participants with lesions to different areas of the frontal cortex, as well as in 18 control participants without brain lesions. Doing so allowed us to assess whether damage to regions known to be engaged during this task cause deficits in people’s ability to calibrate persistence. We separated our sample into three groups based on the location of the lesion: one group (n = 8) had lesions that mostly affected the dorsomedial prefrontal cortex (dmPFC) or anterior insula (AI), another group (n = 10) had lesions to the vmPFC, and the final group (n = 10) had lesions in other areas of the frontal cortex (frontal controls). Behaviorally, we found that participants with lesions to vmPFC waited less in both reward-timing conditions, while participants with lesions to the dmPFC/AI were significantly impaired in calibrating their waiting behavior across conditions. We found no evidence for any impairment in the frontal controls.
Finally, to better understand where in the decision-making process participants with lesions differed from controls, we fit a newly developed computational model of adaptive persistence (Chen et al., 2022, 2024). The model tracks running estimates of the value of waiting at each time step within a trial and probabilistically predicts waiting behavior as a result. It includes five participant-specific parameters that describe the learning process—two parameters that capture the speed of learning from positive (α) or negative (ν) feedback, respectively; an inverse temperature parameter (τ) that determines action selection noise; a discount parameter (γ) for temporally remote rewards; and a parameter that captures participants’ initial willingness to wait (η). We found that lesion location was related to dissociable deficits in the learning process: whereas participants with lesions to the vmPFC tended to exhibit a lower initial willingness to wait (lower η), participants with lesions to the dmPFC/AI might learn more slowly from negative feedback (lower ν). Taken together, these findings suggest that regions of the frontal cortex make computationally distinct contributions to adaptive persistence.
Materials and Methods
Participants
A total of 18 healthy controls (9 female and 11 male) and 31 participants with brain lesions (20 female and 11 male) were recruited from the University of Pennsylvania Focal Lesion Database and the McGill Cognitive Neuroscience Research Registry to participate in this study. They provided informed consent in accordance with the McGill University and University of Pennsylvania Institutional Review Board. The lesion group and controls were matched in terms of age (mean age, 57 ± 9 years for the controls vs 59 ± 12 years for the participants with lesions) and years of education (mean, 14.3 ± 2.1 for the controls vs 14.5 ± 2.5 for the lesion group). Lesions were drawn on a common neuroanatomical template (Montreal Neurological Institute) by neurologists at the research sites who were not aware of task performance.
The results from McGuire and Kable (2015) suggest that different regions of the frontal cortex underlie specific aspects of willingness to wait behavior. Consequently, we separated the participants into subgroups based on the location of their lesions—vmPFC, dmPFC, AI, or elsewhere in the frontal lobe. Before separating the frontal participants into specific subgroups, we first verified that at least 50% of the lesion affected the frontal lobe [as defined by a combination of regions in the Automated Anatomical Labeling (AAL) atlas (Rolls et al., 2020)]. This resulted in the exclusion of three participants whose lesions were primarily outside the frontal lobe. We then further separated participants on the basis of anatomical masks derived from the AAL. For the vmPFC mask, we included parcels 23–28, 31, and 32, which we truncated to exclude voxels superior to z = 10. Any participants whose lesions overlapped with at least 5% of the resulting vmPFC ROI were classified as belonging to the vmPFC group.
In order to establish the dmPFC and AI subgroups, we leveraged a combination of anatomical masks from the AAL and fMRI finding from McGuire & Kable. Specifically, we intersected the anatomical masks with masks from relevant functional contrasts, since McGuire & Kable found that both dmPFC and AI showed increased activation for “quit” versus “reward” trials. For the dmPFC anatomical mask, we included regions 5, 6, 19, 20, 32, and 33 and the portions of 23 and 24 and 31 and 32 that were not part of the vmPFC mask (truncated to exclude voxels at or inferior to z = 10). We created the final dmPFC ROI by determining the conjunction between the atlas-derived dmPFC anatomical mask and voxels significant for the “quit” versus “reward” contrast in McGuire & Kable. Any participants whose lesion overlapped with at least 5% of this dmPFC ROI were classified as belonging to the dmPFC group. Finally, we created the AI anatomical mask by combing parcels 29 and 30 of the AAL and created our final AI ROI by determining the conjunction between this atlas-derived mask and significant voxels for the “quit” versus “reward” contrast. Any person whose lesion comprised >25% of that ROI was classified as belonging to the AI group. Overall, this procedure resulted in 10 people with vmPFC lesions, two people with dmPFC lesions, and six people with AI lesions. Because the relevant regions of the dmPFC and AI were both shown to be differentially active for “quit” versus “reward” trials, we combined the dmPFC and AI lesion groups to result in a combined sample of eight participants with dmPFC/AI lesions (Fig. 1A). Finally, individuals whose lesions were primarily in the frontal lobe (i.e., met the frontal lobe cutoff) but did not meet the criteria for the other groups were classified as frontal controls (n = 10).
A, Illustration of the degree of overlap in lesion location across participants in each lesion group. B, Overview of the willingness to wait task. Participants chose whether to keep waiting for an uncertain future reward in two temporal environments: one in which the distribution of wait durations was uniform (high persistence condition) and another in which it was heavy-tailed (limited persistence condition).
Experimental design
In this experiment, we used a willingness-to-wait task similar to the one described in McGuire and Kable (2015) (Fig. 1B). Participants were tasked with making as much money as possible in 12 min by determining when to sell coins whose value varied within a trial. Specifically, participants received a coin worth 0¢ at the start of every trial. They were told that after a certain amount of time, the length of which varied, the coin would mature and become worth 10¢. At that point, signaled by a change in the coin’s color, they could sell the coin by pressing the space bar, leading to a 10¢ reward. However, participants were also told that at any point in the trial, they could elect to sell the coin, even if it had not yet matured. In that case, they would not receive 10¢, but they would receive the next coin. Whenever participants decided to sell the coin, the word “SOLD” would appear in red in the center of the screen for 1 s. A progress bar on the bottom of the screen indicated how much time remained in the block, out of the total of 12 min.
The blocks were defined according to the distribution of wait durations. In the high-persistence block (HP), wait times were sampled according to a uniform distribution ranging from 0 to 20 s. In this condition, the coin was equally likely to mature at any time within the 20 s interval, and the optimal strategy was always to wait until it matured. In the limited-persistence (LP) environment, wait times were sampled according to a heavy-tailed generalized Pareto distribution truncated at 40 s. As is illustrated in Figure 1B, the optimal strategy in the limited-persistence environment was to wait only 2.3 s for the coin to mature.
Procedure
The experiment was conducted over two sessions, with the order of the blocks held constant across participants to highlight differences across lesion groups. Session 1 began with a high-persistence block followed by a limited-persistence block. Participants later returned for Session 2 (median of 22 d in between sessions), which began with a limited-persistence block followed by a high-persistence block. Because participants’ behavior in Block 2 tended to be highly influenced by what they learned in Block 1 (see Fig. 2A for an illustration), we take the first block of each session (HP from Session 1 and LP from Session 2) as instances of learning that were not contaminated by previous experience in a different environment. These are the focus of the upcoming analyses. The distinction between the two blocks was signaled by a change in the color of the coin, but participants were not explicitly told about the characteristics of the two wait time distributions.
A, Average area under the curve (AUC) for each lesion group on all four blocks of the willingness to wait task. Overall, Block 2 performance is very influenced by Block 1. For this reason, in the subsequent analyses, we focus on Block 1 as an instance of learning that was uncontaminated by recent experience in a different environment. B, C Within-trial willingness to wait during Block 1 for each group of participants with lesions and healthy controls. B, Each group’s mean survival curve is plotted for the high-persistence environment (left) and the limited-persistence environment (right). C, Top, Area under the survival curve (AUC) for each group of participants in the high- and limited-persistence environments. Opaque circles indicate group means and each transparent circle represents a participant. Bottom, Difference in average time waited in the HP versus the LP condition, separated by lesion group. Range corresponds to the mean ± the standard error for each group.
Statistical analyses
To capture waiting behavior, we conducted three main types of analyses in R (4.2.3).
First, we used the “survival” package to estimate survival curves for each lesion group in the two temporal environments. These survival curves estimated the likelihood that a participant would wait until time t, assuming the coin had not yet matured. Because trials in which participants waited until reward receipt provided only a lower bound on how long they would have been willing to wait, we consider these trials right censored. To ensure that the survival curves covered comparable time ranges in both temporal environments, and trials never lasted longer than 20 s in the HP environment, we restricted our analyses to the first 20 s of every trial. From this survival curve analysis, we estimated the area under the curve (AUC) using the “desctools” package in R. The AUC is commonly used as proxy for mean survival time and provided us with an estimate of how many of the first 20 s participants were, on average, willing to wait.
Second, to account for individual variability, we used the “coxme” package to run a mixed effects Cox proportional hazards model with random effects for the effect of temporal environment on each participant. Lesion group was modeled as a fixed effect, with the healthy control group as the reference condition. Our predictors of interest were lesion group, temporal environment, and the interaction between the two. Significant positive coefficients for the effect of temporal environment suggest that participants were more likely to skip a trial—that is, to wait less—in the limited persistence condition. Significant positive coefficients for the effect of lesion group suggested that the relevant group waited less than the healthy controls. Finally, significant negative effects for the interaction between lesion group and environment indicate a reduced sensitivity to environmental characteristics compared with healthy controls.
Finally, we followed the approach proposed in McGuire and Kable (2012) to obtain a running estimate of each participants’ quitting threshold across the entire task. The running estimate was initialized to the longest time the participant waited on any trial up to (and including) the first time they quit. Willingness to wait was then updated trial by trial: if a participant waited for longer than the current running estimate, we raised the value to the observed waiting time. If instead they quit at a delay shorter than the current running estimate, we dropped the value to the observed quit time. Finally, if they waited until the end of a trial but the reward came earlier than their current willingness to wait, we did not update the estimate. This procedure allows us to compute a “best guess” measure for participants’ tolerance of waiting throughout the task. To assess whether participants learned to wait longer in the high-persistence condition, we computed a difference score for each participant by subtracting their running estimate of willingness to wait in the limited-persistence condition from their willingness to wait estimate in the high-persistence condition. We then averaged this difference across each minute of the task and assessed whether it was significantly greater than zero for each lesion group.
Computational model
To model behavior in the willingness-to-wait task, we fit a Q-learning model that has been adapted for this task (Chen et al., 2022, 2024). This model successfully explains behavior in identical willingness-to-wait paradigms, outperforms other competing modeling frameworks like R-learning (Schwartz, 1993) and other parameterizations of Q-learning, and yields reliable parameter estimates.
Task representation
We represent the willingness-to-wait task as a Markov decision process (MDP) in which the waiting period is broken into discrete time steps, each 1 s long. Every trial begins at time step t = −2, and a token appears after a 2 s intertrial interval at t = 0. For each time step after that, the agent must decide whether to keep waiting or to quit. If the agent chooses to wait and the token remains unmatured, the decision process proceeds to the next time step. Alternatively, if the agent quits or the token matures, the agent returns to t = −2 to start a new trial. We will refer to the trial-wise payoff as R, with R = 0 if the agent does not wait until the token matures, or R = 10 if it does. The total number of time steps within a trial is referred to as T.
Model description
Broadly, the Q-learning model assumes that the agent compares the value of quitting to the value of waiting at each time step to arrive at a decision. The value of continued waiting depends on how much time has elapsed since the token appeared and is referred to as q(wait, t). Conversely, the value of quitting is not time dependent, since the result of quitting is the same regardless of how long the waiting period has lasted. The agent makes a wait-or-quit decision every second according to the following choice probability:
At the end of a trial, the value of each preceding time step is updated simultaneously according to a feedback signal g(t). g(t) corresponds to a local estimate of the total discounted rewards following time step t. Computing it requires adding up the current-trial reward and the prospective reward from future trials, modulated by a discount parameter γ. The current-trial reward corresponds to R and is received explicitly at the end of the trial. Conversely, the prospective reward from all future trials is equivalent to the value of quitting, q(quit), and is derived from the agent’s best-guess estimate of prospective rewards given the recurrent nature of the task. Consequently, we have the following:
All that is now missing from the model is how the initial value estimates are determined. q(quit) is defined as the average optimal rate of return across the HP and LP environments, divided by a scaling constant to reflect discounting of future rewards (we take scaling constant to be equal to 1–0.85, which roughly matches the empirical median discount rate):
Model fitting
We fit the model in Stan (Carpenter et al., 2017) separately for each participant. This yielded posterior distributions of parameter estimates sampled using Markov chain Monte Carlo (MCMC) with uniform priors (τ ∈ [0.1, 22], α ∈ [0, 0.3], γ ∈ [0.7, 1], η ∈ [0, 6.5]). To compute summary statistics across the full posteriors, we calculated the mean of the distributions and use them as point estimates of parameter values in the subsequent sections.
In order to maximize the reliability of the parameter estimates, we included all the data collected from each participant, including the second block from each session that is excluded from the behavioral analyses. We assumed that participants learned continuously throughout their participation in the experiment, so we did not reinitialize value estimates across sessions or blocks. We fit the model with four chains of 8,000 samples for each participant, discarding the first 4,000 samples as burn-in. If the model did not converge for a given participant (R-hat > 1.01, effective sample size per chain < 100, or divergent transitions detected), we fit the model again with four chains and 20,000 samples per chain (10,000 as burn-in). With this procedure, we were able to successfully fit the model on all 46 of our participants.
Model validation
The five-parameter Q-learning model described above has been extensively validated in a larger dataset of healthy controls performing the same task (Chen et al., 2022, 2024). We point the reader to this reference for tests of parameter recoverability and comparisons to R-learning. Nonetheless, we performed several tests of model fit specifically to our data, including (1) comparisons to a single learning rate Q-learning model and a “baseline” no-learning model and (2) comparisons across different model-fitting techniques (fit on only Block 1 data, including or excluding reinitialization of Q-values between sessions).
Overall, we found that the five-parameter Q-learning model outperforms both the four-parameter alternative and the “baseline” model with one free parameter for each participant’s general willingness to wait. Indeed, using WAIC (Vehtari et al., 2017) as our measure of model comparison, the five-parameter model led to greater out-of-sample predictive accuracy compared with the single learning rate alternative (paired t test: t(45) = −3.16; p = 0.0028). Similarly, the five-parameter model outperformed the “baseline” model that assumes no learning (t(45) = −6.17; p < 0.001). Turning to the different model-fitting procedures, we consider trial-wise expected predictive accuracy (ELPD) as our measure of model fit, given that the alternatives vary in the total number of data points. We found that there was no difference in ELPD whether we fit the model on Block 1 only or both Blocks 1 and 2. Similarly, there was no significant difference in ELPD when we reinitialized Q-value estimates between sessions, for both the version fit on Block 1 data and the version fit on the full dataset (ELPD ∼ βfit type; p > 0.3 for all comparisons). Given that we found no significant differences and given that fitting the model on the full dataset without reinitialization is the most straightforward procedure, this is the version we focus on in the Results section.
Results
Behavior
Although participants completed two blocks of the willingness to wait task in each session, behavior on the second block was strongly influenced by what participants learned in Block 1 (Fig. 2A). For this reason, in the following Results section, we focus on Block 1 as an instance of learning that was uncontaminated by recent experience in a different environment.
If participants are sensitive to the temporal statistics of each condition, they should wait longer in the high persistence (HP) than in the limited persistence (LP) environment. Our experimental design presented the timing environments in a fixed order to prioritize sensitivity to between-group differences; nevertheless, we examined whether the healthy control group data contained within-participant patterns consistent with the previously observed effect of timing environment. To quantify the effect, we compared the mean area under the survival curve (AUC) in each of the two environments. Doing so amounted to assessing how many of the first 20 s of each trial participants waited on average, thus factoring out the longer maximum waiting time in the LP condition (40 s in LP vs 20 s in HP). Consistent with previous results (McGuire and Kable, 2012, 2015; Fung et al., 2017; Lempert et al., 2018; Lang et al., 2021), we found that the average AUC was significantly greater in the HP than the LP environment in the healthy control group (t(17) = 4.6; p = 0.00024; Fig. 2C).
The frontal control (FC) group exhibited waiting behavior that was quantitatively similar to that of the healthy controls (Fig. 2B,C). The average AUC for frontal controls was significantly greater in the HP condition than in the LP condition (difference in AUC: t(8) = 2.5; p = 0.031; Fig. 2C). To quantify this similarity, we fit a proportional hazards model that predicted time waited as a function of both lesion group and the interaction between lesion group and temporal environment. Neither of the two coefficients showed a significant effect of the FC group relative to healthy controls (βFC = 0.018; p = 0.80; βFC*env = 0.30; p = 0.58). Together, these findings suggest that the waiting behavior exhibited by the frontal controls follows similar principles of dynamic valuation to that of healthy controls.
The remaining participants showed deficits in their ability to adaptively calibrate wait duration, and their behavior differed significantly from that of the healthy controls (Fig. 2B,C). Although both the vmPFC and dmPFC/AI lesion groups behaved suboptimally, however, they exhibited different biases with respect to their sensitivity to environmental differences versus overall reward. On the one hand, participants with lesions to the vmPFC were less willing to wait for the reward overall, and especially when the temporal statistics of the environment favored persistence. The mean AUC for participants with vmPFC lesions was 8.1 ± 5.6 s in the HP condition, compared with 12.4 ± 4.4 s for the healthy controls (t(26) = 2.1; p = 0.037). Relatedly, the survival term from the proportional hazards model was significantly different for the vmPFC group compared with healthy controls (βvmPFC = 1.09; p < 0.0001; Fig. 2B), suggesting a lower survival rate overall.
Conversely, participants in the dmPFC/AI group did not wait significantly less time than the healthy controls: their average AUC was 10.1 ± 6.3 s across the two conditions, which was statistically indistinguishable from the healthy control group’s overall mean of 10.3 ± 5.6 s (t(23) = 0.09; p = 0.92). However, participants with lesions to the dmPFC/AI did not calibrate their waiting behavior based on the statistics of the temporal environment. This reduced sensitivity to the difference between the high and limited persistence conditions was evident in the effect on wait duration of the interaction between environment and dmPFC/AI lesion group compared with controls (βdmPFC/AI*env = −1.15; p = 0.049). Similarly, we found no significant difference in AUC across environments for participants in the dmPFC/AI group (t(6) = 0.74; p = 0.48; Fig. 2C). To ensure that these results were not biased by our procedure that combines participants with lesions to the dmPFC and AI, we also replicated the analyses focusing only on the six participants with lesions to the AI, and the results remain unchanged.
Thus far, we have focused on trial-wise analyses of waiting behavior that collapse across the entire block. While it is clear that lesion location modulates willingness to wait overall, the process by which lesions might affect how temporal environments are learned remains unclear. Figure 3 shows a local estimate of giving-up-time on Block 1, averaged across 1 s time points and across participants, as it evolved throughout learning. This estimate is dynamically updated based on participants’ trial-by-trial behavior. By the end of the 12 min, both the healthy participants and the frontal controls had learned to wait longer in the high- than the limited-persistence environment, as indexed by a reliably positive difference in willingness to wait in HP versus LP (one-sample t test averaged over the second half of the task: p = 0.002 and p < 0.001, respectively). The evolution of willingness to wait for the dmPFC/AI and vmPFC groups followed a different pattern, however. For the participants with dmPFC/AI lesions, learning appeared to be somewhat erratic, with no sensitivity to temporal environment by the end of the task (one-sample t test averaged over the second half of learning: p = 0.84). Though the vmPFC lesions group did, on average, wait significantly longer in the HP than the LP condition in the second half of the task (one-sample t test: p = 0.02), it is not clear that this difference was the result of learning. Indeed, while the magnitude of the difference in willingness to wait across environments increased with time for both the healthy participants and frontal controls (β = 0.31, p = 0.001 and β = 0.21, p = 0.05, respectively), this progression was not evident for either the dmPFC/AI or vmPFC lesion group (β = 0.28, p = 0.09 and β = −0.02, p = 0.72).
A, Running average (± standard error) of willingness to wait across both blocks of the experiment, sampled at 1 s intervals. B, Evolution of the difference in estimated willingness to wait in the high-persistence versus limited-persistence condition over time. Smoothed line estimated from a cubic fit to the data, including standard error. Filled circles correspond to mean WTW difference across 72 s bins (10 total bins); lines correspond to bootstrapped confidence intervals around each mean.
Parameter comparisons
The model-free analyses we have just described emphasized the functional specificity of different brain regions, with the vmPFC having a role in overall persistence and the dmPFC/AI involved in adaptive calibration. To further understand how different brain regions contribute to waiting behavior, we fit a computational model of willingness to wait to the data, as described in the Materials and Methods section and in Figure 4. This version of the model, which contains five free parameters, outperformed two alternatives—one with only one learning rate for both positive and negative feedback and another with a single free parameter for average willingness to wait (paired t test relative to the 1 learning rate model: t(45) = −3.16; p = 0.0028; paired t test relative to the “baseline” model: t(45) = −6.17; p < 0.001; Fig. 5A). Each of the five parameters in the best-performing model was estimated separately for each individual and captures a distinct feature of the decision process. We ensured that the model could recapitulate behavior by simulating both the AUC and full survival curves for each participant based on model-derived estimates of their parameter values, which are plotted in Figure 5B,C. Across both temporal environments, there were no significant group differences between the true behavioral AUC and the AUC derived from model simulations (smallest p value = 0.31). The behavioral effects we observe were also replicated in the simulated survival curves, which show sensitivity to temporal context only for the healthy and frontal controls. These findings highlight that the model could successfully capture behavioral tendencies as they varied across participants and lesion groups.
Schematic of the five-parameter reinforcement learning model used to computationally characterize willingness to wait. It assumes that participants maintain estimates of the value of quitting and the value of waiting at each time step and compare the two in order to determine choice.
A, Model comparison. Difference in WAIC for each participant between the two learning rate Q-learning model reported in the text and two possible alternatives: a one learning rate Q-learning model (left), and a baseline model that fits one free parameter for average willingness to wait and assumes no learning (right). More negative values indicate a comparatively smaller WAIC for the two learning rate model, indicating better model fit. Overall, the two learning rate model outperforms both alternatives. B, Comparison between curves simulated using model-derived parameters (top; lighter colors) and actual trial-wise waiting behavior (bottom; darker colors). The model can recapitulate group differences in sensitivity to temporal environments as well as overall waiting behavior. C, Comparison of AUCs from the behavioral data (plotted in darker colors) and model simulations based on the parameter estimates for each participant (plotted in lighter colors). Here too, the model can recapitulate the differences in waiting behavior across lesion groups.
Turning to parameter comparisons, we found that the model captured the behavioral dissociations outlined above through significant differences in the values of two parameters (Fig. 6). First, people with lesions to the vmPFC had lower values of initial willingness to wait on average, as captured by parameter η (Wilcoxon rank sum test: p = 0.040). As described in Equation 6, η is an intercept term that determines when the value of waiting falls below the value of quitting for each person, before any learning. Higher values of η indicate more patient behavior at baseline, whereas lower values of η indicate an initial reluctance to wait. Second, lesions to the dmPFC or anterior insula resulted in lower estimates of the valence-dependent updating parameter ν (Wilcoxon rank sum test: p = 0.041). This difference suggests that participants with dmPFC/AI lesions were less able to learn from quit trials. In fact, the within-group average value of ν was 0.74 (compared with 2.10 for the healthy controls), indicating a strong bias toward greater belief-updating from positive, as opposed to negative or neutral, outcomes. Finally, though we saw a marginal difference between the value of τ for healthy versus frontal controls—with frontal controls making marginally less noisy choices (Wilcoxon rank sum test: p = 0.051)—none of the other parameters estimated from the model were reliably different between the healthy control and the frontal control groups.
Parameter estimates derived from the computational model for each participant group. Participants with vmPFC lesions showed reduced initial willingness to wait, while dmPFC/AI damage led to lower valence-dependent updating. There are no other significant group-level differences in parameter estimates, other than a trend toward the inverse temperature parameter being higher for the frontal controls relative to the healthy controls (p = 0.051). Though the interpretability of this difference is limited by sample size, if this is a reliable effect, it would further our observation that lateral prefrontal cortex is not necessary for the adaptive calibration of waiting behavior.
In order to ensure that these group-level differences were not merely a consequence of our particular modeling choices, we performed several additional checks. First, while the specific values we reported above came from the means of the posterior distributions, we verified that similar group differences held with median-based point estimates for each posterior instead (p = 0.04 for η and p = 0.07 for ν). We also fit the model in three alternative ways: (1) only on Block 1 data, reinitializing Q-values in between sessions, (2) on Block 1 data without re-initializing, and (3) on both blocks of data, reinitializing Q-values in between sessions. Figure 7B highlights that if we average the parameter values for each participant across these four model fits, we continue to find that lesions to vmPFC led to reduced values of η and that participants with lesions to the dmPFC or AI tended to update less from negative feedback (ν). Furthermore, we found no significant differences in model performance across these four alternatives (Fig. 7A). Taken together, these additional checks help ensure that the group differences in parameter estimates we observe, though subtle, are robust.
A, Expected log point-wise predictive density (ELPD), averaged across trials within participant, across four different model-fitting techniques. There are no significant differences in predictive accuracy if the model is fit on only Block 1 data (with or without reinitializing q-value across sessions) or both blocks of data (with or without reinitializing q-values across session). To maximize data use and minimize assumptions, we report the parameter estimates from the model fit on both blocks without reinitialization in the text (“2 block + No reinit”). B, Group-level differences in parameter estimates relative to healthy controls, using a “composite” estimate derived from averaging across the four different model fits illustrated in A.
Discussion
Here, we provide causal evidence for the necessity of the vmPFC and dmPFC/anterior insula in the adaptive calibration of persistence. While participants with lesions to the vmPFC were less willing to wait for delayed rewards overall, lesions to the dmPFC and anterior insula caused a deficit in people’s ability to learn from quit trials. Crucially, lesions to lateral regions of the prefrontal cortex, an area of the brain implicated in self-control (Figner et al., 2010; Casey et al., 2011; Heatherton and Wagner, 2011; Blain et al., 2016), did not affect performance. We validated these findings with a computational model of task performance, which highlighted deviations in initial willingness to wait for the vmPFC group and decreases in the rate of learning from negative feedback for the dmPFC/AI group. Thus, both neural substrates contribute necessary and functionally distinct computations to adaptive persistence.
Our finding that lesions to the vmPFC led to reduced willingness to wait across both temporal environments echoes its known role in the representation of subjective value (Chib et al., 2009; Bartra et al., 2013; Clithero and Rangel, 2014). The subjective nature of this representation is highlighted by the fact that vmPFC activity during the willingness-to-wait task is modulated by temporal context and dynamically changes over the course of the delay interval (McGuire and Kable, 2015). However, we did not find that lesions to the vmPFC specifically disrupted the calibration of waiting behavior to the temporal environment. Rather, we found that lesions caused participants to wait less overall. This result suggests that the vmPFC may play a more general role in maintaining behavior toward delayed rewards, which is in line with previous findings in intertemporal choice tasks. Participants with mPFC lesions exhibit a stronger preference for smaller-sooner reward options, and cortical thickness in the vmPFC and neighboring orbitofrontal cortex (OFC) is correlated with patience in a delayed discounting paradigm (Sellitto et al., 2010; Pehlivanova et al., 2018). Thus, the vmPFC may play an important role in tempering impulsivity by holding a neural representation of the upcoming reward. This framework is also in line with reports of increased impulsivity in people that sustain damage to the vmPFC (Bechara et al., 1994; Torregrossa et al., 2008). VmPFC damage appears to bias people away from considering the long-term consequences of behavior, simultaneously leading to more present-oriented behavior (Fellows and Farah, 2005) and—in our experimental context—decreased willingness to wait.
Our finding that lesions to the anterior insula and dmPFC lead to impairments in learning from feedback is consistent with existing evidence that these brain regions play a role in belief updating. For instance, McGuire et al. (2014) found reliable activity in the insula and dmPFC in a change-point detection task that required participants to update their beliefs about reward contingencies. That work identified increased activity in these brain regions when a surprising outcome indicated that participants needed to revise their mental model of the task. Similarly, Stöttinger et al. (2014) showed that participants with insular lesions struggle on tasks that required the updating of mental models across perceptual and nonperceptual domains. These findings suggest that the insula and the dmPFC play a necessary role in the selective integration of task-relevant information, which could be related to their involvement in the brain’s “salience network” (Seeley et al., 2007).
In the context of the willingness to wait task, McGuire and Kable (2015) highlighted a role for the dmPFC and the anterior insula in signaling imminent quit decisions. If we conceptualize the last couple of seconds before a quit decision as being particularly salient, then this is in line with the network’s role in flagging relevant information. Furthermore, it may be that the signal in the dmPFC and anterior insula that ramps up right before a quit decision helps the brain to learn from quit events in addition to rewards. Although this interpretation is speculative, it is supported by the results from the computational model; the dmPFC/AI lesion group differed specifically in a parameter that governed learning from quit trials.
So far, we have emphasized the role vmPFC and dmPFC/anterior insula play in the calibration of persistence. Of equal importance, however, is the finding that participants in the frontal control group—who have lesions to more lateral portions of the prefrontal cortex—were not impaired in the willingness to wait task. Self-control is often thought to rely on lateral regions of the prefrontal cortex and especially the inferior frontal gyrus (IFG). For instance, Heatherton and Wagner (2011) put forth a framework in which top-down control from the lateral PFC and the IFG over subcortical areas like the striatum and the amygdala is necessary for successful self-regulation. This theory is inspired by evidence that response inhibition, or the ability to override a prepotent response in favor of the correct choice, relies on lateral PFC activation (Rubia et al., 2003; Aron et al., 2004; Casey et al., 2011; Tabibnia et al., 2011). In our results, however, we see no difference in waiting behavior between participants with lesions to the lateral regions of the prefrontal cortex and healthy controls. If choosing to wait were really a matter of sustained self-discipline, we would expect lesions to the lateral PFC to result in decreased waiting overall, rather than the intact performance we observe.
To support our claims about functionally distinct roles of different regions of the frontal cortex in voluntary persistence, we fit a task-specific reinforcement learning model to the data. This model has previously been validated on datasets with much larger sample sizes than ours (Chen et al., 2022, 2024), following guidelines for the computational modeling of behavioral data outlined in Wilson and Collins (2019). We also perform a variety of performance checks within this smaller sample to help establish its validity. Nonetheless, we recognize that our sample sizes small and the group-differences for the most part are subtle. We thus consider the model to be a helpful complement to the stark behavioral differences, helping to clarify which parts of the learning process might be disrupted. The group-level differences in parameter estimates should nonetheless be validated by larger sample sizes in the future.
Furthermore, in line with the general notion that all models are at least partially wrong, we recognize that many other well-suited modeling alternatives exist. Of particular interest here is the distinction between model-free and model-based reinforcement learning. While the Q-learning model we implemented here is model-free in that it does not assume that people are learning higher-level distributions of externally imposed delay durations, a model-based alternative might assume that participants are forming beliefs about the temporal statistics of each environment (Griffiths and Tenenbaum, 2006). Although this is a reasonable assumption, we recently showed that providing participants with explicit descriptions of the wait durations did not significantly improve their performance (Lempert et al., 2023). Thus, learning in this paradigm may not rely on model-based processing. Formalizing a computational alternative to model-free Q-learning will help us further examine this issue in the future.
With this study, we have provided evidence that distinct regions of the brain play different causal roles in determining how long people are willing to wait for future rewards. Although we focus here on participants with brain lesions, we believe that the willingness to wait task and the broader framework of persistence as valuation is relevant to other clinical domains. Indeed, mental illnesses like anxiety and depression often involve deviations in how reward is processed, particularly when its receipt is delayed or uncertain (Lempert and Pizzagalli, 2010; Takahashi et al., 2011; Pulcu et al., 2014). As such, we expect that clinical populations may exhibit systematically different choice behavior in the paradigm we have described. For instance, Mukherjee et al. (2020) administered the willingness to wait task in a sample of participants with major depression and found that they waited significantly less long in the high persistence environment than healthy controls. In the future, we hope to continue this line of work in other clinical populations, leveraging what we know about the computations that are required for adaptive behavior to shed light on processes that may go awry in mental illness.
Data Availability
Data is publicly available at https://github.com/cvangeen/wtw_lesion and can also be shared upon request.
Footnotes
We thank Lesley Fellows for facilitating access to participants in Montreal, Anjan Chatterjee and Roy Hamilton for facilitating access to participants in Philadelphia, Christine Déry and Eileen Cardillo for coordinating participants in Montreal and in Philadelphia, and all of the participants themselves, without whom this work would not have been possible. This work was supported by National Institute on Drug Abuse (NIDA) R01-DA029149 to J.W.K, National Institutes of Health (NIH) F32-DA030870, National Science Foundation BCS-1755757, and NIH R21-MH124095 to J.T.M. This research was supported in part by the Intramural Research Program of the NIH, NIDA (ZIA DA000642). The content of this paper is solely the responsibility of the authors and does not reflect the official views of the NIH.
The authors declare no competing financial interests.
- Correspondence should be addressed to Camilla van Geen at cvg{at}sas.upenn.edu.