Tuning the Brake While Raising the Stake: Network Dynamics during Sequential Decision-Making

When gathering valued goods, risk and reward are often coupled and escalate over time, for instance, during foraging, trading, or gambling. This escalating frame requires agents to continuously balance expectations of reward against those of risk. To address how the human brain dynamically computes these tradeoffs, we performed whole-brain fMRI while healthy young individuals engaged in a sequential gambling task. Participants were repeatedly confronted with the option to continue with throwing a die to accumulate monetary reward under escalating risk, or the alternative option to stop to bank the current balance. Within each gambling round, the accumulation of gains gradually increased reaction times for “continue” choices, indicating growing uncertainty in the decision to continue. Neural activity evoked by “continue” choices was associated with growing activity and connectivity of a cortico-subcortical “braking” network that positively scaled with the accumulated gains, including pre-supplementary motor area (pre-SMA), inferior frontal gyrus, caudate, and subthalamic nucleus (STN). The influence of the STN on continue-evoked activity in the pre-SMA was predicted by interindividual differences in risk-aversion attitudes expressed during the gambling task. Furthermore, activity in dorsal anterior cingulate cortex (ACC) reflected individual choice tendencies by showing increased activation when subjects made nondefault “continue” choices despite an increasing tendency to stop, but ACC activity did not change in proportion with subjective choice uncertainty. Together, the results implicate a key role of dorsal ACC, pre-SMA, inferior frontal gyrus, and STN in computing the trade-off between escalating reward and risk in sequential decision-making. SIGNIFICANCE STATEMENT Using a paradigm where subjects experienced increasing potential rewards coupled with increasing risk, this study addressed two unresolved questions in the field of decision-making: First, we investigated an “inhibitory” network of regions that has so far been investigated with externally cued action inhibition. In this study, we show that the dynamics in this network under increasingly risky decisions are predictive of subjects' risk attitudes. Second, we contribute to a currently ongoing debate about the anterior cingulate cortex's role in sequential foraging decisions by showing that its activity is related to making nondefault choices rather than to choice uncertainty.


Introduction
When gathering goods to secure survival, agents continuously need to estimate long-run reward expectations, and their associ-ated risks. This is particularly pertinent in situations where risk and reward expectations are highly coupled and escalate over time, such as foraging, trading, or gambling. During foraging, for instance, the agent has to continuously balance the desire to collect as much food as possible against the risk of predation (Davies et al., 2012). Likewise, a stock trader has to decide whether to hold or to sell off a climbing stock that can at any time fall.
Recently, foraging-like task settings have been used to delineate neural activity underlying risk and reward computation during sequential decision making (Kolling et al., 2012(Kolling et al., , 2014Congdon et al., 2013;Mobbs et al., 2013;Economides et al., 2014). Although these studies have gained important insights into the neural underpinnings of sequential risk taking behavior in foraging settings, they do not afford inference about the neural networks involved in the trade-off between competing responses as subjects respond with increasing caution to escalating risk and reward. In the context of action selection, a right-hemispheric inhibitory network consisting of the pre-supplementary area (pre-SMA), inferior frontal gyrus (IFG), caudate and subthalamic nucleus (STN) has been implicated in reactive and proactive stopping (i.e., "braking") as well as response switching, although always in the context of externally cued action selection and inhibition (Aron et al., 2007(Aron et al., , 2014Jahfari et al., 2009;Neubert et al., 2010;Rae et al., 2015). It is thus unclear whether and how this network might be engaged in mapping external risk variables to voluntary action selection and whether its dynamics predict interindividual differences in risk-taking attitude.
To address these questions, we performed whole-brain fMRI while healthy volunteers repeatedly rolled a common six-sided die to accumulate rewards. With every throw, the accumulated reward increased relative to the number of pips. However, in the event of throwing a "1," participants lost their accumulated total for the round (see Fig. 1a). We modeled subjective choice evidence and choice uncertainty based on subjects' choice behavior. For each choice, the realizable reward (i.e., the "accumulated sum") increased in parallel with the risk defined as the range of possible outcomes (Markowitz, 1959;Voon et al., 2011). At each decision point, the player had to balance the escalating risk against the escalating reward when choosing between the two options, namely, to continue rolling the die in the hope of winning more, or to stop playing, contenting oneself with the reward accumulated during previous rolls. In this experimental context, stopping can be conceptualized as voluntary inhibition of the continue choice; thus, it can be expected that the tendency to stop monotonically increases over time during a gambling round. We therefore hypothesized that the inhibitory control network would already generate a steadily augmenting "braking signal" during "continue" choices in proportion to the accumulated sum.
In addition, the task enabled us to address an unresolved topic with respect to the role of dorsal ACC (dACC) in sequential decision-making. The dACC is associated with computing different aspects of the value of choice options in the context of changing temporal constraints (Kennerley et al., 2006;Kolling et al., 2014). Yet, it is controversial whether dACC tracks the value of the nondefault choice (Kolling et al., 2012(Kolling et al., , 2014 or deals with choice uncertainty (also referred to as decision difficulty) when utilities of alternative choices are similar (Botvinick, 2007;Shenhav et al., 2014). By modeling subjective parameters of choice uncertainty and stop probability, we were able to relate "continue" activity in ACC to choice uncertainty or nondefault choice. A nondefault choice was defined as the choice to continue throwing the die despite increasing subjective evidence for stopping. In most experiments, the repeatedly chosen continue response would be considered a default choice. However, in this specific experimental setting, the context of an increasing subjective evidence for the stop decision gradually turns the continue choice into a nondefault action.

Materials and Methods
Participants. We included 20 healthy volunteers (9 female), without a history of drug abuse or neurological or psychiatric disorder. The South Oaks Gambling Screen (Lesieur and Blume, 1987) confirmed that none of the subjects was a pathological gambler. Two participants had to be excluded. One participant misunderstood the instructions, and another one had an abnormal anatomical MR scan. The study was approved by the Copenhagen Ethics Committee (KF 01-131/03), and all subjects gave written informed consent before participating in the study.
Sequential gambling task. The sequential gambling task was a singleplayer version of the dice game "pig" (see Fig. 1a). The task created an open-ended foraging-like environment where individuals were faced with cumulative rewards of escalating risk. Each gambling trial started with a 1.5-3.5 s (jittered) rolling phase where one of the six sides of a die was chosen randomly, shown for 150 ms and then replaced by another randomly drawn side of the die. Subsequently, the randomly chosen outcome of the trial was shown for 2 s, together with the accumulated gains gathered during this round. If the upper face of the die showed a number of pips Ͼ1 (i.e., rewarding throws), subjects were instructed to press a button with either the index finger or the middle finger of the right hand to continue throwing the die or to stop the round and bank the accumulated gains. The association of the choices with index finger and middle finger was counterbalanced across subjects. If subjects continued, a new rolling phase started after the 2 s. If the surface of the die only showed one pip, participants lost their accumulated total for the round. If subjects decided to stop or if the outcome of the throw was a "1," the banked amount (0 in case of a "1") was shown for 2.5 s until a new round started. If the subject did not respond within the 2 s, the round ended with a "0" amount to make sure that participants made relatively fast choices. Subjects were told that they would play the dice game for 25 min and that they would be paid out their average earnings, including lost rounds with zero earnings, after the experiment. Crucially, unlike in the Balloon Analog Risk Task (Rao et al., 2008;Fukunaga et al., 2012;Schonberg et al., 2012), the increasing incentive for stopping in the pig game is only driven by an increasing accumulated sum. Thus, risk is only increasing with the spread in possible outcomes and is independent of the probability of losing which was 1/6 in all trials. Furthermore, unlike the Angling Risk Task (Pleskac, 2008), the possible additional gain of continuing is variable and subjects can therefore only decide whether to continue or stop once they see the outcome of a trial if accumulated sum is driving their behavior. Subjects played the dice game for 25 min, leading to an average of 187 "continue," 42 "stop," and 34 "loss" trials. PsychoPy software (version 1.74.01, www.psychopy.org;Peirce, 2009) was used for task presentation on a back-projected screen that participants viewed with a coil-mounted mirror. After receiving task instructions, participants performed a short training session outside the scanner to familiarize them with the task.
Imaging procedures. Participants were scanned in a Verio 3T scanner with a 32-channel head coil (Siemens). A T2*-weighted EPI sequence (TR 1.65 s, TE 26 ms, flip angle 74°) was used to map task-related changes of the regional BOLD signal as index of regional neural activity. The 910 brain volumes with 32 slices per volume were acquired in ascending order with an in-plane resolution of 3 ϫ 3 mm and a slice thickness of 3.2 mm (FOV 192 ϫ 192 ϫ 134.4, acquisition matrix 64 ϫ 64). The sagittal orientation of the brain volumes was aligned to the anterior-posterior commissure line. Pulse and respiration were recorded with an infrared pulse oximeter and a pneumatic thoracic belt.
Behavioral modeling. As the subjects at each trial (except for loss events) had to make a binary choice between "continue" or "stop," we formulated a simple logistic model of the choice behavior. At trial n, the probability of choosing the stop response was modeled using a logistic regression model as follows: p͑stop͉x n ͒ ϭ 1 1 ϩ exp͑Ϫw 1 x n Ϫ w 0 ͒ , where x n is the accumulated sum in trial n and w 0 and w 1 are free parameters. The Certainty Equivalent (CE) was defined as the accumulated sum corresponding to a stop probability of 0.5 (see Fig. 1c,d).
Analysis of the fMRI data. Image processing and analysis were performed with SPM8 (revision no. 4667, Wellcome Department of Imaging Neuroscience, Institute of Neurology; http://www.fil.ion.ucl.ac.uk/ spm/). EPI images were slice time corrected to TR/2, realigned to the mean EPI image, normalized to a standard EPI template of the MNI using affine warping and a DCT basis (Ashburner and Friston, 1999), and smoothed with an 8 mm kernel (FWHM). We also processed the images with a smaller 3 mm kernel to increase the spatial resolution of dynamic causal modeling (DCM) and to prevent merging of local activation in STN with activation in neighboring brain structures (de Hollander et al., 2015). All subsequent analysis steps were the same for both types of preprocessed images. To correct for low-frequency drifts, data were filtered in the temporal domain with a high-pass filter with a frequency of 1/128 Hz. For the imaging analysis, we formulated a voxelwise GLM, which modeled the three main events of interest, "loss," "stop," and "continue" events. Each event was modeled at the onset of the outcome presentation using stick functions convolved with the canonical hemodynamic response function as implemented in SPM 8. We also modeled additional regressors of no interest (round feedback screens, rolling phase, a "1" as the first outcome, no-response events, 24 regressors to remove residual movement artifacts (modeled with an expansion [six parameters] of the estimated movement from the rigid body realignment procedure (Friston et al., 1996), cardiac pulsation (sixth expansion order) and respiration (fourth expansion order; retrospective correction technique RETROICOR) (Glover et al., 2000).
In the first-level model, we added parametric first-order polynomial modulations to the three main events of interest. For "stop" and "continue" events, we added (in the order mentioned) the following: (1) the accumulated sum; (2) the trial-specific prediction error (PE) defined as the actual outcome minus the expected outcome reflecting both the additional amount gained plus the size of the loss avoided, i.e., at trial n: PE n ϭ y n Ϫ ͩ 200 Ϫ x nϪ1 6 ͪ , where y n denotes the die outcome in trial n and the expected outcome is the average expected additional win ͩ 200 6 ͪ minus the ͩ 1 6 ͪ probability of losing the accumulated sum from trial t-1 (x n-1 ); (3) stop probability at the currently accumulated sum; and (4) the trial-specific choice uncertainty reflected by the first derivative of the logistic choice model with respect to the sum at this trial. The parametric modulation of continue events with stop probability reflects increasing evidence for choosing the stop response while subjects nevertheless continue and thus a nondefault choice. The first derivative of the stop probability function is a measure of choice uncertainty because it peaks at the point of maximal steepness of the stop probability function, which by definition is the subject's CE where the subjective utility for continuing and stopping are equal and uncertainty thus is highest. For "loss" events, we only added one parametric modulation using the sum lost. Accumulated sum correlated with several variables, such as prediction error, stop probability, and choice uncertainty (average Pearson r across all 18 subjects: accumulated sum with prediction error ϭ 0.679, accumulated sum with stop probability ϭ 0.839, accumulated sum with choice uncertainty ϭ 0.869, prediction error with stop probability ϭ 0.514, prediction error with choice uncertainty ϭ 0.537, stop probability with choice uncertainty ϭ 0.829). This prompted us to use two different first level designs. One model had nonorthogonalized regressors to allow for a more precise interpretation of the variance assigned to the regressors (i.e., where the regressors only explain variance in the neural signal that is not shared with the other regressors) (Mumford et al., 2015). If not mentioned otherwise, any reported results are based on this model. However, given the regressors' correlations, we chose a second design with serially orthogonalized regressors for the analysis of connectivity modulations in the inhibitory network (orthogonalized model). In this design, all regressors are orthogonalized to the one (or ones) before it, whereas in a nonorthogonalized design, the variance shared by our regressors is not attributed to any regressor (Mumford et al., 2015). Although all of our regressors capture distinct aspects of the cognitive processes in this game, they all show an increase in value at least for the first trials of each round. In this second design with serial orthogonalization, accumulated sum is attributed all the variance shared with the other regressors and thus reflects a combination of the effects of concurrently increasing risk and reward, together with the associated increasing tendency to switch responses as well as increasing deliberation about the choice. This regressor thus does not reflect a sharply defined cognitive construct, but multiple effects, which are related to the sequential nature of this gambling paradigm and jointly gave rise to the modulation of the inhibitory network. Given the imprecise psychological interpretation of accumulated sum in this case, we termed it "cumulative gambling" regressor in this model. We ran a second GLM with serial orthogonalization to ascertain that the effects of stop probability and choice uncertainty were not influenced by the order of the orthogonalization of the regressors. This GLM was the same as the first model with serial orthogonalization, only that we did not include the stop probability regressor.
The obtained voxelwise parameter estimates of the resultant regressors were transformed to whole-brain statistical parametric maps of t values (SPM{t}) of the effect versus baseline. The resulting individual t-contrast maps were then taken into a second-level random effect, betweensubjects analysis performing a one-sample t test on the effect of interest across the group. At the second level, we also correlated the individual t-contrast maps of the effects of interest with our subject-specific measure of risk tolerance (CE). We report findings at standard threshold p Ͻ 0.001, uncorrected, at whole-brain level (corresponding to a t value Ͼ3.646) with a cluster extent threshold of Ͼ 30 voxels and cluster-level p Ͻ 0.05, FWE-corrected. Given our a priori interest in the motorcontrol network, including pre-SMA, right IFG, caudate and STN, we performed small volume corrections (SVCs) for those regions with anatomical masks (for STN: ATAG mask [https://www.nitrc.org/projects/ atag], for the other regions: AAL atlas from the WFU pick atlas toolbox in SPM 8) if they did not show significant activation at standard threshold in the two relevant contrasts associated with increasingly risky decisions, accumulated sum, and stop probability.
DCM. Current theories of response control highlight three modes of cortico-basal ganglia-cortical connectivity. First, a direct "Go" pathway (cortex-striatum-internal segment of pallidum [GPi]-thalamus-cortex) having a net excitatory influence on cortex. Second, an indirect "NoGo" pathway (cortex-striatum-external segment of pallidum [GPe]-STN-GPi-thalamus-cortex) with a net inhibitory influence on cortex and third, a hyper-direct "NoGo" pathway from cortex directly to STN also with an inhibitory influence on cortex (Aron, 2011). GLM analysis with a model with orthogonalized regressors revealed that activity gradually increased in these key inhibition areas with the cumulative gambling regressor when subjects choose to continue with gambling (for details, see Results). This raised the question whether this rise in activity reflects a context-dependent modulation of these pathways with escalating risk and reward.
To address this question, we specified a two-state DCM (Marreiros et al., 2008) comprising four key regions in the inhibitory network: right caudate head, pre-SMA, right IFG, and right STN (see Fig. 3a). We used the parametric modulator of cumulative gambling as input modulating the connections in the network. The trial types of "loss," "stop," and "continue" were modeled as direct perturbation of each region. Even though we use all three trial types to model the driving input of events in the course of the dice game, the models are compared with regard to the effect of cumulative gambling on the coupling between regions during "continue" trials only. The model space was defined by all combinations of sum-modulated connections between the regions, where we modeled both indirect (via caudate) and hyperdirect connections from cortex to STN, with a direct connection from STN back to cortex (see Fig. 3d). While we only model a monosynaptic connection from STN to cortex, the actual connection most likely is polysynaptic (e.g., via GPi and thalamus) (Yasoshima et al., 2005). However, in DCM, the net effect of a polysynaptic connection can be modeled by only one parameter. We were interested in comparing different feedforward cortex-to-STN coupling dynamics via the indirect and hyperdirect pathway.
As a first step, we extracted the first eigenvariate from the ROIs for each subject according to the following anatomical criteria: for right caudate head, pre-SMA, right IFG, and right STN we constructed a mask that was the inclusion image of (1) an anatomical mask (STN: ATAG mask [https://www.nitrc.org/projects/atag], for the other regions: AAL atlas from the WFU pick atlas toolbox in SPM 8) of the ROI and (2) the second-level SPM{t} of "continue" modulated by "sum" at a strict voxelwise threshold of 0.0001, thus restricting the SMA ROI to pre-SMA (y Ͼ 0) and the caudate ROI to caudate head. For right IFG, we used an AAL mask corresponding to the pars opercularis and triangularis. We then created the final ROI by taking the individual SPM{F} images of the effect of interest (F-contrast over the paradigm). For caudate head, pre-SMA, and right IFG, we thresholded these at a liberal p Ͻ 0.05 uncorrected and then extracted the first eigenvariate from spheres with 4 mm radius centered at the global maximum from voxels included in both AAL and the second-level contrast map. For right STN, we included all voxels within the ROI defined by ATAG mask (https://www.nitrc.org/projects/atag). The signal from the STN ROI was derived from the analysis run on images that were smoothed with a 3 mm kernel. The eigenvariates were adjusted with the F-contrast over the paradigm.
The DCM was specified as follows. For a vector of regional activities, x, and a vector of experimental inputs u, the bilinear DCM was formulated as follows: Here A denotes the context-free (u ϭ 0) coupling matrix. We chose a fully reciprocally connected system. For the input vector u, we chose the main conditions of "loss," "stop," and "continue" plus the parametric modulation of sum on "continue" trials (cumulative gambling regressor). The three main conditions entered as direct input to all regions (C matrix). The parametric modulator of sum on "continue" entered as "modulatory" input (u i ), multiplied on different B i matrices that constituted the model space (see Fig. 3a). We used the two-state version of DCM (Marreiros et al., 2008) where each region is modeled with an excitatory and inhibitory subpopulation and interregional connections are constrained to be positive. In this formulation, the above matrix The u ij are the prior couplings between regions i and j and are constrained to be positive for different regions i and j. Thus, posterior estimates of B ij Ͼ 0 will scale the prior and context-free coupling with a factor Ͼ 1 ͑ϭ exp͑uB ij ͒͒ in the context of u.
Model space was explored by comparing 16 models, which featured different context-dependent coupling architectures (B matrices). Models 1-15 correspond to different combinations of indirect or hyperdirect cortico-basal ganglia-subthalamo-cortical pathways (see Fig. 3d). In model 16, the sum regressor was used instead of the "continue" regressor as input to all regions via the C matrix as a "null" model where there is no contextual modulation of connectivity (B matrix of 0). Because the ROIs used in this DCM are identified on the basis of a second-level effect of cumulative gambling, a model where sum does not enter as either a direct or modulatory input is not plausible and is therefore not included in the model space. Seventeen subjects entered subsequent analysis. One subject had flat-line fits and was excluded.

Sequential decision-making under increasing risk
Participants became more hesitant to continue with gambling, the more money they had gathered during a sequential gamble trial. This was reflected by the reaction times for "continue" choices which increased linearly with the accumulated gain of the gamble (one-sample t test over individual regression weights, p Ͻ 0.001, t (17) ϭ 5.47; parameters were normally distributed according to Kolmogorov-Smirnov test, p ϭ 0.62). All but one subject had positive regression weights, showing that this effect was highly consistent across subjects (mean 4.93, range Ϫ0.84 to 9.71).
Stopping behavior during the dice game could in principle be determined by a strategy of setting a fixed threshold of either accumulated sum or simply of number of throws, but no subject displayed such a stereotypic behavior. Indeed, all subjects showed a variable stopping behavior with respect to both the accumulated sum and the number of throws when stopping. We then established which of these predictor variables best accounted for the observed stopping behavior using logistic regression. Three plausible strategies would use (1) accumulated sum, (2) number of throws, or (3) a combination of both. Comparing the Bayesian Information Criteria (BIC) scores for the "sum" and the "number of throws" model revealed a very strong evidence (Kass and Raftery, 1995) in favor of the "sum" model, showing a mean difference in BIC scores of 11.6. This model had a mean adjusted r 2 of 0.47 (SD 0.15). Adding number of throws to the "sum" model only contributed insignificantly, improving the BIC score by 1.51.
For all participants, the "sum" model revealed a positive sigmoidal relation between stop probability and accumulated sum (Fig. 1d). From these functions, we derived the CE as a taskrelated metric of participants' risk preference: The CE corresponds to the accumulated sum at which the player is indifferent between continuing to throw the die or to stop; in other words, both actions are of equivalent utility (Fig. 1c). Players with low CE values stopped early, contenting themselves with a relatively low monetary reward taking less risk. Players with high CE values tried to achieve a higher payoff, seeking more risk. CE varied considerably across players. There was a single outlier with a very high CE (CE ϭ 365, z ϭ 3.25), the remaining CE values ranged between 94 and 260 Danish Kroner (DKK; 1 DKK ϳ $0.17 US), indicating considerable interindividual differences in sequential risk-taking behavior during gambling (Fig. 1d).
The optimal gambling strategy for maximizing expected value is to continue throwing the die as long as the expected gain per die throw exceeds the absolute expected value of the loss. The crossover point for this game then is the accumulated sum at 200 DKK (Fig. 1b). In other words, an economically rational player should switch from continuing to stopping after having earned 200 DKK (Fig. 1d, red line). However, computing this decision boundary is not intuitively obvious, and none of the subjects used the strategy of abruptly switching from "continue" to "stop" decisions at this amount. At the group level, the mean CE value was significantly lower than the optimal CE of 200 DKK (160.6 Ϯ 33.8 DKK; p ϭ 0.026) and subjects stopped at amounts Ͼ200 DKK in only 20.5 Ϯ 21.2% of all stop decisions. This indicates that most subjects were risk averse during sequential gambling. We found no correlations ( p ϭ 0.76) between our experimental (i.e., CE value) and external measure of risk attitude, the Domain Specific Risk Taking Scale (DOSPERT) (Blais and Weber, 2006).

Activity increases scaled to cumulative gambling
When players decided to continue the gamble, there were no significant linear increases in BOLD activity with accumulated sum and only one significant decrease in the left inferior parietal cortex with the standard model without orthogonalization. After SVC, there was significant activity in the right caudate (Table 1).
In the orthogonalized model where all the variance shared with other regressors is attributed to accumulated sum, a set of brain regions showed a linear increase in BOLD signal with this cumulative gambling regressor ( Fig. 2a; Table 2). Activations included the predicted "braking" network with pre-SMA, right IFG extending into anterior insula, ventral striatum extending into the head of the caudate nucleus, and STN (Fig. 2b). In addition, a large bilateral cluster encompassing the dorsal portion of the ACC along with bilateral dlPFC and bilateral inferior parietal cortex, the ventral striatum as well as bilateral V3/V4 showed an increase in activity with cumulative gambling. Linear decreases in activity with cumu-lative gambling were observed in a left inferior parietal cluster, corresponding to area PG in left angular gyrus and precuneus (Table 2).
Dynamic changes in subthalamic-to-cortex connectivity DCM was used to explore the effect of cumulative gambling on functional connectivity in cortico-basal ganglia-subthalamocortical pathways during "continue" trials (Fig. 3). We extracted the signal from STN based on images smoothed with a smaller 3 mm kernel to achieve higher spatial resolution and to avoid a spill-over of the signal of neighboring brain regions into the STN (de Hollander et al., 2015). Fixed-effects analysis at the group level (Stephan et al., 2010) revealed a winning model in which both the indirect right IFG-caudate-STN and pre-SMA-caudate-STN connections as well as the subthalamo-cortical connections from STN to right IFG and STN to pre-SMA were modulated by cumulative gambling (Fig. 3b,d). This combined model, which featured an indirect Figure 1. The pig dice game. a, Trial types. Each trial started with a rolling phase for 1.5-3.5 s, after which the random outcome was shown. If a throw yielded a pip number between 2 and 6, the pip number multiplied by 10 DKK was added as reward to the accumulated earnings for the round. After each rewarding throw, subjects chose whether to continue rolling the die (i.e., CONTINUE trial) or to stop and bank the cumulated gain (i.e., STOP trial). If the pip number of a throw was 1, participants lost the entire gain that had been accumulated during the round (LOSS trial). b, Relation between expected value for additional gains (green line) or losses (red line) and the accumulated earnings in DKK during a gambling round. The expected value of the additional gain per throw (green line) remained constant during a gambling round and was derived by the mean pip number per rewarding throw multiplied by the probability to win and the factor 10 ͩ ͑2 ϩ 3 ϩ 4 ϩ 5 ϩ 6͒ ‫ء‬ 1 6 ‫ء‬ 10 ͪ . The expected value of loss per throw (red line) steadily increased during the gambling round and equaled the accumulated sum multiplied by the loss probability (1/6).
The coordinate where the red and green lines cross each other indicates the point where expected values for additional gain and loss were equal. At this point of the game, corresponding to accumulated earnings of 200 DKK, the player ought to stop because the expected value of losing starts to outweigh the expected value of the additional gain. c, Logistic regression modeling of individual stop probability. Stop probability was described as a sigmoid function of accumulated sum during sequential gambling. The CE corresponds to the accumulated earnings (in DKK) where the likelihood to stop equals the likelihood to continue throwing a die (dashed lines). d, Logistic stop probability functions of all participants (n ϭ 18). There were considerable interindividual differences in the distribution of choice behavior. No participant showed a choice behavior that matched the optimal gain-maximizing strategy. The objectively optimal choice behavior is illustrated by the red line, representing a step function with an abrupt switch from continuing to stopping when the accumulated earnings are DKK 200.
connection from both pre-SMA and right IFG via caudate to STN and a direct pathway back to cortex as winning model, had a posterior probability of ϳ99% (Fig. 3b). For the winning model, mean percent variance explained by the DCM across all subjects was 16% (range 3%-34%).
Extracting coupling values from the winning model's B matrix revealed that the coupling from STN back to both cortical areas increased in proportion with cumulative gambling. The mean exponentiated coupling scale parameter was 1.53 from STN to pre-SMA, which was significantly different from 0 (t (16) ϭ 16.29, p Ͻ 0.001, Bonferroni corrected). Parameters were normally distributed (Kolmogorov-Smirnov test, p ϭ 0.878). The mean exponentiated coupling scale parameter was 1.26 from STN to right IFG (t (16) ϭ 17.42, p Ͻ 0.001, Bonferroni corrected). Parameters were normally distributed (Kolmogorov-Smirnov test, p ϭ 0.466). Although the modulatory influence of cumulative gambling was modeled for the cortico-caudate-STN connections as well, there was no significant increase or decrease of coupling by cumulative gambling at the group level on these connections. At the individual level, the change in STN-to-pre-SMA connectivity with cumulative gambling, as reflected by the mean coupling scale parameter, predicted interindividual variations in gambling behavior (Fig. 3c). After removing the subject with the highest CE from the correlation based on the Mahalonobis distance measure (CE ϭ 365, Mahalanobis distance ϭ 9.53, p ϭ 0.004), individuals with more cautious gambling behavior showed a stronger influence of STN on pre-SMA during continued reward seeking under escalating risk than individuals displaying more risky behavior. The stronger the increase in STN-to-pre-SMA coupling with cumulative gambling, the smaller were individ-ual CE scores (adjusted r 2 ϭ 0.26, p ϭ 0.048, Bonferroni correction).

Neural correlates of stop probability
A higher stop probability during a "continue" trial reflects the need to overcome an increasing evidence to choose the alternative "stop" option. A cluster in dACC showed a linear effect of increasing stop probability when subjects nevertheless decided to continue. Further activity scaling with stop probability was seen in parietal and frontal areas ( Fig. 4a; Table 1). Because we were specifically interested in how the stopping network of pre-SMA, right IFG, caudate nucleus, and STN would be modulated by increasing sum and stop probability (both parameters being related to increasingly risky decision), we performed SVC for these predefined areas of interest. Bilateral STN, left caudate, and pre-SMA showed a significant increase in "continue" activity with Significant activation peaks (z score) and the corresponding stereotactic x, y, z coordinates in MNI space from the standard model without serial orthogonalization. Results are significant at p Ͻ 0.001, uncorrected, at whole-brain level (t Ͼ 3.646), cluster extent threshold Ͼ30 voxels, cluster-level p Ͻ 0.05 FWE-corrected.

Figure 2.
Color-coded statistical parametric maps showing clusters where task-related brain activity during "continue" trials scaled positively with cumulative gambling. a, Linear effect of cumulative gambling from the model with serially orthogonalized regressors at the onset of "continue" trials (one-sample t test, n ϭ 18) in ventral striatum and caudate head bilaterally, pre-SMA/ MCC/ACC, bilateral IFG extending into anterior insula, and thalamus. All statistical parametric maps are thresholded at an uncorrected p Ͻ 0.001. Peak z scores and corresponding stereotactic coordinates are listed in Table 2. b, An ROI analysis (SVC) in STN revealed a significant linear effect of accumulated sum on "continue" activity in right and left STN. For illustration purposes, we show the unthresholded SPM in the bilateral STN mask used for the SVC. Red arrows indicate only the coupling from STN to pre-SMA, and rSTN to right IFG increases significantly with the cumulative gambling regressor at the group level. Numbers next to the red arrows indicate the exponentiated mean coupling scale parameters ( p Ͻ 0.001 for STN-to-pre-SMA and p ϭ 0.04 for STN-to-IFG, Bonferroni corrected). c, Linear relation between the CE values and exponentiated coupling value from STN to pre-SMA (n ϭ 16). The individual coupling value is predictive of the risk attitude displayed during the task, as reflected by CE. d, Display of the 16 connectivity models tested within the DCM framework. The four circles represent the regions shown in a. Solid arrows indicate connections modulated by cumulative gambling. Dashed arrows indicate unmodulated connections. Red box represents the winning model shown in b. In the null model (model 16), the connections were not modulated by cumulative gambling, and the perturbing input into the network from "continue" was replaced with cumulative gambling. stop probability, but we found no effects of stop probability on right IFG or right caudate activity (Fig. 4b).

Neural correlates of choice uncertainty
Although stop probability reflects the continuously increasing evidence for the subject to take the safe choice, its first derivative with respect to accumulated sum (a bell-shaped function peaking at the CE) indicates in which phase of sequential gambling subjects are most uncertain about whether to continue or to stop. Even though this modulation reflects an ostensibly salient point in the game, we found no brain regions where "continue" activity varied in proportion to choice uncertainty.
To assure that this null-finding was not due to shared variance with the stop probability regressor that was not assigned to either regressor, we ran another general linear model without the stop probability regressor. Again, we found no significant activations associated with choice uncertainty.

Discussion
We investigated the neural dynamics during the decision between a risky "continue" choice with the possibility of higher reward and a safe "stop" choice with a known outcome. This type of decision is prototypical of foraging behavior where the forager needs to weigh the incentives of future foraging against the prob-ability of losing the opportunity to harvest its gains. It allowed us to investigate the inhibitory network dynamics underlying voluntary response switches during sequential decisions. Furthermore, because subjects were unable to solve the optimal gainmaximizing strategy, this experimental setting allowed the modeling of subjective evidence for the two available options under overt contextual constraints. Comparing the effects of stop probability and choice uncertainty during "continue" choices on ACC activity, we were able to contribute to an ongoing debate about this region's function in sequential decision-making.

Balancing caution and greed
At the behavioral level, participants became gradually more hesitant to continue with gambling the more money they had accumulated. This was reflected by a linear increase in reaction times for "continue" choices with the accumulated sum. Although there was substantial interindividual variation, on average, participants were more likely to stop than to continue at amounts that were significantly lower than the optimal amount of DKK 200. This is in good agreement with the notion that most people are aversive to risk with known probability in a gain frame (Kahneman and Tversky, 1979;Tversky and Kahneman, 1992). As a side note, we found no correlation between individual CE values as behavioral index of the individual risk attitude and our external measure of risk taking (DOSPERT scale). It has been shown that several behavioral measures of risk taking do not vary in proportion with the DOSPERT, suggesting that the DOSPERT might not reflect all aspects of risk-taking attitude (Mishra and Lalumière, 2011).

Sequential risk taking and the braking network
In the standard model without orthogonalized regressors, accumulated sum only showed one cluster with a significant decrease in activity in the left inferior parietal cortex, and, after SVC, an increase in right caudate. This shows that most of the variance explained by the cumulative gambling regressor in the orthogonalized model is shared with the other regressors and thus not easily assigned a specific psychological construct. Several brain networks gradually increased neural activity with the cumulative gambling regressor in the orthogonalized model showing a pattern of activation similar to the results in a study on the Angling Risk Task . In this model, the cumulative gambling regressor is assigned all the variance shared with the other regressors and thus reflects both risk, anticipated accumulated reward at the next outcome, as well as increasing deliberations about action selection. Therefore, these regions may trace the increase in possible future reward, the increasing risk caused by the increase in possible loss amount, or they might trade off the increasing reward and risk. The cluster in the right IFG extended into the insula, a region associated with risk processing (Preuschoff et al., 2008). Even though the neural effects of cumulative gambling cannot be interpreted unequivocally, this shared variance contains all the aspects of cumulative gambling that are relevant with regard to their modulatory effects on connectivity between the four key nodes of the inhibitory network. The accumulation of gains resulted in a linear increase of neural activity in the pre-SMA, right IFG, STN, and caudate nucleus. These regions form a functional network that has been implicated in reactive (e.g., stop signal tasks) (Munoz and Everling, 2004;Chikazoe et al., 2009;Rae et al., 2015) Table 1. b, An ROI analysis in STN revealed a bilateral increase in activity with stop probability after SVC. For illustration purposes, we show the unthresholded SPM in the bilateral STN mask used for the SVC.
Connectivity analysis revealed that a combined "indirect" cortico-basal ganglia-cortical pathway best accounted for the ramping-up of the braking network. Importantly, the coupling strength from STN back to cortex increased with cumulative gambling, suggesting that fast interactions between cortical control regions and STN are modulated by the level of reward and risk in the sequence. Two mechanistically different explanations may account for the gradual increase in STN-to-cortex connectivity with increasing sum (Keuken et al., 2015). First, it may reflect an excitation of the "continue" response in combination with inhibition of competing responses (Nambu et al., 2002), gated by the basal ganglia receiving modulatory dopaminergic input. Second, the STN might mediate a gradual buildup of global action inhibition to prevent premature responding and allow adequate response selection under escalating risk (Frank, 2006). Our finding that reaction times grow linearly with cumulative gambling is consistent with the latter interpretation.
The increase in connectivity strength from STN to pre-SMA in the context of increasing stakes predicted interindividual variations in risk attitude as reflected by individual CE values. This finding indicates that subjects with a tendency to stop already at low stake amounts show a faster ramping-up of the connectivity increase from STN to pre-SMA than individuals who stop at higher amounts on average. Subjects with a lower CE are by definition less inclined toward being exposed to the risk of large losses. Hence, the dynamic buildup of STN influence on pre-SMA may reflect a neural correlate of responding with increasing caution. In support of this view, Cavanagh et al. (2014) found that STN activity was related to increased response caution after errors. Pre-SMA has also been found to increase its activity when preparing for a stop signal (Jahfari et al., 2009) and in proportion to increases in accuracy in conflict tasks (Forstmann et al., 2008). Differences in recruitment of STN-to-pre-SMA connectivity during sequential risk taking may therefore provide a partial mechanistic explanation for these interindividual differences. It should be noted that DCM is a model-fitting technique that only allows a relative comparison of the specified models (Lohmann et al., 2012).

Stop probability
In addition to neural networks computing objective levels of risk and reward, the nature of the task enabled us to investigate neural activity related to subject-specific choice tendencies, as reflected by the individual stop probability. Activity in dACC, insula, bilateral IPC, and middle frontal gyrus was scaled to the individual probability to stop. In other words, these areas became gradually more active when subjects nevertheless continued to gamble despite their choice tendency to stop. This ramping-up of activity in a frontoparietal network might reflect anticipatory activity preceding a "stop" choice and is compatible with a contribution to cued behavioral switching (Braver et al., 2003) and action switch in a reward context (Gläscher et al., 2009).

Nondefault choice and choice uncertainty
Investigating the neural correlates of stop probability and its first derivative reflecting the level of choice uncertainty allowed us to distinguish between two competing hypotheses of ACC function. On the one hand, several studies suggest that dACC is involved in tracking the value of the nondefault choice (Kolling et al., 2012(Kolling et al., , 2014Boorman et al., 2013;Mobbs et al., 2013), whereas others argue that it encodes choice uncertainty, when the subjective utility of the available options is similar (Botvinick, 2007;Shenhav et al., 2014). It should be noted that, in the latter studies, the options' similarity in value is called decision difficulty. We refer to this measure as choice uncertainty to distinguish it more clearly from our measure of stop probability because the decision to continue despite a high stop probability can also be considered a difficult decision. Recently, one study suggested that the value of the nondefault choice was confounded with choice uncertainty in the studies by Kolling et al. (2012Kolling et al. ( , 2014, arguing for the choice uncertainty hypothesis (Shenhav et al., 2014). In our task, high stop probability values exceeding 0.5 imply a decreasing choice uncertainty as the value difference between the two options increases. We found increasing activity in dACC while subjects chose the nondefault "continue" response despite a high stop probability where choice uncertainty is decreasing. Furthermore, we did not find dACC activity correlating with the choice uncertainty regressor. Both findings together argue against the choice uncertainty theory of dACC function. Compared with the study by Kolling et al. (2012), the continue choice was the most frequently chosen action in our sequential gambling task, and it was the context of the high stop probability that rendered it a nondefault choice. Therefore, we infer that the dACC activity associated with making a nondefault choice is not just a reflection of having to perform an action that is different from the most frequently chosen response, but rather associated with making a response that is not the normal response in the current state.
In conclusion, this study characterized the neural dynamics associated with sequential risk taking under conditions of escalating reward and risk. We found that sequential risk-taking progressively engages a cortico-striatal-subthalamic "braking" network and that the scaling of connectivity within this network is linked to the expression of low-risk choice behavior. On the other hand, we find that the dACC is involved in choosing the continue choice over an increasing tendency to stop and thus mediating nondefault choices rather than choice uncertainty.