Abstract
Using neuroimaging in combination with computational modeling, this study shows that decision threshold modulation for reward maximization is accompanied by a change in effective connectivity within corticostriatal and cerebellar–striatal brain systems. Research on perceptual decision making suggests that people make decisions by accumulating sensory evidence until a decision threshold is crossed. This threshold can be adjusted to changing circumstances, to maximize rewards. Decision making thus requires effectively managing the amount of accumulated evidence versus the amount of available time. Importantly, the neural substrate of this decision threshold modulation is unknown. Participants performed a perceptual decision-making task in blocks with identical duration but different reward schedules. Behavioral and modeling results indicate that human subjects modulated their decision threshold to maximize net reward. Neuroimaging results indicate that decision threshold modulation was achieved by adjusting effective connectivity within corticostriatal and cerebellar–striatal brain systems, the former being responsible for processing of accumulated sensory evidence and the latter being responsible for automatic, subsecond temporal processing. Participants who adjusted their threshold to a greater extent (and gained more net reward) also showed a greater modulation of effective connectivity. These results reveal a neural mechanism that underlies decision makers' abilities to adjust to changing circumstances to maximize reward.
Introduction
When making everyday decisions, we oftentimes incur opportunity costs, spending time deliberating one judgment at the cost of having time to consider others (Chittka et al., 2009). To maximize returns in situations in which multiple decisions need to be made, decision makers have to adjust their decision criterion (Gold and Shadlen, 2002).
Research on perceptual decision making has provided considerable first steps into general mechanisms of decision making (Shadlen and Newsome, 2001; Romo et al., 2004; Philiastides et al., 2006; Gold and Shadlen, 2007; Heekeren et al., 2004, 2008). Underlying processes have been well described by sequential sampling models (Smith and Ratcliff, 2004). One distinctly successful instantiation of this computational framework, the drift diffusion model, provides a well-fitting description of mechanisms for simple decisions (Smith and Ratcliff, 2004; Bogacz et al. 2006; Ratcliff and McKoon, 2008). Decisions are formed by continuously accumulating the relative evidence for the choice alternatives over time until a response boundary is crossed. The distance of boundaries from the starting point of the accumulation process determines accuracy and speed of decisions. More accurate decisions require longer accumulation and are thus slower, because more noisy evidence has to be integrated.
From a computational standpoint, a modulation of the decision threshold can be accomplished by either adjusting the required amount of accumulated evidence or changing deliberation time through the application of a time-dependent stopping rule (Busemeyer and Townsend, 1993; Gold and Shadlen, 2002; Fig. 1A). However, it is not established how an adjustable decision threshold for reward maximization is implemented in the brain.
Recent evidence suggests that decision threshold modulation is implemented in the corticobasal ganglia network (Lo and Wang, 2006; Forstmann et al., 2010; Cavanagh et al., 2011). A change in synaptic strength within this network has been proposed as a candidate neural mechanism supporting decision threshold modulation. This proposal is based on the observation that activity in the basal ganglia generally increases until a neural activity bound is reached and motor execution is initiated. In this context, the most efficient way to speed up or slow down responses to accommodate better or worse stimulus quality is to modulate the synaptic strength between cortical neurons accumulating decision evidence and basal ganglia neurons that trigger motor execution when a neural activity bound is reached (see Materials and Methods).
Consistent with previous findings, we hypothesize that brain mechanisms for perceptual decision-making adjust decision thresholds (to maximize reward) by modulating connectivity among brain regions that are involved in evidence accumulation, time processing, and action selection. These components are represented by the dorsolateral prefrontal cortex (dlPFC), which accumulates the relative evidence for choice alternatives in perceptual decision-making tasks (Heekeren et al., 2004, 2008; Philiastides et al., 2011; Wenzlaff et al., 2011), and the cerebellum, which has been implicated in subsecond temporal processing (Ivry and Keele, 1989; Dreher and Grafman, 2002; Lewis and Miall, 2003; Ivry and Spencer, 2004; Grondin, 2008). These regions communicate with the striatum, the main input structure of the action-selection system (Selemon and Goldman-Rakic, 1985; Graybiel et al., 1994; Middleton and Strick, 1994). Using a multimethod approach, we show that decision threshold modulation for reward maximization is accompanied by changes in effective connectivity [psycho-physiological interactions (PPIs); see Materials and Methods] within corticostriatal and cerebellar–striatal networks.
Materials and Methods
Participants.
Twenty-two subjects (11 females; mean age, 28 ± 3.7 years) were recruited from an in-house database at the Max Planck Institute for Human Development (Berlin, Germany). All subjects had normal or corrected-to-normal vision, were free of neurological and psychiatric history, and were briefed on the nature of methods used. All participants gave informed consent to participate according to the protocol approved by the local ethics committee. Complete datasets of 18 subjects were included in the full analysis, because two participants dropped out during experimental testing and two other participants had severe head motion during scanning (for details, see below).
Task setup.
Subjects had to decide on the direction of motion (binary choice: left or right) of a dynamic random-dot stimulus and indicate their choice with a button press. Dots were white on a black background and were drawn in a circular aperture (∼5° diameter) for the duration of one video frame (60 Hz). Dots were redrawn after ∼50 ms at either a random location or a neighboring spatial location to induce apparent motion. The resultant motion effect appeared to move between 3°/s and 7°/s, and dots were drawn at a density of 16.7 dots per degree per second. The task was implemented with Presentation (version 0.70; Neurobehavioral Systems), the Psychtoolbox3 (www.Psychtoolbox.org), and an adapted version of the Variable Coherence Random Dot Motion Code Collection (www.shadlen.org/Code/VCRDM). Stimuli were displayed using VisuaStim goggles (Magnetic Resonance Technologies), consisting of two small thin-film transistor monitors placed directly in front of the eyes, simulating a distance to a normal computer screen of 100 cm with a resolution of 1024 × 768 pixels and a refresh rate of 60 Hz. Participants used VisuaStim Response Pads (Magnetic Resonance Technologies) to make their response by pressing a button with either their left or right thumb.
The task was partitioned into blocks using three alternating reward schedules: (1) +50/−25; (2) +50/−50; and (3) +25/−100 (Fig. 1B). These number pairs consist of gained points for a correct answer (left number) and lost points for an incorrect answer or a failure to respond (right number). Blocks in all conditions were maximally 32 s long. These 32 s were filled with as many trials as possible. For example, if a participant would respond with an average reaction time (RT) of 600 ms per trial during a block, she would be able to complete 11 trials [32s/(0.6 s decision + 1.6 s fixation + 0.6 s feedback) = 11.4]. Within a block, a new trial was only issued if there was enough remaining time for a complete maximum trial [length of 3.2 s = fixation cross (1.6 s) + maximum stimulus presentation time (1 s) + feedback (0.6 s)].
Participants did not receive explicit response instructions but were informed that there is limited overall time for making decisions. Maximizing net reward under different reward schedules thus required adapting decision thresholds. For instance, when gains for correct responses were small and losses for incorrect responses were large, it was reward maximizing to avoid large losses by collecting more evidence and thereby increase the probability to respond accurately. Conversely, when gains were moderate and losses small, it was reward maximizing to respond faster (albeit on average less accurately) because as a result more decisions could be made during the experiment and in this way net reward increased. Hence, subjects were required to modulate their decision behavior by using either a lower threshold (leading to faster but less accurate responses) or a higher threshold (leading to slower yet more accurate responses) to maximize net reward in finite time (Fig. 1A). Performance was rewarded with up to €25.
Blocks were presented randomized, whereby the reward schedule for each block was shown once at the beginning of each block, displayed to the subject for 3 s. The decision phase immediately followed in blocks of trials (amount of total responses depended on participant's individual response speed). In each trial, subjects saw a fixation cross (1.6 s) after which the stimulus was presented. The stimulus was extinguished by a response or after 1 s, followed by feedback of 0.6 s. At the end of each block, participants were shown their current projected reward in euros (2 s) for the entire experiment: Reward (in €) = (€25/Total Score) × ScoreReward Schedule, where Score = Actual_Scoreprojected + Additional_Scoreprojected. Using the 75% accuracy level as base-level performance (set by an adaptive staircase procedure; see below) for the speed condition, we calculated how much the subjects could earn (whereby the fastest included RT had to be >249 ms and RTs <250 ms were assumed to be fast guesses). Including the points per reward schedule, we computed a metric for rewards for the entire experiment with the aforementioned RT constraint. Subject's scores (which were determined by accuracy and response times) were then compared with this metric, and they were rewarded based on their score.
Participants received a 20 min practice immediately before they entered the scanner. During practice, subjects performed direction-of-motion discriminations for varying coherence levels and received per trial feedback on accuracy. Participants did not practice under the influence of reward schedules or response instructions. The subsequent scanning session totaled 1 h. While subjects were lying in the scanner but before any scanning protocol, an adaptive staircase procedure was used to determine an individual stimulus coherence level at 75% performance accuracy (Leek, 2001). This allowed us to obtain enough error responses for a good fit of the diffusion model to RT data (Ratcliff and Tuerlinckx, 2002; Vandekerckhove and Tuerlinckx, 2007). Moreover, a constant-coherence level allowed us to exclude effects of stimulus difficulty on decision threshold adaptation (Vandekerckhove and Tuerlinckx, 2007; Ratcliff and McKoon, 2008). The coherence level at which participants achieved 75% accuracy was low (group mean, 12%). We were thus able to minimize differences in attention between reward schedule conditions. It is assumed that low-coherence stimuli demand comparable focus on the task during all task conditions. Participants were debriefed with a questionnaire and a personal interview.
Behavioral data were analyzed with a repeated-measures ANOVA. For each individual dataset, all trials with RTs > = 2 SD from the mean were excluded from all analyses. Computational model parameters such as boundary height or drift rate were estimated using the Diffusion Model Analysis Toolbox (DMAT; Vandekerckhove and Tuerlinckx, 2008). We fitted the standard version of the model with constant decision boundaries and an extended version (including a collapsing boundary; courtesy of Jonathan Malmaud and Antonio Rangel, California Institute of Technology, Pasadena, CA) of the diffusion model to experimental data. The goodness of fit of the models values were compared using the Bayesian information criterion (BIC; Schwarz, 1978).
Neuroimaging.
Subjects were scanned in a whole-body 3T Siemens TRIO MR system by acquiring 46 axial slices (216 mm field of view; 72 × 72 acquisition matrix; in-plane voxel dimensions, 3 × 3 mm; repetition time of 2500 ms; echo time of 29 ms; 80° flip angle) parallel to the anterior commissure–posterior commissure plane and covering the whole brain. Slice gaps were interpolated to generate output data with a spatial resolution of 3 × 3 × 3 mm. High-resolution T1 images (repetition time of 1900 ms; echo time of 2.52 ms; 9° flip angle; 192 sagittal slices; voxel size, 1 × 1 × 1 mm) were acquired, as well as field inhomogeneity measures and a structural fluid-attenuated inversion recovery image.
fMRI data analysis was performed on the High-Performance Computing System Abacus (http://www.zedat.fu-berlin.de/Compute) at Free University of Berlin with FSL [for FMRIB (Functional MRI of the Brain, Oxford, UK) Software Library] version 4.1.2, the open software toolbox from FMRIB (http://www.fmrib.ox.ac.uk/fsl/), and in-house developed MATLAB scripts (R2007a; www.mathworks.de). Regressors were convolved using a double-gamma hemodynamic response function. Motion parameters were checked for outliers [two participants with relative head movement greater than one voxel size (3 mm) were removed from the analysis] and added as regressors to the design matrix to reduce motion-related artifacts. Functional volumes were slice-time and motion corrected and spatially smoothed by using a Gaussian kernel of 6 mm full-width at half-maximum and high-pass filtered with σ = 80 s (2.5 times the maximum block length). Images were registered using linear transformations [FLIRT (for FMRIB Linear Image Restoration Tool); Jenkinson et al., 2002] with 7 degrees of freedom (d.o.f.) for individual functional (EPI) space to T1 and with 12 d.o.f. for T1 to standard space. For group-level results, individual-level contrasts were averaged using FLAME (for the FMRIB Local Analysis of Mixed Effects) 1 and 2 (including automatic deweighting of outliers) module in FSL (Beckmann et al., 2003; Woolrich et al., 2004), and one-sample t tests were performed at each voxel for each contrast of interest. Significant clusters on a familywise-error-corrected level of α = 0.05 were identified with the threshold-free cluster enhancement (TFCE) algorithm implemented in the FSL program randomize (Smith and Nichols, 2009). From group-level activation maps of the decision phase, we calculated a conjunction map based on the initial general linear model decision phase group activation maps using a logical AND conjunction at a p = 0.05 corrected TFCE threshold (Fig. 2A). This conjunction map includes brain regions that show a significant increase in blood oxygenation level-dependent (BOLD) activity during the decision phase in all reward schedules. Considering our hypothesis that threshold adjustments are instantiated by changes in interaction between those brain regions mediating decision making, we selected regions of interest (ROIs) that are involved in representation and computation of the decision variables accumulated evidence (dlPFC) and available time (cerebellum) as seeds for a PPI analysis. To verify that the selected ROIs are functionally involved in the stipulated processes of evidence accumulation and available time processing, respectively, we compared parameter estimates for the decision phase under the different reward schedule conditions. Functional ROIs (Poldrack, 2007; Ramsey et al., 2010) were constrained by using a 50% anatomical probability threshold based on the anatomical probability atlases included in FSL [the Harvard–Oxford Subcortical Atlas (Caviness et al., 1996) for the dlPFC and the MNI Structural Brain Atlas (Diedrichsen et al., 2009) for the cerebellum]. ROIs were subsequently transformed from standard space into each individual's functional space using a linear transformation. Each individual ROI was checked to ensure that it is contained within the brain and within the anatomical ROI and was subsequently used to extract the time series of the seed regions for a PPI analysis (Friston et al., 1997). It is worth to consider in detail if changes in the strength of the cortico (cerebellar)-striatal synapses can be examined with fMRI by analyzing changes in effective connectivity with a PPI analysis. According to Friston and colleagues, a change in “effective connectivity” can be characterized as a change in the correlation of neural activity (and the dependent BOLD signal) between two neuronal populations from one state (or experimental condition) to another (Friston et al., 1993, 1997; Friston, 2011).
Significance testing of the results of PPIs for two dlPFC (left and right) and one cerebellar seed regions (left) was done in a predetermined basal ganglia mask (based on Harvard–Oxford Subcortical Atlas including bilateral caudate, putamen, nucleus accumbens, thalamus, pallidum, using no probability threshold), and correction for familywise error was achieved by applying small-volume correction. Next, to investigate whether voxels were correlated with boundary modulation, a whole-brain group analysis (high threshold > low threshold) with the additional covariate of the magnitude of boundary modulation (estimated with the diffusion model) was performed. Additional whole-brain PPI analyses were performed for the 10 most highly activated clusters during the decision phase (see Table 3).
Results
Participants made, on average, 815 decisions during the entire experiment. Training on the task before the scanning session and during the scanning session minimized error trials. On average, 2.1 responses (0.25%) were fast guesses (i.e., response time >250 ms) and participants did not respond within 1 s in 7.3 trials (0.85%).
Subjects altered their decision-making behavior to maximize reward by responding with decision threshold adjustments to reward schedules. They responded faster (F(1.21,20.57) = 17.361, p < 0.0001) and less accurately (F(1.722,29.28) = 17.817, p < 0.0001) during blocks of trials in which faster responses were reward maximizing compared with blocks in which slower, accurate judgments were more profitable (Fig. 1C). The number of completed trials differed significantly between the high-threshold condition (mean ± SD, 262 ± 6.8 trials) and low-threshold condition (mean ± SD, 277 ± 9.4; t(17) = 11.8, p < 0.001) (mean ± SD, 276 ± 9.1 in the neutral-threshold condition). The number of completed trials indicates that participants adjusted their decision-making behavior to reward schedules; however, the examination of reward frequencies (number of accurate trials per condition) shows that this did not happen in an optimal manner: average ± SD reward frequencies were 207 ± 19 in the low-threshold, 215 ± 20 in the neutral-threshold, and 212 ± 19 in the low-threshold conditions. The differences high–low and neutral–low are statistically significant (t(17) = −2.3, p = 0.035 and t(17) = −3.08, p = 0.006, respectively), but high–neutral is not statistically significant (t(17) = −0.68, p = 0.49). The lower reward frequency in the low-threshold condition is unexpected because, to optimize rewards, a larger number of completed trials should also lead to a larger number of accurate trials. The reduced reward frequency in the low-threshold condition suggests that participants on average responded too fast in this condition. That is, they implemented a lower than optimal decision threshold given their discrimination ability (as measured with the drift rate v of the diffusion model).
A, Decision threshold modulation is reward maximizing under changing incentive structures (red and green lines). B, Rewards and costs for hits and misses are presented at the beginning of each block of trials. Subjects receive per trial feedback and the projected reward in euros after each block. Overall time for the task is finite. Thus, response times influence the number of decisions that can be made. C, Behavioral results. Top, Mean ± SEM RT: low-threshold state, 576.41 ± 17.48 ms; intermediate state, 597.04 ± 17.30 ms; high-threshold state, 624.07 ± 19.35 ms. Bottom, Mean ± SE of response accuracy: low-threshold state, 75.45 ± 2.13%; intermediate state, 78.45 ± 2.16%; high-threshold state, 81.62 ± 1.89% (percentage correct mean ± SEM RT and response accuracy differed significantly between threshold states, *p < 0.05, **p < 0.001, Bonferroni's adjusted). D, Mean ± SEM group boundary parameter values for all threshold states from the best-fitting diffusion model: low-threshold state, 0.0739 ± 0.0035 a.u.; intermediate state, 0.0778 ± 0.0028 a.u.; high-threshold state, 0.0848 ± 0.0027 a.u. (boundary heights are significantly different between threshold states, *p < 0.01, **p < 0.001). E, Magnitude of boundary modulation relates to reward. Reward gain is defined as the difference between projected reward with no threshold modulation and actual reward with boundary modulation. F, Quantile probability plots of four randomly selected datasets.
Subjects markedly adjusted their decision process by modulating their decision threshold, as indicated by model fits of the drift diffusion model to observed RT data (Fig. 1F). Model comparisons on different versions of the diffusion model indicated the modulating boundary version as the best-fitting model (lowest BIC score of 39,227 in the standard DMAT framework; see top part of Table 1). Additionally, we fitted an extended model that included an exponentially collapsing boundary parameter (also implemented in DMAT) to the RT data and compared it with the previously used versions. Model fits indicated the standard modulating boundary version as the best-fitting model (lowest BIC score of 25,620; see bottom part of Table 1). Moreover, subjects with a greater threshold modulation between high-threshold and low-threshold states gained more rewards (Fig. 1E). It is important to note that the interindividual differences in threshold modulation we observed are consistent with previous findings, which indicate that not all subjects adjust decision thresholds equally well (Bogacz et al., 2010).
BIC model comparison on different standard (top) and extended (bottom) versions of the drift diffusion model
Within each threshold condition, participants were more accurate in trials with faster RTs than in those with slower RTs. The main effect of response speed on accuracy was significant (F(1,17) = 46.14, p < 0.001). These results are in line with the view that longer RTs between conditions (associated with increased accuracy) are caused by a different process than longer RTs within condition (associated with reduced accuracy). Within-condition slowing is likely to be a result of trial-to-trial variations in difficulty, in that participants slow down response in more difficult trials to maintain accuracy (although this has only limited success). This type of speed–accuracy tradeoff is described by the model of Lo and Wang (2006). In contrast, between-condition differences in RTs are not an adaptation to local variations in difficulty but an adaptation to the reward structure.
Corresponding to our hypothesis that threshold modulation is reflected in a change in effective connectivity between brain regions that process task-relevant decision variables and control action selection during the decision phase (Fig. 2A,C), we focused on ROIs that are involved in the representation and computation of accumulated evidence (dlPFC) and available time (cerebellum) and motor execution (basal ganglia). To verify the putative functional roles of those seed regions in the decision phase, we compared their respective parameter estimates for low-, neutral-, and high-threshold conditions (Fig. 2C). BOLD responses in both dlPFCs showed greater activation during the high(er)-threshold condition in which more evidence is accumulated compared with the low(er) evidence accumulation condition, in which faster responses are reward maximizing (Fig. 2C). For the cerebellar ROI, we observed the opposite pattern. Activation was greater in the low-threshold condition, which favors faster decision making compared with the high-threshold condition when slow(er) but on average more accurate decisions are reward maximizing. BOLD responses were greater in the cerebellum when deliberation time was of the essence, suggesting that this ROI is involved in time processing. This is in line with other evidence that ascribes a prominent role for the cerebellum during subsecond (interval) time processing (Ivry and Keele 1989; Dreher and Grafman, 2002; Harrington et al., 2004; Grondin, 2008). Together, these results thus demonstrate a functional role for both the dlPFC and the cerebellum in our task. Note that voxels in the striatum showed significant activation during the decision phase across all reward schedules (Fig. 2A,B).
A, Conjunction map of activated brain areas during the decision phase. B, Threshold modulation covariation (red) overlap with decision phase activity (green). C, Beta values of seed regions for low-threshold (lT), neutral-threshold (nT), and high-threshold (hT) states (red, right DLPFC; blue, left DLPFC; yellow, cerebellum); *p < 0.01.
The PPI analysis of both dlPFC seeds indicates a modulation of neural connectivity between the dlPFC and the striatum when comparing high-threshold with low-threshold states (Fig. 3A,B; Table 2). During high(er)-threshold states (in which thresholds are adjusted for slower and more accurate decisions), effective connectivity between bilateral dlPFC and the left striatum was significantly increased compared with during lower-threshold states (Table 2). According to our connectivity hypotheses, we expected this neural interaction to be reflected in the magnitude of boundary modulation from the diffusion model as well. We calculated Pearson's correlation coefficient between estimates of the diffusion model boundary parameter and neural effective connectivity parameter (i.e., the PPI), correcting for multiple comparisons. We found a strong, positive correlation for the PPI estimates of the left dlPFC [r = 0.8 (16), p < 0.0001, two-tailed; Fig. 3A] and the right dlPFC [r = 0.798 (16), p < 0.0001, two-tailed; Fig. 3B] with the magnitude of boundary modulation (high–low threshold) of the computational model. This association between PPI estimates and boundary modulation firmly suggests that corticostriatal brain systems for the processing of accumulated evidence contribute to the observed changes in decision parameters. During lower-threshold states, effective connectivity between the cerebellum and the striatum was significantly increased compared with higher-threshold states (Fig. 3C). In line with our hypothesis, neural connectivity between the cerebellum and the striatum correlated negatively with the magnitude of boundary modulation [r = −0.644 (16), p = 0.004, two-tailed; Fig. 3C, using the same analysis direction as in the dlPFC analyses].
A, From top to bottom, Left dlPFC seed ROIs, functionally interacting region of the striatum (z max = 3), association of neural connectivity parameter with boundary modulation estimate (high–low threshold states) from the diffusion model. B, Same as A but for right dlPFC (z max = 3). C, Same as A but for cerebellar seed, interaction with left striatal region (z max = 3.1). Blue squares, red diamonds, and green circles indicate individual subject estimates of neural interaction and magnitude of boundary modulation comparing high- with low-threshold states.
dlPFCs and cerebellar PPI results
To further examine the relationship between condition differences in boundary parameters and brain activation, we performed a voxelwise covariate analysis with the magnitude of boundary modulation. This analysis resulted in a cluster of activation in the left striatum, in particular the left putamen [center of gravity (COG) (x, y, z), MNI152 = −24, 0, 2; size = 96 voxels; at z = 2.4 uncorrected whole brain]. Significance testing using small-volume correction in the same basal ganglia mask used for the PPI analysis resulted in one significantly activated cluster that partly overlaps with decision phase activity [COG (x, y, z), MNI152 = −29, −1, 5; 169 voxels; TFCE corrected at p = 0.05; Fig. 2B].This cluster showed similar activation during low- and high-threshold decision phases. Importantly, in the left posterior striatum, the clusters showed (1) covariation with boundary modulation, (2) PPI with left dlPFC, and (3) PPI with the cerebellum overlap [COG (x, y, z), MNI152 = −27, −3, −7; 32 voxels; TFCE corrected at p = 0.05; Fig. 4]. There were no significant whole-brain results from additional PPI analyses using other ROIs as seeds that were also activated during the decision phase (corrected for multiple comparisons; Table 3).
The clusters showing (1) a covariation with boundary modulation, (2) PPI with left DLPFC, and (3) PPI with the cerebellum overlap (COG: −27, −3, −7; 32 voxels, in pink).
PPI results for 10 significantly activated clusters for the decision phase
Discussion
We present converging evidence encompassing behavioral data, neuroimaging data, and computational modeling, which suggests that decision threshold modulation for reward maximization is instantiated through a change in effective connectivity within corticostriatal and cerebellar–striatal functional brain circuits during the decision phase, with the former being responsible for processing of accumulated sensory evidence and the latter being responsible for automatic, subsecond temporal processing.
Neuroimaging studies, along with pharmacological and lesion studies, indicate that the cerebellum is crucial for automatic subsecond temporal information processing (Ivry and Keele, 1989; Hazeltine et al., 1997; O'Reilly et al., 2008; without disregarding the ongoing debate concerning the exact role of the cerebellum in temporal processing, see Mauk and Buonomano, 2004; Coull et al., 2011). We propose that stronger effective connectivity in cerebellar–striatal brain systems places a greater weight on available decision time (corresponding to a time stopping rule; Busemeyer and Townsend, 1993; Gold and Shadlen, 2002; Cisek et al., 2009) compared with evidence accumulation during decision making. Conversely, stronger connectivity between the dlPFC and the striatum during high(er)-threshold states places a greater weight on the accumulated evidence during deliberation.
These results are generally consistent with previous empirical and theoretical accounts (Bogacz et al., 2006; Lo and Wang, 2006; Forstmann et al., 2008; Salinas, 2008; Deco et al., 2010). Notably, however, our results partly diverge from the predictions of a theoretical model proposed by Lo and Wang (2006). We observed stronger connectivity in a corticostriatal network during higher-threshold states, which leads to more accurate and slower decision making. Conversely, the biophysical network model implements strong(er) connectivity in corticostriatal pathways, producing low(er) decision thresholds for fast(er) responses with lower accuracy. Simulations of the computational model show that the optimal synaptic efficacy for reward maximization depends on task difficulty (Lo and Wang, 2006, their Fig. 7B,C). Lo and Wang show that, for more difficult choices (low motion coherence), synaptic efficacy is tuned toward higher thresholds. However, the computational model does not incorporate information about elapsed time, which is crucial for reward maximization in our task. Moreover, motion coherence was not manipulated in our experiment; in our task, more difficult choices are rather instantiated by reward schedules that entail a bigger loss and require therefore more accurate and slower decision making. Additional support for our interpretation of the observed changes in connectivity may evolve when considering the signal-to-noise ratio (SNR) in the cortex (cf. Faisal et al., 2008) and how signal gating modulates it. In the context of adaptation to stimulus quality, a theoretical analysis (Lo and Wang, 2006) suggests that the downregulation of the decision threshold is mediated by an increase of corticostriatal synaptic efficacy. This mechanism could be suboptimal in the context of changing reward schedules as we describe below.
We assume that change in synaptic efficacy is mainly instantiated through a gating process, in which stronger gating leads to weaker synaptic efficacy (cf. Salinas and Thier, 2000; Chance et al., 2002). Importantly, gating has the effect to increase the SNR because it primarily filters weaker “noise” activity (cf. Purcell et al., 2010). The underlying assumption here is that “signal” spikes have generally higher firing rates than “noise” spikes, and a low synaptic efficacy effectively filters out noise (cf. Purcell et al., 2010). Hence, increasing the synaptic efficacy by downregulating the gating process reduces the SNR. In the context of adaption to higher quality of perceptual input, for which the model of Lo and Wang (2006) was developed, this reduction of SNR is unproblematic because it is counterbalanced by the greater SNR in cortical accumulation neurons. However, when the synaptic efficacy is adjusted in the context of changing reward schedules, no counterbalancing occurs, leading to a weaker SNR of the accumulation signal sent to striatal neurons.
In our interpretation, the striatum combines evidence (dlPFC) and available time information (cerebellum) for the execution of a motor response (Middleton and Strick, 1994; Hoshi et al., 2005; Balleine et al., 2007; Cohen et al., 2009). In a nutshell, we assume that, independent of orientation toward speed or accuracy, a fixed amount of input to the striatum is required for action execution (cf. Mink, 1996) (consistent with that, we did not find that activation differed between conditions in the striatum). This input can stem from either cortical accumulation regions or from cerebellar timing regions (modulating baseline firing of the striatum). Depending on the response condition, the connectivity between accumulation and timing region on the one side and striatum on the other is adjusted. Note that both decision drivers always influence striatal activation, but the relative importance can vary. More specifically, in higher-threshold states, synaptic efficacy in a corticostriatal network should be greater and in a cerebellar–striatal network should be smaller compared with the low-threshold state. Thus, the decision threshold and the reward frequency are sensitive to the relative strength of corticostriatal and cerebellar–striatal connectivity: the greater the relative strength of the corticostriatal connectivity, the higher the threshold, and the greater the cerebellar–striatal connectivity, the lower the threshold. This mechanism can also be expressed in a simple equation, in which a vector of firing rates of striatal neurons (NStr) is the weighted average of firing rates of cortical neurons (Nco) and cerebellar neurons (Nce): NStr = α × Nco + β × Nce + x, where x are other influences on striatal activity including noise. From this equation, it is easy to see that a change in synaptic efficacy, i.e., in the weights α and β, changes the correlation between the time series NStr on the one hand and Nco, and Nce, respectively, on the other hand. This formulation also shows that using changes in effective connectivity (as measured using BOLD fMRI) as a marker for changes in synaptic efficacy is well justified.
One interesting aspect of mechanisms for threshold adaptation is whether the observed changes in connectivity need to remain active throughout the entire task. In the mechanisms we propose, the persistent changes in connectivity represent different modes of processing under which decision making is performed in our task to maximize rewards. Persistent connectivity modulation is observed because, in every trial, sensory information and elapsed time are integrated until a decision threshold is reached (differently, depending on the reward schedule) to form a decision. Trial feedback and dependent adjustments may be additionally reflected by this persistent activity. Note that we see these persistent changes as a complementary process to processes of state resetting or switching (cf. Rushworth et al., 2002; Forstmann et al., 2008).
An alternative account of boundary modulation in the context of an urgency model or collapsing boundary model (cf. Cisek et al., 2009) is that decision boundaries collapse faster when the decision time is shorter. To investigate the plausibility of such a model, we fitted RT data to a diffusion model implementing an exponentially collapsing boundary (courtesy of Jonathan Malmaud and Antonio Rangel). However, model comparison indicates that this collapsing boundary version of the diffusion model does not improve the fit compared to a modulating boundary model. Moreover, we do not find significant correlations between the difference in collapse rate of the boundary parameter and the PPI results. Still, future work should test alternative formulations of the urgency model (e.g., comparing linear, logarithmic, or exponential collapse over time).
One potential alternative interpretation of the PPI results is the simple “reduced noise through attention” hypothesis. Subjects might pay more attention in trials with more at stake, which by reducing noise in the computations performed in all relevant areas increases the observed functional connectivity in the trials with emphasis on accuracy. However, this hypothesis is inconsistent with our result of a stronger cerebellar–striatal correlation in the low-threshold condition. Hence, the “reduced noise” argument cannot explain all our results. Moreover, if attention was the main driver of performance difference between conditions, accuracy differences should have been greater. This is because we had well-trained participants who performed the task at relatively low-coherence levels selected for 75% accurate responses in the neutral threshold condition (see Materials and Methods).
As described above, we propose that the striatum combines evidence and available time information for the execution of a motor response, which we assume to be coded in dlPFC and cerebellum, respectively. The dlPFC has been implicated in the processing of accumulated sensory evidence during decision making (Heekeren at al., 2008; Domenech and Dreher, 2010; Philiastides et al., 2011). Studies on value-based decision making locate the comparator/accumulator in the dorsomedial PFC (dmPFC) and the intraparietal sulcus (IPS) rather than the dlPFC (Basten et al., 2010; Hare et al., 2011). The distinctiveness of the implicated brain regions (dlPFC vs dmPFC) is not uncommon and is likely attributable to task and stimulus differences. Perceptual and value-based decisions share a common neural mechanism (change in connectivity), but the neural substrate can differ, even within the domain of value-based choices. For example, Basten et al. (2010) locate a cost–benefit comparator mechanism in the ventromedial PFC and the accumulator in the IPS. The general mechanism of difference-based accumulation of evidence is the same as in the study by Hare et al. (2011) but the neural substrate differs. Future research based on single or multiunit recordings and/or high-resolution fMRI will be required to clarify the invariance of mechanisms and regions that are involved in these decision-making processes.
In conclusion, we present converging evidence from behavioral data, neuroimaging data, and computational modeling showing that threshold adjustments to maximize net reward are instantiated through a change in effective connectivity within corticostriatal and cerebellar–striatal brain systems. Our findings on the neural mechanism of decision threshold modulation could be highly relevant for the understanding of neuropsychiatric disorders that are accompanied by impulsivity and or inflexibility (Mulder et al., 2010) and how temporal information can be used to inform decision making (cf. Hanks et al., 2011).
Footnotes
This research was supported by the Max Planck Society and German Research Foundation Grant HE 3347/2-1. We thank F. Blankenburg, D. Meshi, and M. Philiastides for comments on this manuscript.
The authors declare no competing financial interests.
- Correspondence should be addressed to Nikos Green, Affective Neuroscience and Psychology of Emotion, Freie Universität Berlin, Habelschwerdter Allee 45, Room JK 24/221e, D-14195 Berlin, Germany. nikos.green{at}fu-berlin.de