## Abstract

The neurotransmitter dopamine is implicated in diverse functions, including reward processing, reinforcement learning, and cognitive control. The tendency to discount future rewards over time has long been discussed in the context of potential dopaminergic modulation. Here we examined the effect of a single dose of the D2 receptor antagonist haloperidol (2 mg) on temporal discounting in healthy female and male human participants. Our approach extends previous pharmacological studies in two ways. First, we applied combined temporal discounting drift diffusion models to examine choice dynamics. Second, we examined dopaminergic modulation of reward magnitude effects on temporal discounting. Hierarchical Bayesian parameter estimation revealed that the data were best accounted for by a temporal discounting drift diffusion model with nonlinear trialwise drift rate scaling. This model showed good parameter recovery, and posterior predictive checks revealed that it accurately reproduced the relationship between decision conflict and response times in individual participants. We observed reduced temporal discounting and substantially faster nondecision times under haloperidol compared with placebo. Discounting was steeper for low versus high reward magnitudes, but this effect was largely unaffected by haloperidol. Results were corroborated by model-free analyses and modeling via more standard approaches. We previously reported elevated caudate activation under haloperidol in this sample of participants, supporting the idea that haloperidol elevated dopamine neurotransmission (e.g., by blocking inhibitory feedback via presynaptic D2 auto-receptors). The present results reveal that this is associated with an augmentation of both lower-level (nondecision time) and higher-level (temporal discounting) components of the decision process.

**SIGNIFICANCE STATEMENT** Dopamine is implicated in reward processing, reinforcement learning, and cognitive control. Here we examined the effects of a single dose of the D2 receptor antagonist haloperidol on temporal discounting and choice dynamics during the decision process. We extend previous studies by applying computational modeling using the drift diffusion model, which revealed that haloperidol reduced the nondecision time and reduced impulsive choice compared with placebo. These findings are compatible with a haloperidol-induced increase in striatal dopamine (e.g., because of a presynaptic mechanism). Our data provide novel insights into the contributions of dopamine to value-based decision-making and highlight how comprehensive model-based analyses using sequential sampling models can inform the effects of pharmacological modulation on choice processes.

## Introduction

Future rewards are discounted in value (Peters and Büchel, 2011) such that humans and many animals prefer smaller-sooner (SS) rewards over larger-but-later (LL) rewards (temporal discounting). Steep discounting of reward value is associated with a range of maladaptive behaviors ranging from substance use disorders (Bickel et al., 2014), attention-deficit hyperactivity disorder (Jackson and MacKillop, 2016), and obesity (Amlung et al., 2016) to behavioral addictions, such as gambling disorder (Wiehler and Peters, 2015). Temporal discounting has thus been suggested to constitute a transdiagnostic process (Amlung et al., 2019; Lempert et al., 2019) with relevance for many psychiatric conditions.

Dopamine (DA) plays a central role in addiction (Robinson and Berridge, 1993). In rodents, reductions versus moderate increases in DA transmission led to increases and decreases in discounting, whereas the corresponding human literature is small and more heterogeneous (D'Amour-Horvat and Leyton, 2014). For example, de Wit et al. (2002) found that acute administration of d-amphetamine decreased impulsivity, such that temporal discounting was reduced under d-amphetamine. However, a later study did not replicate this effect (Acheson and de Wit, 2008). Administration of the D2/D3 receptor agonist pramipexole did not affect measures of impulsivity in another study (*n* = 10) from the same group (Hamidovic et al., 2008). In contrast, Pine et al. (2010) observed increased temporal discounting following administration of the catecholamine precursor L-DOPA compared with placebo in healthy control participants (*n* = 13), while the D2-receptor antagonist haloperidol did not modulate discounting. In a recent within-subjects study using L-DOPA in a substantially larger sample (*n* = 87), there was no overall effect on temporal discounting (Petzold et al., 2019). Rather, effects depended on baseline impulsivity, which the authors interpreted in the context of the inverted-U-model of DA effects on cognitive control functions (Cools and D'Esposito, 2011). Two recent studies have reported a reduction in discounting following administration of the selective D2/D3-receptor antagonist amisulpride (Weber et al., 2016) as well as the D2 receptor antagonist metoclopramide (Arrondo et al., 2015). Although the latter is primarily used clinically for its peripheral effects, it can pass the blood-brain barrier and act centrally (Shakhatreh et al., 2019).

A similar heterogeneity is evident when considering model-based reinforcement learning (RL) (Doll et al., 2012), which in some studies (Shenhav et al., 2017), but not others (Solway et al., 2017), was associated with reduced temporal discounting. However, in contrast to temporal discounting (see above), L-DOPA instead increased reliance on model-based RL in healthy controls (Wunderlich et al., 2012) and Parkinson's disease patients (Sharp et al., 2016). Notably, this overall effect was not observed in a recent study in a substantially larger sample (*n* = 65) (Kroemer et al., 2019). Here, increased model-based RL under L-DOPA was restricted to participants with high working memory capacity.

One well-replicated behavioral effect in temporal discounting (magnitude effect) refers to the observation that the rate of temporal discounting decreases with increasing reward magnitude (Green et al., 1997). In humans, this effect depends on lateral PFC processing (Ballard et al., 2017); and in rodents, D-amphetamine effects on temporal discounting are more pronounced for large-magnitude conditions (Krebs et al., 2016). However, it is unclear whether DA impacts the magnitude effect in humans.

In the present study, we examined these processes using a between-subjects double-blind placebo-controlled pharmacological study with the D2-receptor antagonist haloperidol (2 mg). We previously reported increased dorsal striatal activation under haloperidol versus placebo in these participants (Clos et al., 2019a,b), compatible with a predominantly presynaptic effect of haloperidol that increases striatal dopaminergic signaling. Importantly, we extended previous pharmacological studies by applying a temporal discounting modeling framework based on a combination of discounting models with the drift diffusion model (DDM) (Pedersen et al., 2017; Fontanesi et al., 2019; Shahar et al., 2019; Peters and D'Esposito, 2020), allowing us to comprehensively examine drug effects on response time (RT) components related to both valuation and non–valuation-related processes.

## Materials and Methods

#### Participants

Fifty-four healthy participants were initially enrolled in the study. Participants were screened by a physician for current diseases and current intake of prescription drugs or drugs of abuse. All participants were presently in good health and had no history of neurologic or psychiatric disorder with no current intake of prescription medication. Only healthy subjects were allowed to participate. Twenty-seven participants were randomly assigned to each group (placebo/haloperidol). Two participants from the haloperidol group did not complete the temporal discounting task. Technical problems led to working memory data loss from 4 participants (3 from the haloperidol, 1 from the placebo group), but these participants were still included in the temporal discounting data analysis.

Following filtering of RTs (see below; the fastest and slowest 2.5% of trials were excluded per participant), we examined the individual RT histograms for each subject (see Extended Data Fig. 1-1). This revealed that, even after filtering, the 3 participants with the fastest minimum RTs (2 from the haloperidol group and 1 from the placebo group) still showed implausibly fast responses on a number of trials (minimum RTs of 2, 2, and 234 ms, in Subjects 24, 25, and 41, respectively) such that the minimum RTs were substantially faster than those in the remaining participants (all min(RT) *z* scores of −2.04, −2.04, and −1.7; see Extended Data Fig. 1-2). These subjects were therefore excluded from further modeling.

We verified that there were no significant differences in demographic background in terms of age or baseline working memory capacity (Table 1). Potential side effects of the medication were monitored via multiple blood pressure and pulse measurements and evaluated via mood questionnaires. These analyses did not reveal significant group differences in terms of reported mood, side effects, or physiological parameters, as reported in our previous study (Clos et al., 2019b). Before enrollment, participants provided informed written consent, and all study procedures were approved by the local institutional review board (Hamburg Board of Physicians).

#### Experimental design

##### General procedure

The study consisted of two testing sessions performed on separate days. On the first day (T0), participants completed a background screening and a set of working memory tasks (see below). On the second day (T1), participants received either placebo or haloperidol (2 mg). In line with the pharmacokinetics of haloperidol (Franken et al., 2017), testing on T1 was performed 5 h after drug administration to ensure appropriate plasma levels of haloperidol. During the first 2.5 h, participants were under constant observation, and pulse as well as blood pressure levels were checked 30 min and 2 h after drug administration. During the waiting period, participants filled out questionnaires on current mood and medication effects. Participants then completed a number of unrelated tasks during an fMRI scanning session (total scan time 2.5 h.). Following scanning, they first completed the temporal discounting task outlined below, followed by a set of working memory tasks (digit span forward and backward, block span forward and backward, complex working memory span) (for detailed results, see Clos et al., 2019b).

##### Temporal discounting task

Participants performed 210 trials of a temporal discounting task where on each trial they made a choice between an SS reward available immediately and an LL reward. SS and LL rewards were randomly displayed on the left and right sides of the screen, and participants were free to make their choice at any time. For half the trials, the SS reward consisted of 20€; and for the remaining trials, the SS reward was fixed at 100€. These trials were presented randomly intermixed. LL options were computed via all combinations of a set of LL reward amounts (constructed by multiplying the SS reward with [1.01, 1.02, 1.05, 1.10, 1.20, 1.50, 1.80, 2.50, 2, 3, 4, 5, 7, 10, 13]) and LL delays (1, 2, 3, 5, 8, 30, 60 d), yielding 105 trials in total per magnitude condition. As is typically the case for temporal discounting tasks investigating magnitude effects (Green et al., 1997), all choices were hypothetical.

#### Computational modeling

##### Temporal discounting model

We applied a simple single-parameter hyperbolic discounting model to describe how value changes as a function of delay (Mazur, 1987; Green and Myerson, 2004) as follows: (1)

Here, *A _{t}* is the numerical reward amount of the LL option on trial

*t*,

*D*is the LL delay in days on trial

_{t}*t*, and

*I*is an indicator variable that takes on a value of 0 for trials from the large-magnitude condition (SS amount = 100€) data and 1 for trials from the small-magnitude condition (SS amount = 20€). The model has two free parameters:

_{t}*k*is the hyperbolic discounting rate from the large-magnitude condition (modeled in log-space) and

*s*is a weighting parameter that models the degree of change in discounting for small versus large SS rewards (i.e., higher values in

_{k}*s*reflect a greater magnitude effect) (Green et al., 1997).

_{k}#### Softmax action selection

Softmax action selection models choice probabilities as a sigmoid function of value differences (Sutton and Barto, 1998) as follows: (2)

Here, *SV* is the subjective value of the risky reward according to Equation 1 and β is an inverse temperature parameter, modeling choice stochasticity (for β = 0, choices are random and as β increases, choices become more dependent on the option values). *SV(SS _{t})* was fixed at 100 for the large-magnitude condition and fixed at 20 for the small-magnitude condition.

*I*is again the dummy-coded condition regressor, and

_{t}*s*models the magnitude effect on β.

_{β}#### Temporal discounting DDMs

To more comprehensively examine dopaminergic effects on choice dynamics, we additionally replaced Softmax action selection with a series of DDM-based choice rules. In the DDM, choices arise from a noisy evidence accumulation process that terminates as soon as the accumulated evidence exceeds one of two response boundaries. In the present setting, the upper boundary was defined as selection of the LL option, whereas the lower boundary was defined as selection of the SS option.

RTs for choices of the SS option were multiplied by −1 before model fitting. We furthermore used a percentile-based cutoff, such that, for each participant, the fastest and slowest 2.5% of trials were excluded from the analysis. We then first examined a null model (DDM_{0}) without any value modulation. Here, the RT on each trial *t* is distributed according to the Wiener First Passage Time (*wfpt*) as follows:
(3)

The parameter α models the boundary separation (i.e., the amount of evidence required before committing to a decision), τ models the nondecision time (i.e., components of the RT related to motor preparation and stimulus processing), *z* models the starting point of the evidence accumulation process (i.e., a bias toward one of the response boundaries, with *z* > 0.5 reflecting a bias toward the LL boundary, and *z* < 0.5 reflecting a bias toward the SS boundary), and ν models the rate of evidence accumulation. For each parameter *x*, we also include a parameter *s _{x}* that models the change in that parameter from the high-magnitude (SS = 100€) to the low-magnitude (SS = 20€) condition (coded via the dummy-coded condition regressor

*I*).

_{t}As in previous work (Pedersen et al., 2017; Fontanesi et al., 2019; Peters and D'Esposito, 2020), we then set up temporal discounting diffusion models by making trialwise drift rates proportional to the difference in subjective values between options. First, we set up a linear modeling scheme (DDM_{lin}) (Pedersen et al., 2017) as follows:
(4)

Here, the drift rate on trial *t* is calculated as the scaled value difference between the LL and SS rewards. As noted above, RTs for SS options were multiplied by −1 before model estimation, such that this formulation predicts SS choices whenever SV(SS) > SV(LL) (the trialwise drift rate is negative) and predicts longest RTs for trials with the highest decision conflict (i.e., in the case of SV(SS) = SV(LL) the trialwise drift rate is zero). We next examined a DDM with nonlinear trialwise drift rate scaling (DDM_{S}) that has recently been reported to account for the value dependency of RTs better than the DDM_{lin} (Fontanesi et al., 2019; Peters and D'Esposito, 2020). In this model, the scaled value difference from Equation 4 is additionally passed through a sigmoid function with asymptote *v _{max}* as follows:
(5)
(6)

All parameters, including *v _{coeff}* and

*v*, were again allowed to vary according to the reward magnitude condition, such that we included

_{max}*s*parameters for each parameter

_{x}*x*that were multiplied with the dummy-coded condition predictor

*I*(see above).

_{t}#### Hierarchical linear regression

Here we used the median posterior log(k) parameter of each participant from the DDM_{S} model (see above) to compute the discounted values for all LL options. We then computed the trialwise decision conflict as the absolute difference between the subjective value of the LL reward and the corresponding smaller sooner reward. To ensure that the intercept in the regression model corresponds to the RT for the lowest decision conflict and to account for the strongly skewed distribution of value differences, we took the inverse of the absolute difference in SS and discounted LL values in each trial. To further avoid numerical instabilities when taking the inverse of absolute differences < 1 (high conflict, e.g., SV(LL) = 20.10€, SS = 20€), these value differences were capped at 1 before computing the inverse. We then ran a hierarchical linear regression model in JAGS with 1/RT (to account for the skewed RT distribution) as dependent variable and decision conflict (inverse of the absolute value difference) as a predictor.

#### Statistical analyses

##### Hierarchical Bayesian models

Models were fit to all trials from all participants using a hierarchical Bayesian modeling approach with separate group-level distributions for all parameters for the placebo and haloperidol groups. Model fitting was performed using Markov Chain Monte Carlo as implemented in the JAGS software package (Plummer, 2003) (version 4.3) using the Wiener module for JAGS that implements the Wiener First Passage Time (Wabersich and Vandekerckhove, 2014) (see Eq. 3) in combination with R (version 3.4) and the R2Jags package. For group-level means, we used uniform priors defined over numerically plausible parameter ranges (see Code and data availability). For all *s _{x}* parameters modeling condition effects on model parameters, we used Gaussian priors with means of 0 and SDs of 2. For group-level precisions, we used γ distributed priors (0.001, 0.001). We initially ran 2 chains with a burn-in period of 900,000 samples and thinning of two. Chain convergence was then assessed via the Gelman-Rubinstein convergence diagnostic and sampling was continued until for all group-level and individual-subject parameters. This occurred after a maximum of 1.3 million samples. For most parameters, (Softmax: all parameters, DDM

_{0}: all parameters, DDM

_{lin}: 5 parameters , DDM

_{S}: 9 parameters ). Relative model comparison was performed via the deviance information criterion (DIC), where lower values reflect a superior fit of the model (Spiegelhalter et al., 2002). A total of 10,000 additional samples were then retained for further analysis. We then show posterior group distributions for all parameters of interest as well as their 85% and 95% highest density intervals (HDIs). For group comparisons, we report Bayes factors (BFs) for directional effects Kass and Raftery, 1995 for the hyperparameter difference distributions of placebo-haloperidol, estimated via kernel density estimation using R (version 4.01) via RStudio (version 1.3) interface. These are computed as the ratio of the integral of the posterior difference distribution from 0 to ∞ versus the integral from 0 to –∞. Using common criteria (Beard et al., 2016), we considered BFs between 1 and 3 as anecdotal evidence, BFs >3 as moderate evidence, and BFs >10 as strong evidence. BFs >30 and >100 were considered as very strong and extreme evidence, respectively, whereas the inverse of these reflect evidence in favor of the opposite hypothesis.

##### Parameter recovery analyses

To ensure that the parameters underlying the data-generating process could be recovered using our modeling procedures, we performed posterior predictive checks for the best-fitting model (DDM_{S}). During model estimation, we generated 10,000 datasets simulated from the posterior distribution of the DDM_{S}. Ten of these simulated datasets were randomly selected and refit with the DDM_{S} (see previous section) (Fontanesi et al., 2019; Peters and D'Esposito, 2020). Parameter recovery was then assessed in two ways. For group-level parameters, we examined whether the estimated 95% highest posterior density intervals contained the true generating parameters. For subject-level parameters, we examined scatter plots of generating versus estimated single-subject parameters, pooled across all 10 simulations.

##### Posterior predictive checks

To check whether the best-fitting model indeed captured key aspects of the data, in particular the value dependency for RTs, we performed posterior predictive checks (Peters and D'Esposito, 2020) as follows. For each individual participant, we binned trials into five bins, according to the absolute difference in LL versus SS value (“decision conflict,” computed according to each participant's median posterior log(k) parameter from the DDM_{S}, and separately for the high- and low-magnitude conditions). For each participant and condition, we then plotted the mean observed RTs as a function of decision conflict, as well as the mean RTs across 10,000 datasets simulated from the posterior distributions of the DDM_{0}, DDM_{lin} and DDM_{S}.

#### Code and data availability

Model code is available on the Open Science Framework (https://osf.io/wm7ud/). Raw choice data are available from Zenodo.org (https://doi.org/10.5281/zenodo.4006531) for researchers meeting the criteria for access to confidential data.

## Results

### Subjective and physiological drug effects

As reported in detail in our previous papers (Clos et al., 2019a,b), there were no significant group differences with respect to reported side effects, subjective mood, heart rate, or blood pressure relative to baseline. Likewise, groups did not differ with respect to the actual and guessed drug condition (haloperidol vs placebo) (Clos et al., 2019b).

### Model free analysis of temporal discounting

Figure 1*a* shows the overall RT distributions per group with choices of the LL option coded as positive RTs and choices of the SS option coded as negative RTs. As a model-free measure of temporal discounting, we examined proportions of LL choices as a function of group (placebo vs haloperidol) and condition (100€ vs 20€ reference reward). Raw proportions of LL choices are plotted in Figure 1*b*. ANOVA on arcsine-square-root transformed proportion values with the within-subject factor magnitude (high [100€] vs low [20€] SS reward) and the between-subject factor drug (placebo vs haloperidol) confirmed a significant magnitude effect (*F*_{(1,47)} = 96.86, *p* < 0.001) such that participants overall made more LL selections in the high-magnitude condition. Furthermore, effects of drug (*F*_{(1,47)} = 3.47, *p* = 0.068) and drug × magnitude (*F*_{(1,47)} = 3.31, *p* = 0.075) showed trend-level significance.

### Extended Data Figure 1-1

RT histograms for each individual participant (placebo group: blue, haloperidol group: orange) following exclusion of each participants' 2.5% fastest and 2.5% slowest trials. Subjects 24, 25 and 41 still had a number of implausibly fast trials even after filtering (see Extended Data Figure 1-2 and methods section) and were therefore excluded from modeling. Download Figure 1-1, EPS file

### Extended Data Figure 1-2

Histogram of minimum RTs across subjects following percentile-based trial filtering (i.e., following exclusion of each participants 2.5% fastest and 2.5% slowest trials). Still three participants (leftmost bars in the plot) had implausibly fast minimum RTs (z-scores < -1.65), corresponding to Subjects 24, 25, and 41 from Extended Data Figure 1-1. These participants were excluded from modeling. Download Figure 1-2, EPS file.

### Softmax choice rule

First, we analyzed our data using a standard Softmax choice rule (Fig. 2). This analysis revealed an overall drug effect on log(k), such that discounting was substantially lower in the haloperidol group compared with the placebo group (Fig. 1*a*). Examination of BFs indicated that a decrease in log(k) in haloperidol versus placebo was ∼116 times more likely than an increase (Table 2).

### Model comparison

We next compared three versions of the DDM that varied in the way that they accounted for the influence of value differences on trialwise drift rates, based on the DIC (Spiegelhalter et al., 2002). In each model, we included separate group-level distributions for the two drug conditions (haloperidol vs placebo). Furthermore, for each parameter *x*, we included a shift parameter *s _{x}* modeling the change in parameter

*x*from the high-magnitude condition (SS reward =100€) to the low-magnitude condition (SS reward = 20€) (see Materials and Methods). These

*s*parameters were modeled with Gaussian priors with means of zero (see Materials and Methods). DDM

_{x}_{0}assuming constant drift rates independent of value was also included and compared with two variants of the DDM using either linear (DDM

_{lin}) (Pedersen et al., 2017) or in a nonlinear (sigmoid) drift rate scaling (Fontanesi et al., 2019; Peters and D'Esposito, 2020). In both drug conditions as well as overall (Table 3), the data were best accounted for by a DDM with nonlinear drift rate scaling (DDM

_{S}).

We also compared the three diffusion models and the Softmax model with respect to the proportion of binary choices (LL vs SS selections) that they correctly accounted for. As can be seen from Table 4, the DDM_{s} performed numerically on par with the Softmax model, whereas the DDM_{lin} performed slightly worse.

### Overall group differences

We next examined overall group differences in model parameters for the baseline (SS reward =100€) condition. Results are plotted in Figure 3, and BFs for all group comparisons are listed in Table 5. In both groups, there was a positive association between trialwise drift rates and value differences, as the 95% HDI for the drift rate coefficient parameter did not include 0 in either group (Fig. 3*b*). Likewise, there was a slight bias toward the SS option in both groups, as the 95% HDI for bias was <0.5 in both cases (Fig. 3*e*).

We furthermore observed substantially lower group-level discount rates log(k) in the haloperidol group compared with placebo, such that the 95% HDI of the posterior group difference in log(k) was >0 (Fig. 3*a*; Table 5). Interestingly, the nondecision time was likewise substantially lower in the haloperidol group (Fig. 3*c*; Table 5), amounting, on average, to 180 ms faster nondecision times.

### Magnitude effects on model parameters

We next turned to the effects of the magnitude manipulation on diffusion model parameters, that is, the change in each parameter in the low-magnitude condition compared with the high-magnitude baseline condition. Results are plotted in Figure 4, and BFs for all directional group comparisons are listed in Table 5. There was a substantial magnitude effect on log(k), such that discounting was steeper in the low-magnitude condition (Fig. 4*a*). Interestingly, this pattern of results was not mirrored by in the magnitude effect on the starting point/bias parameter. Instead, the bias was shifted in the direction of a neutral bias (0.5) in the low-magnitude condition (Fig. 4*e*) in both groups. An additional interesting observation is that the nondecision time was increased in the low-magnitude condition by on average ∼30 ms (Fig. 4*c*).

Both drift rate components (*v _{coeff}* and

*v*) were increased in the 20€ condition (Fig. 4

_{max}*b*,

*f*). This overall effect might in part be attributable to the fact that, in the model, these two parameters effectively scale the trialwise value differences to the appropriate scale of the DDM (Pedersen et al., 2017). Because average value differences spanned a smaller absolute range in the 20€ condition, this is compensated in the model by increasing both

*v*(Fig. 4

_{coeff}*b*) and

*v*(Fig. 4

_{max}*f*). Notably, under haloperidol, the drift rate coefficient was somewhat increased, whereas the maximum drift rate was attenuated. There might be some trade-off between the drift rate components, which could contribute to such contrasting effects, such that increases in one component can be compensated by decreases in the other. There was also some evidence for a reduced magnitude effect on the maximum drift rate (Fig. 4

*f*) in the haloperidol group. This could be a reflection of the fact that the magnitude effect on LL choice proportions was numerically attenuated under haloperidol (Fig. 1

*a*), leading to overall more homogeneous values in the two conditions. Difference distributions in the remaining model parameters were centered at zero, indicating no systematic group differences.

### Correlation of model parameters

For descriptive purposes, we show the full correlation matrices for all single-subject median posterior parameters in Figure 5*a* for haloperidol and Figure 5*b* for placebo.

### Hierarchical linear regression

We also explored whether the qualitative pattern of results could be reproduced using a hierarchical linear regression, modeling trialwise inverse RTs as a function of value differences (see Materials and Methods). Full posterior distributions of all parameters are shown in Figure 6. This analysis reproduced effects observed for the full DDM. For example, the slope was overall negative, reflecting the decrease in 1/RT for increasing conflict (Fig. 6*a*). The intercept was numerically smaller under haloperidol (dBF = 0.11; see Table 6), mirroring the drug effect on the nondecision time in the DDM_{S}. However, a direct comparison with DDM parameters is complicated by the fact the intercept in the regression model also captures RT components that in the DDM are reflected in the boundary separation, as well as potentially additional nonlinear aspects of the evidence accumulation process that cannot be accounted for by the slope. These effects are visualized in Figure 6*e* where we plot the 1/RT predicted by this regression model as a function of group, condition, and decision conflict. This illustrates again the slope effect in the baseline condition and the attenuated intercept under haloperidol.

### Associations with working memory span

Exploratory analyses did not reveal associations between model parameters of interest (log(k), nondecision time, drift rate scaling) and working memory score (all |*r*| < 0.38).

### Posterior predictive checks

We next performed extensive posterior predictive checks to ensure that the best-fitting model (DDM_{S}) could account for RTs of individual participants in both groups. To this end, we binned the trials of each individual participant into five bins, according to the absolute difference in LL versus SS value (computed according to each participant's median posterior log(k) parameter from the DDM_{S}). For each bin, participant, and condition, we then plot the mean observed RT, as well as the mean simulated RT across 10,000 datasets simulated from the posterior distributions of the DDM_{0}, DDM_{lin}, and DDM_{S}. These results are shown in Figure 7 for the placebo group and Figure 8 for the haloperidol group. As can be seen, the DDM_{S} provided a much better account of how RTs vary as a function of decision conflict than the DDM_{lin} in the vast majority of participants in both groups. This was mainly because the DDM_{lin} overestimated RTs with medium decision conflict and underestimated RTs in cases of very low decision conflict (Peters and D'Esposito, 2020).

Some additional nontrivial patterns in the data deserve mention. For example, while the DDM_{S} in most cases predicted longest RTs for choices with the highest decision conflict, this was not always the case (see, e.g., the low-magnitude condition of Participant 34 from the placebo group in Fig. 7). In this case, in the low-magnitude condition, the participant exhibited a relatively small boundary separation (1.84) and drift rate coefficient (0.24), in combination with a bias toward the SS boundary (0.43) and a high discount rate log(k) (−0.7). In such a constellation, the bias toward the SS boundary can only be overcome when value evidence is accumulated for a relatively long time (because *v _{coeff}* is relatively small), giving rise to long RTs for LL choices (which in this case only occurred in the case of low decision conflict).

### Parameter recovery

As a final model check, we ran a series of parameter recovery simulations. Here, we randomly selected 10 datasets simulated from the posterior distribution of the DDM_{S} (see Materials and Methods), and refit these synthetic data with the DDM_{S}. Results are shown in Figure 9 for the baseline (high magnitude 100€) parameters, and Figure 10 for the parameters modeling condition effects. As can be seen from these plots, for both baseline and condition effects, this revealed that group-level parameters (Figs. 9, 10, bottom rows) recovered well, such that the true generating parameters were generally contained in the estimated 95% HDIs.

### Extended Data Figure 9-1

Nonparametric Spearman correlation coefficients (generating versus fitted) for all subjects. Download Figure 9-1, DOCX file.

Parameter recovery for individual-subject parameters was excellent for all baseline (100€ magnitude) parameters (Fig. 9, top row) such that the correlation between generating and estimated individual-subject parameters was >0.9 for all parameters. For the parameters modeling condition effects (magnitude effects, Fig. 10, top row), these correlations were lower for some parameters, in particular for condition effects on boundary separation and log(k). The likely reason is that the synthetic data were simulated from the actual posterior distribution, and there was overall little between-subject variance in some of these parameters in our data (see, e.g., Fig. 10*a*,*f*).

## Discussion

We investigated the effects of a single dose of the D2-receptor antagonist haloperidol (2 mg) on temporal discounting in a between-subjects study in a double-blind placebo-controlled setting. A diffusion model-based analysis revealed substantially smaller log(k) parameters and a substantial reduction in nondecision times under haloperidol versus placebo.

We applied a recent class of value-based decision models based on the DDM (Pedersen et al., 2017; Fontanesi et al., 2019; Shahar et al., 2019; Peters and D'Esposito, 2020). Comprehensive RT-based analysis was not possible in previous studies because of the specifics of task timing (Pine et al., 2010) or low trial numbers (Weber et al., 2016; Petzold et al., 2019). Model comparison confirmed previous results (Fontanesi et al., 2019; Peters and D'Esposito, 2020), such that the data were better accounted for by a model assuming a nonlinear trialwise scaling of the drift rate, and this was confirmed via posterior predictive checks of single-subject data. Extensive parameter recovery analyses confirmed that group-level parameters recovered well (Fontanesi et al., 2019; Peters and D'Esposito, 2020). Recovery of individual-subject baseline parameters (100€ magnitude condition) was excellent, whereas recovery of parameters modeling condition effects was somewhat lower. This is likely because of some parameters (e.g., boundary separation shift) showing low between-subject variance. Modeling was further validated by the observation that drug effects were fully reproduced using a Softmax choice rule (Sutton and Barto, 1998) and by the finding that the magnitude effect (Green et al., 1997; Ballard et al., 2017; Mellis et al., 2017) was likewise replicated using the DDM-based approach. The qualitative pattern of RT effects was reproduced using a hierarchical linear regression model of trialwise inverse RTs as a function of decision conflict.

The human literature on DA and impulsivity is heterogeneous (D'Amour-Horvat and Leyton, 2014), and interpretation of these findings is complicated by several factors. First, effects of dopaminergic drugs might depend on baseline DA availability (Cools and D'Esposito, 2011), such that the same drug might impair or enhance performance in different participants, according to an inverted U-shaped function (or a different process-dependent function) (Floresco, 2013). Second, the action of D2-receptor antagonists is often interpreted in terms of a reduction in DA neurotransmission (Pessiglione et al., 2006; Pine et al., 2010). But such drugs might indeed enhance DA release by predominantly binding at presynaptic DA auto-receptors, at least at lower dosages (Frank and O'Reilly, 2006) as shown in animal (Pehek, 1999; Schwarz et al., 2004) and human studies (Chen et al., 2005).

Interpretation of D2-receptor antagonist effects as a presynaptically mediated elevation of DA release might reconcile a number of conflicting results. First, our finding of reduced temporal discounting under haloperidol is in line with two recent studies that reported reduced temporal discounting following administration of D2/D3-receptor antagonists (Arrondo et al., 2015; Weber et al., 2016). On the other hand, a reduction of temporal discounting following administration of haloperidol was not observed in an earlier within-subjects study in *n* = 13 participants (Pine et al., 2010) that used a slightly lower dosage of 1.5 mg (we used 2 mg). Lower dosages of D2/D3-receptor antagonists might increase (rather than decrease) DA signaling (Frank and O'Reilly, 2006), an effect mediated by inhibitory feedback through presynaptic D2 auto-receptors (Grace, 1991), which may lead to an enhancement of phasic (vs. tonic) DA signaling (Frank and O'Reilly, 2006), a point that we return to below. However, we do acknowledge that such an interpretation is not general consensus in the cognitive literature on DA drug effects (Pessiglione et al., 2006; Pine et al., 2010).

Our results advance previous findings regarding the role of D2/D3-receptor antagonists in temporal discounting in several ways. First, participants performed an unrelated memory task during fMRI directly before completing the temporal discounting task. Those data revealed an overall main effect of drug condition on trial onset-related activity in caudate nucleus (Clos et al., 2019a,b) (i.e., caudate activity was increased under haloperidol). Although this neural read-out was obtained before the discounting task, both the fMRI and temporal discounting time points were well within the time of maximum haloperidol plasma levels (Franken et al., 2017). This observation is arguably more compatible with the idea that the dosage of haloperidol applied here increased (rather than decreased) striatal DA signaling. Similar neural evidence was lacking in most previous human pharmacological studies on DA effects on discounting (de Wit et al., 2002; Hamidovic et al., 2008; Arrondo et al., 2015; Weber et al., 2016). Second, the DDM-based modeling approach adopted in the present study allowed us examine the dynamics underlying decision-making much more comprehensively than previous human pharmacological studies (de Wit et al., 2002; Hamidovic et al., 2008; Pine et al., 2010; Arrondo et al., 2015; Weber et al., 2016; Petzold et al., 2019). In addition to the drug effect on the discount rate log(k), diffusion modeling revealed substantially shorter nondecision times in the haloperidol group that amounted to ≈180 ms on average. Such a robust enhancement of lower-level motor and/or perceptual RT components is also more compatible with an increase, rather than a decrease, in DA transmission (Weed and Gold, 1998) and resonates with previous findings regarding a dopaminergic enhancement of RT-based response vigor (Guitart-Masip et al., 2011; Beierholm et al., 2013). An exploratory inspection of parameter correlations revealed that log(k) and nondecision time were positively correlated in both groups, suggesting that they might capture similar aspects of the data and/or might both be modulated by changes in phasic dopaminergic responses. In support of this interpretation, augmentation of DA levels in Parkinson's disease patients reduces temporal discounting (Foerde et al., 2016) and improves model-based RL (Sharp et al., 2016). Finally, this interpretation of available human D2-receptor antagonist effects would also reconcile the human and animal literature on acute dopaminergic effects on impulsivity (D'Amour-Horvat and Leyton, 2014). Together, these considerations lead us to suggest that haloperidol increased (rather than decreased) striatal DA neurotransmission, resulting in enhanced cognitive control (reduced discounting) and a substantial facilitation of motor responding (shorter nondecision times).

By what mechanism might haloperidol attenuate the impact of delay on reward valuation? According to models of basal ganglia contributions to action selection (Maia and Frank, 2011), the probability for selecting a given candidate action depends on the relative difference in activation between the direct (*go*) and the indirect (*nogo*) pathways. A similar striatal gating mechanism might underlie working memory and/or prefrontal control functions (Cools, 2011). By increasing phasic DA responses, haloperidol might increase the signal-to-noise ratio in striatal value representations, thereby increasing the likelihood that objectively smaller and/or more delayed LL rewards gain access to processing in the PFC. Naturally, other modes of action are likewise conceivable. Frontal and striatal regions are interconnected via a series of loops that follow a dorsal-to-ventral organization (Haber and Knutson, 2010), and haloperidol might impact functional interactions within these circuits (Cools, 2011), for example, related to top-down control of value representations (Hare et al., 2009, 2014; Figner et al., 2010; Peters and D'Esposito, 2016). Finally, haloperidol might have directly augmented control processes in specific PFC regions (Figner et al., 2010). However, because of the much greater expression of D2 receptors in striatum compared with PFC (Seamans and Yang, 2004), it is generally assumed that prefrontal action of D2 antagonists requires substantially higher dosages than those applied in the studies examined here (Seamans and Yang, 2004; Frank and O'Reilly, 2006).

The present study has a number of limitations that need to be acknowledged. First, we did not run a within-subjects design, which would have allowed us to account for individual-participant baseline parameters in the analysis of the drug effects. Second, this also precluded us from comprehensively analyzing potential modulatory influences of, for example, individual differences in working memory on the drug effects, which might modulate DA effects on discounting (Petzold et al., 2019) and cognitive control more generally (Cools and D'Esposito, 2011). Third, the proportion of female participants was relatively large. Given the known association of ovarian hormones with the DA system (Yoest et al., 2018), future studies would benefit from testing larger sample sizes that allow for the examination of gender effects and/or from directly controlling menstrual cycle phase. Fourth, rewards were hypothetical because of the inclusion of the high-magnitude condition. However, preferences for real and hypothetical outcomes in temporal discounting tasks show a very good correspondence (Johnson and Bickel, 2002) and rely on similar neural circuits (Bickel et al., 2009). Also, neural haloperidol effects vary across brain regions and functions (Wächtler et al., 2020), complicating interpretation as no task-related imaging data were obtained here.

In conclusion, our data show that the D2-receptor antagonist haloperidol attenuated temporal discounting and substantially shortened nondecision times, as revealed by comprehensive computational modeling of choices and RTs using hierarchical Bayesian parameter estimation. These data are best accounted for by a model in which low dosages of haloperidol lead to an enhancement of phasic DA responses because of reduced feedback inhibition from D2 auto-receptors, leading to an augmentation of both lower-level (nondecision time) and higher-level (temporal discounting) decision components.

## Footnotes

This work was supported by Deutsche Forschungsgemeinschaft PE1627/5-1 to J.P. and SO952/3-1 to T.S.

The authors declare no competing financial interests.

- Correspondence should be addressed to Ben Wagner at ben.jonathan.wagner{at}uni-koeln.de or Jan Peters at jan.peters{at}uni-koeln.de