Skip to main content

Main menu

  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Collections
    • Podcast
  • ALERTS
  • FOR AUTHORS
    • Information for Authors
    • Fees
    • Journal Clubs
    • eLetters
    • Submit
  • EDITORIAL BOARD
  • ABOUT
    • Overview
    • Advertise
    • For the Media
    • Rights and Permissions
    • Privacy Policy
    • Feedback
  • SUBSCRIBE

User menu

  • Log out
  • Log in
  • My Cart

Search

  • Advanced search
Journal of Neuroscience
  • Log out
  • Log in
  • My Cart
Journal of Neuroscience

Advanced Search

Submit a Manuscript
  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Collections
    • Podcast
  • ALERTS
  • FOR AUTHORS
    • Information for Authors
    • Fees
    • Journal Clubs
    • eLetters
    • Submit
  • EDITORIAL BOARD
  • ABOUT
    • Overview
    • Advertise
    • For the Media
    • Rights and Permissions
    • Privacy Policy
    • Feedback
  • SUBSCRIBE
PreviousNext
Articles, Behavioral/Systems/Cognitive

Dopaminergic Drugs Modulate Learning Rates and Perseveration in Parkinson's Patients in a Dynamic Foraging Task

Robb B. Rutledge, Stephanie C. Lazzaro, Brian Lau, Catherine E. Myers, Mark A. Gluck and Paul W. Glimcher
Journal of Neuroscience 2 December 2009, 29 (48) 15104-15114; DOI: https://doi.org/10.1523/JNEUROSCI.3524-09.2009
Robb B. Rutledge
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Stephanie C. Lazzaro
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Brian Lau
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Catherine E. Myers
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mark A. Gluck
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Paul W. Glimcher
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Article
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF
Loading

Abstract

Making appropriate choices often requires the ability to learn the value of available options from experience. Parkinson's disease is characterized by a loss of dopamine neurons in the substantia nigra, neurons hypothesized to play a role in reinforcement learning. Although previous studies have shown that Parkinson's patients are impaired in tasks involving learning from feedback, they have not directly tested the widely held hypothesis that dopamine neuron activity specifically encodes the reward prediction error signal used in reinforcement learning models. To test a key prediction of this hypothesis, we fit choice behavior from a dynamic foraging task with reinforcement learning models and show that treatment with dopaminergic drugs alters choice behavior in a manner consistent with the theory. More specifically, we found that dopaminergic drugs selectively modulate learning from positive outcomes. We observed no effect of dopaminergic drugs on learning from negative outcomes. We also found a novel dopamine-dependent effect on decision making that is not accounted for by reinforcement learning models: perseveration in choice, independent of reward history, increases with Parkinson's disease and decreases with dopamine therapy.

Introduction

Midbrain dopamine neurons are thought to play a critical role in reinforcement learning. Electrophysiological recordings from the ventral tegmental area and substantia nigra in animals suggest that dopamine neurons encode the reward prediction error (RPE) signal hypothesized to guide action value learning in contemporary reinforcement learning (RL) models (Schultz et al., 1997; Hollerman and Schultz, 1998; Nakahara et al., 2004; Bayer and Glimcher, 2005). According to standard RL models, action values are updated on a trial-by-trial basis using a RPE term, the difference between the experienced and predicted reward (Rescorla and Wagner, 1972; Sutton and Barto, 1998). The phasic activity of midbrain dopamine neurons is widely hypothesized to carry this error term, possibly after multiplication by a learning rate term (for review of the dopaminergic RPE hypothesis, see Niv and Montague, 2009).

Parkinson's disease is characterized by a dramatic loss of dopamine neurons in the substantia nigra (Dauer and Przedborski, 2003). Several studies have found general learning deficits accompanying this loss in Parkinson's patients (Knowlton et al., 1996; Swainson et al., 2000; Czernecki et al., 2002; Shohamy et al., 2004, 2009), and dopaminergic medication has been found to affect performance in many tasks (Cools et al., 2001, 2006; Frank et al., 2004, 2007b; Shohamy et al., 2006; Bódi et al., 2009). However, no study has ever fit standard RL models to choice behavior in Parkinson's patients on and off dopaminergic medications to quantitatively test predictions the dopaminergic RPE hypothesis makes about learning rates.

Parkinson's disease is typically treated with levodopa (l-DOPA), the biosynthetic precursor to dopamine (Hornykiewicz, 1974), which is thought to increase phasic dopamine release (Keller et al., 1988; Wightman et al., 1988; Harden and Grace, 1995). If phasic dopamine activity encodes a RPE signal of some kind, then l-DOPA should modulate RPE magnitude. According to theory, this effect should manifest itself as a change in the learning rate estimated in standard RL models. This prediction stems from the fact that, in standard models, values placed on actions are updated by the product of the RPE and the learning rate. Thus, if dopamine carries either this product or simply the RPE signal, changes in learning rates will capture the effects of dopaminergic manipulations.

To test this prediction, we studied decision making in Parkinson's patients using a dynamic foraging task in which subjects could be expected to learn the value of actions using a reinforcement learning mechanism. We also tested whether dopaminergic drugs including l-DOPA (combined in some patients with dopamine receptor agonists) differentially affect learning from positive and negative outcomes as some previous studies suggest (Daw et al., 2002; Frank et al., 2004). We tested Parkinson's patients both on and off dopaminergic medication, in addition to testing both healthy young and elderly control subjects, and fit choice behavior with standard RL models to test quantitative predictions of the dopaminergic RPE hypothesis.

Materials and Methods

Subjects.

Seventy-eight paid volunteers participated in the experiment: 26 patients with Parkinson's disease (12 females; mean age, 65.7 years), 26 matched healthy elderly control subjects (12 females; mean age, 67.3 years), and 26 healthy young subjects (14 females; mean age, 22.8 years). Parkinson's patients were diagnosed with idiopathic Parkinson's disease and recruited by neurologists. Patients were at the mild to moderate stages of the disease, with scores on the Hoehn–Yahr scale of motor function (Hoehn and Yahr, 1967) of 2 or 2.5. We used the motor exam (section III) of the Unified Parkinson's Disease Rating Scale (UPDRS) (Lang and Fahn, 1989) to quantify symptom severity at the time of testing. All patients participated in two sessions, one on and one off dopaminergic medication, usually (94%; 49 of 52) in the morning. For the “on” session, patients were tested an average of 1.6 h after a dose of dopaminergic medication. For the “off” session, patients refrained from taking all dopaminergic medications for a minimum of 10 h (mean, 14.4 h). Session order was randomized across patients (11 completed the off session first). All patients were receiving treatment with l-DOPA, the precursor for dopamine, and the majority were also taking a D2 receptor agonist (n = 17). Some patients were also taking serotonergic (n = 9) or cholinergic (n = 4) medications (for medication information, see supplemental data, available at www.jneurosci.org as supplemental material). Care was taken to minimize potential serotonergic and cholinergic medication effects by having subjects take those drugs at the same time before both on and off sessions. However, it is possible that the chronically administered medications taken by our patients, and not by any elderly subjects, might affect serotonergic, cholinergic, adrenergic, or noradrenergic transmission and potentially influence choice behavior. This limitation is common to studies of decision making in Parkinson's patients.

Parkinson's patients and elderly subjects were screened for the presence of any neurological disorder in addition to Parkinson's disease and any history of psychiatric illness including depression. The Parkinson's and elderly subject groups did not differ significantly in age, years of education, verbal intelligence quotient, or several other neuropsychological measures (for details, see supplemental Table 1, available at www.jneurosci.org as supplemental material). All Parkinson's patients and elderly subjects gave informed consent in accordance with the procedures of the Rutgers Institutional Review Board for the Protection of Human Subjects of Rutgers University (Newark, NJ). All healthy young participants gave informed consent in accordance with the procedures of the University Committee on Activities Involving Human Subjects of New York University.

Behavioral task training.

After reading task instructions, subjects answered five multiple-choice questions to ensure that they had a basic understanding of the task. They then completed 200 trials of training in five separate blocks of 40 trials in which they were informed that the reward probabilities were fixed within a block and would not change. On each trial, subjects could choose either the red or the green option, animated crab traps attached to red and green buoys (Fig. 1A). When a reward had been scheduled for their chosen option, the chosen trap was raised from the ocean to reveal a crab inside. Otherwise, the chosen trap was revealed to be empty. The probability ratio specifying the relative values of the two traps in any block was either 6:1 or 1:6, with actual probabilities summing to 0.3 within each block. Subjects were asked to verbally identify the rich (higher reward probability) option after each training block. They were given feedback as to whether or not they were correct. Subjects were not paid according to performance in the training blocks, and all subjects earned $5 for completing the instructions and training.

Figure 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 1.

Experimental task design. A, Subjects selected one of two crab traps marked by red and green buoys and earned 10 cents for each crab caught. Example unrewarded and rewarded trials are shown. Earnings for the past 40 trials were displayed on the screen. B, Example block sequence. Relative reward rates (6:1, 3:1, 1:3, 1:6) changed in blocks of 70–90 trials separated by unsignaled transitions. The identity of the rich (higher reward probability) option alternated between consecutive blocks. Subjects completed 800 trials in 10 blocks.

Behavioral task experiment.

Subjects then completed 800 trials in the dynamic task environment as they tried to maximize earnings (Fig. 1B). The precise mathematical structure of the task replicated the critical features of the concurrent variable-interval tasks used by Herrnstein (1961) to formulate the matching law, which describes how animals make choices among options that differ in expected value. Monkeys performing this type of task allocate their choices according to reward probabilities and dynamically track changing probabilities (Platt and Glimcher, 1999; Sugrue et al., 2004; Corrado et al., 2005; Lau and Glimcher, 2005). Measuring choice under these conditions thus reveals the subject's expectations about the relative value of possible actions. Subjects completed 10 blocks of 70–90 trials with four possible reward probability ratios (6:1, 3:1, 1:3, 1:6). Blocks were separated by unsignaled transitions in which the identity of the higher reward probability option reversed, but with the reward probability otherwise unpredictable. Once a reward was scheduled for a trap, it remained available until the associated trap was chosen. This meant that the longer a trap remained unchosen, the greater the probability that a reward would be earned by choosing it. We used this reward schedule because monkeys performing a similar task have been shown to make choices consistent with reinforcement learning (Lau and Glimcher, 2005). The display indicated the number of crabs caught over the past 40 trials (or since the start of the experiment for the first 40 trials). Subjects were paid according to performance, earning 10 cents for each crab caught. At the end of the session, the total catch was revealed and subjects were paid accordingly. Subjects had unlimited time to make each choice but typically completed 800 trials in <30 min. Finally, subjects answered six multiple-choice questions about the experiment (for preexperiment and postexperiment questionnaires, see supplemental data, available at www.jneurosci.org as supplemental material). We computed multiple measures of task performance and fit choice data with multiple models. We used both paired and unpaired two-tailed t tests to compare behavioral measures. We used Wald tests to compare model parameter estimates across subject groups.

Matching law analysis.

We fit steady-state choice behavior using the logarithmic form of the generalized matching law (Baum, 1974): Embedded Image Here, CR and CG are the number of choices to the red and green options, respectively, and RR and RG are the number of rewards received from choices to the red and green options, respectively. In each block, we allowed 20 trials for choice behavior to stabilize and then included 50 trials in this analysis, fitting a line by least-squares regression. By the generalized matching law (Baum, 1974), the slope of this line (a) is the reward sensitivity, a measure of the sensitivity of choice allocation to reward frequency.

Single α reinforcement learning model. We fit choice data from all subject groups with a standard RL model (Sutton and Barto, 1998). The model uses the sequence of choices and outcomes to estimate the expected value of each option for every trial. The expected values are set to zero at the beginning of the experiment, and after each trial, the value of the chosen option [for example, VR(t) for the red option at trial t] was updated according to the following rule: Embedded Image Embedded Image Here, δ(t) is the RPE, the difference between the experienced and expected reward. RR(t) represents the outcome received from the red option on trial t with a value of 1 for a reward and 0 otherwise. The learning rate α determines how rapidly the estimate of expected value is updated. If the learning rate is high, recent outcomes have a relatively greater influence on the expected value than less recent outcomes. Given expected values for both options, the probability of choosing the red option PR(t) is computed using the following softmax rule: Embedded Image Here, β is a noise parameter and CR(t − 1) and CG(t − 1) represent the choice of the red or green option on the previous trial t − 1 with a value of 1 for the chosen option and 0 otherwise [CR(t) = 1 − CG(t)]. The choice perseveration parameter c captures tendencies to perseverate or alternate (when positive or negative, respectively) that are independent of reward history (Lau and Glimcher, 2005; Schönberg et al., 2007). This parameter is similar to b1 in the linear regression model below. The constants α (learning rate), β (noise parameter), and c (choice perseveration parameter) were estimated by maximum likelihood (Burnham and Anderson, 2002).

Dual α reinforcement learning model. We also fit a second learning model closely related to the standard RL model, which has been proposed previously (Daw et al., 2002; Frank et al., 2007a) and for which there is growing evidence from Parkinson's studies (Frank et al., 2004, 2007b; Cools et al., 2006; Bódi et al., 2009). This model is identical with the one specified above except that it uses separate learning rates for positive and negative outcomes. In this model, the value of the chosen option (for example, the red option) was updated according to the following rule: Embedded Image

Parameter estimation for reinforcement learning models.

For single and dual α RL models, we fit choice data across Parkinson's and elderly subject groups (excluding young subjects) with a single shared noise parameter and separate learning rate and choice perseveration parameters for each subject group. We estimated all parameters simultaneously by maximum likelihood (Burnham and Anderson, 2002). To determine whether learning rates and choice perseveration parameters are affected by age, we also fit the single α RL model separately for each subject in the Parkinson's and elderly subject groups with a single shared noise parameter across all subjects. We used a shared noise parameter because the learning rate and noise parameter estimates are not fully independent, and Schönberg et al. (2007) have shown that leaving both parameters fully unconstrained can lead to interpretability problems. One solution is to fix one of the parameters to examine specific hypotheses regarding the other parameter, which remains unconstrained. The noise parameter can be fixed, for example, to test for dopaminergic influences on the learning rate. Alternatively, the learning rate can be fixed to test for dopaminergic influences on subject randomness (stochasticity). Since we wanted to test whether dopaminergic manipulations influence behavior in a way that can be captured by changes in the learning rate, we fit a single noise parameter across Parkinson's and elderly subject groups.

Hypothesis testing.

We used these RL models to compare our Parkinson's and elderly subject groups as well as to compare the more and less severely affected of our Parkinson's patients. Knowlton et al. (1996) found that the one-half of their Parkinson's patients with the most severe symptoms had worse performance on a probabilistic classification task than all patients combined. We therefore split our Parkinson's patients into two groups of equal sizes according to symptom severity measured in the off medication session by the UPDRS motor exam to determine whether disease progression affects choice behavior.

The dopaminergic RPE hypothesis predicts that learning rates should be higher in Parkinson's patients on than off dopaminergic medication and lower in patients off medication than elderly control subjects. It also predicts that learning rates should be lower in the more than in the less severely affected Parkinson's subgroup. The hypothesis makes no specific predictions for how choice perseveration parameters might be modulated by dopamine levels nor does it explicitly predict how learning rates for positive and negative outcomes might differ across subject groups. For all comparisons for which we have specific predictions, we report uncorrected p values for these ex ante hypotheses. For analyses about which we had no ex ante hypotheses, we also report p values Bonferroni-corrected for multiple comparisons.

l-DOPA is thought to increase phasic dopamine release (Keller et al., 1988; Wightman et al., 1988; Harden and Grace, 1995). Value estimates in the standard RL model (Eq. 2) are updated by the product αδ(t), which we refer to as the error correction term. If phasic dopamine activity encodes a RPE signal, then treatment with l-DOPA should affect value learning by amplifying the error correction term, and this manipulation will be reflected in choice behavior by a change in the learning rate. Despite the fact that the dopaminergic RPE hypothesis is silent about the role of tonic dopamine activity in behavior, it is important to note that both phasic and tonic dopamine signaling may well be affected by the dopamine depletion associated with Parkinson's disease and by the dopaminergic drugs taken to treat the disorder. There is now growing evidence that phasic and tonic dopamine activity may play distinct roles in reinforcement learning (Niv et al., 2007), and as those phasic/tonic models evolve it may become possible to test additional hypotheses with these data.

A final note concerns the interpretation of our hypothesis tests with regard to the dopaminergic RPE hypothesis. Value estimates are updated by the product αδ(t). This implies that a difference in observed learning rates between subject groups might reflect the role of dopamine in encoding α, δ(t), or the product αδ(t). In electrophysiological studies, phasic dopaminergic firing rates are observed to vary under conditions in which α is believed to be constant (Schultz et al., 1997; Bayer and Glimcher, 2005), implying that the phasic activity of dopamine neurons does not encode α alone. These data make it unlikely that l-DOPA, by increasing the dopamine release associated with action potentials, could uniquely influence α without affecting δ(t). It is possible that other dopaminergic manipulations, including dopamine receptor agonists taken by many of our subjects, could uniquely affect α. Regardless, even the electrophysiological data give no guidance for determining whether l-DOPA should influence δ(t) or αδ(t). Model fits would reveal a change in learning rate in either case. For this reason, changes in α estimated from behavior by RL models should be interpreted as evidence that either δ(t) or αδ(t) is influenced by l-DOPA administration.

Linear regression model.

In an effort to test the robustness of any conclusions drawn from the fits of our RL models, we also used a linear regression approach to fit our choice data [as in Lau and Glimcher (2005)]. To perform this robustness check, we assumed that influences of past rewards and choices were linearly combined to determine choice on each trial, with choice probability computed using the softmax rule, as in the RL model. We used logistic regression to estimate weights for rewards received and choices made on previous trials, so the noise parameter of the softmax rule is effectively incorporated into reward weights by the regression. The goal of the regression was to estimate the probabilities PR(t) and PG(t) of choosing the red and green options, respectively. Since there are only two options and assuming symmetric weights for the two options, the model for 10 previous trials reduces to the following: Embedded Image Here, a and b coefficients represent changes in the log odds of choosing the red or green options with ai the weight for a reward received i trials ago and bi the weight for a choice made i trials ago. Negative weights indicate decreases in the log odds of choice as a function of previous rewards (ai) or choices (bi). The log odds of the subject making a given choice on a specific trial is obtained by linearly combining previous choices and outcomes (rewards), weighted by the coefficients extracted by the regression. The logistic regression is linear in log odds but nonlinear in choice probability. Choice probability is recovered from the log odds by exponentiating both sides of Equation 6 and solving for PR(t). This formulation has been used to study choice and striatal function in monkeys (Lau and Glimcher, 2005, 2008). If all bi are set to zero, this is similar to the linear reward model that has been used in several previous studies of choice in monkeys (Sugrue et al., 2004; Corrado et al., 2005; Kennerley et al., 2006).

If the RL models discussed above perfectly described behavior, then the weights ai would decline exponentially with the decay rate specified by the learning rate term in the RL models. The linear regression model thus relaxes the constraint, imposed by the RL models presented above, that weights must decline exponentially. The inclusion of multiple bi terms relaxes the constraint in the specific RL models we used that, independent of rewards, only the previous choice may influence future choice. The set of bi allows long-term trends in choice to be identified. Comparing these less constrained linear regression analyses to the standard RL models thus allows us to examine the robustness of the conclusions drawn with the more structured RL model-based approach.

Model comparison.

To evaluate model fits, we computed a pseudo-R2 statistic (Camerer and Ho, 1999) using the following equation: Embedded Image Here, L is the maximum log likelihood for the estimated model given the data and R is the log likelihood of the model under random choice. To compare the RL model and linear regression approaches, we penalized model fits for complexity using the Bayesian information criterion (BIC) (Schwarz, 1978). We computed BIC using the following equation: Embedded Image Here, L is the maximum log likelihood for the estimated model given the data, k is the number of free parameters in the model, and n is the number of trials. The model with the lower BIC is preferred.

Results

Behavioral results

Choice data are shown for example subjects from each subject group including on and off medication sessions for one Parkinson's patient (Fig. 2). We found that subjects from all four groups chose the richer option more frequently in the training session (Fig. 3A). Subjects correctly identified the richer option in all five training blocks in most (88%; 84 of 96) sessions. In the dynamic task environment, choice behavior after transitions to different reward probability ratios quickly stabilized at the group level, suggesting that subjects in all four groups adjusted choice behavior according to option reward rates (Fig. 3B).

Figure 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 2.

Observed and predicted choice for example subjects. A, Healthy young subject HC2011. B, Healthy elderly subject HC2620. C, D, Parkinson's patient PD2710 tested both on and off dopaminergic medication. Data were fit with a learning model with three parameters (learning rate α, noise parameter β, and choice perseveration parameter c). Choice data (black) and model predictions (gray) were smoothed with an 11-trial moving average. The vertical lines indicate unsignaled transitions between blocks. The horizontal lines indicate strict matching behavior (Herrnstein, 1961). Individual parameter estimates are indicated. The block sequence for young example subject HC2011 is displayed in Figure 1B.

Figure 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 3.

Behavioral results. A, Average choice behavior in the training session for young and elderly control subjects (both n = 26) and Parkinson's patients on and off medication (both n = 22). Subjects instructed to identify the rich (higher reward probability) option chose the rich option more than expected by chance (indicated by a horizontal line). Data are unsmoothed. B, Average choice behavior in the experimental session for all four subject groups (n = 26). Subjects experienced multiple unsignaled transitions between blocks and all groups reacted quickly to transitions. The probability of choosing the option that is richer after the unsignaled transition is plotted with choices aligned on the trial on which a transition occurred (indicated by a vertical line). This convention is used in subsequent panels. Data are unsmoothed. C–F, Average choice behavior in all four subject groups for high-probability (6:1) and low-probability (3:1) ratio blocks. All groups allocated choices according to option reward rates, choosing the rich option more in the high-probability than low-probability ratio blocks. Data are smoothed using an 11-trial moving average. After allowing 20 trials from the beginning of each block for choice behavior to stabilize, choices are averaged separately for high-probability and low-probability ratio blocks for the next 50 trials and plotted at the right of each panel. Error bars reflect ±SEM across subjects.

Parkinson's patients earned similar amounts on (mean ± SEM, $19.50 ± 0.34 excluding one subject who did not complete all 800 trials) and off ($19.45 ± 0.26) dopaminergic medication (paired t test, t(24) = 0.31, p = 0.76). Young control subjects earned more ($20.54 ± 0.23) than Parkinson's patients either on (unpaired t test, t(49) = 2.55, p = 0.014) or off medication (t(50) = 3.17, p = 0.003). Elderly subjects also earned more ($20.13 ± 0.34), although not significantly, than Parkinson's patients either on (t(49) = 1.32, p = 0.19) or off medication (t(50) = 1.60, p = 0.11). All four subject groups allocated more choices to the richer option in high-probability (6:1, 1:6) than low-probability (3:1, 1:3) blocks (young, paired t test, t(25) = 4.50, p = 0.0001; elderly, t(25) = 1.96, p = 0.061; Parkinson's on, t(25) = 2.29, p = 0.031; Parkinson's off, t(25) = 2.80, p = 0.01) (Fig. 3C–F). We fit a line to determine how the log ratio of choices to the two options at steady state related to the log ratio of rewards from the two options, which provides a measure of reward sensitivity (Fig. 4) (for fits, see supplemental Table 2, available at www.jneurosci.org as supplemental material). Linear fits accounted for 42–65% of the variance within each subject group. Choice was lawfully related to reward rates, suggesting that subjects in all four subject groups made their choices based on the relative values of the two options.

Figure 4.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 4.

Steady-state choice behavior obeys the matching law. A–D, Log choice ratios (ratio of red to green choices) are plotted as a function of log reward ratios for 50 trials of each block after allowing 20 trials for choice behavior to stabilize (the same period used in Fig. 3C–F). Data are plotted for all subjects in a group and the example subjects (in red) from Figure 2A–D. All four groups obey the matching law (all n = 26), allocating choices as a function of reward ratios. Blocks in which one option was never rewarded in the 50-trial period are excluded. The lines represent least-squares fits of the generalized matching law with the slope, corresponding to reward sensitivity, noted on each plot for subject group (blue) and example subject (red). Fit parameters are listed in supplemental Table 2 (available at www.jneurosci.org as supplemental material).

Reward sensitivity in the young group was comparable with measures from young adult monkeys performing a similar task (Lau and Glimcher, 2005) (see supplemental data, available at www.jneurosci.org as supplemental material). Reward sensitivity for elderly subjects was intermediate but not significantly different from Parkinson's patients either on (unpaired t test, t(472) = 1.11, p = 0.27) or off medication (t(473) = 1.36, p = 0.17). Reward sensitivity was higher in Parkinson's patients on than off medication (t(476) = 2.34, p = 0.02), and this increase in choice allocation to the richer option while on dopaminergic medication is consistent with dopamine playing a role in reinforcement learning.

Reinforcement learning models

We analyzed individual trial-by-trial choice data using a standard RL model with a learning rate α and a noise parameter β. The average individual pseudo-R2 was 0.18 (SD = 0.12; n = 104), significantly better than a model predicting random choice for most sessions (88%, 92 of 104; likelihood ratio test, p < 0.05), suggesting that a standard RL model can account for choice behavior in our task. Because choice behavior might also depend on previous choices independent of previous rewards, we also included a choice perseveration parameter c that captures short-term tendencies to alternate or perseverate (when the value of c is negative or positive, respectively) that are independent of reward history (Lau and Glimcher, 2005; Schönberg et al., 2007). The average individual pseudo-R2 for the RL model with this additional parameter was 0.27 (SD = 0.16; n = 104), significantly better than a model predicting random choice for all but one session (99%, 103 of 104; likelihood ratio test, p < 0.05) and not significantly different between the two Parkinson's groups (paired t test, t(25) = 0.63, p = 0.54) or between elderly subjects and either Parkinson's group (unpaired t test, both t(50) < 0.91, p > 0.3). We plot individual fits for this model for example subjects in Figure 2 (pseudo-R2, 0.30–0.40).

To characterize choice behavior across subject groups, we fit choice data with a single shared noise parameter across Parkinson's and elderly groups (β = 1.73 ± 0.03; SEs are indicated for all parameter estimates) and separate learning rate and choice parameters for each group. The group parameter estimates are plotted in Figure 5 relative to parameter estimates for elderly subjects (α = 0.60 ± 0.02; c = 0.39 ± 0.02; n = 26). Learning rates for Parkinson's patients off medication were similar to learning rates in elderly control subjects (Wald test, p = 0.44).

Figure 5.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 5.

Reward and choice effects for Parkinson's and elderly subject groups. A, B, Choice data for Parkinson's and elderly groups (all n = 26) were fit with a single shared noise parameter and group parameter estimates are plotted relative to elderly parameter estimates. The insets show fits for the more affected one-half of the Parkinson's patients with moderate symptoms (n = 13). All differences between Parkinson's patients on and off dopaminergic medication are highly significant (all p < 0.001, indicated by stars). Parkinson's patients have higher learning rates on than off dopaminergic medication. Parkinson's patients perseverated in their choices more, independent of reward history, off than on dopaminergic medication. These differences were maintained in the more affected one-half of the patients with moderate symptoms. Error bars indicate 95% confidence intervals.

We checked whether model parameters were affected by disease progression by splitting the Parkinson's patients according to disease severity, into more and less affected subgroups of equal sizes. Learning rates were lower in the more than in the less affected Parkinson's subgroup off medication (Wald test, p < 0.0001; both n = 13; noise parameter fixed at β = 1.73). This difference cannot be accounted for by increased motor symptoms impairing task performance since reaction times, measured from stimulus onset to response completion, were similar off medication in the more affected (mean ± SEM of median reaction times, 364 ± 56 ms; n = 13) and the less affected Parkinson's patients (368 ± 41 ms, n = 13; paired t test, t(24) = 0.063, p = 0.95). Learning rates were also lower in the more affected Parkinson's patients off medication than in elderly control subjects (Wald test, p = 0.006) (Fig. 5A). Surprisingly, the less affected Parkinson's subgroup had higher learning rates off medication than the elderly control subjects (p < 0.0001).

Most importantly, as predicted by the dopaminergic RPE hypothesis, learning rates were higher in Parkinson's patients on than off dopaminergic medication. This difference was significant at p = 0.0003 (Wald test) (Fig. 5A). We also found an additional dopamine-dependent effect on decision making. Parkinson's patients off dopaminergic medication perseverated more, independent of reward history, than elderly control subjects (Wald test, p < 0.0001; corrected p < 0.0001), and dopaminergic medication reduced this tendency (p < 0.0001; corrected p < 0.0001) (Fig. 5B), making them more like control subjects. The same differences in learning rate and perseveration between Parkinson's patients on and off medication were maintained in the more affected Parkinson's subgroup (both p < 0.0001) (Fig. 5).

Because block lengths varied within a limited range (70–90 trials), subjects might have learned to predict block transitions and to adjust their choice behavior accordingly. For example, learning rates might be higher immediately after block transitions than in later block phases and our analysis might have obscured this fact. However, we found that parameter estimates were similar in early and late block phases, and there was no indication that subjects were able to predict block transitions (see supplemental data, available at www.jneurosci.org as supplemental material).

Previous studies have suggested that dopamine neurons might be differentially involved in learning from positive and negative outcomes (Daw et al., 2002; Frank et al., 2004, 2007b; Bayer and Glimcher, 2005; Cools et al., 2006, 2009; Frank and O'Reilly, 2006; Bayer et al., 2007; Bódi et al., 2009). To test for this possibility, we fit separate learning rates for positive and negative outcomes, as in a recent study of reinforcement learning in healthy young subjects (Frank et al., 2007a), again with a single shared noise parameter (β = 1.12 ± 0.03) and separate learning rate and choice perseveration parameters for each Parkinson's and elderly subject group. This model fit the data significantly better than the single learning rate model after accounting for number of parameters (likelihood ratio test, p < 0.0001). Group choice parameters were similar to those estimated using the single learning rate model (Wald test, all p > 0.19) and differences between group choice parameters remained significant at p < 0.0001. Group parameter estimates are plotted in Figure 6 relative to parameter estimates for elderly subjects (αpositive = 1.20 ± 0.05; αnegative = 0.47 ± 0.02). We found that learning rates for positive outcomes were significantly higher in Parkinson's patients on than off dopaminergic medication (Wald test, p = 0.003; corrected p = 0.016), but dopaminergic medication did not affect learning rates for negative outcomes (p = 0.44; corrected p = 1.0). This finding supports a role for dopamine neurons in learning from positive but not negative outcomes. Learning rates in Parkinson's patients off medication were similar to elderly control subjects for positive outcomes (p = 0.58; corrected p = 1.0) but were marginally higher for negative outcomes (p = 0.032; corrected p = 0.19), indicating a possible increase in sensitivity to negative outcomes in Parkinson's patients, consistent with a previous study (Frank et al., 2004).

Figure 6.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 6.

Learning rates for positive and negative outcomes. A, B, Choice data for Parkinson's and elderly subject groups were fit with a shared noise parameter and group parameter estimates are plotted relative to elderly parameter estimates. Parkinson's patients have higher learning rates for positive outcomes on than off dopaminergic medication (p = 0.003, indicated by a star). Parkinson's patients had similar learning rates for negative outcomes on and off dopaminergic medication (p = 0.44). Error bars indicate 95% confidence intervals.

Since the number of dopamine neurons in the substantia nigra (although not in the ventral tegmental area) declines with normal aging (Stark and Pakkenberg, 2004), we examined whether learning rate or choice perseveration parameters correlated with age. Excluding young subjects from this analysis, we fit choice data across Parkinson's and elderly subject groups with a single noise parameter (β = 2.61 ± 0.16) and separate learning rate and choice parameters for all individuals (Fig. 7). We found no significant correlation between learning rate and age for Parkinson's or elderly groups (all r < 0.18; p > 0.3) (Fig. 7A). We did not find a correlation between choice perseveration and age in either Parkinson's group (both r < 0.16; p > 0.3), but we did find a positive correlation between choice perseveration and age in the elderly subjects (r = 0.40; p = 0.043) (Fig. 7B).

Figure 7.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 7.

Effect of age on individual parameter estimates. A, B, Parameter estimates for individual subjects are plotted against subject age. Choice data for individual Parkinson's patients and elderly subjects were fit with a shared noise parameter. There was no significant relationship between individual learning rates and age in elderly or Parkinson's subject groups (all p > 0.3). Choice perseveration increased with age for elderly subjects (p = 0.043; the line represents the least-squares fit) but not for either Parkinson's group (both p > 0.3). Young subjects were excluded from these analyses, but we used the noise parameter obtained for Parkinson's and elderly groups to fit their data and plot the results for comparison purposes.

We considered whether other variables might also be correlated with perseveration in elderly subjects. Choice perseveration was significantly correlated with scores on the two memory tests (both r > 0.51; p < 0.007), but not with each of the other six demographic and neuropsychological variables (all r < 0.22; p > 0.29). We note that age and scores on the two memory tests were highly correlated (both r > 0.72; p < 0.0001 for both memory scores) and that multiple regression across demographic and neuropsychological variables revealed no significant correlations at p < 0.05, even without correcting for multiple comparisons. Our results thus indicate that, although perseveration might result from normal aging, perseveration might also be related to memory decline. Our sample size is simply too small to determine to what extent each contributes differentially to perseveration.

Linear regression model of reward influence on choice

Not all recent studies of value-based decision making have used a standard RL model. Several studies in monkeys have used an alternate approach that does not assume that the influence of rewards received on previous trials decays in an exponential manner (Sugrue et al., 2004; Corrado et al., 2005; Lau and Glimcher, 2005, 2008; Kennerley et al., 2006), an assumption effectively embedded in standard RL models by the learning rate α. To examine the robustness of our RL model-based findings, we fit trial-by-trial choice dynamics using a more general linear regression model (Fig. 8). Our results using this approach were broadly consistent with the results obtained using the more constrained RL models.

Figure 8.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 8.

Two-stage description of choice. At the first stage, past rewards and choices are used to value the two options. An example sequence of five trials is shown with the most recent trial being a rewarded choice to the red option. We fit choice data with a linear model and, as an example, display reward and choice effects for young subject HC2011 from Figure 2A. The relative weight of rewards received on past trials typically decays with time. Choice effects capture tendencies to alternate or perseverate that are independent of reward history. The negative weight for the most recent choice corresponds to a reward-independent tendency to alternate. A positive weight reflects a tendency to perseverate. At the second stage, the value of the options is compared, incorporating choice effects, and an option is selected using a decision rule.

To perform this analysis, we assumed only that past rewards and choices were weighted and linearly combined, with no restriction on the structure of weights, to determine what choice a subject would make. Best fitting weights for past rewards and choices obtained in this way quantify changes in the probability of a subject choosing a particular option exerted by past events. A positive reward weight for the most recent trial indicates that a reward received from the red trap (Fig. 8) had the effect of increasing the probability of choosing the red trap on the next trial. A negative choice weight on the most recent trial indicates that a choice to the red trap had the effect of decreasing the probability of choosing the red trap on the next trial, independent of past rewards. A linear regression computed in this manner effectively identifies the weights that, when summed, best describe the influence of previous rewards and choices on future choice. If these influences sum to zero, either option is equally likely to be chosen.

Model weights in log odds for rewards received and choices made for 10 previous trials were estimated by logistic regression for each of the four subject groups (Fig. 9). (A similar estimation was not made for the two Parkinson's subgroups because of the small size of those groups relative to the number of parameters being estimated.) Reward effects for all four groups (Fig. 9A,B) decayed in an approximately exponential manner, consistent with subjects using a reinforcement learning mechanism to learn action values. A negative weight for the most recent choice for the young subjects indicates a tendency to alternate independent of reward history (Fig. 9C). Reward and choice weights for young adult subjects are comparable with those for young adult monkeys performing a similar task (Lau and Glimcher, 2005).

Figure 9.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 9.

Reward and choice effects estimated by a linear regression model. A, Reward effects for young subjects. B, Reward effects for Parkinson's and elderly subject groups. The linear model described in Figure 8 was fit to choice data for each group (all n = 26 and >20,000 trials) using rewards and choices for 10 past trials. Coefficients are plotted as a function of trials in the past relative to the current trial. The decay in reward weights resembles an exponential function. Maximum 95% confidence intervals for all parameter estimates in each plot are indicated. C, Choice effects for young subjects. D, Choice effects for Parkinson's and elderly groups. The choice weight for the most recent trial captures the greater perseveration of Parkinson's patients off than on dopaminergic medication, as in Figure 5B.

Critically, reward weights decayed more quickly in Parkinson's patients on than off dopaminergic medication (Fig. 9B) as expected from the RL model-based analysis. As an additional check of robustness, we fit an exponential function to the reward weights and compared that time constant in Parkinson's patients on and off medication. This time constant was smaller for Parkinson's patients on than off medication (Parkinson's on, 0.96 ± 0.22; Parkinson's off, 1.67 ± 0.26; Wald test, p = 0.037). This result is again consistent with the higher learning rates we found in Parkinson's patients on than off medication. If subjects learn values according to standard RL models, this iterative computation can be equivalently described as an exponentially weighted average of previous rewards with the rate of decay determined by the learning rate α (Bayer and Glimcher, 2005). Thus, the fact that reward weights decay exponentially, although they are not constrained in any way to do so by the linear regression approach, is consistent with subjects using a RL model with an error correction term to estimate option values. Since the learning rate and noise parameter are not fully independent (Schönberg et al., 2007), it is possible that, because the noise parameter is shared across groups, changes in the noise parameter will be reflected by changes in the learning rate. Importantly, the decay rate of reward weights in the linear regression analysis provides an estimate of the rate of learning that is independent of the noise parameter. A difference in the time constants for this decay in Parkinson's patients on and off medication thus demonstrates that a significant part of the dopaminergic medication effect on learning must be attributable to a change in learning rate and not a change in the noise parameter.

For Parkinson's and elderly groups, a positive weight for the most recent choice captures the tendency, observed in the RL model-based analysis, to repeat the choice just made (Fig. 9D). This tendency to perseverate was higher in Parkinson's patients off dopaminergic medication than elderly control subjects, but was reduced by dopaminergic medication, as indicated by our RL model analysis, with all differences between subject groups significant at p < 0.0001. We fit models with 10 reward weights and up to 20 choice weights and used BIC (Schwarz, 1978) to compare fits penalizing for model complexity. The most preferred model for all four subject groups had at least five choice weights, demonstrating that choice weights for previous trials capture additional variance in choice data. This is an observation made previously for monkeys performing a similar task (Lau and Glimcher, 2005).

Reward effects estimated by the linear regression and RL model approaches are plotted for example subjects (Fig. 10A–D). To verify the appropriateness of the RL model over the more general linear regression model, we penalized model fits for complexity using BIC. We compared individual fits for the three-parameter RL model (α, β, c) to an 11-parameter linear regression model (10 reward weights, 1 choice weight) and found the (more constrained) RL model preferred for the majority (82%; 85 of 104) of sessions (Fig. 10E). To verify that this finding was not attributable to the choice of reward weight number, we fit linear regression models with 1–20 weights and found the learning model always preferred for the majority (at least 59%; 61 of 104) of sessions, showing that the RL model containing an error correction term explains the data better (in the BIC sense) than the more general linear regression model. The three-parameter RL model was also preferred to a two-parameter RL model that omitted the choice parameter in the majority (83%; 86 of 104) of sessions (Fig. 10F), supporting inclusion of the choice perseveration parameter and reiterating the importance of considering reward-independent choice effects on decision making in reinforcement-learning tasks.

Figure 10.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 10.

Comparison of RL and linear regression models. A–D, In RL models, the value of an option is computed using an exponentially weighted average with a learning rate α. The linear model in Figure 8 does not constrain reward weights to decay exponentially. For the example subjects in Figure 2, linear model weights are plotted alongside the exponential function with the learning rate estimated by a three-parameter RL model, like that used in Figure 5, as its time constant. E, F, BIC measures indicated that the three-parameter RL model we used is preferred (lower BIC) to both an 11-parameter linear regression model with 10 reward weights and one choice weight and a two-parameter RL model without a choice parameter.

Discussion

We used a dynamic foraging task and fit choice behavior with RL models to test the quantitative hypothesis that dopaminergic drugs affect the error correction term of the RL mechanism in humans. We found that (1) dopaminergic drugs increased learning rates in Parkinson's patients; (2) learning rates were similar in Parkinson's patients off dopamine medication and elderly control subjects; (3) learning rates were lower in more affected Parkinson's patients than either less affected patients or elderly control subjects; (4) dopaminergic drugs selectively increased learning rates for positive but not negative outcomes; and (5) perseveration in choice, independent of reward history, increased with normal aging and Parkinson's disease and decreased with dopamine therapy.

Human reinforcement learning and dopamine

Although dynamic foraging tasks have only recently been adapted for use in humans (Daw et al., 2006; Serences, 2008), they are commonly used to study value-based decision making in monkeys (Platt and Glimcher, 1999; Sugrue et al., 2004; Lau and Glimcher, 2005, 2008; Samejima et al., 2005). Activity in striatal neurons, which receive dense dopaminergic inputs, is correlated with trial-by-trial action values estimated from choice behavior with both approaches we used: RL models (Samejima et al., 2005) and linear regression models (Lau and Glimcher, 2008). In our task, all four subject groups adjusted to unsignaled changes in reward probabilities and allocated choices according to option reward rates, allocating more choices to options with higher reward probabilities. Steady-state choice behavior was well described by the matching law, and we found greater reward sensitivity in Parkinson's patients on than off dopaminergic medication, consistent with previous studies finding that dopaminergic drugs affect performance in learning tasks (Cools et al., 2001, 2006, 2009; Frank et al., 2004, 2007b; Frank and O'Reilly, 2006; Pessiglione et al., 2006; Shohamy et al., 2006; Bódi et al., 2009).

To test quantitative predictions of the dopaminergic RPE hypothesis, we fit choice behavior using a standard RL model. We found higher learning rates in Parkinson's patients on than off dopaminergic medication. If the phasic activity of dopamine neurons encodes RPEs (Schultz et al., 1997; Hollerman and Schultz, 1998; Nakahara et al., 2004; Bayer and Glimcher, 2005), then l-DOPA, by increasing phasic dopamine release (Keller et al., 1988; Wightman et al., 1988; Harden and Grace, 1995), should affect RPE magnitude. In standard models, action values are updated by the product of RPE and the learning rate, so our observation that learning rates increase with dopaminergic medication is predicted by standard RL models. These results suggest that dopamine neurons are involved in encoding the error correction term in RL models and are consistent with the widely held hypothesis that dopamine neurons encode a RPE signal.

Surprisingly, learning rates were similar in Parkinson's patients off medication and elderly control subjects. If the RL mechanism remains relatively intact despite significant loss of dopamine neurons in the substantia nigra during the early phase of the disease, as these data suggest, then dopamine therapy might have the effect of “overdosing” this mechanism with regard to learning behavior (Gotham et al., 1988; Swainson et al., 2000; Cools et al., 2001). This might be reflected by higher learning rates in medicated Parkinson's patients than in control subjects. This is exactly what we observed. This finding may highlight the role of the ventral tegmental area in learning, an area that is relatively intact early in Parkinson's disease (Kish et al., 1988).

To test whether disease progression influences reinforcement learning, we split our Parkinson's patients into more and less affected halves. We found that the more affected subgroup had lower learning rates than both the less affected subgroup and elderly control subjects. This suggests that, as the disease progresses, learning rates decline as dopamine depletion worsens. We also found that learning rates were higher in the less affected patients than control subjects. This may reflect higher motivation in patients than control subjects because the patients knew we were studying the effects of their disorder on decision making (the Hawthorne effect) (Frank et al., 2004). In a similar way, the effects of dopaminergic medication we observed might be explained by nonspecific effects of motivation or arousal that manifest as specific changes in learning rates.

Separating positive and negative reward prediction errors

We also found that the dopamine-dependent effect on learning was selective for learning from positive RPEs and does not appear to affect learning from negative RPEs. That dopaminergic drugs affect learning from positive RPEs (or outcomes) and asymmetrically affect learning from positive and negative RPEs (or outcomes) is consistent with a growing body of evidence. Theoretical (Daw et al., 2002; Dayan and Huys, 2009) and electrophysiological (Bayer and Glimcher, 2005; Bayer et al., 2007) studies suggest that midbrain dopamine neurons may encode positive RPEs by increases in spiking activity and that a nondopaminergic system might encode negative RPEs. Pharmacological studies (Frank et al., 2004, 2007b; Cools et al., 2006, 2009; Frank and O'Reilly, 2006; Bódi et al., 2009) suggest that positive and negative outcomes have differential effects on learned values. Our results are compatible with all of these previous findings. However, our finding that dopaminergic drugs did not affect learning from negative RPEs may be at odds with pharmacological studies finding evidence for greater learning from negative outcomes off than on medication (Frank et al., 2004, 2007b; Cools et al., 2006, 2009; Frank and O'Reilly, 2006; Bódi et al., 2009). The explanation for this inconsistency may lie in the subtle distinction between RPEs and outcomes, or in the precise reward patterns our subjects experienced. Future research will have to resolve this ambiguity. One interesting implication of the asymmetry we found is a possible explanation for the increased prevalence of pathological gambling in Parkinson's patients taking some dopaminergic drugs (Molina et al., 2000; Dodd et al., 2005; Voon et al., 2006; Dagher and Robbins, 2009). When learning rates for positive and negative outcomes are balanced, the probability of selecting an action accurately reflects the true value of that action. In contrast, an RL mechanism that overweights positive relative to negative outcomes would overvalue some options in gambling tasks because gains would effectively loom larger than losses.

Bursts, pauses, and tonic activity

Electrophysiological findings suggest that dopamine neurons encode positive RPEs by phasic bursts of activity (Bayer and Glimcher, 2005) and l-DOPA, by increasing phasic dopamine release, could amplify positive RPEs. However, it is unknown what effect dopaminergic drugs have on the duration of the pauses in dopamine neuron activity that have been correlated with negative RPEs (Bayer et al., 2007), consistent with a computational model of basal ganglia function (Cohen and Frank, 2009). If negative RPEs are, in fact, encoded in pauses and these drugs do not significantly affect pause durations, learning rates for negative outcomes could be similar in Parkinson's patients on and off medication, as we found. Alternatively, dopaminergic drugs might reduce pause durations, decreasing learning rates for negative outcomes, consistent with results of some studies (Frank et al., 2004, 2007b; Cools et al., 2006; Frank and O'Reilly, 2006).

It should also be noted that both Parkinson's disease and dopaminergic drugs, including D2 receptor agonists taken by the majority (n = 17) of our patients, are likely to affect both phasic and tonic dopamine activity. Phasic and tonic dopamine signaling may play distinct roles in reinforcement learning (Niv et al., 2007), and it is possible that tonic dopamine effects also contribute to the differences we observe between subject groups. It is also possible that dopaminergic drugs might have effects outside of the neural pathways implicated in reinforcement learning or might have nondopaminergic effects that could alter choice behavior in the same way as predicted by RL models. Future experiments might address the role of dopamine neuron pauses in reinforcement learning and the relative contributions of phasic and tonic dopamine activity and of nondopaminergic activity to reinforcement learning and decision making.

Dopamine-dependent effects on perseveration

We also found that perseveration in choice, independent of reward history, increased with normal aging and was higher in Parkinson's patients off dopaminergic medication than in elderly subjects. Dopaminergic drugs reversed this effect, reducing perseveration in Parkinson's patients. Several studies in Parkinson's patients have found deficits in switching attention from one stimulus or task to another (Lees and Smith, 1983; Cooper et al., 1991; Owen et al., 1993; Cools et al., 2001; Lewis et al., 2005; Slabosz et al., 2006), consistent with our finding of perseveration in Parkinson's patients. Parkinson's patients may be better able to switch to a new cue when it is novel (Shohamy et al., 2009). In some tasks, dopaminergic medication has been shown to improve set switching (Owen et al., 1993; Cools et al., 2001), consistent with our finding of reduced perseveration on dopaminergic medication. However, perseveration after reward contingency changes might be accounted for in some tasks by RL models (Suri and Schultz, 1999). In contrast, the perseverative effects we observed cannot be accounted for by existing RL models. This finding emphasizes the importance of considering dopamine effects on decision making that are independent of reward history. This finding may highlight the role of the substantia nigra, which is affected by both normal aging (Stark and Pakkenberg, 2004) and Parkinson's disease (Dauer and Przedborski, 2003), in this non-RL process.

Assessing the robustness of model findings

To examine the robustness of our findings, we used a more conservative linear regression approach to explain choice behavior and found results consistent with our learning model analysis. This approach is often used to study value-based decision making in animals (Sugrue et al., 2004; Corrado et al., 2005; Lau and Glimcher, 2005; Kennerley et al., 2006). We also confirmed that behavioral results were comparable for young adult subjects and young adult monkeys performing a similar task (Lau and Glimcher, 2005) (see supplemental data, available at www.jneurosci.org as supplemental material). If subjects learn values according to standard RL models, reward weights estimated by linear regression, although not constrained to do so, will decay exponentially. Reward weights across our groups bear a striking resemblance to the exponentially weighted average of reward history in RL models. Reward weights for Parkinson's patients decayed significantly more quickly on than off dopaminergic medication, consistent with higher learning rates in patients on than off dopaminergic medication. Finally, we used model comparison methods to show that the RL model explains the data better than the more general linear regression approach, although results using the two different modeling approaches were consistent.

Conclusion

Our results are consistent with the hypothesis that dopamine neurons encode a RPE signal for reinforcement learning. The dopaminergic RPE hypothesis predicts that learning rates estimated from choice behavior will differ in Parkinson's patients on and off dopaminergic medication and that is what we found. More specifically, we found that the increase in learning rates we observed with dopaminergic drugs is selective for learning from positive outcomes, consistent with the hypothesis that dopamine neurons might differentially encode positive and negative RPEs. We found reinforcement learning to remain relatively intact with aging and Parkinson's disease, but that reward-independent perseveration in choice increased with both. Dopaminergic medication reversed this effect. This novel dopamine-dependent effect is not predicted by standard RL models and highlights the importance of considering additional roles for dopamine in decision making that are independent of reward history.

Footnotes

  • This work was supported by a National Defense Science and Engineering Graduate Fellowship (R.B.R.), a Dekker Foundation Award from the Bachmann–Strauss Dystonia and Parkinson Foundation (M.A.G.), and National Institutes of Health Grants F31-AG031656 (R.B.R.), R01-NS047434 (M.A.G.), and R01-EY010536 (P.W.G.). We thank Dan Burghart, Nathaniel Daw, Eric DeWitt, Margaret Grantner, Charles Hass, Ahmed Moustafa, and Daphna Shohamy for helpful comments, Lucien Côté, Jacob Sage, and Susan Bressman for referring Parkinson's patients to this study, Kindiya Geghman, Yasmine Said, and Sarah Vanderbilt for assistance with data collection, and the Creskill Senior Center (Creskill, NJ) for their participation in this study.

  • Correspondence should be addressed to Robb B. Rutledge, Center for Neural Science, New York University, 4 Washington Place, Room 809, New York, NY 10003. robb{at}cns.nyu.edu

References

  1. ↵
    1. Baum WM
    (1974) On two types of deviation from the matching law: bias and undermatching. J Exp Anal Behav 22:231–242.
    OpenUrlCrossRefPubMed
  2. ↵
    1. Bayer HM,
    2. Glimcher PW
    (2005) Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron 47:129–141.
    OpenUrlCrossRefPubMed
  3. ↵
    1. Bayer HM,
    2. Lau B,
    3. Glimcher PW
    (2007) Statistics of midbrain dopamine neuron spike trains in the awake primate. J Neurophysiol 98:1428–1439.
    OpenUrlAbstract/FREE Full Text
  4. ↵
    1. Bódi N,
    2. Kéri S,
    3. Nagy H,
    4. Moustafa A,
    5. Myers CE,
    6. Daw N,
    7. Dibó G,
    8. Takáts A,
    9. Bereczki D,
    10. Gluck MA
    (2009) Reward-learning and the novelty-seeking personality: a between- and within-subjects study of the effects of dopamine agonists on young Parkinson's patients. Brain 132:2385–2395.
    OpenUrlAbstract/FREE Full Text
  5. ↵
    1. Burnham KP,
    2. Anderson DR
    (2002) Model selection and multimodel inference (Springer, New York).
  6. ↵
    1. Camerer C,
    2. Ho TH
    (1999) Experience-weighted attraction learning in normal form games. Econometrica 67:827–874.
    OpenUrlCrossRef
  7. ↵
    1. Cohen MX,
    2. Frank MJ
    (2009) Neurocomputational models of basal ganglia function in learning, memory and choice. Behav Brain Res 199:141–156.
    OpenUrlCrossRefPubMed
  8. ↵
    1. Cools R,
    2. Barker RA,
    3. Sahakian BJ,
    4. Robbins TW
    (2001) Enhanced or impaired cognitive function in Parkinson's disease as a function of dopaminergic medication and task demands. Cereb Cortex 11:1136–1143.
    OpenUrlAbstract/FREE Full Text
  9. ↵
    1. Cools R,
    2. Altamirano L,
    3. D'Esposito M
    (2006) Reversal learning in Parkinson's disease depends on medication status and outcome valence. Neuropsychologia 44:1663–1673.
    OpenUrlCrossRefPubMed
  10. ↵
    1. Cools R,
    2. Frank MJ,
    3. Gibbs SE,
    4. Miyakawa A,
    5. Jagust W,
    6. D'Esposito M
    (2009) Striatal dopamine predicts outcome-specific reversal learning and its sensitivity to dopaminergic drug administration. J Neurosci 29:1538–1543.
    OpenUrlAbstract/FREE Full Text
  11. ↵
    1. Cooper JA,
    2. Sagar HJ,
    3. Jordan N,
    4. Harvey NS,
    5. Sullivan EV
    (1991) Cognitive impairment in early, untreated Parkinson's disease and its relationship to motor disability. Brain 114:2095–2122.
    OpenUrlAbstract/FREE Full Text
  12. ↵
    1. Corrado GS,
    2. Sugrue LP,
    3. Seung HS,
    4. Newsome WT
    (2005) Linear-nonlinear-Poisson models of primate choice dynamics. J Exp Anal Behav 84:581–617.
    OpenUrlCrossRefPubMed
  13. ↵
    1. Czernecki V,
    2. Pillon B,
    3. Houeto JL,
    4. Pochon JB,
    5. Levy R,
    6. Dubois B
    (2002) Motivation, reward, and Parkinson's disease: influence of dopatherapy. Neuropsychologia 40:2257–2267.
    OpenUrlCrossRefPubMed
  14. ↵
    1. Dagher A,
    2. Robbins TW
    (2009) Personality, addiction, dopamine: insights from Parkinson's disease. Neuron 61:502–510.
    OpenUrlCrossRefPubMed
  15. ↵
    1. Dauer W,
    2. Przedborski S
    (2003) Parkinson's disease: mechanism and models. Neuron 39:889–909.
    OpenUrlCrossRefPubMed
  16. ↵
    1. Daw ND,
    2. Kakade S,
    3. Dayan P
    (2002) Opponent interactions between serotonin and dopamine. Neural Netw 15:603–616.
    OpenUrlCrossRefPubMed
  17. ↵
    1. Daw ND,
    2. O'Doherty JP,
    3. Dayan P,
    4. Seymour B,
    5. Dolan RJ
    (2006) Cortical substrates for exploratory decisions in humans. Nature 441:876–879.
    OpenUrlCrossRefPubMed
  18. ↵
    1. Dayan P,
    2. Huys QJM
    (2009) Serotonin in affective control. Annu Rev Neurosci 32:95–126.
    OpenUrlCrossRefPubMed
  19. ↵
    1. Dodd ML,
    2. Klos KJ,
    3. Bower JH,
    4. Geda YE,
    5. Josephs KA,
    6. Ahlskog JE
    (2005) Pathological gambling caused by drugs used to treat Parkinson's disease. Arch Neurol 62:1377–1381.
    OpenUrlCrossRefPubMed
  20. ↵
    1. Frank MJ,
    2. O'Reilly RC
    (2006) A mechanistic account of striatal dopamine function in human cognition: psychopharmacological studies with cabergoline and haloperidol. Behav Neurosci 120:497–517.
    OpenUrlCrossRefPubMed
  21. ↵
    1. Frank MJ,
    2. Seeberger LC,
    3. O'Reilly RC
    (2004) By carrot or by stick: cognitive reinforcement learning in Parkinsonism. Science 306:1940–1943.
    OpenUrlAbstract/FREE Full Text
  22. ↵
    1. Frank MJ,
    2. Moustafa AA,
    3. Haughey HM,
    4. Curran T,
    5. Hutchison KE
    (2007a) Genetic triple dissociation reveals multiple roles for dopamine in reinforcement learning. Proc Natl Acad Sci U S A 104:16311–16316.
    OpenUrlAbstract/FREE Full Text
  23. ↵
    1. Frank MJ,
    2. Samanta J,
    3. Moustafa AA,
    4. Sherman SJ
    (2007b) Hold your horses: impulsivity, deep brain stimulation, and medication in Parkinsonism. Science 318:1309–1312.
    OpenUrlAbstract/FREE Full Text
  24. ↵
    1. Gotham AM,
    2. Brown RG,
    3. Marsden CD
    (1988) “Frontal” cognitive function in patients with Parkinson's disease “on” and “off” levodopa. Brain 111:299–321.
    OpenUrlAbstract/FREE Full Text
  25. ↵
    1. Harden DG,
    2. Grace AA
    (1995) Activation of dopamine cell firing by repeated l-DOPA administration to dopamine-depleted rats: its potential role in mediating the therapeutic response to l-DOPA treatment. J Neurosci 15:6157–6166.
    OpenUrlAbstract
  26. ↵
    1. Herrnstein RJ
    (1961) Relative and absolute strength of response as a function of frequency of reinforcement. J Exp Anal Behav 4:267–272.
    OpenUrlCrossRefPubMed
  27. ↵
    1. Hoehn MM,
    2. Yahr MD
    (1967) Parkinsonism: onset, progression and mortality. Neurology 17:427–442.
    OpenUrlFREE Full Text
  28. ↵
    1. Hollerman JR,
    2. Schultz W
    (1998) Dopamine neurons report an error in the temporal prediction of reward during learning. Nat Neurosci 1:304–309.
    OpenUrlCrossRefPubMed
  29. ↵
    1. Hornykiewicz O
    (1974) Mechanisms of action of l-DOPA in Parkinson's disease. Life Sci 15:1249–1259.
    OpenUrlCrossRefPubMed
  30. ↵
    1. Keller RW Jr.,
    2. Kuhr WG,
    3. Wightman RM,
    4. Zigmond MJ
    (1988) The effect of l-DOPA on in vivo dopamine release from nigrostriatal bundle neurons. Brain Res 447:191–194.
    OpenUrlCrossRefPubMed
  31. ↵
    1. Kennerley SW,
    2. Walton ME,
    3. Behrens TE,
    4. Buckley MJ,
    5. Rushworth MF
    (2006) Optimal decision making and the anterior cingulate cortex. Nat Neurosci 9:940–947.
    OpenUrlCrossRefPubMed
  32. ↵
    1. Kish SJ,
    2. Shannak K,
    3. Hornykiewicz O
    (1988) Uneven patterns of dopamine loss in the striatum of patients with idiopathic Parkinson's disease. N Engl J Med 318:876–880.
    OpenUrlCrossRefPubMed
  33. ↵
    1. Knowlton BJ,
    2. Mangels JA,
    3. Squire LR
    (1996) A neostriatal habit learning system in humans. Science 273:1399–1402.
    OpenUrlAbstract
  34. ↵
    1. Munsat TL
    1. Lang AE,
    2. Fahn S
    (1989) in Quantification of neurologic deficit, Assessment of Parkinson's disease. ed Munsat TL (Butterworth-Heinemann, Boston), pp 285–309.
  35. ↵
    1. Lau B,
    2. Glimcher PW
    (2005) Dynamic response-by-response models of matching behavior in rhesus monkeys. J Exp Anal Behav 84:555–579.
    OpenUrlCrossRefPubMed
  36. ↵
    1. Lau B,
    2. Glimcher PW
    (2008) Value representations in the primate striatum during matching behavior. Neuron 58:451–463.
    OpenUrlCrossRefPubMed
  37. ↵
    1. Lees AJ,
    2. Smith E
    (1983) Cognitive deficits in the early stages of Parkinson's disease. Brain 106:257–270.
    OpenUrlAbstract/FREE Full Text
  38. ↵
    1. Lewis SJ,
    2. Slabosz A,
    3. Robbins TW,
    4. Barker RA,
    5. Owen AM
    (2005) Dopaminergic basis for deficits in working memory but not attentional set-shifting in Parkinson's disease. Neuropsychologia 43:823–832.
    OpenUrlCrossRefPubMed
  39. ↵
    1. Molina JA,
    2. Sáinz-Artiga MJ,
    3. Fraile A,
    4. Jiménez-Jiménez FJ,
    5. Villanueva C,
    6. Ortí-Pareja M,
    7. Bermejo F
    (2000) Pathologic gambling in Parkinson's disease: a behavioral manifestation of pharmacologic treatment? Mov Disord 15:869–872.
    OpenUrlCrossRefPubMed
  40. ↵
    1. Nakahara H,
    2. Itoh H,
    3. Kawagoe R,
    4. Takikawa Y,
    5. Hikosaka O
    (2004) Dopamine neurons can represent context-dependent prediction error. Neuron 41:269–280.
    OpenUrlCrossRefPubMed
  41. ↵
    1. Glimcher PW,
    2. Camerer CF,
    3. Fehr E,
    4. Poldrack RA
    1. Niv Y,
    2. Montague PR
    (2009) in Neuroeconomics: decision making and the brain, Theoretical and empirical studies of learning. eds Glimcher PW, Camerer CF, Fehr E, Poldrack RA (Academic, New York), pp 331–351.
  42. ↵
    1. Niv Y,
    2. Daw ND,
    3. Joel D,
    4. Dayan P
    (2007) Tonic dopamine: opportunity costs and the control of response vigor. Psychopharmacology (Berl) 191:507–520.
    OpenUrlCrossRefPubMed
  43. ↵
    1. Owen AM,
    2. Roberts AC,
    3. Hodges JR,
    4. Summers BA,
    5. Polkey CE,
    6. Robbins TW
    (1993) Contrasting mechanisms of impaired attentional set-shifting in patients with frontal lobe damage or Parkinson's disease. Brain 116:1159–1175.
    OpenUrlAbstract/FREE Full Text
  44. ↵
    1. Pessiglione M,
    2. Seymour B,
    3. Flandin G,
    4. Dolan RJ,
    5. Frith CD
    (2006) Dopamine-dependent prediction errors underpin reward-seeking behavior in humans. Nature 442:1042–1045.
    OpenUrlCrossRefPubMed
  45. ↵
    1. Platt ML,
    2. Glimcher PW
    (1999) Neural correlates of decision variables in parietal cortex. Nature 400:233–238.
    OpenUrlCrossRefPubMed
  46. ↵
    1. Black AH,
    2. Prokasy WF
    1. Rescorla RA,
    2. Wagner AR
    (1972) in Classical conditioning II: current research and theory, A theory of pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. eds Black AH, Prokasy WF (Appleton, New York), pp 64–99.
  47. ↵
    1. Samejima K,
    2. Ueda Y,
    3. Doya K,
    4. Kimura M
    (2005) Representation of action-specific reward values in the striatum. Science 310:1337–1340.
    OpenUrlAbstract/FREE Full Text
  48. ↵
    1. Schönberg T,
    2. Daw ND,
    3. Joel D,
    4. O'Doherty JP
    (2007) Reinforcement learning signals in the human striatum distinguish learners from nonlearners during reward-based decision making. J Neurosci 27:12860–12867.
    OpenUrlAbstract/FREE Full Text
  49. ↵
    1. Schultz W,
    2. Dayan P,
    3. Montague PR
    (1997) A neural substrate of prediction and reward. Science 275:1593–1599.
    OpenUrlAbstract/FREE Full Text
  50. ↵
    1. Schwarz G
    (1978) Estimating the dimension of a model. Ann Stat 6:461–464.
    OpenUrlCrossRef
  51. ↵
    1. Serences JT
    (2008) Value-based modulations in human visual cortex. Neuron 60:1169–1181.
    OpenUrlCrossRefPubMed
  52. ↵
    1. Shohamy D,
    2. Myers CE,
    3. Grossman S,
    4. Sage J,
    5. Gluck MA,
    6. Poldrack RA
    (2004) Cortico-striatal contributions to feedback-based learning: converging data from neuroimaging and neuropsychology. Brain 127:851–859.
    OpenUrlAbstract/FREE Full Text
  53. ↵
    1. Shohamy D,
    2. Myers CE,
    3. Geghman KD,
    4. Sage J,
    5. Gluck MA
    (2006) l-DOPA impairs learning, but spares generalization, in Parkinson's disease. Neuropsychologia 44:774–784.
    OpenUrlCrossRefPubMed
  54. ↵
    1. Shohamy D,
    2. Myers CE,
    3. Hopkins RO,
    4. Sage J,
    5. Gluck MA
    (2009) Distinct hippocampal and basal ganglia contributions to probabilistic learning and reversal. J Cogn Neurosci 21:1820–1832.
    OpenUrlCrossRef
  55. ↵
    1. Slabosz A,
    2. Lewis SJ,
    3. Smigasiewicz K,
    4. Szymura B,
    5. Barker RA,
    6. Owen AM
    (2006) The role of learned irrelevance in attentional set-shifting impairments in Parkinson's disease. Neuropsychology 20:578–588.
    OpenUrlCrossRefPubMed
  56. ↵
    1. Stark AK,
    2. Pakkenberg B
    (2004) Histological changes of the dopaminergic nigrostriatal system in aging. Cell Tissue Res 318:81–92.
    OpenUrlCrossRefPubMed
  57. ↵
    1. Sugrue LP,
    2. Corrado GS,
    3. Newsome WT
    (2004) Matching behavior and the representation of value in the parietal cortex. Science 304:1782–1787.
    OpenUrlAbstract/FREE Full Text
  58. ↵
    1. Suri RE,
    2. Schultz W
    (1999) A neural network learns a spatial delayed response task with a dopamine-like reinforcement signal. Neurosci 91:871–890.
    OpenUrlCrossRefPubMed
  59. ↵
    1. Sutton RS,
    2. Barto AG
    (1998) Reinforcement learning: an introduction (MIT, Cambridge, MA).
  60. ↵
    1. Swainson R,
    2. Rogers RD,
    3. Sahakian BJ,
    4. Summers BA,
    5. Polkey CE,
    6. Robbins TW
    (2000) Probabilistic learning and reversal deficits in patients with Parkinson's disease or frontal or temporal lobe lesions: possible adverse effects of dopaminergic medication. Neuropsychologia 38:596–612.
    OpenUrlCrossRefPubMed
  61. ↵
    1. Voon V,
    2. Hassan K,
    3. Zurowski M,
    4. Duff-Canning S,
    5. de Souza M,
    6. Fox S,
    7. Lang AE,
    8. Miyasaki J
    (2006) Prospective prevalence of pathologic gambling and medication association in Parkinson disease. Neurology 66:1750–1752.
    OpenUrlAbstract/FREE Full Text
  62. ↵
    1. Wightman RM,
    2. Amatore C,
    3. Engstrom RC,
    4. Hale PD,
    5. Kristensen EW,
    6. Kuhr WG,
    7. May LJ
    (1988) Real-time characterization of dopamine overflow and uptake in the rat striatum. Neuroscience 25:513–523.
    OpenUrlCrossRefPubMed
Back to top

In this issue

The Journal of Neuroscience: 29 (48)
Journal of Neuroscience
Vol. 29, Issue 48
2 Dec 2009
  • Table of Contents
  • Table of Contents (PDF)
  • About the Cover
  • Index by author
Email

Thank you for sharing this Journal of Neuroscience article.

NOTE: We request your email address only to inform the recipient that it was you who recommended this article, and that it is not junk mail. We do not retain these email addresses.

Enter multiple addresses on separate lines or separate them with commas.
Dopaminergic Drugs Modulate Learning Rates and Perseveration in Parkinson's Patients in a Dynamic Foraging Task
(Your Name) has forwarded a page to you from Journal of Neuroscience
(Your Name) thought you would be interested in this article in Journal of Neuroscience.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Print
View Full Page PDF
Citation Tools
Dopaminergic Drugs Modulate Learning Rates and Perseveration in Parkinson's Patients in a Dynamic Foraging Task
Robb B. Rutledge, Stephanie C. Lazzaro, Brian Lau, Catherine E. Myers, Mark A. Gluck, Paul W. Glimcher
Journal of Neuroscience 2 December 2009, 29 (48) 15104-15114; DOI: 10.1523/JNEUROSCI.3524-09.2009

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Respond to this article
Request Permissions
Share
Dopaminergic Drugs Modulate Learning Rates and Perseveration in Parkinson's Patients in a Dynamic Foraging Task
Robb B. Rutledge, Stephanie C. Lazzaro, Brian Lau, Catherine E. Myers, Mark A. Gluck, Paul W. Glimcher
Journal of Neuroscience 2 December 2009, 29 (48) 15104-15114; DOI: 10.1523/JNEUROSCI.3524-09.2009
Reddit logo Twitter logo Facebook logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Jump to section

  • Article
    • Abstract
    • Introduction
    • Materials and Methods
    • Results
    • Discussion
    • Footnotes
    • References
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF

Responses to this article

Respond to this article

Jump to comment:

No eLetters have been published for this article.

Related Articles

Cited By...

More in this TOC Section

Articles

  • Choice Behavior Guided by Learned, But Not Innate, Taste Aversion Recruits the Orbitofrontal Cortex
  • Maturation of Spontaneous Firing Properties after Hearing Onset in Rat Auditory Nerve Fibers: Spontaneous Rates, Refractoriness, and Interfiber Correlations
  • Insulin Treatment Prevents Neuroinflammation and Neuronal Injury with Restored Neurobehavioral Function in Models of HIV/AIDS Neurodegeneration
Show more Articles

Behavioral/Systems/Cognitive

  • Influence of Reward on Corticospinal Excitability during Movement Preparation
  • Identification and Characterization of a Sleep-Active Cell Group in the Rostral Medullary Brainstem
  • Gravin Orchestrates Protein Kinase A and β2-Adrenergic Receptor Signaling Critical for Synaptic Plasticity and Memory
Show more Behavioral/Systems/Cognitive
  • Home
  • Alerts
  • Visit Society for Neuroscience on Facebook
  • Follow Society for Neuroscience on Twitter
  • Follow Society for Neuroscience on LinkedIn
  • Visit Society for Neuroscience on Youtube
  • Follow our RSS feeds

Content

  • Early Release
  • Current Issue
  • Issue Archive
  • Collections

Information

  • For Authors
  • For Advertisers
  • For the Media
  • For Subscribers

About

  • About the Journal
  • Editorial Board
  • Privacy Policy
  • Contact
(JNeurosci logo)
(SfN logo)

Copyright © 2023 by the Society for Neuroscience.
JNeurosci Online ISSN: 1529-2401

The ideas and opinions expressed in JNeurosci do not necessarily reflect those of SfN or the JNeurosci Editorial Board. Publication of an advertisement or other product mention in JNeurosci should not be construed as an endorsement of the manufacturer’s claims. SfN does not assume any responsibility for any injury and/or damage to persons or property arising from or related to any use of any material contained in JNeurosci.