Skip to main content

Main menu

  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Collections
    • Podcast
  • ALERTS
  • FOR AUTHORS
    • Information for Authors
    • Fees
    • Journal Clubs
    • eLetters
    • Submit
    • Special Collections
  • EDITORIAL BOARD
    • Editorial Board
    • ECR Advisory Board
    • Journal Staff
  • ABOUT
    • Overview
    • Advertise
    • For the Media
    • Rights and Permissions
    • Privacy Policy
    • Feedback
    • Accessibility
  • SUBSCRIBE

User menu

  • Log out
  • Log in
  • My Cart

Search

  • Advanced search
Journal of Neuroscience
  • Log out
  • Log in
  • My Cart
Journal of Neuroscience

Advanced Search

Submit a Manuscript
  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Collections
    • Podcast
  • ALERTS
  • FOR AUTHORS
    • Information for Authors
    • Fees
    • Journal Clubs
    • eLetters
    • Submit
    • Special Collections
  • EDITORIAL BOARD
    • Editorial Board
    • ECR Advisory Board
    • Journal Staff
  • ABOUT
    • Overview
    • Advertise
    • For the Media
    • Rights and Permissions
    • Privacy Policy
    • Feedback
    • Accessibility
  • SUBSCRIBE
PreviousNext
Journal Club

A Role for the Human Substantia Nigra in Reinforcement Learning

Archy O. de Berker and Robb B. Rutledge
Journal of Neuroscience 24 September 2014, 34 (39) 12947-12949; https://doi.org/10.1523/JNEUROSCI.2854-14.2014
Archy O. de Berker
1Sobell Department of Motor Neuroscience and Movement Disorders,
2Wellcome Trust Centre for Neuroimaging, University College London, London WC1N 3BG, United Kingdom, and
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Archy O. de Berker
Robb B. Rutledge
2Wellcome Trust Centre for Neuroimaging, University College London, London WC1N 3BG, United Kingdom, and
3Max Planck University College London Centre for Computational Psychiatry and Ageing Research, London WC1B 5EH, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Robb B. Rutledge
  • Article
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF
Loading

In a world rich with stimuli and potential actions, organisms must learn which objects are rewarding and which actions produce rewards. Dopamine neurons may play a key role in learning the values of stimuli and actions by representing reward prediction errors (RPEs), the difference between experienced and predicted reward. Although many studies report that phasic activity of dopamine neurons in the ventral tegmental area (VTA) and substantia nigra (SN) represents RPEs in humans (Zaghloul et al., 2009) and other animals (Schultz et al., 1997; Cohen et al., 2012), only recently has evidence emerged to support an instrumental role for phasic dopamine in reinforcement learning (Steinberg et al., 2013). However, it remains unclear whether the phasic activity of dopamine neurons in the VTA and SN play different roles in reinforcement learning. This issue is of particular importance given the selective deterioration of SN dopamine neurons in Parkinson's disease (Kish et al., 1988).

Ramayya et al. (2014) recently reported the results of a study addressing this important question. During deep brain stimulation (DBS) surgery to treat Parkinson's disease, patients routinely undergo recording and microstimulation of SN neurons to aid surgeons with DBS electrode placement in the nearby subthalamic nucleus. These operations provided the opportunity to record SN neurons during a probability-learning task and to use electrical microstimulation of those neurons to manipulate behavior.

In three blocks of 50 trials each, subjects made 25 choices with one stimulus pair and 25 choices with another stimulus pair, with the two pairs presented in alternating trains of three to six trials. One stimulus in each pair was associated with a high probability of reward (the “high-probability stimulus”) and the other stimulus was associated with a low probability of reward (the “low-probability stimulus”). Importantly, the left–right configuration of the stimuli in each pair was random, and therefore successful task performance required learning the values of stimuli and not the values of actions (i.e., left or right button presses).

In the first block, subjects on average made correct responses in 63% of trials. At the same time, the authors recorded the waveform and phasic spike response to positive feedback for a single SN neuron in each subject. In the second block, phasic microstimulation was applied coincident with all positive feedback resulting from the high-probability stimulus for one of the two stimulus pairs (the STIMPOS pair); this outcome, as a positive RPE, should be associated with an increase in phasic dopamine (Hart et al., 2014). Thus, microstimulation is expected to further increase phasic dopamine release associated with the positive RPE. In the third and final block, phasic microstimulation was applied coincident with all negative feedback from the low-probability stimulus for one of the two stimulus pairs (the STIMNEG pair); this outcome, as a negative RPE, should be associated with a decrease in phasic dopamine (Hart et al., 2014). Thus, microstimulation might counteract the phasic dopamine decrease normally associated with a negative RPE.

The authors reported that correct task performance decreased for the STIMPOS stimulus pair, suggesting that SN microstimulation does not enhance the learning of stimulus values needed to perform the task. This immediately suggests that the SN plays a different role in reinforcement learning to the VTA, where microstimulation potentiates stimulus–reward learning (Steinberg et al., 2013; Arsenault et al., 2014). To test whether this decline in performance related to an increased emphasis on learning of action–reward rather than stimulus–reward associations, the authors developed a hybrid action–stimulus learning model (Fig. 1A). In this model, stimulus and action values are learned separately and then combined, using a weighting parameter, into an aggregate value for each action–stimulus combination. In model simulations, increasing the weighting parameter to give action values a greater influence on combined values resulted in a decline in STIMPOS performance. Model simulations were also consistent with other features of the data, such as a significant correlation across subjects between the decrease in STIMPOS accuracy and the probability of repeating the same action (i.e., left or right button press) after positive feedback, a probability that depends on the extent to which action values influence decision making.

Figure 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 1.

Model simulations for microstimulation effects on reinforcement learning. A, The hybrid action-stimulus learning model of Ramayya et al. (2014) features separate stimulus values (Qstimulus) and action values (Qaction) that are aggregated with a weighting parameter WA into combined values (AQstimulus, action) used to make decisions. Stimulation is suggested to increase WA (yellow lightning bolt), thereby increasing the influence of action values on decisions. An alternative possibility (red lightning bolt) is that stimulation increases action-value RPEs used to update action values. B, In the STIMPOS condition, stimulation coincides with positive feedback from the high-value stimulus for the STIMPOS pair. If phasic dopamine activity represents action-value RPEs, then as the magnitude of SN dopamine neuron stimulation increases, the average difference in the action values for the two actions increases (red) and action values exert a greater influence on decision making relative to stimulus values (blue). As the influence of action values increases, performance in a stimulus-value learning task decreases (green). C, In the STIMNEG condition, stimulation coincides with negative feedback from the low-value stimulus for the STIMNEG pair. For low magnitudes, stimulation does not increase the influence of action values on decision making. Higher stimulation magnitudes are necessary to obtain deficits comparable to the STIMPOS condition.

The authors also found that task performance did not decrease for the STIMNEG pair, but their model simulations did not offer an explanation for this surprising result. We therefore explored an alternative interpretation of the results using the same hybrid model but with microstimulation increasing the action-value RPE term that SN dopamine neurons might represent, instead of the weighting parameter (Fig. 1A). We generated simulations using the hybrid model with the same parameters (α = 0.2, β = 0.2) and equal weighting for stimulus and action values (WA = 0.5).

Using this approach, we reproduced the main features of the data, including a greater decrease in accuracy for the STIMPOS pair (Fig. 1B) than for the STIMNEG pair (Fig. 1C). For example, a stimulation magnitude sufficient to elicit a 12% accuracy decrease for the STIMPOS pair led to only a 4% accuracy decrease for the STIMNEG pair. This asymmetry has a simple explanation: negative feedback should normally weaken action–reward associations but stimulation during that feedback might counteract negative action-value RPEs. Paradoxically, small STIMNEG microstimulation magnitudes could actually improve performance in a task where performance depends only on stimulus values because they might reduce the average difference in values between the actions, and thus the influence of action values on choice (Fig. 1C). Our alternative approach also has the benefit of obviating the need for the brain to store combined action–stimulus values. Instead, action values and stimulus values can be combined during the decision process, which may be more parsimonious than updating action–stimulus values using a weighting parameter that is only affected by stimulation at the time of outcome delivery. Further research will be needed to determine exactly how the weighting parameter exerts its effect.

The results of Ramayya et al. (2014) suggest that SN dopamine neurons represent action-value RPEs, complementing the possibility that VTA dopamine neurons represent stimulus-value RPEs. This division of labor between the SN and VTA is consistent with distinct patterns of inputs to VTA and SN dopamine neurons (Watabe-Uchida et al., 2012), recent findings that stimulating VTA dopamine neurons increases stimulus–reward associations (Steinberg et al., 2013; Arsenault et al., 2014), and the finding that optogenetic stimulation of VTA and SN dopamine neurons have similar effects on operant place preference in mice when, as in many tasks, stimulus-value or action-value learning make similar behavioral predictions (Ilango et al., 2014).

How might the role of the SN in reinforcement learning be further tested? If the SN plays a role in action-value but not stimulus-value learning, microstimulation of SN dopamine neurons in Parkinson's patients should improve STIMPOS performance in tasks that could be learned by either mechanism, and in tasks in which feedback depends only on actions. Furthermore, electrophysiological recordings during these tasks could reveal whether VTA and SN dopamine activity represent different types of RPEs or reflect the weighting parameter used in different tasks.

A prominent proposal (the actor–critic model) suggests that stimulus-value RPEs represented by dopaminergic projections from the VTA to the ventral striatum (the “critic”) underlie learning of stimuli or states necessary for Pavlovian conditioning. The same stimulus-value RPEs are then used to inform instrumental choice, training an “actor” putatively located in the dorsal striatum (Balleine et al., 2008). Neuroimaging (O'Doherty et al., 2004) and pharmacological (Piray et al., 2014) studies support this distinction, but it is unclear what the role of SN dopamine neurons might be in this scheme, since electrophysiological results suggest that SN dopamine neurons do not represent stimulus-value RPEs (Morris et al., 2006). The hybrid action–stimulus learning model introduced by Ramayya et al. (2014) differs from the actor–critic model in that values associated with actions can be learned in isolation from stimulus values (although it remains unclear whether this learning is entirely stimulus-independent or whether action values are separately linked to each stimulus pair). If SN dopamine neurons are not used to learn stimulus values, as these findings suggest, SN microstimulation might have no effect on Pavlovian phenomena such as vigor, which might be reflected in reaction times. Further studies might use microstimulation of the VTA and SN to explicitly compare the predictions of these differing models of basal ganglia function.

The finding that changes in accuracy can be understood as changes in the weighting of action and stimulus values raises some interesting possibilities. Early-stage nonmedicated Parkinson's patients might show deficits in an action-value learning task due to loss of SN dopamine neurons but might actually outperform control subjects in a stimulus-value learning task due to a reduced influence of action values on their choices. The weighting parameter introduced in this study may also provide an informative way of characterizing behavior, providing a behavioral assay that may relate to patterns of dopamine depletion in Parkinson's disease.

The demonstration by Ramayya et al. (2014) of a role for the human SN in action-value learning is an important advance in our understanding of reinforcement learning in humans. These results suggest several interesting directions for future research that might clarify the precise roles of VTA and SN dopamine neurons and at the same time advance our understanding of Parkinson's disease and the effects of dopaminergic drugs on behavior.

Footnotes

  • Editor's Note: These short, critical reviews of recent papers in the Journal, written exclusively by graduate students or postdoctoral fellows, are intended to summarize the important findings of the paper and provide additional insight and commentary. For more information on the format and purpose of the Journal Club, please see http://www.jneurosci.org/misc/ifa_features.shtml.

  • This work was supported by the Medical Research Council (A.O.d.B.) and the Max Planck Society (R.B.R.). The Wellcome Trust Centre for Neuroimaging is supported by core funding from Wellcome Trust Grant 091593/Z/10/Z. We thank Peter Dayan for helpful comments.

  • Correspondence should be addressed to either of the following: Archy O. de Berker, Sobell Department of Motor Neuroscience and Movement Disorders, 33 Queen Square, London WC1N 3BG, United Kingdom, archy.berker.12{at}ucl.ac.uk; or Robb B. Rutledge, Wellcome Trust Centre for Neuroimaging, 12 Queen Square, London WC1N 3BG, United Kingdom, robb.rutledge{at}ucl.ac.uk

References

  1. ↵
    1. Arsenault JT,
    2. Rima S,
    3. Stemmann H,
    4. Vanduffel W
    (2014) Role of the primate ventral tegmental area in reinforcement and motivation. Curr Biol 24:1347–1353, doi:10.1016/j.cub.2014.04.044, pmid:24881876.
    OpenUrlCrossRefPubMed
  2. ↵
    1. Balleine BW,
    2. Daw ND,
    3. O'Doherty JP
    (2008) in Neuroeconomics: decision-making and the brain, Multiple forms of value learning and the function of dopamine, eds Glimcher PW, Camerer C, Fehr E, Poldrack RA (Academic, New York), pp 367–385.
  3. ↵
    1. Cohen JY,
    2. Haesler S,
    3. Vong L,
    4. Lowell BB,
    5. Uchida N
    (2012) Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature 482:85–88, doi:10.1038/nature10754, pmid:22258508.
    OpenUrlCrossRefPubMed
  4. ↵
    1. Hart AS,
    2. Rutledge RB,
    3. Glimcher PW,
    4. Phillips PE
    (2014) Phasic dopamine release in the rat nucleus accumbens symmetrically encodes a reward prediction error term. J Neurosci 34:698–704, doi:10.1523/JNEUROSCI.2489-13.2014, pmid:24431428.
    OpenUrlAbstract/FREE Full Text
  5. ↵
    1. Ilango A,
    2. Kesner AJ,
    3. Keller KL,
    4. Stuber GD,
    5. Bonci A,
    6. Ikemoto S
    (2014) Similar roles of substantia nigra and ventral tegmental area dopamine neurons in reward and aversion. J Neurosci 34:817–822, doi:10.1523/JNEUROSCI.1703-13.2014, pmid:24431440.
    OpenUrlAbstract/FREE Full Text
  6. ↵
    1. Kish SJ,
    2. Shannak K,
    3. Hornykiewicz O
    (1988) Uneven pattern of dopamine loss in the striatum of patients with idiopathic Parkinson's disease: pathophysiologic and clinical implications. N Engl J Med 318:876–880, doi:10.1056/NEJM198804073181402, pmid:3352672.
    OpenUrlCrossRefPubMed
  7. ↵
    1. Morris G,
    2. Nevet A,
    3. Arkadir D,
    4. Vaadia E,
    5. Bergman H
    (2006) Midbrain dopamine neurons encode decisions for future action. Nat Neurosci 9:1057–1063, doi:10.1038/nn1743, pmid:16862149.
    OpenUrlCrossRefPubMed
  8. ↵
    1. O'Doherty J,
    2. Dayan P,
    3. Schultz J,
    4. Deichmann R,
    5. Friston K,
    6. Dolan RJ
    (2004) Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science 304:452–454, doi:10.1126/science.1094285, pmid:15087550.
    OpenUrlAbstract/FREE Full Text
  9. ↵
    1. Piray P,
    2. Zeighami Y,
    3. Bahrami F,
    4. Eissa AM,
    5. Hewedi DH,
    6. Moustafa AA
    (2014) Impulse control disorders in Parkinson's disease are associated with dysfunction in stimulus valuation but not action valuation. J Neurosci 34:7814–7824, doi:10.1523/JNEUROSCI.4063-13.2014, pmid:24899705.
    OpenUrlAbstract/FREE Full Text
  10. ↵
    1. Ramayya AG,
    2. Misra A,
    3. Baltuch GH,
    4. Kahana MJ
    (2014) Microstimulation of the human substantia nigra alters reinforcement learning. J Neurosci 34:6887–6895, doi:10.1523/JNEUROSCI.5445-13.2014, pmid:24828643.
    OpenUrlAbstract/FREE Full Text
  11. ↵
    1. Schultz W,
    2. Dayan P,
    3. Montague PR
    (1997) A neural substrate of prediction and reward. Science 275:1593–1599, doi:10.1126/science.275.5306.1593, pmid:9054347.
    OpenUrlAbstract/FREE Full Text
  12. ↵
    1. Steinberg EE,
    2. Keiflin R,
    3. Boivin JR,
    4. Witten IB,
    5. Deisseroth K,
    6. Janak PH
    (2013) A causal link between prediction errors, dopamine neurons and learning. Nat Neurosci 16:966–973, doi:10.1038/nn.3413, pmid:23708143.
    OpenUrlCrossRefPubMed
  13. ↵
    1. Watabe-Uchida M,
    2. Zhu L,
    3. Ogawa SK,
    4. Vamanrao A,
    5. Uchida N
    (2012) Whole-brain mapping of direct inputs to midbrain dopamine neurons. Neuron 74:858–873, doi:10.1016/j.neuron.2012.03.017, pmid:22681690.
    OpenUrlCrossRefPubMed
  14. ↵
    1. Zaghloul KA,
    2. Blanco JA,
    3. Weidemann CT,
    4. McGill K,
    5. Jaggi JL,
    6. Baltuch GH,
    7. Kahana MJ
    (2009) Human substantia nigra neurons encode unexpected financial rewards. Science 323:1496–1499, doi:10.1126/science.1167342, pmid:19286561.
    OpenUrlAbstract/FREE Full Text
Back to top

In this issue

The Journal of Neuroscience: 34 (39)
Journal of Neuroscience
Vol. 34, Issue 39
24 Sep 2014
  • Table of Contents
  • Table of Contents (PDF)
  • About the Cover
  • Index by author
  • Advertising (PDF)
  • Ed Board (PDF)
Email

Thank you for sharing this Journal of Neuroscience article.

NOTE: We request your email address only to inform the recipient that it was you who recommended this article, and that it is not junk mail. We do not retain these email addresses.

Enter multiple addresses on separate lines or separate them with commas.
A Role for the Human Substantia Nigra in Reinforcement Learning
(Your Name) has forwarded a page to you from Journal of Neuroscience
(Your Name) thought you would be interested in this article in Journal of Neuroscience.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Print
View Full Page PDF
Citation Tools
A Role for the Human Substantia Nigra in Reinforcement Learning
Archy O. de Berker, Robb B. Rutledge
Journal of Neuroscience 24 September 2014, 34 (39) 12947-12949; DOI: 10.1523/JNEUROSCI.2854-14.2014

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Respond to this article
Request Permissions
Share
A Role for the Human Substantia Nigra in Reinforcement Learning
Archy O. de Berker, Robb B. Rutledge
Journal of Neuroscience 24 September 2014, 34 (39) 12947-12949; DOI: 10.1523/JNEUROSCI.2854-14.2014
Twitter logo Facebook logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Jump to section

  • Article
    • Footnotes
    • References
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF

Responses to this article

Respond to this article

Jump to comment:

No eLetters have been published for this article.

Related Articles

Cited By...

More in this TOC Section

  • Attentional Mechanisms for Learning Feature Combinations
  • Moment-to-Moment Heart–Brain Interactions: How Cardiac Signals Influence Cortical Processing and Time Estimation
  • How the Ventromedial Prefrontal Cortex (VMPFC) Facilitates Welfare Maximization in Social Contexts
Show more Journal Club
  • Home
  • Alerts
  • Follow SFN on BlueSky
  • Visit Society for Neuroscience on Facebook
  • Follow Society for Neuroscience on Twitter
  • Follow Society for Neuroscience on LinkedIn
  • Visit Society for Neuroscience on Youtube
  • Follow our RSS feeds

Content

  • Early Release
  • Current Issue
  • Issue Archive
  • Collections

Information

  • For Authors
  • For Advertisers
  • For the Media
  • For Subscribers

About

  • About the Journal
  • Editorial Board
  • Privacy Notice
  • Contact
  • Accessibility
(JNeurosci logo)
(SfN logo)

Copyright © 2025 by the Society for Neuroscience.
JNeurosci Online ISSN: 1529-2401

The ideas and opinions expressed in JNeurosci do not necessarily reflect those of SfN or the JNeurosci Editorial Board. Publication of an advertisement or other product mention in JNeurosci should not be construed as an endorsement of the manufacturer’s claims. SfN does not assume any responsibility for any injury and/or damage to persons or property arising from or related to any use of any material contained in JNeurosci.