Deep and beautiful. The reward prediction error hypothesis of dopamine
Introduction
According to the reward-prediction error hypothesis of dopamine (RPEH), the phasic activity of dopaminergic neurons in specific regions in the midbrain signals a discrepancy between the predicted and currently experienced reward of a particular event. The RPEH is widely regarded as one of the largest successes of computational neuroscience. Terrence Sejnowski, a pioneer in computational neuroscience and prominent cognitive scientist, pointed at the RPEH, when, in 2012, he was invited by the online magazine Edge.org to answer the question “What is your favorite deep, elegant, or beautiful explanation?” Several researchers in cognitive and brain sciences would agree that this hypothesis “has become the standard model [for explaining dopaminergic activity and reward-based learning] within neuroscience” (Caplin & Dean, 2008, p. 663). Even among critics, the “stunning elegance” and the “beautiful rigor” of the RPEH are recognized (Berridge, 2007, pp. 399, 403).
However, the type of information coded by dopaminergic transmission—along with its functional role in cognition and behaviour—is very likely to go beyond reward-prediction error. The RPEH is not the only available hypothesis about what type of information is encoded by dopaminergic activity in the midbrain (cf., Berridge, 2007, Friston et al., 2012, Graybiel, 2008, Wise, 2004). Current evidence does not speak univocally in favour of this hypothesis, and disagreement remains about to what extent the RPEH is supported by available evidence (Dayan and Niv, 2008, O’Doherty, 2012, Redgrave and Gurney, 2006). On the one hand, it has been claimed that “to date no alternative has mustered as convincing and multidirectional experimental support as the prediction-error theory of dopamine” (Niv & Montague, 2009, p. 342; see also Glimcher, 2011, Niv, 2009); on the other hand, the counter-claims have been put forward that the RPEH is an “elegant illusion” and that “[s]o far, incentive salience predictions [that is, predictions of an alternative hypothesis about dopamine] appear to best fit the data from situations that explicitly pit the dopamine hypotheses against each other” (Berridge, 2007, p. 424).
How has the RPEH become so successful then? What does it explain exactly? And, granted that it is at least intuitively uncontroversial that the RPEH is beautiful and elegant, in which sense can it be justifiably deemed deeper than alternatives? The present paper addresses these questions by firstly reconstructing the main historical events that led to the formulation and subsequent success of the RPEH (Section 2).
With this historical account on the background, it is elucidated what and how the RPEH explains, contrasting it to the incentive salience hypothesis—arguably its most prominent current alternative. It is clarified that both hypotheses are concerned only with what type of information is encoded by dopaminergic activity. Specifically, the RPEH has the dual role of accurately describing the dynamic profile of phasic dopaminergic activity in the midbrain during reward-based learning and decision-making, and of explaining this profile by citing the representational role of dopaminergic phasic activity. If the RPEH is true, then a mechanism composed of midbrain dopaminergic neurons and their phasic activity carries out the task of learning what to do in the face of expected rewards, generating decisions accordingly (Section 3).
The paper finally explicates under which conditions some explanation of learning, motivation or decision-making phenomena based on the RPEH can be justifiably deemed deeper than some alternative explanation based on the incentive salience hypothesis. Two accounts of explanatory depth are considered. According to one account, deeper explanatory generalizations have wider scope (e.g., Hempel, 1959); according to the other, deeper explanatory generalizations show more degrees of invariance (e.g., Woodward & Hitchcock, 2003). It is argued that, although it is premature to maintain that explanations based on the RPEH are actually deeper—in either of these two senses of explanatory depth—than alternative explanations based on the incentive salience hypothesis, relevant available evidence indicates that they may well be (Section 4). The contribution of the paper to existing literature is summarised in the conclusion.
Section snippets
Reward-prediction error meets dopamine
Dopamine is a neurotransmitter in the brain.
Reward-prediction error and incentive salience: what do they explain?
In light of Montague et al., 1996, Schultz et al., 1997, the RPEH can now be more precisely characterised. The hypothesis states that the phasic firing of dopaminergic neurons in the ventral tegmental area and substantia nigra “in part” encodes reward-prediction errors. Montague and colleagues did not claim that all type of activity in all dopaminergic neurons encode only (or in all circumstances) reward-prediction errors. Their hypothesis is about “a particular relationship between the causes
Explanatory depth, reward-prediction error and incentive salience
A number of accounts of explanatory depth have recently been proposed in philosophy of science (e.g., Woodward and Hitchcock, 2003, Strevens, 2009, Weslake, 2010). While significantly different, these accounts agree that explanatory depth is a feature of generalizations that express the relationship between an explanans and an explanandum.
According to Woodward and Hitchcock (2003), in order to be genuinely explanatory, a generalization should exhibit patterns of counterfactual dependence
Conclusion
This paper has made two types of contributions to existing literature, which should be of interest to both historians and philosophers of cognitive science. First, the paper has provided a comprehensive historical overview of the main steps that have led to the formulation of the RPEH. Second, in light of this historical overview, it has made explicit what precisely the RPEH and the ISH explain, and under which circumstances neurocomputational explanations of learning and decision-making
Acknowledgements
I am sincerely grateful to Aistis Stankevicius, Charles Rathkopf, Peter Dayan, and especially to Gregory Radick, editor of this journal, and to two anonymous referees, for their encouragement, constructive criticisms and helpful suggestions. The work on this project was supported by the Deutsche Forschungsgemeinschaft (DFG) as part of the priority program “New Frameworks of Rationality” ([SPP 1516]). The usual disclaimers about any remaining error or misconception in the paper apply.
References (110)
Theoretical neuroscience rising
Neuron
(2008)- et al.
Midbrain dopamine neurons encode a quantitative reward prediction error signal
Neuron
(2005) - et al.
What is the role of dopamine in reward: Hedonic impact, reward learning, or incentive salience?
Brain Research Reviews
(1998) - et al.
Dopamine neuron systems in the brain: An update
Trends in Neurosciences
(2007) A map of the rat mesencephalon for electrical selfstimulation
Brain Research
(1972)Computational modelling
Current Opinion in Neurobiology
(1994)Twenty-five lessons from computational neuromodulation
Neuron
(2012)- et al.
Reinforcement learning: The good, the bad and the ugly
Current Opinion in Neurobiology
(2008) - et al.
Value-dependent selection in the brain: Simulation in a synthetic neural model
Neuroscience
(1994) Phasic versus tonic dopamine release and the modulation of dopamine system responsivity: A hypothesis for the etiology of schizophrenia
Neuroscience
(1991)
Actor-critic models of the basal ganglia: New anatomical and computational perspectives
Neural Networks
Temporal prediction errors in a passive learning task activate human striatum
Neuron
Computational neuroimaging: Monitoring reward learning with blood flow
A computational substrate for incentive salience
Trends in Neuroscience
Computational psychiatry
Trends in Cognitive Sciences
Reinforcement learning in the brain
Journal of Mathematical Psychology
Theoretical and empirical studies of learning
Dialogues on prediction errors
Trends in Cognitive Sciences
Temporal difference learning model accounts for responses in human ventral striatum and orbitofrontal cortex during Pavlovian appetitive learning
Neuron
The neural basis of drug craving. An incentive-sensitization theory of addiction
Brain Research Reviews
Chemistry of purposive behavior
Multiple forms of value learning and the function of dopamine
Statistics of midbrain dopamine neuron spike trains in the awake primate
Journal of Neurophysiology
Predictability modulates human brain response to reward
Journal of Neuroscience
The debate over dopamine’s role in reward: The case for incentive salience
Psychopharmacology (Berl)
Taste reactivity analysis of 6-OHDA aphagia without impairment of taste reactivity: Implications for theories of dopamine function
Behavioral Neuroscience
A motivational view of learning, performance, and behavior modification
Psychological Review
A mathematical model for simple learning
Psychological Review
Computational capabilities of single neurons: Relationship to simple forms of associative and nonassociative learning in aplysia
Dopamine, reward prediction error, and economics
Quarterly Journal of Economics
Measuring beliefs and rewards: A neuroeconomic approach
Quarterly Journal of Economics
The occurrence, distribution, and physiological role of catecholamines in the nervous system
Pharmacological Reviews
Morphologic and dynamic aspects of dopamine in the central nervous system
A half-century of neurotransmitter research: Impact on neurology and psychiatry
The computational brain
Constitutive relevance and the personal/subpersonal distinction
Philosophical Psychology
Behavioural aspects of dopamine agonists and antagonists
Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control
Nature Neuroscience
The functional role of mesotelencephalic dopamine systems
Biological Reviews of the Cambridge Philosophical Society
Verteilung von Noradrenalin und Dopamin (3Hydroxytyramin) im Gehirn des Menschen und ihr Verhalten bci Erkrankungen des extrapyramidalen Systems
Klinisch Wochenschrift
Drugs and reinforcement mechanisms: A critical review of the catecholamine theory
Annual Review of Pharmacology and Toxicology
Dopamine, affordance and active inference
PLoS Computational Biology
The logic of limax learning
Understanding dopamine and reinforcement learning: The dopamine reward prediction error hypothesis
Proceeding of the National Academy of Science USA
Habits, rituals and the evaluative brain
Annual Review of Neuroscience
An identified neuron mediates the unconditioned stimulus in associative olfactory learning in honeybees
Nature
Is there a cell-biological alphabet for simple forms of learning?
Psychological Review
The logic of functional analysis
The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity
Psychological Review
Dopamine (3-hydroxytyramine) and brain function
Pharmacological Reviews
Cited by (31)
Interoception as modeling, allostasis as control
2022, Biological PsychologyBidirectional regulation of reward, punishment, and arousal by dopamine, the lateral habenula and the rostromedial tegmentum (RMTg)
2019, Current Opinion in Behavioral SciencesCitation Excerpt :The findings that the RMTg and LHb display inverse firing patterns relative to classic DA RPEs suggested that the function of these regions might be understood in relation to the hypothesized role of DA RPEs in learning and motivation. For two decades, DA RPE signals have been posited to serve as ‘teaching signals’ that drive synaptic plasticity in downstream targets [2,26–30]. In an operant task, such as lever pressing for food or drug, increased DA would thus be hypothesized to reinforce just-completed actions, increasing their likelihood of occurrence at future times and places, while reductions in DA would do the opposite, reducing the likelihood of an action’s future occurrence.
A working memory model improves cognitive control in agents and robots
2018, Cognitive Systems ResearchHIT and brain reward function: A case of mistaken identity (theory)
2017, Studies in History and Philosophy of Science Part C :Studies in History and Philosophy of Biological and Biomedical SciencesStriatal dopamine D1 receptor suppression impairs reward-associative learning
2017, Behavioural Brain ResearchCitation Excerpt :Finally, the lack of D1R agonist-induced hyperactivity in the D1R-suppressed mice – observed in GFP-treated mice – supports the D1R specificity of our D1RshRNA. DA is vital for learning, as suggested by the DA reward prediction error hypothesis [7,9]. Numerous aspects of learning exist however, and so identifying the mechanisms underlying the specific learning deficits seen in neuropsychiatric patients is important, such as impaired reward-associative learning in the probabilistic task in patients with schizophrenia [53].
Explanatory pluralism: An unrewarding prediction error for free energy theorists
2017, Brain and CognitionCitation Excerpt :Neither does it claim that prediction errors can only be computed by DA operations, nor that all learning and action selection is executed using reward prediction errors or is dependent on DA activity. Given these caveats, RPE, which is arguably a major success story of computational neuroscience (Colombo, 2014), may be reducible to PTB only insofar as DA operations other than encoding reward prediction errors are neglected. But what does PTB claim, exactly, about DA?