Neurocomputational mechanisms of reinforcement-guided learning in humans: A review

Cohen, Michael X

doi:10.3758/CABN.8.2.113

Neurocomputational mechanisms of reinforcement-guided learning in humans: A review

Published: June 2008

Volume 8, pages 113–125, (2008)
Cite this article

Download PDF

Cognitive, Affective, & Behavioral Neuroscience Aims and scope Submit manuscript

Neurocomputational mechanisms of reinforcement-guided learning in humans: A review

Download PDF

Michael X Cohen^1,2

1133 Accesses
35 Citations
Explore all metrics

Abstract

Adapting decision making according to dynamic and probabilistic changes in action-reward contingencies is critical for survival in a competitive and resource-limited world. Much research has focused on elucidating the neural systems and computations that underlie how the brain identifies whether the consequences of actions are relatively good or bad. In contrast, less empirical research has focused on the mechanisms by which reinforcements might be used to guide decision making. Here, I review recent studies in which an attempt to bridge this gap has been made by characterizing how humans use reward information to guide and optimize decision making. Regions that have been implicated in reinforcement processing, including the striatum, orbitofrontal cortex, and anterior cingulate, also seem to mediate how reinforcements are used to adjust subsequent decision making. This research provides insights into why the brain devotes resources to evaluating reinforcements and suggests a direction for future research, from studying the mechanisms of reinforcement processing to studying the mechanisms of reinforcement learning.

Article PDF

Twenty years of load theory—Where are we now, and where should we go next?

Article 04 January 2016

Gillian Murphy, John A. Groeger & Ciara M. Greene

Decision Making: a Theoretical Review

Article 15 November 2021

Matteo Morelli, Maria Casagrande & Giuseppe Forte

Revisiting the Role of Rewards in Motivation and Learning: Implications of Neuroscientific Research

Article 22 April 2015

Suzanne Hidi

References

Abler, B., Walter, H., Erk, S., Kammerer, H., & Spitzer, M. (2006). Prediction error as a linear function of reward probability is coded in human nucleus accumbens. NeuroImage, 31, 790–795.
Article PubMed Google Scholar
Aron, A. R., Shohamy, D., Clark, J., Myers, C., Gluck, M. A., & Poldrack, R. A. (2004). Human midbrain sensitivity to cognitive feedback and uncertainty during classification learning. Journal of Neurophysiology, 92, 1144–1152.
Article PubMed Google Scholar
Aston-Jones, G., & Cohen, J. D. (2005). Adaptive gain and the role of the locus coeruleus-norepinephrine system in optimal performance. Journal of Comparative Neurology, 493, 99–110.
Article PubMed Google Scholar
Barraclough, D. J., Conroy, M. L., & Lee, D. (2004). Prefrontal cortex and decision making in a mixed-strategy game. Nature Neuroscience, 7, 404–410.
Article PubMed Google Scholar
Barto, A. G. (1995). Reinforcement learning. In M. A. Arbib (Ed.), Handbook of brain theory and neural networks (pp. 804–809). Cambridge, MA: MIT Press.
Google Scholar
Bayer, H. M., & Glimcher, P. W. (2005). Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47, 129–141.
Article PubMed Google Scholar
Bayer, H. M., Lau, B., & Glimcher, P. W. (2007). Statistics of midbrain dopamine neuron spike trains in the awake primate. Journal of Neurophysiology, 98, 1428–1439.
Article PubMed Google Scholar
Behrens, T. E., Woolrich, M. W., Walton, M. E., & Rushworth, M. F. (2007). Learning the value of information in an uncertain world. Nature Neuroscience, 10, 1214–1221.
Article PubMed Google Scholar
Berridge, K. C. (2007). The debate over dopamine’s role in reward: The case for incentive salience. Psychopharmacology, 191, 391–431.
Article PubMed Google Scholar
Bilder, R. M., Volavka, J., Lachman, H. M., & Grace, A. A. (2004). The catechol-O-methyltransferase polymorphism: Relations to the tonic-phasic dopamine hypothesis and neuropsychiatric phenotypes. Neuropsychopharmacology, 29, 1943–1961.
Article PubMed Google Scholar
Blum, K., Braverman, E. R., Holder, J. M., Lubar, J. F., Monastra, V. J., Miller, D., et al. (2000). Reward deficiency syndrome: A biogenetic model for the diagnosis and treatment of impulsive, addictive, and compulsive behaviors. Journal of Psychoactive Drugs, 32 (Suppl., i–iv), 1–112.
Google Scholar
Bogacz, R., Brown, E., Moehlis, J., Holmes, P., & Cohen, J. D. (2006). The physics of optimal decision making: A formal analysis of models of performance in two-alternative forced-choice tasks. Psychological Review, 113, 700–765.
Article PubMed Google Scholar
Bogacz, R., McClure, S. M., Li, J., Cohen, J. D., & Montague, P. R. (2007). Short-term memory traces for action bias in human reinforcement learning. Brain Research, 1153, 111–121.
Article PubMed Google Scholar
Braver, T. S., Barch, D. M., Keys, B. A., Carter, C. S., Cohen, J. D., Kaye, J. A., et al. (2001). Context processing in older adults: Evidence for a theory relating cognitive control to neurobiology in healthy aging. Journal of Experimental Psychology: General, 130, 746–763.
Article Google Scholar
Braver, T. S., & Brown, J. W. (2003). Principles of pleasure prediction: Specifying the neural dynamics of human reward learning. Neuron, 38, 150–152.
Article PubMed Google Scholar
Brown, J. W., & Braver, T. S. (2007). Risk prediction and aversion by anterior cingulate cortex. Cognitive, Affective, & Behavioral Neuroscience, 7, 266–277.
Article Google Scholar
Brown, J. W., & Braver, T. S. (2008). A computational model of risk, conflict, and individual difference effects in the anterior cingulate cortex. Brain Research, 1202, 99–108.
Article PubMed Google Scholar
Camerer, C. F. (2003). Behavioral game theory: Experiments in strategic interaction. Princeton: Princeton University Press.
Google Scholar
Cardinal, R. N. (2006). Neural systems implicated in delayed and probabilistic reinforcement. Neural Networks, 19, 1277–1301.
Article PubMed Google Scholar
Carr, D. B., & Sesack, S. R. (2000). Projections from the rat prefrontal cortex to the ventral tegmental area: Target specificity in the synaptic associations with mesoaccumbens and mesocortical neurons. Journal of Neuroscience, 20, 3864–3873.
PubMed Google Scholar
Cepeda, C., & Levine, M. S. (1998). Dopamine and N-methyl-D-aspartate receptor interactions in the neostriatum. Developmental Neuroscience, 20, 1–18.
Article PubMed Google Scholar
Cohen, J. D., McClure, S. M., & Yu, A. J. (2007). Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration. Philosophical Transactions of the Royal Society B, 362, 933–942.
Article Google Scholar
Cohen, J. D., & Servan-Schreiber, D. (1993). A theory of dopamine function and its role in cognitive deficits in schizophrenia. Schizophrenia Bulletin, 19, 85–104.
PubMed Google Scholar
Cohen, M. X (2007). Individual differences and the neural representations of reward expectation and reward prediction error. Social Cognitive & Affective Neuroscience, 2, 20–30.
Article Google Scholar
Cohen, M. X, Elger, C. E., & Ranganath, C. (2007). Reward expectation modulates feedback-related negativity and EEG spectra. NeuroImage, 35, 968–978.
Article PubMed Google Scholar
Cohen, M. X, & Ranganath, C. (2005). Behavioral and neural predictors of upcoming decisions. Cognitive, Affective, & Behavioral Neuroscience, 5, 117–126.
Article Google Scholar
Cohen, M. X, & Ranganath, C. (2007). Reinforcement learning signals predict future decisions. Journal of Neuroscience, 27, 371–378.
Article PubMed Google Scholar
Cohen, M. X, Young, J., Baek, J. M., Kessler, C., & Ranganath, C. (2005). Individual differences in extraversion and dopamine genetics predict neural reward responses. Cognitive Brain Research, 25, 851–861.
Article PubMed Google Scholar
Cools, R., Clark, L., Owen, A. M., & Robbins, T. W. (2002). Defining the neural mechanisms of probabilistic reversal learning using eventrelated functional magnetic resonance imaging. Journal of Neuroscience, 22, 4563–4567.
PubMed Google Scholar
Cools, R., Lewis, S. J., Clark, L., Barker, R. A., & Robbins, T. W. (2007). L-DOPA disrupts activity in the nucleus accumbens during reversal learning in Parkinson’s disease. Neuropsychopharmacology, 32, 180–189.
Article PubMed Google Scholar
Daw, N. D., & Doya, K. (2006). The computational neurobiology of learning and reward. Current Opinion in Neurobiology, 16, 199–204.
Article PubMed Google Scholar
Daw, N. D., O’Doherty, J. P., Dayan, P., Seymour, B., & Dolan, R. J. (2006). Cortical substrates for exploratory decisions in humans. Nature, 441, 876–879.
Article PubMed Google Scholar
Dayan, P., & Balleine, B. W. (2002). Reward, motivation, and reinforcement learning. Neuron, 36, 285–298.
Article PubMed Google Scholar
Debener, S., Ullsperger, M., Siegel, M., Fiehler, K., von Cramon, D. Y., & Engel, A. K. (2005). Trial-by-trial coupling of concurrent electroencephalogram and functional magnetic resonance imaging identifies the dynamics of performance monitoring. Journal of Neuroscience, 25, 11730–11737.
Article PubMed Google Scholar
Dehaene, S., & Changeux, J. P. (2000). Reward-dependent learning in neuronal networks for planning and decision making. Progress in Brain Research, 126, 217–229.
Article PubMed Google Scholar
Delgado, M. R., Miller, M. M., Inati, S., & Phelps, E. A. (2005). An fMRI study of reward-related probability learning. NeuroImage, 24, 862–873.
Article PubMed Google Scholar
Ditterich, J. (2006). Stochastic models of decisions about motion direction: Behavior and physiology. Neural Networks, 19, 981–1012.
Article PubMed Google Scholar
Egelman, D. M., Person, C., & Montague, P. R. (1998). A computational role for dopamine delivery in human decision-making. Journal of Cognitive Neuroscience, 10, 623–630.
Article PubMed Google Scholar
Evenden, J. L., & Robbins, T. W. (1983). Increased response switching, perseveration and perseverative switching following d-amphetamine in the rat. Psychopharmacology, 80, 67–73.
Article PubMed Google Scholar
Everitt, B. J., Parkinson, J. A., Olmstead, M. C., Arroyo, M., Robledo, P., & Robbins, T. W. (1999). Associative processes in addiction and reward: The role of amygdala-ventral striatal subsystems. In J. F. McGinty (Ed.), Advancing from the ventral striatum to the extended amygdala: Implications for neuropsychiatry and drug abuse (Annals of the New York Academy of Sciences, Vol. 877, pp. 412–438). New York: New York Academy of Sciences.
Google Scholar
Fiehler, K., Ullsperger, M., & von Cramon, D. Y. (2004). Neural correlates of error detection and error correction: Is there a common neuroanatomical substrate? European Journal of Neuroscience, 19, 3081–3087.
Article PubMed Google Scholar
Filoteo, J. V., Maddox, W. T., Simmons, A. N., Ing, A. D., Cagigas, X. E., Matthews, S., & Paulus, M. P. (2005). Cortical and subcortical brain regions involved in rule-based category learning. NeuroReport, 16, 111–115.
Article PubMed Google Scholar
Floresco, S. B., & Magyar, O. (2006). Mesocortical dopamine modulation of executive functions: Beyond working memory. Psychopharmacology, 188, 567–585.
Article PubMed Google Scholar
Frank, M. J. (2005). Dynamic dopamine modulation in the basal ganglia: A neurocomputational account of cognitive deficits in medicated and nonmedicated Parkinsonism. Journal of Cognitive Neuroscience, 17, 51–72.
Article PubMed Google Scholar
Frank, M. J. (2006). Hold your horses: A dynamic computational role for the subthalamic nucleus in decision making. Neural Networks, 19, 1120–1136.
Article PubMed Google Scholar
Frank, M. J., & Claus, E. D. (2006). Anatomy of a decision: Striatoorbitofrontal interactions in reinforcement learning, decision making, and reversal. Psychological Review, 113, 300–326.
Article PubMed Google Scholar
Frank, M. J., Moustafa, A. A., Haughey, H. M., Curran, T., & Hutchison, K. E. (2007). Genetic triple dissociation reveals multiple roles for dopamine in reinforcement learning. Proceedings of the National Academy of Sciences, 104, 16311–16316.
Article Google Scholar
Frank, M. J., Seeberger, L. C., & O’Reilly, R. C. (2004). By carrot or by stick: Cognitive reinforcement learning in Parkinsonism. Science, 306, 1940–1943.
Article PubMed Google Scholar
Frank, M. J., Woroch, B. S., & Curran, T. (2005). Error-related negativity predicts reinforcement learning and conflict biases. Neuron, 47, 495–501.
Article PubMed Google Scholar
Franken, I. H., van Strien, J. W., Franzek, E. J., & van de Wetering, B. J. (2007). Error-processing deficits in patients with cocaine dependence. Biological Psychology, 75, 45–51.
Article PubMed Google Scholar
Gao, M., Liu, C. L., Yang, S., Jin, G. Z., Bunney, B. S., & Shi, W. X. (2007). Functional coupling between the prefrontal cortex and dopamine neurons in the ventral tegmental area. Journal of Neuroscience, 27, 5414–5421.
Article PubMed Google Scholar
Garavan, H., Ross, T. J., Murphy, K., Roche, R. A., & Stein, E. A. (2002). Dissociable executive functions in the dynamic control of behavior: Inhibition, error detection, and correction. NeuroImage, 17, 1820–1829.
Article PubMed Google Scholar
Gariano, R. F., & Groves, P. M. (1988). Burst firing induced in mid-brain dopamine neurons by stimulation of the medial prefrontal and anterior cingulate cortices. Brain Research, 462, 194–198.
Article PubMed Google Scholar
Gehring, W. J., Goss, B., Coles, M. G., Meyer, D. E., & Donchin, E. (1993). A neural system for error detection and compensation. Psychological Science, 4, 385–390.
Article Google Scholar
Glimcher, P. W., Dorris, M. C., & Bayer, H. M. (2005). Physiological utility theory and the neuroeconomics of choice. Games & Economic Behavior, 52, 213–256.
Article Google Scholar
Gold, J. I., & Shadlen, M. N. (2000). Representation of a perceptual decision in developing oculomotor commands. Nature, 404, 390–394.
Article PubMed Google Scholar
Gold, J. I., & Shadlen, M. N. (2002). Banburismus and the brain: Decoding the relationship between sensory stimuli, decisions, and reward. Neuron, 36, 299–308.
Article PubMed Google Scholar
Goto, Y., & Grace, A. A. (2005). Dopaminergic modulation of limbic and cortical drive of nucleus accumbens in goal-directed behavior. Nature Neuroscience, 8, 805–812.
Article PubMed Google Scholar
Hajcak, G., Holroyd, C. B., Moser, J. S., & Simons, R. F. (2005). Brain potentials associated with expected and unexpected good and bad outcomes. Psychophysiology, 42, 161–170.
Article PubMed Google Scholar
Hajcak, G., Moser, J. S., Holroyd, C. B., & Simons, R. F. (2007). It’s worse than you thought: The feedback negativity and violations of reward prediction in gambling tasks. Psychophysiology, 44, 905–912.
Article PubMed Google Scholar
Hampton, A. N., Bossaerts, P., & O’Doherty, J. P. (2006). The role of the ventromedial prefrontal cortex in abstract state-based inference during decision making in humans. Journal of Neuroscience, 26, 8360–8367.
Article PubMed Google Scholar
Haruno, M., & Kawato, M. (2006). Different neural correlates of reward expectation and reward expectation error in the putamen and caudate nucleus during stimulus-action-reward association learning. Journal of Neurophysiology, 95, 948–959.
Article PubMed Google Scholar
Haruno, M., Kuroda, T., Doya, K., Toyama, K., Kimura, M., Samejima, K., et al. (2004). A neural correlate of reward-based behavioral learning in caudate nucleus: A functional magnetic resonance imaging study of a stochastic decision task. Journal of Neuroscience, 24, 1660–1665.
Article PubMed Google Scholar
Hewig, J., Trippe, R., Hecht, H., Coles, M. G., Holroyd, C. B., & Miltner, W. H. (2007). Decision-making in blackjack: An electrophysiological analysis. Cerebral Cortex, 17, 865–877.
Article PubMed Google Scholar
Hollerman, J. R., & Schultz, W. (1998). Dopamine neurons report an error in the temporal prediction of reward during learning. Nature Neuroscience, 1, 304–309.
Article PubMed Google Scholar
Holroyd, C. B., & Coles, M. G. (2002). The neural basis of human error processing: Reinforcement learning, dopamine, and the errorrelated negativity. Psychological Review, 109, 679–709.
Article PubMed Google Scholar
Holroyd, C. B., & Coles, M. G. (2008). Dorsal anterior cingulate integrates reinforcement history to guide voluntary behavior. Cortex, 44, 548–559.
Article PubMed Google Scholar
Holroyd, C. B., Nieuwenhuis, S., Yeung, N., & Cohen, J. D. (2003). Errors in reward prediction are reflected in the event-related brain potential. NeuroReport, 14, 2481–2484.
Article PubMed Google Scholar
Holroyd, C. B., Nieuwenhuis, S., Yeung, N., Nystrom, L., Mars, R. B., Coles, M. G., & Cohen, J. D. (2004). Dorsal anterior cingulate cortex shows fMRI response to internal and external error signals. Nature Neuroscience, 7, 497–498.
Article PubMed Google Scholar
Holroyd, C. B., Yeung, N., Coles, M. G., & Cohen, J. D. (2005). A mechanism for error detection in speeded response time tasks. Journal of Experimental Psychology: General, 134, 163–191.
Article Google Scholar
Houk, J. C., & Wise, S. P. (1995). Distributed modular architectures linking basal ganglia, cerebellum, and cerebral cortex: Their role in planning and controlling action. Cerebral Cortex, 5, 95–110.
Article PubMed Google Scholar
Joel, D., Niv, Y., & Ruppin, E. (2002). Actor-critic models of the basal ganglia: New anatomical and computational perspectives. Neural Networks, 15, 535–547.
Article PubMed Google Scholar
Kalenscher, T., Ohmann, T., & Güntürkün, O. (2006). The neuroscience of impulsive and self-controlled decisions. International Journal of Psychophysiology, 62, 203–211.
Article PubMed Google Scholar
King, J. A., Tenney, J., Rossi, V., Colamussi, L., & Burdick, S. (2003). Neural substrates underlying impulsivity. In J. A. King, C. F. Ferris, & I. I. Lederhendler (Eds.), Roots of mental illness in children (Annals of the New York Academy of Sciences, Vol. 1008, pp. 160–169). New York: New York Academy of Sciences.
Google Scholar
Knutson, B., & Wimmer, G. E. (2007). Splitting the difference: How does the brain code reward episodes? In B. W. Balleine, K. Doya, J. O’Doherty, & M. Sakagumi (Eds.), Reward and decision making in corticobasal ganglia networks (Annals of the New York Academy of Sciences, Vol. 1104, pp. 54–69). New York: New York Academy of Sciences.
Google Scholar
Koob, G. F. (1999). The role of the striatopallidal and extended amygdala systems in drug addiction. In J. F. McGinty (Ed.), Advancing from the ventral striatum to the extended amygdala: Implications for neuropsychiatry and drug abuse (Annals of the New York Academy of Sciences, Vol. 877, pp. 445–460). New York: New York Academy of Sciences.
Google Scholar
Krawczyk, D. C. (2002). Contributions of the prefrontal cortex to the neural basis of human decision making. Neuroscience & Biobehavioral Reviews, 26, 631–664.
Article Google Scholar
Kringelbach, M. L. (2005). The human orbitofrontal cortex: Linking reward to hedonic experience. Nature Reviews Neuroscience, 6, 691–702.
Article PubMed Google Scholar
Kringelbach, M. L., & Rolls, E. T. (2004). The functional neuroanatomy of the human orbitofrontal cortex: Evidence from neuroimaging and neuropsychology. Progress in Neurobiology, 72, 341–372.
Article PubMed Google Scholar
Lee, H. J., Youn, J. M., O, M. J., Gallagher, M., & Holland, P. C. (2006). Role of substantia nigra-amygdala connections in surpriseinduced enhancement of attention. Journal of Neuroscience, 26, 6077–6081.
Article PubMed Google Scholar
Ljungberg, T., & Enquist, M. (1987). Disruptive effects of low doses of d-amphetamine on the ability of rats to organize behaviour into functional sequences. Psychopharmacology, 93, 146–151.
Article PubMed Google Scholar
Maddox, W. T., Bohil, C. J., & Dodd, J. L. (2003). Linear transformations of the payoff matrix and decision criterion learning in perceptual categorization. Journal of Experimental Psychology: Learning, Memory, & Cognition, 29, 1174–1193.
Article Google Scholar
McClure, S. M., Berns, G. S., & Montague, P. R. (2003). Temporal prediction errors in a passive learning task activate human striatum. Neuron, 38, 339–346.
Article PubMed Google Scholar
Montague, P. R., & Berns, G. S. (2002). Neural economics and the biological substrates of valuation. Neuron, 36, 265–284.
Article PubMed Google Scholar
Montague, P. R., Dayan, P., & Sejnowski, T. J. (1996). A framework for mesencephalic dopamine systems based on predictive Hebbian learning. Journal of Neuroscience, 16, 1936–1947.
PubMed Google Scholar
Montague, P. R., Hyman, S. E., & Cohen, J. D. (2004). Computational roles for dopamine in behavioural control. Nature, 431, 760–767.
Article PubMed Google Scholar
Muller, S. V., Moller, J., Rodriguez-Fornells, A., & Munte, T. F. (2005). Brain potentials related to self-generated and external information used for performance monitoring. Clinical Neurophysiology, 116, 63–74.
Article PubMed Google Scholar
Murray, G. K., Corlett, P. R., Clark, L., Pessiglione, M., Blackwell, A. D., Honey, G., et al. (2008). Substantia nigra/ventral tegmental reward prediction error disruption in psychosis. Molecular Psychiatry, 13, 267–276.
Article Google Scholar
Nakahara, H., Itoh, H., Kawagoe, R., Takikawa, Y., & Hikosaka, O. (2004). Dopamine neurons can represent context-dependent prediction error. Neuron, 41, 269–280.
Article PubMed Google Scholar
Nieuwenhuis, S., Holroyd, C. B., Mol, N., & Coles, M. G. (2004). Reinforcement-related brain potentials from medial frontal cortex: Origins and functional significance. Neuroscience & Biobehavioral Reviews, 28, 441–448.
Article Google Scholar
Nieuwenhuis, S., Ridderinkhof, K. R., Talsma, D., Coles, M. G., Holroyd, C. B., Kok, A., & van der Molen, M. W. (2002). A computational account of altered error processing in older age: Dopamine and the error-related negativity. Cognitive, Affective, & Behavioral Neuroscience, 2, 19–36.
Article Google Scholar
Niv, Y., Daw, N. D., Joel, D., & Dayan, P. (2007). Tonic dopamine: Opportunity costs and the control of response vigor. Psychopharmacology, 191, 507–520.
Article PubMed Google Scholar
O’Doherty, J. [P.], Critchley, H., Deichmann, R., & Dolan, R. J. (2003). Dissociating valence of outcome from behavioral control in human orbital and ventral prefrontal cortices. Journal of Neuroscience, 23, 7931–7939.
PubMed Google Scholar
O’Doherty, J. P., Dayan, P., Friston, K., Critchley, H., & Dolan, R. J. (2003). Temporal difference models and reward-related learning in the human brain. Neuron, 38, 329–337.
Article PubMed Google Scholar
O’Doherty, J. P., Dayan, P., Schultz, J., Deichmann, R., Friston, K., & Dolan, R. J. (2004). Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science, 304, 452–454.
Article PubMed Google Scholar
O’Doherty, J. P., Hampton, A., & Kim, H. (2007). Model-based fMRI and its application to reward learning and decision making. In B. W. Balleine, K. Doya, J. O’Doherty, & M. Sakagumi (Eds.), Reward and decision making in corticobasal ganglia networks (Annals of the New York Academy of Sciences, Vol. 1104, pp. 35–53). New York: New York Academy of Sciences.
Google Scholar
Onn, S. P., & Wang, X. B. (2005). Differential modulation of anterior cingulate cortical activity by afferents from ventral tegmental area and mediodorsal thalamus. European Journal of Neuroscience, 21, 2975–2992.
Article PubMed Google Scholar
O’Reilly, R. C. (2006). Biologically based computational models of high-level cognition. Science, 314, 91–94.
Article PubMed Google Scholar
Pennartz, C. M. (1995). The ascending neuromodulatory systems in learning by reinforcement: Comparing computational conjectures with experimental findings. Brain Research Reviews, 21, 219–245.
Article PubMed Google Scholar
Platt, M. L., & Glimcher, P. W. (1999). Neural correlates of decision variables in parietal cortex. Nature, 400, 233–238.
Article PubMed Google Scholar
Potts, G. F., George, M. R., Martin, L. E., & Barratt, E. S. (2006). Reduced punishment sensitivity in neural systems of behavior monitoring in impulsive individuals. Neuroscience Letters, 397, 130–134.
Article PubMed Google Scholar
Ramnani, N., Elliott, R., Athwal, B. S., & Passingham, R. E. (2004). Prediction error for free monetary reward in the human prefrontal cortex. NeuroImage, 23, 777–786.
Article PubMed Google Scholar
Ratcliff, R. (2002). A diffusion model account of response time and accuracy in a brightness discrimination task: Fitting real data and failing to fit fake but plausible data. Psychonomic Bulletin & Review, 9, 278–291.
Article Google Scholar
Redgrave, P., & Gurney, K. (2006). The short-latency dopamine signal: A role in discovering novel actions? Nature Reviews Neuroscience, 7, 967–975.
Article PubMed Google Scholar
Redgrave, P., Prescott, T. J., & Gurney, K. (1999). Is the short-latency dopamine response too short to signal reward error? Trends in Neurosciences, 22, 146–151.
Article PubMed Google Scholar
Ridderinkhof, K. R., Nieuwenhuis, S., & Bashore, T. R. (2003). Errors are foreshadowed in brain potentials associated with action monitoring in cingulate cortex in humans. Neuroscience Letters, 348, 1–4.
Article PubMed Google Scholar
Rodriguez, P. F., Aron, A. R., & Poldrack, R. A. (2006). Ventralstriatal/ nucleus-accumbens sensitivity to prediction errors during classification learning. Human Brain Mapping, 27, 306–313.
Article PubMed Google Scholar
Rolls, E. T., McCabe, C., & Redoute, J. (2008). Expected value, reward outcome, and temporal difference error representations in a probabilistic decision task. Cerebral Cortex, 18, 652–663.
Article PubMed Google Scholar
Ruchsow, M., Grothe, J., Spitzer, M., & Kiefer, M. (2002). Human anterior cingulate cortex is activated by negative feedback: Evidence from event-related potentials in a guessing task. Neuroscience Letters, 325, 203–206.
Article PubMed Google Scholar
Rushworth, M. F., Buckley, M. J., Behrens, T. E., Walton, M. E., & Bannerman, D. M. (2007). Functional organization of the medial frontal cortex. Current Opinion in Neurobiology, 17, 220–227.
Article PubMed Google Scholar
Rushworth, M. F., Walton, M. E., Kennerley, S. W., & Bannerman, D. M. (2004). Action sets and decisions in the medial frontal cortex. Trends in Cognitive Sciences, 8, 410–417.
Article PubMed Google Scholar
Samejima, K., Ueda, Y., Doya, K., & Kimura, M. (2005). Representation of action-specific reward values in the striatum. Science, 310, 1337–1340.
Article PubMed Google Scholar
Schall, J. D. (1995). Neural basis of saccade target selection. Reviews in the Neurosciences, 6, 63–85.
PubMed Google Scholar
Schall, J. D. (2005). Decision making. Current Biology, 15, R9-R11.
Article PubMed Google Scholar
Schultz, W. (1998). Predictive reward signal of dopamine neurons. Journal of Neurophysiology, 80, 1–27.
PubMed Google Scholar
Schultz, W. (2001). Reward signaling by dopamine neurons. Neuroscientist, 7, 293–302.
Article PubMed Google Scholar
Schultz, W. (2006). Behavioral theories and the neurophysiology of reward. Annual Review of Psychology, 57, 87–115.
Article PubMed Google Scholar
Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and reward. Science, 275, 1593–1599.
Article PubMed Google Scholar
Schultz, W., & Dickinson, A. (2000). Neuronal coding of prediction errors. Annual Review of Neuroscience, 23, 473–500.
Article PubMed Google Scholar
Seymour, B., Daw, N., Dayan, P., Singer, T., & Dolan, R. (2007). Differential encoding of losses and gains in the human striatum. Journal of Neuroscience, 27, 4826–4831.
Article PubMed Google Scholar
Seymour, B., O’Doherty, J. P., Dayan, P., Koltzenburg, M., Jones, A. K., Dolan, R. J., et al. (2004). Temporal difference models describe higher-order learning in humans. Nature, 429, 664–667.
Article PubMed Google Scholar
Simen, P., Cohen, J. D., & Holmes, P. (2006). Rapid decision threshold modulation by reward rate in a neural network. Neural Networks, 19, 1013–1026.
Article PubMed Google Scholar
Spanagel, R., & Weiss, F. (1999). The dopamine hypothesis of reward: Past and current status. Trends in Neurosciences, 22, 521–527.
Article PubMed Google Scholar
Sugrue, L. P., Corrado, G. S., & Newsome, W. T. (2004). Matching behavior and the representation of value in the parietal cortex. Science, 304, 1782–1787.
Article PubMed Google Scholar
Suri, R. E. (2002). TD models of reward predictive responses in dopamine neurons. Neural Networks, 15, 523–533.
Article PubMed Google Scholar
Suri, R. E., & Schultz, W. (1999). A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task. Neuroscience, 91, 871–890.
Article PubMed Google Scholar
Sutton, R. S., & Barto, A. G. (1990). Time-derivative models of Pavlovian reinforcement. In M. Gabriel & J. Moore (Eds.), Learning and computational neuroscience: Foundations of adaptive networks (pp. 539–602). Cambridge, MA: MIT Press.
Google Scholar
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning. Cambridge, MA: MIT Press.
Google Scholar
Tanaka, S. C., Doya, K., Okada, G., Ueda, K., Okamoto, Y., & Yamawaki, S. (2004). Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops. Nature Neuroscience, 7, 887–893.
Article PubMed Google Scholar
Thorndike, E. L. (1911). Animal intelligence: Experimental studies. New York: Macmillan.
Google Scholar
Tong, Z. Y., Overton, P. G., & Clark, D. (1996). Stimulation of the prefrontal cortex in the rat induces patterns of activity in midbrain dopaminergic neurons which resemble natural burst events. Synapse, 22, 195–208.
Article PubMed Google Scholar
Ungless, M. A. (2004). Dopamine: The salient issue. Trends in Neurosciences, 27, 702–706.
Article PubMed Google Scholar
Williams, S. M., & Goldman-Rakic, P. S. (1998). Widespread origin of the primate mesofrontal dopamine system. Cerebral Cortex, 8, 321–345.
Article PubMed Google Scholar
Wrase, J., Kahnt, T., Schlagenhauf, F., Beck, A., Cohen, M. X, Knutson, B., & Heinz, A. (2007). Different neural systems adjust motor behavior in response to reward and punishment. NeuroImage, 36, 1253–1262.
Article PubMed Google Scholar
Yacubian, J., Glascher, J., Schroeder, K., Sommer, T., Braus, D. F., & Buchel, C. (2006). Dissociable systems for gain- and loss-related value predictions and errors of prediction in the human brain. Journal of Neuroscience, 26, 9530–9537.
Article PubMed Google Scholar
Yacubian, J., Sommer, T., Schroeder, K., Glascher, J., Kalisch, R., Leuenberger, B., et al. (2007). Gene-gene interaction associated with neural reward sensitivity. Proceedings of the National Academy of Sciences, 104, 8125–8130.
Article Google Scholar
Yasuda, A., Sato, A., Miyawaki, K., Kumano, H., & Kuboki, T. (2004). Error-related negativity reflects detection of negative reward prediction error. NeuroReport, 15, 2561–2565.
Article PubMed Google Scholar
Yeung, N., & Sanfey, A. G. (2004). Independent coding of reward magnitude and valence in the human brain. Journal of Neuroscience, 24, 6258–6264.
Article PubMed Google Scholar
Zhou, Q. Y., & Palmiter, R. D. (1995). Dopamine-deficient mice are severely hypoactive, adipsic, and aphagic. Cell, 83, 1197–1209.
Article PubMed Google Scholar

Download references

Author information

Authors and Affiliations

University of California, Davis, California
Michael X Cohen
University of Bonn, Bonn, Germany
Michael X Cohen

Authors

Michael X Cohen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael X Cohen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cohen, M.X. Neurocomputational mechanisms of reinforcement-guided learning in humans: A review. Cognitive, Affective, & Behavioral Neuroscience 8, 113–125 (2008). https://doi.org/10.3758/CABN.8.2.113

Download citation

Received: 02 August 2007
Accepted: 18 October 2007
Issue Date: June 2008
DOI: https://doi.org/10.3758/CABN.8.2.113

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Neurocomputational mechanisms of reinforcement-guided learning in humans: A review

Abstract

Article PDF

Similar content being viewed by others

Twenty years of load theory—Where are we now, and where should we go next?

Decision Making: a Theoretical Review

Revisiting the Role of Rewards in Motivation and Learning: Implications of Neuroscientific Research

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Neurocomputational mechanisms of reinforcement-guided learning in humans: A review

Abstract

Article PDF

Similar content being viewed by others

Twenty years of load theory—Where are we now, and where should we go next?

Decision Making: a Theoretical Review

Revisiting the Role of Rewards in Motivation and Learning: Implications of Neuroscientific Research

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation