Abstract
Adapting decision making according to dynamic and probabilistic changes in action-reward contingencies is critical for survival in a competitive and resource-limited world. Much research has focused on elucidating the neural systems and computations that underlie how the brain identifies whether the consequences of actions are relatively good or bad. In contrast, less empirical research has focused on the mechanisms by which reinforcements might be used to guide decision making. Here, I review recent studies in which an attempt to bridge this gap has been made by characterizing how humans use reward information to guide and optimize decision making. Regions that have been implicated in reinforcement processing, including the striatum, orbitofrontal cortex, and anterior cingulate, also seem to mediate how reinforcements are used to adjust subsequent decision making. This research provides insights into why the brain devotes resources to evaluating reinforcements and suggests a direction for future research, from studying the mechanisms of reinforcement processing to studying the mechanisms of reinforcement learning.
Article PDF
Similar content being viewed by others
References
Abler, B., Walter, H., Erk, S., Kammerer, H., & Spitzer, M. (2006). Prediction error as a linear function of reward probability is coded in human nucleus accumbens. NeuroImage, 31, 790–795.
Aron, A. R., Shohamy, D., Clark, J., Myers, C., Gluck, M. A., & Poldrack, R. A. (2004). Human midbrain sensitivity to cognitive feedback and uncertainty during classification learning. Journal of Neurophysiology, 92, 1144–1152.
Aston-Jones, G., & Cohen, J. D. (2005). Adaptive gain and the role of the locus coeruleus-norepinephrine system in optimal performance. Journal of Comparative Neurology, 493, 99–110.
Barraclough, D. J., Conroy, M. L., & Lee, D. (2004). Prefrontal cortex and decision making in a mixed-strategy game. Nature Neuroscience, 7, 404–410.
Barto, A. G. (1995). Reinforcement learning. In M. A. Arbib (Ed.), Handbook of brain theory and neural networks (pp. 804–809). Cambridge, MA: MIT Press.
Bayer, H. M., & Glimcher, P. W. (2005). Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47, 129–141.
Bayer, H. M., Lau, B., & Glimcher, P. W. (2007). Statistics of midbrain dopamine neuron spike trains in the awake primate. Journal of Neurophysiology, 98, 1428–1439.
Behrens, T. E., Woolrich, M. W., Walton, M. E., & Rushworth, M. F. (2007). Learning the value of information in an uncertain world. Nature Neuroscience, 10, 1214–1221.
Berridge, K. C. (2007). The debate over dopamine’s role in reward: The case for incentive salience. Psychopharmacology, 191, 391–431.
Bilder, R. M., Volavka, J., Lachman, H. M., & Grace, A. A. (2004). The catechol-O-methyltransferase polymorphism: Relations to the tonic-phasic dopamine hypothesis and neuropsychiatric phenotypes. Neuropsychopharmacology, 29, 1943–1961.
Blum, K., Braverman, E. R., Holder, J. M., Lubar, J. F., Monastra, V. J., Miller, D., et al. (2000). Reward deficiency syndrome: A biogenetic model for the diagnosis and treatment of impulsive, addictive, and compulsive behaviors. Journal of Psychoactive Drugs, 32 (Suppl., i–iv), 1–112.
Bogacz, R., Brown, E., Moehlis, J., Holmes, P., & Cohen, J. D. (2006). The physics of optimal decision making: A formal analysis of models of performance in two-alternative forced-choice tasks. Psychological Review, 113, 700–765.
Bogacz, R., McClure, S. M., Li, J., Cohen, J. D., & Montague, P. R. (2007). Short-term memory traces for action bias in human reinforcement learning. Brain Research, 1153, 111–121.
Braver, T. S., Barch, D. M., Keys, B. A., Carter, C. S., Cohen, J. D., Kaye, J. A., et al. (2001). Context processing in older adults: Evidence for a theory relating cognitive control to neurobiology in healthy aging. Journal of Experimental Psychology: General, 130, 746–763.
Braver, T. S., & Brown, J. W. (2003). Principles of pleasure prediction: Specifying the neural dynamics of human reward learning. Neuron, 38, 150–152.
Brown, J. W., & Braver, T. S. (2007). Risk prediction and aversion by anterior cingulate cortex. Cognitive, Affective, & Behavioral Neuroscience, 7, 266–277.
Brown, J. W., & Braver, T. S. (2008). A computational model of risk, conflict, and individual difference effects in the anterior cingulate cortex. Brain Research, 1202, 99–108.
Camerer, C. F. (2003). Behavioral game theory: Experiments in strategic interaction. Princeton: Princeton University Press.
Cardinal, R. N. (2006). Neural systems implicated in delayed and probabilistic reinforcement. Neural Networks, 19, 1277–1301.
Carr, D. B., & Sesack, S. R. (2000). Projections from the rat prefrontal cortex to the ventral tegmental area: Target specificity in the synaptic associations with mesoaccumbens and mesocortical neurons. Journal of Neuroscience, 20, 3864–3873.
Cepeda, C., & Levine, M. S. (1998). Dopamine and N-methyl-D-aspartate receptor interactions in the neostriatum. Developmental Neuroscience, 20, 1–18.
Cohen, J. D., McClure, S. M., & Yu, A. J. (2007). Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration. Philosophical Transactions of the Royal Society B, 362, 933–942.
Cohen, J. D., & Servan-Schreiber, D. (1993). A theory of dopamine function and its role in cognitive deficits in schizophrenia. Schizophrenia Bulletin, 19, 85–104.
Cohen, M. X (2007). Individual differences and the neural representations of reward expectation and reward prediction error. Social Cognitive & Affective Neuroscience, 2, 20–30.
Cohen, M. X, Elger, C. E., & Ranganath, C. (2007). Reward expectation modulates feedback-related negativity and EEG spectra. NeuroImage, 35, 968–978.
Cohen, M. X, & Ranganath, C. (2005). Behavioral and neural predictors of upcoming decisions. Cognitive, Affective, & Behavioral Neuroscience, 5, 117–126.
Cohen, M. X, & Ranganath, C. (2007). Reinforcement learning signals predict future decisions. Journal of Neuroscience, 27, 371–378.
Cohen, M. X, Young, J., Baek, J. M., Kessler, C., & Ranganath, C. (2005). Individual differences in extraversion and dopamine genetics predict neural reward responses. Cognitive Brain Research, 25, 851–861.
Cools, R., Clark, L., Owen, A. M., & Robbins, T. W. (2002). Defining the neural mechanisms of probabilistic reversal learning using eventrelated functional magnetic resonance imaging. Journal of Neuroscience, 22, 4563–4567.
Cools, R., Lewis, S. J., Clark, L., Barker, R. A., & Robbins, T. W. (2007). L-DOPA disrupts activity in the nucleus accumbens during reversal learning in Parkinson’s disease. Neuropsychopharmacology, 32, 180–189.
Daw, N. D., & Doya, K. (2006). The computational neurobiology of learning and reward. Current Opinion in Neurobiology, 16, 199–204.
Daw, N. D., O’Doherty, J. P., Dayan, P., Seymour, B., & Dolan, R. J. (2006). Cortical substrates for exploratory decisions in humans. Nature, 441, 876–879.
Dayan, P., & Balleine, B. W. (2002). Reward, motivation, and reinforcement learning. Neuron, 36, 285–298.
Debener, S., Ullsperger, M., Siegel, M., Fiehler, K., von Cramon, D. Y., & Engel, A. K. (2005). Trial-by-trial coupling of concurrent electroencephalogram and functional magnetic resonance imaging identifies the dynamics of performance monitoring. Journal of Neuroscience, 25, 11730–11737.
Dehaene, S., & Changeux, J. P. (2000). Reward-dependent learning in neuronal networks for planning and decision making. Progress in Brain Research, 126, 217–229.
Delgado, M. R., Miller, M. M., Inati, S., & Phelps, E. A. (2005). An fMRI study of reward-related probability learning. NeuroImage, 24, 862–873.
Ditterich, J. (2006). Stochastic models of decisions about motion direction: Behavior and physiology. Neural Networks, 19, 981–1012.
Egelman, D. M., Person, C., & Montague, P. R. (1998). A computational role for dopamine delivery in human decision-making. Journal of Cognitive Neuroscience, 10, 623–630.
Evenden, J. L., & Robbins, T. W. (1983). Increased response switching, perseveration and perseverative switching following d-amphetamine in the rat. Psychopharmacology, 80, 67–73.
Everitt, B. J., Parkinson, J. A., Olmstead, M. C., Arroyo, M., Robledo, P., & Robbins, T. W. (1999). Associative processes in addiction and reward: The role of amygdala-ventral striatal subsystems. In J. F. McGinty (Ed.), Advancing from the ventral striatum to the extended amygdala: Implications for neuropsychiatry and drug abuse (Annals of the New York Academy of Sciences, Vol. 877, pp. 412–438). New York: New York Academy of Sciences.
Fiehler, K., Ullsperger, M., & von Cramon, D. Y. (2004). Neural correlates of error detection and error correction: Is there a common neuroanatomical substrate? European Journal of Neuroscience, 19, 3081–3087.
Filoteo, J. V., Maddox, W. T., Simmons, A. N., Ing, A. D., Cagigas, X. E., Matthews, S., & Paulus, M. P. (2005). Cortical and subcortical brain regions involved in rule-based category learning. NeuroReport, 16, 111–115.
Floresco, S. B., & Magyar, O. (2006). Mesocortical dopamine modulation of executive functions: Beyond working memory. Psychopharmacology, 188, 567–585.
Frank, M. J. (2005). Dynamic dopamine modulation in the basal ganglia: A neurocomputational account of cognitive deficits in medicated and nonmedicated Parkinsonism. Journal of Cognitive Neuroscience, 17, 51–72.
Frank, M. J. (2006). Hold your horses: A dynamic computational role for the subthalamic nucleus in decision making. Neural Networks, 19, 1120–1136.
Frank, M. J., & Claus, E. D. (2006). Anatomy of a decision: Striatoorbitofrontal interactions in reinforcement learning, decision making, and reversal. Psychological Review, 113, 300–326.
Frank, M. J., Moustafa, A. A., Haughey, H. M., Curran, T., & Hutchison, K. E. (2007). Genetic triple dissociation reveals multiple roles for dopamine in reinforcement learning. Proceedings of the National Academy of Sciences, 104, 16311–16316.
Frank, M. J., Seeberger, L. C., & O’Reilly, R. C. (2004). By carrot or by stick: Cognitive reinforcement learning in Parkinsonism. Science, 306, 1940–1943.
Frank, M. J., Woroch, B. S., & Curran, T. (2005). Error-related negativity predicts reinforcement learning and conflict biases. Neuron, 47, 495–501.
Franken, I. H., van Strien, J. W., Franzek, E. J., & van de Wetering, B. J. (2007). Error-processing deficits in patients with cocaine dependence. Biological Psychology, 75, 45–51.
Gao, M., Liu, C. L., Yang, S., Jin, G. Z., Bunney, B. S., & Shi, W. X. (2007). Functional coupling between the prefrontal cortex and dopamine neurons in the ventral tegmental area. Journal of Neuroscience, 27, 5414–5421.
Garavan, H., Ross, T. J., Murphy, K., Roche, R. A., & Stein, E. A. (2002). Dissociable executive functions in the dynamic control of behavior: Inhibition, error detection, and correction. NeuroImage, 17, 1820–1829.
Gariano, R. F., & Groves, P. M. (1988). Burst firing induced in mid-brain dopamine neurons by stimulation of the medial prefrontal and anterior cingulate cortices. Brain Research, 462, 194–198.
Gehring, W. J., Goss, B., Coles, M. G., Meyer, D. E., & Donchin, E. (1993). A neural system for error detection and compensation. Psychological Science, 4, 385–390.
Glimcher, P. W., Dorris, M. C., & Bayer, H. M. (2005). Physiological utility theory and the neuroeconomics of choice. Games & Economic Behavior, 52, 213–256.
Gold, J. I., & Shadlen, M. N. (2000). Representation of a perceptual decision in developing oculomotor commands. Nature, 404, 390–394.
Gold, J. I., & Shadlen, M. N. (2002). Banburismus and the brain: Decoding the relationship between sensory stimuli, decisions, and reward. Neuron, 36, 299–308.
Goto, Y., & Grace, A. A. (2005). Dopaminergic modulation of limbic and cortical drive of nucleus accumbens in goal-directed behavior. Nature Neuroscience, 8, 805–812.
Hajcak, G., Holroyd, C. B., Moser, J. S., & Simons, R. F. (2005). Brain potentials associated with expected and unexpected good and bad outcomes. Psychophysiology, 42, 161–170.
Hajcak, G., Moser, J. S., Holroyd, C. B., & Simons, R. F. (2007). It’s worse than you thought: The feedback negativity and violations of reward prediction in gambling tasks. Psychophysiology, 44, 905–912.
Hampton, A. N., Bossaerts, P., & O’Doherty, J. P. (2006). The role of the ventromedial prefrontal cortex in abstract state-based inference during decision making in humans. Journal of Neuroscience, 26, 8360–8367.
Haruno, M., & Kawato, M. (2006). Different neural correlates of reward expectation and reward expectation error in the putamen and caudate nucleus during stimulus-action-reward association learning. Journal of Neurophysiology, 95, 948–959.
Haruno, M., Kuroda, T., Doya, K., Toyama, K., Kimura, M., Samejima, K., et al. (2004). A neural correlate of reward-based behavioral learning in caudate nucleus: A functional magnetic resonance imaging study of a stochastic decision task. Journal of Neuroscience, 24, 1660–1665.
Hewig, J., Trippe, R., Hecht, H., Coles, M. G., Holroyd, C. B., & Miltner, W. H. (2007). Decision-making in blackjack: An electrophysiological analysis. Cerebral Cortex, 17, 865–877.
Hollerman, J. R., & Schultz, W. (1998). Dopamine neurons report an error in the temporal prediction of reward during learning. Nature Neuroscience, 1, 304–309.
Holroyd, C. B., & Coles, M. G. (2002). The neural basis of human error processing: Reinforcement learning, dopamine, and the errorrelated negativity. Psychological Review, 109, 679–709.
Holroyd, C. B., & Coles, M. G. (2008). Dorsal anterior cingulate integrates reinforcement history to guide voluntary behavior. Cortex, 44, 548–559.
Holroyd, C. B., Nieuwenhuis, S., Yeung, N., & Cohen, J. D. (2003). Errors in reward prediction are reflected in the event-related brain potential. NeuroReport, 14, 2481–2484.
Holroyd, C. B., Nieuwenhuis, S., Yeung, N., Nystrom, L., Mars, R. B., Coles, M. G., & Cohen, J. D. (2004). Dorsal anterior cingulate cortex shows fMRI response to internal and external error signals. Nature Neuroscience, 7, 497–498.
Holroyd, C. B., Yeung, N., Coles, M. G., & Cohen, J. D. (2005). A mechanism for error detection in speeded response time tasks. Journal of Experimental Psychology: General, 134, 163–191.
Houk, J. C., & Wise, S. P. (1995). Distributed modular architectures linking basal ganglia, cerebellum, and cerebral cortex: Their role in planning and controlling action. Cerebral Cortex, 5, 95–110.
Joel, D., Niv, Y., & Ruppin, E. (2002). Actor-critic models of the basal ganglia: New anatomical and computational perspectives. Neural Networks, 15, 535–547.
Kalenscher, T., Ohmann, T., & Güntürkün, O. (2006). The neuroscience of impulsive and self-controlled decisions. International Journal of Psychophysiology, 62, 203–211.
King, J. A., Tenney, J., Rossi, V., Colamussi, L., & Burdick, S. (2003). Neural substrates underlying impulsivity. In J. A. King, C. F. Ferris, & I. I. Lederhendler (Eds.), Roots of mental illness in children (Annals of the New York Academy of Sciences, Vol. 1008, pp. 160–169). New York: New York Academy of Sciences.
Knutson, B., & Wimmer, G. E. (2007). Splitting the difference: How does the brain code reward episodes? In B. W. Balleine, K. Doya, J. O’Doherty, & M. Sakagumi (Eds.), Reward and decision making in corticobasal ganglia networks (Annals of the New York Academy of Sciences, Vol. 1104, pp. 54–69). New York: New York Academy of Sciences.
Koob, G. F. (1999). The role of the striatopallidal and extended amygdala systems in drug addiction. In J. F. McGinty (Ed.), Advancing from the ventral striatum to the extended amygdala: Implications for neuropsychiatry and drug abuse (Annals of the New York Academy of Sciences, Vol. 877, pp. 445–460). New York: New York Academy of Sciences.
Krawczyk, D. C. (2002). Contributions of the prefrontal cortex to the neural basis of human decision making. Neuroscience & Biobehavioral Reviews, 26, 631–664.
Kringelbach, M. L. (2005). The human orbitofrontal cortex: Linking reward to hedonic experience. Nature Reviews Neuroscience, 6, 691–702.
Kringelbach, M. L., & Rolls, E. T. (2004). The functional neuroanatomy of the human orbitofrontal cortex: Evidence from neuroimaging and neuropsychology. Progress in Neurobiology, 72, 341–372.
Lee, H. J., Youn, J. M., O, M. J., Gallagher, M., & Holland, P. C. (2006). Role of substantia nigra-amygdala connections in surpriseinduced enhancement of attention. Journal of Neuroscience, 26, 6077–6081.
Ljungberg, T., & Enquist, M. (1987). Disruptive effects of low doses of d-amphetamine on the ability of rats to organize behaviour into functional sequences. Psychopharmacology, 93, 146–151.
Maddox, W. T., Bohil, C. J., & Dodd, J. L. (2003). Linear transformations of the payoff matrix and decision criterion learning in perceptual categorization. Journal of Experimental Psychology: Learning, Memory, & Cognition, 29, 1174–1193.
McClure, S. M., Berns, G. S., & Montague, P. R. (2003). Temporal prediction errors in a passive learning task activate human striatum. Neuron, 38, 339–346.
Montague, P. R., & Berns, G. S. (2002). Neural economics and the biological substrates of valuation. Neuron, 36, 265–284.
Montague, P. R., Dayan, P., & Sejnowski, T. J. (1996). A framework for mesencephalic dopamine systems based on predictive Hebbian learning. Journal of Neuroscience, 16, 1936–1947.
Montague, P. R., Hyman, S. E., & Cohen, J. D. (2004). Computational roles for dopamine in behavioural control. Nature, 431, 760–767.
Muller, S. V., Moller, J., Rodriguez-Fornells, A., & Munte, T. F. (2005). Brain potentials related to self-generated and external information used for performance monitoring. Clinical Neurophysiology, 116, 63–74.
Murray, G. K., Corlett, P. R., Clark, L., Pessiglione, M., Blackwell, A. D., Honey, G., et al. (2008). Substantia nigra/ventral tegmental reward prediction error disruption in psychosis. Molecular Psychiatry, 13, 267–276.
Nakahara, H., Itoh, H., Kawagoe, R., Takikawa, Y., & Hikosaka, O. (2004). Dopamine neurons can represent context-dependent prediction error. Neuron, 41, 269–280.
Nieuwenhuis, S., Holroyd, C. B., Mol, N., & Coles, M. G. (2004). Reinforcement-related brain potentials from medial frontal cortex: Origins and functional significance. Neuroscience & Biobehavioral Reviews, 28, 441–448.
Nieuwenhuis, S., Ridderinkhof, K. R., Talsma, D., Coles, M. G., Holroyd, C. B., Kok, A., & van der Molen, M. W. (2002). A computational account of altered error processing in older age: Dopamine and the error-related negativity. Cognitive, Affective, & Behavioral Neuroscience, 2, 19–36.
Niv, Y., Daw, N. D., Joel, D., & Dayan, P. (2007). Tonic dopamine: Opportunity costs and the control of response vigor. Psychopharmacology, 191, 507–520.
O’Doherty, J. [P.], Critchley, H., Deichmann, R., & Dolan, R. J. (2003). Dissociating valence of outcome from behavioral control in human orbital and ventral prefrontal cortices. Journal of Neuroscience, 23, 7931–7939.
O’Doherty, J. P., Dayan, P., Friston, K., Critchley, H., & Dolan, R. J. (2003). Temporal difference models and reward-related learning in the human brain. Neuron, 38, 329–337.
O’Doherty, J. P., Dayan, P., Schultz, J., Deichmann, R., Friston, K., & Dolan, R. J. (2004). Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science, 304, 452–454.
O’Doherty, J. P., Hampton, A., & Kim, H. (2007). Model-based fMRI and its application to reward learning and decision making. In B. W. Balleine, K. Doya, J. O’Doherty, & M. Sakagumi (Eds.), Reward and decision making in corticobasal ganglia networks (Annals of the New York Academy of Sciences, Vol. 1104, pp. 35–53). New York: New York Academy of Sciences.
Onn, S. P., & Wang, X. B. (2005). Differential modulation of anterior cingulate cortical activity by afferents from ventral tegmental area and mediodorsal thalamus. European Journal of Neuroscience, 21, 2975–2992.
O’Reilly, R. C. (2006). Biologically based computational models of high-level cognition. Science, 314, 91–94.
Pennartz, C. M. (1995). The ascending neuromodulatory systems in learning by reinforcement: Comparing computational conjectures with experimental findings. Brain Research Reviews, 21, 219–245.
Platt, M. L., & Glimcher, P. W. (1999). Neural correlates of decision variables in parietal cortex. Nature, 400, 233–238.
Potts, G. F., George, M. R., Martin, L. E., & Barratt, E. S. (2006). Reduced punishment sensitivity in neural systems of behavior monitoring in impulsive individuals. Neuroscience Letters, 397, 130–134.
Ramnani, N., Elliott, R., Athwal, B. S., & Passingham, R. E. (2004). Prediction error for free monetary reward in the human prefrontal cortex. NeuroImage, 23, 777–786.
Ratcliff, R. (2002). A diffusion model account of response time and accuracy in a brightness discrimination task: Fitting real data and failing to fit fake but plausible data. Psychonomic Bulletin & Review, 9, 278–291.
Redgrave, P., & Gurney, K. (2006). The short-latency dopamine signal: A role in discovering novel actions? Nature Reviews Neuroscience, 7, 967–975.
Redgrave, P., Prescott, T. J., & Gurney, K. (1999). Is the short-latency dopamine response too short to signal reward error? Trends in Neurosciences, 22, 146–151.
Ridderinkhof, K. R., Nieuwenhuis, S., & Bashore, T. R. (2003). Errors are foreshadowed in brain potentials associated with action monitoring in cingulate cortex in humans. Neuroscience Letters, 348, 1–4.
Rodriguez, P. F., Aron, A. R., & Poldrack, R. A. (2006). Ventralstriatal/ nucleus-accumbens sensitivity to prediction errors during classification learning. Human Brain Mapping, 27, 306–313.
Rolls, E. T., McCabe, C., & Redoute, J. (2008). Expected value, reward outcome, and temporal difference error representations in a probabilistic decision task. Cerebral Cortex, 18, 652–663.
Ruchsow, M., Grothe, J., Spitzer, M., & Kiefer, M. (2002). Human anterior cingulate cortex is activated by negative feedback: Evidence from event-related potentials in a guessing task. Neuroscience Letters, 325, 203–206.
Rushworth, M. F., Buckley, M. J., Behrens, T. E., Walton, M. E., & Bannerman, D. M. (2007). Functional organization of the medial frontal cortex. Current Opinion in Neurobiology, 17, 220–227.
Rushworth, M. F., Walton, M. E., Kennerley, S. W., & Bannerman, D. M. (2004). Action sets and decisions in the medial frontal cortex. Trends in Cognitive Sciences, 8, 410–417.
Samejima, K., Ueda, Y., Doya, K., & Kimura, M. (2005). Representation of action-specific reward values in the striatum. Science, 310, 1337–1340.
Schall, J. D. (1995). Neural basis of saccade target selection. Reviews in the Neurosciences, 6, 63–85.
Schall, J. D. (2005). Decision making. Current Biology, 15, R9-R11.
Schultz, W. (1998). Predictive reward signal of dopamine neurons. Journal of Neurophysiology, 80, 1–27.
Schultz, W. (2001). Reward signaling by dopamine neurons. Neuroscientist, 7, 293–302.
Schultz, W. (2006). Behavioral theories and the neurophysiology of reward. Annual Review of Psychology, 57, 87–115.
Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and reward. Science, 275, 1593–1599.
Schultz, W., & Dickinson, A. (2000). Neuronal coding of prediction errors. Annual Review of Neuroscience, 23, 473–500.
Seymour, B., Daw, N., Dayan, P., Singer, T., & Dolan, R. (2007). Differential encoding of losses and gains in the human striatum. Journal of Neuroscience, 27, 4826–4831.
Seymour, B., O’Doherty, J. P., Dayan, P., Koltzenburg, M., Jones, A. K., Dolan, R. J., et al. (2004). Temporal difference models describe higher-order learning in humans. Nature, 429, 664–667.
Simen, P., Cohen, J. D., & Holmes, P. (2006). Rapid decision threshold modulation by reward rate in a neural network. Neural Networks, 19, 1013–1026.
Spanagel, R., & Weiss, F. (1999). The dopamine hypothesis of reward: Past and current status. Trends in Neurosciences, 22, 521–527.
Sugrue, L. P., Corrado, G. S., & Newsome, W. T. (2004). Matching behavior and the representation of value in the parietal cortex. Science, 304, 1782–1787.
Suri, R. E. (2002). TD models of reward predictive responses in dopamine neurons. Neural Networks, 15, 523–533.
Suri, R. E., & Schultz, W. (1999). A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task. Neuroscience, 91, 871–890.
Sutton, R. S., & Barto, A. G. (1990). Time-derivative models of Pavlovian reinforcement. In M. Gabriel & J. Moore (Eds.), Learning and computational neuroscience: Foundations of adaptive networks (pp. 539–602). Cambridge, MA: MIT Press.
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning. Cambridge, MA: MIT Press.
Tanaka, S. C., Doya, K., Okada, G., Ueda, K., Okamoto, Y., & Yamawaki, S. (2004). Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops. Nature Neuroscience, 7, 887–893.
Thorndike, E. L. (1911). Animal intelligence: Experimental studies. New York: Macmillan.
Tong, Z. Y., Overton, P. G., & Clark, D. (1996). Stimulation of the prefrontal cortex in the rat induces patterns of activity in midbrain dopaminergic neurons which resemble natural burst events. Synapse, 22, 195–208.
Ungless, M. A. (2004). Dopamine: The salient issue. Trends in Neurosciences, 27, 702–706.
Williams, S. M., & Goldman-Rakic, P. S. (1998). Widespread origin of the primate mesofrontal dopamine system. Cerebral Cortex, 8, 321–345.
Wrase, J., Kahnt, T., Schlagenhauf, F., Beck, A., Cohen, M. X, Knutson, B., & Heinz, A. (2007). Different neural systems adjust motor behavior in response to reward and punishment. NeuroImage, 36, 1253–1262.
Yacubian, J., Glascher, J., Schroeder, K., Sommer, T., Braus, D. F., & Buchel, C. (2006). Dissociable systems for gain- and loss-related value predictions and errors of prediction in the human brain. Journal of Neuroscience, 26, 9530–9537.
Yacubian, J., Sommer, T., Schroeder, K., Glascher, J., Kalisch, R., Leuenberger, B., et al. (2007). Gene-gene interaction associated with neural reward sensitivity. Proceedings of the National Academy of Sciences, 104, 8125–8130.
Yasuda, A., Sato, A., Miyawaki, K., Kumano, H., & Kuboki, T. (2004). Error-related negativity reflects detection of negative reward prediction error. NeuroReport, 15, 2561–2565.
Yeung, N., & Sanfey, A. G. (2004). Independent coding of reward magnitude and valence in the human brain. Journal of Neuroscience, 24, 6258–6264.
Zhou, Q. Y., & Palmiter, R. D. (1995). Dopamine-deficient mice are severely hypoactive, adipsic, and aphagic. Cell, 83, 1197–1209.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cohen, M.X. Neurocomputational mechanisms of reinforcement-guided learning in humans: A review. Cognitive, Affective, & Behavioral Neuroscience 8, 113–125 (2008). https://doi.org/10.3758/CABN.8.2.113
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.3758/CABN.8.2.113