Abstract
Evaluation of both immediate and future outcomes of one's actions is a critical requirement for intelligent behavior. Using functional magnetic resonance imaging (fMRI), we investigated brain mechanisms for reward prediction at different time scales in a Markov decision task. When human subjects learned actions on the basis of immediate rewards, significant activity was seen in the lateral orbitofrontal cortex and the striatum. When subjects learned to act in order to obtain large future rewards while incurring small immediate losses, the dorsolateral prefrontal cortex, inferior parietal cortex, dorsal raphe nucleus and cerebellum were also activated. Computational model–based regression analysis using the predicted future rewards and prediction errors estimated from subjects' performance data revealed graded maps of time scale within the insula and the striatum: ventroanterior regions were involved in predicting immediate rewards and dorsoposterior regions were involved in predicting future rewards. These results suggest differential involvement of the cortico-basal ganglia loops in reward prediction at different time scales.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Bechara, A., Damasio, H. & Damasio, A.R. Emotion, decision making and the orbitofrontal cortex. Cereb. Cortex 10, 295–307 (2000).
Mobini, S. et al. Effects of lesions of the orbitofrontal cortex on sensitivity to delayed and probabilistic reinforcement. Psychopharmacology (Berl.) 160, 290–298 (2002).
Cardinal, R.N., Pennicott, D.R., Sugathapala, C.L., Robbins, T.W. & Everitt, B.J. Impulsive choice induced in rats by lesions of the nucleus accumbens core. Science 292, 2499–2501 (2001).
Rogers, R.D. et al. Dissociable deficits in the decision-making cognition of chronic amphetamine abusers, opiate abusers, patients with focal damage to prefrontal cortex, and tryptophan-depleted normal volunteers: evidence for monoaminergic mechanisms. Neuropsychopharmacology 20, 322–339 (1999).
Evenden, J.L. & Ryan, C.N. The pharmacology of impulsive behaviour in rats: the effects of drugs on response choice with varying delays of reinforcement. Psychopharmacology (Berl.) 128, 161–170 (1996).
Mobini, S., Chiang, T.J., Ho, M.Y., Bradshaw, C.M. & Szabadi, E. Effects of central 5-hydroxytryptamine depletion on sensitivity to delayed and probabilistic reinforcement. Psychopharmacology (Berl.) 152, 390–397 (2000).
Doya, K. Metalearning and neuromodulation. Neural Net. 15, 495–506 (2002).
Berns, G.S., McClure, S.M., Pagnoni, G. & Montague, P.R. Predictability modulates human brain response to reward. J. Neurosci. 21, 2793–2798 (2001).
Breiter, H.C., Aharon, I., Kahneman, D., Dale, A. & Shizgal, P. Functional imaging of neural responses to expectancy and experience of monetary gains and losses. Neuron 30, 619–639 (2001).
O'Doherty, J.P., Deichmann, R., Critchley, H.D. & Dolan, R.J. Neural responses during anticipation of a primary taste reward. Neuron 33, 815–826 (2002).
O'Doherty, J.P., Dayan, P., Friston, K., Critchley, H. & Dolan, R.J. Temporal difference models and reward-related learning in the human brain. Neuron 38, 329–337 (2003).
Sutton, R.S. & Barto, A.G. Reinforcement Learning (MIT Press, Cambridge, Massachusetts, 1998).
Houk, J.C., Adams, J.L. & Barto, A.G. in Models of Information Processing in the Basal Ganglia (eds. Houk, J.C., Davis, J.L. & Beiser, D.G.) 249–270 (MIT Press, Cambridge, Massachusetts, 1995).
Schultz, W., Dayan, P. & Montague, P.R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).
Doya, K. Complementary roles of basal ganglia and cerebellum in learning and motor control. Curr. Opin. Neurobiol. 10, 732–739 (2000).
McClure, S.M., Berns, G.S. & Montague, P.R. Temporal prediction errors in a passive learning task activate human striatum. Neuron 38, 339–346 (2003).
Mesulam, M.M. & Mufson, E.J. Insula of the old world monkey. III: Efferent cortical output and comments on function. J. Comp. Neurol. 212, 38–52 (1982).
Cavada, C., Company, T., Tejedor, J., Cruz-Rizzolo, R.J. & Reinoso-Suarez, F. The anatomical connections of the macaque monkey orbitofrontal cortex. Cereb. Cortex 10, 220–242 (2000).
Chikama, M., McFarland, N.R., Amaral, D.G. & Haber, S.N. Insular cortical projections to functional regions of the striatum correlate with cortical cytoarchitectonic organization in the primate. J. Neurosci. 17, 9686–9705 (1997).
Balleine, B.W. & Dickinson, A. The effect of lesions of the insular cortex on instrumental conditioning: evidence for a role in incentive memory. J. Neurosci. 20, 8954–8964 (2000).
Knutson, B., Fong, G.W., Bennett, S.M., Adams, C.M. & Hommer, D. A region of mesial prefrontal cortex tracks monetarily rewarding outcomes: characterization with rapid event-related fMRI. Neuroimage 18, 263–272 (2003).
Ullsperger, M. & von Cramon, D.Y. Error monitoring using external feedback: specific roles of the habenular complex, the reward system, and the cingulate motor area revealed by functional magnetic resonance imaging. J. Neurosci. 23, 4308–4314 (2003).
O'Doherty, J., Critchley, H., Deichmann, R. & Dolan, R.J. Dissociating valence of outcome from behavioral control in human orbital and ventral prefrontal cortices. J. Neurosci. 23, 7931–7939 (2003).
Koepp, M.J. et al. Evidence for striatal dopamine release during a video game. Nature 393, 266–268 (1998).
Elliott, R., Friston, K.J. & Dolan, R.J. Dissociable neural responses in human reward systems. J. Neurosci. 20, 6159–6165 (2000).
Knutson, B., Adams, C.M., Fong, G.W. & Hommer, D. Anticipation of increasing monetary reward selectively recruits nucleus accumbens. J. Neurosci. 21, RC159 (2001).
Pagnoni, G., Zink, C.F., Montague, P.R. & Berns, G.S. Activity in human ventral striatum locked to errors of reward prediction. Nat. Neurosci. 5, 97–98 (2002).
Elliott, R., Newman, J.L., Longe, O.A. & Deakin, J.F. Differential response patterns in the striatum and orbitofrontal cortex to financial reward in humans: a parametric functional magnetic resonance imaging study. J. Neurosci. 23, 303–307 (2003).
Haruno, M. et al. A neural correlate of reward-based behavioral learning in caudate nucleus: a functional magnetic resonance imaging study of a stochastic decision task. J. Neurosci. 24, 1660–1665 (2004).
Reynolds, J.N. & Wickens, J.R. Dopamine-dependent plasticity of corticostriatal synapses. Neural Net. 15, 507–521 (2002).
Tremblay, L. & Schultz, W. Reward-related neuronal activity during go-nogo task performance in primate orbitofrontal cortex. J. Neurophysiol. 83, 1864–1876 (2000).
Critchley, H.D., Mathias, C.J. & Dolan, R.J. Neural activity in the human brain relating to uncertainty and arousal during anticipation. Neuron 29, 537–545 (2001).
Rogers, R.D. et al. Choosing between small, likely rewards and large, unlikely rewards activates inferior and orbital prefrontal cortex. J. Neurosci. 19, 9029–9038 (1999).
Rolls, E.T. The orbitofrontal cortex and reward. Cereb. Cortex 10, 284–294 (2000).
Hanakawa, T. et al. The role of rostral Brodmann area 6 in mental-operation tasks: an integrative neuroimaging approach. Cereb. Cortex 12, 1157–1170 (2002).
Owen, A.M., Doyon, J., Petrides, M. & Evans, A.C. Planning and spatial working memory: a positron emission tomography study in humans. Eur. J. Neurosci. 8, 353–364 (1996).
Baker, S.C. et al. Neural systems engaged by planning: a PET study of the Tower of London task. Neuropsychologia 34, 515–526 (1996).
Middleton, F.A. & Strick, P.L. Basal ganglia and cerebellar loops: motor and cognitive circuits. Brain Res. Brain Res. Rev. 31, 236–250 (2000).
Haber, S.N., Kunishio, K., Mizobuchi, M. & Lynd-Balta, E. The orbital and medial prefrontal circuit through the primate basal ganglia. J. Neurosci. 15, 4851–4867 (1995).
Eagle, D.M., Humby, T., Dunnett, S.B. & Robbins, T.W. Effects of regional striatal lesions on motor, motivational, and executive aspects of progressive-ratio performance in rats. Behav. Neurosci. 113, 718–731 (1999).
Pears, A., Parkinson, J.A., Hopewell, L., Everitt, B.J. & Roberts, A.C. Lesions of the orbitofrontal but not medial prefrontal cortex disrupt conditioned reinforcement in primates. J. Neurosci. 23, 11189–11201 (2003).
Hikosaka, O. et al. Parallel neural networks for learning sequential procedures. Trends Neurosci. 22, 464–471 (1999).
Mijnster, M.J. et al. Regional and cellular distribution of serotonin 5-hydroxytryptamine2a receptor mRNA in the nucleus accumbens, olfactory tubercle, and caudate putamen of the rat. J. Comp. Neurol. 389, 1–11 (1997).
Compan, V., Segu, L., Buhot, M.C. & Daszuta, A. Selective increases in serotonin 5-HT1B/1D and 5-HT2A/2C binding sites in adult rat basal ganglia following lesions of serotonergic neurons. Brain Res. 793, 103–111 (1998).
Celada, P., Puig, M.V., Casanovas, J.M., Guillazo, G. & Artigas, F. Control of dorsal raphe serotonergic neurons by the medial prefrontal cortex: involvement of serotonin-1A, GABA(A), and glutamate receptors. J. Neurosci. 21, 9917–9929 (2001).
Martin-Ruiz, R. et al. Control of serotonergic function in medial prefrontal cortex by serotonin-2A receptors through a glutamate-dependent mechanism. J. Neurosci. 21, 9856–9866 (2001).
Hikosaka, K. & Watanabe, M. Delay activity of orbital and lateral prefrontal neurons of the monkey varying with different rewards. Cereb. Cortex 10, 263–271 (2000).
Shidara, M. & Richmond, B.J. Anterior cingulate: single neuronal signals related to degree of reward expectancy. Science 296, 1709–1711 (2002).
Matsumoto, K., Suzuki, W. & Tanaka, K. Neuronal correlates of goal-based motor selection in the prefrontal cortex. Science 301, 229–232 (2003).
Acknowledgements
We thank K. Samejima, N. Schweighofer, M. Haruno, H. Imamizu, S. Higuchi, T. Yoshioka, T. Chaminade and M. Kawato for helpful discussions and technical advice. This research was funded by 'Creating the Brain,' Core Research for Evolutional Science and Technology (CREST), Japan Science and Technology Agency.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Figure 1
An example of the time series of the explanatory variables for one subject. (PDF 34 kb)
Supplementary Figure 2
A schematic diagram of the brain areas involved in reward prediction at different time scales. The dotted lines indicate a cortico-cortico connection, and the green arrows indicate the serotonergic pathways from the dorsal raphe. The 'limbic loop' (including lateral OFC and ventral striatum) is involved in short-term reward prediction. The 'cognitive and motor loops' (including DLPFC, PMd and dorsal striatum) are involved in long-term reward prediction. Ventroanterior-to-dorsoposterior topographical projections from the insula to the striatum are involved in short-to-long-term reward prediction (rainbow-colored arrow). The mPFC and dorsal raphe, which are reciprocally connected, may regulate these loops by cortico-cortical and cortico-striatal projections from mPFC and serotonergic projections from dorsal raphe. SNr, substantia nigra pars reticulate. (PDF 415 kb)
Supplementary Table 1
Areas significantly activated in the block-design analysis. (PDF 11 kb)
Supplementary Table 2
Areas with significant correlation with reward prediction V(t) estimated with different discount factors γ. (PDF 15 kb)
Supplementary Table 3
Voxels with significant correlation with reward prediction error δ(t) estimated with different discount factors γ. (PDF 15 kb)
Rights and permissions
About this article
Cite this article
Tanaka, S., Doya, K., Okada, G. et al. Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops. Nat Neurosci 7, 887–893 (2004). https://doi.org/10.1038/nn1279
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nn1279
This article is cited by
-
Reinforcement Learning Under Uncertainty: Expected Versus Unexpected Uncertainty and State Versus Reward Uncertainty
Computational Brain & Behavior (2023)
-
Influences of Reinforcement and Choice Histories on Choice Behavior in Actor-Critic Learning
Computational Brain & Behavior (2023)
-
Large-scale societal dynamics are reflected in human mood and brain
Scientific Reports (2022)
-
Interactions between ventrolateral prefrontal and anterior cingulate cortex during learning and behavioural change
Neuropsychopharmacology (2022)
-
The nucleus accumbens and ventral pallidum exhibit greater dopaminergic innervation in humans compared to other primates
Brain Structure and Function (2021)