Skip to main content
Log in

Tonic dopamine: opportunity costs and the control of response vigor

Psychopharmacology Aims and scope Submit manuscript

Abstract

Rationale

Dopamine neurotransmission has long been known to exert a powerful influence over the vigor, strength, or rate of responding. However, there exists no clear understanding of the computational foundation for this effect; predominant accounts of dopamine’s computational function focus on a role for phasic dopamine in controlling the discrete selection between different actions and have nothing to say about response vigor or indeed the free-operant tasks in which it is typically measured.

Objectives

We seek to accommodate free-operant behavioral tasks within the realm of models of optimal control and thereby capture how dopaminergic and motivational manipulations affect response vigor.

Methods

We construct an average reward reinforcement learning model in which subjects choose both which action to perform and also the latency with which to perform it. Optimal control balances the costs of acting quickly against the benefits of getting reward earlier and thereby chooses a best response latency.

Results

In this framework, the long-run average rate of reward plays a key role as an opportunity cost and mediates motivational influences on rates and vigor of responding. We review evidence suggesting that the average reward rate is reported by tonic levels of dopamine putatively in the nucleus accumbens.

Conclusions

Our extension of reinforcement learning models to free-operant tasks unites psychologically and computationally inspired ideas about the role of tonic dopamine in striatum, explaining from a normative point of view why higher levels of dopamine might be associated with more vigorous responding.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. Given that the actions we include in “Other” are typically performed in experimental scenarios despite not being rewarded by the experimenter, we assume these entail some “internal” reward, modeled simply as a negative unit cost.

  2. Realistically, even in a well-learned task, the average reward rate and response rates may not be perfectly stable. For instance, during a session, both would decline progressively as satiety reduces the utility of obtained rewards. However, this is negligible in most free-operant scenarios in which sessions are short or sparsely rewarded.

References

  • Aberman JE, Salamone JD (1999) Nucleus accumbens dopamine depletions make rats more sensitive to high ratio requirements but do not impair primary food reinforcement. Neuroscience 92(2):545–552

    PubMed  CAS  Google Scholar 

  • Ainslie G (1975) Specious reward: a behavioural theory of impulsiveness and impulse control. Psychol Bull 82:463–496

    PubMed  CAS  Google Scholar 

  • Barrett JE, Stanley JA (1980) Effects of ethanol on multiple fixed-interval fixed-ratio schedule performances: dynamic interactions at different fixed-ratio values. J Exp Anal Behav 34(2):185–198

    PubMed  CAS  Google Scholar 

  • Barto AG (1995) Adaptive critics and the basal ganglia. In: Houk JC, Davis JL, Beiser DG (eds) Models of information processing in the basal ganglia. MIT Press, Cambridge, pp 215–232

    Google Scholar 

  • Beninger RJ (1983) The role of dopamine in locomotor activity and learning. Brain Res Brain Res Rev 6:173–196

    CAS  Google Scholar 

  • Bergstrom BP, Garris PA (2003) ‘Passive stabilization’ of striatal extracellular dopamine across the lesion spectrum encompassing the presymptomatic phase of Parkinson’s disease: a voltammetric study in the 6-OHDA lesioned rat. J Neurochem 87(5):1224–1236

    PubMed  CAS  Google Scholar 

  • Berridge KC (2004) Motivation concepts in behavioral neuroscience. Physiol Behav 81(2):179–209

    PubMed  CAS  Google Scholar 

  • Berridge KC, Robinson TE (1998) What is the role of dopamine in reward: hedonic impact, reward learning, or incentive salience? Brain Res Brain Res Rev 28:309–369

    PubMed  CAS  Google Scholar 

  • Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena, Belmont

    Google Scholar 

  • Bolles RC (1967) Theory of motivation. Harper and Row, New York

    Google Scholar 

  • Carr GD, White NM (1987) Effects of systemic and intracranial amphetamine injections on behavior in the open field: a detailed analysis. Pharmacol Biochem Behav 27:113–122

    PubMed  CAS  Google Scholar 

  • Catania AC, Reynolds GS (1968) A quantitative analysis of the responding maintained by interval schedules of reinforcement. J Exp Anal Behav 11:327–383

    PubMed  Google Scholar 

  • Catania AC, Matthews TJ, Silverman PJ, Yohalem R (1977) Yoked variable-ratio and variable-interval responding in pigeons. J Exp Anal Behav 28:155–161

    PubMed  Google Scholar 

  • Chéramy A, Barbeito L, Godeheu G, Desce J, Pittaluga A, Galli T, Artaud F, Glowinski J (1990) Respective contributions of neuronal activity and presynaptic mechanisms in the control of the in vivo release of dopamine. J Neural Transm Suppl 29:183–193

    PubMed  Google Scholar 

  • Chesselet MF (1990) Presynaptic regulation of dopamine release. Implications for the functional organization of the basal ganglia. Ann N Y Acad Sci 604:17–22

    PubMed  CAS  Google Scholar 

  • Correa M, Carlson BB, Wisniecki A, Salamone JD (2002) Nucleus accumbens dopamine and work requirements on interval schedules. Behav Brain Res 137:179–187

    PubMed  CAS  Google Scholar 

  • Cousins MS, Atherton A, Turner L, Salamone JD (1996) Nucleus accumbens dopamine depletions alter relative response allocation in a T-maze cost/benefit task. Behav Brain Res 74:189–197

    PubMed  CAS  Google Scholar 

  • Daw ND (2003) Reinforcement learning models of the dopamine system and their behavioral implications. Unpublished doctoral dissertation, Carnegie Mellon University

  • Daw ND, Touretzky DS (2002) Long-term reward prediction in TD models of the dopamine system. Neural Comp 14:2567–2583

    Google Scholar 

  • Daw ND, Kakade S, Dayan P (2002) Opponent interactions between serotonin and dopamine. Neural Netw 15(4–6):603–616

    PubMed  Google Scholar 

  • Daw ND, Niv Y, Dayan P (2005) Uncertainty based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat Neurosci 8(12):1704–1711

    PubMed  CAS  Google Scholar 

  • Daw ND, O’Doherty JP, Dayan P, Seymour B, Dolan RJ (2006) Cortical substrates for exploratory decisions in humans. Nature 441:876–879

    PubMed  CAS  Google Scholar 

  • Dawson GR, Dickinson A (1990) Performance on ratio and interval schedules with matched reinforcement rates. Q J Exp Psychol B 42:225–239

    PubMed  CAS  Google Scholar 

  • Denk F, Walton ME, Jennings KA, Sharp T, Rushworth MF, Bannerman DM (2005) Differential involvement of serotonin and dopamine systems in cost–benefit decisions about delay or effort. Psychopharmacology (Berl) 179(3):587–596

    CAS  Google Scholar 

  • Dickinson A (1985) Actions and habits: the development of behavioural autonomy. Philos Trans R Soc Lond B Biol Sci 308(1135):67–78

    Google Scholar 

  • Dickinson A, Balleine B (1994) Motivational control of goal-directed action. Anim Learn Behav 22:1–18

    Google Scholar 

  • Dickinson A, Balleine B (2002) The role of learning in the operation of motivational systems. In: Pashler H, Gallistel R (eds) Stevens’ handbook of experimental psychology. Learning, motivation and emotion, 3rd edn, vol 3. Wiley, New York, pp 497–533

    Google Scholar 

  • Dickinson A, Smith J, Mirenowicz J (2000) Dissociation of Pavlovian and instrumental incentive learning under dopamine agonists. Behav Neurosci 114(3):468–483

    PubMed  CAS  Google Scholar 

  • Domjan M (2003) Principles of learning and behavior, 5th edn. Thomson/Wadsworth, Belmont

    Google Scholar 

  • Dragoi V, Staddon JER (1999) The dynamics of operant conditioning. Psychol Rev 106(1):20–61

    PubMed  CAS  Google Scholar 

  • Evenden JL, Robbins TW (1983) Increased dopamine switching, perseveration and perseverative switching following d-amphetamine in the rat. Psychopharmacology (Berl) 80:67–73

    CAS  Google Scholar 

  • Faure A, Haberland U, Condé F, Massioui NE (2005) Lesion to the nigrostriatal dopamine system disrupts stimulus–response habit formation. J Neurosci 25:2771–2780

    PubMed  CAS  Google Scholar 

  • Fiorillo C, Tobler P, Schultz W (2003) Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299(5614):1898–1902

    PubMed  CAS  Google Scholar 

  • Fletcher PJ, Korth KM (1999) Activation of 5-HT1B receptors in the nucleus accumbens reduces amphetamine-induced enhancement of responding for conditioned reward. Psychopharmacology (Berl) 142:165–174

    CAS  Google Scholar 

  • Floresco SB, West AR, Ash B, Moore H, Grace AA (2003) Afferent modulation of dopamine neuron firing differentially regulates tonic and phasic dopamine transmission. Nat Neurosci 6(9):968–973

    PubMed  CAS  Google Scholar 

  • Foster TM, Blackman KA, Temple W (1997) Open versus closed economies: performance of domestic hens under fixed-ratio schedules. J Exp Anal Behav 67:67–89

    PubMed  Google Scholar 

  • Friston KJ, Tononi G, Reeke GNJ, Sporns O, Edelman GM (1994) Value-dependent selection in the brain: simulation in a synthetic neural model. Neuroscience 59(2):229–243

    PubMed  CAS  Google Scholar 

  • Gallistel CR, Gibbon J (2000) Time, rate and conditioning. Psychol Rev 107:289–344

    PubMed  CAS  Google Scholar 

  • Gallistel CR, Stellar J, Bubis E (1974) Parametric analysis of brain stimulation reward in the rat: I. The transient process and the memory-containing process. J Comp Physiol Psychol 87:848–860

    PubMed  CAS  Google Scholar 

  • Gibbon J (1977) Scalar expectancy theory and Weber’s law in animal timing. Psychol Rev 84(3):279–325

    Google Scholar 

  • Goto Y, Grace A (2005) Dopaminergic modulation of limbic and cortical drive of nucleus accumbens in goal-directed behavior. Nat Neurosci 8:805–812

    PubMed  CAS  Google Scholar 

  • Grace AA (1991) Phasic versus tonic dopamine release and the modulation of dopamine system responsivity: a hypothesis for the etiology of schizophrenia. Neuroscience 41(1):1–24

    PubMed  CAS  Google Scholar 

  • Hernandez G, Hamdani S, Rajabi H, Conover K, Stewart J, Arvanitogiannis A, Shizgal P (2006) Prolonged rewarding stimulation of the rat medial forebrain bundle: neurochemical and behavioral consequences. Behav Neurosci 120(4):888–904

    PubMed  CAS  Google Scholar 

  • Herrnstein RJ (1961) Relative and absolute strength of response as a function of frequency of reinforcement. J Exp Anal Behav 4(3):267–272

    PubMed  CAS  Google Scholar 

  • Herrnstein RJ (1970) On the law of effect. J Exp Anal Behav 13(2):243–266

    PubMed  Google Scholar 

  • Houk JC, Adams JL, Barto AG (1995) A model of how the basal ganglia generate and use neural signals that predict reinforcement. In: Houk JC, Davis JL, Beiser DG (eds) Models of information processing in the basal ganglia. MIT Press, Cambridge, pp 249–270

    Google Scholar 

  • Ikemoto S, Panksepp J (1999) The role of nucleus accumbens dopamine in motivated behavior: a unifying interpretation with special reference to reward-seeking. Brain Res Brain Res Rev 31:6–41

    PubMed  CAS  Google Scholar 

  • Jackson DM, Anden N, Dahlstrom A (1975) A functional effect of dopamine in the nucleus accumbens and in some other dopamine-rich parts of the rat brain. Psychopharmacologia 45:139–149

    PubMed  CAS  Google Scholar 

  • Kacelnik A (1997) Normative and descriptive models of decision making: time discounting and risk sensitivity. In: Bock GR, Cardew G (eds) Characterizing human psychological adaptations: Ciba Foundation symposium 208. Wiley, Chichester, pp 51–70

    Google Scholar 

  • Killeen PR (1995) Economics, ecologies and mechanics: the dynamics of responding under conditions of varying motivation. J Exp Anal Behav 64:405–431

    PubMed  Google Scholar 

  • Konorski J (1967) Integrative activity of the brain: an interdisciplinary approach. University of Chicago Press, Chicago

    Google Scholar 

  • Lauwereyns J, Watanabe K, Coe B, Hikosaka O (2002) A neural correlate of response bias in monkey caudate nucleus. Nature 418(6896):413–417

    PubMed  CAS  Google Scholar 

  • Le Moal M, Simon H (1991) Mesocorticolimbic dopaminergic network: functional and regulatory roles. Physiol Rev 71:155–234

    PubMed  Google Scholar 

  • Ljungberg T, Enquist M (1987) Disruptive effects of low doses of d-amphetamine on the ability of rats to organize behaviour into functional sequences. Psychopharmacology (Berl) 93:146–151

    CAS  Google Scholar 

  • Ljungberg T, Apicella P, Schultz W (1992) Responses of monkey dopaminergic neurons during learning of behavioral reactions. J Neurophys 67:145–163

    CAS  Google Scholar 

  • Lodge DJ, Grace AA (2005) The hippocampus modulates dopamine neuron responsivity by regulating the intensity of phasic neuron activation. Neuropsychopharmacology 31:1356–1361

    PubMed  Google Scholar 

  • Lodge DJ, Grace AA (2006) The laterodorsal tegmentum is essential for burst firing of ventral tegmental area dopamine neurons. Proc Nat Acad Sci U S A 103(13):5167–5172

    PubMed  CAS  Google Scholar 

  • Lyon M, Robbins TW (1975) The action of central nervous system stimulant drugs: a general theory concerning amphetamine effects. In: Current developments in psychopharmacology. Spectrum, New York, pp 80–163

    Google Scholar 

  • Mahadevan S (1996) Average reward reinforcement learning: foundations, algorithms and empirical results. Mach Learn 22:1–38

    Google Scholar 

  • Mazur JA (1983) Steady-state performance on fixed-, mixed-, and random-ratio schedules. J Exp Anal Behav 39(2):293–307

    PubMed  Google Scholar 

  • McClure SM, Daw ND, Montague PR (2003) A computational substrate for incentive salience. Trends Neurosci 26(8):423–428

    PubMed  CAS  Google Scholar 

  • Mingote S, Weber SM, Ishiwari K, Correa M, Salamone JD (2005) Ratio and time requirements on operant schedules: effort-related effects of nucleus accumbens dopamine depletions. Eur J Neurosci 21:1749–1757

    PubMed  Google Scholar 

  • Montague PR (2006) Why choose this book?: how we make decisions. Dutton, New York

    Google Scholar 

  • Montague PR, Dayan P, Sejnowski TJ (1996) A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J Neurosci 16(5):1936–1947

    PubMed  CAS  Google Scholar 

  • Moore H, West AR, Grace AA (1999) The regulation of forebrain dopamine transmission: relevance to the psychopathology of schizophrenia. Biol Psychiatry 46:40–55

    PubMed  CAS  Google Scholar 

  • Murschall A, Hauber W (2006) Inactivation of the ventral tegmental area abolished the general excitatory influence of Pavlovian cues on instrumental performance. Learn Mem 13:123–126

    PubMed  CAS  Google Scholar 

  • Niv Y, Daw ND, Dayan P (2005a) How fast to work: response vigor, motivation and tonic dopamine. In: Weiss Y, Schölkopf B, Platt J (eds) NIPS 18. MIT Press, Cambridge, pp 1019–1026

    Google Scholar 

  • Niv Y, Daw ND, Joel D, Dayan P (2005b) Motivational effects on behavior: towards a reinforcement learning model of rates of responding. COSYNE 2005, Salt Lake City

  • Niv Y, Joel D, Dayan P (2006) A normative perspective on motivation. Trends Cogn Sci 10:375–381

    PubMed  Google Scholar 

  • Oades RD (1985) The role of noradrenaline in tuning and dopamine in switching between signals in the CNS. Neurosci Biobehav Rev 9(2):261–282

    PubMed  CAS  Google Scholar 

  • Packard MG, Knowlton BJ (2002) Learning and memory functions of the basal ganglia. Annu Rev Neurosci 25:563–593

    PubMed  CAS  Google Scholar 

  • Phillips PEM, Wightman RM (2004) Extrasynaptic dopamine and phasic neuronal activity. Nat Neurosci 7:199

    PubMed  CAS  Google Scholar 

  • Phillips PEM, Stuber GD, Heien MLAV, Wightman RM, Carelli RM (2003) Subsecond dopamine release promotes cocaine seeking. Nature 422:614–618

    PubMed  CAS  Google Scholar 

  • Redgrave P, Prescott TJ, Gurney K (1999) The basal ganglia: a vertebrate solution to the selection problem? Neuroscience 89:1009–1023

    PubMed  CAS  Google Scholar 

  • Robbins TW, Everitt BJ (1982) Functional studies of the central catecholamines. Int Rev Neurobiol 23:303–365

    Article  PubMed  CAS  Google Scholar 

  • Roitman MF, Stuber GD, Phillips PEM, Wightman RM, Carelli RM (2004) Dopamine operates as a subsecond modulator of food seeking. J Neurosci 24(6):1265–1271

    PubMed  CAS  Google Scholar 

  • Salamone JD, Correa M (2002) Motivational views of reinforcement: implications for understanding the behavioral functions of nucleus accumbens dopamine. Behav Brain Res 137:3–25

    PubMed  CAS  Google Scholar 

  • Salamone JD, Wisniecki A, Carlson BB, Correa M (2001) Nucleus accumbens dopamine depletions make animals highly sensitive to high fixed ratio requirements but do not impair primary food reinforcement. Neuroscience 5(4):863–870

    Google Scholar 

  • Satoh T, Nakai S, Sato T, Kimura M (2003) Correlated coding of motivation and outcome of decision by dopamine neurons. J Neurosci 23(30):9913–9923

    PubMed  CAS  Google Scholar 

  • Schoenbaum G, Setlow B, Nugent S, Saddoris M, Gallagher M (2003) Lesions of orbitofrontal cortex and basolateral amygdala complex disrupt acquisition of odor-guided discriminations and reversals. Learn Mem 10:129–140

    PubMed  Google Scholar 

  • Schultz W (1998) Predictive reward signal of dopamine neurons. J Neurophys 80:1–27

    CAS  Google Scholar 

  • Schultz W, Apicella P, Ljungberg T (1993) Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. J Neurosci 13:900–913

    PubMed  CAS  Google Scholar 

  • Schultz W, Dayan P, Montague PR (1997) A neural substrate of prediction and reward. Science 275:1593–1599

    PubMed  CAS  Google Scholar 

  • Schwartz A (1993) A reinforcement learning method for maximizing undiscounted rewards. In: Proceedings of the tenth international conference on machine learning. Morgan Kaufmann, San Francisco, pp 298–305

    Google Scholar 

  • Sokolowski JD, Salamone JD (1998) The role of accumbens dopamine in lever pressing and response allocation: effects of 6-OHDA injected into core and dorsomedial shell. Pharmacol Biochem Behav 59(3):557–566

    PubMed  CAS  Google Scholar 

  • Solomon RL, Corbit JD (1974) An opponent-process theory of motivation. I. Temporal dynamics of affect. Psychol Rev 81:119–145

    PubMed  CAS  Google Scholar 

  • Staddon JER (2001) Adaptive dynamics. MIT Press, Cambridge

    Google Scholar 

  • Sutton RS, Barto AG (1981) Toward a modern theory of adaptive networks: expectation and prediction. Psychol Rev 88:135–170

    PubMed  CAS  Google Scholar 

  • Sutton RS, Barto AG (1990) Time-derivative models of Pavlovian reinforcement. In: Gabriel M, Moore J (eds) Learning and computational neuroscience: foundations of adaptive networks. MIT Press, Cambridge, pp 497–537

    Google Scholar 

  • Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge

    Google Scholar 

  • Taghzouti K, Simon H, Louilot A, Herman J, Le Moal M (1985) Behavioral study after local injection of 6-hydroxydopamine into the nucleus accumbens in the rat. Brain Res 344:9–20

    PubMed  CAS  Google Scholar 

  • Takikawa Y, Kawagoe R, Itoh H, Nakahara H, Hikosaka O (2002) Modulation of saccadic eye movements by predicted reward outcome. Exp Brain Res 142(2):284–291

    PubMed  Google Scholar 

  • Taylor JR, Robbins TW (1984) Enhanced behavioural control by conditioned reinforcers following microinjections of d-amphetamine into the nucleus accumbens. Psychopharmacology (Berl) 84:405–412

    CAS  Google Scholar 

  • Taylor JR, Robbins TW (1986) 6-Hydroxydopamine lesions of the nucleus accumbens, but not of the caudate nucleus, attenuate enhanced responding with reward-related stimuli produced by intra-accumbens d-amphetamine. Psychopharmacology (Berl) 90:390–397

    CAS  Google Scholar 

  • Tobler P, Fiorillo C, Schultz W (2005) Adaptive coding of reward value by dopamine neurons. Science 307(5715):1642–1645

    PubMed  CAS  Google Scholar 

  • van den Bos R, Charria Ortiz GA, Bergmans AC, Cools AR (1991) Evidence that dopamine in the nucleus accumbens is involved in the ability of rats to switch to cue-directed behaviours. Behav Brain Res 42:107–114

    PubMed  Google Scholar 

  • Waelti P, Dickinson A, Schultz W (2001) Dopamine responses comply with basic assumptions of formal learning theory. Nature 412:43–48

    PubMed  CAS  Google Scholar 

  • Walton ME, Kennerley SW, Bannerman DM, Phillips PEM, Rushworth MFS (2006) Weighing up the benefits of work: behavioral and neural analyses of effort-related decision making. Neural networks (in press)

  • Watanabe M, Cromwell H, Tremblay L, Hollerman J, Hikosaka K, Schultz W (2001) Behavioral reactions reflecting differential reward expectations in monkeys. Exp Brain Res 140(4):511–518

    PubMed  CAS  Google Scholar 

  • Weiner I (1990) Neural substrates of latent inhibition: the switching model. Psychol Bull 108:442–461

    PubMed  CAS  Google Scholar 

  • Weiner I, Joel D (2002) Dopamine in schizophrenia: dysfunctional information processing in basal ganglia-thalamocortical split circuits. In: Chiara GD (ed) Handbook of experimental pharmacology, vol 154/II. Dopamine in the CNS II. Springer, Berlin Heidelberg New York, pp 417–472

    Google Scholar 

  • Wickens J (1990) Striatal dopamine in motor activation and reward-mediated learning: steps towards a unifying model. J Neural Transm 80:9–31

    CAS  Google Scholar 

  • Wickens J, Kötter R (1995) Cellular models of reinforcement. In: Houk JC, Davis JL, Beiser DG (eds) Models of information processing in the basal ganglia. MIT Press, Cambridge, pp 187–214

    Google Scholar 

  • Wilson C, Nomikos GG, Collu M, Fibiger HC (1995) Dopaminergic correlates of motivated behavior: importance of drive. J Neurosci 15(7):5169–5178

    PubMed  CAS  Google Scholar 

  • Wise RA (2004) Dopamine, learning and motivation. Nat Rev Neurosci 5:483–495

    PubMed  CAS  Google Scholar 

  • Yin HH, Knowlton BJ, Balleine BW (2004) Lesions of dorsolateral striatum preserve outcome expectancy but disrupt habit formation in instrumental learning. Eur J Neurosci 19:181–189

    PubMed  Google Scholar 

  • Zuriff GE (1970) A comparison of variable-ratio and variable-interval schedules of reinforcement. J Exp Anal Behav 13:369–374

    PubMed  Google Scholar 

Download references

Acknowledgements

This work was funded by the Gatsby Charitable Foundation, a Hebrew University Rector Fellowship (Y.N.), the Royal Society (N.D.), and the EU Bayesian Inspired Brain and Artefacts (BIBA) project (N.D. and P.D.). We are grateful to Saleem Nicola, Mark Walton, and Matthew Rushworth for valuable discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yael Niv.

Appendix

Appendix

Here we describe in more mathematical detail the proposed RL model of free-operant response rates (Niv et al. 2005a) from which the results in this paper were derived.

Formally, the action selection problem faced by the rat can be characterized by a series of states, SS, in each of which the rat must choose an action and a latency (a,τ) which will entail a unit cost, C u , and a vigor cost, C v /τ, and result in a possible transition to a new state, \(S\prime \), and a possible immediate reward with utility, U r . The unit cost constant C u and the vigor cost constant C v can take different values depending on the identity of the currently chosen action a∈{LP, NP, “Other”} and on that of the previously performed action. The transitions between states and the probability of reward for each action are governed by the schedule of reinforcement. For instance, in a random-ratio 5 (RR5) schedule, every LP action has p=0.2 probability of inducing a transition from the state in which no food is available in the magazine to that in which food is available. An NP action in the “no-reward-available” state is never rewarded and, conversely, is rewarded with certainty (p r =1) in the “food-available-in-magazine” state. As a simplification, for each reinforcement schedule, we define states that incorporate all the available information relevant to decision making, such as the identity of the previously chosen action, whether or not food is available in the magazine, the time that has elapsed since the last lever press (in random-interval schedules only), and the number of lever presses since the last reward (in fixed ratio schedules only). The animal’s behavior in the experiment is thus fully described by the successive actions and latencies chosen at the different states the animal encountered {(a i, τi, S i), i=1,2,3, ...}. The average reward rate \(\overline{R} \) is simply the sum of all the rewards obtained minus all the costs incurred, all divided by the total amount of time.

Using this formulation, we can define the differential value of a state, denoted V(S), as the expected sum of future rewards minus costs encountered from this state and onward compared with the expected average reward rate. Defining the value as an expectation over a sum means that the value can be written recursively as the expected reward minus cost due to the current action, compared with the immediately forfeited average reward, plus the value of the next state (averaged over the possible next states). To find the optimal differential values of the different states, that is, the values \(V^{*} {\left( S \right)}\) (and average value \(\overline{R} ^{*} \)) given the optimal action selection strategy, we can simultaneously solve the set of equations defining these values:

$$V^{*} {\left( S \right)} = {\mathop {\max }\limits_{a,\tau } }{\left\{ {p_{{\text{r}}} {\left( {a,\tau ,S} \right)}U_{{\text{r}}} - C_{{\text{u}}} {\left( {a,a_{{{\text{prev}}}} } \right)} - \frac{{C_{{\text{v}}} {\left( {a,a_{{{\text{prev}}}} } \right)}}}{\tau } - \overline{R} ^{{\text{*}}} \cdot \tau + {\sum\limits_{S\prime \in S} {p{\left( {\left. {S\prime } \right|a,\tau ,S} \right)}V^{{\text{*}}} {\left( {S\prime } \right)}} }} \right\}},$$
(1)

in which there is one equation for every state S∈ S, and p(\(S\prime \) |a,τ,S) is the schedule-defined probability to transition to state \(S\prime \) given (a,τ) was performed at state S.

The theory of dynamic programming (Bertsekas and Tsitsiklis 1996) ensures that these equations have one solution for the optimal attainable average reward \(\overline{R} ^{*} \), and the optimal differential state values \(V^{*} {\left( S \right)}\) (which are defined up to an additive constant). This solution can be found using iterative dynamic programming methods such as “value iteration” (Bertsekas and Tsitsiklis 1996) or approximated through online sampling of the task dynamics and temporal-difference learning (Schwartz 1993; Mahadevan 1996; Sutton and Barto 1998). Here we used the former and report results using the true optimal differential values. We compare these model results to the steady-state behavior of well-trained animals as the optimal values correspond to values learned online throughout an extensive training period.

Given the optimal state values, the optimal differential value of an (a,τ) pair taken at state S, denoted \(Q^{*} {\left( {a,\tau ,S} \right)}\), is simply:

$$Q^{*} {\left( {a,\tau ,S} \right)} = p_{r} {\left( {a,\tau ,S} \right)} \cdot U_{r} - C_{u} {\left( {a,a_{{{\text{prev}}}} } \right)} - \frac{{C_{{\text{v}}} {\left( {a,a_{{{\text{prev}}}} } \right)}}}{\tau } - \overline{R} ^{*} \cdot \tau + {\sum\limits_{S\prime \in S} {p{\left( {\left. {S\prime } \right|a,\tau ,S} \right)}V^{*} {\left( {S\prime } \right)}} }$$
(2)

The animal can select actions optimally (that is, such as to obtain the maximal possible average reward rate \(\overline{R} ^{*} \)) by comparing the differential values of the different (a,τ) pairs at the current state and choosing the action and latency that have the highest value. Alternatively, to allow more flexible behavior and occasional exploratory actions (Daw et al. 2006), response selection can be based on the so-called “soft-max” rule (or Boltzmann distribution) in which the probability of choosing an (a,τ) pair is proportional to its differential value. In this case, which is the one we used here, actions that are “almost optimal” are chosen almost as frequently as actions that are strictly optimal. Specifically, the probability of choosing (a,τ) in state S is:

$$p{\left( {a,\tau ,S} \right)} = \frac{{e^{{\beta Q^{*} {\left( {a,\tau ,S} \right)}}} }}{{{\sum\limits_{a\prime ,\tau \prime } {e^{{\beta Q^{*} {\left( {a\prime ,\tau \prime ,S} \right)}}} } }}},$$
(3)

where β is the inverse temperature controlling the steepness of the soft-max function (a value of zero corresponds to uniform selection of actions, whereas higher values correspond to a more maximizing strategy).

To simulate the (immediate) effects of depletion of tonic dopamine (Fig. 3b), Q values were recomputed from the optimal V values (using Eq. 2), but taking into account a lower average reward rate (specifically, \(\overline{R} _{{{\text{depleted}}}} = 0.4\overline{R} ^{*} \)). Actions were then chosen as usual, using the soft-max function of these new Q values, to generate behavior.

Finally, note that Eq. 2 is a function relating actions and latencies to values. Accordingly, one way to find the optimal latency is to differentiate Eq. 2 with respect to τ and find its maximum. For ratio schedules (in which the identity and value of the subsequent state \(S\prime \) is not dependent on τ), this gives:

$$\tau ^{*} = {\sqrt {\frac{{C_{{\text{V}}} }}{{\overline{R} ^{*} }}} },$$
(4)

showing that the optimal latency \(\tau ^{*} \) depends solely on the vigor cost constant and the average reward rate. This is true regardless of the action a chosen, which is why a change in the average reward has a similar effect on the latencies of all actions. In interval schedules, the situation is slightly more complex because the identity of the subsequent state is dependent on the latency, and this must be taken into account when taking the derivative. However, in this case as well, the optimal latency is inversely related to the average reward rate.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Niv, Y., Daw, N.D., Joel, D. et al. Tonic dopamine: opportunity costs and the control of response vigor. Psychopharmacology 191, 507–520 (2007). https://doi.org/10.1007/s00213-006-0502-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00213-006-0502-4

Keywords

Navigation