Elsevier

Neural Networks

Volume 15, Issues 4–6, June–July 2002, Pages 603-616
Neural Networks

2002 Special issue
Opponent interactions between serotonin and dopamine

https://doi.org/10.1016/S0893-6080(02)00052-7Get rights and content

Abstract

Anatomical and pharmacological evidence suggests that the dorsal raphe serotonin system and the ventral tegmental and substantia nigra dopamine system may act as mutual opponents. In the light of the temporal difference model of the involvement of the dopamine system in reward learning, we consider three aspects of motivational opponency involving dopamine and serotonin. We suggest that a tonic serotonergic signal reports the long-run average reward rate as part of an average-case reinforcement learning model; that a tonic dopaminergic signal reports the long-run average punishment rate in a similar context; and finally speculate that a phasic serotonin signal might report an ongoing prediction error for future punishment.

Introduction

From a computational perspective, serotonin (5HT) is the most mysterious of the main vertebrate neuromodulators. Pharmacological investigations reveal that it plays a role in a wide variety of phenomena, including impulsivity, obsessionality, aggression, psychomotor inhibition, latent inhibition, analgesia, hallucinations, eating disorders, attention and mood (Solomon et al., 1980, Soubrié, 1986, Fields et al., 1991, Harrison et al., 1997, Harrison et al., 1999, Westenberg et al., 1996, Buhot, 1997, Edwards and Kravitz, 1997, Hollander, 1998, Aghajanian and Marek, 1999, Masand and Gupta, 1999, Stanford, 1999, De Vry and Schreiber, 2000, Lesch and Merschdorf, 2000, Stahl, 2000). However, there are many complexities in these effects. For instance, drugs that take immediate and selective effect on inhibiting serotonin reuptake, and so boost its neural longevity, take two weeks to have an effect on mood. Also, electrophysiological data (Jacobs and Fornal, 1997, Jacobs and Fornal, 1999, Gao et al., 1997, Gao et al., 1988) show that serotonin cells do not obviously alter their firing rates in response to the sort of significant stimuli that might be expected to control some of the behaviors described earlier. Thus, the experimental data on the involvement of serotonin are confusing, and this has inevitably impeded the development of computational theory.

In this paper, we focus on one important (though emphatically not exclusive) aspect of serotonin suggested by anatomical and pharmacological data, namely an apparent opponent partnership with dopamine (DA, Azmitia, 1978, Azmitia and Segal, 1978, Deakin, 1983, Fletcher, 1991, Fletcher, 1995, Vertes, 1991, Deakin, 1996, Kapur and Remington, 1996, Fletcher and Korth, 1999, Fletcher et al., 1999). Substantial evidence supports the theory that phasic activity of dopamine cells in the ventral tegmental area and substantia nigra pars compacta reports a prediction error for summed future reward (Montague et al., 1996, Schultz et al., 1997, Schultz, 1998) in the context of a temporal difference (TD) model (Sutton, 1988, Sutton and Barto, 1990) of reinforcement learning (Bertsekas and Tsitsiklis, 1996, Sutton and Barto, 1998). To the extent that serotonin acts as an opponent to dopamine, we can use our understanding of the role of dopamine to help constrain aspects of the role of serotonin. Equally, the TD model of dopamine is based on experiments that only probe a small part of the overall scope of reinforcement learning. Extending the model to cope with theoretical issues such as long-run average rewards (Daw and Touretzky, 2000, Daw and Touretzky, 2002) actually leads to the requirement for a signal that acts like an opponent to dopamine. Here, we explore this candidate role for serotonin.

Opponency has a venerable history in psychology and neuroscience. In an implementational form (e.g. Grossberg, 1988), it starts from the simple idea of using two systems to code for events (such as affective events), with one system reporting positive excursions from a baseline (appetitive events), the other system reporting negative excursions (aversive events), and mutual inhibition between the systems and/or opposing effects on common outputs. From a physiological perspective, this neatly circumvents the absence of negative firing rates. However, opponency turns out to have some less obvious mathematical and computational properties, which have been given diverse interpretations in everything from affective systems to circadian timing mechanisms (Grossberg, 1984, Grossberg, 2000).

In this paper, we focus on motivational opponency between appetitive and aversive systems. In modeling conditioning, reinforcement learning has largely focused on formalizing a notion of the affective value of stimuli, in terms of the future rewards and punishments their presence implies. Psychologically, this notion of affective value is best thought of as a form of motivational value, and motivational opponency has itself been the focus of substantial experimental study. For instance, following Konorski, 1967, Dickinson and Dearing, 1979, Dickinson and Balleine, 2002 review evidence (such as transreinforcer blocking, Ganesan & Pearce, 1988) suggesting it is psychologically reasonable to consider just one appetitive and one aversive motivational system, and not either multiple appetitive and multiple aversive systems or just one single combined system. These two systems are motivational opponents; they also have opposing preparatory behavioral effects, with the appetitive system inducing Pavlovian approach, and the aversive system withdrawal. The reinforcement learning model of dopaminergic activity identifies it as the crucial substrate of the appetitive motivational system; here, following, amongst others, Deakin and Graeff (1991), we model serotonergic activity as a crucial substrate of the aversive motivational system.

Psychological and implementational aspects of opponency have been much, and sometimes confusingly, debated. Psychologically, two main forms of opponency have been considered, one associated with the punctate presentation of conditioned and unconditioned stimuli, the other associated with their long-term delivery. For the former, rewarding unconditioned stimuli are assumed able to excite the appetitive system, as are conditioned stimuli associated with reward. Punishing unconditioned stimuli are assumed to excite the aversive system, as are conditioned stimuli associated with punishment. The inhibitory interaction between the two systems can have various consequences. For instance, extinguishing an appetitive conditioned stimulus could equally result from reducing its ability to drive the appetitive motivational system (passive extinction), or increasing its ability to drive the aversive motivational system (active extinction), or both (see, for example, Osgood, 1953). These possibilities have different experimental implications. Another example is that if a conditioned inhibitor for reward acts by exciting the aversive system, then it should be able to block (Kamin, 1969) learning of a conditioned predictor of shock (Dickinson and Dearing, 1979, Goodman and Fowler, 1983), since it will predict away the activation of the aversive motivational system.

Solomon and Corbit (1974) considered an apparently different and dynamic aspect of opponency in the case that one or both of the appetitive or aversive systems are excited for a substantial time. Stopping the delivery of a long sequence of unexpected rewards is aversive (perhaps characterized by frustration); stopping the delivery of a long sequence of unexpected punishments is appetitive (perhaps characterized by relief).

We seek to model both short- and long-term aspects of opponency. One way to proceed would be to build a phenomenological model, such as Solomon and Corbit's (1974), or a mechanistic one, such as Grossberg, 2000, Grossberg and Schmajuk, 1987. Solomon and Corbit's model (1974) suggests that the long-term delivery of appetitive unconditioned stimuli excites the aversive opponent system at a slower timescale. When the unconditioned stimuli are removed, the opponent system is also slower to lose excitation, and can thus be motivationally dominant for a short while. Grossberg and his colleagues (e.g. Grossberg, 1984, Grossberg, 1988, Grossberg, 2000, Grossberg and Schmajuk, 1987), have extensively discussed an alternative mechanism for this (involving slow adaptation within the system that reports the original unconditioned stimulus rather than the slow build up of the opponent), and have shown how the rich internal dynamics that opponent systems exhibit might themselves be responsible for many otherwise puzzling phenomena.

By contrast with, though not necessarily in contradiction to, these proposals, we seek a computational account. We start by considering long-term aspects, arguing that opponency emerges naturally (Daw and Touretzky, 2000, Daw and Touretzky, 2002) from TD learning in the case of predicting long-run average rewards rather than summed future rewards (Puterman, 1994, Schwartz, 1993, Mahadevan, 1996, Tadepalli and Ok, 1998). As will become apparent, this form of TD learning embodies a natural opponency between the existing phasic dopamine signal, and a newly suggested, tonic, signal, which we identify with serotonin. We extend the scope of the model to predictions of summed future punishment, and thereby postulate mirror opponency, between a tonic dopamine signal and a phasic serotonin signal. Short-term aspects of opponency then arise through consideration of the ways that the predictions of future reward and punishment might be represented.

In Section 2, we discuss the various aspects of the data on serotonin that have led us to consider it as being involved in aversive processing in general, and as an opponent to dopamine in particular. Section 3 covers the theoretical background to the TD learning model and the resulting link to short- and long-term aspects of opponency; Section 4 discusses long-term aspects of opponency; and Section 5 considers the consequences if serotonin exactly mirrors dopamine. The discussion ties together the various strands of our argument.

Section snippets

Serotonin in conditioning

As suggested by the vast range of its effects listed earlier, serotonin plays an extremely complicated set of roles in the brain, roles that it is impossible at present to encompass within a single theory. Compared with dopamine, it is anatomically more widespread and behaviorally much more diverse. Further, although the activity of serotonin cells has not been systematically tested in the range of conditioning tasks that has been used to probe dopamine cells (Jacobs and Fornal, 1997, Jacobs

Dopamine and temporal difference learning

Electrophysiological data on the activity of dopamine neurons suggest that they report a TD prediction error for predictions of long-run rewards (Montague et al., 1996, Schultz et al., 1997). The TD learning algorithm (Sutton, 1988, Bertsekas and Tsitsiklis, 1996, Sutton and Barto, 1998) uses samples of received rewards to learn a value function, which maps information about the current state of the world (i.e. the current stimuli) to a prediction of the rewards expected in the future. Value

Long-term opponency

The assumption made in the standard TD model that events are episodic, coming in separated trials, is clearly unrealistic, since most events, even in the context of behavioral experiments, are really ongoing. Treating them as such requires using a different notion of value; in particular, Eq. (1) must be replaced with a return that is not truncated at the end of each trial. Theoretical treatments of this case avoid the possibility of divergence by considering either discounted values, in which

Aversive conditioning and mirrored opponency

The key lacuna in the computational model that we have specified is the lack of an account of how the prediction error is reported for aversive events. From a theoretical standpoint, the learning of actions to obtain reward is no different from the learning of actions to avoid punishments. However, as stated earlier, there is both physiological (cells only have positive firing rates) and psychological (data suggest separate appetitive and aversive systems) evidence to support a distinction

Discussion

We have suggested various ways in which serotonin, perhaps that released by the dorsal raphe nucleus serotonin system, could act as a motivational opponent system to dopamine in conditioning tasks. We considered how serotonin might report the long-run average reward rate as a tonic opponent to a phasic dopamine signal in the theoretical framework of average-case reinforcement learning. We also considered how dopamine might report the long-run average punishment rate as a tonic opponent to a

Acknowledgements

We are very grateful to Bill Deakin, Kenji Doya, Barry Everitt, Zach Mainen, Trevor Robbins, David Touretzky and Jonathan Williams for helpful discussions, and to Kenji Doya for inviting SK to the Metalearning and Neuromodulation workshop. Funding was from the Gatsby Charitable Foundation and the NSF (ND by a Graduate Fellowship and grants IIS-9978403 and DGE-9987588; SK by a Graduate Fellowship).

References (101)

  • G.K. Aghajanian et al.

    Serotonin and hallucinogens

    Neuropsychopharmacology

    (1999)
  • B.D.O. Anderson et al.

    Optimal filtering

    (1979)
  • E.C. Azmitia

    The serotonin-producing neurons of the midbrain median and dorsal raphe nuclei

  • E.C. Azmitia et al.

    An autoradiographic analysis of the differential ascending projections of the dorsal and median raphe nuclei in the rat

    Journal of Comparative Neurology

    (1978)
  • A.G. Barto et al.

    Neuronlike adaptive elements that can solve difficult learning control problems

    IEEE Transaction on Systems, Man and Cybernetics

    (1983)
  • K.C. Berridge et al.

    What is the role of dopamine in reward: Hedonic impact, reward learning, or incentive salience?

    Brain Research Reviews

    (1998)
  • D.P. Bertsekas et al.

    Neuro-dynamic programming

    (1996)
  • M.C. Buhot

    Serotonin receptors in cognitive behaviors

    Current Opinion in Neurobiology

    (1997)
  • J.J. Canales et al.

    Psychomotor-activating effects mediated by dopamine D-sub-2 and D-sub-3 receptors in the nucleus accumbens

    Pharmacology, Biochemistry and Behavior

    (2000)
  • N.D. Daw et al.

    Behavioral considerations suggest an average reward TD model of the dopamine system

    Neurocomputing

    (2000)
  • N.D. Daw et al.

    Long-term reward prediction in TD models of the dopamine system

    Neural Computation

    (2002)
  • P. Dayan

    Motivated reinforcement learning

    NIPS

    (2002)
  • P. Dayan et al.

    Learning and selective attention

    Nature Neuroscience

    (2000)
  • J. De Vry et al.

    Effects of selected serotonin 5-HT-sub-1 and 5-HT-sub-2 receptor agonists on feeding behavior: Possible mechanisms of action

    Neuroscience and Biobehavioral Reviews

    (2000)
  • J.F.W. Deakin

    Roles of brain serotonergic neurons in escape, avoidance and other behaviors

    Journal of Psychopharmacology

    (1983)
  • J.F.W. Deakin

    5-HT, antidepressant drugs and the psychosocial origins of depression

    Journal of Psychopharmacology

    (1996)
  • J.F.W. Deakin et al.

    5-HT and mechanisms of defence

    Journal of Psychopharmacology

    (1991)
  • M.J. Detke

    Extinction of sequential conditioned inhibition

    Animal Learning and Behavior

    (1991)
  • A. Dickinson et al.

    The role of learning in motivation

  • A. Dickinson et al.

    Appetitive–aversive interactions and inhibitory processes

  • K. Doya

    Metalearning, neuromodulation, and emotion

  • Doya, K (2002). Metalearning and neuromodulation. Neural Networks, this...
  • D.H. Edwards et al.

    Serotonin, social status and aggression

    Current Opinion in Neurobiology

    (1997)
  • B.J. Everitt et al.

    Associative processes in addiction and reward. The role of amygdala-ventral striatal subsystems

    Annals of the New York Academy of Sciences

    (1999)
  • H.L. Fields et al.

    Neurotransmitters in nociceptive modulatory circuits

    Annual Review of Neuroscience

    (1991)
  • P.J. Fletcher

    Dopamine receptor blockade in nucleus accumbens or caudate nucleus differentially affects feeding induced by 8-OH-DPAT injected into dorsal or median raphe

    Brain Research

    (1991)
  • P.J. Fletcher

    Effects of combined or separate 5,7-dihydroxytryptamine lesions of the dorsal and median raphe nuclei on responding maintained by a DRL 20s schedule of food reinforcement

    Brain Research

    (1995)
  • P.J. Fletcher et al.

    Activation of 5-HT1B receptors in the nucleus accumbens reduces amphetamine-induced enhancement of responding for conditioned reward

    Psychopharmacology

    (1999)
  • P.J. Fletcher et al.

    Selective destruction of brain serotonin neurons by 5,7-dihydroxytryptamine increases responding for a conditioned reward

    Psychopharmacology

    (1999)
  • P.J. Fletcher et al.

    Conditioned place preference induced by microinjection of 8-OH-DPAT into the dorsal or median raphe nucleus

    Psychopharmacology

    (1993)
  • P.J. Fletcher et al.

    Median raphe injections of 8-OH-DPAT lower frequency thresholds for lateral hypothalamic self-stimulation

    Pharmacology Biochemistry and Behavior

    (1995)
  • R. Ganesan et al.

    Effect of changing the unconditioned stimulus on appetitive blocking

    Journal of Experimental Psychology: Animal Behavior Processes

    (1988)
  • K. Gao et al.

    Activation of serotonergic neurons in the raphe magnus is not necessary for morphine analgesia

    Journal of Neuroscience

    (1988)
  • K. Gao et al.

    Serotonergic pontomedullary neurons are not activated by antinociceptive stimulation in the periaqueductal gray

    Journal of Neuroscience

    (1997)
  • P.A. Garris et al.

    Real-time measurement of electrically evoked extracellular dopamine in the striatum of freely moving rats

    Journal of Neurochemistry

    (1997)
  • I. Geller et al.

    The effects of mepobromate, barbiturate, d-amphetamine and promazine on experimentally induced conflict in the rat

    Psychopharmacology

    (1960)
  • J. Gibbon

    Scalar expectancy theory and Weber's law in animal timing

    Psychological Review

    (1977)
  • J.H. Goodman et al.

    Blocking and enhancement of fear conditioning by appetitive CSs

    Animal Learning and Behavior

    (1983)
  • S. Grossberg

    Some normal and abnormal behavioral syndromes due to transmitter gating of opponent processes

    Biological Psychiatry

    (1984)
  • S. Grossberg

    The imbalanced brain: From normal behavior to schizophrenia

    Biological Psychiatry

    (2000)
  • S. Grossberg et al.

    Neural dynamics of attentionally modulated Pavlovian conditioning: Conditioned reinforcement, inhibition, and opponent processing

    Psychobiology

    (1987)
  • F.A. Guarraci et al.

    An electrophysiological characterization of ventral tegmental area dopaminergic neurons during differential Pavlovian fear conditioning in the awake rabbit

    Behavioural Brain Research

    (1999)
  • A.A. Harrison et al.

    Doubly dissociable effects of median- and dorsal-raphe lesions on the performance of the five-choice serial reaction time test of attention in rats

    Behavioural Brain Research

    (1997)
  • A.A. Harrison et al.

    Central serotonin depletion impairs both the acquisition and performance of a symmetrically reinforced go/no-go conditional visual discrimination

    Brain Research

    (1999)
  • E. Hollander

    Treatment of obsessive-compulsive spectrum disorders with SSRIs

    British Journal of Psychiatry

    (1998)
  • J.C. Horvitz

    Mesolimbocortical and nigrostriatal dopamine responses to salient non-reward events

    Neuroscience

    (2000)
  • J.C. Houk et al.

    A model of how the basal ganglia generate and use neural signals that predict reinforcement

  • S. Ikemoto et al.

    The role of nucleus accumbens dopamine in motivated behavior: a unifying interpretation with special reference to reward-seeking

    Brain Research Reviews

    (1999)
  • H. Imai et al.

    The organization of divergent axonal projections from the midbrain raphe nuclei in the rat

    Journal of Comparative Neurology

    (1986)
  • Cited by (0)

    View full text