2002 Special issueOpponent interactions between serotonin and dopamine
Introduction
From a computational perspective, serotonin (5HT) is the most mysterious of the main vertebrate neuromodulators. Pharmacological investigations reveal that it plays a role in a wide variety of phenomena, including impulsivity, obsessionality, aggression, psychomotor inhibition, latent inhibition, analgesia, hallucinations, eating disorders, attention and mood (Solomon et al., 1980, Soubrié, 1986, Fields et al., 1991, Harrison et al., 1997, Harrison et al., 1999, Westenberg et al., 1996, Buhot, 1997, Edwards and Kravitz, 1997, Hollander, 1998, Aghajanian and Marek, 1999, Masand and Gupta, 1999, Stanford, 1999, De Vry and Schreiber, 2000, Lesch and Merschdorf, 2000, Stahl, 2000). However, there are many complexities in these effects. For instance, drugs that take immediate and selective effect on inhibiting serotonin reuptake, and so boost its neural longevity, take two weeks to have an effect on mood. Also, electrophysiological data (Jacobs and Fornal, 1997, Jacobs and Fornal, 1999, Gao et al., 1997, Gao et al., 1988) show that serotonin cells do not obviously alter their firing rates in response to the sort of significant stimuli that might be expected to control some of the behaviors described earlier. Thus, the experimental data on the involvement of serotonin are confusing, and this has inevitably impeded the development of computational theory.
In this paper, we focus on one important (though emphatically not exclusive) aspect of serotonin suggested by anatomical and pharmacological data, namely an apparent opponent partnership with dopamine (DA, Azmitia, 1978, Azmitia and Segal, 1978, Deakin, 1983, Fletcher, 1991, Fletcher, 1995, Vertes, 1991, Deakin, 1996, Kapur and Remington, 1996, Fletcher and Korth, 1999, Fletcher et al., 1999). Substantial evidence supports the theory that phasic activity of dopamine cells in the ventral tegmental area and substantia nigra pars compacta reports a prediction error for summed future reward (Montague et al., 1996, Schultz et al., 1997, Schultz, 1998) in the context of a temporal difference (TD) model (Sutton, 1988, Sutton and Barto, 1990) of reinforcement learning (Bertsekas and Tsitsiklis, 1996, Sutton and Barto, 1998). To the extent that serotonin acts as an opponent to dopamine, we can use our understanding of the role of dopamine to help constrain aspects of the role of serotonin. Equally, the TD model of dopamine is based on experiments that only probe a small part of the overall scope of reinforcement learning. Extending the model to cope with theoretical issues such as long-run average rewards (Daw and Touretzky, 2000, Daw and Touretzky, 2002) actually leads to the requirement for a signal that acts like an opponent to dopamine. Here, we explore this candidate role for serotonin.
Opponency has a venerable history in psychology and neuroscience. In an implementational form (e.g. Grossberg, 1988), it starts from the simple idea of using two systems to code for events (such as affective events), with one system reporting positive excursions from a baseline (appetitive events), the other system reporting negative excursions (aversive events), and mutual inhibition between the systems and/or opposing effects on common outputs. From a physiological perspective, this neatly circumvents the absence of negative firing rates. However, opponency turns out to have some less obvious mathematical and computational properties, which have been given diverse interpretations in everything from affective systems to circadian timing mechanisms (Grossberg, 1984, Grossberg, 2000).
In this paper, we focus on motivational opponency between appetitive and aversive systems. In modeling conditioning, reinforcement learning has largely focused on formalizing a notion of the affective value of stimuli, in terms of the future rewards and punishments their presence implies. Psychologically, this notion of affective value is best thought of as a form of motivational value, and motivational opponency has itself been the focus of substantial experimental study. For instance, following Konorski, 1967, Dickinson and Dearing, 1979, Dickinson and Balleine, 2002 review evidence (such as transreinforcer blocking, Ganesan & Pearce, 1988) suggesting it is psychologically reasonable to consider just one appetitive and one aversive motivational system, and not either multiple appetitive and multiple aversive systems or just one single combined system. These two systems are motivational opponents; they also have opposing preparatory behavioral effects, with the appetitive system inducing Pavlovian approach, and the aversive system withdrawal. The reinforcement learning model of dopaminergic activity identifies it as the crucial substrate of the appetitive motivational system; here, following, amongst others, Deakin and Graeff (1991), we model serotonergic activity as a crucial substrate of the aversive motivational system.
Psychological and implementational aspects of opponency have been much, and sometimes confusingly, debated. Psychologically, two main forms of opponency have been considered, one associated with the punctate presentation of conditioned and unconditioned stimuli, the other associated with their long-term delivery. For the former, rewarding unconditioned stimuli are assumed able to excite the appetitive system, as are conditioned stimuli associated with reward. Punishing unconditioned stimuli are assumed to excite the aversive system, as are conditioned stimuli associated with punishment. The inhibitory interaction between the two systems can have various consequences. For instance, extinguishing an appetitive conditioned stimulus could equally result from reducing its ability to drive the appetitive motivational system (passive extinction), or increasing its ability to drive the aversive motivational system (active extinction), or both (see, for example, Osgood, 1953). These possibilities have different experimental implications. Another example is that if a conditioned inhibitor for reward acts by exciting the aversive system, then it should be able to block (Kamin, 1969) learning of a conditioned predictor of shock (Dickinson and Dearing, 1979, Goodman and Fowler, 1983), since it will predict away the activation of the aversive motivational system.
Solomon and Corbit (1974) considered an apparently different and dynamic aspect of opponency in the case that one or both of the appetitive or aversive systems are excited for a substantial time. Stopping the delivery of a long sequence of unexpected rewards is aversive (perhaps characterized by frustration); stopping the delivery of a long sequence of unexpected punishments is appetitive (perhaps characterized by relief).
We seek to model both short- and long-term aspects of opponency. One way to proceed would be to build a phenomenological model, such as Solomon and Corbit's (1974), or a mechanistic one, such as Grossberg, 2000, Grossberg and Schmajuk, 1987. Solomon and Corbit's model (1974) suggests that the long-term delivery of appetitive unconditioned stimuli excites the aversive opponent system at a slower timescale. When the unconditioned stimuli are removed, the opponent system is also slower to lose excitation, and can thus be motivationally dominant for a short while. Grossberg and his colleagues (e.g. Grossberg, 1984, Grossberg, 1988, Grossberg, 2000, Grossberg and Schmajuk, 1987), have extensively discussed an alternative mechanism for this (involving slow adaptation within the system that reports the original unconditioned stimulus rather than the slow build up of the opponent), and have shown how the rich internal dynamics that opponent systems exhibit might themselves be responsible for many otherwise puzzling phenomena.
By contrast with, though not necessarily in contradiction to, these proposals, we seek a computational account. We start by considering long-term aspects, arguing that opponency emerges naturally (Daw and Touretzky, 2000, Daw and Touretzky, 2002) from TD learning in the case of predicting long-run average rewards rather than summed future rewards (Puterman, 1994, Schwartz, 1993, Mahadevan, 1996, Tadepalli and Ok, 1998). As will become apparent, this form of TD learning embodies a natural opponency between the existing phasic dopamine signal, and a newly suggested, tonic, signal, which we identify with serotonin. We extend the scope of the model to predictions of summed future punishment, and thereby postulate mirror opponency, between a tonic dopamine signal and a phasic serotonin signal. Short-term aspects of opponency then arise through consideration of the ways that the predictions of future reward and punishment might be represented.
In Section 2, we discuss the various aspects of the data on serotonin that have led us to consider it as being involved in aversive processing in general, and as an opponent to dopamine in particular. Section 3 covers the theoretical background to the TD learning model and the resulting link to short- and long-term aspects of opponency; Section 4 discusses long-term aspects of opponency; and Section 5 considers the consequences if serotonin exactly mirrors dopamine. The discussion ties together the various strands of our argument.
Section snippets
Serotonin in conditioning
As suggested by the vast range of its effects listed earlier, serotonin plays an extremely complicated set of roles in the brain, roles that it is impossible at present to encompass within a single theory. Compared with dopamine, it is anatomically more widespread and behaviorally much more diverse. Further, although the activity of serotonin cells has not been systematically tested in the range of conditioning tasks that has been used to probe dopamine cells (Jacobs and Fornal, 1997, Jacobs
Dopamine and temporal difference learning
Electrophysiological data on the activity of dopamine neurons suggest that they report a TD prediction error for predictions of long-run rewards (Montague et al., 1996, Schultz et al., 1997). The TD learning algorithm (Sutton, 1988, Bertsekas and Tsitsiklis, 1996, Sutton and Barto, 1998) uses samples of received rewards to learn a value function, which maps information about the current state of the world (i.e. the current stimuli) to a prediction of the rewards expected in the future. Value
Long-term opponency
The assumption made in the standard TD model that events are episodic, coming in separated trials, is clearly unrealistic, since most events, even in the context of behavioral experiments, are really ongoing. Treating them as such requires using a different notion of value; in particular, Eq. (1) must be replaced with a return that is not truncated at the end of each trial. Theoretical treatments of this case avoid the possibility of divergence by considering either discounted values, in which
Aversive conditioning and mirrored opponency
The key lacuna in the computational model that we have specified is the lack of an account of how the prediction error is reported for aversive events. From a theoretical standpoint, the learning of actions to obtain reward is no different from the learning of actions to avoid punishments. However, as stated earlier, there is both physiological (cells only have positive firing rates) and psychological (data suggest separate appetitive and aversive systems) evidence to support a distinction
Discussion
We have suggested various ways in which serotonin, perhaps that released by the dorsal raphe nucleus serotonin system, could act as a motivational opponent system to dopamine in conditioning tasks. We considered how serotonin might report the long-run average reward rate as a tonic opponent to a phasic dopamine signal in the theoretical framework of average-case reinforcement learning. We also considered how dopamine might report the long-run average punishment rate as a tonic opponent to a
Acknowledgements
We are very grateful to Bill Deakin, Kenji Doya, Barry Everitt, Zach Mainen, Trevor Robbins, David Touretzky and Jonathan Williams for helpful discussions, and to Kenji Doya for inviting SK to the Metalearning and Neuromodulation workshop. Funding was from the Gatsby Charitable Foundation and the NSF (ND by a Graduate Fellowship and grants IIS-9978403 and DGE-9987588; SK by a Graduate Fellowship).
References (101)
- et al.
Serotonin and hallucinogens
Neuropsychopharmacology
(1999) - et al.
Optimal filtering
(1979) The serotonin-producing neurons of the midbrain median and dorsal raphe nuclei
- et al.
An autoradiographic analysis of the differential ascending projections of the dorsal and median raphe nuclei in the rat
Journal of Comparative Neurology
(1978) - et al.
Neuronlike adaptive elements that can solve difficult learning control problems
IEEE Transaction on Systems, Man and Cybernetics
(1983) - et al.
What is the role of dopamine in reward: Hedonic impact, reward learning, or incentive salience?
Brain Research Reviews
(1998) - et al.
Neuro-dynamic programming
(1996) Serotonin receptors in cognitive behaviors
Current Opinion in Neurobiology
(1997)- et al.
Psychomotor-activating effects mediated by dopamine D-sub-2 and D-sub-3 receptors in the nucleus accumbens
Pharmacology, Biochemistry and Behavior
(2000) - et al.
Behavioral considerations suggest an average reward TD model of the dopamine system
Neurocomputing
(2000)