Elsevier

Neural Networks

Volume 15, Issues 4–6, June–July 2002, Pages 495-506
Neural Networks

2002 Special issue
Metalearning and neuromodulation

https://doi.org/10.1016/S0893-6080(02)00044-8Get rights and content

Abstract

This paper presents a computational theory on the roles of the ascending neuromodulatory systems from the viewpoint that they mediate the global signals that regulate the distributed learning mechanisms in the brain. Based on the review of experimental data and theoretical models, it is proposed that dopamine signals the error in reward prediction, serotonin controls the time scale of reward prediction, noradrenaline controls the randomness in action selection, and acetylcholine controls the speed of memory update. The possible interactions between those neuromodulators and the environment are predicted on the basis of computational theory of metalearning.

Introduction

Some of the neurotransmitters that have spatially distributed, temporally extended effects on the recipient neurons and circuits are called Neuromodulators (Katz, 1999, Saper, 2000, Marder and Thirumalai, 2002). The best known examples of neuromodulators are dopamine (DA), serotonin (5-HT), noradrenaline (NA; also called norepinephrine, NE), and acetylcholine (ACh). Neuromodulators are traditionally assumed to be involved in the control of general arousal (Robbins, 1997, Saper, 2000). Recent advances in molecular biological techniques have provided rich data on the spatial localization and physiological effects of different neuromodulators and their receptors. This prompted us to build a more specific yet still comprehensive theory for the functions of neuromodulators. This paper proposes a computational theory on the roles of the earlier four major neuromodulators from the viewpoint that neuromodulators are media for signaling specific global variables and parameters that regulate distributed learning modules in the brain (Doya, 2000b).

The computational theory for acquisition of goal-directed behaviors has been formulated under the name of reinforcement learning (RL) (Barto, 1995b, Sutton and Barto, 1998, Doya, 2000c, Doya et al., 2001). The theory has been successfully applied to a variety of dynamic optimization problems, such as game programs (Tesauro, 1994), robotic control (Morimoto & Doya, 2001), and resource allocation (Singh & Bertsekas, 1997). In practical applications of reinforcement learning theory, a critical issue is how to set the parameters of the learning algorithms, such as the speed of learning, the size of noise for exploration, and the time scale in prediction of future reward. Such parameters globally affect the way many system parameters change by learning, so they are called metaparameters or hyperparameters.

In statistical learning theory, the need for setting the right metaparameters, such as the degree of freedom of statistical models and the prior distribution of parameters, is widely recognized. Theories of metaparameter setting have been developed from the viewpoints of risk-minimization (Vapnik, 2000) and Bayesian estimation (Neal, 1996). However, many applications of reinforcement learning have depended on heuristic search for setting the right metaparameters by human experts. The need for the tuning of metaparameters is one of the major reasons why sophisticated learning algorithms, which perform successfully in the laboratory, cannot be practically applied in highly variable environments at home or on the street.

Compared to current artificial learning systems, the learning mechanisms implemented in the brain appear to be much more robust and flexible. Humans and animals can learn novel behaviors under a wide variety of environments. This suggests that the brain has a certain mechanism for metalearning, a capability of dynamically adjusting its own metaparameters of learning. This paper presents a hypothesis stating that the ascending neuromodulatory systems (Fig. 1) are the media of metalearning for controlling and coordinating the distributed learning modules in the brain (Doya, 1999). More specifically, we propose the following set of hypotheses to explain the roles of the four major ascending neuromodulators (Doya, 2000b):

  • 1.

    Dopamine represents the global learning signal for prediction of rewards and reinforcement of actions.

  • 2.

    Serotonin controls the balance between short-term and long-term prediction of reward.

  • 3.

    Noradrenaline controls the balance between wide exploration and focused execution.

  • 4.

    Acetylcholine controls the balance between memory storage and renewal.

In order to state the above hypotheses in a more computationally well-defined manner, we first review the basic algorithms of reinforcement learning and the roles of major metaparameters. We then propose a set of hypotheses on how such metaparameters are regulated by the above neuromodulators. Finally, we discuss the possible neural mechanisms of metaparameter control and the possible interactions between neuromodulatory systems predicted from the hypotheses.

In this paper, our main focus is on the roles of neuromodulators within the circuit of basal ganglia, which have been suggested as the major locus of reinforcement learning (Houk et al., 1995, Montague et al., 1996, Doya, 2000a). However, we also discuss how their roles can be generalized to other brain areas, including the cerebral cortex and the cerebellum.

Section snippets

Reinforcement learning algorithm

Reinforcement learning is a computational framework for an agent to learn to take an action in response to the state of the environment so that the acquired reward is maximized in a long run (Fig. 2) (Barto, 1995b, Sutton and Barto, 1998, Doya, 2000c, Doya et al., 2001). What makes reinforcement learning difficult yet interesting is that selection of an action does not only affect the immediate reward but also affect the future rewards through the dynamic evolution of the future states.

In order

Hypothetical roles of neuromodulators

Now we restate our hypotheses on the roles of neuromodulators in terms of the global learning signal and metaparameters introduced in the above reinforcement learning algorithm (Doya, 2000b):

  • 1.

    Dopamine signals the TD error δ.

  • 2.

    Serotonin controls the discount factor γ.

  • 3.

    Noradrenaline controls the inverse temperature β.

  • 4.

    Acetylcholine controls the learning rate α.

Below, we review the experimental findings and theoretical models that support these hypotheses.

Dynamic interactions of neuromodulators

Based on the above hypotheses on the specific roles of neuromodulators in reinforcement learning, it is possible to theoretically predict how the activities of those modulators should depend on each other. Fig. 9 shows the possible interactions between the neuromodulators, the experience of the agent represented in the form of value functions, and the environment.

Conclusion

This paper proposed a unified theory on the roles of neuromodulators in mediating the global learning signal and metaparameters of distributed learning mechanisms of the brain. We considered how such regulatory mechanisms can be implemented in the neural circuit centered around the basal ganglia. However, there are many other brain areas and functions that require further consideration, for example, the roles of the amygdala and hippocampus in reinforcement learning and the roles of

Acknowledgements

The author is grateful to Peter Dayan, Barry Everitt, Takeshi Inoue, Sham Kakade, Go Okada, Yasumasa Okamoto, and Shigeto Yamawaki for valuable discussions on the roles of serotonin and also thanks Nicolas Schweighofer for his comments on the manuscript.

References (83)

  • S Ishii et al.

    Control of exploitation–exploration meta-parameter in reinforcement learning

    Neural Networks

    (2002)
  • D Joel et al.

    Actor–critic models of the basal ganglia: New anatomical and computational perspectives. Dopaminergic modulation in the basal ganglia

    Neural Networks

    (2002)
  • S Kakade et al.

    Dopamine bonuses

    Neural Networks

    (2002)
  • M Kawato

    Internal models for motor control and trajectory planning

    Current Opinion in Neurobiology

    (1999)
  • M Littman et al.

    Learning policies for partially observable environments: Scaling up

  • E Marder et al.

    Cellular, synaptic, and network effects of neuromodulators

    Neural Networks

    (2002)
  • F.A Middleton et al.

    Basal gagnlia and cerebellar loops: Motor and cognitive circuits

    Brain Research Reviews

    (2000)
  • J Morimoto et al.

    Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning

    Robotics and Autonomous Systems

    (2001)
  • C.A Paladini et al.

    Striatal, pallidal, and parsreticulata evoked inhibition of nigrostriatal dopaminergic neurons is mediated by GABA A receptors in vivo

    Neuroscience

    (1999)
  • E Perry et al.

    Acetylcholine in mind: A neurotransmitter correlate of consciousness?

    Trends in Neurosciences

    (1999)
  • S Rahman et al.

    Decision making and neuropsychiatry

    Trends in Cognitive Sciences

    (2001)
  • D.D Rasmusson

    The role of acetylcholine in cortical synaptic plasticity

    Behavioural Brain Research

    (2000)
  • P Redgrave et al.

    Is the short-latency dopamine response too short to signal reward error?

    Trends in Cognitive Sciences

    (1999)
  • J.N.J Reynolds et al.

    Dopamine-dependent plasticity of cortico-striatal synapses

    Neural Networks

    (2002)
  • T.W Robbins

    Arousal systems and attentional processes

    Biological Psychology

    (1997)
  • H Sershen et al.

    Serotonin-mediated striatal dopamine release involves the dopamine uptake site and the serotonin receptor

    Brain Research Bulletin

    (2000)
  • R.E Suri

    Td models of reward predictive responses in dopamine neurons

    Neural Networks

    (2002)
  • M Usher et al.

    Neuromodulation of decision and response selection

    Neural Networks

    (2002)
  • J.R Wickens et al.

    Dopamine reverses the depression of rat corticostriatal synapses which normally follows high-frequency stimulation of cortex in vitro

    Neuroscience

    (1996)
  • R.A Wise

    Neurobiology of addiction

    Current Opinion in Neurobiology

    (1996)
  • A.J Yu et al.

    Acetylcholine in cortical inference

    Neural Networks

    (2002)
  • T Aosaki et al.

    Responses of tonically active neurons in the primate's striatum undergo systematic changes during behavioral sensory-motor conditioning

    Journal of Neuroscience

    (1994)
  • G Aston-Jones et al.

    Locus coeruleus neurons in monkey are selectively activated by attended cues in a vigilance task

    Journal of Neuroscience

    (1994)
  • A.G Barto

    Adaptive critics and the basal ganglia

  • A.G Barto

    Reinforcement learning

  • A.G Barto et al.

    Neuronlike adaptive elements that can solve difficult learning control problems

    IEEE Transactions on Systems, Man, and Cybernetics

    (1983)
  • Baxter, J., & Bartlett, P. L (2000). Reinforcement learning in POMDP's via direct gradient ascent. International...
  • J Brown et al.

    How the basal ganglia use parallel excitatory and inhibitory learning pathways to selectively respond to unexpected rewarding cues

    Journal of Neuroscience

    (1999)
  • R.N Cardinal et al.

    Impulsive choice induced in rats by lesions of the nucleus accumbens core

    Science

    (2001)
  • P Dayan et al.

    Advances in neural information processing systems

    (2002)
  • P De Deurwaerde et al.

    Opposite changes of in vivo dopamine release in the rat nucleus accumbens and striatum that follows electrical stimulation of dorsal raphe nucleus: Role of 5-HT3 receptors

    Journal of Neuroscience

    (1998)
  • Cited by (0)

    View full text