The dopamine system has been thought to play a central role in guiding behavior based on rewards. Recent pharmacological studies suggest that another monoamine neurotransmitter, serotonin, is also involved in reward processing. To elucidate the functional relationship between serotonin neurons and dopamine neurons, we performed single-unit recording in the dorsal raphe nucleus (DRN), a major source of serotonin, and the substantia nigra pars compacta, a major source of dopamine, while monkeys performed saccade tasks in which the position of the target indicated the size of an upcoming reward. After target onset, but before reward delivery, the activity of many DRN neurons was modulated tonically by the expected reward size with either large- or small-reward preference, whereas putative dopamine neurons had phasic responses and only preferred large rewards. After reward delivery, the activity of DRN neurons was modulated tonically by the received reward size with either large- or small-reward preference, whereas the activity of dopamine neurons was not modulated except after the unexpected reversal of the position–reward contingency. Thus, DRN neurons encode the expected and received rewards, whereas dopamine neurons encode the difference between the expected and received rewards. These results suggest that the DRN, probably including serotonin neurons, signals the reward value associated with the current behavior.
Many functions of the brain are modified by various kinds of monoamine neurons. In particular, dopamine and serotonin appear to be the two major modulators of motivational and emotional behaviors (for review, see Daw et al., 2002). The role of dopamine is particularly clear because dopamine neurons in the midbrain in and around the substantia pars compacta (SNc) are excited by a reward or a sensory event that predict the reward, either of which can change motivational or emotional states. More specifically, the activity of the dopamine neurons encodes the difference between the expected reward and the actual reward, which is often called reward prediction error. This signal is suggested to induce learning and modulate actions (Mirenowicz and Schultz, 1994; Montague et al., 1996; Schultz et al., 1997; Hollerman and Schultz, 1998; Schultz, 1998; Suri and Schultz, 1998).
Several lines of evidence suggest that serotonin is also related to reward-related behaviors (Rogers et al., 1999; Daw et al., 2002; Doya, 2002; Schweighofer et al., 2007), in addition to other functions such as the sleep–wake cycle (McGinty and Harper, 1976; Lydic et al., 1983; Guzman-Marin et al., 2000; Dugovic, 2001), appetite (Curzon, 1990), locomotion (Jacobs and Fornal, 1993), emotion and social behavior (Davidson et al., 2000; Graeff, 2004), stress-coping behavior (Deakin, 1991; Graeff et al., 1996), and learning and memory (Meneses, 1999). Notably, it has been proposed that there are opponent interactions between dopamine and serotonin (for review, see Kapur and Remington, 1996). However, the physiological basis of the function of the serotonin system in the cognitive and motivational behavior has not been well understood. Electrophysiological studies of the raphe nuclei have been focused mainly on sleep–wake cycle and motor behavior (for review, see Jacobs and Fornal, 1993). Specifically, it is unknown whether and how serotonin neurons in the raphe nuclei encode reward-related information.
In a series of studies using saccade tasks with a biased reward schedule, we have shown that the activity of neurons in the caudate (Kawagoe et al., 1998; Lauwereyns et al., 2002) and the substantia nigra pars reticulata (SNr) (Sato and Hikosaka, 2002), as well as putative dopamine neurons in SNc (Nakahara et al., 2004; Takikawa et al., 2004), was modulated depending on the expected reward. We also showed that the reward-dependent changes in saccade behavior depended on the physiological dopamine release in the caudate (Nakamura and Hikosaka, 2006). Having found that these tasks engage the basal ganglia and the dopamine system, we hypothesized that they would also recruit the serotonin system. We therefore recorded from the dorsal raphe nucleus (DRN), the principal source of serotonergic innervations in the basal ganglia (van der Kooy and Hattori, 1980; Imai et al., 1986; Corvaja et al., 1993). As a comparison, we also recorded from dopamine neurons using the same tasks in the same animals. We found that neurons in the DRN relay signals related to cognitive and motivational processes, but in a different manner from the dopamine system.
Materials and Methods
We used four hemispheres of two rhesus monkeys (Macaca mulatta; laboratory designations: E, male; L, female). Both animals had been implanted with scleral search coils for measuring eye position and a post for holding the head. The recording chambers were placed over the posterior cortices. All aspects of the behavioral experiment, including presentation of stimuli, monitoring of eye movements, monitoring of neuronal activity, and delivery of reward and electrical stimulation were under the control of a QNX-based real-time experimentation data acquisition system (REX; Laboratory of Sensorimotor Research, National Eye Institute–National Institutes of Health, Bethesda, MD). Eye position was monitored by means of a scleral search coil system with 1 ms resolution. Stimuli generated by an active matrix liquid crystal display projector (PJ550; ViewSonic, Walnut, CA) were rear-projected on a frontoparallel screen 25 cm from the monkey's eyes. On successful completion of each trial, drops of water or juice were delivered as reward through a spigot under control of a solenoid valve. Magnetic resonance images were obtained to determine the position of the electrode. The activity of single neurons was recorded using tungsten electrodes (Frederick Haer, Bowdoinham, ME; diameter, 0.25 mm; 1–3 MÙ). The signal was amplified with a bandpass filter (200 Hz to 5 kHz) (BAK, Mount Airy, MD) and collected at 1 kHz via custom-made window discriminator (MEX). We also collected spike wave form for each recorded neuron. All procedures were approved by the Institutional Animal Care and Use Committee and complied with Public Health Service Policy on the humane care and use of laboratory animals.
The animal performed a memory-guided saccade task with a biased reward schedule [one-direction rewarded memory-guided saccade task (1DR-MGS)] (see Fig. 1A). The appearance of a central fixation point (FP) (diameter, 0.6°) signaled the trial initiation. The monkeys were required to fixate on the FP and maintain fixation within a window of ∼3°. After fixation on the FP for 1000–1500 ms (“fixation period”), a cue indicating the future target position (diameter, 1.2°) was presented for 100 ms either to the right or left 20° from the FP. The position of the target was chosen pseudorandomly such that within every “subblock” of four trials each of the two positions was chosen twice. The monkey had to keep fixating on the FP for another 800 ms until the FP went off. The disappearance of the FP was the cue for the monkey to make a saccade toward the memorized cue position. A correct saccade was signaled by the appearance of the target with a 100 ms delay. A liquid reward was delivered with an additional 100 ms delay. If the monkey broke fixation at any time during the fixation period or failed to make a saccade to the cued position, the trial was determined to be an error, and the same trial was repeated until a correct saccade occurred. The intertrial interval, which started at the time of reward offset and lasted until FP onset in the next trial, was 3 s.
The biased reward schedule was introduced in blocks (Kawagoe et al., 1998). In one block of 20–28 trials (10–14 trials for each direction), the amount of reward was always large (0.4 ml) for one direction of the target and small (0 or 0.01 ml) for the other direction (for example, left, large reward; right, small reward). In the next block, the position–reward contingency was reversed (i.e., left, small; right, large). These two kinds of blocks with opposite position–reward contingencies are called the left-large and right-large blocks, and they were alternated two or three times for each recording session (see Fig. 1C).
In a separate experiment, we also used a visually guided saccade task [one-direction rewarded visually guided saccade task (1DR-VGS)] (see Fig. 1B). After fixation on the FP for 1200 ms (fixation period), the FP disappeared and at the same time, the target (1.2°) appeared either to the right or left 20° from the FP. The monkey then had to make a saccade to the target immediately. The trial sequence and the reward schedule were the same as those in 1DR-MGS.
We used both 1DR-MGS and 1DR-VGS tasks for 64 DRN neurons, 1DR-MGS only for 20 neurons, and 1DR-VGS only for 103 neurons in two monkeys. For dopamine neuron recordings, we used only 1DR-VGS.
Mapping and recording of the DRN.
The location of DRN was estimated using magnetic resonance imaging and was later verified histologically (see below). A recording chamber, which was angled 38° (monkey E) or 35° (monkey L) posteriorly, was implanted over the midline of the parietal cortex to access the brainstem between the superior colliculi and the inferior colliculi. For electrophysiological recordings, we used a grid system (Crist et al., 1988). A stainless-steel guide tube (outer diameter, 0.6 mm; inner diameter, 0.35 mm) was inserted through a grid hole, and, after penetrating the dura, it was lowered until its tip reached ∼7 mm above the surface of the superior colliculi, which was estimated by magnetic resonance images. Through the guide tube, we inserted an electrode to reach the DRN. The distance of the recording sites from the midline was 1 or 1.5 mm. The anteroposterior extent of the recording sites was 2 mm, which corresponded to 6–8 mm anteriorly to the level of the ear canals (Horsley–Clarke coordinates) in both monkeys.
The DRN is known to be a major source of serotonin neurons (Dahlstrom and Fuxe, 1964; Leger et al., 2001). It has traditionally been accepted that DRN serotonin neurons spontaneously fire slowly and regularly with broad spikes, whereas nonserotonin neurons generally fire more rapidly and irregularly with narrow spikes (Aghajanian et al., 1978; Sawyer et al., 1985; Jacobs and Fornal, 1991; Hajos et al., 1998). Recent studies, however, report that serotonin neurons do not always differ significantly from nonserotonin neurons in terms of these electrophysiological features (Allers and Sharp, 2003; Kocsis et al., 2006). In this report, therefore, rather than choosing neurons with specific electrophysiological properties, we studied all well isolated neurons in the DRN whose activity changed during saccade tasks.
To record from putative dopamine neurons, we searched in and around the SNc. Dopamine neurons were identified by their irregular and tonic firing around 5 spikes/s with broad spike potentials. In this experiment, we focused on dopamine neurons that responded to reward-predicting stimuli with a phasic excitation.
At the conclusion of the experiments, we made electrolytic microlesions at selected recording sites in monkey L. The animal was then deeply anesthetized with pentobarbital and perfused with 10% formaldehyde. The brain was cut into 50 μm coronal sections and stained with cresyl violet (see Fig. 1D).
A neuron was judged to be task-related if there was a statistical difference in its firing rate across the following seven task periods (Kruskal–Wallis, p < 0.007≈0.05/7): fixation point onset to cue (target) onset, 0–200 ms after target onset, 700–0 ms before fixation point offset (only for 1DR-MGS), 200 ms before to 200 ms after saccade, and three postreward periods, which were 0–400, 400–1200, and 1200–2000 ms after reward onset.
Because reward-related modulation of neuronal activity was found mainly during a period after target onset (which indicated the size of an upcoming reward) and during a period after reward delivery, we focused our analysis on neuronal activity during the two task periods: (1) a 400 ms period after target onset, which we will call “prereward period,” and (2) a 400 ms period starting 400 ms after reward onset, which we will call “postreward period.” We analyzed the neuronal activity in each task period using a two-way ANOVA [reward (large or small) by direction (contralateral or ipsilateral target to the recording site)].
To examine changes in neuronal activity throughout the trial as a whole, we computed a receiver operating characteristic (ROC) value comparing the firing rate in a test window of 100 ms aligned with respect to a task-related event (e.g., target onset) to the firing rate in a control window of 400 ms before fixation onset. We repeated the ROC analysis on consecutive overlapping test windows (advanced in 20 ms steps), separately for the large-reward, small-reward, contraversive-saccade, and ipsiversive-saccade trials (see Fig. 3A–D). Similarly, to examine the changes in the reward and direction effects, we computed an ROC value comparing the firing rates in the same test window of 100 ms between the large- and small-reward trials (reward effects) (see Fig. 3E) and between the contraversive- and ipsiversive-saccade trials (direction effects) (see Fig. 3F).
To examine the changes in the neuronal activity in the prereward and postreward periods after the reversal of position–reward contingency, we normalized the firing rate in each trial by the following: (the firing rate in the trial − the mean firing rates across all trials)/(SD of the firing rate across all trials). We performed this calculation for each direction of saccades. Then we compared the firing rates for the ith (e.g., the first and second) trials before and after the contingency reversal with the firing rates for the last five trials during the new block (Mann–Whitney U test, p < 0.01) (see Fig. 8).
We characterized the physiological properties of recorded neurons by (1) spike wave form, (2) baseline firing rate, and (3) irregularity of firing pattern. The typical spike shape consisted of the following waves in order: first, sharp negative; second, sharp positive; third, long-duration negative; fourth, long-duration positive. Thus, we measured the spike duration from the first sharp negative to the peak of the fourth, long-duration positive deflection (Kocsis et al., 2006). It ranged from 1.0 to 3.7 ms (mean, 2.2 ms; SD, 0.58 ms). Baseline firing rate is the mean firing rate during 1000 ms before the onset of the fixation point on the first trial of each experiment, because the activity during the intertrial interval was often modulated tonically after the delivery of reward in the preceding trial. Finally, to quantify irregularity of spike trains, we used an irregularity metric introduced by Davies et al. (2006) which they called “IR.” First, interspike interval (ISIs) was computed for each “between-spikes.” If spike(i − 1), spike(i), and spike(i + 1) occurred in this order, the duration between spike(i − 1) and spike(i) corresponds to ISIi; the duration between spike(i) and spike(i + 1) corresponds to ISIi + 1. Second, the difference between adjacent ISIs was computed as |log(ISIi/ISIi + 1)|. The value was then assigned to the timing when the spike(i) occurred. Thus, small IR values indicate regular firing and large IR values indicate irregular firing. We then computed a median of all IR values during the whole task period for all correct trials. This measure has an advantage over traditional measures of irregularity, such as the coefficient of variation of the interspike intervals, which require a constant firing rate during the measurement period. This requirement was not met in our experiments because neural responses often changed during the task periods. We analyzed IR values of DRN neurons, putative dopamine neurons, and putative projection neurons in the caudate (supplemental Fig. 1, available at www.jneurosci.org as supplemental material). The caudate data were obtained in the separate experiments (Davies et al., 2006).
We analyzed the activity of DRN neurons using two tasks with biased reward schedules: a memory-guided saccade task (1DR-MGS) (Fig. 1A) (17 neurons from monkey E; 67 from monkey L) and a visually guided saccade task (1DR-VGS) (Fig. 1B) (96 neurons from monkey E; 71 from monkey L). Because the biased reward schedule was introduced in blocks, on each trial the animal could predict the reward value based on the location of the target cue (Fig. 1C). Indeed, saccadic reaction times were significantly shorter for large-reward than small-reward trials in both monkeys in both tasks (supplemental Table 1, available at www.jneurosci.org as supplemental material; see also Fig. 8G).
The electrode was directed to the DRN through a recording chamber that was implanted over the midline of the parietal cortex. During the initial survey of DRN, the following brain structures were identified and used as landmarks: superior colliculus with receptive fields in the upper visual field with large eccentricities, inferior colliculus with auditory responses, mesencephalic trigeminal nucleus with responses to mouth movements, the locus ceruleus with phasic responses to salient sensory stimuli, and trochlear nucleus with increased firing during downward eye movements. We analyzed neurons located 0–2 mm anterior to the trochlear nucleus.
Traditionally, it has been accepted that serotonin neurons fire broad spikes spontaneously in a slow and regular “clock-like” firing pattern (Aghajanian et al., 1978; Sawyer et al., 1985; Jacobs and Fornal, 1991; Hajos et al., 1998). Therefore, we computed the baseline firing rate, spike duration, and regularity of sampled neurons (see Materials and Methods). The baseline firing rate across neurons ranged from 0 to 22 spikes/s with a mean of 4.9 spikes/s (SD, 4.3; median, 4.0). The spike duration ranged from 1.0 to 3.7 ms (mean, 2.2 ms; SD, 0.58 ms). Different methods have been used to quantify the regularity of neuronal firing (Shinomoto et al., 2003). In this study, we used the irregularity metric IR, which was the median value of the differences between adjacent interspike intervals during the whole task period (Davies et al., 2006) (see Materials and Methods). Smaller IR values indicate more regular firing. There was no significant difference in IR value between 1DR-MGS and 1DR-VGS (Wilcoxon signed rank test, p = 0.79) (supplemental Fig. 1A, available at www.jneurosci.org as supplemental material). The IR values for the DRN neurons we sampled were significantly smaller (i.e., more regular) than those for putative projection neurons in the caudate nucleus (p < 0.0001) and putative dopamine neurons in the substantia nigra pars compacta (p = 0.02) (supplemental Fig. 1B, available at www.jneurosci.org as supplemental material). Among DRN neurons, there was no significant correlation between IR values and spike duration (p = 0.4, Spearman rank correlation) or baseline firing rate (p = 0.05).
Reward-dependent modulations in DRN neuronal activity
DRN neurons exhibited task-related modulations with distinctive features during the performance of the 1DR-MGS. Most notably, DRN neurons often showed reward-dependent modulations in activity after reward onset. Figure 2A shows a representative example. This neuron was characterized by long spike duration (2.76 ms), low baseline activity (2 Hz), and regular firing (median IR, 0.31). The neuron exhibited an increase in activity after the onset of the fixation point (FPon) followed by regular and tonic firing until reward onset. The activity further increased after the onset of a large reward but ceased after the onset of a small reward. This modulation occurred regardless of the direction of the saccade, and lasted for 860 ms after reward onset (permutation test, p < 0.05) (see Materials and Methods). Such reward-dependent modulations during the postreward period lasted longer for other DRN neurons. For example, the neuron in Figure 2B was also characterized by long spike duration (2.6 ms), low baseline activity (6 Hz), and regular firing pattern (median IR, 0.50). For both saccade directions, there was a long-lasting decrease in activity starting 400 ms after the onset of large reward (permutation test, p < 0.05). The activity of the neuron in Figure 2C (baseline firing rate, 3 Hz; spike duration, 1.9 ms; IR = 0.47) was significantly stronger for large- than small-reward trials starting 800 to 1500 ms after reward onset. The neuron in Figure 2D (baseline firing rate, 10 Hz; spike duration, 1.4 ms; IR = 0.48) also exhibited a long-lasting reward effect starting around the time of reward offset. Note that, in all of these examples, the postreward modulations of activity disappeared before the next trial started (supplemental Fig. 2, available at www.jneurosci.org as supplemental material).
In some neurons, reward-dependent modulations were also observed before reward onset during the delay period. The neuron in Figure 2C exhibited stronger activity on small-reward than large-reward trials (p = 0.8 × 10−6). The neuron in Figure 2D also exhibited stronger activity on small- than large-reward trials, but only when leftward saccades were required (two-way ANOVA, reward effect, p = 0.005; interaction, p = 0.02). Such direction selectivity, however, was relatively rare among DRN neurons.
Reward-dependent modulations in activity during the delay and the postreward periods, as shown in the example neurons in Figure 2, were commonly observed in the population of DRN neurons. Figure 3, A–D, illustrate the time course of these modulations using ROC analysis, by comparing the firing rate of each neuron for each task condition to the baseline activity during 400 ms before fixation onset. During the delay and postreward periods of the task, many DRN neurons had tonic increases in activity (shown in warm colors) or decreases in activity (cool colors).
Figure 3E shows the time course of reward selectivity, using ROC analysis to compare the activity of each neuron between large- and small-reward trials. Figure 3F shows a similar analysis for direction selectivity, comparing contraversive- and ipsiversive-saccade trials. The reward effect was present in many neurons during both task periods before (mainly the delay period) and after reward, whereas direction effects were uncommon.
The data in Figure 3, A and B, reveal a notable difference in the reward-dependent modulations between the prereward period and the postreward period. For each neuron, the changes in activity during the prereward period, compared with the baseline activity, tended to be in the same direction on both large- and small-reward trials (Fig. 3A,B). On the contrary, the changes in activity during the postreward period, compared with the baseline activity, tended to be in opposite directions (Fig. 3A,B). For example, for the neuron shown in Figure 2A, the prereward activity increased compared with the baseline on both large- and small-reward trials. However, the postreward activity increased on large-reward trials, but it was inhibited on small-reward trials.
The main cause of the reward effect during the prereward period was that the changes in activity tended to be stronger on large-reward trials than on small-reward trials, which is illustrated by the greater intensity of colors in Figure 3A than in Figure 3B. To quantify the trend, we computed the prereward activity as the firing rate during 400 ms after target onset minus the baseline firing rate, and the results are shown in Figure 4A. Among 22 neurons (22 of 84; 26%) that showed significant reward effects during the prereward period, 20 neurons exhibited significant activity changes on large-reward trials, whereas only 10 neurons did on small-reward trials. This tendency is illustrated by a wider distribution of the prereward activity on large-reward trials than that on the small-reward trials (Fig. 4A, marginal histograms). When the firing rate in the prereward period was compared between the reward conditions, 16 neurons showed higher firing rates on the large-reward trials than on the small-reward trials; the other 6 neurons showed the opposite pattern (two-way ANOVA, p < 0.01).
Reward-dependent modulations were clearer and more prevalent in postreward activity. Among 42 neurons (42 of 84; 50%) that showed significant reward effects during the postreward period, 24 neurons showed changes in activity in opposite directions between large- and small-reward trials (Fig. 4B, data points in the top left and bottom right quadrants). When postreward activity was compared between the reward conditions, 18 neurons showed a large-reward preference (i.e., higher firing rates on large-reward trials than on small-reward trials); the other 24 neurons showed a small-reward preference (two-way ANOVA, p < 0.01).
As discerned from Figure 3, A–D, some DRN neurons also exhibited changes in activity (1) after fixation onset: increases for 23 of 84 (27.4%) or decreases for 12 of 84 (14.3%) neurons (comparison between activity during 400 ms before and 200 ms after fixation onset; Mann–Whitney U test, p < 0.01), and (2) during the later fixation period: increases for 17 of 84 (20.2%) or decreases for 20 of 84 (23.8%) neurons (comparison between activity during 400 ms before fixation onset and 800–400 ms before target onset, p < 0.01).
Comparison of reward-dependent modulations between DRN and dopamine neurons
To understand the functional significance of the reward-related activity of DRN neurons, we compared it to the activity of dopamine neurons in the same two monkeys. For this purpose, we used a visually guided version of the biased-reward saccade task (Fig. 1B, 1DR-VGS). We recorded from 167 DRN neurons (96 from monkey E; 71 from monkey L) and 64 dopamine neurons (20 from monkey E; 44 from monkey L).
The characteristics of the reward-dependent modulations in the activity of DRN neurons in 1DR-VGS were similar to those found in 1DR-MGS. Thus, many DRN neurons exhibited increases or decreases in tonic activity (usually increases) after the onset of the fixation point. These changes became more evident during the prereward period, after the onset of the saccade target that indicated the size of the upcoming reward. As in 1DR-MGS, changes in prereward activity occurred in the same direction on both large- and small-reward trials (Fig. 5A,B), but tended to be greater on large-reward trials (Fig. 6A), thus leading to differences in activity between the two reward conditions (Fig. 5E). Among 44 neurons (44 of 167; 26%) that showed significant reward effects during the prereward period, 34 exhibited significant activity changes on large-reward trials (29 increase and 5 decrease), whereas only 15 did on small-reward trials (13 increase and 2 decrease).
In the postreward period, the same DRN neurons tended to exhibit opposite changes in activity (Fig. 5A,B). Among 74 neurons (74 of 167; 44%) that showed significant reward effects, 40 neurons changed their activity in opposite directions on large- and small-reward trials (Fig. 6B). About one-half (n = 36) showed a large-reward preference, whereas the other 38 neurons showed a small-reward preference (two-way ANOVA, p < 0.01). The direction of the reward preference was not always the same between the prereward and postreward periods (Fig. 6E).
The activity pattern of dopamine neurons was distinctively different from DRN neurons (Fig. 5C,D). Dopamine neurons exhibited a phasic increase in activity after fixation onset, as reported by Takikawa et al. (2004) for 1DR-MGS. They also exhibited a phasic increase in activity after the onset of the target indicating an upcoming large reward (Fig. 5C) and a phasic decrease in activity after the onset of the target indicating an upcoming small reward (Fig. 5D), leading to a strong and transient large-reward preference in the prereward period (Fig. 5F).
In contrast to the prereward period, changes in the postreward period were less clear in dopamine neurons. Small increases in activity were observed in some neurons after a large reward (Fig. 5C), leading to weak reward effects (Fig. 5F). Whereas 53 of 167 DRN neurons (31.7%) exhibited significant activation modulation long after reward (600–1000 ms after reward onset; sign test, p < 0.01), only 5 of 64 dopamine neurons (7.8%) did so. Thus, the duration of the postreward activity in dopamine neurons was shorter than that in DRN neurons (χ2 test, p < 0.0001). Overall, most of dopamine neurons showed large-reward preference in the prereward period and some did so in the postreward period (Fig. 6F).
Figure 7 shows the proportions of neurons that exhibited significant reward and direction effects for both DRN and dopamine neurons. Statistical significance was determined using a two-way ANOVA for each task period (p < 0.01). In both DRN and dopamine neurons, reward effects were more prevalent than direction effects. For DRN neurons, the large-reward preference was more common than the small-reward preference in the prereward period, whereas these kinds of preferences were equally common in the postreward period. The reward effect was more robust among dopamine neurons. They predominantly showed the large-reward preference in the prereward period and less commonly in the postreward period. The ratio of large- versus small-reward preference was significantly different between DRN neurons and DA neurons (χ2, p < 0.0001 for both prereward and postreward periods).
Changes of prereward and postreward activity after the reversal of position–reward contingency
In both of our tasks, the contingency between target position and reward value was fixed during one block of trials, but was then reversed with no external cue. This allowed us to examine how the monkey's performance and neuronal activity changed adaptively to the new position–reward contingency. As in previous studies from our laboratory, the saccadic reaction time changed quickly after the reversal of the position–reward contingency (Fig. 8G) (Lauwereyns et al., 2002; Watanabe and Hikosaka, 2005).
We therefore examined the time course of the changes in the activity of DRN and dopamine neurons (Fig. 8). We computed the mean normalized firing rates for the prereward period (0–400 ms after target onset) and the postreward period (400–800 ms after reward onset for DRN neurons; 0–400 ms after reward onset for dopamine neurons) as a function of the trial number after the reversal. To assess the speed of activity change after the reversal, we tested whether the neuronal activity on each trial number was significantly different from the mean activity on the last five trials of the new block (Mann–Whitney U test, p < 0.01). This analysis was restricted to neurons whose firing rates were significantly modulated by reward value (two-way ANOVA, p < 0.01) and was performed separately for the prereward and postreward periods.
The changes in prereward activity after the contingency reversal were qualitatively similar for DRN neurons and dopamine neurons (Fig. 8A,C,E). In both DRN neurons and dopamine neurons, the activity on the first trial after the contingency reversal was not different from the last trial of the block before the reversal. This is not surprising because the changed reward had not yet been delivered when the activity occurred. Interestingly, however, the change in activity of DRN neurons was delayed by one trial after the reversal from large rewards to small rewards (Fig. 8A,C), unlike dopamine neurons (Fig. 8E).
The difference between DRN neurons and dopamine neurons was clearer in the postreward period (Fig. 8B,D,F). Unlike in the prereward period, the changed reward had already been delivered on the first trial after the contingency reversal. The activity of DRN neurons followed the size of the reward faithfully (Fig. 8B,D). In contrast, the activity of dopamine neurons only changed transiently on the first trial, and thereafter returned to a level close to baseline activity (Fig. 8F). Specifically, dopamine neurons decreased their activity on large-to-small reward reversals and increased their activity on small-to-large reversals. These transient changes in activity represent the “reward prediction error,” which is the difference between the expected reward value (e.g., small reward) and the actual reward value (e.g., large reward). This pattern of dopamine neuron activity has been shown previously using other tasks (Hollerman and Schultz, 1998; Takikawa et al., 2004). The results thus indicate that DRN neurons encode the actual reward value, not the reward prediction error.
Relationship between the firing pattern and the reward-effect of DRN neurons
In the present experiment, we studied all well isolated neurons in the DRN whose activity changed during saccade tasks. It has traditionally been accepted that serotonin neurons in the DRN show slow and regular firing with broad spikes (Aghajanian et al., 1978; Sawyer et al., 1985; Jacobs and Fornal, 1991; Hajos et al., 1998), although recent studies may not agree with this characterization (Allers and Sharp, 2003; Kocsis et al., 2006). To examine whether such electrophysiological properties were correlated with reward-related modulation, we first grouped 71 DRN neurons (whose spike shapes were successfully recorded) based on their spike durations (shorter or longer than 2 ms) and baseline firing rates (higher or lower than 3 Hz) (Tables 1 and 2). These criteria were chosen based on a previous study reporting that the mean spike duration of immunohistochemically identified serotonin neurons was 2.17 ms (range, 1.67–3.5) and the mean baseline firing rate was 1.67 Hz (range, 0.37–3.0), respectively (Allers and Sharp, 2003). During both prereward and postreward periods, there was no tendency that neurons in specific categories show specific types of reward modulation (χ2 test, p > 0.5).
We further examined whether the reward-related features of DRN neurons were correlated with any combination of the electrophysiological properties (Fig. 9). There was no significant difference between large- and small-reward preferring neurons in baseline firing rate, spike duration, or irregularity (Kruskal–Wallis, p > 0.05). Furthermore, multiple regression analysis indicated that reward effects in ROC values could not be significantly predicted by any linear combination of these three variables (prereward, p = 0.17; postreward, p = 0.68).
Neurochemical identity of recorded DRN neurons
Pharmacological and behavioral studies have suggested that the dorsal and median raphe nuclei (DRN and MRN) are important elements of the brain reward circuitry (Higgins and Fletcher, 2003; Liu and Ikemoto, 2007). However, it was unknown whether and how reward information is represented in the DRN. Our experiments now demonstrate that single neurons in the monkey DRN encode reward information before and after the delivery of reward.
Many serotonergic neurons in the brain (∼40% in cats; 60% in rats) are located in the DRN (Wiklund et al., 1981). The lateral component (the wings) of DRN, best developed around the trochlear nucleus, is most prominent in primates (Jacobs and Azmitia, 1992) and that was where we sampled most of the neurons. Among heterogeneous DRN neurons containing different neurotransmitters (for review, see Michelsen et al., 2007), previous studies report that a substantial proportion of DRN neurons are serotonergic: ∼30% in rats (Descarries et al., 1982), 70% of medium-sized DRN neurons in cats (Wiklund et al., 1981), and 70% in human (Baker et al., 1991). Recent combined electrophysiological and immunochemical studies revealed that DRN neurons with “traditional” electrophysiological characteristics, such as long spike duration and low and regular baseline firing, are not always serotonergic (Allers and Sharp, 2003; Kocsis et al., 2006). Nevertheless, as shown in Tables 1 and 2, 52% of our sample neurons exhibited long spike duration (>2 ms) and 50% exhibited low firing rate (<3 Hz), consistent with the proportion of “classical” serotonergic neurons in DRN. We also found that these neurons did show reward-dependent modulation in activity, indicating that a group of classical serotonergic DRN neurons modulate their activity depending on reward information. We also observed 12% of neurons exhibited baseline firing rate >10 Hz and they also exhibited reward-dependent modulation. Such DRN neurons with high firing rates may be GABA neurons [Allers and Sharp, (2003), their Table 1].
Reward information is differently coded by dopamine and DRN neurons
Reward-dependent modulations in the activity of DRN neurons were different from those observed in putative dopamine neurons. First, whereas the dopamine neurons predominantly responded to a reward-predicting sensory stimulus, DRN neurons responded to both the reward-predicting stimulus and the reward itself. Second, whereas dopamine neurons respond to a reward only when it was larger or smaller than expected, DRN neurons reliably coded the value of the received reward whether or not it was expected. Unlike DRN neurons, dopamine neurons responded to reward delivery only when the cue position–reward contingency was switched so that the reward was unexpectedly small or large (Fig. 8). In other words, dopamine neurons encoded reward prediction error, as suggested previously (Schultz, 1998; Satoh et al., 2003; Kawagoe et al., 2004), but DRN neurons did not. Third, whereas dopamine neurons invariably preferred larger rewards (i.e., are excited by larger rewards), the DRN contains neurons preferring larger rewards and neurons preferring smaller rewards. Finally, whereas dopamine neurons exhibit phasic responses, DRN neurons typically exhibited tonic responses. Thus, whereas dopamine neurons provide phasic signals related to reward prediction error, DRN neurons provide tonic signals related to expected and received reward values.
The responses of DRN neurons were diverse, compared with relatively stereotyped responses of dopamine neurons. This may be because DRN neurons are heterogeneous, containing different neurotransmitters such as GABA, dopamine, noradrenaline, substance P, nicotine, and acetylcholine (for review, see Michelsen et al., 2007), in addition to serotonin neurons, which constitute 30–70% of DRN neurons (Descarries et al., 1982; Leger and Wiklund, 1982). However, in the current experiment, putative dopamine neurons were selected based on their firing rates, spike shapes, and responsiveness to 1DR-saccade tasks, which might be a reason why their task-related activity was quite homogeneous.
Reward processing in the DRN: inputs
The reward-related signals in DRN neurons may originate from the brain areas that project to the DRN (Aghajanian and Wang, 1977; Sakai et al., 1977; Behzadi et al., 1990; Peyron et al., 1998). Notable among them are (1) dopamine neurons in the substantia nigra pars compacta and the ventral tegmental area and (2) the lateral habenula. The dopamine neurons, which project to both the DRN and MRN (Kitahama et al., 2000), may exert facilitatory effects on putative serotonin neurons in the DRN (Haj-Dahmane, 2001). Because the dopamine neurons are excited by the stimulus that predicts a large reward, DRN neurons would also be excited by the large-reward-predicting stimulus. Indeed, during the prereward period, large-reward preference was more common than small-reward preference. In contrast, DRN neurons are inhibited by electrical stimulation of the lateral habenula (Wang and Aghajanian, 1977; Stern et al., 1979; Varga et al., 2003). Using the same reward-biased saccade tasks, a recent study from our laboratory showed that lateral habenula neurons exhibit strong small-reward preference (i.e., inhibited by stimuli that predict large rewards and excited by stimuli that predict small rewards) (Matsumoto and Hikosaka, 2007). These changes in habenula activity would then be translated into the large-reward preference in DRN neurons.
In contrast, the postreward responses of DRN neurons are unlikely to be derived from dopamine or habenula neurons because neither of them exhibit reliable postreward responses. Possible origins of the postreward information include the hypothalamus (Celada et al., 2002) and the medial prefrontal cortex (Hajos et al., 1998; Varga et al., 2003). Hypothalamic orexin neurons are activated by arousal, feeding, and rewarding stimuli (Mieda and Yanagisawa, 2002; Harris and Aston-Jones, 2006). They project to the DRN in addition to many other areas (Peyron et al., 1998) and facilitate serotonin release (Tao et al., 2006). Medial prefrontal cortex inputs to the DRN and MRN attenuate the increase in serotonin release in response to aversive stimuli (Amat et al., 1998).
In the postreward period, about one-half of DRN neurons showed large-reward preference and the other one-half showed small-reward preference. One possible interpretation would be that the two kinds of reward-related signals are represented in other brain areas such as the anterior cingulate cortex (Niki and Watanabe, 1979; Amiez et al., 2006) and these signals are transmitted to DRN (Arnsten and Goldman-Rakic, 1984). Another possibility is that reward information is transferred from one group of neurons to the other via inhibitory connections within the DRN. It has been suggested that the ventral medial prefrontal cortex inhibits serotonin neurons in the DRN by targeting local GABAergic interneurons (Varga et al., 2001). Thus, the modulation in activity of some DRN neurons may be in opposite direction to the others depending on the direct or indirect projection from the cortex.
Reward processing in the DRN: outputs
Among the widespread efferent projections of the DRN (Lavoie and Parent, 1990; Vertes, 1991), those to the basal ganglia structures, especially, the striatum and the substantia nigra (van der Kooy and Hattori, 1980; Imai et al., 1986), may be particularly important because they are thought to control reward-dependent saccadic eye movements (Hikosaka et al., 2006).
Many lines of evidence suggest that an inhibition of raphe neurons causes a rewarding effect and that this is mediated, at least partly, by the disinhibition of dopamine neurons. Electrical stimulation of the DRN and MRN causes inhibitions of dopamine neurons, which are mediated by serotonin released in the substantia nigra (Dray et al., 1976; Tsai, 1989; Trent and Tepper, 1991). Self-administration of muscimol into the raphe nuclei causes rewarding effects in behavior, and this effect is dependent on normal dopamine function (Liu and Ikemoto, 2007). It has been suggested that dopamine actions in the basal ganglia are antagonized by serotonin that derives from the DRN or MRN (Kapur and Remington, 1996). Thus, the inhibition of the DRN/MRN followed by the enhancement of dopaminergic transmission in the basal ganglia appears to be rewarding (Fletcher et al., 1993).
The DRN may have a more direct route to influence saccadic eye movements, which is its projection to the substantia nigra pars reticulata (SNr) (Corvaja et al., 1993). The SNr is known to exert tonic GABAergic inhibition on the superior colliculus and to remove this inhibition in response to sensory, memory, and motivational demands (Hikosaka et al., 2006).
Possible functions of the DRN in reward processing
Characteristic features of the activity of DRN neurons were that (1) their reward-related response pattern was tonic, and (2) the changes were of either large- or small-reward preference. Such activation patterns may be useful in integrating appetitive or aversive reward information for a substantial time, as suggested by Solomon and Corbit (1974). This may also explain the experimental results indicating that serotonin-depleted animals show impulsive tendencies. That is, systemic or local depletion of serotonin renders the animal likely to choose a small but immediate reward rather than a large but delayed reward (Wogar et al., 1993; Brunner and Hen, 1997; Harrison et al., 1997; Mobini et al., 2000a,b; Winstanley et al., 2004, 2006; Denk et al., 2005). The human DRN was activated when subjects learned to obtain large future rewards (Tanaka et al., 2004). Long-lasting DRN activity may have other functions as well, because impulsivity has been associated with other serotonin-related behavioral tendencies such as aggression (Mehlman et al., 1994; van Erp and Miczek, 2000) and obsession (Insel et al., 1990).
The coding of delayed rewards has been a long-standing issue in reinforcement learning theories (Cardinal et al., 2001). Recent studies have suggested that multiple neural systems may participate in the representation of rewards at different timescales (McClure et al., 2004; Tanaka et al., 2004). One hypothesis is that serotonin regulates the balance between immediate and delayed rewards (Doya, 2002). Daw et al. (2002) suggested that the current reward value is represented by the phasic activation of dopamine neurons, whereas the average value is represented by the tonic activation of serotonin neurons. We found indeed that one-half of DRN neurons exhibited such reward-related tonic activation. However, our results do not completely support the theory because the tonic activation of DRN neurons did not seem to accumulate across trials. Additional experiments using tasks involving long-term reward prediction will be necessary to test this hypothesis.
In conclusion, our experiments demonstrate that many neurons in the monkey DRN encode expected and received rewards. They do so in a manner distinctly different from dopamine neurons. It remains to be solved whether and how the DRN signals are used for the reward-based modulation of motor behavior or learning.
This work was supported by the intramural research program of the National Eye Institute. We thank Dr. Long Ding and Ethan Bromberg-Martin for helpful comments. We thank GC America, Inc., for providing us with dental acrylic.
- Correspondence should be addressed to Kae Nakamura at her present address: Department of Physiology, Kansai Medical University, School of Medicine, 10-15, Fumizono-cho, Moriguchi-city, Osaka 570-8506, Japan.