Abstract
The dorsal raphe nucleus and its serotonin-releasing neurons are thought to regulate motivation and reward-seeking. These neurons are known to be active during motivated behavior, but the underlying principles that govern their activity are unknown. Here we show that a group of dorsal raphe neurons encode behavioral tasks in a systematic manner, tracking progress toward upcoming rewards. We analyzed dorsal raphe neuron activity recorded while animals performed two reward-oriented saccade tasks. There was a strong correlation between the tonic activity level of a neuron during behavioral tasks and its encoding of reward-related cues and outcomes. Neurons that were tonically excited during the task predominantly carried positive reward signals. Neurons that were tonically inhibited during the task predominantly carried negative reward signals. Neurons that did not change their tonic activity levels during the task had weak reward signals with no tendency for a positive or negative direction. This form of correlated task and reward coding accounted for the majority of systematic variation in dorsal raphe response patterns in our tasks. A smaller component of neural activity reflected detection of reward delivery. Our data suggest that the dorsal raphe nucleus encodes participation in a behavioral task in terms of its future motivational outcomes.
Introduction
The dorsal raphe nucleus has a role in many brain functions, such as sleeping and waking (Dugovic, 2001), locomotion (Jacobs and Fornal, 1993), emotion and social relationships (Davidson et al., 2000; Moskowitz et al., 2003), and coping with stress (Graeff et al., 1996; Holmes, 2008). In addition to these functions, there is evidence that the dorsal raphe nucleus regulates motivated behavior (Rompre and Miliaressis, 1985; Diotte et al., 2001; Tanaka et al., 2004; Chamberlain et al., 2006; Liu and Ikemoto, 2007). Notably, the dorsal raphe contains serotonin-releasing neurons that project to nearly all of the cerebral cortex and a large number of subcortical areas. Serotonin release then has a potent influence on actions to gain rewards and avoid punishments (Rogers et al., 1999; Daw et al., 2002; Doya, 2008; Dayan and Huys, 2009). The precise mechanism by which dorsal raphe neurons cause these effects is unknown, partly because only a few investigations have examined the activity of these neurons during motivated behavioral tasks (Nakamura et al., 2008; Ranade and Mainen, 2009). In a previous investigation, we found that many dorsal raphe neurons changed their activity in response to reward outcomes and reward-related cues (Nakamura et al., 2008). Their responses took on a variety of forms, because neurons could be either excited or inhibited by multiple reward-related events, often producing gradual tonic changes in activity lasting throughout multiple phases of a behavioral task.
We theorized that these tonic changes in dorsal raphe activity would be ideal to encode sustained aspects of motivated behavior such as the state of expectation of future rewards. This quantity has a central role in theories of reinforcement learning, which posit that organisms continually track the expected value of their current situation, or “state value,” and that changes in this value are used as instructive signals that indicate when the situation becomes better or worse than expected (Schultz et al., 1997). A recent study found that this form of value coding was present in single neurons in the amygdala (Belova et al., 2008). Amygdala neurons tracked progress through a behavioral task, such that the response of a neuron to the start of the task was strongly correlated with its response to reward cues and outcomes. The amygdala receives dense innervation from the dorsal raphe nucleus (Sadikot and Parent, 1990; Freedman and Shi, 2001), suggesting that this form of reward value coding might be present in dorsal raphe neurons, which could broadcast this signal throughout the brain through their diverse projections.
To test this hypothesis, we analyzed dorsal raphe neural activity recorded while animals progressed through several phases of reward-oriented tasks. We found a strong correlation between neural responses to the start of the task and neural responses to its reward cues and outcomes. This form of correlated task and reward coding accounted for the majority of systematic variation in dorsal raphe response patterns, suggesting that the dorsal raphe nucleus encodes participation in a behavioral task primarily in terms of its future motivational outcomes.
Materials and Methods
General.
The subjects in this study were two rhesus monkeys (Macaca mulatta), animal E (male) and animal L (female). All animal care and experimental procedures were approved by the Institute Animal Care and Use Committee and complied with the Public Health Service Policy on the humane care and use of laboratory animals. A head-holding device, a chamber for unit recording, and a scleral search coil were implanted under general anesthesia. During experimental sessions, animals were seated in a primate chair and placed in a sound-attenuated room. All aspects of the behavioral experiment, including presentation of stimuli, monitoring of eye movements, monitoring of neuronal activity, and delivery of reward, were under the control of a QNX-based real-time experimentation data acquisition system (REX; Laboratory of Sensorimotor Research, National Eye Institute/National Institutes of Health, Bethesda, MD). Eye position was monitored at 1 ms resolution. Stimuli generated by an active matrix liquid crystal display projector (PJ550; ViewSonic) were rear-projected on a frontoparallel screen 25 cm from the animal's eyes.
Behavioral tasks.
Animals performed two reward-biased saccade tasks during separate experiments. The first was the memory-guided saccade task (Fig. 1A, MGS). Each trial started with the appearance of a central fixation point (0.6° diameter). The animal was required to shift its gaze to the fixation point and maintain fixation within a window of ∼3°. After 1000–1500 ms fixation, a visual target (1.2° diameter) was presented for 100 ms on the left or right side of the screen (20° eccentricity). The position of the target was chosen pseudorandomly such that, within every “subblock” of four trials, each of the two positions was chosen twice. The animal had to maintain fixation for 800 ms until the fixation point disappeared, at which point the animal was required to saccade toward the memorized target position. A correct saccade was signaled by the appearance of the target with a 100 ms delay. On rewarded trials, a liquid reward was then delivered through a spout controlled by a solenoid valve after an additional 100 ms delay. The reward delivery duration was ∼500 ms. If the animal broke fixation at any time during the fixation period or failed to make a saccade to the cued position, an error tone sounded, the trial was aborted, and same trial was repeated until the animal performed it correctly. The intertrial interval, which started at the time of reward offset and lasted until fixation onset in the next trial, was fixed at 3000 ms (for five recording sessions, an interval of 4500 or 6000 ms was used instead).
Behavioral tasks. A, Memory-guided saccade task with biased reward schedule. B, Visually guided saccade task (VGS) with biased reward schedule. In each task, the animal was required to hold its gaze on a fixation point that appeared at the center of the screen. In the memory-guided saccade task (MGS), a visual target was briefly flashed on the left or right side of the screen. The animal was required to hold fixation until the fixation point turned off and then saccade to the remembered location of the target. In the visually guided task, a visual target appeared that the animal was required to saccade to immediately. Both tasks had a reward schedule with two blocks of trials. In one block of 20–32 trials, the left target was rewarded and the right target was unrewarded; in the next block of trials, their reward values were reversed.
The second task was the visually guided saccade task (Fig. 1B, VGS). Animals were required to maintain fixation for 1200 ms, at which point the fixation point disappeared and the target appeared on the left or right side of the screen. The animal was required to saccade to the target immediately. The trial sequence and the reward schedule were the same as in the memory-guided task.
The biased reward schedule was introduced in blocks (Kawagoe et al., 1998). In one block of 20–32 trials (10–16 trials for each direction), saccades to one target location were rewarded (0.4 ml of liquid), whereas saccades to the other target location were unrewarded (for two recording sessions, a small reward of ∼0.01 ml was delivered instead). In the next block of trials, the reward values of the two targets were switched without warning to the animal.
Neural recording.
Neural recording procedures were performed using conventional single-neuron recording techniques, which were targeted to the dorsal raphe nucleus as described previously (Nakamura et al., 2008). Briefly, recording chambers were implanted to allow insertion of tungsten electrodes (diameter, 0.25 mm; 1–3 MΩ; FHC) aimed at the dorsal raphe nucleus. The location of the dorsal raphe nucleus was estimated using magnetic resonance imaging, and the recording location was later verified histologically. The distance of the recording sites from the midline was 1 or 1.5 mm. The anteroposterior extent of the recording sites was 2 mm, which corresponded to 6–8 mm anterior to the level of the ear canals (Horsley–Clarke coordinates). We studied all well isolated dorsal raphe neurons whose activity changed during saccade tasks. The dorsal raphe nucleus is known to be a major source of serotonin (Dahlström and Fuxe, 1964; Leger et al., 2001). It also contains several other types of cells, such as GABAergic interneurons, and the type of a neuron cannot always be identified from its electrophysiological properties (Allers and Sharp, 2003; Kocsis et al., 2006; Hajós et al., 2007). Our recordings are therefore likely to include a mixture of serotonin and non-serotonin neurons.
Database.
We analyzed neurons recorded in a previous study (Nakamura et al., 2008). Our database consisted of 98 neurons in animal E (4 recorded in the memory-guided task, 81 recorded in the visually guided task, 13 recorded in both tasks) and 88 neurons in animal L (17 recorded in the memory-guided task, 21 recorded in the visually guided task, 50 recorded in both tasks). This resulted in a total of 84 neurons for the memory-guided task and 165 neurons for the visually guided task. The analysis included neural data from all correctly performed trials, excluding the first three trials of each block when animals were adapting to the change in reward location (Nakamura et al., 2008). Electrophysiological properties of neurons were quantified in terms of spike duration, spiking irregularity, and baseline firing rate (Nakamura et al., 2008). Spike duration was measured between the peaks of the first negative deflection and second positive deflection of the spike waveform (Nakamura et al., 2008). Spike duration was analyzed in the subset of neurons for which the spike waveform was recorded (memory-guided task, n = 43 of 84; visually guided task, n = 70 of 165). Spiking irregularity was measured for each spike using the irregularity metric |log(I1/I2)|, where I1 is the interspike interval ending in that spike, and I2 is the interspike interval starting with that spike (Davies et al., 2006). The irregularity index of each neuron was defined as the median of the irregularity metrics of all of its spikes recorded during performance of correct trials. The baseline firing rate was measured in a 1000 ms window before fixation point onset.
Receiver operating characteristic analysis.
We analyzed neural activity primarily by using receiver operating characteristic (ROC) analysis to compare spike counts collected at two times during a trial or collected during two different task conditions. The ROC area is the probability that a randomly chosen spike count from the first condition has a higher value than a randomly chosen spike count from the second condition (excluding ties) (Green and Swets, 1966). Thus, if the ROC area is 1, then neural activity is always higher in the first condition. If the ROC area is 0.5, then neural activity does not discriminate between the two conditions. If the ROC area is 0, then neural activity is always higher in the second condition.
For the analysis of population activity (see Fig. 3), the normalized activity of each neuron was calculated at each millisecond as the ROC area comparing the spike counts of the neuron collected in a 151 ms window centered on that time versus the spike counts collected during a baseline period represented by three nonoverlapping 151 ms windows starting 2000 ms before fixation point onset. Neurons were classified as “positive reward,” “negative reward,” or “no outcome response” based on their significant positive, significant negative, or nonsignificant reward discrimination in a window 150–450 ms after outcome onset (p < 0.05, Wilcoxon rank-sum test; see below for the definition of reward discrimination).
For the correlation between fixation period activity and reward coding (see Fig. 4), the fixation period response of each neuron was defined as the ROC area comparing a window 500–900 ms after fixation point onset versus a prefixation window 0–400 ms before fixation point onset. The reward discrimination of each neuron was defined as the ROC area comparing spike counts collected on rewarded trials versus spike counts collected on unrewarded trials, separately for the target period (150–450 ms after target onset), go period (150–450 ms after fixation point offset), and outcome period (150–450 ms after reward or nonreward delivery onset). Statistical significance and p values were determined using Wilcoxon rank-sum tests. For the analysis of absolute discrimination strength (see Fig. 4B), the absolute ROC area was computed as 0.5 plus the absolute value of the difference between the ROC area and 0.5. Thus, ROC areas >0.5 were unchanged, but ROC areas <0.5 were “reflected” to become >0.5 (e.g., 0.36 would become 0.64). All correlations were done using Spearman's rank correlation (rho), and statistical significance was determined using permutation tests (20,000 permutations).
Principal component analysis.
Our inspection of neural activity indicated that neurons had complex response patterns including excitation and inhibition by multiple task events. This raised the possibility that the dorsal raphe nucleus processed multiple task-related signals and that these signals were mixed in different combinations in individual neurons. In an attempt to extract such component signals underlying dorsal raphe activity, we used principal component analysis, a method for representing a high-dimensional dataset as a linear combination of a smaller number of components (Richmond and Optican, 1987; Paz et al., 2005). This analysis was performed separately for the memory-guided and visually guided tasks. For each neuron, we constructed a neural “activity profile” as follows. The mean firing rate of the neuron was calculated at 1 ms resolution, in a time window aligned on 10 separate task events: fixation point onset (separately for the two blocks, ipsilateral-rewarded and contralateral-rewarded), target onset (separately for the four combinations of block × reward outcome), and outcome onset (separately for the four combinations of block × reward outcome). For each task event, the time series of firing rate was smoothed with a 151 ms running average and then subsampled at 25 ms resolution. The 10 resulting time series of firing rates were then concatenated to produce the activity profile of the neuron, a vector that contained either 760 data points (for the memory-guided task) or 632 data points (for the visually guided task). The activity profile of each neuron was then converted from firing rate (spikes per second) to a normalized firing rate by subtracting its mean and dividing it by its SD [thus converting the firing rates into Z-scores (Ranade and Mainen, 2009)]. This normalization step prevented the results from being disproportionately influenced by a small number of neurons with high firing rates. Finally, principal component analysis was applied to a matrix that consisted of one row for each neuron, representing the activity profile of that neuron. This produced three outputs. The first output was a sequence of principal components, each resembling a neural activity profile (shown for the first two components in Fig. 5A,B). The activity of each neuron can be described as the sum of the mean neural activity profile (Fig. 5C) plus a weighted linear combination of all of the principal components. The second output was the weightings of each component for each neuron (the weights are shown for the first two components in Fig. 5E). Because principal components are only specified up to an arbitrary scaling factor, we chose to scale each principal component so that its single neuron weights had unit variance. The third output was the percentage of the variance in neural activity profiles that was explained by each component (shown for the first eight components with black lines in Fig. 5D). By definition, 100% of the variance in the data could be explained by using a linear combination of all of the components (760 components for the memory-guided task or 632 components for the visually guided task). In our data, ∼30% of the variance could be explained using a linear combination of the first two components (see Fig. 5E).
Ideally, the principal components represent systematic structure in neural activity profiles, such that multiple neurons responded in similar manners at similar times during the task. In contrast, a null hypothesis is that the principal components represent idiosyncratic activity patterns specific to individual neurons, which happened to occur at similar times simply by chance. To represent this null hypothesis, we generated synthetic datasets using a shuffling procedure in which the activity profile of each neuron was shifted by a random temporal offset. Specifically, suppose a neuron was recorded during the memory-guided task, so that its activity profile had 760 data points. The neuron was then assigned a temporal offset randomly chosen between 1 and 760. If the temporal offset of a neuron was 20, then its original activity profile consisting of the data points numbered (1, 2, …, 760) was shifted by 20 data points, so that it now consisted of the data points numbered (21, 22, …, 760, 1, 2, 3, …, 20). Thus, the activity profile of each neuron retained its original “shape” (i.e., the temporal correlation between its firing rates measured at successive moments in time). However, for any pair of neurons, the shuffling caused their activity profiles to become uncorrelated with each other [removing any “signal correlation” (Averbeck et al., 2006) that would reflect a tendency for neurons to be active in similar manners at similar times during the task]. This shuffling procedure was repeated 200 times. The principal component analysis was performed on each shuffled dataset, each time calculating the percentage of variance explained by each principal component. The range of these 200 values are shown as gray error bars in Figure 5D. Similar results were observed using other shuffling procedures, such as shuffling neural responses to each of the 10 task events among neurons (thus removing any tendency for neurons to respond in correlated manners to multiple task events).
Results
We analyzed the activity of neurons recorded from the dorsal raphe nucleus while monkeys performed eye movement tasks with biased reward schedules (Nakamura et al., 2008) (Fig. 1A,B). Each trial began with the appearance of a fixation point at the center of the screen, which animals were required to fixate. In the visually guided saccade task (n = 165 neurons), a visual target appeared on the left or right side of the screen toward which the animal made an immediate saccade (Fig. 1B, VGS). In the memory-guided saccade task (n = 84 neurons), the target location was cued to the animal with a brief flash of light, which the animal was required to remember and use to guide its later saccade (Fig. 1A, MGS). The tasks were run using a reward-biased procedure with separate blocks in which the two target locations had different reward values. Thus, the targets acted as both goals for future saccades and cues to the upcoming reward outcome. Animals closely tracked the reward values of the two targets, saccading readily to rewarded targets and sluggishly to unrewarded targets (Table 1) (Nakamura et al., 2008).
Saccadic reaction times to the rewarded and unrewarded targets for each animal during the memory-guided saccade task (MGS) and visually guided saccade task (VGS)
To test whether dorsal raphe neurons encoded the reward value of ongoing events, we analyzed the relationship between tonic activity during the behavioral tasks and differential responses to reward cues and outcomes. If neurons encoded behavioral tasks primarily in terms of their reward value, then cells excited by the task should be preferentially excited by reward cues (i.e., carrying positive reward signals), whereas cells inhibited by the task should be preferentially inhibited by reward cues (i.e., carrying negative reward signals). Conversely, if cells encoded behavioral tasks and reward value in independent manners, then there should be no systematic relationship between task onset-related activity and reward-related activity.
Inspection of single-neuron activity suggested that task and reward coding were systematically related. Figure 2A shows a neuron that increased its tonic activity during the memory-guided saccade task. The neuron had a brief phasic response to the fixation point, followed by a sustained elevation in tonic activity during the fixation period. Then when the target appeared, the neuron responded with a positive reward signal, with higher activity in response to the reward-indicating target than the no-reward-indicating target. This differential response continued persistently during the memory delay period, indicating that it encoded the expectation of upcoming rewards rather than simply a visual response to the target stimulus. The differential response continued at the time the reward outcomes were delivered and then decreased during the intertrial interval.
Activity of three dorsal raphe neurons during the memory-guided saccade task. Neural activity is aligned on fixation point onset (left), target onset (middle), and outcome onset (right). Curves indicate average firing rate on all trials (black), rewarded trials (red), or unrewarded trials (blue). Spiking activity was smoothed with a Gaussian kernel (σ = 20 ms). Black asterisks (**) indicate significantly different activity during the 500–900 ms after fixation point onset compared with a prefixation period 0–400 ms before fixation point onset (p ≤ 0.005, Wilcoxon rank-sum test). Red and blue asterisks (**) indicate significantly different activity for the two reward conditions during a 150–450 ms window after target onset, go onset, or outcome onset (from left to right). A, Neuron that increased its tonic activity during the task and emitted positive reward signals in response to the targets and outcomes. B, Neuron that decreased its tonic activity during the task and emitted negative reward signals in response to the target and outcomes. C, Neuron that did not change its tonic activity during the task and had little or no reward signals.
Figure 2B shows a neuron that decreased its tonic activity during the task. This neuron decreased its activity in response to the fixation point and then responded to the target with a negative reward signal, with higher activity on unrewarded trials than rewarded trials. This differential response was maintained during the memory delay, grew larger in response to the outcome delivery, and then faded away during the intertrial interval.
Figure 2C shows a neuron with no tonic change in activity during the task. This neuron had a small phasic response to the fixation point and a small excitatory response to the rewarded target, but its reward value signal was weak and did not continue in a sustained manner.
The pattern seen in these cells was the predominant pattern in the population as a whole. The population average normalized activity of dorsal raphe neurons is shown in Figure 3, separately for the memory-guided and visually guided tasks (left and right columns) and separately for neurons with positive, negative, or no significant reward signals in response to the outcomes (top, middle, and bottom rows). Neurons with positive reward signals for the outcome had elevated activity during the early period of the task (Fig. 3A). If the rewarded target appeared, their activity was elevated further, whereas if the unrewarded target appeared, they returned to near baseline. Neurons with negative reward signals had suppressed activity during the early period of the task (Fig. 3B). If the rewarded target appeared, their activity was further suppressed, whereas if the unrewarded target appeared, they returned to near baseline. Neurons with no significant reward signals had a tendency for small phasic responses to the fixation point and the targets and slightly elevated activity during the task. However, they had no strong tendency for sustained tonic activity triggered by task events (Fig. 3C). Similar activity patterns were found in both tasks. Neurons also had a tendency for small phasic responses to the fixation point and targets, consistent with past observations that some dorsal raphe neurons have transient sensory responses to light flashes and auditory clicks (Heym et al., 1982; Ranade and Mainen, 2009).
Population average activity of dorsal raphe neurons separated by their reward signals in response to the outcome. A–C, Normalized activity is shown for the memory-guided saccade task (MGS, left) and visually guided saccade task (VGS, right), separately for positive-reward cells (A, top), negative-reward cells (B, middle), and non-outcome responsive cells (C, bottom). Neurons were sorted into these categories based on significant reward discrimination during a 150–450 ms window after outcome onset (gray bar on x-axis; p < 0.05, Wilcoxon rank-sum test). The histograms below (C) show the reward discrimination for each neuron, with colors indicating positive-reward cells (red) and negative-reward cells (blue). For the plots of normalized activity, the activity of each neuron was smoothed with a 151 ms running average and normalized to lie between 0 and 1 by computing its ROC area versus the baseline activity of the neuron during the intertrial interval (see Materials and Methods). Thick lines indicate mean normalized activity and the light shaded areas are ±1 SEM. In both tasks, neurons with positive reward discrimination between outcomes had elevated activity during the tasks and positive responses to the rewarded target (A). Neurons with negative reward discrimination between outcomes had suppressed activity during the tasks and negative responses to the rewarded target (B).
The above impressions from inspection of neural data were borne out by statistical analysis (Fig. 4A). We used ROC analysis to measure the change in firing rate of each neuron during the fixation period (Green and Swets, 1966). The ROC area was 0 if the neuron was strongly inhibited by the task, 0.5 if it stayed at its prefixation firing rate, and 1 if the neuron was strongly excited. We used a similar measure of neural discrimination between rewarded and unrewarded trials, which was 0 if the neuron had a higher firing rate on unrewarded trials, 0.5 if it had no discrimination, and 1 if the neuron had a higher firing rate on rewarded trials. We found that the activity of a neuron during the fixation period was strongly positively correlated with its degree of reward discrimination during the periods after target onset and outcome delivery (all p ≤ 0.002, permutation test) (Fig. 4A). During the memory-guided task, activity during the fixation period was also positively correlated with reward discrimination during the later part of the memory delay, when the fixation point turned off and the animal made a saccade to collect the reward outcome (“go period”, p < 0.001) (Fig. 4A). An important point is that fixation activity and reward discrimination were not merely correlated with each other, but there was a lawful relationship between their signs. This can be seen by the fact that the best-fit regression line passed through the center of each plot (Fig. 4A). Thus, most cells with enhanced fixation activity had positive reward signals, most cells with suppressed fixation activity had negative reward signals, and cells with no change in fixation activity had no bias toward positive or negative reward signals.
Correlation between dorsal raphe neuron task coding and reward coding. A, Plot of fixation period response (x-axis) versus reward-related response (y-axis) separately for the memory-guided saccade task (MGS, top) and visually guided saccade task (VGS, bottom). The fixation period response was measured as the ROC area for each neuron for discriminating between its firing rate 500–900 ms after fixation point onset versus a prefixation period 0–400 ms before fixation point onset. Reward discrimination was measured during several time windows during the trial (columns): after target onset, after fixation offset (go period), and after outcome onset. Text indicates rank correlation (rho), and asterisks indicate its p value (*p < 0.05; **p < 0.01; ***p < 0.001, permutation test). Dark dots indicate cells with a significant excitation or inhibition during the fixation period (p < 0.05, Wilcoxon rank-sum test). Colored dots indicate cells with significantly higher activity during rewarded trials (red) or during unrewarded trials (blue) (p < 0.05, Wilcoxon rank-sum test). Black lines indicate the line of best fit calculated with type 2 least-squares regression. Neural activity during the fixation period was positively correlated with reward coding during the target, go, and outcome periods. B, Same as A but using absolute ROC area, which ranges between 0.5 (no discrimination) and 1.0 (perfect discrimination), independent of the direction of activity changes or reward discrimination. Neurons with strong responses during the fixation period had significantly stronger reward signals.
The tonic activity of a neuron during the task also predicted its absolute strength of reward coding (Fig. 4B). This was indicated by an analysis using the absolute ROC area, which ranges between 0.5 for no discrimination and 1.0 for perfect discrimination, regardless of the direction of activity changes. The absolute ROC area for the fixation period response was positively correlated with the absolute ROC area for reward discrimination during all tasks and during all task periods (all p < 0.05, permutation test). Thus, neurons that tracked progress through the behavioral task were the most prominent source of dorsal raphe reward signals.
If dorsal raphe neurons encode the reward value of ongoing events, then they should adapt to changing stimulus reward values with the same speed as the adaptations in animal behavior. To test this, we focused on the second trial of each block that occurred just after the reward values of the two targets had been unexpectedly switched. We tested for changes in behavioral reaction times to the targets as well as changes in neural firing rates during the target period. For this analysis, cells were classified as positive reward or negative reward based on their significant reward discrimination during the target period, excluding the first three trials of each block (p < 0.05, Wilcoxon rank-sum test). The results confirmed a previous analysis showing that both neurons and behavior rapidly adjusted to the new target values (see Nakamura et al., 2008, analysis of changes in neuronal activity with the reversal of position–reward contingency). By the second trial of the block, behavioral reaction times had shorter latencies for the new rewarded target, positive-reward neurons had higher firing rates in response to the new rewarded target, and negative-reward neurons had higher firing rates in response to the new unrewarded target (all p < 0.05 in each task, Wilcoxon signed-rank test; except negative-reward neurons in the memory-guided task, p = 0.067).
Predominant influence of correlated task and reward coding
The above analysis showed that dorsal raphe neurons encoded behavioral tasks and reward outcomes in a correlated manner. However, it did not indicate how important this correlation was to dorsal raphe neural task-related activity. Was it a dominant influence on dorsal raphe neurons, or was it merely one of many systematic forms of task and reward encoding? In particular, the above analysis was limited because it relied on narrow time windows chosen by eye, whereas most neurons had slow, tonic changes in activity spanning multiple phases of the task. We sought a method that could search for patterns in dorsal raphe activity during all task phases in an unbiased manner.
To achieve this goal, we used principal component analysis (Richmond and Optican, 1987; Paz et al., 2005) (Fig. 5). This method describes neural activity as a linear combination of “principal components” such that the first principal component explains the greatest amount of variance, the second principal component explains the second greatest amount of variance, and so on. The rationale for this approach is that the first few principal components represent the most common patterns of neural activity, and the importance of each component to neural coding is suggested by the amount of variance in the data that it explains. To apply this technique to our data, we assigned each neuron an activity profile consisting of its time series of normalized firing rate in response to each of five task events (fixation point, rewarded target, rewarded outcome, unrewarded target, and unrewarded outcome) during each of the two blocks of trials of the task (ipsilateral-rewarded block, contralateral-rewarded block). We then applied principal component analysis to a matrix containing all of the single-neuron activity profiles. This resulted in a sequence of principal components, each representing a time series of increases or decreases in neural activity, such that the activity profile of every neuron could be reconstructed as the sum of the mean neural activity profile (Fig. 5C) plus a weighted combination of the principal components. If a neuron assigned a component positive weight, then its activity was positively related to the time series of that component. If a neuron assigned a component negative weight, then its activity was negatively related to the time series of that component.
Principal components of dorsal raphe neuron activity. A, B, The first (A) and second (B) principal components of dorsal raphe neural activity profiles during the memory-guided saccade task (MGS, top) and visually guided saccade task (VGS, bottom). Curves represent the normalized firing rate of the principal component during the fixation period (black), after the onset of the rewarded target (red), and after the onset of the unrewarded target (blue), separately for the contralateral-rewarded block (dark colors) and ipsilateral-rewarded block (light colors). The first principal component (A) indicated tonically increased activity during the task and positive-reward coding during the target, memory, and outcome periods. The second component (B) indicated tonically increased activity in response to reward delivery. C, The mean neural activity profile during the memory-guided task (top) and visually guided task (bottom) consisted of phasic responses to the fixation point and targets with no conspicuous tonic activity. D, Percentage of variance in the neural activity profiles explained by the first eight principal components, separately for the true data (black lines) and shuffled datasets (gray lines). Gray error bars indicate the range of percentage variance explained observed in 200 separate shuffled datasets. Only the first two components explained more variance than expected by chance. E, Weight assigned to the first two principal components by each neuron during the memory-guided task (top) and visually guided task (bottom). Each dot represents a single neuron. Because principal components are only specified up to an arbitrary scaling factor, we chose to scale each principal component so that its distribution of single-neuron weights had unit variance. There was no systematic relationship between the weights assigned to the first and second components.
Figure 5 shows the results of this analysis. The population average activity profile for each task consisted of brief phasic responses to the fixation points and targets, with little conspicuous tonic activity (Fig. 5C). The first principal component was quite similar in both tasks and indicated a positive correlation between task onset-related activity and reward coding (Fig. 5A, “task + reward value”). It consisted of a gradual increase in tonic activity during the intertrial interval and after fixation point onset, followed by an additional increase in tonic activity in response to the rewarded target. During the memory-guided task, this positive reward signal was maintained throughout the memory delay period and after the saccade. Finally, when the reward outcome was delivered, tonic activity continued on rewarded trials but decreased if no reward was delivered. The reward signal then gradually faded during the intertrial interval. The principal component was remarkably similar during the ipsilateral-rewarded and contralateral-rewarded blocks of each task (Fig. 5A, compare light curves with dark curves), indicating that it was reproducible and was independent of reward location and saccade direction. Note that this principal component could be assigned either positive or negative weight, both of which occurred in equal numbers of neurons (Fig. 5E). Thus, some neurons assigned the principal component positive weight, representing cells that were excited by the task and carried positive reward signals. Other neurons assigned the component negative weight, representing cells that were inhibited by the task and carried negative reward signals. In summary, the principal component analysis confirmed our original observation of correlated task and reward value coding (Fig. 2–4) and, in addition, indicated that this was the most predominant form of dorsal raphe activity.
The first principal component also included a brief phasic response to the fixation point. This indicates that the brief phasic response seen in the population average activity profile (Fig. 5C) was modulated by the neural direction of task and reward coding. Specifically, if a neuron assigned the first component a positive weight, then its activity resembled the mean activity profile plus a positive contribution from the first component, which enhanced its phasic response. If a neuron assigned the first component a negative weight, then its activity resembled the mean activity profile plus a negative contribution from the first component, which reduced its phasic response. This matches the pattern seen in Figure 3, in which positive reward neurons tended to have an enhanced phasic response (Fig. 3A) and negative reward neurons tended to have a reduced phasic response (Fig. 3B).
A second principal component was also present, which was again quite similar in the two tasks. This component had little change in activity during performance of the task: a slight decrease in activity during the intertrial interval, flat activity during the fixation period, and little or no reward signal in response to the targets. However, once the task was completed, the component had a prolonged, tonic change in activity after a reward was delivered (Fig. 5B). Thus, whereas the first component resembled “task + reward value coding,” the second component resembled “reward delivery coding.” There was no systematic relationship between the neural weights assigned to these two components (Fig. 5E), suggesting that these two forms of reward coding were not found in separate groups of neurons. Instead, they existed as a continuum across the dorsal raphe population as a whole.
How important were these two forms of reward coding to dorsal raphe activity? The first principal component explained the greatest amount of variance in neural activity profiles, accounting for 17–20% of the variance in each task (Fig. 5D). The second component explained 11–13% of the variance in each task, and additional components explained considerably less (Fig. 5D). To determine which principal components represented systematic structure in dorsal raphe neuron activity, we repeated the analysis on shuffled data in which the activity profile of each neuron was shifted by a random temporal offset (Fig. 5D, gray lines). The shuffled data preserved the shape of the activity profile of each individual neuron but removed any tendency for pairs of neurons to respond in similar manners. This represents the null hypothesis that dorsal raphe neuron activity profiles had no correlation with each other. Only the first two principal components explained appreciably more variance than would be expected under the null hypothesis (Fig. 5D). This leads to two conclusions: (1) the first two principal components accounted for most of the systematic structure in dorsal raphe neural activity during the tasks; (2) the majority of this systematic structure was accounted for by the first component, representing correlated coding of the behavioral task and reward value.
Little relationship between task-related activity and electrophysiological properties
Previous reports indicated that the neurotransmitter content of dorsal raphe neurons may be related to electrophysiological properties, such as spike duration, spiking irregularity, and baseline firing rate (Aghajanian et al., 1978; Sawyer et al., 1985; Jacobs and Fornal, 1991; Hajós et al., 1998) (but see Allers and Sharp, 2003; Kocsis et al., 2006). We therefore tested whether these properties were correlated with the task-related signals identified in this study (Fig. 6). There was no clear evidence for such correlations. In both memory-guided and visually guided tasks, the spike duration and spiking irregularity of a neuron had no detectable correlation with its fixation period activity or with the weights it assigned to the first two principal components (Fig. 6A,B) (rho between −0.02 and +0.26; p > 0.05 for each electrophysiological property in each task). There was a tendency for modest negative correlations between baseline firing rate and measures of task-related activity, with rank correlations ranging from −0.23 to −0.02, some of which reached statistical significance (Fig. 6C). However, modest negative correlation would be expected by chance simply as a result of a floor effect on neural activity. Neurons with low baseline firing rates could have large task-related increases in activity but could only have small decreases in activity, because their firing rate could not go below zero. Thus, aside from this constraint, electrophysiological properties did not appear to have a consistent relationship with task-related activity.
Little relationship between task-related activity electrophysiological properties. A, The relationship between spike duration (y-axis) and task-related signals (x-axis) including fixation period activity (left column) and the single-neuron weights assigned to the first two principal components (middle and right columns). Data are shown separately for the memory-guided saccade task (MGS, left subcolumns) and visually guided saccade task (VGS, right subcolumns). Each dot is one neuron. Spike waveforms were recorded in a subset of neurons (memory-guided task, n = 43 of 84; visually guided task, n = 70 of 165). Text indicates the rank correlation (rho) and its p value (permutation test, 2000 permutations). The black lines were fit by least-squares linear regression. B, C, Same as A, for the irregularity index (B) and baseline firing rate (C). Overall, most correlations were small in size and did not reach significance (p > 0.05). The exception was a modest tendency for negative correlation between baseline firing rate and the weights of the principal components (C; rho between −0.23 and −0.02), which may be a result of a floor effect on neural activity.
Discussion
We found that most dorsal raphe neurons responded to the start of a behavioral task in the same direction as they responded to rewarding cues and outcomes and that neurons with stronger task coding also had stronger reward coding. These two signals combined so that the level of dorsal raphe activity tracked progress through the task toward obtaining future rewards.
The correlation between task and reward coding had a large influence on dorsal raphe neural activity, accounting for the majority of the systematic structure in neural response patterns during the task. Note that our analysis using principal components was not limited to reward coding and could in principle have detected a wide variety of activity patterns. For example, it could have produced principal components reflecting neural encoding of saccade execution, visual stimulus location, or reward position, variables that are strongly encoded by neurons in many cortical and subcortical areas (Hikosaka et al., 2000, 2006; Ding and Hikosaka, 2006; Kobayashi et al., 2007). The analysis could also have produced principal components reflecting purely temporal aspects of neural activity, for instance if one population of neurons was active during the fixation period, whereas a second population was active during the outcome period. However, the major principal components of dorsal raphe neuron activity were primarily insensitive to these sensory, motor, contextual, and event-timing properties (Fig. 5A,B), with the exception of small phasic responses to the fixation point and targets (Fig. 5A,C), which may represent a form of transient sensory response (Heym et al., 1982; Ranade and Mainen, 2009). This is consistent with a report that dorsal raphe encoding of these properties is generally idiosyncratic among neurons and rarely follows a systematic pattern (Ranade and Mainen, 2009). In contrast to these diverse task properties, we instead found a predominant principal component representing correlated encoding of task onset and reward value. Furthermore, this component was highly similar during two distinct behavioral tasks: a visually guided task that required simple stimulus-guided actions, and a memory-guided task that required visuospatial memory and internally guided actions. This indicates that dorsal raphe neurons systematically encoded the behavioral tasks in terms of their potential for producing future rewards.
The dorsal raphe nucleus sends serotonergic projections to many neural structures involved in reward-oriented behavior. Of particular interest is the amygdala, which receives dense serotonergic input (Sadikot and Parent, 1990; Freedman and Shi, 2001) and in which neurons carry state value signals tracking progress through a task, similar to the signals we observed (Belova et al., 2008). The amygdala also sends projections to the dorsal raphe (Peyron et al., 1998; Lee et al., 2007; Vertes and Linley, 2008). Thus, the amygdala could be either the source or the recipient of task value signals in the dorsal raphe. The dorsal raphe nucleus is well equipped to influence the amygdala with reward-related signals, because it is known that dorsal raphe activation and serotonin alter amygdala neuron activity (Wang and Aghajanian, 1977; Stutzmann et al., 1998; Stutzmann and LeDoux, 1999; Jha et al., 2005), that serotonin levels within the amygdala change during consumption of food rewards (Fallon et al., 2007), and that the expression of specific serotonin receptors alters amygdala responses to emotional events (Hariri et al., 2002, 2005; Cools et al., 2005; Holmes, 2008). This hypothesis could be tested by recording amygdala neuron reward signals while manipulating local serotonin levels. This would be predicted to cause tonic offsets in amygdala neuron activity as if the expected reward value had changed.
The dorsal raphe nucleus contains several different neuron types, including serotonin, GABA, and dopamine-releasing neurons (Michelsen et al., 2007). It would be natural to expect that these neurochemical classes of neurons correspond to the different functional classes of neurons found in the present study, perhaps corresponding to the direction of reward coding (positive vs negative) or the first two principal components of dorsal raphe neuron activity (task + reward value coding vs reward delivery coding). Conversely, we found that task-related signals in these neurons had no clear relationship to electrophysiological properties such as spike duration and spiking irregularity and were expressed along a continuum rather than in distinct clusters of neurons. This suggests that dorsal raphe reward signals may not be confined to specific neuronal types. Additional studies may be able to resolve this question using techniques to identify neurons based on neurotransmitter content (Allers and Sharp, 2003; Kocsis et al., 2006; Hajós et al., 2007). Another goal of future research will be to determine the precise range of behavioral tasks and natural environments in which dorsal raphe neurons signal upcoming reward value. In particular, it will be important to discover whether dorsal raphe tonic activity encodes the negative value of aversive tasks along the same scale as the positive value of rewarding tasks (Daw et al., 2002; Belova et al., 2008; Dayan and Huys, 2009).
There is now considerable evidence that the dorsal raphe nucleus and the serotonin system have a role in evaluating future rewards. Notably, elevated serotonin levels promote persistence to wait for large delayed rewards, whereas depleted serotonin levels cause impulsive choices of small immediate rewards (Schweighofer et al., 2007, 2008; Tanaka et al., 2007). However, the mechanisms by which the raphe nuclei produce these effects are unknown. Here we found that a group of dorsal raphe neurons tracked progress toward future delayed rewards in a consistent manner both after task initiation and after the value of the trial was revealed. Our data suggest that dorsal raphe neurons influence behavior in an adaptive manner based on the anticipated delay and worth of future motivational outcomes.
Footnotes
-
This research was supported by the Intramural Research Program at the National Eye Institute. K.N. is supported by Precursory Research for Embryonic Science and Technology, the Takeda Foundation, the Nakayama Foundation, a Grant-in-Aid for Scientific Research B, and a Grant-in-Aid for Scientific Research on Priority Areas.
- Correspondence should be addressed to Dr. Ethan S. Bromberg-Martin, Laboratory of Sensorimotor Research, National Eye Institute, National Institute of Health, 49 Convent Drive, Building 49, Room 2A50, Bethesda, MD 20892-4435. bromberge{at}mail.nih.gov