The ventral striatum (VS) is thought to serve as a gateway whereby associative information from the amygdala and prefrontal regions can influence motor output to guide behavior. If VS mediates this “limbic–motor” interface, then one might expect neural correlates in VS to reflect this information. Specifically, neural activity should reflect the integration of motivational value with subsequent behavior. To test this prediction, we recorded from single units in VS while rats performed a choice task in which different odor cues indicated that reward was available on the left or on the right. The value of reward associated with a left or rightward movement was manipulated in separate blocks of trials by either varying the delay preceding reward delivery or by changing reward size. Rats' behavior was influenced by the value of the expected reward and the response required to obtain it, and activity in the majority of cue-responsive VS neurons reflected the integration of these two variables. Unlike similar cue-evoked activity reported previously in dopamine neurons, these correlates were only observed if the directional response was subsequently executed. Furthermore, activity was correlated with the speed at which the rats' executed the response. These results are consistent with the notion that VS serves to integrate information about the value of an expected reward with motor output during decision making.
The ventral striatum (VS) is thought to serve as a “limbic–motor” interface (Mogenson et al., 1980). This hypothesis has been derived primarily from the connectivity of this area with decision/motor-related areas including the prefrontal cortex, limbic-related areas including the hippocampus, amygdala, orbitofrontal cortex, and midbrain dopamine neurons, along with its outputs to motor regions, such as ventral pallidum (Groenewegen and Russchen, 1984; Heimer et al., 1991; Brog et al., 1993; Wright and Groenewegen, 1995; Voorn et al., 2004; Gruber and O'Donnell, 2009). Through these connections, the ventral striatum is thought to integrate information about the value of expected outcomes with motor information to guide motivated behavior. Consistent with this proposal, manipulations of VS impair changes in response latencies associated with different quantities of reward (Hauber et al., 2000; Giertler et al., 2003) and impact other measures of vigor, salience, and arousal thought to reflect the value of expected rewards (Berridge and Robinson, 1998; Cardinal et al., 2002a,b; Di Chiara, 2002; Nicola, 2007).
From these and other studies (Wadenberg et al., 1990; Ikemoto and Panksepp, 1999; Di Ciano et al., 2001; Di Chiara, 2002; Salamone and Correa, 2002; Wakabayashi et al., 2004; Yun et al., 2004; Gruber et al., 2009), it has been suggested that VS is indeed critical for motivating behavior in response to reward-predicting cues. However, there is ample contradictory evidence (Amalric and Koob, 1987; Cole and Robbins, 1989; Robbins et al., 1990; Reading and Dunnett, 1991; Reading et al., 1991; Brown and Bowman, 1995; Giertler et al., 2004) and little direct single-unit recording data from VS in tasks designed to directly address this question (Hassani et al., 2001; Cromwell and Schultz, 2003). Specifically, most VS studies have not varied both expected reward and response direction. Furthermore, no studies have examined how VS neurons respond when animals are making decisions between differently valued rewards, to assess the relationship between the cue-evoked activity and the decision.
To address these issues, we recorded from single neurons in VS while rats performed a choice task for differently valued rewards (Roesch et al., 2006, 2007a,b). On every trial, rats were instructed or chose between two wells (left or right) to receive reward. In different trial blocks, we manipulated the value of the expected reward by increasing either the delay to or size of reward (10% sucrose solution). Here we report that cue-evoked activity in VS neurons integrated the value of the expected reward and the direction of the upcoming movement. Increased firing required that the response be executed and was not observed if the reward was available but the animal chose to execute a different response. Furthermore, increased firing was correlated with the speed at which the rats' executed that response. These results are consistent with the notion that VS serves to integrate information about the value of an expected reward with motor output during decision making.
Materials and Methods
Male Long–Evans rats were obtained at 175–200 g from Charles River Laboratories. Rats were tested at the University of Maryland School of Medicine in accordance with School of Medicine and National Institutes of Health guidelines.
Surgical procedures and histology.
Surgical procedures followed guidelines for aseptic technique. Electrodes were manufactured and implanted as in previous recording experiments. Rats had a drivable bundle of 10 25-μm diameter. Iron–nickel–chrome wires (Stablohm 675; California Fine Wire) chronically implanted in the left hemisphere dorsal to VS (n = 6; 1.6 mm anterior to bregma, 1.5 mm laterally, and 4.5 mm ventral to the brain surface). Immediately before implantation, these wires were freshly cut with surgical scissors to extend ∼1 mm beyond the cannula and electroplated with platinum (H2PtCl6; Aldrich) to an impedance of ∼300 kΩ. Cephalexin (15 mg/kg, p.o.) was administered twice daily for 2 weeks postoperatively to prevent infection. Rats were ∼3 months old at the time of surgery and were individually housed on a 12 h light/dark cycle; experiments were conducted during the light phase.
Recording was conducted in aluminum chambers ∼18 inches on each side with sloping walls narrowing to an area of 12 × 12 inches at the bottom. A central odor port was located above and two adjacent fluid wells on a panel in the right wall of each chamber. Two lights were located above the panel. The odor port was connected to an air flow dilution olfactometer to allow the rapid delivery of olfactory cues. Task control was implemented via computer. Port entry and licking was monitored by disruption of photobeams.
The basic design of a trial is illustrated in Figure 1. Trials were signaled by illumination of the panel lights inside the box. When these lights were on, nose poke into the odor port resulted in delivery of the odor cue to a small hemicylinder located behind this opening. One of three different odors was delivered to the port on each trial, in a pseudorandom order. At odor offset, the rat had 3 s to make a response at one of the two fluid wells located below the port. One odor (Verbena Oliffac) instructed the rat to go to the left to get reward, a second odor (Camekol DH) instructed the rat to go to the right to get reward, and a third odor (Cedryl Acet Trubek) indicated that the rat could obtain reward at either well. Odors were presented in a pseudorandom sequence such that the free-choice odor was presented on 7 of 20 trials and the left/right odors were presented in equal numbers (±1 over 250 trials). In addition, the same odor could be presented on no more than three consecutive trials. Odor identity did not change over the course of the experiment.
Once the rats were shaped to perform this basic task, we introduced blocks in which we independently manipulated the size of the reward delivered at a given side and the length of the delay preceding reward delivery. Once the rats were able to maintain accurate responding through these manipulations, we began recording sessions. For recording, one well was randomly designated as short (500 ms) and the other long (1–7 s) at the start of the session (Fig. 1A, Block 1). Rats were required to wait in the well to receive reward. In the second block of trials, these contingencies were switched (Fig. 1A, Block 2). The length of the delay under long conditions abided the following algorithm. The side designated as long started off as 1 s and increased by 1 s every time that side was chosen until it became 3 s. If the rat continued to choose that side, the length of the delay increased by 1 s up to a maximum of 7 s. If the rat chose the side designated as long as less than 8 of the last 10 choice trials, then the delay was reduced by 1 s to a minimum of 3 s. The reward delay for long forced-choice trials was yoked to the delay in free-choice trials during these blocks. In later blocks, we held the delay preceding reward delivery constant (500 ms) while manipulating the size of the expected reward (Fig. 1A). The reward was a 0.05 ml bolus of 10% sucrose solution. For big reward, an additional bolus was delivered after 500 ms. At least 60 trials per block were collected for each neuron. Rats were mildly water deprived (∼30 min/d water ad libitum) with ad libitum access on weekends.
Procedures were the same as described previously (Roesch et al., 2006, 2007a). Wires were screened for activity daily; if no activity was detected, the rat was removed, and the electrode assembly was advanced 40 or 80 μm. Otherwise, active wires were selected to be recorded, a session was conducted, and the electrode was advanced at the end of the session. Neural activity was recorded using two identical Plexon Multichannel Acquisition Processor systems, interfaced with odor discrimination training chambers. Signals from the electrode wires were amplified 20× by an operational amplifier head stage (HST/8o50-G20-GR; Plexon), located on the electrode array. Immediately outside the training chamber, the signals were passed through a differential preamplifier (PBX2/16sp-r-G50/16fp-G50; Plexon), in which the single-unit signals were amplified 50× and filtered at 150–9000 Hz. The single-unit signals were then sent to the Multichannel Acquisition Processor box, in which they were further filtered at 250–8000 Hz, digitized at 40 kHz, and amplified at 1–32×. Waveforms (>2.5:1 signal-to-noise) were extracted from active channels and recorded to disk by an associated workstation with event timestamps from the behavior computer. Waveforms were not inverted before data analysis.
Units were sorted using Offline Sorter software from Plexon, using a template-matching algorithm. Sorted files were then processed in Neuroexplorer to extract unit timestamps and relevant event markers. These data were subsequently analyzed in Matlab (MathWorks). To examine activity related to the decision, we examined activity from odor onset to odor port exit. Wilcoxon's tests were used to measure significant shifts from zero in distribution plots (p < 0.05). t tests or ANOVAs were used to measure within-cell differences in firing rate (p < 0.05). Pearson's χ2 tests (p < 0.05) were used to compare the proportions of neurons.
Rats were trained on a choice task illustrated in Figure 1A (Roesch et al., 2006, 2007a). On each trial, rats responded to one of two adjacent wells after sampling an odor at a central port. Rats were trained to respond to three different odor cues: one odor that signaled reward in the right well (forced-choice), a second odor that signaled reward in the left well (forced-choice), and a third odor that signaled reward at either well (free-choice). Across blocks of trials, we manipulated value by increasing the length of the delay preceding reward delivery (Fig. 1A, Blocks 1, 2) or by increasing (Fig. 1A, Blocks 3, 4) the number of rewards delivered. Essentially, there were four types of rewards (short-delay, long-delay, big-reward, and small-reward) and two response directions (left and right), resulting in a total of eight conditions.
Rats' behavior on both free- and forced-choice trials reflected manipulations of value. On free-choice trials, rats chose shorter delays and larger rewards over their respective counterparts (t test; df = 119; t values >16; p values <0.0001). Likewise, on forced-choice trials, rats were faster and more accurate when responding for a more immediate or larger reward (t test; df = 119; t values >9; p values <0.0001). Thus, rats perceived the differently delayed and sized rewards as having different values and were more motivated under short-delay and big-reward conditions than under long-delay and small-reward conditions, respectively.
We recorded 257 VS neurons across 75 sessions in six rats during performance of all four trial blocks. Recording locations are illustrated in Figure 2F. Because forced-choice trials present an evenly balanced neural dataset with equal numbers of responses to each well, we will first address our hypothesis by analyzing data from these trials. Thus, we will ask whether neural activity in VS neurons reflects value and direction of responding across blocks, particularly after learning (last 10 trials in each direction).
Activity in VS reflected the value and direction of the upcoming response
As has been reported previously (Carelli and Deadwyler, 1994; Nicola et al., 2004; Taha and Fields, 2006; Robinson and Carelli, 2008), many VS neurons were excited (n = 44; 17%) or inhibited (n = 76; 30%) during cue sampling (odor onset to port exit) versus baseline (1 s before nose poke; t test comparing baseline with cue sampling over all trials collapsed across condition; p < 0.05). An example of the former is illustrated in Figure 2A–D. Consistent with the hypothesis put forth in Introduction, activity of this neuron reflected the integration of associative information about the value of the reward predicted by the cue and the subsequent response. Thus, cue-evoked activity on forced-choice trials after learning was strongest for the cue that indicated reward in the left well, and this neural response was highest when value predicted for that well was high (on short and big trials). To quantify this effect, we performed a two-factor ANOVA with value and direction as factors during the last 10 forced-choice trials in each block (p < 0.05). Of the 44 cue-responsive neurons, 21 (47%) showed a similar significant interaction between direction and value. This count was significantly above chance given our threshold for statistical significance in our unit analysis (χ2 test; p < 0.0001), and there was no directional bias to the left or right across the population (Fig. 2E) (p = 0.98). In contrast, and in keeping with the most rigorous account of the hypothesis that VS integrates value and direction information, only five (11%) showed a main effect of direction alone, and only three (7%) showed a main effect of value alone (Fig. 2E) (ANOVA; p < 0.05); these counts did not exceed chance (χ2 test; p values >0.05).
The overall effect is illustrated in Figure 3, which plots the average activity across all cue-responsive neurons on forced-choice trials during the last 10 trials for all eight conditions. For each cell, direction was referenced to its preferred response before averaging; thus, by definition, activity was higher in the preferred direction (left column). Like the single-cell example, population activity during cue sampling was stronger in the preferred direction when value was high. That is, activity was stronger before a response in the preferred direction of the cell (left column) when the expected outcome was either a short delay (blue) or a large reward (green) compared with a long delay (red) or a small reward (orange), respectively. Notably, although activity in these populations did begin to increase during entry into the odor port, the difference in firing was only present during actual delivery of the odor (Fig. 3, gray shading).
Distributions of delay and size indices for each neuron, defined by the difference between high and low value divided by the sum of the two, are illustrated for each direction (preferred and nonpreferred) during the odor epoch (odor onset to port exit) in Figure 3, E and F. Only when value was manipulated in the preferred direction of the cell was the index significantly shifted above zero, indicating higher firing rates for more valued outcomes (Wilcoxon's test; μ = 0.134; z = 3.56; p < 0.001). Cases in which neurons exhibiting stronger firing for high-value reward [n = 16 (18%)] outnumbered those showing the opposite effect [n = 4 (5%); χ2 test; p < 0.008]. Neither the shift in the distribution nor the difference in number of cases in which activity was stronger for high or low value achieved significance in the nonpreferred direction (Fig. 3F) (p values >0.4).
Activity in VS was correlated with motivational level
VS is thought to motivate or invigorate behavior (Robbins and Everitt, 1996; Cardinal et al., 2002a). If the neural signal integrating value and directional response, identified above, relates to that function, then one should expect this activity to be correlated with the motivational differences between high- and low-value reward in our task. To address this question, we next examined the relationship between neural activity and reaction time (speed at which rat made the decision to move and exited the odor port). In previous sections, we showed that the reaction time was faster and activity was stronger (Fig. 3) when more valued reward (short delay and big reward) was at stake. To ask whether the two were correlated, we plotted neural activity (high − low/high + low) versus reaction time (high− low/high + low) independently for preferred and nonpreferred directions. We found that there was a significant negative correlation between the two in the preferred direction of the neuron (Fig. 3G) (p < 0.001; r2 = 0.150). This relationship was not evident in the nonpreferred direction (Fig. 3H) (p = 0.361; r2 = 0.010).
To examine this phenomenon more closely, we divided sessions into those with a strong versus a weak motivational difference between high- and low-value reward. According to the correlation described above, we would expect activity to be stronger for higher-value reward in sessions in which rats showed a strong difference between high- and low-value outcomes. To test this, we sorted sessions based on each rat's reaction time difference between high- and low-value trial types (small − big; long − short). In the top half of the distribution, the average reaction time on high- and low-value trials was 156 and 285 ms, respectively (t test; df = 43; t = 17; p < 0.0001), whereas in the lower half, reaction times on high- and low-value trials were 207 and 234 ms, respectively (t test; df = 43; t = 5; p < 0.01). Although both halves exhibited significant differences between high- and low-value outcomes, the differences were significantly larger in the top half (t test; df = 43; t = 17; p < 0.0001).
Remarkably, the neural signal identified above was only evident in sessions in which the rats were more strongly invigorated by high-value reward (Fig. 4A–D). This is illustrated in both delay and size blocks by higher firing rate during odor sampling for short-delay (blue) and big-reward (green) conditions over long-delay (red) and small-reward (orange) conditions, respectively. Value index distributions were significantly shifted above zero in the preferred direction in these sessions (Fig. 4E) (Wilcoxon's test; μ = 0.188; z = 3; p < 0.002). In sessions in which rats were less concerned about the outcome (Fig. 4G–L), there was only a modest nonsignificant difference in activity in the preferred direction (Wilcoxon's test; μ = 0.080; z = 1; p = 0.138).
Notably, the differences between sessions with strong and weak reaction time differences did not seem to reflect satiation, which has been shown to lead to slower overall reaction times (Holland and Straub, 1979; Sage and Knowlton, 2000). Overall speed of responding was not significantly different between sessions with strong and weak reaction time differences (220 vs 221 ms; t test; df = 43; t = 0.02; p = 0.986), and value correlates were no more likely to be observed early in a session versus late. The number of cells exhibiting value selectivity during the first two blocks of a session did not significantly differ from those observed during the last two blocks of a session (12 neurons or 27% vs 11 neurons or 25%; χ2; p = 0.85).
Rats also appeared to learn the contingencies similarly in the two session types; rats chose the more valuable reward on 69% of trials (strong, 69.1%; weak, 69.4%; t test; df = 43; t = 0.2; p = 0.852). This indicates that latency differences did not reflect a learning effect. Together, these data suggest that differences in reaction time did not result from satiation or insufficient learning. Instead, when rats were goal oriented and strongly motivated by differences in expected value, activity in VS clearly reflected the animals' motivational output.
Activity in VS reflected the value of the decision
Up to this point, we have only analyzed forced-choice trials, in which odors instruct rats to respond to the left or the right well. We have assumed that this directional selectivity reflects the impending movement; however, directional selectivity might also represent the identity of the odor, regardless of whether or not that response is executed. This is because, on forced-choice trials, the odor and movement direction are confounded, because one odor means go right and the other means go left.
To address this issue, we compared activity on forced-choice trials with that on free-choice trials. This comparison can resolve this issue because, on free-choice trials, a different odor (rather than forced-choice) indicated the freedom to choose either direction (i.e., reward was available on each side). Moreover, rats chose the lower-value direction on a significant number of free-choice trials. Thus, by comparing firing on free- and forced-choice trials, we can disambiguate odor from movement selectivity. If the directional signal identified on forced-choice trials reflects only the impending movement, then it should be identical on free- and forced-choice trials, provided the rat makes the same response. Conversely, if the signal differs on free- and forced-choice trials when the rat makes the same response, then this would suggest that the proposed directional selectivity incorporates information about the sensory features of the odor.
For this analysis, we included all trials after learning (>50% choice performance) and collapsed across delay and size blocks. This procedure allowed us to increase our sample of low-value free-choice trials, which were sparse at the end of trial blocks. To further control for any differences that might arise during learning (rats typically chose low-value outcomes earlier on free-choice trials but were forced to choose low-value outcomes throughout the entire block on forced-choice trials), we paired each free-choice trial with the immediately preceding and following forced-choice trial of the same value.
The results of this analysis are illustrated in Figure 5. Figure 5, A and B, represents the average activity over all neurons that showed a significant interaction between direction and value when rats responded in the preferred (solid) and nonpreferred (dashed) direction of the cell for high-value (black) and low-value (gray) outcomes during forced- and free-choice trials, respectively. As described previously, cue-evoked activity on forced-choice trials was stronger for high-value outcomes but only in one direction (Fig. 5A). Activity during free-choice trials showed exactly the same pattern. Thus, firing was higher on free-choice trials but only when the rat chose the high-value outcome and only when that outcome was in a particular direction (Fig. 5B). This is quantified in Figure 5C, which plots the difference between the preferred outcome/response (e.g., high-value-left) and nonpreferred outcome/response (e.g., low-value-right) of the cell on forced-choice trials (x-axis) versus the same calculation from data on free-choice trials (y-axis). By definition, values are all shifted above zero on the x-axis, because firing in these neurons was always higher for the preferred outcome/response on forced-choice trials. Importantly, values were also shifted above zero on free-choice trials (y-axis; Wilcoxon's; μ = 0.2879; z = 4; p < 0.001). This indicates that neural activity was the same for a particular value and response, although the two trial types (free and forced) involved different odors (Fig. 5C). This pattern suggests that neural signals in VS neurons reflect the value of a particular yet-to-be-executed motor response and is not cue specific. This pattern also indicates that signaling in VS reflects the value of the response that is going to be executed, because firing differed on free-choice trials when different responses were made, although the high-value reward was always available to be selected.
Activity after the decision was stronger in anticipation of the delayed reward
Lesions or other manipulations of VS make animals more likely to abandon a larger, delayed or higher-cost reward in favor of a smaller, more immediate or lower-cost reward (Cousins et al., 1996; Cardinal et al., 2001, 2004; Winstanley et al., 2004; Bezzina et al., 2007; Floresco et al., 2008; Kalenscher and Pennartz, 2008). These studies suggest that VS may be important for maintaining information about reward after the decision has been made. Consistent with this, we found that activity in the cue-responsive VS neurons described above was also be elevated during the delay in our task, especially on correct trials. This is apparent in Figures 3 and 5, which show that activity was higher after the response in the preferred direction of the cell on long-delay (red) compared with short-delay (blue) trials. To quantify this effect, Figure 6, A and B, plots the distribution of delay indices (short − long/short + long) during the 3 s (minimum delay after learning) after the behavioral response in the preferred and nonpreferred direction of the cell. Delay indices were shifted significantly below zero, indicating higher firing after responding to the delayed well (Wilcoxon's test; μ = −0.158; z = 2.2; p < 0.024;), and the counts of neurons exhibiting this pattern [n = 24 (55%)] significantly outnumbered those showing the opposite effect [n = 6 (14%)]. Notably, the increased firing after responding to the delayed well always preceded reward, because it occurred before the minimum delay after learning (3 s).
Interestingly, the difference in firing between short- and long-delay trials after the behavioral response was also correlated with reaction time (Fig. 6C) (p < 0.005; r2 = 0.190). However, the direction of this correlation was the opposite of that between reaction times and cue-evoked activity described previously. Thus, slower responding on long-delay trials resulted in stronger firing rates after well entry and before reward delivery. If activity in VS during decision making reflects motivation, as we have suggested, then activity during this period may reflect the exertion of increased will to remain in the well to receive reward or expectation of reward, rather than signaling of other variables such as disappointment. Perhaps loss of this signal after lesions or inactivation of VS reduces the rat's capacity to maintain motivation toward the delayed reward. This suggests that it is necessary for VS to fire more in the delay to keep the rat in the well waiting for reward. Unfortunately, there were too few trials in which the rat left the fluid port prematurely to test this hypothesis.
Inhibitory responses in VS were not correlated with motivation
Finally, we asked whether the 76 neurons (30% of total neurons recorded) that were inhibited during odor sampling reflected motivational value. Inhibitions in VS activity during performance of behavioral tasks have been described previously (Carelli and Deadwyler, 1994; Nicola et al., 2004; Taha and Fields, 2006; Robinson and Carelli, 2008) and might reflect the inhibition of inappropriate behaviors during task performance (i.e., leaving odor port or fluid well early), which might be more critical when a better reward is at stake. Here we address whether or not these neurons were modulated by expected reward value.
The average firing rates over these neurons are illustrated in Figure 7, A and B. As defined in the analysis, activity was inhibited during odor sampling. As the rat moved down to the well, activity briefly returned to baseline but then quickly returned to an inhibited state after entering the well and then subsequently returned to baseline during well exit. As for excitatory neurons, we asked whether the motivational level of the animal modulated neural firing in these neurons. Distributions of value indices were not significantly shifted from zero (Fig. 7E,F) (Wilcoxon's test; z values <2; p values >0.082), and approximately equal numbers of neurons fired more strongly and weakly for high-value reward (Fig. 7E,F, black bars). Furthermore, activity in these neurons was not correlated with reaction time. Thus, inhibitions observed during task performance were not modulated by value as observed for excitations.
VS activity during reward delivery was not modulated by unexpected reward
Previously, in rats performing this same task, we have shown that dopamine neurons fire more strongly at the beginning of trial blocks when an unexpected reward was delivered and less strongly in trial blocks when an expected reward was omitted (Roesch et al., 2007a). Such activity is thought to represent bidirectional prediction error encoding.
Of the sample of 257 VS neurons, activity in 41 neurons was responsive to reward delivery [t test comparing baseline with reward delivery (1 s) over all trials collapsed across condition; p < 0.05]. Of those, 12 were also cue responsive as defined above. Analysis of prediction errors revealed that few VS neurons seem to signal errors in reward prediction. For example, the single cell illustrated in Figure 8A fired more strongly when reward was delivered unexpectedly; firing was maximal immediately after a new reward was instituted and diminishing with learning. However, this example was the exception rather than the rule. This is illustrated across the population in Figure 8, B and C, which shows the contrast in activity (early vs late) for all of the reward-responsive VS neurons (n = 41). This contrast is plotted separately for blocks involving unexpected delivery and omission of reward. Neither distribution was shifted significantly above zero, indicating no difference in firing early, after a change in reward, compared with later, after learning (Fig. 8B,C) (Wilcoxon's test; z values <2; p values >0.2610).
Here we show that single neurons in VS integrate information regarding value and impending response during decision making and influence the motivational level associated with responding in a given direction. Cues predicting high-value outcomes had a profound impact on behavior, decreasing reaction time and increasing accuracy. This behavioral effect was correlated with integration of value and impending response during cue sampling in VS neurons. This result is broadly consistent with proposals that VS acts as a limbic–motor interface (Mogenson et al., 1980) and with a number of recent reports showing that VS signals information about impending outcomes at the time a decision is made (Carelli, 2002; Setlow et al., 2003; Janak et al., 2004; Nicola, 2007; Ito and Doya, 2009; van der Meer and Redish, 2009).
Although these results are correlational in nature, they are in agreement with results from several studies in which pharmacological methods were used to show a more causal relationship between VS function and behavior (Berridge and Robinson, 1998; Cardinal et al., 2002a; Nicola, 2007). One set of studies in particular examined the impact of several different VS manipulations on rats' latencies to respond for different quantities of reward (Hauber et al., 2000; Giertler et al., 2003). In this simple reaction time task, discriminative stimuli presented early in each trial predicted the magnitude of the upcoming reward. As in our task, rats were faster to respond when reward was larger. Manipulations of glutamate and dopamine transmission in VS disrupted changes in the speed of responding to stimuli predictive of the upcoming reward magnitude. This is consistent with correlations between reaction time and firing in VS reported above.
Interestingly, the same group reported that lesions or inactivation of VS had no impact on latency measures, suggesting that complete disruption of VS allows for other areas to motivate behavioral output (Brown and Bowman, 1995; Giertler et al., 2004). This may explain why in some sessions in the current study VS activity was not selective for the upcoming reward, yet there remained a weak difference in response latencies. Notably, rats continued to choose the more preferred outcome during free-choice trials, consistent with reports that VS is not required for choosing a large over a small reward (Cousins et al., 1996).
Interestingly, our results suggest that VS may play multiple, potentially conflicting roles in delay discounting tasks. On one hand, activity during the decision is higher preceding an immediate reward and seems to invigorate behavior toward the more valued reward. On the other hand, once a decision to respond for the delayed reward had been made, activity in VS neurons increased, as if maintaining a representation of the anticipated reward. Most of the delay discounting literature suggests that the latter function is the one of importance; lesions or other manipulations of VS make animals more likely to abandon a larger, delayed reward in favor of a smaller, more immediate reward (Cousins et al., 1996; Cardinal et al., 2001, 2004; Winstanley et al., 2004; Bezzina et al., 2007; Floresco et al., 2008; Kalenscher and Pennartz, 2008). However, we would speculate that different training procedures might change the relative contributions of these two functions. For example, if animals were highly trained to reverse behaviors based on discounted reward, as in the recording setting used here, they might be less reliant on VS to maintain the value of the discounted reward. In this situation, the primary effect of VS manipulations might be to reduce the elevated motivation elicited by cues predicting more immediate reward.
Another notable aspect of these data is that VS neurons integrated activity regarding value (size and delay) and response, both during forced- and free-choice behavior. Anticipation of differently valued rewards has been shown previously to affect firing in other regions of striatum. For example, many neurons in occulomotor regions of caudate (dorsal medial striatum) encode both direction and motivational value and are thought to be critical in the development of response biases toward desired goals (Lauwereyns et al., 2002). These data differ from our results in several ways. First, neurons in caudate typically exhibit a contralateral bias, firing more strongly for saccades made in the direction opposite to the recording hemisphere. In VS, approximately equal numbers of neurons preferred leftward and rightward movement. These results are consistent with deficits observed after pharmacological manipulations of these areas (Carli et al., 1989). Second, activity in many neurons in caudate has been reported to reflect available movement–reward associations even when the relevant response is not subsequently executed (Lauwereyns et al., 2002; Samejima et al., 2005; Lau and Glimcher, 2008). Such “action-value” or “response-bias” correlates were not present in VS. In this, our results are consistent with recent findings by Ito and Doya (2009), which showed that representations of action value are less dominant in rat VS compared with other types of information. Thus, whereas activity in dorsal striatum (DS) may be critical in representing the value of available actions (behaviorally independent action value), activity in VS seems to be more closely tuned to representing the value of the upcoming response (behaviorally dependent action value). Such activity may reflect an “action-specific reward value” (Samejima et al., 2005), because it is specific for value for only one of the two actions. Practically speaking, such a representation could invigorate or motivate a specific behavior (left or right) through downstream motor areas via some sort of winner-take-all mechanism (Pennartz et al., 1994; Redgrave et al., 1999; Nicola, 2007; Taha et al., 2007).
Another possibility is that the correlates observed in VS incorporate information about the expected outcome itself. Such representations would allow behavior to change spontaneously in response to changes in the value of the outcome. Such information might be acquired through inputs from orbitofrontal cortex or basolateral amygdala, both of which send information to VS and are implicated in signaling of information about expected outcomes (Hatfield et al., 1996; Schoenbaum et al., 1998; Gallagher et al., 1999; Gottfried et al., 2003; Ambroggi et al., 2008). Interestingly, data regarding the role of VS in these behavioral settings is sparse and often contradictory. This is also somewhat true of our own results; because we recorded during presentation of the differently valued outcomes (i.e., during learning), we cannot distinguish signaling such outcome representations from cached estimates of response value.
Critically, such firing cannot represent “cue value” because the signal integrating value and impending response in VS neurons is not present when the rats choose to respond in the opposite direction. Moreover, we have shown previously that responses to the low-value well on these trials are not mistakes; the rats' response latencies on these trials indicate that they know they are responding for the less valuable outcome (Roesch et al., 2007a). As illustrated in Figure 5B, the elevated cue-evoked activity on trials in which the rats responded in the preferred direction of the neuron (bold lines) was not evident when the rat chose to go in the opposite direction (dashed lines). This was true despite the fact that, on these trials, the rats sampled the same odor and had available the same outcome in the preferred direction. Notably, this result differs from what we have reported previously for cue-evoked activity in dopamine neurons in this same task; these neurons signaled the value of the best available option on free-choice trials, even when it was not selected (Roesch et al., 2007a). Thus, firing in ventral tegmental area dopamine neurons reflects the value of the better option during decision making, whereas activity in VS neurons tracks the value of the action that is ultimately chosen.
This notion is consistent with the idea that VS plays a critical role in actor–critic models, optimizing long-term action selection through its connections with midbrain dopamine neurons. In this model, the “critic” stores and learns values of states, which in turn are used to compute prediction errors necessary for learning and adaptive behavior. The “actor” stores and forms a policy on which actions should be selected (Joel et al., 2002; Montague et al., 2004). Recently, the functions of critic and actor have been attributed to ventral and dorsal lateral striatum, respectively, based on connectivity, pharmacology, and functional magnetic resonance imaging (fMRI) (Everitt et al., 1991; Cardinal et al., 2002a; O'Doherty et al., 2004; Voorn et al., 2004; Balleine, 2005; Pessiglione et al., 2006).
Our single-unit results fit well with this hypothesis. Neurons in VS signal the value of the upcoming decision, which may in turn impact downstream dopamine neurons that subsequently modify both the actor (DS) and the critic (VS). In this regard, it is noteworthy that analysis of neural activity in VS during learning in this task revealed no evidence that VS neurons encode the actual reward prediction errors, which are proposed to stamp in associative information. This is consistent with recent suggestions that the strong error signal in VS often reported in human fMRI studies reflects input from other areas and is not an output signal from this region (Knutson and Gibbs, 2007).
Finally, we also found that many neurons were inhibited during task performance. Previous studies have also reported long-lasting inhibitions during task performance and argued that these correlates may reflect inhibition of competing behaviors (e.g., locomotion, grooming, and running away) (Nicola et al., 2004; Taha and Fields, 2005; Taha and Fields, 2006). Unlike the instructive signals described above, it is thought that these inhibitory signals should be modulated by appetitive behavior but be independent of the specific response being performed (Taha and Fields, 2006). In our task, it necessary for rats to remain stationary in the odor port and then in the well to receive reward. During these two periods, activity of many VS neurons was inhibited, perhaps reflecting the need to suppress competing behaviors. However, if this is the case, it seems odd that inhibitory activity was no more pronounced on big-reward and short-delay conditions than on small-reward and long-delay trials. This suggests that activity in these cells is not influenced by the value of the reward at stake, despite the fact that the rats attend better and are more motivated on these trials. Of course, maintaining hold in the odor port and then in the fluid well was not difficult, as evidenced by the low number of early non-pokes. It is possible that increasing the requirement to remain still during these periods would provide more evidence for such a function.
This work was supported by National Institute on Drug Abuse Grants R01-DA015718 (G.S.) and K01DA021609 (M.R.R.) and National Institute on Aging Grant R01-AG027097 (G.S.).
- Correspondence should be addressed to either Matthew R. Roesch, Department of Psychology, Program in Neuroscience and Cognitive Science, University of Maryland, College Park, MD 20742, ; or Geoffrey Schoenbaum, Department of Anatomy and Neurobiology, University of Maryland School of Medicine, 20 Penn Street, HSF-2 S251, Baltimore, MD 21201,