Abstract
The ventral striatum is critical for evaluating reward information and the initiation of goal-directed behaviors. The many cellular, afferent, and efferent similarities between the ventral striatum's nucleus accumbens and olfactory tubercle (OT) suggests the distributed involvement of neurons within the ventral striatopallidal complex in motivated behaviors. Although the nucleus accumbens has an established role in representing goal-directed actions and their outcomes, it is not known whether this function is localized within the nucleus accumbens or distributed also within the OT. Answering such a fundamental question will expand our understanding of the neural mechanisms underlying motivated behaviors. Here we address whether the OT encodes natural reinforcers and serves as a substrate for motivational information processing. In recordings from mice engaged in a novel water-motivated instrumental task, we report that OT neurons modulate their firing rate during initiation and progression of the instrumental licking behavior, with some activity being internally generated and preceding the first lick. We further found that as motivational drive decreases throughout a session, the activity of OT neurons is enhanced earlier relative to the behavioral action. Additionally, OT neurons discriminate the types and magnitudes of fluid reinforcers. Together, these data suggest that the processing of reward information and the orchestration of goal-directed behaviors is a global principle of the ventral striatum and have important implications for understanding the neural systems subserving addiction and mood disorders.
SIGNIFICANCE STATEMENT Goal-directed behaviors are widespread among animals and underlie complex behaviors ranging from food intake, social behavior, and even pathological conditions, such as gambling and drug addiction. The ventral striatum is a neural system critical for evaluating reward information and the initiation of goal-directed behaviors. Here we show that neurons in the olfactory tubercle subregion of the ventral striatum robustly encode the onset and progression of motivated behaviors, and discriminate the type and magnitude of a reward. Our findings are novel in showing that olfactory tubercle neurons participate in such coding schemes and are in accordance with the principle that ventral striatum substructures may cooperate to guide motivated behaviors.
- appetitive behavior
- basal ganglia
- consummatory behavior
- motivation
- reward
- ventral striatopallidal complex
Introduction
Goal-directed behaviors are widespread among animals and underlie complex behaviors ranging from food intake, social behavior, and even pathological conditions, such as gambling and drug intake. All goal-directed behaviors share the necessity to evaluate available reward and motivational information to select an appropriate action, and are defined by their sensitivity to changes in outcome value and the action-outcome contingency (Dickinson and Balleine, 1994; Redgrave et al., 2010). The ventral striatum, containing both the nucleus accumbens (NAc) and olfactory tubercle (OT), serves as a “limbic-motor interface” (Mogenson et al., 1980). The ventral striatum receives a complex array of sensory and contextual information from cortical, amygdalar, hippocampal, thalamic, and midbrain dopaminergic afferents. Further, the ventral striatum sends efferent projections to the ventral pallidum and substantia nigra to influence basal ganglia output structures (for review, see Ikemoto, 2007; Haber, 2011). These connections place the ventral striatum in a critical position for evaluating reward information, and in turn, to influence the motivational control and execution of appropriate behavioral actions (Mogenson et al., 1980; Cardinal et al., 2002; Kelley, 2004; Haber, 2011).
As a part of the ventral striatopallidal complex, the NAc and OT share many morphological and chemical characteristics, with medium spiny neurons being the principal projection neurons of both structures (Alheid and Heimer, 1988). The many cellular, afferent, and efferent similarities between the OT and NAc suggest the distributed involvement of neurons within the ventral striatum in motivated behaviors. Although the NAc has an established role in representing goal-directed actions and their outcomes (Apicella et al., 1991; Setlow et al., 2003; O'Doherty, 2004; Taha and Fields, 2005; Day and Carelli, 2007; Roesch et al., 2009; van der Meer and Redish, 2011; Floresco, 2015), it is not known whether the OT also partakes in this function. This remains a major void in our understanding of ventral striatum function and how motivational information is evaluated to drive goal-directed behaviors.
The OT influences motivated behaviors. Electrical stimulation of the OT is rewarding, with rats and mice readily self-administering stimulation (Prado-Alcalá and Wise, 1984; FitzGerald et al., 2014). Similarly, lesions of the OT decrease mating behavior in male rats (Hitt et al., 1973) and abolish the preference of female mice for male chemosignals (Agustín-Pavón et al., 2014; DiBenedictis et al., 2015). The OT is a target of dopaminergic neurons originating in the ventral tegmental area and may also modulate the salience of drugs of abuse (Ikemoto, 2003, 2007; Ikemoto et al., 2005; Striano et al., 2014). For example, infusions of cocaine into the OT induces conditioned place preference in rats (Ikemoto, 2003). Further, rats self-administer cocaine into the OT more so than the NAc or ventral pallidum (Ikemoto, 2003), and neurons within the OT exhibit changes in firing during the self-administration of cocaine (Striano et al., 2014). Our recent work further revealed that the OT robustly and flexibly encodes the associated meaning of conditioned cues (Gadziola et al., 2015). Together, these findings suggest a critical role for the OT in the encoding of reward-related cues to adaptively guide behavior.
Here, we test the hypothesis that OT neurons encode goal-directed actions and natural reinforcers by implementing a tandem fixed-interval modified fixed-ratio instrumental task in combination with extracellular multi-wire array recordings in mice. We find that the firing rate of OT neurons is modulated by the instrumental behavior (licking) and can encode the type and magnitude of rewards. Our results illustrate the profound capacity for the OT to represent primary reinforcers in manners likely essential for driving motivated behaviors.
Materials and Methods
Animals.
C57BL/6 male mice (n = 15, 2–3 months of age) originating from Harlan Laboratories were bred and maintained within the Case Western Reserve University School of Medicine animal facility. Two animals did not contribute data because they did not reach criterion behavioral performance levels. Three animals were used for behavioral measures only. Mice were housed on a 12 h light/dark cycle with food and water available ad libitum, except when water was restricted for behavioral training (see below). Postsurgical animals were housed individually. All experimental procedures were conducted in accordance with the guidelines of the National Institutes of Health and were approved by the Case Western Reserve University's Institutional Animal Care Committee.
Surgical procedures.
Surgical procedures were conducted as described previously (Gadziola et al., 2015). Briefly, mice were anesthetized with Isoflurane (2–4% in oxygen, Abbott Laboratories), and mounted in a stereotaxic frame with a water-filled heating pad (38°C) beneath to maintain body temperature. An injection of a local anesthetic (0.05% marcaine, 0.1 ml s.c.) was administered before exposing the dorsal skull. A craniotomy was made to access the OT (+1.8 mm bregma, +1.0 mm lateral; Fig. 1). An 8-channel micro-wire electrode array (102 μm diameter PFA-insulated tungsten wire, with 4 electrode wires encased together in a 254 μm diameter polyimide tube) was implanted within the OT (4.9 mm ventral) and cemented in place, along with a headbar for later head fixation. A second craniotomy was drilled over the contralateral cortex for placement of a ground wire (127 μm stainless steel wire). For one cohort of mice (n = 3), electrode arrays were implanted bilaterally within the OT to increase data yield. During a 3 d recovery period, animals received a daily injection of Carprofen (5 mg/kg, s.c., Pfizer Animal Health) and ad libitum access to food and water.
Behavioral task.
Mice were mildly water-restricted for 3 d before behavioral training on a 24 h water restriction schedule. Bodyweight was monitored daily and maintained at 85% of their original weight by means of daily supplemental water. Although C57BL/6 mice normally consume ∼3–5 ml of water per day (Mouse Phenome Database from the The Jackson Laboratory; http://www.jax.org/phenome), physiological adaptation and stabilization of body weight occurs with chronic restriction of water, resulting in the mice only requiring ∼1–2 ml of water per day to maintain their restricted weight (Bekkevold et al., 2013; Guo et al., 2014).
Mice were trained in cohorts of three. All behavioral procedures were performed during the light hours. Across multiple sessions (<1 h duration), head-fixed mice were trained on a tandem fixed-interval (FI) modified fixed-ratio (M-FR) task to lick a spout positioned in front of their snouts for a 4 μl water reward. Mice were required to lick near continuously throughout a 2 s baseline period before reward delivery, enabling us to independently monitor activity changes in response to the instrumental period and reinforcer. Licking was measured by a pair of infrared photobeams positioned to cross in front of the lick spout by ∼2 mm. Mice were first trained to lick the spout for a water reward according to a FI(1) schedule (Phase 1). Thus, after the 1 s FI had elapsed, mice were eligible to receive a 4 μl reward if a single lick response to the spout was detected. Reward delivery would then initiate the start of a new trial. The FI was progressively increased from 1 to 11 s, incrementing in 1 s steps. In Phase 2, a vacuum epoch (2 s duration, 2 L/min flow rate) occurred within the FI, 6 s after reward delivery, to remove any remaining liquid (Fig. 2A). Once behavioral stability was reached on the FI(11) schedule, in Phase 3, mice were transitioned to a tandem FI M-FR schedule, in which reinforcement only occurred after the two successive schedule requirements had been met (Fig. 2A). The M-FR schedule was progressively increased until licking was maintained for at least 2.5 s before triggering reward delivery. In the M-FR schedule, a pause in lick detection of >300 ms would reset the FR counter back to 1 to ensure that there was a continuous bout of licking before reward delivery. Mice were required to complete ∼20 trials before the M-FR was incremented by 1–2 licks. Final M-FR schedules varied from 16 to 24 licks for different animals based on their rates of dry licking. No cue was provided at trial start or to signal when one schedule was complete and the next had begun.
On separate experimental days mice were evaluated under two different behavioral sessions. In the first session, a water reward was delivered at three different volumes (4, 8, and 12 μl). Mice then received additional training (2–4 sessions) with the three different reward types (water, saccharin, or quinine) to increase the number of trials performed within a single session. For the second experimental session, all three reward types were presented at two different volumes each (4 and 12 μl). Both sessions also included some trials of reward omission (0 μl). Experiments continued until mice stopped initiating new trials or after 1 h had elapsed. In the first session mice performed an average of 104 ± 25 trials, resulting in a range of 15–45 trials per reward type. In the second session mice performed an average of 125 ± 28 trials, resulting in a range of 14–26 trials per reward type. Mice consumed 0.75 ml of water on average within the behavioral task and were provided supplemental water as needed in a dish in their home cage.
Reward delivery.
Reward fluids were delivered through a custom 3D-printed polylactic acid lick spout. The spout contained seven 1 mm holes, with one hole positioned in the center and the other six arranged in a circle around the center hole, with ∼1.7 mm spacing between adjacent holes. Independent stimulus lines terminated onto 20 G blunted needles that passed through the holes and extended to the tip of the spout. In the current task, three adjacent holes on the lick spout were used for reward delivery, three were connected to a vacuum line, and the last unused hole was blocked. Reward types included water, 2 mm saccharin and 1 mm quinine (Sigma-Aldrich; dissolved in water), and could be delivered at one of three different volumes (4, 8, and 12 μl) by controlling the duration that fluid-limiting solenoid pinch valves were opened. Reward volumes were calibrated for each reward valve at the beginning of each experimental session. Placement of reward lines rotated on different sessions. To dampen any potential auditory cues from the different solenoid valves, valves were housed within a sound-attenuating chamber. Reward types and volumes were pseudo-randomized throughout the session. No predictive cues were associated with rewards.
Measuring changes in motivational drive across a session.
To test whether motivational drive to perform the task (i.e., thirst) changed across the duration of a session, three behavioral measures were examined. First, a cohort of mice without electrode implants (n = 3) were trained on the task to receive a 4 μl water reward each trial. Across two behavioral sessions, these mice were then removed at different time points in the session to measure ad libitum access water consumption. Specifically, on different days the mice were removed either early in the session (after 0.125 ml of water consumption, or 31 trials completed) or late in the session (after 0.625 ml of water consumption, or 156 trials completed), which corresponds to the amount of water typically consumed within the first third and last third of trials, respectively. Following, the mice were immediately transferred to a mouse cage for monitoring of water consumption for 30 min via ad libitum access to metal lick tubes which allowed measures of both the volume of consumption and number of licks (based on the designs of Bachmanov et al., 2002; Hayar et al., 2006; Slotnick, 2009). Mice were maintained at the same weight on both testing days and had previous experience with the ad libitum access behavioral setup, receiving their supplemental water during 15 min sessions for 5 d before initial testing. As a second measure of motivational drive, in 10 mice we analyzed the latency to initiate the first dry lick that resulted in reward delivery after completion of the FI from all experimental sessions. Finally, as a third measure of motivational drive, in the same 10 mice we also analyzed the duration of wet licking observed after delivery of a water reward. This lick bout was defined by the first lick triggering reward delivery and the last lick that occurred before vacuum onset.
Reinforcer devaluation test.
To assess whether our head-fixed licking behavior measurement is subject to devaluation and therefore “goal-directed” (Dickinson and Balleine, 1994; Redgrave et al., 2010), a subset of water-restricted animals (n = 4) were allowed ad libitum access to water 30 min before testing. Mice consumed an average of 1.5 ± 0.31 ml of water, with the majority of intake occurring within the first few minutes. Immediately afterward, mice were head-fixed within the behavioral task so that the amount of instrumental behavior in a sated state could be evaluated. The number of trials completed when sated were compared with the average number of trials completed in the previous five sessions that were under normal water-restriction.
In vivo electrophysiology.
The output of the electrode array was amplified, digitized at 24.4 kHz, filtered (bandpass 300–5000 Hz), and monitored (Tucker-Davis Technologies), along with licking (300 Hz sampling rate), and reward presentation events. One electrode wire was selected to serve as a local reference. Our electrode arrays were fixed in place and no attempt was made to record from unique populations of neurons on different sessions. To compensate for the possibility that the same neurons were recorded across multiple days, two different behavioral tasks were used and statistical comparisons are only made within each task type. After all recording sessions were complete (between 10 and 21 d), mice were overdosed with urethane (i.p.) and transcardially perfused with 0.9% saline and 10% formalin. Brains were stored in 30% sucrose formalin at 4°C. OT recording sites were verified by histological examinations of slide-mounted, 40 μm coronal sections stained with a 1% cresyl violet solution (Fig. 1).
Analysis of behavioral and physiological data.
Single neurons were sorted offline in Spike2 (Cambridge Electronic Design), using a combination of template matching and cluster cutting based on principle component analysis. Single neurons were further defined as having <2% of the spikes occurring within a refractory period of 2 ms. Spike times associated with each trial were extracted and exported to MATLAB (MathWorks) for further analysis. To examine modulations in firing rate within a single trial, spike density functions were calculated by convolving spike trains with a function resembling a postsynaptic potential (Thompson et al., 1996). Mean firing rates across trials were measured in 50 ms bins, along with the 95% confidence interval. Mean baseline firing rate for each neuron was averaged across a 2 s period (−3 to −1 s relative to the onset of the first dry lick), whereas the mean prestimulus background firing rate was calculated over a 2 s period before reward delivery. As we reported previously (Gadziola et al., 2015), baseline firing rates of OT neurons were low with a median firing rate of 0.9 Hz (interquartile range: 0.2–9.6 Hz, range: 0–58 Hz).
To assess changes in activity during the dry lick period, spiking was aligned to the first dry lick instead of reward delivery, and background firing was calculated from −3 to −1 s relative to the onset of the first dry lick. On some trials, mice may have been licking before the first recorded dry lick (e.g., if licking was initiated before the completion of the FI, or if any pauses in licking reset the FR counter). Any trial in which the animal licked during the 2 s period before the first recorded dry lick was removed.
All statistical tests were two-sided and met assumptions of normality (Kolmogorov–Smirnov test). Statistical analyses were performed in SPSS 22.0 or MATLAB. All data are reported as mean ± SD unless otherwise noted.
Receiver operating characteristic analysis.
The area under the receiver operating characteristic (auROC) is a nonparametric measure of the discriminability of two distributions (Green and Swets, 1966). To normalize activity across neurons, we used an auROC method that quantifies stimulus-related changes in firing rate to the baseline activity on a 0–1 scale (for more details, see Cohen et al., 2012). A value of 0.5 indicates completely overlapping distributions, whereas values of 0 or 1 signal perfect discriminability. We calculated the auROC at each 50 ms time bin over a 4 s period centered on reward onset for each neuron. Values >0.5 indicate the probability that firing rates were increased relative to the prestimulus background (excitation), whereas values <0.5 indicate the probability that firing rates were decreased relative to the prestimulus background (inhibition). Similar trial numbers have been used for calculating auROC (Veit and Nieder, 2013; Gadziola et al., 2015). To obtain mean auROC values, the auROC values of individual neurons were averaged at each time bin. In some cases mean auROC values were computed separately for all excitatory and inhibitory neurons.
To evaluate reward-evoked responses, a permutation test was used to create a null distribution of auROC values ∼0.5, where the “response” and “background” firing rate labels were randomly reassigned and calculated 1000 times. Significant auROC bins were determined by testing whether the actual auROC value was outside the 95% confidence interval of this null distribution (Veit and Nieder, 2013). Neurons were considered reward-responsive if there were at least two consecutive significant bins within a 2 s period from reward delivery, to at least one of the presented reward types. To evaluate responses during the dry lick period, the above analysis was repeated, but with spike times aligned to first dry lick instead of reward onset, and periods of significant modulation were evaluated before and after the first dry lick.
Results
We monitored OT activity from 10 head-fixed mice that were trained to lick a spout according to a tandem FI M-FR schedule for acquisition of a liquid reward (Fig. 2A,B). Water-restricted mice were trained over successive days (see Materials and Methods) to display progressively longer bouts of licking to receive a single reward. Several key components of this task design were implemented to allow for powerful analysis of the neural data. The M-FR schedule ensured that mice would continuously lick the spout for >2.5 s before reward, so that reward delivery was not confounded by the onset of licking. Mice were also required to lick at a rate >3.3 Hz to more closely match the licking behavior observed after reward delivery. Last, the FI schedule guaranteed a minimum intertrial interval of >11 s in which to monitor activity. Licks in the window preceding and following reward delivery are defined as “dry” or “wet” licks, respectively. Thus, this task structure enabled us to monitor changes in activity in response to the instrumental behavior and reinforcer independently.
Trained mice contributed data for two experimental sessions recorded on different days. In the first session, a water reward was delivered at three different volumes (4, 8, and 12 μl). Mice then received additional training with three different reward types (water, saccharin, or quinine) to increase the number of trials performed within a single session. For the second experimental session, all three reward types were presented at two different volumes each (4 and 12 μl). As expected, behavioral performance in our task was considered to be goal-directed because the instrumental action was dramatically suppressed after a subset of mice were sated with ad libitum access to water 30 min before testing in a reinforcer devaluation experiment (113 ± 45 vs 3 ± 7 completed trials; paired t test, t(3) = 5.20, p = 0.014; Fig. 2C).
OT neural dynamics are shaped by appetitive instrumental behavior
We found that OT neurons encode an appetitively driven instrumental behavior. Specifically, after mice learned the task (range = 3–9 d of training), we observed that the majority of OT single neurons modulated their firing rates during the 2 s dry lick period relative to baseline rates. Some neurons progressively increased their discharge throughout the entire dry lick period (Fig. 3A1,A2), whereas others had a more transient discharge around the start of the dry lick period, with only modest occasional firing during the sustained licking (Fig. 3B1,B2). To characterize responses across the population, we first removed any trials in which the animal licked during the 2 s before the first recorded dry lick (see Materials and Methods). We then measured the temporal response profile of each neuron by quantifying changes in firing rate from baseline using an ROC analysis (Cohen et al., 2012; Veit and Nieder, 2013; Gadziola et al., 2015). This analysis revealed that 69% (56/81) of neurons significantly modulated their firing rate during the dry lick period, with 71% of these responsive neurons increasing their firing rates relative to baseline and the remaining neurons suppressing their firing rates relative to baseline. Interestingly, the temporal response profile revealed that the modulation in firing rate occurred before the first dry lick for 79% of the responsive neurons (Fig. 3C,D). On average, the latency of significant response was 186 ms before the first dry lick, with the earliest response occurring as early as 550 ms prior. There were no discernable differences in the temporal response pattern of neurons that increased vs decreased firing rates in response to the dry licking period (Fig. 3C,D), suggesting that these neurons are similarly driven by instrumental behaviors. Thus, OT neurons can represent both the onset and progression of the instrumental licking behavior associated with reward, with the activity before the first lick likely internally generated.
The influence of motivation on OT activity
Is the activity observed before the onset of dry licking related to motivational drive as observed in other systems (Rolls, 2005; Gutierrez et al., 2006)? To address this question we first tested whether the motivational drive to perform the task declines across the session. In other words, do the animals get less thirsty throughout the session? Session trials were split into thirds, and the early and late trial blocks were compared. Three behavioral measures suggested that the reinforcing value of the reward declined over the course of the session, coinciding with an increase in the total amount of water earned. First, when the mice were allowed ad libitum access to water after a session, they consumed 40% less water on average when they were removed during late, compared with early session trials (0.83 ± 0.44 vs 1.40 ± 0.44 ml, respectively; Fig. 4A, left), suggesting that mice were less thirsty. Second, the mean latency to initiate the first dry lick after completion of the FI schedule was significantly delayed in late, compared with early session trials (15.57 ± 12.56 vs 7.75 ± 5.97 s, respectively; paired t test, t(24) = −3.43, p = 0.002; Fig. 4A, middle), indicating that mice were slower to initiate actions that would earn rewards. Finally, the duration of wet licking after reward delivery decreased in late, compared with early, session trials (3.31 ± 0.64 vs 4.08 ± 0.45 s, respectively; paired t test, t(13) = 4.27, p < 0.001; Fig. 4A, right), consistent with a decrease in reinforcer value (Davis, 1973; Davis and Levine, 1977; Travers and Norgren, 1986; Taha and Fields, 2005; Travers, 2005). Together, these results suggest that motivational drive to perform the task declines as the total amount of water earned increased across a recording session. We next compared the temporal response profile of each neuron during early (high motivational drive) and late session trials (low motivational drive). On late session trials both excitatory and inhibitory OT neurons increased the amount of early modulation in firing rate observed before the first dry lick (Fig. 4B). The mean onset of significant response was statistically earlier for late versus early session trials (−176 ± 291 vs 36 ± 289 ms, respectively; two-sample t test, t(101) = 3.71, p < 0.001). In contrast, firing rates of neurons during the baseline period and first 500 ms postreward onset did not significantly differ between early and late session trials. Thus, as motivational drive declines across an experimental session, the activity of OT neurons is enhanced earlier relative to the instrumental action.
OT neurons encode rewards based upon their type and magnitude
Within these same mice, we next asked whether the firing rates of OT neurons are modulated in response to reward delivery itself, independent of the modulation occurring during the instrumental behavior. An ROC analysis was used to test for significant modulation in firing rate relative to the background firing rates observed during the 2 s dry lick period before reward delivery. Thus, a reward-evoked response must overcome any modulation in firing rate that was already occurring in response to the instrumental behavior. A substantial number of neurons (53/81, 65%) were significantly modulated by reward compared with reward omission trials (Fig. 5A). On average, these responses upon reward presentation were transient, returning to background firing rates within <500 ms from reward delivery (Fig. 5A). Looking across individual neurons, many of the excitatory reward responses were transient (Fig. 5B, horizontal arrow), whereas inhibitory responding neurons were more likely to sustain their decreased firing rate after reward delivery (Fig. 5B, arrowhead).
Do OT neurons encode reward magnitude? In the first experimental session, three different volumes (4, 8, and 12 μl) of water reward were randomly varied throughout the session and were not associated with any predictive cue. Used as an indicator of perceived reward palatability, we first examined the duration of the licking cluster in response to reward delivery (Davis, 1973; Davis and Levine, 1977; Travers and Norgren, 1986; Spector et al., 1998). As expected, we found that the average duration of licking could discriminate the magnitude of the water reward, with increasing volumes resulting in significantly increased lick cluster durations (Fig. 6A; 3.1 ± 0.2, 3.8 ± 0.2, and 4.4 ± 0.2 s for 4, 8 and 12 μl water, respectively; F(3,48) = 16.9, p < 0.001, repeated-measures ANOVA with Bonferroni correction). This confirms that the mice detected the differences in reward volumes. We next explored whether these differences are reflected among the activity of OT neurons, and found that some neurons robustly encoded reward magnitude. For instance, the example neuron illustrated in Figure 6B displayed a transient excitatory response locked to reward delivery, with increasing firing rates for the three increasing magnitudes of water reward. We examined the number of neurons with significant responses to any of the three reward volumes within the first 500 ms from reward delivery, and found that neurons were not equally responsive to the different reward volumes (χ(2, N=53)2 = 5.99, p < 0.05). Across all responsive neurons, auROC values were greater for the two larger reward sizes compared with the smallest reward size (Fig. 6C,D). These findings indicate that OT neurons encode the magnitude of reward, particularly between small and larger sized volumes.
Do OT neurons encode reward type? In a separate experimental session, reward delivery was randomized among three different reward types: water, saccharin, or quinine, presented at two different magnitudes (4 or 12 μl). As before (Fig. 6A), there was a main effect of volume on licking duration, with large magnitude rewards evoking longer durations of licking compared with small magnitude rewards (Fig. 7A; F(1,10) = 17.8, p < 0.001, repeated-measures ANOVA). There was also a significant interaction between the effects of reward type and magnitude on the duration of the licking cluster (F(2,20) = 4.52, p = 0.024, repeated-measures ANOVA). Although the type of reward did not have an effect at small magnitudes, saccharin evoked a longer duration of licking compared with quinine at large magnitudes (Fig. 7A; p = 0.025, with Bonferroni correction). Together, this result illustrates that mice identify differences among the rewards used, and so we again explored whether these differences are encoded among OT neurons. In this task, 68% (36/53) of neurons were significantly modulated by at least one reward type. Neurons differentially modulated their firing rate and duration of response among reward types, as illustrated by the two example neurons in Figure 7. The first example neuron exhibited a large increase in firing in response to both water and quinine but not to saccharin (Fig. 7B). The second example neuron increased its firing in response to both saccharin and water, but not quinine (Fig. 7C). Across the population of responsive neurons, 33% (12/36) of responsive neurons were highly selectively for a specific stimulus; responding to just one of the six presented rewards (Fig. 7D). The majority of these selective neurons (83%, 10/12 neurons) were responsive for saccharin (split equally across the 2 volumes), suggesting a strong preference for this highly palatable reinforcer (Fig. 7E). Among the remaining nonselective neurons (n = 24), the percentage of neurons responding to each reward type was roughly uniform across the different rewards (Fig. 7E). Thus, although some OT neurons are highly selective for just one reward type, the entire population of neurons is able to collectively represent different rewards. As a population, OT neurons encode receipt of a reinforcer and do so based on the type and volume of reward.
Discussion
The orchestration of goal-directed behaviors relies on decision-making processes that evaluate available rewards and their current value based on motivational and contextual information. Neural responses to rewards can include distinct anticipatory and consummatory components related to reward receipt, and several brain regions are involved in reward processing, including midbrain dopaminergic nuclei, striatum, orbitofrontal cortex (OFC), and the amygdala (Berridge, 1996; Schoenbaum et al., 1999; Schultz et al., 2000; O'Doherty, 2004; Roesch et al., 2007a; Ilango et al., 2014). As the first study to describe the neural representations of goal-directed actions and their outcomes in the OT, the novel insights reported here advance our understanding of how substructures within the ventral striatum may collectively function to guide motivated behavior.
Known aspects of reward-related encoding in the ventral striatum
A complex array of sensory and contextual information arrives in the ventral striatum from several cortical and subcortical structures in both rodents and primates (Zahm and Brog, 1992; Heimer, 2003; Ikemoto, 2007; Haber, 2011). In rodents, both the NAc and OT receive similar inputs that mediate reward processing, including afferents from the prefrontal cortex (McGeorge and Faull, 1989; Berendse et al., 1992a; Brog et al., 1993), basolateral amygdala (Russchen and Price, 1984; Brog et al., 1993; Wright et al., 1996), subiculum of the hippocampus (Kelley and Domesick, 1982; Groenewegen et al., 1987; Brog et al., 1993), paraventricular thalamic nucleus (Berendse and Groenewegen, 1990; Moga et al., 1995), and ventral tegmental area (Fallon and Moore, 1978; Swanson, 1982; Del-Fava et al., 2007). Efferent projections of the ventral striatum are sent to the ventral pallidum (Heimer et al., 1987, 1991; Zhou et al., 2003), lateral hypothalamus (Berendse et al., 1992b; Usuda et al., 1998), and midbrain dopaminergic nuclei (Berendse et al., 1992b; Usuda et al., 1998) to then develop and execute appropriate action plans. Notably, both the afferent and efferent projections vary with mediolateral topography (Ikemoto, 2007). Despite substantial overlap in their anatomical connections, some of this connectivity is unique between structures, which suggests that the NAc and OT may serve distinct functions in motivated behaviors. For example, only the OT is highly interconnected with olfactory regions (White, 1965; Haberly and Price, 1977; Luskin and Price, 1983; Carriero et al., 2009; Kang et al., 2011; Sosulski et al., 2011; for review, see Wesson and Wilson, 2011) and provides a direct projection to posterior regions of the OFC and agranular cortices (Barbas, 1993; Illig, 2005; Hoover and Vertes, 2011).
Elegant work by numerous groups has established that neurons within the NAc encode conditioned task-related events, including the instruction or trigger cues that signal subsequent outcomes, the preparation, initiation, and execution of behavioral actions, and the sensory properties of reinforcers (Apicella et al., 1991; Schultz et al., 1992; Williams et al., 1993; Hollerman et al., 1998; Hassani et al., 2001; Carelli, 2002; Setlow et al., 2003; Taha and Fields, 2005; Roesch et al., 2009). During goal-directed behaviors, the activity of rodent NAc neurons is characterized by anticipatory changes in firing preceding the operant response, followed by either an increase or decrease in firing after delivery of the reinforcer (Carelli et al., 1993, 2000; Chang et al., 1996; Lee et al., 1998; Martin and Ono, 2000). Further, NAc neurons can differentially encode reward value and motivation (Bissonette et al., 2013), and integrate the value of expected rewards with directions of required movements during decision making (Roesch et al., 2009; van der Meer and Redish, 2009). Dopamine released by ventral tegmental area terminals within the ventral striatum modulate glutamatergic input onto medium spiny neurons (Nicola et al., 2000; O'Donnell, 2003) and is essential for signaling reward and promoting goal-seeking behavior (Wise, 1982; Salamone and Correa, 2002; Nicola, 2007; Tsai et al., 2009; du Hoffmann and Nicola, 2014). When dopamine is depleted within the NAc, animals are less likely to engage in instrumental responses with a high work requirement and often fail to respond to reward-predictive cues (Salamone et al., 2003; Salamone and Correa, 2012). The extensive body of literature on the NAc has led to the proposal that the ventral striatum serves as a “critic” in actor-critic models of reinforcement learning (O'Doherty, 2004; van der Meer and Redish, 2011), providing necessary information to midbrain dopaminergic neurons for updating of reward prediction errors.
Novel insights into OT representations of reward-related behaviors and outcomes
Our findings suggest that the OT may be a critical site for translating the representation of “reward” into overt action. OT neurons represented the onset and progression of the instrumental licking behavior, similar to what has been observed among NAc neurons (Carelli et al., 1993; Chang et al., 1996). Interestingly, we find that neurons respond before the first lick, and that the latency of response decreases even earlier as the session progresses. Pre-response activity may relate to the involvement of the OT in responding to the associative contingencies of conditioned stimuli (Gadziola et al., 2015), or in the case here, to self-initiated behaviors in anticipation of expected reward. We predict that this increase in pre-response OT activity may be an essential component for invigorating instrumental behavior in states of reduced motivation, and that dopamine has a crucial role in promoting performance within high-effort instrumental tasks, such as the one we used (Salamone et al., 2007; Nicola, 2010). Further studies investigating the causal mechanism of this pre-response activity are required to test this hypothesis, as there may be alternative explanations for the change across a session, such as over-trial learning. The monitoring of licking behavior by OT neurons could also play an important role in the regulation of appetitive consummatory behaviors, as seen in the OFC (Rolls, 2005; Gutierrez et al., 2006). Future studies will need to address whether the OT is necessary or sufficient in regulating licking or other appetitive operant behaviors.
Another major finding is that OT neurons encode natural reinforcers with changes in firing rate. Although these neurons may also respond to the instrumental behavior itself, they nevertheless display a significant change in firing after reward delivery beyond any modulation occurring in response to the instrumental licking. Although excitatory responses were transient, neurons displaying reward-evoked inhibition were more likely to sustain the suppressed firing rate relative to the background period. These suppressive responses likely represent neurons that increase activity during the instrumental period in anticipation of reward and terminate their response upon reward delivery.
Our results revealed robust reward-evoked responses among neurons with different ranges of selectivity to the different reward types and magnitudes (small vs larger volumes) presented, despite the fairly limited stimulus set used. Across the entire population of sampled neurons, the effectiveness of particular rewards at evoking a response was roughly uniform. However, a subset of highly selective neurons displayed a preference for saccharin, suggesting that palatability is a significant factor in reward encoding within the OT. Sensory properties of the reinforcer may underlie this discrimination (including gustatory mechanisms), with OT neurons differentially tuned for different reinforcers. Although not necessarily independent from the above, it is also possible that the responses of OT neurons depend upon the current value of the rewards; something that could be determined with alternative task designs that allow for testing of selective devaluation or contrast (Dickinson and Balleine, 1994; Taha and Fields, 2005), or by evaluating reinforcer selectivity with concentration response functions.
It will be important for future studies to identify how distinct cell classes or regions within the OT are contributing to motivated behavior. For example, optogenetic approaches would allow for identifying distinct cell types within the OT (Millhouse and Heimer, 1984; Chiang and Strowbridge, 2007), which may differentially contribute to the reward response. There is also evidence for functional heterogeneity between the medial and lateral OT (Ikemoto, 2003; Agustín-Pavón et al., 2014; DiBenedictis et al., 2015; Murata et al., 2015) that may be subserved by the mediolateral topographical projection patterns of dopaminergic (Newman and Winans, 1980) and other inputs into the OT (Schwob and Price, 1984; Ikemoto, 2007). Although we did not have a sufficient number of neurons in each OT subregion to address this question, it is possible that the OT is spatially heterogeneous in its encoding of motivated actions and outcomes.
In our task, water-deprived mice engaged in a tandem fixed-interval modified fixed-ratio schedule to obtain a fluid reinforcer. This task structure enabled us to independently monitor activity changes in response to the instrumental period and reinforcer. As licking behavior involves a combination of chemosensory, motor, and motivational responses, the act of licking itself is inextricably tied to reward (Gutierrez et al., 2006). Thus, we expect that both the firing rates of neurons and measures of licking behavior should be effective at discriminating among reward types. It is unlikely that the changes in neural activity we observed were exclusively driven by the act of licking for several reasons. First, because mice are engaged in near continuous licking behavior before and after reward delivery, the reward-evoked activity cannot easily be explained by changes in motor activity or arousal levels. Further, if changes in licking behavior were driving neural activity then one would expect to see a much higher percentage of neurons responding to the large volume rewards compared with small volume rewards.
The duration of licking clusters is used to infer solution palatability, though this is typically tested under ad libitum access conditions (Davis, 1973; Davis and Levine, 1977; Travers and Norgren, 1986; Spector et al., 1998). Although licking cluster durations have not been studied to our knowledge in response to a single drop of fluid, they do reflect palatability for brief (1–2 s) presentations of solution (Taha and Fields, 2005), and the amount of licking is increased after delivery of large compared with small rewards (Bissonette et al., 2013). It is possible that rodents do not discriminate the palatability of tastants as well under conditions of water-deprivation, because the drive to restore fluid balance should override the natural palatability of solutions. Indeed, thirsty rodents ingest similar volumes of water, quinine and sucrose solutions independent of their palatability during the initial period of consumption (Scalera, 2000). We find that mice extend the duration of licking for large saccharin rewards relative to quinine of the same volume. However, it is not clear whether the increased licking duration to large rewards is related to a higher associated value of the stimulus (Bissonette et al., 2013; Burton et al., 2014) or because of the additional time required to consume a larger volume.
Conclusion
Our findings are in accordance with the principle that parallel processing of motivated behaviors and their outcomes is occurring within ventral striatum substructures. Although the NAc and OT share many features in common, some of their unique connectivity suggests that they serve distinct functions in motivated behaviors. For example, the OT may play a particularly important role in the processing of social and consummatory motivated behaviors (especially those directed by olfactory cues), and in influencing the OFC representation of outcome expectancies (Kringelbach, 2005; Schoenbaum et al., 2006). The current findings, along with our previous work (Gadziola et al., 2015), suggests that the OT is highly sensitive to the associative contingencies of conditioned cues, initiation and maintenance of instrumental behaviors, and outcomes of natural rewards. This accumulating evidence sets reward-related processing within the OT apart from other olfactory cortical regions, such as piriform cortex (Calu et al., 2007; Roesch et al., 2007b; Gire et al., 2013), and appears more in line with reward-related responses observed in the NAc and OFC (Carelli et al., 1993; Schoenbaum and Roesch, 2005; Taha and Fields, 2005; Roesch et al., 2007b; Bissonette et al., 2013). Furthermore, the OT and piriform cortex are also distinct in their anatomical connections with the OFC (Illig, 2005; Hoover and Vertes, 2011).
Both the NAc and OFC are thought to serve as “critics” in actor-critic models of reinforcement learning—providing unique information related to predicted outcome changes and general affective information, respectively (Schoenbaum et al., 2009). Based upon our results, we propose that the OT also plays the role of a critic. Investigating the unique dynamics of each ventral striatum substructure, the cell types involved, and their dependence on one another will have profound impacts on our understanding of how the brain coordinates reward value judgements to ultimately guide motivated behavior (Stott and Redish, 2015). How activity in the ventral striatum may lead to reward preferences and the consumption of natural rewards is fundamental to understanding the mechanisms involved in aberrant reward-associations and anhedonia, which is observed in a variety of psychiatric disorders, including addiction and mood disorders (Lobo and Nestler, 2011; Russo and Nestler, 2013; Ikemoto and Bonci, 2014).
Footnotes
This work was supported by grants from the National Institutes of Health (NIDCD R01DC014443), National Science Foundation (IOS-1121471), Alzheimer's Association (14-305847), and the Mt Sinai Healthcare Foundation. We thank Kate White for assisting with the histology.
The authors declare no competing financial interests.
- Correspondence should be addressed to Dr. Daniel W. Wesson, Department of Neurosciences, Case Western Reserve University School of Medicine, 2109 Adelbert Road, Cleveland, OH 44106. dww53{at}case.edu