Abstract
Animals can readily learn that stimuli predict the absence of specific appetitive outcomes; however, the neural substrates underlying such outcome-specific conditioned inhibition remain largely unexplored. Here, using female and male rats as subjects, we examined the involvement of the lateral habenula (LHb) and of its inputs onto the rostromedial tegmental nucleus (RMTg) in inhibitory learning. In these experiments, we used backward conditioning and contingency reversal to establish outcome-specific conditioned inhibitors for two distinct appetitive outcomes. Then, using the Pavlovian-instrumental transfer paradigm, we assessed the effects of manipulations of the LHb and the LHb–RMTg pathway on that inhibitory encoding. In control animals, we found that an outcome-specific conditioned inhibitor biased choice away from actions delivering that outcome and toward actions earning other outcomes. Importantly, this bias was abolished by both electrolytic lesions of the LHb and selective ablation of LHb neurons using Cre-dependent Caspase3 expression in Cre-expressing neurons projecting to the RMTg. This deficit was specific to conditioned inhibition; an excitatory predictor of a specific outcome-biased choice toward actions delivering the same outcome to a similar degree whether the LHb or the LHb–RMTg network was intact or not. LHb lesions also disrupted the ability of animals to inhibit previously encoded stimulus–outcome contingencies after their reversal, pointing to a critical role of the LHb and of its inputs onto the RMTg in outcome-specific conditioned inhibition in appetitive settings. These findings are consistent with the developing view that the LHb promotes a negative reward prediction error in Pavlovian conditioning.
SIGNIFICANCE STATEMENT Stimuli that positively or negatively predict rewarding outcomes influence choice between actions that deliver those outcomes. Previous studies have found that a positive predictor of a specific outcome biases choice toward actions delivering that outcome. In contrast, a negative predictor of an outcome biases choice away from actions earning that outcome and toward other actions. Here we reveal that the lateral habenula is critical for negative predictors, but not positive predictors, to affect choice. Furthermore, these effects were found to require activation of lateral habenula inputs to the rostromedial tegmental nucleus. These results are consistent with the view that the lateral habenula establishes inhibitory relationships between stimuli and food outcomes and computes a negative prediction error in Pavlovian conditioning.
- lateral habenula
- negative prediction error
- outcome-specific conditioned inhibition
- Pavlovian-instrumental transfer
- reversal learning
- rostromedial tegmental nucleus
Introduction
Animals have the capacity to extract predictive information from the environment to anticipate important events (Hollis, 1984). This capacity entails learning about both excitatory stimuli, which positively predict these events, and about inhibitory stimuli, which negatively predict these events. Excitors and inhibitors have traditionally been dissociated by their respective influence on conditioned responses (Pavlov, 1927). Stimuli that positively predict appetitive outcomes elicit approach whereas negative predictors evoke withdrawal and are often characterized by their ability to generate retardation and subtractive summation effects (Rescorla, 1969). Nevertheless, although these phenomena are useful in classifying predictors as inhibitory, they do not provide any detailed information about the content of inhibitory learning. In recent studies, we have used the outcome-specific Pavlovian-instrumental transfer paradigm to provide this information (Laurent et al., 2015, 2016; Laurent and Balleine, 2015). In specific transfer, a stimulus predicting a particular appetitive outcome guides choice toward actions earning that outcome. In contrast, a stimulus predicting the absence of a particular outcome biases choice away from actions earning that outcome and toward actions delivering a different outcome. Essentially, the pattern of choice produced by excitors is reversed with inhibitors, providing compelling evidence that both positive and negative predictions are established in an outcome-specific manner.
Excitors and inhibitors of appetitive outcomes can also be dissociated at a neuronal level by examining the firing of midbrain dopamine (DA) neurons that compute the prediction error controlling appetitive Pavlovian conditioning (Schultz et al., 1997). This error signal reflects the discrepancy between the actual and expected outcome of a trial (Rescorla and Wagner, 1972); a positive error produces excitatory learning, whereas a negative error drives inhibitory learning. Accordingly, learning about an appetitive excitor has been associated with phasic activation of midbrain DA neurons, whereas the activity of these neurons is depressed by an appetitive inhibitor (Tobler et al., 2003). Recent studies suggest that this depression of midbrain DA activity is mediated by the lateral habenula (LHb). Neurons in this region display activity inverse to that of midbrain DA neurons: they are inhibited by appetitive excitors and are excited by appetitive inhibitors (Matsumoto and Hikosaka, 2007; Bromberg-Martin et al., 2010). Furthermore, the latter excitation precedes the depression observed in midbrain DA neurons. Importantly, this depression is not directly controlled by changes in LHb activity. Rather, the LHb is thought to activate GABAergic neurons in the rostromedial tegmental nucleus (RMTg), which, in turn, silence the activity of midbrain DA neurons (Jhou et al., 2009; Brown et al., 2017). Together, these findings suggest that the LHb mediates negative prediction errors by inhibiting the activity of midbrain DA neurons via its projection to the RMTg (Proulx et al., 2014); however, it is unknown whether this inhibitory influence is required for outcome-specific inhibition.
The present experiments investigated the role of the LHb and its input to the RMTg in choice between actions driven by outcome-specific inhibition. To achieve this, we disrupted LHb function using either electrolytic lesions (Tian and Uchida, 2015) or the selective ablation of LHb neurons projecting to the RMTg (Yang et al., 2013; Marchant et al., 2016). Rats then received Pavlovian conditioning to establish outcome-specific inhibitors of two distinct outcomes followed by instrumental training, during which these outcomes could be earned by distinct lever-press actions. Finally, we assessed the effect of the predictive stimuli on choice between actions. In control rats, in which LHb function was intact, we expected the inhibitors to bias choice away from the action earning the negatively predicted outcome and toward the action delivering the other outcome (Laurent et al., 2015, 2016; Laurent and Balleine, 2015). If, however, LHb and its inputs to the RMTg mediate the functional effects of outcome-specific conditioned inhibitors, then rats with disrupted LHb function should be unable to use inhibitory information to guide choice.
Materials and Methods
Overview of the experiments
A total of three experiments were completed. The first two experiments examined the effects of electrolytic lesion of the LHb on choice between actions driven by either outcome-specific inhibition (Exp. 1) or outcome-specific excitation (Exp. 2). Experiment 3 used a viral strategy to determine the role of LHb inputs to the RMTg on choice between actions driven by an outcome-specific conditioned inhibitor.
Subjects
The subjects in Experiments 1 and 2 were 49 experimentally naive Hooded-Wistar rats (≥12 weeks old) obtained from the Laboratory Animal Services at the University of Sydney (Sydney, Australia). The subjects in Experiment 3 were 12 female and 12 male experimentally naive Long–Evans rats (≥12-weeks old) obtained from the Animal Resources Center (Perth, Australia). All animals were housed in plastic boxes (2–4 rats per box) located in a climate-controlled colony room and were maintained on a 12 h light/dark cycle (lights on between 7:00 A.M. and 7:00 P.M.). Three days before the behavioral procedures, the rats were handled daily and were put on a food-deprivation schedule to maintain them at ∼85% of their ad libitum feeding weight. The Animal Ethics Committees at the University of Sydney (Exps. 1 and 2) and at the University of New South Wales (Exp. 3) approved all experimental procedures. All these procedures were conducted between 7:00 A.M. and 7:00 P.M.
Surgery
At the time of surgery, male rats weighed between 320 and 390 g whereas female rats weighed between 250 and 300 g. A continuous flow of mixed isoflurane and oxygen gas was used to anesthetize rats, which were then placed in a stereotaxic frame (Kopf Instruments). An incision was made to expose the scalp and the incisor bar was adjusted to align bregma and lambda on the same horizontal plane. For rats used in Experiments 1 and 2, holes were drilled bilaterally into the skull above the LHb at the following coordinates: −3.4 mm anterior-posterior (AP); ±1 mm medial-lateral (ML); −5.7 mm dorsal-ventral (DV) relative to bregma. The lesion was induced by passing a current at 7–10 V for 20 s with an LM4 lesion maker (Grass Instruments) and an insulated electrode that was bared 1 mm from the tip. The same procedures were applied to rats in the Sham groups except that no current was passed. For rats used in Experiment 3, holes were drilled into the skull above the two targeted brain regions according to the following coordinates: the LHb: −3.4 or −3.6 mm AP; ±1 mm ML; −5.1 or −5.3 mm DV relative to bregma; RMTg: −7.2 or −7.4 mm AP; ±0.8 mm ML; −7.9 or −8.1 mm DV relative to bregma. AAV5-CMV-HI-eGFP-Cre-WPRE-SV40 (AAV-Cre; Penn Vector Core) was then bilaterally infused in the RMTg, whereas AAV5-flex-taCasp3-TEVp (Casp3; Gene Therapy Center Vector Core, University of North Carolina at Chapel Hill) was bilaterally infused in the LHb. Both viruses were delivered at a rate of 0.1 μl/min using a 1 μl Hamilton syringe. A total volume of 0.3 μl of AAV-Cre was used, whereas 0.6 μl of Casp3 was infused into the LHb. Control rats in Experiment 3 were infused with artificial CSF (ACSF) rather than Casp3 in the LHb. After each infusion, the needle of the syringe was left in place for an additional 10 min to allow for the diffusion of the viruses or ACSF. Following lesion or viral infusion, the incision was closed by using wound-closure clips (EZ Clip, Stoelting). All rats were given a 0.4 ml intraperitoneal injection of procaine penicillin solution (300 mg/kg; Ilium Benicillin) after the surgery. Animals were allowed to recover for 7 d before the behavioral procedures.
Behavioral apparatus
Training and testing took place in 16 Med Associates operant chambers enclosed in sound-resistant and light-resistant shells. Each operant chamber was equipped with a pump fitted with a syringe that could deliver 0.1 ml of a 20% sucrose solution into a recessed magazine. Each chamber was also equipped with a pellet dispenser that could individually deliver grain food pellets (45 mg; BioServe Biotechnologies). The chambers contained two retractable levers that could be inserted to the left and right side of the magazine. An infrared photobeam crossed the magazine opening, allowing for the detection of head entries. A 3 W, 24 V house light provided illumination of the operant chamber, and each chamber contained a Sonalert that, when activated, delivered a 3 kHz pure tone, a 28 V DC mechanical relay that was used to deliver a 2 Hz clicker stimulus, and a white-noise generator (80 dB). A set of two microcomputers running proprietary software (Med-PC, MED Associates) controlled all experimental events and recorded magazine entries and lever presses. Outcome devaluation was conducted in a separate room with 16 plastic feeding boxes in which the outcomes were presented and could be consumed.
Behavioral procedures
Backward training.
Rats used in Experiments 1 and 3 received eight sessions of backward training across 8 consecutive days. These sessions used two distinct auditory stimuli (S1 and S2; clicker and tone) and two distinct food outcomes (O1 and O2; pellets and sucrose solution). Each session involved 24 outcome deliveries (12 of each outcome) followed by presentation of one of the two stimuli, which was turned on 10 s after the rats entered the magazine to consume the outcome. Delivery of O1 was always followed by presentation of S1, and delivery of O2 was always followed by presentation of S2. The stimulus duration varied from 2 to 58 s with an average of 30 s and was separated by an intertrial interval (ITI) that varied from 80 to 200 s with an average of 150 s. We applied these parameters because they have been shown to generate Pavlovian inhibitors in the past (Laurent et al., 2015; Laurent and Balleine, 2015). The stimuli were presented in one of three pseudorandom orders and the stimulus–outcome associations were counterbalanced both between and within groups. Throughout these sessions, both levers were retracted and magazine entries were recorded and separated into a stimulus period and a period before outcome delivery of equal length (Pre; 30 s).
Forward training.
Rats in Experiment 2 received forward training rather than backward training. This involved eight sessions across 8 consecutive days. These sessions used the same auditory stimuli (S1 and S2; clicker and tone) and the same food outcomes (O1 and O2; pellets and sucrose solution) as Experiment 1. During each session, each stimulus was presented four times in a pseudorandom order and every presentation lasted for 2 min followed by a variable ITI, which averaged 5 min. The two outcomes were delivered during each stimulus using a random time 30 s schedule. O1 was delivered during S1; O2 was delivered during S2. The stimuli were presented in one of three pseudorandom orders and the stimulus–outcome associations were counterbalanced both between and within groups. Throughout the session, both levers were retracted and magazine entries were recorded and separated into a stimulus period and a period of equal length prior to outcome delivery (Pre; 30 s).
Instrumental training.
Following backward or forward training, all rats received instrumental training. Left and right lever press actions (A1 and A2) were trained to deliver the two outcomes (O1 and O2) in separate sessions daily. The order of the sessions was counterbalanced, as were the action–outcome relationships, which were also counterbalanced with the stimulus–outcome relationships established during backward/forward training. Each session ended when 20 outcomes were earned or when 30 min had elapsed. For the first 2 d, lever pressing was continuously reinforced (i.e., each action earned an outcome). Then, the probability of the outcome given action was gradually shifted over days using increasing random ratio (RR) schedules: a RR5 schedule (p = 0.2) was used on days 3–5 and a RR10 schedule (p = 0.1) was used on days 6–8.
Transfer test.
After the final day of RR10 training, rats receive a transfer test, during which both levers were inserted into the box, but no outcomes were delivered. Responding was extinguished on both A1 and A2 actions for 8 min to establish a low rate of baseline performance. Then, the rats received four 1 min presentations of each stimulus in the following order: tone–clicker–clicker–tone–clicker–tone–tone–clicker. The ITI was set at 3 min. Magazine entries and lever-pressing rates were recorded and separated into Pre-S and S periods. Rats in Experiment 1 received two transfer tests across 2 consecutive days in the manner just described. Rats in Experiment 2 received a second transfer test conducted after reversal training. This second test was identical to the one just described.
Devaluation test.
The devaluation test was conducted to assess the effect of LHb lesion or disruption of the LHb–RMTg pathway on choice following changes in reward value. The test was conducted in all rats after their final transfer test. Before this test, rats received 2 d of instrumental retraining on the RR10 schedule. On the day of test, all rats received 1 h access to one of the two outcomes before being given a choice test. This test lasted 5 min, during which both levers were inserted into the box, but no outcomes were delivered. The lever-pressing rates were recorded throughout the test. The same procedure was repeated the following day except that the other outcome was devalued. The order of the outcome devalued was counterbalanced within groups.
Reversal training.
Reversal training was administered to the forwardly trained rats used in Experiment 2. It was identical to the initial forward training except that the stimulus–outcome relationships were reversed. Thus, S1 was immediately followed by the delivery of O2 and S2 was followed by the delivery of O1. Eleven sessions of reversal training were given and four sessions of RR10 instrumental training were also provided to reinstate lever-press responding.
Histology
At the end of the behavioral procedures, rats were deeply anesthetized with an injection of sodium pentobarbital and perfused through the heart with ice-cold 4% paraformaldehyde in 0.1 m phosphate buffer. Brains were extracted and postfixed in 4% paraformaldehyde overnight. Using a Vibratome (Leica), we cut 40 μm free-floating coronal sections through the LHb and the RMTg. LHb sections collected in Experiments 1 and 2 were then stained with cresyl violet. The area of lesion was determined under a microscope by a trained observer, who was unaware of the treatment groups, with boundaries defined by the atlas of Paxinos and Watson (Paxinos and Watson, 2006). Animals with inaccurate or extensive damage at the lesion site were excluded from the statistical analysis. Based on these criteria, five animals were excluded. Brain sections collected in Experiment 3 were processed for immunofluorescence.
Immunofluorescence staining
Individualized free-floating sections from the RMTg (Exp. 3) were rinsed 10 min with PBS three times, incubated for 40 min in PBS containing 0.2% Triton X-100 (Sigma-Aldrich), and rinsed 10 min three times with PBS again. The eGFP signal in the RMTg was amplified through incubation (24 h, 4°C) with polyclonal chicken anti-eGFP primary antibody (1:500; #GFP-1020, Aves Labs) diluted in PBS. The following day, RMTg sections were rinsed 10 min with PBS three times and incubated 60 min at room temperature with donkey anti-chicken Alexa 488 (1:400; #703-745-155, Jackson ImmunoResearch Laboratories).
Individualized free-floating sections from the LHb (Exp. 3) were rinsed 10 min with PBS three times, incubated for 40 min in PBS containing 0.5% Triton X-100 and 10% horse serum, and rinsed another three times with PBS. The sections were then incubated (48 h, 4°C) with polyclonal mouse anti-Cre recombinase (1:1000; #MAB3120, Merck Millipore) diluted in PBS containing 0.2% Triton X-100 and 2% horse serum. Two days later, the sections were rinsed 10 min with PBS three times and incubated 60 min at room temperature with donkey anti-mouse Alexa 488 (1:400; #R37114, Thermo Fisher Scientific) diluted in PBS.
Following the second incubation, all sections were rinsed three times for 10 min in PBS, mounted on Superfrost Plus-coated slides (Thermo Fisher Scientific) and let dry for ∼10 min before being coverslipped in Vectashield fluorescence medium (Vector Laboratories).
Immunofluorescence analysis
The extent of the viral infection in the RMTg was assessed using confocal microscopy (FV1000, Olympus). For each subject, six sections were selected along the AP axis to encompass the RMTg. A trained observer blind to subjects' group designation determined the extent of infection (i.e., region where cell bodies were showing viral expression) on each section using the boundaries defined by Paxinos and Watson (Paxinos and Watson, 2006). Confocal microscopy was also used to determine the number of Cre-recombinase-positive neurons in the LHb. For each subject, one section was selected and a trained observer blind to subjects' group designation determined the number of Cre-recombinase-positive neurons in each section using the boundaries defined by Paxinos and Watson (Paxinos and Watson, 2006). Two animals in the control group (ACSF infusion in the LHb) were excluded due to low level of infection with the AAV-Cre in the RMTg. Interestingly, these animals also failed to show Cre-recombinase expression in the LHb. Two animals were also excluded in the Casp3-infused group due to significant expression of Cre-recombinase in the LHb (mean number of Cre-positive neurons per mm2 in these animals were 11.49 and 12.02).
Experimental design and statistical analysis
We used a between-subjects experimental design for all experiments. Experiments 1 and 2 compared performance of LHb-lesion animals to that of control sham-lesion animals, whereas Experiment 3 contrasted performance of animals with a disrupted LHb–RMTg network to that of animals with an intact network. The analyses of Pavlovian responding (i.e., forward and backward training, reversal training) included training days and period (i.e., stimulus vs prestimulus) as within-subject factors while the analyses of instrumental responding used only training days as a within-subject factor. The transfer tests were analyzed with “Same” and “Different” responding as the within-subject factor while the choice tests were analyzed with “Valued” and “Devalued” responding as the within-subject factor. Figure 5C,D added an additional within-subject factor: prereversal test and postreversal test. All experiments included ≥8 animals per group after histological exclusion. The differences between groups or stimuli/compounds were analyzed by means of a planned orthogonal or nonorthogonal contrast testing procedure, which used the Bonferroni inequality method to control the experiment-wise error rates. Within-session changes of responding were assessed by a planned linear trend analysis. All these procedures and analyses have been described by Hays (Hays, 1963) and were conducted in PSY software (School of Psychology, The University of New South Wales, Australia). The type-I error rate was controlled at 0.05 for each contrast tested.
Results
LHb lesions abolish the effect of outcome-specific inhibitors on choice
A stimulus trained as a negative predictor of a specific appetitive outcome has been reported to bias choice away from actions earning that outcome and toward actions delivering other outcomes (Laurent et al., 2015, 2016; Laurent and Balleine, 2015). Therefore, as is clear from these studies, negative predictors reverse the traditional effects produced by positive predictors: they leave unaffected performance on actions earning the outcome that they predict will not occur while increasing responding on actions delivering other outcomes. As elaborated further in these papers, the importance of the latter effect is that it provides evidence for the development of inhibitory action–outcome associations that share the same predictive content with the Pavlovian inhibitors (i.e., in both cases the action or the stimulus predicts the absence of a specific outcome), providing the basis for a consistent explanation for the effects of excitors and inhibitors on action selection via an outcome–response account of specific transfer (Balleine and Ostlund, 2007).
Experiment 1 used this property to determine whether the LHb is required to establish or use outcome-specific conditioned inhibition in an appetitive paradigm. A set of naive rats received either bilateral electrolytic lesions of the LHb (Lesion, n = 10; Fig. 1) or control sham lesion (Sham, n = 11). All rats were then given backward training in which the delivery of two food outcomes, O1 and O2, preceded the presentations of two distinct stimuli, S1 and S2. The delivery of O1 always preceded the presentation of S1. The delivery of O2 always preceded the presentation of S2. We and others have found that such an arrangement imbues S1 and S2 with inhibitory properties, acting as negative predictors of O1 and O2, respectively (Delamater et al., 2003; Laurent et al., 2015; Laurent and Balleine, 2015). After backward conditioning, rats received instrumental training, during which they learned to perform two lever-press actions, A1 and A2, to deliver the two outcomes: A1 earned O1; A2 earned O2. Finally, we assessed the influence of S1 and S2 on choice between A1 and A2 in two tests conducted in the absence of outcome delivery. This enabled us to assess the effects of the inhibitory stimuli on choice in the absence of programmed feedback.
Backward training occurred as expected (Fig. 2A). It did not reveal any difference between the two groups of rats (Sham vs Lesion, F < 0.1), it did not increase conditioned responding across days (F < 1.7), and it produced similar levels of responding whether the stimuli were present or absent (Pre vs S, F < 0.2). Interestingly, the tendency early in training for higher responding in the presence of the stimuli than in their absence was reversed by the end of that training (F(1,19) = 99.76; p < 0.05). This result suggests that the backward arrangement turned S1 and S2 into inhibitors of their respective outcomes. Instrumental training was successful (Fig. 2B), as lever-press responding increased as the ratio parameters increased across days (F(1,19) = 527.69, p < 0.05). LHb lesions had no influence on instrumental responding (F's < 0.3).
The data of greatest interest were those from the two tests in which we assessed the influence of S1 and S2 on choice between the two instrumental actions. This influence is plotted in Figure 2C as the mean number of lever presses per minute when the stimuli served as specific inhibitors of either the same outcome as the action (Same) or the outcome that differed from the action (Different). Thus, A1 was identified as “Same” and A2 as “Different” in the presence of S1 and, conversely, A2 was identified as “Same” and A1 as “Different” during S2. Further, baseline responding was subtracted from these rates of responding as it was similar in the two groups of rats (F < 1.2). Baseline responding was defined as the mean number of lever presses per minute on both actions in the absence of the stimuli. Subtracting this responding enabled us to reveal the net increase in choice performance over baseline; i.e., the net effect of the predictive stimuli. Inspection of the figure clearly indicates that the pattern of choice triggered by the two stimuli was influenced by LHb lesion.
Overall, Sham animals displayed less total responding than Lesion animals (F(1,19) = 4.40; p < 0.05) but there was no difference in performance on the Same and Different actions (F < 0.1). More importantly, there was a significant interaction: responding on the Same action relative to responding on the Different action differed between groups (F(1,19) = 7.397, p < 0.05). Unfortunately, simple effect analyses failed to reveal the source of this interaction for either the Sham (F(1,10) = 4.296; p = 0.065) or the Lesion group (F(1,9) = 3.20; p = 0.102). We therefore conducted additional analyses to determine how the stimuli influenced responding on the two actions relative to baseline responding. In the Sham group, performance on the Different action was substantially higher than during baseline (F(1,10) = 7.273; p < 0.05), whereas performance on the Same action did not differ from baseline (F < 0.07). Importantly, rats with LHb lesions did not show this effect or, indeed, any preference for one action over the other. Instead their responding was elevated above baseline on both the Same (F(1,9) = 10.843; p < 0.05) and Different actions (F(1,9) = 5.233; p < 0.05). As such, two main conclusions can be drawn from the present experiment. First, stimuli that act as negative predictors of a particular outcome bias choice away from actions that earn that outcome and toward actions that earn different outcomes. Second, the influence of inhibitors on choice between actions is abolished by LHb lesion. This finding is consistent with the view that the LHb is required for conditioned inhibitors, i.e., stimuli that predict the absence of an appetitive outcome (Proulx et al., 2014), to guide choice between distinct courses of action.
LHb lesions spare the effect of outcome-specific excitors on choice
The previous experiment provides evidence that the LHb is essential for the functional effects of negative predictions in an appetitive paradigm; LHb lesions disrupted the influence of these predictions on choice between actions. However, it remains possible that this disruption was not specific to negative predictions but rather removed the ability of any predictor to guide choice. To address this possibility, Experiment 2 evaluated whether lesions of the LHb affect the influence of positive predictors on choice between actions. A naive cohort of rats received either electrolytic lesions of the LHb (Lesion; n = 12; Fig. 1C) or control sham lesions (Sham; n = 10). After surgery, the rats were given standard forward pairings between two stimuli, S1 and S2, and the delivery of two food outcomes, O1 and O2, respectively. They were then given training on the levers as previously described, after which they received a transfer test in which the influence of S1 and S2 on choice between A1 and A2 was assessed in a single session.
The forward pairing of S1–O1 and S2–O2 occurred smoothly (Fig. 3A); conditioned responding increased across sessions (F(1,21) = 6.77; p < 0.05) and this responding was higher in the presence of the stimuli than in their absence (F(1,21) = 290.06; p < 0.05). Further, the difference between these two periods (Pre vs S) increased with training (F(1,21) = 51.52; p < 0.05). Although there was no overall effect of group on responding or on the overall difference between the two periods of interest (F's < 3.1), LHb lesion did disrupt the extent to which the difference between the two periods increased across training (F(1,21) = 4.43; p < 0.05). Nevertheless, this disruption was subtle and appeared to be largely due to a slight decrease in responding across the last two sessions of training in the Lesion group; the difference in responding in the presence of the stimuli and in their absence late in training clearly indicates successful excitatory learning in the Lesion group. As before, instrumental training was similar in the two groups (F's < 0.8; Fig. 3B) and lever-press responding increased as the ratio parameters increased across days (F(1,21) = 257.56, p < 0.05).
The data from the transfer test are presented in Figure 3C as mean number lever presses per minute when the stimulus predicted the same outcome as the action (Same) and when it predicted a different outcome from the action (Different). Baseline responding did not differ between the two groups (F < 1.6) and was subtracted from Same and Different responding. Inspection of the figure suggests that LHb lesion had no effect on choice between actions. Responding on the Same action was increased relative to responding on the Different action overall (F(1,21) = 37.31, p < 0.05) and significantly in both Sham (F(1,10) = 13.94, p < 0.05) and Lesion (F(1,11) = 25.60, p < 0.05) groups. This pattern of responding was not influenced by damage to the LHb (F < 0.03) nor did the LHb lesion produce any overall effect on performance (F < 0.8). Consistent with the literature (Colwill and Rescorla, 1988; Dickinson and Balleine, 1994; Holmes et al., 2010), this experiment reveals, therefore, that a stimulus predicting a specific outcome biases choice toward an action earning the same outcome. Furthermore, this bias was unaffacted by lesions of the LHb, confirming that the role of this brain region in Pavlovian-instrumental transfer is specific to the influence of conditioned inhibitors on choice and does not extend to conditioned excitors.
LHb lesions spare the effect of outcome devaluation on choice
So far, we have shown that the LHb is critical for the way conditioned inhibitors signaling the absence of specific appetitive outcomes bias choice between instrumental actions. However, it is well established that choice is also influenced by the value of the outcomes delivered by the instrumental actions (Dickinson and Balleine, 1994); an action delivering a valued outcome will be favored over an action earning a less valued outcome. Here, we assessed whether the LHb was required for this value-based decision-making process. Rats from Experiments 1 and 2 were given instrumental retraining after the final transfer tests and were then given two outcome devaluation choice tests over 2 consecutive days. Before each test, one of the training outcomes was devalued using sensory-specific satiety. This was achieved by giving free access to that outcome for 1 h before assessing choice between the two actions. To prevent any feedback and further learning, choice was evaluated under extinction conditions (i.e., no outcomes were delivered). The second choice test was conducted in the same fashion the following day except the other outcome was devalued.
Instrumental retraining occurred smoothly (Fig. 4A). There was an increase in lever presses across the two sessions (F(1,42) = 16.82; p < 0.05) and performance was similar whether rats had received LHb or sham lesion (F's < 2.4). The data from the two choice tests are presented in Figure 4B. The action delivering the valued outcome was performed at higher rates than the action earning the outcome that had just been devalued (F(1,40) = 28.78; p < 0.05) and lesions of the LHb had no effect on this bias in choice (F's < 2.26). The present experiment indicates, therefore, that the LHb does not mediate the influence of outcome devaluation on choice between goal-directed actions, indicating that it plays no role either in processing the relationship between actions and their respective outcomes, in the changes in outcome value induced by specific satiety, or in the integration of these factors in value-based decision-making. This finding is consistent with the results from the previous experiment; intact action–outcome associations are required for changes in the value of specific outcomes and for excitatory predictors of those outcomes to appropriately bias action selection and choice.
LHb lesions impair the reversal of Pavlovian contingencies
The previous experiments suggest that the LHb is critical for acquiring outcome-specific conditioned inhibition; LHb lesions abolish the ability of inhibitory stimuli to guide choice between actions. Here, we sought to confirm and expand on this role by assessing whether the LHb is necessary during the reversal of appetitive stimulus–outcome contingencies. Such reversal can be studied by initially training animals to learn that two stimuli predict two distinct food outcomes (e.g., S1–O1 and S2–O2) and then reversing these contingencies (i.e., S1–O2; S2–O1). Although such reversal obviously requires learning about the new contingencies, it also involves inhibiting the old ones (Rescorla, 2007); i.e., subsequent to reversal, to successfully respond on the new contingencies the animals must now encode that S1 no longer predicts O1 and that S2 no longer predicts O2. We tested whether the LHb was involved in this specific form of negative prediction. To do so, we gave reversal training to the animals used in Experiment 2, i.e., those that had received forward training (Fig. 3), such that S1 now predicted O2 and S2 signaled O1. We evaluated the impact of this reversal by again assessing how the two stimuli influenced choice between the two previously trained actions in a transfer test.
Reversal training was successful (Fig. 5A). Conditioning responding gradually increased across training (F(1,21) = 10.99; p < 0.05) and was greater in the presence of the stimuli than in their absence (F(1,21) = 199.94; p < 0.05). The difference between these two periods did not grow larger across sessions (F(1,21) = 4.02; p = 0.058), presumably because this difference was already substantial at the start of training. The LHb lesions had no effect on conditioned responding per se (F's < 0.2); however, our aim, using the Pavlovian-instrumental transfer test, was to evaluate what they had encoded. Before the test the rats were given reminder sessions of instrumental training (Fig. 5B) to ensure substantial performance after the previous extinction tests (Figs. 3C, 4B). As expected, this training occurred smoothly and lever-press responding increased across sessions (F(1,21) = 13.23; p < 0.05). The LHb lesion had no effect on this responding (F's < 2.2).
The data from the transfer test are presented in Figure 5C,D. These data were compared against those obtained before reversal training (Fig. 3C) so as to evaluate directly how animals altered their performance according to the new stimulus–outcome contingencies. Performance is shown as the mean number of lever presses per minute when the stimulus predicted the same outcome as the action (Same) and when the stimulus predicted a different outcome to the action (Different) relative to the appropriate contingency. Sham-lesioned rats successfully updated and reversed the previously learned stimulus–outcome contingencies (Fig. 5C). Thus, although overall responding decreased during the second transfer test (F(1,10) = 5.156, p < 0.05), performance on the Same action was increased relative to performance on the Different action (F(1,10) = 14.24; p < 0.05) regardless of whether the test occurred after reversal training or not; i.e., the interaction with test was not significant (F < 3.1). Nevertheless, and unfortunately, the simple effect in the reversal test was not significant, likely due to the depressive effects of further extinction testing on performance. In contrast, LHb lesions prevented animals from altering their choice based on the new stimulus–outcome contingencies (Fig. 5D). Overall responding was similar across the two tests (F < 0.4) and performance on the Same action was higher than performance on the Different action (F(1,11) = 6.78; p < 0.05). However, the pattern of choice depended on whether choice was assessed before reversal training or after; the interaction with test was significant (F(1,11) = 15.32; p < 0.05). Before reversal training, the stimuli biased choice toward the action earning the same outcome as that predicted by the cue (F(1,11) = 25.60; p < 0.05). This bias was, however, absent after reversal training (F < 0.5) such that animals distributed their responses equally on both actions. Despite the general loss of power induced by the reduction in performance induced by the need for multiple extinction tests, the present results suggest that the LHb may be necessary for the reversal of appetitive contingencies and are consistent, therefore, with the argument that this region plays an important role in outcome-specific inhibition.
The LHb input to the RMTg is necessary for outcome-specific inhibition
The previous experiments indicate that the LHb may be necessary for outcome-specific inhibition in an appetitive paradigm. This is consistent with the view that the LHb regulates negative prediction error (Proulx et al., 2014). At a neural level, this regulation has been proposed to occur via LHb-mediated activation of inhibitory GABAergic neurons in the RMTg that in turn depresses activity of DA neurons in the VTA (Jhou et al., 2009; Brown et al., 2017). Experiment 3 tested this proposal by using a viral approach that would disrupt communication from the LHb onto the RMTg. To achieve this, a new set of naive rats were infused with AAV-Cre into the RMTg to label LHb-projecting neurons with Cre-recombinase (Marchant et al., 2016). Once labeled, these neurons were ablated in one group of rats using the Casp3 virus (Casp3; n = 12; Yang et al., 2013), Meanwhile, the neurons were not ablated in another group of control rats (Control; n = 8). All rats then received backward training and instrumental training in the manner described in Experiment 1. Following these two training stages, we administered a transfer test that assessed the influence of the two outcome-specific conditioned inhibitors, S1 and S2, on choice between A1 and A2.
In both groups of rats, immunofluorescence analysis revealed substantial expression of AAV-Cre in the RMTg (Fig. 6A,B). This expression is consistent with the equally substantial expression of Cre-recombinase-positive neurons in the LHb of control rats (Fig. 6C). Importantly, no such neurons were found in the LHb of Casp3-treated rats (Fig. 6D), indicating that the Casp3 virus was successful in ablating LHb neurons projecting to the RMTg (Fig. 6E). At the behavioral level, backward training occurred smoothly (Fig. 6F). Looking overall, conditioned responding decreased across days (F(1,18) = 52.48; p < 0.05) and was higher in the absence of the stimuli than in their presence (Pre vs S; F(1,18) = 16.96; p < 0.05). Interestingly, the tendency early in training for higher responding in the presence of the stimuli than in their absence was reversed by the end of that training (F(1,18) = 104.93; p < 0.05). There was no difference between the two groups of rats (Control vs Casp3; F < 2.8). These results suggest, therefore, that the backward arrangement turned S1 and S2 into inhibitors of their respective outcomes. Instrumental training (Fig. 6G) was successful, as lever-press responding increased as the ratio parameters increased across days (F(1,18) = 155.17; p < 0.05). There was no difference between the two groups of rats on this measure (F < 2.7).
The data from the transfer test are presented in Figure 6H as the mean number of lever presses per minute when the stimuli predicted the absence of either the same outcome as the action (Same) or the outcome that differed from the action (Different). Baseline responding was subtracted from these rates of responding as it was similar in the two groups of rats (F < 0.42). Clearly, the pattern of choice triggered by the negative predictors depended on the integrity of the projections from the LHb to the RMTg. Overall, responding was equivalent between the two groups of rats (F < 0.29) and there was no difference in total performance on the Same and Different actions (F < 0.47). However, critically, responding on the Same action relative to responding on the Different action differed between the two groups (F(1,18) = 7.78; p < 0.05). In the control group, the stimuli biased choice away from the action earning the outcome they negatively predicted and toward the action delivering the other outcome (Different > Same: F(1,18) = 6.20; p < 0.05). This pattern of choice was reversed in the Casp3 group such that the stimuli guided choice toward the action earning the outcome that they predicted would be omitted (F(1,18) = 10.45; p < 0.05). Thus, the negative predictors in the Casp3 group influenced choice in the same manner as positive predictors, such as those in Experiment 2. This finding is consistent with the view that LHb inputs to the RMTg are necessary for the functional effects of outcome-specific conditioned inhibition. It should be noted that this critical role for these LHb inputs was not apparent across backward training in the present experiment nor was it apparent in Experiment 1, in which electrolytic lesions were used: no deficit was observed during the training stage. This lack of deficit in performance confirms that inhibitory learning requires appropriate tests for it to be revealed. These tests include subtractive summation, retardation (Rescorla, 1969), or the outcome-specific transfer assessment used in the present study.
Finally, we examined whether projections from the LHb to the RMTg are necessary for valued-based choice. To do so, we used the same sensory-specific satiety manipulation as that used in Experiments 1 and 2. Thus, following the transfer test, we gave instrumental reminder training, which was successful (Fig. 6I) although it did not reveal a significant increase in responding (F < 0.4) likely due to high levels of responding on the very first day of retraining. We then gave two postdevaluation choice tests (Fig. 6J). The action delivering the valued outcome was performed at higher rates than the action earning the outcome that had just been devalued (F(1,18) = 26.61; p < 0.05). There was a tendency for the Casp3 animals to display lower instrumental responding overall (F(1,18) = 4.13, p = 0.057) but, critically, performance on the valued and devalued actions remained similar in both groups (F < 1.1). This confirms that LHb inputs to the RMTg are critical for the influence of outcome-specific conditioned inhibitors, but not of outcome value, on choice between actions.
Discussion
The present experiments found that the LHb mediates choice between actions in the presence of outcome-specific inhibitors via its projections to the RMTg. In animals with an intact habenula, a negative predictor of a particular food outcome biased choice away from an action earning that outcome and toward an action earning a different outcome. This bias was absent in animals with LHb lesions or selective ablation of LHb neurons projecting to the RMTg. The impairment produced by these manipulations was specific to the negative prediction carried by the stimulus. A stimulus that positively predicted a particular food outcome biased choice toward an action earning that predicted outcome and in a similar manner whether the LHb had been damaged or not. The role played by the LHb was further clarified by the following findings: (1) the LHb is required to inhibit previous learning after the reversal of stimulus–outcome contingencies, and (2) the LHb and its inputs onto the RMTg do not mediate choice based on the reward value of the outcomes earned by the actions. Together, these findings are consistent with the claim that LHb neurons projecting to the RMTg mediate the encoding of outcome-specific inhibition and/or the influence of these inhibitors on choice between actions.
Experiments 1 and 2 assessed the role played by the LHb using localized electrolytic lesions. This method of assessment presents certain advantages, as it ensures durable silencing of the targeted brain region and it allows accurate determination of the extent of the damage. However, it also has disadvantages, such as the potential development of compensatory mechanisms. Such compensation is, however, unlikely to have contributed to the present findings. We found an effect of the LHb lesion on the effect of inhibition on choice but not excitation. Further, although animals trained with the excitors did not show impairments initially, they emerged during learning the reversal of these stimulus–outcome contingencies. Another disadvantage of electrolytic lesions is that they damaged fibers of passage as well as brain regions adjacent to the LHb, such as the paraventricular thalamic nucleus (PVT) and the medial habenula (MHb). However, such damage is unlikely to be the source of the present results because the main fiber bundles passing through the LHb mostly target inputs and outputs of the LHb (Tian and Uchida, 2015) and because the current consensus on the role played by the PVT and MHb does not predict a dissociation between excitors and inhibitors (Haight and Flagel, 2014; Proulx et al., 2014). More critically, Experiment 3 used a viral approach that selectively ablated LHb neurons projecting onto the RMTg while leaving intact local fibers and activity in adjacent structures. This approach generated a similar effect to the LHb lesion. Thus, we feel confident that the effects reported in the lesion experiments were derived from a loss of LHb function.
The current experiments confirmed that excitors and inhibitors exert an opposite influence over choice between actions (Laurent et al., 2015, 2016; Laurent and Balleine, 2015). In control animals, outcome-specific excitors biased choice toward actions earning the predicted outcome. This bias was reversed by the conditioned inhibitors, which guided choice away from the action earning the inhibited outcome, toward the action delivering an alternative outcome. Critically, LHb lesions produced highly selective impairments. It spared choice produced by the excitors and only influenced choice driven by the inhibitors. This influence was reproduced when we selectively ablated LHb neurons projecting onto the RMTg, suggesting that these neurons are required for outcome-specific inhibition. One surprising finding, however, was that rats with selective ablation of LHb inputs to the RMTg displayed a similar pattern of choice in the presence of the inhibitors to that induced by the excitors. Indeed, performance on the action delivering the outcome negatively predicted by the stimulus was greater than performance on the action earning the other outcome. A similar nonsignificant trend was observed in animals with the LHb lesion. It is not immediately clear why removing the inhibitory properties of the stimulus should turn that stimulus into an excitor, although inhibition induced by backward conditioning is highly sensitive to the delay between the presentation of the outcome and the stimulus (Maier et al., 1976). Short delays can result in the stimulus becoming an excitor, whereas long delays turn the stimulus into an inhibitor (Delamater et al., 2003; Laurent and Balleine, 2015). However, the development of these properties is not mutually exclusive; a backwardly trained stimulus may gain both excitatory and inhibitory properties (Cole and Miller, 1999). It is possible, therefore, that the removal of the inhibitory properties via selective ablation of LHb inputs onto the RMTg unmasked excitatory properties of the stimuli, leading to a pattern of choice similar to that produced by excitatory stimuli.
The role of the LHb in the functional effects of inhibitory relationships between stimuli and outcomes was further confirmed using reversal learning. This task required animals to learn new stimulus–outcome contingencies but, to respond appropriately, they also needed to inhibit any previously acquired contingencies (Rescorla, 2007). Control animals were able to bias choice between actions according to the reversed contingencies. In contrast, rats with LHb lesions failed to show this effect. As these rats were initially able to modify choice based on the original stimulus–outcome contingencies, the impairment was not due to a deficit in Pavlovian conditioning per se. Rather, LHb lesions appeared to prevent inhibition of the original contingencies, which is consistent with the general claim that the LHb is involved in learning inhibitory relationships between stimuli and their associated outcomes.
Finally, we extended our understanding of the role played by the LHb on choice by showing that this role does not include choices driven by outcome value. Animals were able to select an action delivering a valued outcome over an action earning a devalued outcome to a similar extent whether the LHb or its inputs to the RMTg had been lesioned or not. This implies that the LHb and its inputs onto the RMTg are not required for learning specific action–outcome contingencies or for integrating those contingencies with changes in outcome value. Consistent with this, LHb-lesioned animals were able to choose appropriately in the presence of the excitors. Thus, the LHb and its inputs onto the RMTg do not contribute to choice per se but rather mediate the functional properties of outcome-specific conditioned inhibitors and the use of these specific stimuli to guide choice between actions.
If the LHb and its inputs onto the RMTg are strictly involved in processing inhibitory relations between stimuli and appetitive outcomes, it remains to be determined how this information is used to influence choice between actions. Evidence suggests that the nucleus accumbens shell (NAcS) is critical for choice driven by predictive stimuli (Corbit et al., 2001; Corbit and Balleine, 2011). Indeed, disrupting activity in the NAcS at the time of choice removes the influence of both excitatory and inhibitory stimuli (Laurent et al., 2015). Interestingly, the NAcS receives dense projections from midbrain DA neurons located in the VTA (Ikemoto, 2007). It is, therefore, possible that the encoding of inhibitory stimulus–outcome relationships in LHb neurons projecting to the RMTg allows activation of local GABAergic neurons to, in turn, generate a negative prediction error signal in the VTA by depressing the activity of local DA neurons (Tobler et al., 2003; Jhou et al., 2009; Proulx et al., 2014), thereby altering DA release in the NAcS and the pattern of choice. An alternative or potentially complementary possibility involves the same circuitry but highlights a role for GABA projecting neurons in the VTA. These neurons have been shown to target the NAcS and to strongly regulate activity of cholinergic interneurons in that structure (Brown et al., 2012). These interneurons have been shown to be critical for choice between actions in the presence of excitatory or inhibitory stimuli (Bertran-Gonzalez et al., 2013; Laurent et al., 2014, 2015). Thus, the LHb could promote the influence of inhibitory cues on choice by modulating, directly or indirectly (Jhou et al., 2009; Omelchenko et al., 2009), the function of VTA GABAergic neurons that, in turn, control the pattern of activity of cholinergic interneurons in the NAcS to affect choice. These possibilities are speculative; however, they point to a model in which outcome-specific inhibitors promote choice between actions by activation of neural pathways involving the LHb, the RMTg, the VTA, and the NAcS.
In conclusion, the present experiments reveal that LHb neurons projecting onto the RMTg are critical for outcome-specific inhibition in an appetitive setting. In the absence of activity in these neurons, appetitive inhibitors lose their ability to guide choice between actions and animals are unable to inhibit previously learned stimulus–outcome contingencies. The role of the LHb is specific to inhibitory Pavlovian contingencies; it is not involved in the influence of appetitive excitors or the instrumental contingency on choice. The role of the LHb and its inputs onto the RMTg identified here is, therefore, consistent with its proposed involvement in negative reward prediction errors and highlights the contribution of this brain region to predictive learning.
Footnotes
This work was supported by the Australian Research Council in the form of an Early Career Fellowship to V.L. (DE140100868), a Laureate Fellowship to B.W.B. (FL0992409), and a Discovery Project to both V.L. and B.W.B. (DP130103965). B.W.B. is supported by a Senior Principal Research Fellowship from the National Health and Medical Research Council of Australia (GNT1079561).
The authors declare no competing financial interests.
- Correspondence should be addressed to Bernard Balleine, Decision Neuroscience Laboratory, 4th Floor Matthews Building, School of Psychology, University of New South Wales, Kensington, New South Wales 2052, Australia. bernard.balleine{at}unsw.edu.au