Abstract
The decision to perform, or not perform, actions known to lead to a rewarding outcome is strongly influenced by the current incentive value of the reward. Incentive value is largely determined by the affective experience derived during previous consumption of the reward—the process of incentive learning. We trained rats on a two-lever, seeking–taking chain paradigm for sucrose reward, in which responding on the initial seeking lever of the chain was demonstrably controlled by the incentive value of the reward. We found that infusion of the μ-opioid receptor antagonist, CTOP (d-Phe-Cys-Tyr-d-Trp-Orn-Thr-Pen-Thr-NH2), into the basolateral amygdala (BLA) during posttraining, noncontingent consumption of sucrose in a novel elevated-hunger state (a positive incentive learning opportunity) blocked the encoding of incentive value information normally used to increase subsequent sucrose-seeking responses. Similar treatment with δ [N, N-diallyl-Tyr-Aib-Aib-Phe-Leu-OH (ICI 174,864)] or κ [5′-guanidinonaltrindole (GNTI)] antagonists was without effect. Interestingly, none of these drugs affected the ability of the rats to encode a decrease in incentive value resulting from experiencing the sucrose in a novel reduced-hunger state. However, the μ agonist, DAMGO ([d-Ala2, NMe-Phe4, Gly5-ol]-enkephalin), appeared to attenuate this negative incentive learning. These data suggest that upshifts and downshifts in endogenous opioid transmission in the BLA mediate the encoding of positive and negative shifts in incentive value, respectively, through actions at μ-opioid receptors, and provide insight into a mechanism through which opiates may elicit inappropriate desire resulting in their continued intake in the face of diminishing affective experience.
Introduction
Goal-directed actions are the means through which we exert control over our environment in service of our desires. The decision to engage in such actions is largely controlled by the incentive value of the goal—the degree to which it is desired (Balleine and Dickinson, 1998). Incentive learning is the process through which this value is established so that it may be used to guide future reward seeking. This occurs when an individual has a novel affective experience with the reward (Balleine, 1992, 2001). Interestingly, recent evidence suggests that the neural processes underlying the pleasure elicited during reward contact (‘liking’) and the attribution of incentive value to that reward are dissociable (Wassum et al., 2009).
Endogenous opioid peptides are thought to convey the affective properties of food rewards, based on the palatability-enhancing effects of opiates (Doyle et al., 1993; Kelley et al., 2002), and μ-opioid receptor-mediated hedonic hotspots have been localized to the nucleus accumbens shell and ventral pallidum (Peciña and Berridge, 2005; Smith and Berridge, 2005, 2007). Popular constructs of motivated behavior highlight the dissociation of this reward ‘liking’ and ‘wanting’ (Robinson and Berridge, 2001, 2003). Such theories focus on ‘wanting’ as the general motivational influence that reward-associated cues exert on instrumental actions, mediated primarily through dopamine systems (Berridge, 2007), but also involving amygdala central nucleus opioids (Mahler and Berridge, 2009). However, as previously mentioned, goal-directed actions are heavily influenced by the incentive value of the reward (Balleine and Dickinson, 1998), which is dependent on prior reward experience, and is therefore distinct from the general motivational effects of cues (Dickinson and Balleine, 1994; Corbit and Balleine, 2005).
As alluded to above, this incentive learning process may also be dissociable from ‘liking’. Opioid receptor activation in hedonic hotspots was found to be necessary for deprivation-induced palatability increases, but not for incentive value elevations induced by such deprivation (Wassum et al., 2009). Conversely, activation of opioid receptors in the basolateral amygdala (BLA) was necessary for encoding an incentive value increase, but not for expression of increased palatability (Wassum et al., 2009).
Such a tripartite role of endogenous opioid systems in mediating the consummatory effects of rewards (‘liking’), the process by which such ‘liking’ effects are encoded as incentive value to guide reward-seeking actions (incentive learning), and the invigorating effects of reward-paired stimuli (‘wanting’), may explain the intensely addictive nature of opiates. With respect to incentive learning, this account would be strengthened by evidence that endogenous activation of BLA opioid receptors promotes positive, but not negative, shifts in incentive value, since this might explain why drugs continue to be sought despite negative consequences of their use. Moreover, one might predict exogenous activation of opioid receptors in the BLA to impede the encoding of negative incentive value shifts. These hypotheses were tested here together with an examination of the opioid receptor subtype in the BLA-mediating incentive learning given the presence of μ-, δ-, and κ-opioid receptors in this region (Mansour et al., 1994; Ding et al., 1996).
Materials and Methods
General approach
An incentive learning paradigm was used wherein rats were trained on a seeking–taking chain of actions to press one lever to gain access to a second lever that delivered a sucrose pellet reward (Balleine et al., 1995; Corbit and Balleine, 2003; Wassum et al., 2009). The importance of this seeking–taking chain for assessment of incentive value, in contrast to the more common single-lever instrumental paradigms, is highlighted by data showing that responding on the lever distal to reward delivery is sensitive to the incentive value of the reward while being relatively immune to the general activational effects of motivational state (in this case, hunger state) and to reward-related cues (Balleine et al., 1995; Corbit and Balleine, 2003). That is, whereas changes in food-deprivation state have immediate general effects on activity on the lever proximal to reward delivery when tested under nonrewarded conditions, changes in responding on the distal-seeking lever require that the rat learn about the value change of the specific outcome earned by the action through prior experience of the reward in the altered motivational state, i.e., incentive learning (Balleine et al., 1995; Corbit and Balleine, 2003; Balleine et al., 2005). For example, an increase in reward-seeking vigor occurs only after the animal has consumed reward in the new hungry state and in so doing has learned that, in this state, the reward is more palatable and now retains a higher incentive value.
Experiment 1 (Table 1) was designed to replicate these previous findings and allow us to focus on the distal reward-seeking response as a measure of the incentive value of the sucrose outcome the rat is working to receive. In experiment 2 (Table 2), rats were trained under low-deprivation conditions and then given the opportunity for positive incentive learning by allowing them to consume noncontingent sucrose pellets when highly food-deprived and in so doing learn that, in this state, the sucrose is more palatable and retains a higher incentive value. Opioid antagonists or vehicle were administered into the BLA immediately before this learning phase. The next day, their responding on the seeking–taking chain was measured, off drug, under nonrewarded conditions while in the heightened food-deprived state to test the effects of the previous day's opportunity for incentive learning on reward-seeking actions. In experiment 3 (Table 3), the food deprivation state was reversed; animals were trained in a high food-deprived state and then given an opportunity for negative incentive learning in a low-deprivation state (i.e., to learn that in the low-deprivation state, sucrose is less palatable and therefore less valuable). Again, opioid antagonists or vehicle were administered into the BLA immediately before this learning phase. This incentive learning opportunity was then followed by a nonrewarded test, off drug, in this low-deprivation condition. Experiment 4 followed from experiment 3 to assess the effects of the μ-opioid receptor agonist DAMGO ([d-Ala2, NMe-Phe4, Gly5-ol]-enkephalin) on negative incentive learning.
Subjects
Male Long–Evans rats (experiment 1, n = 29; experiment 2, n = 35; experiment 3, n = 32; experiment 4, n = 23; Charles River Laboratories) were group housed and handled for 3 d before training. Rats had ad libitum access to tap water in the home cage and were fed ∼3 h after each day's training session according to the deprivation schedules described below. All procedures were conducted in accordance with the NIH Guide for the Care and Use of Laboratory Animals and approved by the UCLA Institutional Animal Care and Use Committee.
Apparatus and training
Training and testing took place during the light phase of the 12:12 h light:dark cycle in eight Med Associates operant chambers described previously (Corbit and Balleine, 2003).
Heterogeneous seeking–delivery chain training
Briefly, rats were trained to earn 45 mg sucrose pellets (Bioserv) on a heterogeneous seeking–taking chain. The training procedures were similar to those we have previously used (Corbit and Balleine, 2003; Wassum et al., 2009). Each session started with the illumination of the house light and insertion of the levers where appropriate and ended with the retraction of the levers and turning off of the house light. Rats received only one training session per day.
Magazine training.
Rats received 3 d of magazine training in which they were exposed to noncontingent sucrose deliveries (20 outcomes over 30 min) in the operant chamber with the levers retracted, to learn where to receive the sucrose pellet reward.
Single-action instrumental training.
Rats were then given 3–4 d of single-action, continuous reinforcement training on the lever to the right of the magazine with the sucrose delivered on a continuous reinforcement schedule. Each session lasted until 20 outcomes had been earned or 30 min elapsed.
Training on the reward-delivery chain.
Following single-action instrumental training, the seeking lever (i.e., the lever to the left of the magazine) was introduced into the chamber, initially in the absence of the delivery lever. Rats (n = 2) that failed to acquire the chain in experiment 2 (20 earned outcomes within 30 min) were dropped from the experiment.
Surgery
Standard stereotaxic procedures were used for implantation of bilateral guide cannulae (Wang et al., 2005; Wassum et al., 2009). Rats were anesthetized with isoflurane (4–5% induction, 1–2% maintenance) and implanted bilaterally with 22 gauge stainless steel guide cannulae (Plastics One) 1 mm above the intended injection site in the BLA (coordinates from bregma and the skull surface: anterior-posterior, −3.0; medial-lateral, ±5.1; dorsal-ventral, 8.0).
Experiment 1
In experiment 1 (Table 1), rats, 3 h food deprived, were trained as above to press a seeking lever to gain access to a taking lever, a single press on which delivered sucrose. After training, all rats were placed in the operant chamber with the levers retracted. Half the rats (n = 14) were given 30 noncontingent sucrose pellet presentations over 40 min, and the unexposed group (n = 15) did not receive the any pellets. The next day, still 3 h food deprived, all rats were tested for their responding on the chain under nonrewarded conditions for 5 min. This nonrewarding test was conducted just as in training, with rats responding on the seeking lever on random ratio 4 (RR-4) to receive the second taking lever, which was retracted once pressed; no reward was delivered. Rats were then retrained for 2 d on the 3 h food-deprived schedule. For the second phase, all rats were switched to a 23 h food-deprived state and placed in the operant chamber with the levers retracted and either given 30 noncontingent sucrose pellet presentations over 40 min (an opportunity for incentive learning) or no pellets, as before. All rats were then tested the following day, still 23 h food deprived, for their responding in the chain under nonrewarded conditions. The order of the deprivation manipulation was not counterbalanced based on our preliminary data, indicating that experience with the outcome in the increased deprivation state in the first test significantly impacted performance in the second testing series.
Experiment 2
For experiment 2 (Table 2), rats, 3 h food deprived, were trained as above to press a lever to gain access to a second lever, a single press on which delivered a sucrose pellet. Rats were trained before surgery for cannula implantation, then after surgery rats were retrained on the final schedule for 2 d. After the last postsurgery training session, all rats were maintained 3 h food deprived and received an infusion of either intra-BLA vehicle (n = 14), N, N-diallyl-Tyr-Aib-Aib-Phe-Leu-OH (ICI 174,864; n = 6), 5′-guanidinonaltrindole (GNTI; n = 8), or d-Phe-Cys-Tyr-d-Trp-Orn-Thr-Pen-Thr-NH2 (CTOP; n = 7) immediately before being placed in the operant chamber with the levers retracted and given 30 noncontingent sucrose pellet presentations over 40 min (the revaluation). The next day, still 3 h food deprived but devoid of drug, all rats were tested for their responding on the chain under nonrewarded conditions for 4 min. This nonrewarded test was conducted just as in training, with rats responding on the seeking lever on RR-4 to receive the second taking lever, which was retracted once pressed; no reward was delivered. Rats were then retrained for 2 d on the 3 h food-deprived schedule. For the second phase, all rats were switched to a 23 h food-deprived state and received an infusion of the same drug they received during the first phase of the experiment immediately before being placed in the operant chamber with the levers retracted and given 30 noncontingent sucrose pellet presentations over 40 min (the opportunity for positive incentive learning). All rats, still 23 h food deprived, were then tested for their responding in the chain, off drug, under nonrewarded conditions the following day.
Experiment 2 was run in two sets of two replications with the effects of ICI 174,864 and CTOP being compared against vehicle in the first set and the effects of GNTI compared against vehicle in the second. The vehicle groups were collapsed, as a deprivation-by-replication analysis revealed no main effect of replication (F(3,10) = 1.033, p = 0.419).
Experiment 3
In experiment 3 (Table 3), rats, 23 h food deprived, were trained to press a lever to gain access to a second lever, a single press on which delivered a sucrose pellet. Rats were trained before surgery for cannula implantation, then retrained on the final schedule for 2 d after surgery. At test, rats were maintained 23 h deprived and allowed to consume the sucrose after an infusion of CTOP (n = 7), ICI 174,864 (n = 5), GNTI (n = 7), or vehicle (n = 13) into the BLA. The following day, the effect of this revaluation on seeking was tested in a nonrewarded test, conducted off drug in the same 23 h deprived state. After retraining for 2 d, the testing sequence was repeated but with rats being reexposed to the outcome 3 h food deprived following the same drug treatment (the opportunity for negative incentive learning). The effect of this negative incentive learning manipulation on reward seeking actions was then evaluated in a nonrewarded test, conducted in the same 3 h deprived state off drug.
Similar to experiment 2, experiment 3 was run in two replications with the effects of ICI 174,864 and CTOP being compared against vehicle in the first and the effects of GNTI and vehicle compared in the second. The vehicle groups were collapsed as a deprivation-by-replication analysis and revealed no main effect of replication (F(3,11) = 0.232, p = 0.640).
Experiment 4
In experiment 4, rats, 23 h food deprived, were trained to press a lever to gain access to a second lever, a single press on which delivered sucrose. Rats were trained before surgery, then after surgery they received 2 d of training on the final chain schedule. During the test, rats were maintained 23 h deprived and allowed to consume the sucrose after an infusion of either DAMGO (n = 12) or vehicle (n = 11) into the BLA. The following day, the effect of the reevaluation on seeking was tested in a nonrewarded test in the 23 h food-deprived state. After retraining for 2 d, the testing sequence was repeated but with rats being reexposed to the outcome 3 h food deprived following the same drug treatment and then tested 3 h food deprived, off drug, in the nonrewarded test.
Drug administration
All drugs were obtained from Tocris Bioscience and were dissolved in sterile water vehicle and infused bilaterally into the BLA in a volume of 0.5 μl over 1 min via an injector inserted into the guide cannula fabricated to protrude 1 mm ventral to the tip using a microinfusion pump. Injectors were left in place for at least 1 additional minute to ensure full infusion. Immediately thereafter, rats were placed in operant boxes for the noncontingent delivery of sucrose pellets. The doses of the CTOP (1 μg), a μ-selective agonist, ICI 174,864 (10 μg), a δ-selective opioid receptor inverse agonist (Cotton et al., 1984), and GNTI (0.5 μg), a κ-specific opioid receptor antagonist (Jones and Portoghese, 2000) were selected on the basis of their affinities for their respective receptors (CTOP, 9.7 pKi; ICI 174,864, 7.4 pKi; GNTI, 9.9 pKi) and our previous work with the nonselective opioid receptor antagonist naloxone (pKi of 9.0, 7.2, and 8.0 at the μ, δ, and κ receptors, respectively), which, when infused into the BLA at a dose of 1 μg, was shown to block positive incentive learning (Wassum et al., 2009). The dose of DAMGO (0.1 μg), a μ-selective agonist (8.7 pKi), was selected based on previous evidence of this dose being affective in reward-related behaviors (Mahler and Berridge, 2009).
Data analysis
Lever pressing data are presented as a percentage of baseline response rates, with the baseline being the average of the rate of performance during the last 2 training sessions before the test. The results of experiment 1, the seeking response rate data, normalized to baseline levels, were analyzed with a two-way repeated-measure ANOVA with within-subjects variable deprivation (control, 3 vs 23 h deprived) and between-subjects variable exposure. Bonferroni post hoc analyses, correcting for multiple comparisons, were used to evaluate the effects of deprivation within each exposure group. Within the exposed group, we looked at the effects of deprivation over the time course of the unrewarded test with a two-way repeated-measures ANOVA with within-subjects variables deprivation and time bin (five 1 min bins).
For the results of experiments 2, 3, and 4, data were analyzed separately for each experiment, with a two-way repeated-measures ANOVA with within-subjects variable deprivation state (3 vs 23 h deprived), and between-subjects variable drug treatment (experiment 2 and 3: vehicle, ICI 174,864, GNTI and CTOP; experiment 4: vehicle and DAMGO). Bonferroni post hoc analyses, controlling for multiple comparisons, were then run on these data to compare the effect of deprivation within each drug group, and then separately to compare the effect of drug within the 3 and 23 h food-deprived conditions. We then conducted three additional two-way ANOVAs, combining data from experiments 2 and 3 and separating out drug treatment. For each of the three drug treatments, a two-way ANOVA was run with within-subjects variable deprivation state and between-subjects variable direction of state change (either an upshift or downshift in hunger state). For all hypothesis tests, the α level for significance was set to p < 0.05.
In several cases, a manipulation, such as drug treatment, was critically found to have no effect on lever-press actions. In these select cases, we computed Bayes factors for use in supporting the null hypothesis (Gallistel, 2009; Rouder et al., 2009), using a freely available Bayes factor calculator (http://pcl.missouri.edu/bayesfactor) (Rouder et al., 2009). This analysis has been argued to provide an appropriate method for expressing a preference for the null hypothesis (Gallistel, 2009; Rouder et al., 2009).
Histology
Histology was conducted as described previously (Corbit et al., 2007). Rats with miss-directed cannulae (n = 4) were removed from the data analysis. See Figure 1 for cannula placements.
Results
Experiment 1: Changes in reward-seeking actions require an opportunity for incentive learning
As illustrated in Table 1, experiment 1 was conducted in three phases: initial training (3 h food deprived), opportunity for incentive learning or unexposed control, and test. All rats acquired and maintained lever-pressing performance and, in the final session of training, performed the seeking-lever response at a rate of 5.73 (SEM, 1.31) and 5.64 (SEM, 0.84) presses per minutes in the unexposed and exposed groups, respectively.
The effects of exposure to the reward in an increased deprivation state on subsequent seeking actions are represented in Figure 2. As has been previously shown (Balleine et al., 1995; Corbit and Balleine, 2003), changes in response rate on the seeking lever only occurred in rats given an opportunity for incentive learning (reexposure). Incentive learning occurs when an animal experiences a reward in an altered state (Dickinson and Balleine, 1994, 2002; Balleine, 2001; Balleine and Killcross, 2006). Therefore, exposure to the sucrose pellet reward in an increased hunger state (23 h food deprived) provides an opportunity for incentive learning, whereas exposing the rat to the operant context but not to the sucrose reward should not allow for any incentive learning. An increase in seeking actions after exposure to the sucrose when in the heightened (23 h) food-deprived state would suggest that the incentive value of the sucrose was increased as a consequence of that experience, and that this information was used to direct subsequent reward-seeking actions (Balleine, 1992; Balleine et al., 1995).
Analysis of the data from Figure 2A shows a main effect of deprivation state (F(1,27) = 4.97, p = 0.03), no effect of exposure (F(1,27) = 0.04, p = 0.82), and a marginal exposure × deprivation interaction (F(1,27) = 3.15, p = 0.08). Most importantly, Bonferroni post hoc analysis revealed a significant increase in response rate in the 23 h deprived condition over the 3 h control condition for the exposed (p < 0.05), but not the unexposed (p > 0.05), group. Moreover, Bayes factor analysis (Gallistel, 2009; Rouder et al., 2009) of the seeking data from the unexposed group indicates that the null hypothesis, that there is no difference in seeking response rate between the 3 and 23 h tests, is 4.89 times more likely than the alternate hypothesis.
The incentive learning account is further supported by analysis of the seeking data over the time course of the nonrewarded test, which demonstrates that the exposure effect persists throughout the nonrewarded test. In the unexposed (Fig. 2B) group, there is no main effect of deprivation (F(1,14) = 0.14, p = 0.71) or time (F(4,11) = 1.12, p = 0.39), or an interaction between these factors (F(4,11) = 0.80, p = 0.55). In the exposed group (Fig. 2C), however, there is a significant main effect of deprivation (F(1,13) = 6.15, p = 0.03), with no main effect of time (F(4,10) = 2.50, p = 0.11), and no interaction between these factors (F(4,10) = 0.47, p = 0.76).
Together, these data indicate that a motivational state change alone will not significantly impact reward-seeking responses. A change in reward-seeking response rate resulting from a change in motivational state requires exposure to the reward in the changed state, i.e., incentive learning. This effect is sustained across the duration of the test. Reward-seeking responses are therefore reflective of incentive learning.
Experiment 2: μ-, but not δ- or κ-, opioid receptor blockade blocks positive incentive learning
As illustrated in Table 2, this experiment was conducted in three phases: initial training (3 h food deprived), incentive learning, and test. All of the rats acquired and maintained lever-pressing performance and, in the final session of training, performed the seeking-lever response at a rate of 5.09 (SEM, 0.95), 4.94 (SEM, 0.51), 5.94 (SEM, 0.96), and 7.17 (SEM, 2.45) presses per minutes in the vehicle, ICI 174,864, GNTI, and CTOP groups, respectively.
The effects of specific opioid receptor blockade on positive incentive learning are represented in Figure 3. An increase in seeking actions 1 d after exposure to the sucrose when in the heightened (23 h) food-deprived state would suggest that the incentive value of the sucrose was increased as a consequence of that experience, and that this information was used to direct subsequent reward-seeking actions (Balleine, 1992; Balleine et al., 1995). A clear incentive learning effect on reward-seeking actions was indeed observed in rats given vehicle infusion into the BLA. This incentive learning effect was also seen in rats given infusion of ICI 174,864 or GNTI to block δ- and κ-opioid receptors, respectively, during the incentive learning reevaluation phase. Importantly however, the incentive learning effect was not apparent in the rats given intra-BLA CTOP to block μ-opioid receptors.
Statistical analysis of these data (Fig. 3) reveals no main effect of drug (F(3,31) = 0.73, p = 0.54), but does show a significant effect of deprivation (F(1,31) = 18.86, p = 0.0001) and, importantly, a significant drug × deprivation interaction (F(3,31) = 3.36, p = 0.03). Post hoc analyses indicate that the positive incentive learning effect on reward-seeking actions is significant in the vehicle- (p < 0.01), ICI 174,864- (p < 0.05), and GNTI- (p < 0.05) treated groups, but not the CTOP-treated (p > 0.05) group. Moreover, additional post hoc analysis reveals that seeking response rate in the 23 h condition is significantly lower than vehicle for the CTOP (p < 0.01), but not ICI 174,864 or GNTI (p > 0.05) groups. Bayesian analysis (Gallistel, 2009; Rouder et al., 2009) lends additional credence to the finding that intra-BLA CTOP blocked the positive incentive learning effect; the null hypothesis that there was no difference in seeking responses in the 3 and 23 h conditions when reexposed under CTOP was found to be 3.07 times more probable than the alternative hypothesis. Interestingly, the effect of CTOP was limited to learning about the value increase, as intra-BLA CTOP had no affect in the 3 h food-deprived condition (p > 0.05, compared with vehicle), indicating that exposure to the sucrose outcome in the control 3 h deprived state under intra-BLA CTOP did not affect sucrose incentive value per se, but rather acted to block the increase in value brought about by a positive shift in motivational state.
This effect of CTOP to block positive incentive learning as measured in reward-seeking actions was mirrored in the magazine entries during the reexposure incentive learning experience (Table 4). Statistical analysis of the magazine entry rate data collected during the reexposure shows a significant effect of deprivation on magazine entry rate (F(1,29) = 27.65, p < 0.0001), with no significant effect of drug (F(3,29) = 0.88, p = 0.46). There was, however, a marginal drug × deprivation interaction (F(3,29) = 2.65, p = 0.07). Separate analysis of each drug group compared with vehicle reveals a significant effect of deprivation (ICI: F(1,16) = 28.22, p < 0.0001; GNTI: F(1,17) = 28.58, p < 0.0001), with no main effect of drug (ICI: F(1,16) = 0.04, p = 0.83; GNTI: F(1,17) = 2.11, p = 0.16) and no interaction between these factors (ICI: F(1,16) = 0.02, p = 0.88; GNTI: F(1,17) = 2.32, p = 0.15) for the ICI 174,864 and GNTI groups. When comparing magazine entry rate between the vehicle and CTOP-treated rats, there is a significant effect of deprivation (F(1,16) = 9.75, p = 0.006), with no effect of drug (F(1,16) = 0.002, p = 0.96), but a significant interaction between these factors (F(1,16) = 5.77, p = 0.03). Post hoc analysis confirms that there is a significant increase in magazine entry rate when 23 h deprived in the vehicle- (p < 0.001), but not CTOP- (p > 0.05), treated groups. Importantly, despite the lack of increase in the magazine entry rate for the CTOP group, all pellets were consumed during the reexposure, ensuring that these rats did still have the opportunity for incentive learning. These data suggest that increasing food deprivation resulted in rats checking the magazine more often and that this was not affected by the δ- or κ-opioid receptor antagonist, but was blocked by the μ-opioid receptor antagonist.
Experiment 3: Neither μ, δ, nor κ opioid receptor blockade affects negative incentive learning
As illustrated in Table 3, experiment 3 was conducted in three phases: initial training (23 h food deprived), incentive learning, and test. All of the rats acquired and maintained lever-pressing performance and, in the final session of training, performed the seeking-lever response at a rate of 23.28 (SEM, 2.57), 24.59 (SEM, 2.48), 23.34 (SEM, 4.87), and 24.20 (SEM, 3.80) presses per minutes in the vehicle, ICI 174,864, GNTI, and CTOP groups, respectively.
The effects of specific opioid receptor blockade on negative incentive learning are represented in Figure 4. A clear negative incentive learning effect was observed in rats given vehicle, ICI 174,864, and GNTI infusion into the BLA. Contrary to the result of experiment 2, the negative incentive learning effect was also apparent in the rats given intra-BLA CTOP to block μ-opioid receptors.
Statistical analysis of these data reveals a significant effect of deprivation (F(1,28) = 62.23, p < 0.0001), but importantly no main effect of drug (F(3,28) = 2.154, p = 0.17), or significant drug × deprivation interaction (F(3,28) = 0.33, p = 0.80).
As with experiment 2, the effects of ICI 174,864, GNTI, and CTOP on subsequent reward-seeking actions were mirrored in the magazine entry rate during the actual reexposure (Table 4). Statistical analysis of the magazine entry rate show a significant effect of deprivation (F(1,28) = 30.51, p < 0.0001), with no main effect of drug (F(3,28) = 0.60, p = 0.61), or interaction between these factors (F(3,28) = 0.45, p = 0.71). These data suggest that decreasing food deprivation resulted in rats checking the magazine less often and that this was not affected by drug treatment.
Further statistical support for a selective effect of CTOP on positive versus negative incentive learning is derived from combined analyses of the data from both experiments 1 and 2 with separate two-way ANOVAs for each drug group. These analyses reveal a significant effect of deprivation (either from 3 h control to 23 h to increase value or from 23 h control to 3 h to decrease value), with no significant interaction between the effect of deprivation and the directional change (either increasing or decreasing) in the vehicle, ICI 174,864, and GNTI-treated groups (vehicle: deprivation, F(1,25) = 20.53, p = 0.0001; deprivation × value change, F(1,25) = 2.03, p = 0.17; ICI 174,864: deprivation, F(1,9) = 20.47, p = 0.001; deprivation × value change, F(1,9) = 1.71, p = 0.22; GNTI: deprivation, F(1,13) = 17.67, p = 0.001; deprivation × value change, F(1,13) = 1.74, p = 0.21). However, in the intra-BLA CTOP group there is no significant effect of deprivation (F(1,12) = 0.98, p = 0.31), but rather a significant deprivation × value change interaction (F(1,12) = 12.07, p = 0.005), indicating that CTOP blocked learning that the value of the outcome had increased, but not that it had decreased.
Experiment 4: A μ-opioid receptor agonist attenuates negative incentive learning
Having shown, in experiment 2, that blockade of BLA μ-opioid receptors blocked the effects of positive incentive learning on reward seeking and, in experiment 3, that μ-opioid receptor blockade did not affect negative incentive learning, we next explored the possibility that infusion of a μ-opioid receptor agonist into the BLA may attenuate negative incentive learning. Experiment 4 was conducted in three phases: initial training (23 h food deprived), incentive learning, and test. All of the rats acquired and maintained lever-pressing performance and, in the final session of training, performed the seeking-lever response at a rate of 16.48 (SEM, 2.12) and 16.79 (SEM, 1.36) presses per minute in the vehicle and DAMGO groups, respectively.
The effects of the specific opioid receptor agonist, DAMGO, on negative incentive learning are represented in Figure 5. A clear negative incentive learning effect was observed in rats given vehicle, but this effect was attenuated in animals receiving intra-BLA infusion of DAMGO. Statistical analysis of these data reveals a significant effect of deprivation (F(1,21) = 15.66, p = 0.0007), but no main effect of drug (F(1,21) = 1.27, p = 0.27). Importantly, there was a significant drug × deprivation interaction (F(1,21) = 4.34, p = 0.049). Post hoc analyses clarify this interaction to show that there was a significant decrease in reward-seeking response rate in the vehicle- (p < 0.001), but not DAMGO- (p > 0.05), treated rats when shifting from 23 to 3 h food deprivation. DAMGO also appeared to have an effect alone to lower seeking response rates in the control 23 h condition, but this was not significant by separate post hoc analysis (p > 0.05).
Although intra-BLA DAMGO blocked the effect on subsequent reward seeking of experiencing the reward in a lower deprivation state, it did not as robustly attenuate the decrease in checking for the sucrose pellets during the incentive learning phase (Table 4). Statistical analysis of the magazine entry rate during the reexposure shows a significant effect of deprivation (F(1,21) = 2.25, p = 0.001), with no main effect of drug (F(1,21) = 0.77, p = 0.38), or interaction between these factors (F(1,21) = 0.16, p = 0.68). These data suggest that decreasing food deprivation resulted in rats checking the magazine less often and this was not affected by drug treatment. Although intra-BLA DAMGO was shown to block the effects of negative incentive learning on reward seeking, the lack of a significant interaction between drug and deprivation on magazine entries suggests that DAMGO did not as robustly attenuate the decrease in checking for the sucrose pellets.
Discussion
The present results extend the previous finding that endogenous opioids in the BLA mediate the assignment of incentive value (Wassum et al., 2009) in three significant ways. First, it is evident from these data that endogenous activation of opioid receptors in the BLA mediates the encoding of positive, but not negative, shifts in value; opioid receptor blockade during revaluation, although preventing the increase in reward seeking following experience of the reward in a heightened motivational state (hunger) (Fig. 3), had no influence on the reduction in reward-seeking actions induced by experience of the reward in a decreased motivational state (Fig. 4). Second, this action of endogenous opioids in the BLA appears to be specifically mediated by μ-opioid receptors (Fig. 3). Third, decreases in endogenous opioid transmission may mediate the encoding of negative shifts in reward value as exogenous activation of the μ-opioid receptor attenuated the reduction in reward-seeking actions induced by experience of the reward in a decreased motivational state, i.e., blocked negative incentive learning (Fig. 5).
The mediation of the incentive learning function by BLA endogenous opioid peptides specifically via the μ subtype of opioid receptors, despite the presence of μ, δ, and κ receptors in this nucleus (Le Merrer et al., 2009), parallels the well documented involvement of this receptor in the consummatory component of the reward experience. However, such consummatory hedonic effects are mediated outside of the BLA, primarily in the ventral pallidum and nucleus accumbens shell (Peciña and Berridge, 2005; Smith and Berridge, 2005, 2007; Wassum et al., 2009). Indeed, naloxone-induced blockade of opioid receptors in the BLA during reward reevaluation was shown to prevent the increase in future reward seeking despite having no effect on the increase in the palatability-related hedonic experience induced by the motivational state change (Wassum et al., 2009). Although the concept that reward ‘liking’ and cue-induced reward ‘wanting’ are dissociable processes has been established for some time (Robinson and Berridge, 1993; Berridge and Robinson, 2003; Peciña et al., 2003), these data established that reward ‘liking’ and the process by which the value of a reward is updated to drive uncued instrumental actions, i.e., incentive learning, are also dissociable. Together with the current data, it appears that endogenous activation of μ-opioid receptors mediates not only the affective qualities of reward consumption (in the nucleus accumbens shell and ventral pallidum), but also, independently (in the BLA), the process by which positive, but not negative changes in the incentive value of rewards are encoded. As BLA circuitry per se has been implicated in incentive learning, generally (Balleine et al., 2003; Corbit and Balleine, 2005; Wang et al., 2005; Ostlund and Balleine, 2008), it appears, that within BLA circuits mediating the assignment of incentive value there is a mechanism for encoding the valence of the reward shift. The current observation that intra-BLA administration of the μ agonist, DAMGO, attenuated negative incentive learning induced by a downward shift in hunger state suggests that this may simply be achieved by upward or downward shifts in endogenous opioid tone, with the latter facilitating negative incentive learning.
It is possible that, rather than encoding of incentive value per se, endogenous opioid transmission in the BLA may be necessary for mediating the association that is formed between interoceptive hunger-state cues and the heightened affective experience with the reward, an account that we are pursuing in ongoing experiments. As systemic opioid antagonists have been shown to produce a taste aversion-like effect (Parker and Rennie, 1992), it is also possible that the observed effects of intra-BLA μ-opioid receptor antagonism on positive incentive learning could also result from competition between conditioned aversion and incentive learning. This account is unlikely, however, as reexposure to the sucrose outcome under intra-BLA CTOP did not alone affect subsequent reward seeking actions in the unrewarded test (Figs. 3 and 4), i.e., intra-BLA CTOP/sucrose outcome pairings did not produce the devaluation effect that would be predicted by the conditioned taste aversion hypothesis.
An additional alternate interpretation of our data is that, as the opportunity for incentive learning was conducted in the same context as training and test, intra-BLA CTOP may have altered the Pavlovian incentive value of the context, and in so doing blocked the increase in reward-seeking actions detected during the nonrewarded test of incentive learning. However, contextual control is not necessary for the effects of incentive learning to be expressed in reward-seeking actions (Balleine et al., 1995) and responding on the seeking component of the heterogeneous seeking–taking chain has previously been shown to be relatively immune to Pavlovian influences (Corbit and Balleine, 2003), rendering this account similarly unlikely. Moreover, intra-BLA CTOP did not degrade the context–reward association under control conditions, as evidenced by no change in reward seeking in the intra-BLA CTOP group, relative to vehicle group in the control condition (either 3 h in experiment 2 or 23 h in experiment 3). The near-complete blockade of the incentive learning effect by intra-BLA CTOP suggests that μ-opioid receptors are involved in the process by which the instrumental incentive value of the outcome is increased. Nonetheless, the possibility that μ-opioid receptors in the BLA are important for the effects of cues on instrumental performance will be assessed in subsequent experiments using a more targeted Pavlovian to instrumental transfer design.
Interesting in this context is the observation of Mahler and Berridge (2009) that μ-opioid receptor activation in the central nucleus of the amygdala enhances the effect of a reward-paired cue on approach to either the cue itself or the food source, i.e., it enhanced and focused cue-induced reward ‘wanting’, to use the authors' terminology (Mahler and Berridge, 2009). Earlier studies have also shown that the central nucleus of the amygdala is necessary for the general motivational influence of reward-related cues on action performance, whereas the BLA is involved in more reward-specific incentive processes and instrumental incentive learning (Balleine et al., 2003; Corbit and Balleine, 2005; Dwyer and Killcross, 2006). These findings, taken with our current results, begin to elucidate an anatomically dissociable neuromodulatory role of μ-opioid receptor activation in reward ‘liking’ (Peciña and Berridge, 2005; Smith and Berridge, 2005, 2007), cue-induced generalized reward ‘wanting’ (Mahler and Berridge, 2009), and the performance of uncued specific incentive value-driven actions described herein, involving the ventral pallidum/nucleus accumbens shell, amygdala central nucleus, and BLA, respectively. Given that not only opiates, but also palatable foods, cocaine, alcohol, cannabinoids, and nicotine have been shown to acutely and chronically influence the activity of the endogenous opioid systems (Unterwald et al., 1994; Turchan et al., 1998, 1999; Berrendero et al., 2002, 2005; Smith et al., 2002; Fattore et al., 2004; Oswald and Wand, 2004; Ziółkowska et al., 2006), the separable role of these peptides in these three aspects of reward processing may underlie the intensely addictive property of these substances. Moreover, the current data indicates that opioid receptor activation in the BLA facilitates encoding of increases in incentive value, whereas reduced opioid transmission may underlie encoding of decreased incentive value, which could explain how the use of addictive substances can result in aberrant drug seeking; the incentive value of the drug becomes enhanced and the negative aspects of drug use that should otherwise lead to devaluation of the drug and a concomitant reduction in drug seeking are overshadowed. Targeting these anatomically dissociable μ-opioid receptor-mediated processes may provide treatments for these addictive disorders that improve the integration of emotional and cognitive processes to result in more appropriate decision-making.
Footnotes
-
This research was supported by Grants DA09359 and DA05010 from National Institute on Drug Abuse to N.T.M., Grants MH56446 from National Institutes of Mental Health and AA18014 from National Institute on Alcohol Abuse and Alcoholism to B.W.B., and a Ruth L. Kirschstein National Research Service Award DA023774 and Hatos scholarship to K.M.W.
- Correspondence should be addressed to Kate Wassum, Department of Psychiatry and Biobehavioral Sciences, Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA 90024. kwassum{at}ucla.edu