Free-Operant Avoidance Behavior by Rats after Reinforcer Revaluation Using Opioid Agonists and d-Amphetamine

The associative processes that support free-operant instrumental avoidance behavior are still unknown. We used a revaluation procedure to determine whether the performance of an avoidance response is sensitive to the current value of the aversive, negative reinforcer. Rats were trained on an unsignaled, free-operant lever press avoidance paradigm in which each response avoided or escaped shock and produced a 5 s feedback stimulus. The revaluation procedure consisted of noncontingent presentations of the shock in the absence of the lever either paired or unpaired with systemic morphine and in a different cohort with systemic d-amphetamine. Rats were then tested drug free during an extinction test. In both the d-amphetamine and morphine groups, pairing of the drug and shock decreased subsequent avoidance responding during the extinction test, suggesting that avoidance behavior was sensitive to the current incentive value of the aversive negative reinforcer. Experiment 2 used central infusions of D-Ala2, NMe-Phe4, Gly-ol5]-enkephalin (DAMGO), a mu-opioid receptor agonist, in the periacqueductal gray and nucleus accumbens shell to revalue the shock. Infusions of DAMGO in both regions replicated the effects seen with systemic morphine. These results are the first to demonstrate the impact of revaluation of an aversive reinforcer on avoidance behavior using pharmacological agents, thereby providing potential therapeutic targets for the treatment of avoidance behavior symptomatic of anxiety disorders.


Introduction
Appetitive studies have provided good evidence that instrumental behavior is directly sensitive to the current incentive value of the reinforcer. This conclusion is based on a procedure known as reinforcer revaluation, which involves changing the incentive value of the reinforcer before testing whether instrumental responding is sensitive to this revaluation when assessed in extinction in the absence of the reinforcer. If instrumental responding is sensitive to the reinforcer value, then changes in this value should modulate instrumental behavior during the extinction test. Appetitive instrumental behavior has been shown to be sensitive to reinforcer revaluation in rats (Adams and Dickinson, 1981) and, more recently, in both primates (Rhodes and Murray, 2013) and humans (Valentin et al., 2007). The theoretical inference from this finding is that the instrumental behavior is goal directed in the sense of being mediated by representations of the response-reinforcer contingency and the current incentive value of the reinforcer.
The processes underlying avoidance behavior have received less attention. In contrast to appetitive behavior, the performance of an avoidance response is maintained by a negative contingency between the response and the aversive reinforce because the avoidance response prevents its presentation. The outcome of an avoidance response is therefore the omission of the aversive reinforcer, which is thought to be the critical event in establishing and maintaining instrumental avoidance (Mackintosh, 1983). Instrumental avoidance is therefore sensitive to revaluation only if mediated by a representation of the negative action-reinforcer contingency and the current value of the aversive reinforcer. Three studies have assessed the effects of revaluation of an aversive reinforcer on avoidance behavior, one in rats (Hendersen and Graham, 1979) using heat as a reinforcer and two in humans, one using monetary loss as the reinforcer (Declercq and DeHouwer, 2008) and the other an electric shock (Gillan et al., 2013). Although these studies demonstrate that it is possible to revalue an aversive, negative reinforcer of avoidance behavior, none has assessed the neurobiological mechanisms of reinforcer revaluation. Experiment 1 in the present study, however, provides evidence for these mechanisms using analgesic agents to revalue an aversive foot shock reinforcer. After free-operant lever press avoidance training, the rats received revaluation sessions during which the analgesic agents morphine and D-amphetamine (Burrill et al., 1944;Abbot et al., 1995) were administered before noncontingent presentations of the foot shock reinforcer in a paired group or before sessions without the shock in the unpaired group. If avoidance responding is goal directed, we predicted that, during the extinction test session, the paired group would reduce responding after experience of foot shock under analgesia. In Experiment 2, selective infusions of D-Ala 2 , NMe-Phe 4 , Glyol 5 ]-enkephalin (DAMGO), a mu-opioid receptor agonist, in the periacqueductal gray (PAG), a substrate of the pain circuitry (McNally, 1999), and the nucleus accumbens shell (NacS), a region rich in opioid receptors involved in the processing of emotional stimuli (Mansour et al., 1995;Barrot et al., 2002), were administered during the revaluation procedure in paired and unpaired groups to identify brain substrates involved in the mediation of shock revaluation.

Materials and Methods
Subjects. Experiment 1 was conducted in two groups sequentially. The first group consisted of 17 male Lister hooded rats that received treatment with morphine; the second group consisted of 16 male Lister hooded rats that received treatment with D-amphetamine. Experiment 2 was conducted in a group of 42 rats. Rats, which were obtained from Charles River Laboratories, weighed 300 g at the start of the experiment. They were housed in groups of four per cage in a reverse light cycle room (12 h light/dark cycle with lights on at 19:00). Training and testing occurred during the dark phase and complied with the statutory requirements of the UK Animals (Scientific Procedures) Act of 1986.
Apparatus. Fourteen operant conditioning chambers (Med Associates) were used, each measuring 29.5 ϫ 32.5 ϫ 23.5 cm with a Plexiglas ceiling, front door and back panel, and metal paneling on the sides of the chamber. The floor of the chamber was covered with a metal grid with a metal tray beneath. Med Associates shock generators (ENV-224AMWN, 115 V AC, 60 Hz) were connected to the metal grid and used to produce scrambled 0.5 s, 0.5 mA foot shocks. Each chamber was placed within a sound-and lightattenuating box and interfaced to a computer through Whisker control software (Cardinal and Aitken, 2010). The feedback stimulus was either a 2900 Hz tone produced by a Med Associates tone generator (ENV-223AM) or a white noise produced by a Med Associates white noise generator (ENV-2255M) counterbalanced. Both of these generators were attached to the same wall of the chamber and the stimuli were set to 8 dB above background level. Levers could be extended either side of a central food magazine on the opposite side wall, but no pellets were ever delivered.
Pretraining. Rats were first habituated to the chamber and the levers for 4 d. During the first 2 d, either the left or the right lever was randomly chosen at the start of the session. The designated lever was then extended as the session began and any responses resulted in its retraction for 1 s, followed by its immediate extension back into the chamber. For the last 2 d, the opposite lever was extended and the number of responses was limited so that the number of retractions and extensions of the 2 levers was equated. The houselights remained on until the end of each daily 1 h session.
Training. The start of the session was marked with the illumination of the house light and the extension of a single lever, which was randomly chosen as either the right or the left lever at the start of daily sessions. The lever remained extended for the entire session. The session began with an unsignaled avoidance period ranging between 120 and 140 s that, in the absence of a lever press response, was followed by intermittent foot shocks (0.2 mA). During this shock period, the shock-shock interval was 5 s. After three presentations of the shock, the shock period terminated and was followed by the next avoidance period. The maximum number of shocks that could be presented in the session was limited to 30, at which point the session ended. Any lever press during the avoidance or shock periods immediately terminated these periods with a 120 s auditory feedback stimulus, which was then followed by the next avoidance period. The use of a feedback stimulus facilitates the performance of the avoidance with this procedure (Fernando et al., unpublished data). Lever presses during the feedback stimulus had no consequence and did not contribute to the assessment of avoidance responding. The variability and duration of the avoidance period was increased over the course of training to a final mean avoidance interval of 120 s (range 10 -230), the feedback stimulus duration was gradually reduced to 5 s, and the shock intensity increased in 0.1 mA increments across training to a final value of 0.5 mA. Subjects were trained for ϳ30 sessions.
Surgery. Rats in Experiment 2 received surgery after avoidance training. Twenty-four rats were implanted in the PAG with 26 Ga unilateral guide cannulae (Plastics One) according to the stereotaxic coordinates of AP ϩ0.1, ML ϩ0.8, DV Ϫ5.6 (from lambda, DV measured from skull). The cannulated hemisphere was counterbalanced between right or left hemisphere across subjects. Eighteen rats were cannulated in the NacS with 26 Ga bilateral guide cannulae (Plastics One) to the stereotaxic coordinates of AP ϩ1.7, ML ϩ1.0, DV Ϫ2.0 (from bregma, DV measured from skull). After surgery, rats were individually housed and left to recover with both food and water ad libitum. After 1 week of recovery, subjects were retrained on the avoidance task until 3 d of stable baseline was observed before the revaluation procedure.
Revaluation. The revaluation procedure lasted 4 d (1 session/d). Rats in the paired group received 2 sessions in which the analgesic drug was administered before a session in which 15 presentations of 0.5 mA foot shock were experienced in the absence of the lever and feedback stimulus. Three shocks were presented with a shock-shock interval of 5 s, the next 3 shocks were then presented after an average interval of 588 s (range 240 -348 s). During the other two sessions, rats received vehicle injections before sessions in which nothing occurred in the chamber for the 30 min. In the unpaired group, the drug was administered before sessions in which nothing occurred in the chamber and vehicle was administered before sessions with shock.
Drugs and administration procedure. All rats receiving systemic treatments (Experiment 1) received 4 d of intraperitoneal injections, 2 d with the drug (morphine 10 mg/kg or D-amphetamine 1.5 mg/kg, calculated as free base) and 2 d with vehicle (0.9% filtered saline). Doses were chosen for their analgesic effects in the absence of motor depressant effects and selective behavioral effects (morphine: Babbini and Davis, 1972;Babbini et al., 1979;Kuribara et al., 1985;D-amphetamine: Abbot et al., 1995;Fernando et al., 2013b). Subjects were left in single cages in the dark after injections and then placed in the testing chambers either 20 min (morphine) or 15 min (D-amphetamine) after injection.
Central treatments. All rats in Experiment 2 received infusions of the mu-opioid agonist DAMGO at a dose of 0.5 g/l and a flow rate of 0.25 l/min with a 2 min diffusion period. Doses were chosen based on studies of central infusions of opioids in the PAG and NacS (Peciña and Berridge, 2000;Iordanova et al., 2006). Unilateral infusions in the PAG of 0.5 l were performed with the injector extending 1 mm below the guide cannulae for 2 min. Bilateral infusions in the NacS were performed with the injector extending 5.25 mm beyond the guide cannulae for 1 min, resulting in a total bilateral volume of 0.5 l of drug. Rats were placed in the test chamber 5 min after infusion. Infusions were only conducted for 2 of the 4 d with no vehicle infusions.
Before test infusions, all rats received mock infusions in which they were habituated to the infusion procedure and infusion room. This procedure was performed so that any behavioral effects of tissue damage mechanically induced by the first infusion occurred before the test session.
Extinction test. The day after the 4 d revaluation procedure, a single 30 min drug-free test session was conducted, which was the same as baseline sessions except in the absence of the foot shock (i.e., extinction). Reinforced test. The next day after the test session, a reinforced test was conducted in which the revalued shock was presented in the absence of an avoidance response; the session was therefore the same as a baseline session and lasted 1 h.
Data analysis. For all experiments in this study, the rates of lever press responding were square root (SQRT) transformed for statistical analysis as the variance increased with mean responding. ANOVA was conducted on the mean of the SQRT of the rate of avoidance responses per minute during the extinction test, with a between-subjects factor of revaluation condition (paired vs unpaired) and, in Experiment 2, a between-subject factor of infusion site (PAG vs NacS). A rejection criterion of p Ͻ 0.05 was used and the Huynh-Feldt adjustment was applied if sphericity was violated.

Results
The final numbers of subjects in each group are reported in Table 1.

Experiment 1 Training
Analysis of both the D-amphetamine group and morphine group revealed no differences in rates of avoidance responses between the assigned revaluation conditions (paired vs unpaired) during the last training session before revaluation (both F Ͻ 1). Means (SEMs) of avoidance response rates during the baseline session before revaluation are reported in Table 2. Figure 1 illustrates the reduction in avoidance behavior in rats that had pairings (relative to their unpaired controls) of either systemic morphine (revaluation F (1,15) ϭ 14.0, p Ͻ 0.005) or D-amphetamine (revaluation F (1,14) ϭ 14.7, p Ͻ 0.005) with shock before an extinction test session. Figure 4, A and B, provides evidence for the effectiveness of the revaluation procedure. On first experience of the revalued foot shock drug free, the paired group demonstrated a reduced rate of avoidance behavior for 5 min with respect to their unpaired controls. Statistical analysis revealed a significant reduction in the rate of avoidance responding in the paired group with respect to the unpaired group after revaluation with systemic D-amphetamine and a trend to the same effect with systemic morphine (D-amphetamine: revaluation F (1,14) ϭ 4.6, p Ͻ 0.05; morphine: revaluation F (1,15) ϭ 1.9, p Ͼ 0.15, N.S.). Analysis of the entire reinforced test session, in which the rate of avoidance responding was averaged in 5 min time bins, demonstrates the transitory nature of these effects. Rats in the paired groups increased their rates of avoidance responding across the session (morphine: time F (7.0, 105.7) ϭ 2.9, p Ͻ 0.01; D-amphetamine: time F (6.8, 95.0) ϭ 5.1, p Ͻ 0.001) to result in no differences in avoidance behavior between the two revaluation groups (morphine: revaluation F Ͻ 1; D-amphetamine: revaluation (F (1,14) ϭ 1.6, p Ͼ 0.2, N.S.). The transitory nature of the revaluation effect is reflected in the inability to detect the change in rates of avoidance behavior in the revaluation groups over the course of the reinforced test session (morphine: revaluation ϫ time F (11,165) ϭ 1.2, p Ͼ 0.3, N.S.; D-amphetamine: revaluation ϫ time F (11,154) ϭ 1.6, p Ͼ 0.9, N.S.) Revaluation of shock with central infusions of DAMGO may result in more persistent effects during the reinforced tests. Figure 2, A and B, are representative microphotographs of the location of cannulae in the PAG and the NacS, respectively. Four rats were excluded from the study due to injector cannulae being positioned outside the PAG. Two rats were excluded due to procedural errors in the PAG group and three rats lost guide cannulae through the course of the procedure in the PAG group (two rats) and NacS group (one rat). There was no gross tissue damage in the local vicinity of the injector tracks of the approved placements.

Training
Analysis of the baseline session before the extinction test revealed no differences in responding either between the surgical groups or between treatment groups (infusion site, revaluation, infusion site ϫ revaluation, all F Ͻ 1). Means (SEMs) of avoidance response rates during baseline are presented in Table 3. Figure 3 demonstrates the successful revaluation of shock with infusions of DAMGO in both the PAG and NacS. Infusions in these areas before shock revaluation significantly reduced rates of avoidance responding during the drug-free extinction test only when DAMGO was paired with shock (revaluation F (1,29) ϭ 80.1, p Ͻ 0.001). A main effect of infusion site was revealed with the ANOVA (F (1,29) ϭ 5.6, p Ͻ 0.05), which did not, however, interact with the effect of revaluation (revaluation ϫ infusion site F Ͻ 1). Post hoc pairwise comparisons revealed a significant difference in the rate of avoidance responding between the paired and unpaired revaluations for each region infused (PAG: revaluation p Ͻ 0.001; NacS: revaluation p Ͻ 0.001). Figure 4, C and D, illustrates the reduced rate of avoidance responding of the paired groups with respect to their unpaired controls after shock revaluation with central infusions of DAMGO in the PAG and NacS, respectively. Statistical analysis confirmed the observed reduced rate of avoidance responding in the paired group with respect to the unpaired group (F (1,35) ϭ 31.9, p Ͻ 0.001), which was consistent across both brain regions (infusion site F Ͻ 1, N.S.; infusion site ϫ revaluation F (1,35) ϭ 2. 1, p Ͼ 0.1, N.S.). Furthermore, analysis of rates of avoidance behavior averaged in 5 min time bins across the 1 h session revealed a significant effect of time (F (9.1, 319.3) ϭ 4.3, p Ͻ 0.01), with rates of avoidance behavior differing between the two revaluation groups across the session (time ϫ revaluation F (11,385) ϭ 4.2, p Ͻ 0.001). Despite the different rates of avoidance behavior seen between the NacS and PAG across the reinforced test session (time ϫ infusion site F (11,385) ϭ 1.9, p Ͻ 0.5), this did not result in differences in the rate of avoidance behavior between the revaluation groups nor between regions (infusion site F Ͻ 1, N.S.; infusion site ϫ group F (1,35) ϭ 2.1, p Ͼ 0.1, N.S.; time bin ϫ infusion site ϫ group F Ͻ 1, N.S.). Central administration of the mu-opioid agonist DAMGO may have produced a more persistent revaluation effect that was resistant to repeated exposure to foot shock.

Discussion
This study demonstrates that free-operant lever press avoidance behavior is sensitive to the current value of the aversive foot shock, indicating that responding is mediated by a representation of the negative contingency between the response and reinforcer. The reduction in avoidance responding during the extinction test session was only observed in groups that received prior pairings of analgesic drugs and foot shock during the revaluation procedure. These results will be discussed in terms of their neurobiological mechanisms and their implications for theories of avoidance behavior.
Systemic morphine and D-amphetamine have been shown to produce analgesia in a variety of models of pain sensitivity (Abbot   Abbot et al., 1995;Babbini et al., 1979;Tricklebank et al., 1984;Connor et al., 2010, Sohn et al., 2000. Experience with either analgesic drug prior only to sessions with foot shock presentations decreased avoidance responding during the subsequent drug-free extinction test. This finding strongly suggests that these drugs diminished the pain experienced with presentations of the foot shock, leading to its revaluation. Infusions of DAMGO both in the PAG and NacS in the paired revaluation groups again resulted in a decrease in avoidance responding during the drug-free test session. Both regions have been implicated in mediating analgesia. Morphine is believed to inhibit the ascending transmission of nociceptive information from the spinal cord dorsal horn, leading to activation of descending circuits that include the PAG (Reynolds, 1969;McNally, 1999). Dopamine receptors (DARs) within the PAG have also been shown to mediate antinociception, because infusions of apomorphine, a DAR agonist, have a direct antinociceptive effect during hot plate tests when infused into the PAG (Meyer et al., 2009). The involvement of DA in analgesia has also been demonstrated in the Nac, where DA release parallels antinociceptive responses in drug-naive and morphine-pretreated rats (Schmidt et al., 2002) and fluctuates in healthy human controls to predict the magnitude of pain.
Pain can, however, be conceptualized as more than simply nociception. Therefore, the experience of pain can be influenced not only by its sensory properties, but also by the motivational state of the subject (Leknes and Tracey, 2008). The influence of sensory and motivational influences on the experience of pain is therefore flexible and has been formalized using the signal detection theory, which assumes the detection of a stimulus above a background "noise" of stimuli in our environment and requires a statistical decision by the subject (Lloyd and Appel, 1976; Rollman, 1979). The sensitivity in detection of this stimulus from the  background noise is a statistical parameter that could reflect the sensory properties of a painful stimulus. Determining whether a stimulus differs to background stimuli, that is, whether a stimulus is painful respective to the current state of the individual, can be altered by the subject's response bias or criterion, which is influenced by their motivational state. These processes, the sensory perception of pain and the motivation of the subject to report a stimulus as painful, may be represented in the PAG and NacS, respectively.
The sensory aspects of pain are likely to be mediated by the PAG, a region within the midbrain that acts as a supraspinal site to produce opioid-mediated analgesia (Pert and Yaksh, 1975;Yaksh et al., 1976;Sohn et al., 2000). The NacS has also been implicated in aversive processing and the mediation of pain (Becerra et al., 2001;Pezze et al., 2001;Aharon et al., 2006;Martinez et al., 2008;Levita et al., 2009;Badrinarayan et al., 2012). Baliki et al. (2010) observed activation of the putative NacS in humans during application of a painful stimulus before ratings of pain and thus potentially acting as a predictive signal of pain experience. Infusions of DAMGO within the NacS may have revalued the sensory experience of shock via an analgesic mechanism, as predicted for the PAG; however, they could also have activated an appetitive reward system involving the NacS (Parkinson et al., 1999;Corbit and Balleine, 2011), changing the motivational state of the subject.
Lesions of the NacS have been shown to eliminate appetitiveoutcome-specific Pavlovian instrumental transfer (Corbit and Balleine, 2011). Lesions of the NacS also abolished the potentiating effect of D-amphetamine on preferential lever pressing to produce a stimulus previously paired with food (Parkinson et al., 1999). Potential activation of the appetitive reward system in this study with infusions of DAMGO in the NacS may have altered the motivational state of the subject, changing the criterion at which subjects report a stimulus as painful with respect to background stimuli. Support for this prediction is seen in a study by Navratilova et al. (2012) using a model of experimental postsurgical pain in rats. The investigators blocked afferent input from the site of injury, producing chronic pain with a local anesthetic; this resulted in a change in the motivational state of the rats reflected in the observed conditioned place preference (CPP) by these animals. The elicitation of CPP was associated with increased activity of dopaminergic cells within the ventral tegmental area and enhanced DA release specifically in the medial NacS. Our previous studies have indicated that free-operant avoidance behavior is affected by catecholamine manipulations of the NAcS, but not the NAc core subregion, suggesting some specificity of the NacS in avoidance behavior supported by appetitive-aversive interactions (Fernando et al., 2013c). Furthermore, connected regions of the NacS, such as the ventromedial prefrontal cortex and amygdala, have also been implicated in avoidance behavior, acting as interfaces for appetitive and aversive influences (Wilensky et al., 2000;Kim et al., 2006;Prévost et al., 2011;Moscarello and LeDoux, 2013;Fernando et al., 2013a). The dual role of the NacS in both appetitive and aversive processing suggests that the mechanism by which revaluation of the shock occurred with infusions of DAMGO differed between the NacS and PAG, with each region potentially mediating a different aspect of pain experience. This may be reflected in Figure 5, C and D, in which paired infusions of DAMGO in the PAG during revaluation of shock resulted in a more persistent reduction of avoidance behavior during the reinforced test than in the other groups, a result predicted if the sensory experience of the shock had been altered.
The diverging mechanisms of these two regions in revaluation of aversive stimuli could still be indirectly linked anatomically, as in previous studies of pain report activation of both regions (Becerra et al., 2001;Leknes et al., 2013). Specifically, the revaluation of a painful stimulus has been shown in humans to result in greater functional connectivity between the PAG and ventral striatum (Leknes et al., 2013).
These experiments, at the very least, demonstrate that the value of the negative reinforcer is encoded in the associative representations that mediate avoidance behavior. The precise nature of these processes, however, remains undetermined. Cognitive theory (Bolles, 1970;Seligman and Johnston, 1973;Lovibond, 2008) argues that subjects learn the outcomes of responding and of not responding and then make decisions about whether to respond or withhold responding on the basis of a comparison between the two expected outcomes. Cognitive theory assumes that the aversive valence of the reinforcer motivates avoidance behavior through knowledge of the negative contingency between response and reinforcer. The implications for the present results is that revaluation of the foot shock may have resulted in a reduced preference to perform the lever press that was negatively correlated with the presentation of the aversive reinforcer with respect to not performing the avoidance behavior.
Alternatively, classic two-factor theory (Mowrer, 1947;Konorski, 1967) could also predict the sensitivity of avoidance behavior to revaluation of the aversive reinforcer if it is assumed that fear motivation generated by Pavlovian conditioning to the context in the case of free-operant avoidance is mediated by some representation of the shock. Aversive Pavlovian reinforcer revaluation studies have demonstrated inflation of the value of the unconditioned stimulus (US). For example, Rescorla (1974) reported that noncontingent exposure to a more intense shock US than that used during Pavlovian conditioning can enhance subsequent conditioned responding, also suggesting a role for the representation of the value of the US (Rescorla, 1974).
Whatever the merits of these two theories, the present study confirms that avoidance conditioning is mediated by a representation of the aversive reinforcer. Changing the value of a reinforcer of behavior allows the experimenter to test whether the animal has knowledge of the causal relationship between the response and reinforcer. The reduction in avoidance behavior during the drug-free extinction test in this study suggests that the rats had learned the negative contingency between the avoidance response and aversive foot shock. This negative contingency could thus engender goal-directed processes that underlie avoidance behavior even after an extensive degree of avoidance training.
This study provides evidence for the successful revaluation of a negative aversive reinforcer of free-operant lever press avoidance behavior, suggesting that rats learned the negative contingency between their avoidance response and the presentation of the aversive foot shock. This is the first study, to our knowledge, to demonstrate the revaluation of an aversive reinforcer on freeoperant avoidance behavior. This study is not only novel in assessing the neurobiological basis of this process in the brain, but also in its use of analgesic drugs to revalue the aversive reinforcer. The sensitivity of free-operant avoidance behavior to this revaluation procedure could thus provide a useful tool with which to study avoidance habits (e.g., after overtraining of the avoidance response), which may be relevant to anxiety disorders such as obsessive compulsive disorder (Gillan et al., 2013).