The ability of animals to extract predictive information from the environment to inform their future actions is a critical component of decision-making. This phenomenon is studied in the laboratory using the pavlovian–instrumental transfer protocol in which a stimulus predicting a specific pavlovian outcome biases choice toward those actions earning the predicted outcome. It is well established that this transfer effect is mediated by corticolimbic afferents on the nucleus accumbens shell (NAc-S), and recent evidence suggests that δ-opioid receptors (DORs) play an essential role in this effect. In DOR-eGFP knock-in mice, we show a persistent, learning-related plasticity in the translocation of DORs to the somatic plasma membrane of cholinergic interneurons (CINs) in the NAc-S during the encoding of the specific stimulus–outcome associations essential for pavlovian–instrumental transfer. We found that increased membrane DOR expression reflected both stimulus-based predictions of reward and the degree to which these stimuli biased choice during the pavlovian–instrumental transfer test. Furthermore, this plasticity altered the firing pattern of CINs increasing the variance of action potential activity, an effect that was exaggerated by DOR stimulation. The relationship between the induction of membrane DOR expression in CINs and both pavlovian conditioning and pavlovian–instrumental transfer provides a highly specific function for DOR-related modulation in the NAc-S, and it is consistent with an emerging role for striatal CIN activity in the processing of predictive information. Therefore, our results reveal evidence of a long-term, experience-dependent plasticity in opioid receptor expression on striatal modulatory interneurons critical for the cognitive control of action.
Adapting to a complex changing environment requires the capacity to extract predictive information from environmental events to guide future actions. Therefore, such cognitive control involves the integration of two learning processes encoding stimulus- and action-based predictions of rewarding outcomes. This integrative capacity is studied in the laboratory using the outcome-specific pavlovian–instrumental transfer (PIT) paradigm, in which a stimulus predicting a specific pavlovian outcome is shown to bias choice toward those actions that earn the predicted outcome (Colwill and Rescorla, 1988; Dickinson and Balleine, 1994, 2002).
At a neural level, the nucleus accumbens plays an important role in the way reward value and stimuli that predict reward affect the performance of, and choice between, goal-directed actions (Parkinson et al., 2000; Yin et al., 2008; Salamone and Correa, 2012), and it is well established that pavlovian–instrumental transfer is mediated by corticolimbic afferents on nucleus accumbens shell (NAc-S) (Corbit et al., 2001; Shiflett and Balleine, 2010; Balleine et al., 2011; Corbit and Balleine, 2011). Recent evidence suggests that δ-opioid receptors (DORs) play an essential role in this effect; pharmacological blockade or genetic deletion of DORs removes the ability of predictive learning to influence choice (Laurent et al., 2012). Nevertheless, given that predictive learning and the choices it informs take place at different times, it is unknown how the integration of pavlovian and instrumental learning emerges from this circuitry.
DORs belong to the superfamily of G-protein-coupled receptors, the largest group of cell-surface receptors involved in countless physiological and neuromodulatory processes (Cahill et al., 2007; Hanyaloglu and von Zastrow, 2008). In the brain, DORs and other opioid receptors have been shown to undergo dynamic membrane trafficking to adjust cellular responses to external stimuli (Cahill et al., 2007), a process that is directly related to neuronal plasticity events such as desensitization, resensitization, and tolerance (Dang and Christie, 2012). Although DORs have a substantial presynaptic distribution, their exact cellular localization has been debated (Svingos et al., 1998, 1999; Cahill et al., 2001). The generation of reporter DOR-eGFP knock-in mice (DOR-eGFPki), allowing the tracking of functional receptors in vivo (Scherrer et al., 2006; Pradhan et al., 2009; Faget et al., 2012), has revealed that, in the accumbens shell, DORs are localized postsynaptically on both GABAergic projection neurons and cholinergic interneurons (Le Moine et al., 1994; Scherrer et al., 2006; Pradhan et al., 2011).
Striatal cholinergic interneurons (CINs), despite representing only 2–3% of the neurons in the striatum, provide the main source of acetylcholine to all striatal regions (Sorimachi and Kataoka, 1975) and strongly modulate dopaminergic actions through complex regulation of presynaptic and postsynaptic acetylcholine receptors (Threlfell and Cragg, 2011). Influential hypotheses of striatal function have highlighted the role played by CINs in the way environmental context controls decision-making and the modulation of action selection (Apicella, 2007; Stocco, 2012; Bradfield et al., 2013). Therefore, given this involvement, the present study sought to establish the relationship between DOR activity in NAc-S CINs and the influence of predictive learning on choice between goal-directed actions.
Materials and Methods
A total of 183 mice were used in the present study (10 for histological experiments, 129 for behavior-fluorescence experiments, and 44 for behavior-electrophysiology experiments). In all experiments, we used homozygous male C57Bl/6 DOR-eGFP knock-in transgenic mice, in which functional delta-opioid receptor gene (Oprd1) fused to enhanced green fluorescent protein gene (eGFP) is inserted in the wild-type Oprd1 locus, which provides fluorescent DORs with maintained cellular functions (Scherrer et al., 2006). Importantly, these mice showed equal conditioned responses to C57BL/6 wild-type mice and expressed normal pavlovian and instrumental learning (data not shown). The initial colony was generously provided by the laboratory of Prof. B. L. Kieffer (CNRS, Illkirch, France). Mice were housed in plastic boxes (two to six mice per box) located in a climate-controlled colony room and were maintained on a 12 h light/dark cycle (light on at 7:00 A.M.). They were at least 8 weeks old at the start of the experiments. Five days before the behavioral procedures, all mice were handled daily and were put on a food deprivation schedule to maintain them at ∼85% of their ad libitum feeding weight. The Animal Ethics Committee at the University of Sydney approved all experimental procedures.
Training and testing took place in 32 operant chambers (MED Associates) enclosed in sound- and light-resistant shells. Each chamber was equipped with a pump fitted with a syringe that could deliver 0.025 ml of a 20% sucrose solution into a recessed magazine in the chamber. Each chamber was also equipped with two pellet dispensers that could individually deliver either grain food pellets (20 mg; Bioserve Biotechnologies) or chocolate food pellets (20 mg) when activated. The chambers contained two retractable levers that could be inserted to the left and right side of the magazine. An infrared photobeam crossed the magazine opening, allowing for the detection of head entries. A 3 W, 24 V house light provided illumination of the operant chamber, and each chamber contained a Sonalert that, when activated, delivered a 3 kHz pure tone, a 28 V DC mechanical relay that was used to deliver a 2 Hz clicker stimulus, and a white noise generator (80 dB). A set of four microcomputers running proprietary software (Med-PC; MED Associates) controlled all experimental events and recorded magazine entries and lever presses.
Contingent pavlovian training involved eight daily sessions during which the levers were retracted. Each session was of 60 min duration and consisted of presenting two conditioned stimuli (CS1 and CS2; noise, clicker, or tone), each paired with one or another food outcome (O1 and O2; sucrose solution, chocolate pellets, or grain pellets). CS–O pairings were counterbalanced in all experiments: half of the mice received CS1-O1 and CS2-O2, and the other half received CS1-O2 and CS2-O1. Each CS lasted 2 min in duration and was presented four times in a pseudorandom order with a variable intertrial interval of 5 min. O1 or O2 was delivered on a random-time 30 s schedule throughout the appropriate CS. Noncontingent pavlovian training was identical to contingent training except the conditioned stimuli and the delivery of the food outcomes were uncorrelated and dispersed across the entire session. Therefore, the S–O predictive relationships were weakened in this group, as the outcomes could be obtained in the presence or absence of either CS. The number of O1 and O2 delivered in one noncontingent training session was identical to the number of O1 and O2 given in a contingent training session. Conditioned responding (CR) during contingent and noncontingent training was analyzed using an elevation ratio of magazine entries. This ratio was obtained by dividing the total number of magazine entries during CS1 and CS2 by the addition of that number with the total number of magazine entries in the pre-CSs period [i.e., CS/(CS + pre-CS)]. A pre-CS period was defined for each CS presentation as the 2 min preceding that presentation. One elevation ratio per animal was calculated on each training day. An elevation ratio of 0.5 indicated that the animals entered the magazine as much during the CSs as outside the CSs (i.e., poor learning). In contrast, an elevation ratio close to 1 showed that the animal entered the magazine substantially more in the presence of the CSs than in their absence (i.e., good learning). The CR for each animal was defined as the average elevation ratio displayed across the past 3 d of pavlovian training. A mouse with a CR that was <1 SD below the mean was defined as a low-CR mouse. In contrast, a mouse with a CR that was >1 SD above the mean was defined as a high-CR mouse (see Figs. 3, 4).
Instrumental training was administered across 8 to 10 d during which two responses (left and right lever presses) were trained with the two different food outcomes (O1 and O2) in separate daily sessions. The order of the sessions was counterbalanced, as were the response–outcome relationships with the CS–outcome relationships established during pavlovian training. Each session ended when 20 outcomes were earned or when 30 min had elapsed. For the first 2 d, lever pressing was continuously reinforced (i.e., each response was reinforced). Then, the probability of the outcome given a response was gradually shifted over days using increasing random ratio (RR) schedules: a RR5 schedule (p = 0.2) was used on days 3–5, and a RR10 (p = 0.1) was used on days 6–8 or 6–10. Performance during instrumental training was assessed using the mean number of lever presses per minute. This rate of lever press was calculated for each animal during each session.
During the pavlovian–instrumental test, both levers were inserted into the box, but no outcomes were delivered. Responding was extinguished on both levers for 8 min to establish a low rate of baseline performance. Each CS was presented four times over the next 40 min in the following order: CS2-CS1-CS1-CS2-CS1-CS2-CS2-CS1. Stimulus presentations lasted 2 min and were separated by a 3 min fixed interval. In animals that had received contingent training, performance during the test was evaluated as lever press rate minus baseline when the stimulus predicted the same outcome as the response (Same), and when the CS predicted a different outcome from the response (Different). In animals that had been submitted to noncontingent training, Same was defined as left lever press rate minus baseline, and Different corresponded to right lever press rate minus baseline. This pseudorandom allocation of performance was justified because noncontingent training prevented the establishment of specific relationships between the stimuli and the outcomes. In all animals, baseline performance was subtracted as no difference between groups was ever detected (Fs < 1.4). In the correlative analyses conducted in contingent animals, a transfer score was initially calculated as follows: (rate Same/rate Baseline) minus (rate Different/rate Baseline). To reduce the influence of variation in instrumental learning on this score, a normalizing factor (N) was generated based on performance in the 8 min extinction period at the start of the PIT test. N was obtained by dividing the rate of performance (Rs per minute) for each subject by the mean rate of performance in the contingent group. The normalized transfer score was calculated by dividing the initial transfer score by N.
Transcardial fixation and sectioning
Immediately after the last behavioral exercise, mice were rapidly anesthetized first through exposure to isofluorane (Laser Animal Health) in a sealed chamber for 10 s (4% in air) and second through a lethal injection of sodium pentobarbital (500 mg/kg i.p.; Virbac) to minimize the stress of the animals before perfusion. After confirmation of a deeply anesthetized state through paw, tail, and eye reflexes, mice were transcardially perfused with 4% paraformaldehyde in 0.1 m sodium phosphate buffer (PB), pH 7.5, with a peristaltic pump (peristaltic flow, 15–20 ml/min) (Gilson Miniplus 3; Gilson). Brains were postfixed overnight in the same solution at 4°C. Coronal sections (30 μm, 1.78–1.1 mm from bregma) were cut with a vibratome (VT1000; Leica Microsystems) and stored at −20°C in a solution containing 30% ethylene glycol, 30% glycerol, and 0.1 m sodium phosphate buffer until they were processed for immunofluorescence.
Individualized free-floating sections were rinsed in Tris-buffered saline (TBS; 0.25 m Tris, 0.5 m NaCl, pH 7.5), incubated for 5 min in TBS containing 3% H2O2 and 10% methanol, and rinsed three times for 10 min in TBS. After a 20 min incubation in 0.2% Triton X-100 in TBS, sections were rinsed three times in TBS again. In all experiments, the DOR-eGFP signal was amplified through incubation with either polyclonal rabbit anti-eGFP primary antibody (1:500, catalog #A11122; Invitrogen) or polyclonal chicken anti-eGFP (1:1000, catalog #GFP-1020; Aves Labs) diluted in TBS (4°C, overnight). In initial immunofluorescence studies, eGFP immunoamplification was combined with DARPP-32 immunodetection through incubation with purified mouse anti-DARPP-32 (1:300, catalog #611520; BD Biosciences) in the same TBS solution. In triple immunofluorescence assays, eGFP immunoamplification was combined with simultaneous detection of choline acetyltransferase (ChAT) and either synapsin I or GAD65/67 through incubation with combined polyclonal goat anti-ChAT (1:500, catalog #AB144P; Millipore), polyclonal rabbit anti-Synapsin I (1:1000, catalog #51-5200; Invitrogen) or polyclonal rabbit anti-GAD65/67 (1:1000, catalog #G5163; Sigma-Aldrich) diluted in TBS (4°C, overnight). Sections were then rinsed 10 min in TBS three times and incubated 60 min at room temperature with compatible sets of fluorescent secondary antibodies diluted 1:400 in TBS (purchased from Jackson ImmunoResearch Laboratories and Invitrogen): donkey anti-rabbit Alexa 488 (eGFP amplification), guinea pig anti-chicken FITC (eGFP amplification when other primary rabbit was used), donkey anti-mouse Cy3 (DARPP-32), donkey anti-goat Alexa Fluor 594 or CY3 (ChAT), and donkey anti-rabbit Alexa Fluor 647 (Synapsin I and GAD65/67). Sections were rinsed four times for 10 min in TBS, mounted in Superfrost Plus-coated slides (Thermo Scientific), and let dry for 10 min before being coverslipped in Vectashield mounting medium for fluorescence (Vector Laboratories).
All images were obtained using sequential laser-scanning confocal microscopy (LSM 710, Carl Zeiss; FV300 and FV1000, Olympus), and all fluorescent quantifications were performed with Open Source ImageJ software (MacBiophotonics upgrade version 1.43u, Wayne Rasband, National Institutes of Health, Bethesda, MD). In the initial study of DOR distribution in different striatal subregions (Fig. 1B), low-magnification confocal sections covering a significant part of the structure were taken (surface, 635.2 μm2; optical magnification, 20×; averaging scans, five; pixel depth, 16 bit; resolution, 2.519 pixels/μm). One image per region was obtained in each hemisphere, in a total of six mice (12 hemispheres). Acquisition parameters remained invariable for all images. Before fluorescence quantification, the signal contained in fiber bundles was excluded by applying a threshold (Fig. 1B, insets, thr), so only relevant striatal tissue was considered. Mean gray value was then quantified in every image, and final values per region were expressed as averaged hemispheres. In the study of DOR distribution within CINs (Fig. 2E), biocytin-filled neurons in 250 μm slices were reconstructed through a stack of 44 1024 × 1024 confocal sections (303.64 μm2; step size, 2 μm) and shown as Z-projection (SD). Amplified stacks of the soma, proximal dendrites, and distal dendrites were also acquired (18–36 1024 × 1024 confocal images; 42.51 μm2; step size, 0.7 μm), and single optical planes of eGFP fluorescence containing an optimal section of the soma, proximal dendrite, and distal dendrite were selected (Fig. 2E, right). In subsequent fluorescence studies, the somatic membrane distribution of DORs was studied in individualized cholinergic interneurons of the NAc-S by obtaining a single confocal image of all clearly ChAT-immunoreactive neurons found in the region of interest (bilaterally) in 30 μm tissue sections (total of acquisitions was 3118 neurons in 115 mice). For each neuron, a single focal plane with optimal ChAT immunoreactivity was determined in channel 2 (Ch02, HeNe green laser). Sequential 58.93 μm2 single confocal images (optical magnification, 60×; digital zoom, 4×; resolution, 17.378 pixels/μm) were obtained for the ChAT signal (Ch02, HeNe green laser intensity is usually 20.0%; photomultiplier tube (PMT), 720v; offset, 2%), and the corresponding DOR-eGFP+A488/FITC signal (Ch01, Ar laser intensity is usually 15.0%; PMT, 740v; offset, 2%) with a Kaplan filter (five averaging scans). With these images, two complementary fluorescence analyses were used in this study. In the first type of analysis (Fig. 2F), images were converted to RGB (8 bits), and mean gray value line profiles were plotted from 10 to 20 segments distributed along the somatic area of the ChAT image (2 μm length, comprising a continuous line of 35 pixels). Segments were placed perpendicular to the edge and with the center located in the intracellular–extracellular interphase defined by the ChAT staining. Values for each segment were then collected from the overlapped eGFP/A488/FITC image. The second type of analysis (Fig. 2G) was performed on raw 12-bit images, where two different regions of interest were defined in the ChAT image of each neuron: ROI 1 comprised the somatic region (located in the intracellular–extracellular interphase defined by the ChAT staining), whereas ROI 2 was used as a background correction, and comprised the nuclear region (as defined by the central region devoid of ChAT staining). Mean gray value for each ROI was then collected from the overlapped eGFP/A488/FITC image and expressed, for each neuron, as ROI1–ROI2. One single fluorescence value was finally obtained per animal (average of all the quantified neurons). In all cases, microscope acquisitions were performed by an experimenter unaware of the behavioral score underlying the samples, and all image files in each experiment were randomly renumbered using a Microsoft Excel plug-in (fabricated by Romain Bouju, Paris, France) before all quantifications.
Brain slice preparation
Male DOR-eGFP mice (9 weeks old) with prior pavlovian training were killed under deep anesthesia by isoflurane inhalation (4% in air). The brain was rapidly removed and cut using a vibratome (VT1200S; Leica Microsystems) in ice-cold oxygenated sucrose buffer containing (in mm) 241 sucrose, 28 NaHCO3, 11 glucose, 1.4 NaH2PO4, 3.3 KCl, 0.2 CaCl2, and 7 MgCl2. Coronal brain slices (250 μm thick) containing the nucleus accumbens were sampled and maintained at 33°C in a submerged chamber containing physiological saline with the following composition (in mm): 126 NaCl, 2.5 KCl, 1.4 NaH2PO4, 1.2 MgCl2, 2.4 CaCl2, 11 glucose, and 25 NaHCO3, equilibrated with 95% O2 and 5% CO2.
After equilibration for 1 h, slices were transferred to a recording chamber and visualized under an upright microscope (BX50WI; Olympus) using differential interference contrast (DIC) Dodt tube optics and was superfused continuously (1.5 ml/min) with oxygenated physiological saline at 33°C. Cell-attached and whole-cell patch-clamp recordings were made using electrodes (2–5 MΩ) containing internal solution (in mm): 115 K gluconate, 20 NaCl, 1 MgCl2, 10 HEPES, 11 EGTA, 5 Mg-ATP, and 0.33 Na-GTP, pH 7.3; osmolarity, 285–290 mOsm/l. Biocytin (0.1%) was routinely added to the internal solution for marking the sampled neurons during whole-cell recording. Data acquisition was performed with an Axopatch 200A amplifier (Molecular Devices), connected to a Macintosh computer and interface ITC-16 (InstruTECH). Liquid junction potentials of −10 mV were corrected. In cell-attached mode, action potentials were sampled at 10 kHz (low-pass filter, 5 kHz), and whole-cell currents were sampled at 5 kHz (low-pass filter, 2 kHz; Axograph X, Axograph). Whole-cell recordings were established immediately following data collection in cell-attached mode. Stock solutions of all drugs were diluted to working concentrations in the extracellular solution immediately before use and applied by continuous superfusion. Data from cell-attached and whole-cell recordings were only included in analyses if (1) the neurons appeared healthy under DIC on a monitor screen showing smooth, even cell membrane texture and integrity without visible nucleus; (2) cholinergic interneurons were spontaneously active during cell-attached recording; (3) action potential amplitudes were at least 70 mV after establishing whole-cell recording mode; and (4) neurons demonstrated physiological characteristics of cholinergic interneurons, such as presence of hyperpolarization-activated cation current Ih but no plateau low-threshold spiking (Kawaguchi, 1993), to ensure that only highly viable neurons were included.
For biocytin-filled neurons, immediately after whole-cell physiological recording, brain slices were fixed overnight in 4% paraformaldehyde/0.16 m PB solution followed by placing them in 0.3% Triton X-100/PB for 3 d to permeabilize cell membrane. Slices were rinsed in PB and then incubated in Cy3-conjugated ExtrAvidin/PB solution (1:500; Sigma-Aldrich) for 2 h. Stained slices were rinsed, mounted onto glass slides, dried, and coverslipped with Vectashield mounting medium (Vector Laboratories). In post hoc DOR-eGFP analyses in ChAT-immunoreactive neurons in slices, fixation and permeabilization were performed equally, and postpermeabilization immunofluorescence procedures were the same as the ones described above.
Drugs and chemicals
The DOR antagonist naltrindole hydrochloride (Tocris Bioscience) was dissolved in distilled water to obtain a final concentration of 5 mg/ml. It was injected intraperitoneally at a volume of 1 ml/kg before the specific PIT test. Distilled water was used as the vehicle to control for any effect of the injection procedure per se. Biocytin and picrotoxin were purchased from Sigma-Aldrich. dl-AP-5 and CNQX disodium were from Ascent Scientific. [D-Ala2]-deltorphin II was from Tocris Bioscience.
Statistical analyses were conducted using within-subjects or mixed-model ANOVA depending on the experimental design (unless stated otherwise). For all analyses, significance was assessed against a type I error rate of 0.05. ANOVAs were followed by simple main-effects analyses to establish the source of any significant interactions. All correlative studies were analyzed with the Pearson product moment correlation.
DOR detection in the somatic membrane of ventral cholinergic interneurons
We first analyzed the histological distribution of DORs in the striatum of DOR-eGFPki mice using confocal microscopy and confirmed previously reported enrichment of DORs in striatal tissue (Scherrer et al., 2006) (Fig. 1). In ventral regions, enhanced fluorescent signal in the NAc-S contrasted with the weaker signal recorded in adjacent areas of the nucleus accumbens core (NAc-C; Fig. 1A). Fluorescence quantification in different striatal regions revealed a main effect of region (F(1,22) = 32.48, p < 0.001), with the NAc-C expressing significantly less fluorescence than the other striatal regions. Higher-magnification analyses of the NAc-S revealed a complex tissue distribution of DORs (Svingos et al., 1999), with only modest levels of colocalization with the postsynaptic projection marker DARPP-32 (Fig. 1A, insets) and the nerve terminal markers synapsin I and GAD65/67 (Fig. 1C,D). Importantly, we detected the presence of DORs in the somatic membrane of some CINs (Fig. 1C,D), consistent with previous findings in other striatal areas (Le Moine et al., 1994; Scherrer et al., 2006). In striatal slice preparations, we identified CINs in the NAc-S by their electrophysiological profile (Fig. 2A–D) (Kawaguchi et al., 1995; Bertran-Gonzalez et al., 2012), followed by post hoc confocal reconstructions of biocytin spread (Fig. 2E). Analysis of eGFP fluorescence showed a particular enrichment of DORs in the plasma membrane, which was reduced in proximal dendrites and undetectable in distal dendrites (Fig. 2E, right).
Based on the importance of DORs for the influence of reward-related stimuli on choice in PIT (Laurent et al., 2012) and the established role of CINs in associative learning (Apicella, 2007; Brown et al., 2012; Stocco, 2012), we sought to investigate the relationship between specific PIT and membrane DOR distribution in CINs of the NAc-S. Because of the relative low density of CIN somata in the striatum, we developed a method for exhaustive analysis of DOR membrane expression in these neurons through fluorescence quantification in DOR-eGFPki mice, in which the extent of membrane fluorescence was assessed in all individual CINs detected in a NAc-S section (Fig. 2F,G).
Predictive learning induces DOR accumulation in the somatic membrane of NAc-S CINs
In our first experiment, food-deprived DOR-eGFPki mice were exposed to a standard PIT protocol (Fig. 3A). Pavlovian conditioning, in which two different stimuli (S1 and S2) were paired with distinct food outcomes (O1 and O2), was followed by instrumental training in which two different lever press responses (R1 and R2) were trained with each action earning one of the outcomes used in pavlovian conditioning (Fig. 3A; see Materials and Methods). In parallel, two control groups were exposed either to the context alone or to pavlovian training without instrumental training (Non-Inst). In the PIT and Non-Inst groups, conditioned responding during pavlovian training gradually increased across days to both cues (Fig. 3B; F(7,161) = 21.6, p < 0.001), and a gradual increase in training lever press responding was observed only in the PIT group (Fig. 3C; F(9,126) = 21, p < 0.001). Mice were then exposed to a PIT test, in which they were allowed to choose between the instrumental actions in the presence of the predictive stimuli. As expected (Rescorla, 1994), only mice that had received the complete training altered their choice performance during the stimuli, selecting the action previously associated with the outcome predicted by the stimulus more than the other action (i.e., S1: R1 > R2 and S2: R2 < R1; Fig. 3D; F(1,14) = 6.5, p < 0.05).
Quantification of DOR membrane distribution revealed an increase in membrane DOR accumulation in NAc-S CINs from mice that were given pavlovian training compared with those only exposed to the context (Fig. 3E,F; F(1,14) = 7.1, p < 0.05). Interestingly, all animals exposed to pavlovian training showed equivalent membrane DOR levels whether or not they underwent instrumental training (Fig. 3F; F = 0.7). To test the possibility that membrane DOR expression is induced specifically by the PIT test, animals were exposed to pavlovian, followed by instrumental, training but were not exposed to the test phase. No difference in membrane DOR expression was found between this group and a group given the PIT test [1152.7 ± 46.8 (n = 8) vs 1278.5 ± 110.0 (n = 7); F = 1.2]. Together, these results suggest that increased membrane DOR expression in NAc-S CINs is induced by pavlovian conditioning, whether or not instrumental training or the PIT test were given. They also suggest that this increase, like the influence of predictive learning on choice, persists for at least the 11 d after cessation of this conditioning before the test.
This finding led us to hypothesize that encoding stimulus–outcome (S–O) contingencies during initial pavlovian training promotes increased DOR expression in the somatic membrane of NAc-S CINs. Indeed, when we calculated the individual conditioned responses (CRs) performed during the last 3 d of pavlovian training (Fig. 3B; see Materials and Methods), mice displaying a high degree of conditioned responding expressed considerably more DORs in the somatic membrane of their NAc-S CINs than those showing a lower degree of responding (Fig. 4A,B). Moreover, when all animals trained for pavlovian conditioning were considered, conditioned responding correlated significantly with membrane DOR expression in CINs (Fig. 4C; r = 0.511, p < 0.01).
Pavlovian conditioning modifies firing of NAc-S CINs
Based on previous reports that the burst–pause firing pattern of CINs is implicated both in plasticity associated with CINs (Ding et al., 2010; Schulz and Reynolds, 2013) and with associative learning in the NAc (Brown et al., 2012), we hypothesized that the accumulation of DORs in the membrane of CINs in response to pavlovian training reported above should (1) be observed shortly after pavlovian training and (2) induce detectable changes in the basal tonic firing pattern of these neurons. We therefore focused our next electrophysiology experiments on the initial S–O learning phase. Immediately after the last pavlovian training session (Fig. 5A), DOR-eGFPki mice showing the highest and the lowest levels of conditioned responding in each session were selected for slice electrophysiology (Fig. 5B). In parallel, using fluorescence analyses of consecutive slices, we confirmed in these samples the increase in membrane DOR accumulation in NAc-S CINs of mice expressing high versus low CRs, but now immediately after pavlovian training (Fig. 5C; F(1,39) = 9.3, p < 0.01), suggesting that membrane DOR accumulation is developed during pavlovian training and not subsequently during instrumental training or the noninstrumental “incubation period.” Furthermore, in cell-attached recordings from the NAc-S, CINs in high-CR mice showed a more irregular action potential firing, which combined periods of intermittent firing with burst-type firing (Fig. 5D), overall resulting in greater variability in the instantaneous action potential frequency compared with mice expressing low CRs (Fig. 5E; Mann–Whitney U test, p < 0.05). However, overall mean action potential frequencies between groups remained unchanged (low CRs: 3.1 ± 0.7 Hz, n = 7; high CRs: 3.7 ± 0.4 Hz, n = 7; Mann–Whitney U test, p = 0.26).
The stimulus–outcome contingency determines DOR accumulation and CIN activity in the NAc shell
Our data clearly suggest that the S–O contingency acquired during pavlovian conditioning is driving the accumulation of DORs to the membrane of NAc-S CINs, an effect that might influence basal firing of these neurons. To further test this hypothesis, we used a pavlovian conditioning control to which the same number of stimuli and outcomes were presented as to the paired group but were arranged in a random, noncontingent manner (Fig. 6A; see Materials and Methods), a treatment that resulted in a lower level of CR across training (Fig. 6B; F(1,63) = 30.9; p < 0.001). Groups given normal, contingent training and context exposure were run in parallel (Fig. 6A,B). Importantly, the main effect of the noncontingent procedure was to reduce the predictive value of the stimuli rather than simply augment contextual learning. Indeed, the rates of magazine entries elicited by the stimuli in the noncontingent group (7.65 ± 2.24; mean ± SEM) were identical to the rates displayed by this group and the contingent group in the absence of the stimuli (7.14 ± 2.01 and 8.13 ± 2.47, respectively). Confocal microscopy analysis of DOR distribution in NAc-S ChAT-immunoreactive neurons revealed a higher membrane accumulation of the receptor in the contingent group compared with both the context-exposed and the noncontingent control groups (Fig. 6C,D; F(1,11) = 7.62, p < 0.05 and F(1,10) = 5.72, p < 0.05). We next trained contingent and noncontingent groups for slice electrophysiology experiments (F(1,154) = 12, p < 0.01) and found similar firing patterns to those observed in high- and low-CR mice (Fig. 6E,F), i.e., NAc-S CINs of mice in the contingent group displayed significantly higher variance in their instantaneous action potential frequency, a sign of an increased incidence in irregular/burst firing (Fig. 6F; Mann–Whitney U test, p < 0.01). Again, mean action potential frequencies remained unchanged between groups (noncontingent: 2.0 ± 0.4 Hz, n = 21; contingent: 2.8 ± 0.5 Hz, n = 21; Mann–Whitney U test, p = 0.22). These results established that effective conditioning of the S–O association not only modified DOR cellular distribution in NAc-S CINs, but also induced clear changes in the pattern of basal action potential firing, which may provide a neuronal plasticity framework to direct future stimulus-guided choice.
Predictive learning anticipates both DOR membrane expression and choice
To confirm the relationship between DOR translocation and choice, we evaluated the extent of DOR membrane expression and the magnitude of outcome-specific transfer observed during PIT. DOR-eGFPki mice received pavlovian training under contingent and noncontingent schedules (Fig. 7A; F(1,140) = 110.4, p < 0.001). As previously, noncontingent delivery of the food outcomes abolished the predictive value of the stimuli. The rates of magazine entries in the presence or absence of the stimuli were identical in noncontingent mice (7.65 ± 2.24 and 7.14 ± 2.01, respectively) and similar to those displayed by contingent mice when no stimuli were presented (8.13 ± 2.47). Mice were then given instrumental training (Fig. 7B; F(7,140) = 134.9, p < 0.001) before being subjected to a PIT test, in which the effect of the stimuli on choice was assessed (Fig. 7C). Contingent pavlovian training promoted outcome-specific PIT (F(1,21) = 22.18, p < 0.001), whereas noncontingent training abolished this effect (Fig. 7C; F < 1.7). Again, DOR accumulation in the membrane of CINs of the NAc-S was significantly higher in mice that had acquired the S–O contingency during the initial pavlovian training phase (Fig. 7D,E). Of note, this effect was not observed in CINs of other nearby striatal territories (Fig. 7F,G), such as the anterior dorsomedial striatum (aDMS; F < 0.32) and the NAc-C (F < 0.65), demonstrating the specificity of the changes observed in the NAc-S. Importantly, a strong positive correlation was observed between DOR accumulation and choice performance during the PIT test in contingently trained animals (Fig. 7H; r = 0.61, p < 0.05). Thus, these results demonstrate that the plastic changes triggered by learning specific S–O contingencies are intrinsic to the NAc-S and, furthermore, that they not only correlate with performance during pavlovian training, as previously shown here, but also with the later influence of that training on choice between actions.
DOR are functionally involved in contingency-induced CIN firing changes and are essential for stimulus-guided choice
In a final set of experiments, we sought to evaluate the cause–effect relationship between DOR plasticity, predictive learning, and PIT through specific pharmacology. We first gave mice initial pavlovian training under contingent and noncontingent schedules and processed them for slice electrophysiology after the last day of training (Fig. 6A). We found that the DOR agonist deltorphin (300 nm) increased the irregular/burst firing pattern of CINs when bath applied to the NAc-S preparations of contingently trained mice (Fig. 8A), an effect that was absent in noncontingent controls (Fig. 8B,C; Mann–Whitney U test, p < 0.05). This difference was not attributable to outliers; the difference remained significant when the outliers in each group were removed (p = 0.0188). Overall, action potential frequencies changed by deltorphin were not different between groups (noncontingent: −24 ± 10%, n = 10; contingent, −15 ± 13%, n = 10; Mann–Whitney U test, p = 0.74). Importantly, in a subset of animals (n = 5), we performed membrane DOR expression studies in all CINs detected in a postfixed in vitro slice consecutive to the ones used for electrophysiology recordings and contrasted each individual's data with their behavioral score (mean elevation ratio on the last 3 d of training) and their recorded electrophysiological responses [mean action potential (AP) frequency variance per individual; data not shown]. When we averaged the electrophysiological and histochemical data down to one value per individual, five animals were enough to reveal a significant positive correlation, not only between conditioned responding and the level of membrane DORs as previously shown (r = 0.946, p < 0.05) but also between the latter and the variance in AP firing displayed by CINs, even when both measures were obtained in different cells (r = 0.9, p < 0.05). Moreover, comparing levels of conditioned responding and AP firing variance also showed a positive correlation, although it did not reach statistical significance (r = 0.792, p = 0.11). These results suggest, therefore, that the increase of DOR accumulation in the membrane of NAc-S CINs induced by contingent pavlovian training influenced the firing pattern of these neurons and increased the variance in action potential firing.
To confirm the functional relevance of DORs at the moment of stimulus-guided choice, we gave a new set of DOR-eGFPki mice both pavlovian and instrumental training (Fig. 8D,E; F(7,91) = 16.7, p < 0.001 and F(7,91) = 33.6, p < 0.001) and challenged them immediately before the PIT test with either vehicle or the specific DOR antagonist naltrindole (5 mg/kg; Fig. 8F). As we recently reported in DOR-knock-out mice and in rats locally injected in the NAc-S (Laurent et al., 2012), DOR blockade using naltrindole prevented the expression of outcome-specific PIT in DOR-eGFPki mice (F < 0.1) compared with vehicle-treated mice (Fig. 8F; F(1,15) = 18.85, p < 0.01) confirming the role of DOR on stimulus-guided choice.
Although information derived from predictive learning is known to bias future choice between goal-directed actions, it has remained unclear what cellular changes are sufficiently persistent to allow such learning to interact with subsequent decision-making in this way. We previously reported that a DOR-related process in the NAc-S is critical for PIT (Laurent et al., 2012). Here we demonstrate that a critical cellular change mediating this effect involves a persistent learning-related translocation of DORs to the membrane of cholinergic interneurons specifically in the shell region of the ventral striatum. This receptor translocation was induced by predictive learning and, although it was not essential for that learning process itself, was found to determine subsequent choice between goal-directed actions in tests assessing the influence of such learning. The most striking finding was the specific involvement of the stimulus–outcome contingency in the accumulation of DORs at the membrane of NAc-S CINs, an effect that not only produced functional changes in the cellular responses of these neurons in vitro but also influenced future stimulus-guided decision-making. Although DORs are found on the processes of other cell types in the striatum (Scherrer et al., 2006), the close relationship between membrane translocation in NAc-S CINs and both the degree of conditioned responding and the degree of PIT revealed in the current experiments provides consistent evidence of experience-dependent plasticity in the primary neuromodulatory system in the striatum. Furthermore, this evidence of learning-related translocation was specific to the NAc-S; no evidence of this effect was found in the adjacent accumbens core or anterior dorsomedial striatum.
Although membrane insertion of DORs in the CNS has been reported previously, following, for example, chronic morphine or chronic inflammation (Commons, 2003; Bie and Pan, 2007; Pradhan et al., 2011), to our knowledge, this is the first demonstration of a persistent, long-term change in the translocation of a G-protein-coupled receptor induced purely by learning and specifically by predictive learning. Its persistence is particularly remarkable. Despite the widespread distribution of DORs (Scherrer et al., 2006), DOR agonists and antagonists appear to have minimal direct effects on neural excitability under naive conditions (Hack et al., 2005; Bie and Pan, 2007). A widely accepted explanation for this lies in the fact that, at rest conditions, DORs are predominantly localized intracellularly, away from the membrane surface. Chronic morphine and prolonged pain exposure, however, produce an increased sensitivity to DOR agonists (Cahill et al., 2001, 2003; Ma et al., 2006; Chieng and Christie, 2009), an effect that is accompanied by significant membrane translocation of DORs in those neurons when verified ultrastructurally (Cahill et al., 2001, 2003). Although the mechanisms underlying DOR translocation remain unclear, evidence strongly suggests that membrane DOR translocation is responsible for the increased sensitivity to DOR-specific pharmacology. Certainly, in the present study, we found that DOR membrane translocation, acquired during initial conditioning, modified basal firing of NAc-S CINs, which expressed a more irregular/burst pattern of activity and increased the sensitivity to the DOR-specific agonist deltorphin, providing direct evidence of functional cellular consequences associated with learning-induced changes in DOR distribution.
The finding that predictive learning produced a shift in the firing pattern of NAc-S CINs is important, especially considering recent advances clarifying the involvement of CINs in dopamine transmission and striatal function (Aosaki et al., 2010; Threlfell and Cragg, 2011). When firing rhythmically, CINs appear to exert a uniform inhibitory effect on target neurons, mainly medium spiny projection neurons (MSNs). When activated by their afferents, however, CINs increase in firing rate variability producing a patterned change in acetylcholine release and a commensurately patterned disinhibition at local MSNs (Ding et al., 2010). We found that DOR translocation in the NAc-S generated increased irregular/burst firing without a change in action potential frequency, suggesting that DOR activity may participate in enhancing the burst and prolonging the pause period, a feature shown to be critical for corticostriatal plasticity (Goldberg et al., 2012). Such changes have also previously been observed during pavlovian conditioning; indeed, during conditioning, CINs have been reported to increase their burst–pause firing pattern in a manner time-locked to the pairing of conditioned stimuli with reward (Kimura et al., 1984; Goldberg et al., 2012). This effect has been mostly attributed to activity in the thalamostriatal pathway from which CINs receive a direct excitatory input, which, in the accumbens shell, is from the paraventricular nucleus. There are, however, many potential causes for changes in CIN firing. For example, long-range GABAergic inputs to CINs from the ventral tegmental area have recently been reported to pause CIN tonic firing and to enhance the discrimination of unpaired and paired stimuli in a fear-conditioning paradigm (Brown et al., 2012). Here we describe a new mechanism that, through predictive learning, allows CINs in the NAc-S to acquire the capacity of efficiently interrupting their activity through the accumulation of the DOR, an inhibitory G-protein-coupled receptor, in the membrane of these neurons. The endogenous ligand of DORs, enkephalin, is a good candidate for the source of this inhibition to CINs as it is produced by surrounding D2-containing MSNs, one of the two biggest populations of striatal projection neurons, with extensive inputs onto CINs (McGinty, 2007; Gonzales et al., 2013). It is noteworthy that DORs were found to be accumulating mainly in the somatic region of CINs, which is a hot spot of regulation for the efficient interruption of dendro-axonal communication (Freund and Katona, 2007). It is clear, however, that future research will need to address the precise timing of enkephalin regulation and its involvement in the generation and/or maintenance of firing pauses during conditioning events (Goldberg and Reynolds, 2011).
With regard to NAc-S functions, it is important to recognize that, although not important for pavlovian or instrumental conditioning per se, this region plays an essential role in the integration of these two learning processes to guide choice between future actions in the presence of reward-related stimuli (Corbit et al., 2001; Corbit and Balleine, 2011; Laurent et al., 2012; present study). We do not know, however, how changes in NAc-S code for specific contingency events and how new contingencies are integrated within this circuit. One obvious possibility is that specific stimulus–outcome relationships are encoded in other brain regions that would, in turn, control the extent of membrane DOR accumulation in NAc-S CINs. One candidate region in this regard is the basolateral amygdala (BLA). Activity in the BLA is necessary for learning and updating the stimulus–outcome associations established during pavlovian training, and inactivation of the BLA removes the influence of these associations on choice between actions (Ostlund and Balleine, 2008). Moreover, the BLA sends extensive projections to the NAc-S that have been implicated in reward-related behavior (Stuber et al., 2011), and BLA–NAc-S disconnection has been found to remove the influence of reward-related stimuli on choice between actions (Shiflett and Balleine, 2010). Therefore, additional research will be needed to evaluate whether DOR membrane translocation in NAc-S CINs relies on activity within the BLA.
A related question is how new stimulus–outcome learning is integrated with existing contingencies to control actions. Although the precise biochemical strategies that striatal circuits used to integrate predictive information to influence future actions are far from being understood, they likely involve the complex regulation of the neuromodulatory systems that adaptively control corticostriatal transmission. In fact, although relevant to different functions, cholinergic activity in cortex and the hippocampus has long been argued to regulate the interaction between new and existing plasticity (Hasselmo, 1999; Froemke et al., 2007; Newman et al., 2012), and there are good reasons to believe that striatal CINs serve this kind of function also based on findings suggesting that the thalamic inputs to CINs exert state or contextual control over striatal plasticity (Kimura et al., 1984; Apicella, 2007; Bradfield et al., 2013). Indeed, the fact that the plastic changes reported in our current study occurred in neuromodulatory systems (i.e., cholinergic and opioid systems), rather than glutamatergic and GABAergic systems, adds a new dimension to the concept of neuronal plasticity, suggesting that, over and above the principal projection neurons, modulatory interneurons can also be subject to extensive plastic adaptations (Fino and Venance, 2011).
Finally, it is worth noting that the current findings have important implications for how changes or impediments in experience-mediated neuronal changes influence future neuronal activity and behavior. The integrative process occurring during pavlovian–instrumental transfer is critical for the stimulus control of action, and deficits in this function have been associated with a number of disorders, most notably stimulus-induced relapse in drug seeking after a period of abstinence, the stimulus control of food seeking in obesity, and the loss of control over perseverative actions in a number of psychiatric disorders, including psychotic disorders and depression (Hyman, 2005; Seymour and Dolan, 2008; Simpson et al., 2010). There is, therefore, growing evidence of pathologies in decision-making involving the NAc-S (Kalivas and Volkow, 2005; Simon et al., 2011; Stopper and Floresco, 2011), and, given the fact that DOR ligands differentially regulate DOR trafficking in vivo (Pradhan et al., 2010), the current findings may suggest a potential target for rescuing those deficits.
This work was supported by National Institute of Mental Health Grant MH56646, National Health and Medical Research Council Grant 633267 (B.W.B.), an Australian Laureate Fellowship from the Australian Research Council (B.W.B.), and a National Health and Medical Research Council Senior Principal Research Fellowship (M.J.C.). We thank Ashley Morse for assistance and Prof. Brigitte Kieffer for DOR-eGFP knock-in mice.
- Correspondence should be addressed to Bernard W. Balleine, Brain and Mind Research Institute, University of Sydney, 100 Mallett Street, Camperdown, NSW 2050, Australia.