Abstract
To make an appropriate decision, one must anticipate potential future rewarding events, even when they are not readily observable. These expectations are generated by using observable information (e.g., stimuli or available actions) to retrieve often quite detailed memories of available rewards. The basolateral amygdala (BLA) and orbitofrontal cortex (OFC) are two reciprocally connected key nodes in the circuitry supporting such outcome-guided behaviors. But there is much unknown about the contribution of this circuit to decision making, and almost nothing known about the whether any contribution is via direct, monosynaptic projections, or the direction of information transfer. Therefore, here we used designer receptor-mediated inactivation of OFC→BLA or BLA→OFC projections to evaluate their respective contributions to outcome-guided behaviors in rats. Inactivation of BLA terminals in the OFC, but not OFC terminals in the BLA, disrupted the selective motivating influence of cue-triggered reward representations over reward-seeking decisions as assayed by Pavlovian-to-instrumental transfer. BLA→OFC projections were also required when a cued reward representation was used to modify Pavlovian conditional goal-approach responses according to the reward's current value. These projections were not necessary when actions were guided by reward expectations generated based on learned action-reward contingencies, or when rewards themselves, rather than stored memories, directed action. These data demonstrate that BLA→OFC projections enable the cue-triggered reward expectations that can motivate the execution of specific action plans and allow adaptive conditional responding.
SIGNIFICANCE STATEMENT Deficits anticipating potential future rewarding events are associated with many psychiatric diseases. Presently, we know little about the neural circuits supporting such reward expectation. Here we show that basolateral amygdala to orbitofrontal cortex projections are required for expectations of specific available rewards to influence reward seeking and decision making. The necessity of these projections was limited to situations in which expectations were elicited by reward-predictive cues. These projections therefore facilitate adaptive behavior by enabling the orbitofrontal cortex to use environmental stimuli to generate expectations of potential future rewarding events.
Introduction
Appropriate decision making requires the accurate anticipation of potential rewarding outcomes. Often these rewards are not present or noticeable in the immediate environment. So one must use information that can be observed, such as the presence of stimuli or available actions, to enable the mental representation of the critical information needed to make a choice: future possible outcomes. Indeed, stored knowledge of specific stimulus–outcome or action–outcome relationships permits recollection of the detailed reward memories that facilitate the outcome expectations that influence conditional responses, reward seeking, and decision making (Balleine and Dickinson, 1998; Delamater and Oakeshott, 2007; Delamater, 2012; Fanselow and Wassum, 2015). Detailed reward predictions enable adaptive behavior by allowing individuals to rapidly adjust to environmental changes and to infer the most advantageous option in novel situations. But disruptions in this process can lead to the cognitive symptoms underlying many psychiatric diseases.
The orbitofrontal cortex (OFC) and basolateral amygdala (BLA) are two identified key nodes in the circuitry supporting outcome-guided behaviors. Damage to either region causes performance deficits when specific rewarding events must be anticipated (Gallagher et al., 1999; Blundell et al., 2001; Pickens et al., 2003, 2005; Izquierdo et al., 2004; Wellman et al., 2005; Machado and Bachevalier, 2007; Ostlund and Balleine, 2007b, 2008; Johnson et al., 2009; West et al., 2011; Jones et al., 2012; Scarlet et al., 2012; Rhodes and Murray, 2013; Malvaez et al., 2015). These regions share dense and reciprocal direct connections (Carmichael and Price, 1995; Price, 2007) and associative encoding in one region has generally been shown to be altered by lesions of the other (Schoenbaum et al., 2003; Saddoris et al., 2005; Hampton et al., 2007; Rudebeck et al., 2013, 2017; Lucantonio et al., 2015). The unique contribution of each region is still a matter of debate, but there is some evidence to suggest that the BLA might acquire reward representations, whereas the OFC is more important for using this information to generate the expectations that guide action (Pickens et al., 2003; Wellman et al., 2005; Wassum et al., 2009; Jones et al., 2012; Parkes and Balleine, 2013; Takahashi et al., 2013; Gore et al., 2015). The OFC may be especially needed when critical determining elements of future possible states (e.g., potential rewarding outcomes) are not readily observable (Wilson et al., 2014; Bradfield et al., 2015; Schuck et al., 2016). But understanding of BLA-OFC function is limited by the fact that the contribution of direct, monosynaptic projections and the direction of information transfer are unknown. Therefore, here we used designer receptor-mediated inactivation of OFC→BLA or BLA→OFC monosynaptic projections to evaluate their respective contributions to the ability to use detailed reward expectations to influence reward seeking and decision making. Follow-up tests focused on the specific contribution of BLA→OFC projections.
Materials and Methods
Subjects
Subjects were male, Long–Evans rats (n = 60 total, Charles River Laboratories) weighing between 300 and 390 g (age ∼3 months) at the beginning of the experiment. Rats were pair housed and handled for ∼5 d before the onset of the experiment. Training and testing took place during the dark phase of the 12:12 h reverse dark/light cycle. Rats had ad libitum access to filtered tap water in the home cage and were maintained on a food-restricted schedule whereby they received 12–14 g of their maintenance diet (Lab Diet) daily to maintain ∼85%-90% free-feeding body weight. All procedures were conducted in accordance with the National Institutes of Health Guide for the care and use of laboratory animals and approved by the University of California, Los Angeles Institutional Animal Care and Use Committee.
Viral constructs
Transduction of OFC or BLA neurons with the inhibitory designer receptor exclusively activated by designer drug (DREADD) hM4Di was achieved with an adeno-associated virus (AAV) driving the hM4Di-mCherry sequence under the human synapsin promoter (AAV8-hSyn-hM4Di-mCherry, viral concentration 7.4 × 1012 vg/ml; University of North Carolina Vector Core, Chapel Hill, NC). A virus lacking the hM4Di DREADD gene (AAV8-hSyn-mCherry; viral concentration 4.6 × 1012 vg/ml; University of North Carolina Vector Core) was used as a control. For ex vivo electrophysiology experiments, hM4Di and the excitatory opsin, channelrhodopsin (ChR2; AAV5-CaMKII-ChR2-EYFP; viral concentration 6.2 × 1012 vg/ml; University of North Carolina Vector Core), were coexpressed in either the OFC or BLA using a mixture of both viruses. A separate control group received only the ChR2 virus. Behavioral testing began between 6 and 8 weeks after viral injection to allow anterograde transmission and robust axonal expression in terminal regions.
Surgical procedures
Standard aseptic surgical procedures were used under isoflurane anesthesia (5% induction, 1%–2% maintenance). Bilateral virus injections were made via 33-gauge, stainless-steel injectors inserted into either the BLA (anteroposterior −3.0 mm, mediolateral ±5.1 mm, dorsoventral −8.0 or −8.5 mm relative to bregma) or OFC (anteroposterior 3.0 mm, mediolateral ±3.2 mm, dorsoventral −6.0 mm). Viruses were infused in a volume of 0.6 (BLA) or 0.8 (OFC) μl per hemisphere at a flow rate of 0.1 μl/min. Injectors were left in place for an additional 10 min to ensure adequate diffusion and to minimize viral spread up the injector tract. For rats in the behavioral experiments, during the same surgery, 22-gauge, stainless-steel guide cannulae (Plastics One) were implanted bilaterally targeted 1 mm above the BLA (anteroposterior −3.0 mm, mediolateral ±5.1 mm, dorsoventral −7.0 mm) for the OFC viral injection groups, or the OFC (anteroposterior +3.0 mm, mediolateral ±3.2 mm, dorsoventral −5.0 mm) for groups receiving viral injections into the BLA. A nonsteroidal anti-inflammatory agent was administered preoperatively and postoperatively to minimize pain and discomfort. Following surgery, rats were individually housed and allowed to recover for ∼16 d before the onset of any behavioral training.
Behavioral training
Training and testing took place in a set of 16 Med Associates operant chambers, described previously (Wassum et al., 2016).
Pavlovian training.
Each of the 8 daily sessions consisted of 8 tone (1.5 kHz) and 8 white noise conditional stimulus (CS) presentations (75 db, 2-min duration), during which either sucrose solution (20%, 0.1 ml/delivery) or grain pellets (45 mg; Bio-Serv) were delivered on a 30-s random-time schedule into the food-delivery port, resulting in an average of 4 stimulus–reward pairings per trial. For half the subjects, tone was paired with sucrose and noise with pellets, with the other half receiving the opposite arrangement. CSs were delivered pseudo-randomly with a variable 2–4 min intertrial interval (mean 3 min). Entries into the food-delivery port were recorded for the entire session. Comparison of anticipatory entries during the CS-probe periods (interval between CS onset and first reward) with entries during baseline periods (2 min period before CS onset) provided a measure of Pavlovian conditioning.
Instrumental training.
Rats were then given 11 d of instrumental training, receiving 2 separate training sessions per day: one with the lever to the left of the food-delivery port and one with the right lever. Each action was reinforced with a different outcome, either grain pellets or sucrose solution (counterbalanced with respect to the Pavlovian contingencies). Each session terminated after 30 outcomes had been earned or 30 min had elapsed. Actions were continuously reinforced on the first day and then escalated to a random-ratio-20 schedule. The rate of responding on each lever was measured throughout training.
Pavlovian-to-instrumental transfer test
Four groups of subjects received Pavlovian-to-instrumental transfer (PIT) tests: OFChM4Di→BLA (n = 10), BLAhM4Di→OFC (n = 10), OFCmCherry→BLA (n = 11), and BLAmCherry→OFC (n = 12). On the day before each PIT test, rats were given a single 30-min extinction session during which both levers were available, but pressing was not reinforced to establish a low level of responding. Each rat was given 2 PIT tests: one following infusion of vehicle and one following infusion of the otherwise inert hM4Di ligand, clozapine-n-oxide (CNO), into the BLA (OFChM4Di→BLA and OFCmCherry→BLA groups) or OFC (BLAhM4Di→OFC and BLAmCherry→OFC groups). Test order was counterbalanced across subjects. During each PIT test, both levers were continuously present, but pressing was not reinforced. After 5 min of lever-pressing extinction, each 2-min CS was presented separately 4 times each in pseudorandom order, separated by a fixed, 4-min intertrial interval. No rewards were delivered during CS presentation. The 2-min period before each CS presentation served as the baseline. Rats were given 2 retraining sessions for each instrumental association (2 sessions/d for 2 d) and 1 Pavlovian retraining session in between PIT tests.
Outcome-specific devaluation test
Following training, a second cohort of BLAhM4Di→OFC rats (n = 9) was given a series of two outcome-specific devaluation tests. Before each test, rats were given 1-h, unlimited access to either sucrose solution or food pellets in pre-exposed feeding chambers such that the prefed reward would become devalued, whereas the other reward would remain valued. Immediately after this prefeeding, rats received infusions of either vehicle or CNO into the OFC and were then tested. The test consisted of two phases. In the first, both levers were available, and nonreinforced lever pressing was assessed for 5 min. The levers were then retracted, which started the second, Pavlovian, test phase, in which each 2-min CS was presented, without accompanying reward, separately 2 times each in alternating order, separated by a fixed, 4-min intertrial interval. The 2-min period before each CS presentation served as the baseline. Successful devaluation of the earned outcome was confirmed by post test consumption of each food reward, in which rats ate significantly less of the devalued reward type (mean ± SEM, 1.81 ± 0.43 g) relative to the valued reward (5.38 ± 0.7; t(17) = 4.05, p = 0.0008).
After the first test, rats remained in their home cage for 2 d and were then given 2 retraining sessions for each instrumental association (2 sessions/d for 2 d) and 1 Pavlovian retraining session, before the second outcome-specific devaluation test. For the second test, rats were prefed on the opposite food reward (e.g., pellets if sucrose had been pre fed on Test 1) and infused with the opposite drug (e.g., CNO, if they had previously received vehicle). Thus, each rat experienced 2 devaluation tests to allow a within-subject drug-treatment design: one following vehicle and one following CNO infusion, counterbalanced for order. Because, in the absence of the hM4Di receptor, CNO itself was found to have no effect on the expression of PIT, which requires both action–outcome and stimulus–outcome associative information, empty-vector controls were not included for this experiment in which the use of either action–outcome or stimulus–outcome associations was assessed.
Outcome-specific reinstatement test
Rats then received 4 d of instrumental retraining before outcome-specific reinstatement testing. On the day before each reinstatement test, rats received a 30 min lever-pressing extinction session. Each rat was given 2 reinstatement tests, one following intra-OFC vehicle infusion and one after CNO infusion, counterbalanced for order. Rats were given instrumental retraining in between the two reinstatement tests. During each reinstatement test, both levers were continuously present, but pressing was never reinforced. After 5 min of extinction, rewards were presented in 8 separate reward-presentation periods (4 sucrose, 4 pellet periods, in pseudorandom order) separated by a fixed 4-min intertrial interval. Each reward presentation period was 2 min in duration and began with 2 deliveries of the appropriate reward, separated by 6 s. The 2-min period before each reward-delivery period served as the baseline.
Drugs
For behavioral experiments, CNO (Tocris Bioscience) was dissolved in aCSF to 1 mm and was intracranially infused over 1 min in a volume of 0.25 μl into the OFC or 0.5 μl into the BLA. Injectors were left in place for at least 1 additional min to allow for drug diffusion. Behavioral testing commenced within 5–10 min following infusion. CNO dose was selected based on evidence of both its behavioral effectiveness and ability to inactivate terminal activity when intracranially infused over hM4Di-expressing terminals (Mahler et al., 2014). CNO was dissolved in aCSF to 100 μm for ex vivo electrophysiology experiments (Stachniak et al., 2014).
Ex vivo electrophysiology
Whole-cell patch-clamp recordings were performed in brain slices from ∼5- to 6-month-old rats (n = 8 rats) 8–13 weeks following AAV injection. To prepare brain slices, rats were deeply anesthetized with isoflurane and perfused transcardially with an ice-cold, oxygenated NMDG-based slicing solution containing the following (in mm): 30 NaHCO3, 20 HEPES, 1.25 NaH2PO4, 102 NMDG, 40 glucose, 3 KCl, 0.5 CaCl2-2H2O, 10 MgSO4H2O (pH adjusted to 7.3–7.35, osmolality 300–310 mOsm/L). Brains were extracted and immediately placed in ice-cold, oxygenated NMDG slicing solution. Coronal slices (350 μm) were cut using a vibrating microtome (VT1000S; Leica Microsystems) and transferred to an incubating chamber containing oxygenated NMDG slicing solution warmed to 32°C-34°C and allowed to recover for 15 min before being transferred to an aCSF solution containing the following (in mm): 130 NaCl, 3 KCl, 1.25 NaH2PO4, 26 NaHCO3, 2 MgCl2, 2 CaCl2, and 10 glucose) oxygenated with 95% O2-5% CO2 (pH 7.2–7.4, osmolality 290–310 mOsm/L, 32–34°C). After 15 min, slices were moved to room temperature and allowed to recover for an additional ∼30 min before recording. All recordings were performed using an upright microscope (Olympus BX51WI) equipped with differential interference contrast optics and fluorescence imaging (QIACAM fast 1394 monochromatic camera with Q-Capture Pro software, QImaging).
Whole-cell patch-clamp recordings in voltage-clamp mode were obtained from postsynaptic BLA (OFChM4Di/ChR2→BLA: n = 5 cells, or OFCChR2→BLA: n = 5 cells) or OFC (BLAhM4Di/ChR2→OFC: n = 7 cells, or BLAChR2→OFC: n = 5 cells) neurons using a MultiClamp 700B Amplifier (Molecular Devices) and the pCLAMP 10.3 acquisition software. Visible eYFP-expressing terminals were identified in the OFC or BLA, and recordings were obtained from cells located only in highly fluorescent regions. The patch pipette (3–5 mΩ resistance) contained a Cesium methanesulfonate-based internal recording solution (in mm) as follows: 125 Cs-methanesulfonate, 4 NaCl, 1 MgCl2, 5 MgATP, 9 EGTA, 8 HEPES, 1 GTP-Tris, 10 phosphocreatine, and 0.1 leupeptin (pH 7.2, with CsOH, 270–280 mOsm). Biocytin (0.2%, Sigma-Aldrich) was included in the internal recording solution for subsequent postsynaptic cell visualization and identification.
After breaking through the membrane, recordings were obtained from cells while holding the membrane potential at −70 mV. Electrode access resistances were maintained at <30 mΩ. Blue light (470 nm, 5 ms pulse, 8 mW; CoolLED) was delivered through the epifluorescence illumination pathway using Chroma Technologies filter cubes to activate ChR2 and stimulate BLA terminals in the OFC, or OFC terminals in the BLA. All voltage-clamp recordings were performed in the presence of GABAA receptor antagonists, bicuculline or gabazine (10 μm, Tocris Bioscience, R&D Systems). Optically evoked EPSCs were recorded both before and after CNO bath application (100 μm; 20 min). As an additional control, recordings were made with identical timing, but without CNO bath application (n = 4 cells).
Histology
Rats in the behavior experiments were deeply anesthetized with Nembutal and transcardially perfused with PBS followed by 4% PFA. Brains were removed and postfixed in 4% PFA overnight, placed into 30% sucrose solution, then sectioned into 30–40 μm slices using a cryostat, and stored in PBS or cryoprotectant. To visualize hM4Di-mCherry expression in BLA or OFC cell bodies, free-floating coronal sections were mounted onto slides and coverslipped with ProLong Gold mounting medium with DAPI (Invitrogen). The signal for axonal expression of hM4Di-mCherry in terminal regions was immunohistochemically amplified using antibodies directed against mCherry. Floating coronal sections were washed 2 times in 1× PBS for 10 min and then blocked for 1–2 h at room temperature in a solution of 5% normal goat serum and 1% Triton X-100 dissolved in PBS. Sections were then washed 3 times in PBS for 15 min and then incubated in blocking solution containing rabbit anti-DsRed antibody (1:1000; Clontech) with gentle agitation at 4°C for 18–22 h. Sections were next rinsed 3 times in the blocking solution and incubated in AlexaFluor-594-conjugated (red) goat secondary antibody (1:500; Invitrogen) for 2 h. Sections were washed 3 times in PBS for 30 min, mounted on slides, and coverslipped with ProLong Gold mounting medium with DAPI. All images were acquired using a Keyence (BZ-X710) microscope with a 4× or 20× objective (CFI Plan Apo), CCD camera, and BZ-X Analyze software. Data from subjects for which hM4Di-mCherry expression could not be confirmed bilaterally in the target region were omitted from the analysis. We also confirmed that cannula placement was in the target region and coincided with labeled axon terminals.
Following ex vivo recordings, brain slices were fixed in 4% PFA for 24 h. Slices were then washed with 1× PBS, permeabilized with 1% Triton overnight at 4°C, and incubated for 2 h with streptavidin-Marina Blue (365 nm, ThermoFisher Scientific) at room temperature. Fluorescent images were taken of both recorded cells and eYFP or mCherry-expressing terminals using a Zeiss Apotome equipped with 20× and 40× objectives.
Experimental design and statistical analysis
Data were processed with Microsoft Excel and then analyzed with Prism (GraphPad) and SPSS (IBM). For all hypothesis tests, the α level for significance was set to p < 0.05. The behavioral data of primary interest were statistically evaluated with repeated-measures ANOVAs (Geisser–Greenhouse correction). For well-established behavioral effects (PIT, devaluation, reinstatement), multiple pairwise comparisons (paired t test, two-tailed) were used for a priori post hoc comparisons, as advised by Levin et al. (1994) based on a logical extension of Fisher's protected least significant difference procedure for controlling familywise Type I error rates. Bonferroni or Dunnet's corrections were used for post hoc analyses of all drug effects. Electrophysiological data were analyzed with unpaired t tests.
Behavioral data were analyzed for the rate of both lever pressing and entries into food-delivery port. Both drug and test phase were within-subject factors. All data were averaged across trials. For the PIT tests, lever pressing was averaged across levers for the 2-min baseline period and compared with that during the CS period, which was separated for presses on the lever that, during training, earned the same outcome as the cue predicted (i.e., CS-Same presses) versus those on the other available lever (i.e., CS-Different presses). Data from the reinstatement test were analyzed similarly, with reward-period presses separated for those on the lever that previously earned the same outcome as the presented reward (i.e., Reinstated presses) versus those on the alternate lever (i.e., Non-reinstated). For the PIT tests, entries into the food-delivery port were compared between the baseline and CS periods. Food-delivery port entries were analyzed similarly for the Pavlovian phase of the devaluation test; baseline entry rate was compared with entries during presentation of each CS separated for the cue that predicted the valued versus devalued reward type. Lever pressing during the instrumental phase of the devaluation test was separated for actions on the lever that, in training, earned the currently devalued versus valued reward. To specifically examine how CS presentation changed behavior during PIT and the Pavlovian devaluation test, in addition to these analyses, we also evaluated cue-induced change in lever pressing (PIT test) or food-port entries (Pavlovian devaluation test) by calculating an elevation ratio [CS responses/(CS responses + Baseline responses)].
For electrophysiological data, optically evoked EPSC amplitudes following CNO application were expressed as a percentage of the evoked response before CNO for comparison between AAV groups (hM4Di+ChR2 versus ChR2 only).
Results
Pathway-specific chemogenetic OFC-BLA manipulations
We used a chemogenetic approach (Armbruster et al., 2007; Smith et al., 2016) to manipulate monosynaptic OFC→BLA or BLA→OFC projections by taking advantage of the fact that DREADDs are trafficked to axon terminals where when hM4Di is activated by its otherwise inert exogenous ligand, CNO, it can attenuate presynaptic activity (Mahler et al., 2014; Stachniak et al., 2014). We first validated presynaptic suppression by terminal hM4Di activation with ex vivo electrophysiology. The Gi-coupled DREADD hM4Di and the excitatory opsin ChR2 were coexpressed in either the OFC (Fig. 1A) or BLA (Fig. 1D), and whole-cell patch-clamp recordings were obtained from postsynaptic cells in the ChR2- and hM4Di-expressing terminal regions (Fig. 1B,C). EPSCs were evoked by blue light activation of ChR2 in both the BLA (Fig. 1E) and OFC (Fig. 1F), and the amplitude of these responses was markedly attenuated in the presence of CNO. The CNO-induced change in the optically evoked EPSC was significantly lower in both BLA (t(8) = 5.68, p = 0.0005) and OFC (t(10) = 5.41, p = 0.0003) slices expressing hM4Di relative to ChR2-only controls lacking this receptor (Fig. 1G). Identically timed recordings without CNO application indicated <10% rundown of evoked EPSCs due to time alone (Average response = 98.31 ± 4.60% SEM).
For behavioral experiments, a synapsin-driven AAV yielding hM4Di expression was injected into either the OFC (OFChM4Di→BLA group) or BLA (BLAhM4Di→OFC group), yielding robust hM4Di expression (visualized by the mCherry fluorescent reporter protein; Fig. 2A,B,G,H). Guide cannulae were implanted over either the BLA (for OFChM4Di→BLA group) or OFC (for BLAhM4Di→OFC group) terminal fields in close proximity to the area of axonal expression (Fig. 2C,D,E,F) to allow CNO infusion to selectively inactivate OFC terminals in the BLA or BLA terminals in the OFC. We focused on the lateral OFC subregion, which is densely connected with the BLA (Kita and Kitai, 1990; Carmichael and Price, 1995; Ongür and Price, 2000) and heavily implicated in outcome-guided conditional responding and action (Schoenbaum et al., 1998; Ostlund and Balleine, 2007b; Lucantonio et al., 2015).
Contribution of OFC→BLA and BLA→OFC projections to outcome-specific Pavlovian-to-instrumental transfer
Using this approach, we examined the contribution of OFC→BLA and BLA→OFC projections to the ability to retrieve a stored memory of a specific predicted reward and to use this information to influence reward-seeking decisions during outcome-specific PIT (Fig. 3A). Rats were trained to associate two auditory CSs with two distinct food rewards and then to earn each of those two rewards by pressing on independent levers. Rats demonstrated acquisition of the Pavlovian associations by entering the food-delivery port significantly more during the CS probe periods (Average entry rate on the final training session OFChM4Di→BLA group: 11.05 entries/min ±1.25 SEM; BLAhM4Di→OFC group: 11.89 ± 1.51) than during the baseline periods (OFChM4Di→BLA group: 4.52 ± 0.50, t(9) = 5.72, p = 0.0003; BLAhM4Di→OFC group: 6.70 ± 1.44, t(9) = 4.92, p = 0.0008). All rats also acquired the instrumental behavior (Final average press rate OFChM4Di→BLA group: 21.13 ± 1.37 presses/min; BLAhM4Di→OFC group: 21.45 ± 1.54). At the critical PIT test, both levers were present, but lever pressing was not rewarded. Each CS was presented 4 times (also without accompanying reward), with intervening CS-free baseline periods, to assess its influence on action performance and selection in the novel choice scenario. Because the CSs are never associated with the instrumental actions, this test assesses the rats' ability, upon CS presentation, to retrieve a stored memory of the specific predicted reward and to use this information to motivate performance of those actions known to earn the same unique reward (Kruse et al., 1983; Colwill and Motzkin, 1994; Gilroy et al., 2014; Corbit and Balleine, 2016).
CNO-hM4Di inactivation of OFC terminals in the BLA did not alter the expression of outcome-specific PIT (Fig. 3B; Main effect of CS Period: F(2,18) = 10.18, p = 0.001; Drug: F(1,9) = 0.45, p = 0.52; CS × Drug interaction: F(2,18) = 0.04, p = 0.96). Following either vehicle or CNO infusion, CS presentation elevated press rate selectively on the lever that, in training, earned the same predicted reward (CS-Same) relative to both pressing during the CS on the alternate available lever (CS-Different) and baseline press rate (p = 0.001–0.002).
CNO-hM4Di inactivation of BLA terminals in the OFC did, however, attenuate PIT expression (Fig. 3C; CS Period: F(2,18) = 15.64, p = 0.0001; Drug: F(1,9) = 0.63, p = 0.45; CS × Drug: F(2,18) = 3.54, p = 0.05). Robust PIT was demonstrated under vehicle-infused control conditions; the CS elevated performance of the CS-Same action relative to both baseline (p < 0.001) and CS-Different pressing (p = 0.002). Following CNO infusion, there was no significant difference between CS-Same and either CS-Different (p = 0.15) or baseline pressing (p = 0.09), and CS-Same performance was lower following CNO relative to vehicle (p = 0.01). The result was similar when the CS-induced elevation in performance on each action choice was evaluated (Fig. 3C, inset). Under control conditions, the CS induced a greater elevation in performance on action Same than action Different (t(9) = 3.08, p = 0.01), but following CNO infusion there was no significant difference between actions (t(9) = 0.10, p = 0.92). The effect of inactivating BLA terminals in the OFC was restricted to cue-influenced action; lever pressing during the baseline period was not altered by CNO (p = 0.90). CNO-hM4Di inactivation of BLA terminals in the OFC consistently attenuated PIT expression across trials (Drug × CS × Trial: F(6,54) = 1.61, p = 0.20).
Inactivation of neither OFC terminals in the BLA (Fig. 3D; CS Period: F(1,9) = 95.95, p < 0.0001; Drug: F(1,9) = 1.62, p = 0.23; CS × Drug: F(1,9) = 0.08, p = 0.78), nor BLA terminals in the OFC (Fig. 3E; CS Period: F(1,9) = 106.30, p < 0.0001; Drug: F(1,9) = 0.26, p = 0.62; CS × Drug: F(1,9) = 0.49, p = 0.50) altered Pavlovian conditional food-port approach responding. In all cases, CS presentation significantly elevated entries into the food-delivery port (p < 0.0001–0.001).
CNO had no effect on lever pressing during PIT in subjects lacking the hM4Di receptor when it was infused into either the BLA (OFCmCherry→BLA group; Fig. 4A; CS Period: F(2,20) = 7.07, p = 0.005; Drug: F(1,10) = 1.04, p = 0.33; CS × Drug: F(2,20) = 0.20, p = 0.82) or OFC (BLAmCherry→OFC group; Fig. 4B; CS Period: F(2,22) = 34.21, p < 0.0001; Drug: F(1,11) = 0.31, p = 0.59; CS × Drug: F(2,22) = 0.04, p = 0.96).
Contribution of BLA→OFC projections to the sensitivity of instrumental actions and Pavlovian conditional responses to outcome-specific devaluation
The above data suggest that BLA→OFC, but not OFC→BLA, projections are required for a reward-predictive cue to selectively motivate performance of an action that results in the same rewarding outcome. This capacity relies upon retrieval of a representation of the specific shared reward (i.e., outcome) encoded in both the previously learned Pavlovian stimulus–outcome and instrumental action–outcome associations (Dickinson and Balleine, 2002; Corbit and Janak, 2010). The BLA is required for both types of associations (Blundell et al., 2001; Balleine et al., 2003; Ostlund and Balleine, 2008; Johnson et al., 2009). Therefore, we next asked whether BLA→OFC projections are required for reward representations triggered by either Pavlovian reward-predictive stimuli, by the rats' own knowledge of available action–outcome contingencies, or both (Fig. 5A).
A separate group of BLAhM4Di→OFC rats were trained as described above. These subjects demonstrated acquisition of the Pavlovian associations by entering the food-delivery port significantly more during the CS probe periods (12.22 ± 1.08) than the baseline periods (5.03 ± 0.62; t(8) = 7.24, p < 0.0001) and acquired the instrumental behavior (final average press rate 20.54 ± 1.48). Before test, one of the food rewards was devalued by sensory-specific satiety. Rats were then given a brief unrewarded instrumental choice test followed by a test of conditional food-port approach responding, in which levers were retracted and each CS was presented 2 times (without accompanying reward), with intervening CS-free, baseline periods. Infusions were made after the sensory-specific satiety procedure, but before the test to evaluate the influence of inactivation of BLA terminals in the OFC on the retrieval of reward representations, rather than on devaluation learning per se. If rats are able to recall the learned action–outcome contingencies, then, during the instrumental phase of the test, they should be able to select the action that earns the valued reward, downshifting responding on the action that earns the devalued reward. Similarly, if the Pavlovian cues trigger the recall of a memory of their specific predicted reward, then rats should show robust conditional food-port approach responding to the cue signaling the valued reward, but attenuated responding to the cue signaling the devalued reward. Because, in both cases, a specific reward expectation is needed to influence behavior, this test provided an opportunity to evaluate the contribution of BLA→OFC projections to the generation of detailed reward expectancies.
CNO-hM4Di inactivation of BLA terminals in the OFC was without effect on the sensitivity of instrumental choice performance to reward devaluation (Fig. 5B; Devaluation: F(1,8) = 13.50, p = 0.006; Drug: F(1,8) = 0.81, p = 0.39; Devaluation × Drug: F(1,8) = 0.31, p = 0.60). Conversely, this did impair rats' ability to adjust their Pavlovian conditional food-port approach responding according to the current value of each specific predicted reward (Fig. 5C). The CS-induced elevation in food-port approach responding (Fig. 5C, inset; Devaluation: F(1,8) = 2.78, p = 0.13; Drug: F(1,8) = 0.30, p = 0.60; Devaluation × Drug: F(1,8) = 5.50, p = 0.047) was higher when the CS signaled a valued reward relative to a devalued reward in the vehicle-infused condition (p = 0.047), but responding was equally elevated by both CSs following CNO infusion (p = 0.36). Indeed, following vehicle infusion, rats' food-port entries were significantly elevated above baseline by presentation of the CS previously associated with the valued reward (p = 0.006), but were not significantly elevated when the CS predicting the devalued reward was presented (p = 0.40). Conversely, following CNO infusion, rats' food-port approach responding was elevated above baseline during both CSs (Valued: p = 0.03, Devalued: p = 0.04; Fig. 5C, main; Devaluation: F(2,16) = 25.21, p < 0.0001; Drug: F(1,8) = 0.42, p = 0.53; Devaluation × Drug: F(2,16) = 1.65, p = 0.22).
Contribution of BLA→OFC projections to outcome-specific reinstatement
The data show that activity in BLA→OFC projections is required when a cue-triggered reward representation is used to either selectively motivate instrumental action or to direct adaptive conditional goal-approach responding. In both cases, the critical information, a predicted food reward, is not physically available, but rather must be expected based on previously learned associations. That is, the information was previously observed but is not currently observable. BLA→OFC projections may therefore participate in this reward expectation. Conversely, these projections may simply be needed for a reward, whether observable or not, to influence action. The BLA is itself required for both (Ostlund and Balleine, 2008). To test between these possibilities, we evaluated the effect of inactivation of BLA terminals in the OFC on outcome-specific reinstatement (Fig. 6A).
Rats were retrained on the instrumental contingencies (final average press rate: 31.77 ± 2.26) and then given a reinstatement test that was similar in structure to the PIT test, but with rewards themselves rather than CSs presented. During this test, rats hold the reward identity in working memory long enough to drive responding on the correct action without requiring access to a stored memory. As a result, reward presentation will selectively reinstate performance of the action that earns the same unique reward. If BLA→OFC projections are selectively required for the motivating influence of cue-elicited expectations of unobservable rewards, then inactivation of these projections should have little effect in this task. If, however, these projections are required for a reward to selectively motivate action regardless of its physical presence, then inactivation of this pathway should impair performance.
The data support the former. CNO-hM4Di inactivation of BLA terminals in the OFC did not significantly affect the expression of outcome-specific reinstatement (Fig. 6B; Reward delivery: F(2,16) = 5.49, p = 0.02; Drug: F(1,8) = 0.15, p = 0.71; Reward × Drug: F(2,16) = 0.37, p = 0.70). Following either vehicle or CNO infusion reward presentation selectively elevated press rate on the lever that, in training, earned the same reward type (Reinstated) relative to both pressing on the alternate available lever (Non-reinstated) and baseline press rate (p = 0.0002–0.006). There was also no effect on food-port entries in this task (Fig. 6C; Reward delivery: F(1,8) = 19.32, p = 0.002; Drug: F(1,8) = 0.03, p = 0.86; Reward × Drug: F(1,8) = 1.59, p = 0.24).
Discussion
Here we evaluated the contribution of OFC→BLA and BLA→OFC projections to outcome-guided behaviors. Inactivation of BLA terminals in the lateral OFC was found to disrupt the influence of cue-generated reward expectations over both instrumental action choices and Pavlovian goal-approach responses. Activity in these projections was not required when actions were guided by reward expectations based on stored action–outcome contingencies, or when rewards themselves directed action selection. BLA→OFC projections therefore enable the cue-triggered reward expectations that can motivate the execution of specific action plans and allow adaptive conditional responding.
BLA→OFC, but not OFC→BLA, projections mediate the selective motivating influence of reward cues over action
Chemogenetic inactivation was used to evaluate the function of monosynaptic, direction-specific connections between the BLA and OFC. CNO-hM4Di activation was found to suppress terminal output through presynaptic inhibition, consistent with similar findings in other pathways (Stachniak et al., 2014; Yang et al., 2016; Zhu et al., 2016). Projection inactivation was temporally restricted to specifically assess contribution to online behavioral control. CNO-hM4Di inactivation of BLA→OFC, but not OFC→BLA, projections attenuated expression of outcome-specific PIT. In particular, BLA→OFC inactivation blunted the cues' ability to selectively invigorate actions directed at the same unique reward. That this manipulation did not cause the cues to nondiscriminately increase action performance and did not alter discrimination between outcomes during reinstatement argues against a simple deficit in discriminating between the CSs. Rather, activity in BLA→OFC projections was found to be necessary for a reward cue, by way of retrieving a representation of a specific predicted reward, to motivate specific action plans.
This result is generally consistent with findings that surgical BLA-OFC disconnection disrupts outcome-guided choice behavior (Baxter et al., 2000; Zeeb and Winstanley, 2013; Fiuzat et al., 2017) and specifically implicates monosynaptic, bottom-up BLA→OFC projections. It does, however, contrast to data showing that OFC→BLA, but not BLA→OFC, projections are necessary for cue-induced reinstatement of cocaine seeking (Arguello et al., 2017), perhaps indicating that cocaine alters recruitment of OFC→BLA projections. An intact OFC is required for BLA neurons to develop associative encoding of cue-predicted rewards (Saddoris et al., 2005). OFC→BLA projections may therefore be important for stimulus–outcome encoding but not normally required once those associations have been well formed. This hypothesis warrants further investigation.
BLA→OFC projections mediate cue-triggered reward expectancies
Successful PIT requires retrieval of both the previously learned action–outcome and stimulus–outcome associations. Two pieces of evidence here suggest that BLA→OFC projections are not required for rats to access knowledge of the specific consequences of their instrumental actions. First, inactivation of BLA terminals in the OFC did not affect the ability to use the current value of specific anticipated rewards to influence instrumental choice. Second, it also left unaffected the ability of reward delivery to selectively reinstate performance of the action known to earn the same unique reward. These results could be interpreted as inconsistent with findings that BLA-OFC disconnection lesions disrupt the sensitivity of choice behavior to outcome-specific devaluation (Baxter et al., 2000; Zeeb and Winstanley, 2013; Fiuzat et al., 2017). But, in these previous studies, OFC-BLA connectivity was disrupted throughout both the devaluation learning opportunity and the choice test (and, in some cases, the whole of training and test), unlike the present study in which, to focus on memory retrieval, BLA→OFC projections were inactivated after devaluation just before test. While BLA→OFC projections are not needed for value-guided instrumental choice, BLA-OFC connectivity might be necessary for learning about changes in value. This possibility is consistent with evidence that the BLA is required for value encoding (Wassum et al., 2009, 2011; Parkes and Balleine, 2013; Wassum et al., 2016).
BLA→OFC projections were, however, required for cue-triggered outcome expectations to influence behavior. In support of this, inactivation of BLA terminals in the OFC prevented subjects from modulating their Pavlovian conditional goal-approach responding according to the current value of the specific cue-predicted reward. The PIT deficit therefore resulted from an inability of the cue to engender a reward expectation based on a stored stimulus–outcome memory. This could also explain why BLA-OFC disconnection lesions disrupt the sensitivity of instrumental choice behavior to devaluation, given that task demands in these experiments likely required stimulus–outcome information (Baxter et al., 2000; Zeeb and Winstanley, 2013; Fiuzat et al., 2017). That BLA→OFC projections are vital for cue-triggered reward expectations is consistent with evidence that reward cues activate BLA neurons (Paton et al., 2006; Tye and Janak, 2007; Ambroggi et al., 2008; Sangha et al., 2013; Beyeler et al., 2016) and that the OFC specializes in stimulus–outcome representations (Ostlund and Balleine, 2007a, b; Rudebeck et al., 2008, 2017; Camille et al., 2011).
These projections were not, however, necessary for the general, nonspecific motivational influence of the cue. During PIT, the cue-induced elevation in goal-approach responding, which did not require a specific reward expectation because there was a single shared food port, was unaffected by inactivation of BLA→OFC projections. Moreover, following devaluation, food-port entries were elevated by the reward-predictive cue regardless of whether the specific predicted reward was devalued or not. This is consistent with evidence that the BLA is not required for expression of the general form of PIT, in which cues nondiscriminately motivate action (Corbit and Balleine, 2005; Mahler and Berridge, 2012).
The BLA has been suggested to encode motivationally salient, precise reward representations (Schoenbaum et al., 1998; Fanselow and Wassum, 2015; Wassum and Izquierdo, 2015). Such information is needed to generate expectations about the current and potential future states, or situations, that guide decision making. Both the expression of outcome-specific PIT and the sensitivity of Pavlovian conditional responses to devaluation are consistent with the subject using an internally generated state of the environment to guide behavior. In the devaluation test in particular, appropriate responding requires an understanding that, although things have not perceptually changed (e.g., CS presence), the state is nonetheless different because the specific anticipated reward is no longer valuable. The data here can therefore be interpreted as evidence that BLA→OFC projections are required when one must use a cue to generate a state expectation when the critical information, the reward, is not currently observable. In further support of this, these projections were not needed when the reward was itself present to direct action.
Although BLA→OFC projections appear to facilitate decision making, they are unlikely to mediate the actual decision-making process itself. Were this the case, inactivation of BLA terminals in the OFC during PIT would have resulted in a nonspecific cue-induced increase in performance of both Same and Different actions, indicating an inability to select between actions on the basis of the cue-provided expectation. Rather, BLA projections may relay currently unobservable reward-specific information to the OFC for use in making predictions about future states. Indeed, the OFC has been suggested to be important for using reward expectations to guide action (Izquierdo et al., 2004; Delamater, 2007; Balleine et al., 2011; Schoenbaum et al., 2016; Sharpe and Schoenbaum, 2016) perhaps by influencing downstream decision circuits (Keiflin et al., 2013), and lesions to this region do cause nonspecific cue-induced increases in instrumental activity during PIT (Ostlund and Balleine, 2007b). Moreover, activity in the OFC of humans (Gottfried et al., 2003; Klein-Flügge et al., 2013; Howard et al., 2015; Howard and Kahnt, 2017), nonhuman primates (Rich and Wallis, 2016), and rodents (McDannald et al., 2014; Farovik et al., 2015; Lopatina et al., 2015) can represent detailed information about unobservable anticipated events. Correspondingly, OFC lesions or inactivations cause deficits in using anticipated rewarding events to guide behavior (Gallagher et al., 1999; Pickens et al., 2003, 2005; Izquierdo et al., 2004; Ostlund and Balleine, 2007b; West et al., 2011; Jones et al., 2012; Bradfield et al., 2015; Murray et al., 2015). If, as proposed (Wilson et al., 2014; Schuck et al., 2016), the OFC represents the current, not fully observable state, then the results here suggest that projections from the BLA enable reward-predictive cues to provide the OFC with detailed expectations of potential rewards available in that state. In concordance with this, an intact BLA is needed for neuronal encoding of anticipated outcomes in the OFC in rats (Schoenbaum et al., 2003; Rudebeck et al., 2013), nonhuman primates (Rudebeck et al., 2013, 2017), and humans (Hampton et al., 2007).
Implications
Evidence suggests that the cognitive symptoms underlying many psychiatric diseases result from a failure to appropriately anticipate potential future events. Indeed, deficits in the cognitive consideration of potential rewarding events have been detected in patients diagnosed with addiction (Hogarth et al., 2013), schizophrenia (Morris et al., 2015), depression (Seymour and Dolan, 2008), and social anxiety disorder (Alvares et al., 2014). Disrupted amygdala and OFC activity and connectivity have also been associated with these diseases (Ressler and Mayberg, 2007; Price and Drevets, 2010; Goldstein and Volkow, 2011; Passamonti et al., 2012; Liu et al., 2014; Sladky et al., 2015). These data therefore have important implications for the understanding and treatment of these psychiatric conditions, and suggest that they might arise, in part, from disrupted transmission of reward information from the BLA to the OFC.
Footnotes
This work was supported by National Institutes of Health Grant DA035443, University of California at Los Angeles Faculty Career Development Award to K.M.W., National Institutes of Health Training Grant DA024635, the Dr. Ursula Mandel Scholarship, and a University of California, Los Angeles Graduate Research Mentorship fellowship to N.T.L. We thank Drs. Alicia Izquierdo, Melissa Malvaez, and Ashleigh Morse for helpful discussions regarding these data.
The authors declare no competing financial interests.
- Correspondence should be addressed to Dr. Kate M. Wassum, Department of Psychology, University of California, Los Angeles, 1285 Franz Hall, Box 951563 Los Angeles, CA 90095. kwassum{at}ucla.edu