Abstract
Ventromedial prefrontal cortex (vmPFC) is thought to provide regulatory control over Pavlovian fear responses and has recently been implicated in appetitive approach behavior, but much less is known about its role in contexts in which appetitive and aversive outcomes can be obtained and avoided, respectively. To address this issue, we recorded from single neurons in vmPFC while male rats performed our combined approach and avoidance task under reinforced and non-reinforced (extinction) conditions. Surprisingly, we found that cues predicting reward modulated cell firing in vmPFC more often and more robustly than cues preceding avoidable shock; in addition, firing of vmPFC neurons was both response (press or no-press) and outcome (reinforced or extinction) selective. These results suggest a complex role for vmPFC in regulating behavior and support its role in appetitive contexts during both reinforced and non-reinforced conditions.
SIGNIFICANCE STATEMENT Selecting context-appropriate behaviors to gain reward or avoid punishment is critical for survival. Although the role of ventromedial prefrontal cortex (vmPFC) in mediating fear responses is well established, vmPFC has also been implicated in the regulation of reward-guided approach and extinction. Many studies have used indirect methods and simple behavioral procedures to study vmPFC, which leaves the literature incomplete. We recorded vmFPC neural activity during a complex cue-driven combined approach and avoidance task and during extinction. Surprisingly, we found very little vmPFC modulation to cues predicting avoidable shock, whereas cues predicting reward approach robustly modulated vmPFC firing in a response- and outcome-selective manner. This suggests a more complex role for vmPFC than current theories suggest, specifically regarding context-specific behavioral optimization.
Introduction
The medial prefrontal cortex (mPFC) is thought to exhibit control over appetitive behavior and Pavlovian fear responses. Broadly, dorsal mPFC (dmPFC) has been implicated in aversive processing and goal-directed behaviors, whereas ventral mPFC (vmPFC) has been more often associated with the formation and expression of extinction behaviors and habit (Ostlund and Balleine, 2005; Sotres-Bayon and Quirk, 2010; Senn et al., 2014; Sun et al., 2018). This dissociation is supported by divergent anatomical projections, with the dmPFC connecting to nucleus accumbens (NAc) core and dorsomedial striatum, whereas the vmPFC projects to NAc shell, amygdala, and bed nucleus of the stria terminalis (Haber et al., 2000; Killcross and Coutureau, 2003).
Although several recording studies have examined the role of dorsal mPFC in behaving rats, fewer have explored the function of single neurons in vmPFC. Generally, vmPFC activity is thought to regulate fear-related behaviors. For example, trace fear conditioning has been shown to increase intrinsic excitability transiently in vmPFC neurons projecting to basolateral amygdala, which was positively correlated with freezing behavior (Song et al., 2015). Others have shown that fear-induced freezing can be overcome by the activation of vmPFC to basomedial amygdala projections, which can differentiate between aversive and safe environments (Adhikari et al., 2015). Although these studies show the importance of vmPFC in managing fear responses, others have supported its role in avoidance behavior (Giustino et al., 2016; Soler-Cedeño et al., 2016; Schwartz et al., 2017); lesion and inactivation of vmPFC leads to failed shock avoidance and muddles discrimination between shock and safety cues (Sangha et al., 2014; Adhikari et al., 2015).
Emerging evidence suggests that single neurons in vmPFC also contribute to appetitive behavior (Burgos-Robles et al., 2013; Moorman and Aston-Jones, 2015). Consistent with these findings, recent vmPFC lesion and inactivation studies show suppressed reward-seeking behavior in contexts that were previously associated with gaining reward and suggest that these connections to NAc are necessary for the expression of reward-predictive cue-driven behavior (Bossert et al., 2011; Keistler et al., 2015; Zeeb et al., 2015). However, the opposite outcome has also been reported: that activating vmPFC projection neurons suppresses cue-induced drug-seeking behavior (Peters et al., 2008; LaLumiere et al., 2012). Such variance could arise from the simultaneous existence of separate, but intermingled, neural ensembles within vmPFC that selectively encode opposing environmental actions for learned cue-driven responses such as reward or extinction (Suto et al., 2016; Warren et al., 2016).
More generally, extensive evidence implicates the vmPFC in extinction learning and expression. Several studies have found that the degree to which extinction memories are retrieved scales with firing and burst activity within vmPFC (Milad and Quirk, 2002; Burgos-Robles et al., 2007; Wilber et al., 2011; Maroun et al., 2012). Consistent with this, inactivation of the vmPFC has been shown to increase responding to food-predictive cues during extinction and impair extinction retrieval, whereas stimulation of the vmPFC promoted extinction to fear-related cues (Milad and Quirk, 2002; Eddy et al., 2016). This is consistent with the finding that vmPFC encodes both contextually appropriate behavioral initiation during reward seeking and withholding during extinction (Moorman and Aston-Jones, 2015).
Therefore, previous studies suggest that the vmPFC is involved in reward seeking and fear management during both conditioning and extinction. However, it remains unknown how vmPFC encodes avoidance, whether single neurons are modulated during both approach and avoidance, and how these correlates change when task-relevant cues become unreinforced during extinction. To address these issues, we recorded from single neurons in vmPFC while rats performed a combined approach and avoidance task when behavior was reinforced and unreinforced (i.e., in extinction). We observed distinct correlates within vmPFC during reward approach that were response selective (press; no press) and block selective (reinforced; extinction), but found very few cells that were modulated during avoidance.
Materials and Methods
Animals.
Eight male Sprague Dawley rats were obtained from Charles River Laboratories at 250–300 g (7–8 weeks old). Rats were individually housed in a temperature- and humidity-controlled environment and kept on a 12 h light/dark cycle (0700–1900 in light); all tests were run during the light phase. Rats had access to water ad libitum and body weight was maintained at 85% of baseline weight by food restriction (15 g of standard rat chow provided daily in addition to ∼1 g of sucrose pellets during experimental trials). All procedures were performed in accordance with National Institutes of Health guidelines and the University of Maryland–College Park Institutional Animal Care and Use Committee protocols.
Combined positive and negative reinforcement behavioral task.
Eight rats were run on a combined positive and negative reinforcement behavioral task. We trained rats progressively on this task due to its complexity. First, we trained rats daily on a 45 min FR1 reward-shaping program to establish the lever–response reward contingency. Once the reward contingency was learned (3–4 d), rats were then trained daily on a 45 min foot shock (0.42 mA) escape procedure to establish the lever–response shock termination contingency. Foot shock intensity was selected based on the conditioned foot shock intensity optimization protocol for avoidance behavior outlined previously (Oleson et al., 2012) and on previous success in shock avoidance paradigms (Gentry et al., 2016); we used the moderately aversive stimulus strength of 0.42 mA to balance aversiveness with response probability for shock trials. During each shock escape training session, subjects were simultaneously presented with a lever, cue light, and auditory cue at shock onset; a response on the lever at any point during the session resulted in termination of foot shock and cue light, as well as progression to the intertrial interval (ITI) (20 s). During escape sessions, subjects were gradually shaped toward the lever (safe side, quadrant with lever, orientation toward the lever, rearing, pressing) by the experimenter as needed until escape behavior was acquired (1–2 d). Shock escape was used only as a training mechanism and escape responses were not possible in the final behavioral program.
Once escape behavior was acquired, shock trials were altered to allow for avoidance and positive reinforcement (as described above) and neutral (unreinforced) contingencies were added to the program. The combination of these three trial types (reward, shock avoidance, and neutral) comprised our final task. Importantly, in this final task, rats could only avoid or fail to avoid on shock trials. The lever was extended into the testing chamber at the beginning of the session and remained out for the entire length of the session. At trial onset, a cue light and one of three discriminatory auditory cues (tone, white noise, or clicker) were activated; house lights remained on throughout the session. Five seconds after the onset of the auditory cue and cue light, the lever could be pressed to produce a response; presses before the end of the 5 s cue period were not counted. Lever pressing after this 5 s delay would produce one of three outcomes (dependent upon auditory cue identity): delivery of a sucrose pellet (“reward trial”; positive reinforcement behavior), prevention of foot shock (“shock trial”; negative reinforcement behavior), or no consequence (“neutral trial”; unreinforced). If the animal failed to press the lever within a 10 s period, no food reward was delivered on reward trials, foot shock commenced (2 s duration with automatic termination) on shock trials, or there was no consequence on neutral trials. After response or termination of the trial, an ITI (20 s) was initiated. Auditory cue identities were counterbalanced across rats and trial types were pseudorandomly presented within the session. Rats were very well trained on this task, completing >30 sessions and displaying >60% avoidance responses for >3 consecutive sessions. Rats were brought back up to behavioral criterion after surgical recovery (∼2 weeks) before electrophysiological recordings began. Each reinforced session consisted of an average of 32.6 ± 5.9 trials per trial type.
During each recording session, rats were also run on an extinction program after the regular, reinforced session was completed. This extinction program followed the same format and timing schedule, but no consequences occurred regardless of whether the animal pressed the lever during reward and shock trials. Extinction sessions consisted of an average of 21.5 ± 4.6 trials per trial type. Combined, each daily session lasted ∼75 min (45 min reinforced, 30 min extinction). Behavioral sessions in combination with single-unit recordings were run for ∼2 months (N = 84 sessions).
Intracranial surgical procedures.
All surgical procedures were performed after rats were initially trained on the task. All rats were anesthetized using isoflurane in O2 (5% induction, 1% maintenance) and each of the eight rats were chronically implanted with a drivable bundle of 10, 25 μm diameter FeNiCr (iron, nickel, chromium) wires in the left or right hemisphere in mPFC just dorsal to the infralimbic cortex (+3.0 AP, ±0.6 ML, −4.0 DV from brain). The recording electrode and anchoring screws were stabilized using dental cement (Dentsply) and rats then received postoperative care: subcutaneous injection of 5 ml of saline containing 0.04 ml of carprofen (Rimadyl), topical application of lidocaine cream to the surgical area, and placement on a heating pad until full consciousness was regained. Rats were also given antibiotic treatment with Cephlexin orally 1 d before surgery and twice daily for 1 week after surgery to prevent infection of the surgical site. All subjects were allowed at least a week for full recovery before experimentation.
Data acquisition.
Experiments were performed in a Plexiglas behavioral chamber (Med Associates). The behavioral chamber was fitted with shock-grid flooring with a retractable lever, cue light (above lever portal), and food cup on the left side. Auditory cue sounds were recorded and played back to the rat via an Arduino system.
Electrodes were screened daily to monitor active wires and the electrode assembly was advanced by 40–80 μm/d at the end of the recording session, which allowed us to record from a different neuronal population each day. Neural activity was recorded using Plexon Multichannel Acquisition Processor systems. Signals from the electrode channels were amplified 20 times by an op-amp head stage (Plexon, HST/8o50-G20-GR) located on the electrode array. Immediately outside of the chamber, signals were passed through a differential preamplifier (Plexon, PBX2/16sp-r-G50/16fp-G50), where the single-unit signals were amplified 50 times and filtered at 150-9000 Hz. The single-unit signals were then sent to the Multichannel Acquisition Processor box, where they were further filtered at 250–8000 Hz, digitized at 40 kHz, and amplified at 1–32 times. Waveforms >2.5:1 signal-to-noise were extracted from active channels and recorded to disk. Neurons were sorted using Offline Sorter and Neuroexplorer and exported for further analysis in MATLAB (The MathWorks) (Bissonette et al., 2013; Burton et al., 2014).
Experimental design and statistical analysis.
Our analysis epochs (cue and baseline) were computed by dividing the total number of spikes by time. The cue epoch consisted of average spikes across time during the 5 s period after cue onset and the baseline epoch consisted of average spikes across time during a 1 s period 2 s before cue onset. Neurons were characterized by comparing firing rate during baseline with firing rate during the cue epoch or firing rate during the cue epoch of reward or shock trials with firing rate during the cue epoch of neutral trials, averaged over all trial types (Wilcoxon; p < 0.05). We also computed a reward index (reward − neutral) and shock index (shock − neutral) to normalize firing to neutral trials, to determine whether firing was significantly shifted across the population (Wilcoxon; p < 0.05), and to find correlations between firing on shock and reward trials. χ2 tests were performed to assess differences in the counts of neurons showing significant modulation across groups of interest.
Behavior during performance of the task was evaluated by computing percentage press and reaction times (RTs) for each trial type. RT was defined as the time between auditory cue offset and lever press. A two-factor ANOVA (p < 0.05) was performed on these behavioral measures to determine whether activity was modulated by trial type (reward, neutral, and shock) and block (reinforced, extinction) or if there were any interactions between these factors. In addition, we performed a two-factor ANOVA (p < 0.05) on RTs and firing rates to determine whether activity was different across blocks (reinforced, extinction) and time point within block (early, late). ANOVAs were followed with post hoc t tests using a Bonferroni correction to adjust for multiple comparisons.
Behavioral videos recorded during performance of the combined positive and negative reinforcement task and during extinction were scored for freezing and orienting to the lever during the cue epoch (cue onset to cue offset; 5 s) for all trial types. For behavioral analysis, the cue epoch was divided into 2 subepochs (first half and last half) and separate binary (0 or 1) scores were recorded for each behavioral measure during each subepoch. These behavioral analyses were scored blindly and counts were analyzed using χ2 analyses.
Histology.
After the completion of the study, rats were terminally anesthetized with an overdose of isoflurane (5%) and transcardially perfused with saline and buffered 4% paraformaldehyde. Brain tissue was removed and postfixed with paraformaldehyde at 4°C. Brains were then placed in 30% sucrose solution for 72 h and sectioned coronally (50 μm) using a freezing microtome. Tissue slices were mounted onto slides and stained with thionin for histological reconstruction. Electrode placement was verified under a light microscope and drawn onto plates adapted from the rat brain atlas (Paxinos and Watson, 2007).
Results
Behavior during combined approach–avoidance
Rats (n = 8) were trained on a combined approach–avoidance task (Fig. 1A–C). At the start of each session, a lever was extended into the behavioral chamber and remained extended until session completion. At the start of each trial, 1 of 3 distinct discriminatory auditory cues and a cue light were presented to the rat for 5 s, signaling if the current trial would be a reward, shock, or neutral trial. After termination of the auditory cue, a lever press produced one of three outcomes (dependent upon auditory cue identity): delivery of a food reward (positive reinforcement behavior, i.e., reward trials), prevention of foot shock (negative reinforcement behavior, i.e., shock trials), or no consequence (i.e., neutral trials). If the rat failed to press the lever within 10 s of cue termination, then no food reward was delivered on reward trials, foot shock (0.42 mA, 2 s with automatic termination) commenced on shock trials, and there was no consequence on neutral trials. These three trial types were pseudorandomly interleaved (i.e., random without replacement) within each session. Hereafter, we will refer to this first block of trials as the “reinforced” trial block. Each reinforced session consisted of an average of 32.6 ± 5.9 (SD) trials per trial type. During sessions in which single neurons were recorded, each rat immediately went through extinction (i.e., no shocks or rewards were administered) after completion of the regular reinforced block of trials. Extinction (i.e., non-reinforced) sessions consisted of an average of 21.5 ± 4.6 (SD) trials per trial type. Hereafter, we will refer to this second block of trials as the “extinction” trial block.
Task design. A–C, Sessions consisted of three trial types: reward (A), neutral (B), and shock (C), which could be identified by a unique auditory cue. A lever was introduced into the chamber at the start of each session and remained extended for the duration of the session. At the beginning of each trial, rats were presented with a light cue and trial-specific sound cue for 5 s and then had a maximum of 10 s to press the lever. If rats pressed the lever during this 10 s interval, then they could receive a sucrose pellet reward, avoid an impending foot shock (0.42 mA), or experience no consequence depending on the identity of the sound cue. If rats failed to press the lever within 10 s, they would alternatively receive no sucrose reward, receive a foot shock (0.42 mA; 2 s duration with automatic termination), or experience no consequence depending on the identity of the sound cue. After each consequence, the trial progressed into a 20 s ITI. Trial types were pseudorandomly interleaved within each session (∼45 min) and sound cue identity was counterbalanced across rats. During extinction sessions (∼30 min), cues produced no outcome regardless of previous association with reward, neutral, or shock. D, E, RT and %P computed across reinforced (D) and extinction (E) sessions (n = 84). RT was defined as the time between auditory cue offset and the lever press. Bars with asterisks represent significance (t test; p < 0.05). F, G, Analysis of behavior during the cue period of each trial type in reinforcement and extinction sessions. Percentage freezing (F) and orienting to the lever (G) are shown for press and no-press trials. Asterisks indicate p < 0.05 in χ2; n = 17 sessions from 5 rats, with 4 rats contributing 3 sessions and one rat contributing 2 sessions. Error bars indicate SEM.
First, we performed a two-factor ANOVA with trial type (reward, neutral, and shock) and block (reinforced and extinction) as factors. We found a main effect of trial type (RT: F(2,498) = 62.4, p < 0.001; percent press (%P): F(2,498) = 10.55, p < 0.001), as well as a main effect of trial block (RT: F(1,498) = 47.53, p < 0.001; %P: F(1,498) = 579.97, p < 0.001), demonstrating that rats were slower and pressed less often during extinction compared with reinforced trial blocks. There were no interactions between trial type and extinction for either RT or percentage lever press, indicating that the pattern of behavior observed on reward, neutral, and shock trials was similar across trial blocks (RT: F(2,498) = 1.76, p = 0.17; %P: F(2,498) = 0.58, p = 0.56).
Next, we performed post hoc t tests to further evaluate behavior across reinforced and extinction trial blocks. Figure 1, D and E, illustrates behavioral measures across recording sessions (N = 84). During reinforced sessions (Fig. 1D), rats produced the most responses and were fastest to respond on reward trials compared with neutral trials (%P: t(83) = 7.64, p < 0.0001; RT: t(83) = 12.29, p < 0.0001); rats were slowest to press for shock trials (Shk vs Neu RT: t(83) = 7.17, p < 0.0001; Shk vs Rew RT: t(83) = 11.94, p < 0.0001), but pressed significantly more often during shock trials than during neutral trials (%P: t(83) = 2.74, p < 0.01). During extinction sessions (Fig. 1E), rats were still faster to press and pressed more often during reward trials compared with neutral trials (%P: t(83) = 7.60, p < 0.0001; RT: t(83) = 9.24, p < 0.0001), but RTs and percentage press were no longer different between shock and neutral (%P: t(83) = 1.46, p = 0.15; RT: t(83) = 1.72, p = 0.09). Together, these behavioral results suggest that rats can discriminate between the three trial types during both reinforced sessions and extinction and, as expected, behavior declined during extinction when outcomes were omitted.
To further probe behavior during the cue period, we asked whether freezing and orienting toward the lever during the cue differed across trial types in reinforcement and extinction (Fig. 1F,G; n = 14 sessions, with three sessions each from four rats and two sessions from one rat). We found that rats froze (Fig. 1F) significantly more to shock-predictive cues than reward or neutral cues (Shk vs Neu Press: χ2 = 51.39, p < 0.0001, Rew vs Shk Press: χ2 = 54.07, p < 0.0001). Freezing was most prominent when they failed to press on shock trials (Shk vs Neu No Press: χ2 = 24.23, p < 0.0001, Shk vs Rew No Press: χ2 = 31.23, p < 0.0001, Shk Press vs Shk No Press: χ2 = 347.14, p < 0.0001). Although rats still froze more to shock cues than reward or neutral cues during extinction (Shk vs Neu Ext Press: χ2 = 6.80, p < 0.01, Shk vs Rew Ext Press: χ2 = 4.35, p < 0.05, Shk vs Neu Ext No Press: χ2 = 10.53, p < 0.01, Shk vs Rew Ext No Press: χ2 = 14.56, p < 0.001), they froze significantly less to shock cues during extinction sessions than when shock cues were reinforced (Reg vs Ext Shk Press: χ2 = 41.77, p < 0.0001; Reg vs Ext Shk No Press: χ2 = 5.01, p < 0.05).
Rats generally oriented toward the lever more often before pressing (Fig. 1G; Rew Press vs Rew No Press: χ2 = 541.78, p < 0.0001; Neu Press vs Neu No Press: χ2 = 397.92, p < 0.0001; Shk Press vs Shk No Press: χ2 = 347.14, p < 0.0001; Rew Ext Press vs Rew Ext No Press: χ2 = 78.67, p < 0.0001; Neu Ext Press vs Neu Ext No Press: χ2 = 27.25, p < 0.0001; Shk Ext Press vs Shk Ext No Press: χ2 = 21.83, p < 0.0001). Rats oriented toward the lever more often during reinforced reward-predictive cues compared with neutral and shock avoidance cues when they pressed the lever (Rew vs Neu Press: χ2 = 26.07, p < 0.0001; Rew vs Shk Press: χ2 = 40.18, p < 0.0001) and when they pressed during extinction (Rew vs Neu Ext Press: χ2 = 10.62, p < 0.01; Rew vs Shk Ext Press: χ2 = 10.21, p < 0.01). This difference was not present when rats failed to press the lever (Rew vs Neu No Press: χ2 = 0.26, p = 0.26; Rew vs Neu Ext No Press: χ2 = 1.65, p = 0.20; Rew vs Shk Ext No Press: χ2 = 3.84, p = 0.05), with the exception of rats orienting significantly more to the lever for reinforced shock cues compared with neutral cues (Rew vs Shk No Press: χ2 = 5.34, p < 0.05). There was no difference in orienting to shock and neutral cues when rats pressed or failed to press during reinforcement (Shk vs Neu Press: χ2 = 1.54, p = 0.21; Shk vs Neu No Press: χ2 = 0.23, p = 0.23) or extinction (Shk vs Neu Ext Press: χ2 = 0.00, p = 0.95; Shk vs Neu Ext No Press: χ2 = 45, p = 0.49). Together, these data reinforce that our rats were able to discriminate between cues despite pressing at a high rate for all trial types. Further, they show that rats did orient to the lever more often on reward trials and froze more often on shock trials, as described previously (Gentry et al., 2016).
Activity in vmPFC was strongly and weakly modulated during reward and shock trials, respectively
To understand the role of vmPFC in our combined approach and avoidance task, we recorded from a total of 289 neurons within the vmPFC of rats (n = 6 rats). In our initial analysis, we broadly determined whether neurons increased (i.e., increasing-type cell) or decreased (i.e., decreasing-type cell) firing rate during the cue epoch (5 s after cue onset) compared with baseline (1 s epoch taken 2 s before cue onset; Wilcoxon; p < 0.05). We found that 60 (22.5%; χ2 = 495.0, p < 0.05) and 15 (5.6%; χ2 = 13.6, p < 0.05) neurons significantly increased or decreased firing rate during the cue epoch, respectively. Figure 2, A–C, shows firing patterns during all three trial types for each of our recorded neurons over the course of our study, sorted by firing during the cue epoch (i.e., decreasing to increasing). Figure 2, D–F, shows a single-neuron example of an increasing-type cell during Reward (Fig. 2D), Neutral (Fig. 2E), and Shock (Fig. 2F) trials.
Firing rates over trial time for individual neurons. A–C, Heat plots depicting normalized cell firing (spikes/s) across trial time (x-axis) for each recorded cell (y-axis; N = 289 cells) for reward press (A), neutral press (B), and shock press (C) trial types. Cells are sorted by firing during the cue epoch. Cue onset and offset are depicted with dashed black lines. D–F, Single-cell example of an increasing-type cell showing activity across each of the three trial types. Activity is aligned to cue onset (binned at 100 ms); cue onset and offset are depicted with gray lines. Each tick mark equals one action potential.
Figure 3, A–D, illustrates the average normalized firing rate of increasing-type and decreasing-type neurons over trial time broken down by trial type (blue = reward, yellow = neutral, red = shock) aligned to the start of the cue period (Fig. 3A,B) and to lever press (Fig. 3C,D). In both populations of cells, there were clear differences in firing rate between reward and neutral trials (as defined), but no difference between shock and neutral trials. To quantify these results, we computed a reward (reward − neutral) and shock (shock − neutral) index for each cell by subtracting average firing rates during the cue epoch on neutral press trials from reward press and shock press (i.e., avoid) trials. These indices are plotted against each other in Figure 3E. We found no correlation between firing rates for cells that were modulated by reward and shock trials compared with neutral (Fig. 3E; r2 = 0.02, p = 0.26), indicating that the same cells were not significantly modulated by both. For increasing-type neurons, the reward index was significantly shifted in the positive direction (Wilcoxon; Z = 2.22; μ = 0.43; p < 0.05), whereas the reward index for decreasing-type neurons was shifted in the negative direction (Wilcoxon; μ = −0.63; p < 0.05). Therefore, both increasing and decreasing populations were significantly modulated by reward expectation. This was not true, however, for cues that predicted shock; the distribution of shock indices was not significantly shifted from zero in either increasing or decreasing populations (Wilcoxon; Increasing: Z = 0.86, μ = 0.04, p = 0.39; Decreasing: μ = 0.09, p = 0.64). Further, there was no correlation between reward and shock indices individually for increasing (r2 = 0.013, p = 0.144) or decreasing (r2 = 0.157, p = 0.391) cells when analyzed separately nor when they were combined (r2 = 0.017; p = 0.259). Therefore, we conclude that average firing rates in vmPFC were modulated by cues that predict reward but not shock and, furthermore, that these reward effects were not linked to parallel signals reflecting the value or the motivational level associated with avoiding shock.
Increasing- and decreasing-type cells in vmPFC are modulated by cues that predict reward. A, B, Histograms depicting normalized average firing rate (spikes/s) for cells increasing (n = 60) or decreasing (n = 15) within the overall population (N = 289 cells) across trial time for reward (blue), neutral (orange), and shock (red) trial types. Cue onset is depicted with a gray dashed line aligned to time = 0. C, D, Histograms depicting normalized average firing rate (spikes/s) for the same cells pictured in A and B aligned to lever press, which is depicted with a gray dashed line at time = 0. E, Scatter plot depicting combined increasing and decreasing cells (n = 75) along computed reward (reward − neutral; x-axis) and shock (shock − neutral; y-axis) indices for each cell. Indices were calculated by subtracting average firing rates during the cue epoch on neutral press trials from reward press and shock press (i.e., avoid) trials.
From the analysis above, it appears that firing in vmPFC was significantly modulated by reward expectancy during the cue epoch, with little to no modulation during shock trials when cells were divided into increasing-type and decreasing-type neurons. However, it is possible that, by broadly dividing neurons in this way, we overlooked neurons that were shock selective independent of modulation on reward or neutral trial types. To address this issue, we investigated how many neurons within the vmPFC were modulated more or less on reward and shock trials compared with neutral trials. Figure 4A shows the distribution of recording locations for these cells within vmPFC. Figure 4B further quantifies the breakdown of these cells, showing counts and percentages of cells that fired significantly differently between reward and shock trials relative to neutral (Wilcoxon; p < 0.05), indicating how many and what percentage of individual cells increased or decreased to both shock and reward, were modulated only by shock or only by reward, or were not significantly modulated by either.
vmPFC neurons were strongly and weakly modulated during reward and shock cues, respectively, and very few are modulated by both. A, Location of recording sites based on histology (Paxinos and Watson, 2007); PL, Prelimbic cortex, DP, dorsopeduncular region. Each symbol represents the location of neurons that showed differential firing (Wilcoxon; p < 0.05) in the analyses described in the text (see Results) and shown in the table in B. Dark blue indicates reward > neutral; light blue, neutral > reward; dark red, shock > neutral; light red = neutral > shock; -, decreasing-type cells; +, increasing-type cells. B, Table quantifying numbers and percentages of cells that were reward > neutral, neutral > reward, shock > neutral, neutral > reward, or none of the above. C–F, Histograms depicting average normalized firing rate (spikes/s) for cells in which reward < neutral (n = 40; C), reward > neutral (n = 30; D), shock < neutral (n = 5; E), and shock > neutral (n = 12; F) within the overall population (N = 289 cells) across trial time for reward (blue), neutral (orange), and shock (red) trial types. Cue onset is depicted with a gray dashed line aligned to time = 0. Insets show scatter plots depicting each cell within each subpopulation (reward < neutral, reward > neutral, shock > neutral, shock > neutral) along computed reward (reward − neutral; x-axis) and shock (shock − neutral; y-axis) indices. Indices were calculated by subtracting average firing rates during the cue epoch on neutral press trials from reward press and shock press (i.e., avoid) trials.
From the total population of neurons (N = 289), 70 (24.2%) cells fired significantly differently to cues on reward compared with neutral trials (χ2 = 224.39, p < 0.0001), whereas only 17 (5.9%) cells fired significantly differently to cues on shock trials, which was not significantly more than expected by chance alone (χ2 = 0.46, p = 0.49). Further, the frequency of neurons modulated during reward trials significantly outnumbered those modulated during shock trials (χ2 = 27.03, p < 0.0001).
The average firing rate for reward- and shock-modulated neurons is plotted across trial time in Figure 4, C–F, along with inset correlations between the reward and shock indices (reward − neutral, shock − neutral) for these cells. Although, by definition, reward-modulated cells exhibited differential firing rates on reward versus neutral trials, they did not also show differential firing during shock trials; the same is true for shock-modulated neurons, showing differential firing rates on shock versus neutral trials, but no difference during reward trials. Of the 40 (13.8%) cells that fired significantly less on reward trials relative to neutral, only 2 of these cells were also modulated by shock cues (χ2 = 0.001, p = 1.0) and, of the 30 (10.4%) that fired significantly more on reward trials relative to neutral, again, only 2 cells were also modulated by shock cues (χ2 = 0.14, p = 0.68). Correlation insets show no correlation between reward and shock indices for either neutral greater than reward (Fig. 4C; r2 = 0.01, p = 0.54) or reward greater than neutral (Fig. 4D; r2 = 0.002, p = 0.82) cells. Of the 5 (1.9%) cells that fired less on shock trials relative to neutral, only 1 was also modulated by reward cues (χ2 = 2.06, p = 0.12) and, of the 12 (4.2%) cells that tired significantly more on shock trials relative to neutral, 3 cells were also modulated by reward (χ2 = 9.69, p < 0.01). Insets show no correlation between reward and shock indices for either neutral greater than shock (Fig. 4E; r2 = 0.61, p = 0.12) or shock greater than neutral (Fig. 4F; r2 = 0.05, p = 0.48) shock-modulated cells. Overall, only 1.4% of all neurons were modulated on both reward and shock trials. These single-unit results are consistent with our population findings in Figure 3E, showing no correlation between reward and shock indices. Together, these results demonstrate that neurons in vmPFC are strongly and weakly modulated during reward and shock cues, respectively, and that very few neurons were modulated during both reward-seeking and shock-avoidance behaviors.
Neurons selective for outcome during reinforcement trial blocks became nonselective during extinction and vice versa
vmPFC is thought to play a key role in extinction (Milad and Quirk, 2002; Burgos-Robles et al., 2007; Hefner et al., 2008; Wilber et al., 2011; Holmes et al., 2012; Maroun et al., 2012; Giustino et al., 2016). To determine how neural signals encoding the promise of reward and the threat of shock were modulated when outcomes were no longer delivered, we next determined how many neurons that were selective during reinforced trials blocks became nonselective during extinction and vice versa. Because extinction sessions naturally have fewer press trials, we only examined sessions in which there were at least two press trials per trial type during both reinforced and extinction sessions (n = 241) and investigated whether neurons that were selective for press during reinforced trials (i.e., reward > neutral, neutral > reward, shock > neutral, neutral > shock) were also selective for press trials during extinction. Due to the low overall number of neurons that were significantly modulated during shock trials, the following figures are restricted to cells that were modulated by reward, although data and statistics for cells modulated by shock trials will still be reported here using parallel analyses.
Figure 5, A and B, illustrates the average firing rate over trial time of an extinction-matched subgroup of neurons that showed significantly different firing rates to reward relative to neutral trials during reinforced sessions. This group of neurons did not exhibit differential firing between reward and neutral trial types during extinction (Fig. 5C,D). Figure 6 shows a single-neuron example of a decreasing-type cell exhibiting this pattern; it is clear from the raster display that this neuron showed depressed firing during the cue period early in extinction and lost reward cue selectivity as extinction progressed (i.e., it no longer shows decreased firing to the reward cue during the cue epoch). Of the 32 (13.3%; χ2 = 114.4, p < 0.05) and 25 (10.4%; χ2 = 61.0, p < 0.05) neurons that showed lower (Fig. 5A) and higher (Fig. 5B) firing on reward trials during reinforced sessions, respectively, only 4 (χ2 = 3.63, p = 0.05) and 2 (χ2 = 0.41, p = 0.49) of these cells were also selective during extinction. Similarly, of the 5 (2.1%; χ2 = 0.16, p = 0.69) and 8 (3.3%; χ2 = 0.63, p = 0.43) neurons that showed lower and higher firing on shock trials during conditioning, respectively, none of these cells were selective during extinction. Therefore, we conclude that neurons selective during reinforced sessions were not also selective during extinction when outcomes were omitted.
Neurons were selective for outcome during either conditioning or extinction, but not during both contexts. A–D, Histograms depicting average normalized firing rate (spikes/s) for reward < neutral (n = 32; A, C) and reward > neutral (n = 25; B, D) cells that are modulated when rats press the lever after cues during reinforced trial blocks (A, B), but not after extinguished cues (C, D), for reward (blue), neutral (orange), and shock (red) trial types. E–H, Histograms depicting average normalized firing rate (spikes/s) for reward < neutral (n = 18; E, G) and reward > neutral (n = 6; F, H) cells that are modulated when rats press the lever during extinction trial blocks (E, F), but not for cues during reinforced trial blocks (G, H), for reward (blue), neutral (orange), and shock (red) trial types. Cue onset is depicted with a gray dashed line aligned to time = 0. Cells are drawn from the total population and were behaviorally matched across reinforced and extinction (N = 241).
Single-neuron example of a decreasing-type cell showing activity (spikes/s) during reinforced (A–C) and extinction (D–F) trial blocks for press trials for each trial type: reward (A, D), neutral (B, E), and shock avoidance (C, F). Activity is aligned to cue onset at time = 0 s, indicated by a gray line, and binned at 100 ms. Each tick mark equals one action potential.
Interestingly, another subpopulation of neurons that were not selective during conditioning became selective during extinction, showing significantly different firing rates to reward relative to neutral during extinction (Fig. 5E–H). There were 24 (10%) neurons in total that were significantly modulated by expected reward during extinction (χ2 = 12.4, p < 0.06). Of these, 18 (7.5%; χ2 = 24.2, p < 0.05) and 6 (2.5%; χ2 > 0.001, p = 0.99) neurons exhibited lower (Fig. 5E) and higher (Fig. 5F) firing on reward trials during extinction, respectively. Of these 24 neurons that were reward selective during extinction, only 6 were also selective during conditioning, which is not significantly greater than chance alone (6 of 241; 2.49%; χ2 < 0.001, p = 0.99). Similarly, of the 5 (2.1%; χ2 = 0.16, p = 0.69) and 6 (2.5%; χ2 < 0.001, p = 0.99) neurons that showed lower and higher firing on shock trials during extinction, respectively, none was also selective during the reinforced trial block. Therefore, we conclude that neurons selective during extinction were not selective during trials when outcomes were present.
Emergence of behavioral and neural selectivity during learning
Next, we explored the relationship between neural selectivity and behavior in reinforcement and extinction as rats performed reinforced and extinction trials blocks. Although rats were well trained and reinforced blocks were always presented first in each session, we sought to determine whether neural selectivity changed over of the course of the first, reinforced block of trials. Likewise, during extinction, we wanted to determine whether neural selectivity reflected behavioral adjustments made by the rats as they came to recognize that rewards and shocks were no longer delivered.
To address this question, we computed behavioral selectivity and neural selectivity indices for the first (early) and last (late) 10% of trials per block (Fig. 7). Neural selectivity during the cue period was computed by subtracting firing rate during neutral press trials from firing rate during reward press trials (reward − neutral) for increasing-type cells and by subtracting reward from neutral trials (neutral − reward) for decreasing-type cells so that positive values for both groups of cells reflect stronger neural selectivity. To compute our behavioral selectivity index, because rats were faster on reward compared with neutral trials, we subtracted RTs on reward press trials from RTs on neutral press trials (neutral − reward). Therefore, for both behavior and neural indices, higher values reflect stronger selectivity.
Outcome selectivity differs for early and late trials in reinforced and extinction trial blocks. Early and late trials are defined as the first and last 10% of trials in a session, respectively. A–C, Bar graphs depicting differences in behavioral selectivity index (A; combined n = 81) and neural selectivity index (firing rate differences) between reward and neutral trials for neurons selective during reinforced trial blocks (B; n = 57) and extinction trial blocks (C; n = 24). For firing rate during the cue period, selectivity was computed by subtracting neutral from reward trials (reward − neutral) for increasing-type cells and by subtracting reward from neutral trials (neutral − reward) for decreasing-type cells, so that the positive values for both groups of cells reflect stronger neural selectivity. For RT, because rats were faster compared with neutral trials, we subtracted reward from neutral RTs (neutral − reward). Therefore, for both measures, higher values signify higher behavioral and reward selectivity. D–G, Scatter plots pitting behavioral selectivity (y-axis) and neural selectivity (x-axis) indices against each other during early and late phases of reinforced (D, E) and extinction (F, G) trial blocks, respectively.
The above analyses were performed separately on cells that were selective during reinforced and extinction trials blocks (i.e., the neurons shown in Fig. 5). To quantify these results, we performed a two-factor ANOVA with block (reinforcement, extinction) and time point (early, late) as factors for both RT and firing rate measures. We found an interaction between block and time point within the block for behavioral selectivity (Fig. 7A; F(1,320) = 16.28, p < 0.001), suggesting that behavioral selectivity increased and decreased during reinforced and extinction trial blocks, respectively. Late in the first block of trials, the difference between reward and neutral was stronger than early in the trial block (early vs late reinforcement: t(160) = 3.02, p < 0.01). During extinction, the opposite was true; that is, RT differences between reward and neutral were stronger early in the extinction block compared with late (early vs late extinction: t(160) = 3.01, p < 0.01).
Remarkably, as shown in Figure 7, B and C, neural selectivity followed the same pattern. Neural selectivity emerged during reinforced trials blocks, was present in early trials during extinction blocks, but weakened by late in extinction. We found a significant interaction between block and time point for neurons selective for reward cue during reinforcement (Fig. 7B; F(1,224) = 3.95, p < 0.05) and a nonsignificant trend toward an effect of block (F(1,224) = 3.7, p = 0.06; post hoc extinction early vs late: t(112) = 1.84, p = 0.07). For neurons selective during extinction blocks, there was a significant main effect of block (Fig. 7C; F(1,92) = 5.36, p < 0.05), but no significant interaction between block and time point (F(1,92) = 1.32, p = 0.25). Together, these analyses suggest that there is a relationship between behavior and neural selectivity, at least at the population level.
To determine whether there was a direct correlation between neural and behavioral selectivity, we plotted behavioral and neural selectivity indices against each other for each neuron during early and late portions of the blocks in which neurons were selective. For those neurons selective during reinforcement blocks, there were no significant correlations between behavior and neural firing in either early (Fig. 7D; r2 = 0.0002, p = 0.91) or late (Fig. 7E; r2 = 0.01, p = 0.45) reinforcement. For neurons selective during extinction blocks, the correlation between behavior and neural firing was not significant early in extinction (r2 = 0.02, p = 52), but was significantly negatively correlated during late extinction trials (r2 = 0.17, p < 0.05).
Outcome selectivity was response selective during extinction
In the previous sections, we showed that a subpopulation of neurons were outcome selective during either reinforced trials or extinction, but not in both. Next, we wanted to test whether this outcome selectivity seen during extinction was dependent upon the behavioral response (i.e., if the rat pressed or failed to press the lever). For these analyses, we focused on extinction trials from sessions with an adequate number of press and no-press trials for both conditions (i.e., we excluded sessions with too few no-press trials during reinforced trials) and only included sessions in which there were at least two press and no-press trials for each trial type (n = 244).
We found that outcome selectivity during extinction was also response dependent. Figure 8, A–D, illustrates average firing activity over trial time for neurons that fired significantly less (Fig. 8A) or more (Fig. 8B) on reward press trials compared with neutral press trials during extinction. This same population of neurons did not, however, show outcome selectivity on no-press trials during extinction. A total of 24 neurons exhibited significantly different firing on reward versus neutrals trials (9.8%; χ2 = 11.9.6, p < 0.05) during extinction press trials. Of these, 18 (7.4%; χ2 = 23.6, p < 0.05) and 6 (2.5%; χ2 < 0.001, p = 0.98) neurons exhibited significantly lower (Fig. 8A) and higher (Fig. 8B) firing on reward press trials compared with neutral press trials in extinction. Of the 24 neurons selective for reward on press trials, only 4 were also significantly selective during no press trials in extinction, which is significantly fewer than expected by chance alone (4 of 244; 1.6%; χ2 = 5.7, p < 0.05). Therefore, during extinction, neurons that were reward selective on press trials were not also selective on no-press trials.
Outcome selectivity during extinction was also response selective. A–D, Histograms depicting average normalized firing rate (spikes/s) for reward < neutral (n = 18; A, C) and reward > neutral (n = 6; B, D) cells that are modulated when rats press the lever during extinction (A, B), but not when they fail to press (C, D), for reward (blue), neutral (orange), and shock (red) trial types. E–H, Histograms depicting average normalized firing rate (spikes/s) for reward < neutral (n = 24; E, G) and reward > neutral (n = 2; F, H) cells that are modulated when rats fail to press the lever during extinction cues (E, F), but not when they press (G, H), for reward (blue), neutral (orange), and shock (red) trial types. Cue onset is depicted with a gray dashed line aligned to time = 0. Cells are drawn from the total population and were behaviorally matched across reinforced and extinction (N = 244).
Interestingly, other cells showed the opposite pattern. Figure 8, E–H, illustrates the average firing activity over trial time for neurons that exhibited significantly less (Fig. 8E) or more (Fig. 8F) firing on reward no-press trials compared with press trials during extinction. This same population of neurons did not, however, show outcome selectivity on press trials during extinction. A total of 26 neurons showed significantly different firing on reward versus neutral trials (10.7%; χ2 = 16.3, p < 0.05) during extinction no-press trials. Of these, 24 (9.8%; χ2 = 53.6, p < 0.05) and 2 (1%; χ2 = 8.89, p = 0.10) neurons exhibited significantly lower (Fig. 8G) and higher (Fig. 8H) firing on reward no-press trials compared with neutral. Of the 26 neurons that showed significant reward modulation on no-press trials, only 4 also showed selectivity during press trials (1.6%; χ2 = 0.71, p = 0.40). Therefore, during extinction, neurons that were reward selective on no-press trials were not also selective on press trials.
Overall, these results suggest that vmPFC neurons were both outcome and response selective, in that subpopulations of vmPFC neurons showed differential firing on reward or shock trials relative to neutral on either press or no-press trials in either conditioning or extinction sessions, but not in opposing contexts.
Discussion
Summary
Although vmPFC activity is often associated with fear attenuation and extinction learning, little is known about how it processes complex environments that present opportunities for both punishment and reward. Historically, most studies measuring firing within the vmPFC have shown increased activity to cues predicting the extinction of an aversive shock, but, to date, no one has measured activity during active punishment avoidance. Here, we recorded from neurons within vmPFC while rats performed a cued combined approach and avoidance task followed by extinction. We found that neurons within the vmPFC were both outcome and response selective in that subpopulations of vmPFC neurons fired differently on reward or shock trials relative to neutral on either press or no-press trials in either conditioning or extinction sessions, but not in opposite conditions. This effect was more robust for reward trials than shock trials, with very few cells showing modulation by shock avoidance cues.
Neural activity in vmPFC is modulated by cues associated with reward during reinforced or extinction trials
Most strikingly, firing rates were significantly modulated by cues signaling subsequent reward approach, consistent with many emerging studies reporting a role for vmPFC in reward-seeking behavior. This finding is consistent with recent electrophysiology studies showing that single units in vmPFC are modulated during cue-evoked approach responses (Burgos-Robles et al., 2013; Moorman and Aston-Jones, 2015). Our results further demonstrate that vmPFC neurons are modulated by cues that predict reward during positive reinforcement but not during negative reinforcement.
Interestingly, recent work has revealed the existence of specific inhibitory projections from CeA to vmPFC that may influence reward-related behaviors. Seo et al. showed that specifically activating a subset of GABAergic neurons projecting from CeA to vmPFC in mice amplified external reward valuation, increasing nose poke behavior for sucrose reward in an operant conditioning paradigm, whereas producing no effect on internal motivation, value states, or overall reward consumption (Seo et al., 2016). Our data also generally suggest that vmPFC activity reflects reward approach, not the intrinsic value or motivational drive of cues.
We also saw context-dependent firing related to block (reinforced or extinction) and response type (press or no-press). This finding is consistent with recent work by Moorman et al. showing context-dependent firing in vmPFC optimizes behavioral output for reward-seeking and extinction contexts, with neurons firing more strongly for reward approach in reinforced contexts and firing more when behavior was inhibited in extinction (Moorman and Aston-Jones, 2015). This study is one of many implicating the vmPFC in extinction and context-dependent behavioral control (Milad and Quirk, 2002; Burgos-Robles et al., 2007; Hefner et al., 2008; Camp et al., 2009; Wilber et al., 2011; Holmes et al., 2012). Others have hypothesized that separate, yet intermingled, neural ensembles within the vmPFC encode reward-seeking and extinction because inactivation of food-seeking and extinction-related ensembles decreased and increased food seeking, respectively (Warren et al., 2016). These results are consistent with our current findings in that separate individual neurons were modulated by either press or no-press in either reinforcement or extinction contexts throughout vmPFC.
Neural activity in vmPFC is not modulated by cues associated with shock avoidance during reinforced or extinction trials
Although an abundance of literature emphasizes the role of vmPFC in the suppression of amygdala-driven fear responses, suggesting that it may be a critical player in inhibiting freezing and allowing behaviors that promote avoidance, we saw little vmPFC modulation during cues predicting successful shock avoidance. This result was surprising to us because the avoidance of an aversive foot shock would, much like pursuit of reward, result in a positive outcome (i.e., absence of shock). Indeed, during a task very similar to the one described here, we have shown previously that phasic dopamine release is high to both reward and avoidable shock cues (Gentry et al., 2016). In addition, previous studies have shown that vmPFC lesion or inactivation disrupts avoidance behavior by affecting how the animal responds to shock and safety cues (Sangha et al., 2014; Adhikari et al., 2015). Nevertheless, our data do not necessarily contradict current literature regarding the role of vmPFC in fear suppression. Fear conditioning has been shown to reduce vmPFC excitability and low vmPFC activity may contribute to the encoding of contextual fear, whereas extinction of fear has been shown to increase vmPFC firing rates that were previously low during fearful cues (Cruz et al., 2014; Giustino et al., 2016; Soler-Cedeño et al., 2016). However, these studies did not examine the role of vmPFC during active avoidance. Emerging perspectives are beginning to emphasize the need to study these regions in a broader context than fear extinction by using more naturalistic approach and avoidance paradigms (Bravo-Rivera et al., 2014, 2015; Delgado et al., 2016).
Schwartz et al. (2017) recently found that vmPFC to NAc connections are recruited when animals make choices involving conflict and reward; they concluded that activation of this pathway drives animals to choose actions that result in the most rewarding outcome, whereas simultaneously inhibiting actions that may interfere with this choice. In this study, animals must learn to suppress the drive to avoid a risky pain-predictive cue to gain a reward. By using designer receptors exclusively activated by designer drugs (DREADDs) and microinjections of a GABA agonist, the investigators were able to temporarily inactivate infralimbic cortex (IL) during performance of their approach–avoidance task and reinstate avoidance of the pain-predictive cue in rats. They concluded that, after learning, IL function is needed to overcome the drive to avoid punishment to gain valuable food reward, which differs from fear-conditioning studies showing that IL activity is only critical during learning. Therefore, it is likely that the role of vmFPC is more complex than proposed thus far by fear extinction studies and theories may need to be revised to account for data from avoidance and approach studies.
There are several possible explanations for why we saw little vmPFC modulation by shock avoidance cues in our task. First, our rats were very highly trained on our combined approach and avoidance task, having completed over 30 sessions before vmPFC recordings were collected for an additional 2 months. Although the literature is conflicted regarding the role of vmPFC during and after learning, it might be argued that rats in our study were no longer using predicted outcomes (i.e., promise of reward and threat of foot shock) to guide behavior, but rather were responding habitually. It is possible, then, that responding on avoidance trials could have initially been governed by vmPFC, but control had been transferred to more habit-oriented regions such as the dorsal striatum. Although this is consistent with our behavioral findings that rats responded at a high rate for all trial types, it seems unlikely that this is the case because we would also expect reward responses to become habitual over time, yet we still saw significant modulation to reward cues in vmPFC. This also seems unlikely because rats updated their behavior daily depending on whether they were performing reinforced or extinction trials.
Another possibility is that the vmPFC may be more important during early avoidance learning. Others have shown that subregions of vmPFC display conditioned stimulus-evoked responses during early and late extinction, but these responses decrease in magnitude with training (Chang et al., 2010). Further, others have shown that vmPFC is important in discriminating between safety and fear cues (Sangha et al., 2014). Therefore, it is possible that cues predicting successful avoidance with certainty may no longer elicit a fear response when rats are well trained for shock avoidance, as in our study, but are instead interpreted as safety cues. However, we found that rats were slower to respond during avoidance trials compared with reward or neutral trials, suggesting that they were still affected behaviorally by the potential for shock; in addition, we found that rats froze more to shock cues than reward or neutral cues.
It is also possible that reward and neutral cues may be interpreted as safety cues late in training because shock is not possible on these trial types. Rats in our task pressed at a high rate for all trial types, including neutral trial types, which are unreinforced regardless of behavior. However, rats still pressed less for neutral cues compared with for reward or shock cues and were slower to press for neutral compared with reward, suggesting that they were less motivated by neutral cues. Rats also froze less for neutral cues than shock cues and oriented toward the lever less for neutral cues than for reward cues. Therefore, we think rats were “hedging their bets” by pressing for all trial types because pressing requires little effort and could not result in a negative outcome.
Conclusion
The data presented here provide evidence that vmPFC is involved in cue-driven, reward-guided behavioral optimization. This finding is of great interest because this cortical region has commonly been linked to fear extinction and is beginning to be implicated more in reward approach. Here, we found distinct correlates within vmPFC for reward-modulated cues that were response (press; no-press) and block (reinforced; extinction) specific. Surprisingly, we found little vmPFC modulation related to shock avoidance cues. This work provides new insights into the neurobiological underpinnings of approach and avoidance behaviors and extinction learning.
Footnotes
This work was supported by the National Institutes of Health (National Institute on Drug Abuse Grant R01DA040993 and National Institute of Mental Health Grant R01MH071589).
The authors declare no competing financial interests.
- Correspondence should be addressed to either Matthew Ryan Roesch or Ronny Nicole Gentry, Department of Psychology, University of Maryland, 1120 Biology-Psychology Building, College Park, MD 20742, mroesch{at}umd.edu or ronny.gentry{at}gmail.com