Abstract
Different subregions of the prefrontal cortex (PFC) contribute to the ability to respond flexibly to changes in reward contingencies, with the medial versus orbitofrontal cortex (OFC) subregions contributing differentially to processes such as set-shifting and reversal learning. To date, the manner in which these regions may facilitate reversal learning in situations involving reward uncertainty remains relatively unexplored. We investigated the involvement of five distinct regions of the rat OFC (lateral and medial) and medial PFC (prelimbic, infralimbic, and anterior cingulate) on probabilistic reversal learning wherein “correct” versus “incorrect” responses were rewarded on 80% and 20% of trials, respectively. Contingencies were reversed repeatedly within a session. In well trained rats, inactivation of the medial or lateral OFC induced dissociable impairments in performance (indexed by fewer reversals completed) when outcomes were probabilistic, but not when they were assured. Medial OFC inactivation impaired probabilistic learning during the first discrimination, increased perseverative responding and reduced sensitivity to positive and negative feedback, suggestive of a deficit in incorporating information about previous action outcomes to guide subsequent behavior. Lateral OFC inactivation preferentially impaired performance during reversal phases. In contrast, prelimbic inactivation caused an apparent improvement in performance by increasing the number of reversals completed. This was associated with enhanced sensitivity to recently rewarded actions and reduced sensitivity to negative feedback. Infralimbic inactivation had no effect, whereas the anterior cingulate appeared to play a permissive role in this form of reversal learning. These results clarify the dissociable contributions of different regions of the frontal lobes to probabilistic learning.
SIGNIFICANCE STATEMENT The ability to adjust behavior in response to changes involving uncertain or probabilistic reward contingencies is an essential survival skill that is impaired in a variety of psychiatric disorders. It is well established that different forms of cognitive flexibility are mediated by anatomically distinct regions of the frontal lobes when reinforcement contingencies are assured, however, less is known about the contribution of these regions to probabilistic reinforcement learning. Here we show that different regions of the orbitofrontal and medial prefrontal cortex make distinct contributions to probabilistic reversal learning. These findings provide novel information about the complex interplay between frontal lobe regions in mediating these processes and accordingly provide insight into possible pathophysiology that underlies impairments in cognitive flexibility observed in mental illnesses.
Introduction
It is well established that different regions of the prefrontal cortex (PFC) mediate distinct forms of cognitive flexibility. For example, lesions of the dorsolateral PFC (dlPFC) in primates or medial PFC in rats impairs shifts between different strategies or attentional sets (Dias et al., 1996; Ragozzino et al., 1999; Birrell and Brown, 2000). In comparison, shifting between different stimulus–reward associations (reversal learning) is facilitated by the orbitofrontal cortex (OFC) (McAlonan and Brown, 2003; Ghods-Sharifi et al., 2008). These findings have provided valuable insight into the neural mechanisms underlying the ability to adapt behavior to changing circumstances, but have been limited mostly to procedures that provide explicit “correct” or “incorrect” feedback, a scenario that rarely arises in the real world. Indeed, recent data suggest that the specific frontostriatal circuitry involved behavioral/cognitive flexibility can vary depending on whether feedback is probabilistic or assured (Dalton et al., 2014).
Damage to the OFC in humans and nonhuman primates impairs reversal learning during tasks that provide unequivocal feedback, whereas damage to the dlPFC leaves performance intact (Dias et al., 1996; Fellows and Farah, 2003). Similarly, patients with damage to the OFC display impairments on probabilistic reversal learning (PRL) where more ambiguous feedback is provided, whereas those with damage to lateral frontal regions that excluded the OFC displayed more variable effects on performance (Berlin et al., 2004; Hornak et al., 2004). Note that these latter situations require more complex evaluations of action–outcome associations and tracking of the broader context of reward history to ascertain which response option may be more profitable. Thus, additional frontal regions may be recruited when cognitive demands are increased, an idea supported by imaging studies using tasks where correct responses are rewarded only 70–80% of the time and “incorrect” responses are occasionally rewarded. These studies highlight a central role for the OFC in guiding responding when feedback is ambiguous, but also implicate other PFC regions in this type of learning, including the ventrolateral PFC, dorsal anterior cingulate (dACC) and the dlPFC (Cools et al., 2002; O'Doherty et al., 2003; Remijnse et al., 2005). Tsuchida et al. (2010) directly addressed this issue by using lesion–function mapping in patients with focal frontal lobe damage to identify regions that were critical for PRL. Patients with OFC (but not dACC) lesions were impaired. This latter observation emphasizes that even though studies with brain-damaged patients can identify general regions of the frontal lobes that may contribute to certain forms of cognitive flexibility; the often diffuse lesions incurred by these individuals make it difficult to locate specific functions to distinct cortical regions. In this regard, preclinical studies may shed additional light on this issue.
The present study conducted a systematic analysis of the contribution of five key regions of the rat frontal lobe to PRL, using an operant task developed for rats (Bari et al., 2010; Dalton et al., 2014). There has been debate whether rat medial PFC and OFC regions share functional homology to similar regions in primates (Preuss, 1995; Uylings et al., 2003). Taking into account their anatomical connectivity, projection patterns of the rat medial OFC (mOFC) and lateral OFC (lOFC) to striatum and amygdala are similar to those of areas 14 and 12/13 of the primate OFC (Ongür and Price, 2000; Schilman et al., 2008; Wise, 2008; Hoover and Vertes, 2011). Likewise, the rat anterior cingulate, prelimbic, and infralimbic regions display similar striatal connectivity to areas 24, 32, and 25 of primate anterior cingulate (Sesack et al., 1989; Ongür and Price, 2000; Hoover and Vertes, 2007; Wise, 2008). Previous studies in our laboratory have identified a key role for the nucleus accumbens shell in facilitating performance of this task and in mediating reward sensitivity (Dalton et al., 2014). Here, we assessed the effects of inactivation of some of the main OFC and medial PFC inputs to the accumbens in well trained rats to identify possible dissociable roles for these regions in this form of cognitive flexibility.
Materials and Methods
Subjects.
Male Long–Evans rats (280–350 g) were housed in single cages and maintained on a 12 h light/dark cycle with ad libitum access to standard laboratory chow and water. The colony was maintained at 21°C with a 12 h light/dark cycle (lights on at 07:00 h). All experiments were performed during the light phase of the cycle. Rats were given 7–8 d to acclimatize to the colony before behavioral procedures began. Rats were handled and weighed daily during this period and throughout the course of the experiment. During behavioral training, rats had ad libitum access to water and were maintained on a restricted laboratory chow diet to maintain 85–90% of ad libitum weight in age-matched rats. All experiments were conducted in accordance with the standards of the Canadian Council on Animal Care and were approved by the Committee on Animal Care, University of British Columbia.
Apparatus.
All testing was conducted in operant chambers (30.5 × 24 × 21 cm; Med-Associates) enclosed in sound-attenuating boxes. Each box contained a fan to mask outside noises and to provide ventilation. Two retractable levers were located on either side of a central food hopper into which sugar pellet reinforcement (45 mg; BioServ) was delivered. Each chamber was illuminated by a 100 mA house light located in the top-center of the wall opposite the levers. All experimental data were recorded by an IBM personal computer connected to the chambers via an interface.
Orbital/prefrontal regions-of-interest and surgery.
Before training, rats were anesthetized with ketamine (100 mg/kg)/xylazine (7 mg/kg), and implanted with bilateral 23 gauge stainless-steel guide cannulae located above the mOFC (flat skull: anteroposterior = +4.2 mm, mediolateral = ±0.7 mm, dorsoventral = −3.2 mm from dura) the lOFC (flat skull: anteroposterior = +3.8 mm, mediolateral = ±2.6 mm, dorsoventral = −3.2 mm from dura), the prelimbic cortex (flat skull: anteroposterior = +3.4 mm, mediolateral = ±0.7 mm, dorsoventral = −2.8 mm from dura), the infralimbic cortex (flat skull: anteroposterior = +2.8 mm, mediolateral = ±0.7 mm, dorsoventral = −4.1 mm from dura) or the dACC (flat skull: anteroposterior = +2.0 mm, mediolateral = ±0.7 mm, dorsoventral = −1.2 mm from dura), using standard stereotaxic techniques. Guide cannulae were implanted vertically and held in place with stainless steel screws and dental acrylic. Thirty gauge obdurators flush with the end of guide cannulae remained in place until the infusions were made. Rats were given at least 1 week to recover from surgery before behavioral training began. During this period, they were handled for at least 5 min each day and were food restricted to 85% of their free-feeding body weight.
Lever-pressing training.
On the day before their first exposure to the operant chambers, rats were given ∼25 sugar pellet rewards in their home cage. On the first day of training, the food cup contained two to three pellets and crushed pellets were placed on a lever before each rat was placed into the chamber. Rats were first trained to press one of the levers to receive reward on a fixed-ratio 1 schedule to a criterion of 60 presses in 30 min, and were required to press the other lever on the next day (counterbalanced left/right between subjects). Rats were then trained on a simplified version of the full task. These 90 trial sessions began with the levers retracted and the operant chamber in darkness. Every 40 s, a new trial was initiated by illumination of the house light and insertion of one of the two levers into the chamber. If the rat failed to respond on the lever within 10 s, the lever was retracted, the house light was extinguished, and the trial was scored as an omission. A response within 10 s of lever insertion resulted in delivery of a single pellet with 50% probability. This procedure was used to familiarize the rats with the probabilistic nature of the full task. In every pair of trials, the left or right lever was presented once, and the order within the pair of trials was randomized. Rats were trained for ∼3–4 d to a criterion of 80 or more successful trials (ie; ≤10 omissions), after which they were trained on one of two reversal learning tasks.
PRL.
The procedures used in the present study were modified from those described by Bari et al. (2010) through the use of retractable levers (as opposed to nosepoke apertures used in the previous study). Daily sessions consisted of 200 discrete choice trials, with an intertrial interval of 15 s (50 min total). Trials began with illumination of the house light, and 3 s later, insertion of both levers into the chamber. At the start of each session, one of the two levers was randomly selected to be correct and the other incorrect. During this initial discrimination phase, a response on the correct lever delivered a single reward pellet on 80% of trials, whereas an incorrect response delivered reinforcement on only 20% of trials. Failure to press a lever within 10 s of insertion (i.e., trial omission) led to their retraction and termination of the house light until the next trial. Once the correct lever was selected on eight consecutive trials (regardless of whether a correct choice was reinforced), the contingencies were reversed so that the correct lever now became the one that provided a lower probability of reward (i.e., incorrect lever) and vice versa. This pattern was repeated over the course of a daily session. Daily training sessions continued until a group of rats achieved more than three reversals per session for 2 consecutive days. Across all experiments, rats required an average of 11 training sessions (range 10–15) to achieve this criterion. On the following day, rats received their first counterbalanced microinfusion tests.
Reversal learning with assured outcomes.
We determined a priori that if inactivation of a particular cortical region impaired performance on the PRL task, we would also assess the effect of this manipulation on a simplified version of the task where the outcomes of correct and incorrect choices were assured, rather than probabilistic. This would determine whether impairments in probabilistic learning were attributable to more general impairments in cognitive flexibility or more selectively driven by disruptions in the ability to alter behavior in response to probabilistic feedback. This task differed from the probabilistic reversal learning task only with respect to the contingency that correct/incorrect response always/never delivered reinforcement, respectively. Separate groups of experimentally naive rats, unfamiliar with the probabilistic reversal procedure were trained on this task for 7 d, after which they proceeded to the microinfusion test phase of the experiment.
Drugs and microinfusion procedures.
One or 2 d before their first microinfusion test day, rats received a mock infusion procedure, during which obdurators were removed from the guide cannulae, and replaced with stainless steel injectors for 2 min, without an infusion procedure.
A within-subjects design was used for all experiments. Inactivation of each brain region was achieved by microinfusion of a solution containing the GABAB agonist baclofen and the GABAA agonist muscimol (100 ng each per side, Sigma-Aldrich). GABA agonists or saline were infused bilaterally (0.4 μl over 88 s) via a 30 gauge injection cannula that protruded 0.8 mm beyond the guide cannula. Injection cannulae were left in place for 60 s to allow for diffusion. Rats remained in their home cages for an additional 10 min period before behavioral testing. Neurophysiological studies have shown that administration of muscimol into the brain induces a significant suppression of neural activity for at least 2 h (van Duuren et al., 2007), which would last throughout the duration of the test sessions used here (50 min).
On the first infusion test day, one-half of the rats in each group received saline infusions, and the other one-half received baclofen/muscimol. The following day all rats received a baseline training day (no infusion). If a rat achieved less than two reversals during this baseline session, it was given an additional day of training before the second infusion test. On the day after baseline performance was reestablished, rats received a second counterbalanced infusion of saline or baclofen/muscimol.
Histology.
After completion of behavioral testing, rats were euthanized in a carbon dioxide chamber. Brains were removed and fixed in a 4% formalin solution. The brains were frozen and sliced in 50 μm sections before being mounted and stained with cresyl violet. Placements were verified with reference to the neuroanatomical atlas of Paxinos and Watson (2005). Data from rats with placements outside the borders of the region-of-interest and asymmetrical placements were removed from the analysis. In general, animals with inaccurate placements did not display prominent changes in performance following inactivation treatments relative to saline infusions. The locations of infusion sites are displayed in Figure 1.
Histology. Left, Schematics of coronal sections showing the range of acceptable locations of infusions within the medial OFC (filled circles) and lateral OFC (open circles). Right, The range of acceptable locations of infusions within the prelimbic (filled circles), infralimbic (open circles), and anterior cingulate (filled squares) regions of the PFC. Photomicrographs of representative placements in these regions are also presented, with arrows highlighting the location of the cannulae tips.
Data analysis.
The primary dependent variable of interest was the number of reversals completed per session; these were analyzed as a function of the number of complete trials. Specifically, data were transformed using the following formula: [(no. of reversals completed per session/(200 − no. of trial omissions)] × 100 (i.e., number of reversals per 100 completed trials). This transformation was used to accommodate for any potential increases in the number of trial omissions induced by inactivation treatments, which could complicate interpretation of the raw data because a decrease in the number of reversals/session could either be attributable to impairment in cognitive processes related to reversal learning or merely reflect fewer completed trials. These data were analyzed with repeated-measures one-way ANOVAs.
Ancillary analyses assessed differences in the number of errors committed to achieve criterion of eight correct consecutive choices for the first discrimination and the first reversals of a session, as we have described previously (Dalton et al., 2014). These data were analyzed typically with two-way repeated-measures ANOVAs, with treatment and phase (first discrimination, first reversal) as two within-subjects factors.
Whenever inactivation of a particular region altered PRL performance significantly, we also analyzed the number of perseverative errors rats made during the reversal phases of the task. For these analyses, we compared the number of consecutive incorrect choices committed after a reversal of reinforcement contingencies (i.e., after 8 consecutive correct responses). Once a rat made a correct response after a reversal, subsequent errors were no longer counted as perseverative. For these analyses, we compared the average number of perseverative errors made by an individual rat over the minimum number of reversals completed by that rat after both treatments. This was because when rats completed a greater number of reversals, they tended to make fewer perseverative errors during the latter part of the session, which in turn could artificially reduce the average number of perseverative errors per reversal for the session. Thus, this procedure allowed a more unbiased measure of perseveration that we could compare across both treatments. For example, under control conditions, a rat may have completed five reversals, whereas after inactivation treatments, the same rat may have only completed three reversals. In this instance, we computed the average number of perseverative errors made per reversal for only the first three reversals during both control and inactivation treatments. These data were analyzed using repeated-measures one-way ANOVAs.
For the PRL task, we also analyzed each animal's choices according to the outcome after both correct and incorrect choices of each preceding trial to assess whether neural inactivation altered reward (“win-stay”) or negative feedback (“lose-shift”) sensitivity (Bari et al., 2010). Win–stay ratios assessed the likelihood that a subject followed a rewarded choice with another choice of the same type (correct or incorrect). These ratios were calculated from the number of trials on which a rat chose the correct/incorrect lever after being rewarded on the preceding trial, divided by the total number of rewarded correct or incorrect choices. Conversely, lose–shift ratios indexed how likely rats were to switch choices after receiving negative feedback (i.e., reward omission) for a response on the preceding trial. These values were calculated from the number of trials on which a rat switched responding to the other lever after not being rewarded for a correct or incorrect choice on the preceding trial, divided by the total number of non-rewarded correct/incorrect choices. The proportion of win-stay and lose-shift scores for both correct and incorrect choices were analyzed using three-way repeated-measures ANOVAs with treatment, trial type (win-stay and lose-shift), and choice type (correct and incorrect) as three within-subjects factors.
Last, latencies to make a choice and the number of trial omissions (i.e., trials where no response was made within 10 s of lever insertion) were also analyzed with one-way ANOVAs.
Results
mOFC inactivation: PRL performance
Fifteen rats with cannulae implanted into the mOFC were initially trained on the PRL task. Data from three rats were eliminated because of inaccurate placements residing ventral to the mOFC. For the remaining animals (n = 12), infusions of baclofen/muscimol into the mOFC markedly impaired performance, indexed by a decreased in the number of reversals completed (F(1,11) = 23.25, p = 0.001; Fig. 2A). This impairment was not accompanied by changes in the number of trial omissions or choice latencies (both F values <1, both p values >0.35; Table 1).
Inactivation of the medial (top row) or the lateral (bottom row) regions of the OFC differentially impairs PRL. A, Microinfusions of baclofen and muscimol (Bac/Mus) into the mOFC (n = 12) reduced the number of reversals completed per 100 successful trials. For this and all other figures, circles and dashed lines represent data from individual animals following both treatments. B, Errors to achieve criterion performance during the initial discrimination and first reversal phases after inactivation and control treatments. mOFC inactivation increased errors during the first discrimination of the session, and this effect persisted during the first reversal. C, mOFC inactivation increased perseverative errors throughout the task. D, mOFC inactivation caused a decrease in both win-stay and lose-shift behavior after both correct and incorrect choices. E, lOFC inactivation (n = 10) also reduced the number of reversals completed. F, In contrast to mOFC inactivation, lOFC inactivation did not affect error rates during the initial acquisition of the task but did tend to increase the number of errors made during the first reversal. G, lOFC inactivation did not alter perseverative tendencies. H, These treatments reduced both win-stay and lose-shift behaviors only after incorrect choices. Asterisk denotes p < 0.05.
Number of trial omissions over 200 trials and average response latencies following inactivation and vehicle treatments in different regions of the OFC and medial PFC
To determine whether differences in performance were attributable to difficulty during reversal shifts or a more general disruption in learning based on probabilistic feedback, we compared the number of errors to achieve criterion for the initial discrimination and first reversal. One rat in this group did not achieve criterion performance on the initial discrimination following inactivation treatments, whereas the remaining 11 rats completed the initial discrimination phase and at least one reversal after both treatments. Analysis of the data from these 11 rats revealed a significant main effect of treatment (F(1,10) = 6.00, p = 0.034), but no treatment × phase interaction (F(1,10) = 0.41, p = 0.54). As displayed in Figure 2B, mOFC inactivation increased errors to criterion during the initial discrimination, and first reversal, although visual inspection of these data showed that this effect was numerically larger during the initial discrimination. Furthermore, during the reversal phases of the task, mOFC inactivation increased the average number of perseverative error per reversal (i.e., consecutive errors following a shift in reinforcement contingencies; F(1,11) = 5.51, p = 0.041; Fig. 2C). Thus, mOFC inactivation not only impaired the use of probabilistic reward feedback to identify the more profitable option at the start of a test session, but also retarded suppression of a particular response upon shifts in reinforcement contingencies.
Additional insight into the deficits induced by mOFC inactivation was obtained from analyses of changes in sensitivity to positive or negative feedback. Under control conditions, rats followed a rewarded correct choice with another correct choice (win-stay behavior) on 70 ± 2% of these occasions, whereas a rewarded incorrect choice was followed by another incorrect choice on 65 ± 4% of these types of trials. In comparison, on trials where rats were not rewarded after a response, they shifted to the alternative lever (lose-shift) on 48 ± 4% and 53 ± 3% of subsequent trials after correct and incorrect choices, respectively. Analysis of these data obtained on saline and inactivation test days revealed a significant main effect of treatment (F(1,11) = 7.56, p = 0.019), but no other interactions effects with the treatment factor were observed (all F values <0.2, all p values >0.90). As displayed in Figure 2D, mOFC inactivation uniformly reduced both win-stay and lose-shift behavior, regardless of the whether the preceding choice was correct or incorrect. Thus, mOFC inactivation rendered animals less sensitive to either positive or negative feedback, reducing the impact that recent action outcomes exerted on subsequent choices. Together, these data demonstrate that the mOFC plays a critical role in facilitating probabilistic learning. The marked impairment induced by mOFC inactivation in well trained subjects was apparent during the initial discrimination phase, suggesting that these effects may not reflect deficits exclusive to reversal learning, but rather a more comprehensive impairment in probabilistic reinforcement learning. These impairments were associated with increased perseverative tendencies, along with a general reduction in the ability to incorporate positive or negative feedback to guide subsequent action selection.
lOFC inactivation: probabilistic reversals
Eleven rats with cannulae implanted into the lOFC were used in this experiment. Data from one rat was eliminated because of inaccurate placements that were ventral to the lOFC, leaving a final n = 10. Inactivation of the lOFC impaired PRL performance, as indexed by a reduction in the number of reversals completed per 100 trials (F(1,9) = 8.33, p = 0.018; Fig. 2E). However, analysis of the errors made during the initial phases of the task suggested that these impairments were qualitatively different from those observed following mOFC inactivation. Analysis of the number of errors to criterion for the initial discrimination and first reversal revealed no significant main effect of treatment (F(1,9) = 1.28, p = 0.29) but did reveal a strong trend toward a significant treatment × phase interaction (F(1,8) = 4.68, p = 0.059). As is apparent from Figure 2F, performance during the initial discrimination phase was not significantly affected by inactivation of the lOFC, however, performance was noticeably impaired during the reversal phase indicating that rats had difficulty modifying their behavior following a change in reward contingencies. However, this impairment was not associated with enhanced perseverative tendencies (F(1,9) = 1.20, p = 0.30; Fig. 2G). This lack of effect on perseverative responding may be related to the extended training rats received, which may reduce lOFC involvement response suppression under these circumstances (Boulougouris and Robbins, 2009; Young and Shapiro, 2009; Stalnaker et al., 2015).
Unlike mOFC inactivation, lOFC inactivation significantly increased choice latencies (F(1,9) = 7.85, p = 0.022; Table 1), in a manner similar to the effects of these manipulation on response latencies during probabilistic discounting (St Onge and Floresco, 2010). Accordingly, lOFC inactivation also increased trial omissions (F(1,9) = 7.86, p = 0.021; Table 1), presumably attributable to a slowing of response selection that led to a greater number of trials where rats did not respond within the allotted 10 s period while the levers were extended. This increase in choice latency may be related to alterations in phasic dopamine transmission which can invigorate approach behavior toward reward-related cues (Flagel et al., 2011). Similar inactivation of the lOFC has been reported to attenuate phasic dopamine responses induced by reward-related cues during cost/benefit decision making (Jo and Mizumori, 2015). Thus, in addition to mediating accurate performance during PRL, neural activity in the lOFC may also facilitate timely approach toward reward-related stimuli via interactions with the dopamine system.
With respect to changes in win-stay/lose-shift behavior, analysis of these data revealed a significant treatment × choice interaction (F(1,9) = 5.57, p = 0.043; Fig. 2H), but no other significant main effects or interactions with the treatment factor (all F values <1.5, all p values >0.25). Simple main effects analysis further revealed that lOFC inactivation did not affect win-stay or lose-shift tendencies following a correct choice (all F values <2.2, all p values >0.17). Instead, these treatments induced a subtle but statistically significant reduction in both win-stay and lose-shift ratios after incorrect choices (F(1,9) = 5.09, p = 0.05; Fig. 2H right). Thus, following lOFC inactivation, rats were less likely to shift away from the incorrect lever after a more common non-rewarded response, and were also less likely to select the incorrect lever again on the rarer occasions when an erroneous choice was rewarded.
mOFC and lOFC inactivations: reversal learning with assured outcomes
The finding that inactivation of either the mOFC or lOFC impaired PRL in well trained animals differs from other observations that lesions of the OFC region does not impair reversal performance once rats have experienced shifts in reinforcement contingencies (Schoenbaum et al., 2002; McAlonan and Brown, 2003; Boulougouris et al., 2007; Boulougouris and Robbins, 2009). An important difference between the previous and present studies is that in the former instances, a correct/incorrect response always/never delivered reward. To explore whether impairments induced by inactivation of these OFC regions were related to the probabilistic nature of the task, separate groups of rats were well trained on a similar task in which a correct choice was always rewarded and an incorrect choice was never rewarded before receiving saline or inactivation treatments in either the mOFC or lOFC. In these experiments, rats completed more reversals compared with those trained on the probabilistic task, likely due to the relatively more straightforward reinforcement contingencies. As such, all rats in both the mOFC and lOFC groups completed the first discrimination and at least two reversals after saline and inactivation treatments, permitting us to compare the number of errors made across these three phases.
Infusions of baclofen/muscimol into the mOFC (n = 8) failed to affect performance when correct/incorrect responses were always/never reinforced (F(1,7) = 1.22, p = 0.30; Fig. 3A). Analysis of the number of errors to achieve criterion for the initial discrimination and subsequent two reversals did not reveal a significant main effect of treatment or treatment × phase interactions (all F values <2.84, all p values >0.13; Fig. 3B).
Inactivation of neither the mOFC (top row, n = 8) nor the lOFC (bottom row, n = 6) affects performance of a reversal learning task when feedback was assured. A, C, Number of reversals completed per 100 successful trials following saline or inactivation treatments within the mOFC or lOFC. B, D, Errors to achieve criterion performance were not affected by inactivation of either the mOFC (B) or the lOFC (D) during the initial discrimination and first two reversal phases of the reversal with assured outcomes task after inactivation and control treatments.
Inactivation of the lOFC (n = 6) had a somewhat equivocal effect on performance on this task. Three of the six animals in this experiment showed a considerable reduction in the number of reversals completed after inactivation relative to saline treatments, one rat displayed a marked increase in this measure and two others showed minimal change in performance (Fig. 3C). Thus, even though these treatments reduced the average number of reversals completed, analyses of these data failed to yield a significant difference between treatment conditions (F(1,5) = 1.08, p = 0.35; Fig. 3C). This trend appeared to be driven by a slight increase in the number of errors committed during the initial discrimination of the session, yet, analyses of the errors made during the first three discrimination also failed to reveal a significant difference between treatments (main effect of treatment: F(1,5) = 1.04, p = 0.36; treatment × phase interaction: F(2,10) = 2.35, p = 0.15).
With respect to other performance measures, the number of omissions was unaffected by inactivation of either brain region (both F values <4.28; both p values >0.09; Table 1), whereas response latency was unaffected by inactivation of the mOFC (F(1,7) = 0.43, p = 0.53) but again was significantly increased by lOFC inactivation (F(1,5) = 23.38, p = 0.005; Table 1). Thus, these data confirm that neural activity within the mOFC is not required for efficient reversal performance when animals have experienced shifts in reinforcement contingencies that are assured. Furthermore, the lOFC plays, at best, a relatively limited role in facilitating reversal shifts under these conditions after extended training, consistent with previous findings (Boulougouris and Robbins, 2009; Young and Shapiro, 2009). In comparison, both regions play more prominent, although somewhat different roles in mediating cognitive flexibility when action–outcome contingencies are probabilistic.
Medial PFC regions and PRL
Prelimbic PFC
Eighteen rats with cannulae implanted into the prelimbic area of the PFC were trained on the PRL task for this experiment. Data from four rats were eliminated because of inaccurate placements residing ventral to the prelimbic cortex leaving a final n = 14. In these animals, inactivation of the prelimbic cortex induced an surprising increase in the number of reversals completed (F(1,13) = 22.11, p < 0.001; Fig. 4A). Rats in this cohort completed fewer reversals/100 trials (1.6 ± 0.2) after saline infusions into the prelimbic cortex when compared with control performance of rats in the OFC groups (2.6–3.4 reversals completed/100 trials). To confirm that the increase in reversals completed following prelimbic inactivations was not an artifact of the somewhat poorer performance of these rats under control conditions, we analyzed data from a subset of animals whose performance after saline infusions was more comparable to rats in the OFC groups. Despite the fewer number of animals included in this analysis (n = 6), we again observed that inactivation of the prelimbic PFC increased the number of reversals completed/100 trials (mean = 3.6 ± 0.2) relative to saline infusions (mean = 2.2 ± 0.1: F(1,5) = 49.82, p < 0.001). The improvement in reversal performance induced by prelimbic inactivation was mirrored by a significant decrease in the number of errors made to reach criterion at both the discrimination and reversal phases (main effects of treatment: (F(1,13) = 15.61, p = 0.002); but no treatment × phase interaction (F(1,13) = 0.67, p = 0.43; Fig. 4B). Inactivation of the prelimbic cortex had no effects on perseverative errors (Fig. 4C), number of omissions or latency to respond (all F values <1.5, all p values >0.24; Table 1).
Inactivation of the prelimbic PFC induced an apparent improvement in PRL performance. A, Inactivation of the prelimbic PFC (n = 14) increased the number of reversals completed per 100 trials relative to control treatments. B, Errors to achieve criterion were reduced following prelimbic PFC inactivation. C, Perseverative-type errors were not affected by these treatments. D, Win-stay tendencies were increased following both correct and incorrect choices while lose-shift behavior was decreased only following correct choices. Asterisks denote p < 0.05.
Additional insight into the apparent improvement in probabilistic reversal performance induced by prelimbic cortex inactivation was obtained by analysis of the win-stay/lose-shift data. This analysis yielded a significant treatment × trial type interaction (F(1,13) = 27.42, p < 0.001) and a significant treatment × choice type interaction (F(1,13) = 10.39, p = 0.007), although the three-way interaction was not significant (F(1,13) = 1.99, p = 0.18). To further clarify the effect of prelimbic inactivation on reward and negative feedback sensitivity, exploratory two-way ANOVAs were conducted on win-stay and lose-shift data obtained after correct and incorrect choices. For correct choices, the analysis again revealed a significant treatment × trial type interaction (F(1,13) = 16.28, p = 0.001; Fig. 4D, left). Partitioning of this interaction confirmed that inactivation of the prelimbic PFC increased the tendency to follow a rewarded correct choice with another correct choice (p < 0.05), while at the same time, reducing tendency to shift responding after a non-rewarded correct choice (p < 0.05). Analysis of win-stay/lose-shift ratios after incorrect choices yielded another significant treatment × trial-type interaction (F(1,13) = 15.58, p = 0.002; Fig. 4D, right). On these types of trials, prelimbic inactivation again increased win-stay behavior (p < 0.05). However, these treatments did not alter the likelihood of rats shifting their responding after a non-rewarded incorrect response (p > 0.15). Thus, the enhanced reversal performance induced by prelimbic inactivation was likely driven by a greater tendency for rats to follow a rewarded correct choice with a similar choice, while at the same time, making them less likely to shift away from the correct lever on trials when correct choices were not reinforced.
Infralimbic PFC
Fourteen rats with cannulae implanted into the infralimbic PFC were trained on the PRL task, and data from three rats were eliminated because of inaccurate placements residing ventral to the infralimbic cortex. For the remaining animals (n = 11), inactivation of the infralimbic cortex did not significantly affect the number of reversals per 100 completed trials (F(1,10) = 1.41, p = 0.26; Fig. 5A) or errors at either the acquisition or reversal stage of the test (all F values <1.0; Fig. 5B). Notably, there was considerable overlap in terms of the anterior/posterior placements of guide cannula in this group relative to those in the prelimbic group (Fig. 1). Yet performance of this group under control conditions was comparable to other groups in the study, suggesting that the fewer number of reversals completed by rats in the prelimbic group after control treatments was more likely attributable to random variations in performance across groups rather than nonspecific damage incurred by the indwelling cannula. There were also no significant main effects or interactions with the treatment factor for win-stay/lose-shift behavior (all F values <1.0; Fig. 5C). Number of omissions made and latency to respond were also unaffected (both F values <1.0; Table 1).
Inactivation of the infralimbic PFC (top row, n = 11) or the anterior cingulate (bottom row, n = 10) did not significantly affect PRL performance. A, D, the number of reversals completed per 100 trials. B, E, The number of errors made to achieve criterion during either the initial acquisition or reversal stages of the task or (C, F) win-stay/lose-shift behavior. Note, however, the trend of reduced number of reversals completed induced by anterior cingulate inactivation.
dACC
Thirteen rats were initially trained in this experiment, with data from two rats being eliminated following the postmortem identification of tumors and data from one rat was eliminated because of inaccurate, asymmetrical placement, leaving a final n = 10. As displayed in Figure 5D, infusions of baclofen/muscimol into the dACC reduced the number of reversals per 100 completed trials in the majority of animals tested, yet one rat in this experiment showed a marked increase on this measure. This variability occluded our ability to detect a significant effect of treatment (F(1,9) = 3.05, p = 0.12) Despite this trend, analysis of the error data showed that dACC inactivation had no significant main effect on the number of errors made to reach criterion at either the acquisition or reversal stage of the test (all F values <1.0; Fig. 5E). Similarly, no significant effect was found for win-stay/lose-shift behavior (all F values <1.4; Fig. 5F). Latency to respond was significantly increased following inactivation of the ACC (F(1,9) = 13.18, p = 0.005; Table 1), whereas the number of omissions made was unaffected (F(1,9) = 1.62, p = 0.24; Table 1).
Discussion
The present study provides novel insight into the contribution of different OFC and medial PFC regions in reinforcement learning when reward feedback is probabilistic. Inactivation of the mOFC or lOFC induced qualitatively different deficits, with mOFC inactivation impairing probabilistic learning, increasing perseverative responding and reducing the impact of both rewarded and non-rewarded actions on subsequent action selection. lOFC inactivation more selectively impaired reversal performance, driven in part by a disruption in adjusting behavior after non-rewarded incorrect choices. In contrast, prelimbic medial PFC inactivation seemingly improved performance, increasing sensitivity to reinforced actions and reducing sensitivity to non-rewarded correct choices.
Different contributions by OFC subregions to PRL
OFC damage impairs reversal learning with determined outcomes, while leaving initial discrimination learning relatively intact (Dias et al., 1996; Fellows and Farah, 2003; Boulougouris et al., 2007; Ghods-Sharifi et al., 2008). Most rodent studies have focused on the lOFC, whereas comparatively few have examined the contribution of the mOFC to this form of cognitive flexibility (Gourley et al., 2010). The present findings that activity in both OFC regions enables efficient PRL, in combination with the relative lack of effect on performance on a similar task where outcomes were assured reveal that both OFC regions play fundamental and pervasive roles in facilitating flexible responding when reinforcement contingencies are probabilistic. These data complement those implicating the OFC in guiding behavior under conditions of uncertainty (Rogers et al., 1999; van Duuren et al., 2009) or when task complexity is otherwise increased (Rudebeck and Murray, 2008).
mOFC inactivation increased errors during the initial discrimination, suggestive of an impairment in distinguishing responses that yield high probability rewards from lower ones. This in keeping with suggestions that this region integrates goal value signals (Elliott et al., 2000; Kable and Glimcher, 2009) and mediates action–outcome representations (Mainen and Kepecs, 2009) to guide value-based action selection (Gläscher et al., 2009, 2012; Sul et al., 2010; Stopper et al., 2014). Additional analyses revealed that suboptimal reward seeking reflected a generalized deficit in retrieving and incorporating information about outcomes of previous actions to guide subsequent choice, as both win-stay and lose-shift behavior were reduced. Reduced negative feedback sensitivity after a non-rewarded incorrect choice may have contributed to increased perseveration, promoting persistent erroneous responding after a shift in reinforcement contingencies. This complex myriad of effects highlights the importance of the mOFC in facilitating probabilistic learning by integrating information about the likelihood of obtaining rewards following different actions to guide ongoing reward seeking.
In contrast, lOFC inactivation did not affect initial discrimination learning, suggesting that these manipulations left basic motoric and motivational processes intact. Instead, these treatments induced more restricted impairments during reversal stages, decreasing win-stay and lose-shift behavior selectively after incorrect choices, consistent with the idea that the lOFC mediates adjustments response selection upon violations of reward expectancies signaled by negative feedback (O'Doherty et al., 2001; Levens et al., 2014). Notably, functional imaging in monkeys performing reversal tasks have revealed outcome-associated activation of the lOFC that was related to win-stay/lose-shift behavior, suggesting that this region is involved in directing behavior that is adaptive to the context, given the recent distribution of reward to choices, to maximize future reward (Chau et al., 2015). Our results suggest that this activity may be particularly important following incorrect actions. Furthermore, the fact that lOFC inactivation did not affect perseveration suggests that impairments observed in this experiment are less likely to be attributable to updating action–outcome associations after a reversal, but rather, may reflect an impairment in maintaining appropriate patterns of choice upon changes in reinforcement contingencies.
These effects of lOFC inactivation complement findings obtained with monkeys with lesions of the OFC encompassing lateral and medial regions that displayed impaired performance on a three-choice probabilistic learning task, but only during the reversal phases (Walton et al., 2010). This collection of findings provide additional support for the recent theoretical synthesis of Stalnaker et al. (2015) who propose that the lOFC may be recruited in situations that require “a novel value to be computed on the fly using new information or predictions that have been acquired since the original learning.” Thus, the lOFC and mOFC may play distinct yet complementary roles in facilitating PRL. The mOFC facilitates use of probabilistic feedback to identify actions that may yield higher probability rewards (Noonan et al., 2012). In comparison, the lOFC may identify in changes in reinforcement contingencies and signal the mOFC to update appraisals concerning actions that may be more profitable.
Tsuchida et al. (2010) tested humans with damage to both mOFC and lOFC on a PRL task and observed impairments during the initial discrimination and reversal phase, along with increased win-shift tendencies (i.e., reduced win-stay behavior). Our findings suggest that impaired initial discrimination learning and reduced win-stay behavior observed in humans with OFC damage may be attributable to disrupted mOFC function, whereas impaired reversal performance may be related to lOFC damage. These findings emphasize that a more comprehensive understanding of OFC functions will require isolating the dissociable and/or complementary contribution of the medial and lateral portions of this region make to reward seeking, cognitive flexibility, and other aspects of behavior.
Medial PFC regions and PRL
Inactivation of the infralimbic or prelimbic medial PFC did not impair PRL. Similar treatments in the dACC tended to reduce the number reversals completed, but this was not accompanied by changes in error rates or win-stay/lose-shift behavior. The ACC has been proposed to play a role in response inhibition, error detection and performance monitoring (Miyake et al., 2000; Miller and Cohen, 2001; Chase et al., 2008), as well as integration of choice/outcome history (Williams et al., 2004; Rushworth et al., 2007). Note, however, that humans with dACC lesions display normal PRL (Tsuchida et al., 2010). Furthermore, even though dACC inactivation did not significantly impair performance, it did slow choice latencies, suggesting that it plays a permissive role in guiding response selection in these situations.
Prelimbic inactivations not only failed to impair PRL, but actually increased the number of reversals completed. In comparison, lesions or inactivation of this region typically do not affect reversal learning with assured outcomes (Ragozzino et al., 1999; Boulougouris et al., 2007; Floresco et al., 2008). In attempting to understand this surprising effect, it should be noted that the frequent shifts in reinforcement contingencies animals experienced over training would reduce the impact that individual rewarded actions had on subsequent choice. Instead, these conditions promote tracking the broader context of reward history. In this regard, neurophysiological and inactivation studies have implicated the prelimbic PFC in identifying changes in reinforcement contingencies (Durstewitz et al., 2010) and monitoring actions–outcomes to track variations in reward probability (St Onge and Floresco, 2010; St Onge et al., 2012; Orsini et al., 2015). In the present study, prelimbic inactivation increased the likelihood of rats repeating a rewarded choice with the same type of choice. Thus, rather than integrating their reward history, rats with prelimbic inactivation displayed a form of reward myopia, with response selection more heavily influenced by the most recently rewarded action. Note that increased win-stay behavior was observed regardless of whether the previous choice was the correct action or not. However, correct choices were rewarded much more frequently, which would lead to greater number of correct versus incorrect choices.
A consequence of the contingencies used here was that 20% of correct choices were not rewarded. Under control conditions, this causes a shift in responding on ∼40% of such trials, which in turn can interrupt a streak of correct choices and delay criterion performance that triggers a reversal. Prelimbic inactivation reduced lose-shift behavior, primarily after non-rewarded correct choices, rather than incorrect ones. This suggests that the ability to adjust behavior after a string of non-rewarded actions (as would occur after most incorrect choices) is relatively spared by prelimbic inactivation. On the other hand, reductions in lose-shift behavior during probabilistic discounting have been observed following disconnection of prelimbic projections to the basolateral amygdala (St Onge et al., 2012). This combination of findings suggests that neural activity in the prelimbic PFC facilitates detection of infrequent errors in reward prediction that occur after non-rewarded actions. Together, the seemingly improved PRL performance after prelimbic inactivation may actually reflect impairments in the ability to monitor different aspects of volatile action–outcome associations, including diminished sensitivity to both the long-term reward history of actions and occasional negative feedback. Within the context of the PRL task structure, these impairments would increase the likelihood of repeating correct choices and reduce shifts in responding, which in this instance, manifested as longer streaks of correct choices and more reversals completed.
Conclusions
The present findings highlight the complex segregation of activity within different regions of the frontal lobes in mediating cognitive flexibility in uncertain situations. Both the mOFC and lOFC cooperate to ascertain courses of action that are more likely to yield rewards and detect changes in reinforcement contingencies. In contrast, the prelimbic PFC appears to monitor action–outcome reward histories and non-rewarded actions. It is noteworthy that deficits in probabilistic learning, cognitive flexibility and altered sensitivity to reward and negative feedback are apparent in a variety of psychiatric disorders such as schizophrenia and depression (Waltz and Gold, 2007; Taylor Tavares et al., 2008; Roiser et al., 2009; Whitton et al., 2015). Further clarification of the mechanisms that mediate these functions in the normal brain may provide insight into the distinct pathophysiologies of different frontal lobe regions that underlie abnormalities in specific aspects of cognition.
Footnotes
This work was supported by grants from the Natural Sciences and Engineering Research Council of Canada to A.G.P. and S.B.F.
The authors declare no competing financial interests.
- Correspondence should be addressed to Dr. Stan B. Floresco, Department of Psychology and Brain Research Center, University of British Columbia, 2136 West Mall, Vancouver, BC V6T 1Z4, Canada. floresco{at}psych.ubc.ca