Abstract
Different subregions of nucleus accumbens (NAc) have been implicated in reward seeking, promoting flexible approach responses, suppressing nonrewarded actions, and facilitating shifts between different discrimination strategies. Interestingly, the NAc does not appear to mediate shifting between stimulus–reward associations (i.e., reversal learning) when reinforcement is predictable. How these nuclei may facilitate flexible response strategies when reward delivery is uncertain remains unclear. We investigated the effects of inactivation of the NAc shell and core on probabilistic reversal learning using an operant task wherein a “correct” response delivered reward on 80% of trials, and an “incorrect” response was reinforced on 20% of trials. Reinforcement contingencies were reversed repeatedly within a session. In well-trained rats, shell inactivation reduced the number of reversals completed and selectively reduced win–stay behavior. This impairment was apparent during the first discrimination, indicating a more general deficit in the use of probabilistic reward feedback to guide action selection. Shell inactivation also impaired reversal performance on a similar task where correct/incorrect choices always/never delivered reward. However, this impairment only emerged after both levers had been associated with reward. Inactivation of NAc core did not impair reversal performance but increased latencies to approach the response levers. These results suggest the NAc shell and core facilitate reward seeking in a distinct yet complementary manner when the relationship between specific actions and reward is uncertain or ambiguous and cognitive flexibility is required. The core promotes approach toward reward-associated stimuli, whereas the shell refines response selection to those specific actions more likely to yield reward.
Introduction
When seeking out the good things in life, we must often behave in a flexible manner to obtain our goals. Sometimes, rewards are not where they used to be, or require the use of alternative strategies to be obtained. Alternatively, in some circumstances, certain actions may not always yield reward. Situations of uncertainty between action and outcome require different forms of cognitive flexibility mediated by different regions of the frontal lobes. For example, shifts between strategies or attentional sets are dependent on the dorsolateral/medial prefrontal cortex in primates/rats, whereas shifts between different stimulus–reward associations (i.e.; reversal learning) require an intact orbitofrontal cortex (Dias et at, 1996; Ragozzino et al., 1999; Birrell and Brown, 2000; McAlonan and Brown, 2003; Ghods-Sharifi et al., 2008).
Implementation of these different processes is mediated by different corticostriatal circuits. The rat dorsomedial striatum mediates reversal learning, as well as more complex processes related to attentional set formation and shifting between different strategies (Ragozzino et al., 2002; Castañé et al., 2010; Lindgren et al., 2013). Strategy shifting is also critically dependent on circuits linking prefrontal cortex to the nucleus accumbens (NAc) core (Floresco et al., 2006, 2009; Block et al., 2007). In contrast, lesions of the NAc core or shell do not disrupt reversal learning (Burk and Mair, 2001; Castañé et al., 2010), a finding that may appear to contradict the well-established contribution by NAc subregions to reinforcement learning and reward seeking. The core appears to promote a flexible approach toward reward-related locations (Ambroggi et al., 2008; Nicola, 2010), whereas the shell has been implicated in suppression of nonrewarded actions and in learning to ignore irrelevant stimuli (Weiner, 2003; Floresco et al., 2008; Blaiss and Janak, 2009; Ambroggi et al., 2011).
Studies of the neural basis of reversal learning typically use assured reinforcement contingencies, where correct/incorrect responses are always/never rewarded. However, such consistent reinforcement contingencies are not always the norm in the real world, as a “correct” action may not always yield rewards. These situations add additional complexity, by requiring subjects to track the broader context of reward history to ascertain which response option may be more profitable. It is possible that situations of uncertainty recruit neural systems within the striatum that differ from those involved with simpler forms of reversal learning. Indeed, functional imaging has revealed that probabilistic reversal learning is associated with increased activation of the ventral striatum (Cools et al., 2002; Mell et al., 2009).
A growing interest in the neural mechanisms underlying probabilistic reversal learning can be attributed to studies linking perturbation in this form of learning to clinical conditions, including schizophrenia (Waltz and Gold, 2007), Parkinson's disease (Cools et al., 2001; Peterson et al., 2009), mood disorders (Taylor Tavares et al., 2008; Roiser et al., 2009), as well as orbitofrontal cortex damage (Tsuchida et al., 2010). One notable study in rats showed that serotonin depletion impaired probabilistic reversal learning by reducing sensitivity to rewards (Bari et al., 2010). Despite such progress, additional preclinical research is required to identify the specific contributions of different corticostriatal circuits to probabilistic reversal learning. Therefore, the present study addresses this gap by examining the effects of inactivation of the NAc shell or core on probabilistic learning in rats trained on a task requiring multiple reversals within a session. Our focus on the NAc was driven by human imaging studies as well as preclinical studies with rats implicating the NAc in response selection in situations involving reward uncertainty (Cools et al., 2002; Mell et al., 2009; Stopper and Floresco, 2011).
Materials and Methods
Subjects.
Male Long–Evans rats (280–350 g; Charles River Laboratories) were housed in single cages and maintained on a 12 h light/dark cycle with free access to standard laboratory chow and water. The colony was maintained at 21°C with a 12:12 h light dark cycle (lights on at 7:00 A.M.). All experiments were performed during the light phase of the cycle. Rats were given 7–8 d to acclimatize to the colony before behavioral procedures began. Rats were handled and weighed daily during this period and throughout the course of the experiment. During behavioral procedures, rats had free access to water and were maintained on a restricted laboratory chow diet to maintain 85–90% of ad libitum weight in age-matched rats. All experiments were conducted in accordance with the standards of the Canadian Council on Animal Care and were approved by the Committee on Animal Care, University of British Columbia.
Apparatus.
All testing was conducted in operant chambers (30.5 cm × 24 cm × 21 cm; Med-Associates) enclosed in sound-attenuating boxes. Each box contained a fan to mask outside noises and to provide ventilation. Two retractable levers were located on either side of a central food hopper into which sugar pellet reinforcement (45 mg; BioServ) was delivered. Each chamber was illuminated by a 100-mA house light located in the top-center of the wall opposite the levers. All experimental data were recorded by an IBM personal computer connected to the chambers via an interface.
Surgery.
Before training, rats were anesthetized with ketamine (100 mg/kg)/xylazine (7 mg/kg) and implanted with bilateral 23-gauge stainless-steel guide cannulae located above either the core or the medial shell region of the nucleus accumbens (shell: flat skull, anteroposterior = 1.3 mm, mediolateral = ±1.0 mm, dorsoventral = −6.3 mm from dura; core: flat skull, anteroposterior = 1.6 mm, mediolateral = ±1.8 mm, dorsoventral = −6.3 mm from dura) using standard stereotaxic techniques. Guide cannulae were implanted vertically and held in place with stainless steel screws and dental acrylic. Thirty-gauge obdurators flush with the end of guide cannulae remained in place until the infusions were made. Rats were given at least 1 week to recover from surgery before behavioral training began. During this period, they were handled for at least 5 min each day and were food restricted to 85% of their free-feeding body weight.
Lever pressing training.
On the day before their first exposure to the operant chambers, rats were given ∼25 reward pellets in their home cage. On the first day of training, the food cup contained 2 or 3 pellets and crushed pellets were placed on a lever before each rat was placed into the chamber. Rats were first trained to press one of the levers to receive reward on a fixed-ratio 1 schedule to a criterion of 60 presses in 30 min and were required to press the other lever on the next day (counterbalanced left/right between subjects). Rats were then trained on a simplified version of the full task. These 90-trial sessions began with the levers retracted and the operant chamber in darkness. Every 40 s, a new trial was initiated by illumination of the house-light and the insertion of one of the two levers into the chamber. If the rat failed to respond on the lever within 10 s, the lever was retracted, the houselight was extinguished and the trial was scored as an omission. A response within 10 s of lever insertion resulted in delivery of a single pellet with 50% probability. This procedure was used to familiarize the rats with the probabilistic nature of the full task. In every pair of trials, the left or right lever was presented once, and the order within the pair of trials was randomized. Rats were trained for ∼3–4 d to a criterion of ≥80 successful trials (i.e.; ≤10 omissions), after which they were trained on one of two reversal learning tasks.
Probabilistic reversal learning .
The procedures used in the present study were modified from those described by Bari et al. (2010) through the use of retractable levers (as opposed to nosepoke apertures used in the previous study). Daily sessions consisted of 200 discrete choice trials, with an intertrial interval of 15 s (50 min total). Trials began with illumination of the house-light and, 3 s later, insertion of both levers into the chamber. At the start of each session, one of the two levers was randomly selected to be “correct” and the other “incorrect.” During this initial discrimination phase, a response on the “correct” lever delivered a single reward pellet on 80% of trials, whereas an “incorrect” response delivered reinforcement on only 20% of trials. Failure to press a lever within 10 s of insertion (i.e., trial omission) led to their retraction and termination of the houselight until the next trial. Once the “correct” lever was selected on eight consecutive trials (regardless of whether a correct choice was reinforced), the contingencies were reversed so that the “correct” lever now became the “incorrect” lever and vice versa. This pattern was repeated over the course of a daily session. Daily training sessions continued until a group of rats achieved >3 reversals per session for 2 consecutive days (typically 7–10 training sessions). On the following day, rats received their first counterbalanced microinfusion test day.
Reversal learning with assured outcomes.
This task differed from the probabilistic reversal learning task in only one respect as a correct/incorrect response always/never delivered reinforcement, respectively. Two groups of experimentally naive rats separate from those used in the probabilistic reversal experiment were trained on this task for 7 d, after which they proceeded to the microinfusion test phase of the experiment.
Drugs and microinfusion procedures.
One or two days before their first microinfusion test day, rats received a mock infusion procedure, during which obdurators were removed from the guide cannulae and replaced with stainless steel injectors for 2 min, without an infusion.
A within-subjects design was used for all experiments. Inactivation of the NAc shell or core was achieved by microinfusion of a solution containing the GABAB agonist baclofen and the GABAA agonist muscimol (75 ng each per side; Sigma-Aldrich). GABA agonists or saline were infused bilaterally (0.3 μl over 45 s) via a 30-gauge injection cannula that protruded 0.8 mm beyond the guide cannula. Injection cannulae were left in place for 60 s to allow for diffusion. Rats remained in their home cages for an additional 10 min period before behavioral testing. Previous studies using similar infusions of 0.3 μl of baclofen/muscimol solutions reported dissociable effects on behavior with infusions into adjacent brain regions separated by ∼1 mm (Floresco et al., 2006, 2008; Marquis et al., 2007; Moreira et al., 2007). This is consistent with an estimated functional spread of these treatments ≤1 mm in diameter. Based on these estimates, infusions of baclofen/muscimol into the shell would likely inactivate the medial, but not the ventrolateral, part of this subregion. Furthermore, neurophysiological studies have shown that administration of muscimol into the brain induces a significant suppression of neural activity for at least 2 h (van Duuren et al., 2007), which would last throughout the duration of the test sessions used here (50 min).
On the first infusion test day, half of the rats in each group received saline infusions, and the other half received baclofen/muscimol. The following day, all rats received a baseline training day (no infusion). If a rat achieved <2 reversals during this baseline session, it was given an additional day of training before the second infusion test. On the day after baseline performance was reestablished, rats received a second counterbalanced infusion of saline or baclofen/muscimol.
Histology.
After completion of behavioral testing, rats were killed in a carbon dioxide chamber. Brains were removed and fixed in a 4% formalin solution. The brains were frozen and sliced in 50 μm sections before being mounted and stained with cresyl violet. Placements were verified with reference to the neuroanatomical atlas of Paxinos and Watson (2005). Data from rats with placements outside the borders of the NAc core or shell, asymmetrical placements, or those that had infusions that encroached on or penetrated the lateral ventricle were removed from the analysis. In general, animals with inaccurate placements did not display prominent changes in performance after inactivation treatments relative to saline infusions. The locations of infusion sites are displayed in Figure 1A.
Data analysis.
A main dependent variable of interest was the number of reversals completed per session. However, in these experiments, the number of trial omissions differed between saline and inactivation treatments, which could complicate interpretation of the raw data, as a decrease in the number of reversals/session could be attributable either to an impairment in cognitive processes related to learning or merely reflect fewer completed trials. To accommodate for this difference in trial omissions, the data were also analyzed as a function of the number of complete trials. Specifically, data were transformed using the following formula: [no. of reversals completed per session/(200 − no. of trial omissions)] × 100 (i.e., number of reversals per 100 completed trials). These data were analyzed with paired t tests (two-tailed).
For the probabilistic reversal learning task, we also analyzed each animal's choices according to the outcome after a correct choice of each preceding trial to assess whether neural inactivation altered reward (“win–stay”) or negative feedback (“lose–shift”) sensitivity (Bari et al., 2010). Win–stay ratios assessed the likelihood that a subject followed a rewarded “correct” choice with another correct choice. These ratios were calculated from the number of trials on which a rat chose the correct lever after being rewarded for a correct choice on the preceding trial, divided by the total number of rewarded correct choices. Conversely, lose–shift ratios indexed choices of an “incorrect” lever after “correct” choices accompanied by misleading negative feedback (i.e., reward omission). These values were calculated from the number of trials on which a rat selected the incorrect lever after not being rewarded for a correct choice on the preceding trial, divided by the total number of nonrewarded correct choices. The proportion of win–stay and lose–shift scores was analyzed using two-way repeated-measures ANOVAs with treatment and trial type (win–stay and lose–shift) as factors. We also analyzed “incorrect” win–stay ratios, calculated from the number of trials on which a rat chose the incorrect lever after being rewarded for an incorrect choice on the preceding trial, divided by the total number of rewarded incorrect choices. These ratios were analyzed separately from those described above using paired t tests, as they represented a different type of response (i.e., choices following incorrect vs correct responses).
Ancillary analyses assessed differences in the number of errors committed to achieve criterion of 8 correct consecutive choices for the first discrimination of the session and for subsequent reversals. For these analyses, we compared the number of errors to criterion for the minimum number of reversals completed by all rats after both treatments. For example, under control conditions, every rat in a particular group may have completed at least three reversals. However, after inactivation treatments, the same group of rats may have only completed the first discrimination and at least one reversal. In this instance, we analyzed errors to criterion for the first discrimination and first reversal only across both treatments. Unless otherwise stated, these data were analyzed with two-way repeated-measures ANOVAs, with treatment and phase (first discrimination, first reversal, etc.) as two within-subjects factors. Latencies to make a choice and the number of trial omissions were analyzed with paired t tests (two-tailed).
Results
Experiment 1: probabilistic reversal learning
Task acquisition
As displayed in Figure 1B, rats implanted with cannula in either the NAc shell or core rapidly increased the number of reversals completed per session over 7 d of training, and this was confirmed by a two-way ANOVA (main effect of training session: F(6,132) = 13.55, p < 0.001). The groups did not differ significantly on the rate of acquisition (main effect of group and group × day interaction: both F < 1.0, not significant). Targeted analyses on the last 3 d of training confirmed that, by this point, both groups had achieved asymptotic performance, as there was no difference in the number of reversals completed during this period (both F < 1.6, not significant). Rats subsequently received counterbalanced infusions of either saline or baclofen/muscimol on separate test days. For both groups, there were no differences in the number of reversals completed on the days before the first versus second infusion test days (shell: 4.1 ± 0.4 vs 3.8 ± 0.6; core: 3.6 ± 0.3 vs 3.7 ± 0.4; both t < 0.80, not significant).
NAc shell inactivation
Fifteen rats with cannulae implanted into the NAc shell were tested. Data from 3 rats were eliminated because of inaccurate placements that resided outside the borders of the shell. For the remaining animals (n = 12), infusions of baclofen/muscimol into the NAc shell markedly impaired performance, indexed by a decrease in the number of reversals completed per session (t(11) = 4.21, p < 0.01; Table 1). However, these treatments also resulted in a considerable increase in the number of trial omissions (Table 1), even though the difference between treatments was not statistically significant (t(11) = 1.81, 0.05 < p < 0.10). To confirm that the effect on reversals completed was not merely the result of a reduction in the overall number of choices made, we analyzed the number of reversals completed as a function of the total number of trials completed (reversals/100 completed trials). Using these transformed data, the analysis again yielded a significant decrease after inactivation of the NAc shell, relative to saline infusions (t(11) = 4.48, p < 0.01; Fig. 2A). In this experiment, shell inactivations also increase the latencies to make a choice (t(11) = 2.78, p < 0.05; Table 1).
A subsequent analysis was conducted to determine how NAc shell inactivation altered reward sensitivity (i.e., win–stay behavior) or sensitivity to misleading negative feedback (i.e., lose–shift behavior). Under control conditions, rats followed a rewarded correct choice with another correct choice on ∼70% of these occasions. In contrast, on trials where rats chose correctly but were not rewarded, they shifted to the incorrect lever on ∼40% of subsequent trials. Analysis of these data obtained on saline and inactivation test days revealed a significant treatment × trial type interaction (F(1,11) = 6.66, p < 0.05; Fig. 2B). Simple main effects analysis further confirmed that NAc shell inactivation did not affect lose–shift behavior (p > 0.30). Instead, these treatments caused a selective decrease in reward sensitivity, as indexed by a reduction of win–stay ratios (p < 0.01). The reduction in win–stay behavior after shell inactivation was not correlated with the number of trial omissions (r = −0.06, not significant). Furthermore, when we analyzed the proportion of win–stay responses that were not preceded by omitted trials, we again observed that shell inactivation reduced win–stay tendencies (0.56 ± 0.03) relative to control treatments (0.71 ± 0.02; t(11) = 3.85, p < 0.01). Thus, these reduced win–stay tendencies did not appear to be attributable to a greater delay between responses and outcomes experienced by rats that made more omissions. In contrast to these effects, win–stay behavior after a rewarded “incorrect” response was not altered by shell inactivation (0.66 ± 0.05) relative to saline infusions (0.61 ± 0.05; t(11) = 0.88, not significant).
All rats in this group completed the initial discrimination phase and at least one reversal after both treatments. To determine whether the difference in overall performance was attributable to difficulty during reversal shifts or a more general disruption in learning based on probabilistic feedback, we compared the number of errors to achieve criterion (i.e.; 8 consecutive correct choices) for the initial discrimination and first reversal. Analysis of these data revealed a significant main effect of treatment (F(1,11) = 6.85, p < 0.01), but no treatment × phase interaction (F(1,11) = 0.49, not significant). As displayed in Figure 2C, inactivations increased the errors to criterion for the initial discrimination, and this impairment continued through the first reversal. The effect of shell inactivation on errors during the initial discrimination did not appear to be related to whether the correct lever at the start of the session was either opposite (19.3 ± 6 errors, n = 4) or the same (26.9 ± 5 errors, n = 8) as the lever that was correct at the end of the session on the preceding day (between-subjects t(11) = 0.94, not significant). Together, these data demonstrate that the NAc shell plays a critical role in facilitating probabilistic learning and reversal. The marked impairment induced by shell inactivation in well-trained subjects was associated with a decrease in reward sensitivity, as rats were less persistent in selecting the correct option after being rewarded for a correct choice on the preceding trial. Furthermore, performance was impaired during the initial discrimination phase, suggesting that these effects may not reflect deficits specific to reversal learning. Instead, this may indicate a more comprehensive impairment in probabilistic reinforcement learning.
NAc core inactivation
Sixteen rats with cannulae implanted into the NAc core were tested. Data from 4 rats were eliminated because of inaccurate placements that resided outside the borders of the core or were asymmetrical in the mediolateral plane. One additional rat displayed a disproportionate increase in trial omissions after inactivation treatment (171 omissions) that was >2 SDs from the group mean. Data from this animal were also eliminated from the analyses. This left a final n of 11 rats.
Infusions of baclofen/muscimol into the NAc core did not affect probabilistic reversal learning significantly. Neither the total number of reversals (t(10) = 1.17, not significant; Table 1) nor the number of reversals/100 completed trials (t(10) = 1.43, not significant; Fig. 3A) differed between treatment conditions. However, core inactivation did increase response latencies (t(10) = 2.20, p = 0.05; Table 1). Analysis of the number of trial omissions, excluding data from the rat mentioned above, revealed no difference on this measure relative to vehicle treatment (t(10) = 0.57, not significant; Table 1). Consistent with the lack of effect on the number of reversals completed, inactivation of the NAc core also did not alter win–stay or lose–shift tendencies (all F < 1.0, not significant; Fig. 3B). Likewise, core inactivation did not alter “incorrect” win–stay ratios (0.59 ± 0.05) relative to control treatments (0.67 ± 0.04; t(10) = 1.24, not significant)
In this experiment, 2 rats did not achieve criterion performance on the initial discrimination phase after inactivation, attributable in part to a marked increase in trial omissions (68 and 86 omissions). The remaining 9 rats completed at least one reversal after both treatments. To accommodate for the missing data from the first reversal, we analyzed the errors to criterion for the initial discrimination and first reversal separately. For all 11 rats, core inactivation tended to increase errors to criterion during the initial discrimination, but this difference did not achieve statistical significance ((t(10) = 1.99, 0.05 < p < 0.10; Fig. 3C). Notably, the number of errors made during this phase was strongly correlated with the number of trial omissions, in that rats displaying more trial omissions tended to make more errors during the initial discrimination of the session (r = 0.68, p < 0.05). Thus, the greater delay between responses and outcomes experienced by rats that made more omissions may have made it more difficult to remember which lever was providing reinforcement more reliably. In contrast, for the 9 rats that completed at least one reversal, core inactivation did not affect errors to achieve criterion during the first reversal (t(8) = 0.48, not significant; Fig. 3C). Thus, core inactivation caused a general slowing of performance, as evidenced by an increase in response latencies (Table 1). However, inactivation of NAc core did not significantly impair probabilistic reversal learning, in marked contrast to the effects observed after similar treatments of the NAc shell. The dissociation between effects of core and shell inactivation on probabilistic reversal performance was further supported by an additional analysis whereby we directly compared the effects of saline/inactivation treatment of either subregion on the number of reversals/100 completed trials. A two-way ANOVA with group (core, shell) as a between-subjects factor and treatment (saline, inactivation) as a within-subjects factor revealed a significant treatment × group interaction (F(1,21) = 4.41, p < 0.05). Simple main effects analysis again confirmed that inactivation of the NAc shell impaired performance (p < 0.05), whereas core inactivation did not. In addition, even though rats in the core group completed fewer reversals relative to those in the shell group after saline infusions, a direct comparison of these data revealed that this was not a statistically reliable effect (F(1,21) = 3.22, not significant).
Experiment 2: reversal learning with assured outcomes
The finding that inactivation of the NAc shell (but not core) impaired probabilistic reversal learning differs from other observations that lesions of either region do not impair the acquisition or reversal of a spatial discrimination similar to those used here (Burk and Mair, 2001; Castañé et al., 2010). An important difference between the previous and present studies is that, in the former instances, a correct/incorrect response always/never delivered reward. To explore whether impairments induced by NAc shell inactivations were related to probabilistic reinforcement learning, separate groups of rats were trained on a similar task in which a correct choice was always rewarded and an incorrect choice was never rewarded. We hypothesized that inactivation of either region of the NAc would not affect performance under these conditions.
Task acquisition
As displayed in Figure 1C, rats implanted with cannula in either the NAc shell or core rapidly increased the number of reversals completed per session over 7 d of training, confirmed by a two-way ANOVA (main effect of training session: F(6,54) = 13.37, p < 0.001). No difference between groups were observed in terms of the rate of acquisition (main effect of group and group × training session interaction: both F < 1.8, not significant). Notably, by the end of training, rats in this experiment were completing more than twice as many reversals per session compared with rats trained on the probabilistic reversal task (Fig. 1B). This was presumably attributable to the use of assured schedules of reinforcement, which facilitated detection of within-session switches in reward contingencies. Targeted analyses on the last 3 d of training confirmed that, by this point, both groups had achieved asymptotic performance, as there were no differences in the number of reversals completed during this period (both F < 3.0, not significant). Rats subsequently received counterbalanced infusions of either saline or baclofen/muscimol on separate test days. As in Experiment 1, for both groups, there were no differences in the number of reversals completed on the days before the first versus second infusion tests (shell: 7.8 ± 0.8 vs 7. 7 ± 0.8; core: 7.6 ± 0.4 vs 7.4 ± 0.5; both t < 0.35, not significant).
NAc shell inactivation
Nine rats were initially tested in this experiment. Data from 3 rats were eliminated because of inaccurate placements outside the borders of the shell, leaving a final n = 6 for the data analysis. Contrary to our expectations, inactivation of the NAc shell reduced the number of reversals completed per session (t(5) = 4.70, p < 0.01; Table 1) and reversals/100 completed trials (t(5) = 4.26, p < 0.01; Fig. 4A). However, subsequent analysis of pattern of errors revealed that the impairments induced by shell inactivation on this task were qualitatively different from those observed during probabilistic reversals. In the present experiment, all rats completed the initial discrimination phase and at least two reversals after each treatment. Analysis of the errors to criterion data across these three phases revealed a significant main effect of treatment (F(1,5) = 18.55, p < 0.01) and, importantly, a significant treatment × phase interaction (F(1,5) = 4.73, p < 0.05). Simple main effects analyses further revealed that NAc shell inactivation did not affect the number of errors made during the initial discrimination of the session or the first reversal (both p > 0.15; Fig. 4B). Thus, when correct/incorrect responses were always/never reinforced, NAc shell inactivation did not impair learning during these initial phases of the task, which contrasts with the effects observed in Experiment 1, where inactivation did impair performance during these phases when reinforcement was probabilistic. However, shell inactivation increased errors to criterion only when rats had reached the second reversal phase (p < 0.05). Shell inactivation did not significantly affect the number of trial omissions (t(5) = 1.60, not significant; Table 1). In addition, as opposed to Experiment 1, response latencies also did not differ not significantly across treatments (t(5) = 1.70, not significant; Table 1).
NAc core inactivation
Eight rats were initially tested. Data from 3 rats were eliminated because of inaccurate placements outside the borders of the core, leaving a final n = 5. Inactivation of the NAc core did not alter performance of this task, as neither the number of reversals completed per session (t(4) = 0.28, not significant; Table 1) nor reversals/100 completed trials (t(5) = 0.32, not significant; Fig. 4C) differed from control treatments. Similarly, analysis of the errors to criterion data obtained over the initial discrimination phase the first two reversals yielded no significant difference between treatments (both F < 0.30, not significant; Fig. 4D). Inactivation increased trial omissions, although this effect did not achieve statistical significance (t(4) = 1.85, not significant; Table 1). However, core inactivation again retarded approach to the response levers, as evidenced by a significant increase response latencies relative to control treatments (t(4) = 2.86, p < 0.05; Table 1).
Discussion
Here we show that inactivation of the NAc shell impairs probabilistic reversal performance, identifying a key role for this nucleus in using probabilistic reward feedback to facilitate discriminative learning and flexibility. Shell inactivation also induced qualitatively different impairments on a similar reversal task with assured response/reward contingencies. In comparison, inactivation of the NAc core caused a general slowing of approach toward the response levers but did not affect performance accuracy.
The NAc shell, reward uncertainty/ambiguity, and action selection
Functional imaging has revealed increased activation of the human ventral striatum during probabilistic reversal learning (Cools et al., 2002, 2007; Hampton and O'Doherty, 2007; Mell et al., 2009). The present data complement and expand on these findings, confirming that intact functioning of the shell is critical when probabilistic reward feedback is used to guide response selection. Examination of the pattern of errors induced by shell inactivations revealed impaired learning during the initial discrimination phase that persisted during subsequent reversals. This contrasts with previous reports that shell lesions/inactivations do not disrupt acquisition or recall of spatial or visual discriminations when a correct/incorrect choice is always/never reinforced (Floresco et al., 2006; Castañé et al., 2010). The pattern of impairments observed here suggests it is unlikely that these effects reflect a specific deficit in cognitive flexibility per se. Rather, they point to a key role for the NAc shell in refining response selection when reinforcement is uncertain and a particular action may not always lead to reward.
Our finding that activity within the NAc shell guides behavior under conditions of reward uncertainty dovetails with those clarifying how NAc subregions contribute to choice behavior within the context of cost/benefit decision making. For example, when choosing larger versus smaller rewards that are both guaranteed yet the subjective value of a larger reward is diminished by some cost (e.g., effort or delays), perturbations of NAc core, but not shell, shift biases away from larger rewards (Cardinal et al., 2001; Pothuizen et al., 2005; Ghods-Sharifi and Floresco, 2010). In contrast, during a probabilistic discounting task, inactivation of NAc shell (but not core) reduced preference for larger, uncertain rewards relative to smaller/certain ones, particularly when the uncertain option had greater long-term utility (Stopper and Floresco, 2011). In a similar vein, recent imaging studies in humans reported selective activation of NAc shell during evaluation of potential gains or losses on a gambling task (Baliki et al., 2013). This burgeoning literature implicating the shell in influencing response selection in situations involving reward uncertainty is in keeping with the recent theoretical framework of Baudonnat et al. (2013) who proposed that “NAc shell dopamine is important for signaling the occurrence of novel and potentially salient events, particularly … when there is ambiguity over the cause of that event.”
Impaired probabilistic reversal performance was accompanied by a selective reduction in tendencies to follow a rewarded correct choice with another correct choice. This parallels the effects of NAc shell inactivation on risk/reward decision making, where similar treatments also reduced win–stay tendencies (Stopper and Floresco, 2011). The shell may promote win–stay behavior via patterns of activity that track outcomes of recent choices, as some ventral striatal cells encode information about current and previous choices when rats perform a probabilistic reversal task (Kim et al., 2009). Thus, the shell appears to facilitate learning when a particular action may not always yield reward by increasing the likelihood that recently reinforced actions bias the direction of subsequent behavior.
NAc shell inactivation also reduced the number of reversals completed on a task using assured outcomes, but this was accompanied by a pattern of errors distinct from those on the probabilistic task. Learning during the initial discrimination or first reversal was unaffected when action–outcome relationships were completely reliable, consistent with reports that shell lesions/inactivations do not induce general impairments in discrimination learning or response flexibility (Floresco et al., 2006; Castañé et al., 2010). Instead, impairments only emerged during the second reversal. At first glance, this appears to contradict reports that shell lesions do not disrupt serial reversal learning (Castañé et al., 2010). A notable difference between the previous and present studies is that the former incorporated a single reversal per daily session. The present procedure exposed rats to multiple reversal shifts within relatively close succession. As a session progressed, such rapid shifts would be expected to increase ambiguity regarding the “correct” lever, requiring additional vigilance when monitoring and updating representations of action–outcome contingencies. Indeed, disruptions in performance by shell inactivation only emerged after both levers were associated with reward. The idea that the shell is recruited to guide behavior when ambiguity exists about the probable location of reward harmonizes with studies exploring its contribution to spatially guided reward seeking. Foraging for food on 4-arm maze is unaffected by inactivations focused primarily in the shell, yet these manipulations do disrupt search for 4 reward pellets located randomly on a more complex 8-arm maze (Seamans and Phillips, 1994; Floresco et al., 1996). Integration of these findings with those of the probabilistic reversal experiment suggests a broader role for the NAc shell in promoting flexible responding in situations that present considerable ambiguity about which actions may yield reward more reliably (Floresco, 2007). Such situations include those where multiple stimuli may be associated with reward, when there are repeated shifts of stimulus–reward contingencies, or when actions yield rewards in a probabilistic manner.
Distinct and complementary roles for the NAc core and shell in guiding reward seeking
Previous investigations on the contribution of the NAc to cognitive flexibility identified a key role for the core and its prefrontal, thalamic, and dopamine inputs in set-shifting, facilitating acquisition and maintenance of novel discrimination strategies (Floresco et al., 2006; Block et al., 2007; Haluk and Floresco, 2009). Conversely, disrupting NAc core function does not impair reversal learning when there are changes in stimuli associated with reward (e.g., left/right lever), but the basic strategy remains the same (e.g; approach a specific lever) (Castañé et al., 2010). Similarly, excitotoxic lesions of the NAc focused primarily on the core and sparing the dorsomedial shell did not impair reversal performance on a go/no-go task using appetitive (sucrose) and aversive (quinine) reinforcers (Schoenbaum and Setlow, 2003). Likewise, core inactivation also did not perturb shifting during either probabilistic or deterministic reversals in the present study.
It has been proposed that the NAc core is involved in processes that invigorate reward seeking and facilitate flexible approach toward stimuli associated with reward (Nicola, 2010). Glutamatergic/dopaminergic blockade within the core reduces approach toward reward-associated manipulanda signaled by discriminative stimuli, particularly after long (10–20 s) intervals (Nicola, 2010; Ambroggi et al., 2011). In the present study, core inactivation caused a similar effect. Task-related cues (houselight, levers) that signaled the end of 15 s intertrial intervals were less effective at inciting approach toward the response levers, as reflected in increased choice latencies. Integrating these data with those obtained from the NAc shell experiments supports the hypothesis that these subregions subserve distinct, yet complementary, functions to facilitate reward seeking when there is uncertainty/ambiguity about the probable location of reward. The core plays a more general motivational role by invigorating approach toward stimuli linked to reward, whereas the shell refines response selection to enable actions more likely to yield reward.
Since the initial conceptualization that the ventral striatum segregates into distinct subregions (Záborszky et al., 1985), numerous studies have attempted to clarify the specific functions of the NAc core and shell. A unified consensus on this topic remains elusive, yet recent findings have begun to develop a clearer picture on how they may cooperate to guide behavior. As discussed above, numerous studies favor the core as a site that enables reward-related signals to increase the likelihood of approach toward incentive stimuli or locations where rewards may be available (Parkinson et al., 2000; Blaiss and Janak, 2009; Nicola, 2010; Saunders and Robinson, 2012). On the other hand, accumulating evidence suggests that the shell subserves a different function related to suppression of irrelevant behaviors. For example, suppressing shell activity elicits robust feeding in sated animals (Stratford and Kelley, 1997; Reynolds and Berridge, 2001, 2008). Similar treatments enhance responding for drug or food-related cues during reinstatement tests conducted in extinction (Di Ciano et al., 2008; Floresco et al., 2008; Peters et al., 2008) and increase inappropriate responding when rewards are explicitly unavailable (Blaiss and Janak, 2009; Ambroggi et al., 2011). These findings converge with a seasoned literature implicating the shell in learning about the irrelevance of stimuli either during avoidance conditioning (Weiner and Feldon, 1997; Gal et al., 2005) or when acquiring novel discrimination strategies (Floresco et al., 2006).
Suppression of irrelevant, nonrewarded, or less profitable actions would be particularly important for facilitating flexibility in situations involving reward uncertainty/ambiguity. For example, efficient performance on a probabilistic task requires responding be directed toward high-probability “correct” options while at the same time suppressing responses that rarely yielded reward. Shell inactivation increased selection of the low-probability option after a rewarded correct choice (i.e., reduced win–stay behavior), which may be interpreted as a failure to suppress task-irrelevant responding. Similarly, following the discrimination and first reversal phases of the reversal task with assured outcomes, both levers have been reinforced, making it more difficult to disambiguate which actions may or may not be rewarded. Under these conditions, impaired suppression of nonrewarded behaviors induced by shell inactivation would also impair performance. Taking these speculations into account permits an expansion of the hypotheses on complementary roles for NAc core and shell in flexible reward seeking proposed above. Both regions facilitate obtainment of the good things in life, but through different mechanisms. By promoting an approach to stimuli signaling reward availability, core activity ensures an organism moves to where rewards may be procured. Once a bountiful location is reached, the shell suppresses irrelevant, nonrewarded behavior, thus keeping the reward seeker on task and ensuring rewards may be obtained more efficiently.
Footnotes
This work was supported by the Natural Sciences and Engineering Research Council of Canada to A.G.P. and S.B.F. We thank Chelsea Eades for assistance with behavioral testing and Dr. Joanna Workman for assistance with imaging.
The authors declare no competing financial interests.
- Correspondence should be addressed to Dr. Stan B. Floresco, Department of Psychology and Brain Research Center, University of British Columbia, 2136 West Mall, Vancouver, British Columbia V6T 1Z4, Canada. floresco{at}psych.ubc.ca