Abstract
In associative learning, animals learn to associate external cues or their own actions with appetitive or aversive outcomes. Although the dopamine (DA) system and the striatum/nucleus accumbens have been implicated in both the pavlovian and instrumental form of associative learning, whether specific neuronal signaling mechanisms underlie one form or the other is unknown. Here, we report that the striatum-enriched isoform of adenylyl cyclase (AC), AC5, is selectively required for appetitive pavlovian learning. Mice with genetic deletion of AC5 (AC5KO) acquired instrumental responding yet were unable to use cues that predicted reward delivery. The specificity of this deficit was confirmed by an inability of AC5KO mice to learn a simple appetitive pavlovian conditioning task. Conversely, AC5KO mice showed intact aversive pavlovian learning, suggesting the deficit was specific for learning about appetitive outcomes. Our results suggest that AC5 is a critical component of DA-dependent strengthening of stimulus–reward contingencies.
Introduction
An animal's ability to associate environmental stimuli or their own actions with appetitive outcomes is essential for goal-directed behavior and environmental adaptation. Although these forms of appetitive learning, pavlovian and instrumental conditioning, are often integrated, they are dissociable under experimental conditions (Hall, 2002; Kelley, 2004). The distinct neural substrates that underlie these specific forms of appetitive associative learning, however, have not been fully determined and characterized.
The cAMP second messenger system is highly conserved and mediates some form of learning in nearly all organisms (Kandel, 2001). Nine membrane-bound isoforms of adenylyl cyclase (AC) are expressed in mammals, each with different expression patterns and regulatory properties (Iwami et al., 1995; Guillou et al., 1999; Hanoune and Defer, 2001). Of these, the calcium/calmodulin (CaCaM)-stimulated AC1 and AC8 have been extensively studied and shown to be critical for hippocampus-based learning and synaptic plasticity because they couple glutamate-mediated increases in intracellular calcium with cAMP production (Wu et al., 1995; Wong et al., 1999; Wang and Storm, 2003). In contrast, the role of AC5 has been less well characterized. AC5 is highly expressed in the striatum, an area strongly associated with reinforcement learning and a major target of dopamine (DA) innervation (Matsuoka et al., 1997). In the striatal regions in which AC5 is expressed, AC1 and AC8 expression is very low (Matsuoka et al., 1997; Cooper et al., 1998; Nicol et al., 2005). The high level of AC5 expression in the striatum suggests that this isoform may be important for certain forms of striatum-dependent learning.
Striatal DA has been demonstrated to be critical for both appetitive pavlovian and instrumental conditioning (Schultz et al., 1997; Reynolds et al., 2001; Dickinson and Balleine, 2002; Yin and Knowlton, 2006; Day et al., 2007) and is required for induction of synaptic plasticity at corticostriatal synapses (Calabresi et al., 1992, 2007; Wickens et al., 1996; Kreitzer and Malenka, 2007). It has been hypothesized that DA facilitates the learning of environmental contingencies by mediating plasticity mechanisms that strengthen or weaken corticostriatal inputs associated with reward delivery (Wickens et al., 1996, 2003; Schultz et al., 1997; Reynolds et al., 2001; Reynolds and Wickens, 2002; Schultz, 2006). AC5 has been shown to be a primary downstream effector of DA receptor signaling, because in mice deficient in AC5 (AC5KO), stimulation of the DA D1 or D2 receptors does not alter cAMP levels (Iwamoto et al., 2003). In addition, loss of AC5 causes a reduction in D1 receptor levels in the striatum (Iwamoto et al., 2003), further suggesting that loss of this isoform may have critical effects on reward learning.
Using AC5KO mice, we examined the role of this AC isoform in instrumental and pavlovian conditioning. Although AC5KO mice acquired instrumental responding, they exhibited a severe impairment in appetitive pavlovian conditioning and as a result had a difficulty using cues that predicted the availability of reward. This deficit was specific for appetitive conditioning, because aversive conditioning was intact.
Materials and Methods
Mice
ADCY5-deficient (AC5KO) mice were generated as previously described (Iwamoto et al., 2003). AC5KO mice were backcrossed to C57BL/6 for eight generations. Heterozygote offspring were crossed with each other to obtain AC5KO homozygotes and wild-type (WT) controls. All mice tested were 8–12 weeks of age. All animals were group-housed (four to five per cage) in a temperature- and humidity-controlled barrier facility, with lights on/off at 6:00 A.M./6:00 P.M. All testing was conducted during the light phase. All experiments were approved by the Institutional Animal Care and Use Committee of the University of Chicago.
Behavioral procedures
Appetitive and instrumental conditioning experiments were conducted in mouse operant conditioning chambers that have two retractable levers, a house light, two signal lights above levers, a signal light and a nosepoke hole on the back wall, and a feeder with photobeam (MED Associates). All sessions began with the onset of the house light.
Instrumental conditioning.
In all experiments mice were fed ad libitum regular chow in their home cage for 2 h after testing. Naive food-restricted mice were first introduced to the conditioning chambers with two magazine training sessions, in which sucrose pellets were given not-contingently at a variable time of 180 s (VT180), and both levers were retracted. After magazine training, mice were trained on a fixed-interval 20 (FI20) schedule of reinforcement, in which the first lever press after 20 s was reinforced. Sessions ended in 1 h or when mice received 30 rewards. Mice were trained until they reached criterion of 30 rewards in a 1 h session. After all mice reached learning criterion, they were all run in a single FI20 session. After FI20 training, mice were trained for 1 d on a random-ratio 10 (RR10) schedule of reinforcement and 3 d on an RR20 schedule of reinforcement. During training, only the left lever was extended and mice were fed regular chow for 2 h after each session. An event recorder written into the program documented the time of each lever press, head entry, and reward during the session to generate behavioral raster plots.
Outcome devaluation.
Twenty-four hours after the last day of training, mice were tested for 2 consecutive days for sensitivity to outcome devaluation. Mice were placed in feeding cages and fed ad libitum either the reinforcer earned by lever press (sucrose pellets, devalued) or the ad libitum available reinforcer (regular chow, valued) for 1 h. The amount of each reinforcer consumed during prefeeding was recorded. Immediately after prefeeding, mice were tested for lever press behavior in a 5 min extinction session. Mice were counterbalanced for the order of valued or devalued conditions on either day.
Contingency degradation.
One week after testing for outcome devaluation, contingency assessment began. Water-restricted mice were trained to press the right lever for a water reward (25 μl) for 1 d on a fixed-ratio 1 (FR1) schedule of reinforcement, followed by 1 d of RR10 and 3 d of RR20. Then, both levers were extended and food and water-restricted mice were trained to press the left lever for sucrose pellet reward and the right lever for water reward for 4 d. Both levers provided rewards on an RR20 schedule of reinforcement. After two-lever training, testing for contingency was conducted for 5 d. During these sessions, the right lever gave water at an RR20 schedule of reinforcement and pellets were dropped noncontingent on a lever press on a random time (RT) 60 s schedule. Sessions ended after 30 pellets were dispensed.
Appetitive pavlovian conditioning.
Naive mice were food restricted before the first conditioning session. Pavlovian conditioning was conducted in the same chambers as instrumental conditioning, with both levers retracted. Sessions began with the onset of the house light. Mice were trained for 14 d, during which each session consisted of 15 daily trials with a 120 s variable intertrial interval (ITI). Each trial consisted of presentation of a 12 s, 85 dB, 2700 Hz tone [conditioned stimulus (CS)] followed by a click of the pellet dispenser and the drop of a single 20 mg sucrose pellet. The conditioned response was measured as head entries into the food receptacle. Head entries were recorded during the intertrial intervals, and during 2 s bins during tone presentation, immediately after pellet drop, and for 10 s after pellet drop. Data were presented as raw number of head entries during each of the 2 s bins of CS presentation and after pellet drop across all 15 trials in the session. ITI rate was calculated as total head entries during ITI divided by ITI time. Total head entry rate was calculated as total head entries in session divided by session time. A 0.33 s delay for detection of head entries was written into the program to reduce excessive head entry counts caused by twitching of the head in the receptacle. Mice were fed regular chow for 2 h after each session.
Aversive pavlovian conditioning.
The same mice were used for aversive conditioning as those used for appetitive conditioning. Four days after appetitive conditioning, mice were placed in fear conditioning chamber (Coulbourn Instruments) to test aversive conditioning. Baseline freezing was measured in response to context and cue before conditioning. During a 5 min training session, mice received two conditioning trials (60 s intertrial interval) of a 30 s, 90 dB, 2400 Hz tone followed by a 2 s, 0.5 mA footshock. Twenty-four hours later, mice were placed back in chamber and contextual freezing was scored for 2 min. The next day, mice were placed back in the chamber, altered for context by placing a gold-colored cardboard triangular cutout inside the fear conditioning chamber. The triangular cutout obscured the walls, changed the dimensions of the chamber by making it both smaller and triangular in shape, as well as covered the floor of the chamber. The tone CS was given to measure cued freezing. Freezing behavior was monitored every 5 s, and a freezing score was calculated as total number of 5 s bins the subject was immobile divided by total 5 s bins in the session. The cued percentage freezing was calculated as a percentage of total freezing observations during tone presentation.
Immunohistochemistry
Fifteen minutes after injection of either vehicle (0.9% saline) or 6-chloro-2,3,4,5-tetrahydro-1-phenyl-1H-3-benzazepine hydrobromide (SKF81297) (5 mg/kg; Sigma-Aldrich), mice were perfused transcardially with 4% paraformaldehyde, and brains were postfixed overnight in 4% paraformaldehyde. Brains were cryoprotected in 30% sucrose until they sank, and 40 μm coronal sections were cut on a cryostat, and then stored at −20° until use. Successive sections separated by 120 μm were processed for detection of p-ERK1/2 immunoreactivity. Sections were first washed in 0.1 m Tris-buffered saline followed by blocking in 4% donkey serum and 0.1% Triton. Sections were incubated overnight at 4°C in a 1:200 dilution of phospho-p44/42 extracellular signal-regulated kinase 1/2 (ERK1/2) antibody (Cell Signaling; no. 9101) in 4% donkey serum and 0.1% Triton. A biotinylated horse anti-rabbit IgG (1:500; Vector Laboratories) and peroxidase-conjugated avidin–biotin complex (VECTASTAIN Elite ABC kit; Vector Laboratories) were used, and the reaction was visualized by using SigmaFast DAB tablets (Sigma-Aldrich). For counting p-ERK1/2-positive neurons, six successive sections through the nucleus accumbens separated by 120 μm, beginning at ∼1.20 mm anterior to bregma, were used for counting. A 100 μm2 counting window was drawn using Stereo Investigator 6 software (MicroBrightField) medial to the anterior commissure for nucleus accumbens (NAcc) shell counts and ventral to the anterior commissure for NAcc core counts. One count was made per section, and the total numbers of p-ERK-positive neurons in each of six consecutive sections were counted, and the average of the six counts was taken for each mouse. Correct location of counting windows was confirmed by referencing a mouse brain atlas (Paxinos and Franklin, 2001).
Statistical analysis
For the instrumental conditioning data, the latency to check food receptacle, bout length, head entry, lever press rate, and trials to reach criterion were analyzed with Student's t test. Outcome latency was analyzed using a two-way ANOVA with repeated-measures design. For appetitive pavlovian conditioning data, CS+ head entry behavior in Figure 2A was analyzed using a three-way ANOVA with repeated-measures design. Head entries after pellet dispenser activation, total head entry rate, and ITI rate were analyzed using a two-way ANOVA with repeated-measures design. Effect of outcome devaluation and contingency degradation were analyzed using a two-way ANOVA with repeated-measures design. Fear conditioning was analyzed using a two-way ANOVA with repeated-measures design, and baseline differences were analyzed using Student's t test. Cell counting data were analyzed using a two-way ANOVA with repeated-measures design. All p values and effects are indicated in the text. All error bars are ±SEM.
Results
AC5KO mice exhibit altered distribution of goal-directed behaviors in operant tasks
To determine the behavioral consequence of AC5 deficiency, mice were first tested for any overt locomotor deficits. AC5KO mice did not differ from WT littermates in distance traveled when tested in the open field, and showed normal dopamine-dependent locomotor activity (supplemental Fig. S1, available at www.jneurosci.org as supplemental material). Because the dorsal striatum and NAcc, areas with high AC5 expression (for AC5 expression pattern, see supplemental Fig. S2, available at www.jneurosci.org as supplemental material), play important roles in associative learning; we tested WT and AC5KO mice in an instrumental learning paradigm. Mice were trained to press a lever for a sucrose reward, and then tested on an RR20 schedule of reinforcement. Analysis of the distribution of responses on the last day of testing revealed marked differences between AC5KO and WT mice. Figure 1, A and B, are representative raster plots of WT and AC5KO mice that show their actions during a single session. Whereas WT mice focused their efforts on lever pressing and only periodically checked the food receptacle, AC5KO mice checked the receptacle frequently (Fig. 1A,B). Comparing the latencies between rewarded and unrewarded lever presses and checking the food receptacle indicated that WT mice discriminated the rewarded lever press from the unrewarded lever press, whereas AC5KO mice did not. WT mice exhibited a long latency to check after unrewarded lever presses and short latency after rewarded presses (Fig. 1C) (n = 8 WT; latency effect, p < 0.0001). In contrast, AC5KO mice showed no difference in latency to check the food receptacle between rewarded and unrewarded lever presses (Fig. 1C) (n = 8 AC5KO; latency effect, p = 0.47). In addition, the average length of a bout of lever pressing before checking the food receptacle was significantly shorter in AC5KO mice compared with WT controls [n = 8 per genotype (geno); WT, 8.012 (±2.77 SD); AC5KO, 2.176 (±0.833 SD); genotype effect, p < 0.0001], indicating that completing the required number of presses to obtain a pellet was disrupted in the mutants by unnecessary head entries [mean head entry rate, WT, 2.022 (±0.37 SD); AC5KO, 10.762 (±4.386 SD); genotype effect, p < 0.0001]. Although the AC5KO exhibited significantly more head entries into the food receptacle than WT mice, they pressed at a similar or slightly lower rate [WT, 10.274 (±5.47 SD); AC5KO, 6.65 (±2.9 SD); genotype effect, p = 0.126]. There were no significant differences in total goal-directed actions (lever press plus head entries) or overall reinforcement rate during sessions (supplemental Fig. S3, available at www.jneurosci.org as supplemental material). AC5KO and WT mice acquired the lever press response at an FI20 s schedule of reinforcement (Fig. 1D) (reward bin by geno, p = 0.98) and both groups required a similar number of sessions to reach learning criterion [WT, 3 (±2.62 SD); AC5KO, 3.125 (±1.73 SD); p = 0.912]. Yet they reached different asymptotic performance as AC5KO mice showed a greater latency to receive rewards (Fig. 1D) (genotype effect, p = 0.023), which is consistent with their inefficient performance caused by a higher head entry rate during these sessions (supplemental Fig. S4, available at www.jneurosci.org as supplemental material) (genotype effect, p = 0.0072).
AC5KO mice lack reward prediction in appetitive pavlovian conditioning
The instrumental conditioning procedure used has both an instrumental component (learning the lever press action leads to reward outcome) and a pavlovian component [associating the click of the pellet dispenser and sound of pellet drop with the availability of sucrose pellet (Kelley, 2004)]. The instrumental performance of AC5KO mice suggested a deficit in the pavlovian component of the task, that is, an inability to use the cues that indicate reward availability to determine when to press and when to check the food receptacle. To directly assess this possibility, mice were tested in a pavlovian appetitive conditioning task. Mice were presented with a 12 s tone followed by pellet dispenser click and pellet drop (CS). Head entries into the feeder were counted and binned (2 s bins) in histograms around CS presentation and pellet delivery (Fig. 2A). Learning in WT mice was indicated by an increase in discriminative head entries in response to CS presentation (Fig. 2A). Across sessions, WT mice increased anticipatory head entries, whereas AC5KO mice did not (Fig. 2A) (n = 8 per genotype; days by bin by genotype interaction, p < 0.0001). Discriminative head entries immediately after the pellet dispenser activation and pellet drop further highlighted a significant learning curve in WT but not in AC5KO mice (Fig. 2B) (session by genotype interaction, p < 0.0001). Although no significant difference between genotypes was found for head entry rate during the ITI, WT but not AC5KO mice showed a trend of decreasing ITI head entries across sessions (Fig. 2D) (genotype effect, p = 0.27; session by genotype interaction, p = 0.68). Total head entry rate in the session did not differ between AC5KO and WT mice across days (Fig. 2C) (genotype effect, p = 0.58; genotype by session interaction, p = 0.97), indicating that the AC5KO mice have no motor or motivational impairments. These data, compared with those in Figure 1, suggest that AC5KO mice do not make excessive head entries; rather, they make indiscriminative head entries.
AC5KO mice form normal action–outcome contingencies
To test whether the AC5KO phenotype in instrumental conditioning derives solely from abnormalities in the pavlovian component of the task or whether they additionally have deficits forming action–outcome contingencies or estimating reward value, mice were assessed for changes in lever-pressing behavior in response to outcome devaluation or contingency degradation. First, mice were tested for their ability to suppress their responding when the reward is devalued by sensory-specific satiety. One day after RR20 training, subjects were fed ad libitum either sucrose pellets (devalued group) or regular chow (valued group) for 1 h before testing for the effect of prefeeding [amount consumed during prefeeding shown in supplemental Fig. S5 (available at www.jneurosci.org as supplemental material)]. Both groups of mice decreased their lever press rate when the outcome (sucrose pellets) had been devalued, suggesting a similar ability to associate the value of the outcome with their instrumental response and adjust responding accordingly (Fig. 3A) (n = 8 per genotype; genotype effect, p = 0.94; value effect, p = 0.02; geno by value, p = 0.86) (supplemental Fig. S6A, head entry rate, available at www.jneurosci.org as supplemental material). Next, mice were tested for the ability to suppress responding when the response outcome contingency is degraded, that is, when rewards are delivered independent of lever pressing. Food and water-restricted mice were trained to press one lever for a water reward and another for sucrose pellets on an RR20 schedule of reinforcement. After 4 training days with both levers, the sucrose contingency was degraded by random delivery of sucrose independent of lever pressing. Analysis of lever press rate during the session before contingency degradation compared with last session of contingency degradation showed that contingency degradation suppressed responding in both WT and AC5KO mice (Fig. 3B) (genotype effect, p = 0.28; degradation effect, p < 0.0001; geno by degradation, p = 0.063) (supplemental Fig. S6B, head entry rate, available at www.jneurosci.org as supplemental material). This effect was specific for the degraded sucrose lever, because the lever press rate for the nondegraded water lever was not significantly altered (Fig. 3C) (genotype effect, p = 0.09; degradation effect, p = 0.94; genotype by degradation, p = 0.32). Both groups of mice exhibited a lower lever press rate for water compared with sucrose, and AC5KO mice exhibited a slightly lower rate of lever pressing on the water lever than WT mice. This is similar to the lower rate of lever pressing for sucrose before contingency degradation (Fig. 3B) and in RR20 training, which could be attributable to the excessive head entries. The lower rate of lever pressing was not attributable to a difference in restriction protocols, or total water consumption, because both groups of mice drank a similar amount of water when restricted (supplemental Fig. S7, available at www.jneurosci.org as supplemental material). Yet, before degradation, AC5KO mice pressed the water on average once per minute, whereas in comparable instrumental learning experiments an inactive lever was only sampled once every 4 min (data not shown), suggesting their lever press behavior for water was goal-directed. Importantly, both AC5KO and WT mice showed a clear reduction in lever pressing for sucrose after contingency degradation, suggesting their pressing on the sucrose lever was goal-directed and under the control of a contingency between the action and outcome.
AC5KO mice show normal aversive pavlovian conditioning
To test whether the pavlovian conditioning impairment in AC5KO mice was specific for appetitive conditioning, we tested aversive conditioning in a fear conditioning paradigm. In the conditioning chamber, mice were presented with a 30 s tone (CS) followed by a 2 s, 0.5 mA footshock [unconditioned stimulus (US)]. After conditioning, learning was measured by increases in freezing behavior over preconditioning rates in response to either the contextual cues of the chamber or presentation of the tone CS. There was no significant difference in freezing behavior between AC5KO and WT mice in response to the chamber (context) or tone presentation before conditioning (Fig. 4B) (baseline, cue genotype effect, p = 0.3; context, p = 0.2). Twenty-four hours after training, mice were placed back in the chambers, and freezing as a result of context was measured (Fig. 4A). Contextual freezing did not differ between genotypes (Fig. 4A) (n = 8 WT, 7 AC5KO; genotype effect, p = 0.4; context effect, p = 0.0002; context by geno, p = 0.69). The next day, mice were placed in a modified chamber to eliminate contextual cues (see Materials and Methods) and the tone was presented to measure freezing in response to the CS. Freezing behavior in the altered context was minimal, and no significant difference was seen in freezing behavior before tone delivery (data not shown) (genotype effect, p = 0.13). In response to cue presentation, AC5KO and WT mice exhibited similar freezing behavior (Fig. 4B) (genotype effect, p = 0.3; cue effect, p < 0.0001; cue by geno, p = 0.52), suggesting a similar association between the CS and US was formed in the AC5KO and WT mice. These data also suggested intact sensory processing (i.e., normal ability to hear tones and perceive shock) in AC5KO mice, although the sound of pellet drop in the instrumental conditioning task is quieter than the tone in pavlovian conditioning tasks. In situ hybridization studies suggest AC5 levels are very low in the amygdala (Matsuoka et al., 1997) (supplemental Fig. S1, available at www.jneurosci.org as supplemental material), a neural substrate for fear conditioning (LeDoux, 2000), supporting the notion that AC5 deficiency selectively affects striatum/nucleus accumbens-dependent learning.
AC5KO mice have impaired D1 receptor-mediated ERK activation in the NAcc
D1 receptor antagonism can inhibit appetitive pavlovian learning (Eyny and Horvitz, 2003). Although AC5KO mice lack D1-stimulated cAMP production, they retain D1-stimulated locomotor activity, suggesting that specific D1-mediated signaling pathways disrupted by loss of AC5 may be essential for appetitive pavlovian learning. Recent studies indicate a potential role of the ERK1/2 in the NAcc in appetitive pavlovian learning (Shiflett et al., 2008). We therefore examined the ability of a D1 agonist to induce activation of ERK in WT and AC5KO mice. Mice were killed 15 min after an injection of either vehicle (0.9% saline) or SKF81297 (5 mg/kg), and brains were prepared for immunohistochemistry. No significant difference in ERK1/2 phosphorylation was seen between genotypes after vehicle injection. However, analysis of D1 agonist-mediated phosphorylation of ERK1/2 in the NAcc revealed marked differences between AC5KO and WT mice. Although injection of D1 agonist produced a robust increase in phosphorylated ERK1/2-positive neurons in the NAcc shell and core of WT mice, this response was severely diminished in AC5KO mice (Fig. 5A,B) (n = 3 per genotype per treatment; core, genotype effect, p = 0.012, drug effect, p = 0.0001, geno by drug interaction, p = 0.0009; shell, genotype effect, p = 0.025, drug effect, p = 0.0015, geno by drug interaction, p = 0.0062). No significant D1-mediated increase in phosphorylated ERK1/2-positive neurons was seen in the dorsal striatum, consistent with published results (Gerfen et al., 2002). This suggests a profound decoupling of D1 receptor activation from downstream activation of ERK1/2 in the NAcc of AC5KO mice, which may underlie the deficits seen in reward learning.
Discussion
The current study indicates a critical role for the striatum-enriched AC5 in appetitive pavlovian learning. In appetitive pavlovian conditioning tasks and in appetitive instrumental conditioning tasks with a pavlovian component, AC5KO mice exhibited impairment in their ability to use cues to predict the availability of reward. In contrast, they acquired an instrumental response for food reward, and their instrumental responding was sensitive to changes in both outcome value and action–outcome contingency. AC5KO mice also showed normal fear conditioning, indicating intact aversive pavlovian learning. These data indicate that AC5 is specifically required for appetitive pavlovian learning.
Distinguishing learning and performance deficits is an enduring challenge in behavioral studies. Pavlovian deficits in AC5KO mice may be explained by a performance deficit rather than by a learning deficit per se. However, a number of observations suggest that this is unlikely. AC5KO mice make the head entry responses at rates similar to WT mice, indicating no motor impairment. Moreover, like WT, they show an increase in head entry behavior between the first and second sessions, indicating they increase their performance of this behavior in response to reward availability. They consume the same quantity of sucrose pellets in both the pavlovian and instrumental tasks, indicating there are no motivational deficits and that the reward is equally desirable for both groups of mice. In addition, AC5KO mice can adjust their instrumental performance in response to changes in the value of reward, suggesting mechanisms linking motivation and performance are intact. In summary, we observe no performance deficits in the AC5KO mice except that they are unable to use cues to predict the availability of reward.
The role of DA in appetitive pavlovian conditioning has been studied extensively (Schultz, 1998; Dalley et al., 2002; Eyny and Horvitz, 2003; Day et al., 2007). It has been demonstrated that DA cells increase their activity in response to unexpected reward, which has led to the “prediction error” hypothesis of DA (Schultz, 1998). In this model, a sudden burst of DA activity in response to unexpected reward serves as a teaching signal, reinforcing an association between the reward and the preceding stimuli so that the animal can better predict reward in the future (Schultz et al., 1997; Schultz, 2002). Current understanding of the dopaminergic modulation of corticostriatal plasticity is consistent with this model (Reynolds et al., 2001; Reynolds and Wickens, 2002). In the presence of low extracellular DA associated with tonic activity, coincident presynaptic and postsynaptic activity at medium spiny neurons (MSNs) results in long-term synaptic depression (Calabresi et al., 2007); however, with transient, high concentrations of DA achieved during phasic DA release, this same coincident activity results in long-term potentiation (Wickens et al., 1996; Reynolds and Wickens, 2002). This arrangement serves to integrate midbrain dopaminergic and cortical glutamatergic input in the striatum (Reynolds et al., 2001; Reynolds and Wickens, 2002) and provides a mechanism whereby DA can act as a teaching signal by facilitating synaptic plasticity in response to reward (Reynolds and Wickens, 2002). Reports that either D1 or NMDA receptor antagonism can inhibit appetitive pavlovian learning (Di Ciano et al., 2001; Eyny and Horvitz, 2003) are consistent with this view. Previous studies have demonstrated a downregulation of D1 receptors in the striatum of AC5KO, and modulation of cAMP levels in response to D1 or D2 stimulation is impaired in the AC5KO mice (Iwamoto et al., 2003). In addition, the data presented here indicated a loss of D1-mediated increases in activation of ERK1/2. Thus, it is reasonable to hypothesize that these alterations in D1-mediated signaling cascades in AC5KO mice may have significant effects on corticostriatal plasticity. The loss of pavlovian learning further suggests possible plasticity deficits in the striatum of AC5KO mice.
A competing perspective on the role of DA in reward is the “incentive salience” hypothesis (Berridge, 2007). In this view, DA can attribute incentive salience to the CS and therefore the CS serves to motivate behavior. Thus, it is possible that the AC5KO mice do form the CS–US association, but that the CS does not exert incentive control over head entry behavior. In the outcome devaluation experiment, however, the AC5KO mice modulate their lever-pressing behavior in response to changes in the value of the outcome, suggesting that motivational control of behavior is intact. In addition, both WT and AC5KO mice equally scale up their head entry behavior in response to reward (Fig. 2C); only the AC5KO do it indiscriminately, suggesting a deficit in reward prediction rather than incentive control of behavior.
The results presented here indicate a decoupling of D1 receptor activation from phosphorylation of ERK1/2 in the NAcc of AC5KO mice. It has been suggested that D1 regulates the phosphorylation of ERK1/2 via a cAMP-dependent regulation of DARPP-32, which inhibits protein phosphatase-1, which induces an activation of ERK1/2 (Valjent et al., 2005). Although our data extend previous observations of impaired D1-cAMP signaling in the striatum of AC5KO mice (Iwamoto et al., 2003), and recent studies have reported an increase in ERK1/2 activation after pavlovian conditioning (Shiflett et al., 2008), additional studies will be required to determine whether loss of D1-mediated ERK1/2 activation underlies the deficits in pavlovian conditioning seen in AC5KO mice.
Because appetitive pavlovian learning is an important component of many behaviors, impairments in pavlovian learning could potentially have many consequences. Impaired performance in instrument behavior is one such consequence, as demonstrated by our data. The reward pathway is often implicated in impulsive choice behavior (Belin et al., 2008). Although the excessive head entry behavior displayed by AC5KO mice in Figure 1 was most likely attributable to indiscriminative head entries, how impairment in appetitive pavlovian learning may affect impulsive choice and addiction remains to be examined in the AC5KO mice.
The present study showing that AC5 plays a significant role only in the pavlovian component of instrumental conditioning needs to be reconciled with published work reporting that inhibition of the cAMP pathway in the ventral striatum inhibits learning of an instrumental task (Baldwin et al., 2002). In one study, rats receiving protein kinase A (PKA) inhibitors acquired the instrumental lever press, but acquisition was slowed and performance was inhibited. However, the specific task in that study incorporated a strong pavlovian component, that is, a correct lever press was always followed by a pavlovian cue (3 s house light offset and red signal light onset) that was then followed by food delivery (Baldwin et al., 2002). In their design, impaired pavlovian learning would impair instrumental performance. In an alternative task design in which a lever press is immediately followed by reward delivery, minimizing the pavlovian component, PKA inhibitors had no effect on instrumental responding for food reward (Self et al., 1998).
The studies presented here indicate that AC5KO mice are sensitive to changes in outcome value and action–outcome contingency. However, previous studies have reported that lesions of the dorsomedial striatum (Yin et al., 2005) and the NAcc core (Corbit et al., 2001), areas with high AC5 expression, produce insensitivity to outcome devaluation, suggesting a role for these structures in this aspect of instrumental learning. The selectivity of the genetic manipulation used here, which preserves the integrity of the basal ganglia thalamocortical loop and non-AC5-dependent signaling pathways, preserves sensitivity to outcome value. This suggests two possible interpretations about the role of DA-mediated signaling mechanisms in instrumental learning. One is that DA signaling is critical to these behaviors but mediated through downstream effectors other than AC5. Although AC5KO mice show severely reduced DA receptor modulation of cAMP content, reduced D1 receptor levels in the striatum, and severely diminished D1-mediated activation of ERK1/2, AC5KO mice respond robustly to D1 receptor stimulation in locomotor assays, suggesting the significance of alternative downstream signaling pathways in MSNs (Iwamoto et al., 2003). Recent studies have identified non-cAMP-dependent DA receptor signaling pathways in MSNs mediating DA-dependent behaviors (Beaulieu et al., 2005). Alternatively, DA-dependent signaling mechanisms may not be required for some forms of associative learning, as has been reported with genetically engineered DA-deficient mice (Robinson et al., 2005). Discriminating between these possibilities will require future experiments that manipulate other downstream effectors of DA signaling.
In conclusion, the present study demonstrates that a specific cAMP isoform, AC5, is required for appetitive pavlovian learning. Genetic deletion of AC5 abolishes the animal's ability to use environmental cues to predict reward availability. This deficit further impairs instrumental performance when the task includes a pavlovian component, that is, the need to use predictive cues. This demonstrates that the striatum-enriched AC5 plays as an important role in classical pavlovian learning.
Footnotes
-
This work was supported by National Institute of Mental Health Grants 1F31MH076422 (M.A.K.) and MH66216, National Institute on Drug Abuse Grants 1F32DA020427 (J.A.B.) and DA022269, National Heart, Lung, and Blood Institute Grant HL059139, National Institute of General Medical Sciences Grant GM067773 (Y.I.), and The Edward Mallinckrodt Jr Foundation (X.Z.). We thank Rui Costa, Linan Chen, Wei-Jen Tang, Jon Horvitz, Peter Balsam, Peggy Mason, and Cristianne Frazier for helpful discussions, and Zhen Fang Huang Cao and Stephanie Tang for technical assistance.
- Correspondence should be addressed to Xiaoxi Zhuang, Department of Neurobiology, The University of Chicago, 924 East 57th Street, Knapp R214, Chicago, IL 60637. xzhuang{at}bsd.uchicago.edu