The present study evaluated the proposal that mice with a targeted deletion of the glutamate receptor 1 (GluR1) subunit of the AMPA receptor are impaired in using an instrumental or pavlovian signal to gain access to a representation of the sensory-specific motivational properties of a primary reward. In experiment 1, mice were trained to approach two goal boxes in a plus-maze; each goal box contained a different reward (sucrose solution vs food pellet). After acquisition, one of the rewards was devalued by an outcome-specific satiety procedure. Subsequent test trials performed in extinction showed an increase in the latency to enter the devalued goal arm, relative to the nondevalued goal arm in control but not GluR1-/- mice. In experiment 2, a similar outcome-specific satiety procedure was used to examine the effects of reward devaluation on an instrumental nose-poke response. During testing, control but not GluR1-/- mice decreased their rate of responding on a nose poke associated with a devalued reward. A subsequent choice test showed that GluR1-/- mice were able to discriminate between the devalued and nondevalued outcomes used in both experiments. These deficits mirror those seen after lesions of the basolateral amygdala and suggests that GluR1-mediated neurotransmission in this region contributes to encoding the relationship between sensory-specific aspects of reward and their incentive value.
AMPA receptors are hetero-oligomeric proteins composed of subunits glutamate receptor 1 (GluR1) to GluR4 (or GluRA to GluRD) (Shi et al., 2001). Recent research using genetically modified mice has implicated the GluR1 subunit in both long-term potentiation and learning and memory processes (Zamanillo et al. 1999; Reisel et al., 2002; Schmitt et al., 2003) (but see Hoffman et al., 2002).
Mead and Stephens (2003a) reported that mice lacking the GluR1 subunit of the AMPA receptor are capable of forming a pavlovian association between a conditioned stimulus (CS) and the delivery of a reward [unconditioned stimulus (US)] and showed normal pavlovian conditioned approach responses to the site of food delivery. In addition, a pavlovian CS augmented instrumental responding for the same outcome in both knock-out and control mice [pavlovian-instrumental transfer (PIT)]. However, when required to learn a novel instrumental response to obtain presentations of a CS (Mackintosh, 1974) or respond under a second-order schedule of reinforcement, GluR1-/- mice were impaired relative to control mice.
Mead and Stephens (2003a) suggested that the GluR1 deletion disrupted processing of the motivational properties of a US performed by the basolateral amygdala (BLA). Nevertheless, GluR1-/- mice clearly remained sensitive to some aspects of reward as shown by their unimpaired performance in pavlovian and instrumental conditioning tasks. This raises the question of the nature of the reward representation disrupted by the GluR1 mutation. Representations of a US involve both sensory and motivational properties of reinforcement, and cues may form associations with both of these features of a reward representation (for review, see Balleine, 2001). Recent research has shown that lesions of the BLA in rats disrupt the formation of representations involving the sensory properties of a US and their incentive value (Blundell et al., 2001; Balleine et al., 2003). Second-order conditioning and conditioned reinforcement procedures (Mead and Stephens, 2003) do not discriminate between the influence of sensory-specific features of reinforcement and the general motivation properties of reward on performance (cf. Holland and Rescorla, 1975; Stanhope, 1992) (for review, see Gewirtz and Davis, 2000).
In the present study, we used an outcome-specific devaluation procedure to examine whether GluR1-/- mice are able to encode the sensory-specific incentive motivational properties of reward (Balleine and Dickinson, 1991, 1998; Berridge, 1996, 2001; Balleine, 2001). In this procedure, animals are trained to perform different actions, each associated with a different reward. The motivational significance of one of the rewards is then changed, and the propensity of the animal to subsequently perform the action associated with the devalued outcome is assessed (Balleine and Dickinson, 1991, 1998). Typically, responding to the cue associated with the devalued outcome is lower in control animals. In contrast, rats with BLA lesions fail to suppress responding to the devalued cue (Hatfield et al., 1996; Balleine et al., 2003). In two experiments, we examined whether GluR1-/- mice would be sensitive to an outcome-specific devaluation procedure using a runway approach task and an instrumental nose-poke procedure.
Materials and Methods
Experiment 1 was conducted in two replications with age-matched GluR1-/- (n = 10) and wild-type (n = 10) mice. The mice were bred from heterozygous GluR1+/- parents at the Department of Experimental Psychology, University of Oxford, and transferred to the School of Psychology, Cardiff University, for behavioral testing at 8 months of age [for details of genetic construction, breeding, and subsequent genotyping, see Zamanillo et al. (1999)]. The mice were derived from 129S2/SvHsd and C57BL/6JolaHsd background strains. Mice were housed two or three to a cage under a 12 h light/dark cycle (lights on 7:00 A.M. to 7:00 P.M.). Before the start of training, mice were reduced to 85% of their ad libitum weights and weighed 25-30 g at the beginning of the experiment. All testing took place during the light phase between 9:00 A.M. and 5:00 P.M. Experiment 2 was conducted with experimentally naive agematched GluR1-/- (n = 12) and wild-type (n = 12) mice that were maintained under the same schedule of food deprivation as described above. All experiments complied with United Kingdom Home Office guidelines on the use of animals in scientific procedures.
The elevated plus maze consisted of four arms, which were 8 cm wide, 50 cm long, and 10 cm high. The floor of the maze was made of wood and was painted white. The surrounding walls were made of clear Perspex. The guillotine doors, used to block the start, and goal arms were made of opaque black Perspex. Each arm contained a circular food well sunk into the floor at the end of the runway. In each goal box, a pair of infrared photo beam sensors were located on either side of the food well and were used to time the latency of the mice to traverse the runway from the start box to the goal box. The latency data were recorded using an IBM-compatible personal computer using the Graphic State Notation package (Coulborn Instruments, Allentown, PA). The maze was elevated 90 cm from the floor. Two ceiling-mounted fluorescent lights illuminated the experimental room. A variety of visual cues (e.g., benching, racks and posters) was displayed on and along each of the four walls of the testing room.
Instrumental conditioning was performed in six identical, standard operant chambers (Med Associates, St. Albans, VT) housed in sound attenuating boxes. The dimensions of the chambers were 15 cm wide, 12 cm high, and 14 cm deep. The chambers were made from clear polycarbonate, and the front and back of the chambers were fabricated from stainless steel. The floor was a standard grid floor, with 20 stainless steel rods, each with a diameter of 2.5 mm, arranged with centers 5 mm apart. The chambers were fitted with two nose-poke manipulanda, each 10 mm in diameter, and located at identical heights (15 mm) on the left and right sides of the front wall. Each nose poke contained a yellow stimulus light located at the rear of the recessed hole and a photo beam sensor to monitor nose-poke entries. Located between the two nose pokes was a trough-type dual pellet/dipper dispenser, into which either 0.1 ml of liquid reward or food pellets could be delivered. This modular unit featured a 2.5 × 2.5 cm receptacle opening with a photo beam inside. A speaker was mounted to the outside of the chamber on the wall opposite the nose pokes. The speaker was connected to a 3 kHz tone generator. A heavy-duty clicker module was also mounted on this wall and could be switched on and off to emit a 10 Hz train of clicks. The tone and clicktrain were measured and matched to emit a sound level of ∼80 dB. A 28 V, 100 mA house light was mounted at the top center of the inner wall. An IBM-compatible computer equipped with Med-PC software (Med Associates) controlled and recorded all stimuli and responses.
Experiment 1: plus maze acquisition and outcome devaluation
During each experimental phase, the plus maze was used as a simple runway. This effectively created two intersecting runways, each with a start box at one end and a goal arm at the other. One reinforcer was assigned to one goal box and the remaining reinforcer to the alternative goal box. The rewards were individual 20 mg food pellets (Noyes precision pellets, Formula A1; Research Diets, New Brunswick, NJ) and 0.1 ml 20% (wt/vol) sucrose solution. The allocation of goal box and reward type was fully counterbalanced for both the GluR1-/- and control mice.
After 2 d of habituation, which involved 20 min of exposure to the plus maze and the rewards, the mice were then trained to traverse each runway from the start arm to the goal arm. Entries into the goal boxes were rewarded with access to one of the two outcomes. During each of the 10 d of training, each mouse received two 10-trial training sessions, one in each of the alternate runways. These sessions were separated by an ∼4 h interval. The order in which the animals received exposure to each runway, either in the morning or afternoon, was counterbalanced across days and within groups. The arms of the maze were wiped down with 70% alcohol solution between each run in the apparatus.
After this stage of training, the mice received an outcome devaluation test. This was achieved by prefeeding the mice with one of the two outcomes for 120 min in their home cages. Consumption of the outcome was expected over time to induce a progressive reduction in food deprivation, whereas consummatory contact with the outcome provided the opportunity for incentive learning about the reduction in palatability of the outcome (Balleine and Dickinson, 1998). The allocation of the reward for the devaluation treatment was counterbalanced within each group. Immediately after the devaluation treatment, the mice received a series of test trials performed in extinction. Half of the mice (equal numbers of GluR1-/- and control mice) were tested initially on the devalued goal arm, whereas the remaining half was tested on the nondevalued goal arm. Mice were allowed to traverse each runway for a total of 18 trials. However, those animals that failed to complete a single run within 2 min were considered to have extinguished responding, and testing was discontinued. At completion of testing on one goal arm, mice were immediately tested on the alternative goal arm.
Experiment 2: instrumental conditioning and outcome devaluation Stage1: magazine training.
Each animal was assigned to one of six operant chambers and, thereafter, was always trained in that chamber; the assignment of each chamber was counterbalanced between groups. At the start of the session, the house light came on and remained on during the session. Throughout training, the rewards were either a single 20 mg of Noyes food pellets or 0.1 ml of 20% sucrose solution. Mice were trained to collect food rewards for 2 d, with two 20 min sessions per day. The rewards were delivered on a random time 60 s schedule. Magazine entry during this training session was recorded. Half the mice (equal numbers of GluR1-/- and control mice) were trained to collect food pellets in the morning session, and half were trained to collect sucrose solution. In the afternoon session, mice received identical training with the alternative reward. The next day, the order of training was reversed, so that each mouse received each reward for one morning and afternoon session
Stage 2: nose-poke training. After magazine training, the mice were initially trained to respond on the nose-poke manipulanda during two sessions with a continuous schedule of reinforcement (CRf). Each session lasted for 20 min. The mice received two separate training sessions on each nose poke separately with background illumination provided by a house light. After two sessions of training, the house light was turned off to enhance the salience of the nose-poke light.
On the next 4 d of training, each session was 20 min long, and the mice received two training sessions each day, one in the morning and the second (∼4 h later) in the afternoon. Action-outcome assignment was counterbalanced within each group. Throughout training, mice were given two separate sessions each day, one on the right nose poke and the other on the left nose poke, with the action that was trained first on each day alternating from one day to the next. During the training phase, both nose-poke manipulanda were present, but only the active nose poke was illuminated. Mice were initially trained to respond for 2 d with a CRf schedule. If animals did not complete 20 nose pokes, they underwent an additional training session on that nose poke before proceeding to the next stage of training. To increase the overall rate of nose-poke responding in each session, the schedule of reinforcement was made progressively leaner. The mice were first transferred to a fixed ratio 5 schedule, during which every fifth nose poke resulted in the delivery of reward [fixed response-5 (FR-5) schedule]. For the final session of training, the schedule was advanced to a FR-15 schedule. Mice that failed to complete 50 nose pokes during the final day of training were excluded and did not proceed to the discrimination phase of training. A total of four control mice and one GluR1-/- mouse failed to reach this criterion and were subsequently excluded from the remainder of the experiment. Thus, a total of eight control mice and 11 GluR1-/- mice continued to the biconditional discrimination training stage.
Stage 3: biconditional stimulus-response-reinforcer training. In the discrimination phase, which lasted 14 d, each session was 30 min long and consisted of 10 alternating 2 min presentations of either a 3 kHz tone (at 80 dB) or a 10 Hz train of clicks (at 80 dB), with an intertrial interval (ITI) of 1 min. During the discrimination training stage, both nose pokes were illuminated. The assignment of the subjects to the biconditional stimulus-action-outcome discriminations was counterbalanced. For half the mice in each group, activation of the right nose poke during presentations of the tone resulted in the delivery of food pellets, whereas activation of the left nose poke during presentations of the clicker resulted in the delivery of sucrose solution. For the remaining mice in each group, the stimulus-action-outcome assignments were reversed. During the ITI, reward was not available. The first discriminative stimulus presented in each training session was determined by the computer using a pseudorandom sequence that ensured the animals received equal numbers of each trial type in each session. For the first 2 d of training, reward delivery was available on a CRf schedule. On day 3, the mice were trained on a random interval (RI) 5 s schedule. This contingency continued for the following session, after which the reinforcement contingencies were altered to a RI 10 s schedule. An increment in the RI schedule then occurred every 2 d according to the following sequence: 15, 20, 25, and 30. Thus, during the final 2 d of discrimination training, reward delivery was made available on a RI 30 s schedule.
Stage 4: biconditional discrimination extinction test. After completion of training, mice received a test session conducted in extinction to examine whether performance was governed by within-session reinforcement contingencies or whether the mice had learned the appropriate instrumental contingencies. The procedure was identical to that used for the training session, but no rewards were delivered. After the extinction test, mice received 4 d of retraining on the original discrimination. For the first 2 d of training, reward delivery was available on a CRf schedule. On day 3, the mice were trained on a RI 15 s schedule. Finally, on day 4, mice were trained on a RI 30 s schedule, during which asymptotic performance was reestablished.
Stage 5: outcome devaluation and extinction test. The outcome devaluation test was conducted on the day after the final reacquisition session. This was achieved by prefeeding the mice with one of the two outcomes for 120 min in their home cages located in the holding room. The allocation of devaluation treatment to each mouse was counterbalanced for the stimulus (tone vs clicker), for the action (left nose poke vs right nose poke), and for the outcome (pellet vs sucrose). Immediately after the devaluation treatment, the mice received an extinction session. The procedure was identical to that described above. Finally, after the extinction test, the mice were placed back into their holding cages and were administered a 30 min choice test, in which both outcomes were presented. Food pellets were presented in a dish located at one end of the home cage. A bottle containing the sucrose reward was located at the opposite end of the cage. The amount of fluid and food consumed was obtained by weighing the containers before and after each choice test.
Experiment 1: plus maze and outcome devaluation
Figure 1a shows the latencies to transverse the runway from the start arm to the goal box for both GluR1-/- and control mice. During training, all mice showed a reduction in latency to retrieve both types of reinforcement. This impression was confirmed by a three-way mixed ANOVA with genotype, reinforcer type, and session as factors and revealed a main effect of session (F(9,26) = 3.936; p < 0.01). No other main effects or interaction terms were significant (largest F value; genotype, F(1,26) = 3.064; p > 0.08).
Outcome devaluation: extinction test
The results of the extinction test for GluR1-/- and control mice are shown in Figure 1b. During testing, an increasing number of control mice reached the 2 min latency criterion as the extinction test proceeded. Therefore, to provide a meaningful comparison with mutant mice, only the first five trails of the extinction test were analyzed. Up to this stage, all of the control mice successfully completed each trial. Inspection of Figure 1b suggests that control mice showed a gradual increase in latency to reach the goal box associated with the devalued reward across the test, relative to the nondevalued goal box. In contrast, GluR1-/- mice failed to show any evidence of this discrimination.
To evaluate these differences, a three-way mixed ANOVA was conducted with factors of genotype, devaluation treatment, and trial. This revealed no overall effect of genotype or devaluation treatment (largest F value; F(1,24) = 2.283; p > 0.10). However, there was a significant interaction between these two factors (F(1,24) = 4.408; p < 0.05). An analysis of simple main effects revealed that the control mice showed a significantly longer latency to reach the devalued goal box than the nondevalued box (F(1,24) = 6.518; p < 0.05). In contrast, there was no significant difference between latencies to enter the two goal boxes in GluR1-/- mice (F < 1).
Experiment 2: instrumental conditioning and outcome devaluation
Acquisition of conditional discrimination
Figure 2a shows the mean rates of responding in the correct and incorrect nose pokes across training for GluR1-/- and control mice. Inspection of this figure suggests that there was a tendency for GluR1-/- mice to respond at a higher rate than controls. A three-way mixed ANOVA, with genotype, session, and response type (correct vs incorrect) as factors, confirmed the level of responding was not significantly different between the two groups (F(1,72) = 2.219; p > 0.10). However, there was a main effect of session (F(13,936) = 10.374; p < 0.01) and response type (F(1,72) = 4.864; p < 0.01). The ANOVA also revealed a session-response type interaction (F(13,936) = 6.826; p < 0.01). To investigate the nature of this interaction, an analysis of the simple main effects was conducted. This revealed a main effect of response type from sessions 8-14 (smallest F value; session 8, F(1,118) = 4.537; p < 0.05). To ensure that we were sensitive to group differences that were not contaminated by differences in rates of responding, the data were transformed into a discrimination ratio. Discrimination ratios were calculated by dividing the mean rate of responding (in responses per minute) (Fig. 2b) to the reinforced stimuli by the mean rate of responding to reinforced and nonreinforced stimuli. A ratio that exceeds 0.5 indicates that responding during the correct stimulus presentations was greater than that during incorrect stimulus presentations. A two-way ANOVA was conducted on the discrimination scores, with genotype and session as factors, and revealed a significant main effect of session (F(13,238) = 8.55; p < 0.01). However, there was no main effect of genotype or interaction between these two factors (F < 1).
The results of the extinction test are presented in Figure 3a and show the discrimination ratios for GluR1-/- and control mice for the last session of training and during the extinction test. Both groups of mice maintained the discrimination indicating that the performance during acquisition was not conditional on cues supplied by the delivery of rewards within a session. A two-way ANOVA, with genotype and phase as factors, revealed a main effect of phase (F(1,17) = 45.595; p < 0.01), reflecting generally higher rates of responding during the extinction test, no effect of genotype (F(1,17) = 3.871; p > 0.05), or interaction (F < 1) between these factors.
Figure 3b shows the rate of nose-poke responding in 5 min bins across the extinction test. Inspection of this figure shows that the discrimination declined during the extinction session. However, this effect was more apparent in the control mice than in the GluR1-/- mice. A three-way mixed ANOVA, with factors of genotype, response type, and time bin, revealed a main effect of genotype (F(1,72) = 6.143; p < 0.02), response type (F(1,72) = 23.528; p < 0.01), and time bin (F(4,288) = 10.889; p < 0.01). In addition, a genotype-time bin (F(4,288) = 3.246; p < 0.05) and a response type-time bin (F(4,288) = 5.099; p < 0.01) interaction were revealed. However, no additional interactions were significant (F < 1). Simple main effects analysis conducted on the significant genotype-time bin interaction revealed a main effect of genotype at bins 2-4 (smallest value; bin 4, F(1,194) = 4.055; p < 0.05), with both GluR1-/- (F(4,288) = 3.239; p < 0.02) and control mice (F(4,288) = 9.852; p < 0.01) showing a decline in performance across the extinction session. Examination of the response type-time bin interaction revealed a significant effect of response type at each time bin (smallest value; bin 4, F(1,194) = 5.263; p = 0.05), with a progressive decline across the session in correct responses (F(4,288) = 15.073; p < 0.01) but not incorrect responses (F(4,288) = 1.324; p > 0.10). These results indicate that although GluR1-/- mice responded at a higher rate than control mice during extinction, both groups maintained the discrimination and showed a reduction in the rate of responding to the correct nose poke during the extinction test session.
Outcome devaluation: extinction test
Figure 4a shows the mean rates of nose-poke responding from control and mutant mice during the devaluation extinction test in 5 min bins. Inspection of this figure shows that control mice showed a lower rate of responding to the nose poke associated with the devalued outcome from the start of testing. The mice showed higher levels of responding to the nondevalued nose poke that declined during the test. In contrast, the rate of responding to the devalued and nondevalued nose poke was similar across the test for GluR1-/- mice. This impression was confirmed by an ANOVA with genotype, nose poke, and time bin as factors and revealed a nonsignificant main effect of group (F(1,17) = 3.62; p > 0.05), devaluation treatment (F < 1), and a nonsignificant interaction between these factors (F < 1). There was a main effect of time bin (F(4,68) = 4.81; p < 0.01), an interaction of nose poke with time bin (F(4,68) = 5.81; p < 0.01), and, importantly, a significant three-way interaction of group, nose poke, and time bin (F(4,68) = 3.88; p < 0.01).
To interpret the three-way interaction, separate ANOVAs were conducted for each genotype, with a within-subjects factor of devaluation treatment and time bin. For control mice, the analysis revealed a main effect of devaluation (F(1,7) = 10.67; p < 0.05), a main effect of time bin (F(4,28) = 3.66; p < 0.02), and a significant interaction between these factors (F(4,28) = 8.41; p < 0.01). Tests of simple main effects showed that there was a significant difference between responding on the devalued versus nondevalued nose pokes on bin 1 (F(1,7) = 9.45; p < 0.05) and on bin 2 (F(1,7) = 15.57; p < 0.01). There were no significant differences in nose-poke responding during bins 4 or 5 (maximum value; bin 4, F(1,7) = 1.50; p > 0.18).
A similar analysis performed on the data from GluR1-/- mutant mice showed no significant main effect of devalued versus nondevalued nose-poke responding (F < 1), time bin (F(4,40) = 1.93; p > 0.12), or interaction between these factors (F < 1).
An additional analysis with genotype and time bin as factors confirmed that the rate of responding on the devalued nose poke differed significantly between the two groups (F(1,17) = 7.02; p < 0.05). There was no main effect of time bin (F < 1) nor a significant interaction of this factor with group (F(4,68) = 1.91; p > 0.10). Thus, control mice showed a clear devaluation effect, as shown by their differential responding on the devalued and nondevalued nose pokes. In contrast, the devaluation treatment failed to alter nose-poke responding in GluR1-/- mice. They showed comparable levels of performance on the devalued and nondevalued nose pokes.
Figure 4b shows the results of the reward choice test for GluR1-/- and control mice. Both groups consumed more of the nondevalued reward then the devalued reward. A two-way ANOVA confirmed this observation, revealing no main effect of genotype (F(1,17) = 1.991; p > 0.05) but a main effect of food choice (devalued vs nondevalued; F(1,17) = 2.358; p < 0.05). Additionally, no interaction between the two factors was found (F < 1). Thus, the absence of a devaluation effect in the GluR1-/- mice cannot be attributed to a failure of GluR1 mice to discriminate between the two outcomes.
Finally, the lack of devaluation effect in the GluR1 mice cannot be attributed to a difference in the number of pellets or the amount of sucrose solution consumed by the two groups during the specific satiety treatment. Control animals ate, on average, 1.29 g of pellets and drank 2.18 ml of sucrose solution, and GluR1-/- mice ate 1.72 g of pellets and drank 2.32 ml of sucrose solution during the prefeeding phase. Analysis of the means revealed no significant differences between the groups (F < 1).
The main aim of the present study was to investigate the proposal that GluR1-/- mice are impaired in processing the sensory-specific incentive value of reward. In experiment 1, mice were trained to retrieve two different reinforcements from two different goal boxes in a plus maze. During a test trial in which one of the rewards had been devalued by a satiety devaluation procedure, control mice showed an increase in latency to enter the goal box associated with the outcome compared with the nondevalued goal box. In contrast, GluR1-/- failed to show evidence of a devaluation effect, and their latencies failed to differentiate between the two goal boxes. In experiment 2, GluR1-/- and control mice acquired a biconditional discrimination in which two discriminative stimuli signaled different action-outcome contingencies. In a specific satiety devaluation test, control, but not GluR1-/-, mice showed a lower rate of responding to the nose poke associated with the devalued outcome. Thus, although GluR1-/- mice were able to acquire the instrumental discrimination, their performance was unaffected by a manipulation that changed the motivational value of a specific reward.
The deficit in outcome-specific devaluation shown by GluR1 mutant mice is unlikely to reflect a gross impairment in the reinforcing properties of a US. For example, there was no evidence of deficits in the acquisition of either the runway approach task or the instrumental nose-poke discrimination (Mead and Stephens, 2003a). In addition, the deficit in the devaluation treatment is unlikely to reflect a simple impairment in discrimination between the two different rewards. In experiment 2, control and GluR1-/- mice showed a preference for consuming the nonsatiated reward. This pattern of results indicates that GluR1-/- mice remain sensitive to at least some effects of reward and can discriminate between the sensory features of the outcomes.
Recent work has suggested that the BLA is specifically involved in encoding the sensory-specific aspects of motivationally significant events and in associative processes that allow other stimuli to access the incentive value of their associated rewards (cf. Balleine et al., 2003). According to this account, pavlovian and instrumental learning in BLA-lesioned rats are supported by a reward representation system such as a stimulus-response (S-R) reinforcement process or one in which only the generalized properties of reward are encoded. Like rats with BLA lesions, GluR1-/- mice appear to possess a generalized reinforcement process that is sufficient to support instrumental and pavlovian responding (Mead and Stephens, 2003a; present study). In addition, GluR1-/- mice remain sensitive to the direct perception of the palatability of rewards, because they displayed the same ability as control mice to reject the sated reward in the home cages preference test. One conclusion that may be drawn from this pattern of results is that the GluR1-/- mutation disrupts processes that allow a cue to gain access to associatively activated representations of the sensory-specific features of reward and their incentive properties.
Mead and Stephens (2003a) reported that GluR1-/- mice were impaired in second-order conditioning and conditioned-reinforcement paradigms and suggested that the mutant mice were unable to use cues to assess the current motivational properties of reward. Theoretical characterization of the mechanisms that support second-order conditioning have focused on the following two types of explanation: (1) stimulus-stimulus (S-S) learning in which associations are formed between CS2 and CS1 and an associative chain between CS1 and the reward representation (Rescorla, 1979); and (2) S-R learning in which an association forms between CS2 and a response evoked by CS1 (Rizley and Rescorla, 1972). The second-order conditioning deficit in GluR1 mutant mice may be explained either by impaired learning about the sensory properties of the motivationally significant events or by impaired S-R learning. However, the latter explanation is unlikely given evidence from the present study that GluR1 mutant mice can acquire conditioned responding normally but show impaired reinforcer-specific devaluation. This fining suggests that GluR1-/- mice are able to acquire conditioned responding through an S-R mechanism or one in which the representation of the reward that does not include the sensory-specific incentive value of the outcome. Mead and Stephens (2003a) also showed that GluR1-/- mice were impaired in a conditioned reinforcement paradigm. Deficits on this type of task may also be explained by an S-S or an S-R learning impairment. Indeed, there is some evidence that monkeys and rats with BLA lesions can show partial or transient deficits in conditioned reinforcement (Burns et al., 1993; Malkova et al., 1997), which suggests that under some circumstances, S-R learning may be sufficient to support responding for a conditioned reinforcer (for additional discussion, see Blundell et al., 2001). It is important to note, however, that Mead and Stephens (2003a) also reported that GluR1-/- mice showed a normal PIT effect. The single outcome version of PIT has been taken to reflect the acquisition of general motivational properties by a first-order CS and also is typically unaffected by lesions of the BLA (Blundell et al., 2001). Together with the findings reported by Mead and Stephens (2003), the present results suggest that the GluR1 mutation leaves intact a process by which the mice learn about the general motivation attributes of rewarding events. Interestingly, Mead and Stephens (2003b) reported that mice possessing a GluR2 deletion showed a pattern of deficits opposite that of GluR1-/- mice on the same procedures and suggests that the GluR2 mutation may impair learning about the general motivational properties of reward. The results from the present study provide the first demonstration that mice possessing the GluR1 deletion are impaired in associatively activating a sensory-specific representation of reward.
The pattern of deficit on reward devaluation tasks in GluR1 mutant mice shows a striking parallel with findings from rats with BLA lesions (an area rich in GluR1 expression) (McDonald, 1996) (for review, see Cardinal et al., 2002). However, one caveat in assigning impaired BLA function to GluR1 mutant mice is that the deletion of the GluR1 subunit is expressed throughout the brain. The amygdala is part of a wider neural circuit that contributes to processing reward information in associative learning (Cardinal et al., 2002). For example, the BLA has strong reciprocal connections with the orbital frontal cortex (Baxter et al., 2000), and lesions of this region can disrupt performance on outcome devaluation paradigms (Gallagher et al., 1999; Pickens et al., 2003). Similarly, lesions of prelimbic cortex (PLC) disrupt performance in reinforcer devaluation paradigms (Gallagher et al., 1999; Pickens et al., 2003), including an outcome-specific version of this procedure (Balleine and Dickinson, 1998; Corbit and Balleine, 2003). One interpretation that has been offered for the PLC lesion deficit in instrumental learning may also be pertinent to our analysis of the GluR1 mutant mice. It has been suggested that impaired instrumental learning in PLC lesioned rats may reflect impaired working memory (for discussion, see Corbit and Balleine, 2003) and, thus, impaired selection and initiation of goal-directed actions (Fuster, 1997; Corbit and Balleine, 2003). However, there are behavioral differences between animals with frontal cortical damage and mice possessing the GluR1 deletion. For example, large medial prefrontal cortex lesions produce only mild deficits in spatial rewarded alternation in a T-maze (Shaw and Aggleton, 1993; Aggleton et al., 1995; Dias and Aggleton, 2000), and GluR1-/- mice show pronounced deficits even after repeated testing (Reisel et al., 2002; Schmitt et al., 2003). Nevertheless, an examination of the effects of the GluR1 mutation on learning processes supported by other neural systems that are involved in action outcome learning may be a fruitful avenue for future research.
In summary, we have shown that mice with a targeted deletion of the GluR1 subunit are insensitive to changes in the value of a US after an outcome-specific satiety devaluation treatment. These findings extend those reported by Mead and Stephens (2003) by showing that the GluR1 mutation disrupts the associative activation of a sensory-specific representation of the incentive motivational properties of reward, a function that is closely associated with the basolateral amygdala (Balleine et al., 2003).
This research was supported by a PhD studentship provided by the Cardiff University School of Psychology to A.W.J. We thank Simon Killcross for the generous use of equipment and helpful discussions.
Correspondence should be addressed to Dr. Mark A. Good, School of Psychology, Cardiff University, P.O. Box 901, Cardiff CF10 3YG, UK. E-mail:.
Copyright © 2005 Society for Neuroscience 0270-6474/05/252359-07$15.00/0