Acquisition and performance of instrumental actions are assumed to require both action-outcome and stimulus-response (S-R) habit processes. Over the course of extended training, control over instrumental performance shifts from goal-directed action-outcome associations to S-R associations that progressively gain domination over behavior. Lesions of the lateral part of the dorsal striatum disrupt this process, and rats with lesions to the lateral striatum showed selective sensitivity to devaluation of the instrumental outcome (Yin et al., 2004), indicating that this area is necessary for habit formation. The present experiment further explored the basis of this dysfunction by examining the ability of rats subjected to bilateral 6-hydroxydopamine lesions of the nigrostriatal dopaminergic pathway to develop behavioral autonomy with overtraining. Rats were given extended training on two cued instrumental tasks associating a stimulus (a tone or a light) with an instrumental action (lever press or chain pull) and a food reward (pellets or sucrose). Both tasks were run daily in separate sessions. Overtraining was followed by a test of goal sensitivity by satiety-specific devaluation of the reward. In control animals, one action (lever press) was insensitive to reward devaluation, indicating that it became a habit, whereas the second action (chain pull) was still sensitive to goal devaluation. This result provides evidence that the development of habit learning may depend on the characteristics of the response. In dopamine-depleted rats, lever press and chain pull remained sensitive to reward devaluation, evidencing a role of striatal dopamine transmission in habit formation.
Current learning theories assume that instrumental performance is mediated by two different associative representations. During training, performance is goal directed and essentially controlled by an action-outcome (A-O) association. This behavior is flexible and sensitive to outcome devaluation. After extensive training, the performance becomes essentially driven by a stimulus-response (S-R) association, and behavior becomes inflexible and is no longer sensitive to devaluation of the outcome. After extensive training, the shift from goal-directed action to habit is supposed to reflect an increasing domination of S-R associations on behavior (Adams and Dickinson, 1981; Dickinson, 1985). These two kinds of associative representations are supposed to sustain two dissociated forms of memory: declarative memory mediated by A-O associations and procedural memory mediated by S-R associations.
In Parkinson's disease (PD), a degeneration of dopaminergic neurons of the substantia nigra (SN) leads to a progressive striatal depletion of dopamine (DA). This pathology is classically associated with motor impairments but also with cognitive deficits, such as skill and habit-learning impairments, which appear before the onset of clinically identifiable movement disorders. More precisely, procedural learning deficits are observed in PD patients and depend on the severity of the disease and the motor demands of the task. The involvement of the dopaminergic system in parkinsonian cognitive symptoms remains a subject of debate (Dujardin and Laurent, 2003). In rats, lesions or inactivation of the dorsal striatum impair acquisition of instrumental tasks in which a discrete stimulus predicts the availability of reinforcement (Green et al., 1967; Kirkby and Polgar, 1974; Winocur, 1974; Packard and White, 1990; Robbins et al., 1990; Reading et al., 1991; McDonald and White; 1993; El Massioui and Van Golf Racht-Delatour, 1997). Recently, the lateral part of the dorsal striatum has been implicated specifically in performance on procedural tasks (White, 1989, 1997; Packard and McGaugh, 1996; Devan and White, 1999; Devan et al., 1999; Featherstone and McDonald, 2004) as well as in habit learning, as demonstrated by evidence that control by a goal-directed process is reestablished by lesions to the lateral striatum (Yin et al., 2004).
Within the striatum, tonically active neurons maintain their increased firing rate to reward-predicting stimuli after overtraining (Aosaki et al., 1994a), whereas midbrain dopamine neuronal activity to such stimuli decreases (Ljungberg et al., 1992; Aosaki et al., 1994b). The increased firing of striatal neurons to stimuli of importance is eliminated or attenuated by impaired dopaminergic transmission in the striatum (Aosaki et al., 1994b; Raz et al., 1996; Watanabe and Kimura, 1998), even after very extensive training and although the instrumental response was barely altered. This DA-induced impairment could account for the inability of animals with striatal dysfunction to develop S-R habits, like impairment of dopaminergic transmission in patients with Parkinson's disease is related to habit-learning dysfunction (Salmon and Butters, 1995; Knowlton et al., 1996).
The aim of the present experiments was to study the role of the nigrostriatal dopaminergic pathway in instrumental habit formation. For this purpose, we assessed the S-R status of instrumental performance using satiety-specific devaluation of the reward after overtraining.
Materials and Methods
Twenty-four Sprague Dawley rats (IFFA Credo, Saint-Germain sur l'Arbresle, France), weighing between 260 and 300 g, were housed by pairs in a temperature- and humidity-controlled colony room and maintained under a 12 h light/dark cycle (lights on at 8:00 A.M.). On arrival to the laboratory, rats were given ad libitum access to food and water for 2 weeks and handled on a daily basis. Two weeks later, rats underwent a food deprivation schedule to reduce their weight to 85% of their original free-feeding body weight. All experiments were performed in accordance with the recommendation of the European Economic Community (EEC) (86/609/EEC) and the French National Committee (87/848) for care and use of laboratory animals.
Animals were trained in two identical Skinner boxes (Campden Instruments, Cambridge, UK) in a dark sound-attenuating cubicle ventilated by an exhaust fan mounted on the side (sound intensity, 40 dB). Skinner boxes (34.3 cm height × 50.8 cm width × 34.3 cm depth) were heavy-duty aluminum except for the front wall, which was a 0.64-cm-thick clear Plexiglas door that opened downward. According to the different phases of the protocol, a chain and/or a lever were accessible in the boxes. A chain was hanging from the roof at the right of the magazine, at 6 cm from the door and 1.5 cm from the left walls. On the left wall, the lever was located 10 cm from the back wall and 3 cm away from the floor. A magazine, in which two different types of precision pellets (sucrose or food precision pellets; 45 mg; Campden Instruments) could be delivered through two different pellet dispensers, was located in the middle of the wall. A panel, which rats can push with their nose, closed the magazine. The floor of the chambers was composed of 16 rods (0.47 cm in diameter made out of stainless steel and spaced 0.95 cm apart). Each chamber contained with five lamps (2 W each): one glass-covered stimulus lamp was located above each lever, one was located centrally at the top of the intelligence panel, one house lamp was in the ceiling of the chamber, and another was in the magazine. A single loudspeaker was mounted in the back part of the ceiling through which an auditory stimulus (1200 Hz, 80 dB) could be delivered. The Skinner boxes were connected to a computer via an interface through which stimulus, lever presses, chain pulls, and magazine entries could be monitored and recorded.
Two other apparatus, located in a different room, were used for the prefeeding and the satiety test. Prefeeding boxes consisted of a square Plexiglas box (50 cm height × 27 cm width × 27 cm depth) that contained a dish filled with pellets (food or sucrose). The satiety test was conducted in the home cage made of Plexiglas (18 × 30 × 20 cm) with two small dishes on each side.
6-Hydroxydopamine striatal injections
Two groups of rats [6-hydroxydopamine (6-OHDA)-injected, n = 12; sham, n = 12] were anesthetized with pentobarbital (50 mg/kg; Sanofi, Libourne, France) and received one 0.1 ml injection of atropine (0.25 mg/ml, i.m.; Laboratoire Aguettant, Lyon, France). They were then placed in a stereotaxic frame on a thermal barrier to maintain their body temperature (37-38°C). 6-OHDA was injected through a glass micropipette (internal tip diameter, 70-80 μm) glued to the needle of a 10 μl Hamilton syringe filled with liquid paraffin solution.
6-OHDA injections in the striatum were aimed at inducing a partial retrograde bilateral degeneration of the nigrostriatal pathway. Bilateral injections were performed stereotaxically at two different sites of the lateral striatum: anterior sites relative to bregma (Paxinos et al., 1986): anteroposterior (AP), +0.2; mediolateral (ML), ±3.6; dorsoventral (DV), -5.6 and -4; posterior sites relative to bregma: AP, -0.8; ML, ±3.9; DV, -5.6 and -4. These injection sites were close to the caudate-putamen (CPU)/globus pallidus junction where the nigral dopaminergic axons ascend along the internal capsule and enter the CPU as described by Kirik et al. (1998). A volume of 0.3 μl of 6-OHDA (10 μg/μl in PBS, pH 7.4; Sigma, Saint Quentin Fallavier, France) at a concentration of 4 μg/μl was injected at each site, which corresponds to 1.2 μg of 6-OHDA per site. The solution was injected at a rate of 0.1 μl/min, and the glass micropipettes were left in place for 5 additional minutes.
After surgery, animals were given an injection of Valium (0.1 ml, i.p.; 0.2%; Roche, Neuilly-sur-Seine, France). Sham animals were treated identically, except the micropipette was not introduced in the brain. Surgery was done 16 weeks before the behavioral analysis.
In this set of experiments, a specific satiety paradigm was used to devalue the reinforcer and test the sensitivity of the performance to this outcome devaluation. Each rat performed two different instrumental actions (chain pull and lever press) that gave access to two different outcomes (sucrose pellets and food pellets) only available during a discriminative stimulus (light or tone). This complex instrumental situation allows the devaluation of one action-outcome association to be compared with the other action for which the outcome was not devalued. Skinner boxes, time of passage, and the different stimuli (tone/light and food/sucrose) were counterbalanced for each experimental group. Except for the extinction and reward tests, rats were always run twice daily with a minimum of a 1.5 h delay between the two sessions.
Pretraining phase. Magazine training: on days 1 and 2, the animals were placed in the Skinner boxes for a 30 min session. The reinforcement consisted of one precision pellet delivered at variable intervals ranging between 20 and 100 s, with an average of 60 s (VI60 schedule). Each rat was trained to the magazine for food pellets and sucrose pellets on separate sessions each day.
Instrumental learning: on days 3 and 4, rats were placed in the Skinner boxes for a 30 min session of continuous reinforcement (CRF). Each lever press or chain pull was reinforced by precision pellets. The session ended for each rat when it has obtained 50 reinforcements. If this was not completed within 30 min, rats were submitted to an extra session at the end of the day. Each rat was submitted to CRF for chain and lever on separate sessions each day. On days 5 and 6, the actions led to the delivering of the reinforcement in a variable interval schedule ranging between 10 and 40 s with an average of 20 s (VI20 schedule) for 30 min.
Learning phase. As during pretraining, the rats were submitted to two separate sessions each day during which they performed each separate action (chain pull or lever press).
During the 20 min session of training, 20 presentations of a 15 s light (central light) or tone (1200 Hz, 80 dB) were delivered at variable intervals ranging from 10 to 90 s, with an average of 45 s. Actions were only reinforced during the onset of the stimulus. Number of actions and of magazine entries were recorded during the 15 s period of stimulus presentation and during the 15 s period preceding the stimulus. Latencies of the first action during the stimulus were also recorded. The discrimination learning phase terminated when rats reached the predetermined criterion: a coefficient of acquisition was calculated from the rate of actions recorded during the stimulus (A) divided by the rate of actions during the 15 s before the stimulus onset (B) plus A (A/A + B). Thus, a coefficient of acquisition of 0.5 corresponds to an equivalent rate of responding before and during the stimuli (i.e., no learning of the stimulus predictive value). The coefficient of acquisition will tend toward 1 when the rate of actions increase during the stimuli and decrease in between the stimuli. Based on preliminary studies, the criterion is reached and learning phase ends when the coefficient of acquisition reaches 0.7. After reaching the criterion, all rats were overtrained for 22 more sessions.
Outcome devaluation: extinction test. One day after the end of the overtraining, the devaluation treatment was conducted. These tests were composed of a prefeeding phase, an extinction session in Skinner boxes, and a satiety test.
In the prefeeding phase, for 1 h, the animals were fed with one of the outcome used during the discrimination learning in the prefeeding box; rats were given ad libitum access to 50 g of either food or sucrose pellets. For each rat, the devalued food was chosen to devalue the outcome associated with the chain in the chain-devalued group and with the lever in the lever-devalued group. Thus, for the lever-devalued group, the lever outcome was devalued and not the chain outcome; for the chain-devalued group, the chain outcome was devalued and not the lever outcome.
In the extinction test, just after prefeeding, the rats were submitted to a 30 min extinction session in the Skinner boxes. Thirty presentations of the stimuli (15 light and 15 tone) were delivered in a pseudorandom manner at variable intervals ranging between 10 and 90 s with an average of 45 s. The animals had the choice between the lever and the chain, but neither action was reinforced. The performance of the action whose outcome was devalued was then compared with the performance of the action whose outcome was not devalued.
After the extinction session, the satiety test was performed in another experimental room. Rats were given a 5 min choice between two small dishes containing 50 sucrose or 50 food pellets. The number of eaten pellets was recorded.
In the reacquisition session, 1 d after the extinction and satiety tests, the rats were retrained in Skinner boxes with the same paradigm as during the discrimination phase.
Outcome devaluation: reward test. The day after retraining, rats were submitted to a reward test in Skinner boxes. This test was exactly the same as the extinction test (prefeeding, reward test in Skinner boxes, and satiety test), except that both actions were reinforced during the appropriate stimuli.
After completion of the behavior, rats were killed with an overdose of pentobarbital (120 mg/kg, i.p.; Sanofi) and perfused transcardially with 100 ml of 0.9% sodium chloride containing 5% heparin and 1% sodium nitrite, followed by 300 ml of cold 4% paraformaldehyde (4°C) in 0.1 m phosphate buffer (PB). Brains were removed, postfixed for 4 h at 4°C in the same fixative, and immersed in a graded series of sucrose phosphate-buffered solutions (12, 16, and 18%). Serial coronal sections (40 μm thick) were cut on a freezing microtome and collected in an anatomical series. Sections were stored at -20°C in a cryoprotective solution (30% glycerol and ethylene glycol in 0.1 m PB). Every 12th section was stained with 0.15% gallocyanine (Gurr, Poole, UK) to identify immunostained structures.
Tyrosine hydroxylase (TH) was used as a marker of dopamine neurons in the substantia nigra pars compacta (SNc)/ventral tegmental area (VTA) and dopamine processes in the striatum (Björklund and Lindvall, 1984). Expression of the dopamine and cAMP-regulated phosphoprotein of 32 kDa (DARPP-32) protein was used to assess the integrity of the GABAergic efferent neurons of the striatum (Greengard et al., 1998; Ouimet et al., 1998). Free-floating sections were preincubated in PBS containing 5% normal goat serum (NGS) and 0.3% Triton X-100 for 30 min at room temperature. Sections were then incubated for 48 h at room temperature in PBS containing 3.5% NGS, 0.3% Triton X-100, 0.5% bovine serum albumin, 0.05% sodium azide, and the TH antibody (diluted 1:10,000; Institut Jacques Boy, Reims, France) or the DARPP-32 antibody (diluted 1:30,000; Euromedex, Mundolsheim, France). Sections were then processed by the avidin-biotin peroxidase method with tyramine amplification (Berghorn et al., 1994) using the Vectastain and VIP kits (Vector Laboratories, Burlingame, CA) and the BLAST kit (NEN Life Science Products, Boston, MA).
Reconstruction of the extension of TH labeling in SNc and striatum was done by microscopically examining sections with reference to the stereotaxic atlas of Paxinos and Watson (1986) and chemoarchitectonic atlases of Paxinos et al. (1999a,b). To define clearly the extent of lesion in the SN, we use the loss of the DA cell bodies of the SNc and DA fibers in the pars compacta and the pars reticultata of the SN. In the dorsal striatum, the loss of DA innervation was measured using TH labeling of DA fibers. The maximum and minimum sizes of lesions were reported on schematic sections of the Paxinos and Watson atlas (1986) (see Fig. 2).
Behavioral data were analyzed with contrast ANOVAs.
Intrastriatal bilateral injections of 6-OHDA induced an important neuronal loss of dopaminergic neurons in the lateral part of the substantia nigra pars compacta (Figs. 1C,F, 2B) associated with a drastic loss of TH immunoreactivity in the lateral part of the striatum (dorsal and ventral) (Figs. 1B,E, 2A), indicating a marked DA deafferentation of this part of the striatum. However, the striatal DARPP-32 immunoreactivity was similar in normal and 6-OHDA-injected rats, indicating that the GABAergic efferent neurons were not affected by the introduction of the micropipette or by the 6-OHDA injection (Fig. 1A,E).
Coefficients of learning
Because rats learn the two instrumental actions in parallel, only a global coefficient of instrumental learning, averaging lever press and chain pull performances, could be used to decide when each group reached the learning criterion and could therefore start the 22 overtraining sessions. There was no difference between rates of responses to the light and tone or between the rate of actions giving access to food pellets or sucrose pellets; analyses were averaged on these two parameters.
Sham rats required six sessions to reach the overall criterion (0.7), whereas the 6-OHDA-injected rats did not reach the criterion until the 12th session (Fig. 3A). A between-group comparison of performances during the first 6 d showed that 6-OHDA-injected animals were significantly slower to learn the instrumental actions than sham rats (F(1,22) = 5.74; p < 0.05). After overtraining, the sham and 6-OHDA-injected animals performed similarly (F < 1).
To determine whether the overall difference could be account for one particular action, we analyzed separate learning curves for the lever press (Fig. 3B) and chain pull (Fig. 3C). Between-groups analysis showed that 6-OHDA-injected animals were slightly slower than sham rats in learning to press the lever to get the reinforcer: they reached the criterion on the 12th day, whereas sham rats reached it on the sixth day. Their performance during the first 6 d was also slightly lower than for sham rats, but this difference did not reach a significant value (F(1,22) = 3.64; NS). Moreover, 6-OHDA-injected rats needed 16 sessions to reach the learning criterion to pull the chain to get the outcome (i.e., 6-OHDA-injected rats only reached the learning criterion for chain pull on the fourth day of overtraining), in contrast to sham rats who learned this instrumental action in six sessions. 6-OHDA-injected rats also showed significantly worse performance than sham rats during the first 6 d of training (F(1,22) = 5.99; p < 0.05). At the end of overtraining, 6-OHDA-injected and sham rats exhibited similar coefficients of acquisition for the chain pull and for the lever press (F < 1), with no difference between the two instrumental performances (F < 1).
Rates of responding
A more detailed analysis of learning performance indicated that whereas the mean rate of lever presses progressively increased during stimulus presentations until reaching the criterion value for sham (F(5,55) = 33.97; p < 0.01) and 6-OHDA groups (F(11,121) = 21.81; p < 0.01), it progressively decreased before the stimulus (sham, F(5,55) = 6.84, p < 0.01; 6-OHDA rats, (F(11,121) = 4.04, p < 0.01). The period (before and during the stimulus) × session interactions showed a significant difference (sham, F(5,55) = 24.13, p < 0.01; 6-OHDA-lesioned rats, F(11,121) = 12.85, p < 0.01). On the last day of training, the rates of lever press to the stimulus were higher than before the stimulus in both sham (F(1,11) = 13.9; p < 0.01) and lesioned rats (F(1,11) = 9.70; p < 0.01) (Fig. 3B1). There was no between-groups difference for the rates before or during stimulus presentations (F < 1). Similarly, on the last overtraining session, the rates of lever press during the stimulus were significantly higher than before the stimulus (sham, F(1,11) = 293, p < 0.001; 6-OHDA, F(1,11) = 101.08, p < 0.001) (Fig. 3B2), with no between-groups difference (before stimulus, F < 1; during stimulus, F(1,22) = 1.03). The mean rates of chain pull also increased significantly during the stimulus for both groups during training (sham, F(5,55) = 11.92, p < 0.01; 6-OHDA, F(11,121) = 23.25, p < 0.01). Before the stimulus, it was not significantly increased in sham animals (F(5,55) = 1.36; NS) and slightly increased in 6-OHDA rats (F(11,121) = 1.99; p < 0.05). However, the period × session interactions were significant (sham, F(5,55) = 12.50, p < 0.01; 6-OHDA, F(11,121) = 5.79, p < 0.01). On the last day of training, the rate of chain pull to the stimulus was higher than before in sham animals (F(1,11) = 16.80; p < 0.01) but not in 6-OHDA-lesioned rats (F(1,11) = 2.7; NS) (Fig. 3C1). There was no between-groups difference either before stimulus presentations (F(1,22) = 1.28; NS) or during the stimulus (F(1,22) = 2.49; NS). Finally, on the last overtraining session, the rates of chain pull were significantly higher during the stimulus than before for both groups (sham, F(1,11) = 80.74, p < 0.001; 6-OHDA, F(1,11) = 30.94, p < 0.001) (Fig. 3C2), with no between-groups difference (F < 1). Moreover, the rates of response during and before the stimulus over the 22 overtraining sessions showed that there was no difference between lever press and chain pull for both sham (before stimulus, F < 1; during stimulus, F(1,22) = 2.59; NS) and 6-OHDA-lesioned rats (F < 1).
During training, there was no difference between sham and 6-OHDA-injected animals for latencies of the first lever press (F(1,22) = 1.5; NS) or for the first chain pull from the onset of the stimuli (F < 1) (Fig. 4). Within-groups analyses of latencies during the first 4 d showed that the 6-OHDA-injected rats needed slightly more time to pull the chain than to press the lever (F(1,11) = 5.24; p < 0.05), but the sham animals did not (F(1,11) = 3.34; NS). However, analysis on all sessions showed that there was no significant difference in performance of the two different actions for 6-OHDA-injected and sham animals (F < 1). At the end of overtraining, response latencies were similar for both groups (F < 1) and for both actions (F < 1).
Effect of postovertraining devaluation
Consumption during the prefeeding
Before the extinction test, as before the reward test, there was no difference between 6-OHDA-injected and sham animals in the total number of consumed pellets (food or sucrose) when the lever was devalued (F(1,10) = 1.55, NS and F < 1, respectively) or when the chain was devalued (F < 1).
Consumption during the satiety test
After the extinction test, sham and 6-OHDA-injected rats both consumed significantly more nonsatiated pellets than satiated pellets (F(1,5) = 9.6; p < 0.05 and F(1,5) = 734.76; p < 0.01, respectively) when the devalued outcome was associated with lever pressing (Fig. 5A). The same effect was observed for rats devalued on the outcome associated with chain pulling (sham, F(1,5) = 22.10, p < 0.01; 6-OHDA-injected, F(1,5) = 13.05, p < 0.05) (Fig. 5B). There was no between-groups difference.
After the reward test, sham and 6-OHDA-injected rats devalued on the outcome associated with lever pressing (Fig. 5C) consumed significantly more nonsatiated pellets than satiated pellets (F(1,5) = 36.45; p < 0.01 and F(1,5) = 33.91; p < 0.01, respectively). The same effect was observed for the rats devalued on the outcome associated with chain pulling (F(1,5) = 16.70; p < 0.01 and F(1,5) = 558.53; p < 0.01, respectively). There was no between-groups difference.
During these tests, the devalued action was always compared with the nondevalued action for each animal: devaluation of the outcome associated to lever presses led to a devaluation of this action, which is then compared with the nondevalued action, the chain pull for the same rats. In the same way, devaluation of the outcome associated with the chain pull led to a devaluation of this action, which is compared for the same rats with the nondevalued action, the lever press. As the rate of responses decreased rapidly during the extinction test, analyses were done on the first six trials (of 15) of extinction for each stimulus. In those tests, the actions during the tests were expressed as the percentage difference between the rate of response from the last day of overtraining in extinction test and the reacquisition day for the reward test.
There was a devaluation effect on performance in sham as well as in 6-OHDA-injected rats on both extinction and reward tests (extinction: sham, F(1,10) = 9.83, p < 0.05; 6-OHDA-injected, F(1,10) = 19.83, p < 0.01; reward: sham, F(1,10) = 7.46, p < 0.05; 6-OHDA-injected, F(1,10) = 17.46, p < 0.01). However, there was a differential effect of devaluation depending on the kind of action: in the sham animals, outcome devaluation had no effect on lever press (extinction, F(1,5) = 3.41, NS; reward, F <1) but had a clear effect on chain pull (extinction, F(1,5) = 8.98, p < 0.05; reward, F(1,5) = 75.56, p < 0.001), with no significant interaction between action and devaluation (extinction, F < 1; reward, F(1,10) = 3.46; NS). In 6-OHDA-injected rats, outcome devaluation had an effect on lever press (extinction, F(1,5) = 6.38, p = 0.05; reward, F(1,5) = 14.44, p < 0.05) and chain pull during the extinction test (F(1,5) = 14.2; p < 0.05) and was close to significant during the reward test (F(1,5) = 5.59; NS). Moreover, there was a significant interaction between action and devaluation in 6-OHDA-injected rats for the extinction test alone (F(1,10) = 6.24; p < 0.05). The interaction between lesion and devaluation during the reward test for lever press was close to being significant (F(1,20) = 4.18; p = 0.054) and thus strengthened the differential effect of devaluation in sham and 6-OHDA-injected rats. Moreover, for the chain performance, there was no lesion × devaluation interaction in extinction or reward test (F(1,20) = 2.74, NS and F < 1).
In sham animals, response latencies were not affected by outcome devaluation of lever press (extinction, F < 1; reward, F(1,5) = 1.54; NS), but devaluation had a clear effect on chain pull (extinction, F(1,5) = 7.3, p < 0.05; reward, F(1,5) = 104.54, p < 0.001). Moreover, there was a significant interaction between action and devaluation during the reward test (F(1,10) = 5.11; p < 0.05). In 6-OHDA-injected rats, latencies of both lever press and chain pull were affected by outcome devaluation (extinction: lever press, F(1,5) = 13.34, p < 0.05; chain pull, F(1,5) = 9.94, p < 0.05; reward: lever press, F(1,5) = 35.25, p < 0.01; chain pull, F(1,5) = 8.53, p < 0.05). There was no significant interaction between action and devaluation in these animals (extinction, F(1,10) = 3.30; reward, F < 1).
There was an effect of lesion on the latency of lever press during the extinction test (F(1,20) = 4.88; p < 0.05). This effect nearly reached the significant level when lever presses were devalued (F(1,10) = 4.71; p = 0.055), but not when lever presses were not devalued (F(1,10) = 1.62; NS).
In this study, we examined the ability of rats subjected to bilateral 6-OHDA lesions to the nigrostriatal dopaminergic axons to develop behavioral autonomy with overtraining. Our results evidenced that lever press performance in control animals after overtraining was insensitive to selective outcome devaluation by specific satiety, indicating that this action became a habit. In contrast, the sensitivity of 6-OHDA-injected animals to goal devaluation evidences that lever-press performance remained goal directed after overtraining. However, chain pull remained sensitive to selective goal devaluation in sham as in lesioned animals after overtraining, thus evidencing a differential evolution in habit learning depending on the kind of behavior to be learned.
As described previously (Berger et al., 1991; Ichitani et al., 1991), our 6-OHDA injections into the lateral striatum induced a partial retrograde degeneration of dopaminergic cell bodies in the lateral part of the substantia nigra, with a sparing of the mesolimbic dopaminergic pathway and a complete dopamine deafferentation of the lateral striatum (dorsal and ventral parts). Moreover, in our experimental conditions, the striatal DA depletion was not associated with a loss of efferent neurons. It allows us to attribute the behavioral impairments observed in our study to the dopaminergic deafferentation of the lateral striatum and not to local striatal lesions known to induce, by themselves, learning deficits (White, 1997; Adams et al., 2001).
During training of the instrumental tasks, DA denervation of the lateral striatum impaired early acquisition of lever press and chain pull. 6-OHDA-injected rats needed more sessions to reach the criterion than controls and also showed more difficulty to reach the criterion for the chain than for the lever. They, moreover, also responded more slowly to the chain than to the lever. The observed increase of responding before stimuli in injected rats could account for this delay in the evolution of coefficients of learning and could be considered as perseverations. Similar perseverative behavior has often been associated with response deficits observed in animals with lateral striatum excitotoxic or 6-OHDA lesions (Kirkby, 1969; Amalric et al., 1995; Devan et al., 1996; Eagle et al., 1999; El Massioui et al., 2001; Smith et al., 2002) and was interpreted as either an inability to inhibit ongoing actions or as a failure to initiate a new response. Moreover, the lack of response latency impairment could be attributable to the extent of our striatal DA depletion, because 6-OHDA infusion in the medial forebrain bundle that induces similar DA depletion produces no reliable impairment in a reaction time task (Smith et al., 2002) or skilled paw use (Przedborski et al., 1995; Lee et al., 1996). Significant impairment in skilled paw reaching does not appear unless 80-90% of striatal TH-positive fibers density or 60-80% of TH-positive neurons in substantia nigra are lost (Barneoud et al., 1991; Lee et al., 1996).
Although in our study rats with partial DA depletion were initially impaired in lever press and chain pull acquisition after overtraining of the instrumental tasks, they were still capable of acquiring and maintaining a performance similar to sham animals during most of overtraining. However, evidence from selective outcome devaluation by specific satiety shows that the associative status of lever press and chain pull is not equivalent at the end of overtraining. The lever presses of sham rats were yet mediated by S-R learning after overtraining, thus no longer sensitive to selective reward devaluation, whereas chain pulls were still controlled by goal expectancy, although both actions were initially learned at exactly the same rate. From Dickinson's theoretical position on habit formation, the contribution of S-R process should increase with the amount of training and overtraining is presumed only to be effective in producing habits if performance remained at a high unchanged asymptotic rate throughout repeated sessions (Dickinson, 1994; Dickinson et al., 1995). In our situation, sham rats maintained their rates of response during the stimulus at asymptotic level for the 22 sessions, with no difference between rates of response to the chain or the lever. Therefore, it seems thus that the remaining sensitivity of the chain pull to reinforcement devaluation compared with lever press could not be attributable to changes in response rates but to other factors.
There is little evidence in the literature suggesting that associations between events could differentially evolve with training depending on the required motor response. The greater difficulties of our 6-OHDA-injected animals to learn chain pull could indicate that this action had a higher motor demand that was more difficult to perform and thus took more sessions to become a habit, even in control animals. Previous studies showed that schedules inducing a high rate of responding, or having substantial response requirements, are highly sensitive to the effects of DA depletion in the ventral striatum (Aberman and Salamone, 1999; Ishiwari et al., 2004). Moreover, the effects of DA antagonists can vary as a function of the instrumental tasks being used or as a function of task requirements (Ettenberg et al., 1981; Caul and Brindle, 2001). These data support the hypothesis that chain pull had a higher response cost than lever press and possibly will necessitate more extensive training to become autonomous of the goal.
Contemporary theories of instrumental learning assume that acquisition and performance of instrumental actions require interactions of both S-R habits and action-outcome processes (Dickinson, 1981; Dickinson and Balleine, 1994; Dickinson et al., 1995) and that, after extended training, S-R processes gain domination over behavior. It is well established that overtraining rats to press a lever for a reward renders performance of that action habitual; this action (lever press) becomes insensitive to posttraining changes in the value of the reward (Adams and Dickinson, 1981; Adams, 1982; Balleine, 2001). The development of an S-R habit appears to involve the lateral part of the dorsal striatum, as shown by the high sensitivity to the current value of the training reward observed after NMDA-induced lesions to the dorsolateral striatum in rats, suggesting that the instrumental performance was still controlled by goal expectancy for these animals (Yin et al., 2004). The idea that the dorsolateral striatum may underpin the development of S-R habits has been demonstrated by previous results (Mishkin et al., 1984; Reading et al., 1991; White, 1997; Jog et al., 1999), but Yin et al. (2004) were the first to assess directly the specific involvement of the lateral part of the striatum in habit learning using a postovertraining outcome devaluation procedure. The present results show that behavioral control gained by goal-insensitive S-R habit processes during extended lever press training was disrupted by DA depletion in the nigrostriatal pathway. Our DA deafferentation of the lateral striatum, without local striatal lesion, was sufficient to disrupt habit formation, although lesioned rats were still capable of performing a sustained high rate of responding during overtraining sessions. One possible hypothesis is that, because surgery was done before learning, anatomical reorganization could account for the sustained instrumental response during overtraining. However, a previous study using inactivation of the striatum before each overtraining session in unlesioned rats showed no impairment in a lever press task, but habit formation was prevented when striatum was blocked before each test of automaticity (Van Golf Racht-Delatour and El Massioui, 2000). These arguments confirm that the shift of performance from action to habit sustained by S-R association was impaired by striatal dopaminergic dysfunction without impairing the acquired instrumental performance. The present results specify the role of the lateral striatal DA modulation on this process.
Dopamine activity has long been hypothesized to mediate reward processing and the attribution of motivational value to reward-related events (Robinson and Berridge, 1993; Schultz, 2002). Dopamine activation appears to allow neuronal changes during the progress of learning. More precisely, during initial learning, dopaminergic neurons show increased responses to primary reward, which are progressively transferred to the reward-predicting stimulus as learning increases (Ljungberg et al., 1992). During a transient learning period, both reward and stimuli are able to activate dopaminergic neurons. However, when learning is complete, if there is no environmental change, neurons become activated only by reward-predicting stimuli and no more by rewards (Romo and Schultz, 1990; Mirenowicz and Schultz, 1994). A possible interpretation of our results is that the degeneration of the dopaminergic nigrostriatal pathway could prevent the shift from goal-directed behavior to stimulus-driven behavior during extended training, because it prevents the shift from neuronal activation to stimulus and reward to an activation restricted only to reward-predicting stimulus.
In conclusion, in a model of striatal dopamine depletion devoid of striatal neuronal loss, and therefore relevant as a model of Parkinson's disease, the present results demonstrate for the first time the implication of the nigrostriatal dopaminergic system in habit formation. These results could be linked to the procedural or skill learning deficits classically observed in PD patients. However, in Parkinson's disease, the role played by dopamine depletion in the pathophysiology of cognitive dysfunction remains a subject of debate, and an interaction between DA and acetylcholine within the striatum via cholinergic interneurons cannot be ruled out. More studies are needed to specify a possible role of such an interaction within the lateral striatum in habit formation.
We thank Pascale Veyrac and Nathalie Samson for outstanding care of rat colonies and Gérard Dutrieux for assistance in software design and electronic appliances.
Correspondence should be addressed to A. Faure, Laboratoire de Neurobiologie de l'Apprentissage, de la Mémoire et de la Communication, Unité Mixte de Recherche 8620, Bâtiment 446, Université Paris Sud, 91405 Orsay Cedex, France. E-mail:.
Copyright © 2005 Society for Neuroscience 0270-6474/05/252771-10$15.00/0