Abstract
A2A receptors are a major class of G-protein-coupled receptors for adenosine. Highly expressed in the striatum, on the projection neurons giving rise to the striatopallidal or “indirect” pathway, they have been implicated in sleep, addiction, and other processes, yet their role in the control of striatal circuits and behavior remains unclear. Using established assays from the instrumental learning paradigm, we showed that mice with striatum-specific deletion of A2A receptors were selectively impaired in habit formation. After training that generated habitual lever pressing in wild-type controls, the performance of striatum-specific A2A knock-out mice remained goal directed, being highly sensitive to outcome devaluation and reversal of the action–outcome contingency. These data demonstrate a critical role for A2A receptors on striatopallidal medium spiny projection neurons in shaping behavior and decision making, providing the first instance of a selective alteration in instrumental learning after striatum-specific genetic manipulations.
Introduction
A2A adenosine receptors are highly expressed in the striatum, the input nucleus of the basal ganglia, in a major population of medium spiny projection neurons giving rise to the striatopallidal or indirect pathway, and interact with other receptors that modulate of glutamatergic transmission, such as D2 dopamine receptors (Giménez-Llort et al., 2007; Schiffmann et al., 2007; Ferré et al., 2008; Azdad et al., 2009). They are thus in a position to control the corticobasal ganglia circuits so critical for motivated and voluntary behavior, but their functional role in behavior is poorly understood.
In the laboratory, striatum-dependent behaviors can be studied using analytical tools from the instrumental learning paradigm. Any instrumental behavior, such as pressing a lever for food, can be controlled by two distinct central processes. At first, lever pressing is goal directed and sensitive to manipulations like outcome devaluation. Under certain conditions, it can become more habitual and impervious to changes in the value of the outcome or to changes in the action–outcome contingency itself. Studies in flies, mice, rats, horses, monkeys, and humans have shown some version of this transition from more flexible and goal-directed behavior to inflexible and habitual behavior (Miyachi et al., 2002; Yin et al., 2004; Hilario et al., 2007; Brembs, 2009; Parker et al., 2009; Tricomi et al., 2009), as the neural substrate controlling behavior switches from the associative corticobasal ganglia network to the sensorimotor network (Yin and Knowlton, 2006; Yin et al., 2008). This transition is thought to involve synaptic plasticity at glutamatergic synapses in the striatum (Yin et al., 2008, 2009).
A2A receptors are required for long-term potentiation (LTP) of glutamatergic transmission in the hippocampus (Rebola et al., 2008) and in the striatum (Flajolet et al., 2008; Shen et al., 2008b). Blockade of A2A receptors abolished spike-timing-dependent LTP in striatopallidal neurons (Shen et al., 2008b). As different forms of synaptic plasticity are thought to be involved in striatum-dependent forms of learning and memory, we hypothesized that striatal A2A receptors are necessary for habit formation. Using striatum-specific A2A knock-out mice (Shen et al., 2008a), we tested this hypothesis using devaluation and omission—established behavioral assays for habit learning. In outcome devaluation, the value of the reward earned by the previously trained action (e.g., lever press) is reduced by prefeeding of the reward just before a short probe test. If the action is goal directed, then a reduction in the current value of the goal should immediately reduce performance (Dickinson, 1985). If, however, the action is habitual, then devaluation should have no effect on performance, since habits are elicited by antecedent stimuli which are not affected by devaluation. In omission, the causal relationship between the lever press and the food reward is reversed (Davis and Bitterman, 1971). Instead of earning the reward, the press now prevents reward delivery (Yin et al., 2006). Again, because habitual behavior is not controlled by the action–outcome contingency, it is expected to be less sensitive to the reversal of this contingency.
Materials and Methods
Striatum-specific A2A knock-out mice.
The generation of the striatum-specific A2AR knock-out (KO) mice (st-A2AR KO) by cross-breeding floxed A2AR (A2ARflox/flox) with Dlx5/6-cre transgenic mice has been described previously (Shen et al., 2008a). Dlx5/6-cre transgenic mice display striatum-specific expression of Cre recombinant proteins owing to the restricted striatal activity of the murine Dlx-5/6 regulator element during development (Price et al., 1991; Bulfone et al., 1993a,b). Dlx5/6-driven, Cre-mediated deletion of the A2AR genes in the striatum (with only minimal effects in hippocampus, cerebral cortex, and other brain regions) has been confirmed by PCR analysis of genomic DNA (Shen et al., 2008). A2AR protein and mRNA in the striatum of the st-A2AR-KO mice were shown to be reduced to the background level seen in gb-A2AR KO mice (Shen et al., 2008). A recent study using the Rosa26-Cre reporter line also confirmed the specificity of the striatum-specific expression pattern of this Dlx5/6-Cre transgenic line (Ohtsuka et al., 2008).
Instrumental training.
All experiments were conducted in accordance with the Duke University Institutional Animal Care and Use Committee guidelines. Mice were placed on a food deprivation schedule to reduce their weight to 80–85% of their baseline weight. They were fed 1.5–2 g of home chow each day after training. Water was available at all times in the home cages.
Training and testing took place in eight Med Associates operant chambers (21.6 cm length × 17.8 cm width × 12.7 cm height) housed within light-resistant and sound-attenuating walls. Each chamber was equipped with a food magazine that received Bio-Serv 14 mg pellets from a dispenser. Each chamber contained two retractable levers on either side of the magazine and a 3 W, 24 V house light mounted on the wall opposite the levers and magazine. A computer with the Med-PC-IV program was used to control the equipment and record behavior.
Lever-press training.
At the beginning of each session, the house light was turned on and the lever inserted. At the end of each session, the house light turned off and the lever retracted. Initial lever-press training consisted of 4 consecutive days of continuous reinforcement (CRF), during which the animals received a pellet for each lever press. Sessions ended after 90 min or 30 rewards, whichever came first. After CRF, mice were then trained with random interval (RI) schedules to generate habitual lever pressing (Dickinson et al., 1983). They were trained 2 d on RI 30 s, with a 0.1 probability of reward availability every 3 s contingent upon lever pressing, followed by 6 d on the 60 s interval schedules (0.1 probability of reward availability contingent upon lever pressing).
Devaluation tests.
A specific satiety procedure was used for outcome devaluation. This procedure controls the overall level of satiety and motivational state while altering the current value of a specific reward. Mice were given the same amount of either the grain pellets to which they had been exposed in their home cages (non-devalued condition/control), or the purified pellets they normally earned during lever-press sessions (devalued condition). The grain pellet served as a control for overall level of satiety. Immediately after 1 h of unlimited exposure to the pellets, the mice received a 5 min probe test, during which the lever was inserted, but no pellet was delivered. This brief extinction test is designed to test whether the acquired lever pressing of the mice was controlled by the action–outcome instrumental contingency or elicited by antecedent stimuli. On the second day of outcome devaluation, the same procedure was used, except that those animals that received control grain pellets on day 1 received pellets on day 2, and vice versa.
Omission test.
After devaluation, all mice were retrained on RI 60 s for 1 d. The next day, the instrumental contingency was reversed in an omission procedure, which tests the sensitivity of the animal to a change in the prevailing causal relationship between lever pressing and food reward. For the omission training, a pellet was delivered every 20 s without lever pressing, but each press would reset the counter and thus delay the food delivery.
Results
Initial acquisition
All mice learned to press the lever after four sessions of CRF training, in which each press is reinforced with a food pellet. A two-way mixed ANOVA conducted on the first 8 d of lever-press acquisition, with days and genotype as factors, showed no main effect of genotype (F < 1), a main effect of days (F(7,98) = 40.7, p < 0.05), and no interaction between these factors (F < 1). All mice, regardless of genotype, increased their rate of lever pressing during initial acquisition (Fig. 1A).
Devaluation
Planned comparison on lever-pressing data from the devaluation test showed that the performance of wild-type controls was habitual, there being no significant difference between the devalued and non-devalued condition (p > 0.05). In contrast, the lever pressing of A2A KO mice remained goal directed after extended training (p < 0.05) (Fig. 1B).
Omission
When the action–outcome contingency was reversed in an omission procedure, the A2A KO mice more readily reduced their lever pressing. This observation was confirmed by a mixed two-way ANOVA with time and genotype as factors showed a main effect of time (F(5,70) = 15.9, p < 0.05), a main effect of genotype (F(1,70) = 5.3, p < 0.05), and no interaction between these two factors (F(5,70) = 1.4, p > 0.05). Thus, while all mice reduced lever pressing over time, the A2A KO mice more readily reduced their performance (Fig. 1C).
Discussion
Conditions such as overtraining, stress, and exposure to drugs of abuse are known to promote habit formation (Adams, 1982; Nelson and Killcross, 2006; Dias-Ferreira et al., 2009). Although previous work has defined the general circuits involved in goal-directed actions and habit formation, the detailed cellular and molecular mechanisms underlying these processes remain poorly understood (Yin et al., 2004, 2005a,b, 2006, 2008; Yin and Knowlton, 2006; Wassum et al., 2009). Our results demonstrate that striatal A2A receptors are necessary for habit formation. Striatum-specific A2A KO mice did not show any impairments in motor control or motivation, but their lever pressing is more goal directed and flexible than that of wild-type controls with identical training. This is the first report of a striatum-specific genetic manipulation limited to a specific neuronal population leading to a selective deficit in instrumental learning, revealing a novel molecular mechanism for habit formation.
Because LTP of the glutamatergic input to the striatopallidal pathway is known to require the activation of A2A receptors (Shen et al., 2008b), it could be a critical mechanism for habit formation. A recent study showed that overtraining on a skill-learning task results in increased synaptic strength in the sensorimotor striatum, particularly in neurons belonging to the striatopallidal pathway (Yin et al., 2009). It would be interesting to examine, as we have begun to do, the nature of the synaptic plasticity in A2A KO mice, which will shed light on how the lack of A2A receptor can impact transmission in the relevant striatal circuits.
Because the striatal A2A receptors are located postsynaptically on projection neurons of the striatopallidal pathway, the current results clarified the mechanisms of habit formation at both the molecular and the circuit level. At the circuit level, they suggest that the indirect pathway is critical for habit formation. In traditional neurological literature, this pathway is thought to be critical for the inhibition of behavior, despite the lack of direct evidence. In light of our data, behavioral inhibition may be too simplistic a description of the functional role of the indirect pathway. At the molecular level, the discovery of the importance of A2A receptors suggests intriguing mechanisms for the control of striatal circuits. Recent work has suggested a functional link between CB1 and A2A receptors (Schiffmann et al., 2007). Indeed, a recent study has linked deficit in habit learning with genetic deletion of CB1 cannabinoid receptor (Hilario et al., 2007). CB1 receptors are highly expressed in the striatum, specifically the sensorimotor striatum, though the previous data come from global CB1 knock-outs, thus making it difficult to define the relative contributions of receptors in different brain regions. The use of striatum-specific A2A mice, however, obviates such difficulties with the interpretation of the data.
The differences between CB1 receptors and A2A receptors are striking. A2A receptors are Gs coupled and mainly located on the postsynaptic dendritic spines; CB1 receptors, Gi/o coupled, and found on the presynaptic terminals. That genetic deletion of these receptors produces remarkably similar effects confirms a critical insight: Signaling pathways considered in isolation are not enough to explain behavior. What is needed is a detailed analysis of how diverse molecular mechanisms are coordinated to control the global states of neural networks—undoubtedly a major challenge for the future. In linking molecular mechanisms to specific neural circuits and operationally defined behavioral phenomena, the present study represents an initial step in this direction.
Footnotes
This work is supported by National Institute on Alcohol Abuse and Alcoholism Grants 018018 and 016991 to H.H.Y. and National Institute of Neurological Disorders and Stroke Grants 41083 and 48995 to J.F.C. We thank Mona Leblond and Alberto Lopez for their help with the experiments.
- Correspondence should be addressed to Henry H. Yin at the above address. hy43{at}duke.edu