Abstract
Both humans and rodents can learn to associate specific actions with their outcomes, but with repeated performance or exposure to pathological stimuli, such as drugs of abuse, behaviors assume stimulus-elicited, or “habitual,” qualities. Psychostimulants remodel dorsal striatal neurons, critical determinants of decision-making strategies, but cytoskeletal mechanisms associated with drug-induced habit formation are largely unknown. We first show that cocaine can bias decision-making strategies toward stimulus-response habits by interfering with learning about the predictive relationship between a response and its outcome. In the dorsomedial, but not ventral, striatum, cocaine decreases PSD95 expression and phosphorylation of cortactin, a cytoskeletal regulator that interacts with, and is phophorylated by, the Abl2 (Arg) kinase. Based on this pattern, we inhibited Abl-family kinase signaling in the dorsomedial striatum, impairing new response-outcome learning. Consistent with evidence that the dorsomedial striatum promotes response-outcome decision-making while the dorsolateral compartment promotes stimulus-response habits, inhibition of Abl-family kinases in the dorsolateral striatum reinstates goal sensitivity in over-trained “habitual” mice. These findings provide a structural mechanism by which even acute exposure to drugs of abuse can reorganize behavioral response strategies and promote outcome-insensitive stimulus-response habits.
Introduction
Goal-directed behaviors are defined as those sensitive to changes in the predictive relationship between an action and the desired outcome. Extended training, certain reinforcement schedules, and drug and alcohol exposure induce a shift from goal-directed to automated, or “habitual,” response strategies that are by contrast insensitive to changes in the relationship between behavior and its outcome (Balleine and Dickinson, 1998; Balleine and O'Doherty, 2010). Converging neuroanatomical evidence characterizes this process as a transition from distinct dorsomedial and dorsolateral striatal systems (DMS and DLS, respectively) that may act in concert to a predominantly DLS circuit that is sensitive to projections from the sensorimotor cortex (Yin et al., 2008, 2009; Kimchi et al., 2009). Although the brain circuits underlying habit formation are becoming clearer, specific molecular mechanisms, particularly those that regulate cellular structure through actions on the actin cytoskeleton, have received less attention.
We hypothesized here that neurobiological events that impact the structural stability of striatal neurons that receive and transmit critical neurochemical signals would directly influence the coordination of actions and habits. This hypothesis is based in part on evidence that prolonged exposure to the stress hormone corticosterone or environmental stressors, which increase DLS medium spiny neuron complexity, impair response-outcome decision-making (Dias-Ferreira et al., 2009; Gourley et al., 2012a). Likewise, psychostimulant exposure accelerates habit formation (Schoenbaum and Setlow, 2005; Nelson and Killcross, 2006; Nordquist et al., 2007; Zapata et al., 2010) and reduces DMS dendritic spine density while elevating DLS spine density (Jedynak et al., 2007).
Dendritic spines are rich in filamentous actin, which stabilizes cytoskeletal structure or reorganizes it, depending on intracellular signaling factors, such as Src- and Abl-family kinases, which are also implicated in cocaine-seeking and cocaine-induced psychomotor sensitization (Toda et al., 2006; Gourley et al., 2009, 2012b; Warren et al., 2012). Here we show that cocaine exposure immediately after the modification of a familiar response-outcome contingency occludes subsequent goal-directed decision-making and reduces phosphorylation of the Abl-family kinase target cortactin in the DMS. Intra-DMS inhibition of Abl-family kinase signaling recapitulates the effects of acute cocaine, providing a molecular mechanism by which cocaine might facilitate habit formation. As a proof of principle, we also show that intra-DLS inhibition of Abl-family kinase signaling promotes goal-directed decision-making. These findings provide a novel cytoskeletal mechanism that bidirectionally coordinates actions and habits.
Materials and Methods
Subjects.
Male C57BL/6 mice (10–12 weeks old, Charles River Laboratories) were maintained on a 12 h light cycle (0700 on) and provided food and water ad libitum, except during instrumental training when body weights were reduced to ∼90% of baseline to motivate food-reinforced responding. Procedures were Yale University Institutional Animal Care and Use Committee-approved.
Instrumental training.
Mice were trained to nose poke for food reinforcement (20 mg grain-based pellets; Bioserv) using illuminated Med Associates conditioning chambers with 3 nose poke recesses. Training was initiated with a continuous reinforcement schedule; 30 pellets were available for responding on the two outermost recesses, resulting in 60 pellets/session. Four daily training sessions were conducted, during which all animals acquired the response. Next, mice were shifted to a random interval (RI) 30 s schedule of reinforcement for two sessions; again, 30 pellets were available for responding on each of two apertures. This training protocol would be expected to engender goal-directed responding that is sensitive to changes in the response-outcome associative relationship.
Extended training.
In experiments in which stimulus-response habits were developed, mice were trained for an additional three sessions using an RI 60 s schedule of reinforcement. Again, 30 pellets were available for responding on each of two apertures.
Response-outcome contingency degradation.
To test for sensitivity to changes in the response-outcome associative relationship, the response-outcome contingency associated with one of the nose poke apertures was degraded. Here, one nose poke aperture was occluded, and reinforcers were delivered into the magazine independent of animals' interaction with the remaining aperture. In other words, responses on the available aperture produced no consequences (Hammond, 1980); rather, pellets were delivered into the magazine at a rate that was yoked to each animal's individual reinforcement rate from the previous session when the opposite aperture had been available and nose-poking had been reinforced (Gourley et al., 2012a). At test, both apertures were available for 10 min, and responding was nonreinforced, as is standard practice.
To summarize, experimental events were as follows: (1) instrumental training; (2) reinforced responding on one aperture (25 min); (3) response-outcome contingency degradation associated with the opposite aperture (25 min); and (4) probe test.
Injections or infusions were delivered either immediately after the last day of instrumental training (#1 above) or immediately after response-outcome contingency degradation (#3 above). This latter experimental design allows us to selectively target the consolidation, rather than acquisition or expression, of response-outcome learning.
Response rates on each aperture during the probe test were analyzed by two-factor (aperture × group) ANOVA. In the event of significant interactions, post hoc comparisons were made with Tukey's post hoc t tests. Results are indicated in the figure captions.
Cocaine exposure.
In experiments testing the effects of acute cocaine on response-outcome conditioning, animals were injected with cocaine hydrochloride (10 mg/kg, i.p., 1 ml/100 g; generously provided by National Institute on Drug Abuse) immediately after response-outcome contingency degradation and returned to the home cage; then the probe test was conducted the following day when the animals were drug-free. For immunoblotting experiments, mice were injected with 10 or 30 mg/kg cocaine or saline and killed by rapid decapitation 45 min after injection. Brains were immediately frozen on dry ice.
Immunoblotting.
Brains were sectioned into 1 mm coronal sections. Samples were collected from the DMS (adjacent to the lateral ventricles and including the intermediate striatum) and ventral striatum with a 1-mm-diameter tissue corer. Samples were sonicated in lysis buffer: 137 mm NaCl, 20 mm Tris-Hcl, pH 8, 1% igepal, 10% glycerol. Protein concentrations were determined using Bradford colorimetric assays (Pierce), and then 20 μg/sample was added to 10 μl Laemmli buffer (20% glycerol, 2% SDS, Bromphenol blue) and boiled for 10 min. Samples were separated by SDS-PAGE on 8–16% gradient Tris-glycine gels (Invitrogen). Primary antibodies were anti-GAPDH (1:20K; Advanced Immunochemical), anti-PSD95 (1:1000; Cell Signaling Technology), and anti-phospho-cortactin (pY421; 1:750, Abcam). Membranes were incubated for 1 h or overnight and then incubated with IRDye 700 Dx Anti-Rb IgG and IRDye 800 Dx Anti-Ms IgG for 1 h (1:5000; Rockland Immunochemicals).
Bands were quantified using densitometry analysis (LiCor Odyssey Infrared Imaging System). Values were normalized to corresponding GAPDH loading controls. These ratios were converted to a percentage of the control mean from the same membrane to control for variance between gels. Variance in the control group was generated by calculating individual values as a percentage of the mean of the control group. The resulting datasets were analyzed by t test or ANOVA as appropriate. Ventral striatal phospho-cortactin values were arcsin-transformed to preserve normal variance.
Surgery.
Mice were trained to nose poke before surgery; then infusions were delivered either after instrumental response acquisition or after response-outcome contingency degradation training. Experimental groups were designated by matching animals based on response rates during training. Mice were anesthetized with pentobarbital and placed in a stereotaxic frame (David Kopf Instruments). The scalp was incised, skin retracted, bregma and lamda identified, the head leveled, and coordinates located using Kopf's digital coordinate system. The Abl-family kinase inhibitor STI-571 (10 mm; LC Laboratories) or sterile saline was infused over 2 min in a volume of 0.15 μl at anteroposterior +0.74, mediolateral ±2.2, dorsoventral −3.0 for the DMS or anteroposterior +0.5, mediolateral ±2.7, dorsoventral −3.5 for the DLS. Needles were left in place for 3 additional minutes before withdrawal and suturing. Four days later, the experiment proceeded as described. Mice were then killed, and brains were sectioned and needle tracks were visually evaluated.
Eight additional mice were infused with 1% thionin in the same volume by the same surgeon to generate histological representations of the infusion sites. After surgery, brains were extracted and submerged for 48 h in 4% paraformaldehyde, then 30% w/v sucrose. Brains were sectioned into 40-μm-thick sections and imaged on a light microscope.
Results
Cocaine is a widely recognized regulator of dendritic spine density and morphology. Even acute exposure can remodel striatal dendritic spines (Shen et al., 2009), which may in turn impact decision-making processes. We first tested whether a single exposure to cocaine was also sufficient to disrupt the consolidation of response-outcome conditioning and thereby bias decision-making strategies from those based on goal-directed response-outcome associations toward those based on stimulus-response “habit” systems. To this end, we administered cocaine immediately after modifying the response-outcome contingency associated with one of two nose poke apertures in instrumentally trained mice. When sensitivity to this contingency degradation was tested the next day, saline-injected control mice preferentially responded on the “nondegraded” nose poke aperture, showing goal-directed decision-making. Cocaine-exposed mice, however, responded on both nose poke apertures nonselectively, insensitive to modifications in the response-outcome association (interaction F(1,20) = 4.6,p = 0.04) (Fig. 1a). Thus, acute cocaine obstructed new response-outcome conditioning, biasing decision-making strategies toward those governed by familiar, habitual response strategies.
To begin to investigate molecular mechanisms, a separate group of mice was killed 45 min after cocaine injection, a time point at which acute exposure, under certain circumstances, has been shown to remodel spines (Shen et al., 2009). In the DMS, both PSD95 and phosphorylation of the cytoskeletal regulator cortactin were reduced (both t(22) = 2.6, p = 0.02) (Fig. 1b), but we identified no effects of this same behaviorally active low dose in the ventral striatum (Fig. 1c). As a positive control, however, we did confirm that a higher dose increased phospho-cortactin in the ventral striatum, as has been previously reported in rats (F(2,20) = 18.5, p < 0.001) (Fig. 1c) (Toda et al., 2006).
Cortactin is phosphorylated by Abl-family kinases, which enables cortactin to remodel the actin cytoskeleton through the Arp2/3 complex (Lapetina et al., 2009). Thus, we infused the Abl-family kinase inhibitor STI-571 into the DMS before response-outcome contingency degradation to ascertain whether Abl-family kinase inhibition would recapitulate the effects of cocaine and occlude response-outcome conditioning (Fig. 2a). At probe test, saline-infused mice preferentially responded on the nondegraded aperture, but as with cocaine exposure, STI-571-infused mice responded nonselectively (interaction F(1,14) = 6.3, p = 0.03), suggesting habit formation (Fig. 2b).
Since our initial experiments indicated that post-training cocaine exposure was sufficient to bias decision-making strategies toward those based on stimulus-response habits (Fig. 1a), we next confirmed that post-training STI-571 infusions were also sufficient to occlude goal-directed decision-making. Indeed, when infusions were delayed until immediately after response-outcome contingency degradation (i.e., during the presumptive consolidation of new response-outcome learning), subsequent responding was nonselective, or habitual, in STI-571-infused mice at test, whereas saline-infused mice were unaffected (interaction F(1,10) = 5.6, p = 0.04) (Fig. 2c). Together, these findings indicate that, in the DMS, Abl-family kinases facilitate the consolidation of response-outcome conditioning.
To confirm that, with extended training, both groups would ultimately develop habits using this conditioning approach, these mice were trained to nose poke on both apertures, further using an RI 60 s schedule of reinforcement. During this period, STI-571-infused mice executed more responses for the same number of reinforcers than saline-infused mice, consistent with behavioral responding that is insensitive to its outcomes (F(1,9) = 6.6, p = 0.03) (Fig. 2d). After extended training, however, both control and STI-571 mice showed behavioral habits (i.e., responding was nonselective despite “degradation” of the response-outcome contingency associated with one of the apertures; interaction F < 1) (Fig. 2e).
Considerable evidence indicates that the DMS promotes goal-directed decision-making, whereas the DLS coordinates stimulus-response habits (Yin et al., 2008). In this case, STI-571 infusion into the DLS might be expected to block the development of habits by destabilizing critical DLS neurocircuitry. Thus, we infused a separate group of highly trained mice with STI-571 in the DLS before response-outcome contingency degradation. At test, control mice were insensitive to response-outcome contingency degradation as expected, and STI-571 infusion reinstated sensitivity to the modification of familiar response-outcome contingencies, meaning that STI-571-infused mice preferentially responded on the nondegraded aperture (interaction F(1,19) = 4.3, p = 0.05) (Fig. 2f). Thus, STI-571 infusions mimic lesions of the DLS and occlude habit formation (Yin et al., 2004).
In a second group of mice, we tested whether infusion immediately after response-outcome contingency degradation training was sufficient to reinstate goal-directed behavior as in our DMS-targeted experiments. In this case, infusion after training had no effects (interaction F < 1) (Fig. 2g). This dissociation is discussed below, and infusion sites for these experiments are represented in Fig. 2 h.
Discussion
Considerable evidence indicates that both humans and rodents learn to associate specific actions with their outcomes and that, when action-outcome relationships are modified, as in the case of response-outcome contingency degradation, DMS interactions with the prefrontal cortex allow for flexible redirection of instrumental behavior (Balleine and O'Doherty, 2010). Insensitivity to response-outcome associative contingencies by contrast results in habitual behavioral patterns that are automated and stimulus-driven. Stimulus-response habit formation is considered a fundamental etiological factor in several psychopathologies, including obsessive-compulsive disorder and addiction; and indeed, a history of chronic psychostimulant exposure facilitates the formation of behavioral habits in animal models (Schoenbaum and Setlow, 2005; Nelson and Killcross, 2006; Nordquist et al., 2007; Zapata et al., 2010). Moreover, habitual responding for cocaine develops more rapidly than for food reinforcers (Miles et al., 2003), and drug-related structural destabilization of critical neurocircuits may play a causal role. For example, psychostimulant exposure, which accelerates habit formation, also reduces DMS dendritic spine density while elevating DLS spine density (Jedynak et al., 2007). These findings are consistent with evidence that the DMS coordinates goal-directed decision-making whereas the DLS is critical for the formation of behavioral habits (Yin et al., 2008).
Together, previous findings suggest a model wherein the structural stability of critical DMS and DLS neurons determines engagement in goal-directed versus stimulus-response decision-making strategies, but this model has not been directly tested. As an initial step, we paired an acute cocaine exposure with food-associated response-outcome contingency degradation because a single cocaine exposure destabilizes striatal neurons (Shen et al., 2009). Acute post-training cocaine exposure blocked sensitivity to response-outcome contingency degradation, resulting in a reliance on familiar, habitual response strategies. Acute cocaine also rapidly decreased phosphorylation of the cytoskeletal regulator cortactin and expression of the postsynaptic marker PSD95. Based on this pattern, we hypothesized that cocaine-induced neuronal destabilization biases decision-making strategies. To isolate potential mechanisms, we infused STI-571, an inhibitor of Abl-family kinases, which orchestrate cytoskeletal motility via substrates, such as cortactin and p190RhoGAP. Abl-family kinase inhibitors block these interactions, disinhibiting the RhoA GTPase and resulting in increased actomyosin contractility and reduced actin polymerization-based protrusion (Bradley and Koleske, 2009). Here, both pretraining and post-training infusions occluded response-outcome conditioning, resulting in nonselective, or “habitual,” responding. This pattern implicates for the first time to our knowledge a specific constellation of cytoskeletal effectors in the consolidation of response-outcome contingency learning.
Bidirectional coordination of actions and habits by Abl-family kinase signaling
The Abl-family kinase inhibitor STI-571 is an anti-cancer drug that is largely selective for Abl-family kinases (Abl and Arg), although it also inhibits c-KIT protein kinase and the PDGF receptor (Buchdunger et al., 2000). Using this inhibitor, we blocked Abl-family kinase signaling in the DMS before modifying a familiar response-outcome contingency. STI-571-infused mice were subsequently unable to differentiate between the “nondegraded” and “degraded” response contingencies, indicating that mice were unable to use new response-outcome information to guide decision-making. To dissociate effects on response acquisition versus response-outcome contingency consolidation, we next infused STI-571 immediately after response-outcome contingency degradation associated with one instrumental aperture; in this case, STI-571 would be expected to impact the consolidation, but not acquisition, of the new response-outcome contingency. Again, STI-571 mice responded equally on both apertures during a subsequent probe test, indicating that destabilization of DMS neurons interfered with the consolidation of new response-outcome associative learning. Interestingly, response rates in general were also decreased. Although nonselective responding nonetheless connotes habit formation, future studies should investigate the role of DMS Abl-family kinase signaling in occluding sensitivity to nonreinforcement because the probe tests here were conducted in extinction, as is standard practice.
Compelling evidence indicates that, unlike the DMS, the DLS is critical for the development of stimulus-response habits (Yin et al., 2008). Thus, we also infused STI-571 into the DLS; infusions before contingency degradation training reinstated goal-directed decision-making in highly trained mice, but notably, infusions after response-outcome contingency degradation had no effects. How might we account for this dissociation? Electrophysiological studies indicate that both DMS and DLS systems are involved in early skill learning (Kimchi et al., 2009) and that the DMS disengages as behaviors become automated (Yin et al., 2008). Moreover, DLS inactivation increases sensitivity to modifications in learned response-outcome relationships (Yin et al., 2006). We argue that targeted destabilization of neuroplasticity or cellular structure in the DLS of highly trained mice re-engages DMS systems by default, allowing for the acquisition of new response-outcome contingencies. If DLS infusions are delayed until after an opportunity to acquire new response-outcome associative information has passed, behavioral response strategies remain unchanged because the DMS was not engaged during the critical learning event. In this model, neuronal stability in the DMS is critical for outcome-based decision-making, whereas stability in the DLS contributes to the maintenance of familiar behavioral patterns.
Involvement of cell adhesion factors in addiction psychopathology
Our focus on Abl-family kinase signaling stems from evidence that inhibition of Arg kinase in the brain eliminates dendritic spines, synapses, and even dopamine D2 receptors (Sfakianos et al., 2007; Gourley et al., 2009, 2012b). Moreover, cortactin is implicated in cocaine-mediated structural plasticity in the ventral striatum (Toda et al., 2006). We show here that acute cocaine reduces cortactin phosphorylation and PSD95 expression in the DMS. Consistent with previous reports, the opposite profile is observed in the ventral striatum (Toda et al., 2006). Given that cortactin phosphorylation regulates actin polymerization-based phenomena (e.g., Lapetina et al., 2009), diminished phosphorylation in the DMS and enhanced phosphorylation in the ventral striatum mirror anatomically selective patterns of psychostimulant-induced structural reorganization in the striatum, that is, spine elimination in the DMS (Jedynak et al., 2007) and spine proliferation in the ventral striatum (Robinson and Kolb, 1997). We thus propose a model wherein these dendritic spine modifications directly influence decision-making by biasing strategies toward those governed by (hypertrophic) DLS circuits at the expense of engagement in goal-directed response strategies that would otherwise depend on atrophied DMS neurocircuits.
Like acute cocaine administered immediately after response-outcome contingency degradation here, a history of repeated psychostimulant exposure (7–21 d before instrumental training) also results in stimulus-response habits (Schoenbaum and Setlow, 2005; Nelson and Killcross, 2006; Nordquist et al., 2007). Whether a history of cocaine exposure confers vulnerability to habits by increasing the control of stimulus-response associations over behavior or by decreasing sensitivity to response-outcome contingencies has been a topic of considerable debate (Schoenbaum and Setlow, 2005; Zapata et al., 2010). Our findings suggest that cocaine decreases sensitivity to response-outcome learning because, unlike in previous reports, cocaine was injected only after response-outcome contingency degradation, leaving all prior conditioning unadulterated. Our data do not, however, preclude the possibility that a history of cocaine exposure also increases sensitivity to reward-related cues and that this potentiates habit formation (Jentsch and Taylor, 1999). Moreover, it is important to note that a history of cocaine exposure results in meta-plastic dendritic spine reorganization in the striatum in response to additional cocaine exposure (i.e., cocaine-induced spine reorganization is both structurally and temporally distinct in cocaine-experienced animals relative to animals without a history of cocaine exposure) (Shen et al., 2009). Thus, acute cocaine (as here) and repeated cocaine (as in prior reports) may exert differential influences on DMS neural structure and DMS-dependent decision-making, but these distinctions remain unclarified.
In conclusion, since the seminal work of Robinson and Kolb (1997), the relationship between psychostimulant exposure and cytoskeletal morphology in corticostriatal circuits has been intensely investigated. Experiments to this end generally evaluate the effects of repeated, sensitizing cocaine exposure on ventral striatal spine density or morphology, whereas experiments addressing whether the instability of cytoskeletal structure triggers or exacerbates behavioral vulnerabilities are lacking. In one exception, cytoskeletal destabilization via latrunculin A or cofilin inhibition in the ventral striatum potentiated cocaine-primed reinstatement, an animal model of relapse (Toda et al., 2006). Prefrontal cortical latrunculin A infusions also potentiate cocaine-induced psychomotor sensitization (Gourley et al., 2012b), and Arg, which is directly upstream of cortactin, regulates cocaine-induced psychomotor sensitivity and cognitive resilience through interactions with β1 integrin (Warren et al., 2012). Given that integrin expression is regulated by cocaine exposure in mice (Wiggins et al., 2009) and implicated in cocaine addiction in humans (Mash et al., 2007; Drgon et al., 2010), a better understanding of the role of this and other cell adhesion molecules may inform upon novel pharmacotherapeutic approaches to psychostimulant addiction and other disorders characterized by unremitting, automated, inflexible behavioral patterns.
Footnotes
This work was supported by PHS DA011717, DA027844, the Interdisciplinary Research Consortium on Stress, Self-control and Addiction (UL1-DE19586 and the NIH Roadmap for Medical Research/Common Fund, AA017537), and the Connecticut Department of Mental Health. S.L.G. is supported by Children's Healthcare of Atlanta, the Emory Egleston Children's Research Center, and, as a researcher at the Yerkes National Primate Research Center, the Office of Research Infrastructure Programs/OD P51OD11132. We thank Drs. Meiyu Xu and Christopher Pittenger for developing the stereotaxic coordinates used here and Drs. Mary Torregrossa and Anthony Koleske for valuable feedback.
The authors declare no competing financial interests.
- Correspondence should be addressed to Dr. Jane R. Taylor, Yale University, Department of Psychiatry, Division of Molecular Psychiatry, Connecticut Mental Health Center, Ribicoff Laboratories, 34 Park Street, New Haven, CT 06508. jane.taylor{at}yale.edu