Abstract
Stress modulates instrumental action in favor of habitual stimulus-response processes that are insensitive to changes in outcome value and at the expense of goal-directed action-outcome processes. The neuroendocrine mechanism underlying this phenomenon is unknown. Here, we tested the hypothesis that concurrent glucocorticoid and noradrenergic activity bias instrumental behavior toward habitual performance. To this end, healthy men and women received hydrocortisone, the α2-adrenoceptor antagonist yohimbine or both orally before they were trained in two instrumental actions leading to two distinct food outcomes. After training, one of the outcomes was devalued by inviting participants to eat that food to satiety. A subsequent extinction test revealed whether instrumental performance was goal-directed or habitual. Participants that received hydrocortisone or yohimbine alone decreased responding to the devalued action in the extinction test, i.e., they behaved goal-directed. The combined administration of hydrocortisone and yohimbine, however, rendered participants' behavior insensitive to changes in the value of the goal (i.e., habitual). These findings demonstrate that the concerted action of glucocorticoids and noradrenergic activity shifts instrumental behavior from goal-directed to habitual control.
Introduction
The acquisition and performance of instrumental actions which are directed at achieving specific rewards or avoiding punishments can be controlled by two distinct processes: (1) a goal-directed process that learns action-outcome contingencies and is sensitive to changes in goal value and (2) a habitual, stimulus-response process that is largely independent of the current value of the goal (Adams, 1982; Balleine and Dickinson, 1991, 1998; Dickinson et al., 1995). Converging lines of evidence, from lesion studies in rodents and human neuroimaging studies, demonstrate that these two processes rely on distinct neural networks, with the prefrontal cortex, the dorsomedial thalamus and the dorsomedial striatum supporting goal-directed action (Balleine and Dickinson, 1998; Corbit et al., 2003; Killcross and Coutureau, 2003; Yin et al., 2005; Valentin et al., 2007; de Wit et al., 2009) and the dorsolateral striatum subserving habitual instrumental action (Yin et al., 2004; Tricomi et al., 2009; for review, see Yin and Knowlton, 2006).
Recent evidence indicates that stress may alter the contribution of goal-directed and habitual processes to instrumental behavior. Rats that were subjected to chronic stress showed a significant bias toward more habitual responding (Dias-Ferreira et al., 2009). Similarly, an acute stressor before instrumental learning rendered the behavior of healthy humans insensitive to changes in the value of the outcome, i.e., stress made participants' behavior habitual (Schwabe and Wolf, 2009). The neuroendocrine mechanism underlying the stress-induced modulation of goal-directed and habitual action is unknown.
Stress effects on declarative memory necessitate a co-occurrence of glucocorticoids (GCs; mainly cortisol in humans), the steroid hormones that are released from the adrenal cortex in response to stress, and arousal-induced noradrenergic activation (for review, see Roozendaal et al., 2006b, 2009; Wolf, 2009). For example, administration of the β-adrenergic antagonist propranolol blocked the effects of GCs on memory (Roozendaal et al., 2004b; de Quervain et al., 2007). Conversely, coadministering the α2-adrenoceptor antagonist yohimbine, which increases noradrenaline levels in the brain, with GCs in low-arousal conditions, reinstated GC effects on memory (Roozendaal et al., 2006a).
In the present experiment, we hypothesized that stress effects on goal-directed and habitual instrumental action require also co-occurring GC and noradrenergic activity. To test this hypothesis, participants received a placebo, hydrocortisone, yohimbine or a combination of hydrocortisone and yohimbine before they were trained in two instrumental actions leading to two distinct food rewards. After training, one of the two actions was devalued by inviting subjects to eat that food to satiety. Goal-directed and habitual behavior was revealed in an extinction test presented after outcome devaluation (Balleine and Dickinson, 1998; Valentin et al., 2007). Decreased responding to the action associated with the devalued outcome indicates goal-directed behavior. The insensitivity of instrumental behavior to the change in the outcome value reflects habitual behavior. If habitual action is promoted by co-occurring GC and noradrenergic activity, then instrumental responding should be insensitive to outcome devaluation after the administration of both hydrocortisone and yohimbine but not after one of the drugs alone.
Materials and Methods
Eighty healthy, normal-weight individuals [40 men, 40 women; age: mean (M) = 23.76 years, SEM = 0.33 years; body-mass index: M = 23.25 kg/m2, SEM = 0.38 kg/m2] participated in this experiment. The participants were preassessed in a standardized telephone interview to exclude those who met any of the following criteria: present or lifetime history of psychiatric disorders; cardiovascular disease; asthma; current treatment with psychotropic medications, narcotics, β-blockers or steroids; drug abuse; smoking. Furthermore, subjects were prescreened before participation to ensure that they had no food intolerance, were not on a diet and found the food rewards that were used in this study (orange juice, oranges, chocolate milk, and chocolate pudding) pleasant. Nevertheless, 13 participants had to be excluded from the analyses because they indicated during the experiment that they disliked at least one of the food rewards (pleasantness rating below 20 on a pleasantness rating scale from 0 (“not pleasant”) to 100 (“very pleasant”) and choosing the referring high-probability action in <30% of the training trials), thus leaving a sample of 67 participants.
Participants were asked to refrain from excessive exercise, caffeine and eating within the 3 h before testing. Informed consent was obtained from all participants. The study protocol was approved by the ethics committee of the medical faculty of the Ruhr-University Bochum.
Experimental design and procedure.
We used a double-blind, placebo-controlled between-subjects design in which participants were randomly assigned to one of four experimental conditions: (1) oral placebo (7 men, 8 women; plac); (2) oral placebo and hydrocortisone (20 mg, Jenapharm) (8 men, 7 women; plac+cort); (3) oral placebo and yohimbine (20 mg, Desma), a blocker of the α2 adrenergic receptor that stimulates central noradrenergic activity (9 men, 9 women; plac+yoh); (4) oral hydrocortisone and yohimbine (10 men, 9 women; Cort+yoh). Drug doses were chosen in accordance with earlier studies (Buchanan and Lovallo, 2001; van Stegeren et al., 2010).
All testing took place between 2:00 PM and 6.30 PM and all phases of the experiment (drug intake, learning, devaluation, extinction testing) took place in the same room. After participants' arrival at the laboratory, baseline blood pressure as well as a first saliva sample was taken. Depending on the experimental group, participants then received placebo, hydrocortisone, and/or yohimbine pills. After a break of 45 min, in which subjects were allowed to read, blood pressure was measured again and another saliva sample was taken. Before the learning session started, ratings of hunger level and pleasantness of the foods that were presented in the learning task (orange juice, chocolate milk, peppermint tea, and water) were collected on a scale from 0 (“not hungry/pleasant”) to 100 (“very hungry/pleasant”). After participants had completed the instrumental learning task, they again rated their hunger and the food pleasantness. Participants were then invited to eat either oranges or chocolate pudding to satiety. Immediately after this outcome devaluation, ratings of hunger level and food pleasantness were collected again. Following another blood pressure measurement and another saliva sample, participants performed the instrumental learning task in extinction. Finally, we assessed participants' explicit action-outcome knowledge in free recall and cued recall tests.
Instrumental learning task.
The experimental learning task that was used in the present experiment has been described in detail previously (Valentin et al., 2007; Schwabe and Wolf, 2009). Briefly, participants were presented three trial types: chocolate, orange, and neutral. On each trial, participants were asked to choose between two actions represented by two distinct symbols on a computer screen. After subjects had selected one of the symbols by moving the left mouse cursor to the symbol and pressing the left mouse button, the referring symbol was highlighted for 3 s and 1 ml of a liquid food or else no liquid was delivered, according to the reward schedule associated with the chosen action. The liquids were delivered with separate electronic pumps (one pump for each liquid) and transferred via 3-m-long tubes (diameter: 3 mm) to the participants who kept the ends of the tubes between the lips. Importantly, the two actions per trial type differed in the probability with which a food outcome was delivered. While one action was followed with a probability of p = 0.70 by a food outcome (high-probability action), the probability of a food outcome was p = 0.20 for the other action (low-probability action). On the chocolate and orange trials, the high-probability action led to chocolate milk and orange juice, respectively, with a probability of p = 0.50, and to a common outcome (peppermint tea) with a probability of p = 0.20 (the reward and the common outcome were never presented in the same trial). On both trial types, the low-probability action was never associated with the rewards but led only to the common outcome with a probability of p = 0.20. In neutral trials, water was delivered, either with a probability of p = 0.70 (high-probability action) or p = 0.20 (low-probability action). This neutral condition served as a control to assess the effect of the rewards (chocolate milk, orange juice) on participants' choice behavior.
Participants completed 75 trials for each trial type, resulting in 225 trials in total (intertrial interval: 8 s). The occurrence of the trial types was fully randomized. The specific assignment of the symbols and the positions on the computer screen to each action was held constant for each subject but counterbalanced across participants.
Selective outcome devaluation.
After subjects had completed the learning task, they were invited to eat either oranges or chocolate pudding (Optiwell, 150 g per cup) until they did not want to eat any more. This served to decrease the value of one food outcome (e.g., eating chocolate pudding to satiety should decrease the value of chocolate milk), whereas the value of the other outcome (orange juice in the example) should remain intact. The specific food used for devaluation was counterbalanced across participants.
Extinction test.
The effect of the selective outcome devaluation on instrumental behavior was assessed in an extinction test given shortly after the devaluation procedure (∼100 min after pill intake). Participants again completed 75 trials of each of the three trial types in which they were asked to choose between the two possible actions. The basic procedure was the same as during the learning session. This time, however, the rewards (the devalued and nondevalued food outcomes) were no longer presented, i.e., participants were tested in extinction for these outcomes. Both in the chocolate and in the orange trials, the two alternative actions delivered the common outcome (peppermint tea) with a probability of p = 0.20. In the neutral trials, water was now available with the equal probability of p = 0.20 for both actions.
Performance in this extinction test revealed whether instrumental behavior was goal-directed or habitual. Decreased responding to the action associated with the devalued food outcome relative to the action associated with the valued food outcome indicated goal-directed behavior. The ongoing choice of the devalued instrumental action was indicative for habit behavior.
Explicit memory test.
In a free recall test immediately after the extinction test, participants were asked about the actions that had to be performed to receive chocolate milk, orange juice, and water, respectively. We gave one point for each correctly named symbol and symbol position (e.g., participants received two points if they mentioned correctly that they had to click with the mouse cursor at the circle in the left upper corner to receive chocolate milk), i.e., a maximum score of six points could be reached.
In addition, explicit action-outcome knowledge was assessed in a cued-recall test. Participants were presented a multiple-choice questionnaire, in which they were required to indicate (1) in which position on the screen each of the symbols had been presented and (2) which symbols were associated with the delivery of chocolate milk, orange juice, and water, respectively. One point was given for each correct answer. As there were nine multiple choice questions, participants could reach a maximum score of nine points.
Saliva sampling and cortisol measurement.
Saliva samples were collected before the pill intake, immediately before the beginning of the instrumental learning session, as well as before the extinction test session with the help of Salivette (Sarstedt) collection devices. Saliva samples were stored at −20°C until analyses. The biologically active, free fraction of the stress hormone cortisol was analyzed from saliva using an immunoassay (IBL). Inter- and intra-assay coefficients of variance were below 9%.
Blood pressure measurement.
As an indicator of autonomic nervous system activity, we measured blood pressure before the pill intake, before the instrumental learning session, and before the extinction test session by means of the Dinamap system (Critikon); the cuff was placed at the left upper arm.
Results
Physiological changes following cortisol and yohimbine intake
Salivary cortisol
As expected and shown in Figure 1A, the intake of hydrocortisone (i.e., cortisol) resulted in a significant increase in salivary cortisol. A mixed-design ANOVA with the within-subject factor time point of measurement and the between-subjects factors cortisol (yes vs no) and yohimbine (yes vs no) revealed a main effect of cortisol and a cortisol × time interaction effect (both F values >82.56, both p values <0.001, both η2 > 0.57). There was no effect of yohimbine on salivary cortisol, nor was there a cortisol × yohimbine interaction (all p > 0.13).
Salivary cortisol and blood pressure changes following yohimbine and hydrocortisone intake. A, Participants who received hydrocortisone had significantly elevated salivary cortisol concentrations before learning and before the extinction test. B, C, Yohimbine intake resulted in significantly higher systolic (B) and diastolic (C) blood pressure indicating increased activation of the autonomic nervous system. Data represent mean ± SEM. ***Significantly higher cortisol concentrations in the Plac+Cort and Cort+Yoh groups compared with the Plac and Plac+Yoh groups (all p values <0.01, LSD post hoc tests); **significantly higher systolic blood pressure in the Plac+Yoh and Cort+Yoh groups compared with the Plac and Plac+Cort groups (all p values <0.05, LSD post hoc tests); *significantly higher diastolic blood pressure in the Plac+Yoh group than in the Plac and Plac+Cort groups (LSD post hoc tests, all p values <0.05; Cort+Yoh vs Plac/Plac+Cort: all p values <0.15).
Blood pressure
Yohimbine caused a significant activation of the autonomic nervous system as reflected in significant increases in systolic (main effect yohimbine and yohimbine × time interaction: both F values > 13.66, both p values <0.001, both η2 > 0.18) (Fig. 1B) and diastolic blood pressure (both F values >5.46, both p values <0.03, both η2 > 0.08) (Fig. 1C). Cortisol had no effect on systolic or diastolic blood pressure (all p > 0.16).
Cortisol and yohimbine do not affect learning curves
Figure 2 shows the learning curves of the four groups. Over training, participants in all groups increasingly preferred the high-probability actions associated with the rewards (chocolate milk and orange juice, respectively) over the referring low-probability actions. This indicates successful instrumental learning. In neutral trials, however, subjects did not chose the high-probability action more often than the low-probability action showing that they were indifferent as to whether they received the effectively neutral liquid or not. Importantly, the four treatment groups did not differ in their learning curves. This is supported by a mixed-design ANOVA with the within-subjects factors time (5 blocks with 15 trials per block) and trial type (chocolate, orange, neutral) and the between-subjects factors cortisol and yohimbine which yielded significant effects of time (F(4,252) = 20.97, p < 0.001, η2 = 0.25), trial type (F(2,126) = 46.77, p < 0.001, η2 = 0.43) and a significant time × trial type interaction (F(8,504) = 5.31, p < 0.001, η2 = 0.08) but no effects of cortisol, yohimbine or any interaction effects involving cortisol or yohimbine (all p values >0.13).
Percentage of high-probability actions in the three trial types (chocolate, orange, and neutral) across the learning session. All participants learned to choose the instrumental action associated with the food rewards; they increasingly preferred the high-probability actions associated with chocolate milk and orange juice, respectively, over their low-probability counterparts (*p < 0.05). In neutral trials, subjects did not favor the high-probability action over the low-probability action. Cortisol and yohimbine had no influence on learning curves. The gray line marks the percentage of high-probability actions of 50%, where subjects were completely indifferent between high- and low-probability actions. Data represent mean ± SEM; 1 block = 15 trials.
Selective outcome devaluation remains unaffected by cortisol and yohimbine
During the selective outcome devaluation after learning, subjects ate on average 2.09 cups of chocolate pudding (SEM: 0.19) or 2.07 oranges (SEM: 0.14). This led to a significant drop in subjective hunger ratings from 61.6 (SEM: 3.1) before the learning session and 58.4 (SEM: 3.5) immediately before outcome devaluation to 36.3 (SEM: 3.4) after outcome devaluation. Neither the amount of food consumed nor the subjective hunger ratings were affected by yohimbine or cortisol (all F values <1.85, all p values >0.17, all η2 <0.03).
The subjective pleasantness ratings confirmed that the outcome devaluation was indeed specific to the food eaten to satiety (Fig. 3). An ANOVA with time (before vs after devaluation) and value (valued vs devalued) as within-subjects factors and yohimbine and cortisol as between-subjects factors revealed that pleasantness ratings decreased sharply after feeding for the devalued but not for the valued outcome (time × value interaction: F(1,63) = 30.76, p < 0.001, η2 = 0.33). There were no effects of cortisol or yohimbine on pleasantness ratings (all F values <1.69, all p values >0.20, all η2 < 0.03).
Subjective pleasantness ratings on a scale from 0 (“not pleasant”) to 100 (“very pleasant”) before training, before devaluation, and after devaluation. Before the selective outcome devaluation, participants found the rewards (valued and devalued outcome) more pleasant than the common and neutral outcomes. After devaluation, pleasantness ratings decreased significantly for the food eaten (devalued outcome) relative to the food not eaten (valued outcome). Data represent mean ± SEM.
Concurrent cortisol and noradrenergic activity render instrumental behavior insensitive to outcome devaluation
Figure 4 shows subjects' responses in the extinction test. Participants that had received a placebo before learning behaved goal-directed. Consistent with their pleasantness ratings, they preferred the high-probability action associated with the valued outcome over the high-probability action associated with the devalued outcome across extinction testing (F(1,14) = 4.88, p < 0.05, η2 = 0.26). In the first 15-trial block, before they could know that the rewards were not presented any longer, they favored the valued high-probability action over its low-probability counterpart (binomial test, t(14) = 8.05, p < 0.001). No such trend was found for the devalued high-probability action. On the contrary, participants in the plac group tended even to avoid the high-probability action associated with the devalued outcome in the first extinction block (t(14) = 1.91, p = 0.08).
Percentage of valued, devalued, and neutral high-probability actions across the extinction test session. All participants favored the valued high-probability action over its low-probability counterpart in the first extinction block (*p < 0.05). However, only participants who had received both cortisol and yohimbine chose the high-probability action associated with the devalued outcome more often than the referring low-probability action (#p < 0.05), indicating that their responding was insensitive to changes in the value of the outcome (i.e., habitual). The gray line marks the percentage of high-probability actions of 50%, where subjects were completely indifferent between high- and low-probability actions. Data represent mean ± SEM; 1 block = 15 trials.
As in the plac group, participants in the plac+cort and plac+yoh groups performed goal-directed. They chose the valued high-probability actions more often than the devalued high-probability actions across extinction testing (both F values >8.03, both p values <0.02, both η2 > 0.33) and preferred the valued but not the devalued high-probability action over the respective low-probability action in the first 15-trial block (valued: both t values >5.71, both p values <0.001; devalued: both t values <1.55, both p values >0.14). This indicates that cortisol or yohimbine alone did not change instrumental responding.
The combined administration of cortisol and yohimbine, however, altered instrumental behavior significantly. Participants in the cort+yoh group did not prefer the valued high-probability action over the devalued high-probability action (F(1,19) = 0.19, p = 0.67, η2 = 0.01). Furthermore, subjects in the cort+yoh group favored both the valued and the devalued high-probability actions over their low-probability counterparts in the first 15 extinction test trials (both t(18) > 6.54, both p < 0.001). In sum, participants that had received both cortisol and yohimbine indicated that they did not want the devalued outcome any more but still performed the action associated with the devalued outcome, i.e., they responded habitually.
Accordingly, a value (valued vs devalued) × time (five 15 trial blocks) × cortisol (yes vs no) × yohimbine (yes vs no) mixed-design ANOVA yielded a significant value × cortisol × yohimbine interaction effect (F(1,63) = 3.88, p = 0.05, η2 = 0.06). Follow-up ANOVAs showed a significant cortisol × yohimbine interaction for devalued (F(1,63) = 4.87, p < 0.05, η2 = 0.07) but not for valued trials (F(1,63) = 0.00, p = 0.97, η2 = 0.00).
Since the treatment effect appeared to be most pronounced in the first extinction test block, we compared the performance of the four groups in the first extinction block with the one in the last training block by means of a value (valued vs devalued) × time (last 15 training trials vs first 15 extinction trials) × cortisol × yohimbine ANOVA. This analysis revealed a four-way interaction effect (F(1,63) = 4.25, p < 0.05, η2 = 0.06). Follow-up tests indicated that there was a time × cortisol × yohimbine interaction in devalued (F(1,63) = 4.67, p < 0.05, η2 = 0.07) but not in valued trials (F(1,63) = 0.00, p = 0.99, η2 = 0.00). As shown in Figure 5 all participants decreased responding to the devalued action markedly from training to extinction testing (i.e., after outcome devaluation), except the participants in the cort+yoh group.
Comparison of valued and devalued high-probability actions in the last 15 trial block of training and the first 15 trial block of extinction testing. All but the participants of the Cort+yoh group decreased responding to the devalued high-probability action after selective outcome devaluation (**p < 0.001). Data represent mean ± SEM.
All participants decreased responding to the valued high-probability action as extinction testing proceeded (time × cortisol × yohimbine ANOVA; main effect time: F(4,252) = 11.72, p < 0.001, η2 = 0.16; for all other effects: p > 0.27) which suggests successful extinction learning.
Explicit action-outcome knowledge is not influenced by cortisol or yohimbine
Overall, participants performed very well in the tests of action-outcome knowledge. On average, they reached 4.28 points (SEM: 0.23; maximum score: 6 points) in the free recall test and 8.04 points (SEM: 0.13; maximum score: 9 points) in the cued recall test. There was no effect of cortisol or yohimbine on explicit action-outcome knowledge, nor was there a significant cortisol × yohimbine interaction (all p > 0.15).
No effect of cortisol, yohimbine, or outcome devaluation on reaction times
As in previous reports, reaction times were not modulated by the treatment or the outcome devaluation (Valentin et al., 2007; Schwabe and Wolf, 2009; Schwabe and Wolf, 2010). A mixed-design ANOVA with the between-subjects factors cortisol (yes vs no) and yohimbine (yes vs no) and the within-subjects factors time (five 15-trial extinction test blocks) and value (valued vs devalued) indicated that reaction times decreased significantly across the extinction session (F(4,252) = 3.69, p = 0.01, η2 = 0.06), yet there were no other main or interaction effects (all p values >0.10).
Discussion
We demonstrated recently that acute stress favors habit behavior at the expense of goal-directed instrumental behavior (Schwabe and Wolf, 2009). Here, we present the putative neuroendocrine mechanism. Our results indicate that a combination of high cortisol concentrations and increased noradrenergic activity renders individuals' behavior insensitive to changes in goal value. Elevated cortisol or stimulation of the noradrenergic system alone did not affect sensitivity to outcome devaluation. Thus, the present findings show that cortisol and noradrenergic arousal interact synergistically to shift instrumental behavior from goal-directed to habitual action. Interestingly, this change in instrumental responding came without changes in explicit task knowledge which might indicate that the combined administration of cortisol and yohimbine impaired participants' ability to integrate cognitive and emotional information (Pessoa, 2008).
GCs have been assigned a key role in stress effects on learning and memory processes (Lupien and McEwen, 1997; Joëls et al., 2006; Schwabe et al., 2010). However, it is by now well established that GC effects on hippocampus-dependent spatial or declarative memory necessitate co-occurring noradrenergic activation (Roozendaal et al., 2006a; de Quervain et al., 2007; Schwabe et al., 2009; for review, see Roozendaal et al., 2009). Similar effects have been reported for prefrontal cortex-dependent working memory (Roozendaal et al., 2004a). In line with these data, we show here that (acute) increases in cortisol alone do not lead to more habitual instrumental responding. Interestingly, the basolateral part of the amygdala has been identified as a locus of the GC–noradrenaline interaction and as a modulator of memory processes in other brain areas, such as the hippocampus or prefrontal cortex (McGaugh and Roozendaal, 2002; Roozendaal et al., 2006b). The present study shows for the first time that GC effects on the interplay of multiple memory systems (here the prefrontal cortex and the dorsolateral striatum) require also a co-occurrence of noradrenergic activation. Given that the basolateral amygdala operates as a mediator of the interactive effects of GCs and noradrenaline on other prefrontal cortex-dependent processes (e.g., working memory), it is tempting to speculate that the present effects on prefrontal cortex-based goal-directed and dorsolateral striatum-based habit performance might be mediated by the basolateral amygdala as well. Indeed, there is accumulating evidence for a role of the basolateral amygdala in instrumental learning (Balleine et al., 2003; Balleine and Killcross, 2006).
Beyond such a modulatory role of the basolateral amygdala, GCs and noradrenaline may have exerted their effects directly on the brain areas responsible for goal-directed and habitual behavior, i.e., the prefrontal cortex and the dorsolateral striatum (Balleine and Dickinson, 1998; Yin et al., 2004; Valentin et al., 2007; Tricomi et al., 2009). The prefrontal cortex expresses stress hormone receptors at a high density (Patel et al., 2000) and stress or stress hormones impair neuroplasticity in the prefrontal cortex (Diamond et al., 2007). In contrast, the dorsolateral striatum expresses stress hormone receptors to a lesser extent (Patel et al., 2000) which suggests a lower sensitivity of this brain area to stress hormones. Yet, there is recent evidence showing that GCs enhance dorsolateral striatum-based memory processes (Medina et al., 2007; Quirarte et al., 2009). Hence, it could be hypothesized that GCs and noradrenaline had in the present study opposite effects on the neural circuits involved in goal-directed and habitual action, thus leading to habit behavior. In line with this hypothesis, chronic stress causes atrophy in the prefrontal cortex but hypertrophy in the dorsolateral striatum and these structural changes are paralleled by a shift toward habit behavior (Dias-Ferreira et al., 2009). However, the effects of chronic stress or repeated GC exposure are typically more pronounced than the effects of acute stress (or a single stress hormone dose), in particular at the morphological level. At this stage, we can only speculate about the neural correlates of the present effect. Determining the brain mechanism underlying the interactive effect of a single dose of GCs and noradrenaline on instrumental action remains a challenge for future neuroimaging studies in humans and lesion studies in rodents.
In the present study, cortisol and noradrenergic activity were significantly elevated before learning and before extinction testing. Therefore, stress hormones could have influenced the acquisition and the expression of goal-directed and habitual behavior. We showed recently that participants lose sensitivity to outcome devaluation when they are exposed to a stressor before extinction testing, i.e., after learning and after outcome devaluation (Schwabe and Wolf, 2010). This demonstrated clearly that stress and GCs may facilitate habitual responding without affecting instrumental learning. However, the effect of stress before learning appeared to be more pronounced and more long-lasting than the effect of stress before retrieval. In addition, prelearning stress reduced action-outcome knowledge while pre-retrieval stress did not (Schwabe and Wolf, 2009, 2010) suggesting that prelearning stressed affected not only the expression but also the acquisition of instrumental behavior. Unfortunately, stress hormone effects on the acquisition of goal-directed and habitual responses cannot be separated from the expression of these responses in the devaluation paradigm that was used here and in most other studies on instrumental action because in this paradigm goal-directed and habitual action can only be distinguished based on the choice behavior in the extinction test after outcome devaluation. Novel experimental paradigms are desirable that allow the on-line assessment of the habitual status of an instrumental response during learning. Moreover, functional magnetic resonance imaging could be used to assess changes in prefrontal cortex and dorsolateral striatum activity across training.
Our data show that the previously reported switch from goal-directed to habitual action after stress (Schwabe and Wolf, 2009) can be mimicked by pharmacological elevations of cortisol and noradrenergic activity. This, however, does not exclude an influence of other stress mediators on instrumental action. During stressful experiences, numerous hormones and neurotransmitters, in addition to GCs and noradrenaline, are released (Joëls and Baram, 2009). Some of these may also affect instrumental responding. For instance, increased dopaminergic activity has been related to an accelerated transition from goal-directed to habitual performance (for review, see Wickens et al., 2007). Furthermore, an intact endogenous opioid system, implicated in reward processing, is required for goal-directed learning and opioid receptor blockade enhances habitual action (Wassum et al., 2009).
One discrepancy of the present and our previous findings needs to be addressed. Although stress before instrumental learning reduced action-outcome knowledge (Schwabe and Wolf, 2009), we found here no effect of cortisol or noradrenergic activation on action-outcome knowledge. Thus, while the pharmacological manipulations of glucocorticoid levels and noradrenergic activation mimicked the stress effect at the behavioral level, there are some differences in the effect of the stress exposure and the present pharmacological treatment which might be, for example, due to affective changes following a stressful experience. An alternative explanation takes the level of performance in the tests of action-outcome knowledge into account. Action-outcome knowledge was overall very high in the present study which suggests a ceiling effect in performance. That is, the (explicit) memory component of the instrumental learning task was rather undemanding and the test of action-outcome knowledge not very sensitive for possible mnemonic effects of the treatments.
Finally, one limitation of the present study might be seen in the fact that we had no direct measures of noradrenaline availability; noradrenergic activity was indirectly measured via blood pressure. Measuring noradrenaline from plasma samples would have yielded more precise data about the actual noradrenaline concentration in the different experimental groups. We have decided against taking blood samples because the blood sampling procedure might have induced substantial arousal and thus interfered with our experimental manipulation. Without such data, however, we cannot fully rule out explanations that are based solely on noradrenaline concentrations, e.g., that the combined administration of hydrocortisone and yohimbine led to an above threshold concentration of noradrenaline which is necessary for the shift toward habit behavior and was not reached by yohimbine alone.
In summary, our findings show that GCs and noradrenaline act in concert to promote the switch from goal-directed to habitual performance. GCs or noradrenergic activation alone were not able to operate this switch. These findings may have important implications for our understanding of drug addiction and other compulsive disorders that are viewed as endpoints of a transition from initial goal-directed to habitual control of behavior (Berke and Hyman, 2000; Everitt and Robbins, 2005) and can be triggered by stress and GCs (Piazza and Le Moal, 1998). Moreover, they suggest a potential use of β-blockers or glucocorticoid receptor antagonists in such diseases.
Footnotes
This work was supported by Deutsche Forschungsgemeinschaft Grant SCHW1357/2-1. We gratefully acknowledge the assistance of Florian Watzlawik and Carsten Siebert during data collection. We thank Tobias Otto for his technical assistance.
- Correspondence should be addressed to Dr. Lars Schwabe, Department of Cognitive Psychology, Ruhr-University Bochum, Universitaetsstrasse 150, 44780 Bochum, Germany. Lars.Schwabe{at}rub.de