Abstract
Stress promotes a shift from goal-directed action–outcome learning toward habitual stimulus–response learning. This shift is mediated by an interaction of noradrenergic activity and glucocorticoid stress hormones. In the present experiment, we examined the neural correlates of the stress (hormone)-induced shift from goal-directed to habit learning in the human brain. Healthy participants were administered hydrocortisone, the α2-adrenoceptor antagonist yohimbine, or both before they were trained in two instrumental actions leading to two distinct food rewards. After training, one of the rewards was devalued by feeding participants to satiety on that food. Finally, participants were presented the two instrumental actions in extinction. We collected functional magnetic resonance images both during instrumental training and during extinction testing. Our behavioral data confirmed that the simultaneous administration of hydrocortisone and yohimbine renders instrumental behavior insensitive to the outcome devaluation (and thus habitual), whereas hydrocortisone or yohimbine alone have no such effect. At the neural level, the combined administration of hydrocortisone and yohimbine reduced the sensitivity of the orbitofrontal and medial prefrontal cortex to changes in outcome value. Brain areas that have been previously implicated in habit learning were not modulated by hydrocortisone and yohimbine. These findings suggest that concurrent glucocorticoid and noradrenergic activity disrupts the neural bases of goal-directed action and thus renders behavior habitual.
Introduction
Successful adaptation to varying environments requires the ability to predict and control the consequences of one's actions. Learning how to obtain rewards and avoid punishments is referred to as instrumental learning and can be controlled by two distinct processes: (1) a goal-directed process that learns the relationship between an action and the motivational value of the outcome and (2) a habitual stimulus–response process that encodes the association between a response and preceding stimuli, without any link to the outcome that is engendered by the response (Dickinson, 1985). Converging evidence from lesion studies in rodents and human neuroimaging studies demonstrates that goal-directed and habit processes are supported by distinct neural networks (for review, see Balleine and O'Doherty, 2010). Whereas goal-directed learning relies on the prefrontal cortex, the dorsomedial striatum, and the dorsomedial thalamus (Balleine and Dickinson, 1998a; Corbit et al., 2003; Yin et al., 2005; Valentin et al., 2007), habit learning is dependent on the dorsolateral striatum (Yin et al., 2004, 2005; Tricomi et al., 2009).
Stressful experiences may modulate the processes involved in instrumental learning in a manner that favors habit learning, at the expense of goal-directed learning (Dias-Ferreira et al., 2009; Schwabe and Wolf, 2009, 2010). This stress effect can be mimicked by the simultaneous administration of glucocorticoids and an α2-adrenoceptor antagonist that increases noradrenergic stimulation (Schwabe et al., 2010b). Glucocorticoids or noradrenergic activation alone, however, did not alter instrumental learning, suggesting that the stress-induced shift toward habit learning requires, similar to stress effects on hippocampus-dependent memory (Roozendaal et al., 2006a,b), concurrent glucocorticoid and noradrenergic activity. In line with this idea, the effects of stress on instrumental learning can be prevented by a β-adrenergic antagonist (Schwabe et al., 2011). However, the neural mechanism underlying the stress (hormone)-induced shift from goal-directed to habitual control of instrumental learning is still unknown.
In this experiment, we used functional magnetic resonance imaging (fMRI) to examine how stress hormones promote the shift from goal-directed to habit action in the human brain. Healthy participants were administered a placebo, the synthetic glucocorticoid hydrocortisone, the α2-adrenoceptor antagonist yohimbine, or a combination of both drugs before they were trained, while lying in the scanner, in two instrumental actions leading to two distinct food outcomes. Participants were then fed to satiety on one of the food outcomes, to devalue that food. After devaluation, participants performed the two actions in extinction in the scanner. If performance is goal-directed, it should be sensitive to the outcome devaluation; the absence of this sensitivity indicates habitual performance. Based on our previous behavioral study (Schwabe et al., 2010b), we expected that only the combined administration of hydrocortisone and yohimbine would induce habit learning. At the neural level, we focused on interactive influences of hydrocortisone and yohimbine on activity in the orbitofrontal cortex, the medial prefrontal cortex, the caudate nucleus, and the putamen because these areas have been implicated in goal-directed and habitual learning in earlier neuroimaging studies (Valentin et al., 2007; Tricomi et al., 2009).
Materials and Methods
Eighty healthy, normal-weight students of the Ruhr-University Bochum with normal or corrected-to-normal vision participated in this experiment [40 men, 40 women; age: mean = 23.6 years, SEM = 0.3 years; body-mass index (in kg/m2): mean = 23.2, SEM = 0.3]. Participation was limited to right-handed nonsmokers between 18 and 32 years of age, without medication intake, with no reported history of any neurological or psychiatric disorder, no food intolerance, and no contraindications for MRI. Moreover, we prescreened participants to ensure that they liked the food rewards that were used in this study (orange juice, chocolate milk). Nevertheless, the data of 11 participants had to be excluded from analysis because these participants revealed during the experiment that they did not like at least one of the food rewards (pleasantness rating and percentage of high-probability actions for the referring food <2 SD of the mean) (Valentin et al., 2007; Schwabe et al., 2011). The number of participants that had to be excluded did not differ between experimental groups (χ32 = 2.00, p = 0.57).
We used a fully crossed, placebo-controlled between-subject design with the factors hydrocortisone (placebo vs hydrocortisone) and yohimbine (placebo vs yohimbine). Thus, participants were randomly assigned to one of four groups: placebo/placebo (PLAC; n = 19), placebo/hydrocortisone (PLAC/CORT; n = 17), placebo/yohimbine (PLAC/YOH; n = 16), or hydrocortisone/yohimbine (CORT/YOH; n = 17). All participants provided written informed consent for participation in this study, which was approved by the ethics committee of the Medical Faculty of the Ruhr-University Bochum.
Drug administration and manipulation check.
Participants were administered 20 mg of hydrocortisone (Jenapharm) and/or 20 mg of yohimbine (Desma) orally ∼50 min before learning. Timing and dosage of drug administration were chosen to be in line with our previous study (Schwabe et al., 2010b). To verify the action of the drugs, we took saliva samples at several time points across the experiment. Saliva samples were stored at −20°C until analysis. We analyzed the biologically active, free fraction of the stress hormone cortisol, the major glucocorticoid in humans, as well as the enzyme α-amylase, an indicator of adrenergic activity (Chatterton et al., 1996; Nater and Rohleder, 2009). Free cortisol levels were determined by a commercially available luminescence immunoassay (IBL) (Westermann et al., 2004; Granger et al., 2007). Mean intra-assay and interassay coefficients of variation are typically <8% and 12%, respectively. Levels of salivary α-amylase were determined from the saliva samples using a commercially available kinetic reaction assay (Salimetrics) (Granger et al., 2007). Mean intra-assay and interassay coefficients of variation of the salivary alpha-amylase analyses are typically <8% and 6%, respectively. In addition, we controlled for potential changes in subjective mood by means of a German mood questionnaire [Mehrdimensionaler Befindlichkeitsfragebogen (MDBF) (Steyer et al., 1994)] that measures mood states on three dimensions (wakefulness vs sleepiness, calmness vs restlessness, and elevated vs depressed mood).
Instrumental learning task.
The instrumental learning task that we used in the present study has been described in detail previously (Valentin et al., 2007; Schwabe and Wolf, 2009). In brief, participants were presented three trial types: chocolate, orange, and neutral, whose occurrence was completely randomized. On each trial, participants were asked to choose between two actions that were represented by two distinct symbols on a computer screen (Fig. 1A). Symbols were presented in one of four locations on the screen: top left corner, top right corner, bottom left corner, or bottom right corner. Participants selected an action by pressing one of four buttons on a response box that corresponded to these four locations (1, top left corner; 2, top right corner; 3, bottom left corner; 4, bottom right corner), i.e., the two actions between participants could choose per trial were two button presses corresponding to the location of the presented symbols. The specific assignment of the symbols and the positions on the computer screen to each action was held constant for each participant but counterbalanced across participants.
Instrumental learning task and experimental procedure. A, Participants were presented three trial types: chocolate, orange, and neutral. On each trial, they were asked to choose between two actions represented by distinct symbols. One of the actions had a high probability of a food outcome and the other had a low probability of a food outcome. Depending on the trial type, the high-probability action yielded chocolate milk or orange juice with a probability of p = 0.5, a common outcome (peppermint tea) with a probability of p = 0.2, or nothing. The low probability action led to the common liquid with a probability of p = 0.2. In neutral trials, water was delivered with a probability of p = 0.7 for the high-probability action and with a probability of p = 0.2 for the low-probability action. After an action was chosen, the referring symbol was highlighted for 3 s before 1 ml of the liquid was delivered. B, Participants received placebo, hydrocortisone (20 mg), and/or yohimbine (20 mg) ∼50 min before they performed the instrumental learning task in the scanner. After training, participants were satiated with either oranges or chocolate pudding (out of the scanner). This served to devalue selectively one of the food rewards (orange juice or chocolate milk). Finally, participants completed an extinction test, in which the food rewards were not presented any more, while lying in the scanner.
If no response was registered within 3 s, the trial was aborted. When participants had selected one of the actions the referring symbol was highlighted for 3 s and afterward 1 ml of a liquid food or else no liquid was delivered, according to the reward schedule associated with the chosen action. The liquids were delivered with separate electronic pumps (one pump for each liquid) and transferred via 8 m long tubes (diameter: 3 mm) to the participants who kept the ends of the tubes between the lips. Importantly, the two actions per trial type differed in the probability with which a food outcome was delivered. While one action had a probability of p = 0.70 that it would be followed by a food outcome (high-probability action), the other action had a probability of a food outcome of p = 0.20 (low-probability action). In non-reinforced trials, no liquid was delivered. On the chocolate and orange trials, the high-probability action led to chocolate milk or orange juice, respectively, with a probability of p = 0.50 and to a common outcome (peppermint tea) with a probability of p = 0.20 (the reward and the common outcome were never presented in the same trial). On both trial types, the low-probability action was never associated with the rewards but led only to the common outcome with a probability of p = 0.20. In neutral trials, water was delivered, either with a probability of p = 0.70 (high-probability action) or p = 0.20 (low-probability action); the common outcome (peppermint tea) was never presented in neutral trials. By comparing performance in these trials to the performance in chocolate and orange trials, the neutral trial served as a control to assess the effect of the rewards (chocolate milk, orange juice) on participants' choice behavior.
Participants completed 50 trials for each trial type, resulting in 150 trials in total. Between trials, a fixation cross was presented for 5–9 s (random jitter: 0–4 s) on the center of the screen.
Selective outcome devaluation.
After the training session, participants were invited to eat either as much chocolate pudding or as many oranges as they wanted. This served to decrease the value of one food outcome, while the value of the other food outcome should remain high. Eating oranges to satiety should devalue the orange juice (but not the chocolate milk), whereas eating chocolate pudding to satiety should devalue the chocolate milk (but not the orange juice). Which specific food was used for devaluation (oranges or chocolate pudding) was counterbalanced across participants. There was no time limit for the food intake but most participants stopped eating after ∼10 min. To assess the effectiveness of the outcome devaluation, we asked participants to rate their hunger and the subjective pleasantness of the foods on a scale from 0 (not hungry/pleasant) to 100 (very hungry/pleasant) before and after the outcome devaluation.
Extinction test.
After the selective outcome devaluation, participants' were again presented 50 trials for each of the three trial types (chocolate, orange, and neutral) in randomized order. Again, they were presented the two actions, represented by the two distinct symbols, on each trial and asked to select one of them by pressing the corresponding button on the response box. This time, however, the rewards (chocolate milk and orange juice) were not presented any longer, i.e., participants were tested in extinction. Both in the chocolate and in the orange trials, the two alternative actions delivered the common outcome (peppermint tea) with a probability of p = 0.20. To maintain some degree of responding on both actions (even the devalued one), we still presented the common outcome so that the overall outcome was now available with equal probability on the two alternative actions on both trial types (Valentin et al., 2007). In the neutral trials, water was now available with the equal probability of p = 0.20 for both actions. This extinction procedure ensured that the participants only used information about the value of the outcome by making use of the previously learned associations between that outcome and a particular action.
A decrease in the choice of the action associated with the devalued food outcome indicated goal-directed performance, whereas the ongoing choice of the action associated with the devalued food outcome was indicative for habit performance.
Experimental procedure.
To control for the diurnal rhythm of the stress hormone cortisol, all testing took place in the afternoon between 1 and 6:30 P.M. After participants' arrival at the lab, a first saliva sample was collected and a baseline measurement of mood state was taken. Then, participants took placebo, yohimbine, and/or hydrocortisone pills, depending on the experimental condition. After a break of 30 min, during which participants were allowed to read, mood state was measured again and another saliva sample was collected. Afterward, participants received the instructions for the learning task. Immediately before scanning at 3 tesla, participants gave another saliva sample and rated their hunger and the pleasantness of the foods that were presented in the task. After a 4 min anatomical scan and ∼50 min after pill intake, participants completed the instrumental learning task in the scanner (duration: ∼30 min). After finishing the training session, participants were taken out of the scanner. They gave another saliva sample, rated their hunger and the pleasantness of the foods again, and were then satiated with either oranges or chocolate pudding. Participants rated their hunger and the pleasantness of the foods again before they performed the extinction test in the scanner. At the end of the experiment, participants gave a final saliva sample out of the scanner. The basic procedure is summarized in Figure 1B.
fMRI data acquisition.
Imaging was conducted by using a 3.0 tesla Philips Achieva scanner equipped with a 32-channel head coil. For each participant, one high-resolution T1-weighted anatomical scan was acquired before the training session and one before the extinction session (for both scans: 220 slices, slice thickness 1 mm, TR = 8.2 ms, TE = 3.8 ms). The functional scans during instrumental learning and extinction testing (950 volumes each) were acquired parallel to the anterior commissural–posterior commissural plane with the following parameters: 30 slices, slice thickness 3 mm, TR = 2.0 s, TE = 30 ms, flip angle = 90°, 64 × 64 matrix, 2 × 2 mm pixel size, field of view = 200 × 200 mm. The first three images were discarded to allow T1 equilibration.
Data analysis.
Salivary cortisol, α-amylase, and subjective mood data were analyzed by mixed-design ANOVAs with the within-subject factor time point of measurement and the between-subject factors hydrocortisone (hydrocortisone vs placebo) and yohimbine (yohimbine vs placebo). Similarly, participants' responses in the learning and extinction sessions were subjected to mixed-design ANOVAs with the within-subject factors trial type (valued vs devalued vs neutral) and block (5 blocks with 10 trials per block) and the between-subject factors hydrocortisone and yohimbine. Significant interaction effects were followed by appropriate post hoc tests. Greenhouse–Geisser correction was used to correct for violations of sphericity. All reported p values are two-tailed.
Preprocessing and analysis of the event-related fMRI data were performed using SPM8 (Wellcome Trust Center for Neuroimaging, University College London, London, UK). Functional imaging data were corrected for slice-timing and head motion. Structural images were segmented into gray matter, white matter, and CSF. Gray matter images were normalized to the MNI template image. Normalized gray matter images were used for normalization of the structural and functional images. Finally, data were spatially smoothed using an 8 mm full-width half-maximum Gaussian kernel and filtered in the temporal domain using a nonlinear high-pass filter with a 128 s cutoff.
Functional data were analyzed using a general linear model. For each participant, we constructed fMRI design matrixes for the learning session and for the extinction session by modeling the following regressors: valued high-probability action (VAL_H), valued low-probability action (VAL_L), devalued high-probability action (DEV_H), devalued low-probability action (DEV_L), neutral high-probability action (NEUT_H), and neutral low-probability action (NEUT_L). These action regressors were modeled as stick functions at the time of action selection. Moreover, we included the fixation and the six movement regressors counting information about motion correction into our model. All regressors were convolved with the canonical hemodynamic response function.
Linear contrasts of regressor coefficients were computed at the single subject level to enable comparison between the VAL_H, VAL_L, DEV_H, DEV_L, NEU_H, and NEU_L actions. The single subject parameter estimates were included in subsequent random effects analyses. For these second-level analyses, a full factorial model was used, with hydrocortisone and yohimbine as between-subject factors. We focused on activations in a priori regions of interest (ROI). A priori ROIs were the orbitofrontal and medial prefrontal cortex, the caudate nucleus, and the putamen, as these structures have been implicated in goal-directed and habitual instrumental learning in earlier studies (Valentin et al., 2007; Tricomi et al., 2009). The referring masks were taken from the Harvard–Oxford cortical and subcortical atlases (provided by the Harvard Center for Morphometric Analysis; http://www.cma.mgh.harvard.edu). ROI analyses were performed using the small volume correction options of SPM8 (p < 0.05). In addition to the ROI analyses, we performed explorative whole-brain analyses. For the explorative whole-brain analyses, the significance threshold was set to p < 0.05 on voxel-level, corrected for multiple testing [family-wise error (FWE) correction], and a minimum cluster size of five voxels. If not stated otherwise, the reported imaging findings are from the ROI analyses.
Results
Manipulation check
Table 1 shows the physiological and subjective effects of hydrocortisone and yohimbine. As expected, hydrocortisone intake led to a significant increase in salivary cortisol (time point of measurement × hydrocortisone interaction: F(1.92,124.55) = 25.30, p < 0.001, η2 = 0.28), which was not observed after yohimbine intake (time point of measurement × yohimbine interaction: p = 0.61). Salivary α-amylase, an indicator of adrenergic activity, however, increased after yohimbine intake (F(3.53,222.20) = 3.10, p = 0.016, η2 = 0.05) but not after hydrocortisone intake (p = 0.22). There were no interaction effects between hydrocortisone and yohimbine, neither for salivary cortisol nor for salivary α-amylase (both p > 0.44). Subjective mood state remained unaffected by hydrocortisone and yohimbine (all main or interaction effects: all p > 0.15).
Physiological and subjective changes after hydrocortisone and yohimbine intake
Behavioral data
Hydrocortisone and yohimbine did not affect instrumental learning
Figure 2 shows the percentage of high-probability actions in chocolate, orange, and neutral trials over the course of training. As training proceeded, all participants, regardless of the experimental group, favored increasingly those actions that led with a high probability to a reward (chocolate milk or orange juice) over their low-probability counterparts (main effects training block: both F > 11, both p < 0.001, both η2 ≥ 0.15). This indicates successful instrumental learning. In neutral trials, participants also showed an increase in the choice of the high-probability action across training (F(3.39,220.54) = 5.39, p < 0.01, η2 = 0.08), yet this increase was significantly less pronounced in neutral trials compared with chocolate and orange trials (training block × trial type interaction: F(6.46,419.78) = 2.36, p = 0.026, η2 = 0.04). Overall, the high-probability action was significantly more often chosen in chocolate and orange trials than in neutral trials (main effect trial type: F(1.64,110.09) = 44.81, p < 0.001, η2 = 0.41). In the last 10 training trials, participants chose the high-probability action more often than the low-probability action in chocolate and orange trials (both t(68) > 14, both p < 0.001) but not in neutral trials (t(68) = 2.28, p = 0.13), suggesting that participants were indifferent as to whether they received the neutral outcome or not.
Percentage of high-probability actions across the training session (1 block = 10 trials). From early training on, all participants, regardless of the experimental group, favored the high-probability actions associated with the rewards (chocolate milk, orange juice) over the corresponding low-probability action (*p < 0.01 in all groups). The high-probability action was significantly more often chosen in the chocolate and orange trials than in the neutral trials. The dashed line marks the 50% mark for high-probability actions, where participants were completely indifferent toward the low- and high-probability actions. Data represent means ± SEM.
The number of high-probability actions associated with the food that was subsequently not eaten correlated significantly with the pleasantness rating for the respective food after the learning session (r = 0.27, p = 0.02); for the food that was subsequently devalued this correlation did not reach statistical significance (r = 0.19, p = 0.12). Notably, hydrocortisone and yohimbine had no effects on learning curves in the instrumental task (all main and interaction effects: all p > 0.15, all η2 ≤ 0.03).
Selective outcome devaluation was not affected by hydrocortisone or yohimbine
During the selective outcome devaluation after instrumental training, participants ate on average 2.91 150-g-cups chocolate pudding (SEM: 0.15) or 2.49 oranges (SEM: 0.10). Hunger ratings dropped from 46 (SEM: 3.53) before satiety to 19 (SEM: 2.60) after satiety (t(68) = 9.14, p < 0.001). The subjective pleasantness ratings confirmed that the devaluation procedure selectively reduced the value of the food eaten to satiety, whereas the motivational value of the other food reward remained intact. As shown in Figure 3, pleasantness ratings decreased markedly for the food eaten to satiety but not for the food that was not eaten (time × food interaction: F(1,65) = 41.95, p < 0.001, η2 = 0.39).
Subjective pleasantness ratings before training, after training (i.e., before the outcome devaluation), and before the extinction test (i.e., after the outcome devaluation). Pleasantness ratings were given on a scale from 0 (not pleasant) to 100 (very pleasant). Before the outcome devaluation, participants found the rewards (valued and devalued outcomes) more pleasant than the common and the neutral outcomes. After participants were satiated with oranges or chocolate pudding, the pleasantness ratings decreased for the food eaten to satiety (devalued outcome) relative to the food not eaten (valued outcome). Data represent means ± SEM.
It is important to note that hydrocortisone and yohimbine did not affect the amount of food that was eaten during the devaluation, nor the subjective hunger and pleasantness ratings (all p > 0.13).
Simultaneous glucocorticoid and noradrenergic activity rendered instrumental behavior habitual
Figure 4 shows participants' choices in the extinction test. Those participants that had received a placebo before learning performed goal-directed. In line with their pleasantness ratings, they chose the high-probability action that was previously associated with the now devalued outcome significantly less often than the high-probability action that led previously to the valued outcome (F(1,18) = 18.79, p < 0.001, η2 = 0.51). In the first 10-trial extinction block, before they could know that the rewards were not presented any longer, the participants in the PLAC group still preferred the valued high-probability action over its low-probability counterpart (t(18) = 5.22, p < 0.01; binomial test). In contrast, the PLAC group avoided the devalued high-probability action at the beginning of the extinction test (t(18) = −4.59, p < 0.01).
Percentage of high-probability actions across the extinction session (1 block = 10 trials). At the beginning of the extinction test, before they had the chance to learn that the rewards are not presented any longer, all participants favored the high-probability action that was previously associated with the valued outcome over its low-probability counterpart (*p < 0.01 in all groups). In contrast to the valued high-probability action, the devalued high-probability action was avoided by participants who had received a placebo (PLAC), hydrocortisone alone (PLAC/CORT), or yohimbine alone (PLAC/YOH) (§p < 0.01). Participants who had received both hydrocortisone and yohimbine (CORT/YOH), however, still preferred also the devalued high-probability action over the corresponding low-probability action (#p < 0.01), suggesting that the behavior of those participants was insensitive to the change in the value of the outcome. The dashed line marks the 50% mark for high-probability actions, where participants were completely indifferent to the low- and high-probability actions. Data represent means ± SEM.
Participants that were administered either hydrocortisone or yohimbine alone before instrumental learning behaved similar to those in the PLAC group. Both the PLAC/CORT and the PLAC/YOH groups selected the valued high-probability action significantly more often than the devalued high-probability action (both F > 14, both p < 0.001, both η2 > 0.48). Moreover, both groups preferred the valued high-probability action over the referring low-probability action in the first extinction block (both t > 4.92, both p < 0.01), whereas they tended to avoid the devalued high-probability action (both t < −2.30, both p ≤ 0.06).
In sharp contrast to the other three groups, the behavior of participants that had received both hydrocortisone and yohimbine was insensitive to the change in the value of the outcome and thus habitual. Participants in the CORT/YOH group chose the valued and the devalued high-probability actions equally often in the extinction test (F(1,16) = 1.47, p = 0.24, η2 = 0.08). Although they had also indicated, same as the other groups, that that they did not want the outcome that was eaten to satiety any more, they still favored both the valued and the devalued high-probability actions over the corresponding low-probability actions in the first extinction block (both t(16) > 4.79, both p < 0.01).
In support of these interpretations, a trial type (valued vs devalued) × block (5 10-trial extinction blocks) × hydrocortisone (Hydrocortisone vs Placebo) × yohimbine (Yohimbine vs Placebo) ANOVA yielded a significant three-way interaction between trial type, hydrocortisone, and yohimbine (F(1,65) = 5.84, p = 0.019, η2 = 0.08). Both drugs interactively altered behavior in devalued trials (hydrocortisone × yohimbine interaction: F(1,65) = 4.01, p < 0.05, η2 = 0.06), whereas there were no hydrocortisone or yohimbine effects in valued trials (F(1,65) = 0.19, p = 0.67, η2 < 0.01). Compared with the other three groups, the CORT/YOH group chose the devalued high-probability action significantly more often (all p < 0.001, least squares difference post hoc tests).
Because the sensitivity of participants' behavior to the outcome devaluation should be clearest at the beginning of the extinction test, we next compared the change in behavior from the last training block to the first extinction block in valued and devalued trials by means of a trial type × block (last 10 training trials vs first 10 extinction trials) × hydrocortisone × yohimbine ANOVA. This analysis yielded a significant four-way interaction (F(1,65) = 3.91, p = 0.05, η2 = 0.06) showing that hydrocortisone and yohimbine affected the change from training to test for the devalued high-probability action (block × hydrocortisone × yohimbine interaction: F(1,65) = 5.57, p = 0.02, η2 = 0.08) but not for the valued high-probability action (F(1,65) = 0.03, p = 0.87, η2 < 0.01). Follow-up tests revealed that responding to the devalued high-probability action decreased after the outcome devaluation in the PLAC, PLAC/CORT, and PLAC/YOH groups (all F > 48, all p < 0.001, all η2 > 0.75) but not in the CORT/YOH group (F(1,16) = 1.24, p = 0.28, η2 = 0.07; Fig. 5). These data underline that the simultaneous administration of hydrocortisone and yohimbine rendered participants' instrumental behavior insensitive to changes in outcome value.
Changes in instrumental behavior from the last 10 training trials to the first 10 extinction test trials. Participants that were administered a placebo (PLAC), hydrocortisone alone (PLAC/CORT), or yohimbine alone (PLAC/YOH) before instrumental training showed a marked decrease in responding to the devalued high-probability action after the devaluation (**p < 0.001), which is indicative of goal-directed action. No such decrease was observed in participants that had received both hydrocortisone and yohimbine (CORT/YOH), suggesting that their behavior was under habitual control. Data represent means ± SEM.
Imaging data
Neural correlates of instrumental learning
To identify brain regions that are involved in instrumental learning, we contrasted responses during the selection of high-probability actions associated with rewards (i.e., chocolate milk and orange juice) with those during the selection of neutral high-probability actions. This analysis revealed significant activations in the orbitofrontal cortex (left: x = −24, y = 34, z = −24; Z = 3.79, p = 0.05, FWE corrected; right: x = 26, y = 32, z = −18; Z = 3.22, p < 0.001; Fig. 6A), in the putamen (right: x = 30, y = −16, z = −4; Z = 3.84, p = 0.018, FWE corrected; left: x = −30, y = −2, z = 6; Z = 3.04, p = 0.001; Fig. 6B), and in the caudate nucleus (right: x = 10, y = 0, z = 12; Z = 3.54, p = 0.035, FWE corrected; left: x = −12, y = 2, z = 12; Z = 2.85, p < 0.001; Fig. 6C). Moreover, we obtained significant correlations between the subjective pleasantness of the rewards and activity in the putamen (right: x = 16, y = 14, z = −6; Z = 2.90, p < 0.002) and medial prefrontal cortex (x = 2, y = 38, z = −14; Z = 2.83, p < 0.002) during the selection of high-probability actions associated with rewards. Importantly, there were no main or interaction effects of hydrocortisone and yohimbine on the neural correlates of instrumental learning. Subjective mood also did not affect the brain areas involved in instrumental learning, as shown by an analysis of covariance with the mood ratings as covariates, which did not affect our pattern of results.
Brain activity associated with instrumental learning. A–C, Increased activity during high-probability actions associated with rewards compared with neutral high-probability actions was observed in the orbitofrontal cortex (MNI coordinates of peak voxel: −24, 34, −24; A), in the putamen (30, −16, −4; B), and in the caudate nucleus (10, 0, 12; C). Shown are coronal, sagittal, and horizontal sections, superimposed on a T1-template image.
However, α-amylase (but not cortisol) levels before training correlated significantly with activity in the putamen during the training session (right: x = 32, y = −2, z = −2; Z = 3.56, p = 0.038, FWE corrected; left: x = −26, y = −2, z = −6; Z = 2.60, p < 0.005). To assess whether noradrenergic activation alone correlated with putamen activity or whether this correlation was mainly carried by the CORT/YOH group, we analyzed the correlation between α-amylase and putamen activity in the groups that were administered hydrocortisone (i.e., the PLAC/CORT and CORT/YOH groups) and the groups that were not administered hydrocortisone (i.e., the PLAC/PLAC and PLAC/YOH groups) separately. This analysis showed that α-amylase correlated with putamen activity in the hydrocortisone groups (x = −26, y = −10, z = 6; Z = 3.65, p = 0.045, FWE corrected), whereas there was no correlation between α-amylase and putamen activity in the no-hydrocortisone groups.
In addition to activation in the orbitofrontal cortex, exploratory whole-brain analyses with a threshold of p < 0.001 (uncorrected) yielded activation in the left frontal gyrus (x = −46, y = 8, z = 54; Z = 4.19), in the right hippocampus (x = 22, y = −28, z = −8; Z = 4.26), in the superior parietal lobe (x = 24, y = −60, z = 56; Z = 4.02), and in the right pallidum (x = 28, y = −8, z = 0; Z = 3.67) for reward-related high-probability actions compared with neutral high-probability actions. None of these activations were affected by hydrocortisone or yohimbine.
Interactive effect of hydrocortisone and yohimbine on the neural correlates of goal-directed action
Brain regions involved in the goal-directed control of instrumental learning should respond differently to high- compared with low-probability actions in valued relative to devalued trials in the extinction test (Valentin et al., 2007). Therefore, we examined significant trial type by action type interactions (i.e., the contrast [(VAL_H − VAL_L) − (DEV_H − DEV_L)]) during the first 20 extinction test trials, when the effect of the devaluation should be strongest and the participants did not yet know that the rewards were no longer being presented. Corroborating earlier findings that implicated this brain region in goal-directed control (Valentin et al., 2007), the orbitofrontal cortex showed activation in this contrast (x = 24, y = 16, z = −20; Z = 2.97, p < 0.001). Next, we submitted this contrast to a full factorial model with the factors hydrocortisone and yohimbine to identify areas that were modulated by glucocorticoids and noradrenergic activity. This analysis yielded no main effects of hydrocortisone or yohimbine, but did yield a significant interaction effect of both drugs in the orbitofrontal cortex (right: x = 24, y = 10, z = −20; Z = 4.05, p = 0.012, FWE corrected; left: x = −36, y = 26, z = −18; Z = 3.10, p < 0.001; Fig. 7A) and, at a more lenient threshold, in the medial prefrontal cortex (x = 4, y = 46, z = −20; Z = 2.71, p < 0.005). Follow-up analyses revealed that hydrocortisone had no effect on brain activity when participants received hydrocortisone alone. However, if participants were administered yohimbine also, hydrocortisone decreased activation in the orbitofrontal cortex (PLAC/YOH > CORT/YOH; right: x = 26, y = 14, z = −18; Z = 4.43, p = 0.003, FWE corrected; left: x = −14, y = 20, z = −16; Z = 3.05, p < 0.001; Fig. 7B) and in the medial prefrontal cortex (x = −2, y = 38, z = −18; Z = 3.63, p = 0.036, FWE corrected). The parameter estimates shown in Figure 7C suggest that hydrocortisone and yohimbine interactively mainly changed the representations of the devalued high- and low-probability actions, whereas the representations of the valued high- and low-probability actions were comparable to those in the other three groups. Compared with the other three groups, the CORT/YOH group appeared to respond to the devalued high-probability action less negatively and to the devalued low-probability action, which provided an opportunity to avoid the devalued outcome, less positively.
Interactive effect of hydrocortisone and yohimbine on the neural basis of goal-directed action. Neural correlates of goal-directed action are expressed as interaction contrast between trial type (valued, devalued) and action type (high-probability, low-probability). A, Hydrocortisone and yohimbine interactively altered activity in the medial prefrontal and orbitofrontal cortex (MNI coordinates of peak voxel in the orbitofrontal cortex: 24, 10, −20) in this contrast. B, Follow-up tests showed that activity in the medial prefrontal and orbitofrontal cortex (26, 14, −18) was reduced in participants who had received both hydrocortisone and yohimbine compared with participants who were administered yohimbine alone (PLAC/YOH > CORT/YOH). C, Parameter estimates of the peak voxel in the orbitofrontal cortex for the high- and low-probability actions in valued and devalued trials for all four experimental groups. Data represent means ± SEM.
In addition to the above mentioned ROIs, exploratory whole-brain analyses revealed significant activations in the inferior temporal gyrus (x = 48, y = 24, z = −20; Z = 4.26) and the anterior cingulate cortex (x = −4, y = 28, z = 0; Z = 4.07) in the trial type × action type × hydrocortisone × yohimbine contrast, at the threshold of p < 0.001 (uncorrected).
We did not obtain any significant correlations between brain activity and the number of valued and devalued high-probability actions. Nevertheless, to rule out the possibility that group differences in brain activity are simply due to differences in the number of valued and, particularly, devalued high-probability actions, we analyzed our data with an ANCOVA in which the valued and devalued high-probability actions were entered as covariates. This analysis showed that our pattern of results remained when we controlled for differences in valued and devalued high-probability actions, thus ruling out the possibility that group differences in the number of correct and incorrect responses could account for the obtained differences in brain activity. Furthermore, analyses of covariance with subjective mood, subjective pleasantness ratings, or the amount of food that was consumed during the devaluation as covariates did not change the pattern of results, suggesting that none of these factors mediated the observed interactive effect of yohimbine and hydrocortisone. There were also no correlations between salivary cortisol or α-amylase levels and the activity of the orbitofrontal or medial prefrontal cortex (or any other brain area).
Although it is rather unlikely that habits developed after 50 trials per trial type, in the next step, we looked for brain areas that were insensitive to the outcome devaluation and continued to respond to both the valued and the devalued high-probability action. For this purpose, we performed a conjunction analysis in which we tested for regions that showed significant activation during the choice of the valued high-probability action and during the devalued high-probability action compared with neutral high-probability actions. This conjunctions analysis, however, yielded no significant activations. Activation in the putamen, the human homolog of the dorsolateral striatum in rodents (Balleine and O'Doherty, 2010), was observed only at a very liberal threshold (x = −16, y = 6, z = −4; Z = 1.85, p = 0.03) and should therefore not be overstated. Hydrocortisone and yohimbine did not modulate activation in the conjunction contrast.
Discussion
The concerted action of glucocorticoid stress hormones and noradrenergic activity may promote a shift from goal-directed to habitual instrumental learning (for review, see Schwabe and Wolf, 2011). Using fMRI, we present here the putative neural mechanism underlying this effect. In line with our previous data (Schwabe et al., 2010b), we found that the combined administration of hydrocortisone and the α2-receptor antagonist yohimbine, but not the administration of hydrocortisone or yohimbine alone, rendered instrumental behavior insensitive to outcome devaluation, thus indicating that glucocorticoids and noradrenergic activation operate in concert to shift instrumental learning from goal-directed to habitual control. Our imaging data suggest that the simultaneous glucocorticoid and noradrenergic activity disrupted the neural basis of goal-directed action.
In particular, we show here that the orbitofrontal cortex was active during instrumental learning and, more importantly, that its activity reflected changes in the incentive value of an outcome. This result supports earlier findings suggesting a critical role of the orbitofrontal cortex in goal-directed learning (Balleine and Dickinson, 1998b; Ostlund and Balleine, 2005; Valentin et al., 2007). Concurrent glucocorticoid and noradrenergic activity reduced the sensitivity of the orbitofrontal (and medial prefrontal) cortex to changes in outcome value. The prefrontal cortex is one of the brain regions with the highest density of stress hormone receptors (Patel et al., 2000) and impairing effects of stress on the functioning of the prefrontal cortex are well documented (for review, see Arnsten, 2009). Stress suppresses neuroplasticity processes in the prefrontal cortex (Diamond et al., 2007) and hampers prefrontal cortex-dependent cognitive control (Lyons et al., 2000; Scholz et al., 2009). Moreover, the present results are also compatible with evidence showing that the impairing effects of stress on prefrontal cortex-dependent working memory are mediated by an interaction of glucocorticoids and noradrenergic activity (Roozendaal et al., 2004).
Although glucocorticoids and noradrenergic activity affected the neural substrate of goal-directed learning, the habit component appeared to be unaffected. Goal-directed and habit processes are thought to operate in tandem and habit behavior develops usually only after extended training (Adams, 1982; Dickinson and Balleine, 1994; Balleine and Dickinson, 1998b; Killcross and Coutureau, 2003). Here, we used a moderate number of trials because we did not intend to examine habit behavior per se but to unravel the neural correlates of the stress (hormone)-induced shift toward habit behavior previously seen after moderate training (Schwabe and Wolf, 2009; Schwabe et al., 2011). Nevertheless, it has been proposed that some brain areas may show responses consistent with habitualization even after limited training (Killcross and Coutureau, 2003; Daw et al., 2005). During instrumental training we observed, in addition to activation in the orbitofrontal cortex, activation in areas of the dorsal striatum that have been related to habit behavior before (Knowlton et al., 1996; Valentin et al., 2007; Tricomi et al., 2009). However, the dorsal striatum is functionally heterogeneous with its medial (the caudate nucleus in humans) part being involved in goal-directed learning and its lateral part (the putamen in humans) being involved in habit learning (Yin et al., 2004, 2005; Balleine and O'Doherty, 2010). The caudate nucleus has been implicated in the integration of reward with action control (Balleine and O'Doherty, 2010). It is therefore not surprising that we obtained caudate activation during instrumental learning. The observed activation in the putamen during instrumental learning might indeed reflect an early habit component. Interestingly, noradrenergic activation in the face of elevated glucocorticoid levels was associated with stronger activation of the putamen, suggesting that stress hormones might strengthen the neural correlates of habit action (Tricomi et al., 2009) already relatively early during instrumental learning. However, putamen activity did not manifest as a locus of habitualization in the extinction test, at least not to an extent that could be reliably detected with fMRI. Neither caudate nor putamen activity was modulated by glucocorticoids and noradrenergic activation during extinction testing. Thus, we suggest that the observed shift toward habit performance after combined hydrocortisone and yohimbine administration is mainly due to the reduced capacity of the goal-directed system to adequately represent changes in outcome value. Acute stress hormone elevations seem not to enhance the habit system during moderate training. This, however, might be different after repeated stress and more extended training (Dias-Ferreira et al., 2009).
Moreover, using already established habits or conditions in which the incentive value of a reward has already been increased through extensive training may have led to different results. In particular, the brain areas involved in habit behavior, such as the putamen, should be more active in this case. Whether glucocorticoids and noradrenergic activation would have different effects on instrumental behavior and its neural correlates when already established habits are used during learning is an interesting question for future research.
In previous behavioral studies, stress hormones were elevated both during instrumental learning and extinction testing (Dias-Ferreira et al., 2009; Schwabe and Wolf, 2009; Schwabe et al., 2010b). It was therefore difficult to disentangle possible stress effects on the acquisition of instrumental behavior from those on its expression. One study showed the stress-induced bias toward habits when stress was administered before extinction testing, thus ruling out effects on the acquisition of instrumental behavior (Schwabe and Wolf, 2010). However, the influence of stress appeared to be stronger when participants were exposed to a stressor before learning (Schwabe and Wolf, 2009), leaving the possibility that stress affects the acquisition of goal-directed and habit behavior. Using fMRI, we examined in the present study whether stress hormones may already influence the (neural correlates of the) acquisition of instrumental behavior. Our imaging data argue clearly in favor of the view that stress hormones affect the expression of instrumental behavior but not its acquisition. Glucocorticoids and noradrenergic activity were elevated both during the learning session and during the extinction session. However, the stress hormones did not influence learning curves in the instrumental task, nor did they change the neural correlates of instrumental learning. The interactive effect of glucocorticoids and noradrenergic arousal only became apparent after one of the outcomes had been devalued (but did not affect the devaluation itself as suggested by the hunger and pleasantness ratings). This pattern of results suggests that simultaneous glucocorticoid and noradrenergic activity interfered primarily with one of the key functions of the orbitofrontal cortex, the flexible encoding of (changing) values of expected outcomes (Thorpe et al., 1983; Gottfried et al., 2003; Schoenbaum et al., 2003).
The stress-induced shift from goal-directed toward habitual instrumental behavior is another indication of the impact of stress on the engagement of multiple learning and memory systems (Schwabe et al., 2010a). In spatial navigation tasks, stress promotes a shift from hippocampus-dependent spatial to caudate nucleus-dependent stimulus–response learning (Kim et al., 2001; Schwabe et al., 2007, 2010c). Recently, we showed that stress favors striatal procedural learning over hippocampal declarative learning also in probabilistic classification learning and that this effect is because of an impairment of the declarative system (Schwabe and Wolf, 2012). Together, these studies suggest that stress modulates learning and memory systems in favor of habit learning and at the expense of cognitive learning. This switch appears to be due to an impairment of the cognitive systems, which shifts the balance between memory systems toward the habit system.
Finally, two potential limitations of the present study should be noted. First, this study used foods as rewards and during outcome devaluation food intake is known to increase glucocorticoid levels (Rosmond et al., 1998). Indeed, we obtained moderate increases in salivary cortisol from before the devaluation to the end of the extinction session, which might be due to the food intake during the devaluation. However, the amount of food consumed during the devaluation was comparable in our experimental groups and the elevations seen after the devaluation are relatively minor compared with those seen after drug intake. Moreover, the observed group differences in brain activity remained after controlling for the amount of food consumed by means of an ANCOVA. Thus, we do not think that eating-related changes in stress hormone levels had a major influence on our results. Second, yohimbine has been reported to increase subjective stress and anxiety (Charney et al., 1984; Morgan et al., 1993). Although yohimbine led to significant elevations in α-amylase, we did not find any changes in subjective mood after yohimbine intake in the present study. Potential explanations for the lack of a yohimbine effect on subjective feeling include the relatively moderate dose of the drug, the route of drug administration (oral vs i.v.), and the sensitivity of our mood questionnaire (MDBF), which was designed to assess subjective mood but not stress or anxiety in particular.
An aberrant engagement of habit processes is thought to be involved in several psychiatric disorders, including drug addiction (Robbins and Everitt, 1999; Everitt and Robbins, 2005). It is well known that stress is a major risk factor for addiction, particularly for relapse to addictive behavior (Piazza and Le Moal, 1998; Sinha, 2007). The findings that the stress-induced shift from goal-directed to habit behavior necessitates both glucocorticoid and noradrenergic activity and that a blockade of noradrenergic activity prevents this shift (Schwabe et al., 2011) may therefore have important implications for the treatment of addictive disorders. In the present study, we provide the first evidence of how the stress-induced shift from goal-directed to habit action may be represented in the human brain. Our data suggest that concurrent glucocorticoid and noradrenergic activity impairs the capacity of the orbitofrontal cortex to adequately encode changes in outcome value. This impairment of the goal-directed system produces habit behavior, which is likely to impede adaptation to varying environments.
Footnotes
This work was supported by Deutsche Forschungsgemeinschaft Grant SCHW1357/2-2. We gratefully acknowledge the assistance of Florian Watzlawik and Carsten Siebert during data collection. We thank Tobias Otto for his technical assistance.
- Correspondence should be addressed to Dr. Lars Schwabe, Department of Cognitive Psychology, Ruhr-University Bochum, Universitaetsstrasse 150, 44780 Bochum, Germany. Lars.Schwabe{at}rub.de