Abstract
In humans, presence of an A1 allele of the DRD2/ANKK1-TaqIa polymorphism is associated with reduced expression of dopamine (DA) D2 receptors in the striatum. Recently, it was observed that carriers of the A1 allele (A1+ subjects) showed impaired learning from negative feedback in a reinforcement learning task. Here, using functional MRI (fMRI), we investigated carriers and noncarriers of the A1 allele while they performed a probabilistic reversal learning task. A1+ subjects showed subtle deficits in reversal learning. In particular, these deficits consisted of an impairment in sustaining the newly rewarded response after a reversal and in a generally decreased tendency to stick with a rewarded response. Both genetic groups showed increased fMRI signal in response to negative feedback in the rostral cingulate zone (RCZ) and anterior insula. Negative feedback that incurred a change in behavior additionally engaged the ventral striatum and a region of the midbrain consistent with the location of dopaminergic cell groups. The response of the RCZ to negative feedback increased as a function of preceding negative feedback. However, this graded response was not observed in the A1+ group. Furthermore, the A1+ group also showed diminished recruitment of the right ventral striatum and the right lateral orbitofrontal cortex (lOFC) during reversals. Together, these results suggest that a genetically driven reduction in DA D2 receptors leads to deficient feedback integration in RCZ. This, in turn, was accompanied by impaired recruitment of the ventral striatum and the right lOFC during reversals, which might explain the behavioral differences between the genetic groups.
Introduction
Surviving in a changing environment requires constant evaluation of action outcomes. One experimental paradigm representing changing environments is reversal learning. Subjects learn to respond to one specific stimulus to receive a reward. After a number of trials, task contingencies are reversed, and the alternative stimulus is rewarded. Neuroimaging studies in humans and lesion studies in humans and animals show that reversal learning requires the integrity of the ventral prefrontal cortex (PFC) and the dorsomedial and ventral striatum (Divac et al., 1967; Iversen and Mishkin, 1970; Dias et al., 1996; Cools et al., 2002; Fellows and Farah, 2003; Hornak et al., 2004; Izquierdo et al., 2004; Clarke et al., 2008). Furthermore, performance monitoring and implementation of flexible, adaptive behavior seems to engage the anterior cingulate cortex (ACC), particularly the rostral cingulate zone (RCZ) (Ullsperger and von Cramon, 2003; Ridderinkhof et al., 2004; Debener et al., 2005; Rushworth et al., 2007). Negative feedback calling for behavioral adjustments (Kringelbach and Rolls, 2004) activates the orbitofrontal cortex (OFC), often accompanied by ACC activation (Kringelbach, 2005).
Dopamine (DA), particularly D2 receptors, seems to be required for reversal learning (Cools et al., 2001, 2007). Both genetic deletion of the D2 receptor (Kruzich and Grandy, 2004) and pharmacological blockade (Ridley et al., 1981; Lee et al., 2007) was shown to be detrimental for reversal learning in rodents and monkeys. However, agonism of D2 receptors has also been shown to impair reversal learning (Smith et al., 1999; Mehta et al., 2001). Furthermore, it was shown in the rat that DA in the nucleus accumbens part of the ventral striatum is necessary for reversal learning (Taghzouti et al., 1985). In addition to pharmacological interventions and lesions to study the neurochemical bases of reversal learning, there are also genetic polymorphisms leading to natural variations in DA transmission. The DRD2/ANKK1-TaqIa polymorphism modulates the density of DA D2 receptors. The A1 allele is associated with a reduction in striatal D2 receptor density of up to 30% (Thompson et al., 1997; Pohjalainen et al., 1998; Jönsson et al., 1999; Ritchie and Noble, 2003). This reduction is particularly prominent in ventral parts of caudate and putamen. Additionally, reduced glucose metabolism is observed in carriers of the A1 allele, not only in the striatum but also in remote areas such as ventral and medial PFC (Noble et al., 1997). Given the importance of these areas for reversal learning, we hypothesized that the reduction of glucose metabolism and the reduced density of D2 receptors in the ventral striatum in carriers of the A1 allele should lead to impaired reversal learning and reversal-related brain activity in these areas. Furthermore, given the role of the ACC/RCZ in integrating action outcomes over multiple trials (Kennerley et al., 2006), we postulated that activation of the RCZ by negative feedback would be dependent on the outcome of previous trials. We tested whether this history-driven response of the RCZ is impaired in A1+ subjects. To address these hypotheses, we scanned carriers and noncarriers of the A1 allele with functional MRI (fMRI) while they performed a probabilistic response reversal learning task.
Materials and Methods
Participants.
Thirty-five male, Caucasian subjects, aged 20 to 32 years, participated in the study. Subjects were invited with respect to their DRD2/ANKK1-TaqIa polymorphism configuration from a larger sample which was in Hardy–Weinberg equilibrium. Seven subjects had to be excluded, one because of malfunction of the presentation system, the others because they did not perform the task satisfactorily [≤10 final reversal errors (see below) or an excess of switching behavior, i.e., > 100 switches between the responses throughout the experiment]. Of those six subjects, four belonged to the A1− group and two to the A1+ group. Thus, 15 subjects of the A1− group and 13 of the A1+ group remained (mean age A1−, 26.33; mean age for A1+, 25.92; difference not significant). We included only male subjects in our study to avoid menstrual cycle-dependent interactions between the dopaminergic system and gonadal steroids (Becker et al., 1982; Becker and Cha, 1989; Creutz and Kritzer, 2004; Dreher et al., 2007). The study was approved by the Research Ethics Committee of the University of Leipzig, Germany.
Genetic analyses.
The DRD2/ANKK1-TaqIa polymorphism is a restriction fragment polymorphism on chromosome 11 at q22–q23 (Noble, 2003; Reuter et al., 2005). Three genotypes of the dopamine DRD2/ANNK1-TaqIa locus can be differentiated: the A1A1 genotype, the A1A2 genotype, and the A2A2 genotype. Because of the small prevalence of the A1A1 genotype (3% of healthy Caucasians), A1A1 and A1A2 subjects are commonly grouped as A1+ subjects, whereas A2A2 subjects are referred to as A1− subjects. The prevalence of at least one A1 allele (A1+ group) leads to an up to 30% reduction in D2 receptor density (Thompson et al., 1997; Pohjalainen et al., 1998; Jönsson et al., 1999; Ritchie and Noble, 2003).
The direct impact of the DRD2/ANKK1-TaqIa polymorphism (rs1800497) on D2 receptor density has recently been questioned (Lucht and Rosskopf, 2008), because this single nucleotide polymorphism (SNP) is located <10 kb downstream of the DRD2 gene within a protein-coding region of the adjacent ANKK1 gene (Neville et al., 2004). Zhang et al. (2007) investigated 23 SNPs within the D2 gene and found a decreased expression of the short splice variant of the D2 receptor compared with the long splice variant caused by two intronic SNPs (rs2283265 and rs1076560). Interestingly, in the study by Zhang et al. (2007), the minor allele of the two SNPs shows strong linkage disequilibrium with the A1 allele of the DRD2/ANKK1-TaqIa polymorphism (D′ = 0.855). This data indicates that, because of linkage, the DRD2/ANKK1-TaqIa polymorphism is a marker for DA receptor density.
DNA was extracted from buccal cells. Automated purification of genomic DNA was conducted by means of the MagNA Pure LC system using a commercial extraction kit (MagNA Pure LC DNA isolation kit; Roche Diagnostics). Genotyping of the three SNPs (rs1800497, rs1076560, rs2283265) was performed by real-time PCR using fluorescence melting curve detection analysis by means of the Light Cycler System 1.5 (Roche Diagnostics). The primers and hybridization probes (TIB MOLBIOL) are as follows: DRD2/ANKK1-TaqIa (rs1800497): forward primer: 5′-CGGCTGGCCAAGTTGTCTAA-3′, reverse primer: 5′-AGCACCTTCCTGAGTGTCATCA-3′; anchor hybridization probe: 5′-LCRed640-TGAGGATGGC-TGTGTTGCCCTT-phosphate-3′; sensor hybridization probe: 5′-CTGCCTCGACCAGCACT-fluorescin-3′; rs1076560: forward primer: 5′-GGGTATTGAGGCTGCATGA-3′, reverse primer: 5′-GGTAAAGCCGGACAAGTT-3′; anchor hybridization probe: 5′-LCRed640-GGGTGACCCTGTGGTGTTTGC-phosphate-3′; sensor hybridization probe [G]: 5′-CCTTTCCCCCTCTGAAGACTCC-fluorescin-3′; rs2283265: forward primer: 5′-TCTTGGGCTAGACGCAT-3′, reverse primer: 5′-GTGGAATCCTCAAGACCACC-3′; anchor hybridization probe: 5′-LCRed640-CCTGTTTCCTCATCTGTTAAATGGGAAT-phosphate-3′; sensor hybridization probe [T]: 5′-TTAGGCAAGTTTCTTACCTTCTATGA-fluorescin-3′.
Haplotype analysis.
Linkage analyses between SNPs and construction of haplotype blocks were conducted by means of Haploview 3.32 (http://www.broad.mit.edu/mpg/haploview/index.php). Two subjects did not give their approval for reanalyzing their genetic samples for rs2283265 and rs1076560. Therefore, the sample size for the haplotype analyses was n = 26 (one subject from each DRD2/ANKK1-TaqIa group missing). Individual haplotypes were calculated with PHASE, version 2.1. PHASE implements a Bayesian statistical method for reconstructing haplotypes from population genotype data. In simulation experiments, it turned out that the mean error rate using PHASE was approximately half that obtained by the expectation-maximization algorithm (Stephens et al., 2001).
Experimental design.
We used a probabilistic response reversal task (Cools et al., 2002). In each trial, subjects were required to choose between two identical stimuli (two symbolic square buttons of the same color) located to the left and to the right of a central fixation cross. Subjects had to index their response with the index finger of the left or right hand. One of the two responses (left or right) was rewarded in 75% of the trials, whereas in the remaining 25% of trials, the other response was rewarded. Reward allocation to one of the two responses was, thus, mutually exclusive. After a predefined block length of 18–24 trials (randomly jittered), the contingencies reversed, and the other response was now rewarded in 75% of the trials. Note that this reversal learning task is entirely response based, implementing a reversal in response–reward mapping. This is in contrast to the task used by Cools et al. (2002), which implements a reversal in the stimulus–reward mapping.
Participants were instructed to switch to the other response only when they were sure that the rule had changed. Subjects underwent 19 blocks (and thus 18 contingency reversals), totaling 382 trials. Mean trial duration was 5 s. Additionally, 46 null events of the same duration were randomly interspersed with the experimental trials. During null events, only the fixation cross was presented. The entire experiment lasted slightly <36 min. In each trial (Fig. 1A), the central fixation cross was presented, followed after a variable interval (randomly jittered between 300, 700, 1200, 1800 and 2500 ms) by presentation of the two stimuli. The two stimuli remained on screen until the subject made a response or after 1000 ms had elapsed. After a response was made, the corresponding button on the screen was depressed to mark subjects' choices. Feedback consisted of a smiling face for correct responses and a frowning face for incorrect responses. If no response was made within the 1000 ms response window, a face with a question mark was presented. Feedback was presented centrally between the two stimuli and with a delay of 100 ms after the response and remained on screen for 800 ms. After feedback offset, only the fixation cross remained on the screen until the end of the trial. For each positive feedback, participants received 0.01 Euros. The cumulative reward was paid at the end of the experiment. Before scanning, subjects underwent a 30 trial training session to get familiarized with the concept of probabilistic errors (Cools et al., 2002).
A, Sequence of events within a trial of the probabilistic reversal learning task. After selection of one of the two stimuli, the choice was visualized to the subject by depression and darkening of the respective button on the screen. This was followed after 100 ms by positive or negative feedback, according to the task schedule. B, Example of a sequence of trials and the categorization of the trials according to the subject's response and the feedback obtained.
Image acquisition.
Data acquisition was performed at 3T on a Siemens Magnetom Trio equipped with a standard birdcage head coil. Thirty slices (3 mm thickness, 0.3 mm interslice gap) were obtained parallel to the anterior commissure–posterior commissure line using a single-shot gradient echo-planar imaging (EPI) sequence (repetition time, 2000 ms; echo time, 30 ms; bandwidth, 116 kHz; flip angle, 90°; 64 × 64 pixel matrix; field of view, 192 mm) sensitive to blood oxygen level-dependent (BOLD) contrast. To improve the localization of activations, a high resolution brain image (three-dimensional reference data set) was recorded from each participant in a separate session using a modified driven equilibrium Fourier transform sequence.
Image processing and analysis.
Analysis of fMRI data was performed using tools from the Functional Magnetic Resonance Imaging of the Brain (FMRIB) Software Library (FSL) (Smith et al., 2004). Functional data were motion-corrected using rigid-body registration to the central volume (Jenkinson et al., 2002). Low frequency signals were removed using a Gaussian-weighted lines 1/100 Hz highpass filter. Spatial smoothing was applied using a Gaussian filter with 8 mm full-width at half-maximum. Slicetime acquisition differences were corrected using Hanning-windowed sinc-interpolation. Registration of the EPI images with the high resolution brain images and normalization into standard (MNI) space was performed using affine registration (Jenkinson and Smith, 2001). A general linear model was fitted into prewhitened data space to account for local autocorrelations (Woolrich et al., 2001). Analysis I aimed at investigating effects of negative and positive feedback in general. Analysis II considered negative feedback in relation to reversals in task contingencies and behavioral changes. For Analyses I, negative and positive feedback were modeled at feedback onset and the contrast between negative and positive feedback (ALLNEG vs ALLPOS) was assessed. For Analyses II, a different trial classification was used, similar to the one used by Cools et al. (2002): negative feedback that was delivered after a correct response because of the probabilistic task schedule was termed a probabilistic error. When task contingencies reversed and subjects received negative feedback because they still applied the previously correct response, this was called a reversal error (REVERR), however, only if those errors were not followed by a change of behavior on the subsequent trial. In contrast, reversal errors that were followed by a switch to the then correct response on the next trial were considered to be final reversal errors (FINREVERR) (Fig. 1B). All positive feedback trials were grouped together and included in the model. For both analyses, regressors were convolved with a synthetic hemodynamic response function (HRF; double gamma function) and its first derivative. For group analyses, individual contrast images derived from contrasts between parameter estimates for the different events were entered into a second-level mixed effects analysis (Woolrich et al., 2004), for which a general linear model was fitted to estimate the group mean effect of the regressors. Analyses were first performed separately for both genotypes to detect patterns of activation. Subsequently, t tests were performed to assess differences in brain activity between the two genetic groups. Unless stated differently, results are reported on the whole-brain level. The statistical threshold was set to p < 0.001, uncorrected.
The following contrasts were calculated and assessed within and between the two groups: For the effects of negative feedback in general, the contrast ALLNEG versus ALLPOS was analyzed. To investigate activity on error trials that was specific to reversals, we compared final reversal errors with reversal errors (FINREVERR vs REVERR). Furthermore, we tested whether activity to negative feedback was higher when this was immediately preceded by one (NEG + 1) or two (NEG + 2) feedback trials compared with the first negative feedback (NEG + 0) after positive feedback trials. Therefore, the contrasts NEG + 2 versus NEG + 1; NEG + 2 versus NEG + 0 and NEG + 1 versus NEG + 0 were calculated. Trials that fell into neither class were modeled as events of no interest.
In addition, one would assume that subjects weight feedback differently, depending on whether it occurred early or late after a successful reversal, given that after reversal a certain number of trials had to elapse before contingencies reversed again. We, therefore, performed a comparison between trials occurring early and late after contingency reversal. Specifically, all trials occurring after the subject's final reversal error up to the next reversal in task contingencies were split into two halves of equal length, called HALF1 and HALF2. If the number of trials to divide was odd, the trial in the middle was modeled as event of no interest. Excluding all trials between a reversal in task contingencies and a subject's final reversal error enabled us to investigate positional effects of feedback independent of the effect caused by the accumulation of negative feedback because of a rule reversal. We investigated the effect of positive and negative feedback within both halves separately (POS vs NEG and NEG vs POS in HALF1 and HALF2) as well as between the two halves (HALF1 vs HALF2 and HALF2 vs HALF1, for positive and negative feedback, respectively). These contrasts were then compared between the two genetic groups.
Finally, time courses of the hemodynamic response function to final reversal errors and to NEG + 0, NEG + 1 and NEG + 2 were extracted from regions of interest in the ventral striatum, the lateral orbitofrontal cortex (lOFC) and mesial frontal cortex using PEATE (perl event-related average time course extraction), a companion tool to FSL (http://www.jonaskaplan.com/peate/peate-cocoa.html).
Behavioral data and time courses of the hemodynamic response were tested for group-differences using one-tailed t tests for independent samples. A p value <0.05 was considered statistically significant. One-tailed tests were used for the following reasons: in case of the fMRI data, we expected attenuated responses in the lOFC, ventral striatum and mesial prefrontal cortex based on previous work showing reduced glucose metabolism in these areas in carriers of the A1 allele (Noble et al., 1997). Our behavioral predictions of increased switching and reduced persistence were driven by the reports of an association of the A1 allele with increased impulsivity (Limosin et al., 2003; Eisenberg et al., 2007) and the observation that ventral striatal D2 receptor expression is reduced in rats with increased levels of trait impulsivity (Dalley et al., 2007).
Results
Genetic analyses
The genotype frequencies of all three SNPs under investigation were in Hardy–Weinberg equilibrium: DRD2/ANKK1-TaqIa (rs1800497): A1/A1: n = 1, A1/A2: n = 11, A2/A2: n = 14, Chi2 = 0.43, df = 1, n.s.; rs1076560: C/C: n = 16, C/A: n = 9, A/A: n = 1, Chi2 = 0.04, df = 1, n.s.; rs2283265: G/G: n = 16, G/T: n = 9, T/T: 1, Chi2 = 0.04, df = 1, n.s. The three SNPs build a haplotype block (see supplemental Fig. S1, available at www.jneurosci.org as supplemental material) spanning 15 kb according to the method by Gabriel et al. (2002). D′ was 1.0 for all linkages in addition to the one between rs1800497 and rs2283265 (D′ = 0.78). Three different haplotypes could be identified (supplemental Table S1, available at www.jneurosci.org as supplemental material) resulting in four different haplotype combinations (supplemental Table S2, available at www.jneurosci.org as supplemental material). Results of the haplotype analysis suggested testing the most frequent haplotype combination (CCG–CCG) against the rest. All CCG–CCG haplotype carriers are belonging to the A1− group. Thus, our haplotype analysis corroborates the reported linkage between rs1800497 (DRD2/ANKK1-TaqIa) and the two other SNPs (rs1076560 and rs2283265) influencing the splicing of the DRD2 gene (Zhang et al., 2007). In case of an A2 allele in rs1800497, the alleles on rs1076560 and rs2283265 can be perfectly predicted. For the A1 allele of rs1800497, the linkage is not perfect resulting in alternative allele combinations. Grouping by haplotypes, thus, does not yield any further information than that provided by the DRD2/ANKK1-TaqIa SNP. The fMRI and the behavioral data were, therefore, analyzed by grouping participants according to the DRD2/ANKK1-TaqIa alleles.
Behavioral data
The overall amount of rewards collected did not differ between the two genetic groups (p > 0.9). The total number of reversal errors did not differ between groups (p > 0.16). The total average number of reversal errors was (mean ± SEM) 63.76 ± 4.38 for the A1− group and 57.00 ± 5.18 for the A1+ group (supplemental Fig. S2A, available at www.jneurosci.org as supplemental material). However, subjects from the A1+ group switched between the two response alternatives more frequently (p < 0.045) (supplemental Fig. S2B, available at www.jneurosci.org as supplemental material) than the A1− group. Interestingly, even immediately after having received positive feedback, subjects from the A1+ group frequently switched to the other response on the next trial, a behavior that was rarely observed in the A1− group (p < 0.015) (supplemental Fig. S2C, available at www.jneurosci.org as supplemental material). To investigate this response pattern in more detail, we analyzed to what extent subjects sustained their new response after a reversal because of a change in task contingency. Specifically, we analyzed the eight trials after a final reversal error and analyzed, for all 18 blocks, the proportion of trials after the reversal in which subjects maintained the newly correct response before they switched back to the (now) incorrect response. Figure 2 shows that with increasing amount of trials after the final reversal error the likelihood that subjects consistently maintain the newly correct response decreases. Subjects from the A1− group were more likely to maintain the newly correct response. Two-way ANOVA with TRIAL (eight trials) and GROUP (two groups) as factors revealed an effect of TRIAL (F(7,182) = 26.79, p < 0.001) and GROUP (F(1,26) = 4.74, p < 0.04), and a tendency for a TRIAL × GROUP interaction (F(7,182) = 2.45, p = 0.104). Post hoc t test showed that subjects from the A1− group maintained the newly correct response longer than those from the A1+ group at all time points (p < 0.041) after the first trial after the final reversal error (here, by definition, each subject has a score of 100%).
Persistence of behavioral adaptations in the two genotypes. Shown on the x-axis is the number of trials after a successful reversal of behavior, i.e., trial n + 1 is the trial immediately following the final reversal error. The values on the y-axis are the percentage of the 18 reversals, in which the subjects maintain this newly correct response on trials n + 1 to n + 8. *p < 0.05.
We divided the trials remaining in each block after a final reversal error into two halves (HALF1 and HALF2; see Materials and Methods, “Image processing and analysis”). Next, we calculated the probability of staying after positive and shifting after negative feedback separately for the two halves and compared these probabilities within and between groups. Two-way repeated measures ANOVA with the factors HALF (two halves) and GROUP (two groups) was used to assess these differences. The probability of shifting after negative feedback was higher in the second compared with the first half of the block (effect of HALF: F(1,27) = 18.67, p < 0.001). This lose–shift probability was not different between groups (no effect of GROUP, no GROUP × HALF interaction; p > 0.252). In contrast, the probability of staying after positive feedback was higher in the first than in the second half of the block (effect of HALF: F(1,26) = 87.39, p < 0.001). This win–stay probability was higher in the A1− compared with the A1+ group (effect of GROUP: F(1,26) = 4.41, p = 0.045) (supplemental Fig. S3, available at www.jneurosci.org as supplemental material), and this group difference was present in both halves (p < 0.044).
Imaging data
Negative feedback (ALLNEG vs ALLPOS) induced significant increase in BOLD signal in the RCZ, the bilateral ventral anterior insula and the left middle frontal gyrus (Fig. 3; see supplemental Table S3, available at www.jneurosci.org as supplemental material, for a comprehensive list of activations in both groups). However, none of these foci was statistically different between the two genotypes.
Signal change in response to negative feedback (ALLNEG − ALLPOS) superimposed on the MNI template brain. In both groups (A1−, top row; A1+, bottom row), there was increased activity in the RCZ (left), the ventral anterior insula (middle), and the middle frontal gyrus (right). Images are thresholded at z = 3.09. The color bar indicates z-scores. See supplemental Table S1, available at www.jneurosci.org as supplemental material, for a comprehensive list of all activations.
Reversal-related activity (FINREVERR vs REVERR) was found in the same regions as described above. Additional signal change was found in the lOFC bilaterally. Furthermore, there was widespread increase of signal in the bilateral striatum and in a region of the ventral midbrain consistent with the location of the dopaminergic ventral tegmental area (VTA) and pars compacta of the substantia nigra (SNPC) (Fig. 4). However, this striatal and mesencephalic activation was only observed in the A1− group (a comprehensive list of activations is given in supplemental Table S4, available at www.jneurosci.org as supplemental material). Direct group comparison revealed a cluster of 124 mm3 in the right ventral striatum (MNI: x = 17, y = 5, z = −7) that showed, at a threshold of p < 0.01, increased hemodynamic activity in the A1− compared with the A1+ group (Fig. 5A). Additionally, we extracted time courses of hemodynamic activity from a sphere with 3 mm radius centered at this peak coordinate. The hemodynamic response to final reversal errors had a markedly higher amplitude in the A1− compared with the A1+ group (p < 0.05 in all intervals between 5 and 9 s after onset) (Fig. 5B). To test whether the ventral striatal response to the final reversal error was predictive of future behavior, we correlated the amplitude of the hemodynamic response with subjects' tendency to maintain the newly correct response after a reversal. The ventral striatal response correlated positively with the response persistence on the eighth trial (indexing stability of behavioral adaptation) after a final reversal error (r = 0.324, p < 0.047) (supplemental Fig. S5, available at www.jneurosci.org as supplemental material). Thus, the higher the ventral striatal response to the final reversal error, the less rapidly subjects switched away from the newly correct response.
Signal change in response to final reversal errors (FINREVERR − REVERR) superimposed on the MNI template brain. In the A1− group (top), there was significant signal change in the ventral anterior insula, lateral orbitofrontal cortex (left), the RCZ, and in dorsal and ventral aspects of the striatum (right). In addition, a region consistent with the location of the mesencephalic dopamine cell groups was found to be active (left). A similar pattern was found in the A1+ group (bottom); however, note the absence of the midbrain focus and the clearly diminished extent of lateral orbitofrontal activity (left). Furthermore, no significant signal change was found in the striatum, with the exception of a small focus of less than five voxels in the dorsolateral putamen (right). Images are thresholded at z = 3.09. The color bar indicates z-scores. See supplemental Table S2, available at www.jneurosci.org as supplemental material, for a comprehensive list of all activations.
A direct comparison between the two genetic groups of the hemodynamic response to final reversal errors (A) (FINREVERR − REVERR) shows, at p < 0.01, an enhanced response in the right ventral striatum for the A1− group. The color bar indicates z-scores. B, Time course of hemodynamic activity in response to final reversal errors extracted from a 3 mm sphere centered around the peak coordinate of the contrast shown in A at MNI coordinates x = 17, y = 5, z = − 7. *p < 0.05.
A further group difference was observed in the right lateral orbitofrontal cortex (MNI: x = 53, y = 37, z = −5; p < 0.025). Time courses were extracted from a sphere (3 mm radius) centered at the peak coordinate showing an increased response to final reversal errors in the A1− group compared with the A1+ group. The amplitude of the hemodynamic response to final reversal errors was higher in the A1− compared with the A1+ group at 5 and 6 s after event onset (p < 0.016) (Fig. 6). The amplitude of the lateral orbitofrontal response was not significantly correlated with the response persistence (r = 0.210, p = 0.141).
Time course of hemodynamic activity in response to final reversal errors extracted from a 3 mm sphere centered around the peak coordinate of the group difference in the right lateral orbitofrontal cortex (MNI coordinates: x = 53, y = 37, z = −5) between the A1− and the A1+ group. *p < 0.05.
We also investigated whether negative feedback encoding is dependent on the outcome (positive or negative) of the immediately preceding trials. In both groups, negative feedback evoked, on the whole-brain level, a stronger response in the RCZ when it was preceded by one (NEG + 1) or by two trials (NEG + 2) with negative feedback. Comparing NEG + 2 versus NEG + 0 between the genotypes revealed that this contrast was diminished in the A1+ group (Fig. 7A) (p < 0.005, 336 mm3 at MNI x = 4, y = 30, z = 37). Again, we extracted time courses of hemodynamic activity from a sphere with 3 mm radius centered at this peak coordinate for NEG + 0, NEG + 1, and NEG + 2 trials. The BOLD response to NEG + 2 trials is clearly diminished in the A1+ group (Fig. 7B) at all time points from 3 to 10 s after event onset (all p < 0.05). Furthermore, comparisons of the NEG + 0, NEG + 1, and NEG + 2 at the time points 4, 5, and 6 s after event onset revealed that in the A1− group for all three time points NEG + 2 > NEG + 1 > NEG + 0 (all p < 0.034, with the exception of time point six, where NEG + 2 versus NEG + 1; p = 0.064). In contrast, in the A1+ group, NEG + 1 evoked a stronger response than NEG + 0 (all p < 0.016), but there was no further increase from NEG + 1 to NEG + 2 (all p > 0.152). To ensure that these effects could not simply be ascribed to a still elevated baseline as a consequence of carry-over effects from the preceding trial, we also calculated the amplitude of the HRF from baseline to peak. We took the mean from the time points −4 s until event onset as the baseline and the mean of the time points from 4 to 6 s after event onset as the peak of the response, the difference between the two representing the amplitude. The obtained amplitudes were then compared within and between the groups. Amplitudes of the HRF were higher in the A1− group than in the A1+ group at the time point NEG + 2 (p < 0.001), but not at NEG + 0 or NEG + 1 (p = 0.46 and p = 0.13). Paired t test revealed that, in both groups, the amplitude of the HRF increased from NEG + 0 to NEG + 1 (p < 0.015). A further increase from NEG + 1 to NEG + 2 was only present in the A1− group; however, this effect only approached statistical significance (p = 0.059) (Fig. 7C). We also tested whether the increase in HRF amplitudes was attributable to an increased percentage of trials that are followed by a behavioral switch on the next trial (“final reversal error”). The percentage of final reversal errors in the NEG + 0, NEG + 1, and NEG + 2 trials did not differ between groups (p = 0.113, p = 0.309, and p = 0.531, respectively). Additionally, there was no correlation between HRF amplitudes and the percentage of final reversal errors contained in the NEG + 0 (p = 0.839), NEG + 1 (p = 0.868), and NEG + 2 (p = 0.201) trials, respectively. Furthermore, we also investigated if the increased response of the RCZ from NEG + 0 to NEG + 2 was predictive of subjects' propensity to maintain the newly correct behavior after a reversal (i.e., the persistency score shown in Fig. 2). Like the HRF to the final reversal error in the ventral striatum, the difference in BOLD amplitude from NEG + 0 to NEG + 2 trials correlated positively with the response persistency score on the eighth trial after a final reversal error (r = 0.367, p < 0.028) (supplemental Fig. S5, available at www.jneurosci.org as supplemental material).
A, Contrast between negative feedback preceded by negative feedback in both of the two previous trials and negative feedback not preceded by any negative feedback (NEG + 2 − NEG + 0), compared between the two groups. At p < 0.005, there was an enhanced response in the RCZ for the A1− group. The color bar indicates z-scores. B, Time course of hemodynamic activity in NEG + 0, NEG + 1, and NEG + 2 trials. Time courses were extracted from a 3 mm sphere centered on the peak coordinate of the contrast shown in A at MNI x = 4, y = 30, z = 37. Time courses for the A1− group are shown in blue, for the A1+ group in red. *p < 0.05; NEG + 2 in A1− versus NEG + 2 in A1+. C, Amplitudes of the HRF calculated as the difference from baseline (mean from −4 s to event onset) to peak (mean from the time points 4 to 6 seconds after event onset), #p < 0.001.
Similar to the behavioral analyses, we investigated if the hemodynamic response to positive and negative feedback was different between the first and second half of each block. Positive feedback in HALF1 (contrast: positive feedback in HALF1 vs negative feedback in HALF1) evoked a marked signal increase in striatum, in particular in the A1− group where this effect was present bilaterally (MNI: x = −28, y = −4, z = 7 and x = 32, y = −1, z = −8), and unilaterally in the A1+ group (MNI: x = −29, y = 5, z = −2). These activations cover large extents of the putamen, particularly in the A1− group, with the peaks located more posterior and dorsal compared with the peak of the final reversal error activation (supplemental Fig. S4, available at www.jneurosci.org as supplemental material). In contrast, in HALF2, positive feedback failed to significantly engage the striatum (contrast: positive feedback in HALF2 vs negative feedback in HALF2). Negative feedback in contrast exerted a markedly stronger influence on RCZ activity (MNI: x = 5, y = 14, z = 44 and x = 1, y = 12, z = 49, for the A1− and A1+ groups, respectively) when it occurred in HALF2 compared with HALF1 (contrast: negative feedback in HALF2 vs negative feedback in HALF1). None of these effects differed between the two genetic groups.
Discussion
In the present study, the overall network of brain regions we found to be activated by negative feedback per se (anterior insula, RCZ, middle frontal gyrus) and by final reversal errors (lateral orbitofrontal cortex, ventral striatum) is consistent with the literature (Cools et al., 2002; Cohen et al., 2007; Dodds et al., 2008). In addition, our results demonstrate that a genetically driven reduction in striatal D2 receptors affects performance in a probabilistic reversal learning task. The behavioral alteration did not consist of increased perseverative errors. Rather, A1+ subjects, having reduced D2 receptor density compared with A1− subjects, had difficulty in maintaining the newly rewarded response after behavioral adaptation in response to a change in task rule. Moreover, these subjects were in general more likely to switch back and forth between the response alternatives. In particular, A1+ subjects frequently switched to the other response although they had just been reinforced for the response they made. These subtle behavioral differences were accompanied by changes in feedback-related BOLD signals. The final reversal error engaged the ventral striatum and the lOFC in the A1− group to a greater extent than in the A1+ group. Interestingly, the amplitude of the ventral striatal response to the final reversal error was also predictive of subjects' propensity to maintain the newly correct response: the higher the ventral striatal response, the less rapidly subjects switched back to the incorrect response.
Furthermore, activity in the RCZ increased as a function of preceding negative feedback. That is, the more negative feedback trials preceded a negative outcome, the stronger was the response in the RCZ. This graded response to consecutive negative outcomes was absent in the A1+ group: while in these subjects, activity in the RCZ increased from the first to the second negative feedback, no further increase from the second to the third negative feedback was observed.
Interestingly, the graded response of the RCZ to negative feedback was also predictive of subjects' behavior after a final reversal error: the more the activity in RCZ increased with the number of preceding negative feedback, the more likely subjects maintained the newly correct response.
Additionally, we observed that feedback differentially influenced subjects' behavior, depending on its position in the block. In general, lose–shift behavior occurred more frequently in the second half of each block, and win–stay behavior was more frequent in the first half. However, win–stay behavior in both parts of the block was less frequent in subjects from the A1+ group, consistent with their decreased tendency to maintain the newly correct response after a reversal. These effects of feedback position were also found in the fMRI data. A pronounced striatal response to positive feedback, located, in particular, in large extents of the posterior two thirds of the putamen, was only observed in the first half of the block. Here, positive feedback can be thought of as being most informative, particularly in the first trials after reversal, when rewards confirm that the decision to switch was correct. Nevertheless, hemodynamic response amplitudes to positive feedback in the first half were not correlated with the tendency to maintain the newly correct response. Negative feedback, in contrast, evoked clear-cut activation of the RCZ in both parts of the block, but the response of the RCZ was markedly stronger in the second half. Therefore, it seems that subjects ascribe more relevance to negative feedback that occurs later in a block, which is paralleled by the increased incidence of lose–shift behavior in the second half of the block.
The reduced ventral striatal response to final reversal errors in the A1+ group may either be a direct consequence of the reduced D2 receptor density in this region or secondary to the reduced glucose metabolism in the RCZ (Noble et al., 1997), which might entail an impaired integration of negative feedback. We would speculate that, as subjects accrue more and more negative feedback during reversals, activity in the RCZ gradually increases up to a certain threshold. When activity exceeds this threshold, the RCZ engages the striatum via its efferents (Müller-Preuss and Jürgens, 1976; Yeterian and Van Hoesen, 1978; Baleydier and Mauguiere, 1980; Devinsky et al., 1995; Takada et al., 2001) to trigger a behavioral adaptation. In A1+ subjects equipped with reduced striatal D2 receptor density, the already altered information arriving from the RCZ might be further degraded in the ventral striatum by maladaptive corticostriatal integration, attributable to the relative lack of D2 receptors.
We also found an area of the midbrain comprising the dopaminergic nuclei of the VTA and SNPC to be activated on final reversal errors. This suggests that the dopaminergic midbrain is recruited by the OFC, RCZ [but see Frankle et al. (2006) for sparse cortical projections to the midbrain], or ventral striatum during reversals. Alternatively, reversal-related activity in the OFC and RCZ might also be modulated by engagement of the VTA and SNPC.
The impaired maintenance of the correct response after a behavioral switch shown by the A1+ subjects may be attributable to inefficient updating of stimulus–reward associations (Rolls, 2000). This behavior parallels findings from patients with bilateral lesions in the OFC showing the same win–shift behavior in a visual discrimination reversal task (Hornak et al., 2004).
Both D2 receptor agonism and antagonism have been shown to impair reversal learning (Ridley et al., 1981; Smith et al., 1999; Mehta et al., 2001; Lee et al., 2007). A recent study found that reversal-related activity in the ventral striatum was diminished by the catecholamine-releasing drug methylphenidate but not by the D2 receptor antagonist sulpiride (Dodds et al., 2008). As the authors discussed, this lack of effect of sulpiride may be a consequence of the dose they used (400 mg), which might have been too low to occupy a substantial proportion of D2 receptors. In another recent study, however, Lee et al. (2007) reported that antagonizing D2 receptors with raclopride diminished behavioral flexibility in monkeys, thereby increasing the number of reversal errors in a response reversal task.
It is not clear if the effects of D2 receptor agonists and antagonists on reversal learning are mediated by action on receptors in the striatum (ventral or dorsomedial) or in other (for instance, fronto-cortical) brain regions. The reduction of striatal D2 receptors in the A1+ subjects suggests that the effect is indeed mediated by striatal D2 receptors (but see Calaminus and Hauber, 2007). However, it is not clear yet whether the DRD2/ANKK1-TaqIa polymorphism also affects D2 receptor density in brain areas other than the striatum. Speaking in favor of a central role for intact dopaminergic transmission in the striatum, Frank and colleagues, using elaborate computational models, provided evidence that disrupted DA signaling in the ventral striatum might be held responsible for impairments in reversing behavior after a switch in task contingencies in a probabilistic learning task (Frank, 2005; Frank and Claus, 2006).
In our task, subjects were instructed to switch to the alternative response only when they were sure that the contingencies had reversed. Therefore, it is not surprising that we only found a slight, nonsignificant reduction in the overall number of reversal errors in the A1+ group. This is consistent with findings of another study that also compared reversal learning in A1+ and A1− subjects (Cohen et al., 2007). However, our analysis of the behavioral pattern after the reversal of task contingency revealed that, although A1+ subjects did not take longer to switch to the correct response, they were less likely than A1− subjects to sustain this new response and frequently reverted back to the previously correct response. This pattern is remarkably reminiscent of the deficit Kennerley et al. (2006) observed in macaque monkeys with lesions of the anterior cingulate sulcus. Lesioned animals did not take more trials to switch to the correct response, but they were impaired at maintaining this new response on the next trials. Even after having collected several rewards for the new response, they were still likely to revert back to the previously reinforced response. The authors argue that one function of the dorsal ACC/RCZ is to integrate action–outcome associations over multiple trials (“reinforcement history”) and that the lesion interfered with this function. In agreement with this interpretation, we found that negative action outcomes were not encoded in a uniform manner in the RCZ. Rather, the response of this brain region to negative feedback depended on the outcome of previous trials: the more consecutive negative outcomes preceded a negative feedback, the more pronounced was the BOLD response in the RCZ. This integrative function seems to be reduced in carriers of the A1 allele. Our results concur with previous findings showing that A1+ subjects had difficulty in learning from negative feedback in a reinforcement learning task. This was also accompanied by diminished responses of the RCZ to negative feedback in this group (Klein et al., 2007). This suggests a general role of D2 receptors in feedback-based learning.
Together, the results of the present study show that in a probabilistic reversal learning task, negative action outcomes are integrated over multiple trials in the RCZ. During behavioral adaptation to a reversal of task contingencies, the lateral orbitofrontal cortex and ventral striatum are engaged. Carriers of the A1 allele show deficient integration of feedback in the RCZ and reduced recruitment of the ventral striatum and the lOFC during reversal. This diminished engagement of reversal-relevant brain areas likely makes the subjects' decision less stable and thereby causes them to revert back to previously successful actions more frequently. Our findings suggest that striatal and possibly cortical D2 receptors are crucial for the integration of action outcomes and successful reversal learning.
Footnotes
-
This work was supported by a grant from the National Institutes of Health (R01 MH74457; to J.N.) and by a grant from the Deutsche Forschungsgemeinschaft (JO-787/1-1; to G.J.).
- Correspondence should be addressed to Gerhard Jocham, Cognitive Neurology Research Group, Max Planck Institute for Neurological Research, Gleueler Strasse 50, D-50931 Cologne, Germany. jocham{at}nf.mpg.de