Abstract
The striatum is known to play a key role in reinforcement learning, specifically in the encoding of teaching signals such as reward prediction errors (RPEs). It has been proposed that aberrant salience attribution is associated with impaired coding of RPE and heightened dopamine turnover in the striatum, and might be linked to the development of psychotic symptoms. However, the relationship of aberrant salience attribution, RPE coding, and dopamine synthesis capacity has not been directly investigated. Here we assessed the association between a behavioral measure of aberrant salience attribution, the salience attribution test, to neural correlates of RPEs measured via functional magnetic resonance imaging while healthy participants (n = 58) performed an instrumental learning task. A subset of participants (n = 27) also underwent positron emission tomography with the radiotracer [18F]fluoro-l-DOPA to quantify striatal presynaptic dopamine synthesis capacity. Individual variability in aberrant salience measures related negatively to ventral striatal and prefrontal RPE signals and in an exploratory analysis was found to be positively associated with ventral striatal presynaptic dopamine levels. These data provide the first evidence for a specific link between the constructs of aberrant salience attribution, reduced RPE processing, and potentially increased presynaptic dopamine function.
Introduction
Aberrant attribution of salience, i.e., to irrelevant or otherwise neutral stimuli, has been proposed as the central mechanism in the development of psychotic experiences (Heinz, 2002; Kapur, 2003). It has been suggested that aberrant salience attribution may arise from elevated dopaminergic neurotransmission (Heinz and Schlagenhauf, 2010; Winton-Brown et al., 2014), consistent with the replicated finding of heightened presynaptic dopamine levels in schizophrenia and individuals at risk of psychosis (Howes and Kapur, 2009; Howes et al., 2012) and with data from animal research connecting dopamine levels and models of salience attribution (Bay-Richter et al., 2009; Weiner and Arad, 2009; O'Callaghan et al., 2014).
In healthy individuals, dopamine plays a key role in signaling incentive salience (Robinson and Berridge, 1993) and correlates with a teaching signal central in theories of reinforcement learning (Sutton and Barto, 1998). Teaching signals help adapt to an ever-changing environment. Phasic firing of midbrain dopamine neurons was demonstrated to encode the difference between received outcome and expected value (Montague et al., 1996; Schultz, 1997), termed the reward prediction error (RPE). These findings were mirrored in humans using fMRI: coding of RPEs in areas innervated by dopaminergic neurons was repeatedly shown, particularly in the striatum (O'Doherty et al., 2003; D'Ardenne et al., 2008). In schizophrenia, attenuated striatal RPE signals were observed (Murray et al., 2008; Gradin et al., 2011; Deserno et al., 2013; Schlagenhauf et al., 2014). An association of such signals with delusion severity was reported (Corlett et al., 2007; Schlagenhauf et al., 2009; Romaniuk et al., 2010) and attenuated striatal RPE signals were associated with self-reported delusion-like beliefs in healthy participants (Corlett and Fletcher, 2012). In a similar population, high levels of aberrant salience have been reported (Roiser et al., 2013) suggesting that there might be an association of the constructs of RPE and aberrant salience.
Recently, a paradigm was designed to measure the degree of aberrant salience—the salience attribution test (SAT; Roiser et al., 2009). Participants had to respond to different stimuli, some of which predicted reward while others were uninformative. Aberrant salience was operationalized as implicit (the speeding of responses to irrelevant stimulus features) and explicit measures (subjective probability ratings about reward contingencies). The implicit measure had the highest construct validity in healthy volunteers (Schmidt and Roiser, 2009). So far, it has been demonstrated that schizophrenia patients with delusions exhibited higher explicit aberrant salience than patients without (Roiser et al., 2009) as did individuals in an at-risk state for psychosis compared with healthy volunteers (Roiser et al., 2013). We found significantly higher implicit aberrant salience in a large sample of schizophrenia patients (A. Pankow, T. Katthagen, S. Diner, L. Deserno, R. Boehme, T. Gleich, M. Gaebler, H., A. Heinz, F., unpublished observations). fMRI results showed that the ventral striatum (VS) was activated by task-adaptive reward prediction during the SAT, while aberrant reward prediction responses have been observed inconsistently in the VS (Roiser et al., 2010, 2013).
Although there is an influential theoretical background motivating the idea that aberrant salience attribution may relate to weakened learning signals in the VS elicited by task-relevant cues (Heinz, 2002; Kapur, 2003; Corlett et al., 2007; Schlagenhauf et al., 2009; Heinz and Schlagenhauf, 2010; Winton-Brown et al., 2014), the association between these processes has yet to be investigated directly. Here, we related a behavioral measure of aberrant salience attribution derived from the SAT to RPEs assessed by fMRI during operant learning in a large sample of healthy participants. We hypothesized that high levels of aberrant salience attribution are associated with decreased VS RPE signals; we also investigated whether aberrant salience levels are associated with prefrontal RPE encoding. A subgroup underwent [18F]fluoro-l-DOPA (FDOPA) PET to measure presynaptic striatal dopamine levels. Based on the proposal that aberrant salience attribution is accompanied by elevated presynaptic dopamine levels (Howes et al., 2012), we predicted a positive association with aberrant salience scores.
Materials and Methods
Participants.
Fifty-eight healthy volunteers (23 female; mean age: 25.8 ± 5.9 years, range 18–43) were included; participants were recruited through university mailing lists and from the institute subject database. They were free of Axis I psychiatric disorders, with no history of psychiatric disorders in first-degree relatives, and no current or past alcohol or substance abuse (other than nicotine; First et al., 2001). There were three separate appointments for the SAT, fMRI, and PET measurements. Peter's Delusion Inventory (PDI; Peters et al., 1999) and Schizotypy Personality Questionnaire (SPQ; Raine, 1991) were used to approximate psychosis-like experiences. The local ethics committee approved the study and written informed consent was obtained after complete study description. A subgroup of this sample also performed a different behavioral task during fMRI, published separately (Deserno et al., 2015).
Aberrant salience task.
The SAT was used to measure aberrant salience behaviorally. A more detailed description is provided in the original publication (Roiser et al., 2009). In short, a stimulus appeared on the screen as a cue, which could vary across two dimensions: color (red or blue) and form (animal or household object; Fig. 1). While the stimulus features on one dimension predicted reward availability (e.g., red vs blue: 87.5 vs 12.5%), the other dimension carried no predictive information about the occurrence of reward and was therefore irrelevant (e.g., 50% reward for both animal and household features). Right after the cue, participants had to respond to the presentation of a square (the probe) to win money. They were instructed that faster responses yielded higher rewards, but reward was not always available. If the trial was not reinforced, the message “Sorry—no money available” was displayed after the probe disappeared. If reinforced, “hit” responses (made before the probe disappeared) that were slower than the participant's own mean reaction time (RT) (measured during an earlier practice session) resulted in the message “Hit—good: 10 cents.” For hit responses faster than the participant's mean practice RT the following messages appeared: “Quick—very good: X cents” (for responses up to 1.5 SDs above their mean RT) and “Very quick—excellent: X cents” (for responses faster than 1.5 SD). The scaling of reward in each trial was calculated using the equation: X = 10 + 45 × (mean RT − trial RT)/(3 × SD), up to a maximum of 40 cents per trial. Implicit aberrant salience attribution was measured through RTs and calculated as follows: subtraction of the RTs to subjective low reward stimuli from the RTs to subjective high reward stimuli followed by square root transformation to reduce skew. Participants performed the task in two separate blocks of equal length, over which values were averaged. We used this implicit measure of aberrant salience, which has been shown to have a high construct validity (Schmidt and Roiser, 2009), to create two groups of 29 subjects each (“high” and “low”) via a median split. Additionally, we explored the robustness of the results using an extreme group approach with two groups of 20 people each including the highest and lowest values of implicit aberrant salience attribution. In addition to the implicit measure, we also obtained the explicit measure, based on an individual rating after each of the two blocks. Participants rated the individual stimuli on a visual analog scale by estimating how often each of the four stimulus types had been reinforced (0–100). To analyze the development of salience attribution over the first block of the experiment, RTs were centered using the mean RT from the practice session. RTs were separated into high and low predictive probability trials and irrelevant trials. The latter were categorized as subjective good or bad (based on mean RT/based on individual ratings). RTs were averaged over time bins of five trials, resulting in six data points.
One trial of the SAT. Participants had to respond to a probe preceded by stimuli that varied in two dimensions (color, shape), only one of which was relevant for predicting reward (80 vs 20%). The other dimension carried no predictive information (50% win probability). Probe duration was calculated from individual reaction times in the second practice session [mean ± 2 SDs from the fastest half of trials (SDF)]. Faster responses yielded higher rewards, about which feedback was given following each trial.
Operant learning task.
For operant learning, a probabilistic reinforcement learning paradigm was used. Participants were instructed to choose one of two different stimuli appearing simultaneously on the screen during a decision window of 1.5 s. Stimuli were randomly assigned to the left or right side (Fig. 2A). If no response occurred within the decision window, the message “too slow!” was displayed and the next trial began. One hundred and sixty trials were administered. Each trial consisted of stimulus presentation (1.5 s); feedback (0.5 s); and a jittered, exponentially distributed intertrial interval (min 1 s, max 12.5 s). On each trial, participants could win or lose money. They were instructed to try to win as much money as possible. One symbol was associated with a high reward probability (80%) and a low loss probability (20%), and the inversed probabilities were assigned to the other symbol. This probability distribution switched at various points in the experiment: at the beginning and end were stable periods without switching (first block: 55 trials; last block: 35 trials); in the middle, shorter blocks alternated between 15 and 20 trials (Fig. 2B). Feedback for win trials consisted of the message “Win! +10 Cent” and for lose trials of the message “Loss! −10 Cent.” Each participant practiced the task during a training session without reversals of reward contingencies, but was notified that there might be switches during the main experiment. A minimum of 3€ and a maximum of 10€ could be won on the task. Behavioral performance was quantified as the percentage of “correct” responses, i.e., choosing the currently better symbol. Groups were compared using two sample t tests. Behavioral data were analyzed using SPSS 19.
Operant learning task. A, Participants had to choose the symbol with the higher reward probability to win money. B, Reward probability associated with one of the symbols changed over the course of the whole experiment (the other symbol's reward probability was the inverse).
Reinforcement learning model.
A reinforcement learning model was applied to the operant learning task to quantify learning dynamically and to generate trial-by-trial RPEs for each individual to serve as regressors in the fMRI analysis. We used a Q-learning algorithm, which estimates five free parameters for every participant to model their choices during the reinforcement learning paradigm. The algorithm tracks the expected outcome value (the “Q-value”) of the chosen stimulus a (Sutton and Barto, 1998) on each trial t. This expected value was adjusted according to the RPE δQa,t, which is defined as the difference between the received outcome Rt and the expected outcome Qa,t for the chosen stimulus:
Rt denotes two separate free parameters: for rewarded and punished trials. Instead of setting Rt for reward and punishment to fixed values (1 and −1), we allowed them to vary individually as free parameters to capture variation in reward and punishment sensitivity (Schlagenhauf et al., 2014). The RPE is used as a teaching signal to update expected values iteratively trial by trial:
Here, α determines the learning rate, which also was estimated separately for reward and punishment outcomes. The learning rate describes how quickly expectations change with respect to the current RPE. There was another free parameter Qi that specified the initial Q-values for one option (a bias to choose one or the other stimulus initially; Schlagenhauf et al., 2014). A softmax rule was used to estimate the probability of choices (pa(t)) based on expected values:
Here, the used softmax equation does not contain a free temperature parameter, because reward and punishment sensitivities cover the same behavioral variation and render the temperature redundant. Parameters of the model were estimated by applying expectation maximization with empirical priors and the model evidence was approximated by integrating out the free parameters over the likelihood via sampling from the prior distribution (Huys et al., 2011, 2012). For between-group comparison, all five parameters, i.e., reward and punishment learning rates, reward and punishment sensitivities, and initial Q-value, were entered into a multivariate ANOVA (MANOVA).
Only subjects whose choice behavior was explained better than chance (based on the likelihood that the observed behavioral data are given by the parameters) were included in further modeling-based analysis (52 of 58 subjects: four from the low and two from the high aberrant salience group were excluded). Thus, time series derived from these parameters reflect important aspects of the behavioral data and can be regressed against imaging data in a meaningful way.
fMRI.
A 3.0 tesla Siemens trio scanner with a 12-channel head coil was used to acquire 423 T2-weighted EPIs containing 40 slices (TR = 2090 ms, TE = 22 ms, slice thickness 2.5 mm, matrix size 64 * 64, field of view 488 * 488 mm2, in-plane voxel resolution 3 mm2, flip angle = 90°). Field distortion maps and T1-weighted anatomical images were also acquired. fMRI data were analyzed using statistical parametric mapping (SPM8; Wellcome Department of Imaging Neuroscience, London, UK; http://www.fil.ion.ucl.ac.uk/spm) in MATLAB R2010b (The MathWorks). The following steps were performed: slice-time and motion correction including unwarping, coregistration of the mean EPI and the anatomical image, spatial normalization to the MNI T1 template, and segmentation of the T1 image using the unified segmentation approach (Ashburner and Friston, 2005). Normalization parameters were applied to all EPIs. Finally, all images were spatially smoothed with an isotropic Gaussian kernel of 6 mm full-width at half-maximum.
For statistical analysis of the BOLD response, the general linear model approach was used as implemented in SPM8. In an event-related design, feedback onsets were convolved with the hemodynamic response function and trial-by-trial RPEs derived from the learning model were added as a parametric modulator. One additional regressor marked trials where no answer occurred. To account for movement-associated variance, realignment parameters and their first temporal derivative of translational movement plus an additional regressor marking scans with >1 mm scan-to-scan movement were included as regressors of no interest. Individual contrast images were taken to a random effects group-level analysis (one-sample t tests for within-group and two-sample t tests for between-group comparisons). To correct for multiple comparisons, statistics are reported using FWE correction at the voxel level across the whole brain. Based on our a priori hypothesis about the negative association between ventral striatal RPE signal and measures of aberrant salience, parameter estimates from the fMRI activations were extracted from a 2 mm sphere around within-group (over all subjects) peak voxels and subjected to a correlation analysis with aberrant salience measures derived from the SAT (one-tailed, based on our a priori hypothesis) using SPSS19. For group comparisons, the search volume was restricted to those voxels showing a significant main RPE effect at p < 0.05 (whole-brain corrected) across the entire sample (Table 1).
Prediction error-related activation
PET.
A subgroup of 27 subjects (mean age 28.2 ± 6, 12 female) also underwent FDOPA PET. Data were acquired using a Philips Gemini TF16 time-of-flight PET/CT scanner in 3D mode. After a low-dose transmission CT scan, a 60 min dynamic 3D “list-mode” emission recording started after intravenous bolus administration of 200 MBq FDOPA. The following steps were performed: CT-based tissue attenuation and scatter correction, reconstruction (OSEM, 16 iterations, six subsets) and framing (30 frames: 3 × 20 s, 3 × 1 min, 3 × 2 min, 3 × 3 min, 7 × 5 min, 1 × 6 min) of list-mode data, coregistration of a mean emission image and the individual T1 image, and spatial normalization of the T1 image using the unified segmentation approach (Ashburner and Friston, 2005). Normalization parameters were applied to all frames. Presynaptic dopamine synthesis capacity was quantified as FDOPA Ki (min×1) voxel by voxel. Ki was estimated using the frames from 20 to 60 min of the recording by applying Gjedde–Patlak linear graphical analysis (Patlak and Blasberg, 1985). Values from a standard cerebellum mask (excluding Vermis, WFU PickAtlas) were used as input function. Ki values were extracted after normalization to MNI space from four bilateral ROIs: from limbic (ventral), sensorimotor, and associative striatum (Martinez et al., 2003; Howes et al., 2012) and additionally from 2 mm spheres around the peak voxel of ventral striatal RPE-associated activation obtained in the fMRI analysis over all participants. Ki values were compared between high and low aberrant salience groups using a MANOVA and correlated with aberrant salience scores using Pearson correlations within SPSS19. Correlations were corrected for multiple comparisons via Bonferroni corrections.
Results
Behavior during the SAT
The implicit measure of aberrant salience (AbSal) had a mean of 3.1 ± 1.1 (range 0.7–6.55). Compared with previously published data, this sample displays a rather small overall range (values before transformation for comparability: 11.6 ± 7.8 ms; see Roiser et al., 2009, where the healthy sample was found around 16.5 ± 10.3 and 14.5 ± 14.9, Roiser et al., 2013: 24.6 ± 21.9 ms, and Roiser et al., 2010: 15.7 ± 8.5 ms). During the first block of the experiment, RTs for the subjective “good” cue decreased relative to the subjective “bad” cue (Table 2), indicating that AbSal attribution was acquired during the experiment and not based on a pre-existing bias.
Development of RTs over the course of the first block
On the basis of the AbSal measure, two groups of 29 participants each were formed via a median split. The low AbSal group had a mean score of 2.27 ± 0.56 (range 0.7–3.06), and the high AbSal group had a mean score of 3.9 ± 0.77 (range 3.09–6.55; for more details see Table 3). These two groups did not differ in age (t(55) = 1.03, p = 0.32), gender [χ2(1, N = 58) = 0.29, p = 0.9], or verbal intelligence (t(54) <0.001, p > 0.99). There was no difference in the used measures of delusion-like experiences (PDI: t(56) = 0.86, p = 0.4, mean: low AbSal = 6.7 ± 5.7, high AbSal = 7.5 ± 5.6, overall range 0–17; SPQ: t(45) = 0.7, p = 0.48, mean: low AbSal = 18.35 ± 16, high AbSal = 15.5 ± 11.5, overall range 1–73) or in the measure of adaptive salience, i.e., reaction time differences between the cues, which actually predicted reward probabilities (t = −0.93, p = 0.4).
Overview modeling parameters and aberrant salience attribution for low and high aberrant salience groups
Mean AbSal values in the extreme groups were 2 ± 0.5 for the low AbSal group and 4.3 ± 0.7 for the high AbSal group. There was a significant difference on the PDI measure for the two extreme groups: t = 2.5, p = 0.018 (mean: low AbSal groups = 4 ± 5, high AbSal group = 8.9 ± 4.9).
Behavior during the operant task
On the operant learning task, performance of the two groups did not differ significantly: mean percentage correct responses were 74.75 ± 11.5% for the low AbSal and 76.7 ± 7.3% for the high AbSal group (t(56) = 0.77, p = 0.44).
With respect to the modeling analysis, a parameter × group MANOVA revealed a significant group difference only for reward learning rate α (mean α: low AbSal group = 0.68 ± 0.13, high AbSal group = 0.57 ± 0.19, F(1,57) = 7.62, p = 0.0078; an overview over model parameters is given in Table 3). After exclusion of the six participants whose behavior was not explained better than chance by the model, this effect remained significant (F(1,51) = 5.58, p = 0.02). The likelihood of the observed choice data was not related to AbSal scores.
The extreme group approach rendered the effect of the learning rate difference nonsignificant (t(40) = 1.2, p = 0.25).
Neural correlates of RPEs
RPEs covaried with BOLD response in a frontostriatal network including VS, cingulate cortex, parietal and temporal cortex, medial prefrontal cortex, orbitofrontal cortex (OFC), hippocampus, and cerebellum, which survived whole-brain FWE correction (FWE-WB; see Table 1).
The difference between the high compared with the low AbSal group in RPE-related activation extracted at the peak coordinate in the VS (right: [14 10 −10], Z = 5.94, PFWE-WB <0.001; left: [−10 12 −10], Z = 5.09, PFWE-WB = 0.021) was significant on the left [t(50) = 1.8, p = 0.04 (one-tailed)] and showed a trend toward significance on the right [t(50) = 1.54, p = 0.063 (one-tailed)], indicating an association of higher levels of AbSal with reduced RPE signaling in the VS. When correlated over both groups, AbSal and RPE activation showed a negative association in the right VS [r = −0.23, p = 0.045 (one-tailed), R2 = 0.05; Fig. 3] and also approached significance in the left VS [r = −0.2, p = 0.08 (one-tailed), R2 = 0.04].
Ventral striatal prediction error signals correlate negatively with aberrant salience scores. A, Prediction error-associated activation in the striatum (y = 10). The image is thresholded at p < 0.05 (FWE-WB) and the color bar indicates t values. B, Correlation of parameter estimates from the right ventral striatum [14 10 −10] with aberrant salience scores [r = −0.23, p < 0.05 (one-tailed), R2 = 0.05]. L, left; R, right.
The extreme group approach rendered this finding more significant [right VS: t(40) = 2.044, p = 0.049 (two-tailed), left VS: t(40) = 1.4, p = 0.17 (two-tailed)].
In addition to our a priori hypothesis with respect to the VS, we performed a voxelwise analysis (restricted to regions showing a main effect of RPE, as above), which revealed a significant group difference in the left OFC (Z = 4.52, [−30 46 −14], PSVC main effect = 0.003; Fig. 4A): while participants in the low AbSal group encoded RPEs in this region, this signal was reduced in the high AbSal group (Fig. 4B).
Low and high AbSal groups differ in prediction error-associated activation in the left orbitofrontal cortex. A, Contrast between low and high aberrant salience groups, displayed at p <0.001 (uncorrected). B, Mean parameter estimates for the groups of low and high aberrant salience at [−30 46 −14]. Error bars indicate SEM and the color bar indicates t values. L, left; R, right.
This finding was found to be robust using the two extreme AbSal groups (Z = 3.99, PSVC main effect = 0.023).
Presynaptic dopamine levels and aberrant salience measures
To relate presynaptic dopamine levels (Ki) to aberrant salience attribution, a subsample of 27 subjects also underwent FDOPA PET (low AbSal: n = 14, mean AbSal score = 2.2 ± 0.38, high AbSal n = 13, mean AbSal score = 3.9 ± 1.05). These AbSal subgroups did not differ in any of the demographic parameters (t values <1.1, p values >0.26). There was no significant difference in Ki levels between the AbSal groups in the limbic, associated, or sensorimotor striatum (t values <1.5, p values >0.16). There was no correlation between Ki values in ROIs of striatal subregions (limbic, associative, and sensorimotor) and the implicit aberrant salience score (r values <0.25, p values >0.2). In an exploratory analysis, Ki values were extracted from a sphere around the peak voxels of RPE signals in the VS. These were positively correlated with aberrant salience measures in the right VS (r = 0.54, p = 0.004, R2 = 0.29; Fig. 5), while no association was found in the left VS (r = 0.19, p = 0.35, R2 = 0.036). No significant correlation was observed of RPE BOLD signal with Ki in this ROI (p = 0.89).
PET Ki values (indicated by the color bar) quantifying presynaptic dopamine levels correlate positively with aberrant salience scores. A, PET Ki at y = 10, mean over all participants. B, Correlation of aberrant salience scores with Ki values extracted from peak coordinate of the prediction error-related activation in the right ventral striatum ([14 10 −10], indicated in blue in A; r = 0.54, p < 0.01, R2 = 0.29). L, left; R, right.
Explicit aberrant salience
Following the suggestion of one of our reviewers we also explored associations with the explicit measure of aberrant salience (mean = 2.2 ± 1.4, range = 0–5.5). This measure was not correlated with the implicit aberrant salience measure (r = −0.18, p = 0.18) and there were no relationships of this measure with fMRI (r values <0.09, p values >0.5) or PET-derived values (r values <0.15, p values >0.43).
Discussion
Here we present first evidence for an association of aberrant salience attribution derived from the SAT (Roiser et al., 2009) with reinforcement learning signals and neurochemical measures of presynaptic dopamine in a sample of healthy controls: (1) participants with higher levels of aberrant salience attribution displayed reduced coding of RPEs in VS and OFC during instrumental learning and (2) aberrant salience attribution correlated positively with ventral striatal presynaptic dopamine levels measured in the area activated by RPEs. These results provide first proof that interindividual differences in a measure of aberrant salience attribution relate to behavioral and neural correlates of reinforcement learning and measures of dopaminergic neurotransmission.
Activation of the VS in response to RPEs has been repeatedly demonstrated (O'Doherty et al., 2003; D'Ardenne et al., 2008); these learning signals are important for trial-by-trial updating of reward expectations and underlie behavioral adaptation (Kehagia et al., 2010; Boureau and Dayan, 2011; Deserno et al., 2013; Ullsperger et al., 2014). Here we show that coding of such adaptive learning signals is relatively lower in subjects displaying higher aberrant salience attribution.
The finding of a negative correlation between implicit aberrant salience attribution and VS RPEs, here in healthy controls, is in line with studies showing that people with psychotic experiences show reduced RPE signals (Corlett et al., 2007; Schlagenhauf et al., 2009; Romaniuk et al., 2010; Corlett and Fletcher, 2012). Note, though, that we only obtained an association with one self-report scale of delusion-like experiences in the extreme group approach. This does not necessarily contradict the continuum model of psychosis, because aberrant salience measures of our healthy sample covered the lower range of possible aberrant salience values, which might have been insufficient to detect an association with psychopathology (Roiser et al., 2009, 2010, 2013). The RT measure might be too fine-grained to correlate with the coarser self-assessment scales. It is also conceivable that a number of factors have to interact to lead to psychosis-like experiences; aberrant salience attribution might be only one of them.
Alternatively, high aberrant salience participants might engage in a different strategy to solve the task, which might lead to reduced fMRI signals (Schönberg et al., 2007). We cannot make any claims about possible group differences here, because we did not compare behavioral strategies. Instead we used one computational model for all participants to estimate meaningful regressors. Both subgroups were well explained by the learning model and did not differ regarding their reward or punishment sensitivity. However, the finding of a lower learning rate in the group with higher aberrant salience measures indicates slower updating after a reward. Decreased reward learning rates might lead to slower updating of the conditioned response after an aberrant relationship between expectations and irrelevant cues was established initially. To some extent our presented findings challenge the notion that aberrant salience and flexible reward-based learning represent independent constructs. If the aberrant salience measure constitutes a different conceptualization of the same mechanism evaluated by RPEs, aberrant salience would simply reflect altered RPE-based learning. In this case, correlating these measures would be redundant, but the use of reinforcement learning paradigms for investigations of the aberrant salience hypothesis in psychotic patients would be further supported. On a biological level, there is also evidence for independent dopaminergic processing pathways of RPEs and salience: from studies in monkeys the proposition arises that different subpopulations of dopaminergic neurons encode RPEs and salience (Matsumoto and Hikosaka, 2009; Matsumoto and Takada, 2013). For future studies, pharmacological challenges of the dopamine system represent a powerful tool to proof conceptual overlap as well as independence of both constructs on a psychological and biological level. One first study approached this question recently in Parkinson's disease (Nagy et al., 2012).
Furthermore, we found first evidence that aberrant salience attribution was positively related to presynaptic dopamine levels in the same region of the VS that was activated by RPEs. Increased striatal dopamine levels were proposed to be involved in stochastic assignment of salience to irrelevant stimuli (Heinz, 2002; Kapur, 2003; Poletti et al., 2014). Elevated levels of striatal presynaptic dopamine synthesis capacity are a well established finding in schizophrenia patients (Howes et al., 2012), and have been observed in individuals in an at-risk mental state for psychosis (Egerton et al., 2013). Animal models and human PET imaging suggest an overactivation of the dopaminergic system as possible mechanism: because of altered regulatory processes the spontaneous activity of dopaminergic neurons might be increased (Abi-Dargham et al., 2000; Lodge and Grace, 2007; Goto and Grace, 2008; Mizrahi et al., 2012). A general upregulation of tonic activity, leading to heightened phasic responses, could in turn render all stimulus-driven inputs important contributing to aberrant salience attribution to irrelevant stimuli (Grace, 2012). In a similar vein, heightened aberrant salience attribution might be associated with decreased VS RPE BOLD signals because of stochastic VS activity leading to deflated covariation between RPE size and BOLD signal. Our data complement these lines of research and suggest that aberrant salience attribution is directly correlated with elevated dopamine synthesis capacity and reduced encoding of RPEs.
In the whole-brain analysis, we found decreased OFC activation in subjects with higher values of aberrant salience attribution. The OFC is associated with reward processing in animals (Roesch and Olson, 2004; Padoa-Schioppa and Assad, 2006) and was repeatedly implicated in the reward circuit in humans (Kringelbach and Rolls, 2004). Functional and anatomical abnormalities of this region are found in major psychiatric disorders (Jackowski et al., 2012), especially in obsessive compulsive disorder (Greenberg et al., 2000; Menzies et al., 2008), but also in schizophrenia (Meador-Woodruff et al., 1997; Malchow et al., 2015). Accumulating evidence indicates that the OFC contributes to updating stimulus-outcome associations by providing information about expectations to downstream areas (Stalnaker et al., 2007; Schoenbaum et al., 2009; Takahashi et al., 2009), including the VS (Haber et al., 1995). Interestingly, animal studies found that intact OFC function is needed to learn from unexpected outcomes, i.e., from RPEs (Takahashi et al., 2009), and lesions in the OFC led to alterations of activity in dopaminergic neurons (Takahashi et al., 2011) and changes in striatal dopamine levels (Clarke et al., 2014), while stimulation of the VS influenced OFC activity and possibly connectivity (Ewing and Grace, 2013). Furthermore, it has been hypothesized that the OFC contributes information about current external states and is therefore crucially involved in the model-based aspect of learning (Takahashi et al., 2011). These findings suggest that the reduced OFC activation in participants with a high level of aberrant salience attribution might be related to the reduced RPE coding in the VS.
Limitations include that, because of our correlational approach, we cannot make any claims about causality. We focused here on the relationship between task-relevant, adaptive RPE signals during reinforcement learning and aberrant salience attribution measures. We did not measure brain activation related to aberrant salience attribution during the SAT. Roiser et al. (2010) found VS activation in response to adaptive salience during the SAT, i.e., the contrast of high-probability versus low-probability cues, in healthy participants, but no VS activation elicited by aberrant salience attribution to neutral cues. In another study, explicit aberrant salience ratings correlated positively with VS activation to irrelevant cue features across individuals (Roiser et al., 2013). Thus, the relation between task-relevant neural learning signals such as VS RPE (Roiser et al., 2013) and neural correlates of task-irrelevant, aberrant signals remains to be established. Studies in prodromal subjects found increased dopamine levels in the associative, dorsal striatum (Howes et al., 2009; Fusar-Poli et al., 2010). We did not obtain any differences in this striatal subdivision in this sample of healthy participants in relation to aberrant salience. However, an exploratory analysis revealed an association between aberrant salience attribution and presynaptic dopamine levels using values derived from the peak of RPE signal, which was located in the VS. This finding suggests an interdependence of dopamine levels and salience attribution given the assumption that RPEs and salience are associated concepts, but has to be replicated in an independent sample. Whether this result is related to the finding of increased dopamine levels in prodromal and schizophrenic subjects remains to be elucidated.
In conclusion, our findings are in line with predictions of the aberrant salience account of psychosis (Heinz, 2002; Kapur, 2003; Heinz and Schlagenhauf, 2010; Winton-Brown et al., 2014): we provide evidence that the attribution of salience to irrelevant cues is associated with reduced encoding of RPEs in the VS and OFC during reinforcement learning and with increased presynaptic dopamine levels in the VS region associated with RPE encoding. Here, obtained in healthy subjects, the same relationship may contribute to the development of positive symptoms in the psychotic disease spectrum.
Footnotes
This work was supported by grants from Deutsche Forschungsgemeinschaft (SCHL1969/1-1, GRK-1123). We thank Yu Fukuda, Saineb Alaa-Eddine, Sarah Diner, and Jakob Kaminski for assistance during data acquisition.
The authors declare no competing financial interests.
- Correspondence should be addressed to Rebecca Boehme, Department of Psychiatry and Psychotherapy, Charité Universitätsmedizin Berlin, Campus Mitte, Charitéplatz 1, 10117 Berlin, Germany. rebecca.boehme{at}charite.de