Skip to main content

Main menu

  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Collections
    • Podcast
  • ALERTS
  • FOR AUTHORS
    • Information for Authors
    • Fees
    • Journal Clubs
    • eLetters
    • Submit
    • Special Collections
  • EDITORIAL BOARD
    • Editorial Board
    • ECR Advisory Board
    • Journal Staff
  • ABOUT
    • Overview
    • Advertise
    • For the Media
    • Rights and Permissions
    • Privacy Policy
    • Feedback
    • Accessibility
  • SUBSCRIBE

User menu

  • Log out
  • Log in
  • My Cart

Search

  • Advanced search
Journal of Neuroscience
  • Log out
  • Log in
  • My Cart
Journal of Neuroscience

Advanced Search

Submit a Manuscript
  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Collections
    • Podcast
  • ALERTS
  • FOR AUTHORS
    • Information for Authors
    • Fees
    • Journal Clubs
    • eLetters
    • Submit
    • Special Collections
  • EDITORIAL BOARD
    • Editorial Board
    • ECR Advisory Board
    • Journal Staff
  • ABOUT
    • Overview
    • Advertise
    • For the Media
    • Rights and Permissions
    • Privacy Policy
    • Feedback
    • Accessibility
  • SUBSCRIBE
PreviousNext
Articles, Behavioral/Cognitive

Neural Correlates of the Divergence of Instrumental Probability Distributions

Mimi Liljeholm, Shuo Wang, June Zhang and John P. O'Doherty
Journal of Neuroscience 24 July 2013, 33 (30) 12519-12527; https://doi.org/10.1523/JNEUROSCI.1353-13.2013
Mimi Liljeholm
Division of the Humanities and Social Sciences and Computation and Neural Systems Program, California Institute of Technology, Pasadena, California 91125
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Shuo Wang
Division of the Humanities and Social Sciences and Computation and Neural Systems Program, California Institute of Technology, Pasadena, California 91125
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
June Zhang
Division of the Humanities and Social Sciences and Computation and Neural Systems Program, California Institute of Technology, Pasadena, California 91125
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
John P. O'Doherty
Division of the Humanities and Social Sciences and Computation and Neural Systems Program, California Institute of Technology, Pasadena, California 91125
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Article
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF
Loading

Abstract

Flexible action selection requires knowledge about how alternative actions impact the environment: a “cognitive map” of instrumental contingencies. Reinforcement learning theories formalize this map as a set of stochastic relationships between actions and states, such that for any given action considered in a current state, a probability distribution is specified over possible outcome states. Here, we show that activity in the human inferior parietal lobule correlates with the divergence of such outcome distributions–a measure that reflects whether discrimination between alternative actions increases the controllability of the future–and, further, that this effect is dissociable from those of other information theoretic and motivational variables, such as outcome entropy, action values, and outcome utilities. Our results suggest that, although ultimately combined with reward estimates to generate action values, outcome probability distributions associated with alternative actions may be contrasted independently of valence computations, to narrow the scope of the action selection problem.

Introduction

Theories of goal-directed behavior originated with a seminal series of early demonstrations that animals are able to learn about the structure of their environment in the absence of primary rewards (Blodgett, 1929; Tolman and Honzik, 1930). Specifically, in stark contrast to the, then dominating, view of behavior as being controlled exclusively by the incremental modulation of stimulus-response (S-R) associations based on contingent reward or punishment (Thorndike, 1933), these studies suggested that when given the opportunity to explore a maze, nonrewarded rats constructed a valence-neutral “cognitive map” of instrumental and environmental relationships, that could be flexibly integrated with subsequent reward information to generate an optimal course of action (Tolman, 1948).

Contemporary accounts of behavioral control characterize instrumental performance as being governed both by the reinforcement-based, S-R, component and by a more cognitive, goal-directed, system (Balleine and Dickinson, 1998). These separate strategies have been formalized as distinct classes of reinforcement learning (RL): An automatic “model-free” system, in which the values of actions are acquired by means of a reward prediction error (RPE), and a “model-based” class that constructs a mental map of the environment and generates decisions by flexibly combining estimates of state-transition probabilities with outcome utilities (Doya et al., 2002; Daw et al., 2005). Thus, in model-based RL, relationships between actions and future states of the world are represented explicitly and independently of associated motivational features.

Notably, given equivalent costs, actions that yield identical outcome states need not be contrasted further in terms of motivational features, reducing the demand for a computationally costly binding of outcome probabilities with utilities. Consequently, the extent to which actions differ in terms of their relationships to future states, that is, the divergence of their outcome probability distributions, can be used to prune searches of the mental map. Instrumental divergence also serves as a measure of agency–the more actions differ with respect to contingent states, the more flexible control an agent has over its environment. Because of these important characteristics, we hypothesized that a neural signature of instrumental divergence, dissociable from that of motivational variables, would be discernible during human goal-directed performance.

We scanned human participants with functional magnetic resonance imaging (fMRI) as they performed a simple task in which available actions yielded various food rewards with different probabilities (Fig. 1A). Our primary objective was to assess neural correlates of the difference between outcome distributions associated with alternative actions, formalized as Jensen–Shannon (JS) divergence–a measure that quantifies the distance between probability distributions. The relationship between JS divergence and other decision variables is illustrated in Figure 1B. First, in this example, uncertainty about which outcome will be obtained (i.e., outcome entropy) is the same across actions; for each action, one can be almost certain that a particular food reward will occur while the alternative food and the “no food” outcome will not. Likewise, provided that one enjoys oranges as much as Twix bars, the actions are equivalent in terms of their expected value. And yet, the two actions clearly differ with respect to contingent states; this difference is captured by JS divergence.

Figure 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 1.

Illustration of the task and the concept of outcome divergence. A, Illustration of trial in the choice phase: at the onset of a trial, two of four alternative actions are highlighted in white, indicating their availability, together with depictions of potential trial outcomes. After participants choose, the chosen action is highlighted in green for 2 s, followed by either a picture of the obtained food outcome (on rewarded trials) or a white line in the center of the screen (nonrewarded trials). Trials are separated by a jittered 6 s intertrial interval. B, An illustration of the relationship between actions, outcomes, and associated probabilities. The graph shows two available actions, A1 and A2 (coded in blue and red, respectively), where the bars represent the probability distribution of each action across a set of three potential outcomes: a Twix bar, an orange, and a “no-food” outcome state, indicated by a gray bar. JS divergence is a measure of the distance between the two distributions.

Materials and Methods

Participants.

Twenty-two healthy normal volunteers (mean age = 23 0.1 ± 4.3; range = 19–38, 10 females) participated in the study. The volunteers were pre-assessed to exclude those with a history of neurological or psychiatric illness. The eating attitudes test (EAT-26) (Garner et al., 1982) was administered and indicated no eating disorders in any of the subjects (mean score, 3.6 ± 2.8; range, 0–13; all scores were under the 20 point cutoff). Before being scheduled for the experiment, the subjects were prescreened to ensure that they enjoyed sweet and salty treats, that they had no allergies or intolerances, and that they were not overweight, on a diet, or planning to go on a diet. Subjects were asked to fast for at least 4 h before their scheduled arrival time at the laboratory, but were permitted to drink water. All subjects gave informed consent and the study was approved by the Institutional Review Board of the California Institute of Technology.

Task and procedure.

A simple instrumental task was used, in which four action alternatives (i.e., button presses) yielded various food rewards with different probabilities. Specifically, on any given trial, two available actions could differ with respect to the probability with which they produced their respective rewards, with respect to the subjective utility of those rewards and/or with respect to the integrated action value. Thus, by manipulating both probabilities and utilities, we were able to largely decorrelate the different components of the decision problem. To ensure sufficient variance in experienced probabilities and utilities, each subject participated in three consecutive sessions (during a single appearance by the subject in the lab), with the same four actions but with a novel set of food outcomes and outcome probabilities being used in each session. Throughout the task, available actions were indicated by corresponding rectangles on the computer screen, together with images of the food outcomes potentially produced by those actions (Fig. 1A). At the start of the experiment, participants were informed that they would have to remain in the laboratory for 30 min after completing the task, during which they would be allowed to consume any earned treats.

The probabilities with which actions produced their outcomes were generated so as to minimize correlations between our three decision variables (i.e., between outcome probabilities, outcome values, and action values), and thus varied slightly across subjects. Nonetheless, across sessions and action alternatives, probabilities of 0, 0.2, 0.5, 0.7, and 1.0 were used for each subject. In addition, each subject had two probabilities chosen from the set [0.3 0.6 0.8 and 0.9]. The probabilities drawn from this set differed depending on the decorrelation constraints imposed by subjective outcome utilities. A minimum of two and maximum of four distinct probabilities were used in each session.

The subjective values of 36 potential food rewards (listed in Table 1), represented by photographic images, were assessed using evaluative ratings of their pleasantness (on a scale from 0 to 9), as well as a Becker–DeGroot–Marschak (BDM) auction procedure that has been shown to elicit an individual's willingness to pay for a consumer good (Becker et al., 1964). The pleasantness ratings were used to set the utility of food stimuli in subsequent phases of the experiment. The BDM auction was used to obtain convergent evidence for these ratings, providing a measure of inter-rater reliability. Specifically, in the BDM auction, participants were endowed with $5 with which to bid on the various food items. They were instructed that, on each trial, they would have to indicate an amount of money from $0 to $5 that they were willing to pay for the food item displayed on that trial and that, at the end of the experiment, the computer would randomly select one trial from all presented in that phase, as well as randomly draw an amount between $0 and $5. Participants were further told that if the bid that they had indicated on the randomly drawn trial was less than the amount generated by the computer, they would not receive the food item, but would get to keep the $5, and that otherwise they would have to pay the amount generated by the computer and would get to consume the food item.

View this table:
  • View inline
  • View popup
Table 1.

List of the 36 food treats used in BDM auction

In each session, participants first went through a “contingency learning” phase, in which each action was sampled 10 times with associated food rewards occurring with respective probabilities. Only one action was available on each trial in this phase, to ensure complete sampling, and outcome occurrences were generated using predetermined sequences such that if, for example, the probability of an outcome was 0.2, that outcome was presented on exactly 2 of the 10 trials. Furthermore, participants were instructed that they would not actually earn any of the food rewards produced by the actions in this phase, but that it was simply meant to expose them to action–outcome relationships. They then proceeded to a “choice phase” in which, on each of 48 trials, they chose between two of the four action alternatives (Fig. 1A). They were instructed that, at the end of the experiment, three trials would be drawn from this phase, and that they would be allowed to consume any rewards earned on those trials upon completion of the task. Following the choice phase, participants provided judgments of the existence and strength of each action–outcome relationship. Active scanning was only performed during the choice phases.

Computational learning model.

We implemented a model-based RL learner, which uses experience with state transitions to update a matrix, T(s,a,s′), of state transition probabilities, where each element of T(s,a,s′) holds the current estimate of the probability of transitioning from state s to s′ given action a. In our task, as illustrated in Figure 1A, on each trial participants were presented with a choice screen displaying two available actions together with the food outcomes potentially produced by those two actions. Thus, each initial state was defined by the particular available actions and their potential outcomes. The two available actions were drawn from a total of four; consequently, in the choice phase of each session, there were six distinct initial states, repeatedly encountered across 48 trials. The state transitions were initialized to the preprogrammed distributions from the contingency learning phase. In each step, leaving state s and arriving in state s′ having taken action a, the FORWARD learner computes a state prediction error[3] (SPE): δSPE = 1 − T(s,a,s′), and updates the probability T(s,a,s′) of the observed transition via: T (s,a,s′) = T(s,a,s′) + ηδSPE where η is a free parameter controlling the learning rate. Estimated transition probabilities are used together with the rewards at the end states, r(s′) (the magnitude of which were based on pleasantness ratings and taken as given, since potential rewards were displayed together with the actions) to compute state-action values, Q(s,a) as the expectation over the value of the successor state. This is done by defining the state-action values at each level in terms of reward anticipated at the next level: Q(s,a) = Σs'T(s,a,s′) *r(s′).

The model additionally assumes that participants select actions stochastically using probabilities generated by a softmax distribution, such that P(s, a) = exp(τ×Q(s, a))∑b=1nexp(τ×Q(s, b)), where the free “inverse temperature” parameter τ controls the degree to which choices are biased toward the highest valued action. To account for the difference in salience between food pictures and the “no outcome” display (which simply consisted of a white line), we used separate learning rate parameters, η and η′, on rewarded and nonrewarded trials, respectively. Free parameters were fit to behavioral data by minimizing the negative log-likelihood, −Σlog(P(s,a)), of obtained choices for each individual.

Information theoretic variables.

We computed the JS divergence of the outcome probability distributions for the two actions available on a given trial. A finite and symmetrized version of the Kullback–Leibler divergence, JS divergence specifies the distance between probability distributions M and N as follows: Embedded Image It is worth noting that, while we have used JS divergence here to quantify the degree to which alternative actions differ with respect to contingent transitions in environmental states, it is not the only computational variable that captures this conceptual point. For example, mutual information (between actions and outcomes) is a highly related information theoretic measure that would be identical to JS divergence for our current purposes, as would be the χ2 divergence. A particularly compelling aspect of JS divergence is its remarkable generality: it applies to nominal and numerical, discrete and continuous random variables, and it intuitively generalizes to an arbitrary number of probability distributions. The applicability to multiple distributions (Lin, 1991) is especially important as it eliminates the need for a complex, and presumably computationally costly, process of comparing multiple available actions in a pairwise fashion.

Another important information theoretic variable that can be extracted from the state-transition matrix, and that has been previously shown to profoundly affect decision making (Paulus et al., 2002; Feinstein et al., 2006), is the uncertainty, or entropy, of outcome states. Whereas JS divergence reflects the distance between outcome probability distributions, entropy reflects the degree of uncertainty in an outcome, which is greatest when the probability distribution over outcomes is uniform (i.e., all outcomes are equally likely) and smallest when the probability of a particular outcome is 1 or 0. We computed the Shannon entropy of the outcome variable X conditional on the action variable Y, defined as H(X|Y) = ∑x∈X, y∈Yp(x,y)logp(y)p(x,y), both for the case where the (chosen) action is known (p(y) = [1,0]) and for the case where the two available actions are equally likely (p(y)=[0.5,0.5]). To illustrate the relationship between outcome probabilities and information theoretic variables, a representative set of probability distributions, with corresponding levels of JS divergence and entropy, are shown in Figure 2.

Figure 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 2.

A representative set of cases used in the experiment. Each case consists of two probability distributions, one for each available action, across three potential outcome states (O1, O2, and O3) including the no-food outcome. A1 and A2 indicate the two actions available on a given trial, drawn from a total of four possible action alternatives. X indicates the levels of JS divergence (blue) and entropy (red) for each case.

Imaging procedure and analysis.

A 3 T scanner (MAGNETOM Trio; Siemens) was used to acquire structural T1-weighted images and T2*-weighted echoplanar images (repetition time = 2.65 s; echo time = 30 ms; flip angle = 90°; 45 transverse slices; matrix = 64 × 64; field of view = 192 mm; thickness = 3 mm; slice gap = 0 mm) with blood oxygenation level-dependent (BOLD) contrast. To recover signal loss from dropout in the medial orbitofrontal cortex (O'Doherty et al., 2002), each horizontal section was acquired at 30° to the anterior commissure–posterior commissure axis. Image processing and statistical analyses were performed using SPM8 (http://www.fil.ion.ucl.ac.uk/spm). The first four volumes of images were discarded to avoid T1 equilibrium effects. All remaining volumes were corrected for differences in the time of slice acquisition, realigned to the first volume, spatially normalized to the Montreal Neurological Institute (MNI) echoplanar imaging template, and spatially smoothed with a Gaussian kernel (8 mm, full-width at half-maximum). We used a high-pass filter with a cutoff of 128 s.

For each subject, we constructed an fMRI design matrix, merged across the three sessions, with two regressors modeling the distinct time periods of each trial. The first, choice-period regressor, modeled a BOLD response from the onset of each trial until the chosen action was performed and a second stick function modeled the onset of the feedback screen. For the choice-period regressor, we entered as parametric modulators, in order, the expected value of the chosen action, the sum of and the absolute difference between the expected values of the two available actions, the utility of the outcome potentially produced by the chosen action, and the sum and absolute difference in utility of the two potential outcomes depicted on the screen. Absolute differences, rather than that between chosen and unchosen, were used to minimize regressor redundancies. Finally, for this trial period, we entered as modulators the entropy conditional on the chosen action, the entropy conditional on both available actions, and the JS divergence of outcome probability distributions of available actions. For the outcome regressor, we entered as modulators, in order, the RPE, the SPE, and the utility of the received outcome. Orthogonalization was applied according to order such that each parametric modulator was orthogonalized to all preceding modulators associated with the same onset regressor. To rule out motor-execution components, the response time on each trial was added as a regressor of no interest, as were two regressors indicating the three sessions and six regressors accounting for the residual effects of head motion. All regressors were convolved with a canonical hemodynamic response function. Group-level random-effects statistics were generated by entering contrasts of parameter estimates for the different modulators into a between-subjects analysis.

We specifically looked for neural effects in areas previously shown to be involved in the implementation of our modeled decision variables. First, in a recent neuroimaging study, Gläscher et al. (2010) assessed neural correlates of SPEs, finding effects in the lateral prefrontal cortex (LPFC) and intraparietal sulcus (IPS). We predicted that activity in these areas would likewise correlate with SPEs in the current study. We also predicted that the dorsolateral prefrontal cortex (DLPFC) would encode the entropy of outcome probability distributions: activity in this area has been shown to scale with the amount of uncertainty associated with a decision, to predict risk aversion (Weber and Huettel, 2008) and, when disrupted by transcranial magnetic stimulation, to increase selection of risky option (Knoch et al., 2006). With respect to our motivational variables, studies of goal-directed performance, which emphasize the casual relationships between actions and outcomes (Tanaka et al., 2008; Liljeholm et al., 2011) or higher-order relational structures (Hampton et al., 2006), have implicated the ventromedial prefrontal cortex (VMPFC) in the encoding of action values. In contrast, activity in the medial orbitofrontal cortex (mOFC), the insula, and ventral striatum (VS), has been shown to correlate with the utility, or value, of a stimulus (Hare et al., 2008; Abler et al., 2009; Schmidt et al., 2012). The anterior insula has also been implicated in the affective evaluation of food pictures (Pelchat et al., 2004; Wang et al., 2004) and in anticipation and experience of appetitive tastes (O'Doherty et al., 2001, 2002). Finally, The VS has been shown, by myriad studies, to encode a RPE (O'Doherty et al., 2003, 2004).

Small volume corrections (SVCs) were performed on a priori regions of interest (ROIs), using a 10 mm sphere with center coordinates obtained by averaging across relevant studies (coordinates are listed in Table 2). All other effects were reported at p < 0.05, using cluster size thresholding (CST) to adjust for multiple comparisons (Forman et al., 1995). AlphaSim, a Monte Carlo simulation (AFNI) was used to determine cluster size and significance. Using an individual voxel probability threshold of p = 0.001 indicated that using a minimum cluster size of 134 MNI transformed voxels resulted in an overall significance of p < 0.05. To eliminate nonindependence bias for plots of parameter estimates, a leave-one-subject-out (Esterman et al., 2010) approach was used, in which 22 general linear models (GLMs) were run with one subject left out in each, and with each GLM defining the voxel cluster for the left out subject. Using rfxplot (Gläscher, 2009), mean β-weights were extracted from spheres (10 mm) centered on the LOSO peaks (identified within ROIs for SVCs) and were averaged across subjects to plot overall effect sizes.

View this table:
  • View inline
  • View popup
Table 2.

Center coordinates for SVC, averaged across local maxima and studies

Results

Behavioral data and model fits

Pleasantness ratings were fairly evenly distributed across the scale, and were highly correlated (r2 = 0.99) with bids in the BDM auction (Fig. 3A). Moreover, participants' judgments of action–outcome relationships, collected at the end of each session, were close to the programmed contingencies (r2 = 0.98; Fig. 3B). Finally, a comparison of the model-derived choice probabilities with participants' actual choices suggested that the model matches behavior well (r2 = 0.98; Fig. 3C).

Figure 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 3.

Behavioral results: A, Frequencies of pleasantness ratings for 36 food stimuli across participants. B, Scatter plot showing the mean bid in the BDM, across participants and food stimuli, as a function of pleasantness ratings. C, Scatter plot showing the mean rated action probability, across participants, as a function of programmed action probabilities (binned). D, Scatter plot showing participants mean choices as a function of the model-generated choice probabilities (binned). Error bars in indicate SEM.

Neuroimaging results

All results described below are corrected for multiple comparisons at p < 0.05 using either CST across the whole brain, or SVC based on coordinates averaged across previous studies reporting effects of relevant decision variables (see Materials and Methods for details of multiple-comparison correction strategy). Coordinates and cluster sizes for all the activated areas described below are reported in Table 3.

View this table:
  • View inline
  • View popup
Table 3.

Coordinates and significance values for imaging contrasts

State-transition variables

An exploratory test, for areas in which activity during the choice period (i.e., from the trial onset until a response was performed) correlated with the distance between the outcome probability distributions of the two available actions (i.e., with JS divergence) yielded effects in right anterior supramarginal gyrus of the inferior parietal lobule (IPL; Fig. 4A) as well as in the supplementary motor area (SMA) extending into the right precentral gyrus, surviving CST correction. Critically, these effects also emerged when no orthogonalization was applied, ruling out the possibility that activity in these areas was selectively modulated by the orthogonalized component. Neural activity correlating with the entropy of the chosen action during the same period emerged in the right DLPFC (SVC) (Fig. 4B). At the time of outcome delivery, activity in both the LPFC and IPS was significantly modulated by the SPE (SVC), as was activity extending throughout middle and posterior cingulate cortex(PCC; CST) (Fig. 4C).

Figure 4.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 4.

Imaging effects of state-transition variables. A, Maps of the t statistics for tests of neural modulation by JS divergence, showing effects in the right supramarginal gyrus of the IPL. B, Map of the t statistics for tests of neural modulation by the entropy of the outcome probability distribution for the chosen action, showing effects in the DLPFC. C, Map of the t statistics for tests of neural modulation by SPEs during the feedback period of each trial, showing effects in the LPFC, IPS, and PCC. Bar plot shows responses to SPEs in the LPFC. Bar plots showing mean β-weights across variable values are binned into the 25th, 50th, 75th, and 100th percentiles. Error bars indicate SEM. a.u., arbitrary units.

Simpler representations of outcome probabilities

A possible alternative explanation for our effects of JS divergence is that the IPL and SMA are encoding simpler representations of outcome probabilities; for example, if participants did not attend to the sensory-specific or potential motivational differences between food outcomes but instead encoded, for each action, the probability of obtaining any food reward, activity in the IPL or SMA might be scaling with the simple difference between or sum of these probabilities across available actions. As illustrated in Figure 1B, JS divergence can deviate quite dramatically from the difference between reward probabilities, with the former being relatively high and the latter being zero in this example. Indeed, the difference between reward probabilities was not strongly correlated with JS divergence in our task and, moreover, weak correlations were in different directions across subjects (mean absolute value of r = 0.18, SEM = 0.03). Our task also included several instances for which divergence varied independently of the sum of reward probabilities; for example, again using Figure 1B, consider a case in which the probabilities associated with the Twix bar are shifted to the no-food outcome state and vice versa; this would yield the same level of JS divergence but a dramatic reduction in the sum of reward probabilities. Nonetheless, the sum of reward probabilities across available actions was strongly positively correlated with JS divergence for each subject (mean r = 0.78, SEM = 0.05).

To empirically assess the neural effects of simpler representations of outcome probabilities relative to those of JS divergence we specified two additional GLMs that were identical to our original model except for the replacement of the JS divergence modulator with a regressor modeling the difference between or the sum of reward probabilities, respectively, in the second and third model. Separate models were specified for two main reasons: first, to avoid excessive colinearity of regressor variables and, second, to rule out the possibility that neural correlates varied only with those components of JS divergence that are orthogonal to linear representations of probabilities. No significant effects of JS divergence, the difference between, or sum of reward probabilities emerged when a single GLM was used to model all three variables (in addition to those previously specified) suggesting that there was indeed too much shared variance between these regressors. We also found no effects of either the difference or the sum of reward probabilities at our threshold of statistical significance when using separate GLMs, although effects did emerge at an uncorrected threshold of p < 0.05. To formally determine which of the three variables provided the best account of neural activity in the SMA and IPL, we performed a Bayesian model selection analysis. Specifically, we used the first-level Bayesian estimation procedure in SPM8 to compute a voxelwise whole-brain log-model evidence map for every subject and each model (Penny et al., 2005). Then, to model inference at the group level, we applied a random effects approach (Rosa et al., 2010) at every voxel of the log evidence data falling within anatomical masks of the right supramarginal gyrus and SMA, constructing an exceedance posterior probability (EPP) map for each model and for each anatomical area.

We found that the difference between reward probabilities did indeed provide the best account of neural activity in the largest portion of SMA, generating EPPs >0.33 in 413 voxels, followed by JS divergence with EPPs >0.33 in 325 voxels. Meanwhile, the sum of reward probabilities only generated EPPs >0.33 in 14 voxels. In contrast to the SMA, as shown in Figure 5, JS divergence provided a better account of neural activity in a dramatically greater portion of the right IPL than did both the difference between and sum of reward probabilities, with EPPs >0.33 in 342 voxels for JS divergence, in 123 voxels for the difference between probabilities, and in only 2 voxels for the sum. It is of course entirely feasible that more than one computation is being performed in a large anatomical area, and not necessarily the case that the variable that shows superiority in the largest number of voxels is the most essential; in particular when the difference in cluster size is relatively small, and when only one variable generates significant effects using a classical analysis, as is the case here with JS divergence. Nonetheless, as we cannot completely rule out the difference between reward probabilities as the source of our effects in the SMA, we refrain from any further discussion of this area.

Figure 5.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 5.

Results of a Bayesian model selection analysis. EPP maps in an anatomical mask of the right supramarginal gyrus, generated based on three GLMs that were identical except for the inclusion of either JS divergence (left), the difference between reward probabilities (middle), or the sum of reward probabilities (right). The EPP maps are thresholded at 0.333 with probabilities of 1.00 indicated by black.

Stimulus utility and RPEs

During the choice period, the summed utilities (i.e., pleasantness ratings) of the two food rewards that could potentially be obtained on a given trial correlated with activity in the left anterior insula (SVC) and in the left lateral VS (SVC) (Fig. 6A). Weaker effects also emerged in the right anterior insula and right lateral VS at p < 0.005 uncorrected. No other effects of stimulus utility emerged during this trial period. At the time of outcome delivery, activity in the medial VS was significantly correlated with the RPE (SVC), as was activity throughout the medial frontal and visual cortex (CST; Fig. 6B). Notably, there was no effect of the utility of the delivered outcome; however, this lack of result is likely due to the high correlation between this variable and the RPE (r = 0.7). To verify that this was the case, we orthogonalized the RPE to outcome utility, rather than the other way around, giving outcome utility the explanatory power. Using this model, a test for the utility of the delivered outcome yielded significant effects in the mOFC (SVC) and throughout the lingual gyrus and calcarine sulcus (CST).

Figure 6.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 6.

Imaging effects of motivational variables. A, Maps of the t statistics for tests of neural modulation by the summed utility of food outcomes obtainable on a given trial, showing effects in the anterior insula and lateral VS. Bar plot shows responses in the left anterior insula. B, Map of the t statistics for tests of neural modulation by RPEs during the feedback period of each trial, showing effects throughout the medial PFC and in the medial VS. Bar plot shows responses to RPEs in the VS. C, Map of the t statistics for tests of neural modulation by the Q value of the chosen action, showing effects in the dorsal SMA and VMPFC. Bar plots show responses in the VMPFC. Bar plots showing mean β-weights across variable values are binned into the 25th, 50th, 75th, and 100th percentiles. Error bars indicate SEM. a.u., arbitrary units.

Action values

The expected value of the chosen action during the choice period was significantly correlated with activity in VMPFC (SVC) (Fig. 6C), as well as with activity in dorsal SMA, extending throughout the paracentral lobule and into adjacent frontal and parietal areas (CST). No effects were found for the sum of, or difference between, the expected values of available actions.

Discussion

Despite their conceptual appeal and recent popularity, very little is yet known about how well model-based RL theories capture the neural computations underlying goal-directed behavior. In particular, the formalization of Tolman's cognitive map as a probability distribution over instrumental actions and outcomes has remained largely untested. Here, we found that activity in the supramarginal gyrus of the IPL correlated with the divergence of outcome probability distributions associated with available actions. In contrast, activity in the DLPFC varied with the entropy of outcome distributions for chosen actions, while activity in the IPS and LPFC reflected the error-based updating of state transitions. Importantly, these effects were dissociable from those of motivational variables, such as the utility of potential outcomes, which elicited activity in the insula and VS, and the expected (Q) value of the chosen action, which correlated with activity in VMPFC. Our findings complement recent data suggesting that animals develop and maintain a rich internal model of their environment (den Ouden et al., 2009; Gläscher et al., 2010; Abe and Lee, 2011).

Although tasks similar to ours have previously been used to address action–outcome learning, our specific hypothesis–that outcome divergence may capture how multiple instrumental relationships are simultaneously contrasted in a meaningful way–is to our knowledge a novel proposal. Our results suggest that the IPL, an area previously implicated in the planning, execution, and observation of goal-directed actions (Fincham et al., 2002; Liljeholm et al., 2011, 2012), as well as in the experience of agency (Chaminade and Decety, 2002; Farrer et al., 2008; Sperduti et al., 2011), implements a comparison of instrumental probability distributions. This finding has broad implications, potentially generalizing to other types of predictive relationships, and providing a means of linking action–outcome learning to more abstract features of goal-directed performance, such as agency and intent attribution.

A closely related topic is how the brain represents various outcome identities (Hamilton and Grafton, 2006, 2008; Stalnaker et al., 2010; Abe and Lee, 2011; Klein-Flügge et al., 2013), over which instrumental divergence can be defined. In a recent neuroimaging study, Klein-Flügge et al. (2013) assessed repetition suppression of BOLD responses to cues that signaled either the same or different food rewards, essentially yielding a low versus high level of outcome divergence. They found that the identities of food outcomes were encoded by the mOFC, with no such effects emerging in the IPL. Notably, Klein-Flügge et al. (2013) eliminated any effects of stimuli that predicted neutral events, to show that mOFC encodes only the identities of rewarding outcomes. In contrast, the anterior IPL has been shown to exhibit repetition suppression of BOLD responses to neutral outcome identities (Hamilton and Grafton, 2006, 2008). Here, we investigate a valence-neutral “cognitive map” of action–outcome contingencies, treating nonreward and rewarding outcome states as equivalent in our computation of divergence. Although several other factors differed across Klein-Flügge et al.'s (2013) task and ours (e.g., our use of probabilistic and instrumental contingencies), we suspect that their exclusion of any areas in which the identities of both neutral and rewarding outcomes were encoded might account for the differences in neural results.

It should be noted that the IPL has been strongly implicated in visuospatial attention and salience; however, such effects tend to emerge in a much more posterior region of the inferior parietal cortex than that identified here (Müri et al., 1996; Gottlieb et al., 1998; Kastner et al., 1999; Corbetta and Shulman, 2002; Husain and Rorden, 2003; Mevorach et al., 2006; Buschman and Miller, 2007; Arcizet et al., 2011; Leathers and Olson, 2012). Indeed, the anterior portion of the supramarginal gyrus identified in the current study has been anatomically established as clearly distinct from more posterior parietal regions (Mars et al., 2011), and has been functionally implicated in the representation of action outcomes with paradigms that largely rule out attentional confounds (Hamilton and Grafton, 2006, 2008).

A few previous neuroimaging studies have used economic decision tasks to separate information theoretic from motivational variables (Luhmann et al., 2008; Abler et al., 2009; Smith et al., 2009; Symmonds et al., 2011). The current experiment differs from such studies in several critical respects: First, in our study, outcome probabilities were acquired through trial-by-trial exposure to action–outcome contingencies, rather than being verbally or graphically instructed–substantial behavioral evidence suggests that decisions based on descriptive information can differ quite dramatically from those based on direct experience (Hertwig, 2012). Second, we used instrumental contingencies, whereas in gambling studies decisions are stimulus based, with stimuli being randomly assigned to particular actions on each trial. Finally, unlike previous studies addressing the entropy, risk, or probability associated with a single decision, we assessed neural encoding of the divergence of outcome distributions associated with simultaneously available action alternatives.

Our approach yields unique insights: to our knowledge, the activity currently observed in the IPL is quite different from parietal activity emerging in gambling paradigms, which has been more posterior, and which has not been unambiguously attributable to probability magnitudes versus entropy or risk (Ernst et al., 2004; Weber and Huettel, 2008; Smith et al., 2009; Symmonds et al., 2011). Our effects in the more anterior portion of the IPL may reflect the use of instrumental contingencies: Gläscher et al. (2009) found that effects in this area were stronger for action-based than for stimulus-based decisions. Another novel contribution of the current study is the dissociation of goal and action values. Both Abler et al. (2009) and Smith et al. (2009) found that activity in the insula increased with reward magnitude, consistent with our effects in this area of the utility of potential food outcomes. However, whereas neither previous study reported any areas of activation for action values beyond those observed for reward magnitude, we found that VMPFC activity increased with the value of the chosen action. This discrepancy between previous studies and ours may reflect selective encoding of experienced over instructed information: Fitzgerald et al. (2010) found greater VMPFC activation for experientially acquired than for described value signals.

Other previous work has directly investigated neural correlates of model-based versus model-free RL. However, these studies focus primarily on value signals (e.g., Daw et al., 2005), without exploring the possibility that valence-neutral state transitions are represented independently of associated motivational features. One notable exception is a study by Simon and Daw (2011) in which the degree of model-based branching, essentially the complexity of forward planning, was defined as the number of choices available in the current state, as well as the expected number of choices in subsequent states. They found positive neural correlates of these measures in the lateral precentral cortex, the anterior insula, and the anterior cingulate/SMA, but not in the IPL. There are, however, two critical differences between their analyses and ours: First, they were modeling the number of available actions, rather than contingent outcome states; this distinction is particularly important given previous findings that the right anterior IPL distinguishes between outcome identities, but not between action kinematics (Hamilton and Grafton, 2006, 2008). Second, Simon and Daw modeled the number of expected future options summed across, rather than as a divergence across, currently available options. It is not surprising, therefore, that the results differed substantially from those obtained here.

Goal-directed performance is characterized by a sensitivity to changes in the instrumental contingency that has been reliably demonstrated in rodents as well as humans (Hammond, 1980; Shanks and Dickinson, 1991; Balleine and Dickinson, 1998; Liljeholm et al., 2011). In a previous study, using a free-operant task, in which the rate of executing a single rewarded action is self-paced, we found that activity in the IPL correlated with changes in instrumental contingency, formalized as the difference between probabilities of reward in the presence versus absence of an action (Liljeholm et al., 2011). Unlike outcome divergence, instrumental contingency conflates the probabilities and values of outcomes. Moreover, instrumental contingency is signed, reflecting the relative advantage of performing or not performing a particular action (indeed, in our previous study, activity in the left IPL was found to correlate negatively with instrumental contingency, perhaps reflecting the relative advantage of withholding a response). Nonetheless, outcome divergence can be characterized as a general, symmetric, extension of instrumental contingency to the case of multiple actions and outcomes. As such, our current demonstration of a role for the IPL in encoding divergence is consistent with our previous work implicating this area in the probabilistic integration of action alternatives.

In conclusion, we show modulation of the IPL by the divergence of outcome probability distributions associated with alternative actions, and dissociate this decision variable from both stimulus utilities and action values. As applied here, JS divergence reflects the extent to which discrimination between available actions has any impact on the occurrence, and predictability, of future states. Conversely, this information theoretic measure captures the attributability of a current environment to distinct antecedent actions. As such, it is likely to play a central role in goal-directed encoding of action–outcome contingencies, a suggestion that is supported by the current findings.

Footnotes

  • This work was funded by a National Institutes of Health Grant (DA033077-01) to J.P.O. The authors thank Daniel McNamee for helpful discussions.

  • Correspondence should be addressed to Mimi Liljeholm, Division of the Humanities and Social Sciences and Computation and Neural Systems Program, California Institute of Technology, Pasadena, CA 91125. mlil{at}hss.caltech.edu

References

  1. ↵
    1. Abe H,
    2. Lee D
    (2011) Distributed coding of actual and hypothetical outcomes in the orbital and dorsolateral prefrontal cortex. Neuron 70:731–741, doi:10.1016/j.neuron.2011.03.026, pmid:21609828.
    OpenUrlCrossRefPubMed
  2. ↵
    1. Abler B,
    2. Herrnberger B,
    3. Grön G,
    4. Spitzer M
    (2009) From uncertainty to reward: BOLD characteristics differentiate signaling pathways. BMC Neurosci 10:154, doi:10.1186/1471-2202-10-154, pmid:20028546.
    OpenUrlCrossRefPubMed
  3. ↵
    1. Arcizet F,
    2. Mirpour K,
    3. Bisley JW
    (2011) A pure salience response in posterior parietal cortex. Cereb Cortex 21:2498–2506, doi:10.1093/cercor/bhr035, pmid:21422270.
    OpenUrlAbstract/FREE Full Text
  4. ↵
    1. Balleine BW,
    2. Dickinson A
    (1998) Goal-directed instrumental action: contingency and incentive learning and their cortical substrates. Neuropharmacology 37:407–419, doi:10.1016/S0028-3908(98)00033-1, pmid:9704982.
    OpenUrlCrossRefPubMed
  5. ↵
    1. Becker GM,
    2. DeGroot MH,
    3. Marschak J
    (1964) Measuring utility by a single-response sequential method. Behav Sci 9:226–232, doi:10.1002/bs.3830090304, pmid:5888778.
    OpenUrlCrossRefPubMed
  6. ↵
    1. Blodgett HC
    (1929) The effect of the introduction of reward upon the maze performance of rats. University of California Publications in Psychology 4:113–134.
    OpenUrl
  7. ↵
    1. Buschman TJ,
    2. Miller EK
    (2007) Top-down versus bottom-up control of attention in the prefrontal and posterior parietal cortices. Science 315:1860–1862, doi:10.1126/science.1138071, pmid:17395832.
    OpenUrlAbstract/FREE Full Text
  8. ↵
    1. Chaminade T,
    2. Decety J
    (2002) Leader or follower? Involvement of the inferior parietal lobule in agency. Neuroreport 13:1975–1978, doi:10.1097/00001756-200210280-00029, pmid:12395103.
    OpenUrlCrossRefPubMed
  9. ↵
    1. Corbetta M,
    2. Shulman GL
    (2002) Control of goal-directed and stimulus-driven attention in the brain. Nat Rev Neurosci 3:201–215, pmid:11994752.
    OpenUrlCrossRefPubMed
  10. ↵
    1. Daw ND,
    2. Niv Y,
    3. Dayan P
    (2005) Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat Neurosci 8:1704–1711, doi:10.1038/nn1560, pmid:16286932.
    OpenUrlCrossRefPubMed
  11. ↵
    1. den Ouden HE,
    2. Friston KJ,
    3. Daw ND,
    4. McIntosh AR,
    5. Stephan KE
    (2009) A dual role for prediction error in associative learning. Cereb Cortex 19:1175–1185, pmid:18820290.
    OpenUrlAbstract/FREE Full Text
  12. ↵
    1. Doya K,
    2. Samejima K,
    3. Katagiri K,
    4. Kawato M
    (2002) Multiple model-based reinforcement learning. Neural Comput 14:1347–1369, doi:10.1162/089976602753712972, pmid:12020450.
    OpenUrlCrossRefPubMed
  13. ↵
    1. Ernst M,
    2. Nelson EE,
    3. McClure EB,
    4. Monk CS,
    5. Munson S,
    6. Eshel N,
    7. Zarahn E,
    8. Leibenluft E,
    9. Zametkin A,
    10. Towbin K,
    11. Blair J,
    12. Charney D,
    13. Pine DS
    (2004) Choice selection and reward anticipation: an fMRI study. Neuropsychologia 42:1585–1597, doi:10.1016/j.neuropsychologia.2004.05.011, pmid:15327927.
    OpenUrlCrossRefPubMed
  14. ↵
    1. Esterman M,
    2. Tamber-Rosenau BJ,
    3. Chiu YC,
    4. Yantis S
    (2010) Avoiding nonindependence in fMRI data analysis: leave one subject out. Neuroimage 50:572–576, doi:10.1016/j.neuroimage.2009.10.092, pmid:20006712.
    OpenUrlCrossRefPubMed
  15. ↵
    1. Farrer C,
    2. Frey SH,
    3. Van Horn JD,
    4. Tunik E,
    5. Turk D,
    6. Inati S,
    7. Grafton ST
    (2008) The angular gyrus computes action awareness representations. Cereb Cortex 18:254–261, pmid:17490989.
    OpenUrlAbstract/FREE Full Text
  16. ↵
    1. Feinstein JS,
    2. Stein MB,
    3. Paulus MP
    (2006) Anterior insula reactivity during certain decisions is associated with neuroticism. Soc Cogn Affect Neurosci 1:136–142, doi:10.1093/scan/nsl016, pmid:18985124.
    OpenUrlAbstract/FREE Full Text
  17. ↵
    1. Fincham JM,
    2. Carter CS,
    3. van Veen V,
    4. Stenger VA,
    5. Anderson JR
    (2002) Neural mechanisms of planning: a computational analysis using event-related fMRI. Proc Natl Acad Sci U S A 99:3346–3351, doi:10.1073/pnas.052703399, pmid:11880658.
    OpenUrlAbstract/FREE Full Text
  18. ↵
    1. Fitzgerald TH,
    2. Seymour B,
    3. Bach DR,
    4. Dolan RJ
    (2010) Differentiable neural substrates for learned and described value and risk. Curr Biol 20:1823–1829, doi:10.1016/j.cub.2010.08.048, pmid:20888231.
    OpenUrlCrossRefPubMed
  19. ↵
    1. Forman SD,
    2. Cohen JD,
    3. Fitzgerald M,
    4. Eddy WF,
    5. Mintun MA,
    6. Noll DC
    (1995) Improved assessment of significant activation in functional magnetic resonance imaging (fMRI): use of a cluster-size threshold. Magn Reson Med 33:636–647, doi:10.1002/mrm.1910330508, pmid:7596267.
    OpenUrlCrossRefPubMed
  20. ↵
    1. Garner DM,
    2. Olmsted MP,
    3. Bohr Y,
    4. Garfinkel PE
    (1982) The eating attitudes test: psychometric features and clinical correlates. Psychol Med 12:871–878, doi:10.1017/S0033291700049163, pmid:6961471.
    OpenUrlCrossRefPubMed
  21. ↵
    1. Gläscher J
    (2009) Visualization of group inference data in functional neuroimaging. Neuroinformatics 7:73–82, doi:10.1007/s12021-008-9042-x, pmid:19140033.
    OpenUrlCrossRefPubMed
  22. ↵
    1. Gläscher J,
    2. Hampton AN,
    3. O'Doherty JP
    (2009) Determining a role for ventromedial prefrontal cortex in encoding action-based value signals during reward-related decision making. Cereb Cortex 19:483–495, pmid:18550593.
    OpenUrlAbstract/FREE Full Text
  23. ↵
    1. Gläscher J,
    2. Daw N,
    3. Dayan P,
    4. O'Doherty JP
    (2010) States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron 66:585–595, doi:10.1016/j.neuron.2010.04.016, pmid:20510862.
    OpenUrlCrossRefPubMed
  24. ↵
    1. Gottlieb JP,
    2. Kusunoki M,
    3. Goldberg ME
    (1998) The representation of visual salience in monkey parietal cortex. Nature 391:481–484, doi:10.1038/35135, pmid:9461214.
    OpenUrlCrossRefPubMed
  25. ↵
    1. Hamilton AF,
    2. Grafton ST
    (2006) Goal representation in human anterior intraparietal sulcus. J Neurosci 26:1133–1137, doi:10.1523/JNEUROSCI.4551-05.2006, pmid:16436599.
    OpenUrlAbstract/FREE Full Text
  26. ↵
    1. Hamilton AF,
    2. Grafton ST
    (2008) Action outcomes are represented in human inferior frontoparietal cortex. Cereb Cortex 18:1160–1168, pmid:17728264.
    OpenUrlAbstract/FREE Full Text
  27. ↵
    1. Hammond LJ
    (1980) The effect of contingency upon the appetitive conditioning of free-operant behavior. J Exp Anal Behav 34:297–304, doi:10.1901/jeab.1980.34-297, pmid:16812191.
    OpenUrlCrossRefPubMed
  28. ↵
    1. Hampton AN,
    2. Bossaerts P,
    3. O'Doherty JP
    (2006) The role of the ventromedial prefrontal cortex in abstract state-based inference during decision making in humans. J Neurosci 26:8360–8367, doi:10.1523/JNEUROSCI.1010-06.2006, pmid:16899731.
    OpenUrlAbstract/FREE Full Text
  29. ↵
    1. Hare TA,
    2. O'Doherty J,
    3. Camerer CF,
    4. Schultz W,
    5. Rangel A
    (2008) Dissociating the role of the orbitofrontal cortex and the striatum in the computation of goal values and prediction errors. J Neurosci 28:5623–5630, doi:10.1523/JNEUROSCI.1309-08.2008, pmid:18509023.
    OpenUrlAbstract/FREE Full Text
  30. ↵
    1. Hertwig R
    (2012) The psychology and rationality of decisions from experience. Synthese 187:269–292, doi:10.1007/s11229-011-0024-4.
    OpenUrlCrossRef
  31. ↵
    1. Husain M,
    2. Rorden C
    (2003) Non-spatially lateralized mechanisms in hemispatial neglect. Nat Rev Neurosci 4:26–36, doi:10.1038/nrn1005, pmid:12511859.
    OpenUrlCrossRefPubMed
  32. ↵
    1. Kastner S,
    2. Pinsk MA,
    3. De Weerd P,
    4. Desimone R,
    5. Ungerleider LG
    (1999) Increased activity in human visual cortex during directed attention in the absence of visual stimulation. Neuron 22:751–761, doi:10.1016/S0896-6273(00)80734-5, pmid:10230795.
    OpenUrlCrossRefPubMed
  33. ↵
    1. Klein-Flügge MC,
    2. Barron HC,
    3. Brodersen KH,
    4. Dolan RJ,
    5. Behrens TE
    (2013) Segregated encoding of reward-identity and stimulus-reward associations in human orbitofrontal cortex. J Neurosci 33:3202–3211, doi:10.1523/JNEUROSCI.2532-12.2013, pmid:23407973.
    OpenUrlAbstract/FREE Full Text
  34. ↵
    1. Knoch D,
    2. Gianotti LR,
    3. Pascual-Leone A,
    4. Treyer V,
    5. Regard M,
    6. Hohmann M,
    7. Brugger P
    (2006) Disruption of right prefrontal cortex by low-frequency repetitive transcranial magnetic stimulation induces risk-taking behavior. J Neurosci 26:6469–6472, doi:10.1523/JNEUROSCI.0804-06.2006, pmid:16775134.
    OpenUrlAbstract/FREE Full Text
  35. ↵
    1. Leathers ML,
    2. Olson CR
    (2012) In monkeys making value-based decisions, LIP neurons encode cue salience and not action value. Science 338:132–135, doi:10.1126/science.1226405, pmid:23042897.
    OpenUrlAbstract/FREE Full Text
  36. ↵
    1. Liljeholm M,
    2. Tricomi E,
    3. O'Doherty JP,
    4. Balleine BW
    (2011) Neural correlates of instrumental contingency learning: differential effects of action-reward conjunction and disjunction. J Neurosci 31:2474–2480, doi:10.1523/JNEUROSCI.3354-10.2011, pmid:21325514.
    OpenUrlAbstract/FREE Full Text
  37. ↵
    1. Liljeholm M,
    2. Molloy CJ,
    3. O'Doherty JP
    (2012) Dissociable brain systems mediate vicarious learning of stimulus-response and action-outcome contingencies. J Neurosci 32:9878–9886, doi:10.1523/JNEUROSCI.0548-12.2012, pmid:22815503.
    OpenUrlAbstract/FREE Full Text
  38. ↵
    1. Lin J
    (1991) Divergence measures based on the Shannon entropy. IEEE Trans Inform Theory 37:145–151.
    OpenUrlCrossRef
  39. ↵
    1. Luhmann CC,
    2. Chun MM,
    3. Yi DJ,
    4. Lee D,
    5. Wang XJ
    (2008) Neural dissociation of delay and uncertainty in intertemporal choice. J Neurosci 28:14459–14466, doi:10.1523/JNEUROSCI.5058-08.2008, pmid:19118180.
    OpenUrlAbstract/FREE Full Text
  40. ↵
    1. Mars RB,
    2. Jbabdi S,
    3. Sallet J,
    4. O'Reilly JX,
    5. Croxson PL,
    6. Olivier E,
    7. Noonan MP,
    8. Bergmann C,
    9. Mitchell AS,
    10. Baxter MG,
    11. Behrens TE,
    12. Johansen-Berg H,
    13. Tomassini V,
    14. Miller KL,
    15. Rushworth MF
    (2011) Diffusion-weighted imaging tractography-based parcellation of the human parietal cortex and comparison with human and macaque resting-state functional connectivity. J Neurosci 31:4087–4100, doi:10.1523/JNEUROSCI.5102-10.2011, pmid:21411650.
    OpenUrlAbstract/FREE Full Text
  41. ↵
    1. Mevorach C,
    2. Humphreys GW,
    3. Shalev L
    (2006) Opposite biases in salience-based selection for the left and right posterior parietal cortex. Nat Neurosci 9:740–742, doi:10.1038/nn1709, pmid:16699505.
    OpenUrlCrossRefPubMed
  42. ↵
    1. Müri RM,
    2. Iba-Zizen MT,
    3. Derosier C,
    4. Cabanis EA,
    5. Pierrot-Deseilligny C
    (1996) Location of the human posterior eye field with functional magnetic resonance imaging. J Neurol Neurosurg Psychiatry 60:445–448, doi:10.1136/jnnp.60.4.445, pmid:8774415.
    OpenUrlAbstract/FREE Full Text
  43. ↵
    1. O'Doherty JP,
    2. Deichmann R,
    3. Critchley HD,
    4. Dolan RJ
    (2002) Neural responses during anticipation of a primary taste reward. Neuron 33:815–826, doi:10.1016/S0896-6273(02)00603-7, pmid:11879657.
    OpenUrlCrossRefPubMed
  44. ↵
    1. O'Doherty JP,
    2. Dayan P,
    3. Friston K,
    4. Critchley H,
    5. Dolan RJ
    (2003) Temporal difference models and reward-related learning in the human brain. Neuron 38:329–337, doi:10.1016/S0896-6273(03)00169-7, pmid:12718865.
    OpenUrlCrossRefPubMed
  45. ↵
    1. O'Doherty J,
    2. Rolls ET,
    3. Francis S,
    4. Bowtell R,
    5. McGlone F
    (2001) Representation of pleasant and aversive taste in the human brain. J Neurophysiol 85:1315–1321, pmid:11248000.
    OpenUrlAbstract/FREE Full Text
  46. ↵
    1. O'Doherty J,
    2. Dayan P,
    3. Schultz J,
    4. Deichmann R,
    5. Friston K,
    6. Dolan RJ
    (2004) Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science 304:452–454, doi:10.1126/science.1094285, pmid:15087550.
    OpenUrlAbstract/FREE Full Text
  47. ↵
    1. Paulus MP,
    2. Hozack N,
    3. Frank L,
    4. Brown GG
    (2002) Error rate and outcome predictability affect neural activation in prefrontal cortex and anterior cingulate during decision-making. Neuroimage 15:836–846, doi:10.1006/nimg.2001.1031, pmid:11906224.
    OpenUrlCrossRefPubMed
  48. ↵
    1. Pelchat ML,
    2. Johnson A,
    3. Chan R,
    4. Valdez J,
    5. Ragland JD
    (2004) Images of desire: food-craving activation during fMRI. Neuroimage 23:1486–1493, doi:10.1016/j.neuroimage.2004.08.023, pmid:15589112.
    OpenUrlCrossRefPubMed
  49. ↵
    1. Penny WD,
    2. Trujillo-Barreto NJ,
    3. Friston KJ
    (2005) Bayesian fMRI time series analysis with spatial priors. Neuroimage 24:350–362, doi:10.1016/j.neuroimage.2004.08.034, pmid:15627578.
    OpenUrlCrossRefPubMed
    1. Plassman H,
    2. O'Doherty JP,
    3. Rangel A
    (2010) Appetitive and aversive goal values are encoded in the medial orbitofrontal cortex at the time of decision making. J Neurosci 30:10799–10808.
    OpenUrlAbstract/FREE Full Text
  50. ↵
    1. Rosa MJ,
    2. Bestmann S,
    3. Harrison L,
    4. Penny W
    (2010) Bayesian model selection maps for group studies. Neuroimage 49:217–224, doi:10.1016/j.neuroimage.2009.08.051, pmid:19732837.
    OpenUrlCrossRefPubMed
  51. ↵
    1. Schmidt L,
    2. Lebreton M,
    3. Cléy-Melin ML,
    4. Daunizeau J,
    5. Pessiglione M
    (2012) Neural mechanisms underlying motivation of mental versus physical effort. PLoS Biol 10:e1001266, doi:10.1371/journal.pbio.1001266, pmid:22363208.
    OpenUrlCrossRefPubMed
  52. ↵
    1. Shanks DR,
    2. Dickinson A
    (1991) Instrumental judgment and performance under variations in action-outcome contingency and contiguity. Mem Cognit 19:353–360, doi:10.3758/BF03197139, pmid:1895945.
    OpenUrlCrossRefPubMed
  53. ↵
    1. Simon DA,
    2. Daw ND
    (2011) Neural correlates of forward planning in a spatial decision task in humans. J Neurosci 31:5526–5539.
    OpenUrlAbstract/FREE Full Text
  54. ↵
    1. Smith BW,
    2. Mitchell DG,
    3. Hardin MG,
    4. Jazbec S,
    5. Fridberg D,
    6. Blair RJ,
    7. Ernst M
    (2009) Neural substrates of reward magnitude, probability, and risk during a wheel of fortune decision-making task. Neuroimage 44:600–609, doi:10.1016/j.neuroimage.2008.08.016, pmid:18804540.
    OpenUrlCrossRefPubMed
  55. ↵
    1. Sperduti M,
    2. Delaveau P,
    3. Fossati P,
    4. Nadel J
    (2011) Different brain structures related to self- and external-agency attribution: a brief review and meta-analysis. Brain Struct Funct 216:151–157, doi:10.1007/s00429-010-0298-1, pmid:21212978.
    OpenUrlCrossRefPubMed
  56. ↵
    1. Stalnaker TA,
    2. Calhoon GG,
    3. Ogawa M,
    4. Roesch MR,
    5. Schoenbaum G
    (2010) Neural correlates of stimulus-response and response-outcome associations in dorsolateral versus dorsomedial striatum. Front Integr Neurosci 4:12, pmid:20508747.
    OpenUrlCrossRefPubMed
  57. ↵
    1. Symmonds M,
    2. Wright ND,
    3. Bach DR,
    4. Dolan RJ
    (2011) Deconstructing risk: separable encoding of variance and skewness in the brain. Neuroimage 58:1139–1149, doi:10.1016/j.neuroimage.2011.06.087, pmid:21763444.
    OpenUrlCrossRefPubMed
  58. ↵
    1. Tanaka SC,
    2. Balleine BW,
    3. O'Doherty JP
    (2008) Calculating consequences: brain systems that encode the causal effects of actions. J Neurosci 28:6750–6755, doi:10.1523/JNEUROSCI.1808-08.2008, pmid:18579749.
    OpenUrlAbstract/FREE Full Text
  59. ↵
    1. Thorndike EL
    (1933) A proof of the Law of Effect. Science 77:173–175, doi:10.1126/science.77.1989.173, pmid:17819705.
    OpenUrlFREE Full Text
  60. ↵
    1. Tolman EC
    (1948) Cognitive maps in rats and men. Psychol Rev 55:189–208, doi:10.1037/h0061626, pmid:18870876.
    OpenUrlCrossRefPubMed
  61. ↵
    1. Tolman EC,
    2. Honzik CH
    (1930) “Insight” in rats. University of California, Publications in Psychology 4:215–232.
    OpenUrl
  62. ↵
    1. Wang GJ,
    2. Volkow ND,
    3. Telang F,
    4. Jayne M,
    5. Ma J,
    6. Rao M,
    7. Zhu W,
    8. Wong CT,
    9. Pappas NR,
    10. Geliebter A,
    11. Fowler JS
    (2004) Exposure to appetitive food stimuli markedly activates the human brain. Neuroimage 21:1790–1797, doi:10.1016/j.neuroimage.2003.11.026, pmid:15050599.
    OpenUrlCrossRefPubMed
  63. ↵
    1. Weber BJ,
    2. Huettel SA
    (2008) The neural substrates of probabilistic and intertemporal decision making. Brain Res 1234:104–115, doi:10.1016/j.brainres.2008.07.105, pmid:18710652.
    OpenUrlCrossRefPubMed
Back to top

In this issue

The Journal of Neuroscience: 33 (30)
Journal of Neuroscience
Vol. 33, Issue 30
24 Jul 2013
  • Table of Contents
  • Table of Contents (PDF)
  • About the Cover
  • Index by author
  • Advertising (PDF)
  • Ed Board (PDF)
Email

Thank you for sharing this Journal of Neuroscience article.

NOTE: We request your email address only to inform the recipient that it was you who recommended this article, and that it is not junk mail. We do not retain these email addresses.

Enter multiple addresses on separate lines or separate them with commas.
Neural Correlates of the Divergence of Instrumental Probability Distributions
(Your Name) has forwarded a page to you from Journal of Neuroscience
(Your Name) thought you would be interested in this article in Journal of Neuroscience.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Print
View Full Page PDF
Citation Tools
Neural Correlates of the Divergence of Instrumental Probability Distributions
Mimi Liljeholm, Shuo Wang, June Zhang, John P. O'Doherty
Journal of Neuroscience 24 July 2013, 33 (30) 12519-12527; DOI: 10.1523/JNEUROSCI.1353-13.2013

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Respond to this article
Request Permissions
Share
Neural Correlates of the Divergence of Instrumental Probability Distributions
Mimi Liljeholm, Shuo Wang, June Zhang, John P. O'Doherty
Journal of Neuroscience 24 July 2013, 33 (30) 12519-12527; DOI: 10.1523/JNEUROSCI.1353-13.2013
Twitter logo Facebook logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Jump to section

  • Article
    • Abstract
    • Introduction
    • Materials and Methods
    • Results
    • Discussion
    • Footnotes
    • References
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF

Responses to this article

Respond to this article

Jump to comment:

No eLetters have been published for this article.

Related Articles

Cited By...

More in this TOC Section

Articles

  • Memory Retrieval Has a Dynamic Influence on the Maintenance Mechanisms That Are Sensitive to ζ-Inhibitory Peptide (ZIP)
  • Neurophysiological Evidence for a Cortical Contribution to the Wakefulness-Related Drive to Breathe Explaining Hypocapnia-Resistant Ventilation in Humans
  • Monomeric Alpha-Synuclein Exerts a Physiological Role on Brain ATP Synthase
Show more Articles

Behavioral/Cognitive

  • Is It Me or the Train Moving? Humans Resolve Sensory Conflicts with a Nonlinear Feedback Mechanism in Balance Control
  • HDAC3 Serine 424 Phospho-mimic and Phospho-null Mutants Bidirectionally Modulate Long-Term Memory Formation and Synaptic Plasticity in the Adult and Aging Mouse Brain
  • Phospho-CREB Regulation on NMDA Glutamate Receptor 2B and Mitochondrial Calcium Uniporter in the Ventrolateral Periaqueductal Gray Controls Chronic Morphine Withdrawal in Male Rats
Show more Behavioral/Cognitive
  • Home
  • Alerts
  • Follow SFN on BlueSky
  • Visit Society for Neuroscience on Facebook
  • Follow Society for Neuroscience on Twitter
  • Follow Society for Neuroscience on LinkedIn
  • Visit Society for Neuroscience on Youtube
  • Follow our RSS feeds

Content

  • Early Release
  • Current Issue
  • Issue Archive
  • Collections

Information

  • For Authors
  • For Advertisers
  • For the Media
  • For Subscribers

About

  • About the Journal
  • Editorial Board
  • Privacy Notice
  • Contact
  • Accessibility
(JNeurosci logo)
(SfN logo)

Copyright © 2025 by the Society for Neuroscience.
JNeurosci Online ISSN: 1529-2401

The ideas and opinions expressed in JNeurosci do not necessarily reflect those of SfN or the JNeurosci Editorial Board. Publication of an advertisement or other product mention in JNeurosci should not be construed as an endorsement of the manufacturer’s claims. SfN does not assume any responsibility for any injury and/or damage to persons or property arising from or related to any use of any material contained in JNeurosci.