The Journal of Neuroscience, August 9, 2006, ():

The Role of the Ventromedial Prefrontal Cortex in Abstract State-Based Inference during Decision Making in Humans
J. Neurosci. Hampton et al.
26: 8360
Supplemental data
Files in this Data Supplement:
- supplemental material -
Supplemental Methods
- supplemental material
-
Supplementary Figure 1. Comparison of the behavioral fits of the state-based decision model to a variety of standard Reinforcement Learning algorithms, showing that the state-based model provides a better fit to subjects’ behavioral data than a range of RL algorithms. The RL algorithms fitted to the subjects’ behavioral data were: Q-learning Q(rw) using a Rescorla-Wagner update rule, Q-learning Q(td) with intermediate time steps (assumes multiple time steps within a trial and calculates the expected future reward for each time step within a trial (Sutton and Barto, 1981; O'Doherty et al., 2003b)), actor-critic AC(rw) with a Rescorla-Wagner update rule, actor-critic AC(td) with intermediate time steps, and advantage learning (Adv.) - an extension of the actor-critic (Baird, 1993). a, The log likelihoods of the action predictions (switch vs. stay) of each model show that the state-based model provides the best fit to the data. The second best fitting model is Q-learning using a Rescorla Wagner update rule, which is the RL algorithm we compared to the state-based model in the fMRI analyses. RL models are depicted in light blue. b, The state-based model shows a better fit to the data even when using the Bayes Information Criterion (BIC) to account for the fact that the state-based model has more free parameters than the RL models (the number of free parameters was 5 for the state-based model, 3 for the Q-learning Q(rw) model, 5 for the Q-learning Q(td) model, 2 for the actor-critic AC(rw), 4 for the actor-critic AC(td), and 4 for Advantage learning).
- supplemental material
-
Supplementary Figure 2. Behavioral data and model predictions for three randomly chosen subjects (subjects 1, 7, 13). The predictions of the state-based decision model (blue line) as to when to switch correspond more closely to the subjects’ actual switching behavior (red bar = switch) as compared to the predictions of the best fitting standard RL algorithm (green line). On top of each graph is the history of received rewards (blue) and punishments (red), with subjects usually switching after a string of punishments.
- supplemental material
-
Supplementary Figure 3. Plot of the model-predicted choice probabilities derived from the best-fitting RL algorithm before and after subjects switch their choice. One possible alternative explanation for the difference in predictions of the abstract-state based model and the RL model, is that in the latter we are showing the predictions of value rather than choice probability. In many RL variants such as the actor-critic, an anti-correlation between actions is built in when computing the choice probabilities. Thus, it could be argued that had we plotted choice_probability from an RL model instead of value, the predictions of the two models would be much more similar. Here we plot the choice probability data from the best-fitting RL model, which incorporates a form of anti-correlation between actions. In spite of this, we still see that the predictions of the choice probabilities from the RL model do not show the pattern of results we observe for the abstract-state-based model (where the correct choice probability jumps up following reversal). This illustrates that a normalized choice probability signal from standard RL does not emulate the effect predicted by the state-based model as found to be the case in ventromedial prefrontal cortex.