Figure 9. Simulation of a task involving action selection considered by Roesch et al. (2007) and Takahashi et al. (2011): explanation of the neural and behavioral data and the predictions. A, Left, Presumed state transitions during individual task trials (similar to the one proposed in a model developed by Takahashi et al., 2011). S1, S2, … represent subjects' states that are defined by external events (i.e., cue onset, cue offset, reward bolus), their own movements, or internally (i.e., internal state), and A1, A2, … represent option(s) that can be taken at each state (e.g., “plan/prepare for moving to the left,” “move to the left,” or “keep rest”). At the beginning of a trial (at the leftmost in the diagram), the subject is presented with one of three odor cues, two of which (cues 1 and 3) indicated that reward will be given if the animal entered either the left-side or right-side well, respectively (forced-choice trial), whereas the remaining cue (cue 2) indicated that the animal will be rewarded in both of the wells (free-choice trial). In any case, the amount of reward, either small (1 bolus of sucrose solution at S8 or S9) or large (2 boluses, with the second bolus given successively at S10 or S11), was determined according to the predetermined direction-amount contingency that was fixed during a block (at least 60 trials in the experiments; 120 trials in our simulations) and then reversed. Forced-choice trials and free-choice trials were pseudorandomly intermingled. In our model, when the subject enters each state, subset(s) of CCS cells in the OFC are assumed to represent combination(s) of that state and options that can be taken there [e.g., at state S2, a subset represents “S2 − A2” (plan/prepare for moving to the left) and another represents “S2 − A3” (plan/prepare for moving to the right)]. Subsequently, the circuit is assumed to operate as we considered in Figure 8C, and one of the options is selected (and executed if it is a movement); if there is just a single option, it is selected (and executed). In the meantime, the dopamine neurons compute TD reward prediction error (regardless of whether there are multiple options or a single option), according to which the strengths of corticostriatal (i.e., CCS–dMSNs and CPn/PT–iMSNs) connections are plastically modified. Notably, the three time points for S2 (cue onset), S4/S5 (cue offset), and S6/S7 correspond to ti, ti + 1, and ti + 2 in Figure 8E, and thus reaction time is presumably modulated by the activity of dMSNs at cue offset. Right, Assumed soft-max function for choice. B, Dopamine neuronal response sorted by different experimental conditions: the top and bottom rows show the results of the experiments (Roesch et al., 2007) and our simulations, respectively. The vertical black line indicates the timing of cue onset. Left, Dashed and solid lines indicate the average response of dopamine neurons in the first and last 10 forced-choice select-large (green) or select-small (orange) trials in the blocks, respectively. Middle and Right, Green and orange lines indicate the average response of dopamine neurons in forced-choice (middle) or free-choice (right) select-large and select-small trials, respectively. To control for learning and also for the possibility that disadvantageous choices might be more often early in the block, only trials after the ratio of selecting more valuable option exceeds 50% were included, and each free-choice trial was paired with the immediately preceding and following forced-choice trial of the same reward amount [as done in the original work (Roesch et al., 2007)]. C, Population activity of dMSNs (top row) and iMSNs (bottom row) predicted from our model, sorted by the same experimental conditions as considered in B. D, The impact of reward amount on choice behavior in free-choice trials. Line graphs show the ratio (percentage) of choices before and after the switch from big reward to small reward (indicated by the vertical line), and the inset bar graphs show the percentage choice for large versus small reward across all free-choice trials. The left and right panels show the results of the experiments (Roesch et al., 2007) and our simulations, respectively. E, Average reaction time in the forced-choice (left column) and free-choice (right column) trials included in the data shown in the middle and right panels of B, respectively. Top row shows the experimental results (Roesch et al., 2007), and the bottom row shows our simulation results. F, Prediction from our model about the effects of D1 (left) or D2 (right) antagonist on the reaction time in forced-choice trials (top) and free-choice trials (bottom). G, Prediction from our model about the effects of optogenetic stimulation of dMSNs (left) or iMSNs (right) on choice behavior. In a new set of simulations, in one of the two wells (left and right), virtual optogenetic stimulation was applied to either dMSNs or iMSNs coincidently with reward (at S8 or S9), in addition to giving an extra bolus of reward at the subsequent timing in both of the wells; the contingency between the stimulation on/off and the location of the well was fixed for a block and alternated across blocks. The bar graphs show the ratio (percentage) of choices for with versus without optogenetic stimulation across all free-choice trials, either without dopamine receptor antagonist (light gray bars) or with both D1 and D2 antagonists (dark gray bars). The top rows in B and E and the left panels of D were taken from Roesch et al. (2007).