Abstract
Our choices often require appropriate actions to obtain a preferred outcome, but the neural underpinnings that link decision making and action selection remain largely undetermined. Recent theories propose that action selection occurs simultaneously, i.e., parallel in time, with the decision process. Specifically, it is thought that action selection in motor regions originates from a competitive process that is gradually biased by evidence signals originating in other regions, such as those specialized in value computations. Biases reflecting the evaluation of choice options should thus emerge in the motor system before the decision process is complete. Using transcranial magnetic stimulation, we sought direct physiological evidence for this prediction by measuring changes in corticospinal excitability in human motor cortex during value-based decisions. We found that excitability for chosen versus unchosen actions distinguishes the forthcoming choice before completion of the decision process. Both excitability and reaction times varied as a function of the subjective value-difference between chosen and unchosen actions, consistent with this effect being value-driven. This relationship was not observed in the absence of a decision. Our data provide novel evidence in humans that internally generated value-based decisions influence the competition between action representations in motor cortex before the decision process is complete. This is incompatible with models of serial processing of stimulus, decision, and action.
Introduction
Our decisions constantly require the selection of appropriate actions to obtain the outcomes we desire. The physiological mechanisms that transform decisions into actions, however, remain largely unclear. Recent work provides insights into the formation of action plans (Bastian et al., 2003; Romo et al., 2004; Bestmann et al., 2008; Pastor-Bernier and Cisek, 2011) and suggests specific roles for different prefrontal areas in guiding important aspects of reward-based decisions (Daw et al., 2006; Kable and Glimcher, 2007; Boorman et al., 2009; Wunderlich et al., 2010; Noonan et al., 2011). Surprisingly, however, there is still a substantial disconnect in our knowledge on how decision processes ultimately influence activity in motor cortex to select the appropriate action (Freedman and Assad, 2011; Padoa-Schioppa, 2011).
While traditional views temporally segregate sensation, decision, and action (Flash and Hogan, 1985; Kawato et al., 1990; Bhushan and Shadmehr, 1999), recent accounts suggest that decision processes influence action selection via continuous updates that occur simultaneously with the decision process (Mumford, 1992; Cisek, 2006, 2007; Friston, 2008; Shadlen et al., 2008; Cisek and Kalaska, 2010). Recent evidence indeed suggests such an intimate temporal relationship between decision making and action selection. For example, decision evidence can be integrated into an on-going movement plan (Resulaj et al., 2009) and is conveyed through increases in effective connectivity between decision and motor regions (Hare et al., 2011). Decision processes bias sensorimotor regions before movement onset (Donner et al., 2009) and these influences may be reflected in value signals in (pre-)motor regions (Roesch and Olson, 2003; Gupta and Aron, 2011; Pastor-Bernier and Cisek, 2011; Sul et al., 2011).
Collectively, the above findings suggest that influences on action representations in motor cortex should occur during the period of decision making, but this prediction remains untested. More specifically, it is predicted that such temporally parallel influences on motor cortex trigger competition between alternative action representations (Cisek and Kalaska, 2010). This competition is thought to be fuelled by evidence signals originating in regions such as those specialized in value-computations, which gradually bias activity of alternative action representations in motor cortex. To date, it remains unclear whether the biasing influences thus far observed in motor cortex indeed occur during decision processing, or instead reflect motor preparatory processes that follow the decision. Addressing this question requires a direct physiological read-out of competing action representations with high temporal precision, and the comparison of no-choice and choice situations.
To this end, we used transcranial magnetic stimulation (TMS) in healthy human participants to measure corticospinal excitability (CSE) changes during choice and forced choice trials of a value-decision task. We found that on choice trials, CSE distinguishes between chosen and unchosen actions before the end of the decision process, and reflects the value difference between chosen and unchosen actions. This provides novel electrophysiological evidence that value-decisions bias action selection before the decision is complete.
Materials and Methods
Participants
Sixteen healthy volunteers (11 female, 5 male; age range, 19–31 years; mean age, 21.75 years) with no history of neurological or psychiatric disorder and with normal or corrected-to-normal vision participated in the experiment, with local ethics committee approval and in accordance with the Declaration of Helsinki. All participants gave written informed consent. One participant was excluded from the analyses due to an insufficient number of trials (see Data preprocessing and statistical analyses, below).
Behavioral task
Participants performed a value-decision task with two types of trials, choice (C) and forced choice (FC) trials. On choice trials (66%), participants had to make a choice between two options that were presented to the left and right of a central fixation point. Each option was associated with a reward magnitude and a probability of obtaining this monetary reward. Reward magnitudes were displayed as the length of a horizontal bar; reward probabilities were shown as numbers (Fig. 1). The probabilities of winning were independent across the two options. This meant that on a given trial both, neither or one of the options could lead to a reward. On forced choice trials (33%), only one option was presented, which was either on the left or right of the fixation (with equal probability), while the other side of the screen showed an X (“not an option”). In both conditions, the chosen option was indicated via a button press with the left or right index finger [maximum allowed reaction time (RT): 3 s]. The chosen option was highlighted (white border; 500 ms) and the outcome signaled (green border: win; red border: no reward; 500 ms). The cumulative winnings were updated after each trial by increasing a yellow bar shown on top of the screen in proportion to the received reward magnitude.
Participants were familiarized with the task during two training sessions of 108 trials each. The main part of the experiment consisted of four experimental blocks of 144 trials each, resulting in a total of 576 trials (384 choice and 192 forced choice trials). Participants were paid £12 for their participation, plus an amount proportional to their winnings, with a maximum of £2 per block. On average, participants earned £6.00 ± 0.06 on the task.
Reward probabilities (range: 0.2–0.95) and magnitudes (range: 0–2 pence) were chosen according to the following criteria: (1) right and left hand choices had the same overall expected pay-off, (2) the two reward magnitudes or probabilities offered on a given trial were never identical, (3) the same combination of magnitudes and probabilities was never repeated (i.e., every choice pair was novel), (4) the majority of trials were hard in that one option was associated with a higher magnitude but the alternative with a higher probability, and the differences in expected values were <0.15, and (5) the offered probabilities (pS1, pS2) and magnitudes (OS1, OS2), the chosen and unchosen subjective probabilities [w(pch), w(punch)] and magnitudes [v(Och), v(Ounch)], as well as the chosen and unchosen subjective expected values [U(Sch), U(Sunch)] of choice trials were maximally decorrelated (see below). Apart from these constraints, values were randomly generated.
Correlations of the offered reward magnitudes and probabilities were thus experimentally controlled and the average correlation for the six pairwise comparisons between OS1, OS2, pS1, and pS2 was −0.027 ± 0.018. To additionally minimize the correlation between chosen and unchosen expected values, magnitudes, and probabilities, we simulated participants' choices using prospect theory according to Equations 1–3 given below (Tversky and Kahneman, 1992), with parameters α = 0.8, β = 4, γ = 0.8 (Wu et al., 2011). We generated a large number of stimulus sets and choices, and chose those sets that minimized these correlations. This would later allow us to test separately for correlations of our data with chosen and unchosen value, magnitude, or probability. Participants' actual choices led to similar correlations as the simulated choices, namely corr(v(Och), v(Ounch)) = 0.19 ± 0.003, corr(w(pch),w(punch)) = 0.27 ± 0.03, and corr(U(Sch),U(Sunch)) = 0.48 ± 0.02.
The generation of probabilities and magnitudes ensured that participants could not learn any stimulus-response or value-response mappings. On a given trial, responses could not be prepared in advance of the presentation of the options and any motor preparation should have resulted from an evaluation of the options presented. This allowed us to investigate how the value-based decision process influenced action selection.
Behavioral modeling
To infer the subjective values that participants placed on the options offered on each trial, we used cumulative prospect theory (Tversky and Kahneman, 1992; Wu et al., 2011; Hunt et al., 2012) to estimate the parameters for subjective distortions of probability and reward magnitude that best explained participant's choices.
The expected utility U(S) of an option S = (p,O) with associated probability, p, and reward outcome, O, is the product of subjective probability w(p) and subjective reward v(O): U(S) = w(p)v(O).
The value function v characterizes the distortion of information about the reward outcome O, and the probability-weighting function w models the distortion of probability information. It usually has an inverse S-shape characteristic, with an overweighting of small, and underweighting of large probabilities Let S1 and S2 be the two options offered on a given trial, then the probability that the participant chooses S1 is given by the softmax function The temperature β determines the steepness of the softmax function, i.e., how sensitive the choice probability is to differences in subjective value between S1 and S2. For each subject, we thus fitted three parameters (α, β, γ) using a maximum log-likelihood estimation in Matlab (Mathworks).
Reading out changes in corticospinal excitability during decision processing
We used single pulses of TMS to read-out changes in CSE in the period between the presentation of the options and the actual response, as indicated via a button press. Previous work has used single-pulse TMS to investigate changes in CSE during action preparation (Mars et al., 2007; van Elswijk et al., 2007; Bestmann et al., 2008; Duque and Ivry, 2009; Duque et al., 2010). This procedure was adapted to address how CSE changes evolve over the course of a value-decision process.
Participants rested their head on a chinrest to reduce head motion and wore a tight-fitting bathing cap on which the optimal site for stimulation of the right first dorsal interosseous (FDI) muscle was marked. On every trial, a single TMS pulse was delivered to the left motor cortex through a 50 mm figure-of-eight-shaped coil connected to a monophasic Magstim 2002 stimulator (Magstim). The coil intensity was adjusted to elicit a motor-evoked potential (MEP) of ∼1.5 mV in the right FDI muscle at rest. The coil was held tangentially to the skull with the handle oriented posteriorly at ∼45° from the midsagittal axis. The mean TMS pulse intensity was 53 ± 2% of the maximum stimulator output. Electromyographic (EMG) responses were recorded from the right FDI using Ag-AgCl surface electrodes in a tendon-belly montage. The EMG signal was sampled at 1000 Hz, bandpass filtered between 3 and 3000 Hz with an additional 50 Hz notch filter, fed into a CED 1902 signal conditioner, digitized using a CED micro 1401 Mk.II A/D converter, and stored on a PC running Spike2 (all from Cambridge Electronic Design).
The critical question in this experiment was whether biases reflecting the evaluation of choice options emerge in the motor system before the decision process is complete. We therefore measured changes in CSE at different times in a trial to compare the temporal evolution of CSE between C and FC trials. On every trial, TMS was applied at one of six different time points, spaced between trial onset and response. These times were adjusted for each individual, based on the RTs of the C or FC condition measured in the second training session (see below for the validity of this approach). On FC trials, the TMS times (t1–t6) corresponded to 10%, 35%, 50%, 60%, 70%, and 80% of the participant's individual mean FC-RT, thus spanning the entire trial from onset to response (Fig. 1).
On C trials, the first time point, t1, was kept identical to FC trials (10% of mean FC-RT) to allow for a direct comparison of CSE between the two conditions. This time point served as the baseline because it was too early for any decision- or action-related process to be initiated. However, rather than spreading the remaining five TMS time points equally across the trial, they were specifically targeted to the period of a trial when the decision process was most likely to take place.
In pilot data (data not shown), CSE for right hand (RH) and left hand (LH) responses separated at ∼45% of the FC-RT. Given visual stimulus processing should take at least as long on C compared with FC trials, we thus expected relevant decision processes on C trials to start at the earliest around that time. Therefore, t2 was set to 45% of the individual FC-RT. Furthermore, the time required for the decision process will be reflected in the RT difference between C and FC trials, ΔRT (Fig. 1), which can differ across participants. The TMS time points t2–t6 on choice trials were therefore spaced equidistantly between 45% of the FC-RT and participant's individual decision processing time, i.e., 45%FC-RT + ΔRT. For the TMS time points, t3–t6, this corresponded to 45% FC-RT + 0.25*ΔRT (t3), 45% FC-RT + 0.5*ΔRT (t4), 45% FC-RT + 0.75ΔRT (t5), 45% FC-RT + ΔRT (t6).
All TMS delivery times used in the main experiment were based on the average RTs of each participant measured in the second training session. In pilot experiments, we had established that in our task, RTs become stable after ∼100 trials of practice, i.e., toward the end of the first training session. They remained stable from the beginning of the second training session onwards. Indeed, average RTs did not differ between the second training session and the main experiment (t test on mean RT in training session 2 vs experiment: tC,14 = 0.20, pC = 0.85; tFC,14 = −0.50, pFC = 0.63; Fig. 2a). This demonstrates that the specific TMS times used in the main experiment targeted the trial period we were most interested in.
Data preprocessing.
EMG data were exported to Matlab and peak-to-peak amplitudes of the TMS-evoked MEP measured for every trial. Small (amplitude < 0.12 mV) and outlier MEPs (Grubbs test, p < 0.005) as well as those from trials with precontraction in the target FDI muscle (signal > 0.1 mV in the 50 ms preceding the pulse) were discarded. The remaining MEPs of each block were z-normalized to ensure comparability across experimental blocks and to ensure that all participants were given the same weight in subsequent statistical analyses (Hasbroucq et al., 1999; Burle et al., 2002; Davranche et al., 2007; van Elswijk et al., 2007; van den Wildenberg et al., 2010; Tandonnet et al., 2012). Note that in our figures, we additionally show the corresponding average raw MEP values.
Trials in which participants responded prematurely (<100 ms) or too late (>3000 ms) were excluded from the analyses. In one of 16 participants, 51% of trials had to be excluded based on these criteria and the data were not further analyzed because of an insufficient number of trials for each condition. On average, in all remaining participants, 14 ± 2% of trials were discarded (small MEP amplitude: 5%; outliers: 0.02%; precontraction: 9%; premature/late response: 0.1%), leaving an average of 494 ± 11 trials of a total of 576 for further analyses. We note that the relatively high proportion of excluded trials with precontraction is due to the fact that, because of the inherent variability in participants' RTs, TMS would sometimes occur immediately before, or even during the overt response and therefore coincide with its corresponding burst in electromyographic activity.
Even though we carefully discarded any trials with muscle precontraction in the target FDI muscle, the reported effects might potentially be influenced by trials with precontraction levels that were not detected by the above exclusion criterion. To rule out this possibility, we calculated the root mean square (RMS) of the EMG signal in the 50 ms before the TMS pulse (Mars et al., 2009), which reflects such subthreshold fluctuations in the signal. This measure, instead of the MEP amplitude, was then subjected to all analyses and statistical tests reported for MEPs.
Stimulus-locked MEP analysis.
In the first analysis, MEPs were averaged according to the absolute time after trial onset at which TMS was delivered (stimulus-locked) separately for RH and LH responses. This corresponded exactly to the six TMS delivery points (Fig. 1; t1–t6). Because the six time points depended on individual RTs measured in the second training block, absolute TMS times varied across participants, but critically were comparable in relation to the RTs of each participant. In the summary graph (Fig. 3a), the points on the x-axis therefore correspond to the average time of TMS delivery across participants.
Response-locked MEP analysis.
To assess whether any difference in CSE between chosen and unchosen actions might be observed on choice trials during the decision process, and to reveal when a bias in CSE is observed with respect to the actual response on a given trial, we conducted a second analysis. Trials were now sorted according to the time between the TMS pulse and the actual response (response-locked). By experimental design (Fig. 1), we sought to avoid applying TMS too close to the response, but this could still occur on some trials with unexpectedly short RTs. Therefore, all trials in which TMS was delivered <100 ms before the response (−100 ms) were discarded, including those trials where TMS was applied during or after the response (mean number of excluded trials: 31 ± 5). This avoided ramping effects (Evarts, 1966; Starr et al., 1988; Leocani et al., 2000) but was also necessary because there was not enough data available for robust averaging in this time range. Similarly, we discarded trials in which TMS was delivered earlier than 100% of the participant's mean RT, before response (−100%), separately for FC and C (mean number of excluded trials: 40 ± 3). This only excluded trials with unusually long RTs (where TMS occurred very early with respect to the reaction time of that trial). Thus, the distance of all TMS times with respect to the response now ranged between [−average RT] and [−100 ms].
To allow for averaging across participants, we normalized TMS times with respect to the group mean RT, separately for C and FC trials. To this end, we multiplied each trial's TMS time by a constant factor (Group mean RT)/(Participant mean RT) (range: 0.66–1.29 for C; 0.76–1.17 for FC). Consequently, TMS times for all participants with respect to the response now fell between [−group RT] and [∼−100 ms].
This procedure led to an almost continuous distribution of TMS times, relative to the RT on a given trial, due to the inherent variability of RTs across trials. We therefore discretized these data to allow for statistical comparison and display of the CSE changes over time. Based on the time between the TMS pulse and response, MEPs of choice trials were grouped and averaged using equally spaced 60 ms bins with divisions at [−750, −690, −630, −570, −510, −450, −390, −330, −270, −210] ms. These bins were chosen such that the number of observations in each bin were approximately matched (average: 12.44 ± 0.83 MEPs) and such that no time bin included less than five MEPs in any participant. Importantly, the choice of bins did not change our main conclusion.
It was critical that we used the same bins for the FC condition because this enabled a direct comparison of the exact time at which a bias in CSE between chosen and unchosen action representations can be observed. Critically, it also allowed us to test whether the bias occurred earlier in C compared with FC trials. This comparison had not previously been made due to earlier experiments lacking a no-choice control condition when reporting choice biases in motor regions (Donner et al., 2009). Here, the same bins were used for FC trials, starting at −450 ms (before the response) because overall RTs on FC trials were significantly shorter, and thus the earlier bins did not contain any data. Altogether, we therefore obtained 11 choice and six forced choice time points, which were numbered backwards according to their distance from the response (t−11 to t−1 and t−6 to t−1; Fig. 3b).
CSE changes during the decision period: isolating the decision process.
The third and most conservative analysis was performed using a much more stringent exclusion criterion for choice trials. Assuming a strictly serial processing from sensation to action, one would expect that action selection and specification will commence only once the decision is complete. In this case, any difference in CSE between chosen and unchosen actions would be trivially explained by pure action selection and specification. However, one would then not expect any difference between (future) chosen and unchosen actions in the period when the decision is being processed. We therefore now excluded all trials in which TMS might have been applied at a time when the decision process was already complete, thus ruling out any contribution from the action selection and specification process. A divergence between the CSE of chosen and unchosen actions in this pure decision period would then provide evidence that action selection is influenced before the decision process is complete.
One can estimate the time necessary for action selection and specification independent of the decision process from FC trials. Because FC trials do not require a value-decision, the time of first CSE divergence, relative to the time when the overt response is made, provides an estimate for the time necessary for action selection and specification. This assumes that this time is comparable in C and FC trials. Following this logic, excluding any MEPs on C trials that were recorded at a time when CSE was biased on FC trials (going backwards from the response) conservatively controls for simple action selection and specification, leaving only those choice trials where TMS was applied during stimulus or decision processing (for a graphic display of this procedure, see Fig. 4c).
To obtain the time at which a bias in CSE first occurs on FC trials, while enabling comparison across trials with different RTs, the time of TMS delivery was expressed as the percentage of a trial's RT. Zero percent now corresponded to trial start and 100% to the response. This showed that, on FC trials, CSE first distinguished between chosen and unchosen action at 38% of the FC-RT (Fig. 4a), and therefore indicates the time at which action selection processes start. For averaging, we chose the largest number of equally sized bins that would still lead to at least five data points per bin. This allowed for robust statistical comparisons. The time at which the bias was first observed (38%) was approximately consistent across other choices of bins.
Therefore the absolute time (in ms) that corresponded to the last 62% (100 − 38%) of the FC-RT was regarded as the action selection period and was discarded from C trials in this conservative analysis; only if TMS was applied before this cutoff time was the trial included in the analysis (Fig. 4c). This was done for each participant based on their individual mean FC-RT (from the main experiment). On average, 86 ± 8 trials were additionally excluded based on this criterion, leaving an overall number of 409 ± 16 trials (of 576). For display purposes, we rescaled the remaining time of a given trial (i.e., RT − 0.62*RTFC) to 100%. Thus, the TMS time of each choice trial, i, was now expressed between stimulus onset (0%) and the end of what we considered the decision period TMS%,i = TMSms,i/(RTms,i − 0.62*RTFC). Data were averaged across six equally large bins (Fig. 4b).
Relating value-difference to changes in CSE.
We next assessed whether CSE would be biased according to the evidence that a chosen action is better than an unchosen alternative action. In other words, the strength of the biasing influences motor cortex receives from regions processing the decision should depend on how much more valuable the chosen action is compared with its alternative. Thus, MEPs for an RH response should be large when the value for the RH option is clearly larger than that of the LH option, but it should be smaller when the two options have more similar values. In contrast, when the LH is chosen, MEPs from the left hemisphere should be more suppressed the larger the difference in value between LH and RH actions. In brief, there should be a positive correlation between value-difference and MEP size for RH responses, and a negative correlation between value-difference and MEP size for LH responses.
To test this prediction, we used the modeled subjective values of each participant to calculate the difference in value between the chosen and unchosen options for each trial. This value-difference was then regressed against MEP size using a linear fit separately for RH and LH responses. To maximize statistical power, this analysis included all MEPs for which a bias was detectable in the stimulus-locked analysis (t2–t6; see Results, below). Because wrong choices are likely to be due to noise in the decision process (Cisek, 2006), only those trials were included in which the subjectively better option was chosen (average number of MEPs included in this analysis: 107 ± 5 RH, 127 ± 5 LH). The resulting slopes of all subjects were subjected to a t test, again separately for RH and LH.
Statistical analysis
All statistical analyses of RTs and MEPs were performed using repeated-measures ANOVAs and t tests; all t tests were two-sided unless stated otherwise. p values in the figures denote comparisons surviving conservative Bonferroni-correction. p values in the figures denote comparisons surviving conservative Bonferroni-correction with ** (p < 0.05/6 = 0.0083 for six, p < 0.05/9 = 0.0056 for nine, and p < 0.05/11 = 0.0045 for eleven comparisons), and those significant at an uncorrected p < 0.05 with *. When sphericity assumptions were violated, results are reported with Huynh-Feldt correction.
Results
Behavioral choices are guided by value
Reaction times were significantly longer for choice versus forced choice trials (C: 809 ± 33 ms, FC: 539 ± 17 ms, t14 = 10.29, p = 6.54e-08; Fig. 2b). The RT difference between C and FC trials, ΔRT, provided an estimate of the time required to make a value-decision on choice trials in this task.
The behavioral model fits explained participants' choices well (Fig. 2c, right; Table 1). Consistent with the pattern typically reported (Wu et al., 2011; Hunt et al., 2012), we observed an overweighting of small and underweighting of large probabilities (Fig. 2c, top left), as well as a concave utility function (Fig. 2c, bottom left). According to the model-estimated values, participants chose the subjectively more valuable option on 83 ± 2% of trials. Participants were also faster for easy versus hard choices (i.e., large vs small difference in subjective expected values between the options; median split: easy: 759 ± 30 ms, hard: 850 ± 37 ms, t14 = 6.90, p = 7.27 × 10−6; Fig. 2b). This confirmed that action selection was governed by value-based decisions.
CSE distinguishes between chosen and unchosen actions before the value-decision process is complete
To test whether activity in the motor system reflects the chosen action during the decision process, single-pulse TMS was delivered to left M1 at one of six different time points (t1–t6; Fig. 1). This provided a measure of the temporal evolution of CSE. Because TMS was applied to left M1, a RH button press will be referred to as “chosen” (the stimulated hemisphere was the hemisphere chosen to make the action), and a LH button press as “unchosen” (left M1 was not used to perform the action).
A stimulus-locked analysis of changes in CSE revealed the earliest time when a difference between the two alternative actions became evident. On both FC and C trials, excitability between chosen (RH) and unchosen (LH) actions diverged ∼200 ms after trial onset and at least 300 ms before action (2 × 2 × 6 ANOVA with factors C/FC × hand × time: F(5,70) (hand × time) = 30.87, p < 0.001, η2 = 0.16; post hoc t test on RH vs LH: pt1,C > 0.2, pt1,FC > 0.2; t2 to t6, all pC < 0.02; pFC < 0.02; Fig. 3a). Excitability increased for the chosen and decreased for the unchosen actions (2 × 6 ANOVA with factors C/FC × time, separately for RH and LH: F3.9,54.2,RH (time) = 9.06, ηRH2 = 0.27, pRH < 0.001; F1.8,24.9,LH (time) = 18.99, ηLH2 = 0.90; pLH < 0.001; post hoc t tests against t1 for RH: t3 to t6 all p < 0.03; LH: t2 to t6 all p < 0.001), consistent with a competitive process that boosts the chosen and suppresses the unchosen action representation. This difference was stronger on FC trials (interaction C/FC × hand × time: F(5,70) = 13.08, p < 0.001, η2 = 0.05; post hoc t test RH-LH in C vs FC: t3 to t6 all p < 3 × 10−3) where evidence for one option was unambiguously provided and shorter RTs were observed. But critically, this effect was also present on choice trials where evidence for one versus the other action resulted from the value-decision process.
We note that on choice trials, we observed a trend for MEPs of the chosen hand to decrease from t1 to t2 (p = 0.06, t14 = 1.98), after which they progressively increased. This may reflect a “hold your horses” signal that has previously been described in situations of response conflict (Frank, 2006; Aron et al., 2007; Duque and Ivry, 2009). Consistent with that idea, this trend was not observed on FC trials (p = 0.55). There is work to suggest that the subthalamic nucleus may cause this initial suppression of CSE to allow all information to be integrated before a choice is made (Frank, 2006; Aron et al., 2007). This mechanism could co-occur with that of biasing influences gradually building up toward the choice. However, because our experiment was not specifically designed to address this question, we refrain from further discussion of this observation.
Importantly, the stimulus-locked analysis does not resolve our main question of whether excitability differences between chosen and unchosen actions emerged during the decision process, rather than at the stage of action selection following completion of the decision. We tested this in two further analyses. First, in a response-locked analysis, data were grouped according to the distance between TMS pulse and response. Crucially, the same time bins were used for averaging in the FC and C conditions, which enabled a direct comparison of the time at which any bias between action representations can first be observed. This controlled for action selection and specification in the absence of choice.
The response-locked analysis revealed a significant bias for chosen versus unchosen actions starting ∼330 ms before the response in FC trials, but this bias arose even earlier when a value-based decision was required (∼510 ms before the response; 2 × 6 ANOVA C/FC × time on the data for RH-LH at t−6 to t−1 shows trend-wise interaction: F(2.5,35.1) = 2.53, p = 0.082, η2 = 0.014; direct comparison of C versus FC at t−4: p = 0.08; t tests RH vs LH for C: t−11 to t−9, all pC > 0.6; pt(−8),C = 0.058; pt(−7),C > 0.6; pt(−6),C = 0.0024; pt(−5),C = 0.095; t−4 to t−1, all pC < 0.004; FC: t−6 to t−4, all pFC > 0.1; t−3 to t−1, all pFC < 0.001; Fig. 3b). Hence, the competition among alternative actions on C trials started earlier than would be expected if it merely reflected action selection following completion of the decision. This confirmed our main hypothesis showing that activity in motor cortex distinguishes between chosen and unchosen actions before completion of the decision process.
To verify this effect, we performed a third and more conservative analysis in which we sought to isolate the decision period while ruling out any contribution from the action selection and specification process (see Materials and Methods, above).
The TMS time of FC trials was expressed as a percentage of the individual trial's RT. This revealed that excitability for chosen and unchosen options diverged at ∼38% of the RT on FC trials—and thus was present during the final 62% of the RT (Fig. 4a). This was taken as an estimate for the time during which CSE reflects action selection processes that are independent of a decision process.
To examine whether CSE distinguishes between chosen and unchosen actions before the decision process is complete on choice trials, we excluded choice trials in which TMS application fell within the part of the trial corresponding to the final 62% of the FC-RT (in absolute ms). Consequently, trials with fast RTs, in which TMS occurred relatively close to the overt action, were more likely to be discarded (Fig 4c); the exact cutoff depended on each participant's average FC-RT. We therefore ruled out that an effect in CSE could have been trivially explained by pure action selection because TMS might have been applied at a time when the decision was already complete. On all remaining trials, TMS should have occurred before or during the decision process, but not after. Critically, after this conservative correction, CSE could still distinguish between chosen and unchosen actions (2 × 6 ANOVA: interaction hand × time F(3.8,52.7) = 5.6, p < 0.001, η2 = 0.13; post hoc t tests: pt4 = 0.004, pt5 = 0.014, pt6 = 0.004; Fig. 4b).
CSE correlates with subjective value-difference
A final analysis revealed that the CSE difference between chosen and unchosen action representations was likely to be value-driven, and thus reflected the evidence for the chosen versus the unchosen action. Based on recent direct recordings from dorsal premotor cortex in nonhuman primates (Pastor-Bernier and Cisek, 2011), we predicted that excitability should correlate with relative expected value, i.e., the difference in value between chosen and unchosen options. We thus tested for a correlation between value-difference and CSE using the data from time points t2–t6 shown in Figure 3a.
We found that CSE related to relative subjective value (value-difference) separately for chosen and unchosen effectors. On average, CSE of chosen actions correlated positively with value-difference, and significantly more positively than CSE of unchosen actions (average slope for RH: 0.7 ± 0.25, t14 = 2.84, p = 0.007; average difference in slope RH vs LH: 1 ± 0.35, t14 = −2.85, p = 0.006; all t tests reported in this section are one-sided because of our a priori hypotheses; Fig. 5). CSE of unchosen actions showed a trend to correlate negatively with value-difference (average slope: −0.3 ± 0.22, t14 = −1.38, p = 0.094). None of these effects were observed when considering the relative differences between expected probability or expected magnitude (all p > 0.1). Effects were weaker when considering the absolute subjective expected value, i.e., the value of just the chosen option (p = 0.27, p = 0.08, and p = 0.03 for the same three one-sided t tests). This suggests that competition for action is relative and depends on a weighting of current decision variables (here the product of probability and magnitude). Thus, CSE reflects the accumulated evidence for how much an action is worth taking relative to its alternative.
Of note is that in all but two participants, we found an additional correlation between reaction times and value-difference (all p < 0.01 for both hands). This is expected because a faster accumulation of decision evidence for large versus small value-difference leads to faster reaction times (Fig. 2b). Consequently, the correlation with CSE might have been driven by the decision evidence (i.e., value-difference), increased action preparation (i.e., shorter reaction times) resulting from larger value-differences, or both. Restricting our analysis to the decision period (Fig. 4b, shaded region) reduced the correlations between value-difference and CSE (p = 0.15 for RH, p = 0.23 for LH), though we note that in this analysis only 15% of choice trials remained.
CSE changes during value-decisions are not caused by muscle precontraction
Even though trials with muscle precontraction in the target muscle were discarded from all analyses, we wanted to ensure that the reported excitability effects were not caused by trials with subthreshold precontraction not detected by our exclusion criterion. We therefore repeated all analyses and statistical tests reported for MEPs on the RMS (see Materials and Methods, above). There was an effect of hand × time and C/FC × hand × time (both p < 0.001) in the ANOVA of the first nonconservative stimulus-locked analysis (compare Fig. 3a), where TMS close to the response was not conservatively controlled for. However, further post hoc comparisons revealed that this effect was caused by a significant difference in the RMS between chosen and unchosen hand only at the last time point, t6. This could therefore not have caused the very different effect in CSE observed at the earlier time points. Furthermore, and most importantly, none of the other more conservatively controlled effects reported for the MEP data reached significance when performed on the RMS. This shows that muscle precontraction was not driving any of the critical reported MEP effects.
Discussion
Despite intensive research on the brain processes concerning either decision making or the preparation and selection of actions, it is much less clear how the two processes are linked and how our decisions are transformed into actions (Freedman and Assad, 2011; Padoa-Schioppa, 2011). We investigated how decision making informs the competitive process through which actions are selected using corticospinal excitability measures from human motor cortex. We show that this competition arises during the decision process and relates to the expected value of the action. Our main findings are discussed in detail below.
Our data show that CSE progressively increased for the chosen and decreased for the unchosen action, consistent with a competitive process among alternative action representations (Cisek, 2006). Previous work has shown evidence for response competition at the level of human M1 for purely perceptual decisions (Michelet et al., 2010) but it was unclear whether a similar mechanism can account for internally generated, subjectively motivated actions such as those driven by expected values. Theory predicts that alternative action representations will be boosted (or suppressed) by the affordances guiding the selection of the appropriate action (“affordance competition”) (Cisek, 2006), whether perceptual or value-related (Basten et al., 2010). This is of broader interest because it points toward similar physiological principles for the selection of representations in perception and action. Similar competition processes have previously been described in the visual system (Desimone and Duncan, 1995).
Importantly, we observed that this competition between representations of chosen and unchosen actions occurred parallel in time with the value-based decision process, which was the main hypothesis tested in this study. Recent models that unify action, perception, and learning (Cisek, 2006; Cisek and Kalaska, 2010; Friston et al., 2010) make exactly this prediction, and current evidence indeed suggests an intimate temporal relationship between decision making and action selection (Donner et al., 2009; Resulaj et al., 2009; Hare et al., 2011). However, direct evidence supporting a simultaneous specification of actions during a decision process has thus far been lacking. It has not been directly tested whether physiological changes in (pre-)motor regions before an action could not just simply reflect motor preparatory processes that follow the decision. Here we used a novel approach that measured the temporal evolution of CSE during the course of a trial using TMS, and compared this temporal evolution in a choice and no-choice context. We provide evidence that CSE is biased earlier, with respect to movement onset, in choice compared with no-choice contexts. This indicates that the decision process influences action selection via continuous updates, i.e., parallel in time with the decision process. The processing of decision variables guiding behavior therefore immediately influences the specification of the resulting behavior. This contrasts situations where decisions are abstract and thus not immediately linked to any action (Wunderlich et al., 2010; Padoa-Schioppa, 2011).
It could be argued that the early diversion of CSE on choice trials just reflects the beginning of action preparation and selection, which may take longer on choice compared with forced choice trials. This is entirely consistent with our interpretation: the CSE changes observed during the decision period on choice trials are only functionally meaningful if they contribute to the preparation and selection of the required action. These CSE changes are thus the earliest expression of motor preparation, but critically they are triggered by the underlying decision process and reflect the currently available decision evidence. This is different from the effects of action preparation and selection observed on forced choice trials, where the evidence for the forthcoming action is unambiguous and does not require a time-consuming decision process.
Our second finding suggests that CSE on choice trials changes in relation to expected value. Converging evidence from work in rodents and nonhuman primates indeed shows that variables relevant for value-decisions influence activity in premotor areas (Roesch and Olson, 2003; Pastor-Bernier and Cisek, 2011; Sul et al., 2011). So far, this evidence has been lacking in humans (but for a task where participants did not make choices, see Kapogiannis et al., 2008). Based on our current data, we speculate that value-related signals can be observed from human CSE recordings during decision making and reflect the influence of the value computation that motivates behavior. However, in the present case, both reaction times as well as CSE related to value-difference, so that the respective contribution of these variables remains to be determined. Nevertheless, we note that changes in CSE were better explained by the value-difference between the options as opposed to the absolute value chosen on a given trial. The computation of a value-difference signal is critical for the comparison of choice options and thus the ultimate decision computation, and is thought to rely on specialized regions in medial and lateral prefrontal cortex (Boorman et al., 2009; Wunderlich et al., 2010; Hare et al., 2011; Hunt et al., 2012).
The presence of value signals in action representations recorded from motor cortex should however not be interpreted as evidence that this region is concerned with the computation or comparison of values. Instead, these signals are likely to reflect an accumulation of evidence for a decision that is taking place elsewhere. We speculate that the motor system has an active role in action selection in the sense that it continuously predicts what the most likely course of action will be, based on the evidence received from other regions. Although not directly tested here, this would be consistent with our data and current theory about hierarchical dynamical models of brain function (Friston et al., 2010).
The anatomical pathways via which decision evidence is broadcast to motor and premotor cortices require further investigation. Decisions of hand choice, for example, have been shown to rely on influences from posterior parietal cortex (Oliveira et al., 2010). In contrast, perceptual decisions are likely to be driven by evidence originating in regions such as intraparietal (Shadlen and Newsome, 2001) and dorsolateral prefrontal (Heekeren et al., 2004, 2006; Philiastides et al., 2011) cortices. For the computation of value signals, the relevant decision variable in our task, several key brain regions have been identified (for review, see Kable and Glimcher, 2009; Rangel and Hare, 2010). A likely candidate for exerting value-related influences on (pre-)motor regions is the ventromedial prefrontal cortex (VMPFC) because of its suggested role in value computation and comparisons (Boorman et al., 2009; Hare et al., 2011). VMPFC has indirect anatomical connections to motor cortices via the anterior cingulate sulcus (Dum and Strick, 1991; Price, 2005), or alternatively via the ventral striatum, which can influence motor cortex via frontostriatal loops (Middleton and Strick, 2000; Haber et al., 2006). Other candidate regions are the lateral prefrontal and parietal cortices given their suggested role in the selection and implementation of value-decisions (Kable and Glimcher, 2009; Rangel and Hare, 2010).
The single-pulse TMS approach used here does not allow us to identify whether the observed changes in CSE arise via corticocortical or basal ganglia pathways, and future work will need to examine the specific contributions from these regions during decisions-for-actions. For example, Neubert et al. (2010) recently used a combined double-coil and diffusion tensor imaging approach, showing that short-latency and longer latency influences from inferior frontal gyrus/presupplementary motor area to M1 during action reprogramming were mediated by corticocortical and subcortical pathways, respectively. Moreover, our approach did not distinguish between the specific intracortical and spinal contributions to the observed changes in CSE. Paired-pulse TMS techniques can dissociate these (Kujirai et al., 1993) and furthermore help to identify the specific intracortical circuits mediating the observed CSE effects. For example, different protocols can identify excitability changes in intracortical GABAA and GABAB circuits (Di Lazzaro et al., 1998, 2000, 2002). These could provide selective targets for the value signals broadcast to motor cortex.
In conclusion, we have demonstrated that decision processes influence the human motor system in a temporally parallel fashion, consistent with recent models for decision and action (Cisek, 2006). We provide the first direct physiological evidence in humans that internally generated processes such as value-based decisions shape our actions in a competitive and effector-specific way even before the decision process is complete.
Footnotes
This study was supported by the Wellcome Trust (086120/Z08/Z), the European Research Council (ActSelectContext, 260424), and the Biotechnology and Biological Sciences Research Council.
The authors declare no financial conflicts of interest.
- Correspondence should be addressed to Miriam C. Klein-Flügge, Sobell Department of Motor Neuroscience and Movement Disorders, 33 Queen Square, WC1N 3BG London, United Kingdom. m.klein{at}ucl.ac.uk