Abstract
It has been shown that when incentives are provided during movement preparation, activity in parieto-frontal regions reflects both expected value and motivational salience. Yet behavioral work suggests that the processing of rewards is faster than for punishments, raising the possibility that expected value and motivational salience manifest at different latencies during movement planning. Given the role of beta oscillations (13–30 Hz) in movement preparation and in communication within the reward circuit, this study investigated how beta activity is modulated by positive and negative monetary incentives during reach planning, and in particular whether it reflects expected value and motivational salience at different latencies. Electroencephalography was recorded while male and female humans performed a reaching task in which reward or punishment delivery depended on movement accuracy. Before a preparatory delay period, participants were informed of the consequences of hitting or missing the target, according to four experimental conditions: Neutral (hit/miss:+0/−0¢), Reward (hit/miss:+5/−0¢), Punish (hit/miss:+0/−5¢) and Mixed (hit/miss:+5/−5¢). Results revealed that beta power over parieto-frontal regions was strongly modulated by incentives during the delay period, with power positively correlating with movement times. Interestingly, beta power was selectively sensitive to potential rewards early in the delay period, after which it came to reflect motivational salience as movement onset neared. These results demonstrate that beta activity reflects expected value and motivational salience on different time scales during reach planning. They also provide support for models that link beta activity with basal ganglia and dopamine for the allocation of neural resources according to behavioral salience.
SIGNIFICANCE STATEMENT The present work demonstrates that pre-movement parieto-frontal beta power is modulated by monetary incentives in a goal-directed reaching task. Specifically, beta power transiently scaled with the availability of rewards early in movement planning, before reflecting motivational salience as movement onset neared. Moreover, pre-movement beta activity correlated with the vigor of the upcoming movement. These findings suggest that beta oscillations reflect neural processes that mediate the invigorating effect of incentives on motor performance, possibly through dopamine-mediated interactions with the basal ganglia.
Introduction
Several studies have shown that the value of a target affects the vigor of saccades (Xu-Wilson et al., 2009; Manohar et al., 2015; Reppert et al., 2015) and upper limb movements (Opris et al., 2011; Sackaloo et al., 2015; Summerside et al., 2018). Consistent with this, considerable evidence from monkey electrophysiology has revealed that regions of the dorsal parieto-frontal cortex, which directly contribute to goal-directed movements (Andersen and Cui, 2009), are modulated by incentives during movement preparation (Platt and Glimcher, 1999; Roesch and Olson, 2003, 2004; Musallam et al., 2004; Leathers and Olson, 2012). An important issue over the past 20 years has been to understand what these modulations represent.
Following the seminal work of Platt and Glimcher (1999), showing that spiking activity in the parietal cortex scaled with the reward a monkey could expect from an impending saccade, several studies have demonstrated that parietal neurons encode the gain that is expected from an upcoming action (i.e., expected value; Musallam et al., 2004; Sugrue et al., 2004; Yang and Shadlen, 2007). Most of these studies, however, only used positive incentives, making it unclear whether neural activity reflected the signed value of a prospective outcome or its behavioral importance (i.e., motivational salience), which would increase both with rewards and punishments (Bromberg-Martin et al., 2010). Indeed, avoiding a punishment may mobilize action preparatory mechanisms as much as obtaining a reward. In line with this, other studies have shown that some neurons in the parietal (Leathers and Olson, 2012) and premotor cortex (Roesch and Olson, 2004) respond similarly to reward- and punishment-predicting cues, suggesting that motivational salience is also encoded within the parieto-frontal network during movement preparation.
Recent neuroimaging studies in humans have largely supported the above-mentioned work in monkeys, providing evidence for the encoding of both expected value and motivational salience in parietal (Iyer et al., 2010; Kahnt and Tobler, 2013; Kahnt et al., 2014; Barbaro et al., 2017) and premotor cortex (Iyer et al., 2010). Because of the modest temporal resolution of functional magnetic resonance imaging (fMRI), however, the time course of the two representations has remained unclear. An interesting possibility is that expected value and motivational salience may be expressed at different moments during movement preparation. Support for this comes from behavioral work by Chapman et al. (2015), who demonstrated that when forced to respond rapidly (reaction times ≤425 ms), action selection is quicker when the best possible outcome entails obtaining a reward compared with evading a punishment. Importantly, however, this reward-related processing advantage disappeared when response initiation was delayed by 500 ms, suggesting that when there are no net positive options, it takes extra time for the brain to implement rational behavior. In light of this temporal processing asymmetry between rewards and punishments, it could be hypothesized that neural activity subtending reach preparation may initially represent expected value, but gradually come to reflect motivational salience given sufficient time.
The present study tested this hypothesis by exploiting the high temporal resolution of electroencephalography (EEG). Focus was put on the beta frequency band (13–30 Hz), because a reduction in beta power has long been linked to action preparation (Jenkinson and Brown, 2011; Kilavik et al., 2013). Furthermore, beta activity has been shown to be modulated by reward-predicting cues in nonreaching tasks (Bunzeck et al., 2011; Doñamayor et al., 2012; Apitz and Bunzeck, 2014; Meyniel and Pessiglione, 2014) and has been linked with the upregulation and downregulation of dopaminergic activity (Jenkinson and Brown, 2011), which mediate motivation and movement vigor (Niv et al., 2007; Bromberg-Martin et al., 2010; Berke, 2018). Results revealed that monetary incentives strongly modulate beta activity over bilateral parieto-frontal scalp sites, with power positively correlating with movement times. Interestingly, beta power was selectively sensitive to potential rewards early during movement planning, after which it came to reflect motivational salience as movement initiation neared.
Materials and Methods
Participants
Twenty-three university students [16 females, 22 ± 1 (mean ± SD) years old] with normal or corrected-to-normal vision took part in the experiment. All were right-handed based on self-report and were free of any known neurological condition. Before accepting to take part in the study, participants were informed that they would begin the experiment with a monetary compensation of $20 CAD, but that the earnings/losses incurred during the experiment would be added/subtracted from the $20. In all cases, participants finished the experiment with a net monetary gain averaging $19.30 ± 0.70. Before participation, participants gave their informed written consent. All procedures were approved by the Université de Sherbrooke institutional review board and ethics committee. Some data from this experiment have been used to address a separate question, namely how target hits and misses impact post-movement EEG activity (Hamel et al., 2018).
Experimental setup
The experimental setup consisted of a table-mounted steel armature, which supported a 20 inch computer monitor (Dell P1130) that projected visual stimuli onto a semi-silvered mirror positioned in front of participants (Fig. 1A). The monitor (resolution: 1024 × 768; refresh rate: 150 Hz) was mounted face down 29 cm above the semi-silvered mirror, positioned 29 cm above the table surface. Participants' hand movements were recorded by way of a custom-built two-joint manipulandum composed of two lightweight metal rods, which lay on the table surface below the mirror. To move the manipulandum, participants used a short steel handle located at its mobile end. Two potentiometers, located at the manipulandum's hinges, allowed for the recording of movement-induced changes in rod angle at 100 Hz, from which planar hand displacements were measured. To minimize friction between the manipulandum and table surface, a smooth plastic sheet was fixed to the table and felt pads were opposed beneath the manipulandum's hinges. This setup allowed participants to view the visual stimuli in the same plane as their hand. Moreover, because all experiments were conducted in the dark, the semi-silvered mirror prevented participants from seeing their hand during the experiment. This setup has been used in previously published work (Benazet et al., 2016; Canaveral et al., 2017; Hamel et al., 2017; Hamel-Thibault et al., 2018).
Experimental task
Overview.
Participants were required to produce right-handed reaching movements toward visual targets without online visual feedback of their hand. All movements were initiated from the same starting position (30 cm in front of participants' midline), which was identified with a visual landmark (gray circle, diameter: 2 cm; Fig. 1B). On each trial, reaches were made to a visual target located 10 cm away from the starting position. There were three possible target locations to prevent movement repetitiveness: either straight ahead of participants' midline or offset by ±4° with respect to midline. The visual targets were circular and consisted of an inner and an outer region. The inner region was the target goal to be achieved (i.e., the area in which the participant's hand had to land in order for the trial to be deemed successful). The size of the outer region was identical across participants and conditions (diameter 2.47 cm). The purpose of having an inner and an outer region, both of which were color coded, was to inform participants of the monetary consequences associated with landing on or off target (see details in Manipulation of INCENTIVE section). In addition to manipulating monetary incentives (INCENTIVE), the probability of success (PROBABILITY) was also manipulated by changing the size of the inner region, which could be large or small (see details below).
Trial timeline.
The task used in this study resembled the monetary incentive delay task described by Knutson et al. (2000). All trials were initiated when participants actively brought their hand in the starting position. If this position was maintained for 500 ms, the visual target, which indicated the monetary consequences of hitting/missing the target on that specific trial, was presented. This was followed by a 2000 ms delay period, at the end of which an auditory tone was heard, instructing participants to initiate their reach (i.e., go-cue). To emphasize movement accuracy, participants were not instructed to react as fast as possible after the go-cue, but to initiate their movements whenever they felt ready after the go-cue (albeit as consistently as possible throughout the experiment). However, they were required to (1) produce straight movement trajectories, (2) avoid online movement corrections, and (3) keep movement times in the vicinity of 300 ms. To ensure proper behavior, participants underwent a 30 trial familiarization session, which was prolonged until they felt they could meet the aforementioned requirements with consistency. During all phases of the experiment (i.e., familiarization, pre-experiment, and main experiment), whenever movement times exceeded 350 ms on three consecutive trials, participants were instructed to increase their movement speed during the intertrial interval. This ensured that movement times did not slow down during the experiment. Immediately at movement end, the target was extinguished and replaced with a red fixation cross centered at the same location, such that participants could maintain fixation and avoid saccades. Simultaneously, feedback concerning task performance was presented. Specifically, the incurred monetary outcome was displayed (i.e., “0¢” or “±5¢”), along with a green circular cursor (diameter: 0.58 cm) representing participants' final hand position. For a trial to be considered a “hit”, at least one of the cursor's pixels had to overlap with the area corresponding to the extinguished target's inner region. Failure to attain this criterion resulted in a target “miss”. Participants were required to maintain their final hand position for 1500 ms after movement end, after which all but the cursor and starting position was extinguished, prompting participants to bring their hand back to the starting position for the next trial. The cursor was kept visible during this intertrial interval to help participants bring their hand back to the starting position, after which it was extinguished.
Manipulation of INCENTIVE.
On each trial, the colors of the inner and outer regions of the target informed participants of the monetary consequences associated with landing on or off target, according to four conditions (Fig. 1C). In the Neutral condition (inner region, gray; outer region, gray), target hits and misses were neither rewarded nor punished (0¢). In the Reward condition (inner region, green; outer region, gray), target hits were rewarded with a 5¢ monetary gain, whereas target misses yielded no monetary consequence. In the Punish condition (inner region, gray; outer region, red), target hits yielded no monetary consequence, but target misses were punished with a 5¢ monetary loss. Finally, in the Mixed condition (inner region, green; outer region, red), target hits were rewarded with a 5¢ monetary gain, whereas target misses were punished with a 5¢ monetary loss. Importantly, participants were instructed to try their best to land on the target on every trial, regardless of condition.
Manipulation of PROBABILITY.
In addition to INCENTIVE, the probability of landing on target (i.e., PROBABILITY) was manipulated. This was based on multiple lines of evidence showing that the probability of obtaining a reward modulates neural activity during action planning (Platt and Glimcher, 1999; Musallam et al., 2004; Sugrue et al., 2004; Yang and Shadlen, 2007). To induce a high and a low probability of hit in the present study, the size of the inner target was manipulated to yield either a 30 or 70% probability of landing on the target. To determine the inner target size needed to achieve this, participants took part in a pre-experiment before the main experiment (i.e., on the same day), which was exempt of monetary consequences. Specifically, after the familiarization session, participants completed 120 trials of the experimental task with the inner target diameter set at 1 cm. The movement endpoint data for each individual participant were then fitted with ellipses to determine the inner target diameter needed to achieve success rates of 30% (0.84 ± 0.21 cm) and 70% (1.49 ± 0.36 cm) in the main experiment. Analyses confirmed that this procedure effectively yielded different success rates for Low (40.0 ± 0.1%) and High (75.9 ± 0.1%) PROBABILITY conditions (paired-samples t test: t(22) = 19.52, p = 2.21 × 10−15). Once the pre-experiment was completed, the EEG cap was fit to the participants' head and the main experiment started.
The experiment consisted of 8 blocks of 72 trials (i.e., 576 trials). There were 144 trials in each of the four levels of the INCENTIVE factor (i.e., Neutral, Reward, Punish, Mixed). Half of the trials at each INCENTIVE factor level were presented with the small target (i.e., Low PROBABILITY), whereas the other half were presented with the big target (i.e., High PROBABILITY). Hence overall, 72 trials were collected in each of the eight experimental conditions [INCENTIVE (4 levels) × PROBABILITY (2 levels)]. Trial ordering was pseudorandomized, such that each condition was presented nine times in each block.
Expected neural responses to INCENTIVE and PROBABILITY.
It should be noted that the INCENTIVE and PROBABILITY factor levels allowed for the disentangling of expected value versus motivational salience encoding during the delay period. Given that expected value reflects the true value of prospective outcomes (Iyer et al., 2010), it should always be higher in contexts offering a greater reward opportunity. Therefore, expected value should be highest in Reward and lowest in Punish, with Neutral and Mixed being intermediate and depending on PROBABILITY level as follows: In contrast to expected value, motivational salience refers to the absolute value of prospective outcomes (Iyer et al., 2010), implying that it should be highest when monetary stakes are large and lowest when they are small. Therefore, motivational salience should be highest in Mixed and lowest in Neutral, with Reward and Punish being intermediate and depending on PROBABILITY level as follows:
Movement-related data recording and analysis
All visual stimuli were presented using Psychtoolbox (Brainard, 1997; Pelli, 1997), running on MATLAB v2014a (MathWorks) using the Windows 7 operating system (Microsoft) on a desktop computer (Dell Optiplex 9010). Hand position data estimated from potentiometers in the manipulandum were analyzed off-line using custom MATLAB code. Movement initiation was defined as the first time point when the position of the hand was recorded outside the starting position after the go-cue. Movement end was defined as the first time point when hand velocity fell <1 pixel/s after movement initiation. Reaction time (RT) was calculated as the difference between the go-cue and movement initiation, whereas movement time (MT) was calculated as the difference between movement initiation and movement termination. Endpoint error, which was used as a measure of accuracy, was defined as the resultant (x, y) distance between final hand position and the target center. Movement endpoint variability along the x- and y-axes were respectively calculated as the x and y variance of endpoint errors. Finally, the distance covered by the hand between each (x, y) sample during movement was summed to determine the length of the movement trajectory (i.e., path length).
Movement-related trial rejection
To prevent outlier trials from affecting movement-related outcomes, trials for which RT, MT, or endpoint error exceeded ±2 SD were discarded on a per participant basis. Overall, this led to the rejection of 1250/13,248 total trials (i.e., ∼9%).
EEG data acquisition, processing, and time-frequency decomposition
A 64-electrode actiCAP (extended 10/20 system, Brain Products) and BrainAmp system (Brain Products) was used to record scalp EEG. Special care was taken to ensure that electrode Cz was always positioned over the vertex of participants' head before recording. The EEG data were sampled at 500 Hz and digitized online using the BrainVision Recorder software v2.0 (Brain Products), using a separate laptop (Dell Latitude E6530) running on Windows 7 (Microsoft). All EEG data were analyzed offline using custom MATLAB code and functions from the EEGLAB (Delorme and Makeig, 2004) and CSD (Kayser and Tenke, 2006) toolboxes. EEG signals were bandpass filtered between 1 and 125 Hz, with a 59–61 Hz notch, and re-referenced to the average scalp potential. The data were then epoched around the go-cue (−4000 to +2000 ms) and baseline-corrected to the average potential recorded between −2500 and −2000 ms. This period was chosen because (1) it corresponded to a moment when participants were immobile; and (2) the target cue, which informed participants of the upcoming condition, had not yet been revealed, providing a period with a neutral context. Once baseline-corrected, the EEG data were screened for scalp potential deflections >150 μV, and epochs that fit this criterion (859/13,248; ∼6%) were discarded from both the EEG and movement-related datasets before further processing.
Independent component analysis, a blind separation technique that decomposes the EEG signal into maximally independent components (Jung et al., 2000), was then applied to the EEG data using the “runica” algorithm from the EEGLAB toolbox. Briefly, independent component analysis is a standard method for removing artifactual EEG activity without discarding entire epochs (Jung et al., 2000; Whittingstall et al., 2010). Independent components were identified as being artifactual and discarded if they met two of the following three criteria: (1) their time course showed spurious bursts of activity; (2) their spectral power did not generally decrease as a function of frequency, as expected for EEG spectral power (Buzsáki, 2006); and (3) their scalp map showed activity concentrated at the far edges of the scalp, which are often indicative of muscle and/or ocular artifacts (Jung et al., 2000). After backprojecting the “clean” components to electrode space, trials labeled as outliers based on movement-related variables (see Movement-related trial rejection) were discarded from the EEG datasets. This was done after the artifactual components were removed to provide the runica algorithm with as many trials as possible. Finally, the data were transformed into current source density using the Surface Laplacian transform (m constant, 4; head radius, 10 cm; smoothing constant, 10−5; Kayser and Tenke, 2006). This was done because the Surface Laplacian transform reduces the contribution of distant sources to the EEG signal (Nunez and Srinivasan, 2006), thus minimizing electromyography signal contamination (Fitzgibbon et al., 2015). Moreover, the Surface Laplacian transform improves the spatiotemporal resolution of the EEG time series compared with monopolar recordings (Burle et al., 2015; Vidal et al., 2015).
With respect to time-frequency analyses, the EEG time-series of each electrode and trial were convolved with a series of complex Morlet wavelets (1–50 Hz, 1 Hz steps). Spectral power estimates were obtained by multiplying the resulting complex signal by its complex conjugate. Wavelet cycles were linearly increased from 3 to 7.9 in 0.1 steps to improve frequency resolution at higher frequencies (Cohen, 2014). The spectral power data were then baseline-normalized separately in each condition by dividing each data point, averaged across trials, by the average power during the baseline period and log-transforming the result. Finally, the spectral power data were downsampled to 100 Hz.
Experimental design and statistical analysis
All statistical analyses were conducted on the 23 participants that took part in this study. To compare the means of all movement-related dependent variables (MT, endpoint error, RT, x-axis endpoint variability, y-axis endpoint variability, path length), two-way repeated-measures ANOVAs were used, with INCENTIVE (4 levels: Neutral, Reward, Punish and Mixed) and PROBABILITY (2 levels: High and Low) as the two within-subject factors. To control for type 1 error inflation because of dissimilar intercondition variance (i.e., violation of the sphericity assumption), the Huynh–Feldt correction was applied to the degrees of freedom of all ANOVAs. Two-tailed dependent t tests were used to identify within factor differences whenever significant main effects were identified for INCENTIVE, PROBABILITY, or an interaction between them. To counter the inflation of type 1 errors, paired comparisons were Bonferroni-corrected by multiplying their p values by the number of comparisons needed to break down all significant factors (specified in Results). The statistical analysis of the movement-related variables was done with IBM SPSS statistics v23.
With respect to the EEG data, the goal was to assess whether beta (13–30 Hz) spectral power was modulated by either INCENTIVE or PROBABILITY during the delay period. To investigate this, nonparametric permutation tests were conducted to identify clusters of spatially and temporally adjacent samples (or electrode/time pairs) showing statistically significant differences across conditions (Maris and Oostenveld, 2007). This type of analysis is particularly interesting for EEG because (1) it does not make assumptions about the distribution of the data, and (2) it provides an efficient solution to the multiple-comparisons problem. Briefly, for every contrast, two-tailed dependent t tests were computed for all electrode/time pairs in the EEG data. Clusters, defined as adjacent electrode/time pairs whose test statistic exceeded the threshold for statistical significance (t(22) = 2.074, α = 0.05, two-tailed), were then identified. To be considered a cluster, at least three electrodes had to show statistically significant t values within a radius of 4 cm. The size of a cluster was obtained by summing the t values across its constituent electrode/time pairs. Following each permutation (N = 2000), which consisted of randomly shuffling the experimental condition labels across participants, the largest “permutation” cluster was identified. The Monte Carlo estimate (i.e., the proportion of permuted clusters whose size was smaller than the clusters identified in the true data) was used to yield p values for each cluster.
All nonparametric permutation tests were conducted throughout the delay period (−2000 to 0 ms). To probe for differences across the INCENTIVE factor, data were pooled across PROBABILITY levels and dependent t tests were used to compare the four INCENTIVE levels against each other (6 paired comparisons). To probe for differences across the PROBABILITY factor, data were pooled across INCENTIVE levels and dependent t tests were used to compare High PROBABILITY and Low PROBABILITY trials (1 paired comparison). To probe for an interaction between the INCENTIVE and PROBABILITY factors, dependent t tests were used to compare the differences between Low PROBABILITY and High PROBABILITY trials across INCENTIVE levels (6 paired comparisons). Clusters were deemed statistically significant if their Bonferroni-corrected p value (i.e., p * 13) was smaller or equal to the significance threshold (α = 0.05, two-tailed). All nonparametric permutation tests were done using functions from the Fieldtrip toolbox (Oostenveld et al., 2011).
For all targeted comparisons (i.e., dependent t tests), Cohen's dz was calculated and used as a measure of effect size (Lakens, 2013). For the EEG data, the dz was calculated using the average t value of a given cluster (denoted as tmean in the text). According to Cohen (1988), dz is considered small, moderate, or large if it exceeds 0.2, 0.5, or 0.8, respectively.
Results
Motor behavior
The first set of analyses sought to investigate whether the INCENTIVE or PROBABILITY factors influenced any of the movement-related variables. With respect to MTs, the repeated-measures ANOVA revealed a significant main effect of INCENTIVE (F(2.4,53.6) = 33.23, p = 3.9 × 10−11), but not PROBABILITY (F(1,22) = 0.98, p = 0.33). Importantly, an INCENTIVE × PROBABILITY interaction effect was also identified (F(2.7,60.2) = 4.17, p = 0.011; Fig. 2A). To break down the INCENTIVE × PROBABILITY interaction, dependent t tests were used to compare Neutral, Reward, Punish, and Mixed at each PROBABILITY level (Bonferroni correction: p*12). For the High PROBABILITY trials (i.e., large target), MTs were significantly longer in Neutral compared with all other conditions [t(22)range = (3.98, 5.73), prange = (1 × 10−4, 0.0075), dz range = (0.83, 1.20)]. No significant differences were identified between Reward, Punish and Mixed [t(22)range = (−2.82, 2.35), prange = (0.12, 1.00), dz range = (0.22, 0.59)]. For the Low PROBABILITY trials (i.e., small target), MTs were also found to be significantly longer in Neutral than in all other conditions [t(22)range = (4.12, 7.9), prange = (8.76 × 10−7, 0.0055), dz range = (0.86, 1.65)]. Although MTs in Reward and Punish were not found to differ in Low PROBABILITY trials (t(22) = −0.02, p = 1.00, dz = 0.003), both conditions showed longer MTs compared with the Mixed condition [t(22)range = (3.39, 4.06), prange = (0.0063, 0.032), dz range = (0.71, 0.86)]. Globally, these results indicate that movement vigor tends to increase as a function of the stakes associated with the movement.
Concerning endpoint errors, a significant main effect of INCENTIVE was identified (F(3.0,65.8) = 5.32, p = 0.002), but no effect of PROBABILITY (F(1,22) = 0.21, p = 0.65) and no INCENTIVE × PROBABILITY interaction (F(2.9,64.8) = 0.15, p = 0.93; Fig. 2B). Paired comparisons (Bonferroni correction: p*6) revealed that endpoint errors were significantly larger in Neutral compared with both Reward and Mixed [t(22)range = (3.05, 3.27), prange = (0.02, 0.04), dz range = (0.64, 0. 68)], but not Punish (t(22) = −1.01, p = 1.00, dz = 0.21). There was no significant difference between Reward, Punish, and Mixed [t(22)range = (0.24, 2.70), prange = (0.08, 1.00), dz range = (0.05, 0.56)]. Overall, these results show that participants were more accurate on trials for which there was the possibility of a monetary gain (i.e., Reward and Mixed) compared with when there were no incentives.
For RTs, significant main effects were identified for both INCENTIVE (F(2.7,60.0) = 3.76, p = 0.018) and PROBABILITY (F(1,22) = 6.82, p = 0.016), but no significant interaction was found between these factors (F(2.3,51.6) = 0.58, p = 0.59; Fig. 2C). Paired comparisons (Bonferroni correction: p*6) revealed that RTs in Reward were significantly shorter than in Punish (t(22) = −3.35, p = 0.017, dz = 0.70), with all other comparisons being nonsignificant [t(22)range = (−2.31, 2.09), prange = (0.18, 1.00), dz range = (0.21, 1.00)]. Thus participants were quicker to respond on trials in which there was a possible gain and nothing to lose compared with trials in which the best outcome was to avoid a punishment. As for the PROBABILITY main effect, it revealed that RTs were significantly shorter for High PROBABILITY compared with Low PROBABILITY trials (i.e., large vs small targets).
Concerning x-axis endpoint variability, a significant main effect was found for INCENTIVE (F(2.0,43.8) = 6.01, p = 0.005), but not for PROBABILITY (F(1,22) = 0.014, p = 0.91) or the INCENTIVE × PROBABILITY interaction (F(2.3,51.4) = 0.73, p = 0.51; Fig. 2D). Paired comparisons (Bonferroni correction: p*6) revealed that x-axis endpoint variability in Neutral was significantly greater than in Punish (t(22) = 3.65, p = 0.008, dz = 0.76) and Mixed (t(22) = 4.27, p = 0.0018, dz = 0.89), but did not differ from Reward (t(22) = 2.13, p = 0.27, dz = 0.44). No significant difference was identified between Reward, Punish and Mixed for this variable [t(22)range = (0.33, 1.38), all p = 1.00, dz range = (0.07, 0.29)]. Thus, trials associated with monetary incentives were associated with smaller x-axis endpoint variability.
No significant effects were found for y-axis endpoint variability (INCENTIVE: F(2.9,64.0) = 2.01, p = 0.12; PROBABILITY: F(1,22) = 0.06, p = 0.80; INCENTIVE × PROBABILITY: F(2.3,51.4) = 0.47, p = 0.66; Fig. 2E) and path length (INCENTIVE: F(3,66) = 2.28, p = 0.09; PROBABILITY: F(1,22) = 2.11, p = 0.16; INCENTIVE × PROBABILITY: F(3,66) = 0.70, p = 0.56; Fig. 2F).
Beta power responses
The next analysis sought to investigate whether INCENTIVE, PROBABILITY, or INCENTIVE × PROBABILITY influenced beta oscillatory power during the delay period. To this end, nonparametric permutation tests were used to identify statistically significant spatiotemporal clusters between condition pairs (Bonferroni correction: p*13).
First, beta power was compared across each level of the INCENTIVE factor. As shown in Figure 3A, a large bilateral and caudal cluster revealed that beta power was significantly lower in Reward than in Punish. This cluster was observed early in the delay period, from −1630 to −1050 ms (size = −1881.49, tmean = −3.14, p = 0.0065, dz = 0.65). Beta power was also lower in Reward compared with Neutral during much of the delay period (Fig. 3B), although only one significant cluster, which spanned several bilateral electrodes, was observed between −1600 and −1000 ms (size = −4724.47, tmean = −3.17, p = 0.0065, dz = 0.66). As shown in Figure 3C, beta power was higher in Reward than in Mixed, especially late in the delay period. Yet, despite a strong trend, no cluster reached significant levels (all clusters p ≥ 0.058). As for Punish and Neutral (Fig. 3D), there was no significant difference throughout the delay period (all clusters p ≥ 0.4). As presented in Figure 3E, beta power was significantly higher in Punish compared with Mixed. This difference was revealed by two clusters which spanned a broad bilateral and mostly caudal area from −1650 to −880 ms (size = 2007.63, tmean = 3.22, p = 0.013, dz = 0.67) and from −830 to −80 ms (size = 2825.26, tmean = 3.43, p = 0.013, dz = 0.72). Finally, Figure 3F shows that beta power was significantly higher in Neutral compared with Mixed at all but the most rostral electrodes from −1530 to 0 ms (size = 1199.61, tmean = 3.16, p = 0.0065, dz = 0.66). To visually appreciate the temporal evolution of beta activity at each level of the INCENTIVE factor, time courses of beta power were produced by averaging the data across all electrodes caudal to FCz (Fig. 3G; see inset for electrodes). These electrodes were chosen because they showed significant differences in nearly all contrasts. These data clearly show that early in the delay period (∼−1600 to −1000 ms), beta power was specifically sensitive to the presence of a reward (i.e., lower in Reward and Mixed compared with Neutral and Punish), whereas late in the delay period (∼−1000 to 0 ms), beta power qualitatively scaled with the amount of money at stakes (i.e., lowest in Mixed, intermediate in Reward and Punish, and highest in Neutral). The Reward and Punish conditions demonstrate this reversal most strikingly, differing significantly early in the delay period and then becoming nearly identical later in the delay period.
Second, beta power was compared across the two levels of the PROBABILITY factor (Fig. 4A). However, no statistically significant differences were found (all clusters p = 1.00). Third, beta power differences between Low and High PROBABILITY trials were compared across INCENTIVE factor levels to probe for an interaction (see Materials and Methods, Experimental design and statistical analysis; Fig. 4B–G). Similar to the previous analysis, no statistically significant differences were identified (all clusters p = 1.00).
Relationship between beta power and models
The above analyses revealed that the INCENTIVE factor had a significant effect on delay period beta power. The next analysis thus sought to evaluate whether these beta modulations were correlated with either the expected value (Eq. 1) and/or motivational salience (Eq. 2) models. To do so, beta power in each of the four INCENTIVE factor levels (i.e., Neutral, Reward, Punish, and Mixed) was correlated with the models' predictions at each electrode/time pair throughout the delay period (−2000 to 0 ms) using Spearman's rank correlation. Given that there was no effect of PROBABILITY on beta power, data were pooled across this factor, such that for the models' equations, hit probability for each INCENTIVE level was defined as the proportion of trials in which participants successfully hit the target across both PROBABILITY levels (58 ± 9%, N = 92). Given that permutation tests require that the data from at least two conditions be shuffled to yield a null statistical distribution, a one-sample t test could not be used to assess whether the correlations significantly differed from zero. Instead, permutation analyses were conducted with dependent t tests that compared the correlation data to a vector of zeros at each electrode/time pair, which is the mathematical equivalent of a one-sample t test. Clusters were deemed significant if their Bonferroni-corrected p value was smaller or equal to the statistical significance threshold (α = 0.05, two-tailed). For these correlational analyses, the Bonferroni correction entailed multiplying the cluster p values by five, as beta power was correlated with five different variables (i.e., expected value, motivational salience, MT, endpoint errors, and RT; for correlations with the latter three variables, see Relationship between beta power and movement-related variables). Additionally, to determine whether one model explained the data better than the other, the correlations obtained for each model at each electrode/time pair were compared against each other using dependent t tests (no Bonferroni correction). In the following lines, the variable rmean, defined as the average correlation for a given cluster, is reported to provide an idea of how strong the correlation was for the entire cluster.
As shown in Figure 5A, a negative correlation was identified between beta power and the expected value model early in the delay period. This was revealed by a significant cluster that spanned a broad caudal region from −1630 to −980 ms (size = −3215.76, tmean = −3.37, p = 0.0065, dz = 0.70, rmean = −0.33). Interestingly, beta power was also found to negatively correlate with the motivational salience model (Fig. 5B). This was revealed by a significant cluster spanning all but the most rostral electrodes from −1530 to −40 ms (size = −9624.13, tmean = −3.32, p = 0.0065, dz = 0.69, rmean = −0.35). Hence, in contrast to the expected value cluster, the motivational salience cluster remained statistically significant and widespread throughout movement preparation. As such, direct comparison of the two models revealed that the motivational salience correlations were stronger late in the delay period compared with the expected value correlations (Fig. 5C). This difference was revealed by two significant clusters which were identified over parieto-occipital scalp sites from −320 to −190 ms (size = −292.76, tmean = −3.02, p = 0.01; dz = 0.63) and from −270 to −150 ms (size = −250.02, tmean = −2.91, p = 0.02, dz = 0.61). To better appreciate the beta power versus model correlations across the delay period, the average correlation strength and area (i.e., the number of electrodes comprising the cluster) were plotted as a function of time for each significant cluster (Fig. 5D,E). Overall, these data suggest that although beta power responses were inversely related to both the expected value and motivational salience models early in the delay period, they more strongly reflected motivational salience later in movement preparation.
Relationship between beta power and movement-related variables
Given the significant MT, endpoint error, and RT differences observed across conditions, it was next sought to determine whether delay period beta power responses correlated with either of these variables. To do so, correlational analyses akin to those described previously (see Relationship between beta power and models) were performed, except that mean MT, endpoint error, and RT for each INCENTIVE factor level, rather than model predictions, were correlated with beta power. As shown in Figure 6A, a significant positive correlation was identified between beta power and MT. This was revealed by two significant clusters that spanned all but the most rostral scalp sites from −1490 to −560 ms (size = 4390.42, tmean = 3.07, p = 0.0065, dz = 0.64, rmean = 0.33) and from −530 to −180 ms (size = 1268.96, tmean = 3.24, p = 0.015, dz = 0.68, rmean = 0.33). No significant positive (all clusters p = 1.00) or negative (all clusters p = 1.00) correlations were identified between beta power and either endpoint error (Fig. 6B) or RT (Fig. 6C). To better appreciate the beta power versus MT correlation throughout the delay period, the average correlation strength and area of the significant clusters were plotted as a function of time (Fig. 6D,E). Overall, these data suggest a link between pre-movement beta power and movement vigor.
Discussion
This study investigated how positive and negative monetary incentives influence pre-movement beta activity, which has long been associated with movement planning and execution (Jenkinson and Brown, 2011; Kilavik et al., 2013). Participants performed goal-directed reaching movements and were informed of the monetary consequences of hitting or missing the target before each trial (i.e., 0¢ or ±5¢). Results revealed strong modulations at parieto-frontal scalp sites, with beta power positively correlating with MTs. Critically, beta activity presented a transient sensitivity to potential rewards, but not punishments, early in the delay period, before reflecting motivational salience as movement onset neared. These results suggest that expected value and motivational salience are expressed on different time scales during reach planning, with beta reflecting the neural processes that mediate the invigorating effect of incentives on motor performance.
An important finding of the present study is that performance-contingent monetary incentives led to a marked decrease in pre-movement beta power. These modulations were most prominent over bilateral parieto-occipital regions, but also encompassed more frontal scalp sites. It is likely that the reduction in beta power reflected activity within reach-related regions of the posterior parietal cortex and premotor cortex. This would be supported by two studies that have specifically investigated the influence of incentives on reach preparatory activity in monkeys and humans, respectively, revealing increased spiking activity in the parietal reach region (Musallam et al., 2004) as well as increased blood oxygen level-dependent (BOLD) responses in the superior parietal lobule and premotor cortex (Iyer et al., 2010). An interesting possibility is that the beta modulations stemmed from interactions between these sensorimotor regions and the basal ganglia (Jenkinson and Brown, 2011), which are believed to influence movement vigor through motivation-related dopamine signals that dictate the worth of voluntary actions (Shadmehr and Krakauer, 2008; Turner and Desmurget, 2010; Berke, 2018). For instance, it has been shown that activity in the ventral striatum, which receives dopaminergic inputs from the substantia nigra pars compacta and ventral tegmental area (Tritsch et al., 2012), predicts incentive-driven changes in motor performance (Chib et al., 2012, 2014) and vigor (Opris et al., 2011). Hence, the positive correlation between pre-movement beta activity and MTs in the present study strongly, though indirectly, suggests a link between beta activity and nigrostriatal activity. Additionally, the increase in vigor was not accompanied by a decrease in accuracy, which rules out a simple speed-accuracy tradeoff (in fact in Reward and Mixed both vigor and accuracy were increased vs Neutral). This fits with recent results from Manohar et al. (2015), who showed that positive incentives can increase movement speed without having a detrimental effect on accuracy during an oculomotor capture task. Interestingly, these authors found that this ability to “break” the speed–accuracy tradeoff was diminished in Parkinson's patients, leading them to suggest that dopaminergic activity might contribute to the ergogenic effect of incentives. A link between beta and dopamine fits well with the emerging framework that beta oscillations linking basal ganglia and neocortex are inversely related to net dopamine levels (Jenkinson and Brown, 2011). In support, Apitz and Bunzeck (2014) used magnetoencephalography and showed that administration of levodopa, a dopamine precursor, reduced beta activity over parietal and frontal brain regions. It is thought that dopamine release would scale with the behavioral salience of external cues and exert a net suppressive effect on beta power (Jenkinson and Brown, 2011). Given that dopamine increases the signal-to-noise ratio within neuronal networks (Servan-Schreiber et al., 1990; Kroener et al., 2009), it has been suggested that the function of beta power reductions would be to enhance computational power within task-relevant circuits for the prospective resourcing and preparation of potential actions (Brittain et al., 2014). Hence a plausible interpretation of the present findings is that by increasing motivation, incentives exert a suppressive effect on beta power through increased dopamine levels, which facilitates and enhances sensorimotor processing during movement planning.
A key aspect of the current study design was the use of both reward- and punishment-predicting cues, which allowed to identify whether delay period beta activity reflected expected value and motivational salience at different latencies. Results revealed an interesting evolution over the course of the delay period. Specifically, beta power early in the delay period was selectively sensitive to the possibility of obtaining a reward, being lower in contexts in which a reward could (i.e., Reward and Mixed), versus could not (i.e., Neutral and Punish), be obtained. Later in the delay period, however, beta power became best described by the motivational salience model, being lowest in the Mixed condition, intermediate and similar in the Reward and Punish conditions, and highest in the Neutral condition. These results indicate that delay period beta activity in parieto-frontal regions was subject to two sequential phases: an early response showing preferential sensitivity to features associated with a potential reward (i.e., green color), followed by a more rational encoding of the reach plan scaling with the monetary stakes associated with the movement (i.e., similar weighing of rewards and commensurate punishments). This pattern strikingly resembles the two-component dopamine response that occurs during stimulus detection (Schultz, 2016). Specifically, midbrain dopamine neurons show an initial burst of activity tuned to the reward-predicting characteristics of a stimulus. Thereafter, a second response ensues, which is thought to reflect the subjective valuation of the stimulus and thus dictate its behavioral importance. In that regard, it is interesting to note that Platt and Glimcher (1999) had reported that spiking activity in the lateral intraparietal area was best correlated with expected reward shortly after stimulus presentation (≤400 ms). Much like in the present study, however, this relationship tended to weaken over the course of the delay period. A likely explanation for the present pattern of results is that beta activity was modulated more quickly following reward-predicting cues compared with punishment-predicting cues. Electrophysiologically, this finds echo with results showing that lateral intraparietal neurons initiate their firing more quickly in response to rewarding versus non-rewarding stimuli (Peck et al., 2009). The present findings may thus provide neurophysiological grounds for the behavioral evidence that decision-making unfolds more quickly in contexts where the best option is a net gain compared with when it is loss avoidance (Chapman et al., 2015). Additionally, these results may help explain recent evidence for mixed representations in the literature, in that the observed temporal shift in encoding may have been obscured using less temporally-resolved techniques such as fMRI. For instance, using a protocol similar to the present one, Iyer et al. (2010) found that BOLD responses associated with reach planning in the parietal and premotor cortex were most consistent with motivational salience models, but were also reasonably well fit by expected value models. That being said, given the spatial limitations of non-invasive EEG recordings, it is also possible that the present transition in beta response resulted from the multiplexing of signals emanating from different brain regions with a fixed representation of expected value or motivational salience (Kahnt et al., 2014; Barbaro et al., 2017), but whose relative contributions changed throughout the delay period. For instance, Barbaro et al. (2017) found that ventral visual areas are more sensitive to positive, compared with negative, stimuli, whereas regions neighboring the intraparietal sulcus are sensitive to both positive and negative stimuli in a visual search task. They argued that the occipital encoding of value may be faster to develop than the more rational parietal representation, which would only manifest when movement preparation is required. Although more work is warranted to disentangle between these possibilities, an unambiguous finding of the present study is that parieto-frontal beta power modulations best represented motivational salience as movement initiation neared.
Although beta activity showed clear sensitivity to the INCENTIVE factor, the PROBABILITY factor had no detectable effect. This finding is perplexing given that reward probability is known to modulate neural activity during action planning (Platt and Glimcher, 1999; Musallam et al., 2004; Sugrue et al., 2004; Yang and Shadlen, 2007). One possibility is that High and Low PROBABILITY cues, because of their relatively small difference in size (∼0.65 cm), led participants to plan their movements similarly for both PROBABILITY levels. This interpretation would be supported by the fact that endpoint errors were not significantly different across PROBABILITY levels.
To conclude, the present study shows that performance-contingent monetary incentives impact pre-movement parieto-frontal beta activity and increase movement vigor without reducing accuracy. The pattern of beta activity evolved during planning from an early sensitivity to potential rewards to a later encoding of motivational salience, reconciling some of the mixed representations reported in recent studies. Together, these findings suggest that pre-movement beta activity reflects the neural processes subtending the invigorating effect of incentives on motor behavior, likely mediated through increased net dopamine levels.
Footnotes
This work was supported by the Natural Sciences and Engineering Research Council (Grant 418589). We thank Maxime Bellefeuille and Mia Wu for help with data collection and statistical analyses, and Charles Étienne and Anne Catherine for insightful discussions during the drafting of the paper.
The authors declare no competing financial interests.
- Correspondence should be addressed to Pierre-Michel Bernier at Pierre-Michel.Bernier{at}USherbrooke.ca