Abstract
Recent theories of voluntary control predict that multiple motor strategies can be precomputed and expressed throughout movement. We examined online decisional processing in humans by asking them to make reaching movements with obstacles located just to the sides of a direct path between start and end targets. On random trials, the limb was perturbed with one of four mechanical loads that varied in direction and amplitude. Notably, we observed two different strategies when we applied a perturbation (left medium-sized) that deviated the participants' hand directly toward an obstacle. In some trials, subjects directed their hand between the obstacles and in other trials to the left of the obstacles. Importantly, changes in the muscle stretch response between these two strategies were observed in <60 ms after perturbation, during the R2 long-latency epoch (∼45–75 ms). As predicted, the selected strategy depended on the estimated position of the limb when it was perturbed. In our second experiment, we presented either one or three potential goal targets. Movements initially directed to the closest target could be quickly redirected to other potential targets after a perturbation. Differences in muscle stretch responses for redirected movements were observed ∼75 ms after perturbation during the R3 long-latency epoch (∼75–105 ms). The results show that decisional processes are rapidly implemented during movement execution. In addition, our data suggest a hierarchical process with corrective responses on “how” to attain a behavioral goal expressed during the R2 epoch and responses on “what” goal to attain during the R3 epoch.
Introduction
Athletes often exemplify the ability to rapidly alter a motor action when circumstances change. For example, physical contact in sports, such as football or rugby, can push an athlete from their selected course, resulting in a rapid decision to change his/her running direction. This flexibility begs an important question: How does feedback influence decision-making between competing movements or goals during an ongoing motor action. Recent work highlights the motor system's ability to consider multiple potential targets before movement onset (Cisek and Kalaska, 2005; Chapman et al., 2010). Various factors influence movement selection, such as the expected gain associated with movement outcomes (Trommershäuser et al., 2003, 2008), limb biomechanics (Cos et al., 2011), and the reliability of sensory information (Roitman and Shadlen, 2002; Körding, 2007; Resulaj et al., 2009). These decisions reflect a serial process of planning followed by movement execution. However, our interest is to understand how sensory feedback for online movement control can influence how we choose to move in a complex environment.
Optimal feedback control (OFC) offers a theoretical framework to address how motor actions should be updated according to the task goal, taking movement variability and environmental perturbations into account (Todorov and Jordan, 2002; Scott, 2004). Within this framework, motor commands are determined by a cost function that reflects the objective of the motor action. Importantly, these motor commands depend on the estimated position of the limb at each point in time so that the expected remaining cost to accumulate until attaining the movement goal, called the cost-to-go, is minimized. The model makes an important prediction when multiple ways or goals are available to achieve task success. If a perturbation pushes the movement away from the initial best option, another option may become more desirable, inducing a change in movement trajectory or goal. Hence, the model predicts that motor decisions to select new trajectories or alternate goals may result from changes in the position of the limb according to the cost-to-go. If this decisional process is performed during movement, then we hypothesize that rapid motor responses to perturbations (∼50–100 ms) may express distinct strategies depending on the position of the hand. Alternatively, if decisional processes and motor execution are dissociated, then changes in motor strategies during movement should be associated with reaction times similar to voluntary reaction times (EMG > 100 ms).
To test this hypothesis, we investigated how rapidly the motor system can select how to navigate around obstacles after a mechanical perturbation (Experiment 1) or to a new target (Experiment 2). We developed an optimal control model to characterize how alternate movement trajectories could be selected and found that human subjects generated qualitatively similar corrective responses. Critically, task-specific changes in motor responses were observed in as little as 60 ms. Furthermore, our simulations based on OFC reproduced the dependency of the decision on the estimated instantaneous position of the limb, suggesting that an online monitoring of the state of the limb might be used to produce a feedback response that selected the best out of multiple options.
Materials and Methods
Participants
A total of 34 subjects (23 males and 11 females, aged 21–36 years, all right hand dominant) participated in one of 4 experiments. Experiment 1A included 10 subjects, whereas all other experiments included 8 subjects. All subjects were neurologically healthy and gave informed consent according to a protocol approved by the Queen's University Research Ethics Board. Experiments lasted ∼2 h, and subjects were financially compensated for their time.
Apparatus and experimental design
Experiments used a robotic device (KINARM Exoskeleton, BKIN Technologies) permitting elbow and shoulder movement in the horizontal plane (Scott, 1999; Singh and Scott, 2003; Nashed et al., 2012). In addition to recording flexion/extension movement of each joint, the KINARM robot can displace the arm by applying mechanical loads. Projected target lights and hand feedback were presented in the plane of the arm using a TV monitor and a semitransparent mirror. The experiments were performed with and without visual hand feedback during movement.
Muscle recordings
Surface electromyographic (EMG) recordings were obtained from the lateral triceps, an elbow extensor. Full details of the procedures are described in our earlier study (Nashed et al., 2012). One subject was excluded from the analysis because of excessive cocontraction (normalized preperturbation muscle activity, >3 SD larger than baseline differences observed across all subjects).
Experiment 1A: obstacle avoidance after mechanical perturbations
Calibration block: determining obstacle placement. This experiment examined how feedback influenced the decisional process to navigate around obstacles in the environment following a variety of mechanical perturbations. The experiments began with a preliminary test of each subject's corrective response to perturbations without the presence of obstacles. Subjects (n = 10) performed reaching movements from a start target (limb configuration: shoulder −5°; elbow 95°) to an end target (radius = 1 cm) located 10 cm in front of the start target (Fig. 1A). All trials began with the gradual onset (ramp up: 100 ms) of a 0.75 Nm elbow torque flexion load, which excited the elbow extensors. This elbow extension background load was present throughout each trial. Subjects were required to initially stabilize and hold (random time 1–2 s) their hand within the start target (radius = 0.5 cm) before movement. After the hold period, the end target appeared and visual hand feedback was removed. Subjects were instructed to perform simple reaching movements from the start to the end target. As subjects approached the end target (within a 2 cm radius), visual feedback was restored so that they could easily attain the spatial goal. Upon trial completion, subjects were notified as to whether they attained predetermined speed and accuracy criteria (successful; end target filled green, failures; too fast <500 ms; end target filled red, too slow >800 ms; end target filled yellow).
Experimental setup. A, Depiction of experimental setup in Experiment 1A with circular obstacles within the workspace (filled circles) and one circular goal. Subjects made reaching movements to a circular end target (open circles, radius = 1 cm); and occasionally, the limb was perturbed rightward or leftward with applied joint torques. τe, Elbow torque; τs, shoulder torque. Thick arrow indicates evoked motion. The black line indicates corrective responses of an exemplar subject without obstacles. B, Schematic of the torques applied to the shoulder and elbow. The background load (Bkg load) was ramped up >100 ms, and step torque perturbations (vertical second dashed line) were applied just after movement onset (vertical solid line). C, Same as in A, depiction of experimental setups of Experiment 1B (top) and Experiment 1C (bottom) with circular obstacles within the workspace (top; filled circles) and either three circular goals (top; solid circles) or a bar target (bottom; dashed line). Similar to A, the limb was occasionally perturbed rightward or leftward with applied joint torques. τe, Elbow torque; τs, shoulder torque. Thick arrow indicates evoked motion. D, Depiction of Experiment 2 setup with either one (solid circle) or three (solid and dashed circles) in the workspace. Subjects made reaching movements in both target conditions (circles, radius = 1 cm); and occasionally, the limb was perturbed rightward or leftward with applied joint torques. τe, Elbow torque; τs, shoulder torque. Thick arrow indicates evoked motion.
On random trials, step torques were applied to the limb just after movement onset when shoulder and elbow angles were ∼0° and ∼90°, respectively (Fig. 1A). Either flexion/flexion (elbow: 1 Nm; shoulder: 1 Nm) or extension/extension (elbow: −1 Nm; shoulder: −1 Nm) torques were applied, which deviated the hand to the left or right, respectively. Furthermore, timing constraints were loosened on perturbation trials, such that subjects had more time to attain the spatial goal (too slow >1200 ms). After a short familiarization block, subjects performed 1 baseline block, which interleaved 30 unperturbed trials and 20 perturbation trials (10 right and 10 left), for a total of 50 trials.
Main experiment: obstacle avoidance when reaching to a small circular goal
After the preliminary test, subjects were asked to perform similar reaching movements with the added constraint of virtual obstacles (radii = 1 cm) in the environment. Two circular virtual obstacles (mechanical feedback provided when contacted) were presented to subjects and located to the right and left of the unperturbed hand trajectory (Fig. 1A). The locations of these obstacles were strategically positioned to block the corrective responses elicited in the preliminary test (Fig. 1A). Intersubject differences in obstacle placement did not exceed 1.4 cm in all experiments. On selected trials, one of four possible joint torque perturbations was applied (Fig. 1B): (1) rightward (elbow: 1 Nm; shoulder: 1 Nm); (2) small leftward (elbow: −0.5 Nm; shoulder: −0.5 Nm); (3) medium leftward (elbow: 1 Nm; shoulder: 1 Nm); or (4) large leftward (elbow: 2 Nm; shoulder: 2 Nm). Subjects readily countered the loads and avoided the obstacles (∼80% success). Subjects performed 3 blocks, which interleaved 40 unperturbed trials and 28 leftward perturbation trials (6 large, 16 medium, and 6 small), and 12 rightward perturbations, for a total of 120 unperturbed trials and 120 perturbation trials.
Experiments 1B and 1C: obstacle avoidance and target selection when reaching to three small circular goals or a rectangular bar
These experiments identified whether changes in the behavioral goal altered the timing of corrective responses to avoid the obstacle. The start position and obstacle placements were similar to Experiment 1A. Experiment 1B examined whether subjects would select a new end target when avoiding an obstacle during movement, following a mechanical perturbation. After the preliminary test, subjects performed reaching movements to one of three potential targets (radii = 1 cm, each). The central end target was positioned exactly as in Experiment 1A. The additional targets were located to the left and right of the central end target and were directly behind the obstacles (Fig. 1C, top). Experiment 1C extended our previous work on corrective responses to a rectangular bar (Nashed et al., 2012) by exploring how the placement of obstacles influenced this response (Fig. 1C, bottom). The center of the rectangular bar was positioned in the same location as the end target from Experiment 1A (bar length = 40 cm, width = 3 cm). All other aspects of these experimental protocols were identical to the Experiment 1A.
Experiment 2: target selection after mechanical perturbations
This experiment examined how the presence of multiple potential goals influenced corrective responses, but in this case, without any obstacles in the environment. Similar to Experiment 1A, a preliminary test was completed in which subjects made reaching movements to a central end target without obstacles and with perturbations applied on random trials. This preliminary test did not influence the subsequent experiment but allowed subjects to become familiar with the basic experiment and provided consistency with the protocols in Experiment 1.
In the main experiment, subjects were asked to perform reaching movements to either a single circular target (Fig. 1D; size and location same as in Experiment 1A) or three circular targets (Fig. 1D; size and location same as in Experiment 1A and two other identical targets located 5 cm to the left or right). On selected trials, joint torque perturbations were applied to the limb (Fig. 1B). The location, magnitude, and frequency of these perturbations were the same as in the previous Experiment 1.
Model
Optimal control derives a control function that minimizes a cost function expressed in terms of state and control variables of a dynamical system (Bryson and Ho 1975; Todorov and Jordan, 2002; Todorov, 2004). The solution is usually derived from the Hamilton-Jacobi-Bellman equation that characterizes the cost-to-go, that is, the cost of the remaining trajectory under the optimal control policy. In principle, this approach provides an optimal state-dependent control policy, u*(x), which can handle arbitrary constraints (x represents the system state, and u represents the control variable). However, current numerical methods for nonlinear systems are based on quadratic approximation of the cost-function that cannot easily represent constraints of higher order, such as the presence of an obstacle (Li and Todorov, 2007).
For the first experiment, we countered this difficulty by coupling two simple optimal control problems as follows. Each control problem was modeled as the translation of a point mass (m = 1) in the horizontal plane, given by the following differential equations:
where p(t) represents the 2D coordinate vector, G is the viscous constant, F and Fext represent the controlled force and external force, respectively, and u represents the 2D control vector. The dynamics of each system were discretized to include stochastic noise and given as follows:
where xk represents the state vector, including position, velocity, and controlled and external forces. The corresponding feedback signal at each time step is as follows:
The first controller is derived to pass through a via-point that is located between the two obstacles, and the second controller is derived to pass through a via-point that is located on the left side of both obstacles. The task is to pass through a via-point p(tv), at a given time interval during the reach (0.5 s < tv < 0.6 s) and to stop at the target (p*), at the end of the movement duration (tf = 1 s). The task error is defined as follows:
where the first term enforces passing through the via-point, the second and third terms enforce stopping at the end target. The cost matrices Qi expressed that the coordinates of the mass in the plane were selectively constrained. For the via-point (Q1), only the x-coordinate was constrained for the time interval 0.5 s < tv < 0.6 s. The endpoint cost (Q2) expressed the shape of the target goal (dot, bar or multiple targets). The endpoint velocity cost (Q3) always constrained the two dimensions of the plane to enforce that the movement stopped in all cases of target shape or configuration.
The full control system and model parameter were fully described in our earlier study (Nashed et al., 2012). As our primary objective was to simply characterize state-dependent changes in movement strategy, we did not manipulate the model or noise parameters in an attempt to reproduce the exact trajectory of the subjects' limb. Regarding the obstacle avoidance procedure, we define J1(xk, uk) and J2(xk, uk) as the cost-to-go functions associated with the center and left via-points, respectively (or target 1 and 2, respectively). We define a modified control problem where the cost-to-go is as follows:
and the associated optimal control action is readily given by the following:
We now concentrate on the fact that the process is corrupted by motor noise and state feedback is only available through delayed and noisy sensors. We use standard techniques based on system augmentation to handle time delays in the feedback loop. The feedback delay was set to 50 ms. In theory, the cost-to-go functions are given by the following:
where ek is the estimation error, Si,x and Si,e are known matrices, and si is a given non-negative scalar quantity. These parameters (Si,e, Si,x, and si) follow directly from the derivation of the optimal feedback gains and optimal Kalman gains (Todorov, 2005; Crevecoeur et al., 2011). Because the Kalman filter produces unbiased state estimation, the controller can derive an unbiased estimate of the cost-to-go by ignoring the estimation error and compute the following:
using the estimated state instead of the true state. The full control algorithm was implemented as follows: (1) derive the optimal control policy and linear state estimator associated with each via-point trajectory; and (2) apply the control policy associated with the minimum cost-to-go across the two possible trajectories (Eq. 8).
Finally, we should emphasize that our theoretical approach does not make any prediction about the underlying neural implementation, and the optimization formalism to this end may not be practically useful. However, it is possible that a rather simple neural implementation generates state-dependent feedback control that approaches the prediction formulated in the context of OFC. Indeed, it is reasonable to expect that a rather simple sensorimotor map of response gains can be adjusted to take the presence of obstacles into account. OFC is a formal model used to describe behavior and does not make any predictions about the underlying neural implementation.
Data analysis
Filtering and normalization.
All data were aligned on perturbation onset. EMG was normalized by its mean activity in the final end target of the setup block where subjects maintained a constant posture against the medium perturbation torques applied to the elbow. Full details of the filtering procedures are described in our earlier studies (Pruszynski et al., 2008; Nashed et al., 2012).
Kinematics
For the obstacle conditions (Experiment 1), we were most interested in comparing the distribution of trials where the hand passed between versus around the obstacles for the medium-sized perturbation. To pool all subject data, we normalized all trials by subtracting the subject's mean from each individual trial and dividing the result by the respective subject's SD. All means and SDs were calculated in Cartesian x,y space. Thus, datasets were aligned on their means and with similar overall distributions. Kolmogorov-Smirnov (K-S) tests were then performed to determine significant differences between the distributions associated with each “strategy”: hand passing between versus around the obstacles (Massey, 1951). Distributions (all trials and separate populations for each strategy) were fit with unimodal and bimodal distributions and used Akaike Information Criterion to determine the goodness of fit (Ljung, 2001). Hand distributions were estimated using a kernel density estimate, for illustration purposes (Bowman and Azzalini, 1997).
To address whether the trials preceding medium perturbations influenced the decision to move around or between the obstacles, we quantified the effect of the lateral deviation of the previous trial on the subsequent trial's initial reach direction. We quantified the maximum lateral deviation (x-position) caused by each perturbation and the effects on the initial reach direction of the subsequent medium perturbation trial. Ultimately, we compared the movement strategies of the medium perturbation as a function of the previous trial and tested the difference in distributions using a K-S test.
Muscle activity
Stretch response epochs of muscle activity were based on earlier reports: Baseline (Pre) = −100 to 0; R1 = 20–45 ms; R2 = 45–75 ms; R3 = 75–105 ms; and early voluntary (EV) = 105–135 (Lee and Tatton, 1975; Crago et al., 1976; Mortimer et al., 1981; Nakazawa et al., 1997; Pruszynski et al., 2008; Nashed et al., 2012). The EV epoch was chosen such that it was similar in size (30 ms) to the preceding epochs.
We used a receiver-operator characteristic (ROC) technique to determine when the muscle activity was reliably different between the two movement “strategies” following the medium perturbation (Green and Swets, 1974). For each time step (1 ms), we generated an ROC curve representing the probability of discrimination between the two responses based on muscle activity for the same perturbation. Values of 0 and 1 indicate perfect discrimination, whereas a value of 0.5 indicates performance at chance. We determined that muscle activity was reliably different when the ROC curve surpassed a threshold of 0.75 for 5 consecutive ms. We then calculated the point when the ROC curve began to deviate from chance (Thompson et al., 1996; Pruszynski et al., 2008; Nashed et al., 2012), termed the “knee,” by regressing the ROC values located 15 ms before the discrimination point then calculating the time when this line intersected the preperturbation ROC results.
Results
Experiment 1A: obstacle avoidance when reaching to a small single goal
Model
We used an OFC model to conceptualize ideal performance in each experiment. Before movement initiation, each control policy corresponding to a potential reach path is determined by its own cost-to-go, which is the minimum cost expected to accumulate to reach the behavioral goal (Åström, 1970; Bryson and Ho, 1975; Todorov and Jordan, 2002; Todorov, 2004; and Aring). During movement execution, the controller simply selects the policy associated with the minimum estimated cost-to-go based on the present estimated position of the hand. The simulations shown in Figure 2A parallel the behavior of Experiment 1A. The unperturbed reaching movements were relatively straight with bell-shaped velocity profiles directed between the two obstacles (Fig. 2A). For small perturbations, the load was quickly countered and the movement path continued between the obstacles to reach the end target (Fig. 2A). The largest perturbations produced movements that navigated around both obstacles to the end target (Fig. 2A). Medium perturbations, which were directed toward the obstacles, produced a mixture of these two strategies, with some trials navigating between the obstacles and others navigating to the left of the obstacles (Fig. 2A). Which strategy was selected on a given trial depended on the estimated position of the hand, which combines internal predictions and sensory feedback about the actual hand position. Thus, trials in which the hand was to the right at perturbation onset tended to go between the obstacles, whereas when it was to the left tended to go to the left of the obstacles. Because sensory, prediction, and motor signals are affected by noise, and sensory feedback is delayed, the estimated hand location is uncertain, which explains why there is no strict separation in distribution between the hand coordinate associated with the two strategies (Fig. 3A,B). However, these distributions are significantly different (K-S test, D = 0.21, p = 0.014) as a consequence of the state-dependent control policy. Also, the cost-to-go (and therefore the switch in motor strategy) depends on other variables, such as the hand velocity. These considerations explain why there is no strict separation of the hand coordinate between strategies.
Theoretical and empirical movement trajectories for Experiment 1A. A, Trajectories generated by the optimal feedback controller for each load condition. Red and blue trajectories represent trials that went outside of both obstacles and between both obstacles, respectively. The black arrows indicate the relative perturbation magnitudes and spatial location where they were applied. B, Hand trajectories from a representative subject for each perturbation magnitude.
Initial hand trajectories and position for the medium-sized leftward perturbation in Experiment 1A. A, Mean and SE of x-axis trajectories generated by the optimal control model through the first 200 ms of movement for reaches that went between the obstacles (blue) and left of both obstacles (red). B, Distributions of hand positions along the x-axis 50 ms after perturbation onset produced by the optimal feedback controller. The arrows indicate the mean of each distribution (colors same as in A). Black line indicates distribution for all trials (blue and red combined). C, D, Corresponding results of pooled subject data.
How estimates of the perturbation load interact with estimates of the hand location is an interesting question for prospective studies. Simulations indicate that the state-dependent switch in reaching path vanishes after increasing the variance about the medium perturbation load. The reason is that the actual load magnitude has a greater impact on the decision than the hand location. Hence, future studies can use our paradigm to address how well the brain estimates the hand location and the perturbation magnitude by determining the amount of perturbation variance beyond which the state-dependent switch is no longer observed.
Human behavior
In human subjects, the unperturbed reaches were straight with bell-shaped velocity profile and unaffected by the presence of the obstacles (Fig. 4). Figure 4 illustrates the mean kinematic behavior of the unperturbed reaching movements for a representative subject. Random perturbations applied just after movement onset resulted in distinct strategies to avoid the obstacle and reach for the target. When small rightward or leftward perturbations were applied, subjects easily corrected the deviation and continued to pass between the two obstacles as in the unperturbed case. For large leftward perturbations, subjects switched their intended trajectory and navigated a new path to the left of the obstacles. With medium leftward perturbations, subjects used one of these two strategies, with some movements passing between the two obstacles and other movements passing left of both obstacles. Figure 2B illustrates the behavior for an exemplar subject, which qualitatively matched the behavior produced by the model. Figure 5 illustrates the percentage of trials that each subject navigated between the obstacles for the small, medium, and large leftward perturbations. On average, ∼45% of trials go between versus around the obstacles for these medium-sized perturbations. These strategies were observed for each subject with and without vision during the reaching movements (data with vision not shown).
Kinematics of an exemplar subject in Experiment 1A. A, The mean and SD of the Y-position over the course of the reaching movement. Red and blue trajectories represent trials that went outside of both obstacles and between both obstacles, respectively. The black trace indicates the unperturbed movements. The vertical black line indicates perturbation onset. B, Mean and SD of speed in Y-position for the reaching movement. C, Mean and SD of shoulder angle over the course of the movement. D, Mean and SD of the X-position over the course of the reaching movement. E, Mean and SD of speed in X-position for the reaching movement. F, Mean and SD of elbow angle over the course of the movement.
Movement strategies for each perturbation magnitude. Representation of the percentage of trials that proceeded between the obstacles for each subject and flexion load size in Experiment 1A.
The overall distribution of hand x-positions at perturbation onset for all the medium-sized perturbations was unremarkable as they paralleled the overall distributions observed for unperturbed trials (K-S test, D = 0.12, p = 0.101) and for the small (K-S test, D = 0.10, p = 0.236) and large perturbations (K-S test, D = 0.12, p = 0.119), suggesting that, even for perturbed reaches, subjects were planning to reach straight between the obstacles.
We were most interested in the two strategies observed after the medium perturbation (Figs. 2B and 4) We found that the strategy to avoid the obstacle for the medium-sized perturbations depended on hand position at the beginning of the perturbation (Fig. 3C). Trials in which the subject went left of both obstacles tended to be more leftward 50 ms after perturbation when corrective responses had not yet influenced the limb (Fig. 3D; K-S test, p ≪ 0.001). This position-dependent selection was also present even 1 ms before the perturbation compared with trials in which the subject remained between the two obstacles (K-S test, D = 0.20, p = 0.03). Although we focused our analysis on the x-positions at perturbation onset, we should emphasize that the decision to navigate around obstacles likely considers not only position but velocity as well.
Finally, we quantified the maximum lateral displacement of the previous trial to determine its effect on the initial reaching direction on the subsequent trials. We determined that lateral error on the previous trial did not significantly influence the initial reaching direction on the subsequent trial (K-S test, D = 0.17, p = 0.084).
We recorded the activity of the muscle that was stretched by the perturbation (triceps lateralis) to identify the time when the motor system reflected each strategy. We were most interested in comparing the perturbation-related activity between trials that navigated between the obstacles versus around both obstacles after the medium perturbation. The perturbation-related muscle activity was quantified for each subject by taking the mean of the perturbed trials for each strategy (between vs around) and subtracting the mean of the unperturbed trials (Kurtzer et al., 2009). Paired t tests were performed to compare changes in means of corresponding epochs of muscle activity between movement “strategies” (around vs between obstacles) after the medium perturbation. In triceps lateralis (Fig. 6A–C), the R1 response (20–45 ms) was similar for the two strategies (paired t test, T(8) = 1.03, p = 0.331). Significant increases in EMG were observed in the R2 (45–75 ms) and R3 (75–105 ms) long-latency time periods (R2: T(8) = 4.13, p = 0.004; and R3: T(8) = 4.38, p = 0.003) and EV epochs of time (T(8) = 4.11, p = 0.005; 105–130 ms) when the subject generated a large corrective response to remain between the obstacles. Analysis of individual subjects yielded similar results to the group (t test, p < 0.05), with 1, 7, 8, and 9 of 9 subjects demonstrating modulation of R1, R2, R3, and EV epochs, respectively. Preperturbation activity was statistically ∼8% higher (T(8) = 2.90, p = 0.026) for trials navigated between the obstacles, although the magnitude of this effect was very small. These changes in preperturbation muscle activity are likely a reflection of the state (i.e., position and velocity) of the limb. Specifically, greater preperturbation activity in lateral triceps would result in the hand being more to the right before the perturbation and this position then leads to a higher likelihood to move between the obstacles when perturbed.
Muscle activity for the medium-sized leftward perturbation. A, Mean activity of lateral triceps across all subjects aligned to perturbation onset (vertical line). Red and blue trajectories represent trials that went outside of both obstacles and between both obstacles, respectively. The black line indicates muscle activity for the unperturbed reaches. B, Perturbation-evoked response of lateral triceps obtained after subtracting the activity of unperturbed reaches for each individual subject. Black line indicates mean, and shaded color represents SE. C, Difference in muscle activity between the two responses in B (mean ± SE). *p < 0.05. D, Mean activity of posterior deltoid across all subjects aligned to perturbation onset (vertical line). E, Perturbation-evoked response of posterior deltoid obtained after subtracting the activity of unperturbed reaches for each individual subject. F, Difference in muscle activity between the two responses in E (mean ± SE). *p < 0.05.
Our previous work suggests that such a small load-related change in preperturbation muscle activity could not produce such large task-dependent changes in the long latency (Pruszynski et al., 2009; Nashed et al., 2012). The R1 response is known to be most sensitive to changes in preperturbation muscle activity. However, our data failed to demonstrate any significant difference in the R1 epochs. Furthermore, the long latency epoch has shown reduced or little sensitivity to changes in preperturbation muscle activity (Pruszynski et al., 2009), particularly during reaching (Nashed et al., 2012). This suggests that the small but significant changes observed before perturbation could not account for the large task-dependent differences observed in the long latency epochs.
To verify further that preperturbation activity did not influence postperturbation epochs, trials that showed increased muscle activity just before perturbation onset were removed (∼10% of trials) from the analysis. After removing these trials, the preperturbation muscle activity was effectively the same (paired t test, T(8) = 1.84, p = 0.18). The R1 epoch was similar between the two decisions (T(8) = 1.23, p = 0.24). However, we still observed a consistent significant difference in the long latency epoch even with the removal of some trials (R2: T(8) = 2.98, p = 0.02; R3: T(8) = 3.76, p = 0.006). ROC analysis revealed differences in muscle activity for the two types of strategies (between vs around obstacles) that deviated from chance at 57 ms (knee).
We observed similar trends in the shoulder muscle, namely, posterior deltoid (Fig. 6D–F). The R1 (T(8) = 1.43, p = 0.164) was similar between the two strategies. In contrast, significant increases were observed for the R2 (T(8) = 4.13, p = 0.004), R3 (T(8) = 4.13, p = 0.004), and EV epochs (T(8) = 4.38, p = 0.003) for the strategy to navigate between obstacles compared with movement around both obstacles.
Experiment 1B: obstacle avoidance when reaching to three potentials goals
As in Experiment 1A, we found that small perturbations did not deter subjects from continuing between the obstacles, whereas the large perturbation caused subjects to navigate around both obstacles most of the time (Fig. 7A). Medium perturbations resulted in a mixture of strategies, including passing between and around the obstacles (Fig. 7A,B). Movement strategies expressed for the medium perturbation again appeared to depend on hand position (Fig. 7C,D). Those trials that went around the obstacles tended to be more leftward at 50 ms after perturbation, whereas those that navigated between both obstacles tended to be more right at 50 ms after perturbation (K-S test, D = 0.23, p = 0.002). Furthermore, the overall distributions of hand positions of each load magnitude (small, medium, and large) at perturbation onset were similar to that of the unperturbed trials (K-S test, D = 0.01, p ≫ 0.05). We determined that lateral error on the previous trial did not significantly influence the initial reaching direction on the subsequent trial (K-S test, D = 0.18, p = 0.062).
Behavior in each experimental condition in Experiment 1B. A, Hand trajectories from a representative subject for each perturbation magnitude. Red and blue trajectories represent trials that went outside of both obstacles and between both obstacles, respectively. The black arrows indicate the relative perturbation magnitudes and spatial location where they were applied. The red arrow highlights rightward deflections toward the central target of some trials. B, Representation of the percentage of trials that proceeded between the obstacles for each subject and flexion load size in Experiment 1B. C, Mean and SE of x-axis hand trajectories through the first 200 ms of movement across all subjects for reaches that go between the obstacles (blue) and left of both obstacles (red). D, Distributions of hand positions along the x-axis 50 ms after perturbation onset across all subjects. The arrows indicate the mean of each distribution (colors same as in A). Black line indicates distribution for all trials (blue and red combined).
We observed differences in the selection of the end goal based on whether subjects navigated their hand between or around the obstacles. Subjects who went around both obstacles almost always (96%) switched to a new target regardless of the perturbation magnitude. Conversely, if subjects navigated their hand between the obstacles, their terminal hand position was at the center target regardless of the perturbation. However, on some trials, it appears that the selection of the left end target may have occurred after avoiding the obstacle. For example, Figure 7A (left) illustrates a rightward deflection just after passing the obstacle in some hand trajectories for the largest perturbation (small red arrow). This redirection suggests that the decision to select an alternate target may have occurred after the selection to avoid the obstacle.
We quantified perturbation-related muscle activity in the triceps lateral for the medium perturbation trials that navigated around both obstacles compared with those trials that navigated between both obstacles (Fig. 8). The difference in evoked muscle activity is somewhat reduced compared with Experiment 1A, which may be the result of the presence of the lateral target. However, qualitatively, they follow the same trends. The R1 (20–45 ms) was similar for the two strategies (paired t test, T(7) = 1.47, p = 0.176). Significant increases in muscle activity were observed in the R2 (45–75 ms) and R3 (75–105 ms) long-latency time periods (R2: T(7) = 2.61, p = 0.029; and R3: T(7) = 4.50, p = 0.001) and EV epochs of time (T(7) = 7.91, p ≪ 0.001; 105–130 ms) when the subjects generated a larger corrective response to remain between the obstacles (Fig. 8). Similar to the group results, individual subjects analysis revealed (t test, p < 0.05) that 1, 6, 7, and 8 of 8 subjects modulated the R1, R2, R3, and EV epochs, respectively. Preperturbation activity was ∼14% higher and statistically significant (T(7) = 2.58, p = 0.03) for trials that navigated between the obstacles compared with around obstacles. However, the magnitude of this effect was very small and cannot account for the differences observed during the long-latency and voluntary epochs. Similar to Experiment 1A, we verified that preperturbation activity had no influence on the later response epochs. When we removed trials (∼10%) that showed increased muscle activity just before perturbation onset from the analysis, we observed significant differences only in the R2, R3, and the EV epochs (p < 0.05). ROC analysis revealed differences in muscle activity for the two types of strategies (between vs around obstacles) that deviated from chance at 52 ms (knee).
Elbow extensor muscle activity for the medium-sized leftward perturbation in Experiment 1B. A, Mean activity of lateral triceps across all subjects aligned to perturbation onset (vertical line). Red and blue trajectories represent trials that went outside of both obstacles and between both obstacles, respectively. The black line indicates muscle activity for the unperturbed reaches. B, Perturbation-evoked response obtained after subtracting the activity of unperturbed reaches for each individual subject. Black line indicates mean, and shaded color represents SE. C, Difference in muscle activity between the two responses in B (mean ± SE). *p < 0.05.
The optimal control model produced qualitatively similar results (Fig. 9A,B). Trials in which the hand passed around the two obstacles always selected the left end target. The small rightward deviations toward the central target observed for human subjects was not reproduced by the model. However, this simply reflects that the model only considers two competing feedback control policies to capture the state-dependent switch in movement path or goal target. As such, and by design, the model predicts that changes in movement path and changes in endpoint goal occur at the same time. However, our data indicate that there may be multiple stages in this process (see Discussion)
Theoretical movement trajectories and position for the medium-sized leftward perturbation in Experiments 1B and 1C. A, Trajectories generated by the optimal feedback controller for Experiment 1B. Red and blue trajectories represent trials that went outside of both obstacles and between both obstacles, respectively. B, Mean and SE of x-axis trajectories in Experiment 1B generated by the optimal control model through the first 200 ms of movement (colors same as in A). C, Trajectories generated by the optimal feedback controller for Experiment 1C (colors same as in A). D, Mean and SE of x-axis trajectories in Experiment 1C generated by the optimal control model through the first 200 ms of movement (colors same as in A).
Experiment 1C: obstacle avoidance when reaching to a bar goal
The use of a rectangular bar as an end goal highlighted qualitatively similar results to the previous experiments. Medium perturbation produced a mixture of behaviors, with some trials going around and others between both obstacles (Fig. 10A). The decision to navigate around or between both obstacles appeared to be position dependent. Trials in which the hand navigated to the left of both obstacles were associated with more leftward hand positions 50 ms after perturbation (Fig. 10B). We observed differences in the final hand position that appeared to be dependent on whether subjects navigated their hand between or around the obstacles. In trials when subjects navigated around the obstacles they reached the end target significantly more leftward of the center position (−3.6 ± 1.5 cm) than those trials that reached the end target (−0.7 ± 0.8 cm) by navigating between the obstacles (paired t test, T(7) = 3.36, p = 0.011). The overall distributions of hand positions of each load magnitude (small, medium, and large) at perturbation onset were similar to that of the unperturbed trials (K-S test, p ≫ 0.05). Again, we found little effect of the previous trials displacement on the current trials in both variants just as in Experiment 1 (K-S test, D = 0.16, p = 0.071).
Behavior and muscle activity for the medium-sized leftward perturbation in Experiment 1C. A, Hand trajectories from a representative subject for the medium perturbation magnitude. Red and blue trajectories represent trials that went outside of both obstacles and between both obstacles, respectively. B, Distributions of hand positions along the x-axis 50 ms after perturbation onset across all subjects. The arrows indicate the mean of each distribution (colors same as in A). Black line indicates distribution for all trials (blue and red combined). C, Perturbation-evoked response obtained after subtracting the activity of unperturbed reaches for each individual subject. Black line indicates mean, and shaded color represents SE. D, Difference in muscle activity between the two responses in C (mean ± SE). *p < 0.05.
We quantified the perturbation-related muscle activity in triceps lateral for the movement strategies between and around obstacles. The results are illustrated in Figure 10C. The R1 (20–45 ms) was similar for the two strategies (Fig. 10C; paired t test, T(7) = 1.59, p = 0.155). Significant differences in muscle activity remained in the R2 and R3 long-latency time periods (R2: T(7) = 3.21, p = 0.015; and R3: T(7) = 5.46, p = 0.001) and EV epochs of time (T(7) = 5.20, p = 0.005) when the subjects generated a larger corrective response to remain between the obstacles (Fig. 10C). Analysis of individual subjects yielded similar results to the group (t test, p < 0.05), with 2, 5, 7, and 7 of 8 subjects demonstrating modulation of R1, R2, R3, and EV epochs, respectively. Preperturbation activity was again statistically higher (∼18% larger; T(7) = 2.90, p = 0.026) for trials that navigated between obstacles compared with around obstacles. However, this difference is unlikely to account for the differences observed during the long-latency and voluntary epochs. Similar to above, we removed trials (∼10%) with significantly higher preperturbation muscle activity from the analysis to examine the effects on the subsequent epochs of muscle activity. We found no differences in the R1 epoch (T(7) = 1.59, p = 0.155), but significance differences remained in the R2, R3, and EV epochs (R2: T(7) = 3.21, p = 0.015; R3: T(7) = 5.46, p ≪ 0.01; EV: T(7) = 4.69, p ≪ 0.01). ROC analysis revealed differences in muscle activity for the two types of strategies (between vs around obstacles) that deviated from chance at 51 ms (knee).
The optimal control model produced qualitatively similar results (Fig. 9C,D). Trials in which the hand passed around the two obstacles always selected positions on the bar to the left. The dispersion of hand positions was larger than those observed for movements to the circular targets reflecting the redundancy available with these larger spatial targets (Nashed et al., 2012). Trials that navigated outside of both obstacles resulted in significantly different (K-S test, D = 0.31, p ≪ 0.01) terminal end positions on the bar compared (−4.85 ± 2.77 cm) to those that navigated between obstacles (−0.08 ± 2.34 cm).
Experiment 2: target selection after mechanical perturbations
The unperturbed reaches were straight and qualitatively similar whether there were one or three end targets presented (Fig. 1D). In the three-target condition, subjects could reach any of the three end targets, but they always chose to reach to the central target that was closest to the start position (Fig. 11A). In the three-target condition, we observed that larger perturbations always resulted in switching from the central goal to the leftward goal. Medium perturbations resulted in similar responses to the larger perturbation with ∼85% of trials causing a switch to the leftward goal in the three-target case (Fig. 11A). Small perturbations resulted in an ∼60% switch rate between movement strategies, with some trials being corrected back to the originally intended central goal and others to the leftward goal. In contrast, for the one-target condition, random perturbations applied just after movement onset resulted in rapid corrective responses toward the end target for all perturbation magnitudes (medium perturbation illustrated in Fig. 11B).
Behavior and muscle activity for the medium-sized leftward perturbation in Experiment 2. A, Hand trajectories from a representative subject for the medium perturbation magnitude in the three-target condition. Red and blue trajectories represent trials that navigated to a new target and returned to the originally intended target, respectively. The black lines indicate the unperturbed reaching conditions. B, Hand trajectories from a representative subject for the medium perturbation magnitude in the one-target condition. Blue trajectories represent trials that returned to the originally intended target. C, Perturbation-evoked response obtained after subtracting the activity of unperturbed reaches for each individual subject. Black line indicates mean, and shaded color represents SE for each target condition. D, Difference in muscle activity between the two responses in C (mean ± SE). *p < 0.05. E, F, Trajectories generated by the optimal feedback controller for each target condition. Red and blue trajectories represent trials following the medium perturbation in the three-target and one-target conditions, respectively.
We compared the rapid motor responses generated for the medium perturbation for the one-target and three-target condition (Fig. 11C,D). Preperturbation activity was statistically similar across the two conditions (Fig. 11C,D; paired t test, T(7) = 0.61, p = 0.569). The R1 and R2 were similar for the two strategies (Fig. 11C,D; R1: T(7) = −0.41, p = 0.707; and R2: T(7) = 0.55, p = 0.608). Significant differences in muscle activity were observed in the R3 and EV epochs of time (R3: T(7) = 2.61, p = 0.03; and Vol: T(7) = 2.93, p = 0.028; 105–130 ms) with greater EMG for trials when the subjects corrected back to the single central target (Fig. 11C,D). Analysis of individual subjects yielded similar results to the group (t test, p < 0.05), with 0, 2, 7, and 8 of 8 subjects demonstrating modulation of R1, R2, R3, and EV epochs, respectively. ROC analysis revealed differences in muscle activity for the two types of strategies (between vs around obstacles) that deviated from chance at 71 ms (knee). The presence of the initial deviation occurring before the start of the R3 epoch (75 ms) may explain why a few individual subjects displayed a significant change in the R2 epoch. The timing of these ROC results is delayed by ∼15 ms compared with the decision to avoid the obstacle in Experiment 1.
Optimal control models generated qualitatively similar results in that movements were initially directed to the central target and were redirected to the peripheral targets when perturbations were applied (Fig. 11E,F). However, the hand was redirected for all perturbed trials as the cost-to-go to attain the peripheral targets was always found to be smaller than to correct back to the central target. For smaller perturbations, a specific cost to switch end targets or several other aspects of the model could result in the model to reproduce the ability to switch or maintain the same end goal for the same perturbation size.
Discussion
Our study highlights how rapid motor responses to mechanical perturbations are modulated with the presence of obstacles and multiple goals. Previous studies have shown that long-latency responses possess considerable flexibility, including goal-directed corrections (Hammond, 1956; Pruszynski et al., 2008; Dimitriou et al., 2012; Nashed et al., 2012; Pruszynski and Scott, 2012; Crevecoeur et al., 2013; Omrani et al., 2013), stability (Nichols and Houk, 1976; Akazawa et al., 1982; Krutky et al., 2010), and knowledge of limb mechanics (Lacquaniti and Soechting, 1984; Kurtzer et al., 2008, 2009; Pruszynski et al., 2011). However, in each case, there was only one nominal strategy expressed in the corrective response. Here we show that two distinct motor patterns or strategies can be expressed during the long-latency time period.
This ability to make corrective movements that avoid obstacles in the environment or select among alternate behavioral goals highlights the intimate link between decision making and motor control (Cisek, 2012; Wolpert and Landy, 2012). For example, recent work highlights how movements to a spatial goal can be redirected by an ongoing perceptual decision (Resulaj et al., 2009) and that long-latency motor responses are continuously modulated during this decisional process (Selen et al., 2012). As well, properties of the physics of the limb (Cos et al., 2011), extrinsic constraints, such as obstacles or penalties in the environment (Sabes et al., 1998; Trommershäuser et al., 2003), and the number and position of targets (Chapman et al., 2010) also influence our decisions on how to move in the world. Our work illustrates that factors that have been shown to influence decisional processing before moving can also be taken into account during movement when certain conditions arise, such as external disturbances to the motor system.
An important question is to what degree the rapid motor responses to avoid obstacles or move to alternate goals are preplanned. The ability to evoke long-latency responses based on subject intent has a long, rich history since the seminal studies by Hammond (1956) (for a review, see Pruszynski and Scott, 2012). However, these previous studies typically involved the rapid initiation of movement from a stationary posture. Further, there was only one nominal goal, verbal or spatial. In contrast, it seems unlikely that either of the motor responses to avoid the obstacles was entirely preplanned. First, we observed no difference in the overall positional distributions (combined movement strategies) for the medium perturbation condition, suggesting that there was no systematic bias in the initial aiming direction. In Experiment 1A, although hand positions at perturbation onset were similar, when the medium perturbation was decomposed into the two strategies (between vs around obstacles), it is clear that the position of the hand influenced which strategy was expressed (Fig. 3A,B). Second, subjects did not know the presence, size, or direction of the perturbation applied (Fig. 1B). Therefore, the actual perturbation direction and magnitude, unknown at movement initiation, was clearly influencing the decision to navigate between or around the obstacles.
At the same time, it is difficult to imagine that the entire decisional process to avoid the obstacle or attain an alternate goal was performed after the perturbation, given the speed of the motor responses (∼60 ms and ∼75 ms, respectively). OFC provides an important didactic model to describe how multiple potential motor strategies during movement can be precomputed and how the selection of the best motor command determined by the cost-to-go is expressed in the feedback control policy. In the framework of OFC, the initiation of movement toward a behavioral goal is selected to minimize the expected remaining cost. This cost-to-go also dictates how to select motor commands at each point in time based on the estimated state of the system (e.g., position, velocity). In our optimal control models, corrective responses to the right or left of the obstacle were dictated by the instantaneous estimated position of the hand. In other words, both strategies to avoid the obstacle are computed and represented in the feedback control policy. Which strategy is evoked on a given trial simply reflects the estimated instantaneous position (and velocity and acceleration) of the hand following the perturbation, an interplay between variability in the trajectory generated by intrinsic noise in the motor system and internal knowledge of the motion evoked by the perturbation likely available during the long-latency epoch (Crevecoeur and Scott, 2013). Although our model used two competing cost-to-go functions, in principle, a single (more complex) state-dependent control policy can capture such changes in movement strategy, including rerouting around the obstacles or changes in movement goal (Bryson and Ho, 1975).
It is important to note that, although OFC provides a useful normative model to describe these corrective responses, we do not propose that the brain explicitly implements its mathematics (Scott, 2012). Such models identify what good control ought to look like; deviations from “optimal behavior,” such as timing differences for corrections to avoid an obstacle versus attain an alternate goal, provide important insight on the simplifying strategies used by the brain to control motor actions (see below).
In the present case, optimal control highlights that voluntary actions reflect an similar interplay between preplanning and online feedback control. The preplanning phase requires not just setting motor circuits to generate a spatiotemporal pattern of muscle activities to generate movement to the desired goal. Rather, the motor circuits must also provide appropriate corrective responses that consider the features of the behavioral goal, environmental conditions, and properties of the limb. A small disturbance or noise in the motor system leads to a small corrective response (Crevecoeur et al., 2012). However, a large disturbance may require a more complex response such as to avoid an object in the environment or to choose an alternate goal, as in the present study. This preselection process permits rapid “decisional” processes to be generated simply based on estimates of the present state of the limb.
Thus, we propose that the motor system calculates many potential responses to attain a behavioral goal with the specific pattern that is actually expressed being dependent on the estimated state of the limb during movement. Given the nonlinear complexities of the motor system and environment, it is clearly not possible to calculate all possible solutions, so there must be simplifying approaches and limits both spatial and temporal to the amount of alternate strategies that can be considered in a control policy. For example, the motor system may only prepare online corrective actions for a short time period (e.g., ∼100 ms in advance) much like model-predictive control generates locally OFC over a finite time horizon (Lee, 2011). Further, there may be limits as to the number of obstacles or alternate paths that can be preplanned in the control policy. Previous work has shown that we can plan multiple potential actions before movement initiation (Cisek and Kalaska, 2005), and we suggest that similar processes can be maintained and used during online control. Thus, an aspect of elite motor performance may be the ability to prepare multiple ways to perform a task so that alternate strategies can be rapidly selected based on sensory feedback during movement.
Differences in the timing of corrective responses associated with avoiding obstacles (Experiment 1) versus correcting to alternate goals (Experiment 2) suggests that, unlike a single-stage optimal feedback controller, the brain has a hierarchical structure for processing different aspects of motor corrections. The long-latency stretch response has traditionally been divided into two separate time periods, consisting of the R2 epoch (∼50–75 ms) and the R3 epoch (∼75–105 ms). A distinction between R2 and R3 was first proposed by Lee and Tatton (1975) and subsequent work connected R2 with transcortical feedback through primary somatosensory and motor cortex, whereas R3 with cerebellocortical feedback involving the dentate nucleus (Meyer-Lohmann et al., 1975; Thach, 1975; Evarts and Tanji, 1976; Strick, 1976, 1983; Vilis et al., 1976). Studies on the sophistication of rapid motor responses for the proximal arm have generally found that task-dependent changes occur during the R2 epoch, including knowledge of limb dynamics, stability, and intended movement (Hammond, 1956; Nichols and Houk, 1976; Akazawa et al., 1982; Lacquaniti and Soechting, 1984; Kurtzer et al., 2008, 2009; Pruszynski et al., 2008, 2011; Krutky et al., 2010; Dimitriou et al., 2012; Nashed et al., 2012; Pruszynski and Scott, 2012; Omrani et al., 2013). The avoidance of obstacles observed in the present study also led to changes in rapid motor responses in the R2 epoch. In contrast, the selection of alternate goals in Experiment 2 of the present study is later and appears to elicit a response only during the R3 epoch. These new features of R2 and R3 processing may reflect primary somatosensory and dentate input, respectively, to primary motor cortex, although many other pathways may also be involved (Scott, 2012). From a behavioral perspective, R2 appears to reflect corrections for a specific behavioral goal, including whether this goal has been prepared and simply needs to be launched following a mechanical perturbation (e.g., Hammond, 1956; Evarts and Tanji, 1976; Pruszynski et al., 2008). In contrast, the R3 epoch may provide a higher-level corrective process to elicit rapid motor responses to select new or alternate behavioral goals not presently selected. Thus, R2 reflects corrective responses on “how” to attain a behavioral goal, whereas R3 reflects responses on “what” goal to attain. Whether these processes reflect different feedback pathways and/or brain regions remains to be elucidated.
Footnotes
This work was supported by the National Sciences and Engineering Research Council of Canada. We thank Kim Moore and Justin Peterson for their technical support.
S.H.S. is associated with BKIN Technologies, which commercializes the KINARM device used in this study. The remaining authors declare no competing financial interests.
- Correspondence should be addressed to Dr. Stephen H. Scott, Room 219, Botterell Hall, Queen's University, Kingston, Ontario K7 L 3N6, Canada. steve.scott{at}queensu.ca