Everyday movements pursue diverse and often conflicting mixtures of task goals, requiring sensorimotor strategies customized for the task at hand. Such customization is mostly ignored by traditional theories emphasizing movement geometry and servo control. In contrast, the relationship between the task and the strategy most suitable for accomplishing it lies at the core of our optimal feedback control theory of coordination. Here, we show that the predicted sensitivity to task goals affords natural explanations to a number of novel psychophysical findings. Our point of departure is the little-known fact that corrections for target perturbations introduced late in a reaching movement are incomplete. We show that this is not simply attributable to lack of time, in contradiction with alternative models and, somewhat paradoxically, in agreement with our model. Analysis of optimal feedback gains reveals that the effect is partly attributable to a previously unknown trade-off between stability and accuracy. This yields a testable prediction: if stability requirements are decreased, then accuracy should increase. We confirm the prediction experimentally in three-dimensional obstacle avoidance and interception tasks in which subjects hit a robotic target with programmable impedance. In additional agreement with the theory, we find that subjects do not rely on rigid control strategies but instead exploit every opportunity for increased performance. The modeling methodology needed to capture this extra flexibility is more general than the linear-quadratic methods we used previously. The results suggest that the remarkable flexibility of motor behavior arises from sensorimotor control laws optimized for composite cost functions.
- arm movement
- optimal feedback control
- composite cost
- obstacle avoidance
- stability–accuracy trade-off
- behavioral flexibility
Humans interact with a diverse and uncertain environment requiring flexible motor behavior. Here, we show that the optimal feedback control theory that we (Todorov and Jordan, 2002a,b; Todorov, 2004, 2005; Todorov et al., 2005) and others (Meyer et al., 1988; Loeb et al., 1990; Hoff, 1992; Kuo, 1995; Scott, 2004) have pursued affords the flexibility apparent in behavioral data, in contrast with more traditional theories (Flash and Hogan, 1985; Uno et al., 1989; Bizzi et al., 1992; Feldman and Levin, 1995). We distinguish two forms of flexibility. From the perspective of motor planning or preparation, flexibility entails taking into account multiple task requirements and properties of the environment known before movement, and preparing a sensorimotor strategy with both open-loop and closed-loop components customized for the present task and circumstances. From the perspective of motor execution, a strategy is flexible if its closed-loop component makes on-line adjustments that exploit the multiple ways in which a redundant musculoskeletal plant can achieve the same behavioral goal. Both forms of flexibility are obvious desiderata for a well designed estimation-and-control system such as the sensorimotor system.
Previous work has emphasized the evidence for flexibility during execution, in particular the structure of motor variability (which is larger in task-irrelevant dimensions) and the goal-directed nature of on-line corrections (Bernstein, 1967; Cole and Abbs, 1987; Scholz and Schoner, 1999; Domkin et al., 2002; Todorov and Jordan, 2002b). We explained such phenomena with the minimal intervention principle, which states that task-irrelevant deviations from the average behavior should be left uncorrected to maximize performance (Todorov and Jordan, 2002a,b; Todorov, 2004). This argument is further advanced here by showing that (1) target displacement can cause correction before the hand has cleared an intermediate obstacle, ruling out the imaginary via points postulated by alternative models, and (2) end-point variability matches the shape of an elongated target and feedback corrections in the redundant dimension are suppressed, as the minimal intervention principle predicts. Apart from these findings, however, our emphasis here is on flexibility in motor planning/preparation.
With the exception of speed–accuracy trade-offs (Jeannerod, 1988), the systematic relationship between sensorimotor strategies and mixtures of task goals (as well as properties of the environment) has received surprisingly little attention. Optimal control models, which dominate the thinking on trajectory planning, have traditionally optimized a homogeneous cost and treated all other goals as hard constraints; the latter are supposed to be specified externally, outside the scope of such models. The homogeneous cost could be energy consumption (Nelson, 1983; Anderson and Pandy, 2001), derivative of hand acceleration (Flash and Hogan, 1985), derivative of joint torque (Uno et al., 1989), endpoint variance (Harris and Wolpert, 1998). The constraints include endpoint position, final velocity and acceleration (typically zero), movement time, intermediate points along the trajectory. However, these hypothetical constraints are rarely explicit in real-world tasks, raising two questions: (1) how are their values being chosen; (2) are their values “chosen” in the first place, or are they stochastic outcomes of the complex interactions among sensorimotor strategy, noise, musculoskeletal dynamics, and environment, like any other feature of individual movements? Our previous analysis (Todorov and Jordan, 2002a,b) showed that choosing desired values for movement parameters that are not explicitly specified by the task is suboptimal, no matter how the choice is made. This answers question 2 and renders question 1 irrelevant. Instead of satisfying self-imposed constraints, we propose that the CNS relies on sensorimotor strategies optimized for composite cost functions. In the present experiments, the relevant cost components encourage energetic efficiency, endpoint positional accuracy (measured as bias and variance), endpoint stability (defined as bringing the movement to a complete stop), and movement speed (avoiding the time-out errors incurred when duration exceeds a threshold). We show that, as the relative importance of these components is varied by the experimenter, subjects modify their strategy in agreement with our theory. As in previous stochastic optimal control models (Harris and Wolpert, 1998; Todorov, 2002; Todorov and Jordan, 2002b), taking into account the empirically established signal-dependent nature of motor noise (Sutton and Sykes, 1967; Schmidt et al., 1979; Todorov, 2002; Hamilton et al., 2004) turns out to be important.
Although our focus is on effects taking place before movement, the effects in question correspond to changes in a control strategy with both open-loop and closed-loop components, which in turn is best studied using perturbations. Perturbing the target of a reaching movement in an unpredictable direction has been a productive paradigm for investigating the mechanisms of on-line visuomotor corrections (Pelisson et al., 1986; Prablanc and Martin, 1992; Desmurget and Grafton, 2000). Most previous studies have introduced perturbations around the time of movement onset, and found that the hand path is smoothly corrected to reach the displaced target, in agreement with multiple models of motor control (Hinton, 1984; Flash and Henis, 1991; Hoff and Arbib, 1993; Torres and Zipser, 2002). However, perturbations introduced late in the movement may be more informative because they are not fully corrected, in contradiction with alternative models and, somewhat paradoxically, in agreement with optimal feedback control. Such phenomena have been observed with both visual target perturbations (Komilis et al., 1993) and mechanical limb perturbations (Popescu and Rymer, 2000). In the case of limb perturbations, the correction reflects both neural feedback and musculoskeletal impedance, which are seamlessly integrated (Nicols and Houk, 1976) and difficult to disentangle. Therefore, we focus on target perturbations. We first design a two-dimensional reaching experiment to rule out the trivial explanation that the incomplete correction is simply attributable to lack of time. We then replicate the phenomenon in our model, and find that it reflects a previously unknown trade-off between endpoint accuracy and stability. This yields a novel prediction: if stability requirements are decreased, then accuracy should increase. The prediction is confirmed in three-dimensional obstacle avoidance and interception experiments. The latter experiments give rise to rich motor behavior, allowing us to make a number of additional observations consistent with our theory. These include shaping variability patterns to buffer noise in redundant dimensions, adjusting movement duration to take advantage of temporal error margins, exploiting target impedance and surface friction to achieve endpoint stability, reallocating corrective action among redundant actuators to balance signal-dependent noise and inertial constraints, and correcting for task-relevant perturbations before having reached the task-irrelevant subgoals hypothesized by alternative models.
Materials and Methods
Seven subjects made planar reaching movements on a table positioned at chest level. A 21 inch flatscreen monitor was mounted above the table facing down and was viewed in a see-through horizontal mirror. In this way, computer-generated images could be physically aligned with the hand workspace. Movement kinematics was recorded with an Optotrak 3020 infrared sensor at 100 Hz. A small pointer, which had an Optotrak marker and a light-emitting diode (LED) attached near its tip, was held in the dominant right hand. The task was to move the LED to a starting position, wait for a target to appear, and move to the target when ready. Movement onset was detected on-line using a 1 cm threshold on the distance between the pointer and the starting position. Analysis of speed profiles (see Fig. 1b) revealed that the actual movement started ∼100 ms before the distance threshold was reached. Therefore we define the origin of the time axis to be 100 ms before the on-line detection of movement onset and report all times relative to this corrected origin.
The end of the movement was defined as the first point in time when the hand speed had remained <0.5 cm/s for 40 ms. The LED was turned off at movement onset, turned on at the end of the movement, and remained on in the repositioning phase. The room was dark. The target was always visible. Thus, movements were made without visual feedback of the hand, although subjects could see their endpoint error as soon as the movement ended. Movement duration was required to be between 600 and 800 ms. If the duration on any trial fell outside these boundaries, the computer displayed a “slow down” or “speed up” message, respectively. Movement amplitude was 30 cm. The main movement was in the lateral direction from right to left (although in Figs. 1a and 2a, we plot the movements from left to right, for consistency with the space–time plots).
After a brief familiarization session, every subject performed 240 trials. Within each trial, the target could either remain stationary or jump 5 cm forward or backward, orthogonal to the main movement direction. Subjects were instructed that jumps may occur, and asked to always move to the final target position and stop there within the allowed time interval. The jumps occurred at 100, 200, or 300 ms. There were 180 perturbed trials (30 for every possible latency–direction combination) and 60 nonperturbed trials, presented in random permutation order.
Eight subjects made three-dimensional arm movements around a horizontal obstacle while aiming for the center of a vertical target (see Fig. 4a). Movement dimensions are illustrated in Figure 4b. The target was a 20 × 5 cm wooden board with a bull's-eye pattern, and was mounted on a 3DOF robot (Delta Haptic Device; Force Dimension, Lausanne, Switzerland). Subjects held in their right hand a 7 cm wooden pointer with an electromagnetic Polhemus (Colchester, VT) Liberty sensor attached to it. The sensor measured three-dimensional position and orientation (at 240 Hz), making it possible to compute the position of the tip of the pointer. The latter is referred to as the “hand.” Another sensor was attached to the target (which was also tracked via the encoders of the robot at 1000 Hz). Before each trial, the robot moved the target away and waited for the subject to initiate the trial, by inserting the tip of the pointer in a small receptacle mounted below the obstacle and remaining stationary for 100 ms. Then, the robot “presented” the target by moving toward the subject, at which time the subject was free to move when ready. In case of an anticipation error, the computer played a sound and aborted the trial. Hand movement onset was detected with a 1 cm position threshold on the distance from the starting position. Subsequent analysis of speed profiles revealed that the movement started ∼50 ms before it was detected. Therefore, we define the origin of the time axis to be 50 ms earlier.
Movement end was detected when the hand speed remained <10 cm/s for 40 ms, or when the target was displaced because of impact with the hand by >0.4 cm. The maximum allowed movement duration was 900 ms. Time-out errors were signaled by moving the target away and playing a loud sound. During the hand movement, the target was either stationary or rapidly displaced by the robot 9 cm left or right. The robot trajectory is illustrated in Figure 4, c and f; it was generated by a model-based controller with both open-loop and closed-loop components. The target jump could be initiated at 50 ms (early) or 350 ms (late).
After brief familiarization, subjects performed 20 trials without perturbations, followed by two experimental conditions/sessions with perturbations, 120 trials each. Each experimental session included 40 early jumps, 40 late jumps, and 40 baseline trials presented in random permutation order. Left and right jumps were equally probable. Subjects were instructed that jumps may occur and asked to always move to the final target position within the allowed time interval. The two experimental sessions differed in the stopping requirements of the task. In the stop condition, subjects were asked to slow down their hand movement before contact and touch the target gently, so that the impact would not displace the target by >0.4 cm. If it did, the computer played an explosive sound and the target was moved away rapidly. This indicated to the subject that the target has been hit too hard. In the hit condition, hitting the target hard was no longer an error. The robot was placed in high-gain servo mode (with carefully chosen nonlinearities to avoid instability) and was able to absorb the impact with the hand. The maximum force output of the robot is 25 N. Subjects were not explicitly asked to hit the target harder, but they quickly discovered the benefits of using such a strategy. One-half of the subjects performed the hit condition first and then the stop condition; the order was reversed for the other one-half of the subjects.
Ten subjects performed a task similar to experiment 2, with the following modifications. A pressure sensor (FSR; Interlink Electronics, Camarillo, CA) was installed inside the starting position receptacle and used to detect movement onset earlier and more reliably. An ATI Mini-40 six-axis force-torque sensor (2000 Hz sampling) was mounted behind the target. It allowed more reliable detection of contact (which was now defined as the movement end) as well as direct measurement of contact force. Subjects still initiated the movement when ready. As soon as hand movement was detected, the robot began to move downward at a constant speed (6.67 cm/s). This motion continued until the trial ended because of hand–target contact, or until the target hit the horizontal edge of a wooden board mounted underneath. This happened 900 ms after movement onset and was defined as a time-out error. The downward motion was repeatable and easily predictable, providing subjects with an explicit representation of allowed movement duration. When the target jumped (9 cm left or right), the rapid lateral motion was superimposed on the slow downward motion. Instead of the bull's-eye pattern, the target now had a pattern of vertical stripes (5 × 1 cm each), with gray levels increasing with lateral distance from the center stripe (which was white). Subjects were asked to make contact with the target as close to the center stripe as possible. In the stop condition, the threshold for hitting too hard was now defined in terms of force rather than displacement (0.8 N in the first 8 ms after contact). The bookshelf obstacle from Figure 4a was now replaced with a horizontal bar, and moved backward to induce a more curved hand movement (dimensions are illustrated in Fig. 5c). Early jumps were triggered at movement onset; late jumps were triggered at 400 ms after movement onset; allowed movement duration was 900 ms. Every subject still participated in two conditions, hit and stop, in counterbalanced order. In each session, we scheduled 25 baseline trials, 25 early jumps, and 25 late jumps in random permutation order. Any failed trials (time-out errors or hitting-hard errors) were now rescheduled at a random time later in the same session, yielding a more balanced database of analyzable trials. Instead of 20 no-perturbation trials before the experiment, we now had 10 trials before and 10 trials after the experiment. These were used to compute variability in the absence of perturbations (see Fig. 8, “baseline”).
In experiments 2 and 3, the wrist was immobilized with an orthopedic brace to avoid corrective movements using the wrist. There were two reasons for this restriction. First, pilot experiments revealed different involvement of the wrist in early versus late corrections (see Fig. 6c), making the comparison between conditions difficult. Second, our models assume point-mass dynamics and do not capture wrist movements.
In experiments 2 and 3, both the time-out errors and the hitting-hard errors were signaled immediately, in a way that disrupted the behavior, and therefore error trials could not be included in the analysis. Only no-error trials were analyzed. They constituted 79% of all trials in experiment 2 and 58% in experiment 3. The higher overall error rate in experiment 3 is because we repeated failed trials, and so subjects performed more trials in the more difficult conditions (late/stop in particular). In experiment 1, errors were signaled by the computer only after the movement had stopped. Thus, the error signals could not disrupt the behavior, allowing us to include trials whose duration was slightly over the time limit (up to 100 ms). Presumably these movements were generated by the same underlying mechanism and the longer duration was simply attributable to trial-to-trial variability. Ninety-three percent of all trials in experiment 1 were analyzed.
All statistical tests were based on n-factor ANOVA (“anovan” in the Matlab Statistics Toolbox). We avoided averaging to the extent possible. In the comparisons of undershoot and duration (see Figs. 1d,g, 4d,e,g,h) and wrist contribution (see Fig. 6c), individual trials were treated as repeated measures, and the factors were the experimental conditions (perturbation time, stop vs hit when applicable) as well as the subject identity. Thus, we had two factors in experiment 1 and three factors in experiments 2 and 3 and the pilot experiment in Figure 6c. The subject identity was modeled as a factor with random effects because subjects are drawn randomly from the population. In the comparisons of SDs (see Figs. 3e, 6a), time-out error rates (see Fig. 1h), and lateral velocities (see Fig. 6b), all trials that a subject performed in a given condition were combined to obtain a single number. All comparisons of means were based on Tukey's criterion for post hoc hypothesis testing. Differences are reported as significant when p < 0.05. The error bars shown in the figures correspond to ±1 pooled SE of the mean, as computed by the multiple-comparison function used to perform Tukey's test (“multcompare” in the Matlab Statistics Toolbox).
Optimal feedback control model (linear-quadratic-Gaussian).
We model the hand as an m = 1 kg point mass moving in a horizontal plane, with viscosity b = 10 Ns/m approximating intrinsic muscle damping. The point mass is driven by two orthogonal force actuators that can both pull and push (approximating two pairs of agonist–antagonist muscles). The actuators act as muscle-like first-order low-pass filters of the control signals, with time constant τ = 0.05 s. These settings of m, b, and τ were chosen to be compatible with biomechanics and were not adjusted to fit the data.
Let p (t), v (t), a (t), u (t) be the two-dimensional hand position, velocity, actuator state, and control signal, respectively. The corresponding units are meters, meters/second, newtons, and newtons. The time index is t ϵ [0, tf]. The final time tf is specified (taken from the experimental data in Fig. 1g). The plant dynamics in continuous time are modeled as follows: w (t) is standard Brownian motion. M (u (t)) represents control-multiplicative or signal-dependent motor noise, and is given by the following: c1 = 0.15 corresponds to two-dimensional noise in the same direction as the control vector u (t), whereas c2 = 0.05 corresponds to two-dimensional noise in the direction orthogonal u (t). The parallel noise component is larger because muscles pulling in the direction of net muscle force are more active and therefore more affected by signal-dependent noise. The parameters c1, c2, as well as the sensory noise magnitude σ described below, are adjusted so that the baseline variability predicted by the model (see Fig. 2e) is similar to the experimental data (see Fig. 1e). Note that the shape of these curves cannot be fully captured by three scalar parameters, and so the good fit mostly reflects the quality of the model. Denoting the target position p* (t), we can assemble all variables into an eight-dimensional state vector as follows: and write its dynamics in general first-order form as follows: with A, B, and C obtained from the above equations.
To define an optimal control problem, we also need a cost function. As in our previous work (Todorov and Jordan, 2002b), we use a mixed cost function defined as follows: The three cost terms encourage endpoint positional accuracy, stopping at the target, and energetic efficiency, respectively. The activations a (t) are scaled by sa = 0.1 because their numerical values turn out to be an order of magnitude larger than positions and velocities. The weight wenergy = 0.00005 of the control term is a free parameter (it is not clear how to estimate this parameter independently). The weight wstop determines the relative importance of coming to a complete stop (i.e., achieving zero velocity and acceleration) at the end of the movement. We use wstop = 1 for the stop condition and wstop = 0.01 for the hit condition. These values are chosen to capture the qualitative differences between the stop and hit conditions. Note that we do not model the three-dimensional experiments explicitly.
The state of the plant x (t) is not directly observable but has to be inferred from noisy observations whose time integral y (t) satisfies the following: The sensory noise covariance is diagonal: G = σdiag (1, 1, 1, 1, 1/sa, 1/sa, 0, 0) with σ = 0.015 adjusted to reproduce the observed movement variability. z (t) is standard Brownian motion. In comparison with our previous model (Todorov and Jordan, 2002b), the present model is simpler in that here we use first-order rather than second-order muscle filters and do not explicitly represent sensory delays. We do not model explicitly the visuomotor delays or uncertainty in detecting the target observations. To obtain correct reaction times, we simply model each target perturbation as occurring 120 ms later than the corresponding experimental perturbation.
With these definitions, we discretize the time axis (with 1 ms time step) and obtain a discrete-time linear-quadratic-Gaussian (LQG) optimal control problem. The reason for formulating the model in continuous time and then discretizing the time axis, as opposed to working in discrete time all along, is that in a discrete-time formulation the model parameters are affected by the time step. If one were to change the time step, it would not be obvious how the model parameters should scale. Continuous-time formulations have the advantage of being independent of discretization time steps. For details on how to discretize a continuous-time system, see Li and Todorov (2007).
The presence of signal-dependent noise complicates matters; however, we derived an efficient algorithm for solving such problems previously (Todorov, 2005). That algorithm is applied here to yield a modified Kalman filter for computing the optimal state estimate, x̂(t), and an optimal feedback controller of the following form: Once the filter and controller are available, the state is initialized with the experimentally defined starting position and v (0) = a (0) = 0, and the system is simulated until the final time tf.
The time-varying matrix of feedback gains L (t) is 2 × 8 and is in principle described by 16 numbers. However, in the present problem, it turns out to have a lot of structure that can be captured by only three independent parameters. In particular, the control law can be written as follows: where kp, kv, and ka are time-varying scalar gains illustrated in Figure 3, a and c.
When a perturbation is introduced in the model, the final time tf is adjusted according to Figure 1g, and the optimal estimator and controller for the remainder of the movement are recomputed (given the new final time and target position). This is necessary because the optimal feedback gains are originally scheduled up to the duration of the unperturbed movement. Consequently, the predicted feedback corrections are not generated by exactly the same feedback gains as shown in Figure 3, a and c.
However, the change attributable to the recomputation is small, and so our intuitive analysis of feedback gains is valid. Similar recomputation is involved in the minimum-jerk feedback controller. A more general optimal feedback control model capable of predicting the changes in movement duration is described later.
The model parameters, and the criteria for choosing their values, can be summarized as shown in Table 1.
Modified minimum-jerk model.
The original minimum-jerk model (Flash and Hogan, 1985) postulates that the hand moves from a starting position p0 to a target position p* along a trajectory that minimizes the time integral of the squared jerk (third derivative of position) as follows: To make this optimization problem well posed, one has to specify the velocity and acceleration at the endpoints. Let v0 and a0 be the initial velocity and acceleration (possibly nonzero) and suppose the final velocity and acceleration are 0. Then the constraints are as follows: One can find the solution to this minimization problem using the calculus of variations (Flash and Hogan, 1985). The expression for the optimal trajectory p(t) is a somewhat complicated function of t, tf, p0, v0, a0, p*. Differentiating that function with respect to t three times yields the following: where tf − t is the remaining movement time. The rationale for working with third derivatives is that we are given all derivatives up to second order at the initial time, and if have a way of computing the third derivative at each time, we could simply integrate and obtain the entire trajectory. This suggests a feedback-control formulation (Hoff and Arbib, 1993) in which p⃛ is only computed at the current time and is treated as an instantaneous control signal u as follows: The three scalar coefficients in the above expression are time-varying feedback gains for a third-order system with state vector Note that the feedback-control formulation allows us to make the target position time varying.
In the absence of perturbations, the trajectory predicted by this modified minimum-jerk model is identical with the prediction of the original minimum-jerk model (Hoff and Arbib, 1993). The advantage of the modified formulation is that it can generate feedback corrections and thus serve as a model of perturbation experiments. Note that the modified minimum-jerk model reflects a very different philosophy compared with the original model, because a trajectory plan for the rest of the movement is no longer needed. In that sense, it is closer to our optimal feedback control model (Todorov and Jordan, 2002b). The empirical success of jerk minimization has often been interpreted as evidence for trajectory planning. Such interpretations are unjustified given that the same predictions can be made without the assumption of trajectory planning.
Optimal feedback control model (Markov decision process).
Here, we describe a different optimal feedback control model in which movement duration is no longer predefined and task constraints are enforced more explicitly. It is constructed using the more general but less efficient methodology of Markov decision processes: the continuous state and action spaces are discretized (Kushner and Dupuis, 2001) and the resulting discrete optimization problem is solved via dynamic programming (Bertsekas, 2001). Discretization methods suffer from the curse of dimensionality and only apply to low-dimensional problems. This necessitates a simplification of the dynamics model: the arm is now modeled as a fully observable second-order plant with state vector containing hand position p and velocity v and control vector u corresponding to hand acceleration. All quantities are expressed in units of centimeters and seconds. The initial state is p (0) = v (0) = [0;0]. The default target position is p* = [20;0] but can be perturbed to either [20;5] or [20;−5]. Instead of perturbing the target, we perturb the hand in the opposite direction (without changing hand velocity) and then correct the hand and target positions in the subsequent analysis. In this way, the target can be treated as constant and omitted from the state vector.
Each trial ends when the horizontal hand position exceeds 20 cm (i.e., the hand reaches the target plane) or when the duration exceeds a maximum allowed duration of 0.6 s, whichever comes first. Let tf denote the duration of a given trial. The total cost to be minimized is defined as follows: The final cost, computed at the end of the movement, is defined as follows: The endpoint velocity threshold vmax is 5 cm/s in the stop condition and 20 cm/s in the hit condition. These values were chosen to match the observed endpoint velocities. The main movement amplitude (20 cm), perturbation amplitude (5 cm), maximum movement duration (0.6 s), constraint violation cost (100), and other simulation parameters described below were chosen in advance and were not adjusted. The only parameters that were adjusted to fit the data were wenergy = 0.00003 and wtime = 20. This was done by solving the problem multiple times for different points in wenergy − wtime space. The qualitative pattern of results shown in Figures 7 and 8 depended weakly on wtime but was sensitive to wenergy. The latter parameter, denoted r in other studies, has proven to be important in almost every optimal control model we ever constructed.
The sizes of the discretization grids were 101 × 61 points in position (20 × 12 cm; step size, 0.2 cm), 25 × 25 points in velocity (80 × 80 cm/s; step size, 3.33 cm/s), 11 × 11 points in acceleration (1666.67 × 1666.67 cm/s2; step size, 166.67 cm/s2), and 31 points in time (0.6 s; step size, 0.02 s). These numbers were carefully balanced so that the grid density was sufficient to allow accurate approximation, the grid range was sufficient to cover the optimal state-control trajectories, and yet the number of grid points was not intractably large. The noise was uniform and additive, and perturbed the hand velocity by up to ±2 grid points in each time step.
The feedback control law is in the following general form: where u, p, v, and t are constrained to the corresponding grids. Unlike the LQG framework in which the function π is linear and can be represented with a small number of feedback gains, here we do not know in advance the form of π. Instead, we represent it as a lookup table that specifies the value of u for every possible combination of p, v, and t. This table consists of ∼220 million numbers computed by the dynamic programming algorithm in about one-half an hour of CPU time. To speed up the computation and be able to explore the effects of various parameters, we also implemented the algorithm on an nVidia GeForce 8800 GTX videocard with 128 parallel processors. This reduced the running time of the algorithm by about a factor of 30. Because the videocard only supports single-precision floating point arithmetic, we used it for model exploration and run the final model on an Intel CPU with double precision. Once the control laws for the stop and hit conditions were obtained (the only difference being the value of vmax), we applied them to the stochastic plant and simulated 3000 movement trajectories per condition: 1000 without perturbation, 1000 with perturbation at 0.1 s, and 1000 with perturbation at 0.3 s.
Although perturbations were applied in the testing phase to characterize the response of the optimal feedback control laws, the control laws themselves were optimized for an environment without perturbations. To study possible adaptation effects, we computed a second pair of feedback control laws optimized for a perturbed environment. In the latter environment, the target could jump either up or down (at 0.2 s) or remain stationary. The three types of trials had equal probability. Target perturbations were taken into account in the optimization process by incorporating an appropriate position noise term (with trimodal distribution) in the dynamics model.
Undershoot in reaching to perturbed targets
We define “undershoot” as endpoint error in the direction in which the target was displaced. In reaching movements, undershoot (or incomplete correction) for late target perturbations has already been demonstrated (Komilis et al., 1993). However, a key question remains unanswered: is the effect simply attributable to lack of time, or is there a more subtle reason? Experiment 1 was designed to rule out the first possibility. Subjects made lateral reaching movements on a horizontal table, without vision of the hand, while the target was displaced in an orthogonal direction (forward or backward relative to the subject) at different times during movement. Reach duration was experimentally controlled to ensure that even the latest perturbation could have been fully corrected if that was the only objective of the underlying control strategy. More precisely, the remaining time after the onset of the latest correction was substantially larger than the time necessary to make the same movement in isolation.
Figure 1a shows the average hand paths for different perturbation times as well as for baseline (unperturbed) movements. Note the undershoot for 300 ms perturbations. In the rest of the analysis, the backward-perturbed trials are mirrored around the horizontal axis and pooled with the corresponding forward-perturbed trials. Figure 1b shows the tangential speed profiles. The early correction is incorporated so smoothly that its effect on the speed profile is hardly visible. The late correction, in contrast, causes a clear deviation from the bell-shaped baseline profile. One could interpret this as a discrete submovement superimposed on the main movement; however, we will see below that the same effect can arise from a continuous optimal controller. The corrective movement, defined as movement in the forward direction, is shown in Figure 1c. The undershoot for the 300 ms perturbation is significantly larger than the undershoot for the 200 and 100 ms perturbations (Fig. 1d). Note that subjects are moving without visual feedback of the hand, and therefore some misalignment between vision and proprioception (Van Beers et al., 1999) should be expected. This may be the cause for the slight overshoot in the 100 ms conditions (indeed no overshoot was observed in the remaining experiments that were performed with visual feedback). Such misalignment should not depend on the time of the target perturbation, and so the comparison between conditions is meaningful. There is also some systematic endpoint error in the lateral direction, although it shows a weaker and opposite trend (Fig. 1d); we return to it later.
Figure 1f shows the acceleration profile of each corrective movement, aligned on the time when forward acceleration first exceeds 5% of peak forward acceleration. The correction for the late perturbation lasts ∼400 ms, which is more than sufficient to make a movement with the amplitude needed for complete correction. Indeed, an accurate 5 cm movement to a 1-cm-diameter target should take a little under 300 ms according to Fitts' Law (Jeannerod, 1988). Thus, the lack of complete correction is not simply attributable to lack of time. Yet it has something to do with time: the overall movement duration was significantly increased in late-perturbation trials, all the way up to the 800 ms time limit (Fig. 1g), and consequently the percentage of time-out errors was increased (Fig. 1h). Although we asked subjects to treat the time limit as a hard constraint, they treated it on equal footing with the instruction to reach as close as possible to the center of the target and found a balance between these two task requirements. It is reasonable to assume that if we had convinced subjects to avoid time-out errors at all cost, the undershoot would have been even larger.
Optimal feedback control versus alternative models
The undershoot phenomenon is inconsistent with all previous models of motor control we are aware of. One such model (Flash and Henis, 1991) is an extension of the minimum-jerk model of trajectory planning (Flash and Hogan, 1985) to the domain of feedback corrections. It postulates that the hand tracks a planned minimum-jerk trajectory, and if the target is displaced, another minimum-jerk trajectory connecting the original and displaced target positions is added vectorially to the original plan. Naturally, that model predicts full correction in all cases (Fig. 2f). A related model (Hoff and Arbib, 1993), discussed in more detail below, is a feedback-control version of the minimum-jerk model. In our task, it makes the same predictions as the additive model (Flash and Henis, 1991) in Figure 2f. Another incompatible class of models consists of equilibrium-point control (Feldman and Levin, 1995; Bizzi et al., 1992) as well as other schemes (Hinton, 1984; Hoff and Arbib, 1993; Torres and Zipser, 2002) in which the hand is drawn to the target by some virtual spring. In such models, stopping can only occur when the hand reaches the target, which is in general contradiction with systematic endpoint errors that cannot be attributed to sensory-motor misalignment. In addition to the undershoot studied here, phenomena that are problematic for these models include the undershoot of primary saccades (Harris, 1995), the overshoot of rapid wrist movements (Hoffman and Strick, 1999; Haruno and Wolpert, 2005), and the lack of equifinality (or failure to reach the target) in certain adaptation paradigms (Lackner and Dizio, 1994; Hinder and Milner, 2003).
Our optimal feedback control model (Todorov and Jordan, 2002b), which we have previously used to explain a number of unrelated phenomena, turns out to be compatible with the undershoot. One might have thought (as we did) that an optimal feedback controller should make larger on-line corrections and get closer to the displaced target than any other controller. This is not so, for interesting reasons explained in the next section. Here, we only present the simulation results. The model uses a linear approximation to arm dynamics. This is justifiable because the detailed nonlinearities of the arm are unlikely to influence the response to visual perturbations significantly (they are much more relevant when it comes to resisting mechanical perturbations). The model incorporates signal-dependent motor noise (Sutton and Sykes, 1967; Schmidt et al., 1979; Harris and Wolpert, 1998; Todorov, 2002) as well as sensory noise, and optimizes a mixed cost encouraging endpoint positional accuracy, endpoint stability (stopping), and energetic efficiency.
The model predictions (Fig. 2a–e) are plotted in the same format as the data (Fig. 1a–e). Note the close correspondence and, in particular, the undershoot for the late perturbation. When a target perturbation occurs in the model, we increase the remaining movement time as in the experimental data (Fig. 1g) and recompute the optimal controller (see Materials and Methods). If we use the unmodified optimal controller, which always ends the movement at the same time, the predicted undershoot is greatly increased (Fig. 2c, dashed lines). Thus, increasing movement duration is essential for avoiding a much larger undershoot in late perturbations, which may be why subjects were so reluctant to finish the movement on time as instructed.
In Figure 2d, we also see endpoint error in the lateral direction, but its magnitude decreases with increasing perturbation time, as in the experimental data. This is because the target is not perturbed in the lateral direction, and yet the overall movement time is increased, so the control costs that cause this endpoint error (energy consumption, and reduced accuracy because of signal-dependent noise) are effectively smaller. Note also the secondary speed bump in Figure 2b, which could be mistaken for a discrete submovement.
In addition to reproducing average behavior, the model faithfully captures the positional variability pattern of the hand trajectories in unperturbed trials (Figs. 1e, 2e). The larger variability in the lateral (main movement) direction reflects signal-dependent motor noise, which is larger in actuators that are more active. The reduction seen toward the end of the movement is an example of structured movement variability consistent with the minimal intervention principle (Todorov and Jordan, 2002b). Another manifestation of signal-dependent noise is the increased variability of the undershoot for late perturbations (Fig. 3e). This phenomenon is observed in the model and all experiments, and is only present in the corrective movement direction. Although any corrective movement incurs signal-dependent noise in that direction, the feedback controller has less time to suppress it when the noise is introduced late.
Analysis of feedback gains and new predictions
The above simulation results show that the optimal thing to do is make an incomplete correction. Why is this seemingly paradoxical strategy optimal? Such questions are often meaningless: the solution to a complex optimization problem is what it is, and the relationship between the problem and its solution does not have to be intuitive. Nevertheless, analysis of the model yields intuitive answers here. Key to our analysis is the fact that the optimal feedback gains are time varying.
As explained in Materials and Methods, the optimal feedback controller can be written as follows: where kp, kv, and ka are the optimal feedback gains; p̂, v̂, and â are the optimal estimates of hand position, velocity, and muscle activation state (obtained by a modified Kalman filter); p* is the target position; and u is the optimal control signal. The optimal feedback gains are illustrated in Figure 3a. We see that the positional gain kp peaks early and then decreases in the last phase of the movement. In that phase, the velocity gain kv as well as the activation gain ka, which can be thought of as force feedback, are large. Although these gain fluctuations are hard to understand quantitatively, qualitatively they have a simple interpretation: near the end of the movement the optimal controller enters a regime in which it is less sensitive to positional errors and instead aims to stop the movement in a stable manner. In retrospect, this is not surprising. If we think of a mass–spring–damper system, a large spring constant (or positional gain) will make the system underdamped and cause oscillations, which is in conflict with the requirement to stop. The optimal controller effectively makes the system overdamped, achieving stability while compromising its ability to fully respond to last-minute positional errors. Thus, our analysis has uncovered a trade-off between endpoint stability and positional accuracy.
Another intuitive explanation for the undershoot observed in the model is the control cost associated with large and rapid last-minute corrections. Large control signals are penalized in two ways. One is a direct energy cost; the other is an indirect accuracy cost resulting from the signal-dependent nature of motor noise. Increased noise is particularly undesirable near the end of the movement when the feedback loop no longer has time to correct for it (Fig. 3e). In agreement with the latter interpretation, the undershoot predicted by the model increases when either the energy cost or the signal-dependent noise magnitude are increased (results not shown).
Analysis of feedback gains is also illuminating with regard to the modified minimum-jerk model and its failure to predict the undershoot (Fig. 2f). Consider the following feedback-control formulation (Hoff and Arbib, 1993) of the minimum-jerk model. At each point in time, a new minimum-jerk trajectory is formed, starting at the current hand position, velocity, and acceleration, and ending at the target with zero velocity and acceleration. The initial portion of this trajectory is used to control the movement, and then the procedure is repeated, making it possible to correct for on-line disturbances. More precisely, the hand is treated as a third-order system in which the position p, velocity v, and acceleration a are state variables, and the control signal u is defined as the derivative of acceleration (or jerk). It can be shown (see Materials and Methods) that the minimum-jerk feedback controller has the same general form as the optimal feedback controller, but with different feedback gains as follows: We now see something that is in retrospect obvious: the only way the minimum-jerk feedback controller can always make a full correction, regardless of how late the perturbation arises, is to use infinite feedback gains at the end of the movement. As the time t approaches the final time tf, all three feedback gains go to infinity, with kp increasing faster than kv and ka. Note that we could apply this minimum-jerk controller to a partially observable system, in which state estimates are obtained by a Kalman filter, and obtain a control scheme that overall is very similar to optimal feedback control. The only important difference is in the sequence of time-varying feedback gains being used. The optimal feedback gains (Fig. 3a) not only predict behavior that better corresponds to the experimental data, but also guarantee minimal expected cost, and are finite rather than infinite (which is more biologically plausible).
We now return to the stability–accuracy trade-off, and use the model to obtain novel predictions reflecting this trade-off more directly. In the above analysis, the reason for the reduced sensitivity to positional errors was the need to stop at the target. What would happen if the importance of stopping decreases relative to the importance of reaching the target? In the model, stopping is enforced with a cost term quadratic in the final velocity and activation. If we scale down this cost term, then the optimal feedback gains change as shown in Figure 3c. Note that the positional gain kp now peaks much later. Consequently, the predicted undershoot is almost eliminated (Fig. 3b). The optimal controller for the modified cost function takes advantage of the relaxed stopping requirement, and no longer brings the velocity to zero at the specified final time, particularly for late perturbations (Fig. 3d).
Experimental confirmation of model predictions
The above predictions were tested in experiment 2, which compared two conditions: asking subjects to stop at the target versus allowing them to hit the target. Subjects made three-dimensional movements around a horizontal obstacle and aimed for a physical target attached to a three-dimensional robot (see Materials and Methods) (Fig. 4a). The obstacle was introduced to increase movement duration (so that we no longer had to impose a lower limit) and also to test the different predictions of optimal feedback control and alternative models with regard to obstacle avoidance (see next section). On randomly chosen trials, the robot rapidly displaced the target, left or right, either 50 or 350 ms after movement onset. Exceeding the maximum allowed duration (900 ms) resulted in a time-out error. Average movement trajectories are shown in Figure 4b.
Each subject was now tested in two conditions. In the stop condition, subjects were required to slow down their movement and touch the target gently. The robot used a low-gain servo controller so that the target could be easily displaced by the hand; a displacement >0.4 cm resulted in a “hitting-hard” error. In the hit condition, subjects were allowed to hit the target, although they were not instructed to do so. The robot used a high-gain servo controller and could absorb the impact with the hand; displacing the target no longer resulted in an error. The difference in target impedance made the distinction between the two conditions more ecologically valid.
The experimental results confirmed our model predictions. The undershoot for late perturbations was still present (Fig. 4c,d), but it was significantly smaller in the hit condition compared with the stop condition. For early perturbations, the undershoot was smaller (compared with late perturbations) and the difference between the hit and stop conditions was not significant. As before, late perturbations caused an increase in movement duration (Fig. 4e), and a substantial percentage of time-out errors in late/stop trials (40%). Movement duration in the stop condition was larger compared with the hit condition, and yet the correction was smaller (i.e., the undershoot was larger). Thus, as in experiment 1, subjects could have made a larger correction in the stop condition if that was their only objective.
Although at this point we had a convincing story, we remained puzzled by subjects' reluctance to treat the time limit as a hard constraint, although in experiment 2 we made time-out errors much more salient. We reasoned that this may be because the time limit does not correspond to any physical property of the environment, and instead is signaled by the computer on the basis of an (invisible) timer. Could the outcome change if we provided an explicit and ecologically valid time cue? More importantly, could such a time cue reduce the uncertainty in the task and somehow enable subjects to eliminate the undershoot? These issues were addressed in experiment 3 in which we used an interception task rather than a pointing task (see Materials and Methods). The main change was that, as soon as hand movement was detected, the robot began to move the target downward at a low constant speed. Lateral target jumps were superimposed on this downward motion. Subjects were instructed to make contact with the target before it hit the horizontal edge of a board mounted underneath. The downward motion was repeatable and easily predictable, providing an explicit representation of allowed movement duration.
The results from experiment 3 (Fig. 4f–h) were similar to those of experiment 2 and in agreement with our model predictions. The undershoot in the stop condition was again larger than in the hit condition; the difference was now significant even for early perturbations (perhaps because we modified the method for detecting hitting-hard errors, making the threshold effectively smaller). Movement duration was again increased in late-perturbation trials and was larger in the stop condition compared with the hit condition. The time-out error rate in late/stop trials was reduced to 31%, indicating that the explicit time cue had an effect, but this rate was still higher than what would be expected if subjects treated the time limit as a constraint. The time-out error rate in late/hit trials was much lower (12% in experiment 2 and 7% in experiment 3).
Absence of imaginary targets in obstacle avoidance
Optimal feedback control differs from most alternative models in that it does not invent arbitrary subgoals, such as desired trajectories or imaginary targets, but instead uses all available resources to pursue the high-level movement goal. In obstacle avoidance tasks, it predicts that the hand should clear the obstacle without aiming for a specific imaginary target to the side of the obstacle. In contrast, deterministic trajectory planning models (Flash and Hogan, 1985; Uno et al., 1989) as well as other models (Rosenbaum et al., 1999) need such imaginary targets to avoid obstacles (and make curved movements in general). Note that stochastic optimal control models can avoid this limitation by taking into account the probability of collision attributable to random deviations from the average trajectory (Hamilton and Wolpert, 2002).
Here, we present two lines of evidence that subjects do not use imaginary targets in obstacle avoidance. First, we analyze the variability pattern of the hand paths in unperturbed trials in experiments 2 and 3. If subjects were aiming for an intermediate target, their hand paths should be less variable in the vicinity of that target, as we showed previously with real targets (Todorov and Jordan, 2002b). The variability pattern is plotted spatially in Figure 5, a and c, and as a scalar quantity (variability per dimension) in Figure 5, b and d. For both experiments, and for both the hit and stop conditions, we see that the pattern is bell-shaped. In particular, there is no evidence for a reduction of variability in the middle of the movement in which the imaginary target should be.
Second, we analyze the onset of the lateral correction relative to the time when the hand clears the obstacle and starts moving toward the robot. If subjects were aiming for an imaginary target to clear the obstacle, that target should be close to the reversal point and should not move when the robot displaces the final target. Therefore, the corrective movement should not start before the reversal point. We focus on experiment 3, which was specifically designed to address this question by shifting the obstacle farther away from the starting position (thus delaying the reversal) and detecting the onset of hand movement with a pressure sensor (allowing an earlier perturbation). Figure 5e shows that the reversal occurs at ∼300 ms in both the hit and stop conditions, whereas the corrective lateral acceleration in early perturbations starts at 100–150 ms. Given the filtering of the musculoskeletal system, the neural command driving the correction must have been generated even earlier. Thus, subjects begin to correct for the target jump before having reached the hypothetical via-point, casting serious doubt on the existence of the latter.
Flexible strategies for opportunistic control
The sensorimotor strategies predicted by optimal feedback control exhibit great flexibility, in the sense that they are adapted to the task, body, and environment, and take advantage of every opportunity for achieving higher performance. This is in sharp contrast with traditional trajectory-planning models (Flash and Hogan, 1985; Uno et al., 1989), which essentially view all tasks as being the same as long as the average trajectory is the same. Here, we make four additional experimental observations illustrating the flexibility inherent in the optimal control framework.
First, in the hit condition in experiments 2 and 3, subjects actually hit the target harder, although they were not instructed to do so (and one-half of them had already performed the task in the stop condition). In both experiments, the forward hand velocity before impact was two to three times larger in the hit condition compared with the stop condition. In experiment 3, in which we used a force sensor, the normal force in the first 50 ms after contact was approximately three times larger in the hit condition compared with the stop condition. Both the model and the experimental results suggest that the stopping requirement causes decreased sensitivity to positional errors late in the movement. Thus, the relaxed stopping requirement in the hit condition is exploited to increase positional accuracy. It should be noted, however, that the stopping requirement was not fully eliminated in the hit condition. In experiment 2 (unperturbed trials, hit condition), subjects reduced their hand speed from 175 cm/s peak to 39 cm/s before contact, a 78% reduction; in experiment 3, this reduction was 84%. For comparison, the speed reduction in the stop condition was 96 and 92%, respectively.
Second, subjects exploited the relaxed accuracy requirement in the vertical direction in experiment 3, in which the target was a vertical stripe rather than a circle. Focusing on unperturbed trials, we see that lateral and vertical endpoint errors are equally variable in experiment 2, but vertical “errors” are significantly more variable than lateral errors in experiment 3 (Fig. 6a). Furthermore, in experiment 3, subjects did not fully use feedback to adjust their vertical hand position relative to the falling target. Indeed, variability of the vertical endpoint position in absolute coordinates (relative to the room) was smaller than variability relative to the target. We know that subjects are able to correct in the lateral direction, not completely, but still the undershoot is much smaller than the correction. Thus, the difference between absolute and relative variance in the vertical direction does not reflect an inability to correct, but rather an absence of a need to correct, in agreement with the minimal intervention principle (Todorov and Jordan, 2002b).
Third, subjects found a way to exploit the different methods we used to detect the movement end in experiments 2 and 3. Here, we focus on late perturbation trials and analyze the hand velocity immediately before contact with the target. In experiment 2, we used a speed threshold that required both forward and lateral velocity to be reduced to end the trial. In the hit condition, the necessary velocity reduction could result from contact with the target, attributable to target impedance and friction, respectively. In the stop condition, however, the target could not be exploited to stop the movement in either direction, and thus the lateral velocity in experiment 2 was small (Fig. 6b). In experiment 3, we used a force sensor to detect contact with the target, and defined the time of contact as the movement end, so lateral velocity did not have to be reduced as much. Subjects took advantage of this: the difference in lateral velocity between the stop and hit conditions in experiment 3 was much smaller than in experiment 2, and was not significant (Fig. 6b). We already know that the stopping requirement conflicts with positional accuracy. It is then likely that finding a way to partially avoid this requirement (in the lateral direction) afforded improved positional accuracy in experiment 3.
Fourth, subjects exploited the biomechanical redundancy of the arm when they had a chance. In experiments 2 and 3, redundancy was reduced by bracing the wrist; however, we performed a previous pilot experiment in which the wrist was not braced (otherwise, it was similar to experiment 2). In that case, the lateral correction was accomplished with a combination of wrist flexion/extension and humeral rotation. We found that the percentage contribution of the wrist was larger in late perturbations compared with early perturbations, in both the hit and stop conditions (Fig. 6c). The fact that the wrist contributes <30% in early corrections suggests that the preferred strategy is to use humeral rotation. This may be because for a given force level larger muscles are less affected by signal-dependent noise (Hamilton et al., 2004). In late corrections, however, it is perhaps more difficult to accelerate and decelerate the entire forearm within the remaining time, and thus the wrist contribution increases.
Modeling changes in duration and variability
The LQG framework, which we used in the above model as well as in most of our previous work on optimal feedback control, is computationally efficient but has a number of limitations. In the present context, its limitations are as follows: (1) movement duration cannot be modified on-line in response to target perturbations; (2) stopping constraints have to be modeled with quadratic costs instead of more natural step-function costs; (3) the controller cannot be adapted to the statistics of the target perturbations. Here, we present an optimal feedback control model that avoids these limitations.
The new model is constructed using more general but less efficient discretization techniques that require a simpler second-order model of arm dynamics. Movement duration is defined as the point in time when the hand first reaches the target plane. A term proportional to movement duration is included in the cost function. Time-out errors and hitting-hard errors are penalized with step-function costs. The trimodal distribution of final target positions is taken into account in the optimization process. For details, see Materials and Methods.
The new model (Fig. 7) accounts for the salient findings in experiment 2 (Fig. 4c–e) and experiment 3 (Fig. 4f–h). The speed of the corrective movement (slope of the positional traces in Figs. 7a and 4c,f) is smaller in early perturbations. The undershoot in late perturbations is larger in the stop versus hit condition. The undershoot in early perturbations is smaller compared with late perturbations, in both stop and hit conditions. Movement duration in baseline and early perturbations is larger in the stop versus hit condition. For late perturbations, movement duration increases in both the stop and hit conditions. Note that the changes in movement duration are now predicted by the model, as opposed to being taken from the data as in the LQG model. This is possible because the new controller can adjust the duration on-line, by modulating the speed in the main movement direction and thus reaching the target plane at different times.
The new model allows us to address an additional phenomenon that is beyond the scope of LQG models. The phenomenon (Fig. 8) is that trajectory variability on unperturbed trials was larger in experimental sessions with perturbations compared with baseline sessions without perturbations. Results from the hit and stop conditions are averaged in this analysis. Figure 8 shows that frequent perturbations lead to some adaptive change in the sensorimotor system, which in turn leads to increased variability on trials without perturbations. The precise time course of such adaptation is difficult to estimate because measuring variability requires many trials.
What could be the nature of this adaptive change? One possible explanation is that, in sessions with perturbations, trial-to-trial adaptation (Thoroughman and Shadmehr, 2000; Donchin et al., 2003) causes the system to be in a different state every time an unperturbed trial is encountered. However, in target perturbation paradigms, trial-to-trial adaptation is negligible (Diedrichsen et al., 2005) (see also below). Another possible explanation is that target perturbations are for some reason misinterpreted as an increase in sensory noise, in which case the “optimal” thing to do is reduce the reliance on sensory feedback, causing suboptimal performance. A third explanation, which we pursue below, is an adaptive change in the feedback controller.
In environments with large unpredictable perturbations, one would expect the optimal feedback controller to be more concerned with correcting the perturbations than the smaller errors attributable to internal noise. This is confirmed by our simulations. In Figure 8a, we compare the trajectory variability of two feedback controllers, one optimized for an unperturbed environment and the other one for a perturbed environment matching our task. As expected, the latter controller is better at correcting for perturbations (data not shown); however, it allows higher variability on trials without perturbations. This is broadly consistent with the minimal intervention principle as well as with the idea of Pareto optimality: improving any aspect of the behavior of an optimal controller requires sacrifices elsewhere. Analyzing the specific changes in feedback gains that lead to increased variability is interesting but beyond the scope of the present paper.
Lack of trial-to-trial adaptation
Trial-to-trial adaptation has been found in perturbations of the hand (Thoroughman and Shadmehr, 2000; Donchin et al., 2003) but not the target (Diedrichsen et al., 2005). Here, we replicate the latter finding: in our target perturbation experiments trial-to-trial adaptation turns out to be negligible.
To quantify such adaptation we adopted a version of the state-space approach, namely the following: The correction on trial n is denoted y (n) and has two elements: distance in the corrective direction from the average unperturbed trajectory, measured at the time of the late perturbation and at the end of the movement. The perturbation w (n) has two elements specifying the target position (again in the corrective direction) immediately before the late perturbation and at the end of the movement. The vector z (n) is the internal learning state. It also has two elements that the model is free to use in whatever way is needed to fit the data. We quantify the correction and the perturbation using pairs of measurements to allow the model the capture the difference between early and late perturbations. η (n), γ (n) are independent zero-mean two-dimensional Gaussian random variables with covariances Q, S.
The next learning state z (n + 1) may in general depend on the current learning state z (n), the perturbation w (n), and the correction y (n). However, y is a linear function of z, w, and so we do not include a y term in the first equation. Note also that if there is any trial-to-trial learning here, it should be related to predicting the final target position (and initiating a normal movement aimed at that position); thus, it makes more sense to learn from w rather than y.
The sequences of corrections y (n) and perturbations w (n) were measured. Given these measurements, the most likely values of A, B, C, D, Q, S as well as the sequence of learning states z (n) were computed using the expectation maximization (EM) algorithm (Cheng and Sabes, 2006). Because EM can get trapped in local minima, model fitting was run multiple times with different initial conditions and the best result was used. We fit the model separately for each experiment (2 vs 3), condition (stop vs hit), and subject. The first two-thirds of the data in each experimental session was used for model fitting and the last one-third for model testing.
To evaluate the performance of the learning model, we regressed each component of y either on z and w, or on w alone, or on z alone. The first regression measures the performance of the full model, whereas the latter two regressions measure the contribution of feedback correction (w) and learning component (z), respectively. The R2 and p values for the regressions were averaged over subjects, experiments, and conditions. The regressions were first done on all trials, and then separately on the baseline (no perturbation) trials, early jump trials, and late jump trials, because the models are likely to perform differently on different trial types. Table 2 shows the average R2 values multiplied by 100 (to obtain a measure of variance explained), only for the cases in which p < 0.05 on average. In the remaining cases, we found p > 0.3; thus, there was clear separation between significant and nonsignificant regression fits.
None of the regressions on z alone are significant, suggesting lack of trial-to-trial learning. The regressions on w are as expected: given the target position at the middle and at the end of the movement, one can predict the hand position at the end of the movement (y2) in both types of perturbations as well as the hand position at the middle of the movement (y1) in early perturbations. Note that the combined model z, w is slightly but systematically better than w alone, suggesting that there may be a small learning effect. Given how small this effect is, it is not surprising that regressions on z alone were far from significant.
The biological processes that continuously improve behavior closely resemble iterative optimization. This makes optimal control theory a natural framework for studying the neural control of movement. It is also a very successful framework in terms of explaining the details of experimental data (Todorov, 2004). However, one of its most appealing features remains mostly untapped: the ability to predict task-specific sensorimotor strategies and thereby changes in behavior that result from systematic task variation. This is a gap not only in optimal control models but in the field of motor control in general. A substantial number of studies (including most of the literature on motor adaptation) have used a single task: reaching. The emphasis on servo control has created the impression that, as long as a desired trajectory can somehow be planned, motor execution (and all sensorimotor processing during movement) is the same no matter what the organism is trying to accomplish. Planning models have focused on the geometry of limb trajectories and have mostly ignored the context that gives functional meaning to these trajectories. We see this as a substantial gap in the current understanding of sensorimotor function; the present paper is a step toward filling that gap.
To vary the task systematically, we need a compact and experimentally accessible representation of the task space. Optimal control provides the perfect tool: composite cost functions. A central argument of this paper is that subjects optimize a composite cost as opposed to a homogeneous cost under multiple hard constraints. Indeed, we did everything we could to enforce a hard constraint on movement duration, and yet subjects never treated it as such. Instead, they always found a balance between undershoot and time-out errors. The changes in our experimental design affected the relative importance of these errors; in particular, the switch to the intercept task (which made the duration threshold explicit) resulted in the lowest percentage of time-out errors. In addition to accuracy and duration, we proposed that the composite cost includes endpoint stability (stopping in particular) and energy consumption. We showed directly that stability is part of the cost, by allowing subjects to interact with a high-impedance target and finding that they take advantage of it. The only evidence for the energy cost was indirect: it was needed to make our model fit the data. However, other studies have provided more direct evidence: increased muscle cocontraction has been found to yield more accurate movements (Burdet et al., 2001; Gribble et al., 2003), and yet this is not a strategy that subjects normally use, suggesting that they care about energetic efficiency in addition to accuracy. It is also notable that successful optimal control models of full-body movements are predominantly based on energy minimization (Anderson and Pandy, 2001; Pandy, 2001). In obstacle avoidance tasks (experiments 2 and 3), there is likely to be a fifth component of the cost having to do with avoiding the obstacle. Although we did not model this cost, we showed that obstacle avoidance does not rely on hard constraints such as fixed imaginary targets to the side of the obstacle.
The main effect we analyzed, the incomplete correction for late perturbations, reflects the closed-loop component of the sensorimotor strategy. The fact that the effect decreased in the hit condition means that the visuomotor loop operated differently, as predicted by our model. Thus, changes in stopping requirements (as well as target impedance) caused changes in the way visual feedback is used to make on-line corrections. This may be the first demonstration that visuomotor feedback loops are affected by the task and in particular by nonvisual components of the task. In addition to demonstrating task sensitivity, we provided additional evidence that sensorimotor strategies are consistent with the minimal intervention principle of optimal feedback control (Todorov and Jordan, 2002b). We found that positional variability is large during movement (especially three-dimensional movement) and is only reduced near the end, where accuracy is needed. We also found that, when the target is a vertical stripe, endpoint variability is larger in the vertical direction and visual feedback is not fully used to suppress variability in that direction. These results reaffirm the usefulness of looking beyond average trajectories, studying variability patterns and responses to perturbations, and modeling the sensorimotor strategies responsible for such effects.
Motor adaptation is a phenomenon that has not yet been addressed in the optimal control framework, but in principle is easy to address, as we showed in our model of increased variability attributable to frequent perturbations. One can impose any change in the task or environment, compute the new optimal controller, and use it as a model of adapted behavior. Of course, adaptation is rarely complete; thus, the predicted adaptation effect should be somewhere in between the baseline and fully adapted optimal controllers. An interesting open question is how to relate trial-to-trial dynamics of learning to asymptotic predictions regarding optimal adaptation. One way to do this is to model trial-to-trial changes as arising from an iterative optimization algorithm, which in the limit converges to the adapted optimal controller. This approach may yield richer models of learning dynamics than the linear state-space models currently used.
This work was supported by National Institutes of Health Grant NS-045915 and National Science Foundation Grants ECS-0524761 and SBE-0542013. We thank Javier Movellan and Howard Poizner for comments on this manuscript.
- Correspondence should be addressed to Emanuel Todorov at the above address.