Abstract
Optimal integration of different sensory modalities weights each modality as a function of its degree of certainty (maximum likelihood). Humans rely on nearoptimal integration in decisionmaking tasks (involving e.g., auditory, visual, and/or tactile afferents), and some support for these processes has also been provided for discrete sensorimotor tasks. Here, we tested optimal integration during the continuous execution of a motor task, using a cyclical bimanual coordination pattern in which feedback was provided by means of proprioception and augmented visual feedback (AVF, the position of both wrists being displayed as the orthogonal coordinates of a single cursor). Assuming maximum likelihood integration, the following predictions were addressed: (1) the coordination variability with both AVF and proprioception available is smaller than with only one of the two modalities, and should reach an optimal level; (2) if the AVF is artificially corrupted by noise, variability should increase but saturate toward the level without AVF; (3) if the AVF is imperceptibly phase shifted, the stabilized pattern should be partly adapted to compensate for this phase shift, whereby the amount of compensation reflects the weight assigned to AVF in the computation of the integrated signal. Whereas performance variability gradually decreased over 5 d of practice, we showed that these modelbased predictions were already observed on the first day. This suggests not only that the performer integrated proprioceptive feedback and AVF online during task execution by tending to optimize the signal statistics, but also that this occurred before reaching an asymptotic performance level.
Introduction
The integration of different sensory modalities occurs at various nodes in the human brain (Holmes and Spence, 2005; Macaluso, 2006; Driver and Noesselt, 2008; Stein and Stanford, 2008). This process is thought to be important to merge multiple, often redundant, sources of information into a common representation about the outside world. It has been studied across a broad range of behaviors, and can be modeled assuming optimality, i.e., maximum likelihood estimation (MLE): different signal sources are weighted by the inverse of their variance, greater weight being given to the more reliable source (Clark and Yuille, 1990; Hillis et al., 2002). This is seen in the combination of different sensory modalities (van Beers et al., 1999; Ernst and Banks, 2002; Sober and Sabes, 2003; Alais and Burr, 2004; Helbig and Ernst, 2007; Körding et al., 2007), or in the inference from prior knowledge about stimulus distribution and actual sensory input (Bayesian statistics) (Kersten and Yuille, 2003; Körding and Wolpert, 2004, 2006, 2007; Tassinari et al., 2006; Vaziri et al., 2006). Most of these studies focused on the statistics of decisionmaking processes (asking, e.g., “Are the visual and auditory sources congruent or incongruent?”) or on the distribution of endpoint discrete reaching movements, i.e., movements with clearly defined beginning points and endpoints (Magill, 2006; Hogan and Sternad, 2007).
Here, we focused on optimal multisensory integration in the context of coordinated movement control. Human subjects were instructed to continuously stabilize a complex bimanual pattern: cyclical movement of the two wrists, one being a quarter of the cycle ahead (90° out of phase). Previous studies have unambiguously demonstrated that learning and stabilization of this movement is facilitated by providing artificially augmented visual feedback (AVF) online during movement execution (Lee et al., 1995; Swinnen et al., 1997; Debaere et al., 2003, 2004): whereas the betweenhand phase variability of performers will decrease across practice, the variability of those who receive AVF will decrease faster and to a smaller asymptote. We investigated whether the facilitation provided by AVF results from the integration between visual and proprioceptive afferent information and reflects some optimality principles (MLE) by testing the following predictions: (1) The variability of the pattern with both AVF and proprioception available would reach an optimal level and thus be smaller than with only one modality. (2) If AVF provides less and less salient information about the movement, it would be gradually disregarded. (3) If the AVF is imperceptibly phase shifted, the movement would be partly adapted to compensate for this bias, as the integrated estimate depends partly on the corrupted AVF.
Since the task was learned through practice, we particularly focused on the time course of multisensory integration. Specifically, we investigated whether the modelbased predictions were equally validated at the different learning stages: MLE would either be reached after some practice, e.g., with a similar time course to the task, or be acquired much more rapidly. If so, multisensory integration reflecting optimality principles (MLE) would be identified as a process that occurs independently of expertise level in task execution.
Materials and Methods
Subjects.
Fourteen healthy subjects (7 females, 7 males) participated (13 were right handed, 1 left handed according to Oldfield's handedness questionnaire). Their age ranged between 18 and 35 (mean 23). All subjects were naive with respect to the experimental goals and were paid for participation. The experimental procedures were approved by the ethical committee of Katholieke Universiteit Leuven, in compliance with the declaration of Helsinki.
Apparatus and task.
Subjects were seated in front of a computer screen (Fig. 1b). They inserted both hands into two rotating manipulanda, with the palm in neutral position (thumb upward). Both wrists were free to move with their axis of rotation being aligned with the rotational degree of freedom of both manipulanda. The units were fitted with a forearm rest to support it in a natural position, and the forearms, wrists, and hands were occluded from direct vision.
Performers had to coordinate cyclical oscillatory movements of their wrists such that the right wrist always led the left wrist by a quarter cycle (Fig. 1a). If the trajectories of both wrists followed a sinusoid, maintaining the phase lag would correspond to ϕ = 90° of phase offset between both wrists. This movement is not intrinsic to the motor system (Kelso, 1995), and requires practice before being properly mastered (Zanone and Kelso, 1992; Swinnen et al., 1997; Debaere et al., 2004). Indeed, this pattern is located between the two most natural coordination patterns, which act as strong attractors in the movement state space (Haken et al., 1985; Kelso, 1995; Swinnen, 2002): inphase corresponds to simultaneous activation of homologous muscles (ϕ = 0°), and antiphase corresponds to simultaneous activation of nonhomologous muscles (ϕ = 180°) or isodirectional movements in extrinsic space when moving in the mediolateral plane. Accordingly, the 90°outofphase pattern studied here was located exactly in between both aforementioned patterns.
During some trials (see below), a black cursor was displayed on the computer screen (Fig. 1b). Subjects were informed that they controlled the moving cursor by their wrists, each of them moving the cursor along one coordinate of an orthogonal frame (Fig. 1c), such that the positions of both wrists were integrated in a single visual gestalt (Lissajous figure). Successful 90°outofphase performance was then characterized by tracing an anticlockwise circle on the screen (see supplemental Movie 1, available at www.jneurosci.org as supplemental material). Three righthanded subjects showed a clear preference for performing the movement with the left wrist ahead (corresponding to a clockwise circle on the screen), and these data were mirrored in the reported analysis. As soon as the first circles appeared on the screen, actual data acquisition started according to the protocol depicted in Figure 1d. The whole experiment consisted of five sessions of ∼1 h each, which were completed over 4 or 5 consecutive days. Each session was divided into three parts (see below), each containing some blocks of trials, the duration of each trial being 30 s. Regardless of trial condition, movement frequency was softly constrained around 1 Hz (1 full arm cycle per second) by changing the background color of the screen as soon as the time elapsing between two successive minima of any wrist trajectory deviated from the target period by >10%.
The experimental session consisted of three parts. Part I consisted of a single block of four trials during which the participant performed inphase cyclical movements (ϕ = 0°), without AVF. This control condition was included to assess whether learning the new 90°outofphase movement would induce any change in performance from the most stable bimanual coordination pattern, i.e., the mirrorsymmetric inphase movement.
In parts II and III, the target movement was the 90°outofphase pattern described above (ϕ = 90°). Part II consisted of four “learning” or “phaseshifted” (see below) blocks, each containing eight trials. During the first and three other (randomly selected) trials (i.e., 50% of the total), the cursor was visible on the screen, such that AVF was provided. In the remaining four trials, the cursor disappeared after 2 s. Part II also contained a “shakers” block for which we fixated tendon shakers onto the dorsal and palmar region of both subject's wrists, to degrade the quality of the proprioceptive feedback (Bock et al., 2007). While unilateral tendon vibration has been extensively used to bias the proprioceptive inflows toward either flexion or extension (the movement illusion effect) (McCloskey et al., 1983; Casini et al., 2006; Weerakkody et al., 2007), Gilhodes et al. (1986) showed that stimulation of both antagonistic muscles at the same frequency (as we did here) did not induce any overt motor effects. For that reason, simultaneous vibration of agonist and antagonist muscles may provide a useful method to degrade proprioceptive responsiveness, without directly biasing motor output. The shakers' vibration frequency was tuned to 80 Hz, corresponding to the upper limit of Ia afferents firing harmonically with the vibration (Roll et al., 1989). This “shakers” block was randomly inserted in between the four remaining blocks, with the restriction that it was always at least after the third normal block on day 1. Similar to the other blocks in Part II, these “shakers” blocks also contained four trials (always including the first one) in which the subject received AVF and four trials without AVF, the seven last ones being randomly distributed between trials with and without AVF.
The “phaseshifted” blocks (displayed in gray in Fig. 1d) were inserted to establish the relevance of the third prediction. These blocks were very similar to the learning blocks, except that, when present, the AVF was slightly phase shifted with respect to the actual movement. For example, if the phase shift was 10°, the subject saw an ellipse corresponding to the 100°outofphase pattern on the screen when actually executing the 90°outofphase pattern. Hence, the subject would have to execute the 80°outofphase pattern to see the expected circle on the screen (see supplemental Movie 2, available at www.jneurosci.org as supplemental material). A set of alternative predictions about the subject's behavior in the reference frame of the visual feedback is proposed in Figure 2, depending on the introduced phase shift and the level of compensation in the coordination pattern. Importantly, these blocks replaced normal learning blocks without prior warning, such that the phase shift remained undetected by the performer and the corresponding motor adaption occurred unconsciously. Postexperiment interviews revealed that eight subjects did not perceive any discrepancy between their movements and the visual display, while six might have detected the biggest phase shift. Importantly, all the latter subjects thought that the perceived mismatch was due to weak performance and not the result of experimental trickery. The phase shift was implemented as follows: assuming sinusoidal trajectories, let θ_{r} = A sin(ωt) and θ_{l} = A sin(ωt + ϕ) denote the angular trajectories of the right and left wrists, respectively. A and ω represent the movement amplitude and frequency, and ϕ is the phase difference between both wrists. Displaying now the following quantities: θ_{r}^{disp} = cos(ϕ′/2)θ_{r} − sin(ϕ′/2)θ̇_{r}/ω and θ_{l}^{disp} = cos(ϕ′/2)θ_{l} + sin(ϕ′/2)θ̇_{l}/ω—calculated on the basis of online estimates of the velocities θ̇_{r} and θ̇_{l}—it can be shown that θ_{r}^{disp} = A sin(ωt − ϕ′/2) and θ_{l}^{disp} = A sin(ωt + ϕ + ϕ′/2), such that a supplemental phase shift of ϕ′ is artificially introduced in the visual display. We tested visual phase shifts equal to ϕ′ = −15°, −10°, −5°, 5°, 10°, and 15°. Five trials of each were randomly inserted into the seven “phaseshift” blocks, which still contained ∼50% of trials without visual feedback, to preserve the subjects' belief that these blocks were normal. Those last trials were not recorded for analysis. Note that each phase shift was applied for only 30 s (the duration of one trial), such that we expect this time to be too small to induce sensory recalibration (see e.g., Burge et al., 2008). On days 3 and 5, the last learning block was kept normal to wash out any potential effect of the phaseshift blocks on the degree of certainty of the AVF, before starting part III of the experimental session. These “washout” blocks were not included in the analyses.
Part III was done on days 1, 3, and 5 and consisted of three “visual noise” blocks, each containing seven trials in which AVF was provided but corrupted by noise. In those blocks, the displayed signals were θ_{r}^{disp} = αθ_{r} + λ_{r} and θ_{l}^{disp} = αθ_{l} + λ_{l}, α weighting the presence of the actual movement in the display, and λ_{r} and λ_{l} being two independent noise vectors obeying the dynamics of a damped spring excited by a random force of Gaussian distribution (see supplemental Movie 3, available at www.jneurosci.org as supplemental material). In discrete time, their dynamics then obeyed λ[k] = (−λ[k − 2] + (2 + ξτ)λ[k − 1] + τ^{2}μ[k])/(1 + ξτ + κτ^{2}), where ξ = 10, κ = 100, and τ = 0.005 s are the damping and stiffness of the spring, and the display refresh rate, respectively. The force acting on this virtual spring—i.e., μ[k]—was a random vector of Gaussian distribution (zero mean, Λ variance). Six of the seven trials per block were tuned as follows: α = 1 and Λ = 2500, 5000, 7500, 10,000, 12,500, and 15,000, such that different noise levels were superimposed on the actual signal. The seventh trial was tuned with α = 0 and Λ = 30,000, such that the visual display was uncorrelated with the actual movement. The seven conditions were randomized within each block.
A “day” label is indicated on the bottomright corner of many blocks in Figure 1d: D0 refers to the block whose data represented the initial level, D1–D5 refer to the blocks whose data were preserved to represent the skill levels from days 1 to 5; and Da stands for the phaseshifted blocks, whose data were merged together for the corresponding analysis. The blocks without D label were not preserved for analysis.
Data analysis.
The first 3 s of data from each trial were removed since we were not interested in movement initiation. The angular position of both wrists was filtered by a Butterworth filter (forward and backward, cutoff frequency of 8 Hz). They were also detrended by subtracting the bestfitting secondorder parabola, to remove any lowfrequency drift. Angular velocity of the wrists was computed by an appropriate centered differentiation algorithm.
The movement frequency was calculated as the mean of the inverse of the time elapsed between two adjacent maxima in the trajectory, then averaged over both wrists. The continuous phase difference between both wrists was calculated as follows: where θ_{r}, θ̇_{r}, θ_{l}, and θ̇_{l} denote the position and velocity of the right and left wrists, and f̄ is the mean movement frequency over the corresponding trial. The mean and SD of ϕ over a trial were calculated according to circular statistics standards (Fisher, 1983): where T is the trial duration. The mean of the estimate of the phase difference between both wrists—i.e., μ_{ϕ}—whatever the feedback condition, was computed as the mean of the actual phase difference during the corresponding trial. The variance ς^{2} was computed as the square of the SD of this phase difference.
To quantify the learning rate of the task, exponential curves were fitted to the SD of ϕ across days of practice. These curves took the following form: where d stands for the day considered. The three parameters of Equation 4—i.e., the asymptote ς_{ϕ,∞}, the decrease amplitude ς_{ϕ,1}, and the decrease time constants (in days) δ—were estimated by a nonlinear leastsquares curvefitting algorithm (The MathWorks).
Influence of practice and type of feedback provided was assessed by factorial ANOVA designs. The level of significance was set to p < 0.05.
Maximum likelihood model.
A model assuming the integration between proprioception and AVF computes the integrated estimate of the phase difference between both limbs from the estimates given by each modality alone, i.e.:
where μ_{P} and μ_{V} are the estimates provided by proprioception and visual feedback alone, respectively, and 0 ≤ w_{P} ≤ 1 weights the contribution of the proprioceptive feedback in the global estimate. This model assumes that the integrated variable is linear, while the phase actually lies on a circle. Conversely, the tangent of the phase would be a linear variable, but it can be shown that θ ≃ tan θ for θ ∈ [−40°, 40°]. We assume that the linear model is thus valid for phase deviations around steady state belonging to that range. Given Equation 5, the variance of the integrated estimate is thus ς_{V+P}^{2} = w_{P}^{2}ς_{P}^{2} + (1 − w_{P})^{2}ς_{V}^{2}, assuming that ς_{P}^{2} and ς_{V}^{2} denote the variance of the proprioceptive and visual feedback, respectively, and that both are corrupted by independent noise exhibiting Gaussian distribution. The model is optimal if it maximizes the likelihood of the integrated signal by selecting w_{P} to minimize the variance ς_{V+P}^{2}, such that the average square error—i.e., (μ_{V+P} − 90°)^{2}—is minimized through the trial. It can be demonstrated (see the references above) that ς_{V+P}^{2} is minimized if the following is true:
i.e., if the weight of each modality is inversely proportional to its own variance. Accordingly, the variability of the optimally integrated estimate is equal to the following:
and is smaller than the variability of either modality alone [ς_{V+P}^{2} < min(ς_{P}^{2}, ς_{V}^{2})], except if one sensory source is perfect (ς^{2} = 0) or is infinitely noisy (ς^{2} = ∞). The maximal advantage of multisensory combination is obtained when both modalities are of equal variance, since the variance of their integration becomes
This simple model cannot be directly applied to our data as such, for the following two reasons: (1) the variability corresponding to the AVF—i.e., ς_{V}^{2}—cannot be estimated, since it would correspond to a condition in which all proprioceptive afferents are shut down, while the tendon shakers used in this study only increased the noise level on some of the proprioceptive afferents (Bock et al., 2007); (2) the 90°outofphase pattern is most likely stabilized by a mix between feedbackdriven motor commands—based on the proprioceptive and visual inflows—and feedforward motor commands, which are generated from internal predictions about the system's dynamical evolution (Wolpert et al., 1998; Kawato, 1999; Sabes, 2000), and whose influence has been neglected in the model (Eq. 7). For these reasons, we augmented the twocomponent model (Eq. 7) to a threecomponent model, integrating the sensory inflows coming from the visual system (V), the proprioceptive system, which is impaired by the shakers (P for short), and the sum of the residual proprioceptive information and the feedforward command (FF for short). Similarly to what has been done for the twocomponent model, it can be shown that optimal integration of those three components into a common belief about the phase offset would weight each modality according to the following: and that the resulting variability would be equal to the following: Therefore, the three conditions corresponding to the trials without AVF and/or with the tendon shakers can be used to compute the sensory variability of each source alone, i.e., ς_{V}^{2}, ς_{P}^{2}, and ς_{FF}^{2}, assuming the following restrictions: (1) the performance variability (i.e., the variability measured from Eq. 3) fairly reflects the sensory estimation variability; and (2) the maximum likelihood estimator is appropriate to model the stationary performance, while an optimal control model—which would need to consider e.g., the task's dynamics—might be a more complete model of the behavior (see Discussion). If the available information sources are optimally integrated, we find the following: ς̂_{BS}^{2} = ς_{FF}^{2}, ς̂_{B}^{2} = ς_{P}^{2}ς_{FF}^{2}/(ς_{P}^{2} + ς_{FF}^{2}), and ς̂_{S}^{2} = ς_{V}^{2}ς_{FF}^{2}/(ς_{V}^{2} + ς_{FF}^{2}), where ς̂_{BS}^{2}, ς̂_{B}^{2}, and ς̂_{S}^{2} refer to the performance variability that was measured in the trials without AVF and with the shakers (blind+shakers), without AVF and shakers (blind only), and with AVF and shakers (shakers only), respectively. The equations can be inverted to obtain an estimate for ς_{V}^{2}, ς_{P}^{2}, and ς_{FF}^{2}. From there, a modelbased prediction about the performance variability with both intact proprioception and AVF can be estimated from Equation 9 and compared with the actual data: Specifically, the modelbased predictions can now be rephrased as follows, on the basis of the derived equations.

The variability of performance with both AVF and intact proprioception available is smaller than with any other sensory combination, and variability should reach the optimal level predicted by the maximum likelihood integrator (Eq. 9 or 10).

If the AVF is artificially corrupted by noise (such that ς_{V}^{2} increases), the movement variability increases but saturates toward the level without AVF. Indeed, according to Equation 9, if ς_{V}^{2} goes to infinity, ς_{V+P+FF}^{2} tends to—and never exceeds—ς_{P+FF}^{2} = ς_{P}^{2}ς_{FF}^{2}/(ς_{P}^{2} + ς_{FF}^{2}).

If the AVF is imperceptibly phase shifted with respect to the actual movement (such that μ_{V} does not equal μ_{P} anymore), the movement is partly adapted to compensate for this bias, since the integrated estimate depends on both the uncorrupted (feedforward+proprioception) and the corrupted (AVF) modalities (see Eq. 5). The extent of this compensation could be derived from the weights assigned to the different modalities by the maximum likelihood integrator model (Eq. 8).
Results
The following sections will sequentially address the results related to the three modelbased predictions.
Prediction 1
The first prediction was validated by comparison of the four different feedback conditions: intact proprioception+feedforward (visual display not provided), proprioception+vision+feedforward (visual display provided), “vision+feedforward” (with distorted proprioception), and “feedforward only” (visual display not provided and distorted proprioception). Since proprioception cannot be reversibly eliminated without invasive procedures, proprioceptive inflows in the two latter conditions were distorted by placing tendon shakers on the palmar and dorsal side of both wrists (Bock et al., 2007). The results are displayed in Figure 3. Figure 3a displays the SD of the phase difference between both wrists across the 5 d of practice and the five types of movements [inphase, “feedforward only” (with the shakers and no AVF), intact proprioception and feedforward (no AVF), “vision and feedforward” (with the shakers and AVF), and with the three modalities (no shakers, with AVF)]. It shows a clear tendency to reduce the SD—or increase stability—with practice under the four movement conditions corresponding to the 90°outofphase pattern, while inphase pattern stability was very constant across days. Factorial ANOVA (5 × 5, five days, five movement types: four conditions with the 90°outofphase pattern, and inphase) supported this finding as shown by a significant practice day effect (F_{(4,325)} = 14.9, p < 0.0001). The movement type effect also reached significance (F_{(4,325)} = 11.24, p < 0.0001), suggesting that the task was most difficult to stabilize without AVF and with the shakers (i.e., “feedforward only,” orange), then without AVF (proprioception+feedforward, red), then with the shakers (“vision+feedforward,” blue), and finally with the three modalities available and uncorrupted (magenta). The inphase pattern was more successfully stabilized than any of the latter four conditions. Oneway ANOVA on the data corresponding only to inphase movements (with the day of practice as single factor) did not reach significance (p > 0.7), suggesting that the extensive practice of the 90°outofphase pattern under various feedback conditions did not modify the performance level of the control inphase condition.
The qualitative aspect of prediction 1 is clearly visible in Figure 3a: the more feedback the subjects received, the more stable their performance. This result suggests that the integration between the available modalities (proprioceptive feedback, AVF, and internally generated feedforward command) reflects some maximum likelihood mechanisms. However, the shakers' effect was small: while it reached significance with AVF [the difference between the blue and the magenta curves in Fig. 3a was assessed by a dedicated factorial ANOVA that reached significance with F_{(1,130)} = 7.8 (p < 0.01)], it did not reach significance between the two conditions where AVF was not provided [the difference between the orange and the red curves in Fig. 3a was assessed by a dedicated factorial ANOVA that did not reach significance (p > 0.2)]. This result suggests that quantitative matching of maximum likelihood predictions would require other techniques to entirely shut down the proprioceptive afferents. Those techniques are, however, more invasive.
Performance variability could actually be predicted from the three other measurements (see Eq. 10). This prediction is depicted with the dashed green curve in Figure 3a, while Figure 3b represents the difference between this prediction and the actual performance variability (the magenta curve in Fig. 3a). Visual inspection of the figure reveals that the modelbased prediction underestimated the actual performance by ∼1° (or ∼8% of the asymptotic variability). Indeed, since the data suggested that the shakers' effect was tiny without AVF, the model predicts an even smaller effect when combined with another sensory modality, i.e., AVF. The weight assigned to the different modalities could be estimated from Equation 8. These weights are displayed in Figure 4, which suggests that the weight assigned to the AVF slightly decreased in the final learning stages (days 3–5) in favor of the weights assigned to the proprioceptive and internal (feedforward) modalities.
To further investigate the differences in the learning time courses, we fitted exponential learning curves onto the data of Figure 3, a and b (dotted lines). The equation of these fits is given by Equation 4 and contains three parameters that were optimized for the group data and for each subject individually. In Figure 3c, the large dots denote the group data, and the error bars denote the mean ± SE of the individual fits. The three fitted parameters are as follows: (1) the asymptote ς_{ϕ,∞}, i.e., the level reached at the end of the learning; (2) the amplitude ς_{ϕ,1}, i.e., the difference between the level at the first day of practice and the asymptote; (3) and the time constant δ, characterizing the learning speed. Given the asymptotes, the figure confirms that the 90°outphase pattern was more difficult to stabilize with the tendon shakers on the wrists than without, and even more difficult when the visual feedback was not provided. The inphase movement was the most stable one. The asymptote of the difference between the proprioception+vision condition and the modelbased prediction (green) reached a value about −0.7, confirming that the actual data outperformed the modelbased prediction by this small amount. More interestingly, this figure shows that two amplitude parameters were not significantly different from zero (t tests, p > 0.18): (1) the amplitude of the modelerror fit (green) was not different from zero, revealing that this fit was equally close to zero over the whole learning course; (2) the amplitude of fit made on the inphase data (black) was also zero, confirming that performance in this condition was stable, its variability being constant across the 5 d. The time constants of the conditions that were actually learned through practice (i.e., the four movement conditions stabilizing the 90°outofphase pattern that corresponded to nonzero amplitude) were ∼2–4 d. The time constants of the individual fits for the inphase movement and the difference between the model prediction and the “all modalities” condition were not displayed, since the learning amplitude was not different from zero in both cases. In sum, the performers learned to stabilize the 90°outofphase movement along the 5 d of practice, regardless of the available sensory modality(ies), but it seems that their integration was equally close to the predicted optimal level from the first to the last session.
Prediction 2
The second prediction was investigated through specific blocks, executed at the end of sessions 1, 3, and 5. Here, the AVF was always present, but corrupted by some degree of noise: on top of the actual movement, the cursor obeyed the dynamics of a randomly actuated damped spring (see Materials and Methods). The noise strength varied across six levels, and a seventh condition was added in which the cursor moved completely randomly, i.e., uncorrelated with the actual movement. The SDs of the phase difference between both wrists in these noisy conditions are displayed in Figure 5, together with the “vision+feedforward,” the “proprioception+feedforward,” and the “vision+proprioception+feedforward” conditions of the same days (same as in Fig. 3a). As expected, increasing the noise level resulted in a performance deterioration: factorial ANOVA (3 × 8, three days, eight noise levels including baseline V+P+FF) reached significance with F_{(7,312)} = 21.1 (p < 0.001) with the noise level as factor. The effect of day of practice also reached significance, due to the learning effect identified previously (F_{(2,312)} = 53, p < 0.001), and their interaction with F_{(14,312)} = 2.7 (p < 0.001). More importantly, the variability achieved under all of these noise conditions never exceeded the level reached without any visual feedback (i.e., the P+FF condition). Specifically, the level reached in the “infinitely noisy” condition (no correlation between movement and visual display) was not significantly different from the one reached without visual feedback (t tests, all p > 0.16). This is highly consistent with the second prediction: when the AVF does not provide any salient information about the movement, it is simply disregarded such that performance equals the level obtained without any AVF. The same figure displays qualitative modelbased predictions (dotted line): the variability increases smoothly, saturating at the “proprioception+feedforward” (P+FF) level. Once again, the matching between the prediction and the actual data is of similar quality across days.
Prediction 3
To validate the third modelbased prediction, we replaced some “normal” learning blocks of days 3, 4, and 5 by specific blocks (see Fig. 1d) where the visual display (when present) was phase shifted (see Materials and Methods). Importantly, the subjects were not informed about this experimental manipulation, and the introduced phase shifts were kept small enough (±15°) such that they were not perceived. Accordingly, performers did not detect the mismatch between what they did (and felt via proprioceptive feedback) and what they saw, suggesting that any potential movement compensation was thus unconscious. Postexperiment interviews indicated that some subjects did perceive the largest phase shifts, but always attributed the mismatch to inaccurate movement execution. Several possibilities about movement and visual display were presented in Figure 2, depending on the degree of compensation in the coordination pattern. The two extremes were either no compensation (the stabilized movement is always the same, and the AVF is fully skewed), and full compensation (the stabilized movement is fully adapted to correspond to a circular AVF, whatever the phase shift). In Figure 6a, the actual average trajectories are displayed, together with the corresponding visual display (AVF). One can see that the subjects partly compensated for the introduced phase shifts, i.e., between the two extremes described above, such that the visual display was also only partly skewed. Correspondingly, Figure 6b represents the phase difference error, i.e., the mean of the phase difference between the two wrists (computed by Eq. 5) normalized by the mean of this difference across conditions. If the phase shifts were not compensated, this line would have been horizontal (the stabilized pattern would have been the same whatever the shape of the visual display). In contrast, full compensation would have led to a diagonal line of slope −1. The actual data lie in between, suggesting that the weight assigned to AVF was ∼50% to build the estimate of the phase difference between both wrists. Partial compensation was validated by an ANOVA, which reached significance with F_{(6,91)} = 57.4 (p < 0.0001), with the phaseshift level as single factor (seven phase shifts were tested). Full compensation was furthermore invalidated by individual t tests showing significant differences between the six data points and the diagonal line of slope −1 (all p < 0.004).
The compensatory behavior could have been predicted from the results related to the first prediction, since the three modalities should be combined according to weights inversely proportional to their own variance (see Eq. 8 and Fig. 4). Using the data from the learning blocks on days 3, 4, and 5—i.e., the same days as those where the “phaseshift” blocks were inserted—Figure 4 suggests that the visual feedback should be weighted ∼50–60% in the integrated estimate. The predicted compensation is actually represented by the dashed gray line in Figure 6b. The figure reveals that the predicted and actual degree of compensation are very close to each other and are not statistically different for the majority of phase shifts (the asterisks, t tests, p > 0.05).
Finally, we investigated whether the introduced phase shifts in the AVF increased task difficulty for the performer. This is unlikely since the pattern variability stayed constant across the tested phase shifts (see Fig. 6c). ANOVA did not reach significance for this variable (F_{(6,91)} = 0.24, p > 0.96), with the phaseshift level as single factor.
Discussion
Multiple sensory modalities are integrated by the human brain to obtain a common belief about the world state, and multisensorytraining protocols could better approximate natural settings and boost perceptual learning (Shams and Seitz, 2008). Optimal computational models predict that the integrated signal variability will be minimized, if the signal on each modality is weighted with respect to the inverse of its variance, such that the average error between the perceived and the actual world state will be minimized. This optimal integration process is often referred to as maximum likelihood estimation (MLE, assuming all Gaussian likelihoods and symmetrical cost functions). Human capabilities relying on MLE have been broadly demonstrated in the past (Trommershäuser et al., 2008), during decisionmaking tasks (e.g., source localization) (van Beers et al., 1999; Ernst and Banks, 2002; Alais and Burr, 2004; Helbig and Ernst, 2007; Körding et al., 2007), pointtopoint reaching, or saccadic movements (Körding and Wolpert, 2004; Tassinari et al., 2006; Vaziri et al., 2006).
Here, we investigated multisensory integration during the execution of a cyclical bimanual task requiring extensive learning. In particular, the present contribution was twofold. First, we showed that multisensory integration between proprioception and AVF can be performed online, during the execution of a rhythmic movement, while previous studies focused on discrete tasks where multisensory integration and motor action could be separated in time. For example, Sober and Sabes (2003, 2005) illustrated flexible strategies for the integration of proprioceptive and visual feedback, yet restricted to the early stages of movement planning. Second we demonstrated that the integration process was already in place during the initial stage of practice, whereas the task necessitated substantial training before reaching an asymptotic level of performance.
We investigated the performers' capability to reflect integration mechanisms through MLE by testing three predictions that required independent manipulations: (1) integration of the different modalities through learning; (2) corrupting AVF by noise, such that it progressively became less useful; and (3) imperceptibly skewing AVF, such that it provided biased information. Our results confirmed that the 90°outofphase coordination pattern does not belong to the intrinsic bimanual repertoire (Zanone and Kelso, 1992; Lee et al., 1995; Swinnen et al., 1997; Debaere et al., 2003, 2004), since its SD was substantially higher than for the inphase condition, and was only gradually reduced over five daily practice sessions. This was the case regardless of available modalities. Importantly, the phase difference between both limbs is a rather artificial quantity that was most likely not spontaneously encoded by the subjects' brain, although we show that it nicely captures the subjects' performance and reflects multisensory integration mechanisms.
The performance variability that was observed with all three modalities available was slightly better than predicted by the model (by 1°, or 8% of the asymptotic value), obtained by inferring the variability of each modality alone from the other experimental conditions. This highlights an important concern about the experimental manipulations we introduced to validate the first prediction: the tendon shakers used to reversibly degrade proprioception actually only partly affected the proprioceptive inflow from the corresponding muscle (Roll et al., 1989), without completely masking sensory information. Similarly, cutaneous receptors were unaffected by the shakers, and so could provide uncorrupted timing feedback on turning points (maximum flexion or extension). For those reasons, the “shakers” conditions most likely retained some residual proprioception, and the movement variability in these conditions might have resulted from the exploitation of this remaining inflow. The MLE model was thus less thoroughly tested with tendon shakers than it would have been if proprioception was entirely masked through some more invasive techniques. However, the predicted qualitative trend was observed: the available modalities were exploited to reduce the performance variability. Importantly, the two other predictions, which further strengthened the MLE hypothesis, were not based on tendon vibration conditions.
The experimental manipulations associated with prediction 2 directly impaired AVF quality by adding some noise to its dynamics. The prediction from MLE was that performance should never be worse than without AVF. It might be counterintuitive, since noisy AVF could act as a distractor and induce large errors in the stabilized pattern. This was not the case, and the subjects nicely integrated their proprioceptive inputs with the noisy AVF to optimize their movement stability. In particular, performance with AVF that was uncorrelated with the actual movement (and thus was virtually useless) was not distinguishable from performance with proprioception only. In a recent paper, Burge et al. (2008) found that adding random noise in the visual feedback during reaching movements does not affect the adaptation rate to visual offsets, contrasting with optimal predictions from a Kalman filter. Our results are not incompatible, since here we studied the effect of random noise on steadystate performance variability (not adaptation). Burge et al. (2008) did not report whether the added random noise had an impact on the variability of the reaching endpoint.
The performers were not aware about the experimental manipulation, which aimed at testing how they adapted the stabilized pattern when the AVF was imperceptibly skewed (prediction 3). We confirmed here that the uncorrupted inputs and AVF (corrupted) were integrated to compute the perceived phase offset between both limbs. Qualitative examination of Figure 6b suggests that the visual modality should account for ∼50% of the estimate, since the actual data slope was located exactly in between the two extremes: the “nocompensation” condition (if the shape of the visual display did not influence the stabilized pattern) and the “fullcompensation” condition (if the movement was fully adapted to see a circle on the screen, whatever the phase shift). This result was quite consistent with the result from the first prediction, which weighted AVF at ∼50–60% (Fig. 4).
The partial compensation we observed in this experimental manipulation could also arise from the performers' inability to stabilize any arbitrary pattern: e.g., for a phase shift of 5°, full compensation required the production of the 85°outofphase pattern, which could require extensive practice to be properly mastered. However, this explanation seems unlikely for the following two reasons. First, analyses of pattern variability across the different phase shifts did not expose any significant differences (Fig. 6c), suggesting that the required pattern was equally difficult to stabilize across all phase shifts. Second, the 85°outofphase pattern was stabilized successfully, but only when the phase shift equaled 10° rather than 5° (Fig. 6b). So for this particular instance, the performers demonstrated that the pattern corresponding to full compensation at 5° could be achieved, albeit in response to a different AVF phase shift. It seems reasonable to generalize this example to the whole range of coordination patterns covered by the experimental manipulation, since the data plotted in Figure 6b are highly linear, and do not suggest any kind of saturation in the stabilized phase difference between both wrists.
The experimental manipulations revealed the ability to integrate two sensory modalities with some feedforward contribution. This third potential source of information could result from internal forward model computations (Wolpert et al., 1998; Kawato, 1999; Sabes, 2000), and was incorporated as a third sensory source in our model. The way we introduced this information source is an important issue, since feedforward and feedbackdriven information are usually integrated with Kalman filters in the optimal control framework (Liu and Todorov, 2007). Predictions from Kalman filtering mainly deviate from those of MLE because motor noise sets both lower and upper bounds on the output variability, yet they reflect the same trend: the more sensory modalities, the more stable. Testing Kalmanbased predictions in a design such as ours would be challenging, since it requires separating between motor and sensory noise, while—as we stated above—proprioceptive inflows cannot be entirely masked without highly invasive procedures. Moreover, our model fits are strong, suggesting that in this task, the difference between the two approaches would be small.
The ability to integrate different sensory modalities during cyclical movements has been observed along the whole learning process, since modelbased prediction 1 and 2 were supported over the 5 d of practice (prediction 3 was not analyzed with the day of practice as design factor). The ability to rely on nearoptimal multisensory integration, even early on during skill learning, is generally consistent with previous investigations, demonstrating that the multisensory systems in adults are quite plastic across relatively short timescales (Ernst, 2007, 2008). A similar separation over two distinct timescales has recently been demonstrated in reaching movements within a stochastic curl field (Izawa et al., 2008): while the variability in the field induced rapid changes in the movement strategy, the overall optimization of reaching performance was only achieved after several days of practice. This last result is thus coherent with ours, demonstrating a separation between fast multisensory integrative capabilities, and slow learning processes.
Our contribution tackles a new area dealing with multisensory integration in the context of learning dynamic behaviors. We hypothesize that multisensory integration is achieved by mechanisms that are not task specific, and can be processed online approaching a statistically optimal manner, independent of the course of learning the task itself.
Footnotes

Support for this study was provided through a grant from the Research Fund Katholieke Universiteit Leuven (OT/07/073) and the Flanders Fund for Scientific Research (G.0241.05, G.0292.05, G.0593.08). This work was also supported by Grant P6/29 from the Interuniversity Attraction Poles program of the Belgian federal government. R.R. was funded by the Francqui Foundation. R.C.M. was funded by the Wellcome Trust.
 Correspondence should be addressed to Stephan P. Swinnen, Motor Control Laboratory, Department of Biomedical Kinesiology, Katholieke Universiteit Leuven, Tervuursevest 101—bus 1501, B3001 Heverlee, Belgium. stephan.swinnen{at}faber.kuleuven.be