Abstract
Previous research has shown that the brain uses statistical knowledge of both sensory and motor accuracy to optimize behavioral performance. Here, we present the results of a novel experiment in which participants could control both of these quantities at once. Specifically, maximum performance demanded the simultaneous choices of viewing and movement durations, which directly impacted visual and motor accuracy. Participants reached to a target indicated imprecisely by a two-dimensional distribution of dots within a 1200 ms time limit. By choosing when to reach, participants selected the quality of visual information regarding target location as well as the remaining time available to execute the reach. New dots, and consequently more visual information, appeared until the reach was initiated; after reach initiation, no new dots appeared. However, speed accuracy trade-offs in motor control make early reaches (much remaining time) precise and late reaches (little remaining time) imprecise. Based on each participant's visual- and motor-only target-hitting performances, we computed an “ideal reacher” that selects reach initiation times that minimize predicted reach endpoint deviations from the true target location. The participant's timing choices were qualitatively consistent with ideal predictions: choices varied with stimulus changes (but less than the predicted magnitude) and resulted in near-optimal performance despite the absence of direct feedback defining ideal performance. Our results suggest visual estimates, and their respective accuracies are passed to motor planning systems, which in turn predict the precision of potential reaches and control viewing and movement timing to favorably trade off visual and motor accuracy.
Introduction
Sensorimotor decisions involve three distinct components: world states, potential actions, and the rewards associated with different combinations of the first and second components. Inferring the state of the world is an implicitly uncertain process because of incomplete and noisy sensory data, and the outcome of intended actions is inherently variable for reasons that include neural firing variability and imprecise muscle responses to motor commands. Recent studies reveal the brain contains a rich representation of these components of sensorimotor decisions, including perceptual uncertainty, expected motor output variability, and monetary rewards (Trommershäuser et al., 2003a,b; Wu et al., 2006). Because states of the world are often dynamic and actions unfold in time, decisions must also take time into account. Trommershäuser et al. (2005, 2006a,b) showed that such near-optimal target selection may occur under tight time constraints and even in cases when the target or reward information vary during the reach. Our study asks: does the brain represent visual and motor variability as functions of time? Moreover, when task performance is degraded by both visual and motor errors, can the brain manipulate the sources of these errors to maximize task performance?
Many visually guided motor behaviors have time constraints to completion that induce a tradeoff between time allotted to gather sufficient visual information for action planning and time allotted for action execution. Often there is exactly one action to be performed within a finite time; competition for the limited time allowed by the task requires the simultaneous choice of viewing and movement durations. Longer viewing durations improve the quality of visual information, whereas longer movement durations decrease motor errors, and thus the choices of these durations directly impact performance.
For example, consider driving on a mountain road during a snowstorm. As visual information accrues suggesting a possible obstacle ahead, like a boulder or car, a plan must rapidly be formed to continue on course, slam the brakes, or perform a risky swerve. Viewing and action time intervals are controllable, and their durations directly affect task performance. Waiting lets you base your decision on greater sensory information but leaves less time to execute the action. Reacting immediately may allow sufficient time to execute the action but affords little information to decide which action is best. Therefore, time-constrained visuomotor tasks require balancing visual and motor timing to minimize visual and motor errors. The goal of this study is to psychophysically test how well people can achieve the optimal performance tradeoff in an experimentally controlled visuomotor task.
We asked participants to reach to a target within a fixed time limit. Our experiment was designed to allow participants to trade visual accuracy for motor accuracy by choosing when to initiate their reaches. We evaluated how participants balanced this tradeoff to infer whether they knew, and were able to control, their visual and motor variability.
Materials and Methods
Apparatus
Participants performed trials in a virtual workbench consisting of the graphical display of a scene rendered under accurate perspective projection and a haptic interface (PHANToM force-feedback device; Sensable Technologies, Woburn, MA) that simulated the feel of objects in the scene. The graphics were displayed on a 21-inch CRT monitor (pixel resolution, 1600 × 1200; 85 Hz). The image of the monitor was reflected off a full-silvered mirror so that the graphics appeared coaligned with the haptic workspace. The graphics and haptics were calibrated to always be consistent with the three-dimensional virtual scene. The virtual scene consisted of a frontoparallel task surface with contour information to enrich perspective cues, a start button, a countdown sand-timer, and the target stimulus. Participants wore eye patches over their left eyes to remove potential stereodisparity cue conflict. Participants placed their fingertips in a thimble attached to the haptic interface, which tracked the fingertip position. Visual feedback of fingertip position was continuously provided as a 1.3-mm-diameter sphere.
Target
On each trial, our participants' task was to place his or her fingertip on a 2.5-mm-diameter “start button” and reach to a 2.75 mm arc-length target (Fig. 1). The target itself was invisible but always lay collinear with a visible, gray “guide arc,” which was an arc-length at a radius 8 cm from the start button (arc interval between 176 and 236° counterclockwise from the positive x-axis). Also, the position of the target was indicated by a dynamic visual stimulus that appeared at the start of each trial. The stimulus was composed of very small dots scattered around the target position by sampling their positions from a two-dimensional (2D) normal distribution with a mean at the target position and with a SD (termed dot scatter level) that was varied across conditions. Three dot scatter levels were used: 4, 7, and 11 mm (low, medium, and high, respectively).
At the trial start, there were always five dots visible. As the trial time elapsed, the number of visible dots increased quadratically until a reach was initiated. The number of dots at any particular elapsed trial time was given by the following: where floor(x) rounds x down to the nearest integer, t is elapsed trial time, and N(t) is the number of dots as a function of t.
An ideal observer simply computes the sample mean of the dot positions as the optimal target location estimate. Therefore, the ideal observer's SD from the true target location decreases linearly with the square root of N(t). We chose the quadratic relationship between N(t) and t so that the ideal observer's SD from the true target location decreased linearly with t.
Procedure: Experiment 1
Three conditions were run: combined test condition (CC), visual baseline (VB), and motor baseline (MB). The participant's task varied slightly between conditions. Before each block, participants performed a series of practice trials to familiarize them with the task.
Practice sessions.
Before every block in every condition, participants were given ∼50 practice trials that were identical to the trials of the subsequent block but were not recorded or further analyzed. The purpose of these practice trials was to allow the participant to warm up and familiarize him or herself with the particular trial procedure of the subsequent block.
Combined test condition.
In each trial, the participant held the fingertip on the start button for 500 ms (Fig. 1) to signal he or she was ready to begin. When the computer detected the ready state, the trial started. At the trial start, the invisible target was placed at a random location on the guide arc. Simultaneously, a rectangular sand-timer appeared above the start button and began counting down the fixed trial time (1200 ms). The width of the sand-timer was always proportional to the remaining trial time, thus providing a visual cue to the remaining trial time.
To complete a reach before timing out, the participant had to move the fingertip from the start button to the guide arc within 1200 ms of the trial start. The time at which the fingertip first contacted the guide arc was called the trial end. If the participant did not complete his or her reach within the time limit, the trial was considered a “timeout,” and repeated later with a novel stimulus and target location. To make a successful reach, the participant had to be in contact with the target at the trial end. No matter what, the trial always ended once the fingertip first contacted the guide arc, meaning the participant could not cross the guide arc and then return to cross it again at another location. The computer recorded the position at which the fingertip first crossed the guide arc; this was considered to be the participant's indication of the target position for that trial. The distance, measured along the guide arc in millimeters, between the center of the target and the position at which the finger first crossed the guide arc is termed the reach endpoint offset and signifies the spatial inaccuracy of a reach. Because the target was a 2.75 mm arc-length, any reach endpoint offset ≤1.375 mm was considered successful, whereas any offset >1.375 mm was considered unsuccessful.
The computer also recorded the duration of two time intervals during each trial, viewing time (tv) and movement time (tm). tv was the interval between trial start and the reach initiation time, when the fingertip left the start button to move toward the target. tm was the interval between the reach initiation time and the trial end, when the fingertip first contacted the guide arc. The computer recorded the tv and tm durations with a temporal resolution of 11.8 ms.
Participants ran six blocks of 150 trials each. There were three dot scatter level subconditions (low, medium, and high), with two blocks in each subcondition. Each block took ∼20 min to complete and was divided into 30 trial sub-blocks. Between sub-blocks, participants were allowed to rest for a few minutes before continuing. Also, the participant was informed of the cumulative number of successful trials, in which the fingertip had contacted the target, for that block. This was the only form of feedback about task performance provided to the participants during the CC condition of Experiment 1.
Visual baseline condition.
The VB condition was designed to quantify the relationship between the number of visible dots and visual variability for each participant, without any effect of motor variability. The visual stimulus presentation that indicated the target location was similar to the CC condition, but instead new dots continued to appear after the reach was initiated until a predetermined time that was varied between sub-blocks. Therefore, participants could freely move their fingertips without affecting the appearance of the visual stimulus. Also, to indicate the target location, the participant positioned the fingertip on his or her estimate of the target location and depressed the left mouse button. Participants in the VB condition were given 5000 ms to complete the task (instead of the 1200 ms in the CC condition), which was more than adequate time to precisely position their fingertip. Participants rarely used the full 5000 ms; typically, they made target location selections within 2000–3000 ms. Effectively, these alterations eliminated target localization imprecision because of motor variability so that we could isolate the effect of the number of dots on visual variability. The computer recorded tv as the predetermined duration of the interval over which new dots appeared as well as reach endpoint offsets, which were analyzed to determine their relationship.
Participants ran three blocks of 150 trials each. There were three dot scatter level subconditions (low, medium, and high), with one block in each subcondition. Each block took ∼20 min to complete and was divided into 30 trial sub-blocks. The time intervals over which new dots appeared varied across sub-blocks. From first to fifth sub-blocks, the time intervals over which new dots appeared were 0, 212, 425, 637, or 850 ms or a total of 5, 11, 19, 28, or 39 dots, respectively. Between sub-blocks, participants were allowed to rest for a few minutes before continuing. Also, the participant was informed of the cumulative number of trials in which he or she had completed a successful reach for that block. This was the only form of feedback about task performance provided to the participants during the VB condition of Experiment 1.
Motor baseline condition.
The MB condition was designed to quantify the relationship between movement duration and motor variability for each participant, without any effect of visual variability. The MB condition was similar to the CC condition, with several differences. Instead of scattered dots, the visual stimulus that indicated the target location was now a small 2.75 mm white arc-length that perfectly specified the target location. Also, the trial time limit was not constant across the MB condition, it was varied between sub-blocks, and was always ≤1200 ms. The initial width of the countdown sand-timer was shortened to match the trial time limit of the sub-block. Once the trial started, the sand-timer shortened at the same rate as in all other conditions so that the width was proportional to the remaining trial time. Also, the trial start was determined by the reach initiation time, that is, trial time began to elapse once the reach was initiated. The participant was instructed to use as much time as possible to complete his or her reach. Effectively, these alterations removed the effects of visual target localization variability to isolate the relationship between movement duration and motor variability. The computer recorded tm and reach endpoint offsets, which were analyzed to determine their relationship. Note that in the analysis, we related reach precision to the measured tm, not the cued movement time.
Participants ran two blocks of 150 trials each. Each block took ∼20 min to complete and was divided into 30 trial sub-blocks. The cued movement time varied across the five sub-blocks. From the first to fifth sub-blocks, the cued movement times were 1200, 988, 775, 563, and 350 ms. Between sub-blocks, participants were allowed to rest for a few minutes before continuing. Also, the participant was informed of the cumulative number of trials in which he or she had completed a successful reach for that block. This was the only form of feedback about task performance provided to the participants during the MB condition of Experiment 1.
Procedure: Experiment 2
We conducted Experiment 2 to assess the effect of performance feedback on reaching behavior. Experiment 2 was similar to Experiment 1 with several key differences. First, participants received performance feedback after every trial, in addition to the sub-block success summaries. Performance feedback consisted of the haptic sensation of a bump when the target had been contacted, as well as visual presentation of an illuminated arc-length at the location of the target that was green if the target had been successfully contacted and red if the target had been missed. Second, we did not include the medium dot scatter level condition, thus only the 4 and 11 mm dot scatter levels were used. Third, two Experiment 1 participants were unavailable, so only five of the original seven participants participated in Experiment 2.
In Experiment 2, we repeated the VB and MB conditions identically to Experiment 1 (no performance feedback) for each participant and compared performance with Experiment 1.
Participants and compensation.
Seven (four females, three males) naive, right-handed University of Minnesota students with normal or corrected-to-normal vision participated in this experiment. Participants gave informed consent in accordance with University of Minnesota Institutional Review Board standards, were compensated $8.00/h of participation and received bonus money depending on performance.
Participants could earn bonus money in all conditions. Whenever the participant's reach contacted the target, the trial was considered a success. For each successful trial, the participant earned $0.02 bonus. This bonus money was awarded in addition to the $8.00/h compensation for participation time.
Model
Visual and motor accuracy tradeoff.
In each CC trial, once the reach was initiated, no new dots appeared, meaning the visual stimulus no longer improved. Therefore, the reach initiation time effectively divided the trial into two distinct intervals: viewing time and movement time. Because visual localization accuracy improved with increased viewing time, tv, and reach precision improved with increased movement time, tm, the reach initiation time implicitly imposed a tradeoff between visual and motor accuracy. Ideal CC condition performance required the participant to select tv and tm to jointly maximize visual and motor accuracy.
To make predictions for participants' maximal CC performances, we computed an “ideal reacher” by combining participants' individual predicted visual and motor variability, measured in their VB and MB conditions, respectively. Figure 2 depicts ideal predicted CC performance. Each box in Figure 2 shows a different dot scatter level condition. For illustrative purposes, Figure 2 constrains tm to be equal to (1200 − tv) so that tv and tm can be represented on a single axis. Notice how the tradeoff between visual and motor accuracy produces predicted CC performance curves that vary with viewing time, tv. The viewing time that minimizes predicted CC performance function (Fig. 2, arrows with numbers) increases as dot scatter level increases.
Ideal reacher derivation.
CC condition reaches were modeled as follows. The process of planning and executing a reach included two components. First, the participant visually estimated the location of the target. Let X represent the true target location and X̂ represent the visually estimated target location. We assumed there was visual variability that contributed to additive errors, εv, between X and X̂ such that X̂ = X + εv. Second, the participant directed a reach toward the estimated location. Let Z represent the reach endpoint. We assumed there was motor variability that contributed to additive errors, εM, between X̂ and Z such that Z = X̂ + εM. Together, resulting in a total error, εC = εV + εM, that constitutes the reach endpoint offset between Z and X.
Visual errors, εV, varied with tV and σd, had mean equal to 0, and variance, σv2, such that where ρe and ωe are free parameters that were fit by maximum likelihood estimation (MLE) to the VB data. The derivation of Equation 3 and the fitting procedure are described in the Appendix. Note that for the rest of this study, we will suppress explicit dependency expressions after their initial presentation; for instance, σv(tv, σd)2 will simply be written σv2.
Motor errors, εM, varied with tM, had mean equal to 0, and variance, σM2, such that where α, β, and γ are free parameters that were fit by MLE to the MB data, and D was the distance between the start button and the guide arc (8 cm). The derivation of Equation 4 and the fitting procedure are described in the Appendix.
As mentioned above, in each CC condition trial, we allowed participants to choose a combination of tV and tM values. We will refer to the combination of tV and tM chosen on a single CC condition trial as [tV, tM]. Because the variance of the sum of two independent random variables is the sum of their variances, the variance of εC, denoted by σC2, is the sum of σV2 and σM2: Note that the combined variance depends on both timing choices, tV and tM, as well as the dot scatter level, σd.
Because the VB and MB conditions measured how tV, tM, and σd affected participants' visual and motor variability independently, we used these relationships (Eq. 5) to formulate an ideal reacher model to predict how combining these sources of variability would affect participants' reach endpoint offsets in the CC condition. Specifically, by fitting the free parameters of σV2 and σM2 (see Appendix), we were able to compute predicted σC landscapes that depended on tV and tM, for each of the three experimental σd levels. Errors in fitted parameters (estimated by bootstrapping the data across MLE fits) were propagated by computing predictions for each set of bootstrap parameter fits. The ideal reacher is subject to the constraint that tM is less than or equal to the remaining trial time after reach initiation (1200 − tV), just as in the CC condition. We assumed that participants learned the three σd levels from the practice and VB conditions and immediately implemented that knowledge in the CC condition.
For each σd level, we predicted ideal timing choices, [tV*, tM*], by numerically minimizing Equation 5. Note that the predictions of our ideal model are based on the assumption that timing choices will be executed without error. However, we expect that humans viewing times and reaching times will differ from their choice intentions because of noise. Although we investigated the effects of noisy execution of timing choices on theoretical predictions, we found a full analysis provides only a modest increase in our ability to predict performance in the CC condition, and thus for simplicity, we ignored it.
Results
Experiment 1
We compared each human participant's performance in the CC task to their ideal predicted performance. A critical test is whether participant's choices vary with systematic changes in the quality of visual information (dot scatter level) as they do for the ideal predicted choices. To ensure our results reflected normal behavior and not learned associations, no performance feedback was given to participants in the reach task except a cumulative score presented every 30 trials (we later conducted Experiment 2 with performance feedback provided to assess what, if any, impact this had on performance).
Baseline conditions
The VB condition measured participants' σV functions. Because the number of dots, N(tV), increased with viewing time, tV, and more dots gave more information about the target location, visual offsets should decrease as tV increases. The dashed curves in Figure 2 show one participant's σV function, as described by the visual variability model (Eq. 3) with best-fit parameters. Figure 2 illustrates that as tV increases, the participant's visual variability decreases.
Figure 3 depicts participants' errors in estimating dot centroid locations for various numbers of dots, N(tV), and dot scatter levels, σd, in the VB condition. These errors are computed relative to the mean, or centroid, of the dot scatter of each trial (rather than the underlying target location), to discount visual variability because of centroid misestimation. CC condition performance depends on the overall visual error, εV, which combines the errors shown in Figure 3 with errors because of deviations between the centroid of the dots and the true target location. The relationship between the different components of participants' visual errors and σV is provided in the Appendix. Tassinari et al. (2006) reported a similar analysis of human dot centroid mislocalizations, and although they used slightly different dot scatter levels and numbers of dots, their results are consistent with ours when extrapolated to our particular conditions.
The MB condition measured each participant's motor precision (quantified as expected motor offset from the target location) as a function of tM. Remember that tM is the remaining trial time after a reach initiation, so shorter viewing times, tV, allow greater movement times, tM. The dotted curves in Figure 2 show one participant's σM function, as described by our motor precision model (Eq. 4) with best-fit parameters. Figure 2 illustrates that as tM increases (earlier reach initiations), participants' motor variability decreases. σM was used to predict the contribution of motor variability to σC in the CC condition. Table 1 represents mean (±SEM) values for the fitted parameters from our motor precision model (Eq. 4).
CC condition: timing choices
Different dot scatter levels, σd, produce different σC functions (Eq. 5). Because the minimal σC varies with dot scatter level, the ideal performer adjusts its [tV*, tM*] choices for different values of σd (Fig. 2, solid, vertical lines with arrows and numbers indicating the tV value). Specifically, the ideal performer should increase tV and decrease tM as dot scatter level increases. We compared each participant's actual timing choices to his or her respective ideal choice predictions for different dot scatter levels to assess how well humans chose to trade off visual and motor variability.
Figure 4 shows participant 4's measured [tV, tM] choices from every CC condition trial (points) superimposed over the predicted σC landscape as a function of [tV*, tM*] (grayscale contours). Lighter shades represent smaller values of σC from target location; darker shades represent larger values of σC. We interpreted the mean of the observed timing choices as the participant's estimate of the timings that would maximize his or her performance (minimize reach endpoint offsets) and the covariance as resulting from variability in both decision-making and motor output.
To simplify the analysis and provide an intuitive measure of participants' timing choices, we computed the perpendicular projections of each [tV, tM] vector onto the nearest point on the line with slope −1 and y-intercept 1200 ms (termed the total time axis) and called these values tC. The projections of the ideal performer's [tV*, tM*] choices onto the total time axis are called tC*. The total time axis can be thought of as the set of timing choices for which tV + tM = 1200 ms. Each tC represents the nearest point on the total time axis to the participant's [tV, tM] choice. This axis is meaningful, because the minimum of σC for ideal predictions shifted along that axis as a function of σd.
Figure 5 shows a scatterplot of one participant's timing choices and their respective tC values. The x-axis represents viewing time, the y-axis represents movement time, each dot represents a [tV, tM] choice from one trial, and the diagonally aligned bar graph represents a histogram of tC on the total time axis. Blue, green, and red dots/bars represent trials from the low, medium, and high dot scatter level conditions, respectively. Notice that as dot scatter level was increased, tC choices shifted along the total time axis as tV increased and tM decreased.
We performed first-order linear regression analysis on the tC values as a function of dot scatter level. Figure 6 summarizes the regression slopes [with 95% confidence intervals (CIs)] for each participant. All slopes were significantly greater than zero (p < 0.05), indicating all participants shifted tC in the appropriate direction given σC across dot scatter levels.
Qualitatively, all participants adjusted their timing choices in concert with the ideal performer. A quantitative comparison between participants' actual tC choices and their ideal performers' predicted tC* choices is shown in Figure 7. For each participant, the means of measured tC values were plotted against ideal performer's tC* values that would minimize σC.
Figure 7 confirms that all participants shifted their tC values in the direction of the ideal predicted shift but sometimes with a lesser magnitude than predicted (see Discussion). One notable pattern in this figure is that all participants but one (participant 2) show a greater tC shift between high (Fig. 7, rightmost points) and medium (middle points) dot scatter levels versus between medium and low (leftmost points) dot scatter levels. One potential explanation for this pattern is that participants may be less sensitive to their own internal visual target certainty at lower uncertainty levels. Another potential explanation may be an artifact of our methodology: we conducted the high dot scatter level condition several days after the low and medium dot scatter level conditions, the blocks of which were interleaved. Perhaps interleaving the low and medium scatter level blocks promoted strategy generalization among the participants in which they chose [tV, tM]s that were good for both scatter levels.
CC condition: task performance
To validate our predictive model of σC given baseline condition measurements, we compared participants' measured reach endpoints to their ideal performers' σC function. Figure 8 represents all participants' measured endpoint offset SDs plotted against the ideal predicted offset SDs. Each measured endpoint offset SD was computed by taking the SD of a participant's reach endpoint offsets for a particular dot scatter level, and the ideal predicted offset SD is simply the value of the ideal performer's predicted σC at its minimum. Figure 9 characterizes the relationship of predicted and measured performance by “combination efficiency.” The percentages in Figure 9 were computed as follows: for each CC condition reach, the measured reach endpoint offset was divided by the value of σC at the participant's measured [tV, tM] choice in that trial. We took the root mean square of these ratios and multiplied it by 100 to compute average efficiency, in percentage, of the participant's performance with respect to the model.
Generally, measured performances slightly exceeded predicted performance despite actual timing choices shifting slightly less across dot scatter levels than the ideal prediction. This slight overperformance may be attributable to increased participant motivation in the CC condition resulting from the increased difficulty of the task over the baseline conditions (Fig. 2) and greater allowance for participants' choices, because they chose tV and tM in the CC task as opposed to being cued to particular values of each, as in the MB and VB conditions.
Experiment 2
A final question addressed whether providing direct performance feedback to participants would substantially change their timing choices, perhaps helping them choose [tV, tM] nearer to the minimum of their ideal performer's σC function.
As described in Materials and Methods, Experiment 2 provided performance feedback indicating whether reaches were successful in the CC condition. If participants' timing choices remained unchanged between Experiments 1 and 2, we could conclude that participants did not require performance feedback to select [tV, tM] with low values of σC.
We found no significant difference in either CC task performance or [tV, tM] distributions when direct feedback was provided. Figure 10 scatterplots CC task performance between Experiments 1 and 2. To quantify performance, we used the percentage of CC task trials in which the participant successfully reached the target. The x-axis represents Experiment 1 performances, and the y-axis represents Experiment 2 performances. Each point represents the performances for one dot scatter level, for one participant. The correlation between the points is 0.96 (p < 0.00001). This supports the view that each participant performed consistently across Experiments 1 and 2.
Figure 11 scatterplots distributional areas of timing choices between Experiments 1 and 2. To quantify distributional area, we used the square-root of the trace of the covariance matrix of each participant's 2D [tV, tM] distribution. We plotted the distributional areas for Experiment 1 against those for Experiment 2 to assess the consistency of participant's timing choice variability across the two experiments. The correlation between the points is 0.41, which shows participants' [tV, tM] distributions had similar areas, with and without direct feedback. If we consider the point with the “X” over it (participant 4, low dot scatter level condition) an outlier and remove it from this analysis, the correlation rises to 0.67 and becomes significantly different from zero (p < 0.05). Because there was little difference between distributional areas in Experiments 1 and 2, we conclude the variability in [tV, tM] choices was unrelated to participants' knowledge of their own performance. Potential explanations for the timing variability are explored below.
Discussion
We interpreted our model and participants' behaviors in the context of statistical decision theory, whose application to sensorimotor neuroscience stems from a long tradition of treating perception and action as statistical computation problems (Attneave, 1954; Fitts, 1954; Barlow, 1961). Our results suggest that when performing visually guided motor behaviors, the brain represents both the quality of the visual information and potential motor output. Moreover, the brain understands how visual and motor variability depend on time and selects viewing and movement durations to minimize consequent errors.
These results are not necessarily surprising in isolation, as previous studies have shown human performance of various sensorimotor tasks reflects key elements of statistically optimal decision making (Maloney, 2002; Körding and Wolpert, 2006), including near-optimal use of sensory information (Kersten, 1987; Geisler, 1989, 2003; Legge et al., 1997; Knill, 1998; Kersten et al., 2004), reliability-weighted sensory information combination (Landy et al., 1995; Knill, 1998; Jacobs, 1999; Ernst and Banks, 2002; Battaglia et al., 2003; Alais and Burr, 2004; Ernst and Bülthoff, 2004; Shams et al., 2005), knowledge of the generative processes of sensory inputs (Knill and Kersten, 1991; Bloj et al., 1999; Battaglia et al., 2005), use of prior information (Mamassian and Landy, 2001; Weiss et al., 2002; Adams et al., 2004; Körding and Wolpert, 2004; Tassinari et al., 2006), internal motor output variability representations (Harris and Wolpert, 1998; Todorov and Jordan, 2002; Todorov, 2004), and selection of gain-maximizing actions (Schrater and Kersten, 2000; Trommershäuser et al., 2003a,b, 2005, 2006a,b; Wu et al., 2006). What is surprising is that these elements cooperate to allow human performance to approach optimal performance for a novel visuomotor task. Minimally, visual estimates and their respective accuracies are passed to motor-planning systems, which predict the precision of potential reaches and control viewing and movement timing to favorably trade off visual and motor variability.
Based on participants' isolated visual (VB) and motor (MB) performances, we predicted timing choices expected to minimize reaching errors in the combined visuomotor condition (CC). Participants clearly adjusted their timing choices in a manner predicted to improve task performance. Because participants chose viewing and movement durations predicted to yield low offsets with nearly zero performance feedback, we conclude that this behavior is not merely an association between timings and success but in fact an internal representation of task structure. This conclusion is necessary to explain the different timing choice strategies across dot scatter levels. Because the additional direct performance feedback in Experiment 2 did not substantially improve (or otherwise modify) participants' behavior, it seems that inherent knowledge of visual and motor variability are sufficient to support optimal task performance, and little learning is required to optimize that knowledge. A potential future experiment could impose an alternative manipulation of the optimal trade-off time by varying the rate at which new dots appear. If carefully controlled, this manipulation could help expose the relationship between neural integration windows and information accumulation periods for perceptual decision making (Mazurek et al., 2003; Uchida et al., 2006).
One puzzling feature of our data is the variability in participants' measured timing choices in the CC condition. Across participants, the SDs of tM measured in the MB condition were consistent with those of Zelaznik et al. (1988). The average SD of tM measured in the CC condition across participants was 98.9 ms (±18.0 ms; 95% CI), which was 1.8 times (with 1.3 as the 95% lower confidence interval) our MB measurements, as well as those of Zelaznik et al. (1988), predict as participants' minima. This means participants allowed significantly more (p < 0.05) variability in movement time, tM, in the CC condition than in the MB condition. We conclude that participants did not deem tM variance minimization to be critically important in the CC condition. Likewise, CC condition viewing times, tV, had high variability, with an average SD of 122.1 ms (±22.7 ms; 95% CI) across participants.
We could not concretely explain these high degrees of variability, but there are several possibilities. First, participants' choices may reflect a principle from learning theory called the exploration/exploitation tradeoff. This holds that when learning to improve task performance, optimal task behavior may be deliberately sacrificed to test whether novel behaviors may potentially yield greater performance. In our task, participants may choose novel [tV, tM] values that are not consistent with their estimates of the optimal [tV*, tM*] choices to investigate whether these novel timings may improve performance. Despite the lack of performance feedback in Experiment 1 beyond the 30 trial cumulative scores, participants may internally monitor their reach performance to provide supervision for learning, akin to “bootstrapped-learning.” A second possibility is that participants have some uncertainty about the exact tV and tM combinations that minimize σC. When people are uncertain which timing is best, they may simply consider a set of timings to be “good enough” and thus explicitly allow [tV, tM] choices to vary within that set. This is qualitatively consistent with the “minimum intervention principle” posited by Todorov and Jordan (2003). A third possibility is that because σC have relatively flat minima, the predicted difference in monetary reward is so small, perhaps as small as a few cents, that the cost of controlling [tV, tM] choices is higher than the small payoff such control may yield.
A potentially related issue was the smaller-than-predicted timing shifts across dot scatter levels. One possibility again relates to participants devoting some trials to task exploration, as described above. In this case, the extreme dot scatter levels (e.g., low and high) would have some [tV, tM] choices distributed toward the middle of the available timing range, thus lowering the measured shifts across scatter levels. A second possibility involves a potential mismatch between the assumptions in the ideal reacher model and human behavior. Ideal timing choices contain no temporal scatter and have no restrictions on tM (e.g., tM = 0 is possible). Perhaps in the presence of temporal scatter and additional unmodeled costs associated with extremely rapid movements, the range of admissible tM choices was reduced, consequently reducing the range of tV choices as well. A third possibility is that, again, because of the relatively flat minimum of σC, participants may not expect sufficient reward to warrant fully changing their choice strategy with dot scatter level.
One limitation of our model is that it only places cost on missing the target. The model does not acknowledge any cost for timing out. Therefore, the optimal [tV*, tM*] choices of our model always lie on the total trial time axis (slope, −1; y-intercept, 1200 ms). Timing choices that lie on the total trial time axis imply that the participant used all 1200 ms of the total trial time (i.e., tV + tM = 1200 ms). This means that if the participant's tV or tM lasted even 1 ms longer, the trial would time out. Because people cannot control the duration of tV or tM to within 1 ms, the optimal decision is to shorten tV and/or tM to avoid timeout costs. In our experiment, participants often shortened tV and/or tM such that tV + tM < 1200 ms. We believe implicit timeout costs explain why many timing choices are displaced away from the total time axis. Although we did not impose a monetary penalty for timeouts, and each timeout trial is repeated later, it is reasonable to assume participants would prefer to avoid repeating timeout trials. We applied simple estimates of timeout costs to our ideal performer model, but the estimates were inaccurate and participants' consistency with these timeout-penalized optimal timing choices only improved marginally, so we did not report those analyses.
Although our experiment was purely psychophysical, neurophysiological studies support the existence of neural representations and computations required for optimal task performance, including time, probability, and reward-driven decision making (Schall, 2001; Platt, 2002; Sugrue et al., 2005). Recent studies have provided evidence for temporal probability representations in lateral intraparietal areas (LIPs) and parietal cortex in general. Leon and Shadlen (2003) reported evidence of neurons in macaque LIP that code the value as well as uncertainty of a remembered temporal duration (or at least monkeys' judgments of such quantities), whereas Janssen and Shadlen (2005) reported evidence of macaque LIP neurons encoding the probability of the occurrence of an event as a function of time. These results highlight potential cortical substrates for processing components our task requires, specifically representing trial time to decide when, and for how long, to execute reach movements.
In our task, the brain represents visual and motor uncertainty relationships and combines them to perform a joint task. A natural question is whether such behavior relies on cortically separate visual and motor representations or a unified representation of visuomotor uncertainty. Compelling arguments exist for both of these views. Coordinate transformations, temporal syncing, and the propagation of task goals to individual visual and motor decisions may be better served by a unified representation of visuomotor uncertainty. Conversely, independent noise corruption, sensory or motor recalibration, and general organizational simplicity may favor separate representations (for review, see Pouget et al., 2002).
In conclusion, our results supported the view that people can represent their visual and motor variability as functions of time. Moreover, they can combine these components to predict their performance in a task that depends on both, and select viewing and movement durations to minimize reaching errors. This behavior is consistent with Bayesian Decision Theoretic performance of visuomotor tasks.
Appendix
Visual variability model
We assumed that participants computed X̂ by estimating the centroid of the dot positions by taking their mean. The overall deviation of X̂ from X was given by εV, as mentioned, but this term can be further split into two discrete sources of error such that εV has mean equal to 0, and variance, σV2, which is the sum of the variances of εμ and εe (by conditional independence) such that The first source of error, εμ, results from deviations between the mean of the dot positions, μd(tV), and X, resulting from randomness in sampling the dot positions. Because the dot positions were normally distributed, εμ was normally distributed with mean 0 and SD σμ. Formally, σμ was a function of viewing time, tV, and dot scatter level σd such that using N(tV) from Equation 1. Notice that there are no free parameters to fit from data.
The second source of uncertainty, εe, results from participants' misestimates of the positions of individual dots, which we assumed were corrupted by independent, mean 0, normally distributed positional uncertainty. Therefore, εe was normally distributed with mean 0 and SD σe. For simplicity, σe was assumed to be a first-order linear function of σμ and thus dependent on tV and σd as well such that where ρe and ωe were free parameters.
By substituting Equations A3 and A4 into A2, εV has mean equal to 0, and variance σV2 such that as given in Equation 3 in Materials and Methods. We fit the free parameters, ρe and ωe, by MLE using the VB data separately for each participant, for each dot scatter level, as described below.
Motor variability model
We assumed that errors between X̂ and Z, εM, were attributable to motor noise and normally distributed, with mean 0, and SD σM. Classically, Fitt's law expresses a relationship between average movement duration and target width that has several variants (MacKenzie and Buxton, 1992). It has been used to model reach endpoint SDs as a function of movement durations (Schmidt et al., 1978; Harris and Wolpert, 1998). We modified the simplest form of the equation as follows: where T is expected movement duration, D is target distance, and W is target width. a and b are free parameters that vary across tasks and participants.
In particular, we characterized σM as a function of tM, or σM(tM). We related W to εM by observing that 95% of reach endpoints will lie within the target width when Note that we have equated the expected movement time, T, with the movement time choice, tM. To formulate endpoint variability as a function of tM, we solve the above expression for W, replace W with σM/c, where c appropriately scales the endpoint variability to match success criterion for acquiring the target (i.e., the target width). Also, we add a constant, γ, that represents the minimum achievable movement endpoint offset (γ can be thought of as the offset achievable if given a very long time to complete the reach). The resulting expression is fit to individual participant data: as given by Equation 4 in Materials and Methods, where absorbs the effects of c. We fit the free parameters α, β, and γ using MLE as described below.
Maximum likelihood estimation of model parameters
MLE of model parameters was separately fit for each participant using the baseline condition data. The log-likelihood of the data was computed for each reach by evaluating the probability of the offset of each trial using the model, evaluated at various model parameter values. Those model parameters that produced the maximum likelihood were considered to be best-fit parameters. For example, the likelihood of {α, β, γ} is given by the following expression, where φ(·) is a Gaussian density function, Yi is the offset for the ith reach, and k is the total number of reaches: The likelihood for the visual data is similar. These expressions were numerically maximized to find optimal parameters.
Bootstrapped confidence intervals/SEMs
Reported results are accompanied by 95% confidence intervals or SEMs. When necessary for predictions, these confidence intervals or SEMs were computed by bootstrapped resampling of the raw data (Efron and Tibshirani, 1994). Specifically, we sampled the original data with replacement 50 times (or more if computational costs were not prohibitive) and performed the reported analysis on all 50 resampled data sets. From the set of 50 results, we computed the mean and SD; the mean is the value reported, and the bootstrap SD represents the SEM 95% confidence intervals were computed as 1.96 times the SEM.
Footnotes
-
This work was supported by National Institutes of Health Grant R01-EY015261, Office of Naval Research Grant N00014-05-1-0124, and the National Science Foundation Graduate Research Fellowship program. We thank Dr. Daniel Kersten for valuable advice regarding the experimental design and this manuscript. We also thank our reviewers for insightful comments and suggestions for revisions.
- Correspondence should be addressed to Peter W. Battaglia, Department of Psychology, University of Minnesota, Twin Cities, Elliott Hall, 75 East River Road, Minneapolis, MN 55455. batt0086{at}umn.edu