Abstract
Establishing a coherent internal reference frame for visuospatial representation and maintaining the integrity of this frame during eye movements are thought to be crucial for both perception and motor control. A stable headcentric representation could be constructed by internally comparing retinal signals with eye position. Alternatively, visual memory traces could be actively remapped within an oculocentric frame to compensate for each eye movement. We tested these models by measuring errors in manual pointing (in complete darkness) toward briefly flashed central targets during three oculomotor paradigms; subjects pointed accurately when gaze was maintained on the target location (control paradigm). However, when steadily fixating peripheral locations (static paradigm), subjects exaggerated the retinal eccentricity of the central target by 13.4 ± 5.1%. In the key “dynamic” paradigm, subjects briefly foveated the central target and then saccaded peripherally before pointing toward the remembered location of the target. Our headcentric model predicted accurate pointing (as seen in the control paradigm) independent of the saccade, whereas our oculocentric model predicted misestimation (as seen in the static paradigm) of an internally shifted retinotopic trace. In fact, pointing errors were significantly larger than were control errors (p ≤ 0.003) and were indistinguishable (p ≥ 0.25) from the static paradigm errors. Scatter plots of pointing errors (dynamic vs static paradigm) for various final fixation directions showed an overall slope of 0.97, contradicting the headcentric prediction (0.0) and supporting the oculocentric prediction (1.0). Varying both fixation and pointing-target direction confirmed that these errors were a function of retinotopically shifted memory traces rather than eye position per se. To reconcile these results with previous pointing experiments, we propose a “conversion-on-demand” model of visuomotor control in which multiple visual targets are stored and rotated (noncommutatively) within the oculocentric frame, whereas only select targets are transformed further into head- or bodycentric frames for motor execution.
How does the brain represent and store visual space? When the eye is at rest, incident light from different points in the visual field stimulates receptors at unique locations on the retina. The total activity profile of these cells thus provides a retinotopic map of target locations, where the central fovea of the retina always corresponds to the object of current regard. This organization is passed on relatively undisturbed to further cortical and subcortical visuomotor maps (e.g., Hubel and Weisel, 1959; Sparks, 1989). However, this mapping is insufficient in itself to store spatial locations, because the spatial registry between the retina and external space changes every time the eyes move (Howard, 1982; Miller and Bockisch, 1997).
Nineteenth century investigators recognized that this “space constancy” problem could only be solved if the brain somehow takes eye movement into account, either via outflow (von Helmholtz, 1867;Stark and Bridgeman, 1983) or inflow (Steinbach, 1987; Gauthier et al., 1990) signals. Subsequent oculomotor experiments have shown that this problem is indeed solved, in the sense that humans and monkeys can saccade to correct target locations after intervening eye movements (Matin et al., 1969; Hallet and Lightstone, 1976; Mays and Sparks, 1980; Miller, 1980; Schiller and Sandel, 1983; Sparks and Porter, 1983;McKenzie and Lisberger, 1986; Honda, 1989; Schlag-Rey et al., 1989;Schlag et al., 1990). Further experiments have suggested two possible neural mechanisms for such oculomotor space constancy, each with very different implications (e.g., Andersen et al., 1985; Goldberg and Bruce, 1990). The traditional explanation is that the brain continuously compares eye-centered retinal inputs with an internal representation of eye position to derive a headcentric map of visual space (Zee et al., 1976; Howard, 1982; Andersen et al., 1985). By further taking head and body position into account (Soechting et al., 1991; Flanders et al., 1992; Brotchie et al., 1995), this mechanism could potentially provide a stable internal map of absolute space.
Alternatively, visual space could be represented in a dynamic retinotopic map (Moschovakis et al., 1988; Goldberg and Bruce, 1990;Waitzman et al., 1991). The correct spatial registry of this oculocentric map with the external world would be maintained by internally remapping retinotopic representations to compensate for each eye movement. A complete oculocentric map of external space could potentially be formed by extending this internal map beyond the actual retinal range and adding depth information, but our subjective intuitions of a stable internal map of absolute space would then be somewhat illusory.
Thus far, these two models have proven surprisingly difficult to distinguish experimentally. In theory, the eye-to-head reference frame transformation in the headcentric model could be subserved by subtle eye-position dependencies in posterior parietal cortex called “gain fields” (Andersen et al., 1985; Zipser and Andersen, 1988), but this still requires output to a headcentric map. With only a few exceptions (e.g., Schlag and Schlag-Rey, 1987), the latter are rare compared to the expanse of retinotopic maps in the cortex (for review, seeMoschovakis and Highstein, 1994). In line with the headcentric model, perturbations in eye position can affect perceived target location, as judged by pointing (Steinbach, 1987; Gauthier et al., 1990), but we would point out that this proprioceptive modulation could very well occur after the visual storage mechanism. The evidence cited for dynamic retinotopic mapping is equally subtle, relying on subtle shifts in sustained neural activity (Duhamel et al., 1992; Walker et al., 1995) and psychophysically measured perceptual distortions near the occurrence of saccades (Matin et al., 1969; Honda, 1989; Miller, 1989;Cai et al., 1997; Ross et al., 1997). The purpose of the current study was to provide a two-tailed, mutually exclusive behavioral test between these two models, based on localization errors that we observed when eye movements occurred between viewing a target and pointing toward the remembered location of the target.
Theory and logic behind the test
Normal human subjects are relatively accurate when pointing toward a remembered target along the current direction of gaze (Gauthier et al., 1990; Flanders et al., 1992), but when asked to point (open visual loop) toward a peripheral target, they usually exaggerate the angular retinal eccentricity of the stimulus (Bock, 1986; Enright, 1995). This presumably occurs at some point in the visuomotor transformation for arm movement, but exactly how or why this happens need not concern us here. The important point is that it must happen at a level where afferent spatial information is still encoded retinotopically. As a result, this visuomotor error would have to occur at different stages relative to the short-term memory storage stage of the oculocentric and headcentric models. This forms the basic premise for our test.
Figure 1 illustrates the logic behind this test and the different predictions of the headcentric and oculocentric models. The “subject” initially looks straight ahead toward a distant, briefly flashed target (solid circle) (Fig. 1A, center). Again, humans are normally quite accurate at pointing toward such a target, but we now add a new variation. After viewing the target, the subject rotates the eyes 30° leftward (Fig. 1B,center) and only then (Fig. 1C,D) points toward the remembered target location. What will be the effect of this intervening eye movement?
According to the headcentric model (Fig. 1, left), the retinal code would be read out almost immediately to form a headcentric map of visuomotor space. Thus, during initial target perception (Fig.1A), the foveal stimulus would be compared with current eye position (straight ahead) to compute and store a headcentric target direction vector (dashed line). This vector should remain stable after the subsequent eye movement (Fig. 1B). (This indeed is the point of this model.) Assuming for the moment (this will be tested below) that the static position of the eye itself does not induce pointing errors (Hill, 1972;Morgan, 1978), the headcentric model predicts that an intervening eye movement will have little systematic effect on pointing accuracy (Fig.1C).
In contrast, the oculocentric model (Fig. 1, right) states that the retinal code is continuously updated and available until the decision is made to execute a movement. This model stores a retinotopic memory trace, defining an eye-centered target vector (Fig.1A, right). This model must then compensate for the leftward gaze shift (Fig. 1B) by countershifting this retinotopic trace, in effect rotating the oculocentric direction vector 30° to the right (gray sector). Thus, if the subject makes use of the updated representation (which is the purpose of this model), he or she would point based on a peripherally shifted retinotopic memory trace (Fig.1D, gray sector). As mentioned previously, human subjects are systematically inaccurate at pointing toward peripheral retinal targets, usually exaggerating the angular eccentricity of the targets (Bock, 1986). Therefore, even though the target was only viewed with the fovea, such subjects should now show an angular overshoot in pointing direction (Fig.1D, black sector) opposite to the current line of gaze (thin arrow). This prediction is clearly different from that of the headcentric model. Because pointing behavior appears early in development and agrees well with verbal reports of perceived target direction (Gauthier et al., 1990), this test (Fig. 1C vs D) should provide a secure behavioral window into the mechanism of short-term spatial memory.
MATERIALS AND METHODS
Subjects. Nine right-handed human subjects participated in the experiment; the first two subjects were aware of the design and purpose of the experiment, another two were aware of its general nature but not the specific test, and the rest were naive. There was no qualitative difference or statistically significant difference (p ≥ 0.198) in overall pointing responses between these three groups in the final quantification (see Fig. 7). Of the original nine subjects, two were later excluded from quantitative analysis because they consistently failed to meet the ocular fixation criteria delineated below (although qualitatively they showed the same effects described below). Those remaining were four females and three males, aged 23–42 years, with no known neuromuscular deficits. This experiment was preapproved by the York Human Participants Review Subcommittee.
Equipment. Subjects were seated in complete darkness with their right arms resting unencumbered on their laps. The heads were mechanically stabilized with the use of a bite bar attached to a personalized dental impression. Eye and arm orientations were measured using the three-dimensional (3-D) search coil technique (Tweed et al., 1990; Hore et al., 1992). Subjects were seated such that their heads and upper arms stayed within the linear range at the center of three mutually orthogonal pairs of Helmholtz coils 2 m in diameter. Skalar (Delft, The Netherlands) eye coils [either two-dimensional (2-D) or 3-D] were inserted into the anesthetized right eye of a subject at the beginning of the experiment, and a more robust “homemade” dual 3-D coil was secured with tape to the skin of the lateral upper arm. All data were sampled by a personal computer at 50 Hz. As dictated by the 3-D coil method, the Helmholtz coils were precalibrated with the use of similar coils (Tweed et al., 1990). However, we also used a second, more standard calibration procedure, in which gains and biases were adjusted to match the expected signals when subjects looked and pointed toward continuously illuminated targets at known positions (0°, 15°, and 30° horizontal eccentricity). The latter data were gathered at the end of the experiment (and adjustments were made off-line), so that this visual feedback could not influence experimental performance.
A matte black tangent screen was fixed at exactly 110 cm from the center of a subject’s eyes, parallel to the vertical and horizontal magnetic fields. The subject’s seated height was adjusted so that the right eye was directly aligned with the central target on this screen. Targets consisted of 3 mm light-emitting diodes (LEDs) (0.17° in diameter and 2.0 mcd luminance) controlled peripherally by a second computer. The Helmholtz coils were also painted matte black, and the forward portion of the coils was coated further with black velvet to eliminate reflections from the LEDs. During experiments, feedback signals from the LEDs were also recorded to ascertain their exact illumination durations. Finally, signals were recorded from a push-button held in the subject’s left hand that was used to indicate when the subject believed that the right arm was pointing accurately at the target. Peripheral electronic modules, computers, an oscilloscope for on-line monitoring of eye and arm position, and the experimenters were located in an adjacent closed room.
Experimental paradigms. A related previous experiment used a central ocular fixation target and varied the eccentricity of the pointing target to show that retinal displacement was exaggerated (Bock, 1986). However, different displacements of arm position could bring in confounding motor effects (e.g., Bock and Eckmiller, 1986). Therefore, we asked subjects to point (with the arm fully extended) toward a central target light (T) mounted directly in front of the right eye, and we varied the horizontal angle of the illuminated fixation light (F). Subjects were instructed to point as accurately as possible in all paradigms, but only when all stimulus lights were extinguished such that arm movements were made in the complete absence of visual cues.
Figure 2 provides temporal information on LED illumination and schematic eye (dashed line) and arm (solid line) trajectories for our three basic experimental paradigms. In the basic control paradigm, subjects visually fixated T (illuminated for 1.4 sec) and then pointed toward the remembered location of T after it was extinguished (Fig. 2A). An auditory tone (*) signaled the subject to lower the arm to its resting position and prepare for the next trial. The purpose of this paradigm was to establish the basic accuracy of pointing toward a remembered central target. In the second control-type paradigm (Fig. 2B), subjects looked continuously at an eccentric F. After 0.7 sec had passed,T appeared for 0.7 sec, and then both lights were extinguished. Subjects then pointed to the remembered location ofT, while keeping gaze fixed at the location of F. Consequently, in this paradigm, the eyes did not move during the trial, and they never fixated on T. We called this second control the “static” paradigm to reflect the stationary posture of the eyes. The purpose of this paradigm was to establish a control pattern of pointing errors for each subject when pointing toward a retinally peripheral (but craniotopically central) target.
The final paradigm provided the test illustrated in Figure 1. In this paradigm (Fig. 2C), the subjects were required to initially look at the central T, which remained illuminated for 750 msec. At the moment that T was extinguished, Fwas illuminated for 500 msec. Subjects were required to saccade toward and fixate F. When F was extinguished, subjects were required to continue fixating their gaze on the location ofF and to point toward the remembered location ofT (in complete darkness). The resulting time course was long enough that the final pointing direction could not be affected by transient visuomotor distortions related to saccades or target jumps (e.g., Miller, 1989; van Sonderin et al., 1989; Cai et al., 1997). We called this the “dynamic” paradigm to emphasize the saccade that intervened between viewing and pointing toward the central target.
Based on the results of a previous study (Bock, 1986), we positioned LEDs 15° to the right and left of the central target to serve as our standard fixation lights, but we also tested a wider range ofFs along the horizontal meridian. Furthermore, Bock (1986)varied both the pointing-target and fixation direction to confirm that the observed pointing errors were a function of retinal displacement rather than of eye position. We used the same approach with our new dynamic paradigm as a final control, i.e., we repeated our measurements with two additional horizontal positions for the T. In preliminary trials, in which leftward and rightward fixations were interleaved, subjects sometimes reported a vague (presumably proprioceptive) sensation that they were pointing in different directions (i.e., see Fig. 3), even though they believed that they were faithfully pointing toward the remembered location of the central target light. Henceforth, and in all data reported here, we did not interleave fixation lights but rather repeated the same Ffor several trials, with a pause before the next set of trials. With this change, subjects reported no conscious awareness that they were making any pointing errors. Subjects were required to practice our paradigms for ∼15 min within 2 d of the experiment to avoid confusion during the experiment but did not receive any visual feedback on their performance until the calibration trials at the end of the actual experiment.
These factors led to the following order of paradigms in each experiment: (1) control paradigm, 20 trials; (2) static paradigm,F 15° left, 20 trials; (3) static paradigm, F15° right, 20 trials; (4) static “series” (F ordered 30° left, 15° left, 5° left, 0°, 5° right, 15° right, then 30° right; five trials then pause at each F); (5) dynamic paradigm, F 15° left, 20 trials; (6) dynamic paradigm, F 15° right, 20 trials; (7) standard dynamic series (F ordered 30° left, 15° left, 5° left, 0°, 5° right, 15° right, then 30° right; five trials then pause at each F); (8) dynamic series, T shifted 15° right; and (9) dynamic series, T shifted 30° right; 10 calibrations.
Data analysis. The three components recorded from the “normal” ocular coil were treated as a “gaze vector,” calibrated such that it pointed straight forward along the visual axis when subjects stared at T. Our 2-D figures show the vertical and horizontal components of these vectors as they project onto the plane of the tangent screen. For quantification, the orthogonal projections of these vectors have been converted into angular measures of vertical and horizontal orientation (Crawford and Guitton, 1997). Arm coil signals were used to compute quaternions (Tweed et al., 1990;Hore et al., 1992), which were then converted into “pointing vectors” (Tweed et al., 1990) similar to our gaze vectors and also into angular measures of upper arm position. For final orientations with the elbow fully extended and locked, this uniquely specified the pointing direction of the arm. Because the arm does not point directly at the target during visual pointing, but rather aligns the finger tip with the visual axis (Soechting et al., 1991), a description of arm position could be complex. However, this potential pitfall was easily avoided with the quaternion technique, because coil signals recorded while pointing at T during calibrations became the reference position to which all other positions were referred. Thus, all eye and arm positions were measured relative to the orientations recorded while looking and pointing toward the central light. This provided a sensitive and accurate measure of relative pointing errors for our test, while allowing us to confirm that the eyes were held or shifted according to instructions.
Final fixation and pointing positions (Fig. 2, downward arrows) were selected visually according to the criteria that the pointing button must be depressed, arm position must have reached its greatest degree of stability, and the eye and arm movements correctly followed (in timing and magnitude) the requirements of the paradigm. Data trials were also rejected if subjects failed to maintain gaze eccentricity within 80% of F. This allowed a maximum allowable ocular fixation error of 1, 3, and 6°, respectively, at 5, 15, and 30° eccentricity of F. In practice, almost all fixation errors were smaller, and because of the nonlinear saturating nature of the effect described below, even maximal fixation errors would apparently have little effect on arm position. Statistical analysis was performed with the SPSS Statistical Package.
Subjective experiment. While examining the quantitative data below, it may help the reader to note that our three paradigms can be repeated subjectively without experimental apparatus. For example, foveate a central visual target, close the eyes while maintaining fixation, point at the target, and then open the eyes to see the result (control paradigm); next, without moving the head, view the central target while fixating peripherally, close the eyes while carefully maintaining peripheral fixation, point, and then open the eyes (static paradigm); finally, foveate the central target, close the eyes, saccade horizontally, point while maintaining peripheral fixation, and then open the eyes (dynamic paradigm). An effect can sometimes be detected in this subjective version of our experiment, particularly if fixations in both directions are used. However, the effect seems to dissipate (at least temporarily) over time, perhaps because of visual feedback (which was not available in the experiment below).
RESULTS
Standard 15° stimuli
Figure 3A–C illustrates our three basic paradigms and the typical performance of one subject over the time course of five trials. In each case, the horizontal position of the central target (black box), the 15° leftward fixation light (hatched box), the eye (thin trace), and the arm (thick trace) are plotted as a function of time and aligned across five consecutive trials. Figure 3A illustrates the control paradigm, in which gaze is maintained on the target even after it is extinguished. The upper arm was initially at its resting position and then shifted leftward (and upward) when the light was extinguished, finally coming to rest at an accurate pointing orientation. In the second static paradigm (Fig. 3B), subjects continually looked toward a peripheral fixation light, 15° left, while the central target light was flashed. In this case, final pointing orientation was not accurate but rather missed by several degrees to the right. Finally, the dynamic paradigm is illustrated in Figure3C. As indicated by the eye trace, subjects initially fixated on the flashed central target and then saccaded toward the briefly illuminated fixation light after the first light was extinguished. Again, the final pointing direction missed to the right. Note that in each of our three paradigms, final gaze direction was maintained during pointing, and the arm only began moving after all LEDs had been extinguished, such that pointing occurred in complete darkness.
Figure 3D–F shows the same five consecutive trials for each paradigm but now provides two-dimensional eye (solid diamonds) and arm (open squares) trajectories (excluding the return trajectories shown inA–C). The curving orientation trajectory of the upper right arm is now evident, as it went from the resting position toward full extension. In the control condition (Fig. 3D), one can again see that the central target was continuously fixated by 2-D gaze (solid diamonds) and was accurately acquired by the arm. In the static paradigm (Fig. 3E), gaze was continuously maintained on the 15° leftward fixation light, and the arm (open squares) now arced toward an inaccurate final pointing direction, missing to the right when compared with the control. Finally, in the dynamic paradigm (Fig. 3F), 2-D gaze was initially maintained centrally (solid diamonds) but then shifted 15° to the left where it was held. Once again, the subject missed, now pointing to the right. From this figure, one might get the impression that the peripheral deviations of the eye caused the arm to undershoot its target (Fig.3E,F), but as we shall see, this was only part of a more complex pattern of errors. To illustrate the full pattern, we will henceforth focus on the final steady-state directions of ocular fixation and pointing.
Figure 4 shows final 2-D gaze (circles) and pointing ( squares) directions plotted for 20 trials in one archetypical subject (left) and for the computed means (across 20 trials) of all seven quantified subjects (right). Figure 4, Aand B, shows data from the control paradigm. In this case, pointing responses were relatively accurate (compared with the other paradigms), with an overall slight leftward and downward bias. On average, across the individual means (Fig. 4B), subjects erred by only 1.70° left [±0.80° (SE)] and 1.94° down [±0.94° (SE)]. Because the vertical undershoot did not vary between paradigms, and because only horizontal eye position was manipulated, we will henceforth focus on horizontal pointing performance.
In Fig. 4C–F, solid squares represent pointing performance during 15° leftward fixation (solid circles), whereas open squares denote performance with 15° rightward fixation (open circles). Results from both the static (Fig.4C,D) and dynamic (Fig. 4E,F) paradigms are shown. Pointing was much less accurate in the static task (Fig. 4C,D) than in controls. The stochastic variations within subjects (e.g., Fig. 4C) were not significantly different from controls (p ≥ 0.198), but the mean pointing values (Fig. 4D) were less accurate in the static paradigm and more variable between subjects. On average, subjects erred 2.17° ± 0.67 to the right (solid squares) when fixating to the left (solid circles) and erred 1.84° ± 0.97 to the left (open squares) when fixating to the right (open circles) (mean ± SE between subjects). Averaged together and compared with the actual retinal eccentricity (∼15°), this represented an overestimation of 13.4% (± 5.1 SE between subjects). Some of this variance between subjects was reduced when the bias present in the controls (Fig. 4B) was subtracted out. Statistical analysis (pairwise t tests) showed that final pointing directions in the static task were significantly different (p ≤ 0.01) from the directions in controls in all cases, except during rightward fixation in one subject. This confirmed that subjects made systematic errors in pointing toward remembered targets when gaze was deviated peripherally, as required for our dynamic paradigm test to work (Fig. 1). Although not present in all subjects, the trend was for subjects to overshoot the target in the direction opposite to the fixation point, as reported previously (Bock, 1986; Enright, 1995).
Next came the crucial dynamic test (Fig.4E,F). Recall that the headcentric model predicted a final pointing distribution similar to that of the control (Fig. 4A,B), whereas the oculocentric model predicted errors similar to those seen in the static paradigm (Fig.4C,D). The latter prediction held; the dynamic data (Fig.4E,F) were qualitatively indistinguishable from the static data. Again, variability within subjects was not significantly increased (p ≥ 0.159), but the individual pointing distributions (solid squares, open squares) for left (solid circles) and right (open circles) fixations were significantly different from those for controls (p ≤ 0.001 in all but one fixation direction in one subject, where p was ≤ 0.05). Mean horizontal pointing error, relative to fixation direction, averaged across subjects was 3.40° ± 0.98 (SE) for leftward fixation and 1.70° ± 1.20 for rightward fixation. This represented an overall mean exaggeration of 17% (± 6.9%, SE across subjects). Moreover, average horizontal pointing errors across subjects (Fig.4D,F) were not significantly different between the static and dynamic tasks (p ≥ 0.25), whereas both were significantly different from those in the control task (p ≤ 0.003).
Figure 4G graphically illustrates the similarities between the results of the static and dynamic paradigms by plotting the mean horizontal “dynamic error” as a function of mean horizontal “static error” for each subject. By calculating regression lines and correlations for these data, we generated a set of predictions that were independent of the degree of error in any individual subject: the headcentric model predicts a slope and correlation of zero (i.e., no systematic relationship between dynamic and static pointing error), whereas the oculocentric model predicts an ideal slope and correlation of 1.0 (dynamic error = static error; assuming that the internal remapping mechanism worked perfectly). Actual slopes and correlations were 1.17 and 0.897 (dotted line) for rightward fixation (open squares), 1.06 and 0.721 (dashed line) for leftward fixation (solid squares), and 1.15 and 0.926 (solid line) when both data sets were combined. One might wonder whether some of this correlation might be caused by biases that were constant across tasks within subjects but varied between subjects. To control for this, we subtracted the control paradigm errors from the static and dynamic errors in each subject and then replotted the comparison (Fig.4H). This reduced the usable variance in the leftward fixation data (solid squares), but the overall slope and correlation remained high (1.39 and 0.912), consistent with the oculocentric model.
Static and dynamic series
The preceding results were consistent with the predictions of the oculocentric model (Fig. 1) across subjects but did not supply a direct quantitative measure for performance withinindividual subjects. To provide such a test, we asked each subject to repeat both the static and dynamic paradigms five times for each of a series of seven fixation lights in the horizontal plane. Figure5 illustrates the pointing performance in this task, again showing final 2-D gaze (solid circles) and pointing (open squares) vectors for one subject (this subject showed a pattern of pointing errors closely resembling the average pattern shown below but with a greater than average magnitude). Data for all seven fixation targets are shown (from 30° left to 30° right), but each plot is staggered vertically by 8° to reduce overlap. Dashed linesjoin the corresponding groups of gaze and pointing directions. Both the static paradigm (Fig. 5A) and the dynamic paradigm (Fig.5B) produced a characteristic pattern of pointing errors as a function of final fixation position. The important point is that this pattern is almost indistinguishable in these two conditions; as final gaze direction (solid circles) proceeded from left to right, pointing errors (open squares) proceeded from right to left.
To quantify these individual patterns, we averaged the horizontal pointing errors (across trials) for each fixation target and plotted these as a function of horizontal eye position. Figure6 shows such plots for both the static (dashed lines) and dynamic (solid lines) series, for each of the seven individual subjects (Fig.6A–G). There was considerable variability in the pattern between subjects, particularly in their overall horizontal bias. However, the grand mean (Fig. 6H) across the individual curves showed the same saturating pattern of retinal overestimation reported by Bock (1986), with a moderate right–left asymmetry. More importantly, in each case there was a striking similarity between the pattern of errors in the static and dynamic series. Using pairwise t tests for the two directions and comparing the mean responses of all subjects, we found that the dynamic and static series were not significantly different for the leftward fixation directions (n = 3 lights × 7 subjects = 21; p = 0.518) or the rightward fixation directions (n = 21; p = 0.524).
The final step of this analysis was to collapse the data in Figure 6into a single, direct measure of fit to the two models. This was accomplished by plotting mean horizontal dynamic pointing error as a function of mean static error at each of the final fixation targets, providing a total of seven data points for each subject. (This is similar to the approach taken in Fig. 4G, but now we are quantifying performance within individual subjects.) Regression fits to these plots thus provided a test that was independent of individual variations in the error pattern (Fig.6A–G). Figure7A shows the slopes predicted by the two models. Again, the headcentric model predicted a slope (dotted line) and correlation of zero, because dynamic errors should equal control errors independent of static paradigm errors at peripheral targets. In contrast, the oculocentric model predicted that dynamic error would equal static error (assuming that the internal remapping mechanism worked perfectly and assuming zero biological noise). Thus, the oculocentric model predicted an ideal slope (Fig. 7A, dashed line) and correlation of 1.0. Figure 7B shows the actual computed slope for one subject, and Figure 7C shows slopes for all subjects. Figure 7D shows the regression fit (solid line) to the mean across-subject errors shown in Figure6H. This average data had a slope of 1.20 and correlation of 0.99. Figure 7D also shows the grand mean of all the individual slopes (dashed line), which was 0.97 ± 0.13 (± SE between subjects). Thus, according to the predictions of our models (Fig. 1), the data clearly supported the oculocentric hypothesis.
Effect of retinal error versus eye position
The pattern of pointing errors that we have described cannot be accounted for by differences in initial or final body and head posture, initial arm position, or the pointing-target position, because all of these were held constant. However, our test would be invalid if the dynamic paradigm pointing errors were caused by a headcentric dependence of pointing direction on final eye position, rather than retinal displacement per se. In particular, if instantaneous eye position distorted the moment-to-moment perception of target direction, as suggested by some (Hill, 1972; Morgan, 1978), then pointing errors like those described above could also occur with the headcentric model of visual memory (Fig. 1C). Thus, a final experiment was required to control for this contingency. Bock (1986) did this (in a test similar to our static paradigm) by varying both the pointing-target and the fixation direction. To similarly control for this in our new dynamic paradigm, we asked subjects to repeat the multiple-fixation dynamic series (Fig. 6) twice more but now with theT at 15° and 30° to the right.
Figure 8A shows the mean (across subjects) pointing error curves for these three data sets, plotted as a function of final horizontal eye position. The two new curves were similar to the original but shifted by ∼15° intervals (Fig. 8A). The points that appear to be shifted vertically with respect to each other at 0°, 15°, and 30° right were indeed significantly different (across subjects, p≤ 0.001). However, note that these statistical differences disappeared and the plots collapsed into a single curve (Fig. 8B) when replotted as a function of gaze displacement relative to the target light (the reverse of angular retinal displacement of the target). Thus, the pointing errors observed in these paradigms were clearly a function of gaze-centered retinal displacement rather than eye position or any other head-centered variable, confirming the assumption of our previous test (Fig. 1).
DISCUSSION
Miscalibrations in reading the retinal code
The practical implication of this study is that human subjects misestimate the retinal eccentricity of peripheral targets when pointing in the absence of visual feedback, even if they have only momentarily glanced away from a central target (e.g., Fig.4E). The general trend of both our static and dynamic paradigm results confirmed that peripheral retinal displacement is exaggerated in most subjects (Bock, 1986; Enright, 1995). For example, if an artist were to glance at a specific site on a painting, look away toward the subject, and then dab blindly at the remembered location, he or she would very likely miss in the direction opposite to current gaze (for a more dramatic example, think of aiming a handgun in this manner). Presumably these visuomotor errors occur because we have little experience in pointing at or manipulating visual targets that we are not simultaneously foveating, and thus we have not properly calibrated the system for this task (although it is unclear why horizontal retinal error would consistently be exaggerated). Thus, as suggested in Materials and Methods, this effect would probably dissipate if subjects were trained to point toward retinally peripheral targets in the presence of visual feedback.
Although we cannot know exactly where or how these visuomotor distortions occurred in the brain, four points are evident. First, because we used a constant central pointing target and only varied eye position, these errors cannot be attributed to a purely arm-related motor effect. Second, because they occurred to the same degree in both our dynamic and static paradigms (in which the eyes did not move), they cannot be attributed to errors in an eye movement efference copy. Third, they cannot be attributed to distortions in the retinotopic maps for vision. Because any point on a topographic sensory map can potentially be mapped onto any type and magnitude of motor output, this is clearly a question of visuomotor calibration, or what one might call the visual readout mechanism. This point is illustrated by several recent experiments in which the visuomotor calibration process was shown to be quite local, e.g., to a single arm (Thach et al., 1992) or even to certain positions of one arm (Gharamani et al., 1996). Fourth, our final test (Fig. 8) confirmed that the calibration errors in our experiment were a function of retinal displacement (Bock, 1986) rather than eye orientation (Hill, 1972; Morgan, 1978) or any other headcentric variable. In other words, they can be simulated as gain errors in models that use a retinotopic frame at any point (e.g.,Zipser and Andersen, 1988; Moschovakis and Highstein, 1994) but cannot be simulated within an exclusively headcentric frame. This last point is central for the main goal of this study.
Storage and remapping of visual space in an oculocentric frame
The main purpose of this investigation was to use the visuomotor errors described above to gain insight into the internal mechanism for short-term storage of spatial vision. Provided that (1) the visuomotor readout mechanism for pointing distorts peripheral retinal codes, (2) the headcentric model requires this readout process to occurbefore it is stored, and (3) the oculocentric model predicts shifts in retinotopic memory traces and allows these traces to be read out after the storage stage, then these models make the following mutually exclusive predictions (Fig. 1C vsD). The headcentric model predicts that the intervening eye movements in our dynamic paradigm would have no systematic effect on pointing performance, whereas the oculocentric model predicts that the dynamic paradigm would induce a pattern of errors indistinguishable from that observed in our static paradigm. Clearly, our data (Figs.4-7) support the predictions of the oculocentric model. The average dynamic and static error slope (0.97) was remarkably similar to that predicted by the oculocentric model (1.0). Furthermore, this model can account for the individual subject variations in the static and dynamic error slopes (Fig. 7C) as small biases and gain errors to the oculocentric remapping process. The headcentric model provides no provision to explain these data.
The remarkable biological implication is that, contrary to subjective intuition, the brain does not necessarily possess a stable map of absolute or even bodycentric visual space. Instead, it seems to represent space relative to current gaze direction, such that internal representations of visual targets must be remapped for each eye movement to retain the correct spatial registry with the world (Moschovakis et al., 1988; Goldberg and Bruce, 1990; Duhamel et al., 1992; Moschovakis and Highstein, 1994; Walker et al., 1995; Mazzoni et al., 1996). Although this mechanism was first suggested by the oculomotor studies cited above, our results suggest that it is a more general spatial mechanism. [There is some debate whether such mechanisms pertain more closely to vision (Duhamel et al., 1992; Tian et al., 1996) or to motor intent (Mazzoni et al., 1996; Snyder et al., 1997), but our data cannot make this distinction.] This is consistent both with recent single-unit recordings in the primate frontal cortex (Tian et al., 1996; Mushiake et al., 1997) and with the well documented reports of transient perceptual distortions around the time of a saccade (Miller, 1989; Cai et al., 1997; Ross, 1997). Note that the brain could use several reference frames to map sensory space, so long as they are properly interconverted for behavior (Harris et al., 1980;Jay and Sparks, 1984). For example, we may have obtained a different result if we had asked subjects to judge the craniotopic “straight ahead” without a visual target. However, gaze-centered remapping is very likely the dominant mechanism in storing visual information. The potential advantage is clear; it capitalizes on abundantly available retinotopic cortical machinery to keep the spatial reference point (current gaze) centered within the visual field, on the object of greatest interest, and at the region of highest neural acuity (the fovea).
In light of these conclusions, it is timely to point out that current models for this process will not work in real 3-D space. These models subtract translation-like saccade vectors from similar retinotopic vectors (retinal error) to provide final retinal error (Goldberg and Bruce, 1990; Moschovakis and Highstein, 1994). However, eye movements, being rotations, do not add or subtract commutatively. This problem was first raised in the context of ocular motor control (Tweed and Vilis, 1987), in which it may in part have a muscular solution (Demer et al., 1995; Crawford and Guitton, 1997). In contrast, retinotopic remapping is purely an issue of internal representation, so there can be no trivial mechanical solution here. The problem of noncommutativity is most easily seen when the eyes and head rotate in the torsional/roll dimension, (Crawford and Vilis, 1995; Crawford and Guitton, 1997). Ocular torsion disrupts the registry between the world and the retina, but the rotation vectors for such movements do not subtract from retinal error vectors in any meaningful way. Moreover, similar problems can be demonstrated for horizontal and vertical movements (see Tweed et al., 1994, their Appendix), and the resulting errors would tend to accumulate over the course of several saccades. However, these problems are eliminated if we replace the idea of vector subtraction with a noncommutative model that multiplicatively rotates retinal representations by the inverse of each eye rotation in space. Toward stimulating further research in this vein, we have supplied a mathematical model in Appendix that will correctly simulate remapping during 3-D eye, head, and body rotations. Appendix then describes how this model can be tested via 3-D extensions of previous multiple eye movement tasks (e.g., Matin et al., 1969; Duhamel et al., 1992).
Visual representation versus visuomotor control
At first glance, our conclusions seem to contradict previous arguments that an eye position-dependent eye-to-head reference frame transformation is required for the execution of visually guided behaviors (Andersen et al., 1985; Gauthier et al., 1990; Flanders et al., 1992). In particular, we have recently argued that the 3-D geometry of the eye requires such a transformation for saccades to be accurate and kinematically correct from all initial eye positions (Crawford and Guitton, 1997; Klier and Crawford, 1997). This apparent contradiction comes from an historically biased mind set. The eye-to-head reference frame transformation used in some oculomotor models (e.g., Zee et al., 1976) has classically been equated with both spatial perception and motor control. However, it is easily reconciled with dynamic retinotopic mapping if one accepts that these are two separate mechanisms for two separate processes. According to this composite view, dynamic retinotopic mapping pertains to the initial perception and memory of visual target locations, whereas an internal comparison with eye/head position and with construction of signals defined relative to the head/body (Flanders et al., 1992; Brotchie et al., 1995; Crawford and Guitton, 1997) is something that occurs functionally downstream, in the kinematic computations required for the execution of movement.
These two separate stages of representation and visuomotor execution are illustrated schematically in Figure9, in a “conversion-on-demand” model of visuomotor control. (The predictions of this model for single-unit recording are provided in Appendix .) In the primary stages, i.e., initial perception and memory (Fig. 9A), visual target direction is stored dynamically in various retinotopic maps. As described above, these oculocentric representations must be remapped for each eye movement (Fig. 9B). This is the first stage, which we believe operates at a relatively global level, either on the global representation of global visual space (Duhamel et al., 1992;Tian et al., 1996), on multiple intended targets (Mazzoni et al., 1996;Snyder et al., 1997), or perhaps on both within different parts of the cortex. The second stage begins with the selection of particular subsets of visual data (through further attentional and intentional mechanisms) relevant for behavior (Mazzoni et al., 1996). The first geometric transformation in this second process (Fig. 9C) would be a position-dependent eye-to-head reference frame transformation (Zipser and Andersen, 1988; Gauthier et al., 1990;Crawford and Guitton, 1997), followed by a series of transformations (Soechting et al., 1991, 1995; Flanders et al., 1992) necessary for motor execution (Fig. 9D–F). Owing to the computational complexity of such transformations (e.g., Crawford and Guitton, 1997), it is biologically economical to place the global representation stage as early as possible in this sequence, thereby avoiding unnecessary computations on inessential data. Thus, the optimal solution seems to be storage of visual signals in a sensory frame, held available on demand for the motor control systems of the brain.
Appendix
3-D model for noncommutative retinotopic remapping
In our model, retinal error is represented by vectors directed toward the target in eye coordinates (Crawford and Guitton, 1997). The task then is to counter-rotate the vector representations within this map each time the eye rotates in space. Our model thus takes the mathematical form given below, where G is an oculocentric unit vector pointing toward a given target, the vertical and horizontal components of which are specified in the retinotopic map; cis a scalar representation of depth coded by binocular disparity; andEr , Hr , andBr are quaternion representations of eye, head, and body rotation derived from efference and afference copies during an orienting gaze shift (Radau et al., 1994). Rotation of the world relative to the eye (We ) is first computed by inverting the multiplicative composite of Er ,Hr , and Br :We = (ErHrRr )−1(see Tweed and Vilis, 1987, for definitions of quaternion multiplication and inversion). This is then used to rotate globally all retinotopic direction/depth vectors, for i = 1 toN, into the correct registry with the world:ciGi(new) =We [ciGi(old)]We −1. Via attentional/intentional mechanisms, some vectors are then selected for visuomotor reference frame transformation, modeled as a rotation by the inverse of an eye position quaternion (Ep ) into head coordinates: cG(head) =Ep −1[cG(eye)]Ep (Crawford and Guitton, 1997). This may undergo further reference frame transformations and be input into the equations that compute the 3-D kinematics for a specific movement (e.g., Fig. 9).
Appendix
Model predictions
Behavioral predictions
(1) In contrast to the vector-subtraction model of retinotopic remapping, our noncommutative model predicts that visual targets will be accurately remembered when a torsional eye movement occurs between seeing and saccading or pointing toward the target. (2) The same holds for roll movements of the head. (3) If simulated in three dimensions, the vector-subtraction model will predict a specific sequential accumulation of errors as the subject performs a “round-the-clock” pattern of saccades before indicating remembered target direction, approximately analogous to the errors simulated by Tweed and Vilis (1987). The noncommutative model predicts no such systematic accumulation of errors.
Single-unit recording
(4) As the neural basis for experiments 1 and 2 above, our model predicts that remapping will be observed in frontal, parietal, and collicular single-unit activity (e.g., Duhamel et al., 1992; Walker et al., 1995; Tian et al., 1996) during torsional eye and head movements, with peripheral representations in effect “circling” around the foveal region. (5) Single units have now been identified in posterior parietal cortex that carry spatial information during a delayed-response task and that specifically encode arm movements (Snyder et al., 1997). Our model predicts that these will be organized in retinotopic coordinates and will show remapping during a double saccade task. [If these same neurons possess gain fields (Andersen et al., 1985; Brotchie et al., 1995), they could theoretically serve a dual role in the visuomotor eye-to-body reference frame transformation.] (6) Neurons have recently been reported in premotor cortex that encode arm movement direction in retinal coordinates (Mushiake et al., 1997). If these carry sustained activity during delay periods, our model predicts that they will also show remapping. (7) Arm-related primary motor cortex neurons were not organized in retinal coordinates (Mushiake et al., 1997). Because this region also shows spatially selective activity that arises quite early in delay periods (Georgopoulos et al., 1982), this poses an apparent problem for our model, if the conversion-on-demand only occurs at the time of movement execution. However, this term refers to selective conversion, not temporal events. Therefore, the model can allow selective spatial information to pass into body coordinates at an early point, but this can still be updated closer to the time of the movement. This can be tested by recording from monkey primary cortex during our dynamic look–saccade–point task. If monkeys make visuomotor calibration errors like those reported here, we predict that only a small directional modulation (equal to just the bodycentric error arising from peripheral remapping), if any, will be detectable around the time of the saccade. If these last three predictions hold, this will suggest that our conversion-on-demand hypothesis describes the neural transformations between posterior parietal/premotor cortex and primary motor cortex.
Footnotes
This work was supported by a Natural Sciences and Engineering Research Council of Canada Grant to J.D.C. and by the Sloan Foundation. J.D.C is a Canadian Medical Research Council Scholar and an Alfred P. Sloan Fellow. We thank Drs. I. Howard, M. Steinbach, K. Grasse, and H. Ono and two anonymous referees for critical comments. We also thank L. Harris for creative input into Figure 1 and J. Lawrence for technical assistance.
Correspondence should be addressed to Dr. J. D. Crawford, Department of Psychology, York University, 4700 Keele Street, Toronto, Ontario, Canada, M3J 1P3.