Abstract
We analyzed the coordination between gaze behavior, fingertip movements, and movements of the manipulated object when subjects reached for and grasped a bar and moved it to press a target-switch. Subjects almost exclusively fixated certain landmarks critical for the control of the task. Landmarks at which contact events took place were obligatory gaze targets. These included the grasp site on the bar, the target, and the support surface where the bar was returned after target contact. Any obstacle in the direct movement path and the tip of the bar were optional landmarks. Subjects never fixated the hand or the moving bar. Gaze and hand/bar movements were linked concerning landmarks, with gaze leading. The instant that gaze exited a given landmark coincided with a kinematic event at that landmark in a manner suggesting that subjects monitored critical kinematic events for phasic verification of task progress and subgoal completion. For both the obstacle and target, subjects directed saccades and fixations to sites that were offset from the physical extension of the objects. Fixations related to an obstacle appeared to specify a location around which the extending tip of the bar should travel. We conclude that gaze supports hand movement planning by marking key positions to which the fingertips or grasped object are subsequently directed. The salience of gaze targets arises from the functional sensorimotor requirements of the task. We further suggest that gaze control contributes to the development and maintenance of sensorimotor correlation matrices that support predictive motor control in manipulation.
- eye–hand coordination
- object manipulation
- grasping
- obstacle avoidance
- hand movement
- saccadic eye movements
Human gaze behavior has been studied in various natural dynamic activities, including driving (Land, 1992;Land and Lee, 1994; Land and Horwood, 1995), music reading (Goolsby, 1994; Kinsler and Carpenter, 1995; Land and Furneaux, 1997), typing (Inhoff and Wang, 1992), walking (Patla and Vickers, 1997), throwing in basketball (Vickers, 1996), putting in golf (Vickers, 1992), and batting in cricket (Land and McLeod, 2000). Although the use of gaze in these activities is highly task-specific, a common finding is that subjects appear to control gaze shifts and fixations proactively to gather visual information for guiding movements. Concerning control of dexterous object manipulation, despite the importance of vision in general terms, only a few studies have examined gaze strategies in natural manipulation. Land and colleagues (1999) investigated gaze behavior during “tea making” in terms of object-oriented actions (e.g., “lift kettle,” “lid to kettle,” and “milk to mug”). Each of these actions is typically associated with four to six fixations directed to objects involved in the act, with vision typically leading action by 1 sec or less. Ballard and colleagues (1992, 1995; Smeets et al., 1996) examined eye–hand coordination when subjects moved blocks from a pickup area and placed them according to a visible model. Subjects invariably fixated a block before picking it up and the landing surface before placing the block. However, neither of these studies examined the precise spatial and temporal relation between gaze fixations and object-oriented actions. Thus, fundamental questions remain regarding the role of gaze fixations and shifts in the control of manipulation.
Under the hypothesis that the brain uses gaze fixations to obtain spatial information for controlling manipulatory actions, a central issue is whether there are critical landmarks to which the gaze is drawn and how these landmarks impinge on the action. For example, when subjects direct their gaze to an object to be grasped and subsequently moved, do they fixate specific parts of the object such as the grasp site (to guide finger contact) or protruding edges (to gain shape information to be used in motion planning)? When moving a grasped object around an obstacle, does fixation of the obstacle support motion planning? A related issue concerns the probability with which various landmarks are fixated depending on their role in the task. Still, another issue is the temporal relation between extraction of spatial information by gaze fixations and manual actions that make use of this information. Does this timing vary across phases of the task and are there specific epochs during which gaze and hand events are coupled?
The present account provides novel insights into these questions. We analyzed the coordination between fingertip movements, movements of a manipulated object, and gaze behavior in a task that required grasping and lifting of object and subsequent motion planning with the object in hand. Specifically, subjects were asked to reach for and grasp a bar and then move it to contact a target, either directly or around various obstacles.
MATERIALS AND METHODS
Subjects and general procedure
Four women and five men between 22 and 52 years of age participated in the experiments after providing informed consent, and the experimental protocol was conducted according to the declaration of Helsinki. The subjects were all right-handed, did not require corrective lenses, and had no history of ophthalmological or neurological disease. Figure1A illustrates the experimental setup. While seated behind a table, subjects used the tips of the right index finger and thumb to grasp and manipulate a bar (2 × 2 × 8 cm) located on a horizontally support surface formed by the top of a wooden stand placed on the table. The color of the bar was gray. All goal-directed bar movements took place in a frontal plane, termed the work plane, located 39 cm in front of the center of the subjects' eyes. We recorded the position of gaze, expressed as the point of intersection between the work plane and the line of sight of the right eye. An electronic shutter located between the eye and the work plane could be used to block the view of the scene at any time; the view of the left eye was always blocked. A black drape positioned 1 m behind the work plane provided a dark background (not illustrated in Fig. 1A). In addition to gaze, we recorded the three-dimensional position and orientation of the bar and the tips of the right index finger and thumb using sensors in the object and attached to the fingernails. Between trials, subjects grasped a “parking bar” between the right index finger and thumb (Fig. 1A). This bar was fixed on the tabletop 29 cm below the support surface. The subjects wore soundproof earphones with white noise to eliminate auditory cues related to changes of the experimental setting. These earphones were also used for verbal instructions.
Apparatus
Gaze recording. An infrared video-based eye-tracking system (RK-726PCI pupil/corneal tracking system; ISCAN, Inc., Burlington, MA) was used to record the position of gaze in the work plane at 120 samples/sec. The eye-imaging camera, the infrared light source (eccentric), and the dichroic mirror were mounted on a wooden frame that was fixed to the table (Fig. 1A). To stabilize the head, subjects bit on a U-shaped stainless steel plate (Protar, KaVo; EWL, Leutkirch, Germany) anchored to the support frame of the apparatus. Both sides of the plate were coated with wax (Alminax; Associated Dental Products Ltd., Wiltshire, UK), and impressions of the dentition in the wax provided high stability of the head. To obtain such impressions, the subjects initially bit on the wax after it had been prewarmed. The rectangular area of the work plane calibrated for tracking the gaze position of the right eye was 14 cm high and 19 cm wide. The line of sight of the right eye was perpendicular to the work plane when the subjects gazed at its horizontal center, 4 cm below its upper limit, and the calibrated area extended 2 cm below the support surface.
Recording of hand and object movements. We recorded at 30 samples/sec the three-dimensional positions and orientation (elevation, azimuth, and roll angles) of the bar and of the tips of the index finger and thumb using miniature electromagnetic position-angle sensors (FASTRAK; Polhemus, Colchester, VT). One sensor was fitted inside the bar, and the connecting cable came out at its rear lower edge. Each fingertip sensor (spherical, 11 mm diameter) was mounted on a small Perspex plate shaped to the profile of the fingernail. The plate was attached to the fingernail by double-sided sticky tape, and the connecting cables were taped to the digits. The connecting cables delivered with the position-angle sensors were all substituted with custom-made, light, flexible cables that were painted black. In the experimental work-space, the accuracy of the position measurement was >0.5 mm (resolution: 0.12 mm) and that of the angle measurement >1° (resolution: 0.025°).
We represented the positions of the tips of the index finger and thumb by the contact sites that subjects preferred to use when grasping the bar. These sites were estimated while the subjects grasped the parking bar that had the same depth (corresponding to grasp width) and height as the manipulated bar. Symmetrically located on each side of the parking bar were two hemispherical bumps (diameter 3 mm). The subjects were asked to grasp the bar at these bumps located 13.5 cm to the right of the vertical line through the center of the work plane. We used the known locations of the bumps in the calibrated space to offset the fingertip sensors with respect to orientation and position of the preferred contact sites of the digits.
Electronic shutter and fixation light. The electronic shutter (Speedglas; Hörnell International AB, Gagnef, Sweden), located ∼8 cm in front of the right eye, had opening and closure times of 15 and 10 msec, respectively (Fig. 1A). A fixation light (3 mm diameter red LED) against a black background could be presented to the subjects' left eye through a mirror that was located in front of the left eye behind the plane of the shutter. The viewing distance of the LED was 39 cm. When the subjects fixated the light, the gaze position was in the upper right quadrant of the work plane, located nominally 4 cm to the right of the vertical midline of the work plane (12.2 cm to the right of the target; see below) and 9 cm above the support surface.
Tasks
We report data obtained from a target contact task performed by the subjects as a part of a series of bar manipulation tasks studied in the same experimental session. In the target contact task, subjects grasped the bar by its right end and moved it such that its left end contacted a target. The target was a red 1.2 cm cube positioned 12.5 cm above the support surface on the top of a red stand at the left side in the work plane (Fig. 1B,C). The target was mounted on a spring-loaded micro-switch that distinctly yielded when the subjects had displaced its right surface by 2 mm in the left direction. After contacting the target, the subjects replaced the bar on the support surface. The subjects first performed four consecutive trials without an obstacle in the path of the bar and then four trials with each of two obstacles that had to be avoided. The red obstacle was mounted on the same post as the target, below the target. One obstacle had a rectangular shape and one was triangular (Fig.1B) (see Fig. 4A for all three obstacle conditions). The depth of the obstacles was 4 cm, and the side facing the subjects was aligned to the work plane. The presentation of the quartet of trials with each of the two obstacles was balanced across the subjects.
Each trial was initiated by an auditory cue (1 kHz beep for 200 msec) followed by the opening of the shutter allowing vision of the object. When the trial was completed the shutter closed. This was triggered by the digits arriving in the zone of the parking bar (see below). To obtain a reasonably uniform start position of gaze, the subjects gazed the fixation light during the inter-trial periods. The fixation light was turned off when the shutter was opened and the subjects performed the task. However, in one additional test series with the rectangular obstacle (four trials), the fixation light remained on during that task, and eye movements were prevented by requiring subjects to hold gaze on the light. In all test series, between trials we arbitrarily varied the distance between the left tip of the bar and the stand from 0.8 to 4.6 cm (mean 1.8 cm). The time interval between trials was 5–8 sec.
Before the test series, the experimenter demonstrated each type of trial, and the subjects were instructed to do the task at their preferred speed. The subjects were asked to hold the parking bar with the tips of the index finger and thumb between trials. All instructions were fed verbally through the earphones; the experimenter toggled a switch that interrupted the noise and connected the sound recorded by an ambient microphone to the earphones.
Gaze calibration procedure
We used a two-step calibration procedure to obtain gaze data with satisfactory spatial accuracy. For initial calibration, we used the point-of-regard calibration routine of ISCAN′s Line-of-site Plane Intersection Software. The subject was asked to look sequentially at five 3-mm-diameter LEDs that were illuminated one by one. These were mounted on a flat surface aligned with the work plane with one LED was located in the center and one in each corner of the work plane. For the final calibration, we used calibration measurements taken repeatedly during the experiments. Before the first test series and between every third block of trials, the subjects gazed sequentially at nine points on the same surface. These included the same five used during the initial calibration and four additional points located at the midpoint of the four lines that defined the rectangular work plane. Each sampled data point obtained during the experiment was calibrated off-line using data obtained from the nearest calibration measurement before and after the point. A satisfactory gaze recording required that the eyelid did not partly cover the pupil during any phase of the tasks (except during blinks). Therefore, we fixed the subjects' eyebrow in an uplifted position by attaching a tape between the eyebrow and the forehead in a manner that did not prevent the subjects from blinking.
Analysis
Data were sampled and analyzed using the SC/ZOOM system (Physiology Section, IMB, Umeå University). All signals (gaze and kinematic data) were time synchronized and stored at 200 Hz using linear interpolation between consecutive measurements. Data were sampled from 1 sec before the opening of the shutter until its closure. To analyze gaze–hand coordination in a common frame of reference, we projected data pertaining to the line of sight and the positions and orientations of the fingertips and bar to the work plane defined in the world-coordinates of the FASTRAK system.
Phases of the trials defined kinematically. We divided the target contact task into eight consecutive phases (Fig.1B,C). (1) Pre-reach phase: the period from the opening of the shutter until the hand began to move, defined as the point when the tip of the index finger had moved 2 cm from its parking position; (2) reach phase: the period from the start of hand movement (as defined above) until the moment the straight distance between the index finger and the forthcoming grasp site became <5 cm. The grasp site was defined as the position of the index finger in object coordinates at start of bar movement. This moment was instrumentally defined as the time when the bar velocity exceeded 2 cm/sec in any direction; (3) grasp phase: the period from the end of the reach phase until the start of bar movement. The time of contact between digits and the bar was defined as the moment the straight distance between the index finger and the grasp site (computed in three-dimensional space) went below 2 mm; (4) up phase: the period from start of bar movement as defined above until the straight distance between the left tip of the bar and the target was 3 cm; (5) target phase: the period during which the left tip of the bar was within 3 cm of the center of target contact surface. The yield of the switch was 2 mm. Therefore, the time of target contact was defined as the moment the horizontal position of the tip was <2 mm from the position recorded when the bar fully depressed the target switch. The switch release time was the moment the horizontal velocity of the tip of the bar first exceeded 2 cm/sec during the retraction from the target; (6) down phase: the period after the target phase when the bar was moved toward the table support. The end of this phase was the moment the vertical distance between the centroid of the bar and its final position on the support surface became <3 cm; (7) replace phase: the period between the end of the down phase and the moment the bar was repositioned on the support surface, defined as the moment bar velocity dropped below 2 cm/sec in any direction and the distance between the lowermost point of the bar and the support surface was to be <2 mm; and (8) reset phase: the transport of the hand to the parking position after the replace phase. The reset was completed when the distance between the tip of the index finger and its bump on the parking bar went below 2 cm, which triggered closure of the shutter.
Gaze signals. To determine the gaze position in the work plane, for each coordinate of measurement (horizontal, vertical) we combined two signals provided by the ISCAN Line-of-site Computation and Plane Intersection Software. One signal was the initially calibrated “scene image point-of-regard position.” This signal provided a low-pass-filtered representation of gaze position (lagging moving average of 10 samples at 120 Hz). In addition to low bandwidth, this signal suffers from an apparent delay in the representation of gaze position and in contrast to specifications given by the ISCAN Company, this filter could not be changed or removed. The other signal was the “pupil position” signal, which provided the corresponding nonfiltered data but did not benefit from the corneal reflection to correct for slow drifts caused by head movements, etc. To obtain a measure of gaze position (G) with adequate temporal resolution and spatial accuracy, the point-of-regard signal (R) and pupil position signal (P) were combined as follows:Gn = (Pn −Pn−1) +Gn−1 + 0.042 (R −Gn−1) + 0.125 (Rn −Rn−1), whereGn andGn−1 stands for the corrected signal at a given sample (n) and at the previous sample (n−1), respectively. The coefficients 0.042 (0.025 at 200 Hz sample rate) and 0.125 essentially reflected the properties of the built-in low-pass filter of the ISCAN system. Figure2A illustrates at high time resolution the derived gaze position (G) in the horizontal and vertical dimensions together with the recordedR and P signals.
Before computing G, both signals (R andP) were subjected to the following off-line calibration procedure. To calibrate each data point obtained during the target contact task, we used the nearest nine-point calibration measurement before and after the data point. Separate multiple linear regressions in the horizontal (x) and vertical (y) dimensions were applied to data obtained from both calibration measurements. The terms included in the regression were x,y, x2,y2, xy,x2y, andxy2. The resultant regression coefficients were used to scale the data point obtained in the interval between the two calibration measurements. Figure 2Bshows worst-case estimates of the final error in gaze measurements for the x and y coordinates, respectively. We obtained these error distributions by computing the difference between the gaze positions recorded during all nine-point calibration measurements and the corresponding positions predicted from the calibration episode before and after (date pooled from all 9 calibration points obtained in 10 calibration measurements for each of the subjects). The SDs of the error distributions in x andy were ±0.34 and ±0.36 cm in the work plane, respectively. This corresponds to 0.50 and 0.52° angle of gaze.
Measurements of gaze. We measured the locations and duration of all gaze fixations. In addition, we measured the frequency of fixations at specific landmark regions in the visual scene and the sequence of fixations of these regions as described in Results. We defined a gaze fixation as the period between the end of a saccade and the start of the subsequent saccade. Unless indicated otherwise, the position of the gaze during a fixation was defined as the mean values of all sampled x and y values during the entire epoch of fixation.
We detected the occurrences of saccades based on a filter applied to the gaze position signals. First, gaze velocity in the work plane was assessed from the vectorial sum of the first time differentials of the gaze position signals in x and y using ±6 point numerical differentiation (±30 msec moving window; each sample point had the same weight). We then computed the second time differential of the gaze velocity, again using ±6 point numerical differentiation, and a saccade was scored when the amplitude of the negative peak of this differential exceeded 150 m/sec3(22,000°/sec3). The peak gaze velocity of the detected saccades was assessed by the vectorial sum of the first time differentials of the gaze position signals in x andy using a ±1 point numerical differentiation. Likewise, the start and end of a saccade were defined by the first and second maximums of the second time derivative of the gaze velocity signal assessed by ±2 point numerical differentiation (see Fig.2A). The straight distance in the work plane between gaze positions at the start and end of the saccade represented the gaze displacement during a saccade, i.e., the saccade amplitude. Blinks were detected from a transient reduction in the pupil size measurement, provided by the eye tracking system. Gaze shifts >1 cm during a blink were regarded as blink saccades. Blink saccades constituted only 2.6% of the total number of observed saccades.
To our knowledge basic saccade parameters have not been reported previously in a natural visuomotor task involving real objects. However, several factors are known to influence the characteristics of saccades. In addition to idiosyncratic factors, the characteristics of saccades depend on the orbital direction of the eye movement (e.g., centrifugal or centripetal, temporal or nasal), the way the target position is designated (visually or by recall from memory, etc.), and the attentional state of the subject (for review see Becker, 1991). In Figure 2C–F we summarize some saccade parameters based on 1316 saccade-fixation episodes recorded in the target contact task (blink saccades not included). The distribution of saccade duration was skewed positively with a median duration of 43 msec (25–75th percentile: 35–53 msec) (Fig. 2C). Similarly, the saccade amplitudes were positively skewed with a median value of 2.2 cm (1.0–4.8 cm), which corresponds to a 3.2° gaze shift (1.5–7.1°) (Fig. 2D). In accordance with previous observations concerning saccades (Becker, 1991), increased saccade amplitude was associated with an increased gaze velocity (p < 0.001; Spearman rank correlation) with an obvious saturation tendency for gaze velocity (Fig. 2E). Likewise, increased saccade amplitude was accomplished by increased saccade duration (p < 0.001) (Fig. 2F) (Robinson, 1964; Baloh et al., 1975; Körner, 1975; Collewijn et al., 1988). A linear regression between saccade amplitude and duration provided coefficients that matched those reported previously for human saccades (Becker, 1991); the y-axis intercept in Figure2F was 34 msec and the saccade duration increased by 2 msec per degree of amplitude.
In addition to saccadic shifts, the gaze position could drift during the fixation periods between saccades. This drift (median: 0.5 cm or 0.7°) was nearly an order of magnitude smaller than the gaze shifts mediated by saccades and could not be explained by calibration drift. There was no reliable correlation between the amplitude of the gaze drift and fixation duration, suggesting that this drift largely belonged to the post-saccadic eye movements referred to as glissades (Bahill and Clark, 1975; Bahill et al., 1978; Kapoula et al., 1986;Collewijn et al., 1988; Kowler, 1991). Figure 2G shows the distribution of duration of all fixations recorded during the target contact tasks. The duration ranged from 25 msec to 1.9 sec and was positively skewed (median: 286 msec; 25–75th percentile: 197–536 msec).
Nominal landmark and landmark zones. We defined gaze landmark zones in the work plane for the grasp site, the left tip of the bar, the target, the protruding element of the obstacle, and the support surface (for rationale, see Results). The grasp site landmark was represented as a point on the bar as defined above, i.e., as the position of the index finger in object coordinates at the start of bar movement. The tip of the bar was represented by the midpoint of the left end of the bar, and the target was represented by the midpoint of the right surface of the target. For the triangular object, the landmark was the right protruding tip of the triangle, and for the rectangular object, it was a vertical line coinciding with the right vertical edge of the obstacle. A horizontal line that coincided with the edge of the support surface in the work plane represented the landmark of the support surface. Unless specified otherwise, gaze landmark zones corresponded to an area radiating 2 cm (3° visual angle) in the work plane in all directions from the corresponding landmark.
Statistics. In addition to linear-regression analysis (least-squares fit), we used nonparametric statistics (Siegel and Castellan, 1988) as indicated in Results. The level of probability chosen as statistically significant was p < 0.05. Unless stated otherwise, data distribution parameters (e.g., median and percentiles) given in the text refer to data from all subjects pooled.
RESULTS
The results are divided into eight sections: First we introduce some general features of eye and hand movements based on a single target contact trial (1). We then analyze landmarks in the scene that attract subjects' gaze during our task (2) and assess the pattern of sequential landmark fixations within and across trials and across obstacle conditions (3). The spatiotemporal coordination of gaze and hand actions is then described (4). In the subsequent section we specifically address how the obstacle condition influenced various fixation parameters and establish that certain landmarks are obligatory and others are optional (5). We then analyze the temporal coordination between gaze shifts entering and exiting landmark zones and the specific kinematic events associated with the landmark (6). We also address the spatial accuracy of saccadic gaze shifts in manipulation (7i). Finally, we deal with shortcomings in manipulatory behavior if eye movements are prevented and subjects have to rely entirely on peripheral vision and memory (8).
Gaze–hand coordination in a single target contact trial
Figure 3 shows the pattern of eye and hand movements for a single trial involving the triangular obstacle. Figure 3A shows movements up until the bar contacts the target, and B show movements from that moment until the end of the trial. The dashed line in each panel represents the position of the tip of the index finger, and thesolid line represents gaze position. The numbered circles indicate gaze fixations and their sequence. A corresponding number indicates the path of the index finger during each fixation and subsequent saccade. Consecutive fixation, saccade, and hand path units are represented in alternating colors ofgray and black.
Several points can be gleaned from Figure 3. First, gaze and hand movements were linked with respect to key landmarks with gaze leading the hand. Gaze fixated the region of the grasp site (2 and3) throughout the reach toward the bar (Fig. 3A). When the index finger arrived at the grasp site, gaze shifted to the tip of the bar (4), and when the bar started to move, a saccade was made to the tip of the obstacle (5,6). Gaze then shifted to the vicinity of the target (7, 8, 9) as the bar rounded the obstacle. When the bar was still in contact with the target, gaze shifted back to the obstacle (Fig. 3B, 10,11). As the bar and hand rounded the obstacle, the gaze shifted to the support surface (12) and remained there (13, 14) until the bar was replaced. The leftward gaze motion between 13 and 14 represents a pronounced gaze drift that occurred during fixation 14. The amplitude of this drift was at the upper extreme of the drifts that we observed. As illustrated in the Figure, each landmark could be fixated more than once. The number of fixations in this trial was close to the average number of fixations per trial in the triangular obstacle condition (see further below). Note that subjects never fixated or tracked the hand or moving bar during the task.
Fixation landmarks
Subjects thus directed gaze almost exclusively to objects involved in the task. Furthermore, gaze was directed to landmarks on these objects that were important in the task. These included the forthcoming grasp site on the bar, the left tip of the bar used to contact the target, the protruding point(s) on the obstacle, the target, and the support surface.
Figure 4A shows the distribution of all gaze fixations, from all subjects and trials, from the time gaze first left the fixation zone to the moment the bar had been released and the hand moved to the parking position. Separate plots are shown for each obstacle conditions (none, rectangular, and triangular). In addition, for each condition, we show two plots representing fixations that started before (left) and after (right) the tip of the bar entered the target zone. Theblack circles represent fixations within 3° (2 cm in the work plane) of one of the five landmarks, i.e., fixations within the landmark zones (see Materials and Methods). The gray circlesrepresent fixations outside these landmark zones. The area of eachblack and gray circle is proportional to the duration of the fixation. The solid bars indicate the mean positions of the bar at the start (left panels) and end (right panels) of the trial; the dashed extensions represent the range of positions.
Figure 4A illustrates that the majority of fixations were located close to the landmarks. The 1351 fixations shown in Figure4A corresponded to an average of 12.5 fixations per trial. Of the total number of fixations, 1109 (82%) were within 3° of one of the landmarks. In addition, the durations of fixations within the landmark zones (median = 0.34 sec) were significantly longer than those outside the zone (median = 0.19 sec) (Mann–WhitneyU test; p < 0.001). On average, subjects directed their gaze toward landmarks 90% of the total time spend looking at the scene.
The scatter of fixation points within the grasp site and tip of bar zones is relatively large because the horizontal position of the bar varied from trial to trial. The scatter of fixation points within the grasp site zone was further affected by subjects' choice of grasp site. Figure 4B shows the distribution of gaze fixations within these landmark zones normalized for variations in bar position and grasp site, with the positions of grasp site fixations referenced to the mean grasp site for all trials by all subjects. Figure 4B also shows the distribution of gaze fixations within the landmark zones of the target and the obstacle. Thedotted-line circles in Figure 4B represent the landmark zones (3° radius), and the dots that represent fixation location are not scaled for duration of fixation. Gaze fixations were not evenly distributed within the 3° radius that we used to define our landmark zones. Instead they tended to be clustered, and the centers of these clusters did not align perfectly with the nominal landmarks. Moreover, the degree of clustering appeared to vary across landmarks. To quantify this clustering, we first determined the center of gaze as the mean horizontal (x) and vertical (y) position of all fixations within each landmark zone. We then computed, for each landmark, the diameter of a circle about the center of gaze that captured 90% of the fixations (Fig. 4B, solid-line circles). Diameters were obtained for the tip of the bar, the grasp site, and the target (combining data from all three obstacle conditions) as well as for the tip of the triangular obstacle for the periods before and after the tip of the bar entered the target zone. The diameters obtained for the target (3.7°) and tip of the bar (3.3°) were smaller than those obtained for the grasp site (5.0°) and the obstacle for fixations before (5.2°) and after (5.6°) the tip of the bar entered the target zone, respectively.
Inspection of Figure 4, A and B, reveals that there were offsets between the center of gaze and the nominal landmark for the target, tip of bar, and obstacle. Most notably, for the target and the obstacle, gaze was directed, on average, to a point in space displaced from the physical landmark. The centers of thesolid-line circles in Figure 4B defining the gaze distributions for the target and obstacle were located 7 mm from the center of target contact surface and 3 mm from the tip of the obstacle, respectively. For fixations related to the obstacle, one intriguing idea is that they served as a virtual target through which the tip of the bar traveled en route to the target. However, this appeared not to be the case, because the closest approach of the tip of the bar was considerably farther from the obstacle (Fig.4A). Moreover, the location of gaze fixation did not correlate across trials with the position of the tip of bar at its closest approach.
For the grasp site zone, the center of gaze was located ∼0.5 cm above the grasp site, but its horizontal position was close to the grasp site. Importantly, the horizontal position of the forthcoming grasp site was directly related to gaze position on a trial-by-trial basis. Figure 4C shows, for all trials, the location of the grasp site (○) in bar coordinates and the gaze position of grasp site fixations (●). The gaze position is represented as the mean gaze position of grasp site fixations for each trial; in a given trial, there could be several fixations within the grasp site zone. The scatter plot in Figure 4C plots, for the same data, the horizontal position of the grasp site against that of gaze fixation. These positions were positively correlated (r = 0.76;p < 0.001), and the slope was close to 1. Thus, on a trial-by-trial basis, the gaze position appeared to predict the forthcoming grasp position.
Sequence of landmarks fixated
Although the sequence of landmarks fixated could vary across subjects and trials within subjects, during any given trial the sequence was clearly linked to the progress of the task. To assess the pattern of sequential landmark fixations, we first determined, for each trial, the sequence of landmarks that were fixated. On the basis of these sequences, we then determined how often the gaze went from a given landmark to each of the other landmarks and expressed this number as a proportion of all shifts between landmarks. Note that the number of landmark shifts was smaller than the total number of saccades because of multiple fixations within landmark zones and the occurrence of fixations outside these zones. Likewise, as will be shown below, not all landmarks were always fixated. Occasionally, subjects could revisit a given landmark during a trial and thus fixate it more than once.
The arrows in Figure 5illustrate the flow of gaze fixations between landmarks for each obstacle condition based on landmark-shift data from all trials by all subjects. For each obstacle condition, the width of each arrow represents the proportion of all gaze shifts between landmarks during the task. The left panels show gaze shifts between landmarks en route to the target, and the right panels show gaze shifts between landmarks away from the target en route to the support surface. In all obstacle conditions, the grasp site was usually the first landmark fixated as illustrated by the thick arrowsfrom the fixation zone (circle) to the grasp site. In the no-obstacle condition, there were two main paths en route to the target (Fig. 5A, left panel). After fixating the grasp site, subjects either shifted gaze directly to the target or indirectly via the tip of the bar. Gaze was then shifted to the support surface (right panel). A similar pattern was observed in the obstacle conditions, except that the obstacle was frequently fixated (after the grasp site or tip of bar) en route to the target (Fig. 5B,C, left panels) and was fixated again en route between the target and the support surface (right panels). However, with the triangular obstacle in particular, gaze could shift directly from the grasp site to the target and from the target to the support surface, avoiding the obstacle in both cases (Fig. 5C).
Spatiotemporal coordination of gaze and hand
Figure 6 shows the spatiotemporal coordination of gaze and hand actions based on data pooled across all subjects and trials involving the triangular object. To preserve phase information while combining data from different trials in Figure 6, we normalized the time base by scaling each phase of each trial to the median duration of that phase (Fig.7E, striped bars). Each of the panels in Figure 6A shows the distance between gaze and one of the landmarks as a function of time. The dots represent gaze position at the start of each fixation, and the horizontal lines connected to eachdot represent the duration of the fixation. The solid curve represents the distance between the median gaze position and the landmark. The dashed curves in Figure6A refer to kinematic data displayed on the same time base as the gaze data. These curves give the median distance between the tip of the index finger (a), the tip of the bar (c, d), or the lowest point of the bar (e) and the indicated landmark. The vertical lines mark the different phases of the task, and thehorizontal rectangles represent the 2 cm or 3° landmark zones. Figure 6B illustrates the time-varying probabilities of gaze fixating the different landmarks (computed in 100 msec bins). The thick solid curve in each panel shows the instantaneous probability of gaze fixation being within the 3° landmark zone. The contour of the gray area shows the probability of gaze fixation being within 2° of the landmark. Thethin solid curve represents the probability of there having been a fixation within 3° of the landmark at any previous time during the trial.
During the pre-reach phase, gaze began to shift to the grasp site, and the median gaze fixation position had reached the grasp site zone before the end of this phase (Fig.6A,a). This occurred ∼1 sec before the median position of the fingertip arrived at the grasp zone. However, by the end of the pre-reach phase, the tip of the bar and the obstacle had been fixated in ∼20 and 10% of the trials, respectively (Fig. 6B, thin curves in b andc). By the middle of the reach phase, the probability of the grasp site having been fixated was close to 1 (Fig.6B,a). The instantaneous probability of fixating the grasp site fell during the grasp phase and approached zero by the start of the up phase. During the grasp phase, the median position of fixation started to leave the grasp site while the fingertip was still approaching the grasp site (Fig.6A,a).
From the grasp site, fixation shifted to one of three landmarks, the tip of the bar, the obstacle, or the target. However, the timing of shifts varied across landmarks. During the grasp phase, fixation shifted from the grasp site to the bar tip in approximately one-quarter of all trials (Fig. 6B,b). The instantaneous probability of fixating the tip of the bar started to increase when the instantaneous probability of fixating the grasp site began to decrease. The stepwise increase in the probability of having previously fixated the tip of the bar indicates that these trials were not those in which the tip of the bar had been fixated during the pre-reach phase. In other words, the subjects rarely returned to the tip of the bar if it was previously fixated in the trial.
Fixations started to shift from the bar to the obstacle and the target ∼0.5 sec later (Fig. 6B,c). Thus, up to the end of the grasp phase, fixations were mainly directed to the bar (grasp site or tip) but were directed elsewhere once the bar began to move at the start of the up phase. After leaving the bar, gaze typically shifted first to the obstacle and then to the target, which was fixated in all trials. However, in ∼20% of the trials, gaze shifted directly to the target. The early peak, indicated by anarrow, in the probability curves for the target in Figure6B,d, reflects such gaze shifts. The subsequent increase in the instantaneous probability of fixating the target was closely mirrored by the decrease in the instantaneous probability of fixating the obstacle. The median gaze position entered the target landmark zone ∼0.8 sec before the tip of the bar contacted the target, whereas the median gaze position left this zone ∼0.2 sec before the tip of the bar moved away from the target (Fig.6A,d). During the target phase, the instantaneous probability of fixating the landmark-zone representing the tip of the bar increased because the tip entered the target zone (Fig.6A,b,B,b). However, subjects never tracked the tip of the moving bar, and we never observed gaze shifts between the target and the tip of bar during the target phase.
From the target, gaze shifted to the support surface either directly or via the obstacle. When the obstacle was fixated in either the up or down phases, the fixation tended to be brief and typically began ∼0.5 sec before the tip of the bar reached it closest point to the obstacle (Fig. 6A,c).
The instantaneous probability of fixating the support surface when replacing the bar was close to 1 (Fig.6A,e,B,e). Note that the probability of fixating the support surface zone was also quite high during the early phases of the trial before bar movement. However, this early high probability was attributable to fact that the landmark of the support surface overlapped with the landmark of the grasp site. As shown in Figure6A,e, when the grasp site was fixated during the early phases of the trial, the median gaze position was ∼1 cm above that observed when the support surface was fixated. In a similar vein, modest peaks in the instantaneous probabilities of fixating the grasp site and tip of the bar landmarks were observed when the bar was replaced. Note that in all three cases in which the probability increased because of overlapping landmarks, the probability decreased markedly when the landmark zone was reduced from 3 to 2° (Fig.6B,a,b,e). This indicates that these landmarks were not the primary gaze targets.
As shown in Figure 6C, fixation duration varied during the course of the trials. Longer fixations were observed when gaze was directed at the grasp site, support surface, and, in particular, the target. These are the three landmarks that were contacted with either the fingertips or the bar. Figure 6D shows that during the course of the trial, a large number of fixations occurred, with a median value of 16.
Fixation parameters across landmarks and conditions
Figure 7A shows, for each obstacle condition, the probability of fixating each of the landmarks within 3° (2 cm in the work plane) during a trial. In all obstacle conditions, the grasp site, target, and support surface were fixated in almost every trial and can be considered obligatory gaze landmarks. The tip of the bar and the protruding point(s) on the obstacle were fixated with lower probability and can therefore be considered optional gaze landmarks. The rectangular obstacle was fixated with greater probability than the triangular obstacle during both the upward and downward movements (χ2; p < 0.001 in both cases). The grasp site, target, and support surface were fixated for a substantially longer time than the optional landmarks (Fig.7B) (Mann–Whitney U; p < 0.001).
We were particularly interested in whether subjects would reduce the total fixation duration at the grasp site and target in the presence of an obstacle. Planned comparisons revealed that the total fixation duration at the target was reliably greater in the no-obstacle condition than in the two obstacle conditions combined (Mann–WhitneyU; p < 0.003). However, the obstacle condition did not significantly influence the total fixation duration at the grasp site (Mann–Whitney U; p = 0.50).
The number of fixations per landmark primarily influenced the variation in total fixation duration across landmarks (Fig. 7C), but the duration of the individual fixations also contributed (Fig.7D). The number of fixations at the obligatory landmarks (grasp site, target, and support surface) was significantly greater (Mann–Whitney U; p < 0.001) than the number at the optional landmarks (tip of bar and the obstacle in both the up and down phases of the task). Likewise, the durations of individual fixations were reliably longer for the obligatory landmarks (Mann–Whitney U; p < 0.001). The effect of obstacle on the total fixation duration at the target was attributable to the number of fixations and not the duration per fixation (Fig. 7, compare D, C). In Figure7A–D, we have combined data from all four successive trials in each obstacle condition; neither the probabilities of fixating each landmark nor the fixation duration changed across trials.
Figure 7E shows the duration of each phase for all three obstacle conditions. The overall duration of the trial was not significantly affected by obstacle condition (Kruskal–Wallis test;p = 0.33). However, there were reliable differences across conditions in the durations of the up, target, and down phases (Kruskal–Wallis; p < 0.001 in all cases). As expected, the durations of the up and down phases were shorter in the no-obstacle condition. In contrast, the duration of the target phase was longer when no obstacle was present, and this matches the greater total fixation of the target under this condition.
Time relations between gaze shifts and kinematic events
Figure 8 analyzes the timing of gaze shifts entering and exiting landmark zones referenced in time to kinematic events associated with the landmark. Gaze landmark zones were defined using the 3° distance of the landmark (see Materials and Methods).
Considering the grasp site zone (Fig. 8A), on average, gaze entered this zone almost 2 sec before the index finger contacted the bar. This occurred about the time of reach onset (Fig.6A). However, the exit times were distributed about the point in time of grasp contact. On average, gaze exited the grasp site zone just before contact (median = 163 msec), but left after contact in one-quarter of the trials. In virtually all trials gaze had already exited the grasp site zone by the time the bar started to move. Gaze arrived at the grasp site slightly earlier when there was no obstacle (Mann–Whitney U; p < 0.04). This apparent obstacle effect was likely caused by the greater pre-reach, reach, and grasp phase durations observed in the first test series, which was run without an obstacle (Fig. 7E). However, the presence of an obstacle did not influence the distribution of gaze exit times from the grasp site (p = 0.90), despite the fact that the obstacle was usually fixated. Thus, subjects did not sacrifice the visual control of fingertip contact to fixate the obstacle when present. Note that the distribution of gaze exit times, although far sharper than that of the entry times, was nevertheless quite variable. The SD of the time between gaze exit and grasp contact onset was 241 msec (collapsing across all obstacle conditions), and the corresponding variability for gaze entry was 624 msec.
Figure 8B shows when gaze entered and exited the obstacle zone with reference to the time at which the tip of the bar passed closest to the obstacle during the up phase. Despite the fact that the obstacle was an optional fixation landmark, the gaze entry and especially the gaze exit time distributions were relatively tight, indicating a strong coupling between gaze and bar movement. The SD for the exit time was 188 msec, whereas that of the entry time was 392 msec. On average, gaze arrived at the obstacle ∼0.5 sec (median = 0.54 sec) before the tip of the bar made its closest approach and departed almost at the same time as the closest approach (median time difference = 1 msec).
Considering the target landmark zone, neither gaze entry nor exit time was well aligned to the moment the tip of the bar initially contacted the target. On average, the gaze entered the target zone ∼1 sec before target contact (median = 1.13 sec) and exited the zone well after initial target contact (median = 0.71 sec). Moreover, the coupling between these gaze and contact events was rather loose; the SDs of the time difference distributions were 0.54 and 0.47 sec, respectively, and were similar for the different obstacle conditions. However, gaze was tightly coupled to the moment the target switch was released, which represents the goal completion of the target contact phase. Figure 8C shows when gaze entered and exited the target zone referenced to switch release. Note that gaze exit times were distributed evenly about the moment of switch release (median time difference = −58 msec) with a SD of 341 msec. Gaze exit times were not influenced by the presence of the obstacle despite the fact that the duration of the target phase and total target fixation duration were both greater in the no-obstacle condition (Fig. 7). On average, gaze arrived in the target zone ∼2 sec before switch release (median = 1.79 sec) but with a reliably greater lead in the no-obstacle condition in which gaze did not fixate the obstacle en route to the target (Mann–Whitney U; p < 0.003). Thus, subjects began to fixate the target zone well before the target contact phase and continued to fixate the target zone until the switch task was completed as signaled by switch release.
Figure 8D shows when gaze entered and exited the obstacle zone with reference to the time at which the tip of the bar passed closest to the obstacle during the down phase. As during the up phase, the entry and exit time distributions were relatively narrow, indicating a strong coupling between gaze and bar movement. The SDs for the entry and exit distributions were 253 and 245 msec, respectively. As during the up phase, gaze arrived at the obstacle ∼0.5 sec before the tip of the bar made its closest approach (median = 0.44 sec) and departed at the same time as the closest approach (median time difference = −66 msec). That is, the variability of gaze exit times at the obstacle (referenced to closest approach) for both the up and down phases was no greater than that observed at the grasp site and target, although the obstacle was an optional fixation landmark and no actual contact event occurred.
Considering finally the landmark zone of the support surface, the gaze arrived at this zone ∼1 sec (median = 1.04 sec) before the bar contacted the surface, and for all trials, gaze stayed there until after contact (Fig. 8E). Compared with the gaze exit times for the other landmark zones represented in Figure 8, the gaze exit times from the support surface zone were loosely coupled to the kinematic event (SD = 730 msec). However, at the time the bar contacted the support surface, the manipulatory task was completed and there was no forthcoming landmark that attracted gaze. As a result, the gaze either stayed in the region of the support surface until the shutter closed and the fixation light was activated, or it could shift to a point in the vicinity of the bar (see long-duration fixations in the right middle panel of Fig. 4A), and occasionally it could shift back to the target during the reset phase (Fig. 6B,d).
In sum, for all landmarks, gaze arrived in the landmark zone well before the hand or tip of the bar. The time at which gaze exited each landmark zone (with the exception of the support surface fixated at the end of the trial) was closely aligned with a contact event. The contact event could be making (grasp site) or breaking (target) of a contact or a potential contact (obstacle).
Accuracy of saccades to landmarks
Previous work on saccade generation using point light targets has shown that initial shift in gaze typically undershoots the target and is followed by one or more corrective saccades (Becker, 1991). We examined whether this behavior is also observed in the context of object manipulation with natural gaze targets. We focused on saccades that shifted fixation from the grasp site, tip of the bar, or obstacle landmarks to within 3 cm (4.4°) of the centroid of all fixations in the target landmark zone. (The centroid, defined by the averagex and y fixation positions, was computed for each subject separately and is our best estimate of the true gaze target.) We also included local saccades that changed fixation within this area, and data from all three obstacle conditions were combined. For each of these saccades, we computed the resultant distance between the fixation at the end of the saccade and the target fixation centroid. We considered this distance as a measure of saccadic error. Figure9A shows the resultant distance or error as a function of saccade amplitude, and Figure 9,B and C, shows the separate x andy errors as a function of saccade amplitude. The large amplitude saccades (∼10 cm) originated from the grasp site and tip of the bar, the medium amplitude saccades (∼5 cm) originated from the obstacle, and the small amplitude saccades were refixations within the target zone. Significant correlations were observed between saccade amplitude and resultant distance and between saccade amplitude and the separate x and y distance (p < 0.001 in three cases). As can be seen in Figure 9A, the saccadic error increased with saccade amplitude. With large amplitude saccades, the initial fixation tended to undershoot the target in the vertical (y) direction and tended to be located to the right (x) of the target. The finding that large saccades that brought gaze from one landmark to another typically undershot the final gaze position and that the undershoot scaled with the amplitude of the required gaze shift is consistent with most previous studies of target-directed saccades (Becker, 1991).
We also observed that that the duration of the fixation after a saccade decreased as the distance from the fixation to the centroid increased (Fig. 9D). Fixations that were farther than ∼1 cm from the centroid were generally very brief (between 50 and 200 msec), whereas closer fixations were an order of magnitude longer (note the vertical log scale in Fig. 9D). This strongly suggests that the more distant fixations were followed rapidly by corrective saccades. Indeed, it is established that the latency of the first corrective saccade decreases with the magnitude of the error that necessitates the correction but that this function approaches an asymptotic minimum (Deubel et al., 1982; Kapoula and Robinson, 1986). However, not every small saccade was necessarily a correction improving the gaze position with respect to target position. Lemij and Collewijn (1989), for example, noted secondary saccades after accurate primary (long) saccades (landing with 0.1° of the target) on ∼50% of the trials.
That most fixations of long duration were gathered within ∼1 cm from the target fixation centroid (Fig. 9D) implied that the preferred gaze location was within a diameter of ∼2 cm or ∼3° of vision (Fig. 4). This suggests that the extent of the “functional fovea” for the target-related fixations corresponded to ∼3° angle of vision. In view of the errors in our measurements (see Materials and Methods), this angle appears consistent with previous estimates of the size of the fovea as the central 2° of vision (Rayner, 1998). Besides, targets within 4° or so of central vision are still perceived at ∼50% of maximal acuity (Carpenter, 1991).
It is well documented that when saccades are made to two localized targets in reasonably close proximity (e.g., <10 or 20° in eccentricity), the first saccade can go to some intermediate location. This is referred to as the global or center of gravity effect (Findlay, 1982; Deubel et al., 1984; Ottes et al., 1984). Likewise, if one element is larger (e.g., target and obstacle in our experiments), then the saccade tends to land closer to the larger element in comparison to a condition in which the two elements are identical. Given this background we were interested in whether saccade accuracy to the target zone was affected by the presence of an obstacle in our manipulatory task. To test this, we compared the accuracy of saccades from the grasp site with the target in the no-obstacle condition and the two obstacle conditions combined. The deviation between the initial incoming fixation and the target centroid was not affected by the presence of an obstacle. Saccades from the target directly to the support surface also exhibited undershoot. However, as in the case of saccades from the grasp site to the target, the saccade error was not affected by the presence of an obstacle.
Performance without eye movements
We have observed that subjects chose to generate saccadic eye movements that brought important landmarks into central vision in a manner related to the phase of the task. Essentially, gaze appeared to lead the hand throughout the task. Presumably, these eye movements provided retinal and extra-retinal information that was useful to control the hand. However, people can manipulate objects without always gazing the objects involved, as when we grasp our morning coffee while reading the newspaper. This suggests that peripheral vision and/or memory can be adequate for guiding manipulatory hand movements in some contexts. Because the task that we examined was stable across repeated trials and varied only slightly across conditions, subjects should have been able to make effective use of peripheral vision and memory.
To study the importance of the saccadic eye fixations in our task, we asked our subjects to repeat the task with the rectangular obstacle while fixating the fixation light throughout the trial (see Materials and Methods). We then compared the performance during gaze locking with that observed during free gaze movements by analyzing the durations of the phases of the task and the efficiency of grasping, target attainment, and obstacle clearance. Overall, there was only a modest degradation of performance when eye movements were prevented. The phase durations were unaffected with the exception of the target phase, which tended to be shorter and more variable across trials during gaze locking (p < 0.05; Kolmogorov–Smirnov). We quantified grasping efficiency as the distance traveled by the tip of the index finger during the contact phase; errors and subsequent corrections in positioning the fingertip would be associated with greater distances. However, prevention of eye movements did not influence grasping efficiency. We likewise quantified the efficiency of target attainment as the distance traveled by the tip of the bar during the target phase. In contrast to grasp efficiency, the efficiency of target attainment was influenced by gaze fixation (p < 0.001; Kolmogorov–Smirnov). As shown in Figure 10, without eye movements the distribution of travel distances in the fixation condition was skewed toward greater values than with free gaze movements, and fixation increased the travel distance in approximately one-third of the trials. However, the decrement in performance was not as severe as when vision was occluded after an initial 3 sec viewing time before action (Fig.10, thin line curve).
The minimum distance between the obstacle and the tip of the bar was also influenced by gaze locking (p < 0.01; Kolmogorov–Smirnov). Without eye movements, the minimum distance was more variable across trials and, on average, was smaller. Furthermore, the bar contacted the obstacle in 14% of the trials without eye movements but never did so when eye movements were unconstrained. These results, combined with the decreased efficiency of target attainment, suggest that the tip of bar motion was less sensitive to the obstacle and target when eye movements were prevented. Many subjects spontaneously remarked on the high concentration required to perform the task with gaze locked at the fixation light.
DISCUSSION
We examined gaze–hand coordination in a natural manipulation task in which subjects grasped and moved a bar to a target, either directly or around an obstacle, and then returned the bar to the support surface. Consistent with the findings of Land and colleagues (1999), subjects directed gaze almost exclusively toward objects involved in the task. Furthermore, subjects fixated certain landmarks associated with these objects. Landmarks at which contact events took place were obligatory gaze targets and included the grasp site on the bar, the target, and the support surface in advance of contact. However, other landmarks were optional; the obstacle and tip of the bar were fixated in some trials but not others. Note that the obstacle represents a potential contact point, whereas the tip of the bar is a contact point when referenced to the position of the bar rather than the background (world coordinates). Subjects never fixated the hand or the moving bar.
Land and colleagues (1999) considered four functions of gaze fixation in manipulation tasks: locating objects, directing the hand or object in hand to contact an object, guiding contact between two objects that are approaching one another, and checking the state of task-related variables. After this taxonomy, most of the fixations in our task were directing fixations. Thus, subjects fixated the grasp site as the hand approached the bar and the target and support surface as the bar approached. The fixations related to the obstacle may also be considered directing fixations because they apparently specified a location around which the tip of the bar should travel. In addition to directing fixations, we observed locating fixations. Subjects often fixated the tip of the bar early in the task before it moved, presumably to obtain spatial calibration about the extent of the bar useful for motion planning. Likewise, subjects occasionally fixated the obstacle or target before the grasp site. Finally, the persistent fixations of the target zone after initial bar contact may have represented checking fixations that monitored the depression of the target switch. In our task, only a single object was manipulated, and thus guiding fixations were not observed.
Contribution of peripheral vision and memory
Subjects performed our manipulation task reasonably well without eye movements. However, subjects occasionally contacted the obstacle or missed the target. Thus, one advantage of shifting gaze is to guard against these occasional errors. Interestingly, many subjects reported that the fixation condition required a great deal of effort. We presume that this “effort” is required to suppress eye movements during shifts in attention that would normally be linked with shifts in fixation (Kustov and Robinson, 1996). The location of the objects involved in our task in a single plane approximately equidistant from the eye could have facilitated use of peripheral vision in guiding movements. Eye movements may confer stronger performance benefits in environments where the locations of objects vary in depth. Furthermore, that the positions of the objects involved either were held constant across trials (target, obstacle, support surface) or varied slightly (bar) could have facilitated use of visual or kinesthetic memory. However, we observed that the probabilities of fixating the obstacle and the tip of the bar did not change across trials with the same obstacle. Using a task in which subjects arranged blocks to match a model, Ballard and colleagues (1992, 1995) showed that subjects prefer to continuously refixate the model between each block movement rather than rely on memory of the model.
Saccade guidance
How eye movements are directed to objects of interest is an important question in natural manipulation tasks. Although numerous studies have examined the process of saccade target selection using small localized targets (Schall, 1995; for review, see Desmurget et al., 1998; Schall and Thompson, 1999), little attention has been paid to saccades directed to objects. Kowler and colleagues (He and Kowler, 1991; Kowler and Blaser, 1995; Melcher and Kowler, 1999) examined saccades to objects of various shapes presented in peripheral vision. They found that gaze was consistently directed to the geometric center-of-mass or area of the object. However, our results demonstrate that in manipulation, subjects directed their gaze to specific locations that appeared critical for the control of the task rather than to visually noticeable intrinsic features of objects as is typical in visual perception tasks (cf. Steinman, 1965; Findlay, 1982; Findlay et al., 1993; McGowan et al., 1998; Melcher and Kowler, 1999). Considering the bar, subjects directed saccades toward the grasp site and tip of bar. Furthermore, for target and obstacle, saccades were directed to sites that were actually offset from the physical extension of the objects. Thus, the target and the protruding element(s) of the obstacle offered exocentric localization cues for directing the saccades. As such, there is evidence that objects in the visual scene, other than the immediate saccadic target, may facilitate target encoding (Hayhoe et al., 1992; Dassonville et al., 1995; Karn et al., 1997). Likewise, there is evidence that extrafoveal cues can support gaze stabilization during the fixations (Epelboim and Kowler, 1993).
Discrete event-driven sensory control
It is well established that peoples' eye movements depend on the task and the particular cognitive strategy that is used (Yarbus, 1967;Viviani, 1990; Rayner, 1998; Liversedge and Findlay, 2000). In scene perception and other self-paced visual tasks such as reading and visual search, variation in duration of fixations has been attributed to the difficulty of foveal-based cognitive processes and computational processes that specify the parameters of the following saccade (Carpenter, 1988; Rayner, 1998; Schall and Thompson, 1999). However, our findings indicate that the kinematics of the task rather than visual perceptual processing determined when to shift gaze between landmarks. The time at which gaze exited a given landmark zone was tuned to the time of an important kinematic event at that landmark. Specifically, gaze exited the grasp site zone around the time of grasp contact, the obstacle zone around the time of the nearest approach of the tip of the bar, and the target zone around the time of switch release. Consequently, the processes accounting for gaze shifts in manipulation are phasically coupled to the neural programs that control the hand.
The motor output during dexterous manipulation largely relies on predictive control mechanisms, the formation and updating of which depend on correlations between motor output signals and their sensory consequences as established by manipulatory experiences. Initial state information about object size, shape, and local surface geometry of grasp sites provided by vision is commonly used for predictions of required fingertip forces (Gordon et al., 1991, 1993; Jenmalm and Johansson, 1997; Flanagan and Beltzner, 2000; Jenmalm et al., 2000), and digital somatosensory afferent signals are known to mediate critical sensory consequences (Johansson and Westling, 1984, 1987;Westling and Johansson, 1987; Jenmalm and Johansson, 1997; Jenmalm et al., 2000). In the control policy described for manipulatory tasks termed “discrete event, sensory-driven control” (Johansson and Cole, 1994; Johansson, 1998), the CNS stipulates sensorimotor programs that specify both the required fingertip actions and the expected sensory consequences associated with the execution of each phase of the task. Thus, in precision lifting, the CNS expects to receive tactile information about specific mechanical events, such as the digits contacting the object and object acceleration, that confirm the successful completion of phases of the task, such as reaching to the object and lift-off. If the sensory event either occurs too early or does not occur at the expected time, the brain invokes automated corrective actions. Thus, the motor system reacts rapidly to both the presence of an unexpected somatosensory event and the absence of an expected somatosensory event.
The “anchoring” of gaze–hand coordination to actual or potential contact points observed in our task suggests a similar role for visual sensory information. However, compared with discrete event somatosensory-driven control, the temporal coordination between the visual sensory event, when represented as gaze shift, and the kinematic events is rather coarse, with SDs of the time difference being ∼0.25 sec. Yet, this gaze strategy reflects monitoring of critical kinematic events and would be useful for verification of goal completion for various phases of the evolving task. Specifically, we propose that the anchoring behaviors represent spatiotemporal checkpoints for the development, maintenance, and adaptation of correlations between visual and somatosensory information (proprioceptive and tactile) and efferent copy signals required for predictions of motor commands in natural manipulatory tasks. The frontoparietal networks of the primate brain, for instance, seem well suited for correlation of multimodal sensory information and efferent copy signals (Andersen et al., 1997).
In summary, we conclude that gaze supports the planning and control of manipulatory actions by marking key positions (actual and potential contact points) to which the fingertips or grasped objects are subsequently directed. Thus, the salience of potential gaze targets was largely determined by the demands of the sensorimotor task. Furthermore, our results are compatible with a function of gaze in contributing to the development and maintenance of sensorimotor correlation matrices that support predictive motor control in manipulation.
Footnotes
This study was supported by the Swedish Medical Research Council (Project 08667), the Göran Gustafsson Foundation for Research in Natural Sciences and Medicine, Sweden, and the Canadian Institutes of Health Research.
Correspondence should be addressed to Roland S. Johansson, Section for Physiology, Department of Integrative Medical Biology, Umeå University, SE-90187 Umeå, Sweden. E-mail:Roland.S.Johansson{at}physiol.umu.se.