Abstract
Brief stimuli presented near the onset of saccades are grossly mislocalized in space. In this study, we investigated whether the Bayesian hypothesis of optimal sensory fusion could account for the mislocalization. We required subjects to localize visual, auditory, and audiovisual stimuli at the time of saccades (compared with an earlier presented target). During fixation, vision dominates and spatially “captures” the auditory stimulus (the ventriloquist effect). But for perisaccadic presentations, auditory localization becomes more important, so the mislocalized visual stimulus is seen closer to its veridical position. The precision of the bimodal localization (as measured by localization thresholds or just-noticeable difference) was better than either the visual or acoustic stimulus presented in isolation. Both the perceived position of the bimodal stimuli and the improved precision were well predicted by assuming statistically optimal Bayesian-like combination of visual and auditory signals. Furthermore, the time course of localization was well predicted by the Bayesian approach. We present a detailed model that simulates the time-course data, assuming that perceived position is given by the sum of retinal position and a sluggish noisy eye-position signal, obtained by integrating optimally the output of two populations of neural activity: one centered at the current point of gaze, the other centered at the future point of gaze.
Introduction
When asked to report the position of a visual stimulus flashed just before a saccade, subjects mislocated it, primarily in the direction of the saccade (Matin and Pearce, 1965; Mateeff, 1978; Honda, 1989). Typically, localization errors start up to 100 ms before saccadic onset and continue well after the saccade finishes, being maximal at saccadic onset. The dynamics of the effect led Matin (1972) to suggest that perceived position is given by the sum of retinal position, together with a sluggish extraretinal position signal.
In recent times, Bayes' (1764) theorem has been successfully applied to perceptual research, accounting for how many separate sensory cues can be integrated in a statistically optimal manner to yield the most probable form of the perception (Clark and Yuille, 1990), even if this leads to illusions (Weiss et al., 2002). This approach has been applied successfully to many fields of perception, particularly multimodal perception (Ernst and Banks, 2002; Ernst and Bulthoff, 2004). An interesting example is the so-called “ventriloquist effect,” in which a sound appears to emanate from a puppet rather than the more distant ventriloquist; in other circumstances, however, the sound can “capture” the visual stimulus and cause it to be seen in an erroneous position (Alais and Burr, 2004). The Bayesian framework is well suited to explain these cross-modal effects, in which the perceived position of an event is given by combining auditory and visual information, after weighting each source according to its reliability (intertrial consistency). Normally, vision is more reliable than audition, and dominates; but if the visual stimulus is blurred, audition can become more reliable and predominant. Importantly, when both forms of information are present, precision for localizing the stimuli improves, although they are seen in erroneous positions (Alais and Burr, 2004).
The purpose of this study was to test whether the transient spatial distortions at the time of saccades may be consistent with statistically optimal “Bayesian” integration [as has been applied to other aspects of saccadic misperceptions (Niemeier et al., 2003)]. We take advantage of the fact that saccades have little effect on auditory space perception (Harris and Lieberman, 1996; Klingenhoefer and Bremmer, 2004) to investigate audiovisual localization in normal and saccadic conditions. We find that the auditory signal becomes more important for localization at the time of saccades, suggesting that the visual signal has become transiently noisy, and therefore receives less weight. Importantly, localization precision is better for the multimodal presentation than for the unimodal presentation, a strong prediction of the Bayesian theory of information fusion. We also show that, by assuming the optimal integration of sensorimotor information, the illusory mislocalization of visual stimuli during saccades can be well modeled.
Materials and Methods
Subjects and apparatus.
Four subjects (two authors and two observers naive to the goals of the experiment) participated in the entire experiment; in one of the experimental conditions (main experiment, visual stimulation), three additional naive subjects were tested. All participants had normal or corrected-to-normal vision and normal hearing. They were seated facing a vast quasi-hemispheric screen (Vision Station VS102; Elumens, Cary, NC) subtending ∼140 × 140° at the viewing distance of 60 cm, illuminated by a liquid crystal display projector (Epson/Elumens SPIClops API; refresh rate, 67 Hz), with head movement minimized by a chin rest. Ten small speakers (Philips MMS 221/00) were mounted behind the screen, at 8° intervals (the minimum interval allowed by their overall size).
Stimuli.
Visual stimuli were briefly displayed (15 ms) as 2° blue blobs [Commission Internationale de l'Eclairage (CIE) coordinates: x = 0.142, x = 0.065; luminance, 1.21 cd/m2] presented against a red background (CIE coordinates: x = 0.59, y = 0.35; mean luminance, 2.6 cd/m2). Auditory stimuli were short (15 ms) bursts of white noise of 40 dB intensity, played through one of the 10 speakers (selected by computer-controlled digital switch). Both visual and auditory stimuli were presented 8° above gaze level, at various eccentricities. For bimodal presentations, the stimuli were presented together, within a tolerance of 7.5 ms (verified by direct measurement). Visual stimuli were generated by a Framestore (VSG 2/4F; Cambridge Research System, Cambridge, UK) housed in a personal computer (PC) controlled by Matlab programs. Auditory stimuli were generated by the computer soundboard and gated to the appropriate speaker via a switch controlled by digital output from the VSG board.
Task and procedure.
As illustrated in Figure 1, trials began with subjects fixating a 1° dark fixation spot (F0), presented at x = −16°, y = 0° [where (0, 0) refers to straight ahead, with positive implying rightward or upward]. After 400 ms, the fixation spot was extinguished, and a saccadic target (F1, also 1° dark spot) was displayed at (+16°, 0°), to which subjects saccaded immediately. Besides the fixation spot and the saccadic target, two additional stimuli were presented sequentially: a probe stimulus, displayed above F0 (−16°, 8°) well before the eye movement, and a test stimulus, presented at variable eccentricities, some time after the appearance of the saccadic target (arriving before, during, or after the eye movement). Subjects reported the perceived location of the test stimulus relative to the probe (two-alternative forced choice left/right judgments; responses were voiced and recorded by the experimenter). The eccentricity of the test was varied by steps of 8° (the minimum distance between adjacent positions of auditory stimuli), using the adaptive QUEST method [which adjusted the eccentricity for each trial on the basis of the subject's previous responses; for more details, see Watson and Pelli (1983)]. The probe and the test stimuli could be visual, acoustic, or bimodal congruent audiovisual. In the latter case, each was composed of two unimodal presentations aligned in space: the probe stimulus comprised a blob and a sound, both presented above F0 (−16°, 8°); the test stimulus comprised a blob and a sound, both presented in 1 of 10 locations of the speakers. For each subject and stimulus type, a minimum of two sessions of 100 trials each were run. Visual, auditory, and bimodal stimuli were tested in separate sessions, interleaved in a pseudo-randomized order; visual localization was also tested in the three additional subjects. Data were fitted with cumulative Gaussian functions (such as those in Fig. 2). The median gives estimates of the point of subjective equality (PSE; the position where the test was perceived aligned with the probe), and the difference between the PSE and probe position defines the localization bias; the SD gives the localization threshold, or just-noticeable difference.
We also tested a control condition, aiming to study unimodal and bimodal localization in a situation that mimicked that observed in the main experiment, but in the absence of eye movements. Procedure, task, data collection/analysis, and participants were all the same as in the main experiment, except that subjects maintained fixation at F0 (−16°, 0°) and that the eccentricity of visual test stimuli was varied by steps of 1° (to obtain precise measurements of visual localization thresholds; hardware constraints prevented us from reducing the step size also for auditory stimuli). The stimulus arrangement was chosen on the basis of our expectations on the effects of eye movements on unimodal localization. We expected saccades to introduce systematic errors in visual but not auditory localization, having two consequences: (1) that the visual PSE would be systematically different from the auditory PSE and, more importantly, (2) that visual and auditory cues to the location of perisaccadic bimodal test stimuli would be systematically conflicting (although the stimuli were physically congruent). To mimic the difference in PSE values, we presented the visual probe stimulus in a different location than the auditory probe: the visual probe was moved to (−24°, 8°), whereas the auditory probe remained at (−16°, 8°). To mimic the conflict between unimodal cues to the location of the bimodal test stimulus, we introduced a physical displacement between its visual and auditory components (the average displacement was −8°, meaning that the auditory stimulus was, on average, 8° to the left of the visual stimulus), while maintaining the bimodal probe a physically congruent stimulus. This was presented at the same location of the visual probe stimulus (−24°, 8°), so that the bimodal PSE would be close to the visual PSE in case of visual dominance and close to the auditory PSE in case of auditory dominance.
Eye-movement measurement.
Eye movements were recorded by means of an infrared limbus eye tracker (HVS SP150; horizontal resolution, 0.01°; precision, 0.1°): the infrared sensor was mounted below the left eye on transparent wraparound plastic goggles, through which subjects viewed the display binocularly. There were practically no visual references because the display was very large (borders ∼70° eccentricity, very hard to see), and the sensor was mounted below the eye without obscuring the view. The PC sampled eye position at 1000 Hz and stored the trace in digital form. The program identified the onset of the eye movement by convolving the trace with a difference-of-Gaussian function and searching for maximum (this method marks the cusp of the eye trace, so that its estimate of the saccadic onset is not systematically delayed, as are estimates based on velocity derivation; in contrast, it requires the eye trace to be stored before being analyzed, so estimates of saccadic onset can only be provided at the end of the trial). In a later analysis, the experimenter checked the quality of the saccades and, when necessary (in ∼2% of trials, in which the saccade was preceded by a blink, or fixation was noisy, or in the presence of 50 Hz noise), adjusted the estimate of the saccadic onset by visually estimating the point of interception between presaccade and during-saccade segments of the eye trace or eliminated the trial. Very few saccades were, furthermore, excluded from the analysis, for the presence of a corrective saccade.
Modeling audiovisual fusion.
We assume that auditory and visual spatial cues are conditionally independent and are combined in a statistically optimal manner (Ernst and Banks, 2002; Alais and Burr, 2004). The combined audiovisual estimate of spatial position ŜAV is given by the following: where ŜA and ŜV are auditory and visual estimates of spatial location. wA and wV are the weights by which the unimodal estimates are scaled, inversely related to the variances (squared thresholds) σA2 and σV2 for audition and vision: ensuring that the weights sum to unity.
The predicted bimodal thresholds σ̂AV2 are given by the following: σ̂AV will always be less than min (σV, σA), and the reduction in threshold will be greatest (√2) when σV ≅ σA.
Results
Spatial localization
Subjects were asked to report whether a test stimulus appeared to the left or the right of a previously presented probe. In different sessions, the two stimuli were presented either unimodally (visually or acoustically) or bimodally (visually and acoustically), either during fixation or at the time of a saccade. In the latter case, the probe stimulus was always presented well before (at least 500 ms) saccadic onset, whereas the test could fall before, during, or after the eye movement. In all conditions, subjects reported perceiving bimodal stimuli as single unified events, both when they were physically congruent (a blob and a sound presented at the same location) and when they were conflicting (locations of blob and sound differed by 8°, on average).
Example results from subject P.B. together with the best-fitting psychometric functions are shown in Figure 2. During fixation (Fig. 2A), the localization of visual and auditory stimuli was unbiased, meaning that the PSE corresponded to the physical position of the probe stimuli. The width (SD) of the curves gives an estimate of threshold, ∼1° for visual stimuli and between 5° and 11° for auditory stimuli. Bimodal localization was studied by comparing the apparent position of a conflicting test stimulus with a bimodal congruent probe. The precision of the localization for bimodal stimuli was always similar to that for visual stimuli (as predicted by Eq. 3; see Materials and Methods). As predicted by Equation 1, conflicting stimuli were always localized toward the location of the visual component, regardless of the position of the auditory component, as Alais and Burr (2004) showed for small, unblurred stimuli.
Figure 2B shows example curves for stimuli presented within the interval of −25 to 0 ms (where 0 refers to saccadic onset). Perisaccadically, visual localization was grossly biased in the direction of the saccade (in subject P.B., the PSE was displaced ∼15° from probe position), and the localization was less precise (the curve is far broader) than during fixation. However, auditory localization remained approximately veridical (PSE near the physical location of the probe) and as precise as during fixation. Bimodal stimuli, although physically congruent, were mislocalized in the direction of the eye movement, like visual stimuli. However, the mislocalization was less than for unimodal visual stimuli. Importantly, the precision of the localization (given by the slope of the curves) was better than either the visual or the auditory unimodal case, as predicted by Equation 3 in Materials and Methods.
Figure 3 shows all individual results for perisaccadic localization, plotting localization threshold against the corresponding values of bias. Figure 3, A and B, reports data from perisaccadic visual and auditory localization, respectively (dashed lines indicate average values observed during fixation). In all subjects, auditory localization was nearly unbiased, with precision similar to that during fixation. In three of the four subjects who completed all experimental conditions, as well as in three naive subjects who were additionally tested in this condition (see figure legend), visual localization was grossly biased and ∼10 times less precise than during fixation. In one subject (G.A.), precision was even worse, but his bias (in the opposite direction) was not significant.
Figure 3C reports the perisaccadic bimodal localization performance (filled symbols), with data from each subject in a distinct panel (see text insets); unimodal localization is also represented for comparison (open symbols). As predicted by Equation 3 in Materials and Methods, bimodal thresholds are either lower or similar to unimodal thresholds. The relative distance between bimodal and visual (or auditory) values of bias (i.e., the distance on the abscissa between the full symbol and the hollow ones) gives a direct estimate of the relative dominance of visual (or auditory) cues in determining each subject's bimodal localization performance. For example, in subject J.V., auditory localization is far more precise than visual localization: as predicted by Equations 1 and 2 in Materials and Methods, bimodal localization is dominated by auditory cues.
Figure 4 plots the observed bimodal biases (A) and thresholds (B) against the corresponding predicted values. In all subjects, including the atypical subject G.A., the perceived position of bimodal stimuli and the precision of bimodal localization are well predicted by assuming Bayesian integration of visual and auditory spatial cues. The average biases and thresholds for unimodal (visual and auditory) and bimodal (observed and predicted) localization are shown in Figure 5: to minimize intersubject variability, thresholds were all normalized to the observed bimodal value. For both bias and thresholds, measured bimodal values are very close to the predicted values and significantly different from the unimodal values; as predicted by Equation 3 in Materials and Methods, the average bimodal localization threshold is significantly lower than either visual or auditory thresholds.
All results shown so far were for perisaccadic stimulus presentation, −25 to 0 ms before saccadic onset. But a stronger test would be for the Bayesian approach to predict the time course of the mislocalization effect. Figure 6 plots the time course of the average bias and threshold for three subjects (excluding G.A., for whom there was very little data across the time course). Visual mislocalization followed the characteristic dynamics (Morrone et al., 1997; Ross et al., 1997), rising to a maximum near the time of saccadic onset. Discrimination thresholds follow a similar time course, reaching a maximum just before saccadic onset. Stimuli presented early or late had localization biases and thresholds similar to fixation (Fig. 6, dashed lines). Auditory localization threshold and bias were virtually unaffected by the saccades, showing no systematic dependency on time relative to saccadic onset. Bimodal localization (Fig. 6, middle panels) followed a similar time course to the vision results, with less variation in bias and thresholds. Importantly, the variation over time followed closely the Bayesian predictions from the unimodal visual and auditory curves (Fig. 6, middle panels, thick gray curves) (see Eqs. 1–3 in Materials and Methods).
Modeling
The previous section shows that the time courses of both bias and precision of bimodal localization are well predicted by a simple Bayesian model of optimally weighted combination of the unimodal signals. Because auditory PSEs and thresholds were virtually unaffected by saccades, the variation in the bimodal performance was basically caused by the perisaccadic mislocalization and decrease in precision of vision. Here, we model the time course of visual localization, showing how the action of simple physiologically plausible mechanisms can account for the data.
The experimental task required subjects to localize stimuli relative to an external reference (the probe) while making horizontal saccades and therefore to estimate stimuli positions in head-centered coordinates (a coordinate system insensitive to gaze shifts). In these conditions, the head-centered location of a stimulus over time [X(t)] can be derived from its retinotopic location estimate [x(t)], through a simple translation of the horizontal dimension of the coordinates by a function E(t), representing the change of eye position in time (Eq. 4). Notice that correct (i.e., unbiased) head-centered location estimates would require a correct estimate of the function E(t), with the latter presumably provided by an extraretinal eye-position signal (Robinson, 1975; Dassonville et al., 1992; Morrone et al., 1997; Deneve et al., 2001; Niemeier et al., 2003; Pola, 2004). On the contrary, the dynamics of the perisaccadic mislocalization effect (Fig. 6) (Matin and Pearce, 1965; Honda, 1989; Morrone et al., 1997) suggest that the E(t) signal does not simply follow the actual eye trace but anticipates the saccade by at least 50 ms. It has been suggested that such anticipation may reflect the presence, before the actual saccadic onset, of a signal representing gaze at the intended postsaccadic location (an intention-to-move signal) (Dassonville et al., 1992; Bahcall and Kowler, 1999). We propose that an anticipatory signal E(t) may result from the combination of two competing gaze-related signals, one indicating gaze in the presaccadic direction and the other indicating gaze at the future or intended postsaccadic position.
More specifically, we assume E(t) to be the maximum likelihood estimate of two gaze-related activities, each arising from a population of neurons signaling gaze at a particular direction. During normal fixation, neurons signaling gaze in the current direction (the presaccadic location, indicated as F0) will be highly active, and all others are practically dormant. Around the time of saccades, the activation of neurons at the current fixation decreases, and those representing the new postsaccadic direction (the saccadic target F1) begin to respond.
We model gaze-related population activities as Gaussian distributions centered at F0 and F1, the SDs σ0 and σ1 of which vary over time (Eqs. 5a, 5b): Notice that the maximum height of the distributions (i.e., maximum population activity) is inversely proportional to their SDs (Eqs. 6a, 6b) and therefore varies over time too: Because eye position is assumed to be computed from the maximum likelihood estimate of fo(x,t) and f1(x,t), at each time the estimated position of the eyes [E(t)] is given by the weighted sum of the positions x = F0 and x = F1, with the weights inversely related to the variances σ02 and σ12 of the two distributions (Eqs. 7, 8), and the SD of the E(t) signal [σ(E(t))] will be proportional to the product of σ0 and σ1 (Eq. 9): where Thus, E(t) and σ(E(t)) are completely determined once the maxima of the two population activities (or the variances σ02 and σ12) are known; notice, however, that E(t) depends only on the relative strength of the two activities, whereas σ(E(t)) depends also on their absolute strengths; for example, if both populations are strongly activated, σ(E(t)) will be low, if both are weakly activated, σ(E(t) will be high, but in any case, E(t) will signal midway between F0 and F1.
Because we have no physiological data on how the strengths of these putative gaze-related activities may vary over time, we adopt another approach, working backward from our data to predict the gaze functions. As stated above, we assume that the eye-position signal is computed from the maximum likelihood estimate of gaze-related activities and calculate how these activities should vary over time to produce our empirical data.
The results of this simulation are illustrated in Figure 7. Figure 7A–D shows the transition of the eye-position signal from initial fixation (F0) to the saccadic target (F1): A shows the time courses of the maximum activities at F0 (light gray line) and at F1 (dark gray line), and B–D show the spatial distributions of the responses at three times. Well before the saccade (Fig. 7B), there is a clear, sharply tuned signal at F0 (light gray curve), causing the maximum likelihood estimate of gaze position (thin black curve) to be centered there. This signal gradually decreases while the signal at F1 begins to build up. Well after saccadic offset (Fig. 7D), the strong signal for eye position at F1 (dark gray curve) dominates the maximum likelihood estimate, but between the two extremes (Fig. 7C), both signals contribute to the maximum likelihood estimate, causing it to be between the two fixation points. Importantly, during this intermediate period when both responses are low, the signal has high variance and hence low reliability.
Figure 7E shows the time course of the eye-position signal E(t) and its associated SD [σ(E(t)]. E(t) moves progressively from F0 to F1, signaling midway for an extended period. During this period, the SD of the signal σ(E(t)) is particularly high. Figure 7F shows the effects of the saccade on retinotopic position estimates x(t): the saccade displaces the retinal image, and hence the position signal arising from it, but we assume that it does not affect the reliability of estimated retinal positions of the stimuli. Figure 7G shows the sum of the two signals. As can be seen, it closely follows the data, both those for the changes in localization bias and for the changes in precision.
On one hand, the fact that the model fits the data well is unsurprising, given that the parameters of the model were estimated from psychophysical results. On the other hand, however, it shows that a system of this sort can predict these results. With two basic assumptions, that perceived position is given by the sum of an eye-position signal and retinal information and that the eye-position signal is computed as the maximum likelihood estimate of two Gaussian-shaped population activities, centered at F0 and F1, the time course of activity at F0 and F1 are completely constrained by the measured values of localization bias and threshold, with no other free parameters. Consider, for example, the time course of activity at F0 (Fig. 7A, light gray line): activity initially decreases but shows a “rebound” just after the saccadic onset. The observed time courses of bias and threshold jointly determine this variation: around the saccadic onset, the threshold [and therefore σ(E(t))] is maximum, forcing both gaze-related activities to be weak (Eq. 9). After some 25 ms from the saccadic onset, the threshold begins to recover to its usual value, forcing either the activity at F0 or that at F1 to increase (Eq. 9); meanwhile, the localization bias is changing at a similar rate as does the retinal position of the stimuli: this makes E(t) remain approximately constant (Eq. 4) and forces the activities at F0 and at F1 to have similar strengths (Eqs. 7, 8). The activity at F0 is eventually allowed to decrease near the end of the saccade, when σ(E(t)) decreases and E(t) progressively grows toward F1. Thus, whether the rebound has any physiological significance is yet to be determined, but it is a clear prediction of the present model.
Discussion
The general aim of this study was to investigate whether spatial mislocalization at the time of saccades may be consistent with statistically optimal Bayesian integration, by measuring spatial localization of perisaccadic visual, auditory, and bimodal audiovisual stimuli. We found that visual localization dramatically changes in the perisaccadic interval: not only does it become biased toward the saccadic target, but also far less precise. Auditory localization, however, was virtually unaffected by eye movements. Bimodal localization was affected in a way that was well predicted by optimal Bayesian integration of visual and auditory spatial cues, where localization depends less on visual signals at the time of saccades than during fixation.
For one of the four subjects tested extensively (G.A.), the pattern of perisaccadic visual mislocalization was atypical: unlike other subjects, for him perisaccadic visual stimuli were only slightly mislocalized, and the direction of the bias was opposite to that of the saccade. Although atypical, localization errors against the direction of the saccade have been observed previously for stimuli presented beyond the saccadic target (Morrone et al., 1997), leading to an apparent compression of the visual space. It may be that, in this atypical subject, compression of visual space was stronger than usual, affecting also stimuli presented close to the fixation spot. However, despite this idiosyncrasy, his results relative to the main findings of this study were in line with those of the other subjects: the precision of visual localization decreased in the perisaccadic interval relative to normal fixation, and the perisaccadic bimodal localization performance was well predicted by assuming optimal integration of visual and auditory cues.
Saccades have little influence on auditory localization
Auditory localization was almost unaffected by saccades. Perisaccadic auditory stimuli were localized with the same precision as during fixation, as Harris and Lieberman (1996) have previously shown for detection of auditory stimuli. Similarly, there was no statistically significant bias in location, and the direction of the bias varied across subjects. These results fairly agree with previous ones, showing that auditory localization during eye movements is unbiased (Vliegen et al., 2004) or that perisaccadic localization biases are small (Klingenhoefer and Bremmer, 2004).
Saccades influence visual space perception
In agreement with previous reports, in three of our primary subjects and in three additional observers, we found that visual localization was grossly biased around the time of saccades, with stimuli presented near the initial fixation spot shifted in the direction of the saccade; the effect was maximum at saccadic onset but commenced some 50 ms earlier, as observed by several research groups (Matin, 1972; Honda, 1989; Morrone et al., 1997; Ross et al., 1997). These errors have been proposed to result from a mismatch between the actual gaze position and the eye-position signal used by the visual system to recalibrate visual spatial maps across saccades (Pola, 2004), the latter being anticipated relatively to the actual eye trace. As suggested by studies of saccadic adaptation and double-step saccades, the reason for this anticipation may be the presence of an intention-to-move signal (Dassonville et al., 1992; Bahcall and Kowler, 1999), a signal representing gaze at the intended postsaccadic location before the actual onset of the eye movement.
The current study is the first to specifically measure the precision of visual localization during saccades, using a bias-free technique (by measuring psychometric functions, we could derive independent estimates of random and systematic errors). The results from all tested subjects, showing a substantial decrease in precision, agree qualitatively with previous studies (Ross et al., 1997; Bockisch and Miller, 1999).
One possible cause of the degradation in spatial localization could be the detraction of attentional resources from the localization task at the time of oculomotor planning (Kowler et al., 1995). But it seems unlikely that attention could account for such a large (factor of ten) effect, because the effects of attention seem to be in the order of a factor of two or three, at most (Lee et al., 1999; Morrone et al., 2002).
Poor localization precision is more probably related directly to the systematic mislocalization, given that the two phenomena have similar temporal dynamics. Our model suggests that they have the very same cause: both can be accounted for by assuming that the actual shift of retinal images during saccades is compensated by a predictive, but sluggish and noisy, eye-position signal. Such a signal could arise from the optimal integration (maximum likelihood estimate) of two competing gaze-related signals, one representing gaze in the presaccadic direction and the other representing gaze in the postsaccadic one (possibly, the so-called intention-to-move signal).
Current computational work suggests that maximum likelihood estimation is a plausible rule to encode information at the population level (Jazayeri and Movshon, 2006). Neurophysiological evidence indicates that some information about the amplitude and direction of impending saccades reaches a subpopulation of visually driven neurons before actual saccadic onset and affects their function in the perisaccadic interval (Sommer and Wurtz, 2006). These neurons are said to demonstrate “predictive remapping” and often reside in parietal areas (Duhamel et al., 1992; Walker et al., 1995; Kusunoki et al., 1997; Umeno and Goldberg, 1997; Batista et al., 1999; Nakamura and Colby, 2000). In qualitative agreement with our simulations, the visual receptive fields of these neurons abruptly changes location just before the saccade, “jumping” from the presaccadic to its postsaccadic location, with no response at intermediate locations.
It is possible that the activity of these visually driven neurons encodes implicitly the gaze-related signal used to maintain stable the perceived positions of visual stimuli despite the shift of their retinal images. We suggest that, at the time of saccades, this gaze-related signal does not reproduce faithfully the actual displacement of the eyes [causing perisaccadic mislocalizations, as proposed previously by several authors (Matin, 1972; Dassonville et al., 1992; Miller, 1996; Morrone et al., 1997; Pola, 2004)] and is also far more noisy relative to normal fixation (causing the perisaccadic decrease of localization precision). Our simulations suggest that this may occur because, in this time window, gaze-related activities are weaker than usual. Several reasons could account for the gaze-related activity being weaker. For example, the fact that remapping does not proceed simultaneously in all cells would introduce a blurring of the response (Kusunoki and Goldberg, 2003). Another reason could be saccadic suppression (Burr et al., 1994), strongest for the magnocellular system that projects predominantly to parietal cortex.
This particular model, the core parameters of which were derived from our psychophysical observations, rather than from independent physiological data, is not essential for the Bayesian approach to be useful, but it is one that seems to fit well with current physiology (Sommer and Wurtz, 2006).
Multimodal integration
Over the past few years, numerous experiments have shown that information from independent sensory channels is integrated with a near-optimal or Bayesian strategy, so the perceived size or position results from the weighted sum of all of the available sensory cues, with weights proportional to the reliability of each sense (Knill and Kersten, 1991; Ernst and Banks, 2002; Alais and Burr, 2004; Ernst and Bulthoff, 2004). Here, we test these predictions in an atypical situation: the conflict between multimodal cues and the degradation of sensory information were not produced by manipulating the stimuli but by taking advantage of the distortions of visual space that occur at the time of saccades. That the Bayesian framework works in these “natural” conditions provides very strong support for it.
Within this framework, our data provide firm evidence that the weight assigned to each sense is not fixed but is updated dynamically and rapidly so as to match the reliability of that sense at that particular moment. This is most clearly seen in Figure 6, where visual bias and sensitivity, and the Bayesian prediction of bimodal performance, varies smoothly over the interval 50 ms on either side of saccadic onset. This dynamic update of sensory weights implies that the nervous system has the ability to measure the error associated with each sensory estimate. It is still far from clear how neural mechanisms achieve this (Witten and Knudsen, 2005).
Conclusion
Seeing is usually believing: for about two-thirds of our waking lives, when we fixate, we perceive objects where vision tells us they are, which, more often than not, coincides with their actual position. In the remaining time, the visual system sends us erroneous spatial information, presumably because it is engaged in correcting the troublesome consequences of eye movements on retinal afferences. When this happens, we disbelieve visual information; if available, spatial cues from other senses become dominant. Here, we show that not only multimodal perception but also visual space perception can be modeled by assuming that, at any time, the perceptual system optimally integrates the available forms of information.
Footnotes
-
This work was supported by the Italian Ministry of University and Research, the Australian National Health and Medical Research Council, and the European Commission Sixth Framework Program (NEST, MEMORY). We are particularly grateful to Dr. Marco Cicchini for advice and help with the modeling.
- Correspondence should be addressed to Dr. David Burr, Department of Psychology, Università Degli Studi di Firenze, 89 Florence, Italy.dave{at}in.cnr.it
References
- Alais and Burr, 2004.↵
- Bahcall and Kowler, 1999.↵
- Batista et al., 1999.↵
- Bayes, 1764.↵
- Bockisch and Miller, 1999.↵
- Burr et al., 1994.↵
- Clark and Yuille, 1990.↵
- Dassonville et al., 1992.↵
- Deneve et al., 2001.↵
- Duhamel et al., 1992.↵
- Ernst and Banks, 2002.↵
- Ernst and Bulthoff, 2004.↵
- Harris and Lieberman, 1996.↵
- Honda, 1989.↵
- Jazayeri and Movshon, 2006.↵
- Klingenhoefer and Bremmer, 2004.↵
- Knill and Kersten, 1991.↵
- Kowler et al., 1995.↵
- Kusunoki and Goldberg, 2003.↵
- Kusunoki et al., 1997.↵
- Lee et al., 1999.↵
- Mateeff, 1978.↵
- Matin, 1972.↵
- Matin and Pearce, 1965.↵
- Miller, 1996.↵
- Morrone et al., 1997.↵
- Morrone et al., 2002.↵
- Nakamura and Colby, 2000.↵
- Niemeier et al., 2003.↵
- Pola, 2004.↵
- Robinson, 1975.↵
- Ross et al., 1997.↵
- Sommer and Wurtz, 2006.↵
- Umeno and Goldberg, 1997.↵
- Vliegen et al., 2004.↵
- Walker et al., 1995.↵
- Watson and Pelli 1983.↵
- Weiss et al., 2002.↵
- Witten and Knudsen, 2005.↵