Abstract
Both space and time are grossly distorted during saccades. Here we show that the two distortions are strongly linked, and that both could be a consequence of the transient remapping mechanisms that affect visual neurons perisaccadically. We measured perisaccadic spatial and temporal distortions simultaneously by asking subjects to report both the perceived spatial location of a perisaccadic vertical bar (relative to a remembered ruler), and its perceived timing (relative to two sounds straddling the bar). During fixation and well before or after saccades, bars were localized veridically in space and in time. In different epochs of the perisaccadic interval, temporal perception was subject to different biases. At about the time of the saccadic onset, bars were temporally mislocalized 50–100 ms later than their actual presentation and spatially mislocalized toward the saccadic target. Importantly, the magnitude of the temporal distortions co-varied with the spatial localization bias and the two phenomena had similar dynamics. Within a brief period about 50 ms before saccadic onset, stimuli were perceived with shorter latencies than at other delays relative to saccadic onset, suggesting that the perceived passage of time transiently inverted its direction. Based on this result we could predict the inversion of perceived temporal order for two briefly flashed visual stimuli. We developed a model that simulates the perisaccadic transient change of neuronal receptive fields predicting well the reported temporal distortions. The key aspects of the model are the dynamics of the “remapped” activity and the use of decoder operators that are optimal during fixation, but are not updated perisaccadically.
Introduction
How temporal information is encoded by the brain remains a mystery. While traditional theories rely on dedicated mechanisms to represent explicitly the passage of time (Treisman, 1963), recent approaches suggest that accurate timing over the sub-second scale may be achieved by distributed, modality-specific mechanisms (Morrone et al., 2005; Johnston et al., 2006; Binda et al., 2007b). Time may be encoded not explicitly but implicitly in the pattern of activity of a neural net which ultimately represents the spatiotemporal configuration of the stimuli (Buonomano and Merzenich, 1995; Buhusi and Meck, 2005; Eagleman, 2008), particularly intervals in the range of 100 ms, similar to the decay constant of early visual mechanisms (DeLange, 1958; Burr, 1981).
Many studies have shown that time can be strongly influenced by action (Haggard et al., 2002), especially saccadic eye-movements (Yarrow et al., 2001; Morrone et al., 2005). Saccades also produce well documented changes in the spatial selectivity and response dynamics of neurons of many visual areas. These cells respond perisaccadically to stimuli flashed to the spatial position that will become their receptive field after the saccade has been completed, mediated by a corollary discharge signal [“predictive remapping” (Duhamel et al., 1992; Umeno and Goldberg, 1997; Kubischik, 2002; Nakamura and Colby, 2002; Sommer and Wurtz, 2002; Krekelberg et al., 2003; Kusunoki and Goldberg, 2003; Sommer and Wurtz, 2006)]. The dynamics at the “future receptive field” differ from usual responses, most notably with a longer response latency: about 80 ms in V3A and frontal eye-fields (Nakamura and Colby, 2002; Sommer and Wurtz, 2006), and >100 ms in LIP (Kusunoki and Goldberg, 2003). The functional role of the increase in latency is not clear, but could result from the time needed to combine the corollary discharge signal with the visual response (Wurtz, 2008). The perisaccadic response in the classical receptive field also changes, becoming more transient with decreased latency [for example, compare responses of Nakamura and Colby (2002), their Fig. 2]. These changes in response cause a transient elongation in space-time in the neuronal receptive field.
Psychophysical studies show that saccades have major perceptual consequences, both in space and in time. Stimuli flashed briefly just before or during a saccade are erroneously localized in space, displaced toward the saccadic target (Matin, 1972; Honda, 1989; Morrone et al., 1997; Ross et al., 1997; Lappe et al., 2000). The temporal separation between two perisaccadic flashes is systematically underestimated and the perceived order of the two perisaccadic stimuli can be inverted (Morrone et al., 2005). In this study we measured simultaneously the perceived location and timing of a visual stimulus briefly flashed at the time of a saccade, showing that spatial and temporal errors follow similar dynamics and are strongly correlated with each other. The results are well simulated by a model based on physiologically observed predictive remapping. Much of this work has been reported previously in abstract form (Binda et al., 2007a; Morrone et al., 2008).
Materials and Methods
Apparatus
The experiments were performed in a dimly illuminated and quiet room. Subjects sat before a monitor screen (70° by 50°) at a distance of 30 cm, with head stabilized by a chin rest with eye-level aligned to screen center. Stimuli were generated by a dedicated stimulus generator (Cambridge Research Systems VSG2/5 framestore), and presented on a CRT color monitor (Barco Calibrator) at a resolution of 464 × 243 pixels and refresh rate of 250 Hz. Visual stimuli (usually vertical bars) were presented against a red background (Commission International de l'Eclairage (CIE) coordinates: x = 0.624; y = 0.344; luminance: 18 cd/m2). Auditory stimuli were generated by the computer sound board and gated to a speaker placed above the monitor via a switch controlled by digital output from the VSG framestore. Audiovisual synchrony was checked and found to be accurate.
Eye movements
Eye movements were recorded by an infrared limbus eye tracker (HVS SP150 or ASL 310), with sensor mounted below the left eye on transparent wraparound plastic goggles, through which subjects viewed the display binocularly. The VSG framestore sampled eye position at 1000 Hz and stored the trace in digital form. In off-line analysis, saccadic onset was determined by an automated fitting procedure, and checked by eye. The experimenter also checked the quality of saccades, and, when necessary, discarded the trial (<10% of trials for a corrective saccade or for unsteady fixation). At the beginning of each session, a calibration routine was run and the horizontal eye position signal was linearized with reference to a ruler displayed on the monitor screen. Subjects were asked to memorize the ruler during the calibration routine and use this as a reference for spatial localization tasks.
Data analysis
Analyses and data fitting were performed using custom software developed in Matlab 7.4 (MathWorks). Psychometric functions were fit with cumulative Gaussian distributions, using the Maximum Likelihood method (Watson, 1979). SEs were computed by bootstrap (Efron and Tibshirani, 1994), resampling the data (with replacement) and repeating the fitting process 500 times.
Experiment 1: audiovisual bisection and temporal order judgment tasks
We measured perceived time and space concurrently for perisaccadic visual events by asking subjects to perform a double task: they reported the perceived timing of a briefly presented visual stimulus, either relative to two auditory markers (bisection) or relative to a single sound (temporal order judgment: TOJ); they also reported its apparent location, relative to the ruler they used for calibration.
At the beginning of each trial, subjects fixated a 1° black dot presented 10° left of center. After a variable delay of about 1000 ms, subjects saccaded to the saccadic target, a 1° diameter black dot presented 10° right of center. At about the time of the saccadic onset, a green vertical bar (2° × 50°, CIE coordinates: x = 0.286; y = 0.585; luminance: 54.5 cd/m2) was presented at the center of the screen for one monitor frame (4 ms). Before and after the visual stimulus, two flanking auditory markers (16 ms white noise bursts, 65 dBs at subject ears) were presented. The flankers were separated from each other by 200 ms, and presented at a time varied randomly over a 200 ms range, straddling the visual stimulus with a variable delay (see Fig. 1A, lower panel). Subjects judged in two-alternative forced choice which of the two sounds appeared to be temporally closer to the bar (bisection task), and also reported the perceived location of the bar. To avoid stereotyping of the responses, particular care was taken to randomize the delay of the stimuli relative to the saccadic target, collecting responses for all delays. In the off-line analysis, trials were binned relative to saccadic onset, and data from each bin analyzed separately.
Four subjects participated to the experiment (three authors and one naive to the goals of the experiment), all with normal or corrected-to-normal vision, and normal hearing. Each subject completed a minimum of 10 sessions, each of 50 trials, and 2 sessions of 50 trials for the control condition (steady fixation). Experimental procedures were approved by the local ethics committees and in line with the declaration of Helsinki.
Perceived bar location was computed as the average reported bar location in each time-bin (see Fig. 5B). Two procedures were adopted for the analysis of temporal judgments. First they were analyzed as psychometric functions (see Fig. 2A), where the proportion of trials in which the bar was perceived as temporally closer to the second sound (bisection task) was plotted against the time of the sound. The data were fitted with a cumulative Gaussian function, to yield an estimate of the PSE (Point of Subjective Equality, corresponding to the median). The SD of the function gives the JND (Just Noticeable Difference) or precision of temporal judgments. As well as calculating psychometric functions, we computed the proportion of “closer to the second” responses over all the tested bar-sound asynchronies (–100 to 100 ms), and z-transformed this value with an assumed SD of 60 ms (an approximation of the average JND observed across all subjects in the saccade condition). This simplified procedure yielded an estimate of bar perceived time with a higher temporal resolution, allowing a finer-grained comparison with the model.
From one author and one naive subject we collected additional data on an audiovisual temporal order task (subjects judged the temporal order between the visual bar and a single noise burst). With this task we also measured performance with a “simulated” saccade, where subjects observed the monitor through a mirror that rotated with similar dynamics to a saccade (simulating the retinal motion produced by a real saccade, while subjects kept fixation). Only trials with the visual stimulus presented just before (<25 ms) the real or simulated saccade were considered. Judgments were analyzed as psychometric functions (see Fig. 2, lower panels), plotting the proportion of trials in which the bar was perceived as following the sound against the asynchrony between the two stimuli.
Experiment 2: visual temporal order judgments
Two green vertical bars (2 × 20°; CIE coordinates: x = 0.286; y = 0.585; luminance: 54.5 cd/m2, duration: 4 ms) were presented at the center of the screen, one in the upper the other in the lower hemi field (vertical position: ±10°), at a fixed temporal separation (Stimulus Onset Asynchrony: SOA; see Fig. 1B). Subjects reported which of the two bars was presented first (2-AFC Visual Temporal Order Judgment task). Data were collected from three participants who had served as subjects for Experiment 1 (three authors). For each subject the SOA between the bars was set to a value allowing above-chance but non-perfect performance in steady fixation conditions. At least 10 sessions of 50 trials were run for each participant, plus 1 session of 50 trials in the steady-fixation condition. Data were binned according to the delay of the first bar presentation relative to saccadic onset and the proportion of correct responses computed for each time-bin.
The performance of each subject on the visual TOJ task (the probability of reporting the correct temporal order of the two flashed visual stimuli) was predicted from his/her own performance on the audiovisual bisection task (Experiment 1). From data of Experiment 1, we estimated the perceptual time-lines (like those in Fig. 4) for two stimuli presented at different delays from the saccade, the first stimulus at t, the second at t + SOA. The difference between the two perceptual time-lines estimated the perceived temporal distance between the two stimuli, with negative differences implying that the second stimulus was perceived as leading the first. The probability of perceived temporal order of the two stimuli was computed by a Monte Carlo simulation of the data, re-sampling (with replacement) from the individual trials (500 repetitions) and counting the number of times the perceived timing of first stimulus preceded that of the second. From this count the probability of perceiving the order veridically was calculated, and reported by open symbols in Figure 6.
Results
Audiovisual temporal judgments
In the first experiment, subjects reported both the apparent temporal position (relative to two noise bursts), and the apparent spatial location (relative to a previously memorized ruler) of a bar flashed around the time the observer made 20° voluntary saccades (see Materials and Methods) (Fig. 1A). Figure 2A shows results for the four tested subjects, for perisaccadic trials (bar displayed between ± 25 ms relative to saccadic onset) and also during fixation. The psychometric functions show a clear effect. In steady fixation (hollow symbols), temporal judgments were veridical, with the median of the psychometric function (PSE) close to 0 ms. However, when the bar was flashed just before or during a saccade (filled symbols), time was systematically misjudged, with PSE around 50 ms, suggesting that the bar was perceived delayed by 50 ms. The precision of the temporal judgments (given by the SD of the curves) was poorer in the perisaccadic than the fixation condition, 58.5 ± 8.3 ms compared with 30.9 ± 5.2 ms on average.
To verify that the effect is not specific to the bisection task, in two subjects we measured temporal order judgments between an acoustic and visual target presented perisaccadically. Figure 2B shows that, in these conditions, the perisaccadic visual stimulus is delayed by about 100 ms with respect to the steady fixation condition (filled and open symbols, respectively). Crosses show the measurements performed with simulated saccades, where subjects maintained steady fixation and stimuli were observed through a mirror rotating at saccade speed. Simulated saccades did not induce a delay of bar apparent time (indeed, if anything, produced a slight advancement in subject JED), suggesting that the effect is mediated by an internal non-retinal signal, and it is not merely the by-product of spurious retinal motion. The larger perisaccadic effect with the TOJ task may reflect the smaller time windows of bar presentation with respect to the saccadic onset used for this task. Given that subjects found the bisection task easier than the TOJ task, this was used to collect most data.
We studied the dynamics of the temporal bias in the bisection task by estimating perceived timing for bars presented at various delays from the saccade (Fig. 3). The symbols show estimates of PSEs from psychometric functions for appropriately binned data. The data clearly show that for all subjects visual stimuli are perceived as progressively delayed when presented near saccadic onset. However, for all subjects, PSEs show an abrupt inversion 50 – 70 ms before the saccadic onset. This effect is restricted to a very small time-window, indicated by the ellipse. The panels at right show the psychometric functions for the two points indicated by the ellipses. In each case, the curve for the later presented stimulus is to the left of the earlier stimulus, implying that it is seen earlier. That is, time is inverted in this period.
The thick curves passing near data points in the leftmost panels are B-splines calculated from the z-value technique described in Materials and Methods. This curve agrees well with the PSE values, and captures the important aspects, the delay near saccadic onset, and the inversion at –70 to –50 ms.
Two of the four subjects show a tendency for time judgments to have shorter latencies for very early presentations, 100–150 ms before saccadic onset. However, as these stimuli were presented very near the time of display of the saccadic target, they are subject to attentional prior-entry like effects that may well explain their perceived advancement in time. In any event, these effects are not central to the arguments being presented here.
It is possible to evaluate how perceptual time varies with physical time (Fig. 4), by adding to the temporal bias the physical time of bar presentation (using z-score data). Perceived time accelerates relative to physical time just before the saccade (slope of the time-line >1) and strongly decelerates during the saccade (slope <1); at about 50 ms before the saccadic onset, perceived time inverts in direction (brief segments with slope <0).
Figure 5A shows the average temporal mislocalization as a function of time relative to saccadic onset. Data from all subjects were pooled together, and PSEs calculated for the various time bins. Note that this has the effect of blurring (but not eliminating) the time inversion, as it did not occur at exactly the same time for all subjects. As mentioned in Materials and Methods, subjects were always required to report both the perceived time of the bar and its apparent location. Figure 5B shows the time course of the spatial mislocalization errors averaged across all subjects. While the spatial and temporal curves have slightly different dynamics, they are clearly very similar, with spatial and temporal mislocalization occurring mainly in the time range ± 50 ms from saccadic onset.
As the spatial and temporal judgments were collected together on each trial, it is possible to investigate the relationship between the two distortions. Figure 5C plots the temporal bias against the apparent mislocalization of the bar. All data from all subjects were pooled and binned according to the degree of spatial mislocalization (two-degree intervals). From these data a single psychometric function of temporal mislocalization was calculated to estimate the PSEs. The farther the bars are perceived from their actual position (screen center) the more they are perceived as delayed: the temporal bias varies linearly with the spatial mislocalization (R2 = 0.96).
Visual-visual temporal order judgments
One prominent result of Experiment 1 is that, in a brief epoch about 50 ms before saccadic onset, perceived time flows backwards, against the direction of physical time. This agrees qualitatively with findings by Morrone et al. (2005), where the perceived order of two brief visual stimuli was found to be systematically inverted in a small interval before a saccade. Here we replicate the temporal inversion under conditions similar to those of Experiment 1, to test whether the inversions can be quantitatively predicted from the perceptual time-lines estimated in Experiment 1.
The same subjects used in Experiment 1 reported the order of appearance of two bars presented in the upper/lower hemi-fields. The SOA (temporal asynchrony) between the bars was selected for each subject to yield ∼75% correct responses in the steady fixation condition. Figure 6(filled dots) shows data from the three tested subjects at short bar separation. The proportion of trials in which temporal order was correctly reported is plotted against the time of presentation of the first bar relative to the saccadic onset. For all subjects, performance varied considerably as a function of presentation time. At about 50 ms before saccadic onset, the probability of reporting the correct order falls <50%, implying that temporal order is systematically perceived as inverted. In one subject (CM), the effect is clearly observed in two different epochs, about 50 and 80 ms before the saccade. Perisaccadic performance is similar to, or perhaps better than, fixation performance; after the saccadic onset it tends to be worse than during fixation, with probability correct approaching 50%.
Open symbols in Figure 6 report the performance predicted for each subject from his/her perceived time-line, as measured in Experiment 1 (data in Fig. 4). Predicted performance drops below chance at the same time and by a similar amount as observed performance. This suggests that perisaccadic temporal inversion can be accurately predicted by assuming that each of the two visual stimuli is subject to the same perisaccadic distortion of perceived time and that the temporal distortion can be estimated reliably in audiovisual alignment task as those performed in Experiment 1.
Modeling spatiotemporal distortions
We modeled the observed perceptual distortions by assuming that time and space are concurrently encoded within a linear-nonlinear-linear model where the nonlinear stage simulates a transient reorganization of the input just before saccades. Predictive remapping is a profound change observed in the neural spatial selectivity of neurons in many area of the visual system, including V3, V4, IP, SC, and FEF. A diagram of the phenomenon is shown in Figure 7(rightmost panels) based on reported data of LIP neurons (Duhamel et al., 1992; Kusunoki and Goldberg, 2003). At saccadic onset, the receptive fields (RFs) of LIP neurons move in the direction of the saccade to encompass the future RFs. The phenomenon starts about 100 ms before the saccade, and continues throughout the saccade. In addition to the spatial displacement of the receptive field, the response at the future RF position becomes substantially delayed. The left-hand panels of Figure 7 illustrate the spatio-temporal changes to the visual receptive field profile, showing how it elongates in space-time. The shift in receptive field integrated over a time window of about 50 ms produces a cell sensitivity profile oriented in space-time (lower panel) with earlier responses at the classic RF and later responses at the future RF. The phenomenon has been called predictive, because the cell will respond to stimuli located at the future position of the RF before the eye movement starts.
To capture these aspects of RF deformation we propose a model with four stages: linear input, nonlinear remapping, a second linear encoding and a final decoding stage. All stages are operators defined in space (the horizontal position of the stimuli) and time. A diagram of the model is illustrated in Figure 8A.
Retinotopic input layer
The input layer represents stimuli in retinotopic coordinates. The operator that relays visual information is linear and has an impulse response function given by the following: where t is the physical time from stimulus onset and x is space in retinotopic coordinates (retinal eccentricities). The spread of the Gaussian receptive field (RF) is given by the space constant σr. The temporal impulse response function is a simple causal filter where τr determines both the peak of the response and its decay. Figure 8A (leftmost box) illustrates the spatiotemporal transfer function of the input layer with peak activity at τr.
Its response OR to any stimulus I(x,t) is given by the following: We simulated the localization performance in space and time in response to a delta function I(x,t) whose location x varies with the retinal eccentricity of the stimulus at the time of its presentation.
Remapping encoding layer
The nonlinear stage updates the retinal information with eye position information by shifting the retinotopic representation according to an internal signal of eye position, CDS (corollary discharge signal). The CDS is a piecewise linear function which takes a value xF (“Fixation Point”) long before a saccade and a value xT (“Saccadic Target”) long after the saccade; the transition between the two values occurs linearly in the interval between Ton and Toff: V is the velocity of the corollary discharge during transition, v = (xf − xt)/(toff) − ton. The onset and offset of the corollary discharge transition Ton and Toff do not necessarily correspond to the onset and offset of the saccade (for reference, see Pola, 2004).
Such an operation performs a coordinate transformation, changing the output of the retinal stage OR(x,t) into OR*(x + CDS(t),t). This operation is neither linear nor time invariant and hence cannot be modeled within the linear filtering stage. Although a coordinate transformation of this sort may seem physiologically complex, there exist several biologically plausible ideas in the literature. One successful model has been “gain fields” (Zipser and Andersen, 1988; Xing and Andersen, 2000; Cassanello and Ferrera, 2007), another population coding with basis functions that perform the combination of eye position signals and retinal activity (Deneve et al., 2001).
The encoding layer is a linear filter with Impulse Response Function (E) given by the following: where σe defines the spatial spread of the receptive field and τe the temporal integration constant. Note that none of these parameters is crucial for the model (see below). Figure 8A (rightmost box) illustrates the spatiotemporal IRF of this layer.
Figure 8, B and C, illustrate the response of the encoding layer to a delta function during fixation, or just before the onset of a saccade. During fixation, the corollary discharge signal is constant and the response of the encoding layer to a stimulus I(x,t) is the convolution between the output of the retinotopic input layer (after coordinate transformation with x → x + xF) and the impulse response function E(x,t). During remapping, the system is no longer a time invariant linear system since the spatial coordinates depend on the value of CDS(t). However we can approximate the response of the encoding stage at a fixed time t0 as the convolution with the corresponding operator OR* for t0: The sum over time is as follows: and yields the overall response of the layer.
The response of the encoding layer is the response of retinal activity that moves along the space-time trajectory of the CDS while decaying in time, as illustrated by the lower 3-D sketch in Figure 8A. If the duration of the CDS shift is equal to or shorter than the duration of the retinal response decay, the pattern of the result will produce three peaks: one at fixation, one along the CDS trajectory and one at saccadic target (see Fig. 8C). The first peak occurs because the change in coordinate system of the nonlinear stage starts after the initiation of the response, the second corresponds to the peak of the response of the retinal input and the last at saccadic target when the coordinate transformation is completed and there is an accumulation of the energy of the tail of the retinotopic input decay.
Decoding stage
We decoded the response patterns implementing an optimal decoder similar to that described by Jazayeri and Movshon (2006). We first applied a hard threshold on OE at a level θ, then convolved the response of the encoder with the logarithm of the spatiotemporal tuning curve of the encoding layer (i.e., the response to a delta stimulus presented in steady fixation). The decoder (that behaves optimally in fixation conditions) computes a likelihood distribution of stimulus presentation as a function of space and time. The maximum of this distribution is taken as an estimate of the spatiotemporal coordinates in external space of the stimulus (which are therefore estimated according to an MLE rule). We assume that the set of spatiotemporal tuning functions used in the decoding stage remains the same at the time of saccades, and this ultimately produces large spatial and temporal errors in perisaccadic decoding.
Model behavior
In fixation conditions, the response in the encoding layer is a unimodal distribution in space and time. The convolution with the tuning functions of the encoding units produces a unimodal likelihood function that corresponds to veridical localization of the space and time of the visual stimulus, apart from a constant latency. For example, the decoded activity in Figure 8B will generate localization at position 0 and time 220 ms (the peak of the encoded response), which can be considered as the latency of the visual responses.
Stimuli presented during the remapping result in patterns of responses in the encoding layer that are oriented in space-time, generating delayed responses at various spatial positions. Decoding errors fall into two main categories, depending on the timing of stimulus presentation. For stimuli presented when the remapping has just started, the bulk of the response is concentrated at a location corresponding to the physical stimulus position with late and weaker distributed activity along the trajectory of the CDS. The response at the original location is smaller and briefer than in fixation because the remapping is projecting some of the retinal stage activity at other positions along the CDS trajectory. Thresholding in these conditions is critical: a considerable part of the “tail” of activity will fall below the threshold and only the strongest and earliest responses are passed on to the decoder (Fig. 9A). The overall result is read by the decoder as an advancement in time and a veridical judgment in space. Conversely, input arriving toward the end of the remapping causes first a weak and distributed response, then a stronger late response corresponding to the time when the CDS assumes its post-saccadic value. This leads to biased spatial localization (a mislocalization in the direction of the saccade) and delayed estimates of the presentation time. An example of this is illustrated in Figure 8C: the stimulus is no longer localized at 220 ms delay, but at about 320 ms and at position 20 deg instead of 0 deg, corresponding to a mislocalization as large as the saccadic amplitude.
We studied the behavior of the model for different values of parameters and found that the duration of the CDS shift relative to the decay constant of the retinal temporal impulse response is the crucial parameter determining both the spatial and temporal distortions. Figure 9 shows the model temporal output as function of decoder threshold θ (A) and of time constant of the retinotopic filter τr (C). With no threshold there is no inversion of time, but the smallest threshold (0.1% peak response during fixation) produces an inversion. In all other respects, the model behavior remains considerably similar irrespectively of the threshold level. The thresholding need not be a hard threshold, it could easily be generated by an accelerating non-linearity or normalizing gain control.
For time decays longer than 30 ms (with a fixed duration of the CDS equal to 160 ms), perceived time undergoes rapid changes resembling a step function, with a strong delay for perisaccadic stimuli. For temporal constants <20 ms there is mainly an advancement of predicted time and practically no perisaccadic delay. Figure 9, B and D, show the predicted spatial mislocalization as a function of the same parameters as in A and C. Except for very short values of τr (less then 20 ms) the spatial output is strongly biased in the direction of the saccade and stable with changes of the parameters. The immunity of the spatial error to threshold level and to τr is partially conferred by the use of spatial operator with small spread. Also no errors are observed for targets presented during the small temporal inversion, for any of the parameters values. However for large errors, it is interesting to note that the dynamics of the temporal and spatial distortions are very similar, given that both are coupled by the transient shift of the CDS. The model predicts a maximum spatial error equal to the size of the saccade at saccadic onset, in contrast to the half saccadic-size error usually observed (see Fig. 5B). Perhaps the discrepancy can be explained by considering a shift combined with a compression of relative distances (Morrone et al., 1997).
The other parameters are not crucial. The temporal constant of the IRF of the encoder τe produces an overall scaling of the decoding errors, and also interacts with the threshold level to determine the final pattern of temporal decoding errors. The width of the spatial tuning curves (σr and σe) does not change the overall pattern of the temporal localization.
We also used other decoding strategies of the thresholded OE response, such as population vectors and template-matching filters determined for the fixation responses. All these attempts produced essentially the same results (and threshold dependence), indicating that both suboptimal and optimal decoding strategies generate similar errors when faced with the unusual activation patterns occurring in the encoding layer at the time of saccades. The key to all the simulations was the use of a strategy for the decoding of perisaccadic activities that was optimized for the steady fixation conditions. If the decoder were optimized for the perisaccadic pattern of activity, mislocalization errors should have been greatly reduced. But as transient perisaccadic stimuli are rare outside the laboratory, there would be little advantage for the visual system to implement this strategy.
Figure 4 superimposes the predictions of the model on the temporal judgments of all subjects. The best fit to the data for each subject was obtained for different corollary discharge onset times (Ton is equal to –60, –40, –50 and –40 ms for subjects CM, PB, DB and PC respectively) and offset timings Toff (equal to 110 ms for CM, PB, DB and to 100 ms for PC). Threshold values θ for subjects CM, PB, DB and PC were respectively 0.3%, 0.4%, 0.2% and 0.3% of the maximum activity in the encoding layer in steady fixation conditions. We were able to fit the data for all subjects by keeping constant the parameters of the spatiotemporal tuning curves of both the retinotopic input layer and the encoding layer (the width of the spatial tuning curves σr and σe were 0.5, approximating a spatial delta function; the time constant of the retinotopic IRF τr was set to 30 ms and the time constant of the IRF for the encoding layer τe was 75 ms).
In summary, the model simulates at the encoding stage a spatiotemporally oriented neuronal receptive field encompassing the present and the future RF positions, with an increased latency for the future position. This transient change of the neuronal RF is able to predict most of the time distortions observed here.
Discussion
We measured the apparent timing of transient visual stimuli flashed during saccadic eye movements. Perceived time was strongly delayed for stimuli presented just before and during a saccade, and advanced for stimuli presented 50 ms before saccadic onset. The overall change in perceived timing resulted in a tendency for perisaccadic stimuli to be seen either before or after the saccade. In the same trials, we also measured the perceived spatial location of the stimulus and found that the spatial mislocalizations were strongly correlated with the distortions in time.
In a small pre-saccadic temporal window, the perceived passage of time not only decelerates, but runs backwards: the perceived temporal order of two successively presented visual stimuli was inverted. The apparent temporal order could be predicted by assuming that the two stimuli are represented along equally distorted independent time-lines. Just before a saccade the first stimulus is perceived veridically, but the second falls in the first phase of temporal distortion and is perceived as advanced, producing the perceived inversion. These findings replicate previous reports of perisaccadic inversion of perceived time (Morrone et al., 2005; Kitazawa et al., 2007). They also show that consistent measures of the phenomenon can be obtained with two different techniques, involving different perceptual tasks (a cross-modal temporal bisection task and a unimodal visual TOJ task) that engage memory and attention in different ways.
It is known that attention affects time perception in many ways, one being that attended stimuli are perceived earlier than unattended stimuli (Titchener, 1908; Shore et al., 2001; Park et al., 2003; Tse et al., 2004; Eagleman, 2008; Ivry and Schlerf, 2008). As attention is linked to saccade programming (Deubel and Schneider, 1996; Goldberg et al., 2006), prior entry could be relevant for the perceptual inversion. However, prior entry effects have slower dynamics than the temporally precise effects reported here (Reeves and Sperling, 1986), so it is unlikely that our findings could be explained by saccade-related modulation of attentive resources.
The delay of perceived time by 50–100 ms during saccades is at first sight similar to an effect termed “chronostasis” (Yarrow et al., 2001). However, the two phenomena have very different spatial and temporal properties. Chronostasis is specific to spatial positions that are attended (Georg and Lappe, 2007), while the effect reported here is observed for the position halfway between saccadic target and fixation, outside the focus of attention. In pilot studies we also measured other positions, and found no major difference in the temporal distortion pattern. Again, our phenomenon has very brief and strongly coupled dynamics, while chronostasis effects are perceived over more than a second after saccadic landing.
We believe that the reported perceptual effects reflect the action of perceptual mechanisms implicated in the transformation of the coordinate system from one gaze direction to another. Visual perception remains stable across saccades, despite the abrupt displacement of retinal images. To achieve stability, the visual system anticipates the displacement by predictively remapping visual representations. Illusory phenomena, such as the distortion of visual space, probably reflect this remapping (Ross et al., 2001). We propose that the remapping of visual information also impacts on the temporal latencies of visual representation, ultimately biasing the perceived timing of perisaccadic flashed stimuli.
As mentioned in the introduction, remapping has been demonstrated in neurons in many visual areas and well documented in humans (Merriam et al., 2003, 2007). The proportion of cells showing remapping increases from 10% in V2, 30% in V3, to >50% in the parietal-frontal regions such as LIP or FEF (Nakamura and Colby, 2000; Kusunoki and Goldberg, 2003). The shape and size of the receptive fields are also affected by remapping. In area LIP the spatial receptive field elongates to encompass the presaccadic (current RF) and the postsaccadic (future RF) retinal position (Wang et al., 2008); in FEF the receptive field at specific times is composed by two foci, one at the current and one at the future RF position (Sommer and Wurtz, 2006); in V4 the RF tightens and deforms around the saccadic target (Tolias et al., 2001; Kubischik, 2002). In all cases, the neuronal representation of the visual field is transiently distorted, presumably causing the systematic mislocalization of visual stimuli flashed at about the time of a saccade. Interestingly, neuronal latencies also change perisaccadically, that of the “future receptive” field increasing by 80–100 ms in V3A, frontal eye-fields and LIP (Nakamura and Colby, 2002; Kusunoki and Goldberg, 2003; Sommer and Wurtz, 2006). The perisaccadic response in the classical receptive field also changes, becoming more transient (Nakamura and Colby, 2002). All these changes in neuronal response generate a receptive field with a transient elongation in space-time.
We propose a concept model with similar neuronal behavior, in which encoding units orient their spatial receptive field and delay the response at the post-saccadic (“future”) RF. This transient remapping of visual information simulates both the spatial and temporal perceptual errors reported here. The model is a simple LNL encoder-decoder system whereby a stimulus, initially encoded in retinotopic coordinates, is remapped in response to an on-line eye-position signal. The remapping pattern of activity is read out by an optimal decoder, which estimates the most likely location and timing of the stimulus. In its present implementation, the encoding process is spatiotopic, as the corollary discharge affects the position of the receptive field, even during fixation. However, this is not a crucial stage: it would be easy to assign the spatiotopic component to the decoder stage or even at a subsequent memory stage. What is crucial to generate the spatiotemporal distortions is the transient perisaccadic tilting of the receptive field in the space-time domain. The tilting could be smooth and continuous across locations, as our model simulates, or built by two foci of activity at different times, as demonstrated for FEF neurons (Sommer and Wurtz, 2006; Wurtz, 2008). In both cases, it would predict the observed perceptual distortions. The correlation between spatial and temporal errors (Fig. 5C) further supports the idea that neural space and time are firmly interconnected (Morrone et al., 2005; Burr and Morrone, 2006).
The model presented here predicts strong perisaccadic spatial and temporal errors but does not account for the perisaccadic compression of space and time observed previously (Ross et al., 1997; Morrone et al., 2005). However, an extension of the model presented here could in principle predict compression of time and space. Consider the following space-space analogy. If two parallel lines are oriented at 45°, the most reliable estimate of the separation between them is given by the distance orthogonal to the lines orientation. In our model the most reliable estimate of the space-time separation is given along the direction orthogonal to the remapping velocity (Cicchini et al., 2009), and along this direction the space-time distance will be compressed. The prediction would be that the two lines are perceived closer in space and in time, as observed experimentally. The effect should be greater for faster predictive remapping.
The decoding strategy that we adopted is similar to the one proposed by Jazayeri and Movshon (2006) to model of MT function, in that it provides a maximum likelihood of the stimulus properties from the activity generated in the encoding layer. By adopting this strategy, the model yielded an optimal estimate of stimulus timing during fixation and an erroneous estimate at the time of saccades. Interestingly, time is decoded without explicitly using clocks, counters or accumulator mechanisms, but directly from the temporal profile of the response. This fits well with evidence suggesting that the perception of duration seems to rely on multiple spatially distributed mechanisms (Johnston et al., 2006) organized in spatiotopic coordinates (Burr et al., 2007).
The model can also simulate the pre-saccadic inversion of time. In these intervals the decoder is confused by a curtailing of the neuronal discharge in the pre-saccadic receptive field, interpreting the visual event occurring earlier in time. This curtailing in the response seems to be observed in neurons whose response is affected by saccades (Nakamura and Colby, 2002; Kusunoki and Goldberg, 2003; Sommer and Wurtz, 2006).
In conclusion, many complex perceptual localization phenomena have been predicted by a simple model that simulates the space-time transformations that occur in the receptive fields of many visual neurons at the time of saccades. This line of reasoning shares many analogies with the physical case where the observer and the phenomenon to be measured are in different inertial frames of reference that move at different velocities (Special Relativity) and the analogy may be useful to elucidate the phenomenon of perisaccadic distortion of visual perception (Burr and Morrone, 2006; Morrone et al., 2009).
Footnotes
-
This research was supported by the Italian Ministry of Universities and Research and by European Community Projects “MEMORY” (FP6-NEST) and “STANIB” (FP7 ERC).
- Correspondence should be addressed to Prof. M. Concetta Morrone, Department of Physiological Sciences, Università di Pisa, Via San Zeno 31, 56123 Pisa, Italy. concetta{at}in.cnr.it