Abstract
Eye movements in daily life occur in rapid succession and often without a predefined goal. Using a free viewing task, we examined how fixation duration prior to a saccade correlates to visual saliency and neuronal activity in the superior colliculus (SC) at the saccade goal. Rhesus monkeys (three male) watched videos of natural, dynamic, scenes while eye movements were tracked and, simultaneously, neurons were recorded in the superficial and intermediate layers of the superior colliculus (SCs and SCi, respectively), a midbrain structure closely associated with gaze, attention, and saliency coding. Saccades that were directed into the neuron's receptive field (RF) were extrapolated from the data. To interpret the complex visual input, saliency at the RF location was computed during the pre-saccadic fixation period using a computational saliency model. We analyzed if visual saliency and neural activity at the saccade goal predicted pre-saccadic fixation duration. We report three major findings: (1) Saliency at the saccade goal inversely correlated with fixation duration, with motion and edge information being the strongest predictors. (2) SC visual saliency responses in both SCs and SCi were inversely related to fixation duration. (3) SCs neurons, and not SCi neurons, showed higher activation for two consecutive short fixations, suggestive of concurrent saccade processing during free viewing. These results reveal a close correspondence between visual saliency, SC processing, and the timing of saccade initiation during free viewing and are discussed in relation to their implication for understanding saccade initiation during real-world gaze behavior.
Significance Statement
Contrary to traditional controlled stimuli/task studies, eye movements in day-to-day life are not discrete events but occur in (rapid) succession and often without a predefined goal. Therefore, the study of visual processing during free viewing of dynamic scenes is an essential step in understanding visual processing in its functional context. We present an investigation into saliency and visual responses in the superior colliculus (SC) during task free viewing of dynamic videos and their correspondence to saccade initiation. In short, these results show the correspondence between fixation duration, pre-saccadic visual saliency at the saccade goal, and SC processing and provide first evidence of a neural correlate of concurrent visual processing across a chain of saccades in the SC during free viewing.
Introduction
Free viewing behavior is characterized by a series of discrete eye movements (saccades), interspersed with moments of visual fixation, the duration of which is largely dependent upon processes related to analyses of currently foveated objects and processes related to selecting the next peripheral target. Studies of gaze in task free, dynamic viewing conditions (Berg et al., 2009; Mills et al., 2011; Nuthmann, 2017; White et al., 2017b) reveal that fixation durations are often shorter than fixation durations required for structured tasks [V4 (Ogawa and Komatsu, 2004), LIP (Thomas and Paré, 2007), FEF (Cohen et al., 2009), and the SC (McPeek and Keller, 2002; Basso and May, 2017)] and sometimes even shorter than the delay associated with the earliest visually triggered saccades known as express saccades (Pare and Munoz, 1996; Dorris et al., 1997; Marino et al., 2015; Heeman et al., 2019). This discrepancy between viewing behavior during controlled experiments and free viewing raises the question how saccade target selection is affected by visual processing in a less controlled setting where eye movements are more similar to the way we scan our environment in day-to-day situations.
In controlled eye movement studies, saccadic reaction times (SRTs) are thought to reflect the time required to process visual input and are used to examine the sensorimotor processes associated with selecting the new saccade target (Heeman et al., 2017, 2019). These studies have provided important insights into the role of stimulus properties for the control of saccades (e.g., express saccades) as well as the underlying mechanisms and neural circuits (Marino et al., 2015; White et al., 2017a). However, during free viewing, there is no controlled “target” per se. But, assuming a saccade is directed toward a region of interest, the duration of a fixation prior to the saccade may be thought of as the free viewing equivalent to controlled experiment SRTs. In order to gain insight into the mechanisms of visual processing related to saccade target selection in a dynamic context, we examined how fixation duration prior to a saccade correlates to neuronal activation and visual saliency at the saccade goal as an estimate of the available time to process visual input.
In the current study, we combine eye tracking, extracellular recordings, and computational modeling to correlate neuronal responses and visual saliency to fixation duration. Eye movements of three rhesus monkeys are described while they freely viewed a series of dynamic, natural videos (Fig. 1A). Simultaneously, the activity of neurons in the superior colliculus (SC), a multilayered midbrain structure that is critical for control of gaze (Adamuk, 1870; Wurtz and Albano, 1980; Lee et al., 1988; Munoz et al., 2000), attention (Wurtz and Goldberg, 1972a; Krauzlis et al., 2013), and saliency coding (White et al., 2017a, b; Conroy et al., 2023) was recorded. We use a six-channel computational saliency model (Itti et al., 1998; Itti and Koch, 2001) to measure the saliency of the visual input associated with the dynamic video (Fig. 1B). Saccades directed into the receptive field (RF) of a neuron are selected, and the pre-saccadic neuronal data is correlated with the fixation duration prior to the selected saccades (Fig. 1C,D). Furthermore, and unique to task free gaze behavior, there are often sequences of quick saccades preceded by (sequences of) very short-duration fixations (Kiat et al., 2021). We sought to understand if and how SC neurons encode visual inputs during these consecutive fixations and to what extent salient inputs may be processed on such a short time scale.
Free viewing of dynamic video. A, Illustration of a single video frame with eye position (“+”) and the outline of a receptive field (RF) of a single SC neuron (reprint from White et al., 2017b). B, Saliency map based on a computational model to extract the visual features and model a neuronal saliency map based on the log polar transform of visual space in the primate superior colliculus (see Methods section in White et al., 2017b). C, Hypothetical scan path during free viewing. Lines represent saccades, with dark lines and large arrows representing saccades directed into the RF (illustrated by small light gray circles). The small gray “x” illustrates fixations, the bold black “x” represents fixations before saccades that were directed into the RF. D, Illustration of the temporal order and alignment of saccade-fixation pairs used in the analysis of fixations and saccades in the neuronal data.
Specifically, we address three main questions. (1) Does model predicted saliency at the saccade goal predict fixation duration during free viewing? (2) Does the neural correlate of saliency in the SC predict fixation duration? (3) How do SC neurons encode visual saliency across a sequence of saccades?
Materials and Methods
The data used for the present study and detailed methods were published previously (White et al., 2017b). To address questions stated in the Introduction of the current study, the data were analyzed in a novel way. We briefly summarize the key methods below and describe the analyses specific to the research questions of the current study. The data that support the findings of this study are available from the corresponding author upon reasonable request.
Subjects
Data were collected from three male rhesus macaque monkeys (Macaca mulatta) weighing between 10 and 12 kg. The surgical procedures and extracellular recording techniques have been described in detail previously (Marino et al., 2008). All animal care and experimental procedures were approved by the Queen's University Animal Care Committee and in accordance with the guidelines of the Canadian Council on Animal Care.
Stimuli and data acquisition
All behavioral tasks, data acquisition, and recording techniques have been detailed previously (White et al., 2017b). Stimuli were presented in a dark room on a high-definition (HD) LCD video monitor (Sony Bravia 550″, Model KDL-46XBR6) at a screen resolution of 1,920 × 1,080 pixels (60 Hz noninterlaced, 24 bit color depth, 8 bits per channel). The viewing distance of 70 cm resulted in a display spanning 82° × 52° visual angle. In two monkeys (Monkeys Q and Y), the eye position was recorded using a search coil technique (Robinson, 1963). In the third monkey (Monkey I), a video-based eye tracker (EyeLink 1000, SR Research) recorded the eye movements. The data were digitized and recorded running a multichannel data acquisition system (Plexon). Spike waveforms were sampled at 40 kHz. Spike times were digitized at 1,000 Hz.
A total of 516 HD video clips lasting 4–35 s (102,161 video frames) were presented to the animals in random order. Video framerate was recorded using a photodiode and was used to, offline, synchronize video, eye movements, and extracellular recordings. In total 4,267 clips were viewed across three monkeys.
Procedure
Monkeys were head-fixed and seated in a primate chair. Tungsten microelectrodes (2.0 MΩ, Alpha Omega) were lowered into the dorsal SC. When a neuron was isolated, a rapid visual stimulation procedure (Marino et al., 2015) was used to map its visual receptive field, and a delayed saccade task was used to categorize the visual or motor response of the neuron using previously established methods (White et al., 2009; White and Munoz, 2011; Marino et al., 2015). RF mapping and categorization were then followed by the main part of the experiment: the free viewing task. Sessions lasted 2 to 3 h. Minimal liquid reward was given since the monkeys were naturally engaged in the videos.
Neuron classification
A total of 60 neurons formed the basis of the analysis (Monkey Q, n = 27; Monkey Y, n = 12; Monkey I, n = 21). Spikes were convolved with a function that resembled an excitatory postsynaptic potential (Thompson et al., 1996), with rise and decay values of 5 and 20 ms, respectively.
The SC is composed of two dominant functional layers (Wurtz and Albano, 1980; White and Munoz, 2011), a visual-only superficial layer (SCs), and a multisensory/cognitive/motor-related intermediate layer (SCi). The neurons were functionally classified as visual SCs or visuomotor SCi based on their discharge characteristics using a visual RF mapping procedure (Marino et al., 2012b) to determine the presence of a visual component, and a delayed saccade task to determine the presence of a motor component, using previously established methods (White et al., 2009; White and Munoz, 2011; Marino et al., 2012b). In total, 26 neurons were classified as visuomotor SCi, and the remaining 34 were classified as visual SCs.
Eye tracking
Prior to starting the experiment, each monkey performed a thorough 72-point calibration procedure as described previously (White et al., 2017b). Briefly, the animals made saccades to a series of targets that spanned most of the screen (nine eccentricities, eight radial orientations). The targets appeared in random order, with multiple counts per location, and the animals were required to fixate the targets for a minimum of 300 ms for a liquid reward. Saccades were defined as eye movements that exceeded a velocity criterion of 50° visual angle per second and a minimum amplitude of 1° visual angle.
Fixation durations and epochs
Fixation duration was defined as the epoch between the end of a saccade and the start of the next saccade (Fig. 1C,D). The data contained a total number of 97,660 fixations. A subtotal of 12,682 (12.99%) fixations were directed into the RF (Fig. 2). To ensure a sufficient saccade-free, visual integration period (Heeman et al., 2019) while retaining a large amount of data, fixations with a duration of <75 ms or >600 ms were removed from the analyses (Fig. 2D–F). The remaining set of 12,166 (12.46%) fixation–saccade pairs formed the basis of the data analyses in the current study. Model saliency was computed at the RF location for direct comparison with the behavioral fixation/saccade data.
Distributions of fixation durations. A–C, Distribution of fixation durations for each animal, with gray representing fixations in which next saccades could be directed in any direction (full sample), versus black representing the subset of fixations for which the following saccade was directed into the RF. Lower bound of fixations included in the analyses (long solid line), quartiles of the full sample (solid lines), and quartiles of the subset are indicated in the plots (dashed lines). D–F, Cumulative probability distributions for each animal of only the fixations that fell within the restricted fixation duration range used in this study (75–600 ms; gray represents the “all directions” case, and black the “into the RF” case. Only the latter was analyzed in this study). See Extended Data Figure 2-1 for additional information regarding fixations duration distributions and saccade metrics.
Figure 2-1
Extended Data of saccade metrics. A, Saccade metrics sorted according to fixation duration prior to the saccade comparing peak velocity, acceleration, saccade duration and saccade amplitude (left to right) between quartiles of fixation duration from very short to very long. Error bars indicate SEM. B, Scatter plot of amplitude of the saccade into the receptive field as a function of the fixation duration prior to the saccade into the receptive field (•).fixation duration as a function of saccade amplitude. C, Scatter plot of previous fixation duration as a function of fixation duration prior to the saccade into the receptive field (•). Download Figure 2-1, TIF file.
The saliency and neuronal data were divided into either three levels of saliency at the RF location (tertiles: low, medium, and high saliency) or four levels of fixation duration (quartiles: very short, short, long, and very long fixations). See Extended Data Figure 2-1 for comparisons of saccade metrics per quartile of fixation duration (peak velocity, acceleration, saccade duration, and saccade amplitude; Extended Data Fig. 2-1A), a scatterplot of saccade amplitude as a function of fixation duration (Extended Data Fig. 2-1B), and a scatterplot of previous fixation duration as a function of current fixation duration (Extended Data Fig. 2-1C). Averaging of the data was done within a window of 50 ms during the fixation lasting from 25 to 75 ms after the start of the fixation (i.e., during the beginning of the fixation) for fixation-aligned data or −75 to −25 ms before the start of the saccade (i.e., during the end of the fixation) for saccade aligned data.
Data normalization
The neural discharge rate for each neuron was normalized on a scale from 0 to 1 using the following equation:
Saliency model overview
The general architecture of the saliency model has been described in detail (Koch and Ullman, 1987; Itti et al., 1998) and was created and run under Linux using the iLab C++ Neuromorphic Vision Toolkit (Itti, 2004). Briefly, the model is feedforward in nature and consists of six high-level feature maps that are coarsely tuned to the primate visual system (Yoshida et al., 2012): luminance center–surround contrast, red–green opponency, blue–yellow opponency, orientation/edges, flicker center–surround (abrupt onsets and offsets), and motion center–surround (responding to local motion that differed from surrounding full-field motion; Fig. 1). The high-level feature maps were linearly combined with equal weighting to create a feature–agnostic saliency map (Fig. 1B). The feature and saliency maps were computed in a gaze-contingent manner (each video frame was shifted to fovea-centered coordinates). The model was computed on a log polar transformation of the input image (Fig. 1B) to approximate the nonhomogeneous mapping of visual to SC space (Ottes et al., 1986). The processing power required to process the 102,161 video frames in a gaze-contingent, log transformed manner mapping each frame to SC space restricted computing the saliency to the RF location. Saliency within a neuron's RF was computed as the normalized sum of the saliency output within the region specified by the RF. For full details of the methods, see White et al. (2017b).
Statistical analysis
During 60 recording sessions, each lasting 2–3 h, close to 100,000 saccadic eye movements were recorded. During each session a single neuron in either the SCs or the SCi was isolated and recorded. From the large amount of data, it was possible to extract specific eye movements of which the saccade vector spatially coincided with the RF location of the isolated neuron. This left 12,682 saccades for which the neuronal response at the saccade endpoint was known (Fig. 2). Using the computational saliency model developed by Itti et al. (1998), the visual saliency at the saccade endpoints was calculated completing a wealth of information on both the visual properties and the neuronal response at the saccade goal within a free viewing environment (Fig. 1). This information enabled us to examine the processes of target selection leading up to a saccadic eye movement in a dynamic context. We show the dominant visual features that evoked these responses, as well as a neuronal correlate of these processes in the SC.
Saccade and fixation metrics
Before going into the main part of the analyses, we tested whether the selected subset of 12,157 fixations was representative for the full set of 97,660 fixations. To determine whether the selected subset of fixations prior to a saccade into the RF were representative for the distribution of all fixations, the two-sample Kolmogorov–Smirnov test for comparing nonparametric distributions was performed on the fixation durations. The Kolmogorov–Smirnov test between all fixations versus the subset of fixations before saccades into the RF was statistically significant (D(60) = 0.0242; p < 0.001) indicating some difference between the distributions. This difference can largely be contributed to the very large sample sizes (∼100.000 vs ∼13.000) that renders even small differences in the distributions significant. Therefore, the distributions of fixation durations were also quantitatively and qualitatively compared and were similar across all animals and all subsets of saccades used for the current analyses (Fig. 2, Table 1). Mean, median, mode, skewness, and kurtosis for all fixation durations versus the subset of fixations before saccades into the RF can be seen in Table 1.
Descriptive statistics for distributions of fixation duration before all saccades (all) and fixations before saccades into RF (subset)
For further analysis of the saliency and neuronal data, the subset of fixation durations was split into quartiles of fixation durations: very short (M = 146 ms; SD = 12.2), short (M = 191 ms; SD = 18.4), long (M = 235 ms; SD = 21.8), and very long (M = 346 ms; SD = 36.4).
Saccade metrics per quartile of the subset of fixation durations were compared with respect to saccade amplitude, peak velocity, acceleration, and duration of the saccades and were overall comparable and none of the comparisons were significant (Extended Data Fig. 2-1A).
Saliency
To analyze the saliency at the saccade goal, we averaged the saliency at the receptive field location as predicted by the model within a 50 ms window ranging from 25 to 75 ms post-fixation (see Materials and Methods). The one-sample Kolmogorov–Smirnov test for normality indicated that the saliency was not normally distributed (D(60) = 0.601; p < 0.001); therefore nonparametric tests were used where appropriate. To analyze the relation between model predicted saliency and fixation duration, the fixation duration was plotted as a function of saliency. A Wilcoxon signed rank test was performed to test if the resulting regression coefficient differed from 0 indicating the presence of a correlation between saliency and fixation duration. The model also allowed us to analyze the six individual saliency features (motion, flicker, edge, luminance, blue/yellow color opponency, and red/green color opponency). A Kruskal–Wallis test compared the fixation durations at three saliency levels (low, medium, high) for each of the features and per monkey. We report the root mean square error (RMSE) representing the squared residuals of the observed data and the model as an additional measure to facilitate comparisons to future models (e.g., a model weight-tuning each saliency feature).
Neuronal firing rate
The neuronal firing rates of the visual SCs neurons and the visuomotor SCi neurons were analyzed independently. A nonparametric Kruskal–Wallis was used to test differences in neuronal firing rate within the time window of both alignments (25–75 ms post-fixation start and −75 to −25 ms pre-saccade start) and between the four quartiles of fixations duration [very short (vs), short (s), long (l), very long (vl)]. To gain further insight in the differences in neuronal firing rate as a function of fixation duration, six post hoc comparisons (vs-s, vs-l, vs-vl, s-l, s-vl, vl-l), Holm–Bonferroni corrected, were done using a Wilcoxon rank sum test. Additionally, we tested differences of the neuronal firing rate of each individual neuron per quartile of fixation durations (very short, short, long, very long) with a Kruskal–Wallis test resulting in a significant or nonsignificant difference per neuron.
Finally, the value of a dynamic free viewing task is that it allows the investigation of sequential gaze behavior as opposed to reviewing single events in isolation. To analyze the neuronal firing rate across a larger temporal window (up to ∼500 ms before the saccade into the RF), the neuronal data was analyzed based on the two fixations prior to the saccade into the RF (Fig. 5A). The two fixations will be referred to as either the current fixation, denoting the fixation right before the saccade into the RF, or the previous fixation, referring to the fixation before the current fixation. The neuronal firing rate of the very short and the very long current fixations, as described in the previous paragraph, were categorized based on the duration of the previous fixation. This resulted in four conditions for the analysis of the neuronal firing rate across a larger temporal window: short fixations followed by very short fixations (s/vs), long fixations followed by very short fixations (l/vs), short fixations followed by very long fixations (s/vl), and finally, long fixations followed by very long fixations (l/vl; Fig. 5A). Subsequently, the data was aligned on fixation and two epochs were identified: Epoch A, 400–25 ms prior to the start of the fixation, and Epoch B, 25–75 ms after the start of the fixation. The analysis of each epoch was identical to the previous analyses. First, a Kruskal–Wallis test identified the presence of an overall difference between the neuronal traces and second, post hoc Wilcoxon rank sum tests on six post hoc comparisons (s/vs-l/vs, s/vs-s/vl, s/vs-l/vl, l/vs-s/vl, l/vs-l/vl, s/vl-l/vl), Holm–Bonferroni corrected identified which comparisons were responsible for any significant differences. SCs and SCi neurons were analyzed independently.
Results
Saliency at the saccade goal predicts fixation duration
Free viewing gaze is typically directed toward salient stimuli (Itti et al., 1998; Itti and Koch, 2001), and in controlled studies, SRT is inversely related to stimulus intensity and contrast (i.e., saliency; White et al., 2006; Marino et al., 2012a). Therefore, we hypothesized that fixation duration would also be inversely related to saliency at the saccade goal. Figure 3A shows fixation duration as a function of model predicted saliency at the saccade goal, computed from 25 to 75 ms after the start of the fixation (fixation durations of <75 ms were removed from analyses; see Materials and Methods). We observed a significant negative correlation between across all three animals, r = −0.66, −0.64, −0.63, p = 0.032, 0.002, 0.006, for Monkeys I, Q and Y, respectively; this means that fixation duration was inversely correlated with saliency and became shorter with increasing saliency at the saccade goal. The RMSE between the observed data and the predicted data represented by the regression lines in Figure 3A were as follows: Monkey I, 94.35; Monkey Q, 81.67; Monkey Y, 80.94; all monkeys, 85.49.
Fixation duration as a function of saliency. A, Fixation duration as a function of saliency at the saccade goal during the window from 25 to 75 ms after the start of the fixation (fixation aligned) for each saccade into the RF. Overall trends across all animals (solid line) and the individual animals (dashed lines) are represented by the regression lines. B, Contribution of each feature channel to the modulation of fixation duration for each animal (dashed lines) as well as combined across all animals (solid lines). Error bars indicate SEM.
Providing insight in the different feature channels, Figure 3B shows plots of the mean fixation duration as a function of saliency for each of the six basic features of the model (binned into tertiles of low, medium, and high feature saliency at the saccade goal). We found a systematic decrease in fixation duration with increasing motion- (χ2(2) = 11.12; p = 0.004) and edge-related (χ2(2) = 9.86; p = 0.007) feature saliency. None of the other features showed statistically significant main effects, but all showed trends in the expected direction (flicker: χ2(2) = 2.41, p = 0.299; luminance: χ2(2) = 1.84, p = 0.398; blue–yellow: χ2(2) = 1.27, p = 0.530; red–green: χ2(2) = 1.68, p = 0.431). Thus, while the trend is in the predicted direction for most of the basic model features, motion and edge information resulted in the most reliable pattern.
Combined, these results indicate that higher saliency at the RF location was associated with shorter pre-saccadic fixation duration, with motion and edge, more so than color, luminance, and flicker, information being the strongest predictors. This aligns with controlled studies of the visual properties that affect saccade initiation and SRT (White et al., 2006; Mills et al., 2011; Marino et al., 2012b).
SC visual saliency response is inversely related to fixation duration
We tested the hypothesis that, during free viewing, the magnitude of the visually evoked response in SC is inversely related to fixation duration before saccades directed into the RF. Figure 4 shows averaged spike density functions as a function of increasing fixation duration for the sample of SCs (red; n = 34) and SCi neurons (blue; n = 26). For fixation-aligned traces (Fig. 4A,C), the statistical analyses were restricted to a 50 ms saccade-free temporal epoch from 25 to 75 ms after the start of the fixation. Similarly, for saccade aligned traces (Fig. 4B,D), the statistical analyses were restricted to a 50 ms saccade-free epoch from −75 to −25 ms before saccade onset (i.e., at the end of the fixation). When aligned on fixation onset, there was a systematic change in the magnitude and latency of the response curves for both SCs (Fig. 4A) and SCi neurons (Fig. 4C; SCs: χ2(3) = 34.811, p < 0.001; SCi: χ2(3) = 68.123, p < 0.001). Activation within the epoch systematically increased as fixation duration decreased for both SCs and SCi neurons. However, when aligned on saccade onset (Fig. 4B,D), only SC neurons retained this systematic inverse relationship between neural activation and fixation duration (SCs: χ2(3) = 17.884, p < 0.001; SCi: χ2(3) = 7.539, p = 0.057 ns). Correspondingly, these differences in activation for SCi and SCs neurons between the start and the end of the fixation were also visible when analyzing the individual neurons. For SCs neurons, 41% (14 out of 34) when aligned on fixation and 26% (9 out of 34) when aligned on the saccade showed the inverse correlation between fixation duration and neural discharge rate. For SCi neurons, however, the discrepancy between fixation aligned and saccade aligned was much larger. Eighty percent (21 out of 26) of the SCi neurons showed an inverse correlation between fixation duration and neural discharge rate when fixation aligned versus only 15% (4 out of 26) when saccade aligned. Taken together, the disappearance of the inverse correlation between neural discharge rate and fixation duration just prior to the saccade for SCi neurons suggests that most of the variability in activation between fixation durations in the SCi was attributable to motor activity. While, in contrast, for SCs neurons, the differences between the activation remained up until the saccade, indicating that the systematically modulated activation was not purely motor related.
A, C, Averaged normalized neural discharge rate aligned on the start of fixation for the SCs (red; n = 34) and SCi (blue; n = 26) neurons, respectively. B, D, Averaged normalized neural discharge rate aligned on saccade for the SCs and SCi neurons, respectively. Inset in all panels show mean (±SEM) discharge rate calculated over a 50 ms window indicated by the vertical shaded bars (see Materials and Methods). Results per neuron are represented by thin gray lines in the insets. Asterisks indicate p < 0.05, Holm–Bonferroni corrected. Details of all statistics are in the text.
Furthermore, differences between the spike density functions for the different fixation durations for SCs neurons were sustained across a wider temporal window than for SCi neurons. For SCi neurons the curves started to diverge ∼150 ms before the start of the fixation while for SCs neurons the divergence already began ∼500 ms before the fixation (Fig. 4A,B). This led us to examine whether SCs neurons encode multiple sensorimotor plans at the same time spanning multiple saccades and fixations (i.e., concurrent processing) during free viewing.
SCs neurons have a neural correlate of concurrent visual processing
Because the effects of fixation duration were sustained across a wide temporal window around the alignment, in particular for SCs neurons, we investigated sequential gaze behavior and not just as single events in isolation. An interesting aspect of our free viewing data is that we observed a large number of very short fixations (by 200 ms after the start of a fixation 45.2% of the saccades had already been initiated; Fig. 2). These short fixations leave very little time for independent sensorimotor processes and raises the question to what degree the neural activity related to these short fixations reflects concurrent processing of multiple sensorimotor plans. Specifically, a concurrent processing hypothesis which states that the saccadic system can simultaneously program two saccades or more to different goals (McPeek et al., 2000; McPeek and Keller, 2002; McSorley et al., 2016, 2020) would predict that some information from a previous sensorimotor plan can be carried over from one fixation to the next. Concurrent processing can be expected to occur when the inter-saccade interval is short. Therefore, this is likely to occur when there is a sequence of saccades with short fixations in between the saccades (McPeek and Keller, 2002), as is the case in our data with consecutive short fixations. Based on our observation of the large number of very short fixations in conjunction with the early elevated SCs neuronal firing rate as described above, we hypothesized that the level of activation before and during the current fixation is an indication of concurrent processing and dependent on whether the current fixation was preceded by a short versus a long fixation. The idea here is that concurrent sensorimotor plans would have overlapped and therefore result in additive elevated activation patterns around the peri-fixation period.
As described in the Materials and Methods and visualized in Figure 5A, this hypothesis was assessed by aligning the data to the fixations and comparing the activation at the beginning of the fixation of the shortest versus the longest current fixations depending on whether the previous fixation durations were short or long. This method resulted in four conditions: short fixations followed by very short fixations (s/vs), long fixations followed by very short fixations (l/vs), short fixations followed by very long fixations (s/vl), and finally, long fixations followed by very long fixations (l/vl). In Figure 5, the activation during very short current fixations and very long current fixations is denoted by dark and light shading, respectively. Whether the previous fixation was short or long fixation is denoted by solid and dashed curves, respectively.
A, Schematic of the selection of fixation durations based on the duration of the previous fixation. B, C, Average normalized spike density functions for each of the four fixation duration conditions for our sample of SCs neurons (in B, red), and SCi neurons (in C, blue). Shaded gray regions denote the two epochs (Epoch A and Epoch B) where the activation was averaged for statistical tests. D–G, Averaged firing rate during the pre-fixation (Epoch A) and post-fixation (Epoch B) aligned epochs denoted by the gray shaded regions (in B, C). Asterisks indicate p < 0.05, Holm–Bonferroni corrected. Details of all statistics are in the text. See Extended Data Figure 5-1 and Table 5-1 for additional information regarding saccade amplitude and normalized firing rate.
Figure 5-1
Extended Data of saccade amplitude. A, C, Schematic of the selection of fixation durations based on the amplitude of the preceding saccade. Average normalized spike density functions for each of the four resulting conditions for our sample of SCs neurons (in A, red), and SCi neurons (in C, blue). Grey shaded area denotes the analysis epoch (Epoch A) where the activation was averaged for statistical tests. B, D Averaged firing rate during the pre-fixation aligned epoch denoted by the black bar (in A, C). Holm-Bonferroni corrected. Download Figure 5-1, TIF file.
Table 5-1
Differences in normalized firing rate between fixation durations based on the duration of the previous fixation (see Fig. 5) and fixation durations based on the amplitude of the preceding saccade (see Fig. 5-1). Download Table 5-1, DOCX file.
Figure 5, B, D, and E, shows the averaged normalized neuronal firing rate across the four conditions for SCs neurons (red in figures). The Kruskal–Wallis test shows that the differences between the curves in SCs was significant for both Epoch A (400–25 ms before the start of fixation, χ2(3) = 23.143, p < 0.001) and Epoch B (25–75 ms after the start of fixation, χ2(3) = 32.236, p < 0.001). This result means that the activation for SCs neurons during both epochs varied depending on the different sequences of fixation durations. In the post hoc analyses, we first considered Epoch B (Fig. 5B,E). The very short current fixation duration conditions for SCs neurons show a difference in activation depending on whether the previous fixation was short of long (s/vs-l/vs: z = 2.226, p = 0.026, solid vs dashed dark red). When the previous fixation was short (s/vs, solid dark red) the activation in Epoch B was elevated compared with when the previous fixation was long (l/vs, dashed dark red). This is consistent with the concurrent processing hypothesis based on the fact that the s/vs (solid dark red) condition involved two quick sequential sensorimotor plans and their respective activation patterns merged or overlapped producing the elevated response. The alternative explanations that this pattern is the result of consecutive short amplitude saccades and overlapping activation within the same receptive field has been excluded (Extended Data Fig. 5-1, Table 5-1). For the two very long current fixation duration conditions (s/vl and l/vl, solid and dashed light red), it can be observed that, regardless of the duration of the previous fixation, the activation of SCs neurons was reduced compared with when the current fixation is short (light red pairs versus dark red pairs: s/vl-s/vs: z = 3.129, p = 0.002; s/vl-l/vs: z = 1.500, p = 0.147; l/vl-s/vs: z = 5.488, p < 0.001; l/vl-l/vs: z = 3.781, p < 0.001). As can be seen in Figure 5B, this reduction in activation is most prominent for the l/vl (dashed dark red) condition. Again, this is consistent with the concurrent processing hypothesis because in the l/vl condition, the separation between the previous and the current sensorimotor plans would be largest. This separation would hold less overlap in their respective processes and therefore exhibit a more attenuated activation pattern during Epoch B.
Interestingly, the pattern during Epoch A was slightly different (Fig. 5B,D). Here, for SCs neurons, the activation difference was tied to whether the current fixation duration was very short versus very long (light red pairs vs dark red pairs: s/vl- s/vs: z = 3.492, p < 0.001; l/vl- s/vs: z = 3.717, p < 0.001; s/vl- l/vs: z = 3.522, p = 0.002; l/vl- l/vs: z = 3.229, p = 0.001). In other words, the very short current fixation durations were associated with elevated activation during epoch A (Fig. 5B,D, dark red), whereas the very long current fixation duration was associated with attenuated activation during Epoch A, regardless of whether the previous fixation was short or long (Fig. 5B,D, light red). This division between the short and long fixations is consistent with the idea that both very short fixation duration conditions would be expected to have elevated activation (relative to the very long conditions) during Epoch A if their sensorimotor processes were at a heightened state because of an impending saccade that was close at hand. Presumably this ramp up of activation during Epoch A for the very short conditions is not motor related per se since these are predominantly sensory (visual) neurons but might represent the sensory inputs that are leading to an increase in activation leading up to the quick response as the eyes only pause for a short moment before the saccade into the RF.
In contrast to the SCs neurons, the pattern for SCi neurons (blue in figures) was somewhat different (Fig. 5C,F,G). The Kruskal–Wallis test showed a significant difference between the curves during Epoch B (χ2(3) = 66.288; p < 0.001). There was a strong difference between the very short and the very long current fixation duration conditions (Fig. 5C,G, dark blue vs light blue; s/vs-s/vl: z = 5.866, p < 0.001; s/vs-l/vl: z = 0.5.628, p = < 0.001; l/vs-s/vl: z = 5.847, p = < 0.001; l/vs-l/vl: z = 5.536, p < 0.001). However, neither the very short current fixation duration conditions (solid vs dashed dark blue; s/vs-l/vs: z = 0.302, p = 0.763) nor the very long current fixation duration conditions (solid v dashed light blue; s/vl-l/vl: z = 0.366. p = 0.714) showed a difference in activation related to the duration of the previous fixation. This is consistent with the fact that the response pattern here is mostly tied to the motor response itself that was delayed in the very long conditions. Similarly, the pattern during Epoch A for SCi (Fig. 5C, Epoch A) was also very different from SCs (Fig. 5B, Epoch A), with mostly nonsignificant differences between the curves in SCi during Epoch A (Fig. 5F; χ2(3) = 7.696; p = 0.053).
Discussion
We examined the relationship between visual saliency and fixation duration and how this relationship is reflected by neuronal activity in the SC. We addressed three main questions. First, we demonstrated that model saliency predicted fixation duration and higher saliency at the saccade goal resulted in shorter pre-saccadic fixation durations (Fig. 3A), confirming that, during free viewing, fixation duration is a valid instantiation of SRT. Second, we showed that the neuronal firing rate within the SC at the location coding for the future saccade predicted pre-saccadic fixation duration (Fig. 4). Finally, our recordings revealed that SCs neurons, but not SCi neurons, had higher activation during two consecutive short fixations preceding a saccade into the RF (Fig. 4A), suggestive of concurrent processing during free viewing.
During free viewing, saccade latency (e.g., SRT) is not trivially the same as visual processing time because there is an abundance of visual stimulation throughout the experiment. Often, the saccade target has been on the screen multiple saccades in advance of becoming the saccade target. Therefore, it is not at all clear when processing of the saccade goal commences. These results are consistent with behavioral evidence correlating saliency and fixation duration in behavioral models such as the LATEST model (Tatler et al., 2017) and SceneWalk model (Schwetlick et al., 2023). Our results confirm that fixation duration is a valid behavioral correlate of visual processing in the SC and saliency at the saccade goal during free viewing, similar to SRT in controlled tasks.
Short fixation durations associated with high saliency at the saccade goal were accompanied with higher neural activity in the SC. Independent of the upcoming motor activity, the more visually tuned SCs neurons showed elevated neuronal activity related to the duration of the fixation (as shown by the differences in neuronal activation just prior to the saccade initiation; Fig. 4B). This relationship between fixation duration and neuronal activity just prior to the saccades was not present in the SCi: neuronal activity in the SCi just prior to the saccade was very similar across all fixation durations (Fig. 4D) which means that the activity was most likely not driven by the saliency at the saccade goal but by the impending motor activity.
The distinction between the neuronal activity in the SCs and the SCi corroborates existing research regarding the function of the different layers of the SC (Basso and May, 2017; White et al., 2017b; Conroy et al., 2023). The visuosensory tuned superficial layers of the SC, receiving input from the visual cortex and retina, respond directly to the saliency at the saccade goal. This response is reflected in the longer fixation durations when nothing salient is happening at the saccade goal and conversely, when the saccade goal is salient, a saccade toward that goal is imminent, and the fixation duration is short. The process in the SCi is different. The SCi integrates input from multiple different areas such as FEF, LIP, and the SCs into a single motor command that is sent to the brainstem premotor circuit and translated by the oculomotor muscles into a saccade (Thompson et al., 1996; Conroy et al., 2023). The motor command is initiated when the SCi activity surpasses the saccade initiation threshold (Pare and Hanes, 2003; Marino et al., 2015; Stine et al., 2023). This threshold is similar for all saccades which explain the observed independence between the SCi activity and the saliency at the saccade goal and fixation duration just prior to the saccade. So, although there is an inverse correlation between fixation duration and SCi activity at the start of the fixation (Fig. 4C), this correlation is no longer present as the saccade approaches (Fig. 4D). This evidence verifies the idea of saliency coding in the SCs and priority coding in the SCi (Basso and May, 2017; White et al., 2017b) and validates this division of labor between the SCs and SCi in dynamic free viewing.
Interestingly, we observed elevated neuronal activity in the RF of SCs neurons during an extensive temporal window (∼500 ms) prior to the fixation leading up to the saccade into the RF which seemed independent of the motor-related activity in the SCi. The enhancement of neuronal activity was especially apparent in the occurrence of a sequence of consecutive short intersaccadic intervals. The observed intersaccadic intervals were shorter than the minimal visual efferent and afferent neuronal conduction delays (i.e., the minimal time necessary to convert sensory input into an eye movement; (Munoz et al., 2000; Marino et al., 2015; Heeman et al., 2019) which means that intersaccadic intervals were too short to allow visual processing “from scratch”. Thus, the limited time that is available for sensory processing between saccades suggests that multiple sensorimotor plans are developing in parallel and across multiple saccades.
Wurtz and Goldberg (1972b) first described enhancement of visual neural responses in actively behaving awake monkeys performing an eye movement task. They observed a burst in neural responses before monkeys made saccades to visual stimuli, specifically in the SCs. As observed in our free viewing data, the visual responses at the location of the future saccade goal were similarly enhanced in the SCs. Additionally, when the fixation durations were short, these responses were already enhanced at the start of the fixation, more so than at the start of longer fixations (Fig. 4A). This enhancement for short fixations was especially apparent in a train of two short fixations (Fig. 5B). This increase in neuronal activity in the SCs at the start of a fixation combined with the observed short intersaccadic intervals can, therefore, be viewed as proactive visual processing spanning more than one saccade which is evidence for parallel or concurrent processing. Concurrent processing has been identified both in humans (McSorley et al., 2016, 2020) and monkeys (McPeek and Keller, 2001; Caspi et al., 2004; Shen and Paré, 2014) performing various oculomotor tasks. Sometimes the rapid saccades in these controlled experiments had almost no intersaccadic interval. Additionally, during the execution of an initial saccade, activity related to the goal of a quickly following second saccade can be simultaneously maintained in the SC (McPeek and Keller, 2002). The results of the current analyses support overlapping activation patterns associated with quick sequential eye movements during free viewing, which is, for our sample of SCs neurons, consistent with a concurrent processing hypothesis (McPeek et al., 2000; McPeek and Keller, 2002). The enhanced neural activity appears to signal the selection of the second saccade goal even before a saccade has ended. These results indicate that, at the time of an initial saccade, the SC does not necessarily act as a strict winner-take-all network for only the upcoming saccade, but rather, it appears that the saliency of a second visual goal can be simultaneously maintained. With these results, we have identified clear evidence for concurrent processing in the SCs during dynamic free viewing.
Finally, the idea that the short fixation durations correlated with concurrent processing can also be related to the fastest visually triggered saccades, express saccades (Fischer and Boch, 1983). A saccade is made when the saccade-related activity in the SCi surpasses the saccadic response threshold (Marino et al., 2015). Express saccades are triggered when the visual transient response passes through the SC and becomes the motor command in the SCi (Edelman and Keller, 1996; Dorris et al., 1997; Basso and May, 2017). Normally, a visual transient does not trigger an express saccade but with sufficient pre-target build-up of activity in the SCi, this activity combines with the visual transient response to cross the saccade threshold. We suspect that many of the shortest fixation durations in our data were the result of express saccades responses to visual transients produced by the clip changes that we had no experimental control over. We observed that neural activity during the shortest fixation durations was elevated relative to longer fixations. Like in more structured tasks, this elevated activity, representing the combination of preparatory activity and visual salient activity within one location of the visual field, likely triggered saccades after very short fixations. This pre-visual “build-up” preceding short fixation duration saccades is consistent with priority coding in the SCi, where input from multiple sources is integrated resulting in the saccadic response (White et al., 2017a). Express saccades are therefore possibly a highly strategic manifestation of concurrent processing of saccades.
In the laboratory, free viewing tasks are often used as an alternative for real-world viewing. We confirm that fixation duration is indeed a valid behavioral indicator of neural processing in the SC and saliency at the saccade goal. The results of the current study show (1) a close relation between fixation duration and saliency at the saccade goal, with motion and edge information being the strongest saliency predictors, (2) a close relation between fixation duration and visual saliency responses in both the SCs and the SCi, and (3) that SCs neurons showed more activation for two consecutive short fixations (pointing toward concurrent processing in the SCs) and express saccades may be the result of this concurrent processing.
Footnotes
VIDI Grant (452-13-008) from the Netherlands Organization for Scientific Research to S.V.D.S, ERC Advanced Grant 833029 - [LEARNATTEND] to J.T, Canadian Institutes of Health Research Grant (MOP-FDN-148418) and Canada Research Chair Program to D.P.M, VU Talent Fund, Jo Kolk Studiefonds, and Stichting Talent Support to J.H.
The authors declare no competing financial interests.
- Correspondence should be addressed to Douglas P Munoz at doug.munoz{at}queensu.ca or Jessica Heeman at j.heeman{at}uu.nl.