Neurophysiological and phenomenological data on sensorimotor decision making are growing so rapidly that it is now necessary and achievable to capture it in biologically inspired models, for advancing our understanding in both research and clinical settings. However, the main impediment in moving from elegant models with few free parameters to more complex biological models in humans lies in constraining the more numerous parameters with behavioral data (without human single-cell recording). Here we show that a behavioral effect called “saccadic inhibition” (1) is predicted by existing complex (neuronal field) models, (2) constrains crucial temporal parameters of the model, precisely enough to address individual differences, and (3) is not accounted for by current simple decision models, even after significant additions. Visual onsets appearing while an observer plans a saccade knock out a subpopulation of saccadic latencies that would otherwise occur, producing a clear dip in the latency distribution. This overlooked phenomenon is remarkably well time locked across conditions and observers, revealing and characterizing a fast automatic component of visual input to oculomotor competition. The neural field model not only captures this but predicts additional features that are borne out: the dips show spatial specificity, are lawfully modulated in contrast, and occur with S-cone stimuli invisible to the retinotectal route. Overall, we provide a way forward for applying precise neurophysiological models of saccade planning in humans at the individual level.
A limitation of human behavioral measures is that they record only the outcome of the decision process, i.e., the option eventually chosen, whereas the internal dynamics of the system must be inferred. For this reason, progress in understanding human action decisions has been driven by the application of models, derived from mathematical decision theory and from neurophysiological recordings in monkeys (Glimcher, 2001; Schall, 2004; Smith and Ratcliff, 2004).
Although the most influential models emphasize theoretical elegance and parsimony (Carpenter and Williams, 1995; Shadlen et al., 1996; Ratcliff et al., 2003), their success also rested on showing functional correspondence with key properties of presaccadic neurons (Hanes and Schall, 1996). A more recent generation of models aim at representing the behavior of specific neuronal maps in more detail (Kopecz, 1995; Trappenberg et al., 2001; Wilimzig et al., 2006; Cutsuridis et al., 2007; Lo et al., 2009; Meeter et al., 2010), can capture a wider range of behavioral paradigm, and potentially promise a more sophisticated understanding of saccadic planning. However, for complex models to provide diagnostic and predictive power in clinical or research applications, it is essential that their numerous parameters are meaningful behaviorally, can be constrained in humans, and help disentangle individual differences. Here, we show how the precise measurement of a simple effect, combined with an established neuronal field model, is rich enough to be used as a “behavioral electrode” to constrain, for individual observers, the temporal properties of exogenous and endogenous signals in oculomotor decision.
Irrelevant stimuli (“distracters”) appearing after a saccade target can have a characteristic effect on the saccadic latency distribution, producing a clear dip in the number of saccades ∼90 ms after distractor onset. This phenomenon, called “saccadic inhibition,” was first reported in reading studies (Reingold and Stampe, 1999, 2000, 2003, 2004) and then shown to generalize to other eye movement tasks (Reingold and Stampe, 2002; Buonocore and McIntosh, 2008; Edelman and Xu, 2009). It is thought to arise through rapid visual input to, and inhibitory connections within, the superior colliculus (SC) (Reingold and Stampe, 2000, 2002). However, the wider community has primarily overlooked its importance for revealing what is normally hidden in behavioral paradigms: the precise character of automatic activity from a competing, but not chosen, stimulus.
We first show that simple models (Carpenter and Williams, 1995) cannot account for dips (even with significant additions), whereas they are predicted by an existing neuronal field model (Trappenberg et al., 2001). We then demonstrate the high temporal consistency of the dips across temporal conditions and between observers and use these data to quantitatively constrain crucial free parameters of the neuronal field model, which previously could only be estimated from monkey data. We then illustrate how the model flexibly accounts for empirical manipulation of contrast and chromaticity in previous data (Bompas and Sumner 2009a,b) by systematic changes in the strength and speed of exogenous signals. Finally, we update the model by testing whether facilitation or inhibition occurs for distractors appearing at the same location as the target.
Materials and Methods
For clarity, we present all the behavioral methodology together, followed by the modeling.
Experiment 1 (dip timing)
Observers and material.
Four experienced observers participated (one female and three males). All had normal vision and received payment. Stimuli were displayed binocularly with 72 cm viewing distance on a Sony Trinitron 19 inch GDM-F400T9 monitor, driven by a Cambridge Research Systems (CRS) ViSaGe graphics board at 100 Hz, calibrated with a CRS ColorCal and associated software. Eye movements were recorded using the CRS high-speed (250 Hz) video eye tracker mounted on a combined chin and head rest.
Stimuli and procedure.
The fixation point was a small light gray square (32 cd/m2 occupying 0.1 × 0.1 deg2) and appeared at the start of the trial, on a gray background (MacLeod–Boynton coordinates, 0.643, 0.021) with 25 cd/m2 luminance. A fixed delay (700 ms) later, the target stimulus, a small black square (10 cd/m2, occupying 0.25 × 0.25 deg2), appeared randomly on the left or on the right of fixation (8°). Observers were instructed to saccade rapidly to the target, ignoring any other stimuli. Fixation and target stimuli extinguished together after 300 ms, and fixation reappeared 500 ms later to begin the next trial. On 83% of trials, a distractor appeared opposite the target for 50 ms (this duration was chosen for consistency with previous experiments; Bompas and Sumner, 2009a,b). Distractor stimuli consisted of larger gray squares (1 deg2, 30 cd/m2) also centered at 8° eccentricity and were presented with five different stimulus onset asynchronies (SOAs), ranging from 0 (simultaneous) to 80 ms after the target in steps of 20 ms. These distractor-present conditions were randomly mixed with no-distractor trials, giving six conditions per target direction, each repeated 600 times, producing 1200 saccades per condition after averaging directions. The total 7200 saccades per observer were split into 15 blocks of 15 min each.
Saccades were detected using a velocity criterion of 100°/s, and saccade onset was defined at velocity 24°/s, and this automatic saccade detection was visually checked in every trial and corrected when necessary. Trials were excluded when the amplitude of the first saccade did not reach half the stimulus eccentricity. This conservative rule takes into account that saccade amplitude is shortened in the presence of contralateral distractors during the dip (by 28% on average; Edelman and Xu, 2009). Saccades with sufficient amplitude were then categorized as errors or correct responses depending on their horizontal direction. Latencies <75 ms or >500 ms were also excluded.
Saccade latency distributions were obtained with a bin size of 4 ms (being the temporal precision of the eye tracker). To evaluate the amplitude and timing of dips, we calculated, for each time point, the proportional change of saccades in the distractor-present distribution relative to the number in the baseline distribution, i.e., (baseline − distractor distribution)/baseline, thereafter referred as distraction ratio (following Reingold and Stampe, 2004). Using the ratio rather than the difference in saccade count between the distractor and no-distractor conditions ensures that dip parameters are independent of when the dip occurs within the distribution, i.e., whether it occurs when there many or few saccades in the baseline distribution. The beginning of the dip (blue dot) was obtained by going backward in time until the ratio became smaller than 2% (or the no-distractor bin was empty, which occurs when simultaneous distractors already affect the very start of the distribution and results in overestimating dip onset time). To improve stability of these estimates, distributions were lightly smoothed using a Gaussian kernel with 5 ms window and 1 ms SD and interpolated to obtain 1 ms precision, before the ratio was calculated. The interpolation had the consequence of systematically anticipating the beginning of dips by 4 ms. Because this was the case similarly for all conditions and in our analysis of simulated data, we did not correct for this.
Contrast data (from Bompas and Sumner, 2009a).
The material and procedure was the same as for experiment 1, except in the details below. Three observers participated (two female). The gray distractors had one of seven different contrast levels (8, 12, 18, 27, 40.5, 61, and 91%), and there were nine different SOAs, ranging from 80 ms before to 80 ms after the target by steps of 20 ms. Each distractor-present condition occurred 45 times for each target direction, giving 90 saccades per condition after averaging directions (630 for the no-distractor baseline) and a total of 6300 saccades per observer. We also measured (in a separate block) saccade latency to the various stimuli used as distractors in the main task.
Chromatic data (from Bompas and Sumner, 2009b).
The material and procedure was the same as for experiment 1, except in the details below. Five observers participated (four female). The distractors were either gray or lilac (50%), the latter being calibrated for each observer to be visible only to S-cones on the gray background, which was made up of many squares modulated randomly in luminance between limits (Sumner et al., 2002, 2006; Smithson et al., 2003). The gray distractors were individually matched in salience with lilac stimuli. Distractors occurred on half the trials, with nine possible SOAs, ranging from 80 ms before to 80 ms after the target in steps of 20 ms. The 36 distractor conditions (distractor color × SOA × target direction) were presented 45 times each, giving a total of 3240 trials per observer and 90 saccades per condition after averaging direction.
Experiment 2 (distractor location)
The material and procedure was the same as for experiment 1, except in the details below. Three observers participated (one female), all of which also participated in experiment 1. There were both ipsilateral and contralateral distractors, respectively, centered at the same or mirror location to the target. We used one SOA for each observer, selected from the results for experiment 1 to reveal a clear dip (50, 20, and 60 ms for observers 1, 2, and 3). Each condition (ipsilateral, contralateral, or no distractor) occurred 1200 times for each target direction, giving 2400 saccades per condition after averaging directions.
The (almost) linear model: architecture
We attempted to capture the dip phenomenon with the influential linear accumulator models of saccade decision, such as LATER (linear approach to threshold with ergodic rate) (Carpenter and Williams, 1995), by adding an endogenous control signal that inhibits distractor activity and mutual inhibition, similar to previous interactive models (Boucher et al., 2007) (Fig. 1A). In this extension of LATER, we simulate the activity of two neurons, corresponding to the target and the distractor. The activity ui of neuron i starts rising across time t after a delay δvis, from a fixed baseline value u0, according to the following equation: where μi describes the mean rate of accumulation of visually driven activity and is modulated by noise which remains fixed during the trial, with η a normally distributed random variable η = N(0,1) whose amplitude is modulated by aη (Fig. 1B). Noise terms are independent but of equal amplitudes for target and distractor. Mean rate of accumulation is either equal in both neurons (distractor present) or one is set to 0 (no distractor condition). Neurons inhibit each other in proportion to their activity above baseline, with synaptic weight wij. They receive endogenous inhibition Iendo, which occurs a short delay (δendo) after stimulus-driven activity commences and for simplicity remains constant (aendo) afterward. aendo is set to 0 for the target. The accumulation process is simulated in steps of 1 ms until target or distractor activity reaches threshold and thus the response choice is determined. Saccade latency is the time that threshold is reached plus a constant output delay δout.
In addition, to account for the distinct early mode (“express saccades”) present in the latency distributions of our observers as well as their non-null error rates at SOA = 0, we added two extra LATER units (for target and distractor locations), with a large SD aηe and a μe = 0 (exactly like in the study by Carpenter and Williams, 1995). Importantly, however, these units “race” to threshold independently from the main LATER units and from each other. They are not affected by lateral or endogenous inhibition.
Importantly, when the target is presented alone, the model behaves exactly as LATER, but in the distractor conditions, the inhibition mechanisms mean that activity profiles become slightly nonlinear (Fig. 1B,C). Thus, we refer to our extension of LATER as “approximately linear inhibition-governed approach to threshold with ergodic rate” (ALIGATER).
Neuronal field model: architecture
We used the existing neurophysiologically inspired neural field model of Trappenberg et al. (2001), which shares key features with similar models (Kopecz, 1995; Usher and McClelland, 2001) and with the conceptual account of Reingold and Stampe (2000, 2002). Saccade generation results from activity in an oculocentric motor map constituted of buildup and burst neurons, like those observed in intermediate layers of superior colliculus. In our version, we only modeled buildup neurons, because these are the ones effectively responsible for decision, whereas burst neurons in the model triggered execution when the buildup neurons reached a threshold. The activity of each buildup neuron rises or decreases across time as a function of the inputs it receives, until the activity in one neuron reaches an initiation threshold, which triggers a saccade to the corresponding location. Fixation neurons are treated as buildup neurons coding for a null saccade. The model is a leaky competing accumulator, in which the average spiking rate Ai of neuron i is a logistic function of its internal state ui: The internal state ui varies across time t according to the following equation: where the essential features of the model are the separate transient (exogenous) and sustained (endogenous) input signals (Iexo and Iendo), and the influence of lateral excitation and inhibition from the activity Aj of other neurons j (Fig. 1D,E). For the lateral interactions, wij describes the synaptic weight between neurons, and n is the number of nodes. The model also includes leakage (−u), with decay time constant τ, effectively setting how fast activity can rise or fall, a constant u0 describing the initial state (here set to 0), and noise, which varies at each time step (random walk), where η is a normally distributed random variable η = N(0,1), whose amplitude is modulated by aη (for details, see Trappenberg et al., 2001; Satel et al., 2011). The key difference between this model and models like ALIGATER is that target stimuli elicit, within the same population of interconnected neurons, a dual input signal causing a rapid and transient exogenous (automatic) rise in activity, followed by a sustained endogenous (selective) rise, which together make the accumulation to threshold highly nonlinear (Fig. 1E,F). We therefore refer to this class of model as “dual-input neural accumulation with selective and automatic rises” (DINASAUR).
Exogenous and endogenous inputs.
Endogenous and exogenous inputs are spatially extended with a Gaussian profile centered on the fixation, target, and/or distractor locations, as imposed by the experimental design. The main temporal parameters are the delays (δexo and δendo) between stimulus onset and the exogenous and endogenous signals and the temporal profile of these signals. Visual onsets translate each into a transient excitatory input Iexo with maximum intensity aexo at the center of stimulation and at t = tonset + δexo and decrease with time and distance. Visual onsets centered on a node j would affect a distant node i at any time t ≥ tonset + δexo according to the following equation: Endogenous signals were modeled as constants, with maximum intensity aendo at the desired location, with the transition from fixation to target happening with a delay δendo after target onset. For a desired location centered on node j, the endogenous excitatory input would expand laterally to an adjacent node i according to the following:
Spatial interaction within the map.
In the study by Trappenberg et al. (2001), the synaptic weights followed a “Mexican hat profile” (difference of Gaussians). This was changed to a simpler Gaussian with a negative (inhibitory) baseline (Fig. 1D) following subsequent recordings within the SC (Dorris et al., 2007). We follow this in our DINASAUR version, effectively assuming that the basic shape of short-distance excitation and long-distance inhibition remains fairly stable between macaque and humans: the synaptic weight between two neurons i and j within the SC can be described as follows: where Dij is the distance (in millimeters) of SC between i and j, and Act and Inh represent, respectively, the values of peak self-excitation and maximum long-range inhibition. The spatial parameters were not free (for how we set the free parameters to match behavioral data, see Results) and were chosen to be as close as possible to those in the model of Trappenberg, themselves chosen to be compatible with neurophysiological data. However, note that the numerical values reported here differ from those reported by Trappenberg et al. (2001), which were corrupted by a computational error (personal communication). We chose Act = 250, Inh = 345, and σ = 0.7 mm to stay as close as possible to the behavior proposed by Trappenberg but with a Gaussian profile instead of Mexican hat, as suggested by Dorris et al. (2007). Our stimuli appeared at 8° eccentricity, which corresponds to 1.82 mm of SC (Ottes et al., 1986). We used a one-dimensional array of n = 200 nodes, representing 2 × 5 mm of SC, so that Dij = (i − j) × 10/n.
We did not include an effect of visual offsets in the model. In Trappenberg's model, visual offsets were assumed to have an opposite effect to onsets but with a longer decay time, but for brief distractors this would create a reversed distractor effect, which does not occur in the data (Bompas and Sumner 2009a,b). Furthermore, subsequent behavioral evidence suggests that visual offsets do not produce distractor effects when the target is a visual onset (Boot et al., 2005; Hermens and Walker, 2010).
Implementation of the model and fixed parameters.
Differential equations were solved by a Matlab implementation of the explicit Runge–Kutta (4,5) pair of Dormand and Prince (ode45) with an integration step of 1 ms and a time constant τ of 10 ms. Decision threshold Th was fixed at an arbitrary value of 0.85, and the other parameters were scaled according to it. The amount of activity at the fixation node, at least for the 300 ms preceding the target, was entirely constrained by aendo, which was fixed at 10 (as in Trappenberg's model). This is because the transient caused by fixation onset rapidly vanishes and there is no effect of visual offsets in DINASAUR; therefore, aexo for fixation does not have any effect. aendo and aexo were both set to 0 for target and distractor locations during fixation. Via lateral excitation and inhibition, this combination resulted in a baseline activity of B = 0.07 for the target and distractor locations and 65 for fixation location, toward the end of the fixation period.
The basic dip phenomenon
In all our experimental data, we found clear dips in the latency distributions following distractors that appeared after the target. The latency distributions for baseline and distractor conditions in experiment 1 can be seen in Figure 2 (first column). The distractor stimuli leave the distribution entirely undisturbed until a precise time point (T0, marked in blue). From this point, a growing proportion of saccades are disturbed, up to some maximum (TM, marked in red). This is followed by a period during which the distractor condition rises above the no-distractor condition.
Dips started on average 67 ms after distractor onset and reached their maximum 91 ms after distractor onset, the range of both values being ±10 ms across observers and conditions. These values are highly consistent with those reported by Reingold and Stampe (2002) and by Buonocore and McIntocsh (2008), who found (in group means of 10–14 participants) dip maximums from 86 to 101 ms and onsets (or times of 50% maximum) from 66 to 70 ms across several tasks. The timing is also consistent with measures obtained in another paradigm for the shortest times needed for saccade plans to be changed by new visual information (Ludwig et al., 2007).
Amplitude of dips (see Materials and Methods) varied greatly between conditions and participants, covering a range from no measurable dip up to a ratio of 99%. Although there were some consistent differences between participants, the amplitude of dips was mainly affected by SOA, with the average amplitude across participants decreasing from 90 to 74, 63, 53, and 48% for SOA = 0 to 80 ms. We will return to this point later in Results.
A proportion of the “missing” saccades during dips are accounted for by erroneous saccades toward the distractor rather than delayed saccades to the target (Figs. 2, 3). Observers made directional errors on 2% of the trials on average in the no-distractor condition and on 18, 9, 4, 3, and 2% of the trials for SOA 0 to 80 ms, respectively (strong individual differences were observed). Note that directional errors fully accounted for any difference in the area under the baseline and distractor distributions, because the number of trials in which no saccade was performed (correct or incorrect) was extremely low and did not increase in the presence of a distractor.
Before we present the behavioral data in more detail, we examine whether the two types of model [ALIGATER, the extension of LATER, and DINASAUR, the neural field model based on Trappenberg et al. (2001)] are able to simulate the basic shape of this phenomenon at all.
Modeling with ALIGATER
The second column of Figure 2 illustrates our best attempt with ALIGATER to match the latency distribution for observer 1 in the no-distractor condition and produce dips at SOA = 40 ms. As can be seen on the top row, the model can produce some distractor effect for distractors that are simultaneous with or 20 ms after the target, for which saccadic inhibition appears as an overall shift, and there is not expected to be a dip within the distribution (note that our algorithm still detects a dip in these early distractor conditions, hence the blue and red dots; see also Fig. 1E). However, for late distractors, the model entirely failed to produce the clear and temporally constrained dips characteristic of saccadic inhibition.
To obtain the latency distributions presented on Figure 2, we first fitted the no-distractor distribution for observer 1 to determine the rise rate μ and the noise aη for the main LATER units and the noise aηe for the independent LATER unit responsible for express saccades. This was done using SPIC (Carpenter, 1994), because the no-distractor condition behaves exactly like a LATER unit. Although SPIC offers the possibility to also fit a global delay term (δ = δvis + δout), best fit always occurred for δ = 0, which is not realistic. Therefore, rather than fitting δ, we used δ = 60 ms, in agreement with Carpenter et al. (2009). The best fit using Kolmogorov–Smirnov statistics (D = 0.031, p = 0.275) suggested the following values: μ = 12.1 s−1, aη = 4.09 s−1, aηe = 7.44 s−1 (assuming a baseline u0 = 0 and a threshold of 1), which we used in our model for both target and distractor (μ and aηe were set to 0 for distractors in the no-distractor condition), and we split the 60 ms delay into δvis = 40 ms and δout = 20 ms, this later number being quite consensual in the literature (Smit and van Gisbergen, 1989; Munoz and Wurtz, 1993).
We then explored extensively the free-parameter space in search of a combination that would produce a dip at SOA = 40 ms. Our free parameters were the mutual inhibition w, the endogenous inhibition strength aendo, and delay δendo. Note that, in the distractor condition, the nonlinearity in ALIGATER (and DINASAUR) makes it unpractical to precisely fit the parameters of the models to a given behavioral dataset. Instead, we used a brute force optimization technique and ran simulations at SOA = 40 ms for every combination of w (5–40 by steps of 5 s−1), aendo (5–40 by steps of 5 s−1), and δendo (50–90 by steps of 10 ms) to find the combination that maximized the distraction ratio (see Materials and Methods) calculated on a 40 ms bin between 110 and 150 ms (corresponding to when a dip is observed behaviorally at SOA = 40 ms). Although the behavioral distraction ratio for this time bin was 50% for observer 1 at SOA = 40 ms, no combination of parameters in the model was able to give a ratio higher than 10% and none had the shape of a sharp dip. The “best” combination, illustrated in Figure 2 (middle column), was for w = 10, aendo = 20, δendo = 80 ms.
The reason for the failure of ALIGATER is clearly illustrated in Figure 1B: the activity of a late distractor is too strongly inhibited by the already rising target activity to have any great effect. The only way to overcome this problem would be to increase the rise rate of the distractor, but this is not justifiable within the framework of the model, because rise rates in such models are fully constrained by saccade latencies and thus cannot be stronger for distractors than targets unless we would expect shorter saccade latencies for the distractor stimuli if they were targets, and this is not the case for the stimuli we use (Bompas and Sumner, 2009b). Note that the ALIGATER model would be able to produce a larger distractor effect at SOA = 0 ms with stronger mutual inhibition, but this would reduce even further any effect of late distractors. Note also that the fit proposed by SPIC for the express mode was quite poor, and the fitted parameters, when used in ALIGATER, produced too few express saccades and errors (Fig. 2, middle column). Both can be corrected by doubling aηe, but this still failed to produce dips (and ratio between 110 and 150 remained under 9%), although allowing the rise rate of express signals to vary freely did not produce dips either.
Modeling with a neuronal field
Conversely, dips are predicted by DINASAUR (Fig. 2, right column), the model based on the Neurophysiologically inspired neural field model of Trappenberg et al. (2001). Without the need to change the architecture of the model, we find good simulation of nearly all our human behavior, as elaborated below. Figure 1E presents an example of neural activity profiles in a noise-free condition for DINASAUR. We can see that the sharp rise in activity triggered by transient visual input allows the distractor to delay target activity significantly, even when distractor appears 40 ms after the target. Furthermore, the dips constrain the model parameters very well, as elaborated below, which was not possible with previous behavioral data, such as the mean distractor effect, or gap effect. This is because dips characterize the delay and strength of the exogenous input. Because the combination of exogenous and endogenous inputs determines the reaction time, constraining the exogenous inputs makes it much easier to constrain the parameters of the endogenous inputs.
Interestingly, the model also suggests aspects of saccadic inhibition that are not accessible in behavioral data alone. For instance, although from the behavioral data one might assume that the saccades missing from the dips recover in the post dip period, the model rather suggests that the delay varies from almost 0 up to 200 ms. This implies a continuous distribution of recovering saccades since the very beginning of the dip, so that many of the disrupted saccades from the start of the dip may already have recovered during the dip itself, effectively reducing the measured size of the dip. Consequently, we do not expect the crossover point to mark a clear dissociation between periods of disruption and recovery. Relatedly, the model predicts a decrease of dip amplitude with SOA, a trend that is indeed present in our behavioral data, because reduction of fixation activity with time reduces the inhibition on target activity and thus allows faster recovery of disturbed saccades. Because recovery is already happening during the dip, faster recovery reduces dip amplitude.
The following sections describe how we use our behavioral data to constrain the parameters of the model. Note that the highly nonlinear rise in DINASAUR makes it impossible to perform formal fitting. Instead, we adopt an iterative simulation-driven approach to constrain our free parameters in a sequential manner, to produce latency distributions similar to those of observer 1 (Fig. 2) at SOA = 40 ms.
Using dips to constrain exogenous signals (experiment 1)
Following the logic set out by Reingold and Stampe (2000, 2002), δexo was determined directly by the timing of the beginning of the dips (the first instance saccades are influenced by the distractors) minus the post-threshold motor output time δout (note that mutual inhibition is immediate in the Trappenberg model). In our data, saccadic inhibition is a remarkably reliable phenomenon. Dips occurred on every possible occasion, across observers and SOA conditions, in which the distractor occurred at an appropriate interval to allow a dip to be noticeable in the baseline distribution (Fig. 3). The timing of the dips is highly consistent across temporal conditions, illustrated by the strong linear relationship between distractor SOA and dip onset time (r2 = 1; T0 = 0.9 SOA + 72) or peak time (r2 = 0.98; TM = SOA + 91; Fig. 4, left). This relationship was also very clear for each observer individually (all r2 > 0.9).
The very high temporal consistency of dips allows us to average the different SOAs into a single distribution time locked from distractor onset rather than target onset (Reingold and Stampe, 2002). We do this for individual observers to best reveal any differences between them (Fig. 4, right). This analysis gave overall T0 = [66, 72, 58, 62 ms] and TM = [100, 104, 89, 90 ms] with amplitudes 84, 57, 69, and 59% for observers 1–4. The inter-observer consistency in timing for dips is much higher than that for saccades themselves, which always show large differences between observers even in the simplest settings (e.g., compare the baseline distributions of the four observers in Fig. 3; mean latency was 151, 126, 163, and 115 ms, respectively). The highly similar temporal properties of the dip across observers is consistent with Reingold and Stampe (2004), who found that a clear dip with similar timing occurred even when saccades from 50 individuals were combined into one distribution.
Such temporal consistency across conditions and observers make the dips ideal for constraining the temporal properties of DINASAUR. δexo is given directly by dip onset minus motor output time (δout = 20 ms) plus our smoothing factor (4 ms; see Materials and Methods), giving values of 50, 56, 42, and 46 ms across observers. These timings are consistent with those envisaged in the discussion of Reingold and Stampe (2002) but are significantly shorter than the values generally chosen in neural field models. For example, Trappenberg et al. (2001), Cutsuridis et al. (2007), and Meeter et al. (2010) all used δexo = 70 ms.
Once δexo is fixed, the other parameter of Iexo, aexo, can be adjusted to account for individual difference in the amplitude of dips. More details about the effect of modulating δexo and aexo will be given below (see Linking the exogenous signal to stimulus properties), which demonstrates how these parameters are systematically modulated by stimulus properties.
Once the exogenous signal is constrained using the dips, it becomes easier to constrain the parameters of the endogenous signal Iendo (aendo and δendo) to capture the large differences in individual's latency distributions. These parameters were adjusted iteratively, because their effects are interdependent. Figure 5 illustrates the consequences of varying aendo and δendo independently. Increasing aendo from 12, 14, to 16 (while δendo = 75 ms; Fig. 5, three top lines) decreases the mean latency [174, 151, 137 ms] and spread [standard deviation = 69, 48, 34 ms] of the latency distribution. Importantly, increasing aendo does not affect the dip onset at all [T0 = 64, 64, 64 ms] but reduces the amplitude of dips [77, 72, 65%] (and, consequently, the time of maximum dip [TM = 96, 92, 87 ms]) and the error rate [0.45, 0.25, 0%], because the distractor activity is more strongly inhibited by the target activity. Activity at fixation also receives more inhibition from the target and therefore applies less inhibition on the target in return, resulting in faster recovery time of disturbed saccades, also participating in reducing the size of the dip.
Increasing δendo from 75 to 85 ms (while aendo = 14; Fig. 5, bottom line) has similar effect to decreasing aendo: it increases mean reaction time [151, 175 ms], SD [48, 53 ms], error rate [0.25, 0.59%], the amplitude of dips [72, 89%], and (to a minor extent), the time of maximum dip [TM = 92, 96 ms] but leaves the onset time of dips unchanged [T0 = 64, 64 ms]. However, crucially, δendo also has a specific effect: it decreases the overlap between the two modes (express and main modes), making the express mode much more apparent, which was not the case when modulating aendo. Thus, decreases in aendo and increases in δendo are not purely interchangeable, providing we have the ability to constrain each.
Such variations in the strength or the timing of endogenous signals are likely to occur, for instance, when comparing an overlap condition (in which the fixation stays after target appearance) with a step condition (in which fixation offset coincides with target onset). Our conclusions therefore appear very consistent with the finding of Reingold and Stampe (2002) that dip amplitude is larger in the overlap condition.
The simulated distributions on Figure 2 (right column) were obtained with the following numerical values, chosen to best match the performance of observer 1: δexo = 50 ms, aexo = 80 for target and distractor, δendo = 75 ms, aendo = 14 for target and 10 for fixation, aη = 50 (note that these values are scaled in reference to the threshold, which is arbitrarily chosen). When used in our model, these values produce reasonable simulation of both the dip durations and the post-dip recovery periods without any additional fitting. The behavior of the model is fairly stable around these numerical values and can be systematically adapted to match individual performance and suit the specificities of the experimental design as well as the stimuli used (see below).
Linking the exogenous signal to stimulus properties
Capturing the dips of saccadic inhibition in DINASAUR makes the strong prediction that they are a product of the same exogenous signal that contributes to all normal stimulus-driven saccade planning. If this is the case, then they should be modulated by the stimulus properties of the distractors in the same way as saccades are when directed toward such stimuli. Stimulus contrast (of targets) is known to lawfully affect saccade latency, which can be captured in the model by lawful concurrent modulation of the latency, δexo, and amplitude, aexo, of the exogenous signal. This makes a strong and unambiguous prediction for the latency and amplitude of dips when distractor contrast is modulated. The top right panel of Figure 6 presents the distractor-to-saccade latency distribution when low-contrast (δexo = 65, aexo = 50) and high-contrast (δexo = 50, aexo = 80) distractors are simulated by the model. This prediction was borne out in the reanalysis of our previous data (Bompas and Sumner, 2009b): increasing the contrast of distractors from 8 to 91% produced dips of larger amplitude (from 27 to 67%) and of shorter timing (from 81 to 60 ms for T0; from 129 to 89 ms for TM) (Fig. 6, left panels). Crucially, this modulation was linearly related to mean saccade latency to these stimuli when used as targets (Fig. 6, bottom right; amplitude, r2 = 0.93, p = 0.0005; T0: r2 = 0.68, p = 0.022; TM, r2 = 0.95, p = 0.0002). Note that the results are consistent with those of Reingold and Stampe (2004), who found that reducing the salience of large field irrelevant stimuli (displacement of a field of text) also delayed and reduced the dips, but in that paradigm, saccade latency toward the distracting stimuli could not be measured for comparison.
Chromaticity also affects saccade latency and thus is predicted to affect dips in the same way. Saccades to stimuli visible only to the chromatic S-cone channel are 20–40 ms slower than to luminance stimuli matched in salience or detectability (Bompas and Sumner, 2008), and it has been reported recently that signals from isoluminant chromatic stimuli arrive in SC 15–30 ms later than luminance signals, although these signals do not appear to differ in size (White et al., 2009; White and Munoz, 2011). Thus, the model predicts that chromatic signals would produce delayed dips but of an equivalent amplitude to those of luminance distractors (Fig. 7). In agreement with this, by reanalyzing data from Bompas and Sumner (2009a), we observe that stimuli restricted to a chromatic pathway produce dips that occur later in time (T0 = 95 ms, TM = 126 ms) than those for salience-matched achromatic distractors (T0 = 70 ms, TM = 116 ms) but with no evidence for reduced amplitude (53 vs 56%; Fig. 7). These results also have implications for which sensory pathways are required to produce saccadic inhibition, and we return to this issue in Discussion.
Short-range spatial facilitation and refractory periods
In a final experiment, we tested the effect of distractor location: ipsilateral versus contralateral. If dips represent a nonspecific inhibition of saccadic execution to allow processing of new stimuli, by activation of fixation neurons, for example (the “extended fixation zone hypothesis”; for discussion, see Walker et al., 1997; Reingold and Stampe, 2002), they should be expected to occur regardless of distractor location. Consistent with this, Reingold and Stampe (2003, 2004) found dips for stimuli both ipsilateral and contralateral to the upcoming saccade in a reading, whereas for reflexive saccades, Edelman and Xu (2009) found dips for ipsilateral distractors 22.5° away from the target vector. However, in these studies, the size of the dips was smaller for ipsilateral stimuli, at least for small distractors, indicating some degree of spatial tuning. Conversely, the short-range facilitation in the model of Trappenberg actually predicts that distractors near to the target should produce an “inverted dip,” i.e., an increase rather than a decrease in saccade frequency with similar temporal properties to dips (Fig. 8, left). Exactly such reversed dips are present in the data of Edelman and Xu (2009) for memory-guided saccades, which represent the only data for which distractors were presented exactly at the saccade goal location.
Using 2400 saccades per condition per observer, we found virtually no effect of late distractors appearing at the location of the target (Fig. 8; a numerical advantage in mean reaction time of only 1 ms on average), whereas for contralateral distractors the timing of the dip onset and peak was highly consistent with those found in experiment 1 (T0 = [60, 62, 64] and TM = [96, 102, 92] ms for observers 1–3). The precise overlap of the saccade distributions with and without ipsilateral distractors has two important implications. First, the clear absence of any dip contrasts with the clear dips for contralateral distractors, confirming that dips are spatially selective (Edelman and Xu, 2009) and suggesting that they are the product of competition on a neuronal map, as modeled by DINASAUR. Second, the clear absence of an inverted dip contrasts with the predictions of local facilitation in the model but is consistent with previous behavioral literature reporting that distractors do not affect the latency of visually triggered saccades to nearby targets (Walker et al., 1997). Our result suggests that the exogenous signals are subject to a refractory period of at least 60 ms that prevents a second transient from boosting the first when two onsets occur close in time and space (maybe sharing the same mechanism as that suggested to account for inhibition of return by Satel et al., 2011). Importantly, such an addition to the model would still predict inverted dips for memory-guided saccades (Edelman and Xu, 2009), because the distractor transient is separated in time from the previous stimulus at that location (which indicated the required saccade goal, whereas the saccade itself was initiated after a subsequent signal at fixation).
Express saccades and errors (all experiments)
Using a framework such as DINASAUR to model saccadic inhibition makes two additional predictions that can be tested using our data. The exogenous transient responsible for the dips is the same as that responsible for erroneous saccades toward the distractor, and its counterpart for the target stimulus is assumed to be the signal driving “express saccades,” the earliest stimulus-driven saccades humans can make. Therefore, we expect the latency distributions for errors and for express saccades to be similar to the timing and width of the dips. Although we did not use a paradigm optimally favorable to express saccades, in two of our observers (1 and 3), an express saccade mode was visible in their latency distributions (Figs. 3, 8; in Fig. 4, express saccades are smoothed away by averaging on distractor onset). The timing of this express mode (peaks ∼100 and 90 ms, respectively) corresponds well to the timing of these observers' dips (TM = 100 and 89 ms, respectively). Erroneous saccades to the distractors were also present in a small percentage of trials in all observers for at least some conditions. Similarly, the latencies of these errors tend to coincide with the dips (Fig. 3, thin black lines). However, although the model produces very few errors after the endogenous signal favoring the target starts, i.e., errors with latency longer than 95 ms, we do observe some “relatively slow” errors in all our observers. This could suggest that the endogenous signals are not fully reliable and sometimes start favoring the distractor location. An alternative possibility would be that temporal noise in the endogenous signal (which is not present in the model) could allow errors with long latency to occur on occasions when the endogenous signal is activated particularly late.
The tension between theoretically elegant models and more complex biologically inspired models has existed in many areas of neuroscience and psychology and will continue to drive future debate. On the one hand, elegant models of saccadic decision, such as LATER (Carpenter and Williams, 1995), BA (Ballistic Accumulation model, Brown and Heathcote, 2005) or LBA (Linear Ballistic Accumulation, Brown and Heathcote, 2008), have proved remarkably powerful given their parsimony, but they are acknowledged to be unable to capture many behavioral situations. At the other extreme, in more complex models, the very proliferation of parameters that makes them closer to biology also makes them potentially less useful. Many researchers believe that large advances could be made in basic science and clinical settings if we can gradually make the transition from simple abstract models to more complex “biological” models of brain mechanisms in humans. For example, developing the elegant “independent horse race” model for stopping behavior (Logan and Cowan, 1984) into models that better reflect neuronal activity during such behavior has led to an improved understanding of both the inhibitory mechanism and the main source of variability involved (Boucher et al., 2007; Wong-Lin et al., 2010). However, for complex models to become applied to human behavior as a standard practice requires three things beyond the neurophysiological knowledge that inspired the models in the first place. We need to know what critical features the model must contain to be biologically plausible (i.e., where the balance between parsimony and accurate biological description should be). We need to know how to constrain those parameters for humans, and we need to demonstrate that the model can do more than describe the behavior that was in mind when it was created. The data we report provide a step forward in all these regards.
The data confirm that the precise behavioral signature of visual stimuli on saccade plans, known as saccadic inhibition (Reingold and Stampe, 2002; Buonocore and McIntosh, 2008; Edelman and Xu, 2009), is remarkably consistent across conditions and observers and reflects an automatic transient that is a normal part of the saccadic movement planning process. The critical features a model must contain to capture this behavior are lateral inhibition and a nonlinear input: a fast transient automatic phase followed by a selective phase. Importantly, dual input to motor decision is not sufficient to produce dips if they are conceived as independent processes: the transient and the sustained inputs must be integrated and must mutually inhibit each other. Moreover, the dips tell us precisely the onset time of the fast phase and how it is modulated by stimulus features. This leads to the ability to also constrain the second phase, at the level of individual participants, which is critical for clinical application. Without the information from dips, there would be many combinations of these free parameters that would satisfactorily fit normal saccadic latency distributions, and thus previously these parameters had to be roughly estimated from neurophysiological measures or chosen intuitively (Trappenberg et al., 2001; Cutsuridis et al., 2007; Meeter et al., 2010). Most reassuringly, the behavior is predicted by a preexisting biologically inspired model that was not created with this behavior in mind. Reingold and Stampe (2000, 2002) had suggested that fast visual input to, and lateral inhibition within, the SC could potentially account for saccadic inhibition, but importantly, Trappenberg et al. (2001) were not aware of this behavioral phenomenon when they designed their model based on the SC, so the ability to model dips provides an independent validation.
Fast and slow inputs
The concept of separate fast transient and slow sustained signals has of course been present in perceptual and motor research for decades. It echoes the distinction between magnocellular and parvocellular, or between retinotectal and cortical, sensory pathways for vision. It also echoes the distinction between automatic (reflexive) and controlled (voluntary) motor behavior. For saccades, the distinction has been embodied in discussions of exogenously or endogenously driven saccades (Godijn and Theeuwes, 2002) and made explicit in models (DINASAUR models) mainly because of single-cell recordings during the antisaccade task, in which a saccade must be made in the opposite location to a stimulus, and cells in the SC or frontal eye fields (FEF) show a fast response to the visual stimulus and a slower response in the direction of the actual saccade (Munoz and Everling, 2004).
Although such dual-stage temporal profile is not necessary for some other paradigms, including the effect of distractors presented before or simultaneously with the target, the necessity for an initial fast rise becomes obvious for late distractors to have any effect at all: with a single approximately linear rise (like in ALIGATER, our version of LATER with lateral interaction and top-down selective control; see Fig. 1), the distractor activity would be too strongly inhibited by the already rising target activity to have any effect. In contrast, the separation of the input in DINASAUR models into an initial first fast rise followed by a slower phase gives more strength to the transient activity.
Thus, the dips tell us that the input to saccadic decision is temporally highly nonlinear, and we argue that progress in understanding motor competition and decision will be best achieved if this is explicitly taken into account. Furthermore, eye movements have become a popular tool for investigating the interplay between automaticity/impulsivity and control in clinical research (Leigh and Kennard, 2004; Antoniades et al., 2007; Yoshida et al., 2008; Ludwig et al., 2009; Smyrnis et al., 2009; Temel et al., 2009), and if models are to be maximally helpful in discriminating between different deficits, they need to reflect this key feature of healthy saccade activation and control.
Dips, express saccades, and errors
The fact that we explicitly hypothesize, via our choice of model (Trappenberg et al., 2001), that both target and distractor produce a fast transient signal makes the prediction that dips and express saccades are two manifestations of the same mechanism. On some trials, noise could allow the fast transient activity for the target to reach threshold, and this is what produces express saccades in the model. This idea contrasts clearly with that of a separate parallel process for express saccades in LATER (Carpenter and Williams, 1995). We note that, where express saccades were evident in our data, they showed highly similar temporal delay from the target as the dips from the distractor. In summary, dips, express saccades, and most errors all occur with a tight and similar temporal profile after the stimulus that triggered them.
However, contrary to express saccades or errors, dips do not require exogenous signals to reach threshold to be observed. Therefore, even if an observer's threshold is high, resulting in very few errors and express saccades, dips are still easily measurable, making them a more useful tool for studying automatic signals. Express saccades are often tricky to observe at all and to distinguish from the main mode. In some studies, early saccades can be accounted for simply by anticipation (Anderson and Carpenter, 2008), although in more favorable conditions, there seems little doubt that a distinct mode exists beyond noise or anticipation (which would be equally distributed between correct and incorrect responses). In the model, the separation between the two modes will be a function of the difference between the exogenous and endogenous delays (which can be adjusted to match any individual's data), whereas the size of the express mode depends on the strength of the exogenous signals and the initiation threshold.
Saccadic dead time
The dips provide a clear measure of what has been termed “saccadic dead time”: the shortest time needed for new visual information to influence a saccade plan and assumed to correspond to the sum of sensory input time and motor output time (beyond the point of no return). Ludwig et al. (2007) pointed out that, if the measure of dead time requires a different saccade to be produced after the new stimulus, then dead time contains some “decision processes” needed for the selection of the new saccade, and this accounted for the variations in dead time under different conditions. According to our framework (and see Reingold and Stampe, 2002), dip onset is not influenced by such selection processes, except for the minimal delay for lateral inhibition. However, in agreement with Ludwig et al. (2007), the rest of the dip shape, and thus the exact time of the peak, will be affected by decision processes, because they are influenced by the strength of lateral inhibition and by the endogenous signal (Fig. 5).
Current DINASAUR models explicitly refer to intermediate layers of the SC as the integration locus for exogenous and endogenous signals and lateral facilitation/inhibition. This is not to say that other areas (such as FEF or basal ganglia) do not also contribute to this role. For example, although it is clear that SC neurons show functional long- and short-range lateral interactions—activity in SC saccade neurons is enhanced or suppressed by visual distractors close or far from the saccade target (Dorris et al., 2007)—it remains unclear whether these interactions are embodied by direct neuronal connections within the SC alone.
Most researchers assume that the source of endogenous (“planned”) signals involves dorsolateral prefrontal cortex, supplementary eye field, and FEF (Everling and Munoz, 2000; Bichot and Schall, 2002; DeSouza et al., 2003; Everling et al., 2006). As for exogenous signals, DINASAUR predicts that the dips are caused by the same signal that causes other phenomena, such as the remote distractor effect and express saccades. Classically, such exogenous signals were thought to be conveyed by the retinotectal pathway (Posner and Cohen, 1980; Weiskrantz, 1986; Rafal et al., 1990), but more recent studies have shown distractor effects and express saccades with stimuli thought to be invisible to this pathway, and thus the exogenous signal may be transmitted via multiple pathways (Sumner et al., 2006; Bompas and Sumner, 2008, 2009a; Bompas et al., 2008). Here, we have seen that such stimuli, visible only to short wave cones (S-cones) and presented on a background of luminance noise, were perfectly able to produce dips of similar amplitude than those elicited by luminance stimuli matched in salience. However, such S-cone dips were delayed, consistent with previous reports that signals from isoluminant chromatic stimuli arrive in SC 15–30 ms later than luminance signals but are comparable in amplitude (White et al., 2009; White and Munoz, 2011).
We provide detailed behavioral measures that reveal the automatic input from visual stimuli into saccade planning, and demonstrate that fast transient activity is a critical feature with behavioral consequences for saccade models. Using an existing model, we show how our data constrain the temporal properties of endogenous and exogenous signals in human, thus providing the necessary complement to monkey neurophysiology. We argue that this offers a way forward in applying such biologically inspired models for human behavioral and clinical research.
The authors declare no competing financial interests.
The study was funded by BBSRC, ESRC, WICN, and BRACE. We thank Robin Walker, Thomas Trappenberg, and Frouke Hermens for their help on the modeling and comments on this manuscript, and Martynas Dervinis for collecting data.
- Correspondence should be addressed to Aline Bompas, Cardiff University Brain Research Imaging Centre, School of Psychology, Cardiff University, Tower Building, Park Place, CF103AT, Cardiff, UK.