Abstract
We studied the influences of competing visual and auditory stimuli on horizontal gaze shifts in humans. Gaze shifts were made to visual or auditory targets in the presence of either an irrelevant visual or auditory cue. Within an experiment, the target and irrelevant cue were either aligned (enhancer condition) or misaligned (distractor condition) in space. The times of presentation of the target and irrelevant cue were varied so that the target could have been presented before the irrelevant cue, or the irrelevant cue before the target. We compared subject performance in the enhancer and distractor conditions, measuring reaction latencies and the frequency of incorrect gaze shifts. Performance differed the most when the irrelevant cue was presented before the target and differed the least when the target was presented before the irrelevant cue. Our results reveal that, in addition to the spatial and temporal register of the stimuli, the experimental context in which the stimuli are presented also influences multisensory integration: an irrelevant auditory cue influenced gaze shifts to visual targets differently than an irrelevant visual cue influenced gaze shifts to auditory targets. Furthermore, we observed patterns of influence unique to either visual or auditory irrelevant cues that occurred regardless of the modality of the target. We believe that subjects adopted a state of motor readiness that reflected the unique demands of target selection in each experiment and that this state modulated the influences of the irrelevant cue on the target.
- human
- gaze shifts
- visual
- auditory
- target selection
- multisensory integration
- visual fixation
- motor readiness
Gaze shifts are coordinated movements of the eyes (eyes-re-head) and head (head-re-space) that rapidly reorient the visual axis (eyes-re-space) to a target of interest. Reaction latencies for gaze shifts to combined auditory and visual stimuli presented in close spatial and temporal register are less than those to either stimulus presented alone (Engelken et al., 1989; Perrott et al., 1990;Hughes et al., 1994; Nozawa et al., 1994; Frens et al., 1995; Goldring et al., 1996), suggesting that the integration of multisensory information may play an important role in forming appropriate motor behaviors (Stein and Meredith, 1993).
One explanation of the reductions in reaction latencies afforded by combining auditory and visual stimuli is based on statistical facilitation (Raab, 1962). Briefly, gaze shifts to combined auditory and visual stimuli are generated sooner because they can be driven by either the auditory stimulus or the visual stimulus, assuming that both stimuli are processed independently and that their reaction latency distributions overlap. Models using such statistical facilitation have been termed race models, because whichever of the two sensorimotor processing streams is completed first drives the gaze shift. Conceptually, the upper limit of facilitation predicted by race models (that is, the largest reduction in reaction latencies to combined auditory and visual stimuli) is given by the sum of the cumulative reaction latency distributions to the auditory stimulus and the visual stimulus alone (Miller, 1982).
Saccadic reaction latencies to combined auditory and visual stimuli are shorter than those predicted by race models (Hughes et al., 1994;Nozawa et al., 1994). These results infer the convergence of multimodal information at some locus or loci within the brain. However, in the majority of multimodal studies, the auditory and visual stimuli have the same behavioral significance as potential targets for the gaze shift (Engelken and Stevens, 1989; Perrott et al., 1990; Nozawa et al., 1994; Goldring et al., 1996) (for one exception, see Frens et al., 1995); however, in natural behavior, gaze shifts are made to specific targets in the presence of many other competing stimuli.
Our main goal was to study the importance of experimental context in a multisensory protocol by contrasting the effects of irrelevant auditory or visual cues on gaze shifts to visual or auditory targets, respectively. Subject performance is compared between an enhancer condition, in which the designated target and irrelevant cue are presented at the same point in space (Fig.1A), and a distractor condition, in which the stimuli are presented on opposite sides of the vertical meridian (Fig. 1B). To ascertain the temporal range over which the irrelevant cue can differentially influence gaze shifts made in the enhancer or distractor conditions, the relative times of presentation of the target and irrelevant cue are systematically varied. We also compare subject performance to that predicted by the upper limit of race models, given that this limit has been exceeded in previous studies of multimodal reaction latencies (Hughes et al., 1994;Nozawa et al., 1994).
Some of the results presented here have appeared in abstract form (Corneil and Munoz, 1994, 1996).
MATERIALS AND METHODS
Experimental setup. All paradigms were reviewed and approved by the Queen’s University Human Research Ethics Board. Three male subjects (ages 24, 27, and 35 years) and two female subjects (ages 25 and 37 years) were informed of the general nature of the study and consented to participate before the experiments were initiated. One subject (dm, an author of the paper) was well informed about the goals of the experiments, but his data were consistent with the other four subjects who were naive about the goals of the study. Subjects were seated in a straight-back chair in the center of a sound-attenuated, light-tight room and faced a translucent visual screen 100 cm in front of the eyes that subtended 70° of visual angle. The screen was diffusely illuminated (1.0 cd/m2) between trials to prevent dark adaptation. The experiments were performed in silence and darkness except for the presence of light-emitting diodes (LEDs) and/or noise bursts emitted from small speakers. The background lights were extinguished 250 msec before an LED, referred to as the fixation point (FP −2.0 cd/m2; CIE chromaticity coordinates: x = 0.78, y = 0.21), was back-projected onto the center of the screen signaling the start of a trial. The peripheral LEDs and speakers were mounted into small boxes placed just beyond the edges of the screen at 40° eccentricity at the same vertical height as the FP. The FP was illuminated for 1000 msec, and was then extinguished for 200 msec, during which the subjects were in complete darkness before presentation of peripheral stimuli (Fig. 1A,B). Peripheral stimuli consisted of a target (T) and an irrelevant cue (i) that were presented either 40° to the right or left of the FP on the horizontal meridian after the 200 msec gap. Subjects were instructed to first look at the central FP and then look to the peripheral target as quickly as possible. Such an emphasis on speed rather than accuracy has been shown to increase the incidence of incorrect gaze shifts to the irrelevant cue (Ottes et al., 1985; Munoz and Corneil, 1995). Subjects were free to adopt any combination of eye and head movements they desired to perform the gaze shift. Subjects were not given any specific instructions on how to behave with regard to the irrelevant cue, although subjects were informed that the irrelevant cue would be presented in each trial. In the enhancer condition (Fig.1A), the target and irrelevant cue were presented at the same point in space. In the distractor condition (Fig.1B), the target and irrelevant cue were presented on opposite sides of the vertical meridian.
In all experiments, subjects were instructed to look at either a visual stimulus (red LED; CIE: x = 0.73, y = 0.26) or a broad-band auditory stimulus. These target stimuli were presented for a period of 1000 msec. In preliminary experiments, in which only one of the stimuli was presented after the 200 msec gap period, we systematically varied the intensity of the visual target from 0.10 to 4.7 cd/m2 and the auditory target from 44 to 88 dB at 4 kHz. Reaction latencies for gaze shifts were reduced to a minimum as intensity was increased to 0.7 cd/m2 or 70 dB. For all experiments described here, the intensity of the red LED visual stimulus and the broad-band auditory stimulus was fixed at 4.7 cd/m2 and 74 dB at 4 kHz, respectively.
Experimental paradigms. Subjects were required to perform a series of six experiments. In the first two control experiments, only one of the stimuli (either the red LED or the broad-band noise burst) was presented as the target, and no other competing stimuli were presented. These experiments were performed to determine single-target control reaction latencies for visually guided and aurally guided gaze shifts. Trials were run in blocks, in which the target was presented randomly to either the right or the left side using the above described experimental setup.
In the remaining four experiments, we used a separate combination of modalities for the target and the irrelevant cue (Table1). In these experiments, there was always a period of 200 msec of no stimuli from the time of central FP disappearance until the onset of the first presented peripheral stimulus (Fig.1A,B). The intervening period of darkness between FP offset and target onset has been shown to increase the incidence of incorrect gaze shifts to the irrelevant cue in the distractor condition compared to when the central FP remains illuminated during target presentation (Munoz and Corneil, 1995). The relative presentation times of the target and irrelevant cue were randomly varied within each experiment. For convention, the code T100i means that the target T was presented 100 msec before the irrelevant cue i (Fig. 1A). Conversely, the code i100T means that the irrelevant cue was presented 100 msec before the target (Fig.1B).
In the first multiple-target experimental paradigm, the red LED (4.7 cd/m2; CIE: x = 0.73, y = 0.26) was used as the target and the broad-band auditory stimulus (74 dB at 4 kHz) as the irrelevant cue. In the second paradigm, the broad-band auditory stimulus was used as the target and the red LED as the irrelevant cue. In the third paradigm, the red LED was used as the target and a yellow LED (18 cd/m2; CIE: x = 0.54, y = 0.46) as the irrelevant cue. In the fourth experiment, the broad-band auditory stimulus was used as the target and a pure tone auditory stimulus (75 dB, 2 kHz) was used as the irrelevant cue. In each paradigm, a set of 10 temporal asynchronies was introduced between the presentation of the two stimuli (Table 1). The asynchronies were selected to surround the range that would approximate central arrival of the target and the irrelevant cue (boldface asynchronies in Table 1) after accounting for the appropriate sensory transduction and conduction delays, estimated at ∼50 msec for visual information (Gouras, 1967) and 2–10 msec for auditory information (Kraus and McGee, 1992). Different experiments were run on different days. Subjects were given sufficient practice before each new experimental session, and recording was begun after the subject reported being comfortable with the new task. All variations within an experiment (target at 40° left or right; enhancer or distractor condition; 10 temporal asynchronies) were each randomly interleaved in a block of 200 trials by a 486 computer that controlled the experiment at a rate of 1000 Hz, and all variations were presented an equal number of times within a single block of trials. Subjects completed six blocks of trials of 10–15 min each over a period of 2 d for each experiment, with intervening breaks between blocks to maintain subject alertness.
Data collection and analysis. Horizontal eye movements were measured using bitemporal DC electro-oculography (EOG) and were filtered and amplified with a Grass P18 DC preamplifier. Horizontal head rotation was measured by having subjects wear a hockey helmet attached to a low-torque potentiometer that was then fitted to a shaft anchored to the ceiling. The potentiometer signal was first calibrated to known angles of rotation. Subjects were then asked to maintain fixation upon the central FP and deviate their heads to the right and left. The gain of the EOG signal was adjusted to be equal and opposite to that of the potentiometer signal.
Horizontal eye and head position signals were filtered (50 Hz, low pass), amplified, and digitized at a rate of 500 Hz. Digitized data were stored on a hard disk, and subsequent off-line analysis was performed on a Sun Sparc 2 workstation. Horizontal gaze position (eye position in space) was reconstructed off-line by adding the calibrated eye and head position signals together. Gaze shifts were scored as correct if directed toward the target and incorrect if initially directed away from the target. Reaction latencies were measured from the time of target onset to the onset of the gaze shift (indicated when the gaze velocity exceeded 50°/sec), as derived from the gaze position trace by applying a finite impulse response filter. After confirmation that results of individual subjects to the right and the left were statistically similar (Student’s t test,p > 0.05), responses obtained for gaze shifts in both directions were pooled. Mean reaction latencies were computed from trials with reaction latencies between 100 msec after the first presented stimulus and 500 msec after the second presented stimulus. Note that negative reaction latencies may be recorded at certain temporal asynchronies, given that reaction latencies were measured relative to the time of target onset (e.g., at the i200T asynchrony, if the subject reacts 150 msec after the irrelevant cue, we would record a reaction latency of −50 msec relative to the time of target onset). Gaze shifts were classified as anticipatory and were excluded from the analysis if they were initiated <100 msec after the onset of the first presented stimulus. This anticipatory cutoff was obtained in a previous series of experiments in our laboratory in which subjects were instructed to anticipate the appearance of a visual or auditory target at 40° eccentricity. Movements that were begun <100 msec after onset of a single stimulus were correct ∼50% of the time, whereas movements initiated >100 msec after stimulus onset were correct >95% of the time. Gaze shifts with reaction times > 500 msec after the onset of the second presented stimulus were also excluded because of presumed lack of subject alertness. Differences in mean reaction latency for correct gaze shifts in enhancer and distractor conditions were calculated at each asynchrony. Occasionally, there were very few correct gaze shifts in the distractor condition at a given asynchrony. A minimum of five correct gaze shifts from the distractor condition were required at a given asynchrony for the mean reaction latency difference to be calculated.
Calculations of upper limit of race model predictions. A race model predicts that the upper limit of facilitation afforded by paired stimuli is given by the sum of the cumulative reaction latency distributions for gaze shifts to each stimulus alone (Miller, 1982). This upper limit of the race model was used to predict the performance of subjects given optimal statistical facilitation. For each subject, the cumulative reaction latency distribution to the target (Fig.2A) was generated from the reaction latencies obtained at the extreme target-leading-cue asynchrony (see Table 1), because at this asynchrony the irrelevant cue was delayed too long after the target to influence the gaze shift. The cumulative reaction latency distribution for gaze shifts to the irrelevant cue (Fig. 2B) was obtained in different ways in multimodal and unimodal experiments. For the multimodal experiments, the reaction latencies for the irrelevant cue were obtained using data from the converse experiment in which the irrelevant cue served as the target, again from the appropriate extreme target-leading-cue asynchrony. For example, to construct the distribution for an irrelevant auditory cue in the visual target/irrelevant auditory cue experiment, data from the extreme target-leading-cue asynchrony in the auditory target/irrelevant visual cue experiment was used. For the unimodal experiments, the reaction latencies for incorrect gaze shifts at the most extreme cue-leading-target asynchronies were used to construct the cumulative reaction latency distributions for the irrelevant cue. An alternative method of using the reaction latencies from the single target control experiments to construct the cumulative reaction latency distributions for the target and irrelevant cue produced similar predictions of the race model.
The cumulative reaction latency distribution for the irrelevant cue was shifted relative to the distribution of the target for each temporal asynchrony tested (Fig. 2B). For example, the distribution for the irrelevant cue was shifted 40 msec earlier relative to the distribution for the target at the i40T asynchrony (Fig. 2C) and 40 msec later at the T40i asynchrony (Fig.2D). The shifted distribution for the irrelevant cue was then added to the distribution for the target to obtain a summed cumulative reaction latency distribution unique for each temporal asynchrony (solid lines in Fig. 2C,D). Two measures were calculated from each of 10 of the summed cumulative reaction latency distributions (corresponding to all 10 temporal asynchronies). First, the predicted percentage of incorrect gaze shifts was calculated as the percentage of gaze shifts in the summed distribution that were driven to the irrelevant cue (see Fig.2C). Second, the predicted difference in mean reaction latencies in the enhancer and distractor condition was obtained as follows (refer to Fig. 2D during this explanation). For the distractor condition, the predicted mean reaction latency for correct gaze shifts was obtained from the cumulative reaction latency distribution for the target, because correct gaze shifts must be driven to the target in this condition. For the enhancer condition, the predicted mean reaction latency for correct gaze shifts was obtained from the summed cumulative reaction latency distribution, because correct gaze shifts could be driven to either the target or the irrelevant cue in this condition. The predicted mean reaction latency difference was then calculated by subtracting the predicted mean reaction latency in the enhancer condition from that in the distractor condition.
RESULTS
Visual target/irrelevant auditory cue
The time of presentation of the irrelevant auditory cue relative to the visual target determined whether subject performance differed in the enhancer and distractor conditions. The earlier the irrelevant auditory cue was presented relative to the onset of the visual target, the higher the percentage of incorrect gaze shifts in the distractor condition (Fig. 3A–C). No incorrect gaze shifts were generated when the visual target was presented well before the irrelevant auditory cue (Fig. 3D). Figure 4 displays the reaction latency histograms for correct and incorrect gaze shifts in the enhancer and distractor conditions for the same four temporal asynchronies shown in Figure 3. When the irrelevant auditory cue was presented well before the visual target, a majority of gaze shifts were directed to the irrelevant auditory cue, leading to a high incidence of incorrect gaze shifts in the distractor condition and reaction latencies in the enhancer condition that precluded visual responses (Fig. 4A). In contrast, when the visual target was presented well before the irrelevant auditory cue, correct gaze shifts were initiated at nearly the same time in the enhancer and distractor conditions (Fig.4D). A transition between the performance at these two extremes occurred when the irrelevant auditory cue was presented at around the same time as the visual target: as the irrelevant auditory cue exerted a greater influence on subject performance, the incidence of incorrect gaze shifts in the distractor condition and the difference between reaction latencies for correct gaze shifts in the enhancer and distractor conditions increased (Fig.4B,C).
A full analysis of the performance of a single subject in the visual target/irrelevant auditory cue experiment is shown in Figure5. The percentage of incorrect gaze shifts in the distractor condition at each asynchrony generates the incorrect curve (Fig. 5A). This subject generated progressively fewer incorrect gaze shifts in the distractor condition as the presentation of the irrelevant auditory cue was delayed relative to the visual target. No incorrect gaze shifts were generated when the visual target was presented at least 60 msec before the irrelevant auditory cue. The mean reaction latencies for correct gaze shifts in the enhancer and distractor conditions are shown in Figure 5B. For this subject, these reaction latencies were more variable in the enhancer condition (63–167 msec) than in the distractor condition (158–178 msec). Furthermore, reaction latencies in the enhancer and distractor conditions were about equal when the visual target was presented at least 80 msec before the irrelevant auditory cue. The differences between mean reaction latencies in the enhancer and distractor conditions at each asynchrony were used to construct thereaction latency difference curve (solid line in Fig. 5B). The shape of the reaction latency difference curve (Fig. 5B) was similar to the shape of the incorrect curve (Fig. 5A), in that the magnitudes of both measurements tended to be larger the earlier the irrelevant auditory cue was presented relative to the visual target.
The percentage of incorrect gaze shifts and the reaction latency differences predicted by the upper limit of race models for the same subject are shown in Figure 5, C and D, respectively (dashed lines). At each asynchrony, the predicted value of each parameter (dashed line) is subtracted from the observed value (dotted line) to construct a race comparison curve (solid line). Positive values for the race comparison curves mean that the observed values were greater than predicted by the upper limit of a race model; negative values imply that the observed index did not exceed the upper limit of race model predictions. The observed curves were compared to the predicted curves using a Mann–Whitney Rank Sum test (p < 0.05). Although this subject generated slightly fewer incorrect gaze shifts than predicted by the upper limit of a race model (solid line in Fig. 5C) and had larger reaction latency differences than predicted by the upper limit of a race model (solid line in Fig. 5D), neither of the observed curves differed significantly from the predicted curves.
The observed incorrect curves (Fig.6A) and reaction latency difference curves (Fig. 6B) for all five subjects and the sample mean show that both measurements were larger than zero when the irrelevant auditory cue was presented before the visual target, and converged progressively to zero when the visual target was presented ∼60 msec before the irrelevant auditory cue. On average, subject performance differed in the enhancer and distractor conditions if the irrelevant auditory cue was presented up to 60 msec after the visual target; auditory information delayed >60 msec after the visual target did not differentially influence gaze shifts in the enhancer and distractor conditions. Figure 6, C and D, shows that the observed results did not differ drastically from the performance predicted by the upper limit of a race model. All subjects generated slightly larger, albeit nonsignificant, reaction latency differences than predicted by the upper limit of a race model (Fig.6D) and generated slightly fewer incorrect gaze shifts than predicted by the upper limit of a race model (Fig.6C).
Auditory target/irrelevant visual cue
The preceding analysis was repeated for the experiment in which the auditory stimulus served as the target and the visual stimulus as the irrelevant cue. Subject performance in this experiment was very different than in the converse visual target/irrelevant auditory cue experiment. All subjects generated fewer incorrect gaze shifts to the irrelevant visual cue than to the irrelevant auditory cue (compare Fig.7A with 6A). For all subjects, the observed number of incorrect gaze shifts were significantly fewer than predicted by the upper limit of a race model (Fig. 7C). Correct gaze shifts in the enhancer condition were initiated on average 37–52 msec sooner than in the distractor condition for all asynchronies except T100i (Fig. 7B); the observed differences were significantly lower than those predicted by the upper limit of a race model for three of the five subjects (Fig.7D). In summary, the influence of a suprathreshold visual stimulus on aurally guided gaze shifts was not the same as the influence of a suprathreshold auditory stimulus on visually guided gaze shifts, and subject performance fell far short of that predicted by the upper limit of a race model.
Visual target/irrelevant visual cue
The results from the multimodal experiments demonstrated clear differences in the influences of auditory and visual information on gaze shifts to targets of the other modality. To determine whether these influences were attributable to the modality of either the target or the irrelevant cue, we ran the subjects in unimodal multiple target experiments to determine the influences of irrelevant visual and auditory cues on gaze shifts to targets of the same modality.
The group results from the visual target/irrelevant visual cue experiment are shown in Figure 8. An irrelevant visual cue was able to induce a high number of incorrect gaze shifts (Fig.8A) and large differences in reaction latencies in the enhancer and distractor conditions when presented well before the visual target (Fig. 8B). At the extreme i200T asynchrony, the large incidence of incorrect gaze shifts agreed moderately well with that predicted by the upper limit of a race model (Fig. 8C), although the observed reaction latency differences were not as large as expected (Fig. 8D). When the irrelevant visual cue was presented at around the same time as the visual target (around T0i), the observed number of incorrect gaze shifts fell far short of that predicted by the upper limit of a race model (Fig. 8C). The influence of the irrelevant visual cue persisted even when delayed up to 100 msec after the presentation of the visual target, in accordance with the upper limits of a race model (Fig. 8C,D). Four out of five subjects generated significantly fewer incorrect gaze shifts than predicted (Fig.8C), and the reaction latency differences were significantly smaller in three of the five subjects (Fig. 8D).
Auditory target/irrelevant auditory cue
The group results from the auditory target/irrelevant auditory cue experiment are shown in Figure 9. Four out of the five subjects generated significantly fewer incorrect gaze shifts to the irrelevant auditory cue than predicted by the upper limit of a race model (Fig. 9A,C). Furthermore, the reaction latency differences that were observed when the irrelevant auditory cue was presented well before the target were significantly smaller than those predicted by the upper limit of a race model in three of the five subjects (Fig.9B,D). When the irrelevant auditory cue was presented at around the same time as or slightly after the auditory target, the observed reaction latency differences concurred well with the predictions of the upper limit of a race model (Fig.9B,D).
Comparison of experiments
There was a strong correlation between the incidence of incorrect gaze shifts and the reaction latency differences in all experiments (Fig. 10). A linear regression analysis through all 40 data points in Figure 10 produced a line with a slope of 0.59 and a correlation value of 0.94 (p < 0.01). Thus, for every 2 msec difference between reaction latencies in the enhancer and distractor conditions, subjects generated incorrect gaze shifts ∼1% more frequently.
The various curves of the sample means for the five subjects from each experiment are contrasted in Figure 11 and reveal some consistent trends. Subjects tended to make more incorrect gaze shifts to an irrelevant cue at extreme cue-leading-target asynchronies in experiments with a visual target than in those with an auditory target, regardless of the modality of the irrelevant cue (Fig.11A). Furthermore, the observed number of incorrect gaze shifts were fewer than predicted by the upper limit of a race model (Fig. 11C). The reaction latency differences observed for correct gaze shifts in the enhancer and distractor conditions (Fig.11B) were usually far less than predicted by the upper limit of a race model (Fig. 11D); only in the visual target/irrelevant auditory cue experiment did the reaction latency differences observed at any cue-leading-target asynchronies exceed the upper limit of race model predictions.
Reaction latencies from the extreme target-leading-cue asynchronies, in what is essentially a single target in the unimodal and multimodal experiments, can be contrasted with reaction latencies in the single-target control experiment in which no irrelevant cue was presented in a block of trials. These are shown for all subjects for gaze shifts to visual and auditory targets in Figure12. There were consistent differences in these reaction latencies in all experiments, and the pattern of these differences was the same for gaze shifts to visual and auditory targets. All subjects adopted a strategy of reacting with longest latencies in unimodal experiments and shortest latencies in the single-target control experiments. Additionally, the sample means for all subjects were not significantly different for visually guided and aurally guided gaze shifts in single-target control, multimodal, and unimodal experiments (t test, p < 0.05). These observations have important implications about the behavioral strategies used by the subjects in the various experiments.
Reaction latency differences in the enhancer and distractor conditions were primarily attributable to reaction latency reductions in the enhancer condition. Reaction latency increases in the distractor condition could partially account for some of the reaction latency differences only at certain temporal asynchronies in experiments with an irrelevant visual cue. Figure 13 shows the single-subject and sample mean traces for the mean reaction latencies of correct gaze shifts in the enhancer (solid lines) and distractor (dashed lines) conditions. The arrowheads at the right of each graph denote the extreme target-leading-cue reaction latency. Using this extreme target-leading-cue reaction latency as a comparative level, facilitating processes in the enhancer condition tending to shorten reaction latencies were indicated by mean reaction latencies that lay below this level; inhibitory processes in the distractor condition that increased reaction latencies were indicated by mean reaction latencies lying above this level. There was a consistent trend of facilitation in the enhancer condition for all experiments at cue-leading-target asynchronies. Note, however, that in the distractor condition there were consistent increases in reaction latencies around T0i only for experiments using an irrelevant visual cue (Fig. 13B,C). Using the level defined by the extreme target-leading-cue reaction latency as a base, the area of the curve lying above and below the base was calculated between i100T and T100i to quantify the amount of inhibition or facilitation, respectively, in all experiments (Fig.14). A two-way ANOVA in the enhancer condition with the modality of the target and the irrelevant cue as the factors revealed that an irrelevant auditory cue was able to provide significantly more facilitation in the enhancer condition than an irrelevant visual cue (p < 0.05; open, inverted bars in Fig. 14). In the distractor condition, an irrelevant visual cue exerted a significantly larger inhibitory influence in the distractor condition than an irrelevant auditory cue (p < 0.05;filled bars in Fig. 14).
DISCUSSION
Our results emphasize that the processes involved in unimodal and multimodal target selection dampen the integration normally afforded by presenting stimuli in spatial and temporal register. We used the upper limit of race model predictions to illustrate the highest level of statistical facilitation that has been exceeded in previous unimodal and multimodal studies of paired stimuli (for review, see Townsend and Nozawa, 1995). Except for reaction latency differences in the visual target/irrelevant auditory cue experiment, all observed measurements were lower than those predicted by a race model (see Fig.11C,D). Clearly, the demands of target selection impose certain constraints on the time of gaze shift initiation.
Reaction latencies from the most extreme target-leading-cue asynchronies provided valuable approximations of strictly target-driven reaction latencies in different experimental conditions. The differences in these reaction latencies in unimodal, multimodal, and single-target control experiments demonstrated large and consistent effects of the requirements for target localization on reaction latencies in each experiment (see Fig. 12). Target localization was easiest in the single-target control experiments: subjects needed only to shift their gaze to the single target without the presence of any other competing stimuli. In the multimodal experiments, subjects needed to orient to the instructed target modality while suppressing movements to a cue of a different modality. No other physical features of the target modality were important other than its location. In unimodal experiments, subjects had to extract pertinent information from both stimuli in addition to their locations. The additional demands for target localization in the unimodal experiments increased mean reaction latencies when compared to the more simple processing required in the multimodal and single-target control experiments. Subjects therefore adopted, either consciously or unconsciously, a strategy in each experiment that reflected the difficulty of target localization. We believe that these strategies indicate the relative state of motor readiness achieved at target onset. The lower the state of motor readiness at target onset, the longer the time for target localization, hence the longer the time until the gaze shift is initiated.
One component of motor readiness is likely the state of visual fixation at the time of target onset, because the state of visual fixation influences the time to gaze shift initiation (for review, see Fischer and Weber, 1993). The gap effect, which is the reduction in reaction times observed when the central fixation point is extinguished before target onset, has been attributed to both the removal of the foveal visual stimulus and the alerting information provided by offset of the central fixation point warning of impending target presentation (Reuter-Lorenz et al., 1991, 1995; Kingstone and Klein, 1993). Interestingly, alerting information can be provided by the onset or offset of stimuli anywhere in the visual field (Ross and Ross, 1980,1981; Frens et al., 1995; Reuter-Lorenz et al., 1995). Thus, it is possible that onset of the irrelevant cue in our experiments aided in the disengagement of visual fixation. However, we have shown that onset of an irrelevant cue conveyed alerting information only when visual fixation was engaged on a visible central fixation point; no alerting information from the irrelevant cue was observed in the gap task (Munoz and Corneil, 1995). Alerting effects represent a spatially independent mechanism by which any stimulus, either aligned or misaligned with the target, can lower reaction latencies to a target. Because we have directly compared subject performance in the enhancer and distractor conditions, we can be sure that performance differences in the two conditions stemmed from the alignment or misalignment of the stimuli; spatially independent phenomena are subtracted out as they are presumably equal in both conditions.
Our results reveal important differences in the time of central processing of visual and auditory stimuli before gaze shift initiation, as well as a relationship between central processing time and the magnitude of the observed results. Given the shorter afferent delay for auditory information (2–10 msec; Kraus and McGee, 1992) versus visual information (∼50 msec; Gouras, 1967), and assuming equivalent efferent delays for visually guided and aurally guided gaze shifts, the equal reaction latencies for visual and auditory experiments requiring the same strategy for target discrimination (Fig. 12) show that the central processing time required for visually guided gaze shifts is ∼40 msec less than for aurally guided gaze shifts. Less central processing time for visual information leaves less time for complete target selection, leading to more incorrect gaze shifts in the distractor condition and correspondingly larger reaction latency differences (see correlation in Fig. 10) when compared to experiments using an auditory target.
Incorrect gaze shifts in the distractor condition were usually initiated sooner than correct gaze shifts (see Figs. 3, 4), a pattern similar to that observed for errors in other distractor conditions (Ottes et al., 1985, 1987; Munoz and Corneil, 1995) and antisaccade paradigms (Guitton et al., 1985; Fischer and Weber, 1992; Cavegn, 1996). These results suggest a speed–accuracy tradeoff, in that the earlier a movement is initiated, the higher the probability an error is generated. It has been suggested that such a speed–accuracy tradeoff results from target selection and gaze shift initiation being coordinated but discernible processes; incorrect gaze shifts result from the initiation of a gaze shift before the completion of proper target selection (Schall, 1995). Given the differential requirements for target localization in the different experiments, the time required for target selection varies accordingly; thus, our results are not consistent with an absolute speed–accuracy tradeoff. For example, although reaction latencies in the visual target/irrelevant visual cue experiment were among the longest (Fig. 12), a large incidence of incorrect gaze shifts was observed (Fig. 11) . Instead, a tradeoff between gaze shift accuracy and initiation occurs around the specific time required for target selection in each experiment, so that a speed–accuracy tradeoff is only applicable within each experiment, not between experiments. Subjects can be instructed to favor either speed or accuracy, resulting in a higher or lower incidence of incorrect gaze shifts, respectively (Ottes et al., 1985). Furthermore, a speed–accuracy tradeoff applies between human subjects performing the same experiment: subjects capable of reacting at the shortest latencies produced a higher proportion of incorrect gaze shifts than those reacting at longer latencies (Munoz and Corneil, 1995).
Analysis of reaction latencies in all experiments showed that, regardless of target modality, irrelevant auditory cues reduced reaction latencies in the enhancer condition more so than irrelevant visual cues, whereas irrelevant visual cues increased reaction latencies for correct gaze shifts in the distractor condition more so than irrelevant auditory cues (Figs. 13, 14). The interference generated by an irrelevant visual cue in the distractor condition is transient, influencing the processing of the visual target only when the irrelevant visual cue was presented within ±100 msec of the target (see Fig. 13B,C). The time to initiate a smooth pursuit eye movement to a moving visual target is also increased in the presence of an irrelevant visual cue moving in the opposite direction and reduced when the irrelevant visual cue moves in the same direction as the target (Ferrera and Lisberger, 1995). In fact, the increase in the latency of pursuit eye movements when the irrelevant cue moved in the opposite direction was greater than the amount of facilitation observed when the irrelevant cue moved in the same direction as the target, an observation similar to our results in the visual target/irrelevant visual cue experiment (Figs. 13, 14). Taken together, these observations suggest that the processes underlying the influences of irrelevant visual cues on visual targets may be similar in different oculomotor tasks.
It is well established that neurons in the deeper layers of the superior colliculus (SC) play a very important role in the initiation of gaze shifts whether the head is restrained or unrestrained (for review, see Sparks and Hartwich-Young, 1989; Guitton, 1992; Moschovakis and Highstein, 1994). Collicular multisensory neurons show profound enhancement when multimodal stimuli are presented in spatial and temporal coincidence and depression when multimodal stimuli are spatially disparate (Meredith and Stein, 1996). Such convergence of auditory and visual information onto the SC provides at least one area with the necessary architecture to support multimodal interactions. Current neurophysiological evidence suggests that, at least for the visual system, target selection is accomplished in a number of cortical areas by the gradual upregulation of activity relating to a visual target and a concomitant downregulation of activity relating to irrelevant visual cues (for review, see Schall, 1995). It will be of interest to see whether certain areas involved in visual target selection play a similar role in discrimination of auditory stimuli. In multimodal experiments, we speculate that target selection arises from similar selective processing of a target of one modality over a cue or cues of a different modality in a multimodal center such as the SC. Recording in the SC and other multimodal areas during multimodal target selection tasks will be required to assess this prediction. Furthermore, the analysis of the neural activity preceding the generation of incorrect gaze shifts may prove especially illuminating in cases in which complete target selection is preempted by the initiation of a gaze shift.
Footnotes
This work was supported by the Medical Research Council of Canada. D.P.M. is a research scholar of the EJLB Foundation. B.D.C. was supported by an Ontario Graduate Scholarship. We thank D. Hamburger and K. Grant for technical assistance and all of the subjects who participated in this study. We are also grateful to Drs. F. J. Richmond and G. E. Loeb for their comments on early versions of this manuscript.
Correspondence should be addressed to Douglas P. Munoz, Department of Physiology, Queen’s University, Kingston, Ontario, Canada K7L 3N6.