Introduction

The deployment of visual attention has been broadly characterized as relying on two functionally distinct control processes, one exogenous in nature, driving attention towards salient objects, and the other endogenous in nature, driving attention towards objects that are of strategic import (Jonides 1981). The relative importance of these two sources of control has been a hotly contested topic in the vision literature. This debate is often reduced to a dichotomy: on the one hand is the idea that endogenous control is contingent on exogenous processes, with the result that attention can be driven by the external environment (Theeuwes 1994; Itti and Koch 2001). Evidence for this perspective tends to come from studies demonstrating that attention can be captured by salient, task-irrelevant visual stimuli (e.g. Hickey et al. 2006). On the other hand is the perspective that endogenous control of attention is established immediately following stimulation such that attention is invariably deployed according to top-down settings (Folk et al. 1992). Evidence for this category of alternatives comes from studies demonstrating that salient stimuli can be effectively ignored (e.g. Eimer and Kiss 2008; Lien et al. 2008).

The juxtaposition of endogenous and exogenous control creates the impression that selection relies solely on one or the other. This is patently false; the experimental literature makes it abundantly clear that both sources of control are critical to neural representation of visual stimuli and behavior (e.g. Beck and Kastner 2005; Egeth and Yantis 1997). The interesting question is not whether exogenous environmental characteristics or endogenous cognitive states influence selection, but rather how these independent sources of control combine. Recent studies of eye-movements suggest that the timing of selection is critically important to this question. The eyes are often oriented to very salient stimuli even when these objects are task irrelevant (Theeuwes et al. 1998, 1999; van Zoest et al. 2004). Importantly, this is more likely to occur when saccades are initiated soon after stimulus onset (Godijn and Theeuwes 2002; Ludwig and Gilchrist 2003). As saccadic latency increases it becomes more likely that the initial saccade will be directed to the target (van Zoest and Donk 2005; van Zoest et al. 2004). Recent results demonstrate that short-latency directional manual responses made using a joystick show a similar pattern (Hunt et al. 2007).

The timing identified in studies of overt selection provides a framework in which ostensibly conflicting results in the capture literature might be reconciled: capture might occur when attention is rapidly deployed. However, to date studies of the time course of selection have relied on measures of overt behavior, and particularly on the measurement of saccadic eye movements. Though the attentional system is closely linked to the motor system—and to the oculomotor system specifically (Rizzolatti et al. 1994)—there is substantial evidence that the attention and eye movement systems have some functional independence (e.g. Hunt and Kingstone 2003a, b; Belopolsky and Theeuwes 2009; Wu and Remington 2003). This raises the possibility that the time course of overt selection may be specific to motor response rather than reflecting underlying patterns in the covert deployment of attention.

The goal of the current study was to determine whether the time course identified in studies of overt attention reflects a pattern of covert selection. In order to do this we looked to event-related potential (ERP) data collected while participants complete a visual search task based on the additional singleton paradigm (Theeuwes 1991). In this type of task participants are required to covertly select a uniquely shaped target presented among distractors. Unique stimuli like this are known as singletons, reflecting the fact that they differ from their neighbors in a single-feature dimension. In the additional singleton paradigm the shape singleton target is often the only unique item present in the display, but in a critical subset of trials one of the distractors is rendered an irrelevant singleton by giving it a unique color. The presence of this salient distractor is found to increase target response times and reduce accuracy (Bacon and Egeth 1994; Hickey et al. 2006; Theeuwes 1991, 1992).

We used a specific component of the ERP known as the N2pc in order to index the deployment of attention. The N2pc is an increased negativity in the ERP elicited over visual cortex contralateral to an attended item (Luck and Hillyard 1994). It has been linked to attentional processes involved in target discrimination such as the suppression of distractor information (Hickey et al. 2009; Luck et al. 1997). Critically, the lateral topography of the N2pc means that it can be used to determine whether attention is deployed to the left or right visual hemifield (e.g. Woodman and Luck 2003), and its latency can be used to determine relative differences in the timing of selection (e.g. Brisson et al. 2007).

We have an earlier paper based in part on an analysis of the data collected in this experiment (Hickey et al. 2006, Experiment 1). The central finding in that paper was that a distractor-elicited N2pc could be identified in the ERP. Moreover, in Experiment 2 of Hickey et al. (2006) we found that under certain circumstances this distractor-elicited N2pc preceded a later target-elicited N2pc. This pattern is consistent with the idea that attention was captured to the location of the distractor before being reoriented to the target. In the current paper we re-approach data from Experiment 1 of Hickey et al. (2006) with new hypotheses and methodology. We analyzed results from two critical experimental conditions as a function of response latency. These two critical conditions correspond to two configurations of stimuli: when the target was on the vertical meridian and the distractor was lateralized, and when the distractor was on the vertical meridian and the target was lateralized. These configurations are important because under these circumstances lateralized ERP activity elicited by the target can be isolated from lateralized ERP activity by the distractor, and vice versa (Hickey et al. 2009). This is the case because stimuli presented on the vertical meridian are represented equally in both visual cortices and thus cannot create lateralized ERP activity (Woodman and Luck 2003). We chose to use Experiment 1 of Hickey et al. (2006) for these analyses rather than Experiment 2 because Experiment 1 included more trials per critical condition and thus garnered cleaner ERPs.

Our latency analysis was based on manual response to the target stimulus. It is important to note that this approach differs from that employed in prior studies of overt selection, where analysis was based on the timing of the saccade. In those studies it is the short-latency trials that show evidence of the exogenous influence of salient distractor stimuli (e.g. van Zoest et al. 2004). In contrast, in the current experiment we expected to find evidence of increased distractor processing in the trials where eventual target response was relatively slow; it is in these trials that attention was most likely to have been captured by the distractor, presuming that capture causes the need for a reorientation of attention to the target and a longer RT.

Method

Participants

Eighteen neurologically typical students of the Vrije Universiteit Amsterdam gave informed consent before participation. Due to excessive eye movement artifacts in the EEG data from two participants were discarded. Data from one additional participant were discarded as the N2pc was not evident in any experimental condition. Two of the remaining 15 participants (5 women; age 21 ± 2.4 years, mean ± SD) were left-handed.

Experimental stimuli and procedure

The stimuli and procedure employed have been described at length in Hickey et al. (2006). Briefly, visual search displays contained ten object outlines presented in a circle around a fixation dot. Objects could be diamonds or circles, with every display containing a uniquely shaped item (either a diamond among circles or vice versa; see Fig. 1 for examples). All objects contained gray lines; participants were instructed to report the orientation of the line contained in the uniquely shaped object. In two-thirds of trials one of the nine identically shaped objects was presented in a unique color, either red while all other objects were green or vice versa. In one quarter of these color-singleton present trials the uniquely colored object was presented on the vertical meridian of the display and the unique shape was presented to one of the eight lateralized positions (see Fig. 1a). In another quarter the unique shape was presented on the vertical with the uniquely colored object presented at a lateral location (see Fig. 1b). In the remaining color-singleton-present trials the unique shape was presented at one of the eight lateral positions and the uniquely colored object was presented at one of the four positions in the contralateral visual hemifield.

Fig. 1
figure 1

Examples of visual search arrays. A broken circle represents the distractor singleton. Actual experimental stimuli were composed of colour outlines on a black background. a An example of a display that would create a target-elicited N2pc. b An example of a display that would create a distractor-elicited N2pc

Trials began with a fixation dot for 600–1,600 ms. The visual search display followed and remained onscreen until 100 ms after response was made (at which time a new trial began). Participant response was based on the orientation of the line within the shape singleton. Each participant completed 30 experimental blocks and at least one practice block, with each block consisting of 48 trials.

EEG recording and analysis

EEG was recorded from 30 tin scalp electrodes placed at a subset of the international 10/10 system sites. Vertical electrooculogram (VEOG) was bipolarly recorded from electrodes placed above and below the right eye and horizontal electrooculogram (HEOG) was bipolarly recorded from electrodes placed 1 cm lateral to the external canthi. VEOG was used in the detection of blink artifacts and HEOG was used in the detection of eye movement artifacts. All electrodes (except VEOG and HEOG) were referenced during recording the left mastoid and later referenced to the algebraic average of the left and right mastoids. The EEG, VEOG, and HEOG were amplified with a gain of 5000 and a passband of 0.05–100 Hz, digitized at 500 Hz, and stored on a microcomputer. Artifacts stemming from the eyes or amplifier-blocking were detected using an automated artifact-rejection algorithm based on signal amplitude and trials containing such artifacts were excluded from further analysis. This led to the rejection of 8.2% of correct trials, where in no analyzed dataset were more than 15% of correct trials rejected.

The ERPs presented in the figures were digitally low-pass filtered to remove high-frequency noise produced by muscle activity and external electrical sources (6 dB attenuation at 22 Hz with a 6 dB transition bandwidth of 4 Hz). For statistical and display purposes ERP amplitude was computed with respect to a 100-ms pre-stimulus baseline. All analyses of ERP amplitude were conducted prior to digital filtering.

Trials in the two critical experimental conditions were sorted into quartiles for each subject and ERPs were computed for the fastest and slowest quartiles. Descriptive statistics for these quartiles are presented in Table 1. Only EEG-artifact-free trials in which correct performance occurred between 100 and 1,500 ms post-stimulus were used in computation of RT quartiles. All analyses are conducted on ERPs recorded at lateral occipital electrode sites PO7 and PO8. The N2pc is generally maximal at these sites, as was the case in the present data.

Table 1 Mean response times (RTs) for each of the two critical stimuli configurations, for each of the fastest and slowest RT quartiles

Results

Behavioral results

A total of 16.2% of trials were rejected from analysis, 0.8% due to excessively slow response (>1,500 ms) and 15.4% due to incorrect response. Mean RT in the distractor-singleton-present condition was 691 ms and in the distractor-singleton-absent condition was 588 ms. This 103-ms difference was found significant in a repeated measures analysis of variance (RANOVA) with a single factor for distractor presence, F(1,14) = 60.52, p < 0.001. A similar pattern was observed in analysis of error rates, with fewer errors when the distractor was absent (14.3%) than when it was present (16.3%), F(1,14) = 10.99, p = 0.005. Mean RTs and criteria used in the definition of data quartiles for ERP analysis are presented in Table 1.

Electrophysiological results

Figure 2 presents the ERPs elicited at lateral occipital electrode sites PO7 and PO8 in the fastest and slowest data quartiles for the two critical stimulus configurations. All four ERPs show the N2pc in the divergence of ipsilateral and contralateral waveforms between 200 and 300 ms poststimulus. Figure 3 presents the N2pc components isolated by subtracting the ipsilateral waveform from the contralateral waveform.

Fig. 2
figure 2

Grand-averaged ERPs elicited in the fastest and slowest RT quartiles for each of the two stimuli configurations illustrated in Fig. 1. These ERPs were recorded at posterior lateral electrode sites PO7 and PO8. Note that negative is plotted upward and stimulus onset was at 0 ms. a Target-elicited N2pc–Fastest Quartile. b Target-elicited N2pc–Slowest Quartile. c Distractor-elicited N2pc–Fastest Quartile. d Distractor-elicited N2pc–Slowest Quartile

Fig. 3
figure 3

N2pc contralateral-minus-ipsilateral difference waves computed from the ERPs presented in Fig. 2. The plot in the background is of the difference waves across the duration employed in Fig. 2. The plot in the foreground is a magnification through the N2pc latency interval. This foreground plot is provided for easy comparison of the amplitude and latency of the N2pc components elicited in the four experimental conditions

The peak of the N2pc elicited by the target is large in amplitude when participants responded quickly and small when participants responded slowly. In contrast, the distractor-elicited N2pc is large in amplitude when participants responded slowly and small when participants responded quickly. In order to numerically and statistically assess this pattern for each condition, we measured mean N2pc amplitude per subject across a 30-ms latency window centered on the peak latency observed in the grand average. Results from this analysis are illustrated in Fig. 4a. An interaction between response latency (fast vs. slow) and eliciting stimulus (target vs. distractor) is apparent in this figure. A repeated measures analysis of variance (RANOVA) with factors for response latency and eliciting stimulus showed that neither the main effect of response latency nor the main effect of eliciting stimulus was significant, Fs < 1 and F(1,14) = 1.447, p = 0.249, respectively, but the that interaction between the factors was reliable, F(1,14) = 5.934, p = 0.029. Follow-up t test contrasts demonstrated that the target-elicited N2pc amplitude in the fast condition was not reliably different from target-elicited N2pc amplitude in the slow condition, t(14) = 1.256, p = 0.230, but the distractor-elicited N2pc amplitude in the fast condition was reliably different from distractor-elicited N2pc amplitude in the slow condition, t(14) = 1.896, p = 0.039.

Fig. 4
figure 4

a Peak amplitude of the N2pc for each of the four experimental conditions. b Onset latency of the N2pc for each of the four experimental conditions

The N2pc components elicited in the four experimental conditions appear to additionally differ in terms of onset latency. Critically, the N2pc elicited by the distractor in slow trials appears to precede the N2pc elicited by the distractor in fast trials and the target-elicited N2pc found in both fast and slow trials. In contrast, the N2pc elicited by the target in slow trials appears to occur later in time than the N2pc elicited by the target in fast trials. In order to numerically and statistically assess this pattern we measured the N2pc onset latency for each of the four conditions using a jackknife procedure where latency was defined as the point in time at which the N2pc reached 70% of its maximum amplitude (see Kiesel et al. 2008, for details). Results from this analysis are presented in Fig. 4b and suggest an interaction between response latency (fast vs. slow) and eliciting stimulus (target vs. distractor). In a RANOVA based on onset latencies with these factors the main effect of response latency was not significant, F < 1, but both the main effect of eliciting stimulus, F(1,11) = 10.23, p = 0.006, and the interaction, F(1,11) = 10.41, p = 0.006, were reliable. t Test contrasts revealed that response latency had a marginally significant effect on the distractor-N2pc, with the distractor-N2pc elicited in slow trials preceding the distractor-N2pc elicited in fast trials by 36 ms, t(14) = 1.849, p = 0.0433. In contrast, a t test contrast on the target-elicited N2pc revealed that the target-N2pc elicited in slow trials followed the target-N2pc elicited in fast trials by 20 ms, t(14) = 1.922, p = 0.0376. Note that all statistics reported for the latency analysis have been corrected for the artificial reduction of variability caused by the jackknife procedure (see Kiesel et al. 2008; Ulrich and Miller 2001).

Discussion

Analysis of the N2pc results as a function of RT garnered two critical findings. First, as illustrated in Figs. 3 and 4a, N2pc amplitude varied as a function of response speed: the target-elicited N2pc was larger when observers were quick to respond as compared to when they were slow (though this effect was not statistically reliable). In contrast, the distractor-elicited N2pc was larger when observers were slow to respond than when they were quick. Second, as illustrated in Figs. 3 and 4b, the N2pc elicited by distractors in the slow RT condition was earlier than the N2pc elicited by distractors in the fast RT condition, whereas the N2pc elicited by targets in the slow RT condition was later that the N2pc elicited by targets in the fast RT condition.

While discussing these results it is critical to distinguish between the speed of attentional deployment and the speed of manual response. The results demonstrate that rapid deployment of attention—as reflected in early onset N2pc—was associated with slow RT. This is the case because when attention was rapidly deployed it was directed to the distractor, causing the need for the redeployment of attention to the target location and a subsequent increase in RT.

Studies of saccadic eye movements have shown that short-latency eye movements are more often directed towards salient stimuli, whereas longer-latency eye movements are more often directed to target stimuli (e.g. van Zoest et al. 2004). In a recent study, Hunt et al. (2007) found a similar pattern in manual response, demonstrating that directional joystick movements were biased toward salient distractors. Hunt et al. suggested that this time course reflects a development in the quality of visual information over time. According to this perspective, the initial afferent visual response becomes more sophisticated through the integration of other information sources, including those that underlie endogenous attentional control.

We wholeheartedly agree with Hunt et al.’s proposal and believe the current results provide additional evidence for the idea that visual representations develop over time. In the current results the critical factor that determines whether attention was deployed to the target or to the distractor is the latency at which the deployment of attention occurs. Fast deployments of attention, likely based on representations of the visual environment largely exogenous in nature, were directed to the salient distractor. Slower deployments of attention, based on representations that integrated endogenous information, were target directed.

The current results constitute the first direct evidence of such a time course in covert attentional selection, but are surprisingly consistent with results and theoretical models from both the saccadic eye movement and animal electrophysiology literatures. In the eye movement literature Godijn and Theeuwes (2002) proposed a model of saccadic selection they termed the competitive integration model. According to this proposal the programming of saccadic eye movements relies on the integration of exogenous and endogenous control factors in a common retinotopic salience map. Activation loci on this salience map compete with one another such that activation at relatively close locations summates, while relatively distant activation loci inhibit one another. Critically, the competitive integration model proposes that the accrual of endogenous information in the salience map takes time, while exogenous influences on the map occur rapidly.

The competitive integration model is rather specifically tailored to account for results from the eye movement literature, but the idea of competition has also motivated models of covert selection, notably in the biased competition framework of Desimone and Duncan 1995; (see also Luck et al. 1997; Tsotsos et al. 1995). The central premise of biased competition is that object representations in visual cortex compete for neural representation. This competition is in part driven by raw salience, with the result that high-salience items are more likely to be represented in late processing than low-salience items. Endogenous biases on this competitive process ensure that low-salience objects with behavioral relevance will continue to be represented in the system.

Importantly, endogenous competitive biases appear to lag behind the initial afferent response in visual cortex. This was demonstrated by Reynolds and Desimone (2003), who recorded activity from V4 neurons while monkeys viewed displays that contained two objects that fell within the same V4 receptive field. When the stimuli were unattended, neural response was dominated by the higher-contrast of the two objects. In contrast, when attention was deployed to the less salient stimulus the cellular response happened to be driven by the attended stimulus. Critically, the effect of attention took a brief period to develop, 23 ms on average across the sampled neurons. Interestingly, this corresponds roughly to the average of the 57 ms difference between the onset of the distractor-elicited and target-elicited N2pc components observed in the slow-RT condition of the present experiment and the 0-ms effect observed in the fast-RT condition. In other words, the average delay between the exogenous N2pc response and the endogenous N2pc response in the present study is quite similar to the delay between the initial exogenous neural response and the first evidence of endogenous attention in monkey cortex.

Reentrant processing in visual cortex may underlie the time course of attentional selection. Endogenous control of attention very likely relies on feedback connections from higher-level cortex on lower-level sensory areas. It may be that the influence of these higher-level areas on visual cortex is established only once the afferent, feedforward sweep of information through cortex signals the onset of visual processing. This would mean that the first reentrant feedback into low-level visual areas would lag behind the onset of visual activity in low-level cortex. During this time, processing would be solely driven by exogenous input. This general framework has been suggested in other work (e.g. Reynolds and Desimone 2003) and is in line with formal characterizations of attention such as the selective tuning model of Tsotsos et al. (1995).

In “introduction” we propose that differences in the time course of attention may underlie conflicting results in the capture literature. Evidence of capture has been primarily provided by studies using visual search arrays where target and distractor are presented concurrently, as is the case in the present study (e.g. Theeuwes 1991, 1992; Hickey et al. 2006). In these studies the position of the target is uncertain and the active deployment of attention is required in order for observers to locate the target. In contrast, evidence that salient distractors can be ignored comes from studies that often employ a rapid serial visual presentation (RSVP) paradigm where distractors precede targets in time (e.g. Folk et al. 2002; Leblanc et al. 2008). In these studies the location of the target is consistent and observers are not required to shift attention to find the target. Differences in results observed across these paradigms may stem from this difference in experimental design. The current study provides evidence that timing plays an important role in the integration of endogenous and exogenous attentional control factors in visual search. Further research is needed to determine to what degree this factor can account for conflict in extant experimental results.

In conclusion, the present results provide evidence for a time course in the control of visual attention. We find that when attention is deployed soon after stimulus onset it is exogenously driven, whereas when attention is deployed later in time it is endogenously driven. These results are in line with many models of visual attention, and may provide a framework to resolve debate regarding the ability of salient, task-irrelevant stimuli to capture attention.