Abstract
Covert spatial attention has long been thought to speed visual processing. Psychophysics studies have shown that target information accrues faster at attended locations than at unattended locations. However, with behavioral evidence alone, it is difficult to determine whether attention speeds visual processing of the target or subsequent postperceptual stages of processing (e.g., converting sensory responses into decision signals). Moreover, although many studies have shown that attention can boost the amplitude of visually evoked neural responses, no robust effect has been observed on the latency of those neural responses. Here, we offer new evidence that may reconcile the neural and behavioral findings. We examined whether covert attention influenced the latency of the N2pc component, an electrophysiological marker of visual selection that has been linked with object individuation—the formation of an object representation that is distinct from the background and from other objects in the scene. To this end, we manipulated whether or not human observers (male and female) covertly attended the location of an impending search target. We found that the target evoked N2pc onset ∼20 ms earlier when the target location was cued than when it was not cued. In a second experiment, we provided a direct replication of this effect, confirming that the effect of attention on N2pc latency is robust. Thus, although attention may not speed the earliest stages of sensory processing, attention does speed the critical transition between raw sensory encoding and the formation of individuated object representations.
SIGNIFICANCE STATEMENT Covert spatial attention improves processing at attended locations. Past behavioral studies have shown that information about visual targets accrues faster at attended than at unattended locations. However, it has remained unclear whether attention speeds perceptual analysis or subsequent postperceptual stages of processing. Here, we present robust evidence that attention speeds the N2pc, an electrophysiological signal that indexes the formation of individuated object representations. Our findings show that attention speeds a relatively early stage of perceptual processing while also elucidating the specific perceptual process that is speeded.
Introduction
Covert spatial attention is thought to improve both the fidelity and speed of visual processing (Titchener, 1908; Shore et al., 2001; Carrasco, 2011). However, conclusive evidence that attention speeds visual processing is lacking. The strongest evidence to date comes from psychophysics studies that have obtained independent measures of processing speed and visual discriminability to show that target information accrues faster at attended locations (Carrasco and McElree, 2001; Carrasco et al., 2006). However, because this work relies on behavioral responses, it does not establish whether attention speeds perceptual analysis of the target or postperceptual processes (e.g., decision-making and response selection). Adding to this uncertainty, electroencephalography (EEG) work in humans has revealed that attention influences the amplitude but not the latency of early visually evoked neural responses (Hillyard et al., 1998; Di Russo et al., 2003; McDonald et al., 2005). Likewise, studies in nonhuman primates have reported no effect of covert attention on the latency of responses in the visual cortex (Reynolds et al., 2000; Lee et al., 2007) or differences of 1–2 ms (Sundberg et al., 2012), which cannot account for the effects on processing speed estimated from behavior (Carrasco and McElree, 2001). Thus, neural and behavioral data have not converged on a common answer to the fundamental question of whether attention speeds visual processing.
To address this discrepancy, we examined how attention affects the N2pc, a contralateral negativity measured with EEG, which indexes visual selection (Luck and Hillyard, 1994). Although the N2pc was initially thought to reflect a shift of covert attention to a target (Eimer, 1996), recent work suggests that it instead tracks a different aspect of target processing. Kiss et al. (2008) showed that the amplitude of the N2pc evoked by a target during visual search was equivalent when the target's hemifield was cued in advance—enabling a preparatory shift of attention—and when an uninformative cue was presented. Thus, they concluded that the N2pc does not index shifts of attention per se. An alternative view is that the N2pc indexes object individuation (Mazza and Caramazza, 2015), the formation of an object representation that is segregated from the background and other items in the display (Kahneman et al., 1992; Xu and Chun, 2009). In support of this view, N2pc amplitude increases with the number of items that are individuated during rapid enumeration tasks (Pagano and Mazza, 2012) but not in tasks that do not require target individuation (Mazza and Caramazza, 2011). Furthermore, N2pc set-size effects predict individual differences in tasks that require object individuation (Drew and Vogel, 2008; Ester et al., 2012). Thus, current evidence suggests that the N2pc reflects the transformation of sensory information into individuated object representations.
Here, we leveraged the N2pc to test whether covert attention speeds the object individuation stage of perceptual processing. In a cued-search task, we manipulated whether or not human observers attended the location of an impending search target. In two experiments, we found that the N2pc occurred ∼20 ms earlier when the location of a search target was cued in advance. This finding points to a reconciliation of past behavioral and neural studies of how spatial attention influences the latency of visual processing. Specifically, our results suggest that, although attention may not speed very early sensory responses (McDonald et al., 2005; Lee et al., 2007), attention does speed the transformation from raw sensory input to discrete representations of individuated objects, as indexed by the N2pc. Furthermore, our results provide new support for the individuation account of the N2pc by addressing an important caveat to the Kiss et al. (2008) study. Kiss et al. cued observers to the relevant hemifield rather than the precise target position. Thus, the target-evoked N2pc in their cued condition may have reflected the refocusing of attention within the cued hemifield. By contrast, we cued the precise target location and used multivariate decoding of alpha-band (8–12 Hz) oscillations to verify that observers attended the cued location. Even having verified that attention was precisely focused at the target's location, we observed a robust N2pc, as predicted if the N2pc reflects target individuation.
Materials and Methods
Subjects
Sixty-three volunteers (25 in Experiment 1 and 38 in Experiment 2) participated in the experiments for monetary compensation ($15/h). Subjects were between 18 and 35 years old, reported normal or corrected-to-normal visual acuity, and provided informed consent according to procedures approved by the University of Chicago Institutional Review Board.
Experiment 1.
Our target sample size was 16 subjects in Experiment 1, in keeping with our past work using the inverted encoding model (IEM) to track spatial attention with alpha-band oscillations (Foster et al., 2017). This sample size is also typical for studies measuring the N2pc component. We excluded subjects who had <600 artifact-free trials per condition (see Artifact rejection). We excluded a total of nine subjects from our sample. Seven subjects were excluded because too few trials remained after artifact rejection. In addition, data collection was terminated early for two subjects because the data were unusable due to excessive artifacts and/or poor task performance. Thus, our final sample included 16 subjects (9 male, 7 female; mean age = 22.9 years, SD = 2.8). Subjects in the final sample provided 802 trials on average (SD = 95) in the informative-cue condition, and 807 trials on average (SD = 80) in the uninformative-cue condition.
Experiment 2.
We ran Experiment 2 to replicate the findings from Experiment 1. In this experiment, we increased our target sample size to 24 subjects (50% larger than Experiment 1) to increase statistical power given that we expected N2pc latency effects on the order of 20 ms based on Experiment 1. Again, we excluded subjects who had <600 artifact-free trials per condition. In addition, we terminated data collection for subjects for whom we could not obtain usable eye-tracking data. We excluded a total of 12 subjects from our sample. Four subjects were excluded because too few trials remained after artifact rejection. Furthermore, data collection was terminated early for eight subjects, and these subjects were excluded from our sample, for the following reasons: the subject was making too many eye movements during trials (one subject), we were unable to obtain usable eye-tracking data and the session was running behind schedule (three subjects), the subject was feeling unwell (one subject), the subject withdrew from the study (one subject), the experimenter forgot to save the EEG data (one subject), and an equipment failure disrupted data collection (one subject). Thus, the final sample included 26 subjects (11 male, 15 female; mean age = 22.9 years, SD = 3.9). We overshot our target sample size of 24 because the sessions for our final subjects were scheduled before we reached our target sample size. Subjects in the final sample provided 833 trials on average (SD = 100) in the informative-cue condition and 841 trials on average (SD = 99) in the uninformative-cue condition. Three subjects in Experiment 2 also participated in Experiment 1. Only one of these subjects was included in the final sample (i.e., after artifact rejection) for both experiments.
Apparatus and stimuli
We tested the subjects in a dimly lit, electrically shielded chamber. Stimuli were generated using MATLAB (MathWorks) and the Psychophysics Toolbox (Brainard, 1997; Pelli, 1997) and were presented on a 24 in. LCD monitor (refresh rate: 120 Hz, resolution: 1080 × 1920 pixels) at a viewing distance of ∼79 cm for Experiment 1 and 75 cm for Experiment 2. Stimuli were rendered in dark gray against a medium-gray background.
Task procedures
Experiment 1.
Subjects performed a cued-search task (Kiss et al., 2008). On each trial, they searched for a target (a diamond) among seven distractors (squares) and reported whether the target was missing the left or right corner (Fig. 1). In some blocks, a cue indicated the exact location of the target in advance, which enabled observers to attend the target location before the onset of the search array (informative cue). In other blocks, the cue provided no information about the location of the impending target (uninformative cue).
A fixation point (0.15° in diameter) was present throughout each block of trials. Each trial began with a 100 ms cue. In informative-cue blocks, the cue was a bar (0.125° long, 0.08° wide) that extended from the fixation point and pointed to the location where the target would appear. In uninformative-cue blocks, the fixation point increased in size (to 0.2°). Thus, uninformative cues provided the same temporal information as informative cues but provided no information about the target location. After a 600 ms interstimulus interval, a search array appeared for 200 ms. Each search array comprised eight stimuli (a target among seven distractors) equally spaced around fixation at an eccentricity of 4° (Fig. 1). The distractors were squares (1.6° × 1.6°). The target was a diamond (identical in size to the distractors) that was missing the left or right corner. Subjects reported which corner was missing by pressing the “z” key (left hand) or the “/” key (right hand). Subjects were instructed to respond as quickly as possible while maintaining high accuracy. Each trial was terminated by the subject's response and was followed by a variable intertrial interval (ITI) between 1500 and 1800 ms. Subjects were provide with feedback about their performance (mean response time and accuracy) at the end of each block of trials. To minimize ocular artifacts during the trials, we instructed subjects to maintain fixation throughout each block of trials and to blink shortly after their response, before the next trial began.
Subjects completed 32 blocks of 64 trials, with two exceptions: one subject completed 30 blocks, and another subject completed 27 blocks. Within each block, the target appeared at each of the eight possible locations equally often. The cue conditions (informative or uninformative) were alternated across blocks, and the order of conditions (informative or uninformative first) was counterbalanced across subjects. Subjects completed between two and four practice blocks (as needed based on task performance).
Experiment 2.
Subjects completed the cued-search task used in Experiment 1 with the following changes. First, we used a visually balanced cue to rule out any impact of asymmetric cue displays. The cue comprised eight bars that extended from the fixation point (0.125° long, 0.08° wide), one pointing toward each of the eight stimulus positions (Fig. 1). In informative-cue blocks, one bar that was a different color from the rest (either a red bar among blue bars or a blue bar among red bars, counterbalanced across subjects) pointed toward the target location. In uninformative-cue blocks, all bars were the same color (blue or red, counterbalanced across subjects). The red and blue colors used for the cues were closely matched for luminance. In addition, we made two changes to make it easier for subjects to blink between trials to avoid contamination of trial epochs with blinks. First, we used a longer ITI (jittered between 1500 and 1900 ms) so that subjects had a longer window during which they could blink between each trial. Second, to cue subjects to blink during the ITI, the fixation point disappeared 200 ms after subjects made their response (i.e., 200 ms after the ITI began) and reappeared 500–600 ms before the cue for the next trial, and we instructed subjects to blink when the fixation point was absent. Finally, we recalibrated they eye tracker in between trials if necessary (see Eye tracking) to ensure we obtained usable eye-tracking data for all participants. After calibration between trials, the fixation point was presented for 500 ms before starting the next trial.
Subjects completed 32 blocks except for two subjects who completed 30 blocks, one subject who completed 24 blocks, and another subject who completed 36 blocks. For the subject who completed 36 blocks, EEG data were not recorded for five blocks due to experimenter error (see Glitches), so data were analyzed only for the 31 blocks with EEG data.
EEG acquisition
We recorded EEG activity using 30 active Ag/AgCl electrodes mounted in an elastic cap (Brain Products). We recorded from International 10/20 sites Fp1, Fp2, F7, F3, Fz, F4, F8, FC5, FC1, FC2, FC6, C3, Cz, C4, CP5, CP1, CP2, CP6, P7, P3, Pz, P4, P8, PO7, PO3, PO4, PO8, O1, Oz, and O2. Two additional electrodes were placed on the left and right mastoids, and a ground electrode was placed at position Fpz. All sites were recorded with a right-mastoid reference and were re-referenced offline to the algebraic average of the left and right mastoids. We recorded electro-oculogram (EOG) data using passive electrodes, with a ground electrode placed on the left check. Horizontal EOG was recorded with a bipolar pair of electrodes placed ∼1 cm from the external canthus of each eye, and vertical EOG with a bipolar pair of electrodes placed above and below the right eye. Data were filtered online (low cutoff = 0.01 Hz, high cutoff = 80 Hz, slope from low-to-high cutoff = 12 dB/octave) and were digitized at 500 Hz using BrainVision Recorder (Brain Products) running on a PC. We maintained impedances <10 kΩ.
Eye tracking
We recorded gaze position using a desk-mounted infrared eye-tracking system (EyeLink 1000 Plus, SR Research). According to the manufacturer, this system provides spatial resolution of 0.01° of visual angle and average accuracy of 0.25°-0.50° of visual angle. Gaze position was sampled at 1000 Hz. Stable head position was maintained during the task using a chin rest. The eye tracker was recalibrated as needed throughout the session, including whenever subjects removed their chin from the chin rest. We drift-corrected gaze position data on each trial by subtracting the mean gaze position measured in the 300 ms window immediately before cue onset from each time point during the trial.
In Experiment 1, we were unable to obtain usable eye-tracking data for one subject. Furthermore, for another five subjects, between 60 and 240 trials (after artifact rejection) were missing eye-tracking data due to a glitch with the eye tracker (see Glitches). These trials were omitted from the analysis of gaze position. In Experiment 2, we obtained usable eye-tracking data for all subjects.
Artifact rejection
We used a semiautomated procedure to remove trials that were contaminated by ocular artifacts (blinks and eye movements) or by EEG artifacts (amplifier saturation, excessive muscle noise, skin potentials). We used an automated routine to flag trials that contained artifacts (see next section for details). This automated routine served as a guideline for which trials were rejected. However, which trials were rejected was ultimately determined by visual inspection. Experimenters were blind to condition when inspecting the data for artifacts. We excluded trials contaminated by artifacts from all analyses (including behavioral analyses). We discarded electrodes Fp1 and Fp2 for all subjects because data quality was often poor (i.e., excessive high-frequency noise or slow drifts) at these sites. Furthermore, we discarded data from one or two additional electrodes for some subjects because of low-quality data (excessive high-frequency noise, drifts, or sudden steps in voltage). Subjects were excluded from the final sample if they had <600 artifact-free trials per condition (see Subjects).
Automated artifact detection
Here, we summarize the criteria used by the automated artifact detection routine, which was used to flag trials that contained artifacts.
Eye movements.
Trials were flagged as containing a saccade if the Euclidean vector between the mean gaze positions in the first and second halves of a 60 ms sliding window (advanced in 10 ms increments) was >0.5° of visual angle. Furthermore, we flagged trials for rejection if the recorded gaze position was further than 1.5° of visual angle from the fixation point. When eye-tracking data were not available, we used horizontal EOG to detect saccades. Trials were flagged as containing a saccade if the mean voltage during the first and second halves of a 100 ms sliding window (advanced in 10 ms steps) exceeded 20 μV.
Blinks.
Trials were flagged as containing a blink if the eye tracker could not detect the pupil at some point during the trial. When eye-tracking data were not available, we used vertical EOG to detect blinks. Trials were flagged as containing a blink if the mean voltage during the first and second halves of a 150 ms sliding window (advanced in 10 ms steps) exceeded 50 μV.
EEG artifacts.
We flagged trials as containing voltage drifts (e.g., skin potentials) if the absolute change in voltage from the first quarter of the trial to the last quarter of the trial exceeded 40 μV. We flagged trials as including a sudden step in voltage (which can occur when an electrode is damaged) if the mean voltage during the first and second halves of a 250 ms sliding window (advanced in 20 ms increments) differed by >60 μV. We marked trials as containing high-frequency noise (e.g., muscle artifacts) if any electrode had a peak-to-peak amplitude >120 μV within a 15 ms sliding window (advanced in 15 ms increments). Finally, we flagged trials as containing amplifier saturation if any electrode had 60 time points within a 100 ms sliding window (advanced in 50 ms increments) that were within 1 μV of each other.
N2pc analysis
For trials with correct responses, we measured the target-evoked N2pc locked to the search array by calculating a contralateral—ipsilateral difference wave averaging across posterior electrodes P7/8, PO7/8, PO3/4, and O1/2. For one subject, EEG data were unusable for electrode P7; therefore, we measured the N2pc at electrodes PO7/8, PO3/4, and O1/2. We baseline corrected the waveforms by subtracting the mean voltage in the 200 ms period before onset of the search array.
We used a jackknife procedure (Miller et al., 1998) to test for differences in the onset latency of the N2pc following informative and uninformative cues. N2pc onset latency was measured at the earliest time at which the N2pc difference wave reached 50% of its maximum amplitude. The latency difference between conditions, D, was measured as the difference in onset latency between conditions in the subject-averaged N2pc difference waves. We used a jackknife procedure to estimate the SE of the latency difference, SED, from the latency differences obtained for subsamples that included all but one subject (Miller et al., 1998). Specifically, the latency differences, D−i (for i = 1, …, N, where N is the sample size), were calculated, where D−i was the latency difference for the sample with all subjects except for subject i. The jackknife estimate of the SED was calculated as follows: where J̄ is the mean of the differences obtained for all subsamples (i.e., J̄ = ΣD−i/N). A jackknifed t statistic, tj, was then calculated as follows: which follows an approximate t distribution with N − 1 degree of freedom under the null hypothesis. These tests were one-tailed because we had the clear directional hypothesis that N2pc onset would occur earlier following informative cues than following uninformative cues.
Time-frequency analysis
To calculate alpha-band power at each electrode, we first bandpass filtered the raw EEG data between 8 and 12 Hz using the “eegfilt.m” function in EEGLAB (Delorme and Makeig, 2004). We applied a Hilbert transform (MATLAB Signal Processing Toolbox) to the bandpass-filtered data to obtain the complex analytic signal. Instantaneous power was calculated by squaring the complex magnitude of the complex analytic signal. We used a 375-ms-long filter kernel (i.e., three cycles with a low cutoff of 8 Hz). Thus, blurring in the time domain due to filtering did not extend beyond 188 ms before or after each time point.
Inverted encoding model
Following our past work (Foster et al., 2016, 2017), we used an IEM (Brouwer and Heeger, 2009, 2011; Sprague and Serences, 2013) to reconstruct spatially selective channel-tuning functions (CTFs) from the pattern of alpha-band (8–12 Hz) power across electrodes. This analysis assumed that alpha power at each electrode reflects the weighted sum of eight spatially selective channels (i.e., neuronal populations), each tuned for a different position in the visual field. Specifically, we modeled alpha power at each electrode as the weighted sum of eight spatial channels tuned for eight locations equally spaced around the central fixation point corresponding to positions at which the stimuli in the search array appeared (Fig. 1). We modeled the response profile of each spatial channel across angular locations as a half sinusoid raised to the 25th power: where θ is the angular location (0–359°), and R is the response of the spatial channel in arbitrary units. We circularly shifted this basis function to obtain a set of eight basis functions tuned for the eight equally spaced angular locations (0°, 45°, 90°, etc.).
An IEM analysis was applied to each time point to obtain time-resolved CTFs. We partitioned our data into independent sets of training data and test data (see the Training and test data section). The analysis proceeded in two stages (training and test). In the training stage, training data (B1) were used to estimate weights that approximate the relative contribution of the eight spatial channels to the observed response measured at each electrode. Let B1 (m electrodes × n1 measurements) be the power at each electrode for each measurement in the training set, C1 (k channels × n1 measurements) be the predicted response of each spatial channel (determined by the basis functions) for each measurement, and W (m electrodes × k channels) be a weight matrix that characterizes a linear mapping from “channel space” to “electrode space.” The relationship between B1, C1, and W can be described by a general linear model of the following form: The weight matrix was obtained via least-squares estimation as follows: In the test stage, we inverted the model to transform the independent test data, B2 (m electrodes × n2 measurements), into estimated channel responses, C2 (k channels × n2 measurements), using the estimated weight matrix, Ŵ, that we obtained in the training phase: Each estimated channel-response function was then circularly shifted to a common center, so the center channel was the channel tuned for the position of the cued/target location (a channel offset of 0°); then, these shifted channel-response functions were averaged across the eight position bins to obtain a CTF. The IEM analysis was applied to each subject separately because the exact contributions of each spatial channel to each electrode (i.e., the channel weights, W) will likely vary across individuals.
Training and test data.
For the IEM procedure, we partitioned artifact-free trials into independent sets of training data and test data for each subject. When comparing CTF properties across conditions, it is important to estimate a single encoding model that is then used to reconstruct CTFs for each condition separately. If this condition is not met, then it is difficult to interpret differences in CTF selectivity between conditions because these might result from differences between the training sets (Sprague et al., 2018). Here, we used data from the informative-cue condition to estimate the encoding model in the training phase, and we reconstructed CTFs for each condition (informative cue and uninformative cue) separately. Specifically, we partitioned data in each condition into three independent sets, equating the number of trials for each location within each of the six sets (three informative-cue sets and three uninformative-cue sets). For each set, we averaged across trials for each stimulus location bin to obtain estimates of alpha power values across all electrodes for each target location (electrodes × locations, for each time point). We used a leave-one-out cross-validation routine such that two of the three sets of informative-cue data served as the training data. The remaining set of informative-cue data served as the test data for that condition, and one of the sets of data from the uninformative-cue condition served as the test data for that condition. We applied the IEM routine using each of the three matrices within each condition as the test data and the remaining two informative-cue sets as the training set. The resulting CTFs were averaged across the three test sets for each condition.
Because we equated the number of trials for each target location within each set of trials, some trials were not assigned to any set. Thus, we used an iterative approach to make use of all available trials. For each iteration, we randomly partitioned the trials into six sets (as just described) and performed the IEM procedure on the resulting training and test data (such that trials that were not included in any block varied across iterations), and we averaged the resulting channel-response profiles across iterations. We performed a total of 50 iterations per subject.
CTF selectivity.
To quantify the spatial selectivity of alpha-band CTFs, we used linear regression to estimate CTF slope (Foster et al., 2016). Specifically, we calculated the slope of the channel responses as a function of spatial channels after collapsing across channels that were equidistant from the channel tuned for the position of the stimulus (i.e., a channel offset of 0°). Higher CTF slope indicates greater spatial selectivity.
Cluster-based permutation test.
We used a cluster-based permutation test to identify when CTF selectivity was reliably above chance, which corrects for multiple comparisons (Maris and Oostenveld, 2007; Cohen, 2014). We identified clusters in which CTF selectivity was greater than zero by performing a one-sample t test (against zero) at each time point. We then identified clusters of contiguous points that exceeded a t statistic threshold corresponding to a one-sided p value of 0.05. For each cluster, we calculated a test statistic by summing all t values in the cluster. We used a Monte Carlo randomization procedure to empirically approximate a null distribution for this test statistic. Specifically, we repeated the IEM procedure 1000 times but randomized the position labels within each training/test set, such that the labels were random with respect to the observed response at each electrode. For each run of the analysis with randomized position labels, we identified clusters as described above and recorded the largest test statistic, resulting in a null distribution of 1000 cluster test statistics. We then identified clusters in our unpermuted data that had test statistics larger than the 95th percentile of the null distribution. Thus, our cluster test was a one-tailed test with an alpha level of 0.05, corrected for multiple comparisons.
Glitches
Experiment 1.
For seven subjects, eye-tracking data were not recorded for a subset of trials due to a glitch with the eye tracker. Five of these subjects were included in our final sample. For these subjects, between 60 and 240 of the trials that remained after exclusion of trials with artifacts and incorrect responses were missing eye-tracking data. These trials were omitted from analyses of gaze position (see Eye tracking).
For one subject, the stimulus presentation computer crashed during one block of the task. We excluded data from this block from all analyses.
For another subject, we failed to record EEG data for 25 trials due to an equipment failure. We excluded these trials from all analyses.
Experiment 2.
For a subset of subjects, the EOG electrodes were plugged in incorrectly. This problem did not affect our analyses because we relied exclusively on eye-tracking data for detection of ocular artifacts and analyses of gaze position.
For one subject, the experimenter forgot to resume saving EEG data after a break between blocks. Thus, we are missing EEG data for five blocks (320 trials). We excluded these blocks from all analyses. This subject completed a total of 36 blocks. Thus, we have 31 usable blocks of data for this subject.
We terminated data collection for one subject because of technical difficulties with the EEG amplifier.
Data/software availability
All data and code are available on Open Science Framework at https://osf.io/a9mvb/.
Results
In two experiments, we tested whether spatial cues in advance of a search array influence the latency of the N2pc component. Human observers performed a cued-search task (Fig. 1). On each trial, observers searched for a target—a diamond among squares—and reported whether the target was missing its left or right corner (Kiss et al., 2008). We used a spatial cue to manipulate whether or not covert attention was focused at the target location before the onset of the search array. In some blocks, the cue indicated the precise location of the impending target (informative cue), which allowed observers to focus covert attention on the target location in advance. In other blocks, the cue was spatially uninformative (uninformative cue), such that observers needed to monitor all eight positions for the target. Experiment 2 was a close replication of Experiment 1 with a larger sample size to increase statistical power (see Materials and Methods). In Experiment 2, we used a visually balanced cue (Fig. 1) to rule out any impact of asymmetric cues on alpha activity tracking the orienting of covert attention or the target-evoked N2pc. We also made some small changes to make it easier for observers to blink during the ITI. Specifically, we lengthened the ITI, and we removed the fixation dot during the ITI to cue observers to blink during this period (see Materials and Methods).
Task performance
Cueing the position of the target improved search performance in both experiments. Figure 2 shows median response times (RTs) and accuracy (percentage correct) as a function of cue type (informative vs uninformative). In Experiment 1, median RTs were 45 ms faster on average following informative cues (M = 503 ms, SD = 46) than following neutral cues (M = 548 ms, SD = 39; t(15) = 7.49, p < 0.001), and accuracy was higher following informative cues (M = 97.7%, SD = 1.3) than following neutral cues (M = 95.5%, SD = 2.7; t(15) = −4.94, p < 0.001). We replicated this pattern in Experiment 2: median RTs were 54 ms faster on average following informative cues (M = 496 ms, SD = 62) than following neutral cues (M = 550 ms, SD = 73; t(25) = 12.79, p < 0.001), and accuracy was higher following informative cues (M = 96.9%, SD = 2.3) than following neutral cues (M = 95.2%, SD = 2.9; t(25) = −5.58, p < 0.001). These results show that observers made use of informative cues, attending the target location in advance. Below, this conclusion is corroborated by our analysis of alpha activity during the time window between cue and target onset.
Covert spatial attention speeds the N2pc component
To test whether covert attention speeds visual processing, we tested whether our manipulation of covert attention (informative vs uninformative cues) influenced the latency of the target-evoked N2pc seen following the onset of the search array. Figure 3 shows the contralateral—ipsilateral difference waves locked to the onset of the search array for both experiments. The target-evoked N2pc is the negative deflection in the difference wave occurring between 150 and 300 ms after onset of the search array. We observed a robust N2pc following both informative and neutral cues. We measured the onset of the N2pc as the time at which the N2pc reached 50% of its maximum amplitude, and used a jackknife procedure to test whether N2pc latency differed between informative and uninformative cue conditions (see Materials and Methods). In Experiment 1, we found that the target-evoked N2pc started 18 ms earlier following informative cues (192 ms after search array onset) than following uninformative cues (210 ms after array onset; t(15) = 2.19, p = 0.022, one-tailed test). The purpose of Experiment 2 was to replicate this effect of attention on the latency of the N2pc component with a larger sample size to increase statistical power (see Materials and Methods). In Experiment 2, the N2pc started 22 ms earlier following informative cues (184 ms after array onset) than following uninformative cues (206 ms after array onset; t(25) = 3.82, p < 0.001, one-tailed test. Together, these results provide clear evidence that the N2pc onset occurs earlier when the target location is cued in advance. Thus, attention speeds the N2pc.
The effect of cueing on N2pc latency cannot be explained by residual biases in eye position
We recorded eye position using an infrared eye tracker and used stringent criteria for rejecting trials with blinks or eye movements (see Materials and Methods). Nevertheless, we found that very small biases in eye position toward the cued location following informative cues remained after artifact rejection. Figure 4 shows the mean gaze position during the search array (700–900 ms after cue onset) as a function of target position following informative and uninformative cues. Following informative cues, mean gaze position varied by <0.15° (approximately the size of our fixation point). Thus, although there was a detectable bias in gaze position toward the cued location, this bias was very small, as would be expected after artifact rejection. Following uninformative cues, no such variation in gaze position was seen. As a result of this small bias in gaze position, the target stimulus in the search display appeared marginally closer to the fovea on average following informative cues than following uninformative cues. Thus, we tested whether this very small bias in eye position toward the target location could explain the earlier onset of the N2pc component following informative cues than following uninformative cues (Fig. 3). We performed this analysis using data from Experiment 2, in which we obtained reliable eye-tracking data for all subjects (see Materials and Methods). For each subject, we median split trials on the basis of the distance between mean gaze position (during the search array) and the target location. From this median split, the 50% of trials for which gaze position was closest to the target were classified as biased-toward trials, and the 50% of trials for which gaze position was farthest from the target location were classified as biased-away trials. Note that this sorting of trials based on eye position is relative: a trial that was categorized as “biased toward” does not necessarily imply that the gaze was positioned to the target side of the fixation dot for that trial. We sorted trials based on gaze position for each cue condition (informative and uninformative) separately. Figure 5 shows the mean gaze coordinates for the biased-toward and biased-away trials for each cue condition. This plot shows that sorting the trials based on gaze position created substantial bias in gaze position toward or away from the target position.
To test whether bias in gaze position influences the latency of the N2pc component, we examined the N2pc as a function of gaze position. Figure 6 shows the N2pc difference waves for biased-toward and biased-away trials for each condition. Using a jackknife-based procedure, we found that the N2pc did not onset earlier for biased-toward trials (186 ms after search array onset) than for biased-away trials (182 ms after array onset) for the informative-cue condition (t(25) = −0.74, p = 0.77, one-tailed) or for the uninformative-cue condition (206 and 206 ms, respectively; t(25) = 0.0, p = 0.50, one-tailed). However, the latency difference between the cue conditions was significant both for biased-toward trials (186 vs 206 ms; t(25) = 3.50, p < 0.001, one-sided) and for biased-away trials (182 vs 206 ms; t(25) = 4.05, p < 0.001, one-tailed). Thus, the effect of spatial cues on the latency of the N2pc component cannot be explained by small residual biases in eye position that persist after artifact rejection.
The N2pc does not index shifts of spatial attention
The N2pc has often been interpreted as an index of a shift of spatial attention to a target stimulus (Luck and Hillyard, 1994; Woodman and Luck, 1999). However, this view has been challenged in recent years (Kiss et al., 2008; Ester et al., 2012; Mazza and Caramazza, 2015; Tan and Wyble, 2015; Zivony et al., 2018). For example, Kiss et al. (2008), using a cued-search paradigm similar to ours, found that the amplitude of the N2pc evoked by a target during visual search was equivalent following informative and uninformative cues. Based on this finding, they argued that the N2pc does not index a shift of spatial attention because observers had shifted attention to the cued location before the search array following informative cues. However, Kiss et al. cued the hemifield (left or right) that the target would appear in rather than its exact position. Thus, it is likely that observers broadly attended the cued side of space following the cue and then focused attention on the target once the search array appeared. The N2pc that Kiss et al. observed may have reflected this refocusing of spatial attention within the target's hemifield. By contrast, informative cues in our experiments indicated exactly where the target would appear, allowing observers to precisely attend the target location in advance. Moreover, we verified the observers' prior focus of attention by examining alpha-band (8–12 Hz) oscillations that precisely track when and where spatial attention is deployed (Worden et al., 2000; Samaha et al., 2016; Foster et al., 2017).
To this end, we used an IEM (Brouwer and Heeger, 2009, 2011; Sprague and Serences, 2013) to reconstruct CTFs from the pattern of alpha across the scalp that track the allocation of spatial attention (Foster et al., 2017). This approach assumes that alpha power at each electrode reflects the joint activity of a number of spatially tuned channels (or neuronal populations). By first estimating the contributions of these channels to activity measured at each electrode on the scalp, the model can then be inverted so that the underlying response of these spatial channels can be estimated from the pattern of alpha power across the scalp (Foster et al., 2016, 2017). We used data from the informative-cue condition to train the model (i.e., estimate the contribution of each spatial channel to each electrode) and then inverted the model to reconstruct the profile of activity across the spatially selective channels for each of the conditions separately (see Materials and Methods). The resulting alpha CTFs reflect the spatial selectivity of population-level alpha activity measured with EEG.
In line with past work (Foster et al., 2017), alpha activity precisely tracked the cued location. Figure 7 shows the channel responses as a function of the impending target location following informative cues (left) and uninformative cues (right). Because filtering alpha-band activity leads to temporal smearing, we focused on a window 300–500 ms after cue onset (ending 200 ms before the onset of the search array). Channel responses in this window purely reflect activity before the onset of the search array (see Materials and Methods). Following informative cues, we found that the peak channel response tracked the cued target location, with the peak channel response always in the channel tuned for the cued target location or in the neighboring channel. As expected, the peak channel response did not track the target location (which was not cued) following uninformative cues. Figure 8 shows the time course of spatially selective of alpha-band CTFs (measured as CTF slope; see Materials and Methods) in both experiments. Following informative cues, we found that spatially tuned alpha activity started shortly after cue onset (clusters of reliable CTF selectivity started 156 ms after cue onset in Experiment 1 and 214 ms after cue onset in Experiment 2; see markers at the top of the plots in Fig. 8) and persisted through the search array until the end of the trial. Thus, observers oriented covert spatial attention to the cued location quickly following informative cues. In contrast, following uninformative cues, we found that spatially tuned alpha activity did not emerge until after the onset of the search array (clusters of reliable CTF selectivity started 138 ms after search array onset in Experiment 1 and 94 ms after array onset in Experiment 2). We also note that alpha CTF selectivity was greater after array onset for the informative-cue condition than in the uninformative-cue condition in both experiments. This finding suggests that spatially specific modulations of alpha power during this period were stronger in the informative-cue condition or were more consistent across trials, resulting in a stronger modulation in the trial-averaged data. One possibility is that observers more consistently attended the target location during this period in the informative-cue condition than in the uninformative-cue condition.
Together, these findings provide evidence that observers precisely attended the cued locations before the search display following informative cues. Thus, we found a robust N2pc following informative cues (Fig. 3) even though observers had attended the target location before the search array. These findings provide strong support for the Kiss et al. (2008) conclusion that the N2pc does not index shifts of attention per se. Thus, our results are consistent with the view that the N2pc indexes target individuation (Ester et al., 2012; Mazza and Caramazza, 2015)—the formation of an object representation that is segregated from the background and from other items in the display (Kahneman et al., 1992; Xu and Chun, 2009). This process would have unfolded following the onset of the search array, even though spatial attention was already focused on the correct location during cued trials.
Discussion
The idea that covert spatial attention speeds visual processing has a long history (James, 1890; Titchener, 1908; Shore et al., 2001). Behavioral studies have supported this idea, suggesting that information about a visual target accrues faster from attended locations than from unattended locations (Carrasco and McElree, 2001; Carrasco et al., 2004, 2006; Giordano et al., 2009). However, with behavioral evidence alone, it is difficult to tell whether attention speeds the perceptual processing of the target or enables more efficient use of this information in subsequent postperceptual stages of processing, such as decision-making and response selection. Adding to this doubt, work that focused on the latency of target-evoked neural responses has challenged the claim that sensory processing is speeded by attention, showing increased amplitude of early visual responses to attended stimuli but little or no effect on the latency of these responses (Hillyard and Anllo-Vento, 1998; Hillyard et al., 1998; Reynolds et al., 2000; Di Russo et al., 2003; McDonald et al., 2005; Lee et al., 2007; Sundberg et al., 2012). Our findings offer a reconciliation of these different lines of work.
We provide clear evidence that attention does in fact speed visually evoked responses. Observers performed a cued-search task. In some blocks, a spatial cue indicated the exact position of the upcoming target, allowing observers to focus spatial attention at the relevant location. In other blocks, the cue was uninformative. We examined how this manipulation of spatial attention influenced the N2pc, a negative deflection seen at posterior sites contralateral to visually selected stimuli. In two experiments, we found that the N2pc onset occurred earlier following informative relative to uninformative cues. In light of evidence that the N2pc indexes the formation of individuated object representations (Mazza and Caramazza, 2011, 2015; Ester et al., 2012; Pagano and Mazza, 2012), our findings suggest that spatial attention speeds this aspect of object perception.
Thus, our findings reconcile the apparent conflict between evidence from psychophysics that attention speeds perceptual processing and neural measurements that have indicated that attention does not substantially alter the latency of visually evoked responses. We propose that although attention may not speed the first feedforward sweep of visually evoked activity, attention does speed the subsequent formation of individuated object representations. Although individuation is critical to the formation of discrete perceptual representations, individuation follows the initial registration of low-level stimulus features (Mazza and Caramazza, 2015). Thus, earlier formation of individuated target representations could explain behavioral observations of faster evidence accumulation (Carrasco and McElree, 2001), even if the latency of early sensory processing is unchanged.
Although the changes in N2pc latency suggest covert attention speeded target individuation, this does not preclude latency effects at earlier or later stages of processing. Indeed, we found that the effect of spatial cueing on the latency of behavioral responses was ∼50 ms, which was considerably larger than the 20 ms effect seen in N2pc latency. This discrepancy may reflect that attention also speeds later stages of perceptual processing and/or postperceptual processes. It remains to be seen whether attention also speeds earlier stages of perceptual processing that precede the N2pc. However, a broad array of electrophysiological studies have not yet provided compelling evidence for changes in the latency of earlier sensory responses. For example, the earliest stimulus-driven responses in V4, occurring 60–100 ms after stimulus onset, show robust increases in amplitude but little or no evidence for earlier latency of responses at attended positions (Reynolds et al., 2000; Lee et al., 2007; Sundberg et al., 2012). Likewise, human EEG studies of the P1 component, which occurs 80–130 ms after stimulus onset and is thought to reflect the first feedforward sweep of activity in the extrastriate cortex (Hillyard and Anllo-Vento, 1998; Zhang and Luck, 2009), reveal clear increases in amplitude but no change in the latency of responses to stimuli at attended positions (Hillyard and Anllo-Vento, 1998; Hillyard et al., 1998; Di Russo et al., 2003; McDonald et al., 2005). In contrast to these very early responses, the N2pc component, which we have shown is speeded by attention, is a midlatency component that occurs between 150 and 300 ms after stimulus onset (Luck and Hillyard, 1994; Eimer, 1996). Thus, although current evidence suggests that attention has little effect on the latency of the very earliest sensory responses, our findings provide positive evidence that attention speeds the formation of discrete object representations as indexed by the N2pc.
Implications for the interpretation of the N2pc component
In our experiments, we observed a robust target-evoked N2pc even when we verified with alpha-band activity that observers had precisely attended the target location in advance. This finding is difficult to reconcile with the early view that the N2pc reflects a shift of covert spatial attention to a target stimulus (Eimer, 1996; Woodman and Luck, 1999) but dovetails with other recent evidence that the N2pc can be dissociated from shifts of spatial attention (Zivony et al., 2018). This result can be naturally explained by an individuation account of the N2pc (Mazza and Caramazza, 2015). Even when covert attention is deployed to the target in advance, observers must form an individuated representation of the target to perform the task (i.e., reporting which corner the target diamond was missing). Therefore, the individuation account predicts a clear target-evoked N2pc when covert attention is already focused at the target location.
We have favored an individuation account of the N2pc because of the evidence that the amplitude of the N2pc tracks the number of individuated representations (Mazza and Caramazza, 2011, 2015; Ester et al., 2012; Mazza et al., 2013). However, it must be noted that there is an ongoing debate regarding the specific perceptual process that the N2pc reflects. For example, Tan and Wyble (2015) proposed that the N2pc reflects a target localization process. This account is closely related to the individuation account because localization is considered a necessary component of object individuation (Kahneman et al., 1992; Xu and Chun, 2009). Zivony et al. (2018) proposed another possibility. Specifically, Zivony et al. suggested that the N2pc reflects attentional engagement, which they distinguished from the deployment of spatial attention. By attentional engagement, they meant the deployment of higher-level processes that enable identification and binding of a target's features, and consolidation into working memory. It worth noting that object individuation falls within this definition of attentional engagement because object individuation is an essential part of engaging with a stimulus. Although there is an ongoing debate concerning the specific perceptual process that is indexed by the N2pc, it is important to note that there is broad consensus that the N2pc reflects a midlatency stage of object processing rather than a shift of spatial attention to a target stimulus. Thus, regardless of the specific interpretation of the N2pc, our finding that attention speeded the N2pc provides neural evidence that attention speeds visual processing that comes just after the first wave of sensory activity in the visual cortex.
Conclusions
We showed that the target-evoked N2pc, a neural marker of object individuation, is speeded for targets that appear at attended locations. This finding provides neural evidence that bolsters the conclusions of past behavioral studies that attention speeds visual processing, while reconciling these findings with work that has not found latency shifts in the earliest visually evoked neural responses. Although attention may not speed the earliest stages of sensory processing, our results suggest that attention does speed the critical transition between raw sensory encoding and the formation of individuated object representations.
Footnotes
This work was supported by National Institute of Mental Health Grant 5RO1 MH087214-08. We thank Mei Arditi, Ariana Gale, and Russell Jaffe for assistance with data collection.
The authors declare no competing financial interests.
- Correspondence should be addressed to Joshua J. Foster at jjfoster{at}bu.edu