Abstract
Spatial attention is often conceptualized as a flexible “zoom lens” that can dynamically adjust its focus, but most evidence stems from studies of voluntary attention. Our study investigates whether involuntary, reflexive attention exhibits similar adaptability in attentional scope. Using behavioral and electroencephalographic (EEG) experiments with exogenous cues of varying spatial extent, we examined how attentional gradients dynamically adjust when attention is involuntarily captured. Male and female human participants performed visual search tasks preceded by narrow- or broad-cue displays at different onset asynchronies. We applied inverted encoding models to alpha-band neural activity to precisely track the locus and breadth of attentional tuning. Across experiments, we found that reflexive attentional gradients flexibly adapt to match cue characteristics. Behaviorally, narrow cues yielded progressively sharper attentional gradients compared with broad cues, with differences emerging over time. Critically, EEG analyses revealed that alpha-band activity tracked these dynamic adjustments, with differences in spatial selectivity emerging rapidly (±200 ms postcue) and continuing to evolve. Contrary to previous suggestions that involuntary attention primarily influences response efficiency, our results demonstrate that exogenous cues modulate attentional resources across the visual field at early processing stages.
Significance Statement
Our study provides novel evidence of the dynamic nature of involuntary, reflexive attention, contributing to an ongoing debate about the mechanisms of exogenous spatial attention. By employing advanced neuroimaging techniques, we demonstrate that attentional gradients can flexibly adapt to cue characteristics, revealing a more nuanced understanding of how attention is allocated during rapid, involuntary shifts of focus. These findings extend the zoom lens model beyond voluntary attention, highlighting the brain's sophisticated mechanisms of spatial attention.
Introduction
Whether watching a soccer match or searching for keys on a cluttered desk, our visual system flexibly adjusts the scope of attention from broad to narrow views as needed. While early cueing paradigms, wherein a visual cue guides attention to a specific region of space (Posner, 1980; Posner et al., 1980), conceptualized attention as a spotlight with a fixed size, the influential zoom lens model proposed a more dynamic framework where attentional resources flexibly distribute across different spatial scales (Eriksen and St. James, 1986). This model posits two key features: attention spreads with a gradient of processing quality that decreases with distance from the focus (Beck and Ambler, 1973; Downing, 1988), and an inverse relationship exists between attentional scope and perceptual enhancement: such that a narrower focus results in greater perceptual gain (Castiello and Umiltà, 1990).
Empirical support for the zoom lens model arises from voluntary shifts of covert attention. Using a precueing paradigm, Eriksen and St. James (1986) demonstrated that the breadth of attentional focus varies with the number of potential target locations, with narrower focus enhancing processing efficiency. This enhanced selectivity correlates with increased neural activity in retinotopic visual areas, where a broader attentional field expands the activated cortex but reduces local neural activity (Müller et al., 2003). Additionally, Feldmann-Wüstefeld and Awh (2020) observed increased spatial selectivity in alpha activity when attention was voluntarily narrowed. Together, these findings demonstrate that behavioral evidence supporting the zoom lens metaphor reflect changes in early stages of visual processing through sensory enhancement (Heinze et al., 1990) rather than changes in the efficiency of postperceptual processes (Eckstein et al., 2002).
Despite extensive research on voluntary attentional scaling, the neural dynamics underlying involuntary shifts remain unclear. Transient attention likely operates at early visual cortical stages (Hopfinger et al., 2000; Müller and Kleinschmidt, 2003), suggesting a similar trade-off between scope and processing enhancement. While some findings support this (Mounts and Edwards, 2016), others argue that exogenous attention primarily affects response efficiency rather than perceptual quality (Prinzmetal et al., 2005, 2009). Furthermore, transient attention enhances spatial resolution—measured as improved performance in a texture discrimination task—when drawn by a small cue but not a large one (Yeshurun and Carrasco, 2008).
Further complicating this picture, the effects of exogenous attention appear to be highly context-dependent, as initial sensory enhancement following cue onset can sometimes turn into suppression, particularly at longer cue–target intervals—an effect commonly associated with inhibition of return (IOR) (Posner and Cohen, 1984; Klein, 2000). These findings further question whether inadvertent shifts of attention in response to exogenous cues can truly be characterized by a flexible zoom lens model, as the impact on perceptual quality varies depending on spatial and temporal factors. To address this gap in the literature, we employed an inverted encoding analytic approach that precisely tracks the locus and timing of covert spatial attention (Foster et al., 2017).
To characterize whether inadvertent attentional shifts toward either a narrow or a broad region of space yield different attentional tuning profiles across time, we conducted one behavioral and one electroencephalographic (EEG) experiment in which search displays were preceded by exogenous cue displays at varying onset asynchronies (Fig. 1). Based on previous work demonstrating that alpha activity not only tracks the centroid of the attended region (Samaha et al., 2016; Foster et al., 2017; van Moorselaar et al., 2018) but also indexes the “breadth” of the selected region of space (Feldmann-Wüstefeld and Awh, 2020) in the EEG experiment, we used an inverted encoding model to measure the attentional tuning profile elicited by exogenous cues directing attention toward either a narrow or a broad region of space within a circular configuration. Together these experiments allowed us to establish when, if at all, tuning profiles in response to these cues started to diverge as predicted by the zoom lens metaphor and, if so, whether this differential tuning was sustained across time.
Schematic of the experimental procedure. On each trial, either a narrow or a broad region of space was cued by three spaced white circles. These exogenous cues had no predictive value. Following a variable interstimulus interval (150 or 650 ms in Experiment 1; 650 ms in Experiment 2), a probe display appeared. Participants were instructed to report the identity of the digit (1–9) in this display, where seven letters surrounded the digit in a circular configuration. The probe display was only visible for 50 ms after which it was masked, and participants had unlimited time to respond.
Materials and Methods
Participants
In Experiment 1, participants were recruited via the online platform Prolific. In order to participate in the experiment, prospective participants had to be between the ages of 18 and 40, fluent in English, have a 95% or better acceptance rate on the platform, and have participated in at least five other experiments. Datasets were only analyzed when an experiment was completed in full. The final sample (N = 49; mean age, ∼28; range, 18–40; 16 female) was obtained after excluding two participants who's mean accuracy was at or below the chance level. The final sample in Experiment 2 (N = 24; mean age, ∼20; range, 18–21; 18 females), recruited through the University's participant pool, was obtained after replacement of two participants where inverted encoding models (IEMs; see below) failed to produce reliable tuning curves. The sample size of Experiment 2 was based on the Feldmann-Wüstefeld and Awh (2020) study and our previous work using inverted encoding models to temporally track covert attention across conditions (van Moorselaar et al., 2018; van Moorselaar and Slagter, 2019). All participants gave their informed consent prior to the start of the study, which was approved by the Ethical Review Committee of the Faculty of Behavioural and Movement Sciences of the Vrije Universiteit Amsterdam.
Apparatus, material, and procedure
In Experiment 1, participants were required to use a desktop or laptop computer. As Experiment 1 was conducted online, via the hosting website JATOS (Lange et al., 2015), we had little control over the experimental setting, and for replication purposes, we thus report pixels values to describe the stimuli. The experiment was created in OpenSesame v3 (Mathôt et al., 2012) using OSWEB (version 1.4). Each trial began with a 750 ms gray background (RGB: 128, 128, 128) containing a black and white circular fixation point (Thaler et al., 2013) at screen center. A 100 ms cue display then appeared, showing either a black circle (radius, 22 pixels) during the staircase phase or three white circles during the experimental phase, centered at one of the upcoming search locations. Following a stimulus-onset asynchrony (SOA) of either 250 ms (short) or 750 ms (long), the probe display appeared. The probe display consisted of eight items (font size 25) arranged in an equally spaced circular array (radius, 125 pixels) around fixation. Each probe display contained one randomly selected digit (1–9), and seven letters were randomly selected without replacement from the set: A, C, D, E, F, H, K, L, M, N, P, R, S, T, U, V, W, X, Y, Z. The probe display was presented for 50 ms before being replaced by a mask display showing pound signs (#) at all stimulus locations. Cue (i.e., center cue position) and probe locations were counterbalanced. The mask remained visible until participants responded. Participants were instructed to maintain fixation at the center of the screen and report the identity of the digit via numeric key press. Responses were unspeeded, with emphasis placed on accuracy rather than response time.
The experiment started with a staircase procedure, as in Feldmann-Wüstefeld and Awh (2020), to determine the ideal contrast between background and probe stimuli (i.e., performance neither at floor nor at the ceiling level). In the staircase procedure, the cue was a black circle that indicated the upcoming digit location with 100% validity. Participants started with a contrast of 127 (difference in RGB values between background and probe items). Whenever participants responded correctly, contrast was reduced by 10%, and whenever participants responded incorrectly, contrast was increased by 20% by adjusting the RGB values of the probe items. This was done for minimally two blocks of 64 trials, with blocks being continuously added until average performance was above 40%. The ideal contrast was then determined as the average contrast of the last 11 trials of the staircase procedure.
Following the staircase procedure, the main experiment used three white circles as cues with 12.5% validity (chance level). There were two cue conditions which differed in the region of space that was being occupied by the cue. In the narrow-cue condition, the central circle appeared at a probe location, with flanking circles positioned halfway to adjacent probe locations, encompassing a single probe position. In contrast, in the broad-cue condition, the flanking circles were positioned halfway between Item 1 and 2 positions away from the center circle, encompassing three probe positions. On valid cue trials, the target digit appeared within the cued region. For the broad-cue condition, the target appeared with equal probability across all three possible positions within the cued region.
The experimental task contained 768 trials equally distributed across SOA and cue condition. Participants received feedback on their mean accuracy every 64 trials and were offered optional breaks.
Experiment 2 took place in a dimly lit room on a 23.8 in ASUS ROG STRIX XG248 LED monitor with a 240 Hz refresh rate. Stimuli were presented using Psychopy functionality (Peirce, 2009). Participants were positioned 70 cm away from the screen using a desk mounted chinrest. The eyes were tracked on- and offline using an Eyelink 1000 (SR Research) eye tracker tracking the left eye with a 1,000 Hz sampling frequency, and participants heard a beep each time fixation was broken by > 2° of the visual angle in the period before probe display onset. At the start of the experiment, the eyes were calibrated via a five-dot calibration procedure. Drift correction was applied every 96 trials, and when deemed necessary, the calibration procedure was repeated. EEG data were recorded at a sampling rate of 512 Hz with default settings using a 64-electrode cap with electrodes placed according to the 10–10 system (Biosemi ActiveTwo system; biosemi.com), with reference electrodes placed at the left and right earlobes. Vertical and horizontal EOG (VEOG/HEOG) were recorded via external electrodes placed ∼2 cm above and below the eye and ∼1 cm lateral to the external canthi, respectively.
Trials started with a randomly jittered fixation display (1,050–1,350 ms), and the subsequent cue display (radius cue circles, 0.45°) was followed by a 650 ms interval before probe onset. Probes, with a height of 0.58°, were presented on an imaginary circle with a radius of 2.5°. The experiment started with the same staircase procedure as in Experiment 1, with the difference that individual blocks contained 256 trials, which continued after at least two blocks until performance was above 33%.
Following the staircase procedure, participants performed four blocks of 384 trials each, equally distributed between cue conditions, with cue (i.e., center cue position) and probe locations again counterbalanced per condition. Participants were offered an optional 20 s break every 96 trials and received feedback on their mean accuracy after each block.
Behavioral analysis
In Experiment 1, accuracy was analyzed as a function of three factors: cued region size [narrow (one position) vs broad (3 positions)], SOA [short (250 ms) vs long (750 ms)], and target distance from the cue. Target distance was calculated by subtracting the digit position from the center cue position, resulting in values from −4 to 4, where negative values indicate counterclockwise distances and positive values indicate clockwise distances. Valid trials were defined as those where the target appeared at Distance 0 in the narrow-cue condition, or at distances of −1, 0, or 1 in the broad-cue condition.
To quantify the attentional gradient across cue–target distances, we first calculated linear slopes of accuracy as a function of distance from the center of the cued region for each participant, separately for each cue condition and SOA. We implemented a linear polynomial fitting procedure to derive slope estimates. These slope values serve as quantitative indicators of spatial tuning strength, with steeper positive slopes reflecting enhanced selectivity for the cued location. Resulting data were analyzed with repeated-measures ANOVAs, where reported p values are Greenhouse–Geisser corrected in case of sphericity violations, followed by planned comparisons with paired t tests using the JASP software (JASP-TEAM, 2018).
For a more detailed characterization of the attentional gradient, accuracy as a function of cue–target distance was also fit to an exponential cosine function separately for all conditions (van Moorselaar and Slagter, 2019). This fitting procedure yielded four key parameters: the center (i.e., mean), the baseline (i.e., vertical offset), the concentration (i.e., inverse of the width), and the amplitude (i.e., vertical scaling) of the function. Higher-concentration values indicate sharper tuning profiles, while larger amplitude values reflect stronger attentional modulation at the center of the profile.
Bootstrap testing
To assess the statistical significance of differences in amplitude, baseline, and concentration parameters between conditions, we employed a bootstrap procedure combined with permutation testing. Bootstrap resampling was performed separately for each condition and parameter using 1,000 iterations. For each bootstrap iteration, we randomly sampled participants with replacement and computed the parameter of interest (amplitude, baseline, or concentration) by averaging the parameters from individual fits. Interaction effects were calculated as the difference between the differences of the condition means [i.e., (A1 − A2) − (B1 − B2)]. The resulting interaction distribution was used to estimate 95% confidence intervals. To correct for potential bias and skewness, we computed bias-corrected and accelerated (BCa) confidence intervals rather than standard percentile intervals. The BCa method accounts for asymmetry and bias in the resampled distributions, providing more accurate interval estimation (Efron, 1987). If the 95% BCa confidence interval does not include zero, this points to a statistically significant effect.
EEG preprocessing
All preprocessing steps and subsequent analytical procedures were executed using custom Python scripts available in a public repository (https://github.com/dvanmoorselaar/DvM). These scripts primarily utilize functionality implemented in the MNE package (Gramfort et al., 2014).
The recorded EEG data underwent offline rereferencing to the average of both earlobes, followed by zero-phase “firwin” high-pass filtering at 0.01 Hz to eliminate slow signal drifts. Electrodes identified as malfunctioning during recording (M = 0.33; range, 0–3) were temporarily excluded from analysis to prevent contamination of subsequent preprocessing steps. The continuous recordings were segmented into epochs spanning from 750 ms before to 1,250 ms after cue display onset, with our primary window of interest being −250 to 750 ms. Following independent component analysis (using MNE's “picard” method on 1 Hz filtered epochs), eyeblink components were identified and removed.
To eliminate noise-contaminated epochs, we implemented an automated artifact rejection procedure. The EEG signal was bandpass filtered between 110 and 140 Hz to isolate muscle activity, which was then converted to z-scores. We established participant-specific z-score thresholds based on the variance within the windows of interest. Rather than immediately discarding epochs exceeding this threshold, we employed an iterative approach (Duncan et al., 2023) wherein the five electrodes contributing most substantially to the accumulated z-score within the marked artifact period were identified and sequentially interpolated using spherical splines (Perrin et al., 1989). Only epochs that continued to exceed the threshold after this interpolation procedure were ultimately rejected, resulting in an average exclusion of 8.7% of trials (range, 0–22.6%). In the final preprocessing step, any previously identified malfunctioning electrodes were interpolated using spherical splines.
Eye position samples were synchronized with the EEG data during offline analysis and converted to visual degrees representing deviation from central fixation. To maintain interpretative validity, we excluded epochs where gaze position deviated >1° from central fixation during the interval from −100 to 400 ms relative to cue onset. For instances with missing eye-tracking data, we detected eye movements using an algorithm examining HEOG amplitude changes (window size, 200 ms; step size, 10 ms, threshold: 15 µV). This procedure resulted in the exclusion of an additional 4.8% of the previously cleaned data (range: 0.9–24.4%).
Time–frequency analysis
Analyses presented here are based on posterior electrodes only (i.e., 32 posterior electrodes). Time–frequency decomposition was conducted to quantify both total and evoked alpha power. Although we expected effects to be most pronounced within the alpha-band (8–12 Hz), we also examined a broad range of frequencies (4–40 Hz with bands set in 25 logarithmic steps). The artifact-free EEG signals were processed with a fifth-order Butterworth bandpass filter implemented in MNE. Subsequently, evoked power and total power were calculated after extraction of the complex analytic signal via a Hilbert transform. Evoked power was computed by averaging the complex analytical signal across trials before squaring the complex magnitude of the analytic signal, whereas this averaging was done after power subtraction for total power. Consequently, evoked power reflects activity phase-locked to stimulus onset, and total power reflects ongoing activity irrespective of its phase relationship to the onset of the memory stimulus.
Since the calculation of power necessitates trial averaging, we divided our dataset into separate training and test sets (detailed below). For each cue location (i.e., center of the cue), both power measures were calculated by averaging across corresponding trials using the procedures described above.
Inverted encoding model
To reconstruct location-selective channel tuning functions (CTFs) from scalp distributions of frequency power, we implemented an IEM following Foster et al. (2015). This analytical approach assumes that power measurements (see above, Time–frequency analysis) at each electrode reflect the weighted sum of eight spatial channels (neural populations), each selectively tuned to different angular locations. The response profile of each spatial channel was modeled using a half-sinusoid function as follows:
The IEM procedure was applied to power values at each time point through a two-stage process utilizing independent training and test datasets (see below, Data partitioning). In the training stage, we used the training data (B1) to estimate a weight matrix representing each spatial channel's contribution to electrode-specific power measurements. With B1 (m electrodes × n1 observations) representing power values, C1 (k channels × n1 observations) denoting predicted channel responses based on basis functions, and W (m electrodes × k channels) characterizing the linear mapping from channel space to electrode space, we formulated a general linear model:
We circularly shifted these channel responses to center them at a common reference point (0°), corresponding to the cue's central location. The resulting channel responses (8 channels × 8 location bins) were averaged across location bins to quantify general spatial selectivity in tracking the cued position, independent of specific locations.
Dataset partitioning
For all IEM analyses, separately for each condition of interest, we randomly divided our data into three independent sets, with two serving as training data (B1) and the third as test data (B2). Importantly, we used a leave-one-out cross–validation routine, which was repeated until each set had served as testing set (i.e., B2). To ensure balance across locations and conditions, we equalized trial numbers across cued locations by selectively discarding excess trials. For each dataset partition, we averaged across trials for individual cue locations to calculate power.
To enhance signal reliability, we implemented an iterative approach for CTF estimation. The procedure of random data partitioning into training and test sets was repeated across 10 iterations, with the resulting CTFs averaged to obtain the final estimates.
CTF evaluation
To evaluate the spatial selectivity exhibited by the CTFs, we implemented the same linear polynomial fitting procedure to derive slope estimates as in Experiment 1.
For statistical validation of the reconstructed CTFs and examination of between-condition differences, we analyzed these slope values using nonparametric cluster-based statistics. Specifically, we employed a one-sample paired t test in conjunction with Monte Carlo randomization techniques, implemented through MNE's permutation_cluster_1samp_test function. This statistical approach effectively addresses the temporal correlation inherent in the data while controlling for multiple-comparison issues (Maris and Oostenveld, 2007). The statistical procedure followed a resampling protocol wherein each iteration generated a new dataset randomly drawn from the observed data. For each resampled dataset, the signs were randomly flipped, and clusters were formed by identifying adjacent time points with t values exceeding a predetermined threshold. Only the cluster with the highest cumulative t value was preserved for each iteration. This resampling and cluster identification process was executed 1,024 times (default setting), generating a permutation distribution of cluster statistics. The empirically observed clusters from the actual (nonpermuted) data were then evaluated against this permutation distribution. Statistical significance was established when a cluster's test statistic surpassed the 95th percentile threshold of the permutation distribution, indicating that either the CTF slope itself or the difference between conditions was statistically reliable.
Using the same procedure as in Experiment 1, estimated CTF functions were also fit to exponential cosine function separately for narrow and broad cues, which yielded estimates for the center (i.e., mean), the baseline (i.e., vertical offset), the concentration (i.e., inverse of the width), and the amplitude (i.e., vertical scaling) of the function.
Results
In Experiment 1, we aimed to establish whether exogenous cues of varying spatial extents (narrow vs broad) would elicit different attentional tuning profiles at the behavioral level. Specifically, we investigated whether the spatial characteristics of attention-capturing stimuli would influence the gradient of spatial attention and how these potential differences might evolve over time. By using a precueing paradigm with cues of different sizes and measuring performance across short and long SOAs, we sought to test whether attentional gradients dynamically adapt to match the spatial distribution of exogenous cues. This behavioral experiment provided an initial test of the hypothesis that the breadth of attentional focus is not fixed but flexibly adjusts to the characteristics of attention-capturing stimuli, setting the stage for a more detailed neural investigation in Experiment 2.
To examine whether the dynamics of perceptual selection were modulated by the breadth of attentional cues, estimated linear slopes were entered into a repeated-measures ANOVA with within subject's factors cue condition (narrow vs broad) and SOA (short vs long). The main effects of cue condition and SOA (all F's > 5.0; all p's < 0.031) were accompanied by a significant interaction (F(1, 48) = 9.5; p = 0.003;
Results of Experiment 1: narrow cues yield progressively sharper attentional gradients compared with broad cues. Markers, red circles and black squares, show mean accuracy as a function of cue–target distance for narrow- and broad-cue conditions, respectively, for both short SOA (left panel) and long SOA (right panel). Error bars in all plots and all subsequent plots represent 95% within-subject confidence intervals (Morey, 2008). Solid lines display the model fit from the exponential cosine function fitted to the aggregate data. Dashed vertical gray lines represent locations that fell both within broad- and narrow-cued region, with narrow cues only encompassing Distance 0. Insets in the figure display the data used to estimate linear slope values, which are displayed by solid lines.
As slopes are a rather crude measure, we also fitted an exponential cosine function to better characterize the observed differences as a function of cue–target distance in circular space. These fits are displayed by the solid lines within Figure 2. While the estimated baseline (i.e., vertical offset) of the function showed no interaction (95% BCa CI [0.017, 0.018]) both the amplitude (i.e., vertical scaling) and concentration (i.e., inverse of the width) mimicked the slope findings. Specifically, cue condition by SOA interactions confirmed that over time narrow cues yielded increasingly higher amplitudes (95% BCa CI [−0.151, −0.147]) and sharper tuning profiles (95% BCa CI [−13.067, −12.503]) compared with broad cues. Notably, even at the earliest SOA, the observed difference between narrow- and broad-cue conditions was already reliable for both amplitude (95% BCa CI [−0.021, −0.017]) and concentration parameters (95% BCa CI [−0.562, −0.173]).
Together these findings demonstrate that reflexive attentional gradients are not fixed but dynamically adjust to match the spatial distribution of the attentional cue. However, given the online setting without continuous eye movement monitoring, eye movements remain a significant concern, particularly at the longest SOA where participants had more time to execute saccades or gaze drifts toward the cued location. This leaves it unclear to what extent the observed interaction between cue condition and SOA reflects genuine attentional processes versus the influence of eye movements. Nevertheless, the reliable differences between narrow- and broad-cue conditions observed at the earliest SOA, where eye movements are less likely to occur, suggest that cues of different sizes do differently adjust the breadth of attentional tuning. Therefore, we conducted Experiment 2 in a controlled laboratory setting with continuous eye tracking to examine whether we could replicate this dynamic attentional process and identify its neural correlates.
While Experiment 1 demonstrated behaviorally that attentional gradients dynamically adapt to cue characteristics, the neural mechanisms underlying this process remain unexplored. In Experiment 2, we sought to directly examine the neural dynamics of this adaptive process by employing IEMs to reconstruct the spatial distribution of attention from alpha-band EEG activity. This approach allows us to track the temporal evolution of attentional tuning with millisecond precision, potentially revealing how differences in attentional gradients emerge and develop in response to narrow versus broad exogenous cues.
Additionally, conducting Experiment 2 in a controlled laboratory setting with EEG recording addresses a limitation of Experiment 1 by allowing us to monitor eye movements, thereby clarifying whether the observed temporal dynamics reflect genuine attentional processes rather than overt orienting. By linking neural activity to behavioral performance, we aimed to provide converging evidence for the dynamic and adaptive nature of reflexive attentional gradients.
As visualized in Figure 3, counter to Experiment 1, across the board behavioral accuracy was higher in the broad- compared with the narrow-cue condition, and the estimated linear slopes did not differ between conditions (t = 0.3; p = 0.75). However, a more nuanced analysis revealed that despite the overall less clear behavioral pattern, the attentional gradient still varied as a function of cue size. Specifically, the parameters of the exponential cosine function suggested that narrow cues elicited a faster drop in discrimination accuracy as distance from the cued region increased. The confidence intervals for both the amplitude (i.e., vertical scaling; 95% CI [0.005, 0.008]) and concentration (i.e., inverse of the width; 95% CI [2.538, 3.059]) did not include zero, suggesting differences between narrow- and broad-cue conditions. This statistical discrepancy highlights the subtlety of the effects observed in Experiment 2 and suggests that while trends were consistent with Experiment 1, they were not as robust in the laboratory setting. The difference between slopes and estimated tuning parameters further indicates that the exponential cosine function may have captured nuanced aspects of attentional tuning that were not detected by the simpler linear slope analysis.
Behavioral and ERP results of Experiment 2: narrow cues yield sharper attentional gradients compared with broad cues. A, Mean performance across conditions as a function of cue–target distance. Markers, red circles and black squares, show mean accuracy as a function of cue–target distance for narrow- and broad-cue conditions, respectively. Solid lines display the model fit from the exponential cosine function fitted to the aggregate data. Dashed vertical gray lines represent locations that fell both within broad- and narrow-cued region, with narrow cues only encompassing Distance 0. Insets in the figure display the data used to estimate linear slope values, which are displayed by solid lines. B, Lateralized difference waveforms (contralateral vs ipsilateral at PO7/8, PO3/4, O1/2) elicited by the probe display for targets at the horizontal midline and below the midline as a function of the absolute cue–target distance. *Denotes significance estimated gradients in the selected window of interest, with windows of interest (100–220 ms relative to probe onset) denoted by the black horizontal line.
These findings suggest that while linear slope estimates did not differ between conditions, the exponential cosine function revealed more nuanced differences in attentional gradients. Specifically, narrow cues continued to produce a more pronounced decline across spatial locations compared with broad cues, even though this effect was not captured by the simpler linear slope analysis. Notably, despite the controlled laboratory setting of Experiment 2, which allowed for precise monitoring of eye movements and eliminated potential confounds present in the online experiment, the behavioral patterns were less clear-cut than in Experiment 1 (see Discussion for potential reasons).
Event-related potential (ERP) results
Given these modest behavioral effects, we conducted an exploratory ERP analysis to examine whether neural markers might provide clearer evidence of attentional gradient differences between cue conditions. We focused on the posterior positivity contralateral (PpC) component elicited by the probe display, as this component has been linked to imbalances in low-level sensory properties between visual hemifields (Corriveau et al., 2012; Fortier-Gauthier et al., 2012). We reasoned that spatially attended stimuli may be rendered more salient through attentional enhancement, potentially creating functional imbalances that could elicit a PpC even when physical stimulus properties are matched across locations. If different cue types modulate the spatial distribution of attention, this might be reflected in systematic variations of PpC amplitude as a function of distance from the cued location. This exploratory analysis aimed to determine whether neural measures might reveal spatial attention effects that were not robustly evident in the behavioral data.
For this analysis, preprocessed data were low-pass filtered at 30 Hz to focus on slower ERP components, after which they were baseline corrected using a 200 ms window before probe display onset. Lateralized difference waveforms were obtained by subtracting ipsilateral waveforms from contralateral waveforms at electrodes PO7/8, PO3/4, and O1/2, which were selected based on visual inspection. To quantify the PpC, mean amplitudes were analyzed within a time window from 100 to 220 ms following probe onset (Fig. 3B). For each cue condition, mean amplitudes within a time window from 100 to 220 ms following probe onset were used to compute linear slopes across five cue–target distances (0–4), where 0 indicated perfect alignment between the cue center and target location. This analysis revealed significant negative slopes in both the narrow- (t(23) = 3.85; p < 0.001; d = 0.79) and broad-cue (t(23) = 2.31; p = 0.03; d = 0.47) conditions, indicating the presence of a spatial attentional gradient in both conditions. Crucially, the slope was significantly more negative in the narrow- compared with the broad-cue condition (t(23) = 2.26; p = 0.034; d = 0.47), suggesting that attention was more tightly focused when the cue occupied a smaller spatial region. Note that this ERP analysis was exploratory, conducted in response to reviewer comments, with electrode selection and time windows determined through visual inspection rather than a priori hypotheses. These results should be interpreted cautiously and would benefit from confirmatory replication.
These ERP findings provided converging evidence for differential attentional gradients between narrow- and broad-cue conditions, complementing the behavioral results by demonstrating clear neural signatures of spatial attention modulation. We next examined to what extent this attentional gradient modulation by cue breadth could be tracked by changes in neural oscillations in the alpha frequency band, a brain rhythm that has been robustly linked with modulations of incoming sensory information. We next examined to what extent this attentional gradient modulation by cue breadth could be tracked by changes in neural oscillations in the alpha frequency band, a brain rhythm that has been robustly linked with modulations of incoming sensory information.
IEM results
Having established that reflexive attentional gradients continued to dynamically adjust to the spatial distribution of the attentional cue when eye movements were properly controlled, we next examined to what extent the time course of reconstructed attentional tuning profiles tracked this dynamic process. Based on previous research (Samaha et al., 2016; Foster et al., 2017; van Moorselaar et al., 2018), we expected any effects to be especially present within the alpha-band. As visualized in Figure 4A, and in line with previous findings, a range of low frequencies (4–15 Hz) with the peak centered around the upper range of the theta-band tracked the centroid of the cued locations via evoked power, with no differences between conditions. In contrast, total power reconstructions were most pronounced within the alpha-band (Fig. 4B), and critically also total alpha power slopes were reliably larger in the narrow- compared with the broad-cue condition (see dotted white outline for cluster-based permutation differences).
Time–frequency representation of CTF slopes: total alpha power is sensitive to differences between broad and narrow cues. A, Identification of the cue location via evoked power (i.e., phase-locked to stimulus onset) across frequency bands as indexed by CTF slopes. B, Identification of the cue location via total power (i.e., power irrespective of phase relationship to cue onset) across frequency bands as indexed by CTF slopes. All nonsignificant values were set to zero in a two-step procedure. First, each individual data point was tested against zero with a paired-sampled t test. After setting nonsignificant values to zero, data were evaluated using cluster-based permutation. White outline denotes significant slope difference between narrow- and broad-cue conditions (p < 0.05).
To better understand how the breadth of spatial attentional cues influenced spatial tuning, subsequent analysis focused on total power within the alpha-band (8–12 Hz). Figure 5A shows the slopes of the reconstructed tuning profile across time. The reconstructed CTFs were again fitted with an exponential cosine function following the same procedure used to analyze behavior. This was done in two separate 100 ms time windows, one relatively early following the cue (i.e., 150–250 ms) and one before probe display onset (i.e., 650–750 ms), mimicking the short and the long SOA in Experiment 1. As visualized in Figure 5B, reconstructed tuning profiles already showed differences between cueing conditions in the early interval, which appeared to become more pronounced over time. Indeed, cue condition by interval interactions confirmed that although overall tuning was attenuated over time, narrow cues yielded increasingly higher amplitudes (95% CI [0.,04, 0.05]) and sharper tuning profiles (95% CI [1.97, 2.21]) compared with broad cues, with no such difference in baseline estimates (95% CI [−0.08, 0.05]).
Alpha-band IEM results of Experiment 2: narrow cues yield progressively sharper tuning profiles compared with broad cues. A, Average alpha-band channel–tuning function (CTF) across time for all eight locations. B, Alpha power CTF slopes tuned to the centroid of the cued locations across time. Colored dashed bars on the x axis (black; red) represent time points where conditions differ significantly from 0 after cluster correction (p < 0.05), and the black and red dashed line indicates cluster correction differences between conditions. C, Estimated channel responses for narrow- (red circles) and broad-cue (black squares) conditions for an epoch from 150 to 250 ms (left panel; early time window) and an epoch from 650 to 750 ms (right panel; late time window) postcue onset.
These neural findings provide a compelling neural signature that mirrors the behavioral patterns observed in both experiments while simultaneously revealing a more refined and sensitive index of attentional orienting. The parallel between the temporal dynamics of alpha-band tuning and behavioral performance suggests that IEMs can effectively track the evolving spatial distribution of attention as it dynamically adjusts to match cue characteristics. Notably, the neural data revealed that attentional tuning differences between narrow and broad cues emerge rapidly (within 150–250 ms postcue) and become increasingly pronounced over time—a trend consistent with the SOA-dependent effects observed behaviorally in Experiment 1, but captured with a more fine-grained temporal resolution by the alpha-band activity. This convergence between neural and behavioral measures underscores the sensitivity of alpha-band activity as an index of spatial attention, offering direct electrophysiological evidence for the dynamic and adaptive nature of reflexive attentional gradients that behavioral measures could only partially resolve.
Discussion
The present study investigated whether and how reflexive attentional gradients dynamically adjust to match the spatial distribution of exogenous cues. Across two experiments, we found compelling evidence that the breadth of attentional focus is not fixed but flexibly adapts to the characteristics of attentional cues, even when attention is captured involuntarily. In Experiment 1, we demonstrated behaviorally that narrow cues yield progressively sharper attentional gradients compared with broad cues, with this difference becoming more pronounced over time. Extending these findings, Experiment 2 provided neural evidence that alpha-band activity tracks these dynamic adjustments in attentional tuning, revealing that differences in spatial selectivity emerge rapidly and continue evolving. Such dynamic adjustment aligns with theories proposing that attention operates through multiple stages, with initial capture followed by more fine-tuned selection processes (Nakayama and Mackeben, 1989; Van Zoest et al., 2004).
A key innovation is the use of inverted encoding models to track not only the locus but also the breadth of reflexive attentional tuning with high temporal precision. Time–frequency analyses revealed that evoked power tracked the centroid of cued locations following cue onset across low frequencies (4–15 Hz), peaking in upper theta, but showed no differences between cue conditions. In contrast, total power—capturing ongoing, nonphase-locked dynamics—was most pronounced in the alpha-band, and critically, alpha-based reconstructions revealed reliable differences between cue conditions. However, because these analyses rely on temporally filtered data, which inevitably introduces temporal smearing, it is difficult to pinpoint the exact onset of these effects. Nevertheless, evoked alpha-band results suggest that despite clear differences in sensory input, initial attentional orienting did not differ markedly between conditions. If early attentional orienting had differed, we would have expected these differences to manifest in the evoked power, given its sensitivity to early, stimulus-locked processing.
These time courses stand in contrast to voluntary shifts of attention directed to either narrow or broad regions of space. A comparable study by Feldmann-Wüstefeld and Awh (2020), which used physically identical cues for narrow versus broad focus conditions, observed that attentional tuning required ∼200 ms to be instantiated and up to 600 ms for gradients to diverge. While such delayed divergence is expected following voluntary shifts of attention (Müller and Rabbitt, 1989), our exogenous attention paradigm revealed faster emergence of condition differences in ongoing alpha activity. Although our study focused specifically on exogenous attention and did not include a direct within-subject comparison with voluntary attention, these cross-study temporal differences suggest that the mechanisms governing attentional breadth may operate differently across attention systems. Intriguingly, despite these observed rapid condition differences initial attentional orienting appeared not to differ between cue conditions. Although speculative, one potential explanation for this phenomenon is the well-established “global effect” in oculomotor research (Van der Stigchel and Nijboer, 2011). Just as saccades tend to land at the center of gravity of visual configurations (Coren and Hoenig, 1972; Findlay, 1982), covert attention may initially orient to the center of a stimulus array regardless of its spatial extent. This tendency has been attributed to the integration of activity across overlapping receptive fields in structures like the superior colliculus (Vokoun et al., 2014), resulting in a single central peak of activity. Our findings suggest a similar mechanism that may govern exogenous attention, whereby attention is initially drawn to a weighted spatial average of the cue configuration, with subsequent refinement of attentional breadth requiring additional processing time to properly match the spatial distribution of the eliciting stimuli. This interpretation is consistent with the ideas that particularly exogenous shifts of attention are associated with execution of a saccadic eye movement (Smith et al., 2004). For example, Belopolsky and Theeuwes (2009) showed that voluntary attention shifts are not necessarily associated with eye movements while exogenous shifts were unequivocally associated with saccade preparation [see also Smith et al. (2012) for a similar result].
These findings make significant contributions to our understanding of spatial attention. While the zoom lens model has been extensively validated in studies of voluntary attention (Eriksen and St. James, 1986; Castiello and Umiltà, 1990; Feldmann-Wüstefeld and Awh, 2020), our results provide novel evidence that involuntary shifts of attention exhibit similar flexibility in attentional scope. This speaks to an ongoing debate in attention research, where some have argued that involuntary attention primarily influences response efficiency rather than enhancing the quality of visual perception through sensory enhancement (Prinzmetal et al., 2005, 2009).
Wang et al. (2021) also proposed that alpha activity reflects a spatial attention mechanism operating like a zoom lens. However, their study focused on working memory, where memory arrays of varying spatial extent were presented after attention was voluntarily directed to one hemifield. Under these conditions, they observed a global reduction in alpha power. As such, the link to the zoom lens metaphor was, at best, indirectly inferred (Woodman et al., 2022), an interpretation that is difficult to support, given that global alpha reductions can result from a variety of nonspatial task demands. Indeed, global alpha power suppression has been linked to memory load (Fukuda et al., 2015), distractor suppression (Wang et al., 2021), and task difficulty in general (Klimesch, 1999), making it difficult to attribute such effects specifically to changes in attentional scope. It should be noted that the current study differs substantially from that of Wang et al. (2021) in both experimental design and theoretical focus. Rather than examining alpha modulations following the onset of a memory array after voluntary attentional shifts, as in Wang et al. (2021), our paradigm employed a reflexive precueing design with masked letter targets, allowing us to isolate involuntary shifts of attention and their immediate effects on sensory encoding. This approach directly tests whether exogenous attention can flexibly adjust spatial resolution—a hypothesis that has received comparatively little empirical support relative to its voluntary counterpart.
Moreover, the use of inverted encoding models applied to alpha-band activity represents a key methodological advance, as it provides a direct measure of attentional modulation at the level of sensory processing that is temporally precise and independent of behavioral responses (Samaha et al., 2016; Foster et al., 2017; Foster and Awh, 2018; van Moorselaar et al., 2018). The current EEG results revealed distinct alpha tuning profiles for narrow- and broad-cue conditions, demonstrating that involuntary attention, like voluntary attention, can operate in a spatially graded and flexible manner.
Importantly, the robustness of these neural effects contrasts with the relatively weak and inconsistent behavioral effects, particularly in Experiment 2. One potential explanation for this dissociation is offered by the normalization model of attention (Reynolds and Heeger, 2009), which proposes that attention can modulate visual processing through either response gain or contrast gain, depending on the relative size of the attention field and the stimulus. In the present study, where the luminance of targets and distractors was individually staircased, resulting in low-visibility stimuli, differences in the spatial extent of the attentional field (as induced by broad vs narrow cues) may have engaged distinct gain mechanisms, potentially reducing the observable differences in performance across conditions (Herrmann et al., 2010; Itthipuripat et al., 2014). Specifically, broader attention fields may have yielded a contrast gain effect that improved visibility across a wider area, while narrower fields may have resulted in more localized response gain—ultimately yielding comparable levels of performance. Moreover, the temporal dynamics of voluntary versus involuntary attention may also contribute to this behavioral–neural dissociation. While endogenous attention shows robust effects at longer intervals, exogenous attention is more transient, peaking ∼100–200 ms after cue onset (Müller and Rabbitt, 1989). Our 650 ms cue–target interval was chosen to isolate sustained attentional tuning changes while avoiding early sensory transients, but this timing may have occurred outside the optimal window for observing strong behavioral benefits of exogenous orienting. This temporal consideration may help explain why studies using similar paradigms with voluntary attention (Feldmann-Wüstefeld and Awh, 2020) observed more robust behavioral effects alongside their neural findings. Additionally, beyond the mechanisms proposed by the normalization model, it has also been shown that involuntary attention may be more sensitive to cue uncertainty than the cue size per se (Huang et al., 2016). It is also important to acknowledge that behavioral performance can be influenced by numerous factors unrelated to the spatial distribution of attention, such as higher-level strategies, response history, individual differences in experience and learning, and other sources of variability (Yeshurun, 2019), further obscuring subtle attentional. Taken together, these considerations help explain why strong condition effects emerged in the neural data but were less robust in behavior, and they underscore the value of electrophysiological measures—such as alpha-band IEMs—for revealing fine-grained attentional dynamics that behavioral metrics alone may not readily capture.
A noteworthy aspect of our findings is the absence of IOR in the neural data and behavioral data. While IOR is typically observed at longer cue–target intervals, leading to suppressed processing at previously attended locations (Posner and Cohen, 1984; Klein, 2000), the alpha-band analyses provided no indication that the cued location became suppressed over time, nor was processing impaired at the cued location. However, IOR is most robustly observed in response time measures, particularly in simple detection tasks (Klein, 2000), and is considerably less reliable in accuracy-based tasks like ours (Samuel and Kat, 2003). Moreover, according to the reorienting hypothesis, IOR emerges only after attention has disengaged from the cued location, a process that often depends on a central fixation cue (Klein, 2000). Since our design did not include such a mechanism to actively reorient attention, the necessary conditions for IOR may not have been met, despite the involvement of exogenous attention. Finally since IOR likely arise from oculomotor and motor preparation circuits (Sapir et al., 1999; Taylor and Klein, 2000), its effects may be confined to later processing stages or response selection rather than early sensory modulation captured by alpha-band activity.
In conclusion, our findings demonstrate that reflexive attentional gradients adapt to match the spatial distribution of attentional cues, with narrow cues yielding sharper attentional gradients compared with broad cues. This flexibility in attentional scope, previously established in studies of voluntary attention, extends to situations where attention is captured involuntarily. The temporal dynamics, characterized behaviorally and through alpha-band neural activity, reveal continuous refinement of attentional selection following initial capture. These results provide insights into the mechanisms of spatial attention and highlight the dynamic nature of attentional control, even when attention is shifted reflexively.
Footnotes
This research was supported by a NWO VICI Grant to S.S. and a European Research Council (ERC) Advanced Grant (833029) to J.T. We thank Eleonora Assarioti for her invaluable assistance in data collection.
The authors declare no competing financial interests.
- Correspondence should be addressed to Dirk van Moorselaar at dirkvanmoorselaar{at}gmail.com.











