Abstract
Research on feature-based attention has shown that selecting a specific visual feature (e.g., the color red) results in enhanced processing in early visual cortex, providing the neural basis for the efficient identification of relevant features in many everyday tasks. However, many situations require the selection of entire feature ranges instead of just a single feature value, and recent accounts have proposed that broadly tuned attentional templates are often critical for guiding selection in cluttered visual scenes. To assess the neural implementation of such broad tuning of feature-based attention, we here recorded frequency-tagged potentials in human observers (male and female) while participants attended to narrow or broad ranges of colors of spatially intermingled dot fields. Our results show clear increases in the signal strength for the attended colors relative to unattended colors for both narrow and broad color ranges, though this increase was reduced for the broad-range condition, suggesting that limits in the breadth of attentional tuning arise at early processing stages. Overall, the present findings indicate that feature-selective attention can amplify multiple contiguous color values in early visual cortex, shedding light onto the neural mechanisms underlying broad search templates. More generally, they illustrate how feature-based attention can dynamically “zoom in” and “zoom out” in feature space, mirroring models of spatial attention.
Significance Statement
Many daily situations require the human brain to focus attention to entire sets of feature values, for example, when looking for apples in the supermarket which may range from red to yellow to green. How is such broad selection of perceptually contiguous features accomplished? Using electroencephalography, we directly measured early visual processing while participants attended to different color ranges. Our results demonstrate that processing of entire sets of colors is increased in early visual cortex, though the magnitude of this enhancement is modulated by the selected range. This result is important for our understanding of how attention is allocated in complex visual scenes in which relevant inputs are often variable and not defined by a single feature value.
Introduction
To deal with the overwhelming influx of sensory information, it is essential that our brain focuses its resources on behaviorally relevant information and prioritizes its processing. This process of attentional selection can operate based on spatial locations or visual features and results in enhanced processing of attended information in both cases (Carrasco, 2011).
Influential models of spatial attention have put a strong emphasis on the size of the attentional focus, stating that spatial selection can flexibly adjust to select smaller or larger regions in the visual field (Eriksen and St. James, 1986; Eriksen and Yeh, 1985). This concept of attention as a “zoom-lens” has been incorporated in more recent computational models of attention, such as the normalization model (Reynolds and Heeger, 2009) that allows the size of the attentional focus to be broadened or narrowed, which has been useful in reconciling different effects of attention on stimulus processing (Herrmann et al., 2010; Itthipuripat et al., 2014). Importantly, this model assumes that the size of the selection window can be varied not only in location space but also in feature space, as it treats spatial and feature-based attention alike. Thus, just like adjusting the spatial scale of selection, within this framework, the scale of feature-based attention can be adjusted to include smaller or larger ranges of feature values. Consistent with this account of a flexible scaling mechanism, other recent work in visual search has proposed that attention can be guided based on relatively coarse and broad feature templates that are often sufficient to support the rapid localization of relevant items in cluttered visual scenes (Yu et al., 2023).
Thus, current models of attention often assume that selecting broad ranges of feature values is possible, yet empirical evidence for this is scarce. To date, only a few behavioral studies have examined this. In one study, participants were cued with either a high or low uncertainty orientation cue to broaden or narrow their focus of attention, respectively; results showed that the cue enhanced participants’ performance equally well in an orientation discrimination task regardless of cue uncertainty, suggesting that they allocated attention across both conditions effectively (Herrmann et al., 2012). Another study, in which participants were cued to attend to smaller or larger ranges of color values, demonstrated an impressive flexibility in participants’ ability to select broad ranges of colors with only minimal costs in performance (Chapman and Störmer, 2023). Together, these data are consistent with an attention system that scales its focus according to current task demands. However, a key question that remains unanswered from these behavioral findings is whether attending to a range of features is supported by enhanced processing in early visual cortex, as has been shown for a single feature value (Müller et al., 2006; Andersen et al., 2008), or whether the selection of multiple feature values is implemented at later processing stages, and only individual feature values are successfully enhanced at an early processing stage.
We tested this by directly assessing early visual processing of attended and unattended feature values across different feature ranges using electroencephalography. We sought to address two questions: first, does feature-based attention increase the neural processing of a range of feature values, and second, how does the amount of range affect the magnitude of these modulations in early visual cortex? We used a sustained feature-based attention task in which we systematically varied the range of the to-be-attended colors and measured steady-state visual evoked potentials (SSVEPs) elicited by these stimuli. We found evidence consistent with attentional enhancement for both small and large color ranges, indicative of a flexible focus of feature-based attention that enhances the signal strength of contiguous neural representations in early visual cortex. We also found that the magnitude of this enhancement was reduced for wider feature ranges, suggesting that the limits of selecting broad ranges of feature values arise at early processing stages.
Materials and Methods
Participants
Based on pilot data and our previous behavioral study (Chapman and Störmer, 2023), we collected data from 30 undergraduate students (female and male) from Dartmouth College, aged between 18 and 35 years, who received course credit or monetary compensation for participation ($20/1 h). Data from two participants were discarded from the final analyses after having >40% of the trials rejected due to artifacts. Data from one more participant were discarded due to not showing clear peaks at the stimulation frequencies. All participants had normal or corrected-to-normal color vision, as assessed via Ishihara’s plates for color deficiencies (Clark, 1924). Prior to the experiment, participants signed an informed consent form as approved by the Institutional Review Board at Dartmouth College. The study protocol was approved by the Committee for the Protection of Human Subjects.
Stimuli
Stimuli were presented on a black background (RGB: [0, 0, 0]). A fixation stimulus consisting of an inner circle surrounded by four quarter-circle wedges that formed a crosshair shape (“bull's eye and cross hair”; Thaler et al., 2013) was presented in the center of the screen throughout the experiment. Surrounding this fixation stimulus, two spatially intermingled arrays of colored circular dots were presented. The dots were randomly placed within an invisible circular aperture of which the inner and outer borders were 0.75 and 4° of visual angle away from the fixation point. There were a total of 360 dots (radius, 0.25°), and the dots' colors were sampled from an isoluminant color wheel in CIELab that was drawn with a radius of 49 units around a white point L*a*b* = [54, 21.5, 11.5]. For half of the dots, the colors were chosen from a uniform distribution spanning either 20° (narrow-focus) or 60° (broad-focus) along the color wheel, resulting in a mix of perceptually contiguous color values. For the remaining half of the dots, a single color value was chosen, which was always maximally distinct from the first color array, namely, 180° away on the color wheel with respect to the central color of the chosen range. Across the experiment, we used six different color bins sampled along the color wheel (the CIELab coordinates of the mean color values were: [54.8, 72.3, 33.0], [54.6, 22.1, 57.2], [54.0, −26.4, 26.5], [53.5, −18.0, −23.0], [53.9, 42.3, −30.8], [54.8, 77.0, 13.7], illumination equivalent to D50). The spatial positions of all colored dots were randomly intermixed. During each trial, the dots moved in a hazardous fashion, at a speed of 0.7°/s, changing their direction randomly every 0.05–0.12 s (sampled from a uniform distribution). If a dot moved out of the aperture due to the motion parameters, it was immediately redrawn at a random position. Across the experiment, the two arrays flickered at distinct frequencies (8.57 Hz or 7.5 Hz) to elicit separable steady-state visual evoked potentials. Frequencies were counterbalanced across target and nontarget arrays.
Procedure
All stimuli were presented on a ViewPixx monitor (1,920 × 1,080 pixels at 120 Hz; VPixx Technologies) placed 38 cm from the participant, and the participant's head was stabilized by a chin rest. Each trial started with a target display where only the to-be-attended target array was presented in color whereas the to-be-ignored array was shown in gray (RGB : [129, 129, 129]), ensuring that participants knew which colors to attend to, despite these colors varying in their values across different ranges. After 400 ms, the nontarget dots turned colorful, and 300 ms later the testing interval (2 s) began as each array started flickering at a distinct frequency. Participants were instructed to selectively attend to the target array based on the colors shown in the target display and monitor for a brief interval of coherent motion (250 ms), which could occur in one of four directions (up, down, left, right) for 80% of the dots. The coherent motion event could occur at a randomly selected time, with the constraint to not happen within the first 250 ms and last 300 ms of the testing interval. On half of the trials, the dots in the target array moved coherently, and on the remaining half of the trials, the dots in the nontarget array moved coherently. After each trial, participants were first asked to report whether the target array contained a coherent motion event or not by pressing one of two keys (“Z”: yes, “X”: no). If they responded “yes,” regardless of the accuracy of this initial response, they were then asked to indicate the direction of motion using the arrow-keys. Participants were told that the first question was the principal measurement and asked only to try their best with the second question. After the response(s), the next trial started after an intertrial interval of 750 ms. This experimental design was based on prior work from our lab that used this task in a series of behavioral experiments that showed participants are attending to the full color range. Specifically, in a previous experiment (Exp. 4 of Chapman and Störmer, 2023), the to-be-detected target motion could appear either in the center of the color distribution or at the edge. Participants performed equally well across both of these conditions, indicating that they distributed attention across the entire color range and did not only select the mean color.
The experiment consisted of 25 blocks of 16 trials each, resulting in overall 400 trials. Participants received feedback about their overall detection rate (hits and false alarms) and direction accuracy after each experimental block. An experimenter closely monitored participants' performance and introduced 1–3 min rest periods if the hit rate was below 60% and/or the false alarm rate was above 20%. The number of trials across focus conditions and flicker frequency was counterbalanced and all trial types occurred in random order throughout the experiment.
Prior to the main experiment, participants completed a short practice run of the task, followed by a thresholding procedure in which the speed of the motion event was varied to adjust task difficulty. The trial structure of the thresholding task was identical to the main session except for the response window where participants were asked either to report the direction of the motion event if the target array moved or to press “X” if the distractor array moved. There were 96 trials per focus condition, only in half of which the target array went through the motion event. A 1-up-3-down algorithm (Prins and Kingdom, 2018) updated the speed of the coherent motion after every target event. When participants reported the correct direction of the target motion, the response was counted as correct. All the other responses were recorded as incorrect. Two separate thresholding algorithms were implemented per each focus condition, and the trials were randomly intermixed. Participants did not receive any feedback during this thresholding task. The average speed across the last five trials was chosen as the speed of the coherent motion per each focus condition for each participant for the main experiment.
Statistical analyses
All statistical analyses were performed on R v4.3.1 (Team, 2010). The data and analyses scripts are available on OSF (https://osf.io/qx94k/?view_only=e256bebd6e7c4f888a696fb35919913d).
Behavioral analysis
Hit rates and false alarm rates were computed from the detection responses (first response on each trial), such that a “yes” response was counted as a hit when participants correctly reported that the target array moved, and as a false alarm when the motion event occurred in the distractor array. D-primes (d′) were calculated and corrected according to the log–linear rule (Hautus, 1995) for each focus condition separately. Next, we fit a linear mixed-effects model (lmerTest in R, Kuznetsova et al., 2017) with the following formula: d′ ∼ focus + (1|participant).
First, to test whether participants were able to reliably identify the target events in the target array in each condition, we tested overall d-primes against the intercept. We report the t-tests of which degrees of freedom were estimated using Satterthwaite's method. Second, we ran a t-test to check whether d-primes differed between the focus conditions. Similarly, we also ran a t-test to compare the threshold speeds obtained prior to the main experiment between the focus conditions. Accuracies for the direction discrimination responses in the main task were calculated for hits (correct detection of motion event in target array) and false alarms (incorrect detection of motion that did not occur in target array) trials. To test the effect of cueing and focus condition on accuracy for these responses, we fit a linear mixed-effects model with the following formula: d′ ∼ cue * focus + (1|participant).
The fitted mixed models were then used to conduct a two-way repeated-measures analysis of variance (ANOVA). When results were significant, we conducted follow-up tests via the emmeans package (Lenth, 2017). First, we performed a series of pairwise t-tests with Tukey adjustment contrasting the model coefficients for the significant factors. We also report the estimated marginal means (βEMM) with standard error (SE) and estimated 95% confidence intervals (CI). For the paired t-tests, we report the differences between estimated marginal means as βcontrast per contrast. For all t-tests, the effect sizes were estimated by converting the t values to Cohen's d (effect size in R; Ben-Shachar et al., 2020). Finally, we report the Bayes' factors (BF) for linear mixed-effects models via BayesFactor package in R (Morey et al., 2018) by comparing the full model to a reduced model where the associated term was removed. Bayes' factor is a tool to demonstrate the strength of evidence by taking the ratio of marginal likelihoods of two hypotheses for a given dataset. Since it is a ratio of likelihoods, any BFs larger than 1 indicate higher support for the full model over the null model.
Our behavioral models did not include more sophisticated random intercepts as we used for the EEG data (see below) because more complex models failed to converge, likely due to not having enough observations since the behavioral models were fitted to summary statistics (e.g., accuracy and d′), collapsing across multiple trials, whereas the EEG models were fit to the single-trial data.
Electrophysiological recordings and analysis
EEG was continuously recorded from 32 Ag/AgCl electrodes mounted in an elastic cap, and the signal was amplified by a BrainVision ActiCHamp amplifier (Brain Products). The electrode configuration was based on a 10-20 system except that three frontal electrodes (FC5, FCz, FC6) were reconfigured and placed below the occipital electrodes (as I3, Iz, I4) to increase spatial sampling from the posterior scalp. The horizontal electrooculogram (HEOG) was recorded through a bipolar pair of electrodes positioned next to the lateral ocular canthi and grounded with an additional electrode placed at the right side of the neck. The electrode FP1 (above the left eye) was used to measure vertical electrooculogram (VEOG) to detect blinks. The scalp electrodes were referenced online to an electrode on the right mastoid. The recordings were sampled at 500 Hz with an online high-pass filter of 0.01 Hz and low-pass filter of 250 Hz. All impedances were kept below 10 kΩ. To avoid artifacts from blinks and eye movements, participants were instructed to keep their eyes on the fixation stimulus and try not to blink throughout each trial. The experimenter monitored the EEG and HEOG recordings and gave participants feedback if they noticed blinks or eye movements during the recording.
Each participant's EEG data was preprocessed using the packages MNE (Gramfort et al., 2013) and autoreject (Jas et al., 2017) in Python 3.1. In the first step, we detected noisy channels using a random sample consensus algorithm (RANSAC; Fischler and Bolles, 1981) and checked for electrodes that were bridged by gel using the intrinsic Hjorth algorithm (Tenke and Kayser, 2001; Greischar et al., 2004). Flagged noisy or bridged channels suggested by these algorithms were visually inspected and then excluded from further analysis. None of the central-occipital channels used in the main analysis were affected by this. To correct for eye artifacts (blinks and horizontal eye movements), we implemented an independent component analysis (ICA) with the number of components set to one fewer than the number of available channels after exclusion. The resulting components and their scalp distribution were visually inspected. If distinct components for blinks and eye movements were detected, these corresponding components were excluded and the EEG signal was reconstructed (no more than one component per participant was excluded). To ensure that the number of blinks did not differ reliably between the conditions of interest, which could affect the SSVEP amplitudes, we calculated the number of blinks detected per condition prior to the ICA and found averages of 5.9 ± 1.82 and 6.57 ± 2.15 in the narrow- versus broad-focus condition, respectively (p = 0.44). Excluded channels, if any, were then interpolated via spline interpolation after which the EEG data was referenced to the average activity of all electrodes and low-pass (70 Hz) and high-pass(0.1 Hz) filtered using a finite impulse response (FIR) filter. EEG data were then epoched from 100 ms before to 2,000 ms after the flicker onset, and each epoch was baselined −100 to 0 ms with respect to the stimulus onset and linearly detrended. Finally, the autoreject algorithm was used to detect and then drop (mean ± SD = 9.4% ± 6.4% of epochs) or repair (<1% of the data) noisy epochs based solely on the occipital channels.
For the main SSVEP analysis, we first averaged across five occipital electrode sites (POz, O1, Oz, O2, Iz). These electrodes were selected a priori based on a series of pilot studies which used similar stimuli and showed that these electrodes yielded the highest amplitudes at the stimulation frequencies. Data epochs from 250 ms poststimulus onset to 2,000 ms were zero-padded (extending the time points from 875 to 1,051), and a Fast Fourier transform algorithm was implemented (frequency resolution: 0.475 Hz). To determine the peak amplitudes associated with each stimulation frequency, we used a search window (± 0.475 Hz) around each stimulation frequency and selected the maximum value within that window. However, since the two frequencies were separated by 0.95 Hz only, to prevent double assignment, the algorithm assigned the peak frequency to the slower stimulation frequency first, after which the assigned bin was removed from the search window if it coincided with the faster frequency's search window. The search algorithm was implemented for each trial separately.
To test the effects of attentional cueing and focus condition, we fit a linear mixed-effects model. We sought to use a mixed-effects model to account for within-subject variability in SSVEP amplitudes per frequency, specific color values, and across experimental blocks. Hence, in the model, we included random intercepts for stimulation frequency, target color center, and experimental block, resulting in the following linear mixed-effects model: amplitude ∼ cue * focus + (1|participant / (frequency + color + block)).
The operator “/” indicates that random intercepts were estimated per participant and per each term within the parentheses at the level of each participant. To prevent overfitting due to adding more terms and to determine which random intercepts improved the model fit, we added each random effect one by one in the given order and ran a series of likelihood ratio tests for comparing nested models. If a pairwise comparison returned a significant result (p < 0.05), the more complex model was picked to proceed to the next comparison. Otherwise, the simpler model was picked. In this forward random-effects selection procedure, the model including all of the random effects (see above) produced the best fit with the lowest Akaike information criterion (AIC), Bayesian information criterion (BIC), and log-likelihood. The rest of the statistical pipeline was the same as the behavioral data analyses.
Finally, for plotting the SSVEP results, we first averaged the amplitudes per each level of fixed and random-effects factor per participant. Then, we divided the amplitudes of each level with the mean of itself [e.g., FFT(distractor, narrow, 8.57 Hz, 1st block, 3rd color center)] and the contrasting cue condition where all the other factors kept constant [e.g., FFT(target, narrow, 8.57 Hz, 1st block, 3rd color center] to compute attention indices—which were later collapsed across random-effects factors per participant. Values above 1 indicated attentional modulation. The error bars indicate ±1 standard within-subject error (Morey, 2008).
Results
Behavior
Across all measurements we found that participants were able to attend to a range of colors effectively and that there was no reliable cost in attending to the larger versus narrower color ranges. Figure 1C shows the motion speed thresholds separately for each focus condition derived from the initial thresholding procedure. There was no significant effect of focus condition (narrow mean and within-subjects standard error: 1.32° ± 0.02°/s vs broad: 1.36° ± 0.02°/s, t(17) = 1.36, p = 0.186, BF = 0.54). Figure 2A shows the d’s from the detection response across the two focus conditions. The d’s were significantly above 0 (β ± SE = 1.32 ± 0.09, t(27) = 3.61, p < 0.002, Cohen’s d = 1.39), indicating that participants performed the task fine for the target array; there was no effect of focus condition (t(27) = 0.26, p = 0.79, BF = 0.28). Next, we conducted a two-way repeated-measures ANOVA on the discrimination accuracy data (secondary response of participants); participants were more accurate for the target array relative to the distractor array (target events: βEMM ± SE = 96.4% ± 2.06% vs distractor events: βEMM ± SE = 84.4% ± 2.06%; F(1,81) = 61.26, p < 10−10, BF = 123.11, Cohen’s d = 0.84; see Fig. 2B), demonstrating that participants selectively attended the target array relative to the distractor array. There was again no effect of focus condition (F(1,81) = 0.16, p = 0.69, BF = 0.29) and no interaction effect (F(1,81) = 0.41, p = 0.52, BF = 0.19). These behavioral results indicate that participants were attending to the target arrays in both conditions as instructed and were able to effectively respond to the target events across both narrow- and broad-focus conditions. The critical next question was whether these behavioral effects were reflected in increased signal strength of the neural representations in early visual cortex.
Trial structure, stimulus selection, and behavioral results from the thresholding task. A, Each trial started with a target display where only the target array was presented in color whereas the distractor array was presented in gray. After a short period, the testing interval began as each array started flickering at either 8.57 or 7.5 Hz, counterbalanced across conditions. At a random point in time either the target or the distractor array moved briefly in one of four directions. At the end of the testing interval participants reported whether the target array moved or not by pressing one of two keys. If they responded that they saw coherent motion, they were asked to indicate the motion direction. B, The colors of each dot array were selected from an isoluminant color wheel in CIE L*a*b* space and sampled from six color ranges that extended either 20° (narrow-attentional focus) or 60° (broad-attentional focus). The distractor array always contained one homogenous color that was 180° apart from the target colors' center. C, Results from the thresholding task in which the speed of the motion events was adjusted to ensure task difficulty would be at a similar level across participants and conditions.
Task performance. A, Performance was well above chance in both color range conditions. B, Participants were more accurate in reporting the motion direction of the target array (i.e., hits) compared with that of the distractor array (i.e., false alarms). There was no main effect of focus condition and no interaction.
Steady-state visual evoked potentials
As can be seen in Figure 3A, which depicts the FFT amplitudes for each frequency collapsed across the two focus conditions, the flickering arrays elicited clear steady-state visual evoked potentials (SSVEPs), and the amplitude of the SSVEP elicited by the attended array was larger compared with the unattended array. Figure 3B displays the scalp distributions of the average SSVEP amplitudes separately for each stimulation frequency and focus condition, showing the clear occipital locus of the SSVEP amplitudes. Finally, Figure 3D shows the estimated marginal means and 95% confidence intervals from our statistical model. Our analyses confirmed a significant cueing effect such that amplitudes were larger for the target array (βEMM ± SE = 0.51 ± 0.034 µV, CI = [0.44, 0.58] µV) relative to the distractor array (0.49 ± 0.034 µV, [0.42, 0.56] µV, F(1,19451) = 67.84, p < 10−15; Cohen’s d = 1.62). There was no significant main effect of focus condition (p = 0.63, BF = 0.02), but there was a significant interaction between the focus and cue conditions (F(1,19451) = 9.68, p < 0.002), such that the attentional modulation was stronger for the narrow relative to the broad-focus condition. To understand this interaction in more detail, we conducted a series of post hoc pairwise t tests comparing each level of cue and focus conditions which revealed that there was a significant cueing effect in the narrow-focus condition (βcontrast ± SE = 0.284 ± 0.004 μV, t(19458.62) = 8.02, pTukey < 10−8, Cohen's d = 1.57) and a reliable, but smaller, cueing effect in the broad-focus condition (0.012 ± 0.004 μV, t(19457.1) = 3.62, pTukey < 0.002, Cohen's d = 0.71). We also tested whether the condition differences were driven by differences between the target amplitudes or distractor amplitudes. We found no difference between the two nontarget arrays (pTukey = 0.26) but found a trend such that the FFT amplitudes associated with the target array was higher in the narrow- versus broad-focus condition (0.009 ± 0.004 μV, t(19958.07) = 2.53, pTukey = 0.056, Cohen's d = 0.50).
SSVEP data and results. A, Average FFT amplitudes collapsed across focus conditions are shown separately for each target frequency (7.5 Hz in red or 8.57 Hz in blue). When participants attended the target array, there was an increase in amplitudes of the target flicker frequency compared with when the same frequency was associated with the distractor array. The epochs with the same target frequencies were collapsed and zero-padded before implementing FFTs for illustration purposes. B, Average scalp distributions of the FFT amplitudes separately for each stimulation frequency and focus condition and collapsed across the cue conditions. Each stimulation frequency and focus condition resulted in similar scalp distributions where the amplitudes increase around the central-occipital electrodes. C, Normalized attention effects per focus and cue conditions. The target arrays induced higher amplitudes than the distractor arrays, and this effect was more pronounced in the narrow relative to the broad-focus condition. The error bars indicate ±1 within-subject errors. D, Estimated marginal means and 95% confidence intervals. The thick middle line shows the estimated marginal means (EMMs), the boxes span ±1 standard error from the EMMs, and the whiskers denote the estimated 95% confidence intervals. Similar to the normalized amplitudes, the target SSVEPs were higher than the distractor SSVEPs, and this attentional modulation was larger in the narrow versus broad-focus condition. Post hoc analyses directly comparing the two target arrays and the two distractor arrays suggested that the difference between conditions was more prominent between the target amplitudes, with less of a difference between the distractor arrays. The y-axis is scaled down at the interval between 0 and 0.4 μV for illustration purposes.
Finally, we assessed whether the difference in attentional modulation between the broad and narrow cueing conditions would be captured by a simple resource model of attention that assumes that attentional modulations would be linearly proportional to the increases in the color range. Specifically, as the color range increases by a factor of 3 (from 20 to 60°), according to a simple resource model, one might assume that the attention effect would change proportionally also by a factor of 3. This is not what we found: when calculating baseline-corrected amplitudes (target-minus-distractor)/distractor, we found a ∼8% increase in amplitude in the narrow and a ∼4.7% increase in the broad condition (thus, a reduction by ∼1.7 and not 3). This is consistent with more complex models of attentional capacity and resource allocation, where limits are not determined by the number of features or the degrees along a color wheel, but by perceptual grouping and Gestalt principles (Chapman and Störmer, 2023; Chapman and Störmer, 2024; Driver et al., 2001). Overall, our results indicate that color processing of targets was reliably enhanced in early visual cortex across both range conditions but that this enhancement was reduced to some degree for the larger color range of 60°.
Discussion
The current study investigated the flexibility of feature-based attention and its effects on early visual-cortical processing. We found that selecting both broader and narrower ranges in color space resulted in an increased signal for the attended colors at early visual processing stages, demonstrating that attention can be tuned effectively to contiguous regions in feature space. We also found that this enhancement was reduced for the broader color range (60°), indicating that there are limits in how broadly attention can be tuned and that these limits arise at an early sensory processing stage. The observed decrease in attentional modulation for larger ranges parallels models of spatial attention that have long assumed a flexible focus that can include smaller or larger regions in the visual field, but where attentional benefits also decrease as a function of the size of the attended area (Eriksen and St James, 1986; Castiello and Umiltà, 1990).
The cost of broadening attention in color space could be due to multiple reasons. One possibility is that expanding its focus was simply inefficient and only a subset of the to-be-attended colors was enhanced. In this case, the SSVEP amplitude in the broad-focus condition would reflect a heterogeneous population of neurons that include a mixture of those that were modulated by attention and those that were not, resulting in an overall weaker attentional gain relative to the narrow-focus condition. Another possibility is that the full range of colors was selected successfully but that the overall gain was reduced because resources needed to be spread more widely. This interpretation is consistent with a recent behavioral study that used a similar stimulus and task as here, but where the target event (coherent motion) could occur either in the middle or outer part along the selected color range (Exp. 4; Chapman and Störmer, 2023). Results showed similar discrimination performance regardless of where in feature space the target event occurred, suggesting that attention was distributed relatively uniformly along the entire feature range (even when target colors spanned 120° along the color wheel). Given the similarities between this previous behavioral experiment and the current design, we believe it is most likely that participants attended to the entire range in the current study and not just the mean color. This interpretation is also consistent with the normalization model of attention that proposes neural responses are dependent on the distribution of attentional resources across spatial and feature dimensions (Reynolds and Heeger, 2009). According to the model, the stimulus input is multiplied by attentional gain but subsequently undergoes divisive normalization by the activity pooled across the population. When the attentional field is increased (i.e., the focus is broad), the normalization model predicts that the overall attentional efficiency will be reduced, which was borne out in previous behavioral studies (Herrmann et al., 2010). The present result of a reduced SSVEP response during the broad-attentional focus condition is consistent with this and suggests that when attentional resources are spread across a broader range of task-relevant colors, this results in an increased magnitude of normalization applied to the population, decreasing the overall neural response. However, as our experiment varied both the size of the attentional focus and the range of visual features present in each attention condition, it leaves open the possibility that the observed effects are—at least in part—driven by a change in the stimulus input and are not solely due to differences in attention. Future studies could systematically vary the size of the attentional focus and size of the stimulation field in feature space, to test the full range of predictions from the normalization model of attention.
Our findings also provide important insights for theories of visual search, and in particular recent accounts arguing that attentional templates are highly flexible and can be tuned more narrowly or broadly depending on the visual scene and the characteristics of the target item (Yu et al., 2023). For example, participants can maintain highly precise templates when searching for a very specific item with low uncertainty but can adjust this template to be broadly tuned to allow for some uncertainty around the potential target item (Geng and Witkowski, 2019). The present results provide the neural basis for how such broad feature templates may be implemented in early visual cortex, demonstrating that entire feature ranges can be enhanced in parallel. Importantly, our data indicate that such broadly tuned selection does not need to be diffuse but can be efficient and precise, encompassing all relevant feature values. Such a mechanism that allows broadly tuned enhancement appears particularly useful when searching for target stimuli with higher feature variability and uncertainty, which is common in many everyday situations, especially in the case of color where lighting conditions or the surrounding scene can change their appearance (Brainard et al., 2006; Lafer-Sousa and Conway, 2017).
The present study relates to other work investigating the ability to select multiple feature values at once, for example, by cueing participants to select two separate features (e.g., red and green). Several studies have found behavioral and neural costs in attending to one versus two features (Liu and Jigo, 2017; Liu et al., 2013; Andersen et al., 2013; Störmer and Alvarez, 2014), suggesting that there are stark limits in how many features can be selected at once (see also Huang and Pashler, 2007). In one recent study by Martinovic et al., (2024), feature-based attention was directed to a single target color or divided between distinct colors on opposite sides of the color space (red and green, 180° apart). The results showed that SSVEP amplitudes were approximately halved in the two target conditions, suggesting that attentional resources were divided roughly equally across these distinct targets. In contrast, the largest distance between target colors in our study was 60° (although most targets were more similar, due to our uniform sampling within this range). Further, the attention effects were not linearly proportional across the two conditions: while the attentional focus was three times larger in the broad relative to the narrow-focus condition (in terms of degrees along the perceptually uniform color wheel), the decrease in SSVEP modulation was much less. This may suggest that attending to contiguous feature distributions enables attention to more easily spread across ranges of distinct colors, relative to selecting two noncontiguous distinct color values. Specifically, assuming that similar colors are represented nearby each other in color-sensitive regions in visual cortex (Brouwer and Heeger, 2009; Conway et al., 2007; Bohon et al., 2016), it seems plausible that attention can effectively enhance colors that are represented along continuous parts of this well-organized feature map. This is consistent with the idea that selective attention operates over perceptual groups and that selection limits are determined by grouping principles rather than a certain number of features/items (Driver et al., 2001; Chapman and Störmer, 2024).
While the current study demonstrates clear changes in early attentional modulations due to the size of the focus of feature-based attention, we did not observe any behavioral differences between the two focus conditions. While several previous studies have shown strong links between SSVEP modulations and behavioral measures (Andersen and Müller, 2010; Andersen and Hillyard, 2024; Störmer et al., 2013), others have not found such clear relations between SSVEP amplitude changes and performance (Andersen, et al., 2013; Gundlach et al., 2024; Störmer et al., 2014), similar to here. For example, a recent study reported no link between changes in SSVEP amplitudes and performance but instead showed that modulations in the alpha-band predicted behavioral performance in a spatial cueing task (Gundlach et al., 2024); this was interpreted as alpha oscillations being possibly related to processes of sensory read-out, whereas SSVEP signals reflect early sensory processing, which might limit their ability to predict behavior in the same way (Gundlach et al., 2024; see also Zhigalov and Jensen, 2020). These previous studies together with the present results support a framework of attention where selection operates across multiple processing stages, and not all of them are necessarily related to a specific behavioral outcome in the same way (Maunsell, 2015; Buschman, 2015; Störmer et al., 2014). Yet, even though we found not clear behavioral condition differences in the present study, we believe the current findings of reduced attentional gain for larger feature ranges likely lay the foundation for behavioral costs that have been observed when ranges are much higher, for example, when participants are cued to attend to ranges of 90° or more along the color wheel (Chapman and Störmer, 2023).
Overall, the present study demonstrates that feature-based attention can enhance color signals in early visual cortex across entire ranges of task-relevant features with high efficiency, providing the neural foundation of how highly variable visual information can be selected flexibly in complex visual scenes. Our findings also add to models of selective attention that assume joint principles for location- and feature-based attention, highlighting the dynamic flexibility by which selective attention can “zoom in and out” in feature space.
Footnotes
We thank the research assistants Luke B. Putelo and Justin A. Santana for their help with data collection. This work was supported by grants from the National Science Foundation (BCS-1850738) and the National Institute for Health (R01MH133689) to V.S.S.
The authors declare no competing financial interests.
- Correspondence should be addressed to Viola Störmer at Viola.S.Stoermer{at}dartmouth.edu.