Abstract
Covert spatial attention has a variety of effects on the responses of individual neurons. However, relatively little is known about the net effect of these changes on sensory population codes, even though perception ultimately depends on population activity. Here, we measured the EEG in human observers (male and female), and isolated stimulus-evoked activity that was phase-locked to the onset of attended and ignored visual stimuli. Using an encoding model, we reconstructed spatially selective population tuning functions from the pattern of stimulus-evoked activity across the scalp. Our EEG-based approach allowed us to measure very early visually evoked responses occurring ∼100 ms after stimulus onset. In Experiment 1, we found that covert attention increased the amplitude of spatially tuned population responses at this early stage of sensory processing. In Experiment 2, we parametrically varied stimulus contrast to test how this effect scaled with stimulus contrast. We found that the effect of attention on the amplitude of spatially tuned responses increased with stimulus contrast, and was well described by an increase in response gain (i.e., a multiplicative scaling of the population response). Together, our results show that attention increases the gain of spatial population codes during the first wave of visual processing.
SIGNIFICANCE STATEMENT We know relatively little about how attention improves population codes, even though perception is thought to critically depend on population activity. In this study, we used an encoding-model approach to test how attention modulates the spatial tuning of stimulus-evoked population responses measured with EEG. We found that attention multiplicatively scales the amplitude of spatially tuned population responses. Furthermore, this effect was present within 100 ms of stimulus onset. Thus, our results show that attention improves spatial population codes by increasing their gain at this early stage of processing.
Introduction
Covert spatial attention improves perception by improving neural representations in visual cortex (Maunsell, 2015; Sprague et al., 2015). At the level of individual neurons, spatial attention not only increases the amplitude of responses (Luck et al., 1997; McAdams and Maunsell, 1999), but also has a variety of effects on the spatial tuning of neurons: receptive fields shift toward attended locations, and attention increases the size of the receptive field of some neurons while decreasing the size of others (Connor et al., 1997; Womelsdorf et al., 2006, 2008; Anton-Erxleben et al., 2009; for review, see Anton-Erxleben and Carrasco, 2013; Sprague et al., 2015). Ultimately, however, perception depends on the joint activity of large ensembles of cells (Pouget et al., 2000). Thus, there is strong motivation to understand the net effect of these local changes for population representations (Sprague et al., 2015).
There is clear evidence that attended stimuli evoke larger population responses than unattended stimuli. For instance, covert attention increases the amplitude of visually evoked potentials measured with EEG (e.g., van Voorhis and Hillyard, 1977; Itthipuripat et al., 2014a), which reflect the aggregate activity of many neurons (Nunez and Srinivasan, 2006). However, studies that measure changes in the overall amplitude of population responses do not reveal how attention influences the information content of population activity (Serences and Saproo, 2012). Thus, researchers have turned to multivariate methods. Sprague and Serences (2013), for example, used an inverted encoding model (IEM) to reconstruct population-level representations of stimulus position from patterns of activity measured with fMRI. They found that spatially attending a stimulus increased the amplitude of spatial representations across the visual hierarchy without reliably changing their size (see also Vo et al., 2017; Itthipuripat et al., 2019; but see Fischer and Whitney, 2009).
Although fMRI is a powerful tool for assaying population codes, two major limitations prevent clear conclusions regarding the effect of attention on stimulus-driven activity. First, the sluggish BOLD signal that is measured with fMRI provides little information about when attention modulates population codes. Second, growing evidence suggests that the effect of attention on the BOLD signal does not reflect a modulation of the stimulus-evoked response at all, but instead reflects a stimulus-independent shift in baseline activity. These studies varied stimulus contrast to measure neural contrast-response functions (CRFs), which can be modulated by attention in several ways (Fig. 1). Whereas unit-recording and EEG studies have found that attentional modulation of neural responses depends on stimulus contrast, either multiplicatively scaling the CRF (response gain, Fig. 1a) or shifting the CRF to the left (contrast gain, Fig. 1b) (Reynolds et al., 2000; Martínez-Trujillo and Treue, 2002; Kim et al., 2007; Itthipuripat et al., 2014a,b, 2019), fMRI studies have found that spatial attention increases the BOLD signal in visual cortex by the same amount regardless of stimulus contrast, even when no stimulus is presented at all (an additive shift, Fig. 1c) (Buracas and Boynton, 2007; Murray, 2008; Pestilli et al., 2011; Sprague et al., 2018b; Itthipuripat et al., 2019; but see Li et al., 2008). This finding suggests that the effect of attention on the BOLD response reflects top-down inputs to visual cortex rather than a modulation of stimulus-driven activity (Murray, 2008; Itthipuripat et al., 2014a). Therefore, extant work has not yet determined how attention changes stimulus-driven population codes.
Here, we used EEG to examine how spatial attention modulates the spatial tuning of stimulus-driven population responses. We measured stimulus-evoked activity (i.e., activity that is phase-locked to stimulus onset) to isolate the stimulus-driven response from ongoing activity that is independent of the stimulus. We used an IEM (Brouwer and Heeger, 2009) to reconstruct spatially selective channel-tuning functions (CTFs) from the pattern of stimulus-evoked activity across the scalp. The resulting CTFs reflect the spatial tuning of the population activity that is measured with EEG. We focused our analysis in an early window, ∼100 ms after stimulus onset. Activity at this latency is thought to primarily reflect the first wave of sensory activity evoked by a stimulus in extrastriate cortex (Clark and Hillyard, 1996; Martínez et al., 1999). In Experiment 1, we found that attention increased the amplitude of stimulus-evoked CTFs. Thus, attention increased the gain of spatial population codes at this early stage of sensory processing. In Experiment 2, we further characterized the effect of attention on spatial population codes by parametrically varying stimulus contrast. We found that the effect of attention on the amplitude of stimulus-evoked CTFs increased with stimulus contrast, and was well described as an increase in response gain (Fig. 1a). Together, our results show that attention increases the gain of stimulus-evoked population codes at early stages of sensory processing.
Materials and Methods
Subjects
Forty-five volunteers (21 in Experiment 1 and 24 in Experiment 2) participated in the experiments for monetary compensation ($15/h). Subjects were between 18 and 35 years old, reported normal or corrected-to-normal visual acuity, and provided informed consent according to procedures approved by the University of Chicago Institutional Review Board.
Experiment 1
Our target sample size was 16 subjects in Experiment 1, following our past work using an IEM to reconstruct spatial CTFs from EEG activity (Foster et al., 2016). Twenty-one volunteers participated in Experiment 1 (8 male, 13 female; mean age = 22.7 years, SD = 3.2). Four subjects were excluded from the final sample for the following reasons: we were unable to prepare the subject for EEG (n = 1); we were unable to obtain eye tracking data (n = 1); the subject did not complete enough blocks of the task (n = 1); and residual bias in eye position (see Eye movement controls) was too large (n = 1). The final sample size was 17 (6 male, 11 female; mean age = 22.7 years, SD = 3.4). We overshot our target sample size of 16 because the final subject was scheduled to participate before we reached our target sample size.
Experiment 2
In Experiment 2, we increased our target sample size to 20 subjects to increase statistical power because we sought to test how the effect of attention changes with stimulus contrast. Twenty-four volunteers participated in Experiment 2 (6 male, 18 female; mean age = 24.0 years, SD = 3.0), 4 of which had previously participated in Experiment 1. For 4 subjects, we terminated data collection and excluded the subject from the final sample for the following reasons: we were unable to obtain eye tracking data (n = 1); the subject had difficulty performing the task (n = 1); the subject made too many eye movements (n = 2). The final sample size was 20 (5 male, 15 female; mean age = 24.0 years, SD = 2.8).
Apparatus and stimuli
We tested the subjects in a dimly lit, electrically shielded chamber. Stimuli were generated using MATLAB (The MathWorks) and the Psychophysics Toolbox (Brainard, 1997; Pelli, 1997). Subjects viewed the stimuli on a γ-corrected 24 inch LCD monitor (refresh rate: 120 Hz, resolution 1080 × 1920 pixels) with their chin on a padded chin rest (viewing distance: 76 cm in Experiment 1, 75 cm in Experiment 2). Stimuli were presented against a mid-gray background (∼61 cd/m2).
Task procedures
On each trial, observers viewed a sequence of four bullseye stimuli (Fig. 2a). Across blocks, we manipulated whether observers attended the bullseye stimuli (attend-stimulus condition) or attended the central fixation dot (attend-fixation condition). In the attend-stimulus condition, observers monitored the sequence for one bullseye that was lower contrast than the rest (a bullseye target). In the attend-fixation condition, observers monitored the fixation dot for a 100 ms decrement in contrast (a fixation target). Contrast decrements for both the bullseye targets and fixation targets occurred on half of the trials in both conditions, and the trials that contained bullseye targets and fixation targets were determined independently. We instructed subjects to disregard changes in the unattended stimulus. Although past work has suggested that there may be differences in the cortical regions that support attention to peripheral locations and attention to fixated locations (Kelley et al., 2008), we contrasted target-evoked responses in these conditions because of the powerful effect that this manipulation of attention has on stimulus-evoked responses. Furthermore, recent studies that have used fMRI to examine the effect of attention on spatially tuned population responses have manipulated attention in the same way (e.g., Sprague and Serences, 2013; Itthipuripat et al., 2019). Therefore, this manipulation of attention allows for comparison with these past studies.
Observers fixated a central dot (0.1° in diameter, 56.3% Weber contrast, i.e., 100 × (L – Lb)/Lb, where L is stimulus luminance and Lb is the background luminance) before pressing the spacebar to initiate each trial. Each trial began with a 400 ms fixation display. A peripheral cue (0.25° in diameter, 32.8% Weber contrast) was presented where the bullseye stimuli would appear for 300 ms. On each trial, the bullseyes appeared at one of eight locations equally spaced around fixation at an eccentricity of 4°. Each bullseye (1.6° in diameter, 0.12 cycles/°) appeared for 100 ms. The cue and each of the bullseyes were separated by a variable interstimulus interval between 500 and 800 ms. Bullseye targets (the bullseye that was lower contrast than the others) were never the first bullseye in the sequence. Thus, the first bullseye of each trial established the pedestal contrast the trial (i.e., the contrast of the nontarget bullseyes). Fixation targets (a 100 ms decrement in the contrast of the fixation dot) occurred at the same time as one of the bullseye stimuli; and like bullseye targets, fixation targets never occurred during the presentation of the first bullseye of the trial. Both bullseye and fixation targets occurred on 50% of trials, determined randomly and independently for each stimulus to preclude accurate performance based on attention to the wrong aspect of the display. On trials with both a bullseye target and fixation target (25% of trials), the timing of each target was determined independently, such that the targets co-occurred on ∼33% of these trials. The final bullseye in each trial was followed by a 500 ms blank display before the response screen appeared. Each trial ended with a response screen that prompted subjects to report whether or not a target was presented in the relevant stimulus. Subjects responded using the number pad of a standard keyboard (1 = target present, 2 = target absent). The subject's response appeared above the fixation dot, and they could correct their response if they pressed the wrong key. Finally, subjects confirmed their response by pressing the spacebar.
Experiment 1
In Experiment 1, the pedestal contrast of the bullseye was always 89.1% Michelson contrast (100 × (Lmax – Lmin)/(Lmax + Lmin), where Lmax in the maximum luminance and Lmin is the minimum luminance). Subjects completed a 3.5 h session. The session began with a staircase procedure to adjust task difficulty (see Staircase procedures). Subjects then completed 12-20 blocks (40 trials each) during which we recorded EEG. Thus, subjects completed between 480 and 800 trials (1920-3200 stimulus presentations). The blocks alternated between the attend-stimulus and attend-fixation conditions, and we counterbalanced task order across subjects.
Experiment 2
In Experiment 2, we manipulated the contrast of the bullseye stimuli. We included 5 pedestal contrasts (6.25%, 12.5%, 25.0%, 50.0%, and 90.6% Michelson contrast). Thus, there were 10 conditions in total (2 attention conditions × 5 pedestal contrasts). Subjects completed three sessions: a 2.5 h behavior session to adjust task difficulty in each condition (see Staircase procedures), followed by two 3.5 h EEG sessions. All sessions were completed within a 10 d period. Each block consisted of 104 trials: eight trials for each of the 10 conditions, and an additional 12 trials in each condition at the highest pedestal contrast (90.6% contrast) for the purpose of training the encoding model (see Training and testing data). Each block included a break at the halfway point. As in Experiment 1, the blocks alternated between the attend-stimulus and attend-fixation conditions, and we counterbalanced task order across subjects. We aimed to have each subject complete 20 blocks across the EEG sessions to obtain 160 testing trials for each condition (640 stimulus presentations), and 480 training trials (1920 stimulus presentations). All subjects completed 20 blocks with the following exceptions: 3 subjects completed 18 blocks, and 1 subject completed 24 blocks.
In Experiment 2, we made one minor change from Experiment 1: the experimenter could manually provide feedback to the observer to indicate whether they noticed blinks or eye movements during the trial by pressing a key outside the recording chamber. When feedback was provided, the text “blink” or “eye movement” was presented in red for 500 ms after the observer had made their response.
Staircase procedures
In each experiment, we used a staircase procedure to match difficulty across conditions in both experiments. We adjusted difficulty by adjusting the size of the contrast decrement for each condition independently.
Experiment 1
In Experiment 1, subjects completed six staircase blocks of 40 trials (three blocks for each condition) before we started the EEG blocks of the task. Thus, subjects completed 120 staircase trials for each condition. We used a 3-down-1-up procedure to adjust task difficulty: after three correct responses in a row, we reduced the size of the contrast decrement by 2%; after an incorrect response, we increased the size of the contrast decrement by 2%. This procedure was designed to hold accuracy at ∼80% correct (García-Pérez, 1998). The final size of the contrast decrements in the staircase blocks was used for the EEG blocks. During the EEG blocks, we examined accuracy in each condition every four blocks (two blocks of each condition), and adjusted the size of the contrast decrements to hold accuracy as close to 80% as possible.
Experiment 2
In Experiment 2, subjects completed a 2.5 h staircase session before the EEG sessions. We adjusted difficulty for each of the 10 conditions independently (2 attention conditions × 5 pedestal contrast). Subjects completed 16 blocks of 40 trials, alternating between the attend-fixation and attend-stimulus conditions. The five contrast levels were randomized within each block. Thus, observers completed 64 staircase trials for each of the 10 conditions. We used a weighted up/down procedure to adjust task difficulty: after a correct response, we reduced the size of the contrast decrement by 5%; after an incorrect response, we increased the size of the contrast decrement by 17.6%. This procedure held accuracy fixed at ∼76%. The staircase procedure continued to operate throughout the EEG sessions.
EEG acquisition
We recorded EEG activity from 30 active Ag/AgCl electrodes mounted in an elastic cap (Brain Products actiCHamp). We recorded from International 10-20 sites: Fp1, Fp2, F7, F3, Fz, F4, F8, FT9, FC5, FC1, FC2, FC6, FT10, T7, C3, Cz, C4, T8, CP5, CP1, CP2, CP6, P7, P3, Pz, P4, P8, O1, Oz, O2. Two additional electrodes were affixed with stickers to the left and right mastoids, and a ground electrode was placed in the elastic cap at position Fpz. All sites were recorded with a right-mastoid reference and were rereferenced offline to the algebraic average of the left and right mastoids. We recorded EOG data using passive electrodes, with a ground electrode placed on the left cheek. Horizontal EOG was recorded from a bipolar pair of electrodes placed ∼1 cm from the external canthus of each eye. Vertical EOG was recorded from a bipolar pair of electrodes placed above and below the right eye. Data were filtered online (low cutoff = 0.01 Hz, high cutoff = 80 Hz, slope from low- to high-cutoff = 12 dB/octave), and were digitized at 500 Hz using BrainVision Recorder (Brain Products) running on a PC. Impedance values were kept to <10 kΩ.
Eye tracking
We monitored gaze position using a desk-mounted EyeLink 1000 Plus infrared eye-tracking camera (SR Research). Gaze position was sampled at 1000 Hz. Head position was stabilized with a chin rest. According to the manufacturer, this system provides spatial resolution of 0.01° of visual angle and average accuracy of 0.25-0.50° of visual angle. We calibrated the eye tracker every 1 or 2 blocks of the task, and between trials during the blocks if necessary. We drift-corrected the eye tracking data for each trial by subtracting the mean gaze position measured during a 200 ms window immediately before the onset of the cue.
Artifact rejection
We excluded data from some electrodes for some subjects because of low-quality data (excessive high-frequency noise or sudden steps in voltage). In Experiment 1, we excluded one or two electrodes for 3 subjects in our final sample. In Experiment 2, we excluded electrodes Fp1 and Fp2 for all subjects because we obtained poor-quality data (high-frequency noise and slow drifts) at these sites for most subjects, and we excluded data for one additional electrode for 2 subjects in our final sample. In both experiments, all excluded electrodes were located at frontal or central sites. Our window of interest was from 200 ms before stimulus onset until 500 ms after stimulus onset. We segmented the EEG data into epochs time-locked to the onset of each bullseye stimulus (starting 1200 ms before stimulus onset and ending 1500 ms after stimulus onset). We segmented data into longer epochs so that the epochs were long enough to apply a high-pass filter (see Evoked power), and so that our window of interest was not contaminated with edge artifacts when filtering the data. We baseline-corrected the EEG data by subtracting mean voltage during the 200 ms window immediately before stimulus onset. We visually inspected the segmented EEG data for artifacts (amplifier saturation, excessive muscle noise, and skin potentials), and the eye tracking data for ocular artifacts (blinks, eye movements, and deviations in eye position from fixation), and discarded any epochs contaminated by artifacts. In Experiment 1, all subjects included in the final sample had at least 800 artifact-free epochs for each condition. In Experiment 2, all subjects included in the final sample had at least 450 artifact-epochs for testing the IEM in each condition, and at least 1500 artifact-free epochs for training the IEM (see Training and test data).
Eye movement controls
After artifact rejection, for each subject, we inspected mean gaze position as a function of stimulus position for the attend-stimulus and attend-fixation conditions separately. For all subjects in the final samples, mean gaze position varied by <0.2° of visual angle across stimulus positions. One subject in Experiment 1 was excluded from the final sample because the subject did not meet this criterion. To verify that removal of ocular artifacts was effective, we inspected mean gaze position (during the 100 ms presentation of each stimulus) as a function of stimulus position for the attend-stimulus and attend-fixation conditions separately. In both experiments, we observed very little variation in mean gaze position (across subjects) as a function of stimulus position (<0.05° of visual angle) for both the attend-stimulus and attend-fixation conditions (Fig. 3), confirming that we achieved an extremely high standard of fixation compliance after epochs with artifacts were removed. Thus, the effects of attention reported below cannot be attributed to variation in eye position.
Controlling for stimulus contrast
On half of trials, one of the four bullseyes was lower contrast than the rest (i.e., a target). Thus, the average contrast of the bullseyes was slightly lower than the pedestal contrast (i.e., the contrast of the nontarget bullseyes), and small differences in average contrast may have emerged between conditions after rejection of data that were contaminated by EEG artifacts or eye movements. However, the difference in mean contrast between the attend-stimulus and attend-fixation conditions after artifact rejection was negligible. In Experiment 1, mean contrast of the bullseye stimuli was 87.4% (SD = 0.97%) in the attend-stimulus condition and 87.5% (SD = 0.92%) in the attend-fixation condition. Similarly, in Experiment 2, the mean contrast of the bullseye stimuli was comparable for the attend-stimulus and attend-fixation conditions for all pedestal contrasts (Table 1).
Evoked power
A Hilbert transform (MATLAB Signal Processing Toolbox) was applied to the segmented EEG data to obtain the complex analytic signal, , of the EEG, as follows: where is the Hilbert transform of , and . The complex analytic signal was extracted for each electrode using the following Matlab syntax: where data are a 2D matrix of segmented EEG (number of trials × number of samples). We calculated evoked power by first averaging the complex analytic signals across trials, and then squaring the complex magnitude of the averaged analytic signal. Evoked power isolates activity phase-locked to stimulus onset because only activity with consistent phase across trials remains after averaging the complex analytic signal across trials. Trial averaging was performed for each stimulus position separately within each block of training or test data for the IEM analyses (see Training and testing data).
For some analyses, we high-pass filtered the data with a low-cutoff of 4 Hz to remove low-frequency activity before calculating evoked power. We used EEGLAB's “eegfilt.m” function (Delorme and Makieg, 2004), which implements a two-way least-squares finite impulse response filter. This filtering method uses a zero-phase forward and reverse operation, which ensures that phase values are not distorted, as can occur with forward-only filtering methods.
Alpha-band power
To calculate alpha band power at each electrode, we bandpass filtered the raw EEG data between 8 and 12 Hz using the “eegfilt.m” function in EEGLAB (Delorme and Makieg, 2004), and applied a Hilbert transform (MATLAB Signal Processing Toolbox) to the bandpass-filtered data to obtain the complex analytic signal. Instantaneous power was calculated by squaring the complex magnitude of the complex analytic signal.
Inverted encoding model
We used an inverted encoding model (Brouwer and Heeger, 2009, 2011) to reconstruct spatially selective CTFs from the distribution of power across electrodes (Foster et al., 2016). We assumed that the power at each electrode reflects the weighted sum of eight spatially selective channels (i.e., neuronal populations), each tuned for a different angular position (Fig. 2b). We modeled the response profile of each spatial channel across angular locations as a half sinusoid raised to the 25th power as follows: where θ is angular location (0°-359°) and R is the response of the spatial channel in arbitrary units. This response profile was circularly shifted for each channel such that the peak response of each spatial channel was centered over one of the eight locations at which the bullseye stimuli could appear (0°, 45°, 90°, etc.).
An IEM routine was applied to each time point. We partitioned our data into independent sets of training data and test data (see Training and testing data). The analysis proceeded in two stages (training and test). In the training stage (Fig. 2c), training data (B1) were used to estimate weights that approximate the relative contribution of the eight spatial channels to the observed response measured at each electrode. Let B1 (m electrodes × n1 measurements) be the power at each electrode for each measurement in the training set, C1 (k channels × n1 measurements) be the predicted response of each spatial channel (determined by the basis functions; see Fig. 2b) for each measurement, and W (m electrodes × k channels) be a weight matrix that characterizes a linear mapping from “channel space” to “electrode space.” The relationship between B1, C1, and W can be described by a GLM of the following form:
The weight matrix was obtained via least-squares estimation as follows:
In the testing stage (Fig. 2d), we inverted the model to transform the observed test data B2 (m electrodes × n2 measurements) into estimated channel responses, C2 (k channels × n2 measurements), using the estimated weight matrix, , that we obtained in the training phase as follows:
Each estimated channel response function was then circularly shifted to a common center, so the center channel was the channel tuned for the position of the bullseye stimulus (i.e., 0° on the channel offset axes), then averaged to obtain a CTF. Finally, because the exact contributions of each spatial channel to each electrode (i.e., the channel weights, W) likely vary across subjects, we applied the IEM routine separately for each subject.
Training and testing data
For the IEM analysis, we partitioned artifact-free epochs into three independent sets: two training sets and one test set. Within each set, we calculated power across the epochs for each stimulus position to obtain a matrix of power values across all electrodes for each stimulus position (electrodes × stimulus positions, for each time point). We equated the number of epochs for each stimulus position in each set. Some excess epochs were not assigned to any set because of this constraint. Thus, we used an iterative approach to make use of all available epochs. For each of 500 iterations, we randomly partitioned the data into training and test data (see below for details of how data were partitioned into training and test sets in each experiment), and we averaged the resulting CTFs across iterations.
Experiment 1
When comparing CTF parameters across conditions, it is critical to estimate a fixed encoding model (i.e., train the encoding model on a common training set) that is then used to reconstruct CTFs for each condition separately (for discussion of this issue, see Sprague et al., 2018a, 2019). Thus, for Experiment 1, we estimated the encoding model using a training set that included equal numbers of trials from each condition. While we trained our encoding model on a mix of the attend-stimulus and attend-fixation conditions, training on a mix of data from both conditions is not necessary for the purposes of estimating the encoding model. Rather, what is critical is to estimate channel weights just once using the same training set, so that the reconstructed CTFs for each condition can be compared on an equal footing (Sprague et al., 2018a, 2019). We opted to use a mix of the two conditions for estimated the encoding model so that observers were not completing considerably more trials in one attention condition than in the other. Specifically, in Experiment 1, we partitioned data for each condition (attend-stimulus and attend-fixation) into three sets (with the constraint that the number of trials per location in each set was also equated across conditions). We obtained training data by combining data across the two conditions before calculating power, resulting in two training sets that included equal numbers of trials from each condition. We then tested the model using the remaining set of data for each condition separately. Thus, we used the same training data to estimate a single encoding model, and varied only the test data that were used to reconstruct CTFs for each condition.
Experiment 2
In Experiment 2, we included additional trials in the 90.6% contrast conditions (half from the attend-stimulus condition and half from the attend-fixation condition) to train the encoding model (see Task procedures, Experiment 2). We used high-contrast stimuli to estimate channel weights because high-contrast stimuli should drive a strong stimulus-evoked response. For each iteration of the analysis, we partitioned these data into two training sets and generated a single testing set for each of the 10 conditions separately. We equated the number of trials included for each stimulus position in each of the testing sets.
Quantifying changes in channel-tuning functions
To characterize how CTFs changes across conditions, we fitted CTFs with an exponentiated cosine function (Fig. 2e) of the following form: where x is channel offset (−180°, −135°, −90°…, 135°). We fixed the µ parameter, which determines the center of the tuning function, at a channel offset of 0° such that the peak of the function was fixed at the channel tuned for the stimulus position. The function had three free parameters: baseline (b), which determines the vertical offset of the function from zero; amplitude (a), which determines the height of the peak of the function above baseline; and, concentration (k), which determines the width of the function. We fitted the function with a GLM combined with a grid search procedure (Ester et al., 2015). We converted report the concentration as width measured as FWHM: the width of the function in angular degrees halfway between baseline and the peak.
We used a subject-level resampling procedure to test for differences in the parameters of the fitted function across conditions. We drew 100,000 bootstrap samples, each containing N-many subjects sampled with replacement, where N is the sample size. For each bootstrap sample, we fitted the exponentiated cosine function described above to the mean CTF across subjects in the bootstrap sample.
In Experiment 1, to test for differences between conditions in each parameter, we calculated the difference for the parameter between the attend-stimulus and attend-fixation conditions for each bootstrap sample, which yielded a distribution of 100,000 values. We tested whether these difference distributions significantly differed from zero in either direction, by calculating the proportion of values >0 or <0. We doubled the smaller value to obtain a two-sided p value.
In Experiment 2, for each parameter we tested for main effects of attention and contrast, and for an attention × contrast interaction. To test for a main effect of attention, we averaged parameter estimates across contrast levels for each bootstrap sample, and calculated the difference in each parameter estimate between attention conditions for each bootstrap sample. We tested whether these difference distributions significantly differed from zero in either direction, by calculating the proportion of values >0 or <0. To test for a main effect of contrast, we averaged the parameter estimates across the attention conditions, and fitted a linear function to the parameter estimates as a function of contrast. For each bootstrap sample, we calculated the slope of the best-fit linear function. We tested whether the resulting distribution of slope values significantly differed from zero in either direction by calculating the proportion of values >0 or <0. Finally, to test for an attention × contrast interaction, we fitted a linear function to the parameter estimates as a function of contrast for the attend-stimulus and attend-fixation conditions separately. For each bootstrap sample, we calculated the difference in the slope of these functions between the attend-stimulus and attend-fixation conditions. We tested whether the resulting distribution of differences-in-slope values significantly differenced from zero differed from zero in either direction by calculating the proportion of values >0 or <0. For both main effects and the interaction, we doubled the smaller p value to obtained a two-sided p value.
Quantifying contrast-response functions
We found that the effect of attention of the amplitude of stimulus-evoked CTFs varied with stimulus contrast. To further characterize this effect, we fitted the amplitude of stimulus-evoked CTFs across stimulus contrasts for each condition with a Naka-Rushton of the following form: where A is the amplitude of stimulus-evoked CTFs and c is stimulus contrast. The function had four free parameters: baseline (b), which determines the offset of the function from zero; response gain (), which determines how far the function rises above baseline; contrast gain (), which determines the semi-saturation point; and an exponent (n), which determines the slope of the function. We used MATLAB's “fmincon” function to minimize the sum of squared errors between the data and the Naka-Rushton function. We restricted the b and parameters to be between 0 and 10 (with 10 being a value that far exceeds the observed amplitudes of stimulus-evoked CTFs), to be between 0% and 100% contrast, and n to be between 0.1 and 10. As Itthipuripat et al. (2019) have pointed out, in the absence of a saturating function, one might obtain inflated estimates of when the function saturates outside the range of possible contrast values. For example, if the best fit function saturates above 100% contrast, the maximum value of the function can exceed the largest response seen across the range of contrasts that were actually presented by a substantial margin. Thus, following Itthipuripat et al. (2019), rather than reporting and , we instead obtained a measure of response gain (Rmax) by calculating the amplitude of the best-fit Naka-Rushton function at 100% contrast and subtracting the baseline (i.e., ), and a measure of contrast gain by calculating the contrast at which the function reaches half the amplitude seen at 100% contrast (C50).
We used a subject-level resampling procedure to test for differences in the parameters of the fitted Naka-Rushton function across conditions. We drew 100,000 bootstrap samples, each containing N-many subjects sampled with replacement, where N is the sample size. For each bootstrap sample, we fitted Naka-Rushton function to the amplitude of mean stimulus-evoked CTFs across subjects in the bootstrap sample. We calculated the difference for the parameter between the attend-stimulus and attend-fixation conditions for each bootstrap sample, which yielded a distribution of 100,000 values. We tested whether these difference distributions significantly differed from zero in either direction, by calculating the proportion of values >0 or <0, and doubling the smaller value to obtain a two-sided p value.
Electrode selectivity
We calculated an F statistic to determine the extent to which responses at each electrode differentiated between spatial positions of the stimulus. For each subject in Experiment 1, we partitioned all data into 15 independent sets (collapsing across the attend-stimulus and attend-fixation conditions, and equating the number of epoch for each stimulus position across sets). We calculated evoked power (averaging across 100 ms windows) for each stimulus position within each set. For each electrode, we calculated the ANOVA F statistic on evoked power across the eight stimulus positions, with each of the 15 sets serving as an independent observation. Higher F statistic values indicate that evoked power varied with stimulus position to a greater degree. As with our IEM analyses, we randomly partitioned the data into sets 500 times, and averaged the F statistic across iterations.
Data/software availability
All data and code is available on Open Science Framework at https://osf.io/hmvzc/.
Results
Experiment 1
In Experiment 1, we tested how spatial attention modulated spatially selective stimulus-evoked activity measured with EEG. On each trial, observers viewed a series of bullseye stimuli, and we manipulated whether spatial attention was directed toward or away from these stimuli (Fig. 2a). Each trial began with a peripheral cue that indicated where the bullseye stimuli would appear. In attend-stimulus blocks, observers covertly monitored the sequence of bullseyes for one bullseye that was lower contrast than the rest. In attend-fixation blocks, observers ignored the bullseye stimuli, and instead monitored the fixation dot for a brief decrement in contrast. At the end of each trial, observers reported whether or not a contrast decrement occurred in the attended stimulus. We matched difficulty across the two conditions by adjusting the size of the contrast decrement for each condition (see Staircase procedures). Thus, accuracy was comparable in the attend-stimulus (M = 81.0%, SD = 3.7) and the attend-fixation (M = 80.0%, SD = 2.2) conditions.
To test how spatial attention modulates the spatial selectivity of stimulus-driven activity, we measured the power of broadband EEG activity evoked by the bullseye stimuli (i.e., the power of activity phase-locked to stimulus onset; see Evoked power), and we used an IEM (Brouwer and Heeger, 2009, 2011; Sprague and Serences, 2013; Foster et al., 2016) to reconstruct spatially selective CTFs from the scalp distribution of stimulus-evoked power (see Inverted encoding model). Figure 4a shows stimulus-evoked CTFs across time in the attend-stimulus and attend-fixation conditions. We found that stimulus-evoked CTFs were tuned for the stimulus location, with a peak response in the channel tuned for the stimulus location, and this spatial tuning emerged 70-80 ms after stimulus onset. Human ERP studies have found that visually evoked responses are modulated by attention as early as 80 ms after stimulus onset (for review, see Hillyard and Anllo-Vento, 1998). For instance, many studies have reported that attention increases the amplitude of the posterior P1 component (e.g., van Voorhis and Hillyard, 1977; Martínez et al., 1999; Itthipuripat et al., 2014a), which is typically seen ∼100 ms after stimulus onset. Thus, we focused our analysis in an early window, 80-130 ms after stimulus onset, to capture the early stimulus-evoked response. Figure 4b shows the reconstructed channel responses during our window of interest for each of the eight stimulus positions, separately for the attend-stimulus and attend-fixation conditions. We found that the peak response always occurred in the channel tuned for the spatial position of the stimulus. Thus, stimulus position is precisely encoded by stimulus-evoked power. To determine which electrodes carry information about the spatial position of the stimulus, we calculated an F statistic across stimulus locations for each electrode (see Electrode selectivity), where larger values indicate that stimulus-evoked power varies with stimulus location to a greater extent (Fig. 4c). We found that posterior electrodes carried the most information about stimulus location. Although the cortical source of EEG signals cannot be fully resolved based on EEG scalp recordings, this analysis as well as the timing of the observed activity suggest that the spatially selective activity that our IEM analysis capitalized on is generated in posterior visual areas.
Having established that stimulus-evoked power precisely encodes stimulus position, we examined the effect of attention on the tuning properties of the stimulus-evoked CTFs. Figure 5a shows the stimulus-evoked CTFs in our window of interest. We fitted the CTFs in each condition with an exponentiated cosine function to estimate baseline, amplitude, and width parameters (Fig. 2e; see Quantifying changes in channel-tuning functions). Figure 5b shows the parameter of the best fitting functions by condition. We found that stimulus-evoked CTFs were both higher in amplitude (p < 0.0001) and more broadly tuned (p < 0.0001) in the attend-stimulus condition than in the attend-fixation condition, and we observed no difference in baseline between the conditions (p = 0.974). However, as we will see next, the finding that CTFs were broader in the attend-stimulus condition than in the attend-fixation condition appears to be an artifact of lingering activity from the preceding stimulus event. Furthermore, this effect did not replicate in Experiment 2. Thus, the primary effect of attention is to improve the stimulus representation via an increase in the amplitude of the CTF that tracks the target's position.
Controlling for lingering activity evoked by the preceding stimulus in the sequence
We designed our task to measure activity evoked by each of the four stimuli presented within each trial. To this end, we jittered the interstimulus interval between each stimulus (between 500 and 800 ms) to ensure that activity evoked by one stimulus in the sequence will not be phase-locked to the onsets of the stimuli before or after it in the sequence. However, when we examined the amplitude of stimulus-evoked CTFs through time (Fig. 5c), we found prestimulus tuning (in the 200 ms preceding stimulus onset) that was higher amplitude in the attend-stimulus than the attend-fixation condition (p = 0.036). We hypothesized that this prestimulus spatially selective activity may reflect activity evoked by the preceding stimulus in the sequence that was sufficiently low frequency that was not eliminated by the temporal jitter between stimulus onsets. Because this prestimulus activity was higher amplitude in the attend-stimulus condition than in the attend-fixation condition, it could have contaminated the apparent attentional modulations of stimulus-evoked activity (both the increase in amplitude and the broadening of stimulus-evoked CTFs) that we observed 80-130 ms after stimulus onset. Thus, we examined the effect of this lingering activity by examining CTFs as a function of position in the sequence of four stimuli within each trial. Within each trial, the second, third, and fourth stimuli were preceded by a bullseye stimulus that should drive a strong visually evoked response, whereas the first stimulus was preceded by a small, low-contrast cue that should drive a much weaker visually evoked response (Fig. 1). Thus, we expected that stimulus-evoked activity for the first bullseye stimulus in the sequence should be contaminated by activity evoked by the preceding stimulus to a lesser degree than subsequent stimuli in the sequence. Figure 6 shows the reconstructed CTFs from activity evoked by stimuli in each position on the sequence. For this analysis, we trained the IEM on all but the tested stimulus. For example, when testing on the first stimulus in the sequence, we trained on stimuli in serial positions 2-4. We found a robust effect of attention on the amplitude of the stimulus-evoked CTFs across stimuli in all positions in the sequence (all p values < 0.05). In contrast, we found that the CTFs were broader in the attend-stimulus and attend-fixation conditions for the second, third, or fourth stimuli in the sequence (all p values < 0.05), but not for the first stimulus in the sequence (p = 0.540), when the influence of lingering stimulus-evoked activity should be greatly reduced. This finding suggests that the increase in CTF width was driven by lingering activity evoked by the preceding stimulus in the sequence. It is not entirely clear why lingering activity from the preceding stimulus increased the width of CTFs rather than simply increasing CTF amplitude. One possibility is that spatially tuned activity evoked by a visual stimulus is more broadly tuned at later latencies than during the initial encoding of the stimulus.
Next, to obtain converging evidence for this conclusion, we took a different approach to eliminate lingering activity evoked by the preceding stimulus while still collapsing across all stimulus positions in the sequence. It is primarily low-frequency components that survive temporal jitter. Thus, we reanalyzed the data, this time applying a 4 Hz high-pass filter to remove very low-frequency activity. We found that high-pass filtering the data eliminated the prestimulus difference in spatial selectivity between the attend-stimulus and attend-fixation conditions (p = 0.458; see Fig. 7c), suggesting that the prestimulus activity was restricted to low frequencies. Having established that a high-pass filter eliminated prestimulus activity, we reexamined stimulus-evoked CTFs in our window of interest (80-130 ms) after high-pass filtering (Fig. 7a,b). Again, we found that the CTFs were higher amplitude when the stimulus was attended (p < 0.0001). We also found that CTFs were more broadly tuned when the stimulus was attended (p < 0.01). However, as we will see, this small effect of attention on CTF width did not replicate in Experiment 2, suggesting that the primary effect of attention is to increase the amplitude of stimulus-evoked CTFs.
Experiment 2
Past fMRI work has found that spatially attending a stimulus increases the amplitude of spatial representations in visual cortex (Sprague and Serences, 2013; Vo et al., 2017). However, this effect of attention on the amplitude of this spatially specific activity is additive with stimulus contrast, such that attention effects are equivalent across all levels of stimulus contrast (Buracas and Boynton, 2007; Murray, 2008; Sprague et al., 2018b; Itthipuripat et al., 2019). Therefore, these changes in spatially specific activity measured with fMRI appear to reflect a stimulus-independent, additive shift in cortical activity that does not provide insight into how attention affects stimulus-evoked sensory processing. In contrast, the CTFs reconstructed from stimulus-evoked EEG activity provides a more direct window into how attention affects stimulus-driven sensory activity by isolating activity that is phase-locked to target onset. Therefore, in Experiment 2, we manipulated stimulus contrast to test how the effect of of attention on stimulus-evoked population codes scales with stimulus contrast.
Observers performed the same task as in Experiment 1 (Fig. 2a), but we parametrically varied the pedestal contrast of the bullseye stimulus from 6.25% to 90.6% across trials. We adjusted the size of the contrast decrement independently for each of the conditions using a staircase procedure designed to hold accuracy at ∼76% correct (see Staircase procedures). Accuracy was well matched across condition: mean accuracy across subjects did not deviate from 76% by >1% any condition (Table 2). We reconstructed CTFs independently for each condition, having first estimated channel weights using additional trials (with a pedestal contrast of 90.6%) that were collected for this purpose (see Training and testing data). In Experiment 2, we again used a 4 Hz high-pass filter to remove lingering activity evoked by the preceding stimulus in the sequence. Figure 8a, b shows the stimulus-evoked CTFs as a function of contrast with the best-fit functions for the attend-stimulus and attention-fixation conditions, respectively. For each of the three parameters (amplitude, baseline, and width), we performed a resampling test to test for a main effect of contrast, a main effect of attention, and an attention × contrast interaction (see Quantifying changes in channel-tuning functions). First, we examined CTF amplitude (Fig. 8c). We found that CTF amplitude increased with stimulus contrast (main effect of contrast: p < 0.0001), and CTF amplitude was larger in the attend-stimulus condition than in the attend-fixation condition (main effect of attention: p < 0.0001). Critically, the effect of attention on CTF amplitude increased with stimulus contrast (attention × contrast interaction, p < 0.0001). This finding provides clear evidence that the effect of attention on stimulus-evoked CTFs is not additive with stimulus contrast, as is the case with BOLD activity measured by fMRI (Buracas and Boynton, 2007; Murray, 2008; Sprague et al., 2018b; Itthipuripat et al., 2019).
To further characterize this effect, we fitted the amplitude parameter with a Naka-Rushton function (see Quantifying contrast-response functions). The curves in Figure 8c show the best-fit functions for each condition. We estimated four parameters of the Naka-Rushton function: a baseline parameter (b), which determines the offset of the function from zero; a response gain parameter (Rmax), which determines how much the function rises above baseline; contrast gain parameter (C50), which measures horizontal shifts in the function; and a slope parameter (n), which determines how steeply the function rises. We found that Rmax was reliably higher in the attend-stimulus condition the attend-fixation condition (resampling test, p = 0.036). However, we did not find reliable differences between conditions for the C50, b, or n parameters (resampling tests, p = 0.104, p = 0.126, and p = 0.376, respectively; for descriptive statistics, see Table 3). Thus, we found that attention primarily changed the amplitude of stimulus-evoked CTFs via an increase in response gain.
Next, we examined CTF width (Fig. 8d). We found that estimates of CTF width were very noisy for the 6.25% and 12.5% contrast conditions because of the low amplitude of the CTFs in these conditions, precluding confidence in those estimates. Thus, we restricted our analysis to the higher contrast conditions (25.0%, 50.0%, and 90.6% contrast). We found no main effect of attention (p = 0.851), and no main effect of contrast (p = 0.130). However, we found a reliable attention × contrast interaction (p = 0.035), such that CTFs were narrower when the stimulus was attended for the 90.6% contrast condition and 50% contrast condition, and were broader for the 25% contrast condition, but none of these differences between the attend-stimulus and attend fixation conditions survived Bonferroni correction (p = 0.043, p = 0.277, and p = 0.258, respectively; αcorrected = 0.05/3 = 0.017). Thus, we did not replicate the finding from Experiment 1 that stimulus-evoked CTFs were broader when the stimulus was attended. Finally, we examined CTF baseline (Fig. 8e). Although CTF baseline was generally higher in the attend-stimulus condition than in this attention fixation condition, this difference was not significant (main effect of attention, p = 0.055), nor was the main effect of contrast (p = 0.708) or attention × contrast interaction (p = 0.289)
Attention produces a baseline shift in spatially selective alpha band power
Past work has closely linked alpha band (8-12 Hz) oscillations with covert spatial attention. A plethora of studies has shown that posterior alpha band power is reduced contralateral to an attended location (e.g., Worden et al., 2000; Kelly et al., 2006; Thut et al., 2006). Furthermore, alpha band activity precisely tracks where in the visual field spatial attention is deployed (Rihs et al., 2007; Samaha et al., 2016; Foster et al., 2017). For example, we and others have reconstructed spatial CTFs from alpha band activity that track the spatial and temporal dynamics of covert attention (e.g., Foster et al., 2017). Importantly, the relationship between α topography and attention appears to include a stimulus-independent component because alpha activity tracks the allocation of spatial attention in blank or visually balanced displays (Worden et al., 2000; Thut et al., 2006). More recent work has provided further evidence in favor of this view. Itthipuripat et al. (2019) parametrically varied the contrast of a lateral stimulus and cued observers to either attend the stimulus or attend the fixation dot (similar to the task we use in the current study). Itthipuripat et al. (2019) found that the effect of attention and stimulus contrast on posterior alpha band power contralateral to the stimulus were additive: although contralateral alpha power declined as stimulus contrast increased, directing attention to the stimulus reduced contralateral alpha power by the same margin regardless of stimulus contrast. This finding suggests that the alpha band activity indexes the locus of spatial attention in a stimulus-independent manner.
If alpha band activity reflects a stimulus-independent aspect of spatial attention, then fluctuations of alpha power should be additive with stimulus contrast in Experiment 2. Thus, we examined CTFs reconstructed from total alpha band power (i.e., the power of alpha band activity regardless of its phase relationship to stimulus onset) in a poststimulus window (0-500 ms after stimulus-onset). Figure 9a, b shows the reconstructed alpha band CTFs for the attend-stimulus and attend-fixation conditions, respectively. Figure 9c–e shows the amplitude, width, and baseline parameters as a function of condition. We found that amplitude of alpha band CTFs (Fig. 9c) increased with stimulus contrast (main effect of contrast: p < 0.0001), and CTF amplitude was greater in the attend-stimulus condition than in the attend-fixation condition (main effect of attention: p = 0.0005). Importantly, we did not find a reliable interaction between attention and stimulus contrast on CTF amplitude (attention × contrast interaction, p = 0.438). Thus, the effects of contrast and attention on the amplitude of α CTFs was additive. Although spatial CTFs were generally broader in the attend-stimulus condition than in the attend-fixation condition (Fig. 9d), we did not find a reliable main effect of attention (p = 0.094), nor did we find a main effect of contrast (p = 0.869) or an attention × contrast interaction (p = 0.908). Finally, we found that baseline was reliably lower in the attend-stimulus condition than in the attend-fixation condition (Fig. 9e; main effect of attention: p < 0.001). Thus, attending the stimulus not only increased activity in the channel tuned for the attended location, but also reduced activity in channels tuned for distant locations. We did not find a reliable main effect of contrast (p = 0.080), or an attention × contrast interaction (p = 0.900). To summarize, spatial attention primarily influenced the amplitude and baseline of alpha band CTFs, and these effects were additive with the effect of stimulus contrast. Thus, the effect of attention of alpha band power reflects a stimulus-independent baseline shift in spatially selective alpha band power, much like the effect of attention on spatially specific BOLD activity in past fMRI studies of attention (Murray, 2008; Itthipuripat et al., 2019).
Discussion
To examine how and when covert spatial attention shapes the selectivity of stimulus-driven spatial population codes, we reconstructed spatially selective CTFs from stimulus-evoked EEG signals that were phase-locked to stimulus onset. Across two experiments, we found that attention increased the amplitude of stimulus-evoked CTFs that were tuned for the location of the stimulus. We did not find convincing evidence that attention changed the width of stimulus-evoked CTFs. Although we found that stimulus-evoked CTFs were broader for attended stimuli than for unattended stimuli in Experiment 1, this effect was greatly reduced when the influence of prior stimulus events was accounted for, and did not replicate in Experiment 2. Therefore, our results show that spatial attention primarily increases the amplitude of stimulus-evoked population tuning functions.
A core strength of our EEG-based approach is that it allowed us to isolate early visually evoked activity. We focused our analysis on stimulus-evoked activity in a window 80-130 ms after stimulus onset. Visually evoked EEG activity at this latency reflects the first wave of stimulus-driven activity in extrastriate cortex (Clark and Hillyard, 1996; Martínez et al., 1999) but likely also captures early recurrent feedback signals (e.g., Boehler et al., 2008). Many ERP studies have shown that spatial attention increases the amplitude of evoked responses at this early latency. For example, spatial attention increases the amplitude of the posterior P1 component observed ∼100 ms after stimulus onset (van Voorhis and Hillyard, 1977; Martínez et al., 1999; Itthipuripat et al., 2014a). However, it is unclear how changes in the overall amplitude of visually evoked potentials correspond to changes in underlying population codes. For instance, a larger overall population response could reflect an increase in the amplitude of the spatial population code, or it could reflect a broadening of the spatially tuned population response without increasing its amplitude, such that the stimulus evoked a response in a larger population of neurons. Here, we provide the first clear evidence that attention enhances the amplitude of the stimulus-evoked spatial population codes during this early stage of sensory processing.
In Experiment 2, we confirmed that we were observing an attentional modulation of stimulus-evoked activity rather than a stimulus-independent increase in baseline activity. Here, we found that the effect of attention on the amplitude of stimulus-evoked CTFs increased with stimulus contrast. Model fitting revealed that this effect was best described by an increase in response gain (i.e., a multiplicative scaling of the CRF), which dovetails with past work that has found that attention increases response gain of the P1 component and of steady-state visually evoked potentials (Kim et al., 2007; Itthipuripat et al., 2014a, 2014b, 2019). Although our results are most consistent with an increase in response gain, it must be noted that our CRFs did not clearly saturate at higher stimulus contrast, which makes it difficult to unambiguously differentiate between response gain and contrast gain because contrast gain can mimic response gain in the absence of clear saturation (e.g., consider the left half of the functions in Fig. 1b, which closely resemble a change in response gain). We also note that our finding that attention increased response gain may depend on the fact that we cued the precise location of the bullseye stimulus. The normalization model of attention (Reynolds and Heeger, 2009), an influential computational model of attention, predicts that whether attention produces a change in response gain or contrast gain depends on the spread of spatial attention relative to the size of the stimulus. Specifically, the model predicts that attention will change response gain when attention is tightly focused on a stimulus but will change contrast gain (shifting the CRF to the left) when the spatial spread of attention is large relative to the stimulus (Reynolds and Heeger, 2009). Indeed, past EEG and psychophysical studies that have manipulated the spatial spread of attention relative to the size of the stimulus have supported this prediction (Herrmann et al., 2010; Itthipuripat et al., 2014b). Thus, further work is needed to test whether the change in response gain that we observed in the amplitude of the spatially tuned population response is specific to situations in which observers can focus spatial attention very tightly on the stimulus. Nevertheless, Experiment 2 provides unambiguous evidence that the effect of attention on the amplitude of spatially tuned population responses reflects a modulation of stimulus-driven activity rather than a stimulus-independent, additive shift as is measured with fMRI (Buracas and Boynton, 2007; Murray, 2008; Pestilli et al., 2011; Sprague et al., 2018b; Itthipuripat et al., 2019; but see Li et al., 2008).
Other aspects of our findings, however, are consistent with the stimulus-independent effects that have been observed in BOLD activity. There is substantial evidence that attention is linked with spatially specific changes in alpha band power (for review, see Jensen and Mazaheri, 2010; Foster and Awh, 2019). Many studies have shown that alpha power is reduced contralateral to attended locations (e.g., Worden et al., 2000; Thut et al., 2006). This reduction is thought to reflect a stimulus-independent shift in alpha power because it is seen in in the absence of visual input (Sauseng et al., 2005; Foster et al., 2020). Recently, Itthipuripat et al. (2019) provided new support for this view. They found that spatially attending a lateralized stimulus reduced contralateral alpha power by the same margin regardless of stimulus contrast. We conceptually replicated and extended this finding. Attention-related modulations of alpha power track the precise location that is attended within the visual field (Rihs et al., 2007; Samaha et al., 2016; Foster et al., 2017). Thus, we examined the effect of attention on poststimulus alpha band CTFs. Consistent with the results of Itthipuripat et al. (2019), we found that the effect of attention on poststimulus alpha band CTFs was additive with the effect of stimulus contrast, such that spatial attention increased the amplitude of spatially tuned alpha band CTFs by the same amount regardless of stimulus contrast. Thus, our results add to growing evidence that attention-related changes in alpha band power are stimulus-independent.
In conclusion, decades of work have established that spatial attention modulates relatively early stages of sensory processing, but there has been limited evidence regarding how attention changes population-level sensory codes. Here, we have provided robust evidence that spatial attention increases the amplitude of spatially tuned neural activity evoked by attended items within 100 ms of stimulus onset. Thus, attention increases the gain of spatial population codes during the first wave of sensory activity.
Footnotes
This work was supported by National Institute of Mental Health Grant 5RO1 MH087214-08. We thank Mei Arditi, Emma Bsales, and Naomi Nero for assistance with data collection.
The authors declare no competing financial interests.
- Correspondence should be addressed to Joshua J. Foster at jjfoster{at}bu.edu