Abstract
Functional magnetic resonance imaging (fMRI) and electroencephalography (EEG) are two noninvasive methods commonly used to study neural mechanisms supporting visual attention in humans. Studies using these tools, which have complementary spatial and temporal resolutions, implicitly assume they index similar underlying neural modulations related to external stimulus and internal attentional manipulations. Accordingly, they are often used interchangeably for constraining understanding about the impact of bottom-up and top-down factors on neural modulations. To test this core assumption, we simultaneously manipulated bottom-up sensory inputs by varying stimulus contrast and top-down cognitive modulations by changing the focus of spatial attention. Each of the male and female subjects participated in both fMRI and EEG sessions performing the same experimental paradigm. We found categorically different patterns of attentional modulation on fMRI activity in early visual cortex and early stimulus-evoked potentials measured via EEG (e.g., the P1 component and steady-state visually-evoked potentials): fMRI activation scaled additively with attention, whereas evoked EEG components scaled multiplicatively with attention. However, across longer time scales, a contralateral negative-going potential and oscillatory EEG signals in the alpha band revealed additive attentional modulation patterns like those observed with fMRI. These results challenge prior assumptions that fMRI and early stimulus-evoked potentials measured with EEG can be interchangeably used to index the same neural mechanisms of attentional modulations at different spatiotemporal scales. Instead, fMRI measures of attentional modulations are more closely linked with later EEG components and alpha-band oscillations. Considered together, hemodynamic and electrophysiological signals can jointly constrain understanding of the neural mechanisms supporting cognition.
SIGNIFICANCE STATEMENT fMRI and EEG have been used as tools to measure the location and timing of attentional modulations in visual cortex and are often used interchangeably for constraining computational models under the assumption that they index similar underlying neural processes. However, by varying attentional and stimulus parameters, we found differential patterns of attentional modulations of fMRI activity in early visual cortex and commonly used stimulus-evoked potentials measured via EEG. Instead, across longer time scales, a contralateral negative-going potential and EEG oscillations in the alpha band exhibited attentional modulations similar to those observed with fMRI. Together, these results suggest that different physiological processes assayed by these complementary techniques must be jointly considered when making inferences about the neural underpinnings of cognitive operations.
Introduction
Functional magnetic resonance imaging (fMRI) and electroencephalography (EEG) are commonly used as complementary methods to study the neural mechanisms that support human visual attention. fMRI and EEG are different assays of neural activity, with fMRI measuring changes in blood volume and the ratio of oxygenated to deoxygenated hemoglobin (Logothetis et al., 2001; Logothetis, 2002, 2008) and EEG measuring electrical potentials on the scalp generated by coherent activity in large populations of cortical neurons (Luck, 2012; Lopes da Silva, 2013). Implicit in many studies is the assumption that attentional modulations of early sensory responses measured using fMRI and EEG reflect the same underlying changes in neural activity at different spatial and temporal resolutions. For example, there is a long tradition of using both fMRI and EEG to assess attention-induced gain amplification of early sensory signals, with the former measure used for fine-grained spatial localization and the latter for tracking the precise timing of attention-related modulations (Hillyard and Anllo-Vento, 1998; Mangun et al., 1998; Martínez et al., 1999, 2001; Di Russo et al., 2002, 2005, 2007; Noesselt et al., 2002; Busse et al., 2005; Novitskiy et al., 2011; Zhang et al., 2012; Chen et al., 2014; Di Russo and Pitzalis, 2014; Green et al., 2017).
However, there are hints in the literature that fMRI and commonly measured early sensory EEG responses are not simply two complementary means of assaying the same neural modulations. For example, when neural responses are measured as a function of stimulus contrast to obtain contrast response functions (CRFs), different types of attentional modulations have been observed across techniques (Fig. 1a). Results from fMRI often support a mechanism whereby attention increases the evoked response to all stimuli equally, regardless of their contrast, called an “additive shift” (Buracas and Boynton, 2007; Murray, 2008; but see Li et al., 2008; Pestilli et al., 2011; Hara and Gardner, 2014; Sprague et al., 2018b). In contrast, results from EEG and other electrophysiological measurements in visual cortex often support a contrast-dependent response modulation (either a horizontal shift of the CRF, called “contrast gain”, or a multiplicative scaling of the CRF, called “response gain”: Reynolds et al., 2000; Di Russo et al., 2001; Martínez-Trujillo and Treue, 2002; but see Williford and Maunsell, 2006; Kim et al., 2007; Lee and Maunsell, 2009; Lauritzen et al., 2010; Wang and Wade, 2011; Andersen et al., 2012; Itthipuripat et al., 2014a,b, 2017, 2018). Although these different types of attention effects could be because of differences in task designs, stimulus properties, recording sites, training duration, cognitive demand, and subjects' attentional strategy and expertise (Reynolds and Heeger, 2009; Herrmann et al., 2010; Itthipuripat et al., 2014a, 2017; Ruff and Cohen, 2014, 2016; Zhang et al., 2016; Maniglia and Seitz, 2018; Ni et al., 2018), another reasonable source of divergence is the neural response properties to which each measurement is most sensitive (Boynton, 2011; Itthipuripat and Serences, 2016).
Here, we directly evaluated the extent to which results from different measurement techniques covary across parametric manipulations of stimulus intensity and cognitive demands, two factors that are often related to changes in measured signal properties. We sought to test the hypothesis that, when all sources of variability (e.g., task designs, task difficulty, and subjects) are controlled for to the best extent possible, fMRI and EEG actually index similar types of neural modulations across stimulus and task manipulations. Mapping out the full pattern of attentional modulations across different stimulus intensity levels is critical, as measuring modulations at one stimulus intensity will only index changes because of cognitive demands, and using a single task condition will only index stimulus-related modulations (Hermes et al., 2017). Accordingly, each manipulation on its own does not provide enough data to evaluate whether fMRI and EEG responses follow the same profile (Fig. 1a).
We sequentially recorded fMRI and EEG in the same subjects while they performed an attention-demanding contrast detection task with equated task difficulty across stimulus contrast levels, attention conditions, and measurement techniques. Contrary to the implicit hypothesis that fMRI and EEG are just complementary methods that measure the same modulations at different spatiotemporal scales, we found that fMRI responses qualitatively diverged from commonly-measured early evoked EEG signals such as the P1 and steady-state visually evoked potentials (SSVEPs). However, a later low-frequency contralateral negative-going potential and induced oscillatory signals in the alpha band (10–12 Hz) revealed attentional modulations that more closely tracked those recorded with fMRI.
Materials and Methods
Participants.
Seven neurologically healthy human observers (19–32 years old, 3 females, 1 left-handed) with normal or corrected-to-normal vision were recruited from the University of California, San Diego (UCSD) community. All participants provided written informed consent, approved by the human subjects Institutional Review Board at UCSD and the experiment was conducted under a protocol that followed the Declaration of Helsinki. The participants were compensated $10, $15, and $20 per hour for participating in behavioral training, EEG, and fMRI recording sessions, respectively. Two of the participants are authors (S.I. and T.C.S.) and were not compensated.
Stimulus presentation.
During behavioral and EEG recording sessions, stimuli were presented on a PC running Windows XP using MATLAB (MathWorks) and the Psychophysics Toolbox v3.0.8. Participants sat 60 cm from the CRT monitor (60 Hz refresh rate) in a sound-attenuated and electromagnetically shielded room (ETS-Lindgren). During fMRI scanning sessions, we presented stimuli using a contrast-linearized LCD projector (60 Hz) on a rear-projection screen mounted at the foot of the scanner (110 cm wide, ∼4.1 m viewing distance). All stimuli appeared on a neutral gray background.
fMRI and EEG main tasks.
Throughout the main fMRI and EEG task, we instructed participants to fixate on the dark gray fixation point located at the center of the gray screen. Individual trials started with a 500 ms color cue, instructing participants either to covertly attend to the checkerboard stimulus (radius = 1.45° visual angle) located 3.6° to the left (a red cue) or right (a blue cue) relative to fixation (attend-stimulus condition). Alternatively, subjects were instructed to maintain fixation on the gray fixation dot while ignoring the peripheral stimulus (indicated with a green cue; attend-fixation condition). The checkerboard stimulus appeared 500–1000 ms after cue onset and the stimulus continued flickering at 15 Hz (contrast reversal) for 2000 ms. On each trial, the checkerboard stimulus had one of the following Michelson contrast levels: 0, 4.375, 8.75, 17.5, 35, and 70% (logarithmically-spaced). In fMRI sessions, the intertrial interval varied from 3000 to 7000 ms, and in EEG experiments, it varied from 1500 to 2000 ms. On 25% of the attend-stimulus trials, the stimulus contained a constant contrast increment (target trials). On 25% of the attend-fixation trials, the gray fixation dot contained a constant contrast increment (target trials). The contrast increment in both attend-stimulus and attend-fixation target trials appeared anytime from 600 to 1300 ms after the stimulus onset. Subjects were instructed to press a button with their right index finger as quickly and accurately as possible when they saw this contrast increment. The contrast increment of the fixation dot and the contrast increment of the checkerboard stimulus were separately determined for each pedestal contrast level on a block-by-block basis to fix the hit rate at ∼75% across all stimulus and task conditions.
Each subject completed one fMRI session and two EEG sessions with each session completed on different days. The fMRI experiment contained six blocks of trials, while the EEG experiment contained 20 of blocks of trials in total. Every two blocks of the main task contained 96 trials in total where all stimulus and attention conditions were counterbalanced: 2 attention conditions (attend-stimulus/attend-fixation) × 2 stimulus location (left/right hemifield) × 6 pedestal contrast × 4 repeats. The order of stimulus and attention conditions was pseudorandomized within these two-block sequences. To control for any possible effects of learning that might occur across sessions, four participants first underwent two EEG sessions followed by an fMRI session, whereas the other three participants first underwent an fMRI session followed by two EEG sessions. We obtained fewer trials of fMRI data than EEG data because the observed fMRI signals had a relatively high signal-to-noise ratio.
Before participants began the first recording session (either EEG or fMRI), they underwent a 2.5 h behavioral training session on an identical task, except that there were targets on 50% of the trials instead of 25%, there was no response deadline, and subjects had to answer whether each trial was a target (a stimulus with a contrast increment) or a non-target trial by pressing one of the two corresponding buttons on a keyboard. During this training session, the contrast thresholds were estimated using a staircase procedure that was applied independently for each attention condition and each pedestal contrast level. Three successive correct responses (either a hit or a correct rejection) led to a 0.5% decrease in the Δc that defined the target stimulus, whereas one incorrect response led to a 0.5% increase in Δc (either a miss or a false alarm). Trials from the first five reversals were excluded and the mean values of the contrast increments from remaining trials were used as contrast detection thresholds in the first block of the first EEG or fMRI recording session.
Behavioral analysis.
We first computed perceptual sensitivity or behavioral d-prime (d′), using the following equation: d′ = Z(hit rate) − Z(false alarm rate), where Z is the inverse of the cumulative distribution function of the Gaussian distribution. To test whether d′ were equated across contrast levels (6 levels), attention conditions (attend-stimulus/attend-fixation), and measurement modalities (fMRI/EEG), we used a three-way repeated-measured ANOVA with these independent measures as within-subject factors. In addition, we also used a separate two-way repeated-measures ANOVA to test the main effects of contrast and measurement modality and their interactions on the behavioral contrast detection thresholds.
fMRI functional localizers.
Participants performed 1–2 blocks of a functional localizer task to identify voxels that were visually responsive to the portion of the visual field subtended by the stimulus in the main task. Subjects maintained fixation while ignoring the localizer stimulus. The peripheral checkerboard stimulus was 100% contrast and presented at the same size and location as in the main task. It flickered at 15 Hz and alternately appeared in the left and right stimulus locations for 8 s/trial. Subjects responded with a button press when they perceived a brief and small contrast change at the fixation point; contrast detection targets could appear between 2 and 3 times per 8 s trial.
To estimate the spatial sensitivity profile of each voxel during the “training” phase of the inverted encoding model (IEM) analysis (see the next section: fMRI acquisition, preprocessing and analysis), participants performed 7–8 blocks of a spatial mapping task, with all participants performing 4 blocks using low-contrast mapping stimuli (50% contrast), and one participant performing 3 blocks using high-contrast mapping stimuli (100% contrast), whereas the remaining participants performed 4 high-contrast mapping blocks. To ensure IEMs for all participants were estimated with an equivalent amount of data, we used the low-contrast mapping data for all analyses reported here. On each trial, a 15 Hz flickering checkerboard stimulus 2.90° in diameter appeared at a different location on the screen, selected from a 8 × 4 square grid (1.45° horizontal/vertical spacing) and jittered on each trial (±0.725° in X, Y independently). We also included six null trials in which no checkerboard stimulus appeared. On all trials (stimulus-present and null trials), participants carefully monitored for a brief dimming of the fixation point, which acted as a target stimulus (1 target per trial; targets appeared on 50% of trials), and participants responded with a button press when the dimming occurred.
fMRI retinotopic mapping procedure.
Striate and extra-striate visual areas (V1, V2v, V3v, V2d, V3d, hV4) were defined by standard retinotopic mapping procedures, using a rotating counter-phase flickering checkerboard in conjunction with bowtie stimuli subtending the vertical and horizontal visual field meridians on alternating blocks, during separate scanner runs. The data were projected onto a computationally inflated gray/white matter boundary surface reconstruction for visualization (Engel et al., 1994; Sereno et al., 1995). V2 and V3 were combined by concatenating voxels so that any slight errors in drawing the horizontal meridian boundaries that separate them would not bias the inclusion of the localizer-defined voxels into one region or the other. This was especially important because the visual stimuli in this study were presented along the horizontal meridian, so imperfections in ROI definitions could result in erroneous conclusions about differences between V2 and V3, which we do not believe are possible to fairly assay with this stimulus setup. For the V1-hV4 ROI, we concatenated all voxels from all ROIs (V1, V2/V3, and hV4).
fMRI acquisition, preprocessing and analysis.
We acquired fMRI data on a 3-tesla research-dedicated GE MR750 scanner located at the Keck Center for Functional MRI at UCSD. We scanned all participants twice: once for a retinotopic mapping session and once for the main task session (each scan ∼2 h). During each session we acquired a high-resolution whole-head anatomical image used to align to the retinotopic mapping session (T1-weighted fast-spoiled gradient echo sequence, 25.6 × 25.6 cm FOV, 256 × 192 acquisition matrix, 8.136/3.172 ms TR/TE, 192 slices, 9° flip angle, 1 mm isotropic voxel size).
We acquired task data using a Nova 32-channel head coil (Nova Medical) at 3 mm isotropic resolution using axial slices spanning occipital cortex, and also including parietal and frontal cortex (TR = 2000 ms, TE = 30 ms, flip angle = 90°, 35 interleaved slices, 3 mm thickness, 0 mm gap, 19.2 × 19.2 cm FOV, 64 × 64 acquisition matrix). We acquired 179 volumes of data per run for the main task, 91 volumes for the IEM mapping task, and 118 volumes for the stimulus localizer task.
Preprocessing included unwarping using custom scripts implementing procedures from AFNI and FSL. All subsequent preprocessing occurred in BrainVoyager 2.6.1, including slice time correction, six-parameter rigid-body motion correction, high-pass temporal filtering to remove slow signal drifts over the course of each run, and transformation of data into aligned Talairach space. Then, the BOLD signal was normalized within each voxel for each run separately to Z-scores. All other analyses involved custom MATLAB scripts.
For fMRI analyses, we extracted the signal at each voxel on each trial using a GLM framework. We modeled each trial independently for the IEM mapping task and main contrast discrimination task [hemodynamic response functions (HRFs): two-gamma, time-to-peak 5 s, undershoot peak at 15 s, response undershoot ratio 6, response and undershoot dispersion of 1]. To extract deconvolved HRFs, we used a finite impulse response model, modeling time points from −2 to 16 s (spanning 10 TRs). Each condition (6 contrasts × 2 attention conditions × 2 positions × 2 target presence conditions) was modeled together, along with six run-specific constant terms, resulting in a model with 246 predictors. We solved this model using standard linear regression, and plotted HRFs averaged across voxels within localizer-defined ROIs (after sorting trials based on the stimulus location relative to ROI hemisphere; Fig. 2). Error bars represent within-participants SEM, which we computed by removing the mean across all time points and conditions within each participant individually, then computing SEM at each time point within each condition.
For the stimulus localizer task, we modeled all trials using a “left” and a “right” regressor. For univariate analyses, we extracted activation from voxels significantly activated by the localizer task (q = 0.05, whole-brain FDR corrected), averaged across voxels responsive to the left or right stimulus, and sorted trials by contrast and attention condition. This resulted in a range of ROI sizes across participants and hemispheres (V1: 27–285 voxels; V2/V3: 7–579 voxels; hV4: 0–215 voxels; note for the participant with 0 voxels in one hemisphere, we only included data for which the non-empty ROI was stimulated before averaging responses across trials).
For multivariate analyses, we used all voxels across both hemispheres in retinotopically-defined ROIs (V1: 536–1086 voxels, V2/V3: 854–1732 voxels; hV4: 314–746 voxels). For these analyses, we modeled the response of each voxel as a linear combination of a discrete set of spatial filters, or “information channels” (for a detailed description of the analysis framework, see Sprague et al., 2016). We modeled channels as a rectangular grid, 9 × 5, of 1.81° full-width half-maximum (FWHM) round filters, spaced by 1.449° horizontally/vertically: Where r is the distance from the filter center and s is a “size constant” reflecting the distance from the center of each spatial filter at which the filter returns to 0. Values greater than this are set to 0, resulting in a single smooth round filter at each position along the triangular grid (s = 4.554°).
This rectangular grid of filters forms the set of information channels and each mapping task stimulus is converted from a contrast mask (1's for each pixel subtended by the stimulus, 0's elsewhere) to a set of filter activation levels by taking the dot product of the vectorized stimulus mask and the sensitivity profile of each filter. Once all filter activation levels are estimated, we normalize so that the maximum filter activation is 1.
Following previous reports (Brouwer and Heeger, 2009; Sprague and Serences, 2013), we model the response in each voxel as a weighted sum of filter responses: Where B1 (n trials × m voxels) is the observed BOLD activation level of each voxel during the spatial mapping task (beta weight estimated from single-trial GLM; low-contrast mapping runs), C1 (n trials × k channels) is the modeled response of each spatial filter, or information channel, on each non-target trial of the mapping task (normalized from 0 to 1 across all channels and trials), and W is a weight matrix (k channels × m voxels) quantifying the contribution of each information channel to each voxel. Because we have more stimulus positions than modeled information channels, we can solve for W using ordinary least-squares linear regression: This step is univariate and can be computed for each voxel in a region independently. Next, we used all estimated voxel encoding models within an ROI (Ŵ) and a novel pattern of activation from the visual attention task (beta weight for each voxel estimated from single-trial GLM) to compute an estimate of the activation of each channel (C2, n trials × k channels) which gave rise to that observed activation pattern across all voxels within that ROI (B2, n trials × m voxels): Once channel activation patterns are computed (Eq. 4), we compute spatial reconstructions by weighting each filter's spatial profile by the corresponding channel's reconstructed activation level and summing all weighted filters together. This step aids in visualization, quantification, and coregistration of trials across stimulus positions, but does not confer additional information. To visualize these responses, we multiplied each channel's filter profile by its activation measured during the task, and horizontally flipped trials in which the stimulus appeared on the left to align reconstructions as though all trials consisted of right stimuli.
Finally, to quantify these reconstructed stimulus representations, we fit a smooth surface to averaged reconstructions at each contrast and attention condition, within each ROI and participant: We implemented a coarse-to-fine fitting procedure, in which we first sampled a grid spanning center position −3 to 6° (0.33° spacing) horizontally (x), −3 to 3° (0.33° spacing) vertically (y), and FWHM (scaled version of s) from 0.25 to 22.25° (0.5° spacing), and fit amplitude (A) and reconstruction offset (Br) to a surface generated by the parameters at each point in the grid using least-squares regression. Then, we used the best-fit seed values from this initial coarse grid, defined by lowest RMSE, to seed a constrained optimization procedure to optimize for lowest RMSE. The constraints on x, y, and FWHM were identical to those spanned by the grid, and amplitude and baseline were additionally restricted to the range of −5:10 each. This resulted in one best-fit surface, parameterized by its amplitude (A), size (σ or FWHM), and reconstruction offset (Br) for each contrast and attention condition within each ROI for every participant. As described in the following section (Quantifying contrast response functions), the amplitude parameter from this analysis was subjected to a second set of analyses to infer the contrast response function of each ROI for each attention condition.
Throughout the paper, we report results from analyses performed using data from the low-contrast mapping task to estimate encoding models used for reconstruction. That said, results are consistent when we instead estimated encoding models using data from the high-contrast mapping task (Fig. 1-1). Additionally, these high-contrast mapping data are included in the data repository online (see Data/software availability) should the readers be interested in comparing results across analysis procedures.
EEG recording, preprocessing, and analysis.
We recorded EEG data at a sampling rate of 512 Hz with a 64 + 8 electrode Biosemi ActiveTwo system (Biosemi Instrumentation), and placed two reference electrodes on the left and right mastoids. We also monitored blinks and vertical eye movements with four external electrodes placed above and below the eyes and horizontal eye movements with another pair of external electrodes placed near the outer canthi of the left and right eyes. The data were referenced on-line to the CMS-DRL electrode and the data offsets in all electrodes were maintained <20 μV (a standard criterion for this active electrode system).
We preprocessed and analyzed EEG data using a combination of EEGlab11.0.3.1b and custom MATLAB scripts. We first re-referenced the continuous EEG data to the average of the EEG recorded from two mastoid electrodes. Then, we applied 0.25 Hz high-pass and 55 Hz low-pass Butterworth filters (third-order) and segmented the data into epochs extending from −2500 ms before to 2500 ms after the stimulus onset. Artifact rejection was performed off-line by discarding epochs contaminated by eye blinks and vertical eye movements (>±80–150 μV deviation from 0; exact thresholds were determined on a subject-by-subject basis because of differences in the amplitudes of eye blink and vertical eye movement artifacts), horizontal eye movements (>±75 μV deviation from 0), excessive muscle activity, or drifts using threshold rejection and visual inspection on a trial-by-trial basis, resulting in the removal of 12.61% (SD = 6.63%) of trials across subjects.
To obtain SSVEPs, Fourier coefficients were calculated at 30 Hz (the second harmonic of the contrast-reversal flicker frequency of 15 Hz) and surrounding frequencies over the 2 s stimulus interval (0.5–256 Hz in consecutive 0.5 Hz steps). Next, the absolute values of the Fourier coefficients averaged across all artifact-free trials were computed separately for each attention condition (attend-stimulus/attend-fixation), each stimulus location (left/right), each stimulus contrast level (0–70%), and each electrode. The signal-to-noise ratios (SNR) of the SSVEP response for each stimulus contrast level and attention condition were calculated by dividing the amplitude of the second harmonic of the stimulus frequency (30 Hz) by the mean amplitude in the two frequency bins above and below the center frequency of 30 Hz (28.5–29 Hz and 31–31.5 Hz, respectively). We adopted this SNR metric following previous SSVEP studies to ensure that the modulations of the SSVEP were not confounded by any changes in broadband power at beta frequencies (Ding et al., 2006; Kim and Verghese, 2012; Verghese et al., 2012; Garcia et al., 2013; Itthipuripat et al., 2014a). We rearranged the SSVEP data so that electrodes ipsilateral and contralateral to the stimulus are positioned on the left and the right of the topographical map, respectively. We then collapsed the data obtained when the stimulus was presented in the left and the right hemifields. Finally, we plotted the SSVEP signals as a function of stimulus contrast to obtain the neural CRFs based on the SSVEP responses. We focused our SSVEP analysis on three posterior-occipital electrodes where the SSVEP SNR, averaged across all contrast levels, attention conditions, and participants was maximal.
To obtain event-related potentials (ERPs), we baseline corrected from −200 to 0 ms relative to the stimulus onset and then computed the algebraic mean of the EEG data previously sorted into different contrast and attention conditions. The ERP data were also rearranged so that electrodes ipsilateral and contralateral to the stimulus are positioned on the left and the right of the topographical map, respectively, and we collapsed the data obtained when the stimulus was presented in the left and the right hemifields. We focused our ERP analysis on four ERP components, including the visual P1 component from 120 to 130 ms at the posterior occipital electrodes, the visual N1 component from 150 to 170 ms at the contralateral posterior occipital electrodes, the late positive deflection [(LPD) or P3] from 250 to 350 ms at the midline posterior electrodes, and the contralateral late negative-going wave (CLN) from 800 to 2000 ms at the posterior occipital electrodes. The electrodes-of-interest for each of these ERP components were different sets of three electrodes that showed the maximal response amplitude averaged across all contrast levels, attention conditions, and participants. Similar to the analysis of SSVEP responses, we plotted the amplitude of these ERP components from the electrodes of interest as a function of stimulus contrast to obtain the neural CRFs.
Last, we examined poststimulus changes in posterior alpha activity. To do so, we wavelet-filtered the artifact-free epoched EEG data using a Gaussian filter centered at 10–12 Hz with a time-domain SD ranging from 83 to 100 ms (see similar methods by Canolty et al., 2006, 2007; Itthipuripat et al., 2013). Next, we computed changes in the alpha amplitude during the 2 s stimulus duration relative to the pre-cue period by subtracting out the mean alpha amplitude averaged across −500 to 0 ms before the cue onset. Finally, we plotted poststimulus alpha amplitude as a function of stimulus contrast to obtain the neural CRFs, and we focused our alpha analysis on three contralateral posterior-occipital electrodes where the reduction in alpha amplitude, averaged across all contrast levels, attention, and subjects conditions, was maximal.
Quantifying contrast response functions.
Throughout all analyses of BOLD and EEG-based CRFs, we only analyzed trials in which a target stimulus (a stimulus with contrast increment) was not present. This was done in part because the presence of a physical luminance change in the stimulus display on these trials does not occur on non-target trials. Moreover, participants might have realized that immediately after the target appeared, they could cease attending for the remainder of the trial. Finally, neural signals on these trials might be corrupted by motor preparation/execution processes. To minimize the unknown impacts of these potential confounds on measured neural CRFs, we exclude these target-present trials (25%) from further analyses.
To examine whether attention induces either response gain, contrast gain, or baseline shifting in the CRF as measured using fMRI and EEG measurements (see details in fMRI acquisition, preprocessing and analysis and EEG recording, preprocessing and analysis), we fit a Naka-Rushton function to the data as follows. First, we used a bootstrapping procedure to resample subjects with replacement and we computed the averaged response for each contrast level and each attention condition across the resampled subject labels. Then we fit the resampled data (12 data points: 2 attention conditions × 6 contrast levels) with the following Naka-Rushton equation: The fitting procedure was performed with 8 free parameters: 2 response gain factors (Gr), 2 contrast gain factors (Gc), 2 baseline parameters (bc), and 2 exponents (n); one for each attention condition (attend-stimulus and attend-fixation). We used the MATLAB function “fmincon” to minimize the root mean squared error between the data and the fit function, under a set of constraints. For fMRI fits, Gr was restricted to be positive, with a maximum of 5 BOLD Z-score units (univariate analyses) or 5 arbitrary units (multivariate analyses); Gc was restricted within the range of 0–100 (% contrast), and CRF baseline activity was restricted to an absolute value of 3 BOLD Z-score units (univariate analyses) or 3 arbitrary units (multivariate analyses). For EEG fits, Gr was restricted to be within a range of −20 to 20 μV; Gc was restricted within the range of 0–100 (% contrast), and bc was restricted to be within a range of −6 to 6 μV. For both EEG and fMRI fits, the exponent n was restricted within 0.1 and 5. Although there are many variants of the Naka-Rushton equation, we decided to use this version (Eq. 6) to make contact with the large number of past studies that have also used this Naka-Rushton equation to fit contrast response functions measured using a variety of measurement techniques (e.g., psychophysics, single-unit electrophysiology, EEG, and fMRI; Martínez-Trujillio and Treue, 2002, Kim et al., 2007; Herrmann et al., 2010; Pestilli et al., 2011; Carandini and Heeger, 2011; Itthipuripat et al., 2014a,b, 2017, 2018; Reynolds and Heeger, 2009).
Because the Gr and Gc parameters control the response and contrast gain of the function where the contrast axis ranges from zero to ∞, the Gr and Gc parameters could in principle exceed the realistic range of stimulus contrast (0–100% contrast). Thus, instead of directly comparing Gr and Gc parameters across conditions we obtained parameters that described the gain of neural responses relative to baseline (Rmax) by evaluating the best-fit Naka–Rushton equation at c = 100% and subtracting the baseline (bc), and the contrast at which neural responses reach half their maximum (C50) by finding the contrast at which r = bc+Rmax/2. These two derived parameters, respectively, capture response gain and contrast gain of the CRFs over the realistic range of stimulus contrast values. The C50 parameter is sometimes called the semi-saturation constant. However, because not all of our observed CRFs saturate at high contrasts, we instead refer to this parameter as the half-max contrast. For fMRI data, we fit either the univariate mean BOLD response contralateral to the stimulus position, or the amplitude of the best-fit surface to the IEM-based image reconstructions. For EEG data, this fitting procedure was done separately for each of the EEG components (i.e., SSVEP, P1, N1, LPD, CLN, and alpha) obtained from different sets of three electrodes that exhibit the maximal response amplitude averaged across all contrast levels, attention conditions, and participants. We also performed the same analysis on each of 22 electrodes in the occipital and posterior sites, and corrected for multiple comparisons using the false discovery rate (FDR) method (Benjamini and Hochberg, 1995). To obtain bootstrapped distributions of Rmax, C50, bc, and n parameters, this resampling and refitting procedure was performed 10,000 times, with each iteration resampling over participants with replacement. To test the significance of the attention effect on each of these parameters, we compiled the bootstrapped distribution of the differences between the estimated fit parameters in the attend-stimulus and attend-fixation conditions and computed the percentage of values in the tail of this distribution that were less than or greater than zero. We used two-tailed statistical tests throughout to be conservative, so we doubled this proportion to obtain each p value.
Data/software availability.
All data and analysis code supporting reported results is available on the Open Science Framework at https://osf.io/savfp/.
Results
Behavior
To determine the extent to which measures derived from fMRI and EEG index different aspects of neural activity, we used both techniques to measure attentional modulations of neural CRFs in the same human subjects performing the same visual spatial attention task under matched stimulus conditions and difficulty levels (Fig. 1b). Across fMRI and EEG recording sessions, seven participants detected a rare incremental change in the contrast of a target (25% of target-present trials) at the fixation point or at the stimulus location (left or right of fixation) following a central color cue. As shown in Figure 1c, behavioral perceptual sensitivity (d′) was equated across stimulus contrast levels (0–70% pedestal contrast, equally spaced on a logarithmic scale), attention conditions (attend-fixation vs attend-stimulus), and measurement modalities (fMRI vs EEG). Thus, there was no main effect of stimulus contrast (F(5,30) = 2.46, p = 0.056), attention (F(1,6) = 0.16, p = 0.700), or measurement modality (F(1,6) = 0.00, p = 0.961), nor an interaction between any combination of these three factors (contrast × attention: F(5,30) = 0.67, p = 0.652; contrast × modality: F(5,30) = 1.40, p = 0.251; attention × modality: F(1,6) = 0.36, p = 0.572, and contrast × attention × modality: F(5,30) = 0.91, p = 0.489). The contrast detection thresholds increased as a function of stimulus contrast consistent with many past studies (a significant main effect of contrast: F(5,30) = 233.40, p <0.001; Legge and Foley, 1980; Ross et al., 1993; Boynton et al., 1999; Gorea and Sagi, 2001; Huang and Dobkins, 2005; Pestilli et al., 2011; Itthipuripat et al., 2014b, 2017, 2018). Moreover, behavioral performance did not differ significantly across fMRI and EEG sessions (Fig. 1d), showing no main effect of measurement modality (F(1,6) = 0.61, p = 0.464) and no interaction between contrast and measurement modality (F(5,30) = 2.46, p = 0.055). Overall, the similarity of behavioral results across measurement modalities ensured that any differences in attentional modulations measured via fMRI and EEG could not be because of any difference in factors such as task difficulty or strategy.
Figure 1-1
Univariate fMRI
First, we compared evoked BOLD responses across contrast and attention conditions for each early visual retinotopic ROI (V1, V2/V3, and hV4; as well as an aggregate ROI including all voxels from V1-hV4; Fig. 2). In this and all subsequently-reported neural analyses we only considered trials in which no target stimulus (change in contrast of fixation or checkerboard stimulus) appeared, which resulted in dropping 25% of trials (see Materials and Methods, Quantifying contrast response functions). We used a finite impulse response (FIR) model to deconvolve HRFs from each voxel for each condition, then averaged responses across voxels within each unilateral ROI. Finally, we sorted unilateral ROIs based on their location relative to the visual stimulus and averaged across participants. Qualitatively, attention increased the evoked BOLD response in each region, and this effect was most pronounced at lower contrasts.
To quantify this effect, we first plotted the average response of stimulus–responsive voxels (selected using an independent localizer task) as we manipulated stimulus contrast and spatial attention. We found that that spatial attention induced an additive shift in the fMRI-based CRFs (i.e., attention increases BOLD response; Figs. 2a, 3), and that this modulation was greatest at the lowest contrast (i.e., when the stimulus was absent).
To quantify the shape of the CRFs and their modulation with attention, we fit a standard Naka–Rushton equation used to derive parameters for the maximum response relative to baseline (the difference between the response at 0 and 100% contrast or Rmax), the point at which the response reaches 50% of its maximum relative to baseline (the half-max contrast or C50), and the CRF baseline activity or y-intercept of the CRF (bc; see Materials and Methods, Quantifying contrast response functions). Across contralateral retinotopic early visual ROIs V1, V2/V3, and hV4, attention reliably increased the CRF baseline (bc, resampling tests described in Materials and Methods; all p values ≤0.001 across all visual areas; Table 1). In addition, in these regions we observed a significant attention-induced reduction in Rmax (resampling tests, all p values <0.001). Note that this reduced Rmax was because of the robust increase in the baseline activity of the BOLD CRFs because Rmax was computed relative to the baseline parameter (bc), which indexes the degree of attentional modulations of the BOLD response when no stimulus was present. Inspection of CRFs revealed that the amount of attentional modulation per contrast seemed to decrease: attentional modulations were strong at low contrasts, and somewhat weaker at higher contrasts (Figs. 2, 3). This may be because of saturation of the BOLD response at high contrasts, and is consistent with prior reports (Pestilli et al., 2011). There were less consistent effects of attention on the other parameters of the CRFs across different visual areas: p values ranged from 0.026 to 0.914 for C50, and p values ranged from 0.025 to 0.112 for the exponent n (the steepness of the fit CRF; Fig. 3c; resampling tests).
Multivariate fMRI
The univariate analyses focused on the mean response of all voxels responsive to the stimulus locations based on a separate localizer experiment. Previous work has shown that voxels show differential effects of attention based on their preferred position relative to the visual stimulus (Tootell et al., 1998; Silver et al., 2007). Indeed, some voxels have negative evoked responses, which are also subject to attentional modulations (Müller and Kleinschmidt, 2004; Fischer and Whitney, 2009; Bressler et al., 2013; Gouws et al., 2014; Puckett et al., 2014; Puckett and DeYoe, 2015). Moreover, changes in single-voxel spatial response properties have been extensively documented (Sprague and Serences, 2013; Klein et al., 2014; de Haas et al., 2014; Kay et al., 2015; Ling et al., 2015; Sheremata and Silver, 2015; Vo et al., 2017; van Es et al., 2018), and the univariate BOLD signal averaged across stimulus–responsive voxels may not be sensitive to the subtle impact these selectivity changes might have on region-level activation patterns. We wondered: could it be the case that the pattern of response modulations across voxels subtending the entire visual field constrains a selective neural representation of the visual stimulus that exhibits a different pattern of response modulation with attention? That is, could the varied effects of attention seen across voxels jointly constrain a stimulus representation that also exhibits response gain in addition to the observed baseline shift seen in the univariate analyses?
To address this possibility, we reconstructed model-based spatial representations of visual stimuli at each contrast and under each attention condition using a multivariate IEM applied to visual areas V1-hV4 (see Materials and Methods; Brouwer and Heeger, 2009; Sprague and Serences, 2013; Sprague et al., 2014, 2016, 2018a,b; Vo et al., 2017). This method first estimates a fixed encoding model based on an independent set of “mapping” data, then based on this best-fit model, computes a mapping from measured voxel space to a modeled information space (visual retinotopic coordinates). The result is a reconstruction of the entire visual field carried by activation patterns on each trial based on the chosen neural encoding model (Fig. 4a). We evaluated changes in these spatial reconstructions by quantifying and comparing the amplitude (A), size (σ), and reconstruction offset (Br) measured by fitting a 2-D surface (Fig. 4b).
First, we evaluated whether each reconstruction parameter varied across contrast and/or attention conditions. Two-way repeated-measures ANOVAs showed that there were significant main effects of contrast (F(5,30) = 24.63, p <0.001) and attention (F(1,6) = 22.00, p = 0.003) but no significant interaction between the two factors on the amplitude parameter of the reconstructions collapsed across V1-hV4 (F(5,30) = 0.75, p = 0.593), similar to the univariate results. Unlike reconstruction amplitude, there was no main effect of contrast (F(5,30) values ≤0.96, p values ≥ 0.458), no main effect of attention (F(1,6) values ≤0.80, p values ≥ 0.406), or no interaction between contrast and attention on the size (σ) and reconstruction offset (Br) of the reconstruction (F(5,30)values ≤1.62, p values ≥ 0.599). Because these additional reconstruction parameters did not vary with manipulations of interest, we did not further quantify changes in these parameters across stimulus or task conditions using the Naka–Rushton equation (see Materials and Methods).
We then used the amplitude parameter (A) of the best fitting surface to generate a CRF based on the reconstructions. Like the univariate fMRI result, we fit each CRF using a Naka–Rushton equation and found that the CRF baseline parameter (bc) increased with attention in nearly all visual areas (Figs. 4b,c; resampling tests, p values <0.001 for V1, V2/V3, and V1-hV4, except that p = 0.310 for hV4; Table 2). However, attention effects on the other parameters including Rmax, C50, and n were less robust and less consistent across different visual areas: p values ranged from 0.044 to 0.946, from 0.311 to 0.704, and from 0.465 to 0.753 for Rmax, C50, and n, respectively (Table 2). Together, the univariate and multivariate fMRI analyses provide evidence that attention primarily operates to increase the baseline offset of CRFs (Fig. 1a, right), and a model-based multivariate assay of information content demonstrates this effect is confined to a change in the SNR of the stimulus representation, indexed by the amplitude parameter (Buracas and Boynton, 2007; Murray, 2008; Pestilli et al., 2011; Gouws et al., 2014; Hara and Gardner, 2014; Sprague et al., 2018b).
Although the multivariate results yield qualitatively similar patterns of attentional modulations on the baseline shift of neural CRFs as the univariate results, we observed differences in saturation between the multivariate and univariate fMRI-based CRFs. Particularly, V2/V3 exhibited a substantial univariate decrease in Rmax, but the multivariate CRF did not show the same decrease in Rmax. We interpret this Rmax difference to likely reflect the stronger impact of saturation on the “strongest” BOLD signals (those within voxels responsive to the stimulus location) when considering only those stimulus-localized voxels. When all other voxels were incorporated in the multivariate IEM analysis, the saturation effect (decrease in Rmax) was slightly mitigated. However, the response at the highest contrast was always slightly higher when the stimulus was attended, and this decrease in Rmax reflects only a shrinking of the dynamic range of the CRF (in large part because of the strong effect of attention at low contrasts in univariate BOLD measurements).
Additionally, the multivariate reconstruction analysis demonstrates that attention to an empty region of space marked by a placeholder results in a relatively focal enhancement of the attended retinotopic position (Figs. 4a,b, first rows). Visual inspection of reconstructions of the Attended, 0% contrast condition show focused reconstruction activation on the attended region of space rather than a diffuse enhancement of the entire hemifield. Quantification of the reconstructions for the 0% contrast condition confirm that the size is not substantially different from that of stimulus-present conditions (Fig. 4b, second row). This result is consistent with previous work showing the maintenance of information in visual spatial working memory, which is thought to rely on visual attention for rehearsal (Awh and Jonides, 2001; Awh et al., 2006; Kiyonaga and Egner, 2013), and results in similar focal representations in stimulus reconstructions (Sprague et al., 2014, 2016).
SSVEP
If fMRI and EEG tap into similar types of top-down and bottom-up attentional modulations of early sensory responses, we should see similar types of attentional modulations of the CRFs based on BOLD activity and attentional modulations of the CRFs based on early sensory evoked potentials measured by EEG. However, in contrast to changes in the CRF baseline activity that were observed in the fMRI data, stimulus-evoked responses measured via EEG revealed a qualitatively different pattern of neural modulations. First, we examined attentional modulations of SSVEPs, the phase-locked EEG responses in visual cortex that oscillate at the second harmonic (i.e., 30 Hz) of the frequency of the contrast-reversing flickering visual stimulus at 15 Hz (Fig. 5a, middle/right; Kim et al., 2007, 2011; Norcia et al., 2015). For the SSVEP-based CRF, we observed an increase in Rmax (response gain) as well as an increase in C50 (a rightwards shift of the CRF) for the attend-stimulus condition compared with the attend-fixation condition (Fig. 5a, left; resampling tests, p = 0.048 and p = 0.021 for Rmax and C50, respectively; Table 3). However, the bc and n parameters of the SSVEP-based CRFs did not differ across attention conditions (resampling tests, p values ≥ 0.479; Table 3). Note that the SSVEP-based CRFs did not saturate at high contrast levels and these non-saturating SSVEP CRFs have previously been observed in prior studies (Kim et al., 2007, 2011; Itthipuripat et al., 2014a). That said, the attentional modulation we found at the highest contrast level in the SSVEP-based CRFs still stands in contrast to the observed attentional modulations in the BOLD-based CRFs that were more pronounced at 0% and low contrasts (Figs. 3, 4).
Stimulus-evoked ERPs
We also observed qualitatively similar results for early stimulus-evoked ERPs, including the contralateral occipital P1 component (120–130 ms) and the central-posterior LPD (or P3; 250–350 ms). Specifically, the Rmax parameters associated with the P1- and the LPD-based CRFs increased with attention (Figs. 5b,c, Fig. 6; resampling tests, p values <0.001 for both of the P1 and LPD; Table 3). Also, the CRF baseline activity (bc) of these two ERP components were significantly more negative in the attend-stimulus condition than in the attend-fixation condition (resampling tests, p = 0.014 and p = 0.005 for the P1 and LPD components, respectively; Table 3). However, note that the direction of these attentional modulations on the CRF baseline parameter was opposite to the polarity of these ERP components and to the direction of contrast modulations, standing in contrast with the fMRI results. We believe this apparent reversal was likely driven by a slow negative-going ERP induced by sustained covert spatial attention (Woodman and Luck, 1999; Vogel and Machizawa, 2004; Vogel et al., 2005; Woodman et al., 2009; Carlisle et al., 2011; Kuo et al., 2012; Tsubomi et al., 2013; Fig. 7). The other best-fit CRF parameters based on the P1 and LPD components, including the C50 and n parameters, did not change (resampling tests, p values ≥ 0.298; Table 3). In addition, while the amplitude of the contralateral visual N1 (150–170 ms) increased as a function of contrast (i.e., became more negative), we observed no attention modulation of any of the CRF parameters (Fig. 5d; resampling tests, p values ≥ 0.366; Table 3). Overall, the results based on stimulus-evoked responses including SSVEP, P1, and LPD provide converging evidence that attention increases the response gain of neural CRFs (Fig. 1a, left).
This observation that attention results in stronger modulation at higher contrasts supports our speculation above that decreases in Rmax with attention in univariate BOLD signals (Fig. 2) may reflect saturation of the BOLD signal at high contrasts. In the same participants at the same level of behavioral performance, these evoked EEG signals demonstrate that attention results in stronger visual responses at high contrasts and these stronger responses are hidden in the equivalent BOLD measurements.
Late slow-going ERP
Discrepancies between fMRI and EEG measures of early stimulus-evoked responses (SSVEP, P1, and LPD/P3) left us wondering whether we could find any similarity in attentional modulations measured with these two methods. We first examined the contralateral slow negative-going wave that emerged ∼800–2000 ms after stimulus onset (termed here as the CLN), which has been recently found to track the focus of spatial attention (Itthipuripat et al., 2018; Hakim et al., 2019). The CLN component has a characteristic (e.g., temporal window, polarity, and electrode location) that resembles the contralateral delay activity, which is a marker of the active maintenance of attention during visual search and the maintenance of information in working memory (Woodman and Luck, 1999; Vogel and Machizawa, 2004; Vogel et al., 2005; Woodman et al., 2009; Carlisle et al., 2011; Kuo et al., 2012; Tsubomi et al., 2013). Because the CLN is a negative going ERP, we expected the Rmax of the CLN-based CRF to become more negative if attention enhanced response gain of this ERP component. On the other hand, we expected the bc parameter of the CLN-based CRF to become more negative, if attention induced a baseline shift of this ERP component like it did to the BOLD data.
Consistent with the second prediction, we found that there was an increase in the CRF baseline parameter (bc) associated with the EEG-based CRF driven by the CLN in occipital electrodes that were contralateral to the attended target (Fig. 5e; resampling test, p <0.001; Table 3). This modulation had a selective impact on the CRF baseline parameter, as the Rmax, C50, and n parameters were not different across attention conditions (resampling tests, p values ≥ 0.764; Table 3). The present data show that this ERP component also indexes sustained covert spatial attention even in the absence of a stimulus and that the magnitude of attentional modulation was independent of stimulus contrast, consistent with our control over behavioral performance (Fig. 1c). This marker thus exhibits a pattern of additive modulation with attention that is similar in a nature to the pattern of BOLD responses.
Alpha reduction
We also measured the reduction of contralateral posterior alpha activity in the EEG data (10–12 Hz) relative to pre-cue baseline (Fig. 5f), which has been used to index the allocation of visuospatial attention (Foxe et al., 1998; Fries et al., 2001, 2008; Sauseng et al., 2005; Kelly et al., 2006, 2009; Klimesch et al., 2007; Rihs et al., 2007; Bosman et al., 2012; Foster et al., 2016, 2017; Liu et al., 2016; Samaha et al., 2016; Voytek et al., 2017; Hakim et al., 2019). We hypothesized that attentional modulations of alpha oscillations might be similar to that of BOLD activation for two reasons. First, a recent study measured fMRI and intracortical EEG and found a strong correlation between BOLD responses and alpha oscillations in local field potentials (LFPs) recorded from human visual cortex (V1–V3; Conner et al., 2011; Hermes et al., 2017). Second, alpha oscillations have been shown to track the allocation of visuospatial attention in a retinotopically selective manner (Foster et al., 2017), similar to large-scale patterns of fMRI activity (Kastner et al., 1998, 1999; Gandhi et al., 1999; Sprague and Serences, 2013; Vo et al., 2017; Sprague et al., 2018b). Because attention should reduce the amplitude of the alpha activity, we expected that the bc parameter of the alpha-based CRF to become more negative with attention.
As predicted, we found that CRF baseline activity (bc) based on poststimulus alpha amplitude was significantly reduced with attention (resampling test, p = 0.003; Table 3). However, the other parameters including Rmax, C50, and n did not differ across attention conditions (resampling tests, p values ≥ 0.362; Table 3). Recently, studies have shown that the topographic patterns of alpha reduction after attention cues and during a working memory delay period contain information about attended and remembered spatial locations even in the absence of continuous sensory input (Sauseng et al., 2005; Kelly et al., 2009; Foxe and Snyder, 2011; Rohenkohl and Nobre, 2011; Bosman et al., 2012; Foster et al., 2016, 2017; Fukuda et al., 2016; Samaha et al., 2016; Green et al., 2017). Our results are consistent with these previous findings and add that attention-induced changes in contralateral alpha power are independent of stimulus contrast, much like the BOLD response.
Discussion
The present results demonstrate that BOLD signals in V1-hV4 measured using fMRI and SSVEPs and evoked potentials (P1 and LPD/P3) measured using EEG index qualitatively different attentional modulations in the same human subjects performing the same behavioral task. Visual fMRI responses scaled with both contrast and attention, with greater BOLD responses observed for higher stimulus contrast values when attention was directed to the stimulus (Figs. 2, 3 and 4). However, the effect of attention was largely independent of stimulus contrast, and, if anything, decreased with increasing contrast, consistent with prior results (Pestilli et al., 2011). This independence suggests that bottom-up visual stimulation and top-down spatial attention independently impact BOLD responses in visual cortex, consistent with a 'baseline shift' mechanism (Fig. 1a, right). In contrast, evoked EEG responses, including the SSVEP, P1, and LPD signals, scaled with stimulus contrast, and this contrast scaling was impacted by visual attention, with greater effects of attention at higher stimulus contrasts (Figs. 5a–c). This is consistent with a response gain mechanism (Fig. 1a, left). Finally, a slow-going ERP component (i.e., CLN) and poststimulus amplitude changes in EEG alpha oscillations were sensitive to the locus of visual attention even without external stimulation, and these attentional modulations were not sensitive to stimulus contrast. Because each signal type is differentially sensitive to manipulations of stimulus and task features, caution should be taken when treating these signals as different spatiotemporal assays of a common neural source.
Notably, visually evoked fMRI responses are most commonly linked in the literature with modulations of the early stimulus-evoked EEG responses like the P1 component and SSVEPs, so the present demonstration of divergent modulations calls this practice into question (Hillyard and Anllo-Vento, 1998; Mangun et al., 1998; Martínez et al., 1999, 2001; Di Russo et al., 2002, 2005, 2007; Noesselt et al., 2002; Busse et al., 2005; Novitskiy et al., 2011; Zhang et al., 2012; Chen et al., 2014; Di Russo and Pitzalis, 2014; Green et al., 2017). However, we discovered that there were some similarities between the CRFs based on BOLD signals and the CRFs based on the CLN and poststimulus alpha oscillations. These results suggest that early stimulus-evoked potentials measured using EEG interact with spatial attention, giving rise to response gain modulations in the neural CRFs. However, fMRI signals and other EEG measurements including the CLN and poststimulus alpha activity reflect top-down attentional modulations that are spatially selective but that do not scale proportionally with increases in exogenous stimulus drive.
Among previous fMRI studies, different patterns of CRF attentional modulations have been observed. While a majority of these studies have shown that spatial attention produced an increase in the CRF baseline activity measured via fMRI (Buracas and Boynton, 2007; Murray, 2008; Pestilli et al., 2011; Gouws et al., 2014; Hara and Gardner, 2014; Sprague et al., 2018b), one study showed evidence for contrast gain (Li et al., 2008). We speculate that the contrast gain observed in this study was at least partially influenced by different levels of difficulty across stimulus contrast and attention conditions, which we carefully controlled in the present study (Fig. 1c). We also show that attention-induced baseline shifts in the fMRI-based CRFs were not because of the specific analysis methods applied to the fMRI data because traditional univariate methods that quantify the mean response across all voxels in a visual area and the multivariate IEM yielded similar additive shifts because of attention. In either case, the additive effects of attention on the fMRI-based CRFs are qualitatively different from the response gain that was observed in the SSVEP and ERP measures recorded from the same experimental design and the same participants.
Interestingly, in some analyses, we observed a decrease in Rmax, parameterizing gain in neural responses with increasing contrast, when attention was directed toward a stimulus compared with when it was directed toward fixation. Because these decreases in gain always co-occur with increases in baseline, it is likely the case that BOLD saturation at high stimulus contrasts results in a smaller effect of attention for high contrasts than for low contrasts. Additionally, although Rmax decreases, it is important to note that the measured response at our maximum contrast level (70%) remains larger with attention.
Variants of computational models based on divisive normalization have previously provided at least two alternative explanations for the shift in baseline activity of the fMRI-based CRFs. First, these baseline shifts could be related to the spatial extent of attention and the spatial extent of the stimulus, with additive shifts most prominent when these two factors are similar in size (Reynolds and Heeger, 2009). Alternatively, this attention-related increase in CRF baseline activity could be because of the fact that fMRI signals reflect aggregated neural responses pooled from populations of neurons that exhibit different gain patterns (i.e., contrast or response gain; Boynton, 2011; Hara et al., 2014), as well as local synaptic activity which can result from long-range projections (Logothetis and Wandell, 2004; Goense and Logothetis, 2008; Magri et al., 2012). We argue that these explanations, though possible, are not likely to completely account for the observed differences in fMRI and EEG modulations with attention. First, subjects in the present study performed the exact same task using identical visual stimuli and their behavioral performance was equated across measurement modalities. Moreover, fMRI and EEG are both population-level measures that aggregate information across large populations of responsive neurons, yet they still exhibit substantially different patterns of modulation with changes in cognitive demands. This is especially important because recent fMRI and EEG studies using quantitative modeling to link attentional modulations of neural CRFs and psychophysical performance have reached very different conclusions about neural mechanisms underlying attentional selection (Pestilli et al., 2011; Hara and Gardner, 2014; Itthipuripat et al., 2014b, 2017). Specifically, fMRI data suggested that sensory gain (i.e., the increase in CRF slopes) and noise modulation (e.g., the decrease in trial-by-trial variability in neural responses) had limited roles in supporting attention-induced changes in behavioral performance (Pestilli et al., 2011; Hara and Gardner, 2014). In contrast, EEG data suggested that sensory gain modulations and in some cases noise reduction could sufficiently account for attentional benefits in behavior (Mangun and Hillyard, 1990; Störmer et al., 2009; Itthipuripat et al., 2014b, 2017). The results from the present study help to reconcile these divergent conclusions and point to differences in the physiological sensitivity of neural recording techniques rather than other task and stimulus parameters. Finally, it is possible that participants' engagement with the task varied across stimulus contrasts, despite equated task difficulty (Fig. 1). However, both difficulty levels (d′) and perceptual thresholds (incremental contrasts of target stimuli) were comparable across the fMRI and EEG experiments. Therefore, this cannot account for the divergent observations between top-down attentional modulations of fMRI and early sensory evoked potentials measured by EEG.
Instead, our data are consistent with a recent proposal suggesting spatial attention enhances synaptic efficacy between neurons in the early visual system (Briggs et al., 2013; Hembrook-Short et al., 2019). Such an increase in synaptic efficacy would increase local signaling, which can result in increased broadband LFPs, even at low (or 0%) contrasts. When stimulus drive (contrast) is increased, this could result in stronger synchronous, evoked signals, which could be detected at the scalp with EEG. Previous work linking BOLD fMRI and intracortical EEG in the same participants has suggested that BOLD signals are strongly related to a combination of asynchronous broadband activity (difficult to measure at the scalp; but see Kupers et al., 2018) and a decrease in slow synchronous signals in the alpha band (Harvey et al., 2013; Winawer et al., 2013; Hermes et al., 2017). Furthermore, a recent report demonstrated shifts in broadband LFPs recorded at the cortical surface across visual, parietal, and frontal cortex in human participants directing spatial attention in the absence of a stimulus (Martin et al., 2019), consistent with this framework. Additionally, changes in posterior alpha activity and sustained scalp potentials have each been related to top-down input from frontal and parietal regions (Reinhart et al., 2012; Marshall et al., 2015; Liu et al., 2016; Popov et al., 2017). If these top-down inputs work to simultaneously improve synaptic efficacy, thus resulting in increased broadband responses, and to decrease alpha, this could provide a harmonious explanation for all of our observations.
In contrast to the findings that link BOLD to asynchronous broadband and alpha activity in the LFPs (Hermes et al., 2017), BOLD signals were not found to closely relate to changes in visually evoked potentials measured by scalp EEG, such as the P1 and SSVEP signals we report here. Consistent with our findings, the uncorrelated relationship between BOLD and the stimulus-evoked visual responses has been shown across studies recording intracortical EEG in humans and electrophysiology in monkeys (Sirotin and Das, 2009; Winawer et al., 2013; Hermes et al., 2017). However, thus far, no study has documented changes in these electrophysiological signals and BOLD responses in the same participants across attentional task manipulations, precluding the direct comparison between the attentional modulations of electrophysiological signals and of BOLD fMRI performed here. In the present study, stimulus-evoked EEG signals, which primarily index synchronous neural population activity reflecting a more linear integration of neural signals (Winawer et al., 2013), are shown to increase their gain with attention.
Collectively, our data challenge the general assumption that fMRI and stimulus-evoked early EEG responses reflect the same neural processes measured at different spatial and temporal resolutions. Instead, we find that a later low-frequency ERP component and oscillatory activity in the alpha band more closely track attentional modulations of fMRI responses, and thus may be robust indicators of top-down modulatory inputs. The discrepancy between the patterns of attentional modulations of BOLD signals and of stimulus-evoked potentials measured by scalp EEG suggest that attention has impacts across all stimulus contrast levels, but these effects are differentially accessible to different measurement techniques. These results show that different measurement modalities (e.g., fMRI and EEG) assay different aspects of neural signals beyond just a tradeoff in temporal and spatial resolution. Thus, different physiological processes assayed by these complementary techniques must be jointly considered when making inferences about the neural underpinnings of cognitive operations (Logothetis, 2008; Boynton, 2011; Hara et al., 2014; Itthipuripat and Serences, 2016) and when using them as diagnostic tools that measure disruptions in cognitive and sensory functions in clinical populations (Calderone et al., 2013).
Footnotes
This work was supported by NIH R01-EY025872 (J.T.S.), the James S. McDonnell Foundation (J.T.S), the Howard Hughes Medical Institute International program (S.I.), a Royal Thai Scholarship from the Ministry of Science and Technology in Thailand (S.I.), NIH T32-MH020002 (T.C.S.), NIH T32-EY007136 (T.C.S.), and NIH F32-EY023438 (T.C.S.).
The authors declare no competing financial interests.
- Correspondence should be addressed to Sirawaj Itthipuripat at itthipuripat.sirawaj{at}gmail.com or Thomas C. Sprague at tsprague{at}ucsb.edu