Abstract
Attentional selection mechanisms in visual cortex involve changes in oscillatory activity in the EEG alpha band (8–12 Hz), with decreased alpha indicating focal cortical enhancement and increased alpha indicating suppression. This has been observed for spatial selective attention and attention to stimulus features such as color versus motion. We investigated whether attention to objects involves similar alpha-mediated changes in focal cortical excitability. In experiment 1, 20 volunteers (8 males; 12 females) were cued (80% predictive) on a trial-by-trial basis to different objects (faces, scenes, or tools). Support vector machine decoding of alpha power patterns revealed that late (>500 ms latency) in the cue-to-target foreperiod, only EEG alpha differed with the to-be-attended object category. In experiment 2, to eliminate the possibility that decoding of the physical features of cues led to our results, 25 participants (9 males; 16 females) performed a similar task where cues were nonpredictive of the object category. Alpha decoding was now only significant in the early (<200 ms) foreperiod. In experiment 3, to eliminate the possibility that task set differences between the different object categories led to our experiment 1 results, 12 participants (5 males; 7 females) performed a predictive cuing task where the discrimination task for different objects was identical across object categories. The results replicated experiment 1. Together, these findings support the hypothesis that the neural mechanisms of visual selective attention involve focal cortical changes in alpha power not only for simple spatial and feature attention, but also for high-level object attention in humans.
SIGNIFICANCE STATEMENT Attention is the cognitive function that enables relevant information to be selected from sensory inputs so it can be processed in the support of goal-directed behavior. Visual attention is widely studied, yet the neural mechanisms underlying the selection of visual information remain unclear. Oscillatory EEG activity in the alpha range (8–12 Hz) of neural populations receptive to target visual stimuli may be part of the mechanism, because alpha is thought to reflect focal neural excitability. Here, we show that alpha-band activity, as measured by scalp EEG from human participants, varies with the specific category of object selected by attention. This finding supports the hypothesis that alpha-band activity is a fundamental component of the neural mechanisms of attention.
Introduction
Selective attention is a fundamental cognitive ability that facilitates the processing of task-relevant perceptual information and suppresses distracting signals. The influence of attention on perception has been demonstrated in improvements in behavioral performance (Posner, 1980) and changes in psychophysical tuning curves (Carrasco and Barbot, 2019). In humans, these perceptual benefits for attended stimuli co-occur with enhanced sensory-evoked potentials (Van Voorhis and Hillyard, 1977; Eason, 1981; Mangun and Hillyard, 1991; Eimer, 1996; Luck et al., 2000) and increased hemodynamic responses (Corbetta et al., 1990; Heinze et al., 1994; Mangun et al., 1998; Tootell et al., 1998; Martínez et al., 1999; Hopfinger et al., 2000; Giesbrecht et al., 2003). In animals, electrophysiological recordings indicate that sensory neurons responsive to attended stimuli have higher firing rates than those of unattended stimuli (Moran and Desimone, 1985; Luck et al., 1997), improved signal-to-noise in information transmission (Mitchell et al., 2009; Briggs et al., 2013), and increased oscillatory responses (Fries et al., 2001) that support higher interareal functional connectivity (Bosman et al., 2012).
Most models of selective attention posit that top-down attentional control signals arising in higher-level cortical networks bias processing in sensory systems (Nobre et al., 1997; Kastner et al., 1999; Corbetta et al., 2000; Hopfinger et al., 2000; Corbetta and Shulman, 2002; Petersen and Posner, 2012). However, precisely how top-down signals influence sensory processing within sensory cortex remains unclear. One possible mechanism involves the modulation of EEG alpha oscillations (8–12 Hz). When covert attention is directed to one side of the visual field, the alpha signal is more strongly suppressed over the contralateral hemisphere (Worden et al., 2000; Sauseng et al., 2005; Thut et al., 2006; Rajagovindan and Ding, 2011). This lateralized alpha reduction is thought to reflect an increase in cortical excitability in task-relevant sensory neurons to facilitate the processing of upcoming stimuli (Romei et al., 2008; Jensen and Mazaheri, 2010; Klimesch, 2012). A link between top-down activity in the frontal-parietal attentional control system and alpha in sensory cortex has been suggested by studies using transcranial magnetic stimulation to control regions (Capotosto et al., 2009, 2017), simultaneous EEG-fMRI studies (Zumer et al., 2014; Liu et al., 2016) and magnetoencephalography (Popov et al., 2017).
Although the majority of studies of the role of alpha in selective visual attention have focused on spatial attention, alpha mechanisms may be more general (Jensen and Mazaheri, 2010). Selective attention to low-level visual features—motion versus color—has also been shown to modulate alpha that was localized to areas MT and V4 using EEG modeling in humans (Snyder and Foxe, 2010). Therefore, it appears that attention-related alpha modulation can occur at multiple early sensory-processing levels in the visual system, with the locus of alpha modulation functionally corresponding to the type of visual information being targeted by attention. It is unknown whether the alpha mechanism is also involved in attentional control over higher levels of cortical visual processing, such as attention to objects. In the present study, we tested the hypothesis that alpha modulation is a mechanism for selective attention to objects by recording EEG from participants performing an anticipatory object attention task using the following three categories of objects: faces, scenes, and tools. Using EEG decoding methods, we provide support for this hypothesis by revealing object-specific modulations of alpha during anticipatory attention to different object categories.
Materials and Methods
Overview
The present study consisted of three experiments. Experiment 1 is the main experiment in which we tested whether EEG alpha-band topographies could be differentiated between object-based attention conditions. Analysis of EEG data included topographic power difference map construction and support vector machine (SVM) decoding of alpha-band power to quantitatively assess whether the EEG alpha band contained information about the object category being attended. In experiments 2 and 3, we tested two alternative interpretations of our results from experiment 1. In particular, in experiment 2, we tested whether decoding accuracy in the preparatory period between the cue onset and the target onset found in experiment 1 might have been based on differences in the sensory processes evoked in the visual system by the different cue stimuli, because the physical stimulus properties of the cues for the three different object attention conditions differed from one another (triangle vs square vs circle). In experiment 3, we investigated whether differences in alpha topography across object attention conditions in experiment 1 may have been the result of different task sets across the three object attention conditions, rather than reflecting object-based attention mechanisms in visual cortex.
Participants
All participants were healthy undergraduate volunteers from the University of California, Davis; had normal or corrected-to-normal vision; gave informed consent; and received course credit or monetary compensation for their time. In experiment 1, EEG data were recorded from 22 volunteers (8 males; 14 females). Two volunteers opted to discontinue their participation midway through the experiment; data from the remaining 20 participants (8 males; 12 females) were used for all analyses. In experiment 2, EEG data were recorded from 29 undergraduates; datasets from 4 participants were rejected on the basis of irreconcilable noise in the data or subject noncompliance, yielding a final dataset from 25 participants (9 males; 16 females) that was used for further decoding analysis. In experiment 3, EEG data were recorded from 12 volunteers (5 males; 7 females). Datasets from two participants were rejected on the basis of irreconcilable noise in the EEG data, yielding a final dataset of EEG data from 10 participants (5 males and 5 females) that was used for further decoding analysis.
Experimental design
The study used a within-subjects design. In experiments 1 and 3, we investigated the distributions of EEG alpha power at the scalp, as a function of attended object category, in an anticipatory cued attention task with three categories of objects (faces, scenes, and tools). In experiment 2, we investigated the distributions of EEG alpha power at the scalp during the postcue period when the three object categories were not attended in advance. Details of the cued object-based attention task, the noncued task, and the statistical analyses are presented in the following.
Statistical analysis
Behavioral response data were analyzed with a gamma-distributed generalized linear mixed model (Lo and Andrews, 2015) with a random effect of subject and fixed effects of object category and cue validity to quantitatively assess the effect of cue validity on reaction time (RT).
Differences in EEG alpha power scalp topographies as a function of cue condition were statistically analyzed using an SVM decoding approach and a nonparametric cluster-based permutation test and Monte Carlo simulation. A cluster-based statistical test was used to control for multiple-comparison issues that arise when t tests are performed at all time points over the epoch (Bae and Luck, 2018). The details of the statistical test for EEG alpha power are described in the following.
Experiment 1
Apparatus and stimuli.
Participants were comfortably seated in an electrically shielded, sound-attenuating room (ETS-Lindgren). Stimuli were presented on a VIEWPixx/EEG LED monitor (model VPX-VPX-2006A, VPixx Technologies) at a viewing distance of 85 cm, vertically centered at eye level. The display measured 23.6 inches diagonally, with a native resolution of 1920 × 1080 pixels and a refresh rate of 120 Hz. The recording room and objects in the room were painted black to avoid reflected light, and it was dimly illuminated using DC lights.
Each trial began with the pseudorandomly selected presentation of one of three possible cue types for 200 ms (1° × 1° triangle, square, or circle, using PsychToolbox; Brainard, 1997; Fig. 1A). Valid cues informed participants which target object category (face, scene, or tool, respectively) was likely to subsequently appear (80% probability). Cues were presented 1° above the central fixation point. Following pseudorandomly selected stimulus-onset asynchronies (SOAs; 1000–2500 ms) from cue onset, target stimuli (5° × 5° square image) were presented at fixation for 100 ms. On a random 20% of trials, the cues were invalid, incorrectly informing participants about the upcoming target object category. For these invalid trials, the target image was drawn with equal probability from either of the two noncued object categories. All stimuli were presented against a gray background. A white fixation dot was continuously present in the center of the display.
Target images (Fig. 1B) were selected from 60 possible images for each object category. All target images were gathered from the Internet. Face images were front-face, neutral expression, white-ethnicity faces, cropped and placed against a white background (Righi et al., 2012). Full-frame scene images were drawn from the University of Texas at Austin natural scene collection (Geisler and Perry, 2011) and campus scene collection (Burge and Geisler, 2011). Tool images, cropped, and placed against a white background, were drawn from the Bank of Standardized Stimuli (Brodeur et al., 2014). A pseudorandomly distributed intertrial interval (ITI; 1500–2500 ms) separated target offset from the cue onset of the next trial. Each set of 60 object images comprised 30 images of the following different subcategories: male/female faces, urban/natural scenes, and powered/nonpowered tools.
EEG recording.
Raw EEG data were acquired with a 64-channel Brain Products actiCAP active electrode system (Brain Products), and digitized using a Neuroscan SynAmps2 input board and amplifier (Compumedics). Signals were recorded with Scan 4.5 acquisition software (Compumedics) at a sampling rate of 1000 Hz and a DC to 200 Hz online bandpass. Sixty-four Ag/AgCl active electrodes were placed in fitted elastic caps using the following montage, in accordance with the international 10–10 system (Jurcak et al., 2007): FP1, FP2, AF7, AF3, AFz, AF4, AF8, F7, F5, F3, F1, Fz, F2, F4, F6, F8, FT9, FT7, FC5, FC3, FC1, FCz, FC2, FC4, FC6, FT8, FT10, T7, C5, C3, C1, Cz, C2, C4, C6, T8, TP9, TP7, CP5, CP3, CP1, CPz, CP2, CP4, CP6, TP8, TP10, P7, P5, P3, P1, Pz, P2, P4, P6, P8, PO7, PO3, POz, PO4, PO8, PO9, O1, Oz, O2, and PO10; with channels AFz and FCz assigned as ground and online reference, respectively. Additionally, electrodes at sites TP9 and TP10 were placed directly on the left and right mastoids. The Cz electrode was oriented to the vertex of each participant's head by measuring anterior to posterior from nasion to inion, and right to left between preauricular points. High-viscosity electrolyte gel was administered at each electrode site to facilitate conduction between electrode and scalp, and impedance values were kept at <25 kΩ. Continuous data were saved in individual files corresponding to each trial block of the stimulus paradigm.
EEG preprocessing.
All data preprocessing procedures were completed with the EEGLAB MATLAB toolbox (Delorme and Makeig, 2004). For each participant, all EEG data files were merged into a single dataset before data processing. Each dataset was visually inspected for the presence of bad channels, but no such channels were observed. The data were Hamming window sinc FIR (finite impulse response) filtered (1–83 Hz), and then downsampled to 250 Hz. Data were algebraically rereferenced to the average of all electrodes, and then further low-pass filtered to 40 Hz. Data were epoched from 500 ms before cue onset to 1000 ms after cue onset, so that anticipatory data from all trials could be examined together. Data were visually inspected to flag and reject trials with blinks and eye movements that occurred during cue presentation. Independent component analysis decomposition was then used to remove artifacts associated with blinks and eye movements.
EEG analysis.
We used a power spectral density procedure, with the Matlab periodogram(x) function (window length, 500 ms; step length, 40 ms), to extract alpha-band power for each electrode, and for each participant and cue condition. Alpha-band power was calculated as an average of power from 8 to 12 Hz. Within each participant and cue condition, power spectral density results were computed on individual trials and then averaged across trials. Averaged power spectral density results were used to visually examine alpha-band power topographies across cue conditions.
We implemented a decoding analysis to quantitatively assess whether object attention was systematically associated with changes in phase-independent alpha-band power topography across conditions. This analysis routine was adapted from a routine to decode working memory representations from scalp EEG (Bae and Luck, 2018).
Decoding was performed independently at each time point within the epochs. We implemented our decoding model with the Matlab fitecoc(x) function to use the combination of an SVM and error-correcting output coding (ECOC) algorithms. A separate binary classifier was trained for each cue condition, using a one-versus-one approach, with classifier performance combined under the ECOC approach. Thus, decoding was considered correct when the classifier correctly determined the cue condition from among the three possible cue conditions, and chance performance was set at 33.33% (one-third).
The decoding for each time point followed a sixfold cross-validation procedure. Data from five-sixths of the trials, randomly selected, were used to train the classifier with the correct labeling. The remaining one-sixth of the trials was used to test the classifier, using the Matlab predict(x) function. This entire training and testing procedure was iterated 10 times, with new training and testing data assigned randomly in each iteration. For each cue condition, each participant, and each time point, decoding accuracy was calculated by summing the number of correct labelings across trials and iterations, and dividing by the total number of labelings.
We averaged together the decoding results for all 10 iterations to examine decoding accuracy across participants at every time point in the epoch. At any given time point, above-chance decoding accuracy suggests that alpha topography contains information about the attended object category. However, a comparison of decoding accuracy to chance, by itself, is not sufficient for assessing whether an inference made on the basis of decoding accuracy is reliable. Although a one-way t test of decoding accuracies across subjects against chance would provide a t value and a statistical significance result for the time point in question, conducting the same test at each of the 375 time points included in our epoch would require a correction for multiple corrections that would result in overly conservative statistical tests. Therefore,following the study by Bae and Luck (2018), we used a Monte Carlo simulation-based significance assessment to reveal statistically significant clusters of decoding accuracies.
By the Monte Carlo statistical method, decoding accuracy was assessed against a randomly chosen integer (1, 2, or 3), representing an experimental condition, for each time point. A t test of classification accuracy across participants against chance was performed at each time point for the shuffled data. Clusters of consecutive time points with decoding accuracies determined to be statistically significant by t test were identified, and a cluster t mass was calculated for each cluster by summing the t values given by each constituent t test. Each cluster t mass was saved. This procedure was iterated 1000 times to generate a distribution of t masses to represent the null hypothesis that a given cluster of t masses from our decoding analysis was likely to have been found by random chance. The 95% cutoff t mass value was determined from the permutation-based null distribution and used as the cutoff against which cluster t masses calculated from our original decoding data could be compared. Clusters of consecutive time points in the original decoding results with t masses exceeding the permutation-based threshold were deemed statistically significant.
We performed the same decoding routine on phase-independent EEG oscillatory activity in the theta range (4–7 Hz), beta range (16–31 Hz), and gamma range (32–40 Hz) to test the hypothesis that object attention-based modulations of EEG activity are specific to the alpha range. For filtering EEG data to the beta and gamma band, we set the minimum filter order to be three times the number of samples in the experimental epoch. For filtering to the theta band, we set the minimum filter order to be two times the number of samples, because the duration of the epoch was not long enough to allow a filter order three times the number of samples.
Procedure.
Participants were instructed to maintain fixation on the center of the screen during each trial and to anticipate the cued object category until the target image appeared. They were further instructed to indicate the target image object subcategory (e.g., male/female) with a button press as quickly and accurately as possible on target presentation, using the index finger button for male (face), nature (scene), and powered (tool), and to press the middle finger button for female (face), urban (scene), and nonpowered (tool). Responses were only recorded during the ITI between target onset and the next trial. Trials were classified as correct when the recorded response matched the target image subcategory, and incorrect when the response did not match or when there was no recorded response. Each experiment block included 42 trials, lasting ∼3 min. Each participant completed 10 blocks of the experiment.
Experiment 2
The recording and analysis protocols were identical to those of experiment 1. Given that the purpose of this experiment was to test whether decoding accuracy in the preparatory period between the cue onset and the target onset might have been based on differences in the sensory processes evoked in the visual system by the different cue stimuli, we modified experiment 1 by making the cues nonpredictive of the upcoming target category. In keeping with this modification, we instructed participants that the cue shape was not informative and that the cue presentation was simply to alert them that the target stimulus would soon appear. Participants were not explicitly instructed to ignore the cue shape. While the time course of differences in sensory responses in scalp EEG filtered to alpha-band frequencies is difficult to gauge, on the basis of the previous literature (Bae and Luck, 2018), we predicted that even for alpha, any differentiable stimulus-evoked sensory activity would be restricted to a window of time within 200 ms after the cue onset. Each participant completed 10 blocks of the experiment, with each block comprising 42 trials.
Experiment 3
The recording and analysis protocols were identical to those of experiment 1. The purpose of this experiment was to investigate whether differences in alpha topography across object attention conditions in experiment 1 may have been the result of different task sets across the three object attention conditions, rather than reflecting object-based attention mechanisms in visual cortex. Specifically, in the attend-face condition of experiment 1, participants were instructed to discriminate whether the presented face was male or female, and to indicate their choice using a button box with two buttons under the index finger and middle finger. In the attend-scene condition, the task was to discriminate urban from natural scenes using the same two buttons, and in the attend-tool condition, the task was to discriminate powered from nonpowered tools using the same two buttons. Because the categories being discriminated were different across the differentcue conditions (male/female, urban/natural, power tool/hand tool), it is possible that participants were preparing different task sets across the different cue conditions during the preparatory period. After being presented with a triangle cue, for example, a participant would need to cognitively map their index finger response to the identification of a male face and their middle finger response to the identification of a female face, whereas this mapping would be different if the participant were presented with a square cue. These different task sets and mappings from visual cortex to motor response preparation could possibly have been driving the different alpha scalp topographies over the preparatory period.
This explanation is not mutually exclusive of our interpretation that alpha scalp topographies reflect differential preparatory attentional biasing in object category-selective visual areas, but, given the design of experiment 1, there is no way to know whether one, the other, or both are reflected in the differing alpha patterns. Therefore, we conducted an experiment that equated the task across all object attention conditions to eliminate any task set differences that were present in the original experiment. Based on our model that alpha is a mechanism for selective attention to objects in visual cortex, in this new design we should still observe different patterns of alpha signals for preparatory attention to object categories, which should be revealed in successful decoding late in the cue-to-target period.
Apparatus and stimuli.
The general structure of the paradigm for experiment 3 followed the paradigm of experiment 1. On each trial, a cue shape appeared, indicating the object category to attend. Cue shapes were identical to those in experiment 1. As before, a preparatory period followed the cue, and then a stimulus image appeared. An ITI separated the stimulus image and the onset of the next trial. Behavioral responses were collected during this ITI. SOA and ITI ranges were kept the same as in experiment 1.
The behavioral task for this experiment was to determine, on each trial, whether the briefly presented target image belonging to the cued object category (faces, scenes, or tools) was in focus or blurry. Unlike experiments 1 and 2, the stimuli to be discriminated were composites of an image belonging to the target category superimposed with an image belonging to a noncued, distractor category. Crucially, both the target image and the distractor image in the blend could be in focus or blurry independent of each other; therefore, the task could not be performed solely on the basis of attending to and responding to the presence or absence of blur (see experiment 3 Results for example).
Twenty percent of trials were invalidly cued, allowing us to assess the effect of cue validity on behavioral performance. For the invalid trials, the stimulus image was a composite of an image from a randomly chosen noncued object category, superimposed with a black-and-white checkerboard. The checkerboard could also be blurry or in focus independent of the object image. Participants were instructed that whenever they encountered a trial where the blended stimulus did not include an image belonging to the cued object category, but instead contained only one object image and a checkerboard overlay, then they had to indicate whether the noncued object image in the stimulus was blurry or in focus. We predicted that participants would be slower to respond on invalidly cued trials, analogous to the behavioral effect of validity observed in cued spatial attention paradigms.
The stimulus images spanned a square 5° × 5° of visual angle. To create blurred images, Gaussian blur with an SD of 2 was applied to the images.
All three object categories included 40 different individual images. On each trial, random images were drawn to produce the composite stimulus image. Scene and tool images were drawn from the same image sets as those for the original experiment. However, face images were drawn from a different image set (Ma et al., 2015) because the face images used in the original experiment were not of high enough resolution to yield reliably noticeable differences in blurred versus nonblurred conditions. All face images were cropped to ovals centered on the face and placed against a white background.
Unlike scene images, which contained visual details spanning the entire 5° × 5° square, face and tool images were set against white backgrounds and so did not contain visual information up to all the image boundaries. Therefore, to eliminate the possibility that participants could use cue information to focus spatial attention instead of object-based attention to perform the blurry/in focus discrimination on any trial where a face or tool image was included in the composite stimulus, the position of that face or tool image was randomly jittered from the center.
Procedure.
Participants were instructed to respond as quickly as they could to the target stimulus, making it vital that the participants engaged preparatory attention toward the cued object category during the preparatory period. All participants were trained with at least 126 trials of the task and were able to achieve at least 60% response accuracy before performing it under EEG data collection; to achieve this, stimulus duration was adjusted on an individual participant basis during the initial training phase. Experiment 3 was conducted in the same laboratory environment as the original experiment, and environmental setup variables were equated to those of the original experiment.
Each participant completed 15 blocks of the experiment, with each block comprising 42 trials, which represented, on average, 210 more trials per subject than experiment 1.
Results
Experiment 1
Behavioral results
Observed response accuracies were high and uniform across all object conditions and validity conditions (invalid face, 96.6%; invalid scene, 97.1%; valid face, 96.8%; valid scene, 96.7%; valid tool, 93.1%) with the exception of the invalid attend-tool condition (87.5%), which we address below.
To determine whether our task elicited a behavioral attention effect, we compared RT for target discriminations between validly and invalidly cued trials. We observed faster mean RTs for valid trials than for invalid trials, averaging across conditions (Fig. 2A) and for each condition separately (Fig. 2B).
To quantitatively assess the effect of cue validity on RT, we fit a gamma-distributed generalized linear mixed model to the RT data (Lo and Andrews, 2015). We found a significant effect of validity (valid vs invalid, p < 0.001). The model also revealed a significant main effect of object category (p < 0.001) due to the slower overall reaction times in the tool category. Thus, subjects were less accurate and slower in their responses to the tool category. Despite these slight performance decrements for the tool category, there was nonetheless a significant behavioral attention effect for the tool category, providing evidence that the subjects used all three cue types to prepare to discriminate and respond to the upcoming objects.
Alpha topography results
To qualitatively assess whether the pattern of alpha power across electrodes was different for anticipatory attention to the three cued categories of objects, we inspected topographic plots of alpha power for each object condition at different time periods following the cues, but before the onset of the target stimuli. In order to highlight differences between the alpha topographies between conditions and to control for nonspecific effects of behavioral arousal, we created pairwise alpha topography difference maps of one object-attention condition subtracted from another object-attention condition.
We observed that differences in alpha topography between object conditions emerged and evolved overthe anticipatory (cue-to-target) period (Fig. 3). In the attend-face minus attend-scene topographies (Fig. 3A), we observed increased alpha power over the left posterior scalp and decreased alpha power over the right posterior scalp during the course of the anticipatory period, with the lateralization becoming most prominent at longer postcue latencies. In the attend-face minus attend-tool topographies (Fig. 3B), the pattern was similar at the longest latencies, but was more variable in intermediate periods of time. In the attend-tool minus attend-scene topographies (Fig. 3C), the pattern of alpha differences was distinctive from those involving attend-face conditions; at the longest postcue latencies, the pattern of alpha power over the scalp was reversed from that in the other difference maps, with alpha power being higher over the left than the right posterior scalp. Overall, the presence of these differences among conditionsis consistent with variations in the underlying patterns of cortical alpha power during anticipatory attention to faces, scenes, and tools. However, given the variability across subjects, and the inherent difficulty in quantifying difference maps between subjects across attention conditions, we turned to the method of EEG decoding to quantify the differences in alpha power across the conditions that are qualitatively described in the foregoing.
SVM decoding results
SVM decoding results (Fig. 4) revealed statistically significant decoding accuracies in two clusters of time points around the range of 500–800 ms postcue and pretarget (Fig. 4, turquoise dots). Decoding accuracies in the range of −100 to +200 ms around the onset of the cue did not reach the threshold for statistical significance.
SVM decoding results for theta, beta, and gamma-band oscillatory EEG activity revealed no statistically significant decoding in the anticipatory period (Fig. 5).
Experiment 2
We observed a statistically significant cluster of above-chance decoding accuracy time points in the cue presentation window only. No further clusters of significantly above-chance decoding occurred anywhere from 200 to 1000 ms (Figs. 6).
The results of this control experiment argue against the possibility that the late-period alpha-band decoding we observed in our original experiment was simply a result of differential bottom-up sensory processes across the three cue conditions. Because the paradigm for experiment 2 was identical to the paradigm of experiment 1 in every respect other than the cue validity, and because we ran the same SVM decoding pipeline on the alpha-band EEG data from experiment 2, as we did in experiment 1, we could directly assess whether the pattern of decoding results we obtained from the original experiment was attributable to bottom-up sensory processes.
We collected data from more participants for experiment 2 than we did for our original experiment so that we could have more power in assessing the magnitude and the temporal extent of the decoding that could be achieved purely on the basis of stimulus-evoked activity. Our results support the idea that the long-latency above-chance decoding in experiment 1 is not attributable to purely sensory activity driven by physical stimulus differences, because we found that in experiment 2 statistically significant above-chance decoding occurred only in a cluster of time points at short postcue latency (<200 ms after cue onset; Fig. 6).
Experiment 3
Behavioral results
In the task of discriminating blurry from focused images (Fig. 7), we observed differences in RT between valid and invalid trials, for all object categories, such that validly cued trials elicited faster responses than invalidly cued trials (Fig. 8). In fitting a gamma-distributed generalized linear mixed model to the RT data, we found a significant effect of validity (p < 0.001).
SVM decoding results
Using the same EEG analysis and SVM decoding pipeline as for experiment 1, we found statistically significant clusters of time points exhibiting above-chance decoding accuracy (Fig. 9). Just as in experiment 1, these statistically significant clusters were observed in the second half of the preparatory period, >500 ms after the cue onset. Notably, there also appears to be a group of above-chance time points in the cue presentation window of 0–200 ms, in the same period where we observed statistically significant decoding in experiment 2 that was attributable to the cue-evoked sensory activity. However, in the results of experiment 3, like experiment 1, decoding in this cue presentation time period (<200 ms latency) did not reach the level of statistical significance (whereas with the higher number of participants in experiment 2, it could be revealed).
The behavioral results of experiment 3 suggest that participants were engaging object-based attention during the preparatory period. Participants were faster to discriminate object images as blurry or in focus when their category was cued. Analogous to the cued spatial attention paradigms, on invalidly cued trials participants were attending to one object category during the preparatory period but then, on stimulus presentation, reoriented their attention to be able to discriminate whether an image from an uncued object category was blurry or in focus.
With the behavioral effect between valid and invalid trials in line with that from our original experiment, we are confident that the experimental design in experiment 3 was engendering the same form of top-down object-based attention as was captured by experiment 1. Therefore, in observing statistically significant above-chance decoding in the same general window of time after cue onset for experiments 1 and 3, we interpret this finding as evidence that object-based attention, and not task set or motor response preparation differences, is driving the longer-latency decoding result before the onset of the targets.
Discussion
Object-based attention is a fundamental component of natural vision. People navigate the world principally on the basis of interactions with objects, which abound in typical environments (O'Craven et al., 1999; Scholl, 2001). The primacy of objects means that adaptive interaction with the world requires high-level object representations that are distinct from low-level visual features in the same region of space. Therefore, an effect of attention directly on object representations is a critical aspect of perception (Woodman et al., 2009). Attention has been shown to operate on object representations (Tipper and Behrmann, 1996; Behrmann et al., 1998), so identifying the neural mechanisms by which attention influences object representations is a key goal in cognitive neuroscience.
Physiologic studies show that the performance benefits of attention correlate with neural activity changes in perceptual systems. Cortical structures coding attended information show increased signal amplitude, synchrony, and/or functional connectivity (Moore and Zirnsak, 2017). How the nervous system mechanistically controls this cortical excitability and processing efficiency remains incompletely understood, but most models suggest that top-down control signals from higher-order networks in frontal and parietal cortex alter processing in sensory/perceptual cortical regions coding attended and unattended information (Petersen and Posner, 2012). One hypothesized neural signature of top-down control at the level of sensory/perceptual cortex is focal alpha power (Jensen and Mazaheri, 2010). Changes in alpha power occur during spatial attention (Worden et al., 2000), and feature attention (Snyder and Foxe, 2010). Here we investigated alpha-based mechanisms mediating selective attention to objects by cuing attention to different objects and measuring changes in scalp-recorded EEG alpha power. We combined behavior with EEG topographic mapping and decoding to test the hypothesis that object attention involves selective alpha power modulations in object-specific cortex.
We chose faces, scenes, and tools as attentional targets because these objects have been shown to activate circumscribed areas in the visual cortex. The fusiform face area (FFA) is selectively responsive to images of upright faces (Allison et al., 1994; Kanwisher and Yovel, 2006): faces can be considered objects because, for example, evidence from patients with prosopagnosia suggests that the similar mechanisms underlie face recognition and object recognition (Gauthier et al., 1999). The parahippocampal place area (PPA) is responsive to scenes (Epstein et al., 1999), and specifically to scene category (Henriksson et al., 2019). Areas responsive to tools have been identified in the ventral and dorsal visual pathways (Kersey et al., 2016). In line with the prediction that object-based attention modulates alpha in visual areas specialized for processing the attended object category, attention to faces should selectively decrease alpha-band activity in face-selective visual areas like FFA, attention to scenes should decrease alpha-band activity in place-selective areas like PPA, and attention to tools should decrease alpha-band activity over tool-selective regions of the ventral and dorsal visual pathways. EEG is not a strong method for localizing the neural sources of brain activity, but, given that the FFA, PPA, and postulated tool-specialized areas are located in different cortical regions, the patterns of alpha modulations with attention in these areas would be expected to produce differential EEG alpha patterns on the scalp. Given that such patterns might be expected to be only subtly different, and in ways difficult to predict, one avenue for assessing different patterns of alpha for attention to different objects is to incorporate machine learning to decode scalp EEG alpha patterns. Such differences should only be expected if focal modulation of alpha is also involved in selective object attention.
Our reaction time results showed that participants who engaged object-based attention to cued object categories were faster to identify cued objects. Theoretically, when cued to anticipate a particular object category, participants would bias neural activity within the cortical areas specialized for that object type and perhaps also bias activity within cortical areas processing all the lower-level visual features associated with that object (Cohen and Tong, 2015). When the target appears, its visual properties would thus be integrated, facilitating the required perceptual discrimination. When the object appearing is from an unanticipated (uncued) category, activity in object-selective areas and associated visual feature areas for the uncued objects would be relatively suppressed, delaying the integration and semantic parsing of uncued target images, and slowing reaction times.
Topographic alpha difference maps varied with the object category that was attended. Differing alpha topographies were consistent with scalp EEG patterns that would be expected if the alpha modulations were occurring in different underlying cortical generators (cortical patches or areas) for the three objectcategories. The wealth of evidence about underlying neuroanatomical substrates of face, scene, and tool processing from imaging studies allows some predictions about our data with respect to the hypothesized nature of the focal cortical activity contributing to our topographic and decoding findings. The right hemisphere-emphasized FFA (Kanwisher et al., 1997), and the equally bilaterally distributed PPA (Epstein and Kanwisher, 1998), would, in principle, predict a differential scalp alpha distribution and perhaps lower alpha power broadly over the right occipital when attending faces. Our attend-face minus attend-scene alpha topography was generally consistent with this prediction (Fig. 3A), and this pattern was different from that in the attend-face minus attend-tool difference plot (Fig. 3B). We hope to make exceptionally clear, however, that we are not proposing that we can localize the underlying cortical generators of scalp-recorded activity using the methods we used here; hence, we turned to decoding.
Our decoding analyses provide strong support for the claim that attention modulates alpha topographies in an object category-specific manner and is in line with the time courses of the differences in alpha patterns observed in the scalp topographic difference plots. In our decoding analyses, statistically significant above-chance decoding accuracy provides straightforward evidence that alpha topography contains information about the selected object category, and, therefore, that top-down object-based attention modulates alpha topography according to the cued (attended) object category. We observed that statistically significant decoding occurred in the 500–800 ms range postcue/pretarget, indicating that patterns of alpha topography at the scalp were reliably modulated by our attention manipulation in this time range (Fig. 4). Importantly, the 500–800 ms range corresponds to the periods in the alpha topographic difference plots where the patterns stabilized.
In order to test whether our decoding results were specific for the alpha band, we performed the same SVM decoding routine on theta-, beta-, and gamma-band power and found no significant above-chance decoding in the anticipatory period for those frequency bands (Fig. 5). This result is consistent with the hypothesis that oscillatory neural activity in the alpha band is mechanistically involved in anticipatory attention, whereas activity in other EEG frequency bands is not modulated in target-relevant visual areas in human EEG.
In two follow-up experiments, we directly assessed two alternative interpretations of our decoding results from experiment 1. First, differences in alpha scalp topography postcue might reflect purely sensory processing associated with each cue (e.g., triangle vs circle). This should be applicable only to the above-chance (although not significant by our tests) decoding observable in the early postcue period (∼0–200 ms) in Figure 4, and not to the significant longer-latency decoding. Indeed, we verified this in experiment 2, in which participants performed the same task, and saw the same cues and targets as in experiment 1, but the cue shape did not predict the upcoming object category. We observed statistically significant decoding in the postcue/pretarget period from 0 to 200 ms attributable to physical cue features (Bae and Luck, 2018), but no significant decoding later in the cue-to-target interval.
A second alternative explanation of our decoding results from experiment 1 is that they were driven by task set differences across cued object conditions. The task for faces, for example, was to discriminate gender, while for scenes it was to distinguish between urban scenes and natural scenes, leaving open the possibility that our decoding late in the postcue period reflected task set differences (Hubbard et al., 2019) rather than attentional control over object selection as we propose. We can reject this alternative based on the results of experiment 3, in which the cues predicted the relevant target object, but the discrimination task was the same for all object categories: to discriminate whether the cued object was in focus or blurred. We were thus able to replicate the longer-latency alpha-related preparatory attention effects reported in experiment 1 while controlling task set factors.
Our findings show that EEG alpha modulation is linked to object-based selective attention, extending previous findings that alpha modulation is associated with attention to spatial locations and low-level visual features. Using an SVM decoding approach, we identified differences in the topographic patterns of alpha power during selective attention to different object categories. Further, we linked the time range during which statistically significant decoding was achieved to alpha power topographic maps and observed that alpha modulation was consistent with the time course of preparatory attention observed in prior research. Overall, these findings support the model that alpha-band neural activity functions as an attentional modulator of sensory processing for both low-level visual features and high-order neural representations such as those for objects.
Footnotes
This work was supported by National Institutes of Health Grant MH117991 to G.R.M. and M.D. S.N. was supported by T32EY015387. We thank Michael J. Tarr, Center for the Neural Basis of Cognition and Department of Psychology, Carnegie Mellon University (http://www.tarrlab.org/) for face stimulus images, with funding provided by National Science Foundation Award 0339122. We also thank Steven J. Luck and Gi-Yeul Bae for advice on analyses using decoding methods, and Atish Kumar and Tamim Hassan for assistance with data collection.
The authors declare no competing financial interests.
- Correspondence should be addressed to George R. Mangun at grmangun{at}ucdavis.edu