Abstract
The selection of behaviorally relevant information from cluttered visual scenes (often referred to as “attention”) is mediated by a cortical large-scale network consisting of areas in occipital, temporal, parietal, and frontal cortex that is organized into a functional hierarchy of feedforward and feedback pathways. In the human brain, little is known about the temporal dynamics of attentional processing from studies at the mesoscopic level of electrocorticography (ECoG), that combines millisecond temporal resolution with precise anatomical localization of recording sites. We analyzed high-frequency broadband responses (HFB) responses from 626 electrodes implanted in 8 epilepsy patients who performed a spatial attention task. Electrode locations were reconstructed using a probabilistic atlas of the human visual system. HFB responses showed high spatial selectivity and tuning, constituting ECoG response fields (RFs), within and outside the topographic visual system. In accordance with monkey physiology studies, both RF widths and onset latencies increased systematically across the visual processing hierarchy. We used the spatial specificity of HFB responses to quantitatively study spatial attention effects and their temporal dynamics to probe a hierarchical top-down model suggesting that feedback signals back propagate the visual processing hierarchy. Consistent with such a model, the strengths of attentional modulation were found to be greater and modulation latencies to be shorter in posterior parietal cortex, middle temporal cortex and ventral extrastriate cortex compared with early visual cortex. However, inconsistent with such a model, attention effects were weaker and more delayed in anterior parietal and frontal cortex.
SIGNIFICANCE STATEMENT In the human brain, visual attention has been predominantly studied using methods with high spatial, but poor temporal resolution such as fMRI, or high temporal, but poor spatial resolution such as EEG/MEG. Here, we investigate temporal dynamics and attention effects across the human visual system at a mesoscopic level that combines precise spatial and temporal measurements by using electrocorticography in epilepsy patients performing a classical spatial attention task. Electrode locations were reconstructed using a probabilistic atlas of the human visual system, thereby relating them to topography and processing hierarchy. We demonstrate regional differences in temporal dynamics across the attention network. Our findings do not fully support a top-down model that promotes influences on visual cortex by reversing the processing hierarchy.
Introduction
The selection of information from cluttered visual environments (often referred to as “attention”) is a fundamental problem in cognitive neuroscience. This process is mediated by a cortical large-scale network consisting of areas in occipital, temporal, parietal, and frontal cortex (Desimone and Duncan, 1995; Kastner and Ungerleider, 2000; Corbetta and Shulman, 2002; Saalmann and Kastner, 2011; Buschman and Kastner, 2015; Caspari et al., 2015; Moore and Zirnsak, 2017). Anatomical and functional studies indicate that this network is organized into a hierarchy of feedforward and feedback pathways that are dynamically modulated by attention for selective routing of information. Anatomically, this processing hierarchy is constrained by specific laminar projection patterns that index feedforward and feedback connectivity (Felleman and Van Essen, 1991; Markov et al., 2014). Functionally, it is characterized by inter-areal interactions that use distinct frequency channels indexing feedforward and feedback signaling (van Kerkoerle et al., 2014; Bastos et al., 2015; Michalareas et al., 2016). Evidence from studies in patients suffering from attentional deficits because of brain damage, as well as inactivation and microstimulation studies in nonhuman primates, indicate that frontoparietal areas generate attention-related modulatory signals that are fed back to sensory cortex (Barceló et al., 2000; Moore and Armstrong, 2003; Corbetta and Shulman, 2011). Consistent with such a feedback model of attention control, it has been shown in monkey physiology studies that modulatory attention effects are greater and modulation latencies are shorter in higher-order compared with lower-order cortex, suggesting that attention-related feedback signals reverse the visual processing hierarchy (Mehta et al., 2000; Buffalo et al., 2010).
In the human brain, selective attention has been predominantly studied with methods that emphasize network level analyses and have either relatively high spatial, but poor temporal resolution such as fMRI, or high temporal, but relatively poor spatial resolution such as MEG/EEG. The functional hierarchy of feedforward and feedback pathways based on inter-areal interactions has been recently reported for the human visual system using MEG (Michalareas et al., 2016). However, the precise temporal dynamics during feedforward and feedback selective visual processing are not known. Only few studies have been performed at the mesoscopic level of intracranial EEG, or electrocorticography (ECoG), that combines millisecond temporal resolution with precise anatomical localization of recording sites (for review, see Parvizi and Kastner, 2018). In particular, high-frequency broadband (HFB) responses >70 Hz show time-locking to specific sensory, motor, and cognitive events (Kreiman et al., 2006; Flinker et al., 2011; Hermes et al., 2012; Mesgarani et al., 2014). Thus far, spatially and feature-specific attentional modulation of HFB responses have been reported in visual cortex (Yoshor et al., 2007; Davidesco et al., 2013; Szczepanski et al., 2014).
Here, we studied HFB responses from hundreds of electrodes covering occipital, temporal, parietal, and frontal cortex in patients performing a classical spatial attention task. Electrode locations were reconstructed using a probabilistic atlas of the human visual system (Wang et al., 2015), thereby relating them to topography and processing hierarchy. We characterized the spatial specificity of HFB responses and used this property to quantitatively study spatial attention effects on baseline and visually-evoked activity across topographic and nontopographic cortex. Further, we investigated response onset and attentional modulation latencies to characterize the temporal dynamics of feedforward and feedback processing across the visual system during spatial attention.
Materials and Methods
Subjects
Eight subjects (S1–S8, 6 males, age: 35 ± 5, mean ± SEM; for further information, see Table 1), who underwent presurgical epilepsy evaluation, provided written informed consent to participate in the study. Experimental procedures were approved by the Institutional Review Boards of the participating institutions. Anti-epileptic medications were discontinued for 2–3 d before testing, and subjects were seizure free for at least 5 h before testing. Subjects had normal or corrected-to-normal vision.
Subjects were implanted with 52–128 electrodes (1 cm spacing in grids and strips), covering extensive parts of frontal, parietal, occipital, and temporal cortex in their left (7 subjects) and right (1 subject) hemispheres (for electrode locations from all subjects, see Fig. 1; for coverage information of each subject, see Table 1). The positioning of electrode grids and strips was entirely based on clinical criteria pertaining to diagnostic procedures.
Visual display, stimuli, and task
Visual displays were generated on a Dell Precision M4600 laptop (Dell) using Presentation software (Neurobehavioral Systems). Light gray stimuli were presented on a darker gray background at 50% contrast (Fig. 2A). The timing of visual and auditory stimulus presentations was verified using a custom photodiode and microphone system. A microphone recorded auditory cues (starting tone and response feedback sounds; see next paragraph for task description). A photodiode placed at the lower right corner of the monitor recorded timing of each visual stimulus using a simultaneous light square presented at the location of the photodiode receptor. The computer screen was placed at a distance of ∼80 cm from the subject's eyes.
Subjects performed a variant of the Eriksen flanker task (Eriksen and Eriksen, 1974; Eriksen, 1995; Saalmann et al., 2012), discriminating between one of two target shapes that were shown embedded in a circular array of distracter shapes (Fig. 2A). Subjects were instructed to maintain fixation throughout the duration of each trial. Following a 2 s intertrial interval, each trial started with the presentation of a central fixation point (0.5°) and a coincidental tone. After 1100 ms, a circular spatial cue (1.5°) was displayed for 100 ms at a pseudorandomly chosen peripheral location (7° eccentricity), followed by a variable delay period (300–700 ms) and the presentation of a circular array of equally spaced barrel and bowtie shapes (each ∼2 × 2°). The array was displayed for 2000 ms or until the subject responded, indicating with a left or right mouse-button press, respectively, whether a barrel or bowtie shape was presented at the cued location. Barrel and bowtie target stimuli were presented randomly with equal likelihood, and flanking shapes were either congruent (same shape in nearest neighboring positions) or incongruent (different shape in nearest neighboring positions). Feedback on performance was given to the subject upon completion of each trial via tones signaling a correct or incorrect response. To minimize stress for the patients, they were instructed to emphasize accuracy rather than speed of responses. Following task instructions, subjects performed a training block to familiarize themselves with the task. During the experiment, trials were presented in blocks of 50, and 3–6 blocks were recorded per subject (Table 1). The number of cued locations and shapes in the target array was 8 (1 subject), 14 (5 subjects), or 16 (2 subjects).
To confirm fixation performance throughout the task, eye movements were visually monitored by the experimenter, and video recordings of the patient's face and eyes were performed throughout the experiment in the epilepsy monitoring care unit. No systematic saccadic eye movements were observed during task performance.
Data acquisition
Electrophysiological and peripheral (photodiode and microphone) channels were recorded using a 128-channel Tucker-Davis Technologies recording system at Stanford, a 128-channel Stellate Harmonic or Blackrock recording system at Johns Hopkins, a 128-channel Nihon Kohden recording system at Children's Hospital, and a 256-channel Nihon Kohden recording system (model JE120A) at UC Irvine. Signals were sampled at 3052 Hz (Tucker-Davis), 1000 Hz (Stellate), 5000 Hz (Nihon Kohden), or 10,000 Hz (Blackrock), amplified and filtered (0.5–300 Hz at Stanford; 0.1–350 Hz (Stellate), or 0.3–2500 Hz (Blackrock, at Johns Hopkins), using a subdural electrode reference and a scalp ground. Data were digitized and resampled off-line at 1000 Hz to equate analysis across sites.
Electrode localization
For subjects S1–S6, postoperative CT images of the implanted electrodes were aligned with preoperative structural MRIs. For localization of electrodes within the visual system, a probabilistic atlas of visuospatial topographic areas, which is based on fMRI retinotopic mapping data from 53 healthy subjects (Wang et al., 2015), was combined with each subject's structural MRI. Specifically, after obtaining coregistration parameters between the MRI and CT images using normalized mutual information algorithms implemented in Bioimage Suite software, electrode locations were mapped onto a rendering of the 3-D brain surface that was generated from the subject's structural MRI volume using FreeSurfer software (Dale et al., 1999; Fischl et al., 1999) and converted to a standard surface template using SUMA (Saad et al., 2004) and AFNI software. The probabilistic atlas of visuospatial topographic areas (Wang et al., 2015) was then superimposed onto each subject's brain surface. Using the maximum probability map, which assigns each node in the standard space to the topographic area with the highest probability, each electrode location that overlapped with the atlas was assigned to its maximally probable area. Sites that did not overlap the maximum probability map but were within one grid spacing (N = 17, 10 mm spacing) to the nearest maximally probable area were included with the area. For subjects S7 and S8, the electrode locations were reconstructed on a standard surface based on postoperative drawings of the electrode positions. The electrode grids in these two subjects did not overlap with the probabilistic atlas. Recording sites outside visuospatial topographic areas were located using the Harvard-Oxford cortical parcellation that is based on anatomical markers (Desikan et al., 2006).
Data analysis
Behavioral data.
For each subject, accuracy (as the proportion of correct trials relative to the number of all trials) and mean reaction times (RTs; averaged across all correct trials) were computed. Trials with RTs >3 SD from the mean were excluded from analyses (median 2% of trials, min = 0.5%, max = 3.5%). We also computed accuracy as a function of flanker condition to determine behavioral flanker effects (i.e., higher accuracy for congruent than incongruent conditions). Because response speed was not emphasized in our task, RTs were not a reliable measure of flanker effects. For the analyses of neural data, only trials with correct responses and appropriate RTs were included; there were insufficient numbers of incorrect trials for reliable analysis.
Neural data: preprocessing and time frequency analysis.
A neurologist manually inspected all ECoG channels to identify those with interictal or ictal epileptiform activity and artifacts. Channels and epochs contaminated by epileptiform activity or abnormal signals (e.g., poor contact, excess drift, high-frequency noise) as well as those located over MRI defined abnormal sites were excluded from analysis (Table 1 shows the number of electrodes recorded and analyzed per subject). We excluded 16% of recorded electrodes based on these criteria (122/758). Off-line, the intracranial field potentials (IFPs) from the remaining 636 electrodes recorded across the eight subjects were referenced to each subject's common average. Power line noise and its harmonics were removed using a two-way zero phase-lag finite impulse response notch filter (±2 Hz).
All analyses were performed using the EEGLAB toolbox (Delorme and Makeig, 2004) and customized scripts written in MATLAB (MathWorks). Time series were aligned separately to the cue and array onset and sorted by cue location. To increase the number of trials available for each analysis, trials from each cue location were combined with the two closest locations on either side (only in cases of 14–16 cue locations). This resulted in spatial smoothing around each location of ∼25° of visual angle, yielding a minimum of 25 correct trials per cue location.
For each electrode, power spectra were calculated by applying a Hilbert transform to bandpass filtered ECoG IFPs. First, the IFPs were filtered using a two-way zero phase-lag finite impulse response filter. We defined the filter order as 3r, where r is the ratio of the sampling rate to the low-frequency cutoff of the filter, rounded down, in each of the analyzed pass bands. For full-spectrum analyses, we used multiple logarithmically-spaced pass bands with partially overlapping bands from 0.5–250 Hz (as by Voytek et al., 2013): the first pass band was seeded such that fp(1) = (0.5, 0.9), and in subsequent bands fL(n) = 0.85 × (fH(n-1)) and fH(n) = 1.1 × (fH(n-1) − fL(n-1)) + fL(n). We applied the Hilbert transform to each filtered time series x to acquire the analytic amplitude ax(n). The instantaneous power in band fp(n) at each time point in x is the mean over trials of ax(n). In this paper, we focus our analyses on task-related power modulations in HFB responses >70 Hz because of their high spatial specificity and temporal precision (Crone et al., 1998, 2006; Cheung et al., 2016; Parvizi and Kastner, 2018). Although the neural basis of HFB responses is still not entirely clear, these signals have been shown to correlate with multiunit activity obtained from thousands of neurons in the immediate vicinity of the recording electrode (Ray et al., 2008a; Ray and Maunsell, 2011; Rich and Wallis, 2017; Watson et al., 2018). More recent findings indicate CA+ dendritic spikes in supragranular cortex as a principle contributor to pial HFB responses (Leszczyñski et al., Unpublished observations). Here, HFB responses were defined as the average power between the pass bands centered at 70 and 200 Hz. These band definitions applied to the logarithmically-spaced bands yielded averages between 61.6 and 206.6 Hz.
Outlier time points (HFB power modulations >6 SD of the mean for time points in the 50–400 ms following cue and array onset), and trials with outlier cue- or array-evoked power compared with other trials of that same condition (each trial mean in the interval 50–300 ms following the cue or array >6 SD of the mean across all trials in that condition) were eliminated. Typically, <6% of trials per electrode were excluded (median 5%, min = 0%, max = 16%).
Identification of task-related activity.
For each electrode, the mean IFP HFB power was calculated for each of the 8–16 peripheral locations and for four task-related epochs: cue-evoked (50–250 ms after cue onset), delay-related (200 ms before array onset), early array-evoked (50–200 ms after array onset), and late array-evoked (300–500 ms after array onset). HFB power fluctuations during these epochs were compared with baseline activity occurring 200 ms before cue onset. Because there is no sharp transition in the signals between cue-evoked and delay activity, we defined the length of the presumed cue-evoked time interval post hoc based on the time course of cue-evoked activity in topographic area V1d/v, which showed a sharp decline of cue-evoked responses after 250 ms and did not appear to show any elevated delay activity in our recordings (see Fig. 4A, red trace). To avoid contamination of cue-evoked (i.e., sensory-driven) with delay-related (i.e., driven by the cognitive state) activity, only trials with delays >450 ms (the median split of trials) were used for all analyses regarding delay-related activity. Similarly, to avoid contamination from motor responses, trials with reaction times <500 ms were excluded from analyses of array-related activity (median 0, min = 0, max = 9).
Task-responsive recording sites were identified based on the following criteria. First, a nonparametric cluster method (see Tests of Statistical Significance) was used to determine whether significant cue-evoked HFB power (compared with baseline) was sustained for at least 100 consecutive milliseconds at any of the peripheral locations. Second, the reliability of the trial-wise power at those locations was measured by generating bootstrapped distributions of the mean power during the cue-related epoch (1000 resamplings over trials of the cue-evoked HFB power relative to the mean baseline power); sites were included only if the 95% confidence interval (CI) of the bootstrapped distribution was greater than zero. Sites with significant delay- or array-related HFB power modulation were identified using the second criterion applied to the respective epochs.
Spatial tuning functions.
After identifying sites with significant task-evoked responses in the HFB power of the IFPs for at least one peripheral location, we examined their relative responses across all peripheral locations to determine their spatial tuning properties. In cases of spatial tuning, we defined a response field center (RFC) as the location evoking the strongest power relative to baseline in response to the cue. Each site was considered to have a spatially-tuned IFP RF, if its tuning curve met three criteria. First, it had a significant task-evoked response at RFC. Second, we determined whether the IFP responses were spatially selective by comparing the peak of the tuning curve (defined as RFC) to the opposite location (RFnull) using a bootstrapped randomization. We generated a null distribution of randomized differences between RFC and RFnull means by drawing with replacement from a pool of all RFC and RFnull trials, including the number of RFC trials in one mean and the number of RFnull trials in the other. The difference between these randomly generated means was added to a null distribution of randomized differences. The quantile of the real difference (RFC − RFnull) in the null distribution of randomized differences was taken as the p value of the real difference. We rejected the null hypothesis that activity in RFC and RFnull trials were recorded from the same distribution of responses for p values < 0.01. And third, we determined whether each tuning curve was well described by a Gaussian function, where the variance explained by the fit of a Gaussian function was >60% (r2 > 0.6). Because task-evoked responses were recorded at locations arranged around a circular array at a constant eccentricity of 7°, the measured widths were converted from degrees of visual angle (dva) to circular distance around the arc: wid = dva × 2π7°/360. A few sites were excluded due to exceptionally wide variance of the Gaussian fit (excluded if σ > 240 dva; N = 6). Across all sites that met these criteria, the median σ was 52 dva, which corresponds to an arc length of 6° (min = 2°, max = 17°). Spatial tuning was similarly determined for delay and late array activity by comparing HFB responses when attention was directed to RFC (or neighboring locations) compared with RFnull. We refer to the spatial tuning properties during the delay as “memory field”, and those in response to the attended (vs unattended) array as “attention field” (for examples, see Fig. 3).
Spatial tuning functions were generated by centering the mean power at each location on RFC, and the tuning width was measured as half the area under the normalized tuning curve. Since subjects had different numbers of cue locations, we found the cubic spline interpolation of each tuning curve using the least common multiple of the subjects' location counts, which allowed us to compare spatial tuning of HFB responses from all recording sites within a cortical area. Each tuning curve was then normalized to its peak. The population response is shown as the mean of the smoothed, normalized tuning curves within each area. Error bars correspond to the 95% CIs of bootstrapped distributions generated by resampling 500 times with replacement from trials in each condition at each site.
Response onset latencies.
For each electrode, the onset latency of HFB responses was measured as the time-to-half-peak at RFC in response to the cue, following analytical steps as in Lee et al. (2007). We first smoothed the HFB time series of each trial at RFC with an 8 ms σ Gaussian kernel. A distribution of baseline trialwise means (blm) was generated by randomly selecting power values 1000 times from all the baseline times and trials, equivalent to the number of trials (Ntr) and times (Nti) at RFC, then taking the mean over the Ntr to generate a distribution of 1000 randomized baseline time series. The response peak was defined as the maximum at RFC in the 50–250 ms following cue onset that was >99.9% of the blm distribution (p < 0.001). To ensure that we measured elevated, increasing responses, we set the minimum response time (L0) as the first time at least 50 ms after cue onset that the response was more than half the peak value. The response onset latency was then taken as the first time point between L0 and 250 ms after cue onset that the power exceeded half the peak. Only sites with response onset latencies during this time period were considered to have cue-evoked responses. To compare array onset latencies to cue responses, we also performed this analysis using array-evoked activity in the attend-to-RFnull condition, defined below.
Attentional modulation: magnitude and topography of effects.
To determine the strengths of attentional modulation during the delay and in response to the array, we compared mean HFB power from trials when attention was directed to RFC (the attend-to-RFC condition) to trials when attention was directed away from RFC toward the opposite field location (the attend-to-RFnull condition). We compared these trial-wise means by calculating an attentional modulation index (MI) of the normalized means in each epoch. For each site, the time series of the responses in the attend-to-RFC and the attend-to-RFnull conditions were normalized to the maximum value in the 500 ms window following cue onset (for delay effects) or array onset. The population time series for each area was the mean of these normalized time series across sites. The modulation index was the mean difference between the normalized attend-to-RFC and the attend-to-RFnull time series in the time window of interest, yielding the proportion of the maximum response. A distribution of bootstrapped MI values was found for each area by repeating the MI calculation 1000 times after resampling with replacement from trials in the attend-to-RFC and attend-to-RFnull conditions.
MI values were determined for each site, and sites were assigned to an enhanced (MI > 0) or suppressed (MI < 0) group within each area, and then averaged across sites to yield population data. Note that the assignment of sites to these groups did not rely on a significance test, and was presumed to include noise around zero.
The MI values during the delay and in the late array window were mapped onto brain surfaces and combined across subjects onto a surface in common space to yield their topography. Specifically, electrode coordinates of each subject were first identified in their native brain space then realigned to a normalized brain. For sites with a response field, the topography of attentional modulation effects during the delay and late array windows across subjects were plotted in this common space with color indicating MI spread cortically using a Gaussian kernel of 4 cm. Large dots denote the topographic sites, and small dots the nontopographic ones.
Attentional modulation: latencies.
Attentional modulation latencies were calculated based on the time courses of HFB responses evoked by the array in the attend-to-RFC condition versus the attend-to-RFnull condition. Time series were averaged across recording sites from the same area with an enhanced (or separately for suppressed) modulation index to yield population data; the modulation latencies were determined based on these population data. The modulation latency was defined as the first time point in a series of at least 50 consecutive milliseconds after the array onset latency (defined above as the time to half peak of the response at RFnull) during which the responses in the attend-to-RFC condition were greater (or smaller in the case of suppressive effects) than in the attend-to-RFnull condition using the cluster method described in the following section. Our approach is similar to other studies measuring attentional modulation latencies, using the first of several consecutive significant time points (Gregoriou et al., 2009; Buffalo et al., 2010); however, we required longer clusters of significance (50 ms compared with 30 ms) and smaller time bins (1 ms compared with 10 ms) due to the differences in signal quality in HFB power compared with spiking activity.
Tests of statistical significance.
To compare effects between areas, we generated bootstrapped distributions of the population means across sites within each area by randomly resampling 500 times with replacement from the trials in each condition. For example, for tuning widths we resampled from trials at each cue location to generate a randomized mean for each site at that cue location, then took the mean across the sites in the area, repeated 500 times to generate a distribution across the population of sites in that area. Using these distributions, we compared the means between every area using ANOVA, and the significance of each difference was determined by applying the Holm–Bonferroni sequential correction for multiple comparisons on the resulting p values. In this method, a single target α level is applied across the set of tests, yielding a single p value for all tests. Across all comparisons, the p values from the ANOVA were ranked from the smallest to the largest and compared with a ranked α level determined by the following: where n was the number of tests, and the Target Alpha Level was set as 0.05. For instance, 15 areas were included in the comparison of tuning widths (see Table 5), so the number of tests n was = 105. In order of their rank, if a test had p − valuerank < αrank, then that test was considered significant at the Target Alpha Level. The first test with p − valuerank ≥ αrank was not significant, as well as all subsequent tests.
To determine whether an effect within an area was significantly different from zero, we found the 95% CI of the bootstrapped distribution. Areas with CI that did not overlap zero were significantly modulated (p < 0.05). We used Spearman's rank correlation to determine the relationship between cue-evoked tuning widths and latencies.
For measurements of sustained cue-evoked activity, we used a nonparametric cluster method (Maris and Oostenveld, 2007) to determine the number of sequential time points with significant enhancement relative to baseline. With this method, we set a threshold for significance (p < 0.05) and found clusters of sequential time points after the cue onset latency with significantly elevated power at RFC. We used the quantile of the RFC power at each time point relative to a randomized distribution of baseline mean values as the test statistic at each time point. The cluster level statistic was the sum of the test statistics in the cluster. We compared veridical cluster level statistics to a null distribution of cluster level statistics generated by randomly assigning time points as event-related or baseline. Clusters of time points were significant if their veridical cluster level statistic was >99% of the randomly generated cluster level statistics in the null distribution (p < 0.01).
To determine the attentional modulation latencies after array onset, we repeated the assessment of sustained activity but used the time series after the array onset and compared the attend-to-RFC condition to the attend-to-RFnull condition rather than to baseline. The latency of attentional modulation was the first time point of the first cluster after the array onset latency when attend-to-RFC was greater than attend-to-RFnull (or smaller in the case of suppression effects).
Only areas with at least half of the bootstrapped calculations yielding a modulation latency were included in the groupwise comparison, thus areas ISP4+, frontal eye fields (FEF), and the nontopographic regions of occipital cortex were excluded from the group of modulation latencies. For area V1d/v enhanced sites, the distribution of bootstrapped modulation latencies was bimodal, so we separated the population of those latencies into two groups, which had an early (V1c1) and a late (V1c2) component. The distributions for V1c1 and V1c2 were used in the groupwise comparisons.
Results
We recorded IFPs from 758 subdural electrodes implanted over parietal, occipital, temporal, and frontal cortex in 8 patients, who underwent presurgical epilepsy evaluation (Table 1; Fig. 1) while performing a spatial attention task. We eliminated 122 electrode channels that were compromised because of noise or epileptiform activity, yielding 636 channels for analysis.
Electrode localization
In each patient, structural MRI and CT images of the implanted electrodes were used to reconstruct their locations in occipital (N = 54), temporal (N = 170), parietal (N = 280), and frontal cortex (N = 132). To relate electrode positions more specifically to topographically organized areas of the visual system, we combined the structural MRI of each individual patient with a probabilistic atlas of visuospatial topographic cortex (Wang et al., 2015). Electrode locations from all patients in relation to this probabilistic atlas are shown in Figure 1, rendered onto the left hemispheric surface of a standard brain and displaying posterior, lateral, and medial views. One hundred and thirty-three electrodes were located in the topographic visual system, including in early visual (V1-V3d/v, N = 36), dorsal extrastriate (V3A/B, TO1-2, N = 24), ventral extrastriate (hV4, LO1-2, VO1-2, PHC1-2, N = 24), and posterior parietal cortex, particularly in areas along the intraparietal sulcus (IPS; N = 42), as well as in the superior parietal lobule (SPL1, N = 3), and in frontal cortex (FEF, N = 4). The remaining 503 electrodes were implanted outside visuospatial topographic areas. Using the Harvard-Oxford parcellation that differentiates cortical areas using anatomical markers (Desikan et al., 2006), these electrodes were broadly localized by lobe into occipital, temporal, parietal, and frontal categories. Because we did not find systematic differences in our analyses within a given category, results were combined by lobe (designated “Nontopographic, occipital” etc.). The electrodes in nontopographic cortex were distributed across parietal (N = 235), temporal (N = 134), and frontal lobes (N = 128), with only six electrodes in the occipital lobe located outside topographic cortex.
Task design and behavioral results
The patients were tested in a variant of the Eriksen flanker task (Eriksen and Eriksen, 1974; Eriksen, 1995), a classical spatial attention task that we also use in parallel monkey electrophysiology studies (Saalmann et al., 2012). Each trial of the task (Fig. 2A) was initiated by an auditory tone and the presentation of a fixation point on a computer monitor. After a fixation period of 1100 ms, a cue was flashed briefly in a pseudo-randomly selected location arranged in a circular manner around the fixation point at a fixed eccentricity of 7°. The cue indicated with 100% validity the location of a subsequently presented target shape. After a variable delay period (300–700 ms), a circular array of barrel and bow tie shapes was presented, and the patients indicated with a left or right mouse button press which shape (i.e., barrel or bow tie) appeared at the cued location. Patients performed between 150 and 300 trials of this task (Table 1) and achieved high accuracies ranging from 83 to 96% (mean = 93 ± 2%). Importantly, the patients showed the classical flanker effect, with higher accuracies for targets that were flanked by congruent shapes than targets that were flanked by incongruent shapes (congruent: mean = 96 ± 2%, incongruent: mean = 90 ± 3%; t test, p < 0.04). This behavioral pattern indicates that the patients were engaged in the task and able to successfully perform it. To characterize the temporal dynamics of visual processing and its influences by attentional task demands, we report here on electrophysiological results from three epochs of the flanker task: cue-evoked (i.e., “bottom-up” visual stimulation), delay period-related (i.e., maintenance of location information in the absence of visual stimulation), and array-evoked (i.e., the selection of behaviorally relevant stimuli among distracters).
Spatial selectivity of cue-evoked HFB RFs
We first examined the spatial selectivity of event-related power fluctuations of the IFPs recorded from each electrode. A representative example of a response profile from an IFP evoked by cue stimuli is shown in Figure 2. The recording site was located in left dorsal V3 (cortical location shown in Fig. 3A, electrode E). Cue-evoked power modulations (50–250 ms after the cue onset) were compared with a baseline period (200 ms before cue onset). Averaged across all trials, a cue-evoked enhancement in power was observed across a broad band of high frequencies (30–200 Hz) with a concomitant suppression of power in a narrow band of lower frequencies (7–20 Hz; Fig. 2B), similar to typical profiles of IFP power fluctuations in response to visual stimuli previously reported in ECoG studies (Lachaux et al., 2005).
By examining power modulations relative to baseline as a function of time, we found that cue and array stimuli evoked a robust increase in the HFB power with a precise temporal profile marking the onset of the visual stimulation (Figs. 2C,D, top, 3E). In this report, we focus our analyses on modulations in HFB power between 70 and 200 Hz to exclude frequency bands that have been shown to have oscillatory properties such as gamma, beta, α or theta activity (Fries, 2009; Engel and Fries, 2010; Lisman and Jensen, 2013). However, control analyses on broadband activity that included gamma and beta frequency bands with the HFB responses yielded similar results. For the example electrode from dorsal V3, we sorted HFB responses in each trial based on cue location and found that the highest power was consistently evoked by the cue presented in positions 5 and 6 in the lower right quadrant (Figs. 2D, center, 3E, orange polar plot). Cues presented at locations further from the peak locations exerted continuously smaller HFB responses, thereby showing the typical profile of the cross section of a response field, which presents as a spatial tuning curve (Fig. 2D, right). Thus, the visually-evoked increases in HFB power recorded from this site were highly spatially specific, constituting a contralateral ECoG HFB response field. We defined the location that evoked the strongest HFB responses as the RFC (Fig. 3E, position 6) and the opposite field location as RFnull (Fig. 3E, position 13). It is noteworthy that trialwise responses for each cue position were reliable, with consistently stronger responses at RFC (412 ± 32% of baseline, bootstrap randomization test p < 0.001) and consistently weaker or absent responses at the opposite field location (RFnull, 0.4 ± 3% of baseline, p = 0.8).
Cue-evoked HFB responses showed a high degree of spatial specificity across cortex, both within topographic visual cortex and outside of topographic areas. We obtained distinct spatial profiles even from adjacent electrodes, as illustrated in Figure 3 for electrodes that were part of a strip with 10 mm spacing. In addition to the example V3d electrode (Fig. 3A, electrode E), three nearby electrodes with ECoG HFB response fields were implanted in areas IPS0 (Fig. 3A, electrodes B and C, separated by 10 mm), and in V3B (Fig. 3A, electrode D bordering V3A, separated from C and E by 10 and 14 mm respectively). We did not find HFB response fields in two other electrodes of this strip (Fig. 3A, blank circles). The peaks of the HFB response fields shifted from position 5, just below the right horizontal meridian (Fig. 3B) to position 3 in the top right quadrant (Fig. 3C) within IPS0, and from position 3 in the top right quadrant within V3B (Fig. 3D) to position 6 in the bottom right quadrant of V3d (Fig. 3E). This topographic pattern of peak responses reflects the visual field sign reversals of the underlying topographic maps (Konen and Kastner, 2008; Silver and Kastner, 2009; Arcaro et al., 2011; Wang et al., 2015). Thus, HFB responses reflected activity from spatially selective, local neuronal populations, and these signals did not appear to be compromised by volume conduction from more distant sites (Buzsáki et al., 2012), corroborating and extending previous reports on the specificity of HFB responses (Crone et al., 1998; Canolty et al., 2007; Parvizi et al., 2012). The spatial selectivity of HFB responses across the human visual system formed the basis for our quantitative analyses of the temporal dynamics and modulatory effects of selective attention on baseline and visually-evoked activity.
Next, we determined the spatial tuning properties of cue-evoked HFB responses based on the following criteria. First, for each recording site, we required responses to be visually selective such that cue-evoked HFB power increased significantly relative to baseline in response to at least one cue presentation location, as well as significant differences between cue-evoked responses at the preferred location (RFC) compared with the opposite location (RFnull). Second, we required that the response profile of the spatial tuning curve centered on RFC had a regular shape (i.e., a Gaussian fit centered on RFC explained at least 60% of the variance, and the tuning widths were <240° of visual angle). And third, to capture cue-evoked spatial tuning only (and not delay-related tuning), we determined whether the response onset latency at RFC was within 50–250 ms of cue onset (latencies are discussed in the following section).
Using these criteria, 45% of electrodes located in topographic areas exhibited spatially-tuned, cue-evoked responses (60/133) with a well defined response field. The vast majority of these had their RFC in the contralateral hemifield (58/60, 97%). Additionally, in ventral and dorsal parts of visual areas V1–V3, spatial tuning was predominantly limited to the respective upper and lower visual field quadrants. Eighty-two percent, or 9/11 of the dorsal sites had their RFC in the lower contralateral quadrant, and 2/2 of the ventral sites had their RFC in the upper contralateral quadrant. Of the recording sites outside of topographic visual areas, 12% exhibited spatially tuned, cue-evoked HFB responses (60/503), typically with their RFC contralateral to the implanted hemisphere (46/60, 77%). These sites were located in parietal (N = 35 selective, 27 with contralateral RFC), temporal (N = 14 selective, 12 with contralateral RFC), and frontal lobes (N = 11 selective, 7 with contralateral RFC). Except if noted otherwise, only the sites with a cue-evoked RF were included in further analyses.
Cue-evoked response onset latencies
We then examined the temporal dynamics of feedforward processing across the human visual system by analyzing HFB cue response onset latencies at RFC in topographic and nontopographic areas. We defined onset latency as the time to half peak of the power increase at RFC in response to the cue (Lee et al., 2007). For each recording site, we compared the mean time series of HFB power at RFC to a bootstrapped distribution of baseline means, finding the peak power in the cue interval that was greater than at least 99.9% of the bootstrapped baseline distribution. The onset latency was taken as the first time point at which the power was greater than half the peak. In the example area V3d electrode, the cue-evoked responses at RFC were highly consistent across trials and had a reliable onset latency of 59 ± 8 ms (Fig. 2D, top). As expected from monkey single-unit recording studies (Schmolesky et al., 1998), HFB latencies increased systematically across the ventral and dorsal processing pathways (Fig. 4; Tables 2, 3).
Response onset latencies increased along the dorsal pathway from early visual areas (V1-V3d/v mean = 73 ± 4 ms) to dorsal extrastriate areas (V3A/B and TO1–2 mean = 107 ± 9 ms, p < 0.05; Table 3 shows all area-wise comparisons) and IPS0 (106 ± 5 ms). IPS0 latencies were faster than those in more anterior IPS areas. Response onset latencies in the ventral pathway were quite long, with ventral extrastriate area responses (mean = 149 ± 5 ms) on the order of those in the anterior IPS, and slower than in dorsal extrastriate and posterior IPS. These findings were not only observed in the population data, but they were remarkably consistent across the four individual patients with extensive electrode coverage of the visual system (results not illustrated). Interestingly, as in previous monkey studies (Schmolesky et al., 1998), area FEF had a fast latency of 62 ± 5 ms, on the order of the population latencies in early visual cortex. This fast latency likely reflects projections from the superior colliculus that bypass the cortex. Although this latency was obtained from only two sites, these fast latencies were quite consistent (Fig. 4C), and they were recorded from two patients (S3 and S6). Conduction delays between subsequent processing stages along the dorsal pathway were estimated to be on the order of ∼15 ms by examining the progression from V1-V2-V3-V3A-IPS0 (Table 2). In nontopographic sites, response onset latencies in the frontal (84 ± 5 ms), parietal (100 ± 3 ms), and temporal lobes (109 ± 5 ms) were slower than early visual areas and faster than the anterior topographic IPS and ventral extrastriate areas (Fig. 4C).
To determine whether the cue onset latencies were biased by particular stimulus properties such as shape and size, we also compared array onset latencies of trials in which attention was not at RFC (the attend-to-RFnull condition) to the cue onset latencies, and found no differences in latencies across the topographic areas (t test, p = 0.6). Thus, response onset latencies did not appear to depend on the different stimulus configurations used in our study.
Together, the temporal dynamics of cue-evoked responses along the dorsal and ventral visual pathways were consistent with the notion of a hierarchical feedforward architecture of visual processing.
Attentional modulation effects and their topography
To determine dynamic task-related modulations of visual processing and probe feedback effects, we examined attention effects on baseline activity in the absence of visual stimulation (i.e., during the delay) and in response to the array by comparing responses from trials when attention was allocated at RFC to trials when attention was allocated at RFnull, similar to approaches typically taken in monkey physiology studies (Reynolds and Chelazzi, 2004). First, we characterized the different types of attentional modulation and their topography across the human visual system and nontopographic cortex. The vast majority of attention effects were enhancement of HFB responses during the delay and in response to the array, as shown for an example electrode located in area TO and for the TO population response in Figure 5 (top). Such enhancement effects were not only observed at RFC, but typically had a spatial extent that was similar to the cue-evoked HFB RF, as can be seen in the examples shown in Figure 3 (modulation of array-evoked responses, solid purple plot; modulation of responses during delay, dashed purple plots). Collectively, the modulation at the different spatial locations relative to the response at RFnull gave rise to an attention field. Similarly, response enhancement during the delay was spatially tuned and gave rise to a memory field (see section on spatial tuning for further results).
Attention and memory fields were observed in many extrastriate sites but were markedly absent in early visual cortex (Fig. 6), especially during the delay. Of the sites in early visual cortex that had a RF, only one site showed significant attentional modulation during the delay (Ndelay = 1/13, 8%). Ventral extrastriate areas also had a low proportion of sites with a significant delay enhancement effect (Ndelay = 2/11, 18%). In comparison, in dorsal extrastriate and IPS areas ∼50% of sites showed significantly enhanced delay activity (dorsal extrastriate: Ndelay = 5/11, 45%; IPS0-2: Ndelay = 6/12, 50%; IPS3–5 and SPL1: Ndelay = 6/11, 54%). Among nontopographic areas, 20% of the sites that showed cue-evoked spatial tuning exhibited significant modulation of activity during the delay (Ndelay = 12/60). Early visual areas also had relatively few sites with a significant effect of attention in response to the array (Narray = 5/13, 38% in the late array period) compared with dorsal extrastriate areas and posterior IPS, which had a high proportion with a significant attentional enhancement during the late array period (V3A, V3B, TO1-2: Narray = 7/11, 64%; IPS0-2: Narray = 7/12, 58%).
It is notable that the topography of attentional enhancement effects during the delay and in response to the array was not identical (Fig. 6, red areas). In particular, although ventral extrastriate areas LO/VO had a low proportion of sites that showed significant enhancement during the delay (18%), these areas had a majority of sites showing an enhancement effect in response to the array (Narray = 7/11, 64%). In nontopographic parietal areas, only 15% of sites showed enhanced delay activity (N = 9/60), whereas 40% exhibited attentional enhancement in response to the array (N = 24/60). Conversely, although anterior IPS areas IPS4+ had a high proportion of sites with a significant effect during the delay (54%), it had only a few sites with significant enhancement in response to the array (Narray = 3/11, 27%). Thus, only dorsal extrastriate areas and posterior IPS had a majority of sites enhanced by attention during both the delay and in response to the array (dorsal extrastriate: delay 45%, array 64%; IPS0–2: delay 50%, array 58%).
We also observed attentional suppression effects during the delay or in response to the array, albeit less frequently (Fig. 6, green areas). The example electrode shown in Figure 5 (middle, left) was located in V1 and showed a reduction of ∼50% in HFB responses to the array when attention was directed to RFC compared with RFnull. Attentional suppression has been previously observed in monkey physiology studies as a decrease of LFP power and spike-field coherence in gamma frequency bands (40–60 Hz; Chalk et al., 2010). Given that we used an array of stimuli it is likely that inhibitory center-surround interactions and top-down influences contributed to these effects (Ito and Gilbert, 1999; Angelucci et al., 2002; Bair et al., 2003; Ozeki et al., 2009; Zhang et al., 2014; Cox et al., 2017). A similar result was obtained for the population of V1d/v sites, with an overall suppression effect of ∼10% in response to the array (Fig. 5, middle, right). Attentional suppression effects were also found in IPS areas (Fig. 6, green areas). Interestingly, array-related suppression in IPS could be observed with elevated delay activity, as shown in Figure 5 (bottom left) for an electrode located in area IPS3 (for sites with such effects, see Fig. 6A,B, blue arrows). Because both array-related attentional enhancement and suppression effects were found in this area, no net effect of modulation resulted in the population response (Fig. 5, bottom right; mean = 5 ± 6% enhancement, bootstrap randomization test p = 0.06).
Strengths of attentional modulation effects
Hierarchical top-down models assume modulatory attention effects to reverse the bottom-up processing hierarchy. One prediction of such a model is that effects of attention are stronger at advanced compared with early stages of visual processing. Therefore, we probed the strengths of modulatory effects across the human visual system as well as in nontopographic cortex. We quantified the attention effects obtained during the delay and in response to the array using a MI (defined as the difference between the mean power in attend-to-RFC and attend-to-RFnull conditions, normalized to the maximum response). The MI therefore calculates the modulation effect as the proportion of the maximum HFB response. We calculated the MI for the delay period (200 ms before array onset, only including trials with cue-target intervals >450 ms to capture attention effects that were not contaminated by cue-evoked responses), early array (50–200 ms), and late array period (300–500 ms). Positive values indicate enhancement effects (Fig. 6, red) and negative values indicate suppression effects (Fig. 6, green).
To compare the effects of attention between areas, we separately generated bootstrapped distributions of MIs using the population of sites with either enhanced or suppressed effects in each area. Importantly, sites were not assigned to those groups based on any measure of significance, but strictly based on whether their MI was positive or negative. Statistical analyses for each area were then performed on the population means of each of those groups. During the delay, we found significant enhancement effects of attention in dorsal and ventral extrastriate areas (V3A, TO1-2, LO1-2, hV4, VO1-2), as well as in IPS areas IPS0-3 (bootstrap randomization test, each p < 0.001; Figs. 6A, red areas, 7A). No significant enhancement effects were found in early visual areas V1d/v, V2d/v, or V3d/v, nor in dorsal extrastriate area V3B or anterior IPS areas IPS4-5, SPL1, and FEF (each p ∼ 0.1; Figs. 6A, red areas, 7A). Of the areas with a significant effect, V3A (MIdelay = 12 ± 8%, N = 3) and LO (MIdelay = 20 ± 14%, N = 8) showed weaker modulation during the delay than IPS areas (IPS0: MIdelay = 23 ± 8%, N = 8; IPS1–2: MIdelay = 28 ± 16%, N = 2; IPS3: MIdelay = 24 ± 8%, N = 7) and dorsal extrastriate area TO (MIdelay = 37 ± 17%, N = 3). The significance of each comparison is shown in Table 4.
Outside visual topographic cortex, we found significant population enhancement effects in parietal (MIdelay = 34 ± 10%, N = 23), frontal (MIdelay = 33 ± 16%, N = 7), and temporal lobes (MIdelay = 25 ± 16%, N = 8; each p < 0.001; Figs. 6A, red areas, 7A), with modulatory effects similar in strength to higher-order topographic areas (Table 4).
Of the sites with a negative MI, attention significantly suppressed HFB power modulations in the population of V1d/v (MIdelay = 11 ± 7%, N = 3), ISP4+ (MIdelay = 33 ± 27%, N = 1), and nontopographic temporal lobe sites (MIdelay = 21 ± 18%, N = 6; bootstrap randomization test, all p < 0.001; Figs. 6A, green areas, 7A). Notably, although the positive effects were not always significant in these areas, when we examined the effect across all sites in each area we found no overall effect of attention during the delay (V1d/v: p = 0.06, IPS4+: p = 0.4, nontopographic temporal sites: p = 0.8).
Next, we investigated attentional modulation of array-evoked activity. Attention effects can typically be observed in later time windows, since the feedforward cascade of visual stimulation strongly activates sites within the visual system regardless of whether they are attended to or not. For the time period of 300–500 ms after array onset, we found significant positive modulation effects in early, dorsal and ventral extrastriate visual areas (p < 0.001), as well as consistently strong effects in IPS areas (p < 0.001; Fig. 7C). The strength of the modulation generally increased across the cortical hierarchy through IPS0, with the weakest modulation in early visual areas (V1d/v MIarray = 15 ± 7%), and the strongest modulation in dorsal extrastriate area TO (MIarray = 40 ± 14%), ventral extrastriate areas LO/VO (MIarray = 40 ± 7%), and posterior parietal area ISP0 (MIarray = 53 ± 7%; Figs. 6B red areas, 7C; significance of all comparisons shown in Table 4). Interestingly, the anterior IPS areas were as weakly modulated as early visual area V1d/v (IPS4+ MIarray = 16 ± 10%; Fig. 7C; Table 4). We also observed significant suppression in areas V1d/v and IPS3 (MIarray = −27 ± 9% and −25 ± 14%, respectively; Fig. 7C), which were the only areas with this effect either across the population or from individual sites (sites with significant array suppression in V1d/v: Narray = 2 from patient S1; IPS3: Narray = 1 from S5). In contrast, during the early array period, when attention effects and visual onset activity interact, only topographic areas TO, IPS0–2, and LO/VO were significantly modulated (bootstrapped mean ± 95% CI, TO MIarray = 41 ± 14%; IPS0 MIarray = 14 ± 7%; IPS1–2 MIarray = 22 ± 14%; LO/VO MIarray = 18 ± 8%; Fig. 7B).
In summary, TO, IPS0–2, and LO/VO exhibited stronger attentional modulation effects than early visual and anterior IPS areas both during the delay and in the late array window, and these were the only topographic areas that were significantly modulated during their early response to the array. Although the stronger attention effects in extrastriate and posterior parietal cortex relative to early visual cortex are consistent with hierarchical top-down models of attention, the weak or absent attention effects in the anterior IPS and frontal cortex, particularly during visual processing, are in conflict with such models.
Attentional modulation latencies
Just as the temporal order of visual onset responses informs about the temporal dynamics of feedforward visual processing, the timing of selective processing after the array onset provides insight into the temporal dynamics of feedback attentional modulation. Hierarchical top-down models predict that the latencies of attentional modulation systematically increase from advanced to early processing stages as a further indication for a reversal of the processing hierarchy during attentional selection. To determine the latency of attentional modulation after array onset, we examined the population time courses of each area sorted by modulation effects (i.e., enhancement or suppression based on each sites' MI in the late array window). First, we determined which time points showed a significant effect of attention in response to the array (attend-to-RFC > attend-to-RFnull, bootstrap randomization p < 0.05). Then, we identified clusters of consecutive significant time points after array onset that lasted for at least 50 ms (Maris and Oostenveld, 2007). The first time point in the first cluster of significant ones after array onset was defined as attentional modulation latency (see Materials and Methods for more details). To compare latencies across areas, we generated bootstrapped distributions of attentional modulation latencies by resampling 500 times, with replacement, from trials in each condition by site and recalculating the latency based on that set of trials. We determined whether two areas had significantly different latencies by comparing the population means of the distributions, then applying Holm's sequential Bonferroni correction for multiple comparisons at α level p < 0.05 across all the comparisons. The results are summarized in Tables 2 and 3.
Consistent with the idea that feedback signals are generated in higher-order cortex and modulate early sensory processing areas via corticocortical feedback, we found that modulation latencies were longest in early visual cortex (Figs. 8, 9; Tables 2, 3). Modulation latencies were slowest in V1d/v (late component, 315 ± 33 ms), followed by V2d/v (295 ± 16 ms), V3d/v (233 ± 12 ms), V3A (246 ± 8 ms), and V3B (268 ± 22 ms; significance of all comparisons shown in Table 3). The attentional modulation latencies in posterior IPS (IPS0: 156 ± 18 ms, IPS1–2: 119 ± 22 ms), dorsal extrastriate area TO1–2 (129 ± 3 ms) and ventral extrastriate areas LO/VO (172 ± 7 ms) were significantly faster than those in early visual areas. However, the modulation latency in area IPS3, located anterior to IPS0–2, was significantly longer than the latencies in the posterior IPS and on the order of latencies in early visual areas (IPS3 latency = 225 ± 11 ms; Figs. 8, 9; Tables 2, 3). Although the more anterior IPS areas of IPS4+ had significant modulation effects (Fig. 7C), the responses were not robust across trials and sites, yielding <50% of bootstrapped time series with a significant modulation effect. Therefore, the latencies calculated in this area were not considered significant (see Materials and Methods). However, it is worth mentioning that the trend of increasing latencies through the higher-order IPS areas continued in IPS4+: of the bootstrapped time series where we were able to determine a modulation effect, the latency was even slower than IPS3 and on the order of the slow V1d/v effects (latency = 352 ± 22 ms from 41% of the bootstrapped time series). Further, modulation latencies could not be determined in FEF due to the absence of modulation effects (Fig. 7). Thus, the pattern of attentional modulation latencies did not strictly follow the concept of top-down feedback from higher to lower order cortex, with the fastest latencies found instead in intermediate areas of the processing streams.
In V1d/v, we found that the distribution of modulation latencies was bimodal, reflecting two components (Fig. 9B, red traces). A fast component indicated the effect of attention as early as 80 ms in V1 after array onset (83 ± 9 ms; Fig. 8, V1), which was the fastest effect of attentional enhancement that we observed across all areas. Although these responses are too fast to reflect corticocortical feedback modulation, they are consistent with the very fast attention latencies reported in LGN magnocellular populations (McAlonan et al., 2008), suggesting that a feedforward attentional modulation may be passed onto V1 from LGN. We also measured the response onset latencies of the suppression effects (Fig. 7C). In V1d/v, the suppression effect was even earlier than the fast component of the enhanced responses (68 ± 7 ms, bootstrap randomization p < 0.01). The suppression effects in IPS3 sites with a negative MI were late (265 ± 142 ms), on the order of the late enhancement effects found in IPS3 (p = 0.7).
Outside the topographic areas, parietal lobe sites had fast modulation latencies similar to those observed in IPS0 and IPS1/2 (124 ± 7 ms), and temporal lobe sites had modulation latencies on the order of those in ventral extrastriate areas (223 ± 16 ms).
In a further test of the effect of attention on response onset latencies, we examined whether array onset responses were faster with attention. Previous studies of response onset latencies in extrastriate cortex of macaques had found a small, but consistent lag in response to ignored stimuli (Sundberg et al., 2012). However, we did not observe any systematic increases or lags in onset latencies with attention across the topographic areas (t test, p ∼ 0.6).
Spatial tuning of response, memory, and attention fields
Although our task was not designed to probe spatial tuning properties systematically and in detail (e.g., such as a function of eccentricity), we examined spatial tuning properties at a fixed peripheral eccentricity (i.e., 7°, which was the constant eccentricity at which the cue was presented) across the human visual system as well as outside of topographic visual cortex. Across all recording sites in each area that exhibited cue-evoked, spatially-tuned HFB response fields, we determined the population HFB spatial tuning curves (Fig. 10A), and the population widths at half-height of the tuning curves (Fig. 10B), as well as their individual distributions by area (Fig. 10C; see Materials and Methods for further details). We compared the tuning widths between the areas by generating bootstrapped distributions of mean tuning widths in each area after resampling, 500 times, from trials in each condition. The significance of the differences between these bootstrapped distributions was determined by applying the Holm–Bonferroni sequential correction for multiple comparisons at the target α level of p < 0.05 (see Materials and Methods).
As expected from a wealth of fMRI studies in humans and electrophysiology studies in monkeys (Felleman and Van Essen, 1987; Dumoulin and Wandell, 2008; Wandell and Winawer, 2015), spatial tuning widths increased systematically across both the dorsal and ventral visual processing pathways (Fig. 10; Table 5). This progression was apparent in the population data (Fig. 10B), as well as in the distribution of tuning widths from individual recording sites (Fig. 10C). Early visual areas V1-V3d/v had significantly narrower tuning widths (mean = 9.5 ± 0.1°, N = 13) than dorsal extrastriate areas (V3A/B and TO1–2 mean = 13.1 ± 0.2°, N = 12; p < 0.05; Table 5 shows all area-wise comparisons) and ventral extrastriate areas (LO1–2, hV4, and VO1–2 mean = 13.2 ± 0.4°, N = 11). Dorsal and ventral extrastriate areas were in turn more sharply tuned than posterior and anterior IPS areas (IPS0–2 mean = 15.5 ± 0.5°, N = 12; IPS3–5 and SPL1 mean = 15.8 ± 0.6°, N = 11). Tuning widths of areas along the IPS were comparable. Nontopographic sites had tuning widths similar to higher-order topographic areas, with parietal lobe sites' tuning widths on the order of the topographic IPS sites (mean = 14.4 ± 0.2°, N = 35), and temporal lobe sites' widths comparable to the dorsal and ventral extrastriate sites (mean = 11.9 ± 0.7°, N = 14).
We also determined the spatial tuning widths during the delay period (memory field) and in response to the array (attention field). At individual sites, a general broadening of the attention fields relative to the cue-evoked response fields was observed (Fig. 3B–E, purple compared with orange polar plots). At the population level, we investigated the effect of attention on the response field widths by examining the population of sites in each area that had a significant population enhancement effect (Fig. 7, sites from areas with a significant positive MI). We generated trial-wise bootstrapped distributions of mean memory and attention fields, from which we calculated the widths during the delay and in the late array window. We found that memory fields were significantly broader than response fields in TO1–2 and ventral extrastriate areas (increase of 2.7 ± 0.3% and 7.5 ± 0.5%, respectively; bootstrap randomization test p < 0.001), as well as the nontopographic areas (mean increase = 8.6 ± 0.5%, p < 0.01). In contrast, we did not find significant differences in response and memory field widths in area V3A, nor the IPS areas IPS0-3 (p ∼ 0.1; Fig. 11A).
All of the topographic areas that were significantly enhanced by attention (Fig. 7C, positive MI) showed increased attention field widths relative to their respective cue-evoked RFs (Fig. 11B). The effect was remarkably similar across the topographic areas, suggesting a global effect of attentional modulation on visual space, consistent with a recent fMRI study (Klein et al., 2014). Except for areas V1d/v and V3d/v, which had spatial attention field widths ∼8% broader than their response fields, all other topographic areas and the nontopographic sites showed broadening of spatial attention tuning widths on the order of 3% (mean increase = 2.6 ± 0.3%, all p < 0.001). Such broadening may be due to expansion of RFs, as observed in single neurons when attention is allocated next to the RF (Anton-Erxleben et al., 2009). At the same time, there is also evidence that RFs shrink in extent when attention is allocated (Womelsdorf et al., 2006). Reconciling these contradictory observations with our findings may imply that, at the IFP spatial scale, the overall effect appears to be broadening of the IFP RF because of the many contributing individual neurons' RFs expanding and only a smaller number of individual neurons' RFs shrinking. Such broadening appears to occur only in response to visual stimuli, since we did not observe the same effect for memory fields. Relative to the memory field widths, attention fields were broader in areas TO1–2, IPS1–2, and IPS3 (p < 0.001), narrower in ventral extrastriate areas LO/VO and nontopographic areas (p < 0.001), and similar in V3A and IPS0 (p = 0.4; Fig. 11C).
Discussion
We analyzed HFB responses from intracranial recordings of 626 electrodes implanted in 8 epilepsy patients, who performed a spatial attention task, to characterize a dynamic visual processing architecture, modulated by attentional task demands, in the human brain. Electrode locations were reconstructed using a probabilistic atlas of the human visual system (Wang et al., 2015). HFB responses showed high spatial selectivity and tuning, constituting ECoG RFs that were found within and outside the topographic visual system. Both RF widths and onset latencies increased systematically across the visual processing hierarchy. We used the spatial specificity of ECoG responses to quantitatively study spatial attention effects on baseline and visually-evoked activity. Attention effects were stronger, and attention modulation latencies were shorter, in extrastriate, and posterior parietal cortex than in early visual cortex. However, attention effects in anterior IPS and frontal cortex were weaker, and modulation latencies in anterior IPS were longer, than in posterior IPS. Together, the temporal dynamics and modulatory effects of spatial attention revealed in these studies only partially support attentional top-down models that assume a reversal of the visual processing hierarchy.
The electrophysiological basis of HFB responses is still an area of active investigation. HFB power fluctuations have been shown to correlate with multiunit activity from large populations of neurons in the vicinity of the recording electrode (Ray et al., 2008a; Ray and Maunsell, 2011; Rich and Wallis, 2017; Watson et al., 2018). More recent findings indicate CA+ dendritic spikes in supragranular cortex as a principle contributor to pial HFB responses (Leszczyñski et al., Unpublished observations). However, models of HFB responses have also shown that power increases are predicted by increases in neuronal synchronization (Ray et al., 2008a). The underlying firing patterns may consist of multiple band-limited neuronal oscillations at different peak frequencies within the gamma band (Crone et al., 2011). Thus, it is possible that HFB responses index to some extent neuronal synchronization. We used the high spatial and temporal precision of HFB responses to track the temporal dynamics of visual and attentional processing.
Spatial specificity of ECoG response fields
Similar to previous reports from human early visual cortex (Yoshor et al., 2007; Winawer and Parvizi, 2016) and monkey visual cortex (Bosman et al., 2012), we found spatially confined ECoG RFs based on cue-evoked HFB responses. The spatial configurations of the RFs reflected the visual field representations of the underlying maps that are known from fMRI studies (Konen and Kastner, 2008; Silver and Kastner, 2009; Arcaro et al., 2011; Wang et al., 2015). Remarkably, electrodes that were located as little as 1 cm apart showed visual field sign reversals along the horizontal meridian with RF peaks in the upper and lower quadrants, respectively, underlining the impressive specificity of HFB responses, shown in several other domains (Crone et al., 1998; Canolty et al., 2007; Parvizi et al., 2012; Daitch et al., 2016). Interestingly, a large proportion of electrodes with ECoG RFs was found outside topographic cortex, equally distributed across the major lobes. The identification of spatially-selective, but relatively isolated sites outside of visual maps is difficult with techniques such as MEG/EEG and fMRI, which have a poor signal-to-noise ratio. Thus, spatially selective responses appear to be surprisingly ubiquitous outside of the topographic visual system.
Temporal dynamics of feedforward processing
This is the first report of systematic HFB response onset latencies across the human visual system (for LFP onset latencies, see Yoshor et al., 2007). Onset latencies increased gradually across the dorsal processing pathway, where we had systematic coverage, with estimated conduction delays of 10–15 ms between areas. Responses in V1 were recorded as fast as 50 ms after stimulus onset. In general, these results are in excellent agreement with monkey physiology studies (Schmolesky et al., 1998). Notably, we also found extremely short latencies in FEF that were comparable to the onset latencies in early visual cortex. However, a few of our findings were not predictable from what is known about the monkey visual system and therefore may be unique features of the human visual system. First, in humans, onset latencies in TO (the human MT/MST complex) were well >100 ms and significantly longer than those in other dorsal extrastriate areas such as areas V3d/v, or V3A. In contrast, in the monkey, onset latencies in these areas are typically shorter and similar to one another (∼70 ms; Schmolesky et al., 1998; but see large range shown by Raiguel et al., 1989 and Azzopardi et al., 2003). Second, onset latencies between higher-order dorsal and ventral extrastriate areas, IPS1–4 and LO/VO were similar in humans. In contrast, onset latencies between dorsal and ventral higher-order cortex differ significantly in monkeys due to the relatively greater magnocellular input to the dorsal pathway. For example, neurons in LIP respond to shape stimuli with a latency of ∼60 ms, whereas neurons in anterior inferotemporal cortex will respond after ∼100 ms (Lehky and Sereno, 2007). This discrepancy, as well as the longer latencies in TO, may be attributable to the greater capacity of the human dorsal pathway to represent shape and object information (Konen and Kastner, 2008; Freud et al., 2016; Kastner et al., 2017). FMRI studies have shown that the human ventral and dorsal visual pathways represent nonspatial shape and object information similarly (Konen and Kastner, 2008), and thus the human dorsal pathway must receive a relatively greater input from the slower parvocellular system compared with the monkey dorsal pathway, which in turn might explain the longer onset latencies in TO and IPS. Despite these notable human-specific features in the dynamics of feedforward processing, as indexed by response onset latencies, our results provide strong support for a hierarchical visual processing architecture in the human brain.
Spatial attention effects and modulation latencies
The temporal dynamics and strengths of attentional modulation have been interpreted as evidence in support of a top-down feedback model of selective attention. Specifically, monkey physiology studies have shown that attentional modulation latencies were shorter and the strength of attentional modulation was greater in higher-order cortex than in lower-order cortex. For example, Buffalo et al. (2010) recorded from areas V1, V2, and V4 and found that attention effects reversed modulation strengths and temporal order such that attentional enhancement was found to be larger and earlier in V4 and smaller and later in V1, with V2 showing intermediate results, similar to earlier findings by Mehta et al. (2000). These studies have provided support for the idea of a backward propagation of attentional feedback signals across the visual processing hierarchy.
We found widespread spatially-selective attention effects on HFB responses both on baseline activity during the delay and in response to the array, thereby corroborating previous ECoG studies on selective sensory processing (Ray et al., 2008b; Szczepanski et al., 2010; Davidesco et al., 2013; Zion Golumbic et al., 2013). In accordance with a large body of literature from monkey physiology (Luck et al., 1997; Cook and Maunsell, 2002) and human brain imaging (O'Connor et al., 2002; Siegel et al., 2008), attentional modulation was generally stronger in higher-order compared with lower-order areas.
Specifically, our recordings focused on a multitude of areas along the dorsal processing pathway. We found indeed a systematic “backward propagation” in early visual cortex, from areas V3 to V2 and V1 with increasingly longer attentional modulation latencies, and these latencies were also significantly longer than those obtained in dorsal extrastriate cortex. However, the temporal dynamics in dorsal extrastriate and posterior parietal cortex were more complex. For example, area TO and IPS0 had significantly faster latencies than IPS3. Thus, these modulation latencies did not appear to follow a strictly hierarchical processing that was reversed during spatial attention and they do not lend unequivocal support for the top-down feedback model. However, our assumptions on the visual processing hierarchy along the human dorsal pathway can only be tentative. Based on the anatomical locations of areas, one would assume that TO projects to and receives feedback from the IPS areas, and the same would hold for the posterior relative to the anterior IPS areas, but detailed anatomical studies on structural connectivity are lacking. Connectivity, both structurally and functionally, may be increasingly more divergent in higher-order cortex, thereby promoting parallel rather than hierarchical processing. For example, anterior IPS shows grip- and reach-related activations (Konen et al., 2013) as well as representations of tool and manipulable object information (Mruczek et al., 2013). Further, posterior IPS, but not IPS3–5, has been reported to interact with other frontoparietal attention areas, like FEF and supplementary eye field, in visuospatial attention tasks (Szczepanski et al., 2013). Anterior IPS may thus contribute to a different network than posterior IPS, which may predominantly serve visuospatial attention and oculomotor functions. Interestingly, based on analyses of directed feedforward and feedback signaling indexed by synchronization in certain frequency channels, Michalareas et al. (2016) placed the anterior IPS areas below the posterior IPS areas in their functional hierarchy, which is further evidence for the more complex inter-areal dynamics during attentional processing particularly in human parietal cortex. Further, it is noteworthy that cortical network interactions are influenced by additional sources such as thalamic nuclei, which complicates the interpretation of temporal corticocortical interactions (for an extensive discussion of alternative attention control models, see Halassa and Kastner, 2017).
Attentional modulation in V1
Attention effects on array-evoked activity were moderate in early visual cortex. Both enhancement and suppression effects were found in V1, without a net effect of attention. The strongest attention effect that we obtained in V1 was attentional suppression, likely due to modulation of activity in extra-RF surrounds. These findings are consistent with previous monkey physiology studies that have shown attention-related decreases in LFP gamma power in the 40–60 Hz frequency band and spike-field coherence in V1 using stimuli that engaged suppressive extra-RF surrounds (Chalk et al., 2010), as well as with findings of attention-related increases of LFP gamma power when extra-RF surrounds were less stimulated (Bosman et al., 2012). Thus, it is possible that HFB responses also reflect neuronal synchronization processes, because attention-related modulation of spiking activity is typically moderate (Motter, 1993; Luck et al., 1997; McAdams and Maunsell, 1999; Grunewald et al., 2002; Marcus and Van Essen, 2002; Yoshor et al., 2007).
Interestingly, we also found evidence of attentional feedforward modulation in V1, where three modulatory temporal components were found, two early components that were observed at array onset of attentional suppression and enhancement, and a late component that was observed with attentional enhancement and followed the top-down feedback model, discussed above. In monkey physiology studies, attentional feedforward modulation has been found in LGN and thalamic reticular nucleus (TRN; McAlonan et al., 2008). This modulation may be mediated through direct influences of prefrontal cortex on the TRN that bypass corticocortical feedback, as shown in the mouse model (Wimmer et al., 2015). The feedforward modulation observed in LGN-TRN may be passed on to V1 and thus account for our observations. In human EEG studies, attention effects on the earliest component (the “C1”; ∼50 ms onset) that is typically attributed to a generator in striate cortex have been controversial (Martínez et al., 1999; Di Russo et al., 2003; Kelly et al., 2008). Our findings of two early components support the possibility that the earliest EEG component may be modulated by spatial attention.
Footnotes
This work was supported by the National Institute of Mental Health Conte Center Grant 1P50MH109429 (S.K., R.T.K., and J.P.), the National Institute of Mental Health Grants R01MH064043 (S.K. and R.T.K.), R01MH109954 (J.P.), and R01MH110311 (Y.B.S.), the National Institute of Neurological Disorders and Stroke Grant R01R37NS21135 (R.T.K.), and the James S. McDonnell Foundation 21st Century Science Initiative, Understanding Human Cognition, collaborative Grant (S.K. and R.T.K.). We thank Michael Arcaro for help with implementing the probabilistic atlas.
The authors declare no competing financial interests.
- Correspondence should be addressed to Dr. Sabine Kastner, Princeton Neuroscience Institute and Department of Psychology, Washington Road, Princeton, NJ 08544. skastner{at}princeton.edu