Abstract
The perception of dynamic visual stimuli relies on two apparently conflicting perceptual mechanisms: rapid visual input must sometimes be integrated into unitary percepts but at other times must be segregated or parsed into separate objects or events. Though they have opposite effects on our perceptual experience, the deployment of spatial attention benefits both operations. Little is known about the neural mechanisms underlying this impact of spatial attention on temporal perception. Here, we record magnetoencephalography (MEG) in male and female humans to demonstrate that the deployment of spatial attention for the purpose of segregating or integrating visual stimuli impacts prestimulus oscillatory activity in retinotopic visual brain areas where the attended location is represented. Alpha band oscillations contralateral to an attended location are therefore faster than ipsilateral oscillations when stimuli appearing at this location will need to be segregated, but slower in expectation of the need for integration, consistent with the idea that α frequency is linked to perceptual sampling rate. These results demonstrate a novel interaction between temporal visual processing and the allocation of attention in space.
SIGNIFICANCE STATEMENT Our environment is dynamic and visual input therefore varies over time. To make sense of continuously changing information, our visual system balances two complementary processes: temporal segregation in order to identify changes, and temporal integration to identify consistencies in time. When we know that a circumstance requires use of one or the other of these operations, we are able to prepare for this, and this preparation can be tracked in oscillatory brain activity. Here, we show how this preparation for temporal processing can be focused spatially. When we expect to integrate or segregate visual stimuli that will appear at a specific location, oscillatory brain activity changes in visual areas responsible for the representation of that location. In this way, spatial and temporal mechanisms interact to support adaptive, efficient perception.
Introduction
Spatial attention improves our ability to resolve static images, and the neural mechanisms underlying this benefit have been deeply investigated (Desimone and Duncan, 1995; Moore and Zirnsak, 2017). However, real-world stimuli are commonly characterized by their temporal dynamics, and it is less clear how attention impacts the neural processing of stimuli that change over time (Nobre and van Ede, 2018). The perception of dynamic visual input relies on two apparently opposing functions: rapid sequential stimuli must sometimes be integrated to form unitary percepts, but at other times must be segregated or parsed into separate objects and events, and behavioral research has shown that spatial attention can benefit both these operations. Stimuli appearing at cued locations are therefore better segregated when this is required by the task (Hein et al., 2006; Sharp et al., 2018), but better integrated when this is useful (Akyürek et al., 2007, 2008; Sharp et al., 2018; Hochmitz et al., 2021), even when the cue provides no implicit information about stimuli timing (Sharp et al., 2019). It is striking that spatial attention can flexibly benefit both segregation and integration, given that these operations have entirely opposite influences on perception, and we know little about how this might be implemented in the brain.
One possibility is that the deployment of attention has an impact on temporal processing in retinotopic cells in visual cortex. Cells in striate and extrastriate visual areas tend to have a spatial organization, responding to stimulation within specific areas of the retina (Hubel and Wiesel, 1962), and attentional benefits on the representation of static stimuli are largely implemented through effects on activity in these cells. For example, the deployment of attention shrinks the effective receptive field of retinotopic cells such that they become selective for a smaller area at the attended location and therefore carry more specific information (Womelsdorf et al., 2008). The deployment of attention may have an analogous impact on temporal processing, shrinking or stretching the temporal scope over which retinotopic cells summarize visual input.
Consistent with this idea, temporal expectation has an impact on oscillatory α-band activity (7–14 Hz) recorded over posterior visual cortex, where retinotopic cells are located. Individual differences in the average α rate predict the likelihood that a participant will report two sequential flashes as a single event (Samaha and Postle, 2015; although see Buergers and Noppeney, 2022), and manipulation of average α rate, either by sensory entrainment (Ronconi et al., 2018) or stimulation (Cecere et al., 2015; Minami and Amano, 2017; Mioni et al., 2020), has an impact on behavior that suggests a stretching or shrinking of the perceptual window. The effect of attention on α appears strategic and cognitively accessible: if participants are cued to segregate or integrate stimuli, average α rate immediately before stimulus onset will become faster when segregation is required and slower when integration is required (Wutz et al., 2018). However, existing results have not demonstrated that this neural implementation of temporal expectation interacts with the influence of spatial attention.
Here, we test the hypothesis that the impact of spatial attention on temporal processing is instantiated in part through effects on α frequency in retinotopic visual cortex. We recorded magnetoencephalogram (MEG) while participants completed a task requiring them to integrate or segregate sequential visual stimuli that appeared at cued locations. The need for integration or segregation was manipulated across blocks, such that participants knew what was required of them before the stimuli appeared. Our expectation was that average α rate should be faster contralateral rather than ipsilateral to a cued location when participants were prepared to segregate visual input, but slower when participants were prepared to integrate.
Materials and Methods
Participants
Twenty-nine healthy participants (11 male; age 24 ± 2.7 years, mean ± SD) gave informed consent before completing the experiment. All had normal or corrected-to-normal vision. Participants provided consent in accordance with the Declaration of Helsinki and approval for the study was granted by the ethics committee of the University of Trento.
Task structure
The stimuli and task were generated with the Psychophysics Toolbox (Brainard, 1997) in MATLAB (MathWorks). Using a digital-light-processing (DLP) projector (PROPixx, VPixx Technologies Inc.), stimuli were presented at 120 Hz onto a translucent screen (projected screen size 510 × 380 mm) in a dimly lit, magnetically shielded room at a viewing distance of 1 m. The timing of stimulus presentation was recorded with a photodiode placed on the lower right corner of the projection screen and used to correct the delay between trigger and stimulation onset.
The trial structure is shown in Figure 1. A small red “X” (0.2° visual angle) acted as the fixation cross and sustained throughout each trial. At the beginning of 75% of the trials, one of the arms of this cross changed from red to green to cue the quadrant of the visual field where the target was likely to appear. In the remaining 25% of trials the cue was neutral, with the tips of all four arms of the cross changing color (such that roughly the same number of pixels changed from red to green as in the cue condition). When present, the spatial cue was valid 75% of the time and participants were explicitly informed of this.
After a jittered cue interval of 850–1350 ms (randomly selected from a rectangular distribution), the fixation cross became entirely red and the first display appeared on screen for 16.67 ms. This display had circles at seven locations on a four by four grid (see Fig. 1). Each circle was formed from two arc elements so that the gaps in the circle defined a polar orientation randomly selected to be between 45° and 315°. Each complete circle was 1.2° (visual angle) in diameter, the grid of possible locations measured 8.4° by 8.4° (visual angle), and one position in the display contained only a single arc, defining a half circle.
Following a fixed interstimulus interval (ISI) of 48.33 ms, a second display appeared for 16.67 ms. This also had circles at seven locations and a half circle at one location. Crucially, the half circle in display 2 completed the half-circle defined by the single arc presented in display 1, such that if the two displays were superimposed the two arc elements formed a standard circle stimulus. The locations of the seven circles for each display never overlapped, so that if the two displays were superimposed only one of the sixteen possible locations remained empty. This hypothetical superimposition is illustrated in Figure 1. To mitigate the influence of an increasing hazard rate over the cue interval, 10% of trials were catch trials in which a blue fixation cross appeared instead of any displays. No response was required and these catch trials were excluded from analysis of behavior.
A response probe appeared 400 ms after the offset of display 2 and sustained until response was made. This comprised a grid of squares where each square identified one of the sixteen possible target locations. Participants indicated the location where they had perceived the target by moving a highlighted square around the response screen (using two buttons pressed with their left hand) and confirming their choice (with a button pressed with the right hand; button boxes: DataPixx, VPixx Technologies Inc.).
There were two versions of the task that varied across experimental blocks. Importantly, stimulus presentation in both versions was identical, with only task instructions changing between conditions. In one version of the task, the target was the half circle. Successful identification of the half circle required that the two displays be parsed, and we refer to this as the segregation task. In the other version of the task, the target was the location where no circle appeared in either display. Successful identification of this location required a combined percept of the two displays, and we refer to this as the integration task (Hogben and Di Lollo, 1974). In both tasks, the target location was randomly determined for each trial, and the blocks were ordered randomly (with the constraint that participants saw an equal number of integration and segregation blocks). Participants were explicitly instructed to fixate the cross in the center of the screen throughout stimulus presentation and instructed at the beginning of each block to locate either the half circle or the empty position.
Before entering the MEG, participants completed 30 cued practice trials for each task version in a room adjacent to the scanner. Participants repeated these two practice blocks until they were able to perform at better than 25% accuracy in both tasks (note that chance level in this task is 6.25%). Participants then completed ten blocks of the main experiment while MEG was recorded, with each block comprising 67 trials.
Eye tracking and analysis
An Eyelink 1000 eye tracker (SR Research) was fixed to the stimulus presentation screen at a distance of 1 m from the MEG helmet and the position of the right eye was recorded at a sampling rate of 1 kHz. The vertical and horizontal electrooculogram (EOG) was additionally recorded and low pass filtered offline with a 25 Hz cutoff. Trials were rejected from analysis when eye-tracking data in the 850-ms interval preceding onset of the first display showed velocity exceeding 300°/s or any sample indicated that eye position was >1.4° visual angle from the center of the display. Trials were also rejected from analysis when vertical EOG in the 100-ms interval preceding onset of the first display exceeded a maximum value, defined per participant, that characterized blinks. All results from automated rejection were verified (and occasionally adjusted) following visual inspection. This resulted in removal of 3 ± 4% of trials per participant for blinks and 7 ± 7% of trials per participant for saccades (mean ± SD).
MEG recording and analysis
MEG was recorded at 1 kHz using a Neuromag306 system with 102 magnetometers and 204 planar gradiometers (Elekta). A subject-specific head frame was digitized before the experiment began (3Space Fastrack; Polhemus). Each head frame featured the three cardinal landmarks (nasion and left and right preauricular points), the position of five head position indicator (HPI) coils, and between 200–300 other head shape sample points. The head frame was used in localizing the position of the participant's head in relation to the MEG sensors at the beginning of each block. The data were processed using the FieldTrip toolbox (Oostenveld et al., 2011) for MATLAB (MathWorks).
Infomax independent component analysis (ICA; Bell and Sejnowski, 1995) was employed to identify heartbeat and residual eye-movement variance that sustained after eye-tracker guided trial exclusion. This was removed from the data (2 ± 1 components rejected per participant, mean ± SD). MEG channels with nonbiological noise were identified by visual inspection of the raw data, leading to the removal and interpolation of 10 ± 1 channels per participant (mean ± SD). The data were then Maxfiltered (Elekta Neuromag) to remove noise originating from outside the MEG helmet and to align head position across runs before being epoched in reference to photodiode-corrected trigger timings.
Frequency spectra were computed for conditional signals observed at magnetometers posterior to the central sulcus in the 1 s preceding stimulus onset. This depended on a fast-Fourier transform (FFT) of Hamming-windowed signal. A time series of instantaneous α frequency was separately estimated using frequency sliding (Cohen, 2014; Samaha and Postle, 2015; Wutz et al., 2018). To this end, a Hamming-windowed least-square FIR filter was employed to isolate the 7- to 14-Hz frequency band. This generated a plateau passband in frequency space with low passband ripple and a transition bandwidth of 15% (−6 dB at 5.95 Hz and 16.1 Hz). An estimate of instantaneous phase angle was subsequently obtained through the Hilbert transform. From this, an estimate of instantaneous frequency was calculated as the first temporal derivative of phase angle. To remove analytic artefacts, instantaneous frequency was nonlinearly filtered by calculating the median for each sample 10 times, where each median was calculated across a linearly-increasing temporal window of 10–400 ms. The median of these 10 values was taken as an estimate of the instantaneous frequency at each time point.
Source analysis
Source localization began with the combination of individual subject head digitization data with anatomic MRI data to create realistic single-shell head models. Anatomical MRI scans were available for 18 of the 29 participants (Bruker BioSpin MedSpec 4T MR-scanner; T1-weighted MPRAGE, TR = 2.700 ms, TE = 4.18 ms, flip angle = 7°, isotropic voxel = 1 mm3). For the remaining participants a standard MNI template (https://brainweb.bic.mni.mcgill.ca/brainweb/) was used with a coherent point drift fitting procedure (Myronenko and Song, 2010). Source localization was based on grid points defined in common MNI space (10-mm spacing, 3294 grid points) which were warped to individual space during estimation and restored to normalized space to create a consistent normalized grid across participants.
The head models were used to estimate time courses of activity for each virtual sensor in source space using a linear constrained minimum variance (LCMV) beamformer approach (Van Veen et al., 1997). Covariance was estimated for 1- to 30-Hz bandpass filtered magnetometer data from a −300- to 0-ms time window before onset of the first target display and this was used to create a common spatial filter. Every grid point time series estimation was then subjected to the same instantaneous frequency analysis described above for the sensor-level data, with results averaged over the 300 ms prestimulus time window. Analysis was restricted to the parietal and occipital cortex, where visual areas maintain retinotopy and where differences in the α band have been identified in earlier work (Wutz et al., 2018). Regions of interest were created using the WFU pickatlas (Maldjian et al., 2003) under SPM12 (www.fil.ion.ucl.ac.uk/spm). Resulting areas were labeled according to the AAL atlas (Tzourio-Mazoyer et al., 2002).
Numeric and statistical analysis
To test the effect of the experimental manipulations on accuracy, we performed a two-way repeated measures ANOVA (RANOVA) with factors for task (segregation, integration) and cue type (valid, neutral, invalid). Where assumptions of sphericity were not met, statistics have been Greenhouse–Geisser corrected. Post hoc testing of behavioral data was conducted by Bonferroni-corrected pairwise comparisons using two-tailed t tests. Error bars in figures indicate within-subjects 95% confidence intervals (Morey, 2008).
In line with prior studies of average α rate (Wutz et al., 2018), all analysis of MEG results was based on magnetometer data. In sensor-level and source-level analysis of bilateral shifts in average α rate conditional differences were identified in neutral-cue trials using cluster based permutation analysis (Maris and Oostenveld, 2007; 105 permutations; cluster-defining threshold p < 0.05; null distribution generated by randomization of condition labels across participants). Statistical analysis of cue effects on average α are based on mean signal in the 300 ms preceding onset of the first stimulus display. All off-midline sensors posterior to the central sulcus were identified as being ipsilateral or contralateral to the cue location.
Average α frequency is known to have substantive individual variability in the population (Cecere et al., 2015), and this introduces noise to raw estimates of α frequency in a sample. In order to reduce the impact of this noise on tests of the difference between ipsilateral and contralateral signals, we centered data for each of the integration and segregation conditions on results observed following the neutral-cue. For example, to generate the contralateral signal in the left-cue condition, the signal recorded at each right-hemisphere sensor in the neutral-cue condition was subtracted from the signal recorded at that same sensor in the left-cue condition. To generate the ipsilateral signal in the left-cue condition, the signal recorded at each left-hemisphere sensor in the neutral-cue condition was subtracted from the signal recorded at that same sensor in the left-cue condition. Note that because this baselining procedure is applied within each of the integration and segregation conditions separately, it removes the effect of the task manipulation that is shared between the cue and neutral-cue conditions. Results were subsequently entered into a RANOVA with factors for task (segregation vs integration), cortical hemisphere (left vs right), and sensor laterality (contralateral to cued location vs ipsilateral to cued location). Post hoc contrasts of average α frequency relied on permutation tests (105 permutations). Analysis of lateral power relied on similar procedure.
Statistical analysis of the effect of the cue on instantaneous frequency at the source level relied on a different analytic approach. In order to maintain and identify asymmetries in effect magnitude across the left and right cortices, we conducted cluster-based permutation contrasts between the segregation and integration conditions separately for each of the left and right cue conditions (105 permutations; cluster-defining threshold p < 0.05). Because the baseline for the cued segregation data (that is, neutral-cue data from segregation trials) is different from the baseline for the cued integration data (that is, neutral-cue data from integration trials), some care was needed in the design of this analysis. A straight contrast of baselined contralateral segregation-task data and baselined contralateral integration-task data would confound any differences in the signals with differences in the baselines. However, because there is no distinction between contralateral and ipsilateral signals in the neutral-cue condition, a shift in the neutral-cue baseline emerges equally in ipsilateral and contralateral signals. This opened the opportunity to generate an empirical null distribution of cluster probability from data in ipsilateral cortex that integrated any influence of a shift in baseline (through randomization of condition labels across participants). This null distribution was subsequently employed for statistical identification of the effect of task in contralateral source space.
Results
Analysis of accuracy identified robust spatial cueing effects in both the integration and segregation tasks (Fig. 2). Statistical analysis identified significant main effects of cue (F(2,56) = 127.47, p < 0.001, ηP2 = 0.820, ηG2 = 0.604), reflecting the improvement in performance with valid cues, and task (F(1,28) = 19.62, p < 0.001, ηP2 = 0.412, ηG2 = 0.163), reflecting higher accuracy in the segregation task. A significant interaction between cue and task also emerged (F(2,56) = 21.83, p < 0.001, ηP2 = 0.438, ηG2 = 0.070), driven by a greater effect of the cue in the integration task than in the segregation task. Post hoc tests limited to results from the segregation task identified both a significant benefit of the valid cue (t(28) = 4.77, p < 0.001) and a significant cost of the invalid cue (t(28) = 6.40, p < 0.001). Similar results emerged from analysis of the integration task, with a benefit of valid cueing (t(28) = 10.72, p < 0.001) and a cost of invalid cueing (t(28) = 13.79, p < 0.001). These behavioral results broadly replicate Sharp et al. (2018).
Analysis of eye-tracking data identified a residual bias in eye position toward the cue direction in the 300 ms preceding onset of the first display (despite the identification and rejection of trials with eye movement artifacts described above). Mean bias was 0.05° visual angle toward the cue direction, likely reflecting a combination of cue effects on eye drift and microsaccade direction. Importantly, this cue-elicited bias was not sensitive to the experimental task. In a within-participant ANOVA with factors for cue direction (left vs right) and task (segregation vs integration), the effect of cue direction on x-axis position of the eyes was significant (F(1,28) = 11.34, p = 0.002) but the effect of task was not (F(1,28) = 2.86, p = 0.102) and there was no hint of interaction (F < 1).
Frequency spectra for MEG data in the 1 s preceding the first target display are illustrated in Figure 3. These were computed to evaluate the suitability of the data for use of frequency sliding to measure average α frequency. Neural data tends to have 1/f-like structure, with low frequencies having greater power than high frequencies, and this can create a negative bias in the estimation of instantaneous frequency in a bandpass filtered signal (Donoghue et al., 2020). This bias increases as a function of the slope of 1/f structure in the passband signal, creating a situation where shifts in the aperiodic frequency structure (Podvalny et al., 2015) can masquerade as changes in the average instantaneous frequency of a filtered signal. Comparing instantaneous frequency between conditions can therefore be problematic when (1) conditional manipulation introduces a change in the 1/f slope of the data, (2) the oscillatory signal is small or the analytic passband misses the peak in frequency space, or (3) there are substantive power differences between conditions in the passband (Samaha and Cohen, 2022). As illustrated in Figure 3, the current results show none of these characteristics. Alpha is prominent and the passband includes the entirety of the “α bump” for all participants, there is no reliable difference in power between segregation and integration conditions in the average across the target passband (two-tailed t(28) = 1.43, p = 0.166), and slope of the 1/f structure does not appear to differ between experimental conditions.
Results from instantaneous frequency analysis of neutral cue trials are illustrated in Figure 4 and reveal an increase in mean α rate in the segregation condition. Cluster analysis of sensor-level data identified a significant spatiotemporal effect beginning 396 ms before the onset of the first display and sustaining until 20 ms before the onset of the display (Fig. 4A). This effect emerged in a cluster of sensors extending from bilateral occipito-parietal cortex to right lateralized frontal cortex. The spatial and temporal extent of this effect overlaps with that previously observed by Wutz et al. (2018), particularly reproducing the right hemisphere lateralization observed in that study, though the spatial extent is larger and the temporal scope is smaller. This may reflect differences in the noise structure of the data, linked to the fact that the current study had fewer trials per condition than did Wutz et al. (2018) but more total participants. Source analysis of results from neutral-cue conditions is illustrated in Figure 4B. This analysis is based on results from the 300 ms preceding onset of the first display, as was also the case for source analysis by Wutz et al. (2018), and closely replicates the neural sources identified in that earlier study.
To test the idea that strategic effects on prestimulus instantaneous frequency tracked the deployment of spatial attention, we compared results in the 300 ms preceding onset of the first display at posterior sensors ipsilateral and contralateral to the cued location for each of the segregation and integration tasks. We approached the data with the idea that any retinotopic effect must emerge over posterior cortex, but did not otherwise have strong expectations regarding location. With this in mind, we conducted a within-participant ANOVA with factors for task (integration vs segregation), cortical hemifield (left vs right), and sensor laterality (ipsilateral vs contralateral) for each of the 23 pairs of homologous lateral sensors located posterior of the central sulcus. As illustrated in Figure 5A, the critical Bonferroni-corrected statistical interaction of task and sensor laterality emerged at two contiguous sensor pairs (anterior pair: F(1,28) = 13.42, praw = 0.001, pcorr = 0.024, ηP2 = 0.324, ηG2 = 0.043; posterior pair: F(1,28) = 21.12, praw < 0.001, pcorr = 0.002, ηP2 = 0.430, ηG2 = 0.047). Figures 5B,C illustrate the combined signal at these sensors, showing that the α rate contralateral to the attended location was faster than the α rate at ipsilateral sensors when participants expected to segregate sequential stimuli (Fig. 5B), but was slower when participants expected to integrate (Fig. 5C). No other effect emerged at either sensor pair (anterior pair: task, F(1,28) = 2.49, p = 0.126; sensor laterality, F(1,28) = 1.17, p = 0.290; task × sensor hemisphere × sensor laterality, F(1,28) = 1.36, p = 0.254; posterior pair: sensor laterality, F(1,28) = 3.08, p = 0.090; sensor laterality × sensor hemisphere, F(1,28) = 3.12, p = 0.088; all other Fs < 1). Post hoc contrasts were conducted on the combined signal observed at sensor pairs where the interaction survived statistical correction. The difference between ipsilateral and contralateral signals was reliable in the segregation condition (p < 0.001) but not the integration condition (p = 0.181). The lateral effect on average α frequency is illustrated over a longer time course in Figure 5D.
Results from similar analysis of lateral power are illustrated in Figure 6. The purpose of this analysis was to determine whether the variance in instantaneous frequency identified was mirrored by changes in lateral oscillatory α power at the same sensor locations. With this in mind, power over the 300 ms preceding the first display was analyzed at the sensors identified in Figure 5A, where the interaction of task and sensor laterality emerged in instantaneous frequency. Results from a within-participant ANOVA with factors for task, sensor hemisphere, and sensor laterality identified no significant effects (sensor hemisphere: F(1,28) = 2.58, p = 0.120; task × sensor hemisphere: F(1,28) = 1.39, p = 0.248; task × sensor laterality: F(1,28) = 1.38, p = 0.250; task × sensor hemisphere × sensor laterality: F(1,28) = 3.65, p = 0.070; all other Fs < 1). Results at these sensors therefore do not express a reliable decrease in α power in the hemisphere contralateral to the locus of attention, and do not show any reliable difference in the lateral effect between segregation and integration conditions. This is consistent with broader results, where contralateral decrease in α power associated with the deployment of spatial attention tends to be sourced to ventrolateral visual cortex (Capilla et al., 2014), and therefore does not emerge at dorsomedial sensors like those identified in analysis of instantaneous frequency here.
In source analysis of instantaneous frequency, a cluster showing increased α frequency for segregation over integration was found in the left hemisphere when the spatial cue indicated locations in the right visual field (MNI x: 8, y: −96 z: −6, p = 0.010; Fig. 7). A similar cluster emerged in the right hemisphere when the spatial cue indicated locations in the left visual field, though this did not reach independent significance (MNI x: −6, y: −100, z: 0, p = 0.130). Both clusters were located in early visual areas at the occipital pole (left and right calcarine, respectively). The left hemisphere cluster extends dorsally along the medial surface of the brain, roughly following the medial expression of areas V1, V2, and V3.
Discussion
Our results demonstrate a previously-unknown interaction between temporal and spatial processing in early visual cortex. Reproducing existing results, we found that prestimulus α frequency in posterior cortex increased in speed when participants knew they needed to segregate sequential visual stimuli, relative to when they were preparing to integrate visual stimuli (Wutz et al., 2018). Novelly, we found that this shift of α frequency varies in striate and extrastriate cortex with spatial topography. That is, when the cue identified a location where stimuli needed to be segregated, contralateral α in retinotopic visual cortex was significantly faster than ipsilateral α, whereas when the cue identified a location where stimuli needed to be integrated, contralateral α was nominally slower than ipsilateral α. These results suggest that spatial attention enhances operations underlying the resolution of features defined in the temporal dynamics of visual stimuli, much as it supports the resolution of static visual features like color or orientation.
It is important to note that the experimental and analytic approach we adopt in this study means that the comparison of ipsilateral and contralateral signals within each of the integration and segregation conditions is informative, as is the contrast of these differences that is captured in statistical interaction, but there is no simple opportunity to directly compare contralateral (or ipsilateral) signals across the integration and segregation conditions. As described in the methods section, we use results observed in the neutral cue condition as a baseline for results in cue conditions. This removes broad individual differences in α frequency, increasing statistical sensitivity for conditional variance, but also removes task-related bilateral shifts of α frequency that emerge as a function of temporal preparation in both neutral-cue and cue data. As a result, lateral signals in each of the segregation and integration conditions are not defined in reference to the same baseline, making comparison of contra to contra (or ipsi to ipsi) potentially misleading. For example, Figure 5 appears to suggest that α in ipsilateral cortex is slower in the segregation task than it is in the integration task. However, as illustrated in Figure 4A, α is in fact broadly, bilaterally faster in the segregation task than it is in the integration task. The illusory impression garnered by comparison of ipsilateral results in Figure 5 is a product of the independence of baseline data in each of the segregation and integration tasks. This issue also applies in source analysis, where the contralateral difference between segregation and integration is statistically compared against the ipsilateral difference.
This means that we cannot be sure that the conditional difference in lateral α frequency we identify between segregation and integration conditions is entirely instantiated in contralateral cortex. The alternative is that the deployment of spatial attention for segregation or integration has effects on contralateral and ipsilateral cortex of opposite polarity. This would not be entirely surprising. The deployment of attention is known to have this kind of reciprocal lateral effect in other contexts, for example in the ipsilateral increase of α power that is associated with α decrease in cortex contralateral to the locus of attention (Haegens et al., 2011), or in negative fMRI signal in retinotopic cortex responsible for unattended locations that is associated with emergence of positive signal in cortex responsible for attended locations (Tootell et al., 1998). One strategy for further investigation of this issue would be to employ baseline conditions where attention is deployed along the vertical meridian of the display (Hickey et al., 2009; Van Zoest et al., 2021).
A related observation is that spatial attention does not appear to be the sole determinant of temporal processing in these results. That is, analysis of neutral-cue data identified a shift in α frequency in posterior cortex (reproducing Wutz et al., 2018). This effect also emerges in cue trials, but is analytically removed in analysis of lateral results by our baselining procedure. Temporal visual processing thus appears to be sensitive to strategic preparation, independent of the deployment of spatial attention, but spatial attention accentuates this broader influence. In this way, attention tunes vision beyond what is otherwise possible, improving sensitivity for specific types of dynamic events at the attended location.
The pattern of results we observe, with integration associated with slower contralateral α and segregation associated with faster contralateral α, is consistent with the idea that perceptual sampling is directly or indirectly reflected in the speed of α (Akyürek et al., 2007; Wutz et al., 2016, 2018; Mierau et al., 2017). However, the contralateral shift in instantaneous frequency we observe is only independently significant in analysis of the segregation task. One possibility is that the deployment of attention for segregation has an impact on temporal brain dynamics that does not occur in the deployment of attention for integration. However, as segregation and integration lie along a continuum of temporal operations, and are defined only in reference to one another, this would be surprising. A likelier alternative is that, by default, the deployment of attention creates a lateralization of α frequency that supports an increase in temporal resolution at the attended location. This is in line with recent results showing that increase in prestimulus α speed in visual cortex contralateral to the locus of attention is associated with perceptual sensitivity in the detection of fleeting visual stimuli (Di Gregorio et al., 2022). In the current study, the default contralateral increase in α speed associated with spatial attention may be accentuated when attention is deployed for the explicit purpose of temporal segregation, but diminished (and nominally reversed) when attention is deployed for the explicit purpose of temporal integration.
There is ongoing debate regarding the locus of influence for temporal attention in visual cognition. On one hand is the idea that temporal attention speeds decision-making without impacting the quality of sensory evidence (Seibold et al., 2011; van den Brink et al., 2021). This sort of account is broadly in line with the influential theory that temporal attention may act during the establishment of episodic tokens and transfer of information to working memory (Kanwisher, 1991; Bowman and Wyble, 2007). On the other hand, there are compelling empirical demonstrations of effects of attention on sensory information (Schroeder and Lakatos, 2009; Rohenkohl et al., 2012). This debate has tended to focus on the impact of temporal expectation on visual processing of static stimuli, which is subtly different from the expectation of temporal dynamics that is manipulated in the current study. It is moreover possible that temporal attention acts via multiple mechanisms. However, the current results identify a preparatory effect in early visual cortex that is easily reconciled with the notion of a proactive mechanism that acts on the quality and structure of sensory information, but harder to explain as a mechanism that acts on postsensory decision-making.
The current results are broadly in line with theoretical interpretation of the functional significance of α in vision. Alpha has been empirically linked with neural inhibition (Scheeringa et al., 2009; Haegens et al., 2011; Spaak et al., 2012) and this has motivated the proposal that it reflects rhythmic inhibition in the visual system that gates the propagation of representations through the visual hierarchy (Klimesch et al., 2007; Jensen and Mazaheri, 2010; Mathewson et al., 2011). By this, posterior α is thought to pause information transfer between visual areas so that local mechanisms can operate without interference from feedforward and feedback connections. The current results suggest that the deployment of attention can flexibly adapt oscillatory activity to strategically optimize the time duration that fits in the “open” portion of an α cycle. This reveals a novel spatiotemporal aspect of attentional modulation of visual processing that goes beyond spatial or temporal effects in isolation.
The effect of spatial attention on the speed of α that is identified here is small in absolute terms, but has the potential for substantive impact within the framework sketched above. This is perhaps most easily illustrated in consideration of a toy model. Assume that posterior lateral α cycles are composed of equal duration “open” and “closed” windows, and perceptual segregation only occurs when stimuli onset occurs within the “open” window of separate α cycles. Further consider two perceptual events that occur with 65 ms asynchrony (as in the current experiment) in a situation where the deployment of spatial attention for segregation rather than integration causes a total shift of around 0.15 Hz (as in the current results). If baseline α oscillates at 10 Hz in this scenario, the deployment of attention for segregation rather than integration will cause the onset of the two sensory events to fall in discrete “open” windows of contralateral α ∼5% more often. This is already a substantive difference, but the influence grows as baseline alpha frequency decreases. At 8 Hz, a 0.15 Hz shift creates an increase of 46%. A small absolute change in retinotopic oscillatory frequency may in this way create an outsize impact on the neural encoding of temporally ambiguous sensory events.
To conclude, we demonstrate that the strategic deployment of spatial attention optimizes temporal processing by changing the frequency structure of oscillatory activity in retinotopic visual cortex. This combined influence of attention on spatial and temporal processing appears to support efficient and adaptive perceptual processing in dynamic, four-dimensional environments.
Footnotes
This work was supported by the European Research Council (ERC) under the European Union's Horizon 2020 Research and Innovation Program Grant Agreements 313658 (to D.M.) and 804360 (to C.H.). We thank Jeff Anesi for assistance with data collection and David Acunzo for discussion.
The authors declare no competing financial interests.
- Correspondence should be addressed to Clayton Hickey at c.m.hickey{at}bham.ac.uk