Abstract
Dynamic vision requires both stability of the current perceptual representation and sensitivity to the accumulation of sensory evidence over time. Here we study the electrophysiological signatures of this intricate balance between temporal segregation and integration in vision. Within a forward masking paradigm with short and long stimulus onset asynchronies (SOA), we manipulated the temporal overlap of the visual persistence of two successive transients. Human observers enumerated the items presented in the second target display as a measure of the informational capacity read-out from this partly temporally integrated visual percept. We observed higher β-power immediately before mask display onset in incorrect trials, in which enumeration failed due to stronger integration of mask and target visual information. This effect was timescale specific, distinguishing between segregation and integration of visual transients that were distant in time (long SOA). Conversely, for short SOA trials, mask onset evoked a stronger visual response when mask and targets were correctly segregated in time. Examination of the target-related response profile revealed the importance of an evoked α-phase reset for the segregation of those rapid visual transients. Investigating this precise mapping of the temporal relationships of visual signals onto electrophysiological responses highlights how the stream of visual information is carved up into discrete temporal windows that mediate between segregated and integrated percepts. Fragmenting the stream of visual information provides a means to stabilize perceptual events within one instant in time.
Introduction
Relevant information from the visual environment can change dynamically due to either real-world transitions (i.e., change or motion) or internal shifts in focus (i.e., spatial attention, eye movements). However, we experience our sensory surrounding to be coherent and stable in time and space (Melcher, 2011). Perceiving visual stability requires an intricate balance between reading out spatiotemporally invariant representations (i.e., objects) and simultaneously accumulating further sensory evidence over time. Intermediate level vision has to mediate virtually in real time between segregating individual objects detached from their immediate spatiotemporal reference and integration of sensory flux (Oðmen, 1993; Oðmen and Herzog, 2010; Wutz et al., 2012; Wutz and Melcher, 2013).
We investigated the electrophysiological (magnetoencephalographic; MEG) signatures of temporal segregation and integration in vision by presenting observers with two successive sensory signals. Our study took advantage of a forward masking paradigm to manipulate the temporal overlap between two visual transients: mask and target (Di Lollo, 1980; Wutz et al., 2012). The task was enumeration which, unlike simple target detection, requires structuring operations in intermediate level vision [object individuation (Xu and Chun, 2009) and visual routines (Ullman, 1984)] whose outputs in the form of object files can provide visual stability over time (Pylyshyn, 1989; Kahneman et al., 1992). Critically, the capacity of individuation depends upon the stimulus onset asynchrony (SOA) between target and mask, which determines the degree to which visual persistence of the two stimuli is integrated (Wutz et al., 2012; Wutz and Melcher, 2013). We compared correct trials, in which mask and target were successfully segregated in time, to those in which integration by masking was stronger and enumeration failed.
Electrophysiological signatures of temporal segregation and integration in vision were expected to be predominant in three key time periods (Fig. 1). First, as suggested by previous paradigms probing the influence of ongoing brain activity (Varela et al., 1981; Thut et al., 2006; Romei et al., 2008) or top-down control on perception (Hanslmayr et al., 2007a; van Dijk et al., 2008; Keil et al., 2012, 2013; Volberg et al., 2013), we predicted higher power within the alpha (8–12 Hz) to beta frequency range (13–30 Hz) before mask onset for incorrect trials. Second, effects of temporal integration should be observable in the magnitude of the response evoked by the first masking transient (Winkler et al., 1993; Hamada et al., 2001). Finally, we compared target-related processing selectively for short and long SOAs. Adding the specific SOA to the latency of the initial mask-evoked response provides an estimate of when signals related to visual processing of a target display can be expected on a particular sensor (Rieger et al., 2005). We examined the temporal relationship between these expected and observed responses related to individuating target information from masking persistence in close (short SOA) and distant (long SOA) temporal proximity.
This precise mapping of the temporal relationships of visual signals onto electrophysiological responses allowed us to investigate the role of discrete temporal windows in segregation and integration of visual information, as a means to stabilize vision over time.
Materials and Methods
Subjects
Sixteen participants volunteered after giving written informed consent. Two participants were excluded from analysis: one due to excessive artifacts in the MEG data, which contaminated >50% of the trials, the other because of exceptionally bad behavioral performance (<60% of correct responses in the easiest experimental condition; one target item with 200 ms SOA). Fourteen subjects remained in the sample (11 female; mean age M = 25.1 years, SD = 1.9 years; 13 right handed). All participants had normal or corrected-to-normal vision and took part in exchange for payment. The experimental protocol was approved by the local ethics committee.
Stimuli and procedure
Before the experimental runs, each participant completed 50 practice trials to familiarize them with the visual stimulation and the response collection devices. The experimental procedure started only after the mapping between response finger and button box became relatively automatic (at least 20 consecutive fast and correct responses). Visual stimuli were presented to subjects in a dimly lit magnetically shielded room. The visual stimuli were generated on an HP Intel Quad core computer using MATLAB 7.9 (MathWorks) and Psychophysics Toolbox Version 3 (Brainard, 1997; Pelli, 1997). A DLP projector (Panasonic PT-D7700E) projected the visual stimuli at a refresh rate of 60 Hz centered onto a translucent screen (22 (horizontal) × 17° (vertical) of visual angle), located 127 cm from the subjects. The precise timing of the visual stimulation was monitored via a photo diode placed at the upper left corner of the projection screen and the delay between trigger and stimulation onset was corrected with this method.
Each trial began with a central fixation dot (black, 0.15°) on a white background for 500 ms, followed by a blank white screen for a jittered pre-mask interval (800–1300 ms). The visual stimulus consisted of a forward mask and a target display superimposed onto the masking pattern. On each trial a different pattern of 2250 randomly oriented, partially crossing black lines [mean line length = 0.5° visual angle, mean line width = 0.04°, mean size of whole pattern = 4° (horizontal) × 5.6° (vertical)] was presented centered on a white background first (Fig. 1). This pattern remained on the screen and after a variable onset delay (SOA), from 0 to 4 target items appeared superimposed upon the random line pattern by use of the image processing technique called “alpha blending” (alpha = 0.5 for both displays; Fig. 1). Two diagonally crossing lines (“X”) represented one target item. All items were colored in black, were 0.3° (horizontal) × 0.5° (vertical) of visual angle in size and were placed randomly on one of 16 possible locations within an invisible, central rectangle of 2.4° (horizontal) × 3.3° (vertical) of visual angle in eccentricity with a minimum buffer of 0.4° between the locations (Fig. 1). The physical properties of both mask and target elements, i.e., contrast, mean line length, and mean line width, were equated. Furthermore the alpha blending procedure edited the transparency/opacity values of the visual stimuli assuring a mathematically correct superimposition of local element contrast, without creating any discontinuities in luminance. All these adjustments assured that mask and target elements only differed in their temporal onset exclusively creating partial overlap between the visual persistence of the two transients (Di Lollo, 1980; Wutz et al., 2012; Wutz and Melcher, 2013).
The target display was presented for 50 ms. The preceding masking pattern, however, was on the screen during target presentation plus the independently varied SOA between mask and target display. There were four different SOAs: 0 ms, 33 ms, 50 ms, or 200 ms (Fig. 1). After mask and target offset a white blank screen was presented until the subject's response (which initiated the next trial) or for a maximum of 2 s. The participants' task was to indicate the quantity of perceived items in the target display by lifting the finger in the corresponding optical fiber button boxes, which were assigned one particular number each before the experiment (five boxes for responses 0–4). The finger-response mapping was balanced across subjects. In total each participant ran 20 blocks with 102 trials per block (∼6 min duration).
Each of the 16 possible combinations of SOA (0–200) and set size (0–3) were presented six times per block in random order. Displays containing four target items were only shown six times and always in the trials with SOA of 200 ms. This set size was included as catch trials to prevent a response bias to always report the highest number when in doubt. Showing target displays of different set sizes is important to measure integration masking per se. However, for the analysis of the electrophysiological activity, we combined the responses for the different set sizes. Since no response bias for the set size of four items was evident, we collapsed the data for all sizes and only contrasted correct and incorrect trials regardless of the actual set size. Moreover, trials with no targets constitute a separate experimental condition. These mask-only trials collapsed over the different SOAs serve as a control condition for mask-evoked activity without additional target processing.
MEG measurement
Electrophysiological activity was recorded with an on-line sampling rate of 1000 Hz using a whole-head MEG with 102 magnetometers and 204 planar gradiometers (Neuromag306 system; Elekta) in a magnetically shielded room. This system consists of 102 sensor locations each containing a triplet of one magnetometer and two gradiometers. In particular, gradiometer information is sensitive to sources close to the sensor location, i.e., neural generators at the cortical surface. To localize the head position of the subject within the MEG helmet, a subject-specific head-frame coordinate reference was defined before the experimental runs. The cardinal points of the head (nasion and left and right pre-auricular points), the location of five head-position indicator (HPI) coils, and a minimum of 200 other head-shape samples were digitized for motion tracking (3Space Fastrack; Polhemus) at the start of each session. The subject's head position relative to the HPI coils and the MEG sensors was estimated before each experimental run to ensure that no large movements occurred during the data-acquisition procedure.
MEG data analysis
The data were analyzed using the FieldTrip toolbox (Oostenveld et al., 2011) in combination with MATLAB 7.12 (MathWorks). These data were segmented from 800 ms before to 1500 ms after mask onset, downsampled off-line to 250 Hz, and bandpass filtered between 1 and 40 Hz with a two-pass Butterworth filter with the order 4. A semi-automatic artifact detection routine identified trials and channels that deviated in amplitude using a summary statistic (variance) of the entire dataset. These trials and channels were removed from the dataset. Finally, the data were visually inspected and any remaining trials and channels with artifacts were removed manually. The rejected channels were interpolated with the nearest-neighbors approach for sensor level analysis. For source localization, information from interpolated channels was not used. Finally the proportions of trials for the experimental conditions of interest (correct, incorrect, and mask-only trials) were equated in trial number by selecting a random subsample of trials from the condition with more trials.
Event-related fields.
Before calculating event-related fields (ERFs), data were bandpass filtered using a two-pass Butterworth filter with a filter order of 4 and a frequency cutoff between 2 and 20 Hz. After calculating the ERF as the average in amplitude across trials, data from planar gradient pairs were combined using vector addition. ERFs were baseline corrected using an interval of −200 to 0 ms before mask onset.
Time–frequency analysis.
We calculated time–frequency representations using a Fourier transform approach applied to short sliding time windows in steps of 10 ms. The power estimates were computed by means of a Hanning taper of single trial data for the frequency range from 5 to 40 Hz. The window length of the taper was five cycles per frequency of interest (cpf). This procedure yields good spectral resolution at low frequencies and good temporal resolution at high frequencies. At lower frequencies, however, prestimulus activity close to mask onset could possibly be confounded by stimulus-evoked activity after mask onset due to the inherent temporal smoothing of the Hanning tapering ((1/f)*cpf in seconds; f, frequency; here: 5 cpf). Therefore, we confirmed that the reported findings are not dependent on the length of the windowing function, since similar effects were found using a shorter window length (2 cpf; data not shown). The power values were calculated for the horizontal and vertical component of the planar gradient and then combined via their vector sum.
Intertrial coherence.
To disentangle the effects of an increase in amplitude and an increase in phase consistency across trials of the visual-evoked response, we computed the intertrial coherence (ITC; Makeig et al., 2004); also called phase-locking factor (PLF; Tallon-Baudry et al., 1996) within the interval of −200 to 500 ms around mask onset and the frequency range between 5 and 35 Hz, with identical Hanning taper characteristics as described for the calculation of oscillatory power. At each time, frequency, and sensor sample, the result of the Hanning tapering and Fourier transform for each trial is a complex number with a real and an imaginary part. To control for differences in amplitude, the lengths of the complex vectors (representing amplitude and phase) were normalized to one for all trials. Thus, only the information about the phase of the spectral estimate of each trial is taken into account. The extent of phase consistency across trials is quantified by the length of the resultant of these normalized complex vectors along the unit circle. The ITC measure can take values between 0 and 1. A value of 0 represents a random phase angle distribution across trials and a value of 1 indicates perfect synchronization across trials between MEG data and the time-locking events. The ITC values were calculated for the horizontal and vertical component of the planar gradient and then combined via vector addition.
Source localization.
A structural magnetic resonance image (MRI) was available for 12 of 14 participants. We coregistered the brain surface from their individual segmented MRIs (Nolte, 2003) with a single-shell head model. For 2 of the 14 participants no individual MRI scan was available. For those subjects we obtained the canonical cortical anatomy from the affine transformation of a Montreal Neurological Institute (MNI)-template brain to the subject's digitized head shape. Source activities were projected onto these approximate individual anatomical MRI images and subsequently normalized onto a standard MNI brain (Montreal, Canada; http://www.bic.mni.mcgill.ca/brainweb) using SPM8 (http://www.fil.ion.ucl.ac.uk/spm) to accomplish group statistics and for illustrative purposes. Anatomical structures corresponding to the localized sources of the statistical effects were found using the MNl brain and Talairach atlas (MRC Cognition and Brain Sciences Unit, Cambridge, England; see http://imaging.mrc-cbu.cam.ac.uk/imaging/MniTalairach).
Dynamic imaging of coherent sources beamforming of oscillatory sources.
The neural generators of the effects found in the time-frequency domain were identified by means of dynamic imaging of coherent sources (DICS; Gross et al., 2001), a frequency-domain adaptive spatial filtering algorithm. This algorithm has proven to be particularly powerful when localizing oscillatory sources (Liljeström et al., 2005). A common spatial filter derived from all trials has been applied separately to the different conditions (correct, incorrect). Based on the sensor level effects, power and cross-spectral densities were calculated for 15 Hz (±3 Hz smoothing) and within −500 to 0 ms relative to mask onset. As the pre-mask activity was mainly of interest here, source analysis outputs for both conditions (correct vs incorrect trials) were compared directly without prior normalization.
Linear constrained minimum variance beamforming of evoked sources.
The sources of effects found in the time series analysis were localized using a linear constrained minimum variance beamformer algorithm (LCMV; Van Veen et al., 1997). A common spatial filter based on the signal in all trials has been applied separately to the different conditions (correct, incorrect). The covariance matrix has been derived from the bandpass filtered (cutoff frequencies: 2–20 Hz) signal within the time course +50 to +200 ms after mask onset. No baseline adjustment has been applied.
Both the information from the magnetometer and planar gradiometer sensors systems were used for source localization after appropriately adjusting the balancing matrix according to the distance of the gradiometers (17 mm). Separate analysis using only the planar gradiometers yielded very similar results (data not shown).
Statistical analysis.
Oscillatory and evoked visual activity were compared between the conditions by means of nonparametric cluster-based permutation (dependent samples) t statistics (Maris and Oostenveld, 2007). This procedure effectively controls for the type I error accumulation arising from multiple statistical comparisons at multiple time, frequency, and sensor samples. First, clusters of spatiotemporal-spectral adjacent suprathreshold differences (dependent samples t statistics exceeding p < 0.05, two-sided) were identified. Within one cluster t values were summed up to reveal a cluster level test statistic. Then, random permutations of these data were drawn by exchanging the data between experimental conditions within the participants. The maximum cluster level statistic was recorded after each permutation run, revealing a reference distribution of cluster level statistics (approximated with a Monte Carlo procedure of 1000 permutations in the present study). Cluster-level p values were then estimated as the proportion of values in the corresponding reference distribution exceeding the cluster statistic obtained in the actual data. Source-level comparisons were calculated using dependent-samples t tests within the effects of interest identified on the sensor level.
Results
Behavioral data
The proportions of correct trials within the set sizes and SOAs of interest were fed into a within-subjects ANOVA. Consistent with previous results (Wutz et al., 2012; Wutz and Melcher, 2013), enumeration performance improved with increasing SOA within the set sizes (1–3 items) and SOAs (33–200 ms) of interest (SOA: F(2,26) = 89.36, p < 0.001). In fact, a closer examination of the error distributions revealed that the most frequent incorrect response was to report one item less than was actually presented (≈50% of all incorrect responses). Observers seldom missed detecting the onset of the second target display entirely (erroneous responding 0 targets comprised <5% of all trials), but instead failed to converge toward the correct response within the effective persistence of the visual image. As noted previously, this pattern of results is most likely due to increasing enumeration performance with less temporal integration (and hence better temporal segregation) of mask and target visual information (Scheerer, 1973; Di Lollo, 1980; Enns and Di Lollo, 2000; Wutz et al., 2012; Wutz and Melcher, 2013). Also the second main effect (set size)–showing better performance with smaller presented numerosities (F(2,26) = 43.82, p < 0.001)–and the interaction term were significant (SOA × set size: F(4,52) = 7.57, p < 0.001). Replicating previous findings (Wutz et al., 2012), small set size displays could be efficiently enumerated with short SOAs and already reached asymptotic performance (≈90% correct) thereafter, whereas performance with higher set sizes improved also with longer SOAs, within this ordinal interaction (Table 1).
Since enumeration performance was not above chance level for the 0 ms SOA (common onset masking), this condition was not included as a cell in the ANOVA design. The immense difference in the proportion of correct trials between common onset masking (0 ms SOA) and the 33 ms SOA condition (>60%), however, shows what enormous impact a temporal lag as small as 33 ms has on task performance and the entire psychometric function (Table 1). Indeed, since detection performance is only marginally impaired at such a small temporal onset asynchrony an enumeration task is required to be sensitive to the accumulation of information in this short time frame (Wutz and Melcher, 2013). Moreover, set size 0 trials were not included in the ANOVA design, since there is a fundamental conceptual difference between detecting and individuating physically present target items and detecting the absence of visual targets (Table 1).
MEG data
To identify electrophysiological signatures of temporal segregation and integration prior and in response to the presentation of visual transients, we first globally contrasted correct and incorrect trials collapsed over the three different SOAs (33, 50, and 200 ms). In a subsequent step we investigated whether those signatures might be timescale specific, yielding different patterns for short (33 and 50 ms) and long (200 ms) SOA trials. Mask-only trials served as a control condition associated with processing of a single stimulus.
Pre-mask oscillatory power
We started off by comparing differences in oscillatory power between correct and incorrect trials in the frequency range from 5 to 40 Hz averaged over the entire pre- to peri-mask interval (−500 to +50 ms around mask onset). A cluster of central-to-right occipital parietal gradiometer sensor locations (Fig. 2B,C) showed significant negative differences (correct < incorrect) in the lower beta band at ∼15 Hz (p < 0.04; Fig. 2A). A closer look on the time course of the differences in oscillatory power revealed two temporal maxima at approximately −350 and −50 ms before mask onset (Fig. 2A,E). DICS beamforming in the pre-mask interval (−500 to mask onset; 0 ms) at 15 Hz (±3 Hz smoothing) suggested that neural generators at the right occipital pole (peak difference t(13)=-4.54; MNI coordinates [34.0 −88.0 0]) and in left ventral occipital to inferior temporal areas (t(13) = −4.2; MNI coordinates [−49.0 −48.0 −9.0]) were involved in this power difference seen at the sensor level (Fig. 2D).
β-Power decreased both for correct and incorrect trials with approaching mask onset, indicating that mask onset may have been anticipated by the participants (Fig. 2E). This pattern suggests that the observed effect may be due to cognitively induced prestimulus activity in opposition to fluctuations in an ongoing occipital beta rhythm. It is noteworthy that we also observed oscillatory power differences in the alpha frequency range (8–12 Hz; Fig. 2A) conforming to previous findings (Hanslmayr et al., 2007a; van Dijk et al., 2008). The higher power in incorrect trials compared with correct trials, however, did not reach significance on a cluster level (that controls for multiple comparisons) in the alpha band.
Subsequently, we investigated whether the observed effect in oscillatory power in the lower beta frequency band (∼15 Hz) in the interval −500 ms before mask onset was timescale specific for the different levels of SOA. Therefore we ran a similar cluster t statistic (frequency of interest from 5 to 40 Hz, gradiometer sensors, averaged over the interval −500 to +50 ms around mask onset) between correct and incorrect trials, now each divided within the three different SOAs (33, 50, and 200 ms). In short SOA trials (33 and 50 ms) no significant clusters of power differences were found (Fig. 3.A,B). For long SOA trials (200 ms), however, a cluster of right occipital sensor locations shows significant differences at 15 Hz (p < 0.025; Fig. 3C). The general trend of higher β-power within incorrect compared with correct trials in the pre-mask period, however, is observable for all SOA, but strongest for the long SOA trials (200 ms). This effect reaches its maximum immediately before mask onset (−50 ms; Fig. 3D).
Visual-evoked response
In the second stage of the analysis, we examined the evoked activity to the two transients: the forward mask and the addition of 1–4 target items. A cluster-based permutation procedure applied on all gradiometer sensor locations and the time interval −500 to +1000 ms relative to mask onset revealed a cluster of central parietal sensors (Fig. 4.B,C) that showed a significant positive difference (p < 0.003; correct > incorrect) at ∼100 ms after mask onset (Fig. 4A). The time series of visual-evoked amplitude averaged over correct and incorrect trials, respectively, differed significantly around the peak-positive deflection (Fig. 4A; +60 to +130 ms relative to mask onset). The LCMV source solution yielded relatively widespread activity differences in the interval +50 to +200 ms relative to mask onset onto mostly left hemispheric parietal areas (peak difference t(13) = 6.1; left inferior parietal; MNI coordinates [−49.0 −55.0 52.0]; Fig. 4D).
ITC versus amplitude of the visual-evoked field
The higher evoked response to the forward masking event in trials in which the masking effect was weak (correct trials) is counterintuitive, if one assumes a positive linear relationship between response amplitude and masking efficacy. Therefore we tried to estimate to what extent the observed effect is actually due to increases in response amplitude. Theoretically there are two equally plausible explanations as to the generation of measurable differences in the visual-evoked field averaged over trials. First, in one condition the evoking stimulus (or stimuli) could have resulted in higher (or lower) amplitude ∼100 ms after stimulus onset in the underlying source of the signal. Second, in one condition the evoking stimulus (or stimuli) could have resulted in a more (or less) consistent phase reset peaking ∼100 ms after stimulus onset in the underlying source of the signal. In both cases, the averaged amplitude over trials would be higher (or lower) in the respective condition.
We estimated the consistency of the phase alignment in response to the external event by computing the ITC for correct and incorrect trials on all gradiometer channels within the frequency range of 5–35 Hz and in the interval from −200 to +500 ms relative to mask onset. Whereas ITC over all (correct and incorrect) single trials was low in prestimulus intervals (≈0.1 average ITC for all trials −200 to −100 ms before mask onset at 10 Hz within sensor cluster; Fig. 4E; top inset), α-phase synchronized within a brief temporal window in response to mask onset (≈0.7 average ITC for all trials +50 to +150 ms after mask) and returned relatively quickly to prestimulus levels thereafter (≈0.2 average ITC for all trials +300 to +450 ms after mask). Within this short-lived temporal window, however, ITC in the alpha frequency range (7–12 Hz) differed extensively between conditions within a cluster of bilateral occipital sensors (Fig. 4E; correct > incorrect, p < 0.001). A similar nonparametric permutation procedure within the same time range and sensors applied on power did not reveal significant differences between correct and incorrect trials on the cluster level (smallest cluster p > 0.7). In fact Figure 4E shows only those spectral-temporal samples significant in ITC on the cluster level (p < 0.05) that did not also show a concomitant significant increase in power on the uncorrected, single sensor level (p < 0.05).
Results indicate that the increased amplitude of visual-evoked responses for correct as compared with incorrect trials was at least in part due to a stronger phase synchronization locked to mask onset in the first condition. Note, however, that we are not making any general claim that the visual-evoked field was generated by a phase reset (Makeig et al., 2002; Hanslmayr et al., 2007b), instead emphasize that the observed difference in evoked activity between the experimental conditions of interest in our paradigm (correct and incorrect trials) was to a large extent due to phase consistency over trials and not to an amplitude increase. Effects in α-phase synchronization were most evident within a short-lived time interval after mask-evoked reset (100–200 ms) in which short SOA targets are supposed to arrive (133, 150 ms), but relatively low after this temporal window when long SOA target signals (300 ms) are expected (Figs. 1, 4E).
Visual-evoked activity for short and long SOAs
Next, we wanted to further characterize the temporal dynamics in response to these masking event(s) separately for visual transients in close and distant time intervals (short vs long SOA). We ran a similar cluster t statistic on the ERF (time series of interest from −500 to 1000 ms relative to mask onset, gradiometer sensors) in correct and incorrect trials, now each divided within the three different SOAs (33, 50, and 200 ms). In long SOA trials (200 ms) no significant cluster of amplitude differences were found (smallest cluster p > 0.4; Fig. 5C). For short SOA trials (33 and 50 ms), however, a cluster of central parietal sensor locations showed significant differences ∼100 ms after mask onset (for 33 ms SOA: p < 0.002 from +70 to +140 ms; for 50 ms SOA: p < 0.025 from +100 to +120 ms relative to mask onset; Fig. 5A,B). The general trend of stronger evoked amplitude within correct compared with incorrect trials around the peak-positive deflection, however, was observable for all SOA, but strongest for the short SOA trials (33 and 50 ms; Fig. 5D).
Interestingly the same pattern was observable in purely phase-locked evoked activity measured in ITC. Although significant within an occipital sensor cluster in the alpha frequency band (7–12 Hz) for all SOAs (same parameters like in the first analysis step; frequency of interest from 5 to 35 Hz, gradiometer sensors, time series of interest −200 to +500 relative to mask onset; for 33 ms SOA: p < 0.001 from 0 to +300 ms; for 50 ms SOA: p < 0.001 from 0 to +300 ms; for 200 ms SOA: p < 0.04 from +100 to +300 ms relative to mask onset), the effect size between correct and incorrect trials in ITC was almost 7 (for 33 ms SOA) or 4.5 (for 50 ms SOA) times bigger in short SOA trials compared with long SOA trials.
Target-related response profile
Evidence for timescale selectivity in the evoked response is further fostered by a differential target-related response profile between short and long SOA trials. Evoked magnetic fields of mask plus target trials (correct and incorrect trials) deviated from control mask-only trials consistently across SOAs in a later time window that varied with SOA (Fig. 5A–C). Due to this temporal dependency of this effect to target onset, it was probably not observable in the more global first analysis step. For 33 ms SOA, correct and incorrect trials differed significantly from mask-only activity in a left occipital cluster of sensors within the interval from +230 to +300 ms relative to mask onset (p < 0.001; Fig. 5A; but not very visible on this particular sensor). For 50 ms SOA, an occipital cluster of sensors showed this effect within the interval from +260 to +330 ms relative to mask onset (p < 0.02; Fig. 5B). For 200 ms SOA, this effect was located in an occipital parietal cluster of sensors within the interval from +320 to +460 ms relative to mask onset (p < 0.001; Fig. 5C).
The latency of the mask-evoked response (+100 ms) can serve as a temporal reference of when evoked responses in general are supposed to arrive at this particular cluster of sensors (Fig. 1). Thus adding the specific SOA to this reference provides an estimate of when signals related to target processing can be expected (Rieger et al., 2005). In long SOA trials (200 ms), expected (100 + 200 ms) and observed (320 ms after mask onset) latencies match quite well. It is important to note that no other significant indicator of only target-related activity was found until 1 s after mask onset on the cluster level within long SOA trials (smallest p > 0.1). In contrast, for short SOA trials (33 and 50 ms) there is a temporal delay in target-related responses between expected (100 + 33/50 ms) and observed values (230/260 ms after mask onset). A response to short SOA target displays is first measurable +100 ms later than expected, if physical time would map affine onto electrophysiological time (Figs. 1, 5A–C).
Target displays with a longer SOA also evoked a stronger visual response quantified in cluster effect size compared with short SOA trials (almost four times bigger than in 33 ms and almost nine times bigger than in 50 ms SOA trials). This tendency of decreased attenuation of the evoked response to the second stimulus with increasing onset asynchrony between two successive stimuli is commonly observable within paired stimulus paradigms (Hamada et al., 2002). The findings are consistent with the characterization of the evoked response as an indicator of the lifetime of sensory memory, based on the attenuation profile of somatosensory responses (Wikström et al., 1996; Hamada et al., 2001; Wühle et al., 2010).
Interaction between pre-mask oscillatory power and visual-evoked response across SOA
Signatures of temporal segregation and integration can be found both in prestimulus and poststimulus intervals. A more thorough examination of the data, however, reveals that these signatures are timescale specific. Whereas differences in oscillatory power before mask onset occur mainly within long SOA trials (200 ms), effects in evoked activity can foremost be found within short SOA trials (33 and 50 ms). To pin down this interaction statistically, we directly compared the effect sizes between prestimulus and poststimulus effects across short (33 and 50 ms) and long SOAs (200 ms). Because of the large differences in magnitude between prestimulus and evoked activity (×1012), the data points of interest (power within occipital cluster of sensors (Fig. 2C) at 15 Hz and −50 ms before mask onset and amplitude at central parietal sensor (Fig. 4B,C) at +115 ms after mask onset; Figs. 3D and 5D), were standardized to bring them on a common scale (z-scoring). First, we calculated for each subject the average prestimulus and poststimulus activity across correct/incorrect trials and different SOAs at the data points of interest. Based on these individual values we estimated the mean (M) and its SD across subjects separately for prestimulus and poststimulus activity and re-referenced each data point to its corresponding sample estimates (z-score: z = (xi, j − M:, j)/SD:, j; e.g., zsubj1, pre = (xsubj1, pre − M:, pre)/SD:, pre). Then effect size was calculated separately for prestimulus and poststimulus effects as the difference of these z-scores between correct and incorrect trials for each SOA. Since prestimulus and poststimulus effects have different algebraic signs (zcor, pre < zincor, pre; zcor, post > zincor, post), the reference category had to be inverted for prestimulus and poststimulus effect size calculation to yield effect sizes of equal direction (Δ z-scorepre = zincor − zcor; Δ z-scorepost = zcor − zincor). A 2 × 3 within-subjects ANOVA (prestimulus/poststimulus effect × 3 levels of SOA) revealed no significant main effects (prestimulus/poststimulus: F(1,13) = 0.68, p > 0.4; SOA: F(2,26) = 0.45, p > 0.6). More importantly, however, the interaction term yielded a highly significant effect (prestimulus/poststimulus × SOA: F(2,26) = 4.71, p < 0.018). As expected, strong effects for pre-mask oscillatory power were mainly found within long SOA trials (200 ms). Conversely, short SOA trials (33 and 50 ms) showed pronounced effects particularly for evoked activity (Fig. 6).
Discussion
We found two main signatures of temporal segregation and integration mechanisms in our paradigm: pre-mask β-oscillatory power and the evoked α-phase-locked component of the visual response to transient onset. These seem to be two relatively independent prestimulus and poststimulus effects that can be distinguished based on their comparative contribution in segregating temporally close (<100 ms) or farther apart (200 ms) visual transients. Considering also the third critical time period of evoked responses to the target, short and long SOA trials can be further distinguished based on latency differences. Visual responses to targets in close temporal proximity to a previous mask (short SOA trials) were delayed by ∼100 ms compared with responses to targets presented later in time (long SOA trials). Within this 100 ms window the consistency of the phase within α-oscillations (so approximately within one cycle) was indicative of correct or incorrect performance, but only for short SOA trials. On long SOA trials, on the other hand, modulations in pre-mask β-oscillatory power were associated with task performance.
Prestimulus oscillatory power
Incorrect trials, in which mask and target information were more closely integrated into a single percept, showed strong lower β-oscillations (∼15 Hz) throughout the entire prestimulus interval (Figs. 2, 3). Periods of strong β-power have been implicated in top-down control in integration of multisensory signals (Keil et al., 2012, 2013) and spatial contour elements (Volberg et al., 2013). A key attribute of integration across different domains is that local units are combined into a common perceptual or cognitive set (Wertheimer, 1923; Field et al., 1993). β-Oscillatory activity, in particular, has been linked to computational states of maintenance or persistence of the current perceptual/cognitive set, in opposition to a bias toward enhanced sensitivity to new information and expectancy of change (Engel and Fries, 2010).
In temporal vision integrating successive processing iterations within the current perceptual set can help to provide coherence and continuity of the sensory environment. Both correct and, less pronounced, incorrect trials reveal a clear tendency of decreased power at 15 Hz frequency with approaching mask display onset, and therefore increasing anticipation of perceptual change (Figs. 2, 3). Integration of sensory information within the current perceptual set via strong β-oscillations might be the default state of the visual system to emphasize continuity.
Within correct segregation trials, β-oscillations were effectively downregulated time-locked immediately before mask display onset (Figs. 2, 3). This temporal profile suggests that participants were able to predict the impending onset of the stimuli, enabling the observers to induce a specific neural state to exert top-down control.
Classically, β-oscillatory regulation has been reported within sensory-motor tasks like motor imagery (Bai et al., 2008; Waldert et al., 2008), voluntary movement control (Pogosyan et al., 2009), and anticipatory perceptual decision making (Donner et al., 2009). But regulating β-oscillations has, in particular, been implicated in top-down control in perceptual and cognitive operations (for review, see Engel and Fries, 2010), including predictive coding of upcoming perceptual events (Roelfsema et al., 1997; Bastos et al., 2012), visual search (Buschman and Miller, 2007), perceptual change in bistable images (Okazaki et al., 2008), and ambiguous auditory sounds (Iversen et al., 2009).
In the current paradigm top-down regulation of β-oscillations could signal the observers' anticipation of the upcoming sensory change. This induced sensitivity to new information, however, has a limited temporal resolution. Induced oscillatory amplitude regulations before stimulus onset determine whether temporally distant visual transients (long SOA trials) are segregated or integrated, but play a negligible role for short SOA trials (<100 ms). If the sensory signal has strong bottom-up constraints, like in short SOA trials in which (temporal) proximity serves as a strong integration cue (Feldman, 2001; Elder and Goldberg, 2002), predictive top-down regulation is deemed ineffective. For such fast visual transients more fine-grained temporal coding is needed.
Poststimulus phase reset
As a second main signature of temporal segregation and integration, we observed effects in the visual-evoked field at ∼100 ms after mask display onset. This effect was characterized by stronger phase consistency within approximately one α-cycle (from 100 to 200 ms after mask onset) for correct compared with incorrect trials (Fig. 4E). Its observed bandwidth matches psychophysical measures of the effective duration of integration masking (Scheerer, 1973; Enns and Di Lollo, 2000; Breitmeyer and Ogmen, 2006; Wutz et al., 2012), which in turn reflects the trace a visual stimulus leaves in iconic memory (Sperling, 1963; Di Lollo, 1980; Loftus et al., 1992). In fact strong phase locking is particularly important for segregation of fast visual transients, whose traces overlap within this temporal window, as in short SOA trials (Figs. 1, 5). Exact phase information within this integration window may be key to allow correct individuation of target information from temporally overlapping masking persistence.
Perturbations in phase consistency could either be a consequence of mask onset alone–acting as a reset event–or interactions between both mask and target transients. A strong reset event in close temporal proximity might ignite an informational trace with higher temporal resolution in which evidence can be accumulated more efficiently (Dehaene, 2011; Zylberberg et al., 2011). Recently, evidence for the importance of a reset event on subsequent psychophysical α-oscillations has been established behaviorally (Landau and Fries, 2012). The importance of ongoing oscillations in prestimulus intervals on fluctuations in detection thresholds of barely noticeable visual stimuli has been demonstrated in electroencephalographic studies (Busch et al., 2009; Mathewson et al., 2009). The strong forward mask might have reset the phase to different states of such an ongoing sampling rhythm when target-evoked signals are supposed to arrive. At least within the time window in which phase synchrony is high (∼100 ms, short SOA trials) this might help to explain differences in target detection between conditions (although observers rarely missed the target display entirely; see results of error analysis in behavioral data). A thorough examination of single trial distributions of relative phase angles between correct and incorrect conditions in the alpha band (analysis similar to Busch et al., 2009), however, did not reveal any indications for phase opposition reflecting (sub-) optimal processing states in mask or target-related epochs (data not shown).
Alternatively, the difference in phase consistency across trials might reflect interaction between mask and target. Within a similar paradigm, but together with backward masking, visual-evoked signals sum highly nonlinear on the scalp topography, especially when mask and target follow in close temporal succession (Rieger et al., 2005). Hence, interactions of signals from transients in close temporal proximity might perturb phase-locked responses. Phase information has been theorized to be important in connecting coupled systems–like visual signals and visual systems–through coherence (Fries, 2005) or synchrony (von der Malsburg and Schneider, 1986; Singer, 1999; Engel et al., 2001). In particular visual saliency has been associated with a translation into a phase code via timed release of inhibition (Van Rullen and Thorpe, 2001; Klimesch et al., 2007; Jensen et al., 2012). Exact phase coding around transient onset may therefore provide a precise temporal integration window within which structuring and individuation of the sensory image relies on this inhibitory timing to accurately encode visual information. Perturbations to this mechanism mediate between segregated and integrated mask-target percepts.
Temporal integration windows
Our perceptual impression reflects the need to construct stable and coherent objects and scenes while also remaining sensitive to new information with high temporal resolution (Melcher, 2011). Given that sensory input arrives continuously, the visual system must mediate between stable and flexible representations virtually in real time (Oðmen and Herzog, 2010). Here we show that when the sensory environment changes rapidly, as in short SOA trials, segregation of these changes depends on precise phase coding within a brief temporal window. Conversely, temporal segregation of sensory changes exceeding this critical time frame depends on slower power modulations before stimulus onset.
In contrast to previous studies on the temporal dynamics of target detection, we took advantage of a more sensitive enumeration task to probe the early structuring computations within the sensory image [object individuation (Xu and Chun, 2009) and visual routines (Ullman, 1984)], whose outputs can provide visual stability over time by indexing salient items (Pylyshyn, 1989; Kahneman et al., 1992). When two visual transients are presented in rapid succession, their persistence is partly integrated and thus the time to access the sensory trace of each single stimulus is reduced. In this way, temporal integration limits the computational capacity of individuation of multiple items from a single iconic trace (Wutz et al., 2012). The current study provides evidence that these mask-target interactions occur within a rapid temporal integration window (≈100 ms) that maintains the trace of visual persistence.
Precise temporal synchrony (through, e.g., short-lived eigenfrequency damped oscillations; Buzsáki and Draguhn, 2004) within this integration window could provide the high temporal resolution necessary for stability of the perceptual representation despite rapid sensory changes. The transmission of a continuous signal into discrete individual entities, however, is necessarily limited by the bandwidth of the carrier function (Shannon, 1948). Thus, such temporal windows constrain the real-time dynamics of visual processing, but likewise offer an explanation for its limited informational capacity (Sperling, 1963; Cowan, 2001). Fragmenting the continuous stream of visual information into different windows of temporal integration provides a neuronal mechanism to maintain the equilibrium between the competing challenges of providing fine temporal and informational resolution of the environment, stabilizing vision within a perceptual instant of time.
Footnotes
This research was supported by a European Research Council Grant (agreement 313658). We thank Gianpaolo Demarchi and Gianpiero Monittola for help and advice in MEG data acquisition.
The authors declare no competing financial interests.
- Correspondence should be addressed to Andreas Wutz, Center for Mind and Brain Sciences, University of Trento, Palazzo Fedrigotti-corso Bettini 31, 38068 Rovereto (TN), Italy. andreas.wutz-1{at}unitn.it
This article is freely available online through the J Neurosci Author Open Choice option.