Abstract
During binocular rivalry, conflicting images are presented one to each eye and perception alternates stochastically between them. Despite stable percepts between alternations, modeling suggests that neural signals representing the two images change gradually, and that the duration of stable percepts are determined by the time required for these signals to reach a threshold that triggers an alternation. However, direct physiological evidence for such signals has been lacking. Here, we identify a neural signal in the human visual cortex that shows these predicted properties. We measured steady-state visual evoked potentials (SSVEPs) in 84 human participants (62 females, 22 males) who were presented with orthogonal gratings, one to each eye, flickering at different frequencies. Participants indicated their percept while EEG data were collected. The time courses of the SSVEP amplitudes at the two frequencies were then compared across different percept durations, within participants. For all durations, the amplitude of signals corresponding to the suppressed stimulus increased and the amplitude corresponding to the dominant stimulus decreased throughout the percept. Critically, longer percepts were characterized by more gradual increases in the suppressed signal and more gradual decreases of the dominant signal. Changes in signals were similar and rapid at the end of all percepts, presumably reflecting perceptual transitions. These features of the SSVEP time courses are well predicted by a model in which perceptual transitions are produced by the accumulation of noisy signals. Identification of this signal underlying binocular rivalry should allow strong tests of neural models of rivalry, bistable perception, and neural suppression.
SIGNIFICANCE STATEMENT During binocular rivalry, two conflicting images are presented to the two eyes and perception alternates between them, with switches occurring at seemingly random times. Rivalry is an important and longstanding model system in neuroscience, used for understanding neural suppression, intrinsic neural dynamics, and even the neural correlates of consciousness. All models of rivalry propose that it depends on gradually changing neural activity that on reaching some threshold triggers the perceptual switches. This manuscript reports the first physiological measurement of neural signals with that set of properties in human participants. The signals, measured with EEG in human observers, closely match the predictions of recent models of rivalry, and should pave the way for much future work.
Introduction
When the two eyes are presented with incompatible patterns, often only one of the patterns is perceived at a time, and perception alternates between the two (Wheatstone, 1838). Such binocular rivalry provides a rare behavioral window to systematically study the processes controlling intrinsic neural dynamics and awareness. It is widely agreed that suppression of one pattern results from competition between populations of neurons, likely at multiple levels in the visual system, with some “winning” and suppressing the others (e.g., Blake, 1989; Blake and Logothetis, 2002; Wilson, 2003).
What causes the perceptual alternations in binocular rivalry remains more uncertain. Transitions in rivalry occur at seemingly random times, without conscious control, but the durations of stable percepts are indeed highly lawful, and follow an almost identical γ distribution across many different stimulus variations (e.g., Levelt, 1965; Brascamp et al., 2006; Cao et al., 2018; Skerswetat and Bex, 2023). To account for the timing of transitions, theories and models of rivalry propose that a dynamic neural process underlies the stable perceptual periods, for example, gradual changes because of adaptation of the neural population representing the dominant stimulus (e.g., Wilson, 2007; Shpiro et al., 2009) or noisy accumulation of activity in the neural populations representing both stimuli (Lankheet, 2006; Cao et al., 2016, 2021). Alternations arise when the gradually changing activity crosses some threshold that allows the previously suppressed population to “win” the competition, become dominant, and suppress the previously dominant one. Without such gradual changes in an underlying and noisy signal, models cannot reproduce the characteristic shape of the behavioral percept duration distributions.
Studies of online behavior support the idea of a gradually changing signal during rivalry. Visual sensitivity in each eye during rivalry changes slowly over the course of a stable percept, with sensitivity of the dominant eye decreasing and the suppressed eye increasing following a transition (Alais et al., 2010). Continuous psychophysical tracking of perception with eye tracking and joysticks also provides some evidence for gradual changes (Naber et al., 2011; Skerswetat and Bex, 2023).
Neural signals corresponding to the monocular stimuli, including those measured by steady-state visual evoked potential (SSVEP) amplitude (Brown and Norcia, 1997; Katyal et al., 2016), BOLD signals (Tong et al., 1998; Haynes and Rees, 2005; Wunderlich et al., 2005), and coherence between EEG or MEG channels (Cosmelli et al., 2004), show strong modulations during rivalry, strengthening and weakening in synchrony with perceptual dominance and suppression, beginning in the lateral geniculate nucleus and continuing throughout visual cortex. Given this match to perception, the periods of strengthening and weakening are naturally shorter in individuals with more rapid behavioral alternation rates (Spiegel et al., 2019; Bock et al., 2023).
Less is known about the time course of neural signals between transitions. Activity in higher-level visual areas appears to change gradually compared with changes produced by matched abrupt alternations of nonrivalrous stimuli (de Jong et al., 2020), and a recent paper reports an intriguing trend for these changes to be more gradual in individuals with slower alternation rates (Bock et al., 2023). However, to show that one has measured a gradually changing signal that underlies transitions in rivalry, it is key to show, within subjects, that once the signal reaches a threshold, a perceptual switch occurs. An equivalent formulation is that the time required for the signal to attain a particular level (i.e., the rate of signal change), or its slope during the time course, predicts percept duration. Previous work has not attempted to identify signals with this key property.
Below we identify a gradually changing neural signal during rivalry whose rate of change determines percept duration, matching theoretical predictions. We used EEG measurements of SSVEPs, taken from a large previously existing dataset (Katyal et al., 2019). We found activity corresponding to the dominant and suppressed percepts changed gradually leading up to perceptual switches. Critically, the changes in SSVEP amplitude were more rapid during shorter percepts and more gradual during longer ones. These trends can be produced by a simple accumulator model, fit to the behavioral data only.
Materials and Methods
Experimental design
Dataset
We used a previously reported dataset, comprising 84 participants (62 females, 22 males) from a study on binocular rivalry (Katyal et al., 2019). EEG signals were recorded from 34 channels in the 10/20 system, and preprocessed with standard methods. The present report focuses on data recorded during the binocular rivalry task. To aid statistical reliability of our results, we used a smaller sample of 21 participants of the 84 for exploratory analyses and the full set of 84 participants to validate the analysis (Katyal et al., 2019). Datasets have been posted on the Data Repository for U of M (https://doi.org/10.13020/9sy5-a716).
Task and stimuli
Twelve 120 s runs of a binocular rivalry task were acquired for each participant, during which they were presented with orthogonal (±45°) grayscale gratings, one to each eye, as illustrated in Figure 1A (top left box). One grating flickered at 14.4 Hz and the other at 18.0 Hz in each run, counterbalanced between eyes across runs. Participants were instructed to press one of three buttons, indicating a dominant percept of “tilt left,” “tilt right,” or a mixed percept, whenever their perception changed. They were asked to report dominance once one grating filled >90% of the stimulus field and mixed otherwise (Katyal et al., 2019).
Statistical analysis
EEG preprocessing
Raw EEG data were first downsampled from 1024 to 360 Hz, and then filtered sequentially with a 0.1-179 Hz bandpass filter and a band-stop filter around electrical line noises, which are 60 and 120 Hz. To remove ocular and muscle artifacts, an independent components analysis was implemented. In addition, the data were transformed with the Current Source Density toolbox (Perrin et al., 1989; Kayser and Tenke, 2006; Kayser and Tenke, 2015) to improve the specificity of EEG signals. The preprocessing procedure has been described previously (Katyal et al., 2019).
Time-frequency analysis
We first calculated the signal-noise ratio (SNR) for each electrode. Amplitudes within ±0.02 Hz around the signal frequencies (14.4 and 18.0 Hz) as well as a noise frequency (16.2 Hz) were calculated using Fourier transforms for each run of each participant. We then averaged across runs, and SNR was estimated as the difference of the mean amplitudes at the signal and noise frequencies divided by the amplitude at the noise frequency.
The frequencies used for our analyses were not precisely the frequencies that were specified in the empirical design. When we used the specified frequencies, we observed small linear changes in the phase over time during scans indicating that the true stimulus frequency was slightly different, likely because of display software timing. To identify the “true” stimulus frequencies, we conducted a grid search of frequencies and found the frequencies that minimized the phase shift during scans. These frequencies were determined to be 14.4016 and 18.0016 Hz.
To estimate the SSVEP amplitudes over time, defined as the strength of EEG signal modulation related to the two frequency-tagged stimuli, we used a phase-specific filter, assuming the phase of SSVEP was a constant shift relative to the stimulus phase (see also Jamison et al., 2015; Bock et al., 2023). The phase-specific filtering was computed by multiplying the EEG series A(t) by a sinusoidal wave at the stimulus frequency fϕ and phase ψ. Then, we smoothed the resultant by a Gaussian window
Because the phase offset of neural response in each scan was unknown and phase varied slightly scan to scan because of delay in the stimulus presentation software, we determined the phase ψ empirically. We selected the phase that maximized the integral of the amplitude function, for each participant and each scan, using a grid search with the precision of 0.1 rad as follows:
To aid in combining data across runs and observers, the SSVEP amplitudes for each run were z-scored across time. This yielded a time course of SSVEP amplitudes for each frequency that was used in our analyses below.
Data epoching and period selection
To analyze effects of percept duration, we extracted the SSVEP amplitude time course between the start and end of each uninterrupted period of perceptual dominance (Drew et al., 2022). Illustrated by Figure 1C, these periods corresponded to the time between an initial button press corresponding to an unmixed percept followed by a second button press corresponding to the opposite unmixed or the mixed percept with no presses between. To have a reliable estimation of time course, we excluded percepts shorter than 1.5 s, because the SSVEP signal during short periods may be affected by the temporal smoothing of our filter used to estimate amplitude. Periods longer than 5 s were also excluded because some participants had very few of that length, and also they may have simply resulted from missed reporting of a switch because of an attentional lapse.
Time point-wise comparison
Our first analysis simply tested whether the amplitudes of the SSVEP signals from percepts of different duration differed at individual time points. We aligned the SSVEP amplitudes to either the start or the end of periods and used a simple linear model to test whether SSVEP amplitudes at each time point were linearly related to percept duration (i.e., higher amplitude in longer duration periods and lower for shorter periods). Linear models were fit separately for each participant and significance of parameters was tested with a simple t test across participants.
Inter-time point modeling for the SSVEP amplitudes
As noted above, we predicted that the neural signals corresponding to the dominant and suppressed stimuli would change gradually during the period, and that further, this change would be more gradual for longer periods. To test this, we fit a model to the set of SSVEP time courses, of varying period duration, rather than just to individual time points. We used the simplest possible model of amplitude change, a linear slope. The model contained a term representing the slope, and another factor for the interaction between that slope and period duration, allowing it to be shallower for longer periods. We fit separate models to the dominant and the suppressed amplitudes for all periods simultaneously.
Specifically, the models were of the following form:
Again, linear models were fit separately for each participant, and significance of parameters was tested with a t test across participants. We excluded the last 500 s of each period from this analysis, as this segment likely included the transition between percepts and also the response generation for next button press (Drew et al., 2022), and used the remaining portion. To ease computation, we also centered the time index at the center of the period, so that t = 0 fell at the middle time point of each trimmed period.
Accumulation-to-threshold simulation
To demonstrate that an accumulation-to-threshold model (Cao et al., 2016) predicts signals similar to our observed SSVEP amplitudes, we fit a simple version of it to our behavioral data. We used a standard one-sided drift-diffusion model (Ratcliff and Smith, 2004; Cao et al., 2016) and assumed the drifting signal controlled the rivalry time course. Starting from x0, the signal accumulated to the threshold θ with a fixed drifting rate μ and Gaussian noise with variance σ throughout the percept period. Perceptual switches occurred when the signal reached the threshold. We fixed the distance to the threshold,
For comparison with our data, we used the model to simulate neural activity in 2 min simulated blocks, assuming alternating percepts (i.e., no mixed percepts). We did this by fitting subsequent percepts with drift in the opposite direction, with x0 set to the ending point of the previous duration and theta alternating between positive and negative values. We then smoothed the time course with the same smoothing filter used for our SSVEP amplitude estimation, and normalized the data by z-scoring the signal from each run, again as was done for the SSVEP amplitudes. We plot the results of simulating 100 blocks and binning and averaging data.
Results
Opposing and gradual changes in SSVEP signals
Eighty-four observers viewed two orthogonal sinusoidal grating patches presented one to each eye (Fig. 1A) and reported their percept with a button press. We measured perceptual dominance duration as the time period between two consecutive button presses, with one indicating perception of “tilt left” and one “tilt right,” or vice versa, thus excluding periods of perceiving a mixture of the two gratings from analysis.
Illustration of the paradigm and the SNR. A, Stimuli were flickering at 14.4 and 18 Hz (counterbalanced across runs) to induce SSVEP. B, SNR for different electrodes. C, Magenta and green curves represent illustrative SSVEP amplitudes for the two frequency-tagged stimuli. We epoched the data based on uninterrupted perceptual periods. Periods with dominance durations <1.5 s or >5 s were excluded. D, Histogram represents the distribution of all reported percept durations from all participants.
Durations of percepts during rivalry followed the typical distribution shape (Fig. 1D). On average, participants' dominance periods lasted 2.825 s (SD = 0.609 s). We selected dominance periods longer than 1.5 s but shorter than 5 s for additional analysis to have a large enough sample for robust estimation of the time course within and across observers to conduct time course analysis (see Materials and Methods).
Neural signals during rivalry showed an opposing pattern expected from previous work. Figure 2A plots the time course of SSVEP amplitudes during stable perceptual periods between successive button presses whose timings are indicated by the starts and ends of the horizontal rasters. Neural signals at the frequency of the perceptually dominant stimulus were high at the time of the initial button presses and fell monotonically until the next button press (which indicated that perception had transitioned to a mixed percept or to the other grating). The suppressed grating showed the inverse pattern, starting low and rising throughout the period to a peak around the time of the second button press.
Time courses for perceptual periods of different lengths. A, SSVEP amplitudes (from Oz) as a function of dominance duration, averaged within each of 100 bins for visualization. SSVEP amplitudes associated with the dominant stimulus decrease over periods, while the SSVEP amplitude associated with the suppressed increases. B, SSVEP amplitudes averaged for short, medium, and long perceptual durations, for both dominant and suppressed stimuli. We aligned each period to the start (left) and end (right) of the period separately. The slope succeeding the period start differs as a function of duration, but differences are smaller preceding period end. Solid horizontal lines indicate significant effects of duration on the amplitude (t test, p < 0.01), tested with linear regression at each time point (see Materials and Methods).
The large dataset allowed us to examine trends in SSVEP amplitudes as a function of percept duration. For both dominant and suppressed signals, amplitudes changed gradually throughout the period, and generally most rapidly toward the end, particularly 500 ms before the end. The initial gradual change appeared to lengthen as duration increased, while the late rapid change appeared to not depend greatly on duration. To better visualize differences in the time course as a function of duration, we averaged signals within three bins with different duration ranges; Figure 2B plots binned averages for time points following and aligned to the first button press (left) and preceding and aligned to the second button press (right).
Slower changes in SSVEP signals for longer durations
In Figure 2B, the slopes of the time courses for different bins diverged following the first button press, as early as 0.5 s. But preceding the second button press, the SSVEP time courses were relatively similar beginning ∼1 s before the button press. This pattern is what one would expect if percept duration was determined by neural signals accumulating at different rates until they reach a threshold, where similar transitions are initiated regardless of duration.
To test formally how the SSVEP time courses varied with percept duration, we fit linear models to the data (see Materials and Methods). We first tested for differences in the SSVEP amplitudes as a function of duration for each time point independently (Fig. 2B). While the figure plots binned and averaged data, statistical tests were conducted on unaveraged data. Time points where there was a significant effect of duration on amplitude are shown by the horizontal lines. Effects were visible beginning at ∼600 ms following the first button press, with shorter durations showing lower dominant amplitudes and higher suppressed amplitudes, indicative of more rapid change. Leading up to the second button press, time courses were more similar across durations, with the dominant amplitudes appearing almost identical across durations and the suppressed signal rising to a slightly higher peak for shorter durations.
To more directly test whether the SSVEP time courses differed in rate of change across durations, we fit lines to the time courses from all durations, and tested for differences in slope. We excluded the last 500 ms of the time courses because theories predict little differences in signal there. We used a model that included a global mean (β0) and global slope (β1), but also terms modeling linear effects of duration on the mean (β2) and on the slope (β3); the last allows a test of whether the slope of the time course changed with duration (see Materials and Methods). The model was fit for each participant separately, for both dominant and suppressed signals, and the significance of each coefficient was tested across participants with t tests. Figure 3A plots results from Oz based on its highest SNR.
Linear model fits to SSVEP time courses. A, Model coefficients for electrode Oz, for SSVEP amplitudes at the dominant stimulus frequency (purple) and the suppressed stimuli frequency (green). Colored dots plot individual participant coefficients. Black symbols plot across participant means and SEMs. To aid interpretation, the intercept β0 is not shown; instead, we show the average amplitude
As expected, a significant decreasing trend during the period was observed in the amplitudes for the dominant stimulus as well as a significant increasing trend for the suppressed stimulus (
We repeated the model fitting for data from all electrodes. The same trends were visible in most electrodes; that is, amplitudes decreased for the dominant stimulus during the period, and increased for the suppressed stimulus (β1) and these slopes depended on the dominance duration (β3). β1 and β3 were significantly different from zero in almost all electrodes for the dominant stimulus, but fewer for the suppressed stimulus (Fig. 3C). We also fit the model to the last 500 ms of periods, and found few effects of duration, confirming theoretical predictions of smaller differences immediately before responses.
This pattern of results matches the predictions of theories that propose rivalry is determined by noisy neural signals accumulating until they reach a threshold, with shorter durations resulting from more rapid accumulation, and longer durations from slower accumulation, because of the cumulative effects of noise. To demonstrate formally that such a model predicts the patterns observed in SSVEP data, we fit a simple drift-diffusion model to our behavioral data and simulated neural time courses from the drifting signal in the model (see Materials and Methods). The simulation reproduced the major trends in our data, the visible varying slope at the beginning of percept periods and almost the same time course before the end (Fig. 4). We emphasize that the model used in the simulation was fit to the behavioral distributions alone; the neural signals are true predictions of the model, not fits. Because our simulation did not include an estimate of the time required to generate a response and press the button (i.e., reaction time), the peaks and troughs are shifted by ∼250 ms in time relative to the SSVEP data.
Results of a simple accumulator model. A, The signal accumulates (drifts) with a fixed rate μ and Gaussian noise with variance σ. When it reaches the threshold θ, the threshold and the drifting rate inverts and the signal accumulates to the other percept. B, Model accumulator signals averaged within 100 duration bins, as in Figure 2A. C, Averaged model signals within 3 bins aligned to the start (left) and end (right) of the period separately, as was done for SSVEP data in Figure 2B. The slope succeeding the period start differs as a function of duration, but time courses converge preceding the period end, similar to the patterns seen in the SSVEP data.
Discussion
Our results provide clear evidence for an accumulating perceptual signal during stable percepts in rivalry: Longer percept durations were associated with more gradual changes in the SSVEP amplitudes, and shorter durations with more rapid ones. This pattern strongly suggests that the SSVEP contains a neural signal whose evolution controls the timing of perceptual alternations.
Our findings are in general agreement with past behavioral work, and one study using intracranial recordings, suggesting that the time course of signals related to rivalry change gradually during perceptual periods (Alais et al., 2010; Naber et al., 2011; de Jong et al., 2020; Skerswetat and Bex, 2023). These studies did not examine correlates of percept duration, however. Intrinsic neural oscillations do appear to wax and wane during a percept in a way that predicts its duration (Doesburg et al., 2005, 2009; Drew et al., 2022). Our frequency-tagged SSVEP signals are more closely tied to stimulus representations, and so should be more attractive targets for neural modeling (see below). The frequency tagging made it difficult to measure intrinsic oscillations in our study, and future work could examine whether and how intrinsic oscillations interact with stimulus representations during rivalry. Both rate of change in SSVEP amplitudes and frequency of intrinsic oscillations are also related to individual differences in rivalry switch rate (Fesi and Mendola, 2015; Katyal et al., 2019; Bock et al., 2023).
Our results provide physiological support to models of rivalry that attempt to capture the stochastic properties of percept durations between alternations (e.g., Brascamp et al., 2006; Moreno-Bote et al., 2007; Wilson, 2007; Cao et al., 2021). In most current models, different populations of neurons encode the two visual stimuli, and the populations' input contains independent additive noise. Competitive inhibition between the populations ensures only one is highly active at a time, corresponding to perceptual dominance of the corresponding stimulus. To allow switching, most models include a gradual change in activity, with the dominant population decreasing and the suppressed population increasing over time. Once activity levels pass some threshold, the suppressed population “escapes” suppression and perception flips. Earlier theories assumed neural adaptation caused the decrease in the dominant population's response (Lehky, 1995; Lankheet, 2006; Wilson, 2007), but later modeling studies suggest that an alternative theory, in which activity in the two populations accumulates over time, can better predict the distribution of perceptual durations across conditions (Brascamp et al., 2006; Moreno-Bote et al., 2007; Cao et al., 2014, 2021).
Our results are most consistent with this latter theory. Adaptation theories generally also predict that the changes in activity occur at a fixed rate, but with nonaccumulating additive noise pushing activity across threshold at different times. Accordingly, these theories predict similar slopes for the changing activity in different percept durations. The accumulation theories, on the other hand, model changes in activity because of a stochastic random walk to a threshold, with greater or lesser amounts of accumulating noise causing variability in time to reach threshold. These models predict that shorter durations are caused by more rapid random walks (i.e., signal changes with steeper slopes) (Cao et al., 2016).
A simple drift-diffusion model was used to bridge between the accumulation theories and our observed neural signals (Fig. 4). Such a model, fit only to behavioral results, predicts gradually changing neural signals that closely resemble our SSVEP amplitudes, which demonstrates that drift-diffusion models can in principle account for the observed accumulation.
One other hallmark of a signal that is noisily accumulating to a threshold (at least in most models) is that immediately before the threshold time courses converge to have similar slopes (e.g., O'Connell et al., 2012). We observed this pattern in both our model simulations and in our SSVEP data, where the end of time courses was similar regardless of duration. We did, however, see an elevation of suppressed signal at the end of shorter periods. This effect was relatively small, and was statistically reliable only at the Oz electrode. Accordingly, we do not speculate on its functional significance.
The accumulation theory, along with the increasing evidence of top-down modulation of perceptual rivalry, puts rivalry in the framework of decision-making (Frassle et al., 2014): Our results resemble the accumulation to threshold of evidence that is observed in such tasks (e.g., O'Connell et al., 2012; Schall, 2019). That is, the magnitude of the SSVEP may reflect the strength of accumulated evidence that is used by later areas to reach a decision. The decision-making framework has recently been extended to include the notion of value as part of a further reconceptualization of bistable perception (Safavi and Dayan, 2022). The approach generally, and our results specifically, agree with past work that finds that the SSVEP does not necessarily match conscious perceptual reports (Davidson et al., 2020).
The accumulating signal, particularly for the suppressed stimulus, was observed primarily in posterior electrodes. This likely reflects the origin of the SSVEP response, which is believed to be in occipital visual areas V1-V4 (Di Russo et al., 2007; Zhang et al., 2011; Jamison et al., 2015). Future work can measure this signal with methods that possess higher spatial precision. Nevertheless, our results represent one of the first reports of accumulating signals at this relatively early stage, with information persisting until the perceptual transition. Evidence for informational persistence in the early visual cortex has also been found in working memory tasks (Harrison and Tong, 2009; Zhao et al., 2022). Accumulating signals have been most frequently identified in later visual areas, such as the lateral intraparietal area (e.g., Shadlen and Newsome, 1996; Roitman and Shadlen, 2002; O'Connell et al., 2012) in work that failed to find accumulation at earlier stages (e.g., area MT). The conditions under which information persists and/or accumulates in earlier visual areas remains an important open question.
Similar accumulating signals have also been modeled as buildup of predictive error (Weilnhammer et al., 2017). That is, signal magnitude may reflect a growing difference between current perception and sensory input. These error-prediction signals were primarily found in frontal and insular cortices, and so our results in posterior electrodes generally favor an evidence accumulation account. However, we cannot rule out the possibility that the posterior signals reflect feedback from higher areas.
Together, our results strongly constrain theories about and models of binocular rivalry. The approach taken here may also be applicable to many other bistable percepts, which could be controlled by a similar accumulating noisy signal (Cao et al., 2016). In rivalry, the accumulating signal we identified may help answer many additional questions about rivalry's neural bases and computational mechanisms. For example, it should be possible to investigate the origin of the accumulating noise, and fluctuations in attention, which may modulate rivalry (Paffen and Alias, 2011; Li et al., 2017; Drew et al., 2022), are promising candidates.
Footnotes
The authors declare no competing financial interests.
- Correspondence should be addressed to Shaozhi Nie at nie00043{at}umn.edu