Abstract
Human vision combines inputs from the two eyes into one percept. Small differences “fuse” together, whereas larger differences are seen “rivalrously” from one eye at a time. These outcomes are typically treated as mutually exclusive processes, with paradigms targeting one or the other and fusion being unreported in most rivalry studies. Is fusion truly a default, stable state that only breaks into rivalry for non-fusible stimuli? Or are monocular and fused percepts three sub-states of one dynamical system? To determine whether fusion and rivalry are separate processes, we measured human perception of Gabor patches with a range of interocular orientation disparities. Observers (10 female, 5 male) reported rivalrous, fused, and uncertain percepts over time. We found a dynamic “tristable” zone spanning from ∼25–35° of orientation disparity where fused, left-eye-, or right-eye-dominant percepts could all occur. The temporal characteristics of fusion and non-fusion periods during tristability matched other bistable processes. We tested statistical models with fusion as a higher-level bistable process alternating with rivalry against our findings. None of these fit our data, but a simple bistable model extended to have three states reproduced many of our observations. We conclude that rivalry and fusion are multistable substates capable of direct competition, rather than separate bistable processes.
SIGNIFICANCE STATEMENT When inputs to the two eyes differ, they can either fuse together or engage in binocular rivalry, where each eye's view is seen exclusively in turn. Visual stimuli have often been tailored to produce either fusion or rivalry, implicitly treating them as separate mutually-exclusive perceptual processes. We have found that some similar-but-different stimuli can result in both outcomes over time. Comparing various simple models with our results suggests that rivalry and fusion are not independent processes, but compete within a single multistable system. This conceptual shift is a step toward unifying fusion and rivalry, and understanding how they both contribute to the visual system's production of a unified interpretation of the conflicting images cast on the retina by real-world scenes.
Introduction
The human visual system creates singular percepts from two monocular inputs. Small differences are “fused” into intermediate percepts, whereas larger differences are perceived from one eye at a time in a stochastic process called binocular rivalry (Wheatstone, 1838). These phenomena have provided insight to binocular combination effects (Blake and Fox, 1973; Blake et al., 1981) and perceptual suppression (Blake and Logothetis, 2002; Blake and Wilson, 2011) respectively. How do these processes interact to produce single vision? Is fusion a default, stable state that only breaks down into rivalry when stimuli cross a threshold of non-fusibility? Or are monocular and fused percepts better seen as sub-states of a single dynamic system, where either could result for an intermediate stimulus?
Fusion and rivalry are often assumed to be mutually-exclusive and stable outcomes for a static stimulus, but this view is supported mostly by paradigms using stimuli designed to robustly elicit one or the other. Binocular rivalry has long been studied using orthogonal gratings (Fox and Herrmann, 1967; Levelt, 1965), whose dominant image can be identified by orientation. Fusion and stereopsis have been explored using near-vertical gratings with small orientation differences which fuse and tilt in depth (von der Heydt et al., 1981; Gillam and Rogers, 1991; Adams and Mamassian, 2002). A handful of studies have included intermediate orientation disparities (Wade, 1974; Kitterle and Thomas, 1980; O'Shea, 1998) but these disallowed reports of fusion. Studies examining fusion and rivalry together have typically used more than two stimulus components, pitting potentially-fusible pairings against potentially-rivalrous ones. Such studies found that fusion can suppress rivalry (Blake and Boothroyd, 1985; O'Shea, 1987; Blake, 1989, 2001; Blake et al., 1991) and vice versa (Erkelens, 1988; Blake, 1989; Harrad et al., 1994). Although these studies provide insight to how stimuli are paired, they have not addressed how fusion and rivalry relate over time for a single stimulus. Nonetheless this literature has been taken to imply that for a given static stimulus, stable fusion or rivalry will result depending on the disparity present (Wilson, 1977, 2017; Blake, 1989; Buckthought et al., 2008).
We aimed to determine whether fusion and rivalry could compete over time in a static stimulus of intermediate orientation disparity. Prior studies have shown a stimulus history-dependent hysteresis effect in that some stimuli can give rise to fusion or rivalry (Wilson, 1977; Buckthought et al., 2008) dependent on how a different, previously shown stimulus was perceived. Our study differs from these studies in asking whether perceptual states of fusion and rivalry can dynamically change over time when viewing a single static stimulus. To test this, we recorded perception of stimuli with a range of orientation disparities. We found that at around 30° of orientation difference, fusion and rivalry could coexist in a “tristable” dynamic state where perception alternated between fusion and monocular views. Our tristable stimuli reveal novel dynamic properties of fusion that challenge its portrayal as a stable state. Several models with fusion and rivalry as separate bistable processes could not reproduce our findings, but a simple bistable model extended to have three states could. Fusion and both rivalrous dominance conditions should thus be considered multistable substates of a larger perceptual process rather than results of separate bistable processes.
Materials and Methods
Observers.
Observers were 15 undergraduate students (10 female, 5 male) from Stanford University who received credit in an introductory psychology course for participation. All had normal or corrected-to-normal visual acuity, and exhibited normal binocular perception thresholds (50 s of arc or less at 16 inches as determined by a RANDOT test (Stereo Optical; Fawcett and Birch, 2000). All observers were naive to the purpose of the experiment. Prior written informed consent was provided according to a protocol approved by the Institutional Review Board of Stanford University.
Apparatus.
Stimuli were generated using MATLAB (The MathWorks) and MGL (Gardner et al., 2018). They were presented through a half-silvered pellicle haploscope (Planar, model SD2620W) that allowed for the images from two computer screens to be overlaid and seen exclusively by one eye each through polarized glasses. Observers sat in a darkened room one meter from the monitors, with perpendicular lines of sight to both. Monitors had a resolution of 1920 × 1200, a refresh rate of 60 Hz, and subtended 57.6° × 37.9° of visual angle. A keyboard resting on their lap was used to record responses.
Stimuli.
Single Gabor patches were presented separately to each eye, fusing into a vertically-oriented percept or engaging in rivalry depending on their orientation difference. They had a carrier spatial frequency of 5 cycles per degree and the Gaussian window had a standard deviation of 0.14° (see Fig. 1A). The patches were kept small in an effort to avoid incomplete rivalry, as with larger stimuli sub regions of a scene can be perceived through different eyes at the same time (Wilson et al., 2001; Kang et al., 2009). Psychophysical studies have shown that smaller stimuli are more likely to result in periods of complete dominance (Blake et al., 1992).
We provided a number of features to help observers maintain fixation on the Gabors with both eyes. A black fixation point marked the center of each patch, whereas three black circles from 0.5° to 0.7° in radius were shown in both eyes as a strongly-fusible reference frame. In each trial, two additional dots flanked the patches. These served as references for the true vertical or horizontal axes. All of the above features were present in both eyes. Nonius “v's” also emanated from the fixation point, pointing upwards in one eye and downwards in the other so that they formed a binocular “x” during proper vergence to further aid in maintaining fixation. The rest of the display was set to mean luminance (see Fig. 1A).
Experimental design.
Observers were shown pairs of monocular Gabor patches with orientation disparities ranging from 0° to 90°, and were asked to continuously report their perceived orientations over 1 min trials. In the main experiments, each stimulus was symmetrical about the vertical axis. Observers were told to hold down the left arrow key if the stimulus appeared rotated counter-clockwise from vertical and the right arrow key if it appeared rotated clockwise - tracking periods of perceptual dominance. The up arrow key was used to indicate a perfectly vertical orientation. This would indicate fusion, as for small disparities the stimuli would be seen as a single vertical Gabor tilted in depth about the horizontal axis. This tilt was not mentioned to observers, given that a fused Gabor should appear perfectly aligned with the reference dots and be reported as vertical. All buttons were to be released if the stimulus appeared patchy or of uncertain orientation, until it resolved into one of these three percepts. Observers responded with one of the three given keys 92.86% of the time, suggesting that patchy or mixed percepts only made up a small proportion of our results. After being given these instructions, observers were shown an example stimulus with 30° of disparity and asked to practice their responses to ensure they understood the task. They were encouraged to ask any questions they had about the procedure. After this practice period, each stimulus was shown for 1 min with the next stimulus pending a “ready” button press that allowed for a break between trials. Nine levels of orientation disparity (0, 10, 20, 25, 30, 35, 40, 50, and 90) were shown for 2 1 min trials each, swapped between eyes to control for ocular dominance. Shuffled, this produced an 18 min trial block. Sample trial records for three subjects at 0°, 30°, and 90° can be seen in Figure 1B.
To test whether fusion and rivalry could coexist in a tristable state without the involvement of stereopsis, another block of trials was run using a horizontal reference axis. All stimuli were the same, but rotated 90°. Gabor patches with small orientation disparities about the horizontal axis (i.e., vertical disparities) are seen as a single fused horizontal stimulus (Kertesz and Jones, 1970) but do not elicit a perception of depth (Cumming et al., 1991). This allowed fusion to be elicited in the absence of any depth cues from stereopsis. Observers were given the same instructions as before, but with the up arrow key now indicating a perfectly horizontal stimulus.
To better characterize the dynamics of tristable stimuli, each observer was given six more one-minute trials with the “most tristable” stimulus disparity from the vertical trial block, whose orientation we will refer to as that subject's tristable point. To determine the tristable point, the difference between the total duration of fusion and the mean of the two rivalrous durations was calculated for each disparity. The stimulus disparity with the smallest difference for each observer was chosen as their tristable point and presented three times with crossed- and un-crossed disparities in a 6 min trial block.
To confidently treat vertical reports as incidences of fusion, it was important to ensure that observers were not reporting all slightly-oriented stimuli as vertical. Each observer was given a simple orientation discrimination task at the beginning of the session to confirm their ability to distinguish small orientations. Observers were asked to classify a series of binocularly matched Gabor patches, otherwise identical to those in the main experiment, as rotated clockwise or counter-clockwise from vertical. Each was shown for 350 ms, then replaced by a random noise mask within the same Gaussian envelope to prevent observers from using iconic memory to compare subsequent stimuli (Sperling, 1960). This mask remained until the right or left arrow keys were used to indicate clockwise and counterclockwise judgments after each presentation. Forty trials were presented with orientations chosen according to a 3-down 1-up staircase beginning at 1.5° and advancing by steps of 0.25°. Our observers showed an average discrimination threshold (calculated by taking the mean of all of their performance reversals) of 2.65° of orientation, with a standard deviation of 1.00° and a total range of 1.28° to 4.36°. These results are consistent with previous work finding that orientation discrimination thresholds in humans generally lie at 2° of orientation or less (Heeley and Timney, 1988; Paradiso and Carney, 1988; Heeley and Buchanan-Smith, 1990). As the smallest nonzero orientations in our task were 10°, we conclude that vertical reports outside of the 0° condition indicate truly fused percepts rather than imperceptibly rotated monocular images.
Model simulations.
We tested a range of models where fusion and rivalry were separate bistable systems, each model simulated by sampling from the actual distributions of durations our observers reported. We also tested a single-process model that was a variant of an existing model of rivalry (Wilson, 2003). Fusion durations were taken from the eight total minutes of data recorded at the tristable point. Interfusion durations were sampled from the same data, where the total time elapsed between two periods of fusion was considered to be an interfusion period. The final periods in all trials were excluded from sampling as their durations were artificially truncated by the trial's ending. Rivalry durations were sampled from the 40°, 50°, and 90° disparity stimuli in both horizontal and vertical reference trial blocks. This was based on evidence that rivalry rates do not vary with orientation disparity (Wade, 1974), and so those trials would be a good representation of “underlying” rivalry at the tristable point. Some conflicting studies (Kitterle and Thomas, 1980; O'Shea, 1998) have found a decrease in rivalry alternations over time at smaller orientation disparities. However, “rivalry alternations over time” is not a direct measurement of percept duration and the observed decrease could actually result from the insertion of fusion periods—a possibility acknowledged by the authors. We used these duration distributions to simulate various possible ways the rivalry process could behave during periods of fusion.
Two-process stopping model.
One possibility is that the process of rivalry stops when fusion is perceived, and then continues on from where it left off after fusion subsides, what we call the “stopping model.” To simulate the stopping model, rivalry was first simulated by taking alternating samples from the left- and right-eye dominance duration distributions until they spanned at least 8 60 s periods. The same process was used to produce a series of fusion and interfusion durations. Then the fusion periods were placed between the rivalry periods, using the interfusion durations to space them. This effectively resulted in a rivalry process that was intermittently frozen during periods of fusion.
To relax this rather strict assumption that the rivalry process was completely stopped during fusion, we also built models in which rivalry was allowed to slow down during fusion. We tested 1.5× and 3× “slowdown” models. These models were produced in the same way as the stopping model, only with some portion of the rivalry series that occurred after the fusion insertion point removed to simulate that the rivalry process was continuing during fusion. The amount removed was related to the length of the fusion duration such that the ratio of fusion duration to rivalry removal was either 1.5× or 3× (e.g., 1 s of rivalry was skipped after a 3 s period of fusion for the 3× slowdown model).
Two-process continuation model.
The process of rivalry could also continue at a normal rate during perceived fusion, a model we term the “continuation model.” The same process as above was used to simulate this possibility, but the fusion was overlaid instead of interleaved into the rivalry. This can be thought of as a 1× slowdown model, where for each period of fusion an equally long period of rivalry was cut from the rivalry series. This simulated rivalry as a totally separate process carrying on unaffected underneath fused percepts.
Two-process interruption model.
The rivalry process might alternatively be completely disrupted during fusion, so that rivalrous percepts seen after periods of fusion are akin to those seen following new stimulus onsets. Psychophysical experiments have shown that following interruptions, rivalrous percepts tends to return to their prior states (Leopold et al., 2002; Pearson and Brascamp, 2008). Most relevantly for our paradigm, interruptions of rivalrous stimuli by fusible ones have demonstrated similar effects (Kanai et al., 2007). To capture these effects, we used a simple model previously developed to account for the early dynamics of rivalrous perception that lead to perceptual stabilization across interruptions (Noest et al., 2007). This model predicts the course of rivalry across a series of simulated “on” and “off” periods. It uses two variables per eye to do so: a “local field” value H representing the stimulus-related membrane potential of responding neurons, and a value A, which implements adaptation through shunting-style gain control. These variables were updated at each time step according to the following equations:
In our case, Hi represents the response of neurons to the counter-clockwise-rotated stimulus, whose strength is Xi. The gain control term −(1+Ai) Hi serves to lower the response to the stimulus over time, with Ai approaching a sigmoidal transformation of Hi. βAi is a critical term for breaking symmetry and producing the percept stabilization effects upon each initial presentation of the stimuli. Finally, −γS[Hj] is the cross-inhibition from the neurons responding to the competing stimulus. Aside from τA, we used all of the same parameter values as in the original paper: X = 1, α = 5, τH =
, γ =
, β =
, as well as the same sigmoid function S(z > 0) =
; S(z ≤ 0) = 0. We set the initial H values to 0 and the initial A values to be low and asymmetric as suggested in the paper (Ai = 0.5, Aj = 0.45). The original model used τA = 1s, and specified that any arbitrarily faster time constants for H gave similar results (τH << 1). We kept τH = 20 ms, but changed τA for each subject to fit the mean rivalrous percept durations produced by the model to those the subjects perceived in the [40 50 90] disparity conditions. This ensured that the model would simulate a similar underlying rate of rivalry to each subject's observed rate. A grid search revealed that the model's mean durations depended linearly on the value of τA as follows: mean duration = τA *. 55 + 507 ms. The resulting τA values ranged from 4.30 to 10.19 s, with a mean of 7.02 s. The mean rivalry durations produced by the model without interruptions then had a correlation of 0.996 with the rivalry durations reported by the subjects in the [40 50 90] rivalry conditions.
To compare this model's performance with our data, we used the fusion and interfusion durations reported by our subjects as stimulus off and on durations respectively. We simulated 8 min for each subject, with a simulation time step of 1 ms. To convert the results to a button press format, time steps where Hi > Hj were considered clockwise reports, Hj > Hi were considered counterclockwise reports, and any points where both values fell below 0.4 were considered vertical (fused) reports. Bootstrapping was achieved by choosing 8 of the subject's 8 1 min trials to use as input each time.
Single-process model.
Rather than being separate bistable processes, fusion and rivalry might exist within a single process that can alternate between all three states. We simulated such a single-process model by expanding a portion of an existing model that could produce binocular rivalry (Wilson, 2003) to have three competing units. The original model contained left- and right-eye excitatory units that each activated inhibitory neurons to suppress the other, and self-adapted over time. We added a third unit to this model, connected to each of the original units with the same parameters and interactions to preserve symmetry. Each unit (left, right, and fused) thus had three variables which changed over time - an excitatory strength EV, inhibitory strength IV, and adaptation term HV. All three values were initially set to zero and then updated according to the following equations (shown here for the “left” unit):
EVleft was the excitatory activity of the unit representing a counter-clockwise-rotated percept when viewing the tristable stimulus, whose input strength Vleft was set to 10 for all values of t. The two other units representing a clockwise or fused percept had equivalent variables, EVright and Efused. Each unit received inhibitory input from its two neighbors with strength g = 0.45. The dynamics of the excitatory activity had a time constant τE = 20 ms. The asymptotic value of each unit's excitatory activity was described by a Naka–Rushton-like equation; that is, it rose sigmoidally to a maximum firing rate as a function of the difference between the input and inhibition from the other two units. This difference was half-rectified such that if the inhibition exceeded the input the difference was considered zero, i.e., for the left unit [Vleft(t) − gIVright − gIVfused]+, as seen in Equation 3. Each unit suppressed the other two with an inhibitory activity IV that approached its excitatory activity EV with a time constant of τI = 11 ms (i.e., IVleft approaches EVleft, as seen in Eq. 4). Our model also included a noise component in the inhibitory strength equation, which caused simulated percept durations to become log-normally distributed (Fox and Herrmann, 1967; Lehky, 1988). The noise values were drawn from a Gaussian distribution with mean 0 and standard deviation 400 (Eq. 5). Finally, the strength of slow self-adaptation approached h = 0.47 of the excitatory activity with a time constant τH. This time constant was fit so that the model produced mean durations similar to those of each subject at tristability. In a grid search, the model was found to produce mean duration times approximately 1.066 times the value of τH. This relationship was used to determine τH for each subject. The resulting τH values ranged from 2.94 to 5.92 s with a mean of 4.17 s. Simulated durations had a correlation of 0.94 with observed durations. Apart from these fitted τH values, the addition of noise and inhibition from two units rather than one, all equations and parameters were preserved from the Wilson model.
The single-process model was simulated for 8 min for each subject with a simulation step of 1 ms. To extract simulated button-press records, the unit with the highest excitatory value at a given time step was considered to be perceptually dominant. Durations shorter than 150 ms were discarded to account for subjects' limited response speeds. The eight-minute simulated response record for each observer was then processed in the same way as the psychophysical data for comparison (see Fig. 9).
Code accessibility.
The custom code used to simulate our models is available upon request from the corresponding author.
Duration distribution fitting.
To better assess the duration distributions of rivalry and fusion percepts, times from multiple trials were combined. Duration distributions for rivalry were taken from the 40°, 50°, and 90° disparity trials in both horizontal and vertical reference trial blocks. Those of fusion were taken from the 25°, 30°, and 35° trials of the vertical trials. In all cases, the final period of each trial was omitted due to its truncation by the trial's end. Log-normal distributions were fit by finding the mean and standard deviation which maximized the log-likelihood of the z-scored data. The Nelder–Mead method was used to search over parameters, and bootstrapping was used to get 95% confidence interval (CI) estimates.
Statistical analysis.
Kolmogorov–Smirnov (KS) tests were used when comparing the duration distributions of percepts. These tests determined whether observed values could be considered statistically distinct from each other (when comparing triads around fused and monocular percepts), or from simulated values sampled from a fitted distribution (when testing for log normal fits).
Autocorrelation of perceptual durations.
To measure the degree to which each perceptual duration predicted future durations, we computed the correlations between the duration of each percept and their neighbors from one to 10 periods later. Each observer's perceptual durations were shifted from between zero and 10 periods and correlated with themselves. The values were normalized so that the autocorrelation values (for shifts of zero) equaled one. The results for all observers were then averaged to produce the final plots (see Figs. 5; 9C).
Observer exclusion criteria.
We removed four observers from our analysis on the basis of unreliable fusion or rivalry reports. Specifically, two observers (S02 and S08; see Fig. 3) had an average rivalry duration for 90° disparity trials, which was >1.5 times the interquartile range (IQR) above the mean of all observers (>∼6 s). These observers sometimes reported seeing one orientation for nearly whole trials, and so were considered to be outliers in their perception of rivalry. Two observers (S05 and S06; see Fig. 3) saw stimuli with 0° or 10° of disparity as non-vertical (unfused) an average of >1.5 IQR more than the rest of the observers (>∼17/240 s total). As stimuli with 10° of disparity should appear mostly fused and 0° have no possibility of a rotated appearance, these observers who reported significant non-fusion were considered outliers in their perception of fusion.
Results
Perceptual reports
Previous experiments suggesting that fusion and rivalry operate in exclusive stimulus regimes have supported the assumption that they are separate processes. To test whether the perception of fusion and rivalry can dynamically change over time when viewing a single static stimulus, we had observers report the orientations of dichoptic Gabor patches over 60 s trials. Each pair had some orientation disparity evenly split about the vertical axis, so that dominance by either eye gave a rotated percept while fusion resulted in a vertical appearance (Fig. 1A). The fused stimuli also supported perception of tilt in depth, but we did not ask the participants to report tilt, only the perceived orientation. Over 18 trials, we tested nine different orientation disparities. The shifting and unusual nature of dichoptic percepts makes observer instruction especially important; all cases must be accounted for and assigned a clear response to ensure reliable results. Observers were told to hold down the right or left arrow keys to indicate clockwise or counter-clockwise appearances and use the up-arrow key to indicate verticality. If the stimulus appeared mixed or of uncertain orientation, they were to release all keys until it resolved (see Fig. 1B for example reports).
Example stimuli and perceptual reports. A, Examples of stimuli presented dichoptically in our paradigm. B, Sample responses from three observers to 60 s trials of each of the example stimuli.
The 0° and 90° disparity conditions served as internal controls showing that observer responses reliably indicated their percepts. The 0° disparity stimulus was seen as vertical 95.2% of the time and rotated clockwise or counter-clockwise only 0.6% (Fig. 2A). 90° disparity stimuli were seen as rivalrous 90.5% of the time and vertical only 0.1% of the time. Observers with anomalous response patterns in these conditions were excluded from further analysis (though their results are presented in the observer-by-observer analysis of Fig. 3). Altogether, four observers were excluded on the basis of their aberrant perception of fusion or rivalry (see Materials and Methods for a full description of the exclusion criteria).
Rivalry and fusion are not separated by a hard threshold. A, Mean proportion of time each outcome was reported across orientation disparities about the vertical axis. B, Same plot for orientation disparities about the horizontal axis (i.e., without the involvement of stereopsis). Markers indicate the average across observers, linearly interpolated by lines. Error bars show 95% CIs for the means after bootstrapping across observers.
Individual observer responses to disparities about the vertical. Conventions are as in Figure 2. Red labels indicate outliers in either fusion (S5, S6) or rivalry (S2, S8) reports.
We found that stimuli with similar orientations fused and those with distant orientations rivaled as expected, but we also observed a previously unreported range of orientation disparities in which stimuli could appear either fused or rivalrous over time (Fig. 2A). At small orientation differences (<∼25°), fusion predominated (lavender curve). For larger differences (>∼35°), rivalry took hold (yellow and green curves indicating monocular views). Notably, periods of both fusion and rivalry were reported between ∼25° and 35°, a region we will refer to as the “tristable zone.” This region does not simply result from averaging over observers with varying thresholds between fusion and rivalry, as individual observers (Fig. 3) each reported periods of fusion, left- and right-eye dominance for some static stimuli.
To determine whether rivalry and fusion could be tristable even without stereopsis, the same task was repeated with orientation disparities about the horizontal axis (i.e., vertical disparities). Unlike the vertical reference trial block, disparities here should not have resulted in any appearance of tilt in depth. This removed the possibility of extraneous depth percepts interfering with observers' orientation reports. Despite eliminating stereopsis, observers reported a similar pattern of perception with some stimuli resulting in fusion, left- and right-eye dominant views over time (Fig. 2B). The range of fusibility did appear narrowed, with the tristable zone now spanning ∼20–25°. This is consistent with the horizontally-elongated distribution of ecologically observed disparities (Read and Gumming, 2004) and the attendant broader horizontal tuning for disparity in primate visual neurons (Cumming, 2002).
A hallmark of binocular rivalry is the shape of its duration distribution, which we found to match that of the periods of fusion during tristability. The duration distribution of dominance periods in rivalry has historically been described as “gamma-like” (Fox and Herrmann, 1967; Levelt, 1967; Walker, 1975), but more recently been found to be best described by a log normal fit (Lehky, 1995; Carter and Pettigrew, 2003; Zhou et al., 2004). As expected, our observers exhibited log-normally-distributed perceptual duration times for orthogonal stimuli (KS test p = 0.59; Fig. 4A). Although fusion is typically seen as indefinitely stable and without a duration distribution, the distribution of fusion times at tristability was also well described as log normal (KS test p = 0.19; Fig. 4B). Despite trading off with periods of fusion, the rivalrous percepts at tristability also continued to appear log-normally distributed (KS test p = 0.53; Fig. 4C). The mean durations of rivalry and fusion at tristability were similar by selection (4.51 ± 0.24 s for fusion, 4.06 ± 0.29 s for rivalry; 10 of 11 observers had overlapping 95% confidence intervals), as the tristable point was chosen for each observer as the condition with the closest total amounts of each state. However, the standard deviations were also similar (4.90 ± 0.44 s for fusion and 3.42 ± 0.30 s for rivalry; 6 of 11 observers had overlapping 95% confidence intervals), which was not a given. Thus, all three perceptual outcomes observed at the tristable point continued to exhibit a pattern of durations typical of normal bistable rivalry.
Perceptual duration distributions are log normal in rivalry and tristability. A, Distribution of rivalry durations for orthogonal conditions. B, Distribution of fusion durations at the tristable points of each observer. C, Same as B for rivalry durations. Inset plots in each contain histogram versions of the same data.
Another hallmark of bistable rivalry is the independence of subsequent perceptual durations (Fox and Herrmann, 1967; Blake et al., 1971; Walker, 1975; Lehky, 1995) and this was also true for fusion during tristability. For orthogonal Gabor patches, we observed as expected that the duration of a period of rivalry gave little to no information about how long the next would last (near-zero correlations with the next nine periods; Fig. 5A). We did the same analysis on all fused and rivalrous percepts at the tristable point and found that they shared this property (Fig. 5B). Finally, to see whether fusion and “non-fusion” (i.e., either monocular image or no response) could be seen as its own bistable process we tested whether subsequent fusion and interfusion periods were uncorrelated. The results showed that successive fusion and interfusion periods were indeed uncorrelated (Fig. 5C), so we cannot rule out models of fusion as a bistable state from these data alone.
Subsequent durations are independent in rivalry and tristability. A, Autocorrelation function for rivalry periods in orthogonal conditions. B, Autocorrelation function for all (fused and rivalrous) periods at the tristable points of each observer. C, Autocorrelation function for fusion and interfusion periods at the tristable points of each observer. Inset icons provide examples of which durations are being compared. Markers show average values across observers, interpolated by lines. Error bars show 95% CIs after bootstrapping across observers.
Simulations suggest fusion and rivalry are not separate bistable processes
The log-normal distribution and independence of subsequent durations of rivalry are thought to result from noisy adaptation of left- and right-eye units in a mutually-inhibitory arrangement (Lehky, 1988; Laing and Chow, 2002). One possible interpretation of seeing these properties in fusion at the tristable point is that fusion too is an adapting process engaged in mutual inhibition with rivalry. The question then becomes whether all three states are part of a multistable system with mutual inhibition between each pair (single-process model; Fig. 6A), or if rivalry remains a separate bistable process that competes with fusion/non-fusion (separate-process model; Fig. 6A). The separate-process framing would maintain fusion as a distinct process from rivalry, with transitions to and from fusion distinct from those between rivalrous states. However, the generation of tristable perception from two bistable processes would leave distinct fingerprints which can be tested for.
Simulation of models of rivalry and fusion as separate bistable processes, with rivalry behaving in different ways during periods of fusion. A, Diagram depictions of a separate-process model (left) where fusion is a higher-level bistable process that alternates with rivalry, and a single-process model (right) where fusion alternates with each monocular percept directly. B, Modeling rivalry as being unaffected by periods of fusion. C, Modeling rivalry as slowing by a factor of 1.5 during periods of fusion. D, Modeling rivalry as slowing by a factor of 3 during periods of fusion. E, Modeling rivalry as stopping during periods of fusion. F, Modeling rivalry as interrupted during periods of fusion, using the fusion and interfusion durations as model inputs TOFF and TON.
If binocular rivalry and fusion/non-fusion are conceived of as separate bistable processes they could generate a three-state output in a few simple ways. When the fused percept is dominant, rivalry could continue, slow, stop, or be interrupted. That is, the rivalry might continue unconsciously (continuation model; Fig. 6B), as has been observed for some “invisible” rivaling stimuli (Zou et al., 2016). Alternatively, the rivalry could slow down or even appear to stop (slowdown models, Fig. 6C,D; stopping model, Fig. 6E), as observed in experiments where attention is removed from dichoptic stimuli (Paffen et al., 2006; Zhang et al., 2011). Finally, the rivalrous percepts themselves could be interrupted and begin rivalry anew upon their reappearance, as occurs with intermittent presentations of ambiguous stimuli (Leopold et al., 2002; Kanai et al., 2007; Noest et al., 2007; Pearson and Brascamp, 2008) (interruption model; Fig. 6F). Each of these arrangements leads to different patterns in the resulting tristable time course; in particular, the perceptual dominance seen immediately before and after periods of fusion is diagnostic. The proportion of returns to the previously dominant eye after fusion compared with transitions to the other eye indicate how the rivalry process behaves after being suppressed. We simulated each model by sampling from measured fusion and rivalry duration distributions to generate time courses of their resulting tristable outputs.
Our data shows a preference for dominance transitions to a new eye after periods of fusion that is not captured by any of the separate-process models we simulated. In bistable processes, one state always gives way to the other, but with tristability there is a choice of two states at each transition. Of particular interest when considering models where fusion states are special is which eye regains dominance after fusion. Does perception “return” to the previously-dominant eye, or “transition” to the one which had been suppressed? The stopping Model makes the strongest prediction: it directly stores the previous state of dominance and produces a return to it after every period of fusion (Fig. 7A, red simulation points show close to zero transitions relative to blue psychophysical data). Computing the proportion of returns across observers for each of the 200 stopping model simulations gave a 95% CI of [0.00–0.00]. Transitions can only happen here if the fusion period happens to occur at a natural exchange in rivalry. This prediction is softened with less-extreme slowing of the rivalry process during fusion, as seen in the Slowdown models which produce more transitions (Fig. 7B,C; 95% CIs of [0.19–0.25] and [0.29–0.37]). However, even the Continuation model, where the rivalry process continues unabated during fusion, predicts more returns than transitions overall (Fig. 7D; 95% CI [0.35–0.42]). The Interruption model also predicts more returns than transitions, as rivalrous percepts are seen to “stabilize” (return to the previously dominant state) when briefly removed from view (Noest et al., 2007; Fig. 7E; 95% CI [0.17–0.23], orange points are above the diagonal). Of all our simulations, only the real data points consistently exhibit more transitions than returns (blue dots in all panels of Fig. 7 below the diagonal), with a 95% CI of [0.51–0.62]. This pattern has also been observed in other multistable systems with three competing states (Naber et al., 2010; Huguet et al., 2014), as percepts adapt while dominant and are thus less likely to return to dominance after being recently seen.
Separate-process models fail to capture the observed preference for transitions in rivalry dominance after fusion. Points show means and 95% CIs of counts for returns and transitions across 200 8 min simulations for individual observers, with gray lines indicating an equal amount of transitions and returns. Psychophysical data (blue) was bootstrapped across trials for each observer. A, The stopping model almost never exhibits transitions. B, The 3x Slowed model allows for rivalry to continue slowly during fusion, producing some transitions. C, The 1.5× slowed model shows still more transitions. D, The continuation model (effectively a 1× slowed model) shows the most transitions, but still predicts more returns overall. E, The interruption model shows a strong preference for returns. All are compared with the real data, which favors transitions.
Another property we see in our data but not these separate-process models is an increasing likelihood of return to the previously-dominant eye after longer fusion durations. In our data, short fusion periods are often followed by a transition, whereas longer ones approach a more even split between outcomes (Fig. 8A). None of our simulated separate-process models showed this pattern (Fig. 8B–F). Although these models do not exhibit this property, it is a natural result of distributed competition and adaptation among three percepts. Long periods of fusion allow more time for the previously dominant (and thus adapted) eye to recover and return to dominance, whereas short ones will favor a transition to the less-adapted representation. This property has again been observed in multistable perceptual systems with three outcomes (Naber et al., 2010; Huguet et al., 2014) as a result of the competitive disadvantage of recent dominance due to adaptation. Moreover, we find a statistically indistinguishable pattern of the frequency of returns after an intervening monocular percept as a function of the duration of the intervening monocular state. Specifically, the distributions of intervening durations were statistically indistinguishable for both transitions (KS test p = 0.20) and returns (KS test p = 0.48) around monocular states. Although this suggests that the monocular and fusion states are interchangeable in this respect, we would be careful to point out that the probability of being in any one of these states is not necessarily balanced as that depends on the exact orientation disparity (Fig. 2) and there is no guarantee that the fusion state should have the same adaptation constant as monocular states.
Separate-process models do not capture the increased chance of returns after longer periods of fusion seen in the real data. A, Returns approach half from below as fusion periods lengthen in the real data. Data was bootstrapped across trials for each observer. Correlation of fusion durations with transition probabilities for meaned bins (corr) = 0.866 B, Returns tend not to occur for the stopping model. C, The 3× slowed model approaches 50% returns from above as fusion periods lengthen (corr = −0.957) D, The 1.5× slowed model shows the same pattern (corr = −0.946) E, The Continuation model (effectively a 1× slowdown model) also shows the same pattern as the Slowed models (corr = −0.912) F, The interruption model predicts more returns than transitions across all observed fusion durations. Error bars show 95% CIs across 200 simulations. All panels show the means of 10 equal-sized bins of the total data across observers.
Taken together, the above patterns in our data suggest that fusion and rivalry are not well-described by dual process models. Here, we consider whether a single process model of the type used to describe multistable perception could be adapted to model tristability. To demonstrate how a simple multistable model based on competition and adaptation could reproduce many of the features we observed in tristability, we extended an existing model of rivalry to simulate a three-way “single-process” system. The model that we modified (Wilson, 2003) consisted of two units which competed through mutual inhibition and adapted over time. We added a third unit, so that each unit received inhibition from the other two. We also added a noise term to the adaptation process to make the model stochastic and fit the time constant of adaptation to match each subject's mean percept duration at tristability. The resulting model alternated among the three states over time (Fig. 9A) and produced log-normally distributed duration times (Fig. 9B) which were not correlated (Fig. 9C), as we saw in our psychophysical data. Similar to other three-state multistable systems (Naber et al., 2010; Huguet et al., 2014), each unit adapted over time when dominant which resulted in recently-dominant percepts being less likely to regain dominance. As a result, the system tended to transition to the previously-suppressed unit after periods of fusion (Fig. 9D) and this tendency diminished for longer periods of fusion (Fig. 9E). All of these features were observed after fitting the model only to the mean percept durations of each observer.
A “single-process” model of tristability as three competing, self-adapting nodes is able to reproduce many features of our data. A, Example simulation time course with τH = 5.25 s. B, The single-process model produces a good fit to a log-normal duration distribution. C, The single-process model does not show correlation between successive periods of perception. Error bars indicate 95% CIs. D, The single-process model shows a tendency to transition to the previously-suppressed eye after periods of fusion. E, The single-process model's tendency toward transitions after fusion is dependent on the duration of fusion, with returns becoming more likely as fusion durations increase.
Discussion
We found that rivalry and fusion are not mutually exclusive; periods of fusion, left- and right-eye dominance can all result from one stimulus. Although these outcomes are mutually exclusive in each moment, the underlying processes and stimuli which engage them are not. This overturns the notion that fusion precludes rivalry (Blake, 1989; Blake et al., 1991; O'Shea, 1998). We found that simple models of fusion and rivalry as separate bistable processes-where fusion alternates with rivalry-cannot reproduce our findings. Specifically, we observed that rivalrous dominance tended toward a different eye's view after fusion periods and that this bias decreased over longer fusion intervals. An existing model of binocular rivalry expanded with a third node was able to reproduce this property. This implies the perceptual states are trading off based on adaptation levels, as seen in other multistable perceptual processes (Naber et al., 2010; Huguet et al., 2014).
We propose that fusion and rivalry be seen as components of one multistable system. Our tristable percepts exhibit the uncorrelated percept durations and log-normal duration distributions characteristic of bistable percepts such as rivalry (Borsellino et al., 1972; Zhou et al., 2004; Brascamp et al., 2005; van Ee, 2005; Pressnitzer and Hupé, 2006; O'Shea et al., 2009). Other studies of perceptual multistability with three or more states have also documented these properties (Suzuki and Grabowecky, 2002; Hupé, 2010; Naber et al., 2010; Huguet et al., 2014). We argue that the exclusivity of fusion and rivalry in very similar or different stimuli results from their extremity, with gradual mixing possible for intermediate cases. This framing places fused and rivalrous percepts on equal footing, all capable of taking place with rivalry-like dynamics for a given stimulus.
Our findings challenge the prevailing assumption that fusion and rivalry are mutually exclusive, indefinitely stable processes separated by a stimulus threshold. This threshold concept has origins in early studies of retinal fusion and diplopia (double vision). Points and lines fuse within a certain distance and result in diplopia otherwise (Panum, 1858). The allowable distance was shown to exhibit hysteresis; fused points could be drawn farther apart than before and remain so (Fender and Julesz, 1967). This phenomenology was intuitively extended to the higher-level phenomena of fusion and rivalry. Rivalry, like diplopia, was seen as a failure of fusion resulting from excessive disparities: “… binocular rivalry is the default outcome when interocular features differ by an amount too great to be fused… According to my theory, the presence of matching features in the two eyes' images makes those features exempt from binocular suppression.” (Blake, 1989). Although our findings support the mutual exclusivity of rivalry and fusion at any particular moment in time, they overturn the notion that the features of a fused stimulus are exempt from binocular suppression. This idea was supported by a previous experiment where a single vertical grating was matched with a horizontal-vertical plaid in the other eye (Blake and Boothroyd, 1985). The vertical components were seen to fuse together, with the unmatched horizontal feature remaining visible to form a stable plaid. Despite the presence of orthogonal dichoptic gratings, no rivalry was observed-it seemed suppressed by the fusion. However, when the vertical gratings in each eye are fused, the horizontal grating is paired with a blank field in the other eye. Features with no matching contours in the other eye are known to enjoy a stable perceptual dominance (Levelt, 1965). Therefore, this result shows that fusible pairs of stimuli are combined preferentially compared with rivalrous pairs. This question of stimulus pairing does not suggest whether fusion or rivalry will win out for a particular stimulus which might support either outcome. To address this question, a single pair of dichoptic yet fusible stimuli are needed, as we used in our paradigm.
Our finding that a stimulus can result in both fusion and rivalry is supported by some related results. A fusible patch can rival when placed in a larger rivalrous context (Takase et al., 2008). Brief presentations of some stimuli can result in either outcome on different trials (Ono et al., 1977; Braddick, 1979). Moving dot stimuli with disparities in direction-of-motion have been shown to exhibit both rivalry and fusion over time (Blake et al., 1985). When dichoptic stimuli are rotated together or apart over time, fusion or rivalry can occur at the same orientation disparity in a hysteresis effect that depends on the stimulus history (Buckthought et al., 2008). These findings show that a stimulus does not always fall clearly on one side of some threshold of fusibility. However, none of the previous studies have examined the perceptual dynamics involved, having reported at most the overall proportions of each state. As the moving dot stimuli were constantly changing, it is also possible that they happened to become more or less fusible over time. Although our results cannot speak to the dynamics of fusion and rivalry for moving or changing stimuli, our findings are distinct in using static stimuli presented over longer durations of time. It's possible that our stimuli changed in appearance over time due to perceptual noise, but the small (2.6° mean) orientation acuity thresholds we measured in our subjects suggest that this was not a major contributor to the tristable zone that we observed across ∼10° of stimulus disparity. The periods of fusion and rivalry we observed over time were thus not likely the result of stimulus history-related effects, though they are not inconsistent with previous observations of hysteresis in binocular vision. Although other studies have challenged the idea that a stimulus should result in a single outcome, none have shown that both rivalry and fusion can result for a static stimulus over time.
Some models aiming to account for a threshold between rivalry and fusion portray them as separate mutually-inhibitory, self-adapting processes (Julesz and Tyler, 1976; Tyler and Julesz, 1976; Buckthought et al., 2008) which could conceivably produce a tristable state. Self-adaptation and mutual inhibition of monocular populations have been central to modeling binocular rivalry (Sperling, 1970; Lehky, 1988; Blake, 1989; Laing and Chow, 2002). Therefore, for inputs that activate both fusion and rivalry mechanisms, such models would be well disposed to produce rivalry-like alternations between those outcomes. However, we found that such separate-process models cannot reproduce our data-no matter whether the rivalry process is frozen, interrupted or allowed to continue during fusion. On the other hand, a simple single-process model of tristability fit only to subjects' mean percept durations was able to capture many features of our data. We conclude that models of fusion and rivalry as separate bistable processes cannot account for the tristable state we observed, but a model with three-way competition can.
Our finding that ambiguous percepts are perceived tristably is consistent with a growing body of work that views rivalry as having an integral role in consistent scene interpretation, rather than being an aberrant and unecological failure mode of binocular vision. Previous work has shown that regions of binocular conflict can provide cues for depth perception, particularly around occluding edges (Nakayama and Shimojo, 1990; Grossberg and McLoughlin, 1997; Tsirlin et al., 2014; Goncalves and Welchman, 2017). Any nonhorizontal depth edge necessarily results in a binocular mismatch where only one eye can see a patch of space around it. In these cases, binocular conflict and suppression work alongside the fusion process to achieve stereopsis. Rather than conceiving of rivalry and fusion as mutually-exclusive and conflicting processes, perhaps monocular and fused percepts can be seen as potential (and necessary) interpretations which help the visual system to settle on a unified view of uncertain inputs. Treating fusion and rivalry as multistable outcomes of a single perceptual system rather than separable and independent processes is an important first step toward a unified understanding of their roles in binocular perception.
Footnotes
This material is based upon work supported by the National Science Foundation Graduate Research Fellowship (Grant #DGE-1147470). Research reported in this publication was also supported by a training grant from the National Institutes of Health (Award #T32MH020016). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or the National Science Foundation. We acknowledge the generous support of Research to Prevent Blindness, the Lions Clubs International Foundation, and the Hellman Fellows Fund (J.L.G.).
The authors declare no competing financial interests.
- Correspondence should be addressed to Guillaume Riesen at griesen{at}stanford.edu