## Abstract

The timing of perceptual decisions depends on both deterministic and stochastic factors, as the gradual accumulation of sensory evidence (deterministic) is contaminated by sensory and/or internal noise (stochastic). When human observers view multistable visual displays, successive episodes of stochastic accumulation culminate in repeated reversals of visual appearance. Treating reversal timing as a “first-passage time” problem, we ask how the observed timing densities constrain the underlying stochastic accumulation. Importantly, mean reversal times (i.e., deterministic factors) differ enormously between displays/observers/stimulation levels, whereas the variance and skewness of reversal times (i.e., stochastic factors) keep characteristic proportions of the mean. What sort of stochastic process could reproduce this highly consistent “scaling property?” Here we show that the collective activity of a finite population of bistable units (i.e., a generalized Ehrenfest process) quantitatively reproduces all aspects of the scaling property of multistable phenomena, in contrast to other processes under consideration (Poisson, Wiener, or Ornstein-Uhlenbeck process). The postulated units express the spontaneous dynamics of attractor assemblies transitioning between distinct activity states. Plausible candidates are cortical columns, or clusters of columns, as they are preferentially connected and spontaneously explore a restricted repertoire of activity states. Our findings suggests that perceptual representations are granular, probabilistic, and operate far from equilibrium, thereby offering a suitable substrate for statistical inference.

**SIGNIFICANCE STATEMENT** Spontaneous reversals of high-level perception, so-called multistable perception, conform to highly consistent and characteristic statistics, constraining plausible neural representations. We show that the observed perceptual dynamics would be reproduced quantitatively by a finite population of distinct neural assemblies, each with locally bistable activity, operating far from the collective equilibrium (generalized Ehrenfest process). Such a representation would be consistent with the intrinsic stochastic dynamics of neocortical activity, which is dominated by preferentially connected assemblies, such as cortical columns or clusters of columns. We predict that local neuron assemblies will express bistable dynamics, with spontaneous active-inactive transitions, whenever they contribute to high-level perception.

- attractor cell assemblies
- birth-death process
- cortical columns
- first-passage time
- multistable perception
- scaling property

## Introduction

Response time distributions reveal much about the neural mechanisms underlying perceptual decisions (Ratcliff and Smith, 2004; Smith and Ratcliff, 2009). A gradual and noisy accumulation of sensory information, which triggers a choice response when exceeding a criterion amount, explains the respective response time distributions of correct and error responses as well as their dependence on task factors, such as choice utility or the desired speed or accuracy of response. In several instances, neurophysiological studies were able to confirm and extend this quantitative understanding (Schall, 2001; Sugrue et al., 2005; Gold and Shadlen, 2007), revealing distinct cortical substrates for accumulating and for evaluating sensory information (Resulaj et al., 2009; Shadlen and Kiani, 2013).

Here we extend this approach to the multistable appearance of ambiguous visual displays (Leopold and Logothetis, 1999; Blake and Logothetis, 2002; Sterzer et al., 2009). Multistable situations are characterized by spontaneous perceptual decisions in the form of occasional, sudden reversals of visual appearance (Leopold and Logothetis, 1999; Braun and Mattia, 2010). The timing of such reversals depends primarily on stimulus factors (Levelt, 1967; Brascamp et al., 2006) and only secondarily on cognitive or task factors (e.g., Pearson et al., 2008).

Even though a dominant appearance is perceived as stable until a reversal transpires, its robustness gradually declines, as revealed by perturbation experiments (Wolfe, 1984; Nawrot and Blake, 1989; Petersik, 2002; Kang and Blake, 2010). It is thought that support for the currently dominant appearance degrades while, simultaneously, support for an alternative appearance accumulates, until a critical differential is reached (Laing and Chow, 2002; Wilson, 2003; Moreno-Bote et al., 2007; Noest et al., 2007; Hohwy et al., 2008). The cortical sites associated with, respectively, accumulating and evaluating competing evidence from multistable displays (e.g., Knapen et al., 2011) appear to overlap with the sites identified for choice tasks (Resulaj et al., 2009; Shadlen and Kiani, 2013).

The reversal times of ambiguous displays have been characterized in exquisite detail. Astonishingly, the distribution of reversal times maintains a characteristic, gamma-like shape (Levelt, 1967; Blake et al., 1971; Walker, 1975; De Marco et al., 1977; Murata et al., 2003; Pastukhov and Braun, 2007), even though mean reversal times vary widely between observers, displays, and input levels (Fox and Herrmann, 1967; Borsellino et al., 1972; Walker, 1975; Zhou et al., 2004; Brascamp et al., 2005). Moreover, the distribution in question is considerably more variable (“wider”) and less skewed (“shorter-tailed”) than typical choice time distributions (Carpenter, 2012; Pearson et al., 2014). The constancy of distribution shape constitutes a “scaling property”, as it implies that higher moments scale as appropriate powers of the mean.

Here we ask the following: what sort of stochastic process could reproduce both the scaling property and the short-tailed skewness of reversal time distributions? What could guarantee, for all displays and observers, that gradual accumulation of sensory information leading to perceptual reversals is contaminated by proportional noise: slower accumulations by less, and faster accumulations by more noise? And what sort of process could ensure that reversal time distributions are consistently short-tailed?

As a possible solution to this conundrum, we propose a finite population of cortical columns or clusters that are individually bistable in that they transition spontaneously between active and inactive states (generalized birth-death or generalized Ehrenfest [GE] process) (Karlin and McGregor, 1965; van Kampen, 1981; Cao et al., 2014). The discreteness of such a representation would automatically keep normalized moments constant and ensure a scaling property. Moreover, the asymmetric fluctuations of a finite population would ensure a short-tailed distribution. The proposed representation mimics the intrinsic stochastic dynamics of neocortical activity, which is dominated by preferentially connected local assemblies, such as cortical columns or clusters of columns.

## Materials and Methods

#### Perceptual observations

##### Observers.

Procedures were in accordance with the Declaration of Helsinki and were approved by the medical ethics board of the Otto-von-Guericke University (Magdeburg, Germany). All participants had normal or corrected-to-normal vision, were naive to the purpose of experiments, and were paid for participating.

##### Apparatus and general procedure.

Stimuli were generated with MATLAB (The MathWorks). Observers responded using a keyboard. Background luminance was kept at 36 cd/m^{2}. The experimental room was lit dimly (ambient luminance at 80 cd/m^{2}). For visual multistability, we considered situations in which strong retinal inputs are spontaneously suppressed: motion-induced-blindness (MIB), binocular rivalry (BR) (Campbell and Howell, 1972; Leopold and Logothetis, 1999; Bonneh et al., 2001), spontaneous reversals of illusory rotation in depth (kinetic depth effect [KDE]) (Wallach and O'Connell, 1953; Sperling and Dosher, 1994), spontaneous shifts in the apparent direction of motion (moving plaids [MP]) (Adelson and Movshon, 1982; von Grünau and Dubé, 1993), and spontaneous changes in the 3D appearance of line drawings (Necker cube [NC]) (Meng and Tong, 2004). The datasets on KDE, NC, and MP, as well as one of six datasets on BR, were published previously (Pastukhov and Braun, 2007; Pastukhov et al., 2013). New datasets were collected on MIB and on BR.

For auditory multistability, we considered spontaneous transitions between auditory segregation and integration (auditory streaming) (Bregman, 1994; Winkler et al., 2012). This dataset was kindly provided by I. Winkler and S. Denham (personal communication).

For choice reaction tasks, we reanalyzed published information on saccadic reaction times (saccade reaction time) (Carpenter, 2012) and on working memory retrieval times (memory reaction time) (Pearson et al., 2014). Both studies report timing densities for individual observers.

##### Binocular rivalry.

Six observers participated in the experiment (4 male, 2 female). Stimuli were displayed on an LCD screen (EIZO ColorEdge CG303W, resolution 2560 × 1600 pixels, viewing distance 104 cm, single pixel subtended 0.014°, refresh rate 60 Hz) and were viewed through a mirror stereoscope. Chin and head rests were used to stabilize viewing position.

Two grayscale circular orthogonally oriented gratings (45° and −45°) were presented foveally to each eye. Gratings had diameter of 1.6° (spatial period 2 cyc/deg). To avoid a sharp edge, grating contrast was modulated with Gaussian envelope (starting inner radius 0.6°, σ = 0.2°). Tilt and phase of gratings were randomized for each block. Five contrast levels were used as follows: 6.25%, 12.5%, 25%, 50%, and 100%. Contrast of each grating was systematically manipulated, so that each contrast pair was presented in two blocks (50 blocks in total). Each block was 2 min long and separated by a compulsory 1 min break. Observers reported on the tilt of the visible grating by continuously pressing one of two arrow keys. They were instructed to press only during exclusive visibility of one of the gratings, so that mixed percepts were indicated by neither key being pressed (25 ± 8% of total presentation time). To facilitate binocular fusion, gratings were surrounded by a dichoptically presented square frame (outer size 9.8°, inner size 2.8°).

##### Motion-induced blindness.

Twenty observers participated in the experiment (12 male, 8 female). Stimuli were presented on a CRT screen (Iiyama VisionMaster Pro 514, www.iiyama.com; resolution 1600 × 1200 pixels, refresh rate 100 Hz). The viewing distance was 73 cm so that each pixel subtended ∼0.019°. Target was a yellow circle (diameter 0.2°), presented 1° above the fixation. Mask rotated at 1 Hz and consisted of 8 × 8 grid of crosses (arm length 0.6°, intercross distance 0.15°). Observers reported episodes of target disappearance by keeping the space key pressed for the entire duration of the episode. Blocks lasted 1 min and were separated by a compulsory 30 s. In total, 32 experimental blocks were measured for each observer.

#### Scaling property

##### Definitions and relevant quantities.

A distribution of samples *t* is characterized by moments of the distribution (density). Distribution shape (Table 1) may be quantified in terms of the mean μ_{1} ≡ 〈*t*〉 and the central moments μ_{2} ≡ 〈(*t* − μ_{1})^{2}〉, μ_{3} ≡ 〈(*t* − μ_{1})^{3}〉, etc., or, equivalently, in terms of normalized moments, such as the coefficient of variation *c _{v}* = μ

_{2}

^{1/2}/μ

_{1}and the skewness γ

_{1}= μ

_{3}/μ

_{2}

^{3/2}.

A scaling property obtains if central moments are proportional to corresponding powers of the mean or, equivalently, if normalized moments are constant as follows:
Denoting individual realizations of activity with *x*(*t*) and the ensemble average with 〈*x*(*t*)〉, the accumulation and dispersion rates are defined as the respective time-derivatives of the mean and variance as follows:
In the general case, these rates will be both input- and state-dependent. In other words, they will change with both sensory input *s* and instantaneous activity *x*(*t*) as follows:
If *v _{drift}* increases with input and activity

*x*(

*t*) traverses a range

*x*

_{0}≤

*x*(

*t*) ≤ θ, the mean first-passage time (FPT) μ

_{1}= 〈

*t*〉 is expected to decrease with input

_{fp}*s*and increase with threshold θ (compare Fig. 2), consistent with experimental observations. The same is true for the central moments μ

_{2}and μ

_{3}.

When activity *x*(*t*) remains far below an input-dependent steady-state *x _{in}*(

*s*) (drift-dominated regimen), the accumulation and dispersion rates are independent of

*x*(

*t*) and therefore stationary as follows: Together with threshold θ, the stationary rates ν

_{drift}* and ν

_{noise}* are the minimal ingredients for capturing the observed statistics of reversal times.

##### Sufficient condition for a scaling property.

We now formulate a sufficient condition between θ, ν_{drift}*, and ν_{noise}* for coefficients of variation of FPTs to remain constant with input. This condition holds for any drift-dominated accumulation (*x* ≪ *x _{in}*; see Fig. 2

*A*) in which the variance of FPTs σ

_{t}

^{2}= 〈

*t*

^{2}〉 − 〈

*t*〉

^{2}is proportional to variance of activity σ

_{x}

^{2}= 〈

*x*

^{2}〉 − 〈

*x*〉

^{2}at time 〈

*t*〉, so that the respective coefficients of variation are approximately the same (Wiener, Ornstein-Uhlenbeck, Poisson, or Ehrenfest processes) as follows: Experimental evidence shows that increased input decreases the mean time 〈

*t*〉 of first passing a threshold θ decreases. It follows that increased input must increase accumulation rates ν

_{drift}* as follows: If ν

_{noise}* remains constant with input, it further follows that the variance at 〈

*t*〉 decreases with ν

_{drift}* and the coefficient of variation at 〈

*t*〉 decreases with the square root of ν

_{drift}* as follows: so that no scaling property is obtained.

However, if both ν_{noise}* and ν_{drift}* increase with input in the same proportions, the coefficient of variation remains constant as follows:
and the necessary condition for a scaling property is satisfied.

#### Stochastic processes under study

We summarize the salient characteristics of the stochastic processes under discussion, in particular, the moments of the FPT distributions. Where explicit expressions are unavailable, we resorted to a numerical integration of the related Langevin equation with standard first-order methods (Risken, 1984). For each combination of *x _{in}* and θ, we simulated 10

^{5}FPTs in time-steps of 0.01

*s*.

#### Balanced Poisson (BP) process

This process approximates excitatory and inhibitory inputs incrementing and decrementing the membrane potential of a neuron (Tuckwell, 1988). Excitatory (respectively inhibitory) spikes arrive with a fixed rate ν* _{E}* (respectively ν

*), so that either spike counts form independent Poisson processes*

_{I}*N*(

_{E}*t*) (respectively

*N*(

_{I}*t*)). FPT is defined as time

*t*from initial count

*N*(0) =

_{E}*N*(0) = 0 to a threshold count θ. The distribution of FPT has mean μ

_{I}_{1}= 〈

*t*〉, coefficient of variation

*c*, and skewness γ

_{v}_{1}(Tuckwell, 1988) as follows:

#### Discrete Ehrenfest process

This generalized birth-death process is used in many contexts, including statistical mechanics, chemical physics, and theoretical biology, to model the statistical dynamics of macroscopic ensembles composed of discrete microscopic units (Karlin and McGregor, 1965; van Kampen, 1981; Emch and Liu, 2002; Ghosh et al., 2006; Cao et al., 2014). We consider a population of *N* stochastic and bistable units, each transitioning spontaneously and independently between inactive and active states. Transition rates ν_{+} (activation) and ν_{−} (inactivation) are assumed to be stationary. If *n* is the number of units active at a given time (with 0 ≤ *n* ≤ *N*), the overall rate of upward transitions within the population is (*N* − *n*)ν_{+}, and the rate of downward transitions is *n*ν_{−}. Because of this state dependence, the step distribution of the discrete Ehrenfest process is asymmetric, especially when *n* approaches the extremes of its range: *n* ≈ 0 or *n* ≈ *N*. Moments of the FPT density may be obtained from the renewal equation as finite sums over Krawtchouk polynomials, as detailed previously (Cao et al., 2014).

#### Wiener process

A Wiener process combines constant (but input-dependent) accumulation with constant dispersion and produces a noisy linear accumulation (Cox and Miller, 1972). It satisfies a Langevin equation with constant infinitesimal drift and variance,
and normally distributed “white noise” ξ(*t*) is defined as follows:
From initial condition *x*_{0} = 0, the distribution of FPT to a threshold θ is an inverse Gaussian or Wald distribution (Tuckwell, 1988), with moments as follows:

#### Ornstein-Uhlenbeck process

An Ornstein-Uhlenbeck process combines a state-dependent (as well as input-dependent) accumulation with constant dispersion, resulting in a noisy exponential relaxation to a steady-state state *x _{ss}* with zero drift (Cox and Miller, 1972; Risken, 1984). It satisfies a Langevin equation with state- and input-dependent infinitesimal drift, and constant infinitesimal variance as follows:
Moments of the FPT distribution have been derived in terms of infinite series (Inoue et al., 1995), or in terms of nested integrals (Brunel, 2000).

#### Continuous Ehrenfest process

In the continuous limit, discrete random walks may be approximated by Gaussian diffusion processes (Cox and Miller, 1972; Risken, 1984). For a GE process, the continuous limit is a Cox-Ingersoll-Ross process (Cox et al., 1985) in which both infinitesimal drift and variance are state- and input-dependent (van Kampen, 1981) as follows:
where

#### Escape process

To model instantaneous escape across an adapting threshold, we assume that the instantaneous escape probability reflects normally distributed noise with mean 0 and variance σ^{2}. Specifically, we compute the instantaneous probability that the noise exceeds the distance to the (time-varying) threshold θ(*t*) as follows:
where ν is an escape rate. Dividing time into discrete intervals Δ*t*, we approximate the FPT density *f*(*t*) as follows:
and evaluated the moments numerically.

#### Overview

##### Spiking neuron simulations.

To illustrate a possible neural realization of a GE process, we simulated “strongly coupled” assemblies of excitatory and inhibitory neurons (see Fig. 8*A*,*B*), with each assembly expressing a spontaneous and independent bistable dynamics (Mattia et al., 2013). Each such assembly comprised 125 excitatory leaky-integrate-and-fire neurons (“foreground”), which were weakly coupled to an additional 875 excitatory neurons (“background”), as well as 250 inhibitory neurons. The background population is not strictly necessary and included here only for the sake of verisimilitude. The connection probability between any two neurons was *c* = 80%. Excitatory synaptic efficacy between “foreground” neurons, “background” neurons, and between the two was *J _{fore}* = 0.618 mV,

*J*= 0.438 mV, and

_{back}*J*= 0.402 mV, respectively. Inhibitory synaptic efficacy was

_{inter}*J*= −1.50 mV, and the efficacy of excitatory synapses onto inhibitory neurons was

_{I}*J*= 0.560 mV. Finally, “foreground” neurons, “background” neurons, and “inhibitory” neurons each received independent Poisson spike trains of 2340, 2280, and 2280 Hz, respectively. Other settings were as in Mattia et al. (2013). As a result of these settings, “foreground” activity transitioned spontaneously between a low state of ∼3 Hz and a high state of ∼40 Hz.

_{IE}Weakly coupled assemblies (see Fig. 8*C*) were obtained by reducing synaptic efficacies *J _{fore}* = 0.566 mV and

*J*= 0.431 mV, as well as increasing efficacies

_{back}*J*= 0.409 mV (to maintain overall level of activity). For “foreground” neurons (each with external Poisson inputs of 2400 Hz), the firing rate was ∼3 Hz. To reproduce the gradual accumulation of activity by strongly coupled assemblies, we increased (at a suitable pace) external Poisson inputs to 2760 Hz, eventually raising the firing rate to ∼30 Hz.

_{inter}To analyze the spiking activity of “foreground” neurons, we established the instantaneous average rate *r*(*t*) and generated a equally sized population of “surrogate” neurons as nonhomogeneous Poisson point processes with the same rate *r*(*t*). The statistics of “foreground” neuron spiking could then be compared with that of “surrogate” neurons (see Fig. 8*D*,*E*).

#### Formal treatment of stochastic processes

Discrete and continuous random walks require different formal treatments, namely, a Master Equation or a Fokker-Planck-equation, respectively. We provide a brief synopsis of these approaches, to justify, or at least indicate the origin of, the various results cited in the main text. A full treatment may be found in Tuckwell (1988) and Risken (1984).

#### Discrete random walks

Continuous-time random walks in a discrete state-space *K* constitute a subclass of Markov processes and their individual realizations form lattice paths. Such processes are governed by a set of gain-loss relations on the probability of each state *n* ∈, called the Master Equation as follows:
where *W* denotes the transition rate from one state to another.

##### BP process.

Given transition rates ν* _{E}*, ν

*, the Master Equation reads as follows: This can be solved analytically by means of the generating function*

_{I}*G*(

*z*,

*t*) = ∑

*(*

_{k}P_{k}*t*)

*z*, assuming initial conditions

^{k}*P*(

_{n}*t*= 0) = δ

_{0n}as follows: illustrated in Figure 7

*F*, where is the modified Bessel function of the first kind.

##### GE process.

Given transition rates ν_{+}, ν_{−}, the Master Equation reads as follows:
For initial condition, *P _{n}*(

*t*= 0) = δ

_{0n}, the Master Equation may be solved as before to obtain the following: where τ

*=*

_{in}*p*= ν

_{+}τ

*=*

_{in}*x*, and

_{in}*q*= 1 −

*p*= ν

_{−}τ

*. The time evolution of this distribution is illustrated in Figure 7*

_{in}*E*.

#### Continuous random walks

Gaussian random walks in a continuous state-space can derive from the continuous limits of discrete state-space diffusion processes (Cox and Miller, 1972). Typically, continuous processes are more tractable, both analytically and numerically, than their discrete counterparts. The time-dependent probability density *p*(*x*, *t*) of a continuous random variable *x*(*t*) is governed by a Fokker-Planck equation (Risken, 1984) as follows:
where μ(*x*, *x _{in}*) and σ(

*x*,

*x*) represent “accumulation” or “drift” and “dispersion” or “noise,” respectively. In general, both terms may depend on the current state

_{in}*x*and on external input

*x*. For instance, in the continuous limit of the GE process, accumulation and dispersion rates reflect the input-dependent individual transition rates as well as the number of units available for activation or inactivation. The total rates of increments ρ

_{in}_{+}and decrements ρ

_{−}(see Fig. 7

*A*,

*B*) are as follows: where

*x*is instantaneous activity (fraction of active units), and are shown in Fig. 7

*C*,

*D*. Therefore, the infinitesimal drift and variance are as follows: and are shown in Figure 7

*C*,

*D*.

The dynamics of single realizations of such a process are described by a stochastic differential equation (Langevin equation) as follows:
where ξ(*t*) is normally distributed “white noise.” The Langevin formulation makes clear that the drift term describes a deterministic force that is identical on every trial, whereas the noise term captures stochastic forces that differ from trial to trial.

## Results

To assess how well (or poorly) multistable phenomena conform to a scaling property, we obtained and compared observations from six canonical multistable situations (five visual, one auditory). We also considered two choice reaction tasks as examples of “normal” vision.

### Wide range of mean dominance periods

In multistable phenomena, mean dominance periods vary widely with stimulus strength, observer, and display type. A particularly well-studied situation is BR, where both left and right eye contrast may be varied (Levelt, 1967; Moreno-Bote et al., 2010). A representative set of observations is summarized in Figure 1*A*. For a given observer, mean dominance periods varied fourfold between 25 contrast combinations (a 5 × 5 matrix of left and right eye contrasts; see Materials and Methods). Importantly, mean dominance periods decreased with stimulus contrast. Between observers, mean dominance periods differed by a further factor of 4. Between different display types, mean dominance periods ranged even more widely, as illustrated in Figure 1*B*, *C*. More than six types of multistable visual displays (BR, KDE, MIB, MB, NC) and one multistable auditory scene (auditory streaming), mean dominance periods varied over 2 orders of magnitude, from <1 *s* to >100 *s* (see Materials and Methods).

Compared with mean dominance periods, choice reaction times are far less variable. A twofold range with stimulus strengths or between observers appears typical (Carpenter, 2012; Pearson et al., 2014). The two choice reaction tasks illustrated here span the full range of human reaction times, from 100 *m*s to 1 s (Fig. 1*B*,*C*).

### Invariant shape of dominance distribution

The intuitive definition of a scaling property is that distributions exhibit identical shape when scaled to their respective distribution mean. Formally, a scaling property obtains if central moments μ_{2} and μ_{3} are proportional to corresponding powers of the mean μ_{1} as follows:
or, equivalently, if normalized moments, such as the coefficient of variation *c _{v}* or the skewness γ

_{1}are constant (see Materials and Methods) as follows: Multistable phenomena maintain a remarkably strict scaling property over different stimulus strengths, observers, and display types. For example, the SD of the dominance distribution is (nearly) proportional to the mean, as illustrated by the linear regression lines in Figure 1

*A*.

To assess the generality of this finding, we established the mean, SD, coefficient of variation, and skewness of the distribution of dominance periods for different visual and auditory conditions: auditory streaming, MIB, NC, and KDE, BR for stimulus contrasts *c* ∈ {0.06, 0.12, 0.25, 0.5, 0.6, 1.0}, and MP for continuous display, intermittent display, and intermittent display without attention (Pastukhov and Braun, 2007). Figure 1*B*, *C* illustrates the results in terms of the mean and SEM across observers. The coefficient of variation remained consistently near *c _{v}* ≈ 0.6 (Fig. 1

*B*) and the skewness remained consistently near near γ

_{1}≈ 2

*c*(Fig. 1

_{v}*C*), the value corresponding to a Gamma distribution. In other words, a strict scaling property was maintained over all investigated situations.

Choice reaction times do not conform this pattern. Distribution means are less variable and distribution shapes combine smaller variance ((*c _{v}* ≈ 0.2) with greater skewness (γ > 6

*c*).

_{v}To summarize, the dominance distribution of multistable phenomena exhibits a scaling property with three aspects: (1) distribution shape (normalized moments) is highly preserved across conditions (stimulus strengths, observers, and display types); (2) distribution variance is comparatively large (*c _{v}* ≈ 0.6) and distribution skewness comparatively small (γ

_{1}≈ 2

*c*), corresponding to a gamma distribution; and (3) distribution mean varies widely across conditions and decreases with stimulus strength.

_{v}### Dominance durations as FPTs

Our aim is to compare the ability of various stochastic processes to reproduce the scaling property of multistable perception. An FPT framework (Fig. 2) provides convenient and concise terms for this purpose. In this framework, the timing of perceptual events (such as a choice response or a perceptual reversal) is interpreted as the first passage of a threshold by some underlying neural activity. Specifically, this activity is assumed to grow from some initial level toward the threshold level due to both deterministic and stochastic factors (i.e., factors that remain the same and that change randomly, respectively, from trial to trial).

FPT models account well for response time densities in many task situations (Ratcliff and Smith, 2004; Smith and Ratcliff, 2009; Carpenter, 2012; Pearson et al., 2014). They should not be taken literally in the sense that activity necessarily accumulates in a single neural population. More likely, sensory information accumulates concurrently in several competing populations, so that a threshold is reached when the differential activity exceeds a certain value (e.g., Usher and McClelland, 2001; Smith and Ratcliff, 2009; Bundesen et al., 2015).

In the case of multistable reversals, the available evidence supports a differential threshold in that the currently dominant representation appears to degrade while, concurrently, the currently suppressed representation appears to recover (Wolfe, 1984; Nawrot and Blake, 1989; Petersik, 2002; Kang and Blake, 2010). One conceivable scenario is that gradated representations at sensory levels interact with categorical representations at perceptual levels. Perhaps differential activity at sensory levels progressively diverges from differential activity at perceptual levels, until the latter becomes destabilized and dominance reverses. This temporarily realigns the two levels and the cycle begins anew. However, here we are not concerned with such details. We simply assume that dominance periods reflect the time at which stochastically accumulating (differential) activity exceeds a fixed threshold for the first time, in other words, that reversal times represent FPTs of one sort or another.

FPT models may operate in two distinct regimens, illustrated in Figure 2. In drift-dominated regimens, deterministic forces are sufficient to drive activity to threshold. Stochastic forces merely introduce some variability (Fig. 2*A*). In noise-dominated regimens, deterministic forces drive activity toward a steady-state, which lies some distance *below* threshold (Fig. 2*B*). Stochastic forces are needed to drive activity *away* from steady-state (filled arrow), such as to cover the remaining distance to threshold (open arrow). The difference is best appreciated in an effective energy landscape visualization: the energy gradient takes opposite sign at threshold (Fig. 2*A*,*B*, open arrows).

It is intuitive that FPT density depends sensitively on both deterministic and stochastic forces over the entire activity range traversed. A change to any single variable, deterministic force, stochastic force, or traversed range, will alter the FPT density and all of its moments. In general, first and higher moments will not change in a proportional manner. The following sections will show that two or more variables must change in a proportional manner for a scaling property to come about.

### Comparison of stochastic processes

Stochastic processes may be compared in terms of the rates with which they deterministically “accumulate” and stochastically “disperse” activity over time (for definition, see Materials and Methods). In general, accumulation rate ν* _{drift}*(

*s*,

*x*) and dispersion rate ν

*(*

_{noise}*s*,

*x*) will depend both on sensory input

*s*and on instantaneous activity

*x*(

*t*). Together with initial activity and threshold activity θ, these quantities determine the statistics of reversal times.

The following sections compare various stochastic processes in search of a parsimonious explanation for a scaling property. We begin with current models of perceptual decisions and multistable reversals, which use continuous stochastic processes with constant noise (Wiener process, Ornstein-Uhlenbeck process). Next, we modify these models with input-dependent noise. Finally, we consider discrete stochastic processes (Poisson process, Ehrenfest process), where deterministic and stochastic forces share a common physical origin.

### Continuous stochastic processes with constant noise

Perceptual decisions are commonly modeled in terms of diffusion-like processes (Smith, 2000; Ratcliff and Smith, 2004; Smith and Ratcliff, 2009; Smith et al., 2014). Whereas mean accumulation (drift) is assumed to vary with input, stochastic dispersion (noise) and threshold level are assumed to be constant (Wiener process). The possibility that accumulation varies with activity and vanishes for a particular steady-state level (Ornstein Uhlenbeck process) has also been considered (Smith, 2000; Smith et al., 2014). Multistable reversals are widely modeled in similar terms but additionally invoke an adapting threshold (Laing and Chow, 2002; Moreno-Bote et al., 2007; Noest et al., 2007; Wilson, 2007; Shpiro et al., 2009; Moreno-Bote et al., 2010; Seely and Chow, 2011). We now consider each of these possibilities in turn.

In a Wiener process, the accumulation rate changes with sensory input *s*, but the dispersion rate (noise) remains constant as follows:
where τ is a characteristic time and σ is noise amplitude. Drift must be positive, and the development of *x*(*t*) is always drift-dominated (Fig. 2*A*). The moments of the FTP depend on *x _{in}*(

*s*) and threshold θ and are illustrated in Figure 3

*A*(middle columns). Whereas the mean μ

_{1}shortens with

*x*and lengthens with threshold θ, the coefficient of variation

_{in}*c*decreases with both input

_{v}*x*and θ. The skewness is consistently high with γ

_{in}_{1}= 3

*c*. The experimentally observed scaling property is not reproduced (Fig. 3

_{v}*A*, rightmost column).

In an Ornstein-Uhlenbeck process, the infinitesimal drift varies with activity and vanishes for *x*(*t*) = *x _{in}* as follows:
The threshold-crossing of

*x*(

*t*) is drift-dominated when

*x*> θ, but noise-dominated when

_{in}*x*< θ (Fig. 2). In both regimens, FPT moments may be obtained analytically (Inoue et al., 1995; Brunel, 2000) or numerically (Fig. 3

_{in}*B*, middle columns). A scaling property is reproduced neither by noise-dominated nor by drift-dominated regimens. In both cases, the coefficient of variation

*c*varies with

_{v}*x*and skewness γ

_{in}_{1}≈ 3

*c*is too high.

_{v}Finally, current models of multistable perception additionally assume a time-dependent threshold θ(*t*) that progressively decreases with a characteristic time τ* _{A}*. Assuming that

*x*(

*t*) fluctuates spontaneously about a constant mean and that a reversal occurs instantaneously whenever

*x*(

*t*) > θ(

*t*), the moments of FPT may be obtained analytically (Molini et al., 2011) or by numerical simulation (Fig. 3

*C*, middle columns). Similar to the Ornstein-Uhlenbeck process, both noise-dominated and drift-dominated regimens are obtained. In both regimens, the coefficient of variation

*c*varies with input.

_{v}Interestingly, the skewness ranges from γ_{1} ≈ 3*c _{v}* in the noise-dominated regimen to γ

_{1}≈

*c*in the drift-dominated regimen. The experimentally observed skewness of γ

_{v}_{1}≈ 2

*c*obtains only for one particular combination of of

_{v}*x*and θ (Fig. 3

_{in}*C*, middle columns, arrow). This confirms and extends previous reports that a precise balance between adaptation and noise is required to attain the characteristic, gamma-like distribution of multistable reversals (van Ee, 2009; Pastukhov et al., 2013).

To summarize, if the accumulation rate ν_{drift}* varies with input while the dispersion rate ν_{noise}* remains constant, the normalized FPT moments vary with input and a scaling property is not obtained. This conclusion holds independently of details, such as the presence or absence of a steady-state or of an adapting threshold.

### Continuous stochastic processes with input-dependent noise

The previous section showed that a scaling property cannot be obtained if only accumulation rate, but not dispersion rate, changes with input. Accordingly, we now consider scenarios where accumulation and dispersion change proportionally with sensory input *s*.

A modified Wiener process with input-dependent dispersion (noise) is as follows:
has the FPT moments illustrated in Figure 4*A* (middle columns). As expected, the normalized moments *c _{v}* and γ

_{1}remain constant with

*x*, whereas mean FPT spans a wide range of values. The constant α determines the requisite threshold precision. The higher its value, the narrower the range of thresholds over which a given range of

_{in}*c*is obtained. However, the skewness is γ

_{v}_{1}≈ 3

*c*higher than experimentally observed (Fig. 4

_{v}*A*, rightmost column).

A modified Ornstein-Uhlenbeck process with input-dependent dispersion (noise) (Cox et al., 1985) is as follows:
exhibits the FPT moments illustrated in Figure 4*B* (middle columns). In the drift-dominated regimen, the FPT density is modulated similarly as in a Wiener process. In the noise-dominated regimen, the FPT density resembles that of the original, unmodified Ornstein-Uhlenbeck process. As before, skewness γ_{1} ≈ 3*c _{v}* is consistently higher than experimentally observed (Fig. 4

*B*, rightmost column).

To summarize, maintaining the proportionality ν_{noise}* ∝ ν_{drift}* ensures the constancy of normalized moment in drift-dominated regimens. However, the experimentally observed skewness of γ_{1} ≈ 2*c _{v}* cannot be reproduced. The reason is that all processes with symmetric noise, so-called Gaussian diffusion processes, are characterized by an inverse Gaussian or Wald distribution of FPT with γ

_{1}≈ 3

*c*(Tuckwell, 1988).

_{v}### Processes with discrete stochastic events

The two previous sections showed that a scaling property obtains when sensory input affects accumulation and dispersion proportionally. A natural way to ensure this proportionality are processes comprising discrete and spontaneous events, such as Poisson events, simply because event rate will affect both accumulation and dispersion. Several levels of neural activity may be described in terms of Poisson events (e.g., spikes, up-state and down-state transitions, avalanches). We now consider two complementary examples: a BP process and a GE process.

A BP process combines two independent Poisson processes contributing, respectively, activity increments and decrements of unit size with homogeneous rates ν* _{E}* and ν

*Tuckwell (1988). Several realizations*

_{I}*n*(

*t*) of such a process are illustrated in Figure 5

*A*. In the illustrated example, the initial distribution is deterministic with

*p*(

*n*,

*t*= 0) = δ(

*n*− 4) and ν

*= 4.8 Hz is comfortably larger than ν*

_{E}*= 3.2 Hz, ensuring that mean and variance of*

_{I}*x*(

*t*) increase linearly over time. The evolution of

*p*(

*n*,

*t*) over time is illustrated in Figure 7

*E*.

In general, the accumulation and dispersion rates are as follows:
and a scaling property is guaranteed (Fig. 6*A*), provided sensory input *s* changes the event rates in a proportional manner, so that the proportionality between accumulation and dispersion is maintained as follows:
However, the skewness γ_{1} = 3*c _{v}* is once again larger than experimentally observed. The reason is that a BP process remains a diffusive process (despite its discrete nature), in that activity disperses symmetrically, as shown in Figure 7

*F*.

We now compare and contrast a GE process, in which discrete increments and decrements of activity originate within a finite population of spontaneously bistable units (Fig. 5*B–D*). Such a population may become saturated or depleted, so that activation (inactivation) rates will vary with the number of inactive (active) units. This renders both accumulation and dispersion rates dependent on current activation *x*, with important consequences for FPT moments. Originally introduced in 1907 as the famous “dog-flea” model of diffusion, this generalized birth-death process provides a minimal model for the statistical dynamics of many microscopic and mesoscopic systems (Karlin and McGregor, 1965; van Kampen, 1981; Emch and Liu, 2002; Ghosh et al., 2006; Cao et al., 2014).

We consider a population of *N* bistable units, each of which transitions spontaneously between inactive and active states. The transitions are independent Poisson events with rates ν_{+} = ν_{+}(*s*) (activation) and ν_{−} = ν_{−}(*s*) (inactivation), set by sensory input *s*. Additionally, we assume that the ratio ν_{+}/ν_{−} grows monotonically with sensory input *s* as follows:
On average, *x*(*t*) relaxes exponentially to a steady-state activation *x _{in}* with a characteristic time τ

*, both of which change with sensory input*

_{in}*s*as follows: This simple behavior of the average is deceptive, however, for the detailed dynamics is considerably more complex. Several realizations

*x*(

*t*) are illustrated in Figure 5

*B*. Starting from a deterministic initial state (

*n*

_{0}= 4), mean and variance of

*x*(

*t*) increase linearly over time, as ν

_{+}= 0.06 Hz is comfortably larger than ν

_{−}= 0.04 Hz. The realizations of the GE process remain more compact than those of the BP in Figure 5

*A*, even though the total increment and decrement rates are identical (ν

_{+}

*N*= 4.8 Hz and ν

_{−}

*N*= 3.2 Hz). The difference is even more evident in the evolution of activity

*p*(

*x*,

*t*) over time. Comparing the GE process (Fig. 7

*F*) with the BP process (Fig. 7

*E*), two qualitative differences are apparent: the GE accumulation is faster and the GE dispersion is asymmetric. We now consider the reasons for these important differences.

In a GE process, accumulation and dispersion rates reflect the input-dependent transition rates as well as the number of units available for activation or inactivation. As the two populations are typically of unequal size, increment and decrement rates may differ considerably. The resulting infinitesimal drift and variance (see Materials and Methods) are as follows:
and are shown in Figure 7*C*, *D*. At low activation levels *x* ≪ 0.5, increments are far more frequent than decrements, explaining the two qualitative features, fast accumulation and asymmetric dispersion, mentioned above.

Next, we turn to the moments of the FPT density and their dependence on *x _{in}* and θ. Given an activity range

*x*

_{0}≤

*x*(

*t*) ≤ θ, FPT moments may be obtained analytically (Cao et al., 2014) and are illustrated in Figure 6

*B*(middle columns). In the drift-dominated regimen

*x*> θ (indicated by arrows), normalized moments

_{in}*c*and γ

_{v}_{1}remain nearly constant with changing input

*x*. Moreover, the skewness asymptotes to the experimentally observed value γ

_{in}_{1}≈ 2

*c*(Fig. 6

_{v}*B*, rightmost column). Only a small distance θ −

*x*

_{0}between threshold and initial condition is required to obtain

*c*≈ 0.6. These conclusions hold for any initial condition and threshold in the low-activity regimen

_{v}*x*≪ 0.5 where the coefficient of variation is approximately (Cao et al., 2014) the following: In the limit of large

*x*, the FPT density of a GE process approaches an exact gamma distribution, with skewness γ

_{in}_{1}= 2

*c*and excess kurtosis γ

_{v}_{2}= 6

*c*

_{v}

^{2}.

To summarize, a GE process operating far from steady-state (*x* ≪ *x _{in}*) reproduces the scalar property of multistable perception for two reasons. First, both accumulation and drift increase proportionally with

*x*, ensuring nearly constant normalized moments

_{in}*c*and γ

_{v}_{1}(Fig. 7

*C*,

*D*). Second, positive accumulation is strong and negative accumulation negligible (Fig. 7

*A*,

*B*). This entails an asymmetric (non-Gaussian) dispersion that “thins out” realizations lingering near

*x*

_{0}and keeps the activity distribution compact and asymmetric (Fig. 7

*E*). The comparatively compact dispersion of activity then translates into a short-tailed FPT distribution with skewness γ

_{1}≈ 2

*c*.

_{v}Finally, to conclusively demonstrate the importance of asymmetric noise, we compare the discrete GE process with its continuous equivalent, a Cox-Ingersoll-Ross process where the infinitesimal drift and variance μ(*x*, *x _{in}*) and σ

^{2}(

*x*,

*x*), are both input- and state-dependent. In the drift-dominated regimen, accumulation and dispersion rates are as follows: Even though this process matches both the accumulation rates ν

_{in}_{drift}* and the dispersion rates ν

_{noise}* of the discrete GE process, it is a Gaussian diffusive process and operates with symmetric noise. The moments of the FPT density are illustrated in Fig. 6

*C*(middle columns). As with the original, discrete process, both

*c*and γ

_{v}_{1}remain constant with changing

*x*. However, unlike the discrete process, the skewness asymptotes to the larger value γ

_{in}_{1}≈ 3

*c*of an inverse Gaussian distribution (Fig. 6

_{v}*C*, rightmost column).

We conclude by listing the necessary and sufficient conditions for reproducing the scaling property of multistable perception as follows: (1) a population of *N* discrete, locally bistable units that spontaneously activate and inactivate with rates ν_{+} and ν_{−}, respectively; (2) the relative rate ν_{+}/ν_{−}, and thus also the steady-state activation *x _{in}* = ν

_{+}/(ν

_{−}+ ν

_{+}), grows with sensory input

*s*; (3) both initial and threshold activation are comparatively small and far from steady-state,

*x*≪

*x*; (4) population size

_{in}*N*is limited by threshold precision; and (5) to obtain the experimentally observed coefficients of variation

*c*≈ 0.6, the distance between threshold and initial condition must be small, θ −

_{v}*x*

_{0}≈ 0.05.

### Neuronal realization of GE process

A neuronal realization of collective activity accumulating in a population of locally bistable assemblies is illustrated in Figure 8. We simulated 20 assemblies of strongly coupled spiking neurons, each balancing excitation and inhibition, such as to express a bistable attractor dynamics. Activations and inactivations occurred spontaneously, driven by endogeneous activity fluctuations (finite-size noise), and both transition rates changed with external input (for details, see Materials and Methods). To initiate a stochastic accumulation of collective activity, we abruptly altered external input, such as to raise the steady-state fraction of active assemblies from near zero to near unity. The spike raster in Figure 8*A* shows individual assemblies activating at different (random) times and, superimposed, the progressive increase of the mean firing rate over all assemblies. The trajectory of collective activity is more predictable than are individual activations. If the threshold for collective activity is low, so that the entire accumulation proceeds far from steady-state, the FPTs of collective activity will reproduce the scaling property of multistable perception (compare Fig. 5*B–D*). Figure 8*B* shows a slightly different situation, with an initial steady-state fraction of *x*_{0} = 0.2 active assemblies. In this case, individual assemblies both activate and (less often) inactivate at different (random) times. Once again, the progressive increase of collective activity can reproduce the scaling property of multistable perception.

For comparison, we also illustrate an alternative scenario that accumulates collective activity in a comparable way but cannot reproduce a scaling property (Fig. 8*C*). For this purpose, we simulated assemblies of weakly coupled spiking neurons, which do not express a bistable dynamics. Instead, the activity of each assembly fluctuates endogenously (finite-size noise) about a steady-state that is set by external input. To obtain a stochastic accumulation, we gradually increased external input, such as to raise the steady-state from low to high activity. The spike raster shows individual assemblies becoming progressively more active and, superimposed, the collective activity from all assemblies.

By construction, the alternative scenarios illustrated by Figure 8*A*, *C* are indistinguishable in terms of the time-varying average spike rate *r*(*t*). (This is true for arbitrary subsets of neurons.) Nevertheless, even at the single-cell level, a bistable dynamics of individual assemblies is detectable in terms of the variability of firing rates and the distribution of interspike intervals. These telltale features are particularly evident compared with inhomogeneous Poisson spikes with identical average rate *r*(*t*). Specifically, the SD of the instantaneous firing rate increases significantly during the rising phase of the accumulation (Fig. 8*D*) and the interspike interval distribution emphasizes extremes (short and long intervals) at the expense of the middle (Fig. 8*E*).

## Discussion

We have examined the possibility that the highly consistent statistics of multistable phenomena may reflect the qualitative nature of perceptual representations. The macroscopic laws of physical systems often depend more on the qualitative nature than on the details of their microscopic constituents (Haken, 1975), and this may hold also for large-scale brain activity (Breakspear and Jirsa, 2007).

### Current models of multistability

Current models of multistable perception seek to capture the interplay between stabilizing and destabilizing factors thought to underlie perceptual reversals. First, competition between alternative appearances stabilizes whichever currently dominates (Blake et al., 1990; Alais et al., 2010). Second, neural adaptation progressively weakens the dominant representation, shortening its persistence (Wolfe, 1984; Petersik, 2002; Kang and Blake, 2010). Third, neural noise initiates stochastic perceptual reversals (Hollins, 1980; Brascamp et al., 2006; Kim et al., 2006; Sterzer and Rees, 2008). Typically, such models seek to account for the occurrence of reversals, the rate of reversals, and its dependence on stimulus strength (Laing and Chow, 2002; Wilson, 2003; Moldakarimov et al., 2005; Moreno-Bote et al., 2007; Noest et al., 2007; Shpiro et al., 2007; Wilson, 2007; Shpiro et al., 2009; Seely and Chow, 2011; Pastukhov et al., 2013). Because these models combine a fast, stochastic process (noisy competition for dominance) with a slow, deterministic process (adaptation), they can be idealized as a fast escape with slowly adapting threshold.

The reasons why current models fail to reproduce the experimentally observed statistics are straightforward. First, they assume that the slow process (adaptation, corresponding to our accumulation rate ν* _{drift}*) changes with sensory input, whereas the fast process does not (noise, corresponding to our dispersion rate ν

*). To obtain a scaling property, both adaptation and noise would have to vary with sensory input*

_{noise}*s*(see Materials and Methods). Second, current models assume symmetric dispersion, which necessarily yields heavier-tailed FPT distributions with γ

_{1}≈ 3

*c*(see Figs. 3

_{v}*A*,

*B*, 4

*A*,

*B*, 6

*C*, 7

*E*).

It is true that, for a given input strength, the experimentally observed skewness (γ_{1} ≈ 2*c _{v}*) may be obtained by precisely matching adaptation rate and noise amplitude (Pastukhov et al., 2013) (Fig. 3

*C*, middle panels, arrows). However, to obtain a scaling property, both adaptation rate and noise amplitude would have to be matched anew for each input strength.

### A finite population of bistable units accumulating activity far from equilibrium

To explain the observed statistics of multistable perception, we propose to replace the slow, deterministic process of current models (adaptation) with a discrete random walk (GE process). This is consistent with a suggestion of van Ee (2009), who has argued previously that reversal timing reflects a slow random walk rather than a deterministic relaxation, on the grounds that reversal timing is not entirely “memoryless” (Pastukhov and Braun, 2011). We do not see a need to change the fast, stochastic processes of current models, which provide the mechanism for reversals, and which in any case are beyond the scope of the present work.

Specifically, we propose that the neural representations underlying alternative perceptual appearances exhibit three features: (1) granularity in the sense of comprising a finite number of discrete units; (2) local bistability in the sense that individual units transition spontaneously and independently between inactive and active states; and (3) far-from-equilibrium operation in the sense that the entire stochastic accumulation of discrete activity, from initial state to threshold, proceeds far from the steady-state.

Accumulating activity with bistable units ensures a scaling property because both accumulation rate ν* _{drift}* and dispersion rate ν

*change proportionally with sensory input. Accumulating activity over a small range (from initial to threshold level) ensures the high variability of FPT that is experimentally observed. And accumulating activity far from equilibrium, where activity disperses asymmetrically, ensures the shorter-tailed FPT distributions that are observed (Figs. 3*

_{noise}*C*, 7

*F*). Two previous attempts to model multistable perception with populations of bistable units (Taylor and Aldridge, 1974; Gigante et al., 2009) failed to reproduce these statistics (De Marco et al., 1977) because neither operated far enough from equilibrium.

### Collective dynamics of cortical columns or clusters

To what extent is the scenario outlined above consistent with, or even supported by, the available evidence about the dynamics of neocortical activity?

The postulated units idealize the well-known dynamics of attractor assemblies (Amit, 1995). A neuronal assembly with balanced excitation and inhibition may express two distinct steady-states, a low activity (inactive state) and a high activity state (active state), provided that recurrent excitation amplifies inputs nonlinearly (Amit and Brunel, 1997; Wang, 2002; Major and Tank, 2004). If this is the case, collective activity is bistable and, driven by endogeneous fluctuations (finite-size noise), transitions spontaneously between the two steady-states. External inputs alter the relative stability of inactive and active states and therefore modulate the transition rates. In sum, balanced attractor assemblies exhibit all postulated properties: local bistability, noise-driven transitions, and input-dependent rates.

The crucial ingredient of attractor dynamics, recurrent excitatory connectivity, is well established for cortical columns and clusters of columns. In the superficial layers of primate neocortex, neurons are organized into radial columns with similar response properties (Snodderly and Gur, 1995; Mountcastle, 1997; Maier et al., 2010) and preferential recurrent connectivity (Douglas and Martin, 2004). Nearby columns with similar properties are linked by “patchy” horizontal connections into locally dispersed clusters (Bosking et al., 1997; Lund et al., 2003; Tanigawa et al., 2005; Muir and Douglas, 2011). The preferential excitatory connectivity within clusters of columns shapes both the response properties of individual neurons (Angelucci and Bressloff, 2006) and the “patchy” patterns of activity evoked by stimulation (Arieli et al., 1996; Bosking et al., 1997; Tsodyks et al., 1999).

Neocortical activity varies spontaneously even in the absence of external stimulation and to a degree comparable to that evoked by stimulation (Arieli et al., 1996; Destexhe et al., 2003). Interestingly, the spatiotemporal structure of spontaneous activity often corresponds closely to activity patterns evoked by sensory stimulation (Kenet et al., 2003; Petersen et al., 2003; MacLean et al., 2005; Luczak et al., 2007; Sakata and Harris, 2009; Harris et al., 2011). On the basis of these observations, it has been suggested that cortex expresses only a limited vocabulary of activity patterns and that this vocabulary is explored more extensively by spontaneous activity than by evoked activity (Luczak et al., 2009; Harris et al., 2011).

A slow exploration of a limited vocabulary of activity states is exactly the dynamics expressed by interacting attractor assemblies (Durstewitz and Deco, 2008; Litwin-Kumar and Doiron, 2012; Wang, 2012). If excitatory connectivity is not uniform but forms clusters of preferentially connected neurons, then these clusters may express a slow dynamics (<10 Hz), transiently increasing or decreasing their collective activity (Litwin-Kumar and Doiron, 2012). Despite the slow correlation, the timing of individual spikes in a cluster remains asynchronous (Renart et al., 2010). External stimulation biases the slow dynamics and, for stimulated clusters, selectively extends the periods of increased activity.

Whether or not cortical columns or clusters do indeed express bona fide attractor dynamics remains an open question. Evidence from anesthetized rodents and from *in vitro* preparations shows that, at a minimum, cortical networks may enter into transient states of enhanced excitability, which are characterized by a consistently depolarized membrane potential, asynchronous spiking activity, and nonlinear amplification of inputs (Haider and McCormick, 2009; Harris et al., 2011). These states are maintained by a barrage of excitatory and inhibitory synaptic potentials from recurrent connectivity.

Recent evidence from awake rodents and primates demonstrates that many (not all) cortical neurons are strongly coupled to local population activity (Okun et al., 2015), consistent with a local attractor assembly. Sophisticated statistical approaches have recovered evidence for sudden transitions between discrete activity states in sensory, motor, and parietal cortices of behaving animals (Abeles et al., 1995; Jones et al., 2007; Latimer et al., 2015). Even more intriguingly, laminar recording in extrastriate visual cortex of awake primates revealed columnar activity to be bimodal, transitioning between inactive and active states “nearly simultaneously throughout the cortical dept” (Engel et al., 2015). Transition times were random (Poisson statistics), and transition rates were modulated by stimulus and task conditions, exactly as expected from locally bistable assemblies (Engel et al., 2015). In premotor cortex of awake primates, local multiunit activity was found to be bimodal at 57 of 107 recorded locations (Mattia et al., 2013). Additionally, the differential power spectrum of multiunit activity was found to be consistent with bistable attractor dynamics at a subset of recorded locations.

### Conclusions and outlook

We have shown that the characteristic statistics of multistable perception would be reproduced by perceptual representations that comprise populations of attractor assemblies, such as cortical columns or clusters of columns, each expressing a bistable local dynamics, and that accumulate activity in a narrow range far from equilibrium. This idealized scenario may be relaxed without compromising the FPT statistics: the population may be heterogeneous in that assemblies may display different transition rates, may encode different aspects of the stimulus, and may be situated at different levels of a cortical hierarchy, provided all assemblies contribute additively to collective activity.

The perceptual representation postulated here is well suited to support reliable probabilistic inference by stochastic sampling (Hinton and Sejnowski, 1986; Buesing et al., 2011). Specifically, Langevin sampling by a modified Boltzmann machine (Geman and Geman, 1984; Hoyer and Hyvärinen, 2003; Ma et al., 2006; Sundareswara and Schrater, 2008; Churchland et al., 2011; Gershman et al., 2012) offers a functional rationale as to why competing perceptual representations should operate far from equilibrium and at a low level of fractional activation: in this regimen, microtrajectories involving only a few transitions may be expected to acquire maximal influence (maximum caliber) on the competition outcome (Ghosh et al., 2006; Pressé et al., 2013), permitting inferences to be drawn with optimal reliability.

## Footnotes

This work was supported by European Union FP7 FET-ICT-269459 Coronet, Deutsche Forschungsgemeinschaft BR 987/3-1 and Land Sachsen-Anhalt. We thank Istvan Winkler and Sue Denham for sharing data on auditory multistability.

The authors declare no competing financial interests.

- Correspondence should be addressed to Dr. Jochen Braun, Institute of Biology, Otto-von-Guericke University, Leipziger Strasse 44, Haus 91, 39120 Magdeburg, Germany. E-mail: jochen.braun{at}ovgu.de