## Abstract

Action potentials, taking place over milliseconds, are the basis of neural computation. However, the dynamics of excitability over longer, behaviorally relevant timescales remain underexplored. A recent experiment used long-term recordings from single neurons to reveal multiple timescale fluctuations in response to constant stimuli, along with more reliable responses to variable stimuli. Here, we demonstrate that this apparent paradox is resolved if neurons operate in a marginally stable dynamic regime, which we reveal using a novel inference method. Excitability in this regime is characterized by large fluctuations while retaining high sensitivity to external varying stimuli. A new model with a dynamic recovery timescale that interacts with excitability captures this dynamic regime and predicts the neurons' response with high accuracy. The model explains most experimental observations under several stimulus statistics. The compact structure of our model permits further exploration on the network level.

**SIGNIFICANCE STATEMENT** Excitability is the basis for all neural computations and its long-term dynamics reveal a complex combination of many timescales. We discovered that neural excitability operates under a marginally stable regime in which the system is dominated by internal fluctuation while retaining high sensitivity to externally varying stimuli. We offer a novel approach to modeling excitability dynamics by assuming that the recovery timescale is itself a dynamic variable. Our model is able to capture a wide range of experimental phenomena using few parameters with significantly higher predictive power than previous models.

## Introduction

Action potentials, or spikes, are the basic information currency of the nervous system. The dynamics of excitability, providing the conditions for spike emission, are thus crucial for all cognitive functions.

Measurements of neural excitability revealed fluctuations across timescales ranging from milliseconds to hours and across many levels of organization (Ulanovsky et al., 2004; Anderson, 2001; Monto et al., 2008; Marom, 2010). Despite this, natural (or natural-like) stimuli are very effective at reducing this variability both *in vitro* and *in vivo* (Mainen and Sejnowski, 1995; de Ruyter van Steveninck et al., 1997; Schneidman et al., 1998; Churchland et al., 2010). Recently, it was shown that such phenomena exist over much longer timescales at the single neuron level (Gal et al., 2010; Gal and Marom, 2013). Isolated cultured rat cortical neurons were stimulated by periodic pulse trains of 10–20 Hz for several hours. The neuron responded intermittently to the stimuli, revealing slow fluctuations correlated across many timescales. In contrast, pulse trains with variable intervals were able to entrain the neuron's response, revealing a much shorter timescale. This was particularly evident for power-law-distributed (natural-like) stimulus intervals. Considering all of the slow biophysical processes that could give rise to intermittency, it is not trivial to explain the relatively fast entrainment. These two seemingly paradoxical properties pose a challenge to our understanding of long-term single neuron excitability dynamics.

Slow dynamics of excitability and its constituents have been studied experimentally and theoretically from ion channels through neurons to networks (Adelman and Palti, 1969; Fox, 1976; Rudy, 1978; Aldrich et al., 1983; Quandt, 1988; Ogata and Tatebayashi, 1992; LeMasson et al., 1993; Marder et al., 1996; Richmond et al., 1998; Toib et al., 1998; Mickus et al., 1999; Vilin et al., 1999; Ellerkmann et al., 2001; Beggs and Plenz, 2004; Ulbricht, 2005). It is convenient to describe the main finding at the level of a single channel that can be either active (open or closed) or inactive. The finding is that the duration of inactive epochs is widely distributed. Corresponding observations include, among others, the overall conductance in a population of channels, the combined excitability of a single neuron, and the response of a network to external stimulus. Among the proposed mechanisms underlying these observations are slow sodium and potassium channel inactivation and slow intracellular calcium dynamics. For instance, the large number of protein conformations corresponding to inactive channels led to models of excitability dynamics as a diffusion along a Markov chain of unavailable states (McManus and Magleby, 1988; Millhauser et al., 1988). Similar models have been made at the single neuron level, treating the neuron as an ensemble of channels, with excitability being set by the fraction of active channels (Gilboa et al., 2005; Soudry and Meir, 2010).

Our aim is to provide a model with few parameters that addresses both the existence of slow fluctuations and their reduction by variable stimuli. Such a model can be useful both for robust fitting of data (Pozzorini et al., 2013; Mensi et al., 2016) and to explore the effect of parameters on network function. A model with very few parameters was suggested by Marom (2009), in which an adaptive recovery timescale replaces the ensemble of states. In this approach, the details of ion channel populations are neglected and instead an effective recovery timescale is given as a direct function of the current excitability level. However, this model was neither compared directly with experimental results nor considered under variable stimuli.

Here, we show that the two paradoxical aspects of single neuron excitability (intermittency and entrainment) could be understood within a framework of marginally stable dynamics in which excitability is marginally below criticality with a weakly sustained equilibrium while at the same time maintaining high responsiveness to external stimuli. We introduce a new model in the spirit of the adaptive timescale to capture these dynamics. A novel method is proposed based on statistical inference for analyzing long-term excitability experiments through which we could measure the recovery timescale from data. The measured results reveal a dynamical timescale that provides a compact description of various features of the data.

The model, similar to experimental results, is sensitive to stimulus statistics over large timescales, suggesting a potential memory mechanism (Marder et al., 1996). The relatively simple form of the model opens new avenues for exploration in a network setting.

## Materials and Methods

This section contains five parts: (1) a brief description of the experimental data; (2) the statistical observables that were extracted both from the data and from the various models; (3) a definition of all models used and analytical approximations where appropriate; (4) methods used to infer model parameters; and (5) tables of the resulting parameters and notes about their robustness.

### Experimental details

The full details of the experiment were described previously (Gal and Marom, 2013), but we will repeat the main points here. Synaptic blockers were applied to a culture of rat cortical neurons that was plated on a multi electrode array, effectively creating a population of isolated single neurons. A single electrode was chosen to deliver stimuli, whereas other electrodes that contained well isolated spikes were used for recording.

A natural concern when dealing with long timescales is isolating experimental drift from physiological processes. This concern was addressed as detailed previously (Gal et al., 2010). Controls included monitoring spike shape and amplitude and ensuring that there are no correlations between inactive periods of two synaptically isolated neurons that were recorded simultaneously.

Spikes were extracted from the extracellularly recorded voltage trace by setting a threshold at three times the SD. Spike latency was neglected because we aimed to study behavior in the timescale beyond the stimulation interval (0.1 s).

### Statistical observables

#### Probability trace

Input pulses and output spikes were collected in 1 s nonoverlapping bins. A continuous probability function was defined as follows:
Where *R _{t}^{m}* and

*I*are the number of output spikes and input pulses respectively in time bin

_{t}^{m}*t*and trial

*m*.

#### Fano factor

The Fano factor is used to quantify fluctuation level relative to mean under constant stimulus defined as follows:
Where *N*^{m}(*T*) is the number of spikes per time window *T* and σ^{m}(*T*) is the SD of *N*^{m}(*T*). The average is taken over all time windows of size *T*.

#### Autocorrelation

Autocorrelation is used to study the fluctuation under constant stimulus as follows:
where *var*(*P _{t}^{m}*) and P̅t̅m̅ are the variance and mean of the probability trace, respectively.

*A*(0) = 1, and is not displayed in the figures.

#### Covariance function

Covariance between input and output was used to characterize impact of stimulus on its output across time as follows:.
where *Ī* is the average of stimulus intensity.

#### Reproducibility

This measure was defined to quantify reproducibility of output while eliminating the effect of trivial input dependency. Specifically, the mean response to a given input level was subtracted before calculating the intertrial correlation as follows:
where *P*(*I*) is the averaged response probability conditioned on certain stimulus intensity *I* in 1 s resolution.

### Models

#### Single timescale model

This is the simplest model, which also forms the basis for all other models herein. It describes the dynamics of an abstract excitability variable *x* due to incoming input pulse at *t _{i}* and feedback from successful spikes

*y*

_{i}as follows: where

*t*denotes stimulus times. Both

_{i}*x*

_{i}

^{–}and

*t*

_{i}

^{–}refer to the values before the stimulus and spike.

*y*

_{i}is the binary output response sampled from conditional probability

*P*(

*y*|

_{i}*x*

_{i}

^{–}) according to a sigmoid function with a slope β. After an output spike (

*y*= 1),

*x*decreases by a utilization fraction

*U*and later recovers with a timescale τ. An additive noise of magnitude σ is added through a Gaussian variable ξ. The model assumes that an input pulse that did not lead to an output spike does not affect excitability. The differential equation is simulated by Euler method with time step of 0.01 s and noise scaled by according to Ito's formula.

##### Linearized analysis.

All the simulations were done using discrete input and output pulse trains. Most of the mathematical analysis, however, uses a continuous approximation as a coarse grained version of the original equations to facilitate the analysis. We replace *y*_{t} with a continuous function *f*(*x*) and replace δ(*t* − *t _{i}*) with a stimulus intensity

*I*(

*t*). The noise associated with the independent Bernoulli spiking is absorbed into the additive Gaussian noise in the equation σ′ = , effectively approximating the Bernoulli distribution by a normal one as follows: Linearizing around the fixed point

*x̄*results in the following Langevin equation: Despite the large fluctuations in response to a constant stimulus, the average response probability

*P̄*can still constrain model parameters through the following steady-state equation: where we approximate <

*f*(

*x*)> =

*f*(

*x̄*). Substituting the actual sigmoid into Equation 11 gives rise to the following equation that is the basis for deriving inequality 41 in the main text: The Langevin dynamics dictate the statistics of excitability as

*x*∼ (

*x̄*, σ

^{2}/

*2*θ), with autocorrelation timescale

*T*of

_{ac}*x*(and also of

*P*) as follows: To estimate the response to the fluctuations of the white noise stimulus, we consider the input's deviation from the mean using linear analysis (the variance of 2.6 Hz is not too large compared with the mean of 11 Hz) as follows: Equation 16 allows us to evaluate the sensitivity of response probability to an input perturbation δ

*i*= ∑Δ

*I*δ(

_{i}*t*−

*t*).

_{i}#### Adaptive timescale model

The adaptive timescale model (Marom, 2009) is defined by the following equations:
The model's coarse grained version is as follows:
which leads to the following fixed point condition:
Marginal stability corresponds to a low value of ∂*F*(*x̄*, *I*)/∂*x* as follows:
Criticality happens when ∂*F*(*x̄*, *I*)/∂*x* ∼ 0 as follows:
Given the average response probability *P*, we can calculate a critical exponent γ_{c} from Equation 22, which can also be determined graphically by estimating the slope of the nullcline in log scale.

#### Dynamical timescale model

The dynamical timescale model augments the adaptive timescale model by including an equation for the evolution of the recovery timescale as follows:
In addition to the parameters of the adaptive timescale mode, we now need to infer τ_{r}. The most obvious observable effect of τ_{r} is different τ(*x*) relations to the three stimuli. We could therefore get an intuitive estimate of τ_{r} by looking at the separation between τ(*x*) to scale-free stimulus and constant stimulus. A larger separation indicates a larger value of τ_{r}. The values obtained from the fitting procedure described below confirm this intuition.

#### Multiple timescale model

This model is a combination of two single timescales with arbitrary linear weighting as follows:

### Inference methods

#### Inference of τ(*t*)

Given stimulation times *t _{i}*, model parameters

*U*, β, and the data-observed probability trace

*P*(

*t*) =

*R*(

*t*)/

*I*(

*t*), where

*R*(

*t*) and

*I*(

*t*) are number of spikes and stimuli in a given time window, respectively. We are interested in inferring τ(

*t*) such that the resulting probability trace

*P̂*(

*t*) will be as similar as possible to the data. We used an iterative procedure that is a simplified version of expectation maximization (Dempster et al., 1977). The main idea is to use the current estimate of τ(

*t*) to obtain

*P̂*(

*t*) and then to use the difference

*P̂*(

*t*) −

*P*(

*t*) to change τ(

*t*). This is done iteratively until convergence.

The resolution of τ(*t*) is chosen to be sufficiently low to allow the model dynamics to have an effect (see below).

##### Step 1: Initialization.

We cut the data into segments by assuming that τ does not change significantly during an individual segment, allowing us to use the single timescale steady-state condition as follows:
To avoid local minima, we add Gaussian noise with SD equals to *U* and interpolate between individual segments to obtain a continuous τ(*t*).

##### Step 2: Simulation.

Using τ(*t*), we simulate the two Equations (1 and 2) and get an output *P̂*(*t*).

##### Step 3: Correction.

Using the probability (Eq. 1) derived from model output, we modify τ(*t*) according to *P̂*(*t*) − *P*(*t*) using Δτ* _{j}* = η <τ>(

*P*−

_{j}*P̂*) +

_{j}*k*(τ

_{j−1}+ τ

_{j+1}− 2τ

*). Where η and*

_{j}*k*are the learning rate and smoothness constraint, respectively. The reason for the scaling <τ> is that the signal-to-noise ratio increases with the number of data points. We therefore put more weight on segments with more data points. We chose η = 0.2 and

*k*= 0.03 according to a fitting score that was defined as the correlation between model and data probability traces.

Steps 2 and 3 were iterated until convergence of τ(*t*) (∼100 iterations).

##### Choice of estimation time window.

To estimate the accuracy of the inference method, we first applied it to output generated by a model with known τ(*t*), which is generated by the dynamical timescale equations (24–27) giving response probability with same statistics as data. The estimation window cannot be smaller than the longest interstimulus interval of scale-free stimulus (∼5 s) because empty data segments are highly detrimental to the convergence of the process. Conversely, the window size should not be too large to avoid missing the dynamics, leading us to choose a window of 5 s.

##### Effect of U and β on estimation process.

When validating on the model, *U* and β are known. For real data, however, we need to choose some values that might affect τ(*t*). We thus repeated the estimation procedure for different *U* and β values and found that the estimation error is similar whenever *U* and β are within a reasonable range. This range could be obtained by the following arguments: First, if *U*β is very small, then the dynamics are too slow to follow any stimulus and the covariance to white noise stimulus will not fit. Second, if *U*β is very large, then the estimation process diverges on scale-free stimulus. The reasonable range is set around *U*β = ∼0.1. To compare the results across neurons, we fix the same *U* and β values for the entire dataset as *U* = 0.01, β = 10.

As a sanity check, we compared the results of the model with τ(*t*) to a Bernoulli sampling using the probability at each time segment. Comparison done at a 1 s resolution was much better for the model, indicating that fluctuations <5 s could be explained by depletion and recovery processes within the time segment.

#### Evaluating dynamical relation of *x*

If we assume that τ(*t*) is actually a function of *x*, then we arrive at the following equation:
Both *F*(*x*, *I*) and σ′ = were estimated from the model by considering *P* , where δ*t* is stimulus interval. For constant stimulus, we only need to consider the histogram of on *x*. For variable stimulus, however, we need to condition on different *I* values. In particular, because we are interested in slow dynamics of excitability, we low passed the input using a 5 s time window.

#### Inference of γ for adaptive timescale

The parameter γ can be estimated by the following method:

Using fixed values of

*U*= 0.01 and β = 10, estimate τ(*t*) from the data.Plot the resulting τ(

*t*).Using the same

*U*and β values, simulate the dynamics using τ=τ_{0}*x*^{−γ}. This is done for several values of γ between 0 and the critical γ. For each γ, τ_{0}is calculated from the mean field equation (Eq. 21).Plot all resulting τ(

*x*) curves from estimated τ(*t*) described above, and choose γ based on the curve closest to that obtained from the τ(*x*) estimated from the data.

In the next section, we show an alternative method of extracting γ. The values obtained there are consistent with the method described above.

#### General parameter fitting

In addition to the custom method described for γ above, we used a more general approach of cross-validation to fit the parameters of all models, which allowed us to compare the predictive power of different models and avoid over fitting.

Divide the data into train and test session: the white noise stimulus is chosen as a training session and the constant and scale-free stimuli are chosen as test sessions. This is because white noise has both a well defined autocorrelation and input–output covariance functions. Furthermore, the scale-free data provide a good test because both the slow and fast features need to be captured to get a good fit.

Scan the parameter space and then choose the best parameters according to a training score composed of different features of data: Score =

*S*(*P*(_{w}*t*))^{2}+*S*(*Auto*(_{w}*t*))^{2}+*S*(Cov(_{w}*t*))^{2}− . The four terms are as follows: the Euclidean distance between the probability trace in 1 s resolution; the Euclidean distance between the autocorrelation function in 1 s resolution; input–output covariance in 1 s resolution; and a cost associated with σ/*U*as a bias for solutions with low additive noise. Each component is normalized by the range of values in the entire parameter set and then raised to the power of 2. The results are quite insensitive to the power as long as it is at least 2.To capture the variability of the data, we also use one feature from the constant stimulus. Only parameters that produce a Fano factor (in 32 s) larger than 70% of that computed from the data were considered.

The test score is calculated via the average distance between probability trace averaged in one second bins for constant and scale-free stimuli.

Because of intertrial variability in the data, we do not expect errors to go to zero. We therefore calculated the average distance between two probability traces that were derived from two halves of the trials. This error serves as a reference for the fitting scores and is displayed as a dashed line in Figure 6

*C*.

In addition to the cross validation study, we also estimated training error on the entire dataset. We used a similar training score as in the cross validation session, but with more features of the data: Score = *S*(*P _{c}*(

*t*))

^{2}+

*S*(

*P*(

_{w}*t*))

^{2}+

*S*(

*P*(

_{f}*t*))

^{2}+

*S*(

*r*(

_{c}*t*))

^{2}+

*S*(

*r*(

_{w}*t*))

^{2}+

*S*(

*r*(

_{f}*t*))

^{2}+

*S*(

*Auto*(

_{c}*t*))

^{5}+

*S*(

*Auto*(

_{w}*t*))

^{2}+

*S*(

*Cov*(

_{w}*t*))

^{5}− 0.1σ/U. The additional elements are

*r*

_{z}, which stand for the mean response probability for stimulus

*z*. The autocorrelation in constant stimulus and covariance in white noise stimulus are raised to a higher power to better differentiate them because they are the principle features of the data. The selection of optimal parameter is also constrained according to the constant stimulus Fano factor as above.

### Parameter tables

Tables 1 and 2 show the parameters used to fit the autocorrelation to constant stimulus and input–output covariance to white noise stimulus, respectively, which are used to generate Figure 2.

Tables 3, 4, 5, and 6 list the parameters trained with white noise stimulus obtained for the various models, which are used to generate Figure 6. Robustness of parameters was checked with ±10% variation around the chosen value. The sensitivity/robustness of these results regarding parameters variation varies according to neuron. For half of the neurons that are slightly marginal, 10% variation of each parameter makes almost no visible impact. For the very critical neuron, results are actually quite sensitive for parameter changes, especially with γ, β, and τ_{1} concerning criticality. A 10% of γ or diminishing 10% of β would make the neural dynamics unstable.

Table 7 shows parameters trained by all three stimuli for dynamical scale model, which is used to generate Figures 7 and 8.

## Results

### Experimental observations

To study the long-term dynamics of single neuron excitability, we analyzed data obtained by Gal and Marom (2013). Briefly, seven synaptically isolated neurons situated in a culture of rat cortical neurons were stimulated by extracellular pulse trains lasting 10 min each. The pulse trains were designed as 10 repetitions of identical time series to study the reproducibility of responses. The output spikes were sorted from extracellular recordings. Several controls were made to ensure that the measured variability arises from slow excitability dynamics and not experimental drifts (see Materials and Methods). Because we were interested in long-term dynamics, the output spikes were averaged in 1 s time bins to produce a continuous firing rate for further analysis. This choice led us to denote a periodic pulse train as “constant stimulus” herein to avoid confusion with an oscillatory stimulation rate. The actual stimulus is of course different from intracellular experiments in which the same term is used for constant current injection. Importantly, basing the analysis on this continuous firing rate rather than on discrete spikes still allows us to describe the major experimental observations of intermittency and entrainment, as will be detailed below.

Intermittency is mostly observed for a constant stimulus. In this case, the neuron initially responds with an action potential to every stimulus pulse (Fig. 1*A*,*E*, top right). After ∼30 s, however, the probability of response drops to ∼60% and fluctuates around this value (Fig. 1*E*, top right, shows an input of 11.5 Hz leading to an output of around 7 Hz). The simplest explanation is of some fatigue process characterized by a timescale that is longer than the interstimulus intervals, such as slow sodium inactivation and potassium activation (Adelman and Palti, 1969; Soudry and Meir, 2012). A closer look at the fluctuations, however, reveals a more complex picture. Even if we average the response over 5 s bins, the fluctuations do not seem to subside (Fig. 1*B*, cyan vs purple). Following Gal et al. (2010), we quantify this effect by plotting the Fano factor as a function of time window for three different neurons (Fig. 1*C*). The degree of intermittency in the response varies between different neurons. For all neurons, however, the Fano factor does not go to zero, indicating that we cannot detect an upper bound for the recovery timescale. We also characterized these fluctuations by computing the autocorrelation of the response (Fig. 1*D*) that echoes the differences in response variability seen in the Fano factor.

In contrast to this variability, a much more reliable response profile is revealed when the stimulus itself presents large variability. Entrainment of response to stimulus was studied by stimuli with either white noise or scale-free interpulse statistics (Fig. 1*E*, left). All three input protocols were matched to the same mean rate of 11.5 Hz. As can be observed from Figure 1*E* (right), the neuron's output is better entrained by the stimulus as the latter's variability increases. We later quantify entrainment by the correlation between input and output firing rates and the reproducibility across trials (see Figs. 2, 5, 7, and 8).

### A minimal model suggests stimulus-dependent timescales

Intermittency and entrainment represent two seemingly contradictory features of the neuron's response to stimulus. To provide one framework for explaining both aspects while keeping the structure as simple as possible, we started by looking for a minimal model.

The classical approach to neural excitability is through the Hodgkin–Huxley formalism, in which excitability could be regarded as response sensitivity, depending mainly on the ratio between sodium and potassium conductance (Gilboa et al., 2005). The original Hodgkin–Huxley model focuses on the fast millisecond dynamics of spiking. Although slow timescale extensions to this formalism were proposed (Adelman and Palti, 1969; Rudy, 1978; Schneidman et al., 1998), we chose to pursue a different route to keep our focus only on prolonged and slow excitability dynamics beyond 1 s. We therefore use a simplified framework that is formulated directly in this scale.

Specifically, our minimal model consists of an abstract excitability variable *x* that represents resources (e.g., fraction of active sodium channels). Upon a spike, *x* decreases by a utilization fraction *U* and later recovers with a timescale τ (Fig. 2*A*). The model also includes an additive noise term to capture variability of the output. The probability of spiking given an input pulse is determined by the excitability *x* according to a sigmoid function (Gilboa et al., 2005). The model, with parameters *U*, τ_{0}, β, and σ, is given by the following equations:
where ξ is taken from a normal distribution with zero mean and unit variance. By an appropriate choice of parameters, the model can capture the overall response probability in response to a constant stimulus. The neuron initially responds to the stimulus in a one-to-one manner and neural excitability drops upon each spike. After ∼50 s, the neuron begins to react stochastically, which in turn weakens depletion and increases recovery. This model, however, is not able to capture both the intermittency and entrainment aspects of the data, as will be shown below.

To gain insight about the model, we use a coarse-grained version that will serve as a basis for subsequent models here as well. Specifically, we replace both the discrete input and output with smoothed continuous versions and approximate stochastic spiking with an additive Gaussian noise (see Materials and Methods). Note that the model is still simulated with discrete input and output pulses, but the continuous version below facilitates our understanding of it:
For the constant and white noise stimuli, we linearize the dynamics around the fixed point *x̄* implicitly defined by the following:
The resulting linearized equations describe an Ornstein–Uhlenbeck process for the excitability as follows:
The formulas above lead to the following observables: the autocorrelation function of output probability under constant stimulus (via its timescale *T _{ac}*) and the input–output covariance function under white noise stimulus (via its amplitude χ and timescale

*T*) as follows: Where <δ

_{cv}*I*> is the SD of the input averaged in 1 s time bins. The two observables are not independent because the partial derivatives are related, giving rise to the following constraint:

*T*is inversely correlated to χ, imposing a tradeoff between long-term fluctuations and high sensitivity to input.

_{ac}Measuring these observables from the data reveals a long autocorrelation time *T _{ac}* under constant stimulus (Fig. 2

*C*, black: 5–50 s), but also a large input–output covariance χ under white noise stimulus (Fig. 2

*D*, black, −0.1 to −0.2), violating the tradeoff imposed by the minimal model. The parameter set satisfying the autocorrelation in constant stimulus (Fig. 2

*C*, red) is inadequate to describe the neuron's response to a fluctuating input for most of the dataset, as characterized by the input–output covariance under white noise stimulus (Fig. 2

*D*, red). Conversely, fitting the input–output covariance in white noise stimulus (Fig. 2

*D*, green) results in a much shorter autocorrelation timescale in constant stimulus (Fig. 2

*C*, green). This failure of the single timescale model indicates that the effective timescales in the same neuron differ by an order of magnitude. A possible explanation could be multiple timescales operating in parallel, but we show below (in the section “Dynamical timescale model”) that it does not describe the data well. Instead, we opt for a phenomenological description of a single timescale that adapts to different input statistics.

Another paradox revolves around the noise level. In the data, the Fano factor is large across all timescales, suggesting σ ≫ *U* (Fig. 2*B*). At the same time, we observe a relatively high reproducibility across trials to white noise and scale-free stimuli (Fig. 8*B*,*C*), suggesting a high signal-to-noise ratio, implying σ ≪ *U*. Fitting the single timescale model to the observed Fano factor results in negligible reproducibility for fluctuating stimuli (0.06 ± 0.02 model vs 0.42 ± 0.16 data to white noise and 0.11 ± 0.08 vs 0.55 ± 0.15 for scale-free stimulus). This apparent paradox raises questions on the origin of “noise”: are the observed fluctuations really noise or do they point to unknown biological fluctuations that interact with the internal state of the neuron? This issue will also be discussed in the following section.

### Inferred dynamic timescale suggests a marginally stable regime

If the best choice of a timescale depends on the stimulus regime, could this optimal timescale also change on a finer scale? To explore this possibility, we took a novel analysis approach: measuring a dynamic timescale. We assumed that τ is a function of time and developed a simplified expectation minimization algorithm to infer τ(*t*) from the data (see Materials and Methods). The other parameters, *U* and β, were chosen to fit the input–output covariance under fluctuating stimuli. Given the thousands of parameters tuned to the data, it is not surprising that the model fits the data perfectly (Fig. 3*A*). We will show that, despite this overfitting, we can still gain insights from these results.

As we suspected, τ(*t*) does not only vary between stimulus regimes, but also within the same stimulus regime (Fig. 3*B*). To gain insight into these fluctuations, we return to the coarse grained dynamics of Equation 3 and extend them to the case where τ depends on time as follows:
Equation 42 can be viewed as a generalized drift diffusion equation for excitability, with the drift term *F*(*x*, *I*, *t*) describing the deterministic effects and the diffusion term σ′ξ describing randomness inside the system.

We also make a further simplification by assuming that the drift term only depends indirectly on time. Specifically, we replace *F*(*x*, *I*, *t*) with *F*(*x*, *I*) by averaging across all time bins that share the same value of *x* and *I* (see Materials and Methods).

The qualitative behavior of the system is strongly affected by the ratio of drift to diffusion, *F*(*x*, *I*)/σ′. We thus measure *F*(*x*, *I*)/σ′ directly from data via our τ estimation procedure. For constant stimulus (*I* = 11.5 Hz), *F*(*x*, *I*) only depends on *x*. The red curves in Figure 3*D* show the dependence of *F*(*x*)/σ′ on *x*, with the shaded area indicating the noise dominated regime (*F*(*x*)<σ′). A stable fixed point exists at *x̄* ≈ 0.55, revealed by the negative slope around *F*(*x̄*) = 0. This slope, although negative, is small compared with the noise, indicating a weak or marginal stability. We will show that this marginally stable regime, characterized by a weakly sustained equilibrium with noise-driven dynamics, can explain the main experimental observations.

These dynamics are qualitatively different from those obtained from a single timescale model, as shown in the previous section. To show this, we apply the same estimation process to output generated from a single timescale model using the same *U*,β calibrated to the same fixed point. The associated drift diffusion ratio *F*′(*x*)/σ′ is much more inclined than that measured from the data, revealing a stable attractor (blue line in Fig. 3*D*). In this case, we can determine analytically the slope ∂*F*′(*x*)/∂x = (1/τ_{0} + *U If*′(*x̄*)). We use relatively large *U* and short τ_{0} to fit the response to white noise stimulus, so ∂*F*(x)/∂x is also a relatively large number. Because the diffusion term is the same for both models, we have |∂*F*(*x*)/∂*x*| ≪ |(∂*F*′(*x*))/∂*x*|.

This observation leads to a new explanation of the origin of excitability fluctuations: marginally stable dynamics. Because, for most *x* values estimated from the data, *F*(*x*) is smaller than σ′, the dynamics of *x* is fluctuation dominated and resembles a random walk. Combined with the nonlinear transfer function, the resulting spike train is characterized by long-term correlated fluctuations (Soen and Braun, 2000). Note that this is a different source of noise compared with the large additive noise introduced in the single timescale model.

We now consider how entrainment by variable stimuli can occur in such a setting. The neuron's response to fast changes in the input is addressed by choosing *U* to fit white noise susceptibility. What remains to be explained is entrainment over slower timescales, which is determined by changes to *F*(*x*, *I*)/σ′ as a function of *I*.

Thus we measure the dependence of *F*(*x*, *I*)/σ′ on *I* for a fixed value of *x* for white noise and scale-free stimuli. Figure 3*E* shows that the drift is larger than the noise for many input values used in the experiment. This property can be formalized as an inequality of the linearized dynamics as follows:
This inequality reconciles the long autocorrelation timescale with the high input–output covariance. The dynamics are diffusion dominated for a stationary input (intermittency, large Fano factor) and drift dominated in response to external perturbation (entrainment, high reproducibility). The two features are hallmarks of the marginally stable regime.

### Adaptive recovery timescale accounts for marginal stability

Our description above of *F*(*x*, *I*) only used the estimation of τ as an intermediate step. Specifically, τ(*t*) was used to derive *x*(*t*), from which we derived *F*(*x*, *I*). We now use our observations to formulate a compact model that captures the main aspects of the data. To that end, we determine what functional dependence τ(*x*) leads to marginal stability. In the extreme case of *F*(*x*, *I*) = 0, this amounts to τ(*x*) = (1−*x*)/UIf(x), as depicted in Figure 4*C*. This functional form implies a a history-dependent recovery timescale (Toib et al., 1998; Lundstrom et al., 2008) and is very similar to power law dependence, as suggested by the adaptive transition rate model (Marom, 2009). The biophysical intuition behind that model is of an ensemble of channels that reside in various states along an abstract inactivation axis (Fig. 4*A*). Recovery from the inactive domain depends on diffusion and is thus slower for states that are more to the right. The current excitability acts as a reporter on the distribution of inactive states, leading to the relationship τ ∝ *x*^{−γ}. In this paper we refer to this model as adaptive timescale rather than adaptive transition rate, and it is given by the following:
Different levels of marginal stability associated with each neuron are captured by different values of the exponent γ (Fig. 4*B*, *C*). The critical value of γ for which the dynamics are closest to the critical state is obtained from ∂*F*(*x*, *I*)/∂*x* = 0 (see Materials and Methods). With marginal stability this model is able to fit the long-term fluctuations of the constant stimulus, revealed by Fano factor measured in 32 s (1.3 ± 0.9 model vs 1.4 ± 1 data) and at the same time maintaining a strong signal-to-noise ratio to fluctuating input, revealed by reproducibility (0.21 ± 0.09 model vs 0.42 ± 0.16 data for white noise stimulus and 0.55 ± 0.15 model vs 0.46 ± 0.24 data for scale-free stimulus with best fit). Note that Equation 45 shows that sensitivity to input is not affected by marginality, leading to the inequality between external and internal sensitivity holding for a large range of input values as follows:
Even without additive noise (σ = 0), we are able to capture the marginal stability and fit both autocorrelation (Fig. 5*A*) and the Fano factor (Fig. 5*C*). Additive noise can offer an alternative account of response fluctuations (Fig. 5*C*). The resulting dynamics, however, are different. If σ is responsible for variability, then excitability performs a random walk independent of input. As a result, we get lower reproducibility (Fig. 5*D*). In contrast, variability originating from marginal stability causes the magnitude of the random walk to be heavily influenced by the stimulus, leading to high reproducibility between trials (Fig. 5*D*). Therefore, by measuring the Fano factor and reproducibility together we could infer the origin of fluctuations.

Apart from providing a compact description of the data, our estimation method offers a first example of a measurement of the exponent γ directly from data (see Materials and Methods). Closer inspection, however, reveals the shortcomings of this model. The covariance function to white noise stimulus reveals a significant delay that is not observed in the real data and cannot be solved by varying parameters (Fig. 5*B*). Another discrepancy is observed in the response to scale-free stimulus: there is a systematic error associated with high input intensity (Fig. 5*E*), showing that the input–output relation is not described correctly. These failures limit the predictive power of the model, which is only mildly better than the single timescale model (discussed below; Fig. 6*C*).

The systematic error on the fit to scale-free stimulus suggests that τ might not be completely determined by *x*. To probe this relationship directly, Figure 5*F* shows that, indeed, τ(*x*) differs between white noise and scale-free stimuli. This discrepancy leads us to our final model.

### Dynamical timescale model

Guided by the findings above, we reexamine the underlying intuition for the adaptive timescale model (Fig. 4*A*). Specifically, we note that, in order for excitability to be a reporter of the recovery timescale, the distribution of channels has to reach a steady state. If, however, the diffusion timescale is comparable to or longer than the transition timescale between active and inactive states, this would no longer be the case (Fig. 6*A*). We thus assume that τ relaxes toward a function of *x* as follows:
This simple modification greatly improves the ability of the model to describe the data, as revealed by the following cross-validation. We fit model parameters for each neuron using the entire white noise stimulus and the Fano factor from the constant stimulus (Fig. 6*B*, middle). Using the same parameters, the model provides a good description of the response to constant and scale-free stimuli (Fig. 6*B*). We quantified both the test error and the training error (ability to fit all three stimuli) for all models described so far, showing a performance jump when moving to the dynamic timescale model (Fig. 6*C*). The relaxation timescale τ_{r} accounts for the separate *x* − τ relationships derived from constant and scale-free stimuli (Fig. 7*A*). Figures 7 and 8 summarize various comparisons of the model to the data, showing that we are able to address all the main shortcomings of the adaptive timescale model. The systematic error in input–output relation to scale-free stimulus is corrected (Fig. 7*B*). The model captures precisely both the autocorrelation function under constant stimulus (Fig. 7*C*) and the covariance function under white noise stimulus without any delay (Fig. 7*D*). Figure 8, *A* and *B*, shows that the model fits the average response probability and its variance in the individual neuron level.

Due to marginal stability, the model captures rather well both intermittency and entrainment as measured by the Fano factor (1.3 ± 0.5 data vs 1.4 ± 1 model) and average reproducibility (0.32 ± 0.08 model vs 0.42 ± 0.16 data for white noise stimulus and 0.55 ± 0.15 model vs 0.55 ± 0.12 data for scale-free stimulus with best fit). Although all neurons display intermittent response to constant stimulus and entrainment to variable input, the balance between these two features varies. Figure 8, *B–E*, shows the different measures of this tradeoff, with lines connecting the model and data for each neuron. For most of the data, the lines connecting data and model are short, indicating that the model is able to capture different balances between intermittency and entrainment for each neuron in the full experimental range.

Some insight into the model can be gained by dividing the parameters to those that are responsible for fast responses (*U*, τ_{0}, β) and slow dynamics (γ, τ_{1}). The former are mainly revealed by the immediate response to input perturbation, whereas the latter manifest mainly in the long-term fluctuation and history dependency. The stimulus-dependent τ(*x*) profile (Fig. 7*A*) is mostly shaped by the slow dynamics. The dynamical timescale model is able to answer the original question about distinctive timescale of output correlation and input–output covariance by the separation of timescales. The parameters obtained indicate that the relaxation process is substantially slower than the recovery process (3.9 ± 3 s for τ_{r} vs 0.4 ± 0.2 s for τ_{0}), decoupling the immediate impact of input perturbation from the slow process. Because the dynamical timescale provides a good description of both slow and fast processes, it is able to predict the response to the scale-free stimulus that contains both slow and fast components.

The dynamical timescale is an abstraction of many processes with different timescales. To compare the model's predictive power with a more explicit representation of multiple processes, we fit a model with 10 parameters that combines two independent timescales (see Materials and Methods). As shown in Figure 6, *C–E*, whereas the model is able to fit the training data as well as the dynamical timescale model, its predictive power is worse. In particular, it is not able to capture intermittency and entrainment simultaneously.

## Discussion

We analyzed the response of several neurons to different types of extended stimuli. Consistent with previous experimental reports, we found response fluctuations over large timescales for constant inputs (intermittency) and less variable responses for more variable stimuli (entrainment). Our analysis revealed that the neurons operate in a marginally stable regime that reconciles these two seemingly paradoxical aspects. This regime is characterized by a separation of timescales in the reaction to internal and external perturbations, which are linked directly to intermittency and entrainment, respectively. This regime explains scale-free fluctuations observed under constant stimulus and at the same time strong input coupling observed under white noise stimulus.

We proposed a novel approach to derive a model for this multiple timescale adaptation. Using a simplified expectation maximization method, we infer from the data both a continuous adaptation timescale, τ(*t*), and the corresponding excitability, *x*(*t*). Observing the inferred excitability trace has revealed the marginally stable regime described above.

We showed that the qualitative features of the data can be approximated by the adaptive timescale model proposed by Marom (2009). We provided the first experimental estimate of the model's power law exponent, γ. This model can only match the observed multiple timescale fluctuations when γ approaches a critical value, corresponding to a random walk along an approximate line attractor.

Although the adaptive framework provides a decent approximation to the data, several systematic errors remain, primarily the dependence of the neuron's response on long-term statistics. To address these issues, we proposed the dynamical timescale model. This model treats the recovery timescale as a dynamic variable, giving rise to a 2D dynamical system. The model explains all major observations on the data.

The structure of the model suggests a possibility of a hierarchy of adaptive timescales in which τ_{r} itself is determined by another differential equation, similar in form to that of the recovery timescale τ. For the data that we studied, our two-layer model proved to be sufficient. It might be speculated that a full hierarchical structure would be revealed by longer experiments.

The issue of variability reduction by natural stimuli was also addressed in shorter timescales both experimentally (Mainen and Sejnowski, 1995) and theoretically (Schneidman et al., 1998). The proposed model consisted of a stochastic Hodgkin–Huxley model poised near the threshold. Although formulated for very different timescales, the essence of the dynamical state is actually similar to our proposed marginal stability. A similar regime was also explored in a very different setting: networks with balanced excitation and inhibition operating in a fluctuation driven regime for a large range of parameters (van Vreeswijk and Sompolinsky, 1996, 1998). Note that, in our case as well, marginal stability does not depend on a finely tuned choice of parameters or inputs.

Our model draws inspiration from the experiments and models developed for ion channel dynamics. Channel gating was described as a discrete Markov process (McManus and Magleby, 1988; Millhauser et al., 1988) or as a deterministic fractal system (Liebovitch et al., 1987; Lowen et al., 1999). Despite their different origin, both descriptions gave rise to similar wide distributions (up to power law) of residence durations (Korn and Horn, 1988). Our model deals with longer timescales, probably corresponding to channel inactivation. Our approach is an abstraction of the Markov chain approach, in which the effective timescale captures the distribution among states in the chain, but the large number of states gives rise to a continuous version that is formally similar to the fractal models. The simple structure of the model allows us to extract the parameters directly from data and predict responses to novel stimuli, but does not allow for a clear biophysical interpretation of model parameters.

At the single neuron level, proposed models of slow dynamics include fractional differentiation (Lundstrom et al., 2008) and power law kernels (Pozzorini et al., 2013), among others. These models integrate incoming stimuli by a linear kernel. The specific kernel that was able to fit many aspects of their data is equivalent to a sum of exponential kernels with slow and fast timescales. While providing excellent predictions for the experiments to which they were developed (intracellular current injections over tens of seconds), their linear nature is unlikely to fit the experiments that we analyzed. Several aspects of the data we analyzed seem to suggest a nonlinear model—the response rate increases with stimulus variance despite maintaining the same mean input rate. This can be understood as a strong response for isolated bursts compared with periodic pulses. The short bursts are not long enough to increase τ and decrease output with the dynamic timescale model because τ has certain inertia, whereas a linear model would respond maximally to periodic input. The main effect studied here, high reproducibility under scale-free input with strong fluctuation observed in constant stimulus, might also be beyond linear models. Specifically, our two parallel timescale model was not able to fit this behavior despite having more parameters than the dynamic timescale model. It might be speculated that an interaction between the different timescales is necessary and the dynamic timescale is an abstraction of such an interaction.

Recently, a dynamically adaptive threshold model reminiscent of dynamical timescale has been suggested (Mensi et al., 2016). The nonlinear dynamical processes in that study enhanced sensitivity to input fluctuation on different levels of baseline input, preserving the sensitivity to a broad range of input statistics.

A multiple timescale architecture was also suggested as a form of synaptic plasticity (Fusi et al., 2005). A cascade of hidden metaplastic states governed the actual synaptic efficacy, leading to a near optimal combination of memory retention and initial signal-to-noise ratio. Although the underlying biophysics are different, the cascade model shares some similarities with our work. Relating the metaplastic states of the synapse with the space of inactive channels in the neuron, we can draw some analogies. The decay of stimulus impact measured by input–output covariance follows a multiple timescale process instead of single timescale exponential function. This decay time is in a sense a measure of memory retention of the stimulus. For a given memory retention value, the dynamical timescale model has a better signal-to-noise ratio than the single timescale model. Therefore, our dynamical timescale formalism might translate into the synaptic domain as well.

Our model provides a compact description of neural responses in timescales much longer than those associated with spiking. What could be the functional consequences of such slow processes? Specifically, can excitability dynamics be a form of memory? This question was addressed by another model applied to similar data, suggesting that the long-term fluctuations of neural excitability are not a signature of long memory (Soudry and Meir, 2014). That model consisted of a parallel architecture of slow and rapid processes. The response to input is mostly shaped by the fast processes, whereas the slow processes only lead to internal fluctuations. Our model bears some similarities to that analysis because the response to external perturbations is primarily governed by fast processes (*U* and τ_{0}). Nevertheless, the serial nature of our model leads to a larger effect of the slow processes on input processing. Input history could be encoded in the recovery timescale, which in turn can affect the dynamics of excitability.

Finally, part of the motivation of formulating a compact model is to enable its integration into network models. Although outside of the scope of the current study, we can speculate on possible implications. Marginal stability can cause a population to develop a heterogeneous response to homogeneous input, which would lead to a better coding efficiency (Marder and Goaillard, 2006). Whether synaptic coupling would cancel this heterogeneity remains to be seen (Soen and Braun, 2000). The dynamic timescale offers a “hidden depth” to the neuron, which could be a substrate for information processing of time series. This might be particularly useful for processing time series with natural statistics because they contain many timescales (Drew and Abbott, 2006; Lundstrom et al., 2008, 2010; Pozzorini et al., 2013). Finally, a powerful element of recurrent neural networks in the machine learning community is the “long short-term memory” (LSTM) unit (Hochreiter and Schmidhuber, 1997). This element is endowed with a “forget” gate (Gers et al., 2000) that can be described as inducing an effective timescale. It would be interesting to explore whether the activity-induced changes to the LSTM timescale resemble those explored in our work and how our dynamical timescale could inspire new forms of specialized units for machine learning purposes.

## Footnotes

O.B. is supported by ERC FP7 CIG 2013-618543, Fondation Adelis, and the Israel Science Foundation 346/16. We thank Naama Brenner, Shimon Marom, and Daniel Soudry for valuable comments.

The authors declare no competing financial interests.

- Correspondence should be addressed to Omri Barak, Rappaport Faculty of Medicine, Network Biology Research Laboratories, Technion, Haifa, Israel, 3200003. omri.barak{at}gmail.com