## Abstract

Cortical networks can maintain memories for decades despite the short lifetime of synaptic strengths. Can a neural network store long-lasting memories in unstable synapses? Here, we study the effects of ongoing spike-timing-dependent plasticity (STDP) on the stability of memory patterns stored in synapses of an attractor neural network. We show that certain classes of STDP rules can stabilize all stored memory patterns despite a short lifetime of synapses. In our model, unstructured neural noise, after passing through the recurrent network connections, carries the imprint of all memory patterns in temporal correlations. STDP, combined with these correlations, leads to reinforcement of all stored patterns, even those that are never explicitly visited. Our findings may provide the functional reason for irregular spiking displayed by cortical neurons and justify models of system memory consolidation. Therefore, we propose that irregular neural activity is the feature that helps cortical networks maintain stable connections.

## Introduction

Changing synaptic strengths is widely regarded as the mechanism by which long-term memory is encoded and stored in the brain (Martin et al., 2000). Long-term potentiation (LTP) and long-term depression (LTD) of synaptic conductances exhibit many features that make them candidates for the cellular mechanism of memory storage. By correlating presynaptic and postsynaptic activities, LTP/LTD can implement Hebbian plasticity that is at the basis of many learning and memory models (Abbott and Nelson, 2000). Because LTP is observed in many preparations, including freely behaving animals (Whitlock et al., 2006), and in many brain regions (Cooke and Bliss, 2006), it matches the description of the basic mechanism of learning and memory.

To fully satisfy the requirements for memory mechanism, the persistence of synaptic changes induced by LTP/LTD has to be reconciled with persistence of memory traces. Fundamentally, it is not clear whether these two timescales have to match. Although memories can be stored by the brain for dozens of years, the lifetime of LTP appears to be shorter. In hippocampal slice preparations, the persistence of synaptic changes is limited by several hours (Reymann et al., 1985), whereas in most cases *in vivo*, synaptic changes can last 4–5 weeks (Shors and Matzel, 1997; Abraham, 2003). In rare instances, synaptic changes can last for approximately a year; however, these examples require special conditions (Abraham et al., 2002). This occurs despite the observation that at least some components of consolidated long-term memory can be attributed to the hippocampal complex (Nadel and Moscovitch, 1997, 2001). In the cortex, LTP has not been demonstrated to last beyond a period of several weeks (Trepel and Racine, 1998; Ivanco and Racine, 2000). Although changes in structural connectivity can persist for more than a month (Grutzendler et al., 2002; Trachtenberg et al., 2002; Knott et al., 2006; Alvarez and Sabatini, 2007; Fu and Zuo, 2011), they may reflect ongoing changes in sensory inputs rather than carry memory traces. The same applies to other examples of cortical plasticity observed after sensory deprivation (Feldman, 2009). Whether synaptic strengths can persist throughout the lifetime is an open question. Cascade synaptic models (Fusi et al., 2005), for example, propose that individual synapses contain long-lasting internal states that are not directly related to synaptic strength. An alternative explanation is that robust long-term memories can somehow be maintained for decades without the requirement of stable synapses. This hypothesis is investigated here.

Here, we propose a mechanism for persistent memory storage that uses short-lived synapses. In our model, long-term memory can be stored in the network for a very long time despite a short time constant of LTP persistence. To be preserved, memory states do not have to be revisited. We analyze a simple mathematical model for attractor neural network, which includes several realistic elements such as stochastic neural noise, short synaptic lifetime, and ongoing synaptic plasticity described by spike-timing-dependent plasticity (STDP). The network activity resides near a set of states that represent activity relevant to its current environment. The average activity of neurons samples only these current memory states. However, we demonstrate that, because of the presence of noise, the correlations in neural activity carry imprints of all memory traces, including old ones. These correlations, under carefully chosen conditions, can allow the old traces to be rehearsed and maintained by the network even though they are not explicitly visited. We thus propose that old memory states can be reinforced by rehearsal, even though these memories are never visited or accessed. Because our rehearsal mechanism does not involve explicit reactivation of old memories, we call the proposed mechanism “implicit rehearsal.” We show that for implicit rehearsal to be effective, STDP rules must satisfy certain strict conditions. This mechanism will work with antisymmetric STDP that is often observed (Bi and Poo, 2001; Froemke and Dan, 2002), but does not work with the symmetric, non-negative form of LTP. We show therefore that neural noise combined with synaptic plasticity can lead to stability of old memory traces despite individual synapses being unstable. Our model has experimentally testable predictions.

## Materials and Methods

##### Description of the model.

Let *N* be the total number of neurons. We will assume that there are *p* patterns represented by *N*-dimensional vectors, the elements of which are ±1:
Different patterns are chosen to be orthogonal to each other as follows:
Equation 2 allows us to define *p* projection operators as follows:
When projection operators number *a* are applied to a given activity vector, they result in an activity specific to the given pattern *a*.

Memories about the patterns are stored in a synaptic weight matrix using the conventional learning rule (Dayan and Abbott, 2001), as follows:
where the set of coefficients, *c _{a}*(

*t*), represents the strengths of individual patterns.

In our model, the input current and firing rate of neurons *i*, *u _{i}*(

*t*), and

*f*(

_{i}*t*), are related through the activation function

*F*as follows: Input currents are described by the following equation: In Equation 6, τ is a constant that determines how rapidly the current varies and ξ(

*t*) is the Gaussian random white noise. We assume that ξ(

*t*) has the following properties: where 〈…〉 denotes the average over noise ensemble. Subsequently, we assume that the amplitude of noise, ξ, is very small and we use this fact along with the short timescale of ξ to treat noise as a perturbation.

Plasticity in the network is defined by spike-timing-dependent learning rules. For a pair of cells, the strength of synapses is updated with a rate of update that is dependent on presynaptic and postsynaptic activities as follows:
Here, *f _{i}*(

*t*) is the firing rate of neuron number

*i*at time

*t*, and γ is the learning rate. Three terms in the r.h.s. of this equation describe the decay of synaptic strength with time and the modification due to presynaptic and postsynaptic firing, respectively. The relationship between learning in nonstationary rate-based model (Equation 8) and pairwise STDP in spiking models has been studies by Kempter et al. (1999). The STDP kernel,

*K*(Δ

*t*), in our model contains two components: short-range and long-range,

*K*(Δ

*t*) =

*K*(Δ

_{s}*t*) +

*K*(Δ

_{l}*t*). We assume that the short-range component varies within the timescale of several hundred milliseconds, which is defined as follows: The long-range STDP kernel,

*K*(Δ

_{l}*t*), is needed in our model to constrain the overall magnitude of firing rates and can originate from metabolic and other constraints. We assume that it varies very slowly on the timescales of the order of hours or more. The only constraint on

*K*(Δ

_{l}*t*) that is important in our model is that it makes the integral of the entire STDP kernel over time positive (i.e.,

The STDP rule (Equation 8) can also be rewritten in an equivalent integral form as follows:
This equation shows that τ_{0} defines the forgetting time constant. The old memory is expected to decay after this time with the exception of memory that is rehearsed (i.e., relearned within the timescale τ_{0}). This rehearsal process is the topic of our present study.

Due to random noise, ξ(*t*), the input currents fluctuate near constant values **u**(*t*) = **u** + δ**u**(*t*). We assume that fluctuations are weak and can be treated as small perturbations (please see discussion after Equation 15 for the justification of this assumption). Therefore, by Equation 5, the firing rates fluctuate around stationary rates as follows:
where *g* = *F′*(*u _{i}*). Because neural noise is short range, its timescales are measured in milliseconds and we can decompose the dynamics of the system into two components: the fast-changing component, associated with noise, and the slowly varying component, determined by Hebbian learning. These two components are represented by two terms in Equation 11. The equation for fast-changing component is as follows:
Our goal is to derive the contribution of the fast-changing component (i.e., noise) to the slowly varying component. This interplay could be interpreted as rehearsal. In the subsequent discussion, we treat noise as a small perturbation to the firing rates of the network; that is, we will assume that ξ

^{2}is small. The detailed conditions for the validity of this approximation can be found in the subsection below titled “Validity of approximations made in this study.”

For the given set of network weights, we assume that there are two types of attractors. One set of attractors is never visited by the network. We call these states implicit. The other set of attractors is explicitly visited by the system. Because our main goal is to consider the dynamics of implicit attractors (i.e., ones that are never visited), for simplicity, we assume that the explicit attractors are represented by only one attractor. Here, we discuss briefly the explicit attractor state and its stability with respect to learning.

Let us assume that the explicit attractor has an index *a* = 1. The stationary firing rates associated with this state are proportional to the pattern **p**^{1}; that is, **f** = *b***p**^{1}. Here, *b* is a constant determined by the function *F* (we assume that *F*(*x*) = − F(− *x*); e.g., *F* is the sigmoid function). Assuming that the effects of noise are negligible, one can obtain the stationary value of the weight matrix that results from the explicit attractor. This contribution is present by virtue of the explicit attractor relearning, itself, through the STDP rules and could be viewed as resulting from explicit rehearsal (i.e., rehearsal of patterns that are currently in working memory). For this type of rehearsal, noise is not necessary. Equation 10 allows us to determine this component of the weight matrix as follows:
where δ*W _{ij}* =

*b*is determined by the self-consistent equation as follows: Note that Equation 14 depends only on the total integral of the kernel. The temporal details of STDP do not affect the equation. Component δ

*W*→ 0 is in the network without noise according to Equation 10. Our goal here is to determine the behavior of δ

*W*(i.e., the contribution of implicit patterns that are never explicitly visited) in the network with noise.

The parameter *K _{l}*(

*t*) enters our model, we will not discuss this kernel further. From this point on, by STDP kernel we imply the short-range kernel,

*K*(

_{s}*t*), that varies on the timescales of hundreds of milliseconds and is usually measured in LTP/LTD experiments (Abbott and Nelson, 2000).

According to Equation 10, without noise, the weight matrix contains only pattern *P*^{1} with stationary strength *c*_{1}. All other coefficients *c*_{a≠1} are zero. With Gaussian white noise included in Equation 6, other patterns (implicit) are represented in the activity of the network. Therefore, these patterns are present in the synaptic weight matrix δ*W*(*t*). Our goal is to find how coefficients *c*_{a≠1}(*t*) evolve with time in this case.

To accomplish this goal, we consider noise a small perturbation and use the perturbation theory using the amplitude of noise, ξ^{2}, as a small parameter. Noise-induced firing rate fluctuation, δ**u**(*t*), in the direction of pattern number, *a*, is δ**u*** ^{a}*(

*t*) =

**P**

^{a}· δ

**u**(

*t*) and the corresponding component of random inputs is

**ξ**

*(*

^{a}*t*) =

**P**

*·*

^{a}**ξ**(

*t*). By applying projector

**P**

*, defined in Equation 3, to both sides of Equation 12, and because*

_{a}*c*changes much slower than δ

_{a}**u**(

*t*), we find the following: Next, we derive the condition for the validity of perturbation theory (i.e., the amplitude of fluctuations due to noise is smaller than the zero-th order solution obtained without noise). By choosing ξ to be small, we can make δ

*u*much smaller than

_{i}*u*to allow us to treat noise as a perturbation in Equation 11. We give the detailed conditions for the validity of perturbation calculations in the subsection below titled “Validity of approximations made in this study” (Equation 30).

_{i}By Equations 3 and 7, we have the following:
Using Equations 7 and 15, we obtain the average correlation function of fluctuations as follows:
Substituting Equation 11 and 13 into Equation 8 and using the correlation function in Equation 17, the fluctuating part of Equation 8 gives the equations that describe the dynamics of “unused” components of the weight matrix (*a* = 2, …, *p*):
Coefficients *c _{a}*(

*t*) are defined in Equations 46 and 4. Here, we defined the parameters of the short range STDP kernel as follows: Equation 18 describes how the strength of each unused pattern's representation in the network weights changes over time when neurons receive random noise. This equation is the main result of this study. The dependence of the right side of Equation 18 as a function of

*c*is shown in Figures 2

_{a}*B*and 4

*B*. It is evident from Equation 18 that

*c*(

_{a}*t*) = 1/

*g*is a critical value at which the equation for

*c*(

_{a}*t*) becomes singular.

The equation of evolution for the first (explicit) pattern is as follows:
where *c*_{1} satisfies the self-consistency equation *F*[*bc*_{1}] = *b* and *h* = *F*″(*u _{i}*) is the second derivative of the activation function.

##### Firing rate model simulation.

In the simulations, we construct a number of random patterns such that their elements are independent and take the value +1 or −1 with equal probability. These patterns are the memories that we store in the network. From these patterns, we choose an arbitrary one to be the explicit pattern that is stored in the network and constantly visited. Network parameters are chosen according to the bistability conditions given in the Results section. The explicit form of action function *F* is not important because all we need is its first-order derivative and value a specific point; for example, we can choose it to be a power function.

At each time step, we first generate random Gaussian white noise for each neuron. Then, using Equation 12, we calculate the changes to the fluctuating part of the input current (δ**u**(*t*)) due to noise. After updating **f**(*t*), we use Equation 12 to calculate the new firing rates **f**(*t*). Synaptic weights are updated according to Equation 8 but without averaging over noise.

Simulation consists of two phases. In the first phase, we prepare the network. We start with a synaptic weight matrix that contains only the explicit pattern with an arbitrary strength. Then, at each step, the weight matrix **W**(*t*) is updated according to Equation 8 and we stop when it stabilizes. The strength of the explicit pattern can also be read from the weight matrix. In the second phase, we test our results in the main text. First, introduce the implicit pattern(s) to the network. Then, let the network evolve according to Equation 8 and record the strength of patterns at each time step. We found that if the initial strength of the implicit pattern is set below a certain value (the transition point in Fig. 4), the implicit pattern decays with time. Conversely, if we set the initial strength of the implicit pattern above the transition point, then the implicit pattern is kept in the network for a long time. In this case, the strength of implicit pattern fluctuates around certain value (the second stable point in Fig. 4) above the transition point. In both cases, the strength of the explicit pattern never decays but fluctuates around its initial value, which is found when we prepare the network in the first phase. These observations are in good agreement with our model prediction (i.e., Equation 8 and Figure 4). Another test is whether when we turn off the noise, the implicit pattern always decays. This also agrees with our analysis in the main text.

In the simulations, we find that increasing τ_{0} improves the performance of our model; that is, the implicit patterns are maintained for a longer time. This is in agreement with the analysis in the subsection below titled “Validity of approximations made in this study”; that is, in this limit, the mean field approximation (MFA) works better and therefore the strength of the implicit pattern is kept at the second stable point. However, to keep the running time of simulations feasible, we could not have τ_{0} too large. In the simulations, we chose τ = 5 ms, τ_{+} = 50 ms, τ_{−} = 100 ms, and τ_{0} = 2 × 10^{5} ms. Other parameters are indicated in the captions of appropriate figures.

##### Validity of approximations made in this study.

To analyze the behavior of our model, we used primarily the method based on analytical calculations. This method includes derivation of the results in the closed form that can be understood without the use of a computer. Therefore, our main result, Equation 18, describes the learning dynamics of the implicit component of memory and can be analyzed for various sets of parameters without computer simulations. The advantage of this method is that the dynamics of network weights can be understood without the limits on the network size and on the parameters used. To obtain these results, however, some approximations had to be made. Below, we derive the conditions under which our approximations can be considered valid and the effects of them are under control. Briefly, we assumed that the amplitude of fluctuations induced by noise is small, which allowed us to use an approximation called perturbation theory (Equation 11). The effects of noise on the network weights can still be large, however, because they are accumulated over time. Below, in this section of the Materials and Methods (Equation 30), we show that parameters of the model can be chosen so that both perturbation theory is valid and bistability of network weights exist, as described in Figure 4. We show, for example, that perturbation theory is valid for large firing rates, large neuronal gains, or large STDP time windows τ_{+} and τ_{−}. The second approximation used by us is the MFA (Equation 8). MFA is often used in the network theory and has allowed to derive many important results, such as the memory capacity of the Hopfield model (Hertz et al., 1991). In the context of our model, MFA means that the instantaneous values of activity correlations entering STDP rules can be replaced by their average values. Below, we derive the conditions under which MFA is valid by analyzing the effects of relaxing this assumption. We show that MFA is accurate if the STDP time windows, τ_{+} and τ_{−}, are substantially smaller than synaptic strength lifetime, τ_{0} (Equation 31). Because the former set of timescales is approximately hundreds of milliseconds, whereas the latter is measured in weeks, this condition appears to be well valid in reality, thus motivating the use of MFA in our calculations. This comparison also discloses the challenges faced by realistic computer simulations in this setting. Because computer models have to integrate both millisecond neuronal timescales and long-term behaviors of the network lasting years, such simulations are challenging to even modern computers, especially because network size has to be kept large. Despite these challenges, we succeeded in reproducing computationally the predicted behavior of networks in keeping implicit memory states stable (Figs. 7, 8, 9). Cortical networks, however, can easily overcome these challenges due to their inherent parallelism and access the range of parameters only available in our analytical calculations.

##### Validity of perturbation theory approximation.

In our study, we considered fluctuations induced by noise to be small (Equation 11). This means that noise was considered a small perturbation; that is, within a perturbation theory. In this section, we discuss the validity of this approximation. Although we used this approximation (perturbation theory) to solve equations of our model, our mechanism may take place even when the equations cannot be solved using this method.

More precisely, smallness of the amplitude of noise was needed when we used Taylor expansion around the value *u _{i}* in Equation 11. This equation does not include the second-order term

*F*″(

*u*)δ

_{i}*u*

_{i}

^{2}/2. This approximation is valid if: where

*b*is given by Equation 14. Because δ

*u*∼ ξ

_{i}*, this condition imposes a constrain on the noise amplitude ξ. To see this, solving Equation 12, we get the following: From this and Equation 8, we find the following: As follows from this equation, this quantity averaged over neurons is as follows: Here,*

_{i}*c*is the

_{i}*i*-th eigenvalue of matrix W. Because there is only a small number of patterns with finite corresponding

*c*, and most of the eigenvalues

_{i}*c*are close to zero, we have the following: Combining this with Equation 22, we find the condition for perturbation calculation to be valid is as follows: The amplitude of noise is therefore limited by ξ

_{i}^{2}≪τ(

*F*′(

*u*)/

*F*″(

*u*))

_{u=F−1(b)}

^{2}∼ τ

*u*

^{2}∼ τ

*f*

^{2}/

*g*

^{2}. Here,

*u*is a typical value of membrane voltage and

*f*is the typical value of the firing rates. Therefore, the levels of noise have to be sufficiently low for the perturbation theory analysis to be valid.

For bistability, we need the value of noise to be larger than a certain threshold. The detailed conditions for this criterion are described in section titled “Conditions of bistability.” Therefore, our analysis can be used when the level of noise is big enough for the bistability to exist and small enough for the Taylor expansion in Equation 11 to be valid. Can such a regime exist? Here, we will provide simple estimate for the existence of such a window of parameters. The perturbation theory is valid if noise is weak; that is:
As follows from the discussion in this study, the bistability exists if, loosely speaking, the learning rate is sufficiently strong; that is:
Both conditions can be satisfied, if the amplitude of noise ξ^{2} lies within the range defined as follows:
This range exists if the boundaries for the rage differ in the correct direction; that is:
which implies the following:
Therefore, if the learning rate γ*A*_{±} is sufficiently big, both perturbation theory analysis (Taylor series expansion in Equation 11) is valid and bistability necessary for our mechanism exists. This occurred because the firing rate equations and, consequently, Taylor series expansion, do not depend on the learning rates. Therefore, learning rates can be used as an independent parameter to reach the conditions of bistability. In addition, the effects of noise can be big even though we assume weak noise in the perturbation theory. This is because our assumption of the weakness of noise only includes the validity of Taylor series expansion. Therefore, the overall impact of noise can be substantial despite its small amplitude.

##### Validity of the MFA.

In this section, we show that the MFA calculations presented above in Materials and Methods are justified. In Equations 8 and 10, we assumed that the learning rates are determined by the averages of the firing rates over the ensemble of noise. In reality, these equations should be used without such averaging. To derive our results, we therefore used an approximation that could be called the MFA. At what condition can the instantaneous values of the pairwise products of the firing rates be replaced by their correlations? Below, we will show that this condition is determined by the timescale of synaptic modifications. In particular, it is determined by the time constant of synaptic decay τ_{0}. It is this timescale that determines the duration of time over which the firing rates are averaged in Equations 8 and 10). We will show that when the duration of STDP learning kernels τ_{±} (Equation 9) is much smaller than the forgetting timescale; that is: τ_{±} ≪ τ_{0}, the MFA can accurately describe the behavior of the network. Because, in reality, the STDP learning kernel lasts ∼100 ms whereas the forgetting timescale extends over several weeks, τ_{0} ∼ 10^{9} ms, the variance of the deviations from the mean field values are small, as follows:
This estimate argues that the MFA a valid method.

To derive the condition in Equation 31, we start from Equation 8. Without averaging of noise, Equation 8 has the following form:
Let *c _{a}*(

*t*) be the strength of implicit pattern

*a*. In the main text, where we used MFA analysis, the equation for

*c*is given in Equation 18. The quantity described by that equation will be called

_{a}*c*(

_{MF}*t*). Here, we are interested in the difference between the mean field result and the result without averaging. To make notations simpler, we will omit the subscript

*a*in the remaining part of this section and it is understood that our calculation is about a certain implicit patter

*a*the strength of which is

*c*.

By projecting Equation 32 onto state *a* using operator **P*** ^{a}*, we obtain the following:
where

*A*(

*t*) is given by the following: where: is the projection of the membrane voltage onto state

*a*. This quantity can be related to a Gaussian variable describing noise as follows: where: Here, it is easy to see that 〈ξ(

*t*)〉 = 0 and 〈ξ(

*t*)ξ(

*t*′)〉 = ξ

^{2}δ(

*t*−

*t*′). It is also direct to show that 〈

*u*(

*t*)〉 = 0 and: From Equation 18, we know that the following is true: To determine how well we can approximate

*A*(

*t*) by

*A*, we need to calculate the variance of

*A*(

*t*) as follows: In the calculation, we use the fact, which follows from the properties of Gaussian white noise, that: By straightforward calculations using Equations 34 and 41, we find that: Here

*c*

_{+},

*c*

_{−}and

*c*

_{0}are all functions of

*A*

_{±}, τ

_{±}, τ, and

*g*. To simplify the results, we define the following three variables as follows: With

*t*

_{±}and

*n*, different terms in Equation 42 can be written as follows: From our analysis, we know that, near the second stable point,

*t*

_{±}are both of order 1; that is,

*t*

_{±}∼

*O*(1). Therefore,

*c*

_{+},

*c*

_{−}, and

*c*

_{0}are all of the order of 1.

From the previous discussion, we can write: *A*(*t*) = *A _{MF}* + δ

*A*(

*t*), such that 〈δ

*A*(

*t*)〉 = 0. To estimate δ

*A*(

*t*), notice the facts that τ is approximately a few milliseconds, τ

_{±}are approximately a few hundred milliseconds, and τ

_{0}is approximately a few weeks; that is: τ≪τ

_{±}≪τ

_{0}. By Equation 42, we have the following: We can now write

*c*(

*t*) =

*c*(

_{MF}*t*) + δ

*c*(

*t*) such that τ

_{0}

*dc*/

_{MF}*dt*= −

*c*+

_{MF}*A*and τ

_{MF}_{0}

*d*δ

*c*/

*dt*= −δ

*c*(

*t*) + δ

*A*(

*t*). The first equation leads the mean-field solution that is presented in the main text. The second equation describes the fluctuations around the mean field results. Solving the second equation, we get the following: From Equation 43, we find the following: from which Equation 31 follows directly. If we choose the synaptic decay time τ

_{0}to be 2 weeks, then τ

_{0}∼ 10

^{9}ms and the STDP window τ

_{±}is a few hundred milliseconds; for example: τ

_{±}∼ 100 ms by Equation 45, we have δ

*c*/

*c*∼

_{MF}^{−3}; that is, the correction δ

*c*to the mean field solution

*A*is very small. This proves that we can approximate

_{MF}*c*(

*t*) with

*c*(

_{MF}*t*) and the MFA calculations above (e.g., Equation 18), are indeed valid approximations.

## Results

### Patterns of neural activity stored in network weights correspond to network attractors

In this study, we analyze attractor neural networks with features similar to the continuous Hopfield model (Hopfield, 1984; Hertz et al., 1991). Such networks can exhibit two types of memory: long-term, contained in the recurrent network weights, and working memory, contained in the firing rates of neurons (Amit, 1989).

The network can store the long-term memory of a set of patterns in prespecified network weights (Hertz et al., 1991). If activity of a neuron number *i* that is associated with pattern number *a* is *p _{i}^{a}*, then, as within the conventional Hopfield model, the connection strength between two neurons

*i*and

*j*is given by the Hebbian-like learning rule: This means that in the more patterns a given pair of neurons is coactive, the stronger the connection between these neurons. Here,

*N*and

*p*denote the total number of neurons and the number of stored patterns, respectively. We also introduced a set of coefficients,

*c*, that describe how strongly a given pattern is included in the network connections. In the standard Hopfield model, these coefficients are initialized and remain equal to one. In this study, these coefficients are affected by the ongoing activity in the network. The goal of our study is to understand the long-term behavior of the strengths of the patterns

_{a}*c*(

_{a}*t*) that result from ongoing learning.

In this network, patterns that are embedded in the recurrent weights, according to Equation 46, become network attractors (Hopfield, 1982, 1984). This means that if the activity of neurons matches one of the patterns at some moment in time, the pattern will be maintained by recurrent connections despite small perturbations and noise. Because several patterns are simultaneously embedded in the weights, the network may have several stable attractor states provided that the number of patterns, *p*, is not too large (Hertz et al., 1991). Because network firing rates can persist only near an attractor, in the absence of external inputs, the network must choose where to reside. This decision can be viewed as an implementation of short-term memory of the “which attractor I am near” kind. Therefore, Hopfield nets can support both short (working) and long-term memory (Bird and Burgess, 2008; Cowan, 2008) (Fig. 1).

### STDP rules applied to the attractor neural network lead to the deterioration of stored memories

What is the effect of synaptic plasticity on the attractors that are embedded into the network? From the point of view of plasticity, it is important to distinguish two types of attractors. First, there are attractors that represent memories that the network is constantly visiting. For example, because of external stimuli, the network can hop around states that are relevant to the particular task or environment. These attractor states represent recent memory. More precisely, recent states are defined as those that are visited within the time constant of synaptic decay. We call this type of states explicit attractors. The other type of state represents memories that were embedded into the network a long time ago and have not been accessed recently. These states will be called implicit (Fig. 1). Note that our terminology is somewhat different from the convention that uses the terms explicit/implicit to denote different classes of memory; that is, declarative versus procedural (Schacter, 1987).

Synaptic learning leads to different outcomes for explicit and implicit memory states. Because explicit memories are replayed in network activities, they are constantly rehearsed and therefore their contribution within the weight matrix is stable. In the Materials and Methods section, we evaluate the component of the weight matrix that carries explicit states (Equation 13). Specifically, we show that this component does not decay with time and is reinforced by learning in the network.

The behavior of implicit memories is quite different. If the attractors that correspond to implicit patterns are not visited within the time window of the decay of synaptic strength, defined in our model by parameter τ_{0}, these memory states disappear from the network weight matrix (Fig. 2). This observation is not surprising because rehearsal that reinforces the explicit attractors is not available for implicit attractor states. This is because the latter are not present in the network activity.

### Noise added to the network can implement rehearsal of old (implicit) memory states

Next, we included noise in the inputs of neurons to determine whether noise can reinforce implicit memory states. We reasoned that, if white unstructured noise were added to the input of every neuron, the activity of the network would contain implicit memory states, which may potentially stabilize old memories through the process of rehearsal. We call the process of rehearsal that is based on random noise implicit rehearsal. This process is distinct from the rehearsal of explicit states that occurs due to the network actually visiting explicit attractors.

The dynamics of implicit rehearsal is as follows. Random unstructured noise is added to the inputs of every neuron in the network. The term unstructured implies that the amount of noise added to neurons does not contain the patterns being rehearsed. In this study, it is assumed to be the same for all neurons for simplicity. Because neurons are connected by recurrent weights that do contain implicit patterns, when noise passes through recurrent connections, it becomes structured. This means that neural activity acquires correlations that contain implicit patterns (Equation 17). This is because implicit states are amplified by positive feedback that is present within the recurrent weight matrix (Equation 46) and fluctuations along these directions are therefore amplified by recurrent connections. Therefore, despite the network staying near explicit attractors and never visiting the implicit states, the presence of implicit states in the weight matrix shapes network fluctuations along the directions that represent old memories. Implicit memory is contained in the correlations of network activity as opposed to the explicit memory that is contained in the mean firing rate.

### Non-negative symmetric STDP rules applied to the network with white noise do not stabilize old (implicit) memory

What is the effect of implicit rehearsal in the case when STPD rules are ongoing in the network? For an STDP learning rule, a change in synaptic efficacy is dependent on the relative timing of presynaptic and postsynaptic action potentials for every synapse. In the simplest case, the synapse becomes stronger if both presynaptic spikes precede the action potential in the postsynaptic neuron and, in the opposite case, when postsynaptic spikes precede presynaptic spikes. We call this form of learning rule symmetric non-negative (Fig. 3). Our results show that this type of learning rule, applied in the presence of neural noise, does not make unused (implicit) memory more stable. The contribution of a memory state into the weight matrix is determined by coefficient, *c*, defined by Equation 46. The rate of change in this contribution defines the behavior of old memories with time. Figure 3*B* shows that, in the case of non-negative symmetric STDP, the only stable point for this coefficient is zero, which means that implicit memory is destined to disappear (for more detail, see Equation 18). Interestingly, if the strength of this coefficient is sufficiently large and if it passes the transition point in Figure 3*B*, the coefficient becomes unstable. This instability implies that the old pattern will emerge spontaneously in the network when the strength of the pattern is sufficiently large. In both regimes of small and unstable *c*, the network cannot maintain the old memory in a reliable manner.

### Antisymmetric STDP rules combined with white noise can stabilize old memory states that are not revisited

We examined the stability of implicit states when STPD rules have a form that is often observed experimentally (i.e., antisymmetric) (Bi and Poo, 2001; Froemke and Dan, 2002; Sjöström et al., 2008). We assumed that if a presynaptic spike precedes the postsynaptic spike, the synapse is strengthened due to LTP. If the timing of the spikes is reversed, the synapse is weakened; that is, the contribution of such events to the synaptic strength are negative (Fig. 4*A*), which corresponds to LTD. We find that, in this case, an implicit (unused) memory state can have two stable points. The stable points are defined for the contribution of the pattern to the weight matrix *c* (Equation 46). Stable points can be determined by examining the rate of change of this contribution (Fig. 4*B*) that is given by Equation 18). If, for a certain value of contribution, the rate of change is zero, this value is called the stationary point. If the contribution of a pattern is placed exactly into one of the stationary points, it will remain there because the rate of change of *c* is zero. Figure 4*B* shows three stationary points in this case. These three points differ in cases of small perturbations that deflect contribution, *c*, slightly from a stationary state. For two of the stationary states in Figure 4*B*, the resulting rate of change returns the contribution back to the state. This is illustrated by the arrows on the horizontal axis. Therefore, these two states are stable. The third stationary point is unstable and is called the transition point.

For two stable states, the contribution of the pattern is either low (stable point 1) or high (stable point 2). The former state corresponds to the weak representation of the pattern in the network that is indistinguishable from noise. The high contribution point (stable point 2) corresponds to the memory that is substantively present in the weight matrix. The system is capable of maintaining either high or low levels of a pattern in the weight matrix virtually indefinitely. This is despite the decay of synaptic strength that is ongoing in the system. Contribution is maintained because noise implements implicit rehearsal. Although the average values of firing rates are near the explicit attractor, the correlations in the firings rates between cells, induced by noise, carry information about other patterns that are not visited (Equation 17). Because learning rules are dependent on correlations, Hebbian learning is capable of maintaining implicit states in memory. This correlation-induced rehearsal results in the stability of patterns as a function of time. Although we presented the results for a single pattern, other implicit states are stabilized similarly due to their independence (Equation 2). Implicit rehearsal can stabilize several patterns simultaneously.

### Conditions of bistability

For the pattern contribution to have two stable points, several conditions have to be met. First, the integral of STDP kernel (Fig. 4*A*) must be negative. This implies that the LTD part of the STDP curve is stronger than the LTP part. Second, we need the following equation:
which can be satisfied if timescales of STDP, τ_{±}, are larger than that of firing rate, τ. This condition guaranties that *c _{a}*(

*t*) < 1/

*g*. To have the second stable point at finite

*c*(

_{a}*t*), we also need the local maximum to be positive. In the case that γ

*g*

^{2}ξ

^{2}(

*A*

_{+}+

*A*

_{−}) is much larger than

*B*shows the typical behavior of

### Our model predicts correlations between network weights and neural noise

What are the implications of our findings for an individual synapse? To maintain a set of memories in the network, the synapses have to preserve their strength. Because, in our model, synaptic strength decays with time, it has to be maintained at a constant level by the correlations in the presynaptic and postsynaptic activities. This means that stronger synapses have higher correlations in presynaptic and postsynaptic activities. Therefore, the form of rehearsal proposed here can be detected by measuring the correlations in activity for individual synapses and observing their relationships with synaptic strength. A more precise definition of this relationship is given by Equation 10.

### Computer simulations

We have presented the results of our analyses that can be described by equations in the closed form, such as Equation 8. These results have the advantage because we can immediately see what combination of parameters can implement the proposed mechanism, as shown in Figure 4. We also can analyze the behavior of very large networks with an unlimited number of neurons. The mathematical methods used in the previous sections, including the separation of short timescales (STDP ∼0.1 s) and long timescales (synaptic decay time ∼month), are often used in the studies of dynamical stabilization of mechanical, atomic, and plasma confinement (Landau and Lifshits, 1976; Paul, 1990; Hoang et al., 2013). However, to derive these results, some assumptions had to be made (see Materials and Methods). One of the assumptions is that learning in the network is determined by average noise correlations, rather than instantaneous values of noise (Equation 8). This assumption is generally called the mean-field approximation (Hertz et al., 1991). The second assumption is that the patterns stored in the network are strictly orthogonal. Finally, to derive our results, we assumed that the timescale of noise is much faster that the rate of learning, which allowed us to consider noise a perturbation. To validate these assumptions, in this section, we present the results of computer modeling of the implicit attractors in the network of neurons described by the firing rate model.

First, we start with the simplest case where the network has only one explicit pattern (an active memory) and one implicit pattern (memory that is never reactivated). Our simulations confirm that without noise the implicit pattern will decay with time and only the explicit pattern survives (Fig. 5).

With random noise and the antisymmetric STDP rules (Fig. 6), we observe that the network behaves as suggested by our previous analysis: the strength of the explicit pattern fluctuates around a constant value and never decays. The implicit pattern may decay to the noise level, however, defined as the amplitude of an arbitrary pattern contained in the random noise, if the strength of pattern is set at a value lower than the transition point in Figure 4*B*. Conversely, if the strength of the implicit pattern is initially set to a value higher than the transition point, it will fluctuate around the second stable point of Figure 4*B* for a longer time (Fig. 7). In this case, the implicit memory is rehearsed by random noise and is kept in the network for a time much longer than the synaptic decay time τ_{0}.

Next, we simulate the dynamics of a network with one explicit pattern and multiple implicit patterns. Similar behaviors are observed: the strength of the explicit pattern fluctuates around a constant value; implicit patterns are maintained by random noise. When the initial strength of implicit patterns are set above the transition point, the implicit memories can be maintained for a period of time considerably longer than the typical decay-time τ_{0}. When the number of implicit patterns increases, some patterns may start to decay earlier than others (Fig. 8).

Figure 9 demonstrates behavior of the network that contains one explicit and five implicit attractors. During the simulation shown, two of the implicit attractor states decayed to the baseline, indicating that these implicit states have been forgotten. Forgetting was initiated when their strength fell below the transition point briefly due to a fluctuation. This behavior is expected in our simulations because we used the timescale of synaptic decay equal to 200 s. This choice was forced by the limits on the amount of time needed to run these simulations that also included the millisecond timescales. We anticipate that if the synaptic lifetime is close to several weeks, as in biological networks, the implicit patterns are more stable (for more detail, see Materials and Methods, the section “Validity of the MFA,” and Equation 31). Therefore, although the computer simulations attempted here can validate the behaviors described by the analytic calculations presented in the previous sections, the computer model is constrained to overestimate the effects of global fluctuations leading to transitions in some implicit states.

## Discussion

In this study, we examined the behavior of memory states stored in the weights of an attractor neural network. Our network included some realistic features, such as STDP learning rules, neural noise, and limited LTP/LTD lifetime. We assumed that learning occurs in the network on a continuous basis; that is, the weights are continuously updated to reflect ongoing activity. In these conditions, the network weights should reflect the ongoing activity that we described by the term explicit attractors. The other set of states, which we called implicit attractors, represent memories that were stored at some point in the past and that have not been revisited recently or within the lifetime of synaptic strengths. Such states are expected to disappear from the network weight matrix because of the decay of synaptic strengths. How can the network maintain implicit memory states despite synaptic decay?

We show that unstructured noise can substantially alter the dynamics of forgetting. Although input noise is unstructured in our model; that is, it does not contain stored patterns—when noise passes through recurrent synaptic weight matrix, it becomes colored. This means that the correlations in neural activity that are induced by noise reflect all memory patterns stored in the network, both explicit and implicit. This is because the memory states represent the directions in neural activities that are amplified by the positive feedback present in the recurrent network. Therefore, although the average neural activity represents recent states only, the correlations reflect the entirety of memories, including the old ones (i.e., implicit). Because synaptic learning is dependent on correlations, in principle, it can reinforce implicit memory states, when certain conditions are met. We show that the antisymmetric STDP rule, which contains both positive and negative components (i.e., both LTP and LTD; Bi and Poo, 2001; Froemke and Dan, 2002), can reinforce old memory traces without explicitly visiting them (Fig. 4). In contrast, non-negative symmetric STDP cannot stabilize old memories (Fig. 3).

In our model, the old memory traces (implicit) are never visited or accessed by the network. The network always resides near the set of newer states that are relevant to current behavior and, therefore, are called explicit. However, we propose that implicit states can be rehearsed (Fig. 10). The rehearsal occurs not because the average activity, but rather fluctuations reflect the old states. We call this form of rehearsal, in which the old memory is never directly accessed, implicit rehearsal.

A candidate mechanism making memory more robust involves rehearsals whereby old memories are constantly revisited and relearned via an ongoing process. This mechanism has been proposed to resolve both the problem of unstable synapses (Wittenberg et al., 2002) and the catastrophic interference problem (McClelland et al., 1995; Robins, 1996; Robins and McCallum, 1998). Because all old memory states must be explicitly visited within the time window of LTP decay, presumably in the sleep, it is unclear whether such a mechanism is realistic, especially if the number of patterns is large. Within our classification, the class of models proposed by Wittenberg et al. (2002) could be called explicit rehearsal networks.

Our model suggests the functional reason for the high degree of irregularity observed in firing of cortical neurons (Softky and Koch, 1993). In our model, irregular neuronal firing implements rehearsal of old memory patterns. When neural noise is passed through the network weight matrix, it captures the information about patterns stored in these connections. The ongoing synaptic plasticity can subsequently reinforce the stored patterns. Neural noise plays an essential role in generating correlations in neural activity. Similar roles can be played by the unreliable nature of synaptic vesicle release (Sudhof, 2004). Therefore, we propose that unreliable neural activity is the feature that helps cortical networks maintain stable connections.

In our model, both network weights and firing rates exhibit attractor behaviors. Firing rates can have several discrete states that are robust with respect to small perturbations and noise and are called network attractors. The identity of these states depends on the strengths of recurrent connections between neurons (Amit, 1989; Amit et al., 1994). In addition, these states represent long-term memories that are stored in the network weight matrix. In our model, network weights also exhibit attractor behaviors. We show that, because of ongoing synaptic plasticity, a weight matrix can have self-maintaining stable states that could also be called attractors. In Figure 4*B*, we show that synaptic weight matrix can have two stable states that correspond to a given memory pattern being present or absent from the network connectivity. Once the weight matrix is placed near the state that includes a given memory pattern, it will stay there for a long time, which ensures the stability of the memory of the pattern. The attractors of the weight matrix, in our model, are maintained by ongoing neural activity generated by network noise. As such, neural activity helps synaptic weights form stable states (i.e., attractors). In our model, firing rates and synaptic connections form two dual systems of attractors: synaptic weights help firing rates to exhibit discrete stable states and firing rates stabilize discrete self-maintaining states within network connections.

Our model can provide a rationale to the standard model of systems memory consolidation (Dudai, 2004). We proposed here that certain memory traces can be maintained in stable states over long periods by implicit rehearsal. The problem of placing the network into these states is not addressed here. However, we notice that the regions of stability surrounding stable memory states are narrow (stable point 2 in Fig. 4*B*). The parameters of network weights that describe the contribution of a given memory pattern must be tuned to a relatively precise value for the pattern to be stable. We argue that the function of placing the network in a narrow parameter range, which is necessary for long-term storage, is performed by memory consolidation. Once consolidated, that is, placed near stable point 2 (Fig. 4*B*), a memory pattern can be maintained by implicit rehearsal. Therefore, we argue that the functional role of system memory consolidation is to place the network weights within the narrow range of parameters where the memory trace can persist for a long time.

Our study provides experimentally testable predictions. Within the implicit rehearsal mechanism, synaptic strength is larger for synapses with stronger correlations between presynaptic and postsynaptic activities (Equation 10). The remainder of the network produces these correlations, which then reinforce the synaptic strength. This prediction can be tested if synaptic strength is measured simultaneously with ongoing neural activity for individual synapses. In doing so, one should isolate correlations of activity induced by measured synapse and the remainder of the network. This could be done pharmacologically or by including correlations over certain timescales, such as one temporal semi-axis for unidirectional synapses. Specific STDP kernel could also be surmised based on studies of synaptic plasticity or could be derived from the best match between synaptic strength and activity correlations. Overall, we propose that synaptic strengths are maintained by ongoing irregular spiking, which can be tested experimentally.

In our model, individual synapses are unstable, which is described by a “forgetting” term in Equation 9. This feature limits the strength of synapses for a given value of activity correlations, thus introducing a soft bound on synaptic strength. Soft bound implies that synaptic strengths are constrained but the synapses are not bound by a specific value. The behavior of individual synapses in our model is not expected to be different from the models that include hard limits on synaptic strengths (Amit and Fusi, 1994; Fusi, 2002; Fusi et al., 2005; Fusi and Abbott, 2007). In contrast to the models with a hard limit on synaptic strengths, in our model, strong synapses are possible, but their existence is less likely. In adopting this assumption, we were motivated by the observation of log-normal distribution of synaptic strength (Song et al., 2005; Koulakov et al., 2009; Mizuseki and Buzsáki, 2013), which implies that strong synapses are quite possible.

A related question has been addressed in the attempt to build molecular models of LTP (Crick, 1984). Although LTP lifetime, in most cases, is measured in weeks (Abraham, 2003), it is believed that molecules in synapses undergo turnover every several days (Lisman and Hell, 2008). Therefore, the persistence of LTP has to be reconciled with the dynamics of molecules that have relatively short lifetimes. Several studies have proposed how short-lived molecules can build a lasting synapse, including bistability (Lisman and Zhabotinsky, 2001; Miller et al., 2005) and self-sustaining molecular clusters (Shouval, 2005). This problem has many parallels with the question studied here because, in these models, relatively stable synapses result from activities of unstable molecules.

A related question, known in computational literature as the plasticity-stability dilemma, poses that a memory system must evolve to be able to both store new memories promptly and retain old information (Grossberg, 1987; Abraham and Robins, 2005). As a consequence of these contradicting requirements, the neural networks have to overcome what is known as a catastrophic forgetting or catastrophic interference phenomenon whereby old memories are continually overwritten by new ones (Mézard et al., 1986; Nadal et al., 1986; McCloskey and Cohen, 1989). Some solutions to catastrophic interference have been proposed (Carpenter and Grossberg, 1987). Here, we argue that, even without the challenge from novel memories, the known lifetime of LTP is not in agreement with the long-lasting nature of long-term memory.

An interesting observation is that stability of long-term memory in our model seems to imply stability of individual synapses, which is in contrast to the fleeting nature of synaptic strengths discussed in the introduction. Although our model does stabilize memory states, it also allows individual synapses to be unstable. Because, in our study, memory is delocalized and each memory trace is represented by nearly all synapses in the network, variability in individual synapses does not imply memory decay. Our model offers two distinct scenarios for how memory states are corrupted. First, they can completely disappear through a discontinuous jump between two stable points, as in Figure 4*B*. Second, they can slowly change by changing each individual synapse at a time, whereby a memory state morphs into something else. Because the number of synapses involved in each trace is large, this process may progress for a long time without substantial change in the representation of memory. Overall, our model predicts that synaptic lifetime observed in reduced preparations, such as in slices, should be shorter than *in vivo*, because the latter lifetime is improved by the ongoing activity. Rare instances when synapses show stability (Abraham et al., 2002) could be attributed to the mechanism of stabilization proposed here. *In vivo*, synaptic persistence can be shorter than memory retention time because memory is a collective property of a large ensemble of synapses.

### Conclusions

Here, we studied the stability of long-term memory patterns stored in a recurrent neural network. Our model includes ongoing synaptic plasticity regulated by STDP rules. We show that old memory traces can be stabilized by fluctuations of neural activity when STDP rules satisfy certain constraints. Old memory patterns become stable self-maintaining and persistent states of the network weight matrix. Our model provides a mechanism for the extension of memory lifetime via the combination of ongoing synaptic plasticity and neural noise.

## Footnotes

This work was supported by the Swartz Foundation and the National Institutes of Health (Grants NIH R01MH092928 and NIH R01DA036913).

The authors declare no competing financial interests.

- Correspondence should be addressed to Alexei Koulakov, Cold Spring Harbor Laboratory, 1 Bungtown Rd., Cold Spring Harbor, NY 11724. akula{at}cshl.edu