## Abstract

Triggered by recent experimental results, temporally asymmetric Hebbian (TAH) plasticity is considered as a candidate model for the biological implementation of competitive synaptic learning, a key concept for the experience-based development of cortical circuitry. However, because of the well known positive feedback instability of correlation-based plasticity, the stability of the resulting learning process has remained a central problem. Plagued by either a runaway of the synaptic efficacies or a greatly reduced sensitivity to input correlations, the learning performance of current models is limited. Here we introduce a novel generalized nonlinear TAH learning rule that allows a balance between stability and sensitivity of learning. Using this rule, we study the capacity of the system to learn patterns of correlations between afferent spike trains. Specifically, we address the question of under which conditions learning induces spontaneous symmetry breaking and leads to inhomogeneous synaptic distributions that capture the structure of the input correlations. To study the efficiency of learning temporal relationships between afferent spike trains through TAH plasticity, we introduce a novel sensitivity measure that quantifies the amount of information about the correlation structure in the input, a learning rule capable of storing in the synaptic weights. We demonstrate that by adjusting the weight dependence of the synaptic changes in TAH plasticity, it is possible to enhance the synaptic representation of temporal input correlations while maintaining the system in a stable learning regime. Indeed, for a given distribution of inputs, the learning efficiency can be optimized.

- Hebbian learning
- spike-timing-dependent plasticity
- synaptic updating
- symmetry breaking
- unsupervised learning
- infomax
- activity-dependent development

## Introduction

Correlation-based plasticity has long been proposed as a mechanism for unsupervised experience-based development of neuronal circuitry, particularly in the cortex. However, the specifics of a biologically plausible model of plasticity that can also account for the observed synaptic patterns have remained elusive. Two major issues are stability and competition (Miller and MacKay, 1994; Miller, 1996; Abbott and Nelson, 2000; Song et al., 2000; van Rossum et al., 2000; Rao and Sejnowski, 2001;van Ooyen, 2001). If maps, such as ocular dominance maps, emerge from initially random (but statistically homogeneous) synaptic configurations by a Hebbian mechanism (but see Crowley and Katz, 2000), this would imply that there is an inherent instability in the dynamics of synaptic learning that destabilizes an initially homogeneous synaptic pattern. However, this raises the question as to what mechanism prevents synapses from growing to unrealistic values when driven by unstable dynamics. The emergence of inhomogeneous synaptic patterns also requires a competition mechanism that makes some synapses decrease their efficacies as other synapses grow in strength. Such competition is absent in the most naive Hebb rule, which contains only a mechanism for synaptic enhancement. Recent experiments have led to an important refinement of correlation-based or Hebbian learning, by showing that activity-induced synaptic changes can be temporally asymmetric with respect to the timing of presynaptic and postsynaptic action potentials with a precision of down to tens of milliseconds. Causal temporal ordering of presynaptic and postsynaptic spikes induces synaptic potentiation, whereas the reverse ordering induces synaptic depression (Levy and Steward, 1983;Debanne et al., 1994, 1998; Magee and Johnston, 1997;Markram et al., 1997; Bi and Poo, 1998,2001; Zhang et al., 1998; Feldman, 2000; Sjöström et al., 2001).

In this work, we address the question of whether temporally asymmetric Hebbian (TAH) plasticity rules provide an adequate mechanism for unsupervised learning of input correlations. Two models of TAH plasticity have been studied recently that differ in the way that they implement the weight dependence of the synaptic changes and the boundaries of the allowed range of synaptic efficacies. The additive model (Abbott and Blum, 1996; Gerstner et al., 1996; Eurich et al., 1999; Kempter et al., 1999, 2001;Roberts, 1999; Song et al., 2000;Levy et al., 2001; Câteau et al., 2002) assumes that changes in synaptic efficacies do not scale with synaptic strength, and the boundaries are imposed as hard constraints. This model retains inherently unstable dynamics while exhibiting strong competition between afferent synapses. Because this model yields binary synaptic distributions, its ability to generate graded representations of input features is restricted. Moreover, because of the strong competition, patterns in the synaptic distribution can emerge that do not reflect patterns of correlated activity in the input. On the other hand, the multiplicative model (Kistler and van Hemmen, 2000; van Rossum et al., 2000; Rubin et al., 2001) assumes linear attenuation of potentiating and depressing synaptic changes as the corresponding upper or lower boundary is approached. This model results in stable synaptic dynamics. However, because of reduced competition, all synapses are driven to a similar equilibrium value, even at moderately strong input correlations. Thus, neither the additive nor the multiplicative model provides a satisfactory scenario for a robust learning rule that implements a synaptic storage mechanism of temporal structures in the inputs. Here, we introduce a nonlinear TAH Hebbian (NLTAH) model, a novel generalized updating rule that allows for continuous interpolation between the additive and multiplicative models. We demonstrate that by appropriately scaling the weight dependence of the updating, it is possible to learn synaptic representations of input correlations while maintaining the system in a stable regime. Preliminary results have been published previously in abstract form (Aharonov et al., 2001; Gütig et al., 2001).

## Materials and Methods

*Temporally asymmetric Hebbian plasticity.* We describe TAH plasticity as a change in the synaptic efficacy w between a pair of cells, where the range of w is normalized to [0, 1]. A single pair of presynaptic and postsynaptic action potentials with time difference Δt ≡ t_{post} − t_{pre} induces a change in synaptic efficacy Δw given by:
Equation 1The temporal filter K(Δt) = exp (−‖Δt‖/τ) (Song et al., 2000; van Rossum et al., 2000; Rubin et al., 2001) implements the spike-timing dependence of the learning. The time constant τ of the exponential decay determines the temporal extent of the learning window. Following experimental measurements (Bi and Poo, 1998), we let τ = 20 msec throughout this paper. The learning rate λ, 0 < λ ≪ 1, scales the magnitude of individual weight changes. The temporal asymmetry of the learning is represented by the opposite signs of the weight changes for positive and negative time differences. The updating functions f_{+}(w), f_{−}(w) ≥ 0, which are in general weight dependent, scale the synaptic changes and implement synaptic potentiation for causal time differences (Δt > 0), and depression otherwise. Here, we introduce a family of nonlinear updating functions in which the weight dependence has the form of a power law with a non-negative exponent μ:
Equation 2with α > 0 denoting a possible asymmetry between the scales of potentiation and depression. Figure1*A* shows the updating curves (Eq. 2) for several values of μ. For μ = 0, the updating functions are independent of the current synaptic efficacy, and the rule recovers the additive TAH learning model. This model requires that weights, which would have left the allowed range after an updating step, are clipped to the appropriate boundary (0 or 1). The case μ = 1 corresponds to the multiplicative model, in which the updating functions linearly attenuate positive and negative synaptic changes as a synapse approaches the upper or lower boundary of the allowed range. Intermediate values of the updating parameter μ determine the range of the boundary effects on the changes in w. Note that any non-zero μ, given a sufficiently small learning rate, automatically prevents the synaptic efficacies from leaving the allowed range [0, 1], thereby preventing the runaway problem of synaptic efficacies and removing the necessity of artificially clipping synaptic weights. Figure 1*B*provides an illustrative example of the effects of the parameter μ on a sequence of synaptic weight changes (see legend for details).

Following previous work (Kempter et al., 1999;Song et al., 2000; Rubin et al., 2001; but see van Rossum et al., 2000), the plasticity effects of individual spike pairs are assumed to sum independently: given a postsynaptic spike, each synapse is potentiated according to Equations1 and 2 by pairing the output spike with all preceding synaptic events. Conversely, a synapse is depressed when a presynaptic event occurs, using all pairs that the synaptic event forms with preceding output spikes.

*Mean synaptic dynamics.* Because in general the spike times of the presynaptic and postsynaptic neurons are stochastic, the dynamics of synaptic changes are also a stochastic process. However, if the learning rate λ is small, the noise accumulated over an appreciable amount of time is small relative to the mean change in the synaptic efficacies, called the synaptic drift. This drift, denoted asw˙, is the mean rate of change of the synaptic efficacy. Using Fokker–Planck mean field theory, the synaptic drifts are described in terms of the correlations between the presynaptic and postsynaptic activity (Kempter et al., 1999,2001; Kistler and van Hemmen, 2000; Rubin et al., 2001). We consider a pair of stationary presynaptic and postsynaptic processes described by the pulse trains ρ^{pre}(t) = ∑_{k}δ(t − t
) and ρ^{post}(t) = ∑_{k}δ(t − t
), with mean rates r^{pre} = 〈ρ^{pre}〉 and r^{post} = 〈ρ^{post}〉 and the raw cross-correlation function:
Equation 3The angular brackets denote averaging over time t while keeping the time lag Δt between the two spike trains fixed. This cross-correlation is the probability density for the occurrence of pairs of presynaptic and postsynaptic spikes with time difference Δt. Using this probability density, the synaptic drift w˙ is given by integrating the synaptic changes Δw (Eq. 1) over the time differences Δt weighted by the probabilities Γ_{pre,post}(Δt) :
Equation 4
The integral in the first term represents the synaptic depression that stems from all input–output correlations with negative time lag (i.e., acausal correlations). These correlations are filtered by the temporal window K(Δt) of the learning. This contribution from the spike-timing dependence of the learning is multiplied by the weight-dependent scale of depressing synaptic changes f_{−}(w). Conversely, the second term represents the potentiating drift originating from causal input–output correlations, which is scaled by f_{+}(w). Note that the weight-dependent scales f_{±}(w) are evaluated outside the time integrals, because when λ is small, w does not change appreciably during the time scale of the temporal filter of learning. In summary, the dynamic evolution of the synaptic weights depends on the properties of the correlation between the presynaptic and postsynaptic activity Γ_{pre,post}(Δt), which in turn depend on the details of the spike generation mechanism of the postsynaptic cell as well as on the statistics of the afferent inputs (Kuhn et al., 2003).

*Integrate-and-fire neuron.* To study the implications of the above NLTAH plasticity model in a biologically motivated spiking neuron, we simulate a leaky integrate-and-fire neuron, with parameters similar to those of Song et al. (2000). The membrane potential of the neuron is described by:
with membrane capacitance C_{m} = 200 pF, membrane resistance R_{m} = 100 MΩ, resting potential V_{rest} = −70 mV, and excitatory and inhibitory synaptic reversal potentials E_{exc} = 0 mV and E_{inh} = −70 mV, respectively. Whenever the membrane potential exceeds a threshold of −54 mV, an action potential is generated and the neuron is reset to the resting potential with no refractory period. Modeling synaptic conductance dynamics by α-shaped response functions, excitatory and inhibitory conductances are given by:
respectively, where the t_{j} values are the spike times of synapse j and τ_{exc} = τ_{inh} = 5 msec. The values
_{exc} = 30 nS and
_{inh} = 50 nS were chosen such that the total charge injected per spike (at the threshold potential) is Q_{exc} = 0.04 pC and Q_{inh} = 0.02 pC, respectively. While the efficacies w of the N = N_{exc} = 1000 excitatory synapses are plastic and governed by the TAH learning rule, all N_{inh}= 200 inhibitory efficacies are held fixed at −1. In the numerical simulations, the integrate-and-fire neuron is driven by Bernoulli (i.e., zero-one) processes defined over discrete time bins of duration ΔT = 0.1 msec, approximating Poisson spike trains with a stationary rate r. For the inhibitory inputs, r = 10 Hz. All equilibrium synaptic distributions obtained with this model neuron result from an initially uniform synaptic state with all efficacies set to 0.5. In each case, the learning process (learning rate, λ = 0.001) is simulated until the shape of the synaptic distribution ceases to change.

*Linear Poisson neuron.* To investigate analytically the properties of the TAH learning rule, we consider in addition a linear Poisson neuron (Kempter et al., 2001). The spiking activity of this neuron ρ^{post}(t) is a realization of a Poisson process with the underlying instantaneous rate function:
Equation 5where, as before, the N presynaptic input spike trains and the output spikes are characterized by a series of δ pulses [i.e., ρ
(t) = ∑_{k}δ(t − t
) and ρ^{post}(t) = ∑_{k}δ(t − t
)]. The parameter 0 < ε ≪ τ denotes a small constant delay in the output. Because this delay is small compared with the temporal window of learning, we approximate exp(−ε/τ) ≈ 1 throughout this work. As before, w_{j}(t) ∈ [0, 1] denotes the efficacy of the j th synapse. Except for Figures 5 and 9, in which we investigate the large N limit, we let N = 100 throughout this work.

In Figure 2*A*, we numerically simulate the linear Poisson neuron receiving uncorrelated Poisson input spike trains. Generating the spike arrival times in continuous time (down to machine precision), the postsynaptic process defined in Equation 5 is implemented by generating a postsynaptic spike with probability w_{i}/N, whenever a presynaptic spike arrives at a synapse (i) of the neuron.

*Mean synaptic dynamics for the linear Poisson neuron.* For the integrate-and-fire neuron, there is no simple exact expression relating the correlations between the presynaptic and postsynaptic spike trains to the system parameters such as the rates and input correlations. However, because of the linear summation of inputs in the linear Poisson neuron (Eq. 5), this model permits the expression of the input–output correlations Γ_{pre,post}(Δt) in closed form. Considering the case that all input spike trains have a common rate r, we obtain from Equations 3 and 5 that the correlation of the activity at synapse i with the output activity is:
Substituting the above in Equation 4, and rearranging the terms, we obtain the drift of the i th synapse:
where we define the normalized cross-correlations between the input spike trains by:
Equation 6We denote the integrated normalized cross-correlations appearing in the above drift equation by:
Equation 7These matrices are the effective between-input correlations for positive and negative time lags. If C
> 0, the activity at synapse j temporally follows that at synapse i, such that its contribution to the postsynaptic activity results in a potentiating drift on synapse i. Conversely, if C
> 0, the activity at synapse j precedes that at synapse i and contributes to its depression. Note that the effective correlations C
are zero if the i th and j th input spike trains are uncorrelated. Finally, the synaptic drifts can be written as:
Equation 8with Δf = f_{−} − f_{+}. Note that the first term in Equation 8 describes competition between the synapses when Δf > 0; independently from the input correlations, the amount of induced depression on a given synapse w_{i} is large when other synapses w_{j} are strong. The second term represents the cooperative increase of the synaptic weights inherent in TAH learning. In contrast, the last term denotes depressive synaptic interactions stemming from negative time correlations in the input activity.

*Generating correlated inputs.* We consider input spike trains with rate r and instantaneous correlations defined by:
Equation 9where δ(t) is the Dirac-δ function and c_{ij} is non-negative. In this case,
Equation 10The backward effective correlations C
vanish because the argument of Γ
(−Δt − ε) in Equation 7 is never 0. Recall that for a Poisson process ρ(t) with rate r, the raw autocorrelation is 〈ρ(t)ρ(t + Δt)〉 = r^{2} + rδ(Δt). Hence, the normalized autocorrelation is c_{ii} = 1 and the between-input correlations are c_{ij} ≤ 1, with equality only if the two spike trains i and j are identical. In the numerical simulations, we generate populations of correlated spike trains by conditioning the binwise spike probabilities at time bin T on the activity of a common reference Bernoulli spike train X_{0}(T) with the binwise spike probability rΔT. To obtain a positive pairwise correlation coefficient of 0 ≤ c = Cov (X_{i}(T), X_{j}(T))/
between two spike trains X_{i}(T) and X_{j}(T), the conditional probabilities ϑ = P(X_{k}(T) = 1‖X_{0}(T) = 1) and ϕ = P(X_{k}(T) = 1‖X_{0}(T) = 0) for k = i, j are determined by:
Equation 11This choice of ϑ and ϕ for all spike trains within the correlated group guarantees that the spike trains have rates r and an instantaneous pairwise correlation coefficient c (see ). For small bin sizes, this process mimics the instantaneously correlated Poisson point processes defined above (Eq. 9). We will also consider the case of delayed correlations of the form Γ
(Δt) = r^{−1}c_{ij}δ(Δt − D_{ij}). These correlations are obtained by shifting the instantaneously correlated input spike trains relative to each other by a time delay D_{ij}.

*Measuring the performance of learning rules.* A natural way to measure the performance of a learning rule is to quantify its ability to imprint the statistical features of the neuronal input onto the distribution of the learned synaptic weights. One measure of this ability is the mutual information between the neuronal inputs and the synaptic weights. However, direct calculation of the mutual information in cases in which the number of synaptic weights is large is computationally not feasible. Instead, we use here a related quantity that measures the effect of a small change in the statistics of the input on the learned synaptic weights. We denote the features of an ensemble of neuronal inputs by the vector Φ = (Φ_{1}, … , Φ_{R}), where the Φ_{i} parameterize specific input features (e.g., mean strength of the inputs or temporal correlations between different inputs). Given these features, we calculate the N × R susceptibility matrix χ_{ij}, the elements of which are:
Equation 12The ij th element measures the amount of change in the i th synaptic efficacy that is incurred by a small change in the j th input feature, Φ_{j}. A global sensitivity measure S is constructed from this matrix by calculating:
Equation 13where det(·) denotes the determinant. The average sensitivity S_{avg} is defined as 〈S〉_{Φ} where the average 〈·〉 is taken over the distribution of the feature vector Φ. The rational for calculating S is that it is closely related to the mutual information between the input features and the weight distribution. Specifically, if the mapping from the feature space to the synaptic weight space induced by the learning dynamics is invertible, maximizing S_{avg} is equivalent to maximizing the mutual information (Bell and Sejnowski, 1995; Shriki et al., 2001) in the limit of a small learning rate λ. In this work, we focus on the equilibrium properties of the TAH learning rule (i.e., the weight distributions that result after the learning dynamics have converged to a stable stationary state). Therefore, χ_{ij} is evaluated at the fixed point solution** w*** of the drift equations in the linear Poisson neuron, Equation 8 (see ). The possibility of using the analytic expressions for χ

_{ij}in calculating the sensitivity S is the main advantage of using it as a measure of performance of various learning models.

## Results

To understand learning phenomena in biological nervous systems in terms of neural network function, it is crucial to bridge the gap between the microscopic mechanisms that implement experience-based changes in neuronal signaling pathways and the macroscopic properties of the learning system composed of these pathways. In this paper, we focus on two general goals of learning that can be defined at the network level and also investigate the importance of the updating parameter μ of the learning rule in these contexts. First, we consider the question of how a network can develop a functional connectivity architecture, as for example in ocular dominance columns. As noted in the Introduction, this type of learning task typically requires the synaptic learning dynamics to be competitive, to allow segregation between initially homogeneous synaptic populations. Moreover, it is important that the learning process is robust in the sense that the learned synaptic patterns faithfully reflect meaningful features in the neuronal input activity, rather than being dominated by contributions from random noise. Therefore, we study here how the interplay between competition and stability in TAH plasticity affects the learned synaptic distributions. In the second part of Results, we turn to the conceptually different learning task of imprinting information about the input activity of a neuron into the respective synaptic efficacies. In this context, the sensitivity of the learning dynamics to features in the neuronal input becomes crucial. Thus, using the sensitivity measure introduced in Materials and Methods, the second part of Results concentrates on a quantitative evaluation of the performance of different TAH learning rules.

### The emergence of synaptic patterns by symmetry breaking in TAH learning

One of the basic requirements for the activity-driven formation of cortical maps is the ability of the learning to generate spatially inhomogeneous synaptic patterns from a population of synapses with statistically homogeneous inputs. The emergence of such symmetry breaking is an essential property of current cortical plasticity models (Miller, 1996). In this section, we study the conditions under which the TAH learning models introduced above exhibit symmetry breaking and, hence, qualify as candidate models for the development of functional maps. Moreover, because the learning dynamics may also lead to symmetry breaking that overrides the correlation structure of the afferent activity, it is important to ask what learning rules ensure a faithful representation of the input activity within the learned synaptic connections. We address these questions in three basic types of homogeneous afferent activities, that differ with respect to the correlation structure of the input spike trains: uncorrelated inputs, uniformly correlated inputs, and uniformly correlated subpopulations without correlations between the subpopulations (“correlated subgroups”). Before treating these specific cases, we highlight the general features of the synaptic learning dynamics in a population of synapses with statistically homogeneous input activities. These results apply to all three cases of homogeneous populations of inputs.

#### Dynamics of a population of synapses with homogeneous inputs

To study the symmetry breaking in the synaptic patterns, we consider the learning dynamics in cases in which the input statistics are spatially homogeneous. This means that each input obeys the same spike statistics and has the same pattern of correlations with the other inputs. This assumption implies that the presynaptic rates r_{i} (where i denotes the index of the different afferents) are all equal. Likewise, the total sum of the correlations that each input has with the rest of the inputs is the same. In particular, the mean effective causal correlations, C_{0}:
Equation 14is the same for all input channels i, and similarly for the backward correlations (Kempter et al., 2001).

To understand the implications of spatial homogeneity in the presynaptic inputs on the learning dynamics, it is useful to concentrate on the linear Poisson neuron model (Eqs. 5, 8). For convenience, we assume that all correlations between input spikes are instantaneous (see Materials and Methods, Eq. 10).

The important consequence of the spatial homogeneity across the presynaptic inputs is that the product of the effective correlation matrix C
with a homogeneous vector of synaptic efficacies *w*_{o} = (w_{o}, w_{o}, … , w_{o}) remains homogeneous, because C^{+}*w*_{o} = NC_{0}*w*_{o}. Hence, the homogeneous synaptic state is an eigenvector of the effective correlation matrix C^{+} with eigenvalue NC_{0}. The existence of this homogeneous eigenvector is important for the synaptic learning, because it means that in a homogeneous synaptic state *w*_{o}, all synapses experience identical drifts (Eq. 8). Moreover, if there is a w_{o} = w* such that the synaptic drifts become zero, the learning dynamics have a steady-state solution, with w_{i} = w* for all synapses. We call a solution in which all of the learned synapses are equal a homogeneous solution. Indeed, we show in the that for all non-zero updating parameter μ values in our model (Eq. 2), there exists a steady-state homogeneous solution of the learning (w˙_{i} = 0 in Eq. 8), with w* being the solution of the equation:
Equation 15This equation expresses the weight-dependent balance between depression and potentiation that is controlled by the mean effective correlation C_{0}.

Although the homogeneous synaptic steady state always exists, it may be unstable with respect to small perturbations of the synaptic efficacies, driving them into inhomogeneous states. Because of the important functional consequences of this emergence of inhomogeneous synaptic patterns at the network level, it is important to understand the features of the learning dynamics that give rise to this phenomenon of symmetry breaking. Therefore, we analyze the effects of small deviations of the synaptic efficacies from the homogeneous synaptic steady-state ** w***. For each synapse w

_{i}, we denote a corresponding small deviation from the homogeneous solution by δw

_{i}= w

_{i}− w* and express its temporal evolution as a function of all deviations δw

_{j}. As we show in the , this temporal evolution is determined by three separate contributions: where g

_{o}= w*f

_{+}(w*) [f

_{−}(w)/f

_{+}(w)]

_{w=w*}= αμw*

^{μ}/(1 − w*) > 0 (, Eqs.20, 21). The first term is a local stabilizing term. It counteracts individual deviations from the homogeneous solution, maintaining the synaptic efficacies at the same value w*. To understand the origin of this stabilizing term in the learning dynamics, we consider the effect of a single synaptic deviation δw

_{i}on the balance between depression and potentiation. If a synapse is strengthened by a deviation δw

_{i}> 0, the resulting scale of potentiation f

_{+}(w* + δw

_{i}) decreases, whereas the scale of depression f

_{−}(w* + δw

_{i}) increases (Eq. 2). Conversely, a weakening deviation δw

_{i}< 0 shifts the balance between potentiation and depression in favor of potentiation. Because this stabilizing drift stems from the weight dependence of the ratio f

_{−}(w*)/f

_{+}(w*) (), it is not present in the additive model (μ = 0), where the f

_{±}values themselves are constant. The second term is proportional to the net drift −Δf(w*) = f

_{+}(w*) − f

_{−}(w*). This drift is negative, because at the homogeneous solution, f

_{−}(w*) > f

_{+}(w*) when depression balances the potentiating correlations (see Eq. 15, and recall that C

_{0}> 0). The negative drift is multiplied by the total perturbation ∑

_{j}δw

_{j}, which denotes the change in the output rate attributable to the changes in the synaptic efficacies. Thus, this term represents the competition between the synapses. This competition results from the fact that strengthening the efficacy of any synapse increases the output rate, thereby increasing the frequency of occurrence of net negative drift in all of the synapses. It is important to note that this competition is acting between all synapses, unrelated to the correlation structure in the afferent input.

Finally, the last term is a cooperative term. Synapses that are positively correlated cooperate to elevate their weights. This cooperation is driven by the potentiating component of the TAH learning and depends on the pattern of correlations among the input channels. We emphasize that the cooperativity in the synaptic learning in general does not originate from a possible advantage of correlated synapses to drive a potentially nonlinear spike generator of the postsynaptic cell, but rather already occurs because of an inherently increased probability of correlated synapses to precede postsynaptic spikes, even when nonlinear cooperative effects in the spike generator are absent.

The stability of the homogeneous synaptic steady state results from the interplay between the stabilizing, the competitive, and the cooperative drifts in the learning dynamics. As we derive in the , perturbations of the steady state that slightly change all weights by the same amount δw (homogeneous perturbations) decay to zero with time and, hence, do not destabilize the learning of a homogeneous synaptic distribution. In contrast, inhomogeneous perturbations (i.e., perturbations in which the deviations of the synaptic efficacies from w* are not identical) can grow exponentially through the learning dynamics and drive the system into inhomogeneous synaptic states. In the , we specifically show that the homogeneous synaptic state becomes unstable if the largest real part of all inhomogeneous eigenvalues (eigenvalues corresponding to inhomogeneous eigenvectors) of the effective correlation matrix C^{+} is sufficiently large. Denoting this eigenvalue by NC_{1}, we find that when:
Equation 16the homogeneous state is unstable. This inequality means that symmetry breaking occurs whenever the cooperation between synapses C_{1} is strong enough to outweigh the stabilizing term g_{o}. Note that the competition coefficient Δf does not enter directly into the stability criterion. This is because the competition term is proportional to the total weight value, and hence is not sensitive to inhomogeneous perturbations that do not change this value. Nevertheless, this term has a crucial role in the stability of the learning, because it suppresses the homogeneous growth of all synapses. As is shown in the , Equation 16 implies that the homogeneous solution is always stable in the multiplicative model (μ = 1).

Although this analysis was performed using the plasticity equations of the linear Poisson neuron, it is qualitatively valid as well for other neuron models, as we show for specific cases. Below we study how the emergence of symmetry breaking (i.e., transitions from homogeneous to inhomogeneous synaptic distributions) depends on the nonlinearity of the TAH dynamics, namely the parameter μ, as well as on the asymmetry between depression and potentiation α, and on the size of the synaptic population N.

#### Uncorrelated inputs: linear neuron

In this section, we investigate the synaptic distributions that result from the TAH learning process when the postsynaptic neuron is driven by independent Poisson spike trains of equal rate r. For this input regime, it has been found in an integrate-and-fire neuron that additive learning (μ = 0) breaks the symmetry of the statistically identical presynaptic inputs and leads to a bimodal weight distribution (Song et al., 2000; Rubin et al., 2001). However, it was shown by Rubin et al. (2001) that multiplicative learning (μ = 1) leads to a unimodal distribution of synapses. As shown in the preceding section, these qualitatively different learning behaviors originate in the stabilizing effect of the weight dependence of the synaptic changes on the homogeneous synaptic state. Here, we study the generalized nonlinear TAH rule with arbitrary μ ∈ [0, 1].

In the uncorrelated case, c_{ij} = δ_{ij}, and hence C
= δ_{ij}/τr. Both its homogeneous and inhomogeneous eigenvalues normalized by N are:
Equation 17Thus, the only cooperation in the learning dynamics stems from the positive feedback induced by the correlation of each synapse with its own contribution to the postsynaptic activity. The effects of this self-correlation on the learning dynamics decrease inversely to the effective size of the presynaptic population τrN (i.e., the expected number of spikes that arrive within the learning time window).

By inserting the above expression for C_{0} into Equation 15, we obtain the steady-state efficacy w* of the synaptic population when the learned synaptic state is homogeneous (see, Eq. 19). In this case, the output rate of the linear neuron is given by this steady-state efficacy times the rate of the presynaptic inputs r (compare Eq. 5). Figure2*A* depicts the output rate of the postsynaptic neuron as a function of the presynaptic input rate r, for α = 1.05. We focus on this value of α here, because we want to compare the nonlinear rules with the additive rule. In the latter case, α must be close to 1; otherwise, practically all synapses will become zero (see ). For μ = 1 (multiplicative TAH), the efficacy w* is fairly independent of r and, hence, the output rate grows linearly with the input rate. However, if μ is sufficiently small, w* decreases inversely with the input rate, resulting in the output rate being nearly constant.

To study the regime in which the synaptic learning dynamics break the symmetry of the uncorrelated input population, we substitute Equation17 into Equations 15 and 16, computing the homogeneous solution w* (Eq. 19) and the regime of its stability. Figure3*A* depicts the critical contour lines according to the stability condition (Eq. 16). Each line traces the critical combination of the parameters μ and τrN for a fixed value of α, such that = 0. Outside the corresponding contour ( < 0), the homogeneous synaptic state is stable, and thus learning generally results in all synapses having the same efficacy. In contrast, inside the contour line, the learning dynamics induce symmetry breaking.

Figure 3*A* shows how the outcome of TAH learning depends on the effective size of the presynaptic population. For a sufficiently small τrN, the relative contribution of each input channel to the postsynaptic activity is large and, hence, the resulting strong positive feedback drives all synapses to a stable homogeneous state near the upper boundary (Fig. 3*B*, squares). In contrast, as τrN is increased, the effect of a single synapse on the postsynaptic activity decreases. Therefore, for a sufficiently large τrN, the stabilizing force induced by the weight dependence of the synaptic changes dominates the learning dynamics for any non-zero μ, resulting in a stable homogeneous synaptic state (Fig. 3*B*, triangles). In between the two extremes of small and large τrN, there is a regime of intermediate effective population sizes for which symmetry breaking may occur, with the synaptic population segregating into a strong and a weak group. Such a case is shown in Figure 3*B* (circles).

Importantly, Figure 3*A* demonstrates that as the number of afferents N increases, the regime of values of μ for which the homogeneous solution is unstable shrinks to zero. The inset of Figure 3*A* shows the value of μ at the border between stability and instability of the homogeneous solution, as a function of the effective population size. It is apparent that this μ decreases linearly with 1/(τrN) when τrN is large (also see ). Hence, for any sizable degree of weight dependence and large synaptic populations, symmetry breaking does not occur.

In the purely additive TAH model, synaptic changes do not scale at all with the efficacy of a synapse, and the weights have to be constrained by an additional clipping to prevent unrealistic synaptic growth. As a result, the additive learning dynamics do not possess stationary synaptic states in the above sense that the individual synaptic drifts become zero. Instead, synapses with positive drifts are held at the upper boundary, whereas synapses with negative drifts saturate at the minimum allowed efficacy. Our treatment of the additive model in the shows that the numbers of synapses gathering at the upper and lower boundaries critically depend on the ratio of depression and potentiation α, as well as on the effective population size τrN. As in the nonlinear TAH learning model, small effective synaptic populations (τrN < 1/(2(α − 1)) will lead to all synapses saturating at the upper boundary because of the strong positive feedback. However, as τrN increases beyond a critical value, the synaptic population breaks into two groups, one of which remains saturated at the upper boundary while the other, losing the competition, saturates at the lower boundary. The ratio of synapses saturating at the top boundary is n_{up} = 1/2τrN(α − 1) (). Because this ratio is inversely proportional to the input rate r, the output rate of the postsynaptic neuron becomes independent of the input rate, as shown in Figure2*A.*

#### Uncorrelated inputs: integrate-and-fire neuron

We now turn to the behavior of TAH learning in the integrate-and-fire neuron driven by uncorrelated inputs. Figure2*B* shows the output rate of this neuron model versus the input rate for different values of μ. As the figure demonstrates, the output-rate normalization quickly deteriorates as μ departs from the additive model and synaptic changes become dependent on the efficacy of the synapse. Figure 2*C* demonstrates that the sensitivity of the output rate to the parameter α rapidly diminishes as μ increases. Comparing *A* and *B* of Figure 2shows the qualitative similarity between the output rate responses of the linear Poisson and the integrate-and-fire model neurons. Note that we have not attempted to match the overall scale of the output rates in the two models. The output rate of the linear neuron can be arbitrarily changed by a gain factor without affecting any other results.

Figure 4 displays the histograms of the equilibrium distributions of learned synaptic efficacies as a function of the updating parameter μ. Recovering the behavior of additive (Song et al., 2000) and multiplicative (Rubin et al., 2001) updating models for μ = 0 and μ = 1, respectively, the plot reveals the transition between these models for intermediate values of μ. Specifically, it shows the emergence of symmetry breaking as μ approaches zero.

As expected from the analysis of the linear neuron, we find that in the integrate-and-fire neuron also, the critical value of μ at which the synaptic distribution becomes bimodal decreases as the effective population size τrN increases. Increasing the rate of the input processes from 10 Hz (Fig. 4*A*) to 40 Hz (Fig.4*B*) lowers the first occurrence of a bimodal weight distribution from μ_{crit} = 0.023 to μ_{crit} = 0.017. The inset in each panel depicts the equilibrium weight distribution for the intermediate value of μ = 0.019, showing a clearly bimodal distribution for the 10 Hz input (Fig. 4*A*) and a clearly unimodal distribution for the 40 Hz input (Fig. 4*B*). Moreover, as expected from the equations describing the homogeneous steady state in the linear neuron (Eqs. 15 and 17), the synaptic efficacy of the homogeneous state at a given μ decreases when the input rate increases.

It is interesting to note the close similarity in the μ dependence of the learned synaptic distributions in the linear and the integrate-and-fire neurons. For example, in both cases, the critical μ for symmetry breaking is close to 0.023 for input rates of 10 Hz [compare Fig. 4*A* with Fig. 3*B*(circles)]. This is despite the fact that the two models have very different spike generators and different sizes of synaptic populations. The reason for this similarity is that the input–output correlations in the integrate-and-fire neuron with 1000 synapses turn out to match in magnitude the corresponding correlations of the linear neuron with 100 synapses (data not shown).

In summary, for uncorrelated inputs and biologically realistic sizes of the presynaptic population, N, on the order of thousands, and for rates on the order of ≥10 Hz, the regime in μ and α in which symmetry breaking between uncorrelated inputs as well as output rate normalization occur is extremely narrow. Thus, the learning behavior changes qualitatively as soon as synaptic plasticity becomes weight dependent.

#### Uniformly correlated inputs

We briefly discuss here the case in which the presynaptic inputs have positive uniform instantaneous correlations, namely that for all i ≠ j, c_{ij} (Eq. 9) are equal. This situation may, for instance, occur when the entire presynaptic pool of a neuron is driven by a common source. Treating the behavior of the linear Poisson neuron, we show in the that positive uniform correlation increases the value of the synaptic efficacy in the homogeneous synaptic steady state. Moreover, the uniform correlation does not alter the 1/(τrN) dependence of the destabilizing drifts. As a result, in nonadditive learning, when the effective synaptic population is sufficiently large, the homogeneous steady state remains stable for any positive uniform correlation strength. In fact, these correlations increase the stability of the homogeneous state () and, hence, oppose the emergence of spontaneous symmetry breaking.

#### Correlated subgroups

We now consider afferent input activity to a neuron that is composed of M equally sized groups. These groups are defined by a uniform within-group correlation coefficient c_{ij} = c > 0 (compare Eq. 9) that is equal within all groups. For pairs of inputs belonging to different groups, the cross-correlation is zero. In this scenario, the M different presynaptic groups compete for control over firing of the postsynaptic neuron. We first treat the linear neuron and, for simplicity, focus on the case in which the overall number of presynaptic input channels N is large. In this limit, the homogeneous and largest inhomogeneous eigenvalues of C^{+} normalized by N are:
Comparing these expressions with their respective values in the case of N uncorrelated inputs (Eq. 17), we note that the learning behavior in both input scenarios is equivalent when N is identified with M/c, the number of correlated subgroups divided by the strength of the within-group correlation. The stability of the homogeneous synaptic steady state in a large network comprising M correlated synaptic subgroups, each with within-group correlation c, behaves as in an uncorrelated network of finite size M/c. Thus, in the limit of a large presynaptic population, the correlation strength c scales the effective number of presynaptic inputs from M for c = 1 to infinity for c = 0. Importantly, the largest inhomogeneous eigenvector is such that when the homogeneous solution loses stability, the symmetry is broken between the correlated subgroups and not within each subgroup.

The nature of the synaptic pattern that emerges once the homogeneous synaptic state loses stability depends on the number of afferent subgroups. Here we focus on the example of two equally sized subgroups (i.e., M = 2). A similar scenario, which is motivated by the problem of the activity-driven development of ocular dominance maps, has recently been studied by Miller and MacKay (1994) and Song and Abbott (2001). The regimes of symmetry breaking in which the learned synaptic efficacies segregate according to the two correlated input groups are depicted in Figure5*A* (this figure is equivalent to Fig. 3*A*, with c replacing 2/N). Thus, symmetry breaking between two correlated subgroups can occur in nonadditive TAH learning models even when the number of presynaptic inputs N is large. This is demonstrated in Figure5*B* (solid black line), which plots the learned synaptic efficacies as μ as varied, with c held fixed at 0.11. As is evident from Figure 5, for this level of correlation, symmetry breaking occurs below a fairly high value of μ ≈ 0.15. Note that in contrast to our treatment of the uncorrelated inputs, here we do not use α close to 1 but rather set it to a generic value of α = 1.5.

Figure 5, *C* and *D*, describes the behavior of the system as the within-group correlation is gradually turned on. As expected from the analysis of the uncorrelated input scenario, the substantial weight dependence of the synaptic changes induced when μ = 0.15 (solid black lines), yields a stable homogeneous synaptic state if the within-group correlation is sufficiently weak. However, when the correlation reaches a critical value, the homogeneous state becomes unstable and the synaptic efficacies segregate into the two input groups, with the one winning the competition suppressing the other. As the correlation increases still further, another transition may occur at a higher value of c, above which the homogeneous synaptic state becomes stable again. The presence of this second transition (which is discontinuous) depends on the values of τr, the expected number of input spikes per synapse arriving within its learning time window, and the ratio of depression and potentiation α (Fig. 5, compare *C* and *D*). Importantly, for large values of μ, in particular in the multiplicative model (μ = 1), the stabilizing force is so strong that the homogeneous synaptic state remains stable for all positive correlation strengths (Fig. 5*C*, *D*, dashed black lines; also see ), and no segregation is possible.

The behavior described above for the linear neuron is reproduced qualitatively in simulations of the integrate-and-fire neuron, as shown in Figure 6*A.* To address the question of whether symmetry breaking in the integrate-and-fire neuron can also occur at higher values of μ, we follow the linear Poisson neuron analysis shown in Figure5*A*, which suggests that increasing the value of α extends the μ range of bimodal synaptic distributions. Figure6*B* displays the learned synaptic distributions as a function of μ for a within-group correlation c = 0.05, with α = 1.5 and r = 10 Hz. Similar to the linear neuron findings shown in Figure 5*B*, symmetry breaking occurs here in a wide regime of μ.

To emphasize important differences between symmetry breaking in nonlinear versus additive TAH learning, Figure7 shows corresponding learned synaptic efficacies for selected cases of low, intermediate, and high within-group correlations. Figures 7*A–D* depicts learned weight distributions from Figure 6*A* for which μ = 0.019. For each correlation, synaptic efficacies resulting from additive learning are depicted on the right (Fig.7*E–H*). Except in Figure 7, *A* and*B*, where c = 0 (i.e., no input subgroups are defined), the synaptic distribution of the subgroup with higher mean efficacy is depicted in light gray, whereas that of the subgroup with lower mean efficacy is displayed in dark gray.

Inspection of Figure 7, *A* and *B* versus*E* and *F*, shows that in the regime of low correlations, the learning behavior induced by the two types of plasticity is qualitatively different. While in nonlinear TAH learning (Fig. 7*A,B*), the homogeneous synaptic state is stable and all synapses distribute around the same mean efficacy, unstable additive learning induces symmetry breaking (Fig.7*E,F*). Importantly, this symmetry breaking in general does not reflect the correlation structure in the afferent input. As shown in Figure 7*F*, when the within-group correlation is 0.03, the 500 synapses of the group winning the competition (light gray) split into two fractions of 325 versus 175 synapses, of which the larger fraction tends to zero efficacy and mixes with the efficacies of the losing input group. In contrast, in the nonlinear TAH model, unfaithful splitting of the weights occurs only for extremely small values of μ, of the order of 1/τrN (Fig. 7, compare*A* and *B* with *E* and*F*). This is because symmetry breaking within a uniformly correlated group does not occur for μ > 1/τrN, and hence the weights of each subgroup remain the same.

For intermediate strengths of the within-group correlation, both learning rules induce symmetry breaking that faithfully reflects the structure of the input correlation, with the synaptic distributions of the two input groups well separated. This is shown in Figure 7,*C* and *G*, for a correlation of c = 0.1. Note, however, that whereas in additive learning the efficacies of both input groups reach the respective boundaries of the allowed range (i.e., are clipped to saturation), the weights resulting from nonlinear TAH learning do not saturate. As we show in the next section, this property of NLTAH plasticity enhances the sensitivity of the synaptic population to changes in the strength of the within-group correlation. Finally, when the within-group correlation is strong, in both types of learning all efficacies become large (Fig.7*D,H*).

Clearly, the detailed quantitative properties of the learned synaptic patterns, as well as the parameter values at which symmetry breaking occurs, depend on the neuron model, and specifically on the spike generating mechanism. Nevertheless, the striking qualitative similarity in the findings from both neuron models investigated here suggests that the symmetry breaking induced by the within-group correlations is a general property of the nonlinear TAH rule with small but non-zero μ, independent of the specifics of the spike generator.

### Synaptic representation of input correlations

In the previous section, we studied the emergence of symmetry breaking in homogeneous synaptic populations for different types of instantaneously correlated input activity. In this section, we study the more general issue of how information about the spatiotemporal structure of the afferent input is imprinted into the learned synaptic efficacies by TAH plasticity. Specifically, we investigate how the weight dependence of the synaptic changes affects the sensitivity of the learning to features embedded in the input spike trains.

An example of the associated phenomena is shown in Figure8. Here we study the effect of weight dependence on the steady-state synaptic efficacies of the integrate-and-fire neuron receiving 1000 Poisson inputs that comprise a small subgroup of 50 correlated synapses (c = 0.1) while all other input cross-correlations are zero. In this scenario, the subgroup is statistically distinct from the rest of the synaptic population. The coherence of spikes within the subgroup increases the causal correlation of the member synapses with the spiking activity of the postsynaptic neuron. Because of the ensuing cooperation between the correlated synapses, they grow stronger than those of the uncorrelated background. Figure 8 shows how the strength of the stabilizing drift induced by the weight dependence of the synaptic updating modulates the degree of separation between the two subpopulations. For decreasing values of μ, learning becomes increasingly affected by the correlation structure in the input, and the separation between the subgroup and the background is more pronounced. However, below a critical μ, the homogeneous state of the uncorrelated population loses stability and splits, resulting in a bimodal distribution of the background synapses. As a consequence, the representation of the afferent correlation structure in associated groups of synaptic efficacies is confounded by the mixing of the high-efficacy mode of the background with the subgroup of correlated synapses. This example raises the general problem of finding an optimal learning rule that, for a given type of input activity, compromises best between sensitivity and stability.

To address this question, we need a quantitative measure for the performance of a learning rule in imprinting information about the input correlations onto the synaptic efficacies. Here we apply the sensitivity measure S (Eq. 13, Materials and Methods), which quantifies the sensitivity of the learned synaptic state to changes in features embedded in the input correlation structure. When S is high, small changes in the input features are picked up by learning and induce a large change in the learned synaptic efficacies. We emphasize that the goal of this performance measure is to quantify and compare general properties of different plasticity rules. It is therefore based only on the relationship between the afferent neuronal inputs and the learned synaptic efficacies. In particular, it avoids direct reference to the neuronal output activity.

We first illustrate the application of the sensitivity measure by considering a simple example in which the input feature to be represented by the learned synaptic efficacies is only one dimensional (i.e., a scalar quantity). Specifically, we apply S to the scenario discussed in the previous section, of two independent input groups with within-group correlation c. We investigate the behavior of the linear Poisson neuron and quantify how the sensitivity of the learned synaptic distribution to the strength of the within-group correlation is affected by the weight dependence of the synaptic changes. We consider the sensitivity of the learning as a function of μ for a fixed correlation of c = 0.11. As shown in Figure 5, *B* and *C*, this correlation represents an intermediate correlation strength in the linear Poisson neuron treatment. Using the steady-state synaptic efficacies from Figure 5*B*, we compute S for values of μ between 0 and 0.5 (see Materials and Methods, ). Figure9 shows the resulting sensitivity curve. We note that each point quantifies the sensitivity of the learned synaptic weights to small changes in the correlation strength around c = 0.11.

As can be seen in Figure 5*B*, there are two qualitatively distinct regimes of synaptic distributions emerging from learning in this case. For high values of μ, no symmetry breaking takes place, and the correlation strength is represented by the common mean value of the synaptic efficacies. In this regime (μ ≳ 0.15), S decreases monotonically with increasing μ (Fig. 9), because the higher weight dependence strengthens the confinement of the homogeneous synaptic state to the center range of the synaptic efficacies. For lower values of μ, symmetry breaking occurs, and the correlation strength is represented by the mean efficacy values of the two resulting groups. In this regime, S is nonmonotonous in μ. For very low μ, the synaptic efficacies are close to saturation at the boundaries and, hence, a change in the correlation strength cannot induce a large change in the efficacies. However, the centralizing drift induced by a large μ reduces the sensitivity. Thus, S has a maximum at an intermediate μ (in the present case around μ = 0.02). Finally, at the transition between the regions of homogeneous and bimodal synaptic distributions (μ ≈ 0.15), sensitivity is large, because here a small change in c may cause an abrupt and large change in the synaptic efficacies, namely a bifurcation from a homogeneous to an inhomogeneous synaptic distribution. Note, however, that this transition region in μ is narrow.

We now turn to a richer input scenario in which the afferent correlation structure is inhomogeneous and the input feature space to be represented by the learned synaptic efficacies is high-dimensional. Specifically, we consider presynaptic activity in which each synapse receives spike inputs with a specific relative latency with respect to the remaining synaptic population. Such latency or delay-line scenarios have been studied previously in the context of additive TAH learning (Gerstner et al., 1996; Song et al., 2000) and can, for instance, be motivated by their analogy to certain delay-line models in auditory processing (Jeffress, 1948).

We consider the input activity to consist of N time-shifted versions of one common Poisson spike train with rate r. Because the synaptic learning process depends on the relative timing of the input spikes, we fix one presynaptic input as reference, and treat the remaining N − 1 delays Δ = (Δ_{1}, … , Δ_{N−1}) as R = N − 1 dimensional vector of input features to be represented by the learned synaptic weights. Whereas the delays Δ fully specify the temporal correlation structure of the neuronal input activity, S measures the sensitivity of the learned synaptic efficacies to small independent changes in the individual delays. Because of the temporal sensitivity of TAH plasticity, it is intuitively clear that the learning dynamics will critically depend on the temporal scale of the relative delays. Although it is a natural choice to set this temporal scale through the SD of a Gaussian distribution from which the delays are drawn (Song et al., 2000; Aharonov et al., 2001; Gütig et al., 2001), we here apply the sensitivity measure to the simpler case in which we fix Δ such that the delays between the N inputs are uniformly spaced at a fixed delay ς/(N − 1) [i.e., Δ_{i} = iς/(N − 1) (i = 1, 2, … , N − 1)]. We have checked that the qualitative behavior of S in the case of a fixed delay spacing is similar to that of S_{avg} (see Materials and Methods) obtained from averaging over an ensemble of Gaussian delay vectors with SD ς (Aharonov et al., 2001; Gütig et al., 2001).

We investigate here the behavior of the linear Poisson neuron. One important difference between the delay-line input scenario considered here and the input correlations treated above is that here non-zero cross-correlations between input spike trains also exist at negative time lags. Specifically, if the delays of the input activities of synapses i and j are given by Δ_{i} and Δ_{j}, respectively, and the additional delay of the postsynaptic neuron is ε (Eq. 5), the delay difference Δ_{i} − (Δ_{j} + ε) determines the temporal position of the sharp peak in the otherwise zero effective correlation between the two shifted Poisson inputs (Eq. 7). If this delay difference is negative, the output activity contributed by the j th synapse lags behind the input spikes at the i th synapse. Hence, the j th synapse contributes to the potentiation of synapse i, and the respective effective causal correlation C
is positive. Correspondingly, in this case the backward effective correlation C
contributed by synapse j to the depression of synapse i is zero. Conversely, if the delay difference between the i th and j th input spike trains is positive, synapse i is depressed by the activity of synapse j, because C
becomes positive. In both cases, the magnitude of the effective correlation is scaled by the exponentially decaying time dependence of the learning rule (Eq. 1). The full expressions for the effective correlation matrices C
and C
are given in the .

To calculate S for a given delay vector Δ, we numerically solve the drift equation of the synaptic learning (Eq. 8) for the synaptic steady state. Using the resulting learned synaptic distributions, we compute the susceptibility matrix χ (Eq. 12,), giving S (Eq. 13). Figure10*A* shows the sensitivity S as a function of μ for different values of the temporal delay spacing ς. The curves clearly show an optimal weight dependence of the synaptic changes for which the sensitivity peaks. For larger values of μ, the performance of the learning deteriorates because the increasing confinement of the synaptic weights to the central range of efficacies restricts the sensitivity of the learning to changes in the input correlation structure. Conversely, for lower values of μ, the sensitivity is impaired because the synaptic efficacies are beginning to saturate at the boundaries of the allowed range as bimodal efficacy distributions emerge. The value of μ that optimally adjusts the weight dependence of the synaptic changes depends both on the system parameters and on the input correlations determined by the relative time delays between the inputs. Increasing ς (i.e., increasing the relative delays) weakens the effective correlations between the presynaptic inputs because of the exponentially decaying temporal extent of the learning rule (Eq. 7). Hence, a lower weight dependence of the synaptic changes (corresponding to a lower value of μ) is needed to pick up the correlations and allow sufficient sensitivity of the learning to the input delays. The effect of this change in the temporal extent of synaptic interactions on the learned efficacies is shown in Figure 11, which for each μ depicts all N synaptic efficacies for ς = τ (Fig. 11*A*) and ς = 4τ (Fig.11*B*). Note that because of the equidistant delays, the relationship between the relative temporal position of a synapse within the presynaptic population and its steady-state efficacy is monotonic, with the leading synapse (Δ = 0) taking the largest weight. In the foreground, the corresponding sensitivity curves are shown. The plots clearly demonstrate that the saturation regime in which most synaptic weights accumulate at the boundaries of the allowed range (black and white) begins at higher values of μ when the temporal dispersion of the inputs is small (Fig. 11*A*) (i.e., the synaptic interactions are strong). The plot also reveals that in both cases for low values of μ, only the leading synapse remains at a high value. Finally, it can be seen that the peaks in the sensitivity curves approximately coincide with those values of μ for which the synaptic weights smoothly cover a large range of efficacies, as shown by the gradual change from dark to light values in the corresponding vertical cross sections.

Finally, we ask how the learning sensitivity depends on the statistics of the input delays for a fixed value of μ. To answer this question, Figure 10*B* shows S as a function of the delay-line spacing ς, demonstrating that S does not vary monotonically with ς, but rather has a maximum at an optimal temporal separation of the inputs. This is because tight spacing leads to strong effective correlations between the inputs, driving the synapses toward saturation. On the other hand, loose spacing reduces the effective correlations between the presynaptic inputs to the extent that the learning behaves essentially as if driven by an uncorrelated presynaptic population.

## Discussion

The understanding of activity-dependent refinement of neural networks has long been one of the central interests of synaptic learning studies. In this context, most investigations of unsupervised learning using correlation-based plasticity rules have been conducted in the framework of additive plasticity models, which do not incorporate explicit weight dependence in the changes of synaptic efficacies. These simple models suffer from stability problems: either all synapses decay to zero or they grow without bound. An additional problem inherent to simple Hebbian models is the lack of robust competition. Indeed, it has been found that even the inclusion of synaptic depression mechanisms does not provide a robust source of synaptic competition, unless synaptic plasticity is fine-tuned to approximately balance the amount of potentiation and depression (Miller, 1996).

Recent studies of experimentally observed temporally asymmetric Hebbian learning rules have added two new ideas. One idea is that under these plasticity mechanisms, synapses compete against each other in controlling the time of firing of the target cell and, thus, engage in competition in the time domain. Although TAH learning rules are indeed inherently sensitive to temporal correlations between the afferent inputs, we have shown here that this sensitivity alone is not sufficient to resolve the problems associated with either stability or competition. In the additive model of TAH plasticity, hard constraints need to be imposed on the maximal and minimal synaptic efficacies to prevent the runaway of synaptic strength. In addition, as was shown here, in this model synaptic learning is competitive only when the ratio between depression and potentiation is fine-tuned, and even then the emergent synaptic patterns do not necessarily segregate the synaptic population according to the correlation structure in the neuronal input. The second idea is that TAH rules would exhibit novel behavior because of the role of the nonlinear spike-generation mechanism of the postsynaptic cell (Song et al., 2000). In fact, we have shown in this work that the qualitative features of TAH plasticity are strikingly insensitive to the nonlinear integration of inputs in the target cell (see also Kempter et al., 2001). For the parameter choices studied, the properties of the synaptic steady states in the integrate-and-fire neuron are qualitatively similar to those found in a linear input–output model for neuronal firing. Nevertheless, we note that there are substantial quantitative differences between the two models, particularly with respect to the parameters τrN and c, which effectively determine the correlations between the presynaptic and postsynaptic spike trains. Although a quantitative analysis of these differences is beyond the scope of our work, such a study might reveal interesting insights into the quantitative effects of the details of the postsynaptic spike generator on the learned synaptic distributions. In addition, it is possible that the details of the spike-generation mechanism will affect the transient phase (i.e., the dynamics) of the synaptic learning process.

From the present work, we conclude that some of the underlying difficulties in correlation-based learning are alleviated by nonlinear plasticity rules such as the NLTAH rule. The nonlinear weight dependence of the synaptic changes provides a natural mechanism to prevent runaway of synaptic strength. As in additive TAH learning, synaptic competition is provided by the mixture of depression and potentiation. However, in NLTAH plasticity, the balance between depression and potentiation is maintained dynamically by adjusting the steady-state value of the synaptic efficacies. Indeed, we have shown that this competition is sufficient to generate symmetry breaking between two independent groups of correlated presynaptic inputs. However, for this to occur, the stabilizing drift induced by the weight dependence of the synaptic changes should not be too strong. In particular, the simple linear weight dependence (Eq. 2, with μ = 1) assumed in the original multiplicative model is incapable of breaking the symmetry between competing input groups. In fact, we have shown that with μ = 1, the homogeneous synaptic state is stable for any pattern of homogeneous input correlations, provided there are no negative correlations in the afferent activity. The present power-law plasticity rule with 0 < μ < 1 provides a reasonable balance between the need for a stabilizing force and a potential for spontaneous emergence of synaptic patterns. Our study of symmetry breaking between two competing groups of correlated synapses is inspired by the activity-dependent development of ocular dominance selectivity. This scenario has also been studied recently bySong and Abbott (2001) using the additive version of TAH plasticity. In their model, achieving a faithful splitting between the two competing input groups with weak correlations requires relatively tight tuning of the depression to potentiation ratio, α.

One of the surprising results of our investigation is the possibility that when the correlation within input groups is made strong, the stability of the homogeneous synaptic state may be restored. We have shown that this apparently counterintuitive behavior, predicted by the analytical study of the mean synaptic dynamics of the linear Poisson neuron, is also seen in simulations of the full learning rule in the integrate-and-fire neuron. It would be interesting to explore possible experimental testing of this result, perhaps in the context of the development of ocular dominance. In this work, we have limited ourselves to correlated subpopulations of inputs with positive within-group correlations. However, in general, negative correlations are an additional potential source of competition (Miller and MacKay, 1994). Furthermore, we have not addressed the important issue of competition between synapses that target different cells. Lateral inhibitory connections between target neurons may provide a source of such competition.

The last part of the present work addresses situations with inhomogeneous input statistics. Different inputs are distinct in their temporal relationship to the rest of the input population. Here the issue is not whether a spatially modulated pattern of synaptic efficacies will form through TAH learning, but rather whether this pattern will efficiently imprint the information embedded in the input statistics. To quantify the imprinting efficiency of the learning rule, we introduced a new method for measuring learning rule sensitivity. In the present context, this measure quantifies the amount of information about the temporal structure in the inputs that a TAH rule can store. Using this method to study the novel class of NLTAH plasticity rules introduced here, we find that the optimal learning rule depends on the input statistics, in the present example on the characteristic time scale of the temporal correlations between the inputs. This finding suggests that biological systems may have acquired mechanisms for metaplasticity to adapt the learning rule to slow temporal changes in the input statistics. It should be pointed out that the sensitivity measure S focuses entirely on how the learned synaptic distribution changes as a result of changes in the correlation pattern among the input channels. It does not, however, address the problem of “readout,” namely how the resulting changes in the synaptic distribution affect the firing pattern of the output cell. A measure that takes the postsynaptic spike train into account will in general depend on the details of the spike-generating mechanism rather than only capture the properties of the learning rule. In general, however, any readout mechanism will depend on the information that is available in the learned synaptic state. Hence, if the learning itself is insensitive to changes in the input features, the synaptic efficacies will fail to represent these changes and no readout mechanism will be able to extract them. The sensitivity measure S therefore provides an upper bound on the learning performance of the full neural system (including readout). In summary, while quantitative claims about the optimality of specific learning rules have to consider specific readout mechanisms, our study of the general properties of the investigated plasticity rules provide general insights into the mechanisms that enable unsupervised synaptic learning to remain sensitive to input features during learning.

Present experimental results (Bi and Poo, 1998) based on the averaging of individual efficacy changes in different synapses suggest the possibility that indeed the ratio of depressing and potentiating synaptic changes increases in a stabilizing manner as synapses grow stronger (cf. van Rossum et al., 2000). However, available data do not provide conclusive evidence regarding the details of the weight dependence of the efficacy changes. Our work clearly demonstrates the importance of the weight dependence of the TAH updating rule. Synaptic learning rules that implement a stabilizing weight dependence of the type introduced in this work have several advantageous properties for the learning in neural networks. Specifically, our results predict that synaptic changes should be neither additive nor multiplicative, but rather should feature intermediate weight dependencies that could, for instance, result from a gradual saturation of the potentiating and depressing mechanisms. It will be interesting to see whether future experimental results will confirm such a prediction. In this context, it is also important to note that recent experiments and modeling studies reveal important nonlinearities in the accumulation of synaptic changes induced by different spike pairs (Castellani et al., 2001;Senn et al., 2001; Sjöström et al., 2001) as well as evidence for complex intrinsic synaptic dynamics that challenges the simple notion of a scalar synaptic efficacy (Markram and Tsodyks, 1996). The theoretical implications of these sources of nonlinearity and intrinsic dynamics remain to be explored.

## Appendix

### Generating correlated spike trains

We show here that two spike trains that are generated by conditioning their binwise spike probabilities on the activity of a common reference spike train X_{0}(T) as described in Materials and Methods, have a pairwise correlation coefficient c. For clarity, we denote X_{i}(T) simply by X_{i}. The pairwise correlation coefficient is defined by Cov (X_{i}, X_{j})/
, where the covariance is Cov (X_{i}, X_{j}) = E[X_{i}X_{j}] − E[X_{i}]E[X_{j}]. Because X_{i} is either 0 or 1, E[X_{i}] = P(X_{i} = 1), and therefore:
where in the final step we use Equation 11. Note that because E[X_{i}] = rΔT, the spike train X_{i} has rate r. Similarly:
Finally, because Var (X_{i}) = (rΔT)(1 − rΔT), the binwise correlation coefficient becomes:

### Homogeneous synaptic steady state for a homogeneous population of synapses

We derive the homogeneous synaptic steady-state solution by setting w˙_{i} = 0 and*w _{i} = w** in Equation 8 with C
= 0:
Using Equation 14, this yields:
which, discarding the trivial solution in which all synapses are zero, implies that at the homogeneous steady state:
Equation 18From this, we find that the homogeneous solution is given by w

_{i}= w*, where w* is the solution to Equation 15. Because C

_{0}is positive, this equation has a unique solution 0 < w* < 1 for any μ > 0. Note, however, that in the additive model where μ = 0, the ratio f

_{−}(w)/f

_{+}(w) = α and, hence, in general there is no homogeneous synaptic steady state unless w* = 0 or all weights are clipped to the upper boundary.

For uncorrelated input activity C
= δ_{ij}/(τr) and hence C_{0} = 1/(τrN). Substituting C_{0} into Equation 15, we obtain the homogeneous solution for this input scenario:
Equation 19

### Stability of the homogeneous synaptic steady state

We analyze the stability of the homogeneous synaptic steady state by deriving the time evolution of small perturbations δw_{i} = w_{i} − w* of the synaptic efficacies w_{i} around the steady-state value w*. If these perturbations decay to zero, the homogeneous steady state is stable. For small perturbations, the time evolution is given by:
Using the expression for the synaptic drifts from Equation8, we obtain:
where:
Equation 20
In the second step, we used Equation 15 to substitute for 1 + C_{0}. Note that the factor g_{o} is proportional to the derivative of the ratio between the scales of negative and positive synaptic changes with respect to the weights. Hence, g_{o} = 0 in the additive model. For μ > 0:
Equation 21
is positive because 0 < w* < 1. In matrix notation, the time evolution of a synaptic perturbation δ** w**can be rewritten as:
with the matrix:
If the eigenvalues of J are negative, all perturbations of the homogeneous state are attenuated by the learning dynamics and, hence, this synaptic state is stable. In contrast, if any eigenvalue of J is positive, a perturbation along the direction of the corresponding eigenvector will grow exponentially. The matrix J has a homogeneous eigenvector with eigenvalue −Ng

_{o}− NΔf(w*) + NC

_{0}f

_{+}(w*), which using Equation 18 reduces to −Ng

_{o}. Because this eigenvalue is always negative, the homogeneous component of any perturbation δ

**decays to zero with rate λτr**

*w*^{2}g

_{o}. In contrast, the temporal evolution of the strictly inhomogeneous component of δ

**(whose elements sum to zero) comprises a spectrum of rates that are determined by the various eigenvalues of J that correspond to inhomogeneous eigenvectors. The largest inhomogeneous eigenvalue of J is f**

*w*_{+}(w*)NC

_{1}− Ng

_{o}, where NC

_{1}denotes the largest inhomogeneous eigenvalue of C

^{+}. Hence, the homogeneous synaptic state is stable if f

_{+}(w*)NC

_{1}− Ng

_{o}< 0, which gives the stability criterion stated in Results (Eq. 16). Inserting Equations 15 and 21 into the criterion 16, we obtain an upper bound for μ

_{crit}, the largest value of μ for which the homogeneous solution is unstable: Equation 22An important observation is that C

_{1}≤ C

_{0}, and hence this bound is necessarily smaller than 1, implying that the homogeneous solution is always stable in the multiplicative model where μ = 1. To see that C

_{1}≤ C

_{0}, recall that because NC

_{1}is an eigenvalue of C

^{+}, there is an eigenvector

**such that C**

*v*^{+}

**= NC**

*v*_{1}

**. Specifically, for v**

*v*_{m}, the largest component of

**, this implies that NC**

*v*_{1}v

_{m}= ∑ C v

_{j}, and hence, NC

_{1}= ∑ C (v

_{j}/v

_{m}) ≤ ∑ C . But because ∑ C = NC

_{0}(Eq. 14), this yields C

_{1}≤ C

_{0}.

For uncorrelated input activity, the above bound for μ_{crit} becomes:
Equation 23where we used C_{0} and C_{1} from Equation 17. Hence, for large τrN, the regime of μ for which symmetry breaking exists vanishes at least with 1/(τrN).

### Additive TAH in the linear neuron: uncorrelated inputs

The drift of the i th synapse of a neuron receiving uncorrelated inputs and implementing the additive model is given by setting μ = 0 in Equation 8 with C
= 1/(τr) and all other effective correlations equal 0:
Equation 24As explained above, this linear system has no steady state. Imposing the boundary conditions by clipping the efficacies results in all synapses taking the value of either 0 or 1. Thus, the learned synaptic distribution is fully described by n_{up}, the ratio of the synapses that are saturated at the upper boundary. For a ratio n_{up}to be consistent, the drift of a synapse with efficacy 0 must be nonpositive, whereas the drift of a synapse with efficacy 1 must be non-negative. From imposing these conditions in Equation 24 we get:
The first inequality implies α ≥ 1 if n_{up} > 0. It is important to note that the regime of α < 1 would simply yield saturation of all efficacies at the upper boundary because all synapses experience a positive drift. The second condition yields:
However, it can be shown using methods similar to those of Rubin et al. (2001) that:
Equation 25where if this quantity is >1, n_{up} = 1. Therefore, if α ≤ 1 + 1/(2τrN), all synapses will saturate at the upper boundary, whereas if α > 1 + 1/(2τr), even a single synapse at the upper boundary will experience a negative drift, and hence no synapse will saturate at 1.

Moreover, the firing rate of the linear neuron is given by:
which is independent of the input firing rate r [except for very low rates, where all synapses become strong (i.e., n_{up} = 1 and r^{post} = r)]. Thus, output rate normalization is a property of the linear neuron when the additive model is used.

### Uniformly correlated inputs in the linear neuron

When the presynaptic inputs are uniformly correlated, namely c_{ij} = c ≥ 0 for all i ≠ j (c_{ii} = 1), the effective correlation matrix C
= [c + δ_{ij}(1 − c)]/(τr), and hence:
Equation 26Because C_{0} increases with c, the correlation increases the value of the synaptic efficacy in the homogeneous synaptic state w* (Eq. 15). To see the effect of the correlation on the stability of the homogeneous solution, note that because C_{1} decreases with c, the correlation decreases the value of (Eq. 16), and hence increases the stability of the homogeneous solution. Moreover, because both C_{0} and C_{1}decrease with 1/(τrN), for large τrN the critical μ (Eq. 22) approaches zero. Thus, for any μ > 0, the homogeneous synaptic state is stable when the effective population is sufficiently large.

### Computing the susceptibility matrix χ for the linear neuron

Here we compute the susceptibility matrix χ (Eq. 12, Materials and Methods) used in Results to evaluate the sensitivity measure S for the learning process in the linear Poisson neuron model. This matrix is obtained by the implicit function theorem. In the synaptic steady state (** w***), the synaptic drifts are zero by definition, and hence:
where R is the dimension of the space of input features. Using:
and denoting:
we obtain:
The matrices χ and M

^{o}are of dimensions N by R, and M is an N by N matrix. Below we derive χ for the two input scenarios studied in Results.

#### Two correlated input groups

For the case of two correlated subgroups, the sensitivity to the within-group correlation is measured. Hence, the input feature is Φ = c with R = 1. Using Equation 8with C
= 0, C
= c/(τr) if i ≠ j are in the same subgroup (C
= 1/(τr)), and C
= 0 otherwise, we derive the expressions for M and M^{o}:
where j ∈ 𝒢_{i} if synapse j and i are in the same group.

#### Delayed Poisson inputs

For a neuron receiving time-shifted versions of a common Poisson spike train ρ
, the input processes are ρ
(t) = ρ
(t − Δ_{i}) (i = 1, … , N), where Δ_{i} is the delay of the activity at synapse i. Substituting these into Equation 6, we obtain the effective correlation matrices (Eq.7):
Equation 27where Θ denotes the Heaviside step function (Θ(x) = 1 if x ≥ 0 and Θ(x) = 0 otherwise). In this case, the input features Φ = Δ and R = N − 1. Based on Equations 8 and 27, we derive the expressions for M and M^{o}. For compactness, we define the matrix of relative delays D_{ij} = (ε + Δ_{j} − Δ_{i}), with Δ_{N} = 0 corresponding to the delay of the reference spike train. Thus:
where Θ denotes the Heaviside step function and all sums are taken over the N weights. The expressions are evaluated at the synaptic steady-state ** w***, which is obtained by numerically simulating the learning equations for all N synapses.

## Footnotes

↵* R.G. and R.A. contributed equally to this work.

Partial funding from the Studienstiftung des deutschen Volkes, the Institut für Grenzgebiete der Psychologie, the Large Scale Facility Program of the European Commission, the Horowitz Foundation, the German–Israeli Foundation for Scientific Research and Development, the Volkswagen Foundation, the Israel Science Foundation (Center of Excellence 8006/00), and the USA–Israel Binational Science Foundation is gratefully acknowledged. We thank the staff of the Methods in Computational Neuroscience 2000 summer course at Woods Hole, Prof. E. Ruppin, and O. Shriki for useful discussions. We gratefully acknowledge the valuable discussions with Prof. L. Abbott that inspired our present investigation. We thank Prof. A. Aertsen and the anonymous referees for helpful comments and suggestions.

Correspondence should be addressed to Dr. Haim Sompolinsky, Racah Institute of Physics, Hebrew University of Jerusalem, Jerusalem 91904, Israel. E-mail: haim{at}fiz.huji.ac.il.

R. Gütig's present address: Institute for Theoretical Biology, Humboldt University, 10115 Berlin, Germany.