Abstract
A common problem in tasks involving the integration of spatial information from multiple senses, or in sensorimotor coordination, is that different modalities represent space in different frames of reference. Coordinate transformations between different reference frames are therefore required. One way to achieve this relies on the encoding of spatial information with population codes. The set of network responses to stimuli in different locations (tuning curves) constitutes a set of basis functions that can be combined linearly through weighted synaptic connections to approximate nonlinear transformations of the input variables. The question then arises: how is the appropriate synaptic connectivity obtained? Here we show that a network of spiking neurons can learn the coordinate transformation from one frame of reference to another, with connectivity that develops continuously in an unsupervised manner, based only on the correlations available in the environment and with a biologically realistic plasticity mechanism (spike timing-dependent plasticity).
- spatial transformations
- population coding
- correlation-based learning
- spike timing-dependent plasticity
- proprioception
- vision
Introduction
To integrate information from different senses and to perform sensory-guided motor tasks, the brain must perform transformations between different frames of reference. In general, these transformations must be learned and remain plastic, to some degree, in adult life.
A biologically plausible approach to performing such computations is to use basis functions (Salinas and Abbott, 1995; Pouget and Sejnowski, 1997; Pouget and Snyder, 2000), a general method for approximating nonlinear functions. In a neuronal context, the population response patterns for different stimulus locations serve as a set of basis functions. The desired transformation function can be approximated by taking a weighted sum of basis functions, i.e., by connecting the population representing the input variable to a population representing the transformed variable, with appropriate synaptic weights. Basis function networks have been used to model both sensorimotor transformations (Salinas and Abbott, 1995; Pouget and Sejnowski, 1997) and cross-modal integration (Deneve et al., 2001) with predictions that are in good agreement with experimental results.
For basis function networks to be a viable candidate for the actual mechanism of spatial transformations in biological nervous systems, we must show that the appropriate pattern of weights can be learned based only on interactions with the environment and using biologically realistic learning rules. These criteria are not met by existing models of basis function networks, for which synaptic weights are set by hand (Pouget and Sejnowski, 1994; Deneve et al., 2001; van Rossum and Renart, 2004), learned with a supervised learning rule (Pouget and Sejnowski, 1997) or, with a Hebbian-like rule, learned during a separate training phase, in which the outputs are not driven by the inputs but are set to the desired values (Salinas and Abbott, 1995).
We hypothesized that spike timing-dependent synaptic plasticity (STDP) (for review, see Dan and Poo, 2004) could form the substrate for learning spatial transformations in a basis function network. In a spiking network model with population coding of stimulus location and with STDP, the interaction of input correlations and lateral connections imposes structure on the connectivity pattern so as to form cortical maps (Song and Abbott, 2001), i.e., the simple topographic mapping x → x. It occurred to us that if a different structure could be imposed, then more complex transformations, x → f (x), could be learned.
Because potentiation–depression is favored by correlated–uncorrelated inputs, respectively (Song et al., 2000), the appropriate pattern of weights may be achieved by imposing an appropriate pattern of correlations. For learning cross-modal or sensorimotor spatial transformations, an obvious source for these correlations is the input from the other modality or measurement of the motor outcome. An output population having a preexisting, simple topographic mapping from one modality may learn to perform a more complex transformation from a different modality, because connections from input neurons selective for x to those output neurons being activated by a “training” input f (x) (through the preexisting topographic connections) will be potentiated, whereas other connections will be depressed. Ultimately, an input x will elicit an output f (x), even in the absence of the training input.
Here we show that such a system is indeed able to learn coordinate transformations without an external supervisor. We explore the conditions for successful learning and make specific predictions, for example, of interactions between the temporal scales of STDP and of the learning environment and of the transformation accuracy as a function of stimulus location.
Materials and Methods
Model description.
For functions of one variable, the network consists of three populations, each consisting of a one-dimensional array of cells, of size N (see Fig. 1). The “input population” makes all-to-all excitatory connections to the “output population.” The strength of these connections is modifiable by STDP, with a maximum synaptic weight of wmax. Connections from the “training population” to the output population are topographic and local, in the sense that neurons in the middle of the presynaptic population project only to neurons in the middle of the postsynaptic population, etc.: where wijTr is the connection weight between presynaptic neuron i and postsynaptic neuron j (see text after Eq. 4), dij is the shortest distance between i and j given the periodic boundary conditions (0 ≤ i,j < N), σTr is the range of connections, and ρTr is the ratio of training–output to input–output maximum weights. (All parameter values are given in supplemental Table 1, available at www.jneurosci.org as supplemental material). In the standard model, the strength of these connections is fixed. We also consider the case in which these connections are modified by STDP. In this case, we also add local excitatory and global (all-to-all) inhibitory connections within the output population. Specification of excitatory connections follows Equation 1, but with ρTr and σTr replaced with ρRc and σRc, respectively. The strength of the global inhibitory connections is given by ρInh wmax. We use periodic boundary conditions, i.e., a neuron in the training population at position N will make connections to the neuron at position 1 as well as that at position N−1. It is possible to learn multiple functions of a given input variable simultaneously: for each function, one training and one output population is required. For functions of two variables, the input population is a two-dimensional array of size N× N. The synaptic time delays are zero for the connections from the input and training populations to the output population and for inhibitory connections within the output population, and 2 ms for excitatory connections within the output population. Each cell in the output population receives excitatory, “background” Poisson input at 1000 Hz through a synapse with weight wb. This single high-frequency input is equivalent to the combined inputs of a large number of lower-frequency afferents.
Because the input and training populations do not themselves receive any input, they are modeled simply as spike sources. The output population cells are modeled as integrate-and-fire neurons. The membrane potential of neuron j, Vj, obeys the following: where τm is the membrane time constant, Rm is the membrane resistance and where Ijb is the background noise synaptic current and Iijk(t) (k = {In, Tr, Rc, Inh}) are the synaptic currents from the input, training, recurrent excitatory, and recurrent inhibitory connections. Note that the recurrent connections are only used for the subset of simulations in which the training–output connections are not fixed but are subject to plasticity (see Fig. 9), in which case recurrent connections are necessary for correct weight-pattern formation. The cell fires an action potential when Vj reaches a threshold value of 1 and is then reset to 0. There is no refractory period. Excitatory synaptic inputs are modeled as a step change in the synaptic current followed by exponential decay, i.e., the excitatory currents Iijk(t)(k={In,Tr,Rc}) obey the following: Iijk(t) is incremented by an amount wijkI0exc after each spike, where wijk is the synaptic weight and the value of I0exc is calculated such that a single presynaptic spike with wijk = 1 will bring the membrane potential from rest (zero) precisely to spike threshold (1). This normalization means that the precise value of Rm is unimportant. The background noise current evolves similarly. Inhibitory synaptic currents follow an α-function-like time course. The inhibitory current Iijinh(t) obeys the following: ξij is incremented by an amount winhI0inh by every spike. The value of I0inh is calculated such that a single presynaptic spike with winh = 1 will bring the membrane potential from rest (zero) precisely to a membrane potential of −1.
Use of a direct change in synaptic current rather than a change in conductance for the synaptic inputs is somewhat less realistic but has the advantage that the times of threshold crossings can be obtained by successive approximations that converge to the true value (Hines and Carnevale, 2004) rather than by integrating the differential equations. The former method is much faster than the latter. To test that this simplification did not appreciably affect our conclusions, we repeated some simulations with conductance-based (excitatory) synapses [the model described by Song and Abbott (2001)]; we found no qualitative difference in the formation of the weight pattern. The default value of wmax used in our simulations was chosen to match approximately the peak conductance parameter gmax used by Song and Abbott (2001). From the resting membrane potential, their default value of gmax = 0.02 (in units of the leakage conductance) produces a depolarization of amplitude 0.233 mV, or 1.2% of the difference between rest and firing threshold. This amplitude would be somewhat smaller near threshold, because the driving force would be reduced from 74 to 54 mV. In our model, the default value of wmax = 0.02 produces a depolarization the amplitude of which is 2% of the difference between rest and threshold. Thus, the current-based and conductance-based models are comparable, although not identical.
STDP models.
In most of our simulations we use the mechanism for STDP proposed by Song et al. (2000). This takes into account multiple spike interactions and not just the most recent spikes, with two functions, P(t) and M(t), for each synapse. P(t) starts at zero and is incremented an amount A+ by each presynaptic spike. M(t) starts at zero and is decremented an amount A− by each postsynaptic spike. Between increments–decrements, P(t) and M(t) decay exponentially toward zero with time constants τ+ and τ−, respectively. When a presynaptic spike occurs, the synaptic weight is modified according to w → w + wmaxM(t), where wmax is the maximum possible synaptic weight (for brevity, we henceforth use w to represent any of the connection weights wijk). Because M(t) is zero or negative, this amounts to a depression of the weight. If this would take w below zero, w is set to zero. When a postsynaptic spike occurs, the synaptic weight is potentiated according to w → w + wmax P(t). If this would make w > wmax, w is set to wmax. For a single pair of spikes, a presynaptic spike at time tpre and a postsynaptic spike at time tpost, with no recent preceding activity, the change in synaptic weight is given by the following (supplemental Fig. 1A, available at www.jneurosci.org as supplemental material): The values of A+ that we used, 0.01 and 0.001, were in the region of the value (0.005) determined previously (Song et al., 2000) from experimental measurements by dividing the amount of potentiation caused by multiple spike pairs by the number of pairs. The ratio A−τ−/A+τ+ was taken as 1.06, as in Song and Abbott (2001), although the precise value of this ratio is not critical (see supplemental note and supplemental Fig. 6, available at www.jneurosci.org as supplemental material).
We also consider a model with soft bounds for the synaptic weights, rather than the hard bounds at zero and wmax. In this case, w → w + wM(t) for depression and w → w + (wmax − w)P(t) for potentiation, giving, for a single pair of spikes, the following: In some simulations, we consider an STDP mechanism that is symmetric, i.e., whether the change is positive or negative does not depend on the sign of the interval Δt between presynaptic and postsynaptic spikes but only on its magnitude. For short intervals, we have potentiation; for long intervals, we have depression (supplemental Fig. 1B, available at www.jneurosci.org as supplemental material). This rule does not use the P(t) and M(t) functions, and so only interactions between nearest-neighbor spikes are considered, as follows: where Δt = tpost − tpre, Asymm controls the step size for synaptic change, and τa and τb are parameters that control the shape of the weight-change function f(Δt) (supplemental Fig. 1B, available at www.jneurosci.org as supplemental material). τa is the interval at which f(Δt) crosses from positive to negative, and τb controls the range of intervals within which STDP occurs and is analogous to τ+ and τ− in the Song–Miller–Abbott model (Song et al., 2000). Such a symmetric form for STDP has been found at GABAergic synapses in hippocampal neurons (Woodin et al., 2003). The values for τa and τb were determined by approximate fitting of the weight pattern to that obtained with the symmetric rule (supplemental Fig. 2, available at www.jneurosci.org as supplemental material).
In the Song–Miller–Abbott model (Song et al., 2000), the balance of potentiation and depression is controlled by the ratio A+τ+/A−τ−. In the symmetric STDP model, the balance is controlled by the ratio τa/τb. If this ratio is greater than √2, the integral of f(Δt) from zero to infinity is positive, and so potentiation will dominate.
Learning procedure.
The input and training neurons generate spikes according to a Poisson process with a time-varying mean firing rate. The following procedure is identical to that of Song and Abbott (2001), but with a different spatial distribution of firing rates. An interval T is chosen from an exponential distribution with mean interval τcorr. A random location θ in the input reference frame is chosen from a uniform distribution between 0 and 2π. From this, a location f(θ) in the training/desired output reference frame is calculated, where f(·) is the function that we wish the network to learn, and mapped onto the range 0 to 2π. The mean firing rate for a cell with a tuning curve that is centered at location Θ is then the following: where Thus R = Rmax at Θ = θ′. Note that R(·) is a periodic function, i.e., a neuron with a receptive field centered at one end of the input range will respond also to inputs at the opposite end of the range. Use of such periodic boundary conditions avoids the problem of edge effects, whereby locations at the edge of the input range would be encoded by a smaller population than locations in the center, although checks showed that the choice of periodic boundary conditions was not critical for the model (supplemental Fig. 7, available at www.jneurosci.org as supplemental material). The cells fire with rates determined by R(θ, Θ) for a time T, and then another interval and a new location are chosen and the process is repeated. Spike arrival times may be delayed by a constant latency. For our default case, the latency was chosen to be the same for the input and training populations, but the effect of non-zero differences in latency was investigated (see Results).
For the two-dimensional case, the mean firing rate for an input population neuron at location (Θ, Φ) is as follows: where For the training population neurons, the tuning curve is the same as for the one-dimensional case.
Position estimation from population spiking activity.
The question of how nervous systems can optimally read out a population code (in our case, estimate the stimulus location) is the subject of much study (Seung and Sompolinsky, 1993; Lewis and Kristan, 1998; Deneve et al., 1999). Because this is not the subject of our current investigation, we chose to use a simple, non-neural-based method as follows: each spike train was convolved with a Gaussian kernel with SD 100 ms and area 1, to give a smooth measure of mean firing rate over time. The mean location, ȳ, was calculated at 10 ms intervals by summing the neuron indices weighted by the firing rate of that neuron at that time and dividing by the sum of the firing rates across the population (Georgopoulos et al., 1986), as follows: where Ri is the firing rate of the neuron and N is the population size. The periodic boundary conditions introduced a little complexity into this method, because a peak could be split, one part at one end of the array and one at the other. The method was therefore modified as follows. For ξ ≤ N/2: whereas for ξ > N/2: where ξ is the nearest integer to ȳ. To determine ξ, every i was tested as a value of ξ, and the i that gave both the smallest difference | ξ − i | and the smallest variance for y was used for ξ.
Numerical methods.
The model was implemented in the NEURON simulation environment in an event-based framework (Hines and Carnevale, 2004), with the CVODE integration method and an absolute tolerance parameter of 0.001. The NEURON code for the model is available from the SenseLab ModelDB database (http://senselab.med.yale.edu/SenseLab/ModelDB).
Results
For clarity, we consider a specific example of a coordinate transformation: calculating the position in visual, Cartesian coordinates of the hand on an arm with one or two degrees of freedom, based on the joint angle(s) obtained from proprioception. The arm is highly simplified, with two segments of equal length with free rotation in a plane about each of two joints.
During random arm movement, the network receives input from both sensory systems and learns the correlations between these signals at different points in space. After sufficient time, the visual input is no longer necessary, and the network can calculate the position of the hand in visual coordinates based only on the proprioceptive information. The network is shown in Figure 1. It consists of one population of cells for each input modality and one for the network output. Each population represents a spatial variable or location using a population code. The pattern of connections from the visual population to the output population is fixed and is assumed to have been developed at an earlier stage. The strengths of connections from the proprioceptive population to the output population are initially random and subject to change according to STDP. Full details are given in Materials and Methods.
Although we focus on a particular example, the network is capable, in principle, of learning any single-valued function of one variable. For generality, therefore, we label the three populations as the training (here vision), input (here proprioception) and output populations.
Learning spatial transformations
We consider first a one-dimensional coordinate transformation, which corresponds to an arm with one joint fixed. The function to be learned is f(θ) = sin(θ). After <1000 s, an “S”-shaped band becomes visible in the matrix of synaptic weights (Fig. 2A). As the training proceeds, connections within this band (henceforth described as the S-band) are potentiated, whereas those outside are depressed. The process converges by ∼20,000 s (Fig. 2B). The final distribution of weights is strongly bimodal, in accord with the findings of Song et al. (2000). The mean firing rate across the population changes little over the course of the learning: from 7.0 Hz during the first 10 s to 8.4 Hz after 50,000 s. Despite the small change in mean firing rate across the population and a reduction in the total weight of all synapses onto a given cell during the course of training, there is a large increase in the incidence of small interspike intervals (ISIs) (Fig. 2C). This is in accord with previous findings that STDP tends to reduce spike latency (Song et al., 2000; Guyonneau et al., 2005).
To test whether this S-shaped pattern of weights does indeed cause the network to perform the transformation sin(θ) (i.e., enable the system to predict the location of the hand based only on proprioceptive information, with no visual input), the training input was removed (“eyes closed”), the STDP mechanism was inactivated, and then θ was swept over the entire range. The output population does indeed produce a good approximation to sin(θ) (Fig. 2D), although it tends to underestimate near the extremes (sin(θ) = ±1). The reasons for this underestimation were investigated further (see below, Influence of population tuning curve shape and Fig. 6).
The mechanism for the formation of the bands of potentiated weights is the potentiation of correlated inputs shown by Song et al. (2000). Only weights that are activated simultaneously by both the input population and the training population are potentiated. Because of competition between synapses (Song et al., 2000), connections outside the bands are depressed. The center of each band is defined by those connections that are activated simultaneously by the peak of both input and training tuning curves. The distribution of postsynaptic–presynaptic pair intervals is plotted in Figure 3 as a function of the distance from the center of the S-band. Close to the mid-band line, we see an excess of small intervals between postsynaptic and presynaptic spikes and an excess of positive-over-negative intervals (Fig. 3, left panel). In the region of near-zero weight connections, we see the reverse effect: a deficit of short intervals in general and of short positive intervals in particular (Fig. 3, right panel). It is the location-dependent excess–deficit of positive intervals that leads to the formation of the weight patterns with the asymmetric STDP rule; however, the even greater relative excess–deficit of short intervals in general suggests that a symmetric STDP rule would give the same pattern with a shorter learning time. This proves to be the case, although the outcome is strongly dependent on the size of the time window for plasticity, because this controls the balance of potentiation–depression (supplemental Fig. 2, available at www.jneurosci.org as supplemental material).
Figure 4 illustrates some of the other functions that can be learned by the network. For linear functions, the function approximation is equally accurate for all input values, although the accuracy decreases as the number of input values that give the same output value increases. For nonlinear functions, the accuracy is poor near the extremes of the output range, particularly for trigonometric functions. The higher the order of the trigonometric function, the poorer the accuracy. This is the same phenomenon as the underestimation noted two paragraphs previously.
The network performs equally well for coordinate transformations in two dimensions (Fig. 5), although the time required for learning to converge is increased.
Influence of population tuning curve shape
We hypothesized that the poorer accuracy of trigonometric function approximation near the extremes of the function range, noted above, is attributable to the width of the tuning curves. Our hypothesis was that for functions that have multiple input values giving a single output value, the input values compete with one another, because each is correlated with the training input. The degree of competition is larger the larger the width of the tuning curve. The winning input is the one that has the most neighbors, i.e., the one in the widest part of the band at a given output neuron index. This can be seen in Figure 6A. For high values of σR, the peaks of the S-band are “stunted” (i.e., the peak amplitude is smaller and the peak is flatter) compared with the small-σR cases. The effect of this is that the network response at the extremes of the range has a bias toward the center (Fig. 6B). The effect is also larger the closer together are the weight bands; hence the deterioration of approximation accuracy in the series of functions sin(nθ) as n increases, seen in Figure 4. For linear functions, competition also exists, but the width of the weight band is constant across the function range, and so the effect on function accuracy is not systematic, although it may increase the noise in the approximation. This effect is in accordance with standard results from traditional radial basis function networks: too large a basis function width leads to over-smoothed output functions and too narrow a width leads to under-smoothed output functions (Bishop, 1995).
Thus, in general, the smaller the width of the population tuning curve, the better the function approximation for nonlinear, many-to-one functions such as the trigonometric functions. The narrower the tuning curve, however, the higher the input firing rates required, and there is clearly a lower limit at the point where the tuning curve is so narrow that it activates only a single cell. Close to this limit, the approximation becomes noisy (see the graphs for σR = 0.01 in Fig. 6A,B and the plot of approximation error against σR in Fig. 6C).
Turning from the width to the height of the tuning curve, the network is robust to changes in the peak input firing rate (we tested 15 Hz ≤ Rmax ≤ 240 Hz) when the asymmetric STDP rule is used, but it is much less robust with the symmetric rule (supplemental Fig. 3, available at www.jneurosci.org as supplemental material). The lower limit for Rmax is hard to determine because of the low rate of learning when there are few spikes. At 5 Hz, a faint S-band very slowly becomes visible, but the off-band synapses do not become depressed, so the noise in the transformation would be considerable.
The learning procedure is also robust to differences between the peak firing rates of the input and training populations. For a ratio of RmaxTraining/RmaxInput in the range 0.6 to 1.4, there is almost no discernible change in the weight pattern, and the pattern remains identifiable, although degraded, for ratios as small as 0.1: a peak training firing rate of only 6 Hz for the default input Rmax of 60 Hz. With no activity in the training population, the preferred input location for a given output neuron is randomly located, as shown in Song and Abbott (2001), their Figure 2.
The effect of having a non-location-specific baseline input firing rate, i.e., non-zero Rmin, is similar to the effect of increasing the background activity in the output population and is discussed below (see below, Influence of network activity and connectivity).
Influence of the spatiotemporal structure of learning
The procedure that we use for generating correlated activity [taken from Song and Abbott (2001)] has a number of drawbacks as a biologically realistic training–learning procedure. First, it assumes that all points in the input space are visited with equal frequency; second, it requires dwell times (i.e., the time that an input stays in a particular location) that are approximately the same as the STDP time constants (∼20 ms with our default parameters); and third, it ignores travel time between input locations. The second and third assumptions may be reasonable for saccades but not for arm movements, our principal example here.
To test the first point, whether all input points must be visited with equal frequency, we performed simulations in which the input angle θ was chosen from a normal distribution (mean, 180°; SD, 72°). The S-band forms first for synapses from neurons encoding locations that were most often visited and only slowly forms for rarely visited locations; however, by ∼40,000 s the entire pattern has formed and stabilized, as for uniform coverage. In contrast, when the coverage is uniform except for a region that is never visited (of width 72°), a horizontal band of synapses retains their original weights, and the output population is unable to respond to inputs at that location.
Concerning the second point, the requirement for short dwell times, Song and Abbott (2001) found that there was a fall-off in the correlation-induced potentiation as the correlation time (the mean dwell-time), τcorr, increased, but that this fall-off was reduced considerably by increasing the time constant of synaptic depression, τ−. We tested how this fall-off affects the formation of weight patterns and found, as expected from the result of Song and Abbott (2001), that increasing τcorr degrades the pattern severely and ultimately degrades the pattern completely; however, if a longer time constant (Feldman, 2000), on the same order as the correlation time, is used for STDP, or if a symmetric STDP rule is used, the pattern formation is preserved (Fig. 7). The symmetric STDP rule is more robust because in those regions where the input and training inputs are not strongly correlated, the deficits of both positive and negative small postsynaptic–presynaptic intervals (compare Fig. 3, right panel) contribute to the dominance of depression, whereas with the asymmetric rule, the deficit of positive intervals must outweigh the deficit of negative intervals, which is only the case when τcorr is small. The shape of the distribution of dwell times and the individual spike train statistics do not appreciably affect the weight pattern (for details, see supplemental note, available at www.jneurosci.org as supplemental material).
To address the third point, the instantaneous movement between input locations, we trained the network with input locations that were changing smoothly and continuously rather than jumping from point to point. The input location moved alternately clockwise and counterclockwise, with a constant speed, for random time periods drawn from a uniform distribution between 100 and 1100 ms. This input pattern results in successful training, provided the speed of motion is sufficiently fast. For τ− = 20 ms, the speed must be ≥2°/ms; for τ− = 100 ms, a speed of 0.2°/ms is sufficient. The latter value is similar to the maximum joint speeds seen in infants during spontaneous arm movements (Thelen et al., 1993), although more typical values are in the region of 0.05°/ms (Bhat et al., 2005). For slower movements, regions in which the rates are correlated but low are potentiated in addition to the usual S-band of connections with correlated, high firing rates. The requirement for high speeds is related to that for short dwell times in the standard, saltatory input movement: inputs must fire together to produce potentiation but then quickly stop firing to avoid depression. With further tuning of parameters, it is probable that slower, smooth movement could give results comparable to saltatory movement; this is a subject for future study.
We would expect that the temporal structure of learning will also interact with the membrane time constant. The effect of the membrane time constant is rather complex. The optimal value, given the default parameters, is in the range 10 to 20 ms. Outside these values, the weight pattern begins to degrade, although, interestingly, after a nadir around τm = 40 ms, the pattern recovers considerably. The root mean squared positional error (compare Fig. 6C) is 4.7% of the range at the default τm = 20 ms, 18.0% at 40 ms, and 6.3% at 400 ms.
We have so far assumed that the input and training signals are exactly coincidental; however, the latency required for a signal to travel from peripheral receptors to the region in which coordinate transformations are performed (e.g., the parietal cortex) is likely to be different for the two modalities. For example, in the ventral intraparietal area of macaque monkeys, the mean latency of visual signals is ∼86 ms, whereas that of tactile signals is ∼31 ms (Avillac et al., 2005). Furthermore, in our example of a moving arm, tracking eye movements are required to follow the movement, so the visual input is likely to lag behind the proprioceptive. We therefore investigated how robust the learning is to differences in signal latency between the input and training signals (Fig. 8).
We define the difference in latency, Δlat, to be positive when the input signal precedes (has lower latency than) the training signal and to be negative otherwise. We found that the upper limit of the window of Δlat within which the correct transformation is learned is approximately proportional to (approximately three to five times) the correlation time τcorr. The above-mentioned latency difference found experimentally in macaque falls within this window. The lower limit is smaller in amplitude and depends on both τcorr and the time constant for the depression part of the STDP curve, τ−. When both quantities are 100 ms, the limiting value for Δlat is approximately −30 ms. As the difference in latency becomes more negative than this, there is another window in which the reverse weight pattern is learned (Fig. 8A).
The mechanism for this may be understood with reference to the histogram of postsynaptic–presynaptic intervals, Δt (Fig. 3). Initially, the postsynaptic firing time is determined mainly by spikes from the training population, because those from the input population have random weights. If the latency difference, Δlat, is positive (input earlier than training), the peak in the interval histogram is pushed toward positive Δt, into the potentiation part of the STDP curve. Therefore, learning will only begin to fail when the peak of the interval histogram exceeds the correlation time. In the converse case, in which Δlat is negative, the histogram peak is pushed toward negative Δt, because the training spike causes an action potential and then the input spike (with longer latency) arrives. With no difference in latency, there is an excess of postsynaptic–presynaptic pairs with small positive intervals. When a sufficient amount of this excess has been pushed into the depression part of the STDP curve, weights that are normally potentiated will be depressed. Because of the competitive nature of the Song–Miller–Abbott (Song et al., 2000) STDP rule, weights for which there is reduced correlation between inputs will be potentiated, leading to the observed reversal in the weight pattern.
Influence of training pathways
Learning the correct pattern of synaptic weights for the connections from the input population to the output population clearly requires a topographic pattern of connections from the training population to the output population; however, the precise strength and range of these connections are not critical (see supplemental note, available at www.jneurosci.org as supplemental material), but there is a certain minimum level of input to the output population from the training population that is required to mold the weight patterns effectively.
We now asked whether the weight pattern is stable if the training input is removed altogether but the STDP mechanism is still active. This is important because the intention is to be able to predict, for example, the location of the hand based only on proprioceptive signals without visual input. This has straightforward applications such as being in the dark or in the case of becoming blind. Must there be an upper boundary to the critical developmental period, after which weights must be frozen, or can plasticity be ongoing?
In fact, the weight pattern is very stable (supplemental Fig. 4, available at www.jneurosci.org as supplemental material), although the degree of stability depends on the peak step-size of the weight change (A+ and A−). We conclude that differences between the plasticity rules operating during development and during adulthood are not required (nor are they precluded) and that this mechanism can operate in the adult brain and subserve slow adaptation. If a completely different training pattern is applied (consider wearing prism glasses, for example), the originally learned weight pattern is quickly forgotten and then a new one forms.
Up to this point we have made the assumption that the connection weights from the training population to the output population (visual synapses, in our example) have been determined by an earlier phase of development and no longer undergo plasticity. We believe that this assumption is a reasonable one (see Discussion) but nevertheless examine situations in which plasticity is ongoing in the training connections. If we start from a situation in which the training connections have been prespecified, the questions are whether the preexisting pattern is stable with the addition of the new modality and whether the input to output weights develop as desired. Figure 9A indicates that this does seem to be the case: the diagonal band undergoes some changes but remains recognizably diagonal.
If both input and training connection weights start from random values, then recurrent excitatory and inhibitory connections within the output population (see Materials and Methods) and larger values of A+ (i.e., 0.1 rather than 0.01) are required for smooth weight bands to form. Even then, the connections develop with an arbitrary phase, i.e., the network maps θ to sin(θ + ψ), where ψ takes a random value and tends to wander as training continues (Fig. 9B). If the global inhibitory weights within the output population are too weak, then the recurrent excitatory weights tend to saturate in such a manner that ψ precesses, moving always in the same direction. On the basis of this, we conclude that it is necessary for the training pathways (visual pathways, in our example) to develop first (see Discussion) but that plasticity mechanisms can remain active even after this development has taken place.
Influence of plasticity rules
We have shown above that both asymmetric and symmetric STDP rules can produce the desired weight patterns, although the symmetric rule is more robust to the temporal structure of the sensory inputs, whereas the asymmetric rule is much less dependent on the amplitude of the population tuning curve. The precise amplitudes and time constants of potentiation and depression are not critical, provided they obey certain conditions (see supplemental note and supplemental Figs. 5 and 6, available at www.jneurosci.org as supplemental material); however, the dependence of the amplitude of weight changes on the current weight is considerably more important. Clearly, the formation of the banded pattern relies on the bimodal distribution of synaptic weights produced by the hard boundaries (no dependence of Δw on w) of the Song–Miller–Abbott (Song et al., 2000) STDP mechanism. For soft weight boundaries (linear dependence of Δw on w) that give a unimodal distribution (Gütig et al., 2003), a banded pattern also forms, but the modulation of the weight pattern by correlation is much weaker; however, the location calculated from the population activity still accurately follows the sine curve, although it is much noisier than with the bimodal distribution (Fig. 10). The combination of a unimodal distribution with pruning of weaker synapses produces patterns similar to the hard-boundary mechanism; however, with the default parameters used in our simulations, the threshold for pruning has to be a substantial fraction (∼0.8) of the mean synaptic weight.
Influence of network activity and connectivity
The network behavior is not strongly dependent on the size or connectivity of the network, provided that the level of activity to each cell is maintained approximately the same by adjusting other parameters. For example, a network of 500 neurons per population in which each input cell connects to 20% of output cells behaves similarly to a network with populations of 100 neurons in which each input cell connects to all output cells. For networks smaller than this, increasing the maximum firing rate Rmax proportionally to the decrease in network size has a similar effect. The lower limit on network size is dependent on the number of input locations that give a similar output (compare Fig. 4), but the network performs surprisingly well with very few neurons. With N = 20, the network can learn to perform f(θ) = sin(3θ) about as well as a network of 100 neurons per population but fails on f(θ) = sin(4θ), whereas the network with N = 100 succeeds.
We chose to use periodic boundary conditions to avoid edge effects. For problems involving a body-centered coordinate system, such boundary conditions are consistent with the physical situation, because a stimulus could be at any point on a full circle around the body, although whether such conditions are implemented in the brain in the pattern of connections and how such connections could be formed are open questions. For problems involving arm movements, periodic boundary conditions are not appropriate, because the ranges of joint motion are limited. We performed simulations with nonperiodic boundary conditions in which the arrays of neurons were padded with extra neurons with receptive fields centered outside the input range. With no padding, the population estimate is inaccurate when the output location is near the edge of the array, because there is only half a tuning curve at the edge. When the padding is 0.2 N cells at each edge (about half the width of the S-band), there is a little nonspecific activity in the neurons at the edge of the array, but this has little effect on the population estimate, which is comparable in accuracy to the network with periodic boundary conditions (supplemental Fig. 7, available at www.jneurosci.org as supplemental material).
For an asymmetric learning rule with hard bounds on the weights, increased activity biases the distribution toward the peak at zero (Song et al., 2000). If there is insufficient activity in the network, the “off-band” weights are not depressed toward zero. If there is too much, then all weights are depressed toward zero, and the band of potentiated weights fades out (supplemental Fig. 8, available at www.jneurosci.org as supplemental material). The effect has a complex dependence on whether the activity is presynaptic or postsynaptic. Adding background noise to each postsynaptic cell, adding a non-location-specific term (Rmin) to the tuning curve, increasing the height of the tuning curve (Rmax), or increasing the peak synaptic weight (wmax) all separately have the similar effect of depressing off-band weights toward zero; however, combining both background noise and nonspecific input leads to a failure of learning. With Rmin = 10 Hz and low levels of background noise, the weight band forms as desired. If the background noise is increased, however, then off-band weights are not depressed (the opposite to what would be seen with Rmin = 0), and the weight pattern appears almost random.
Discussion
The principal conclusion of this study is that the connectivity patterns required for performing coordinate transformations between sensory frames of reference can be learned in an unsupervised manner, with the STDP learning rule, based on simultaneous observations of moving stimuli with two sensory modalities. This adds support to the hypothesis that basis function networks are actually used in biological neural systems [see Pouget and Sejnowski (1997) for a fuller discussion of the evidence supporting the existence of basis function networks in parietal cortex].
Correlation-based learning in spiking models
The idea of using basis function networks to perform coordinate transformations has been developed extensively (Salinas and Abbott, 1995; Pouget and Sejnowski, 1997; Deneve et al., 2001; Deneve and Pouget, 2003), but most existing models consider only the mean firing rate. The principal disadvantage of this approach is the difficulty in relating the model to underlying biological mechanisms and hence in constraining it with biological data. In contrast, an approach that models synaptic conductances and individual action potentials gives a more direct correspondence between model parameters and experimentally measurable quantities and hence allows the model to be more tightly constrained by experiments and to make more testable predictions. An example in this study is the interaction between the experimentally determined time constants of the STDP rule and the temporal structure of the motion that drives learning (Fig. 7). Another example is the prediction that position estimation should be poorer near the limits of the movement range, which is a direct consequence of the intersynapse competition inherent in the STDP rule used. One existing report has shown that a basis function network can be implemented with spiking neurons (van Rossum and Renart, 2004), but this was shown only for calculation of the sum of two variables and with synaptic weights set by hand.
Because the output of a basis function network is a linear weighted sum of the basis functions, determining the correct weights requires simply linear regression, calculated off-line or by using a supervised learning rule; however, neither one of these methods is available to biological nervous systems. A method that uses a Hebbian-like learning rule was developed and analyzed by Salinas and Abbott (1995). As in our model, the coordinate transformation is learned based on the correlations between randomly generated inputs and independent measurements of the transformed locations. During the learning phase, however, the input and output populations are decoupled, and the desired output response is applied directly to the output population: the input neurons do not drive the output neurons. In contrast, our model has no need for a separate training phase with decoupled neurons.
The idea of learning transformations through observation of randomly generated movements [“motor babbling” (Bullock et al., 1993)] has been widely used in models of sensorimotor transformations (Kuperstein, 1988; Gaudiano and Grossberg, 1991; Burnod et al., 1992; Bullock et al., 1993), with weights determined by error-correcting or generalized operant conditioning methods. Studies in kittens and infant monkeys have shown that visual feedback from limb movements is essential in the development of visually guided reaching (Hein, 1974; Held and Bauer, 1974). For humans, opposing external forces during spontaneous movements requires sight of the limb in newborns (van der Meer et al., 1995) but not in 3-month-old infants (Dibiasi and Einspieler, 2004). This also provides support for our finding that visual connections to the coordinate transformation network must develop before proprioceptive connections. This idea can also be applied to cross-modal integration as opposed to sensorimotor transformations, because the random inputs can be generated by an external mechanism rather than by the organism itself. The only requirement is for two independent measurements of the location.
Time-scale issues
The strongest influence on the learning behavior of the network is the interaction of the different time-scales in the model: of the STDP rule, of the rapidity of change in input location, and of the membrane time constant. To summarize, for saltatory movement, the mean dwell time at any one location must be on the same order as the STDP time constant for depression, τ− (Fig. 7), although this requirement is considerably relaxed for the symmetric STDP rule. Similarly, for continuous motion, the speed must be sufficiently high that the arm moves a distance approximately equal to the receptive field size during a time τ−. This is a strong constraint, which may possibly be eased if the two modalities do not always attend to the same stimulus, e.g., arm movements could be slower if the eyes make saccades to other regions of the visual field (during which the two modalities are not correlated) and only return periodically to observe the hand. The faster of the modalities would then be the limiting one. In general, however, we should expect to find either asymmetric STDP with τ− ≫ τ+ or symmetric STDP in the brain regions that are responsible for coordinate transformations. If the mismatch between STDP and movement timescale is too great, further mechanisms would be required.
Despite the temporal precision of the STDP rule, the model is quite robust to differences in response latency between the two modalities, particularly if the input signal precedes the training signal, as is found experimentally for tactile and visual signals in ventral intraparietal cortex in the macaque (Avillac et al., 2005). Here again, the robustness of the model is increased if τ− is longer than τ+.
Biological realism of the model
This model uses integrate-and-fire neurons, which are considerably lacking in complexity compared with real neurons. We have attempted, however, to set firing rates and excitatory postsynaptic potential sizes to biologically realistic values.
Our standard model relies on the training–output synaptic weights being fixed, and the input–output synapses being subject to plasticity (although we show that a preexisting pattern of plastic training weights is not appreciably degraded by the addition of the input connections). There are several cases in which different classes of excitatory synapses onto a single postsynaptic neuron have been shown to exhibit such differential plasticity. A classic example is the different NMDA dependence of commissural–associational and mossy fiber synaptic plasticity in hippocampal CA3 pyramidal neurons (Nicoll and Malenka, 1995). It has been shown more recently that synapses onto hippocampal CA1 interneurons that express group II metabotropic glutamate receptors (mGluRs) presynaptically exhibit mGluR-dependent long-term potentiation, whereas those that express group III mGluRs do not (Lapointe et al., 2004). To our knowledge, differential plasticity at different synapses has not been shown for postsynaptically expressed plasticity, although it has been shown that the subunit composition of AMPA receptors on hippocampal interneurons can be determined by the afferent identity (Tóth and McBain, 1998), which implies that the enzymatic machinery underlying plasticity with a postsynaptic locus might also be so regulated.
A criticism (van Rossum et al., 2000) of STDP mechanisms with hard bounds on the synaptic weights, such as that of Song et al. (2000), is that the resulting bimodal distribution of synaptic weights does not match the distributions of miniature EPSC amplitudes or AMPA receptor densities measured experimentally (cf. O’Brien et al., 1998), which are unimodal with positive skew. The validity of our model does not depend critically on a bimodal distribution of synaptic weights, because a network with soft-bounded STDP is also able to learn the desired transformation, albeit with increased output noise. We point out, however, that a bimodal distribution of weights where the maximum synaptic weight wmax follows a unimodal distribution may be difficult to discriminate from a unimodal distribution of weights where wmax has a single value, because synapses with zero weight [silent synapses (Malenka and Nicoll, 1997)] are likely to be difficult to detect and hence undercounted experimentally.
We have shown that the appropriate patterns of connectivity for basis function networks could be learned by using a spike timing-dependent, local learning rule. We do not make strong claims regarding in which areas of the brain such networks may exist. A strong case has been made previously that the response properties of neurons in the parietal cortex are in excellent agreement with the predictions of a basis function network model for the specific case of coordinate transformations from retinal to head-centered coordinates (Pouget and Sejnowski, 1997). The superior colliculus is another area in which there is strong evidence for sensorimotor–coordinate transformations (Jay and Sparks, 1984; Stuphorn et al., 2000).
Although we use the transformation from joint angle to visual coordinates as our principal example, the model we have presented here is not intended as a detailed model of arm movements (Baraduc et al., 2001; Joshi and Maass, 2005). In particular, proprioceptive tuning curves appear to be monotonic (Helms Tillery et al., 1996) rather than Gaussian as in our model, although two monotonic tuning curves with opposite slopes can be combined to give a peaked tuning curve (Salinas and Abbott, 1995).
Additional studies are therefore required to determine whether the principles elucidated in this study will still apply when the neuronal and circuit diversity of known cross-modal–coordinate transformation areas is included in the model and when the model is extended to encompass the entire sensorimotor loop.
Footnotes
- Received June 20, 2005.
- Revision received April 7, 2006.
- Accepted April 11, 2006.
-
This work was supported by the European Union under the Life-Like Perception Programme, project reference IST-2001-34712 (SenseMaker) and the Bio-inspired Intelligent Information Systems program, project reference IST-2004--15879 (FACETS). We thank A. Destexhe and M. Rudolph for helpful comments on this manuscript.
- Correspondence should be addressed to Andrew P. Davison, Unité de Neurosciences Intégratives et Computationnelles, Centre National de la Recherche Scientifique, 91198 Gif sur Yvette, France. Email: davison{at}iaf.cnrs-gif.fr