Abstract
A sensory stimulus evokes activity in many neurons, creating a population response that must be “decoded” by the brain to estimate the parameters of that stimulus. Most decoding models have suggested complex neural circuits that compute optimal estimates of sensory parameters on the basis of responses in many sensory neurons. We propose a slightly suboptimal but practically simpler decoder. Decoding neurons integrate their inputs across 100 ms, incoming spikes are weighted by the preferred stimulus of the neuron of origin, and a local, cellular nonlinearity approximates divisive normalization without dividing explicitly. The suboptimal decoder includes two simplifying approximations. It uses estimates of firing rate across the population rather than computing the total population response, and it implements divisive normalization with local cellular mechanisms of single neurons rather than more complicated neural circuit mechanisms. When applied to the practical problem of estimating target speed from a realistic simulation of the population response in extrastriate visual area MT, the suboptimal decoder has almost the same accuracy and precision as traditional decoding models. It succeeds in predicting the precision and imprecision of motor behavior using a suboptimal decoding computation because it adds only a small amount of imprecision to the code for target speed in MT, which is itself imprecise.
Introduction
In the cerebral cortex, information about a sensory stimulus or an impending movement is represented by the profile of activity across a population of neurons. Cortical neurons tend to be tuned for specific properties of a stimulus or movement, responding most for a preferred stimulus or movement and less in relation to how much the stimulus or movement differs from the preferred one. Because the exact stimulus or movement is encoded by which neurons are the most active, rather than the level of activity in the full population, we call the cortical representation a “place code.” For motor activity and perception, the place code present in any cortical area is converted, either explicitly or implicitly, to a “rate code,” in which properties of the sensory stimulus or motor act are represented in terms of the firing rate of individual neurons (Groh, 2001). Sensorimotor integration provides examples of explicit conversions from place to rate codes, for example, the conversion from the place code in the superior colliculus into a rate code for the generation of saccadic eye movements (Lee et al., 1988) or from the place code for visual motion in visual area MT into the cerebellar rate code that drives the initiation of smooth pursuit eye movements (Lisberger and Fuchs, 1978; Churchland and Lisberger, 2001).
Here, we ask how the brain decodes information in a place code to estimate a specific parameter of a sensory stimulus. Most of the decoding computations proposed in previous papers—such as “vector averaging” (Churchland and Lisberger, 2001), “maximum likelihood” (Jazayeri and Movshon, 2006), and more complicated statistical methods (Beck et al., 2008)—are motivated by considerations such as algorithmic elegance or theoretical optimality. They indicate how decoding could be performed in the brain. Our goal is to turn the conversation to implementation. Although it is possible that the brain extracts estimates optimally (Salinas and Abbott, 1994; Jazayeri and Movshon, 2006; Beck et al., 2008), our premise is that optimal computations may require neural circuits that are large and complex, posing a burden for biological implementation.
Divisive normalization is a special case of this broader concern. Normalization has been used to account for many neural responses (Heeger, 1992; Simoncelli and Heeger, 1998; Bonin et al., 2005; Zoccolan et al., 2005; Solomon et al., 2006) but has been implemented using extensive neural circuitry to sum the population response and an as yet unproven mechanism to accomplish division (Groh, 2001; Chance et al., 2002). In the present paper, we propose a decoding model that is qualitatively different from previous models and decidedly suboptimal, yet works well. It estimates the parameters of a sensory stimulus from the occurrence of a few spikes in each of many noisy sensory neurons bases normalization on an imperfect estimate of the total population activity and implements division through a local, cellular mechanism of gain modulation.
Materials and Methods
Much of work here is analytical. We sketch the derivations in Results and fill in details in Appendix. For the part that is based on simulations, we simulated the response of a model MT population with parameters adapted from Huang and Lisberger (2009). For target motion at speed S (in degrees per second), the mean response μk of the kth cell (in number of spikes) is predicted by its tuning curve: where M = 100 spikes/s is the maximum mean cell response (firing rate), and T = 0.1 s is the length of the time window under consideration. The n model neurons have a Fano factor equal to unity so that the variance of the response across trials is equal to its mean. The correlation Ckl between the output of the kth and lth cells is required to satisfy the following: where r = 0.36 is the maximum pairwise correlation among cells. Cells with similar speed tuning have larger noise correlations, which fall off with “distance” constant αλ, expressed in terms of the full range of the cell tunings (λ = log2Sn − log2S1); typically, we use α = 0.3.
To impose the noise and correlation structure defined above, we computed the Cholesky decomposition D of the correlation matrix C. We defined ρk as a vector of n = 1600 pseudorandom numbers, normal, centered at zero and with unity variance, and we defined the noise so that the respective spike counts of the cells in the time window T are Nk = μk + νk. Each Nk was rounded to the nearest non-negative integer, although we confirmed in simulations that allowing νk to be continuous and/or negative does not materially affect our conclusions. This numerical method generates spike count noise as though drawn from a multivariate normal distribution satisfying the empirically measured statistical properties of MT responses (Huang and Lisberger, 2009): each unit has noise variance equal to its mean firing rate, and pairwise noise correlations are given by the correlation matrix C.
Once we calculated the spike counts Nk, we simulated the spiking activity of each model neuron as an independent Poisson process, with the constraint that the calculated number of events occurs within the time window. Because the decoder looks at interspike intervals (ISIs) of its full input spike train to a decoder rather than the input from any individual sensory neuron, the effect of adding an absolutely refractory period of 2 ms to the sensory cells should be negligible. We verified this prediction in simulations.
To decode the population response with maximum likelihood estimation, we calculated the likelihood L(S′|Nk) of target motion at speed S′ given spike counts Nk and assuming a uniform prior for speed. Under the assumption of a multivariate normal distribution, the likelihood is proportional to the following:
where μk(S′) are the expected mean firing rates given target motion at speed S′, Σ(S′) = Δ(S′)CΔ(S′) defines the expected covariance matrix, and Δ(S′) is the diagonal matrix with terms
Results
Population decoding by vector averaging of firing rates
Consider a population of n neurons (called “sensory cells”) that are tuned for the value of a stimulus variable x that has the true value x = X. The kth cell, tuned to a value x = Xk, has a response quantified as rk = r(X;Xk); for example, r(X;Xk) might a Gaussian function centered at Xk. As diagrammed in Figure 1, for an example population of n = 3 neurons that are tuned for sensory variable x, the brain must use the responses of multiple sensory neurons to decode an estimate X′ that is reliably related to X.
The estimates of X by the brain have variation that we can estimate by making repeated measurements of perceptual or motor behaviors that are driven by those estimates. In cases in which we know that the behavioral variation arises from noise or errors in sensory processing (Osborne et al., 2005), the decoded estimates in our models should have means and variances of X′ that match the measurements from our data. If the Xk tile the discernable range for x in a reasonably regular way, then it is intuitive to think of the population response as a probability density function (Ma et al., 2006). A simple and, under some assumptions, optimal estimator of X is provided by (Salinas and Abbott, 1994) the following: Equation 6 gives, by definition, the center of mass of a population response and is often called its “vector average.” The vector average is one of many equations that can estimate sensory parameters from a population response (Seung and Sompolinsky, 1993), and we choose it as a starting point because (1) it has been successful in accounting for the relationship between sensory population responses and motor behavior in the system we study (Churchland and Lisberger, 2001; Huang and Lisberger, 2009; Yang and Lisberger, 2009) and (2) it is a relatively simple and generic way to estimate a stimulus parameter from such a population. We regard vector averaging not as something the brain necessarily does but rather as a tool that we use for reading out the mean and variance of the representation of a sensory parameter in a population response. Thus, our goal is not to find a neural implementation of vector averaging per se but rather to find a neutrally plausible decoding implementation that has the same biologically realistic performance as does vector averaging.
If the magnitude of the overall population response Σkrk is known and constant across all stimulus conditions, then Equation 6 has a straightforward interpretation as a neural circuit model (Groh, 2001). The quantity in the numerator can be computed by using static synaptic weights proportional to the preferred stimulus Xk of each neuron to create the weighted sum of the population response (Fig. 1). If the denominator is the same for all stimuli, then the sum would simply be proportional to X and divisive normalization would be unnecessary. In practice, however, Σkrk is a dynamic quantity of the population response: for example, the overall response level of motion-tuned neurons in visual area MT is sensitive to other parameters of target motion such as stimulus form and contrast (Sclar et al., 1990; Krekelberg et al., 2006; Priebe et al., 2006). Decoding must include a mechanism to render the estimate of X independent of the summed activity of the population response, for example via the explicit divisive normalization in Equation 6.
Population decoding based on spike trains
One of the challenges of population decoding is to estimate the parameters of a sensory stimulus in a short temporal interval from the spike trains of many neurons. We start by creating a structure for doing so, keeping in mind the 100 ms interval of sensory inputs that drive pursuit eye movements (Lisberger and Westbrook, 1985). In that interval, the relevant sensory neurons in area MT will discharge just a few spikes each (Osborne et al., 2004).
Formally, the kth sensory cell in a population response has a spike train bk = (tk,1, tk,2, tk,3, … tk,Nk), defined as the ordered sequence of times at which the cell generated spikes between times 0 and T. Within the relevant time window T, the kth cell generates Nk spikes so that its firing rate rk is estimated by Nk/T.
From the perspective of a decoding neuron such as the one illustrated in Figure 1, we can merge the incoming spike trains bk from the k sensory neurons and sort them by their arrival time at the decoding neuron (Fig. 2) to compile the input stream to the decoder: B = [(x1,t1),(x2,t2),(x3,t3), … (xN,tN)]. This is a formal description of the full population response converging onto the decoder: each pair (xj, tj) corresponds to one input spike that occurred at time tj in the kth neuron, which has preferred stimulus Xk that becomes the value of xj. Pairs are time ordered (i.e., tj+1 > tj). The total number of spikes across all sensory cells over the full time window of length T is given by N = Σk = 1nNk.
Now, we can translate Equation 6 from a sum over sensory cells, indexed by k, into a sum over the input spikes to a decoder unit, indexed by j (detailed derivation in Appendix): Normalization for different population response amplitudes is provided by counting the number of spikes N in the input stream B.
Our next step is to estimate N from the intervals between spikes in the sequence Β. We define Δtj ≡ tj − tj − 1 (and t0 ≡ 0) and make the reasonable assumption for a sufficiently dense population input that the last spike occurs at time tN near the end of the time window (i.e., T ≈ tN). Then, we obtain N ≈ T/<Δtj>j, where the angle brackets denote the mean over all input spikes j. This expression is related to the statement that the average interspike interval Δtj is simply the length of the time window divided by the number of spikes. Now, Equation 7 becomes The estimator in Equation 8 is equivalent to Equation 6. However, it yields a different interpretation as a model: it is based on weighting incoming spikes rather than estimating firing rates. The decoder is an accumulator: each spike input (j) causes an increment in the decoder output according to the preferred stimulus or “label” on the incoming axon. The assumption of accumulation, or temporal integration, is reasonable when the time intervals used to estimate stimulus properties are short, as they are for pursuit eye movements in primates (∼100 ms). Neurons expressing much longer temporal integrations have been discussed in relation to a number of other neural functions (Shadlen and Newsome, 2001; Aksay et al., 2003).
Normalization without division
In population decoding, normalization compensates for dynamic population response amplitudes. Equation 8 shows that normalization can be accomplished by using the mean interspike interval in a merged spike train seen by a decoder as a proxy for the overall population response N. The value of <Δtj>j estimates the total activity in the neural population, even as it is an exact assessment of the total input to a decoding neuron.
If a decoding neuron went one step further by estimating <Δtj>j and using that estimate to set the gain of neural processing in a biological way, then it would be possible to normalize for population response amplitude without either summing the full population activity or dividing. We suggest that such an approach would be much easier to implement with neurons, and we show later that estimating population activity in this way does not add very much noise to the decoded estimate X′, given the correlated noise already present in the population activity.
Suppose the overall gain of a decoding cell has a response nonlinearity that depends on the duration of the time interval (Δtj) between the jth and (j − 1)th spike in the incoming spike train B, without regard for the axon of origin of the two spikes. The response of the decoding neuron to a spike in a sensory neuron would be Δtjxj and would accumulate the quantity In Equation 9, the effect of each incoming spike depends on (1) a static gain determined by the preferred speed of its neuron of origin, perhaps implemented as a synaptic strength, and (2) a dynamic gain factor that is related to the interval since the previous incoming spike. Because the dynamic gain factor depends on the most recent interspike interval in the merged input spike train, it estimates the total response of the population and can be used for normalization in place of the actual total population response. We show in Appendix that Equation 9 differs from Equations 6 and 8 by a term that averages to zero under reasonable assumptions. Therefore, Equation 9 is an unbiased estimator of the actual value of the stimulus variable, X.
A hypothetical decoding unit based on Equation 9 would respond to each input spike with a gain that is proportional to the time Δtj since the prior spike in the collection of all the spikes in its inputs. It would normalize automatically the population response that serves as its input; its output would be related to the preferred stimulus parameter for the most active neurons in the population response. Stochastic behavior in firing across a large population tends to create a Δtj distribution sharply peaked at zero (Fig. 3, dashed curve); in practice, it would be sufficient for a Δtj-dependent response gain to be continuous and increasing for the range of interspike intervals that dominate the probability distribution of the Δtj for the aggregated inputs to the decoding cell. To take into account the realities of a biological system, we have allowed the Δtj-dependent gain to saturate (Fig. 3, solid curve).
If there were only a single decoding neuron with inputs from many sensory cells, then the decoder would need to operate on a very fast timescale to be able to discern differences between very short interspike intervals in the total incoming spike train. If, however, there were m decoding units, each receiving inputs from a small subset of the sensory cells, then the gain function could be based on longer interspike intervals. Once the place code of a population response is translated into noisy rate codes in a group of decoding units, the computationally simple step of averaging across decoding units could provide a better rate code. Furthermore, the number of decoding neurons could be “chosen” to keep the number of inputs to each decoding unit in a range that is on the rising phase of the Δtj nonlinearity. The decoding stage also could be modularized for other reasons, such as robustness or redundancy.
A specific implementation
As a specific example of the general problem of sensory population decoding, we use smooth pursuit eye movements. Sensory signals for pursuit arise in extrastriate visual area MT, in which neurons respond selectively to motion across the visual field and are tuned for target speed and direction (Maunsell and Van Essen, 1983). We apply the decoder described in Equation 9 to a population of model neurons designed to mimic the tuning of MT neurons for the speed of visual motion. Our example is chosen to elucidate how a decoder of target direction and speed could be implemented to convert the population response in MT into estimates of target speed with means and variances defined by measurements of the pursuit response (Osborne et al., 2005, 2007). Pursuit latency is on the order of 100 ms with respect to the initiation of target motion, implying that an estimate of target velocity is made from the population response within a time window of that duration.
Our model MT population is explicitly constructed to mimic statistical features—both at the level of single cells and noise correlations—of the population response measured experimentally (Maunsell and Van Essen, 1983; Osborne et al., 2004; Huang and Lisberger, 2009). It consists of n = 1600 model neurons, each with Gaussian tuning as a function of the logarithm of the speed S (measured in degrees per second) of a moving visual target. The mean response of each model neuron is normal in log2S with a width of 1.45 log units and a peak response of 100 spikes/s at log2Sk; the Sk are spaced uniformly on a log2 scale between S1 = 0.1°/s and S1600 = 512°/s; some example tuning curves are shown in Figure 4B. The variance of the spike count of each model sensory neuron is constrained to replicate the variance found empirically, yielding a population response like that in Figure 4C. The variance in the model MT population also is constrained to have the correlation structure found empirically (Huang and Lisberger, 2009); in particular, the noise correlation between two cells is a function of the square of the difference between their preferred log speeds, peaking at 0.36 and falling exponentially with a “distance” constant that is 30% of the full (logarithmic) range tiled by the preferred speeds of sensory cells (i.e., 3.7 log units). The structure and amplitude of the neuron–neuron correlations, depicted in Figure 4A, are particularly important because they place an upper limit on the coding precision for target speed in the population response. Because of the correlations, the addition of more sensory cells after the first several hundred has negligible impact on the mean or variance of the decoded estimates of target speed. Thus our results are not sensitive to our specific choice for the size (n = 1600) of the model population. In reality, MT neurons are tuned for both speed and direction of motion. For simplicity, we only consider the speed component; our results generalize to the two dimensional case.
We simulated the response of the model MT population to 500 target motions at random speeds uniformly distributed between 2 and 64°/s. This corresponds approximately to the range of target motion velocities over which smooth pursuit is studied behaviorally and is within the range over which the distribution of the preferred speeds in our model population is sufficiently uniform that we can decode target speed directly, without nonlinear compensation for the fact that the population speed-tuning distribution and individual speed tuning are natively logarithmic (Lisberger and Movshon, 1999; Nover et al., 2005; Huang and Lisberger, 2009). For each target motion, we decoded the response of the population with the model we propose using ISI in Equation 9 and compare it with the true vector average shown by Equation 6 and a maximum likelihood decoder, which is asymptotically optimal. For the latter decoding model, we take the speed S′ of the pair (S′, M′) that maximizes the likelihood in Equation 5. We quantified the performance of each decoder via the fractional decoding errors [(S′ − S)/S]), where S′ is the decoded speed, and S is the actual target speed. We measured decoder bias as the mean fractional decoding error and decoder variation as the variance of the fractional decoding error.
Figure 5 shows that our model and the vector average decoding methods yield nearly identical answers in terms of both the bias and the variation of the estimates of target speed; the additional error introduced by our model is small. The top row of Figure 5 shows the similar performance of the three decoders in terms of decoded speed as a function of target speed. For all three decoders, the variation of decoded speed increases as a function of the value of the decoded speed, as found in pursuit (Osborne et al., 2005) and motor control in general (Harris and Wolpert, 1998).
In agreement with empirical data (Nover et al., 2005), the fractional decoding error of all the decoders, rather the decoding error per se, remains approximately constant as a function of target speed. Pooling the responses to similar target speeds revealed that the vector average and ISI decoders showed a similar small bias at low target speeds, indicated by the nonzero values of the mean fractional coding errors (Fig. 5, bottom row). This slight bias is an artifact not of the decoders but rather of reading out linear speed from a model MT population that natively codes speed logarithmically. The bias disappears, and the performance of the decoders improves to 14.0 and 14.1%, respectively, if we decode the logarithm of the target speed directly, a conversion that requires a nonlinear transformation. The maximum likelihood decoder performs the logarithmic conversion intrinsically.
The variance of the fractional decoding errors indicated that the maximum likelihood decoder produces slightly less variable estimates of target speed than do the other two models. Although the maximum likelihood decoder takes into account the neuron–neuron correlations in the model MT population, it cannot eliminate decoding error altogether, and we conclude that its 11.4% variance is an estimate of the error inherent in the population response itself. Vector averaging and our proposed model are slightly suboptimal, in that their variances (15.5 and 15.6%) are larger than those of the optimal decoder. Given the variation of the decoding error inherent in the population, the marginal decrease in performance for the two suboptimal decoders is relatively small. The size of the errors predicted by all three decoders is comparable with data from behavioral experiments (Osborne et al., 2005), meaning that measurements of pursuit variation alone cannot distinguish among possible decoding schemes.
We verified that the normalization method in our proposed decoder does, indeed, decode target speed in a way that is independent of the amplitude of the population response. For values of population response amplitude (M in Eq. 1) spanning from 12.5 to 100 spikes/s, the mean and variance of the fractional decoding error remain nearly constant (Fig. 6A), with only a slight improvement as the population response size grows. However, even for population response amplitudes several times smaller than values measured experimentally (Huang and Lisberger, 2009), the quality of decoding approaches its limit.
The ISI decoder also produces estimates of target speed that evolve rapidly in time and track either increases or decrease in target speed. For target speeds of either 20 or 40°/s, the speed estimate from both vector averaging and the ISI decoder can reach asymptotic performance within <2 ms (Fig. 6B) and after <100 spikes (Fig. 6C) have been produced by the population of model sensory neurons. If the model decoder neurons are implemented as “integrate-and-fire” units, then they also track changes in target speed on a timescale determined by the interspike intervals of the decoding neuron. The rapid response of the ISI decoder with instantaneous changes in the normalizing gain nonlinearity points out that decoder output would respond more slowly in proportion to the presence of any longer temporal dynamics within the nonlinearity.
Our results did not depend materially on the parameters of the model population, as long as it retained stochastic spike counts in short time intervals. The number m of independent decoding units was unimportant, as were the characteristics of the saturation (if any) of the gain modulation by Δt (Fig. 3), as long as each decoding unit received most of its input with Δtj in the nonsaturation range. The model also worked when we implemented decoding units as simulated integrate-and-fire neurons.
Figure 7 documents the similar effects on the decoding performance of all three models we tested of noise correlation parameters R and α in Equation 2, which determine the peak correlation amplitude and the decay of the correlation across the population. For the ISI decoder, vector averaging, and maximum likelihood, increasing the length constant for the decay of correlations decreased the fractional decoding error, whereas increasing the maximum correlation increased the fractional decoding error, as reported previously (Huang and Lisberger, 2009; Lisberger, 2010). All three decoders show similar performance for the parameters of the model that best reproduced the structure of noise correlations across the MT population response (black circle in all three graphs). In the biologically unrealistic regimen of very high peak noise correlations (in which R approaches unity), the difference in performance between maximum likelihood and the other two decoding algorithms becomes more pronounced, although all three decoders perform poorly under these conditions.
Finally, we verified that the ISI decoding algorithm is robust to possible non-uniformities in the distribution of the preferred speeds of sensory cells. Our generic model population contained 1600 sensory cells with preferred speeds spaced evenly along a logarithmic scale between S1 = 0.1°/s and S1600 = 512°/s. To test the effects of disrupting the uniformity, we jittered the preferred speeds by choosing them according to log SJ,k = log(Sk) + ZJ,k, where the ZJ,k are randomly drawn from a normal distribution with a mean of zero and a width of JΔ(log S). Thus, J is a measure of the deviation from even spacing of the distribution of the preferred speeds of sensory cells. Figure 8 shows that the variation of the decoding estimates was independent of the magnitude of the jitter in all three decoding models. The mean errors of the estimates of target speed retain the small deviations from zero illustrated in Figure 5D–F for all three decoding models.
We conclude that the imprecision inherent in encoding information in the finite spike counts of a correlated neuron population can be much larger than additional sources of error that are introduced by simple population decoders. As a result, suboptimal decoders add relatively little estimation error above that inherent in the population and have performance that mimics biological observations to a satisfactory degree.
Discussion
It is common to assess neural processing models in terms of optimality in an information-theoretical sense. We have outlined a different approach, one that we think is more plausible in terms of implementation with neurons. Previous work has shown that the imprecision inherent in a population code of neurons with noise correlations is passed all the way to behavioral outputs, for both perception (Zohary et al., 1994; Bair et al., 2001) and action (Osborne et al., 2005; Huang and Lisberger, 2009). We propose that the inexorable, correlated noise in the sensory encoding creates inescapable imprecision in estimates of sensory parameters. This allows decoding mechanisms to be suboptimal in an information processing sense and permits increased efficiency of implementation without compromising behavioral precision. In the present paper, we have outlined our proposal analytically and demonstrated its successes through simulation of a specific example in which a sensory place code is known to drive movement after transformation into a rate code. Perhaps the brain, in decoding population responses, does not act as an ideal observer. Rather than striving for optimal information processing, it may settle for “good enough.”
Our proposed decoding mechanism provides a numerical example of a more general principle that non-optimal processing may be sufficient for a system already beset by a nontrivial amount of correlated noise. The principle of operation of our model is to estimate neural parameters rather than computing them. Instead of assuming knowledge of firing rate, it estimates firing rate from the Poisson spike trains of the sensory neurons. Instead of computing the total population response for divisive normalization, it estimates the total population response in real time from the interspike interval in the net incoming spike train to a decoding neuron. Instead of dividing, it uses a local, nonlinear modulation of neural gain. The philosophy of performing computations locally is shared by a previous model of contrast-invariant orientation tuning (Kayser et al., 2001).
Implementation of our model relies on three biologically plausible functions. (1) Inputs to decoding units are weighted according to the preferred speed of their neuron of origin. Weighting according to the preferred speed of the input neuron has been an accepted feature of population vector decoding (Salinas and Abbott, 1994; Churchland and Lisberger, 2001) and could be implemented by setting synaptic strengths. (2) Decoding units are integrators. Integrator neurons or circuits have been postulated to account for the oculomotor velocity-to-position integrator in the brainstem (Aksay et al., 2003) and the ramp up to threshold of neurons in the lateral intraparietal area (Shadlen and Newsome, 2001). The duration of temporal integration in our model is much shorter than in the previous models. (3) Decoding units are rendered insensitive to the overall response amplitude across the population through a local, nonlinear modulation of the gain of each input according to the recent history of spikes. The nonlinear gain modulation we propose suggests a postsynaptic analog to short-term synaptic depression (Chance et al., 1998). It depends, however, on the total spike train seen by the postsynaptic neuron rather than on the spikes of any individual presynaptic axon. The cellular mechanism we propose has not been demonstrated in analyses of neural electrophysiology but could be compatible with input nonlinearities that arise in dendritic spike attenuation (Polsky et al., 2004), input attenuation resulting from depletion of extracellular resources (Montague, 1996), or other local, mutually suppressive interactions between synaptic inputs. Evidence for time-dependent sublinear input summation has been observed in pyramidal cells (Urban and Barrionuevo, 1998).
The use of a cellular mechanism to implement divisive normalization has the advantage of greatly reducing the amount of neural circuitry needed to render the decoded estimates of sensory parameters insensitive to the amplitude of the population response. Previous models have implemented divisive normalization by accumulating the total activity across the population response and then using those signals as the input to division on another group of neurons. In principle, division could be implemented to some degree by shunting inhibition (Carandini et al., 1997) or the interplay of noisy excitatory and inhibitory inputs (Chance et al., 2002). In our formulation, a separate set of “normalizing neurons” could estimate the total population activity by using the postsynaptic gain mechanism in our model while sampling from a population of model sensory neurons. As long as the sampling is sparse, the estimate could be achieved with relatively little circuitry and then used to perform divisive normalization with the mechanisms already suggested by others (Carandini et al., 1997; Groh, 2001; Chance et al., 2002). Of course, the cellular mechanism we propose for normalization would be applicable only when normalization is controlled only by the same inputs that drive the responses of a neuron. Thus, either a different kind of cellular mechanism or a network normalization mechanism would be needed to explain phenomena such as contrast normalization from outside the receptive field in the primary visual cortex (Geisler and Albrecht, 1992). In addition, the specific mechanism we suggest is just one of many possible cellular mechanisms of normalization that would have similar properties.
A cellular mechanism of divisive normalization would allow a neuron to normalize its inputs automatically. Divisive normalization also is one of the key elements in Heeger's original model (Heeger, 1993) of the generation of motion sensitive responses in primary visual cortex V1 and of other accounts of cortical responses (Simoncelli and Heeger, 1998; Bonin et al., 2005; Zoccolan et al., 2005; Solomon et al., 2006). It is also a common ingredient in neural networks (Deneve et al., 1999). Furthermore, a cellular mechanism of self-normalization might be adaptively regulated on a neuron-by-neuron basis according to sensible rules of plasticity. Our model shows the utility of the concept that important neural computations might be handled by local cellular mechanisms rather than by neural circuits. Indeed, the solution by the brain to a specific problem such as population decoding may be very different from the mechanisms implied by even the most intuitively appealing equations. Vector averaging and the other extant decoding computations are perhaps at best metaphors for what happens in the brain.
Our decoding model also has the feature that, like vector averaging, it is agnostic to many of the specific details of the population response it is decoding. It has a simple implementation but ignores features such as shapes of the tuning curves of individual sensory cells and the correlation structure within the population. Although these ignored features could, in principle, be exploited to improve the performance of a decoder (Latham and Nirenberg, 2005; Jazayeri and Movshon, 2006; Beck et al., 2008), recent results show that, at least in MT, tuning properties and neuron–neuron correlations can be dynamic (Cohen and Newsome, 2008; Yang and Lisberger, 2009). Any decoding algorithm seeking to extract information optimally must be able to update its data about neural response properties on a fast timescale, lest it be optimized for only an arbitrary subset of conditions. A maximum likelihood decoder with incorrect information about the correlation structure of its inputs can perform worse than a naive vector averaging model or the model we propose.
The mechanistic simplifications in our model allow sensory decoding to proceed with a relatively small number of decoding neurons and synaptic connections. In principle, it would be possible to apply the same simplifications to other decoding approaches, such as some maximum likelihood (Jazayeri and Movshon, 2006) or Bayesian (Beck et al., 2008) approaches. However, the need to do so by reducing the number of neurons and synapses would make their estimates suboptimal, contradicting the philosophy of optimal estimation on which the models are based. The point of our model is perhaps not so much embodied in the exact implementation but rather in the demonstration that compact, suboptimal neuron circuitry can perform well enough to account for neural data. Simplifications of the decoding computation would work because the precision of sensory estimates is limited by the presence of substantial correlated noise in the sensory population response. In the system we study, smooth pursuit eye movements, sensory coding, and behavioral performance both are decidedly non-optimal, and these observations are linked (Osborne et al., 2005, 2007; Medina and Lisberger, 2007; Huang and Lisberger, 2009; Osborne and Lisberger, 2009; Lisberger, 2010). This allows sensory decoding to be suboptimal without compromising behavioral performance much further. We imagine that the same is true for many, although possibly not all, neural systems.
The organization of computations in the brain must be constrained by the limitations of computational properties of neurons and neural circuits and by the need to perform complex functions reliably with minimal elements to keep the volume and energy use of the brain manageable (Montague, 1996). Especially in situations in which the sensory code is already suboptimal, there may be a tradeoff between the constraints of implementation and the drive for optimal performance. A small decrement in optimality of information processing could allow a large improvement in the efficiency of implementation. To accomplish this efficiency, we imagine that the brain might co-opt any one of a plethora of possible cellular mechanisms to perform complex computations without using a ton of neural wire.
Appendix
Derivation of the decoding computation
In Results, we omitted some details about how to translate the vector average defined by Equation 1 from a sum over sensory cells (indexed by k) into a sum over population spikes (indexed by j). Those details are presented here.
We use the observation that the firing rate of each sensory neuron rk is defined as Nk/T and multiply the top and bottom of Equation 6 by T: Recognizing that the denominator in Equation 10 is simply the total number of input spikes N and that summing over the preferred stimuli for the input neurons in the numerator is equivalent to summing the labels retained in the spike train B that describes the total input to the decoding neuron, we end up with Equation 7:
Inaccuracies in estimates of sensory parameters
The use of interspike intervals to estimate the total population response has the potential to introduce estimation errors. Here, we show that Equation 9 differs from Equations 6 and 8 by a term that averages to zero, under reasonable assumptions. Using the individual values of Δtj in the estimator rather than the average <Δtj>j means that each term in the summand of Equation 9 contains a term attributable to the residual δtj ≡ Δtj − <Δtj′>j′. If we substitute into Equation 9, we obtain the following:
This estimator is equivalent to that in Equations 6 and 8, up to the extra
For the specific example of estimating target speed from a model MT population response, the imprecision of the true vector average is limited by the mismatch between spike counts in a finite time window and underlying firing rates in a finite time window and by the neuron–neuron correlations in the population noise. Given the noise correlations found in MT neurons, the error in the estimates of target speed from Equation 1 asymptotes at approximately the ±11.4% variance seen in Figure 5; this error is inherent in the population response itself (Huang and Lisberger, 2009). Thus, approximation of the vector average using Equation 9 differs from the vector average by δX′ (Eq. 13). The additional contribution to the error from δX′ scales as 1/
Footnotes
This research was supported by National Institutes of Health Grant EY017210, including an American Recovery and Reinvestment Act supplement, the Howard Hughes Medical Institute, and the Swartz Foundation. We thank Philip Sabes for helpful discussions and Loren Frank, Michael Shadlen, and the members of the Lisberger laboratory for comments on a previous version of this manuscript.
- Correspondence should be addressed to Kris Chaisanguanthum, Department of Physiology, Box 0444, University of California, San Francisco, 513 Parnassus Avenue, Room HSE-802A, San Francisco, CA 94143-0444. chaisang{at}phy.ucsf.edu